Git Repo - linux.git/log

mm: memcg: deprecate the non-hierarchical mode

Patch series "mm: memcg: deprecate cgroup v1 non-hierarchical mode", v1.

The non-hierarchical cgroup v1 mode is a legacy of early days
of the memory controller and doesn't bring any value today.
However, it complicates the code and creates many edge cases
all over the memory controller code.

It's a good time to deprecate it completely. This patchset removes
the internal logic, adjusts the user interface and updates
the documentation. The alt patch removes some bits of the cgroup
core code, which become obsolete.

Michal Hocko said:
  "All that we know today is that we have a warning in place to complain
   loudly when somebody relies on use_hierarchy=0 with a deeper
   hierarchy. For all those years we have seen _zero_ reports that would
   describe a sensible usecase.

   Moreover we (SUSE) have backported this warning into old distribution
   kernels (since 3.0 based kernels) to extend the coverage and didn't
   hear even for users who adopt new kernels only very slowly. The only
   report we have seen so far was a LTP test suite which doesn't really
   reflect any real life usecase"

This patch (of 3):

The non-hierarchical cgroup v1 mode is a legacy of early days of the
memory controller and doesn't bring any value today.  However, it
complicates the code and creates many edge cases all over the memory
controller code.

It's a good time to deprecate it completely.

Functionally this patch enabled is by default for all cgroups and forbids
switching it off.  Nothing changes if cgroup v2 is used: hierarchical mode
was enforced from scratch.

To protect the ABI memory.use_hierarchy interface is preserved with a
limited functionality: reading always returns "1", writing of "1" passes
silently, writing of any other value fails with -EINVAL and a warning to
dmesg (on the first occasion).

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcg: fix obsolete code comments

This patch fixes/removes some obsolete comments in the code related
to the kernel memory accounting:

- kmem_cache->memcg_params.memcg_caches has been removed by commit
   9855609bde03 ("mm: memcg/slab: use a single set of kmem_caches for
   all accounted allocations")

- memcg->kmemcg_id is not used as a gate for kmem accounting since
   commit 0b8f73e10428 ("mm: memcontrol: clean up alloc, online,
   offline, free functions")

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memcg: update page struct member in comments

The page->mem_cgroup member is replaced by memcg_data, and add a helper
page_memcg() for it. Need to update comments to avoid confusing.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alex Shi <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/rmap: always do TTU_IGNORE_ACCESS

Since commit 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic
v2"), the code to check the secondary MMU's page table access bit is
broken for !(TTU_IGNORE_ACCESS) because the page is unmapped from the
secondary MMU's page table before the check.  More specifically for those
secondary MMUs which unmap the memory in
mmu_notifier_invalidate_range_start() like kvm.

However memory reclaim is the only user of !(TTU_IGNORE_ACCESS) or the
absence of TTU_IGNORE_ACCESS and it explicitly performs the page table
access check before trying to unmap the page.  So, at worst the reclaim
will miss accesses in a very short window if we remove page table access
check in unmapping code.

There is an unintented consequence of !(TTU_IGNORE_ACCESS) for the memcg
reclaim.  From memcg reclaim the page_referenced() only account the
accesses from the processes which are in the same memcg of the target page
but the unmapping code is considering accesses from all the processes, so,
decreasing the effectiveness of memcg reclaim.

The simplest solution is to always assume TTU_IGNORE_ACCESS in unmapping
code.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic v2")
Signed-off-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcg/slab: fix use after free in obj_cgroup_charge

The rcu_read_lock/unlock only can guarantee that the memcg will not be
freed, but it cannot guarantee the success of css_get to memcg.

If the whole process of a cgroup offlining is completed between reading a
objcg->memcg pointer and bumping the css reference on another CPU, and
there are exactly 0 external references to this memory cgroup (how we get
to the obj_cgroup_charge() then?), css_get() can change the ref counter
from 0 back to 1.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Yafang Shao <[email protected]>
Cc: Chris Down <[email protected]>
Cc: Christian Brauner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcg/slab: fix return of child memcg objcg for root memcg

Consider the following memcg hierarchy.

                    root
                   /    \
                  A      B

If we failed to get the reference on objcg of memcg A, the
get_obj_cgroup_from_current can return the wrong objcg for the root
memcg.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Yafang Shao <[email protected]>
Cc: Chris Down <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Eugene Syromiatnikov <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Adrian Reber <[email protected]>
Cc: Marco Elver <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcontrol: eliminate redundant check in __mem_cgroup_insert_exceeded()

The mz->usage_in_excess >= mz_node->usage_in_excess check is exactly the
else case of mz->usage_in_excess < mz_node->usage_in_excess. So we could
replace else if (mz->usage_in_excess >= mz_node->usage_in_excess) with
else equally. Also drop the comment which doesn't really explain much.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcontrol: remove unused mod_memcg_obj_state()

Since commit 991e7673859e ("mm: memcontrol: account kernel stack per
node") there is no user of the mod_memcg_obj_state(). So just remove
it.

Also rework type of the idx parameter of the mod_objcg_state() from int
to enum node_stat_item.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Acked-by: David Rientjes <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yafang Shao <[email protected]>
Cc: Chris Down <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcontrol: add file_thp, shmem_thp to memory.stat

As huge page usage in the page cache and for shmem files proliferates in
our production environment, the performance monitoring team has asked for
per-cgroup stats on those pages.

We already track and export anon_thp per cgroup. We already track file
THP and shmem THP per node, so making them per-cgroup is only a matter of
switching from node to lruvec counters. All callsites are in places where
the pages are charged and locked, so page->memcg is stable.

[[email protected]: add documentation]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Song Liu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

tmpfs: fix Documentation nits

Fix a typo, punctuation, use uppercase for CPUs, and limit
tmpfs to keeping only its files in virtual memory (phrasing).

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Randy Dunlap <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Chris Down <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/shmem.c: make shmem_mapping() inline

shmem_mapping() isn't worth an out-of-line call from any callsite.

So make it inline by
- make shmem_aops global
- export shmem_aops
- inline the shmem_mapping()

and replace the direct call 'shmem_aops' with shmem_mapping()
in shmem.c.

Link: https://lkml.kernel.org/r/20201115165207.GA265355@rlk
Signed-off-by: Hui Su <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: remove pagevec_lookup_range_nr_tag()

With the merge of commit 2e1692966034 ("ceph: have ceph_writepages_start
call pagevec_lookup_range_tag"), nothing calls this anymore.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/swapfile.c: use memset to fill the swap_map with SWAP_HAS_CACHE

We could use helper memset to fill the swap_map with SWAP_HAS_CACHE instead
of a direct loop here to simplify the code. Also we can remove the local
variable i and map this way.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/swapfile.c: remove unnecessary out label in __swap_duplicate()

When the code went to the out label, it must have p == NULL. So what out
label really does is redundant if check and return err. We should Remove
this unnecessary out label because it does not handle resource free and so
on.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/swap_state: skip meaningless swap cache readahead when ra_info.win == 0

swap_ra_info() may leave ra_info untouched in non_swap_entry() case as
page table lock is not held. In this case, we have ra_info.nr_pte == 0
and it is meaningless to continue with swap cache readahead. Skip such
ops by init ra_info.win = 1.

[[email protected]: clean up struct init]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/swapfile.c: use helper function swap_count() in add_swap_count_continuation()

Commit 570a335b8e22 ("swap_info: swap count continuations") introduced the
func add_swap_count_continuation() but forgot to use the helper function
swap_count() introduced by commit 355cfa73ddff ("mm: modify swap_map and
add SWAP_HAS_CACHE flag").

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: handle zone device pages in release_pages()

release_pages() is an optimized, inlined version of __put_pages() except
that zone device struct pages that are not page_is_devmap_managed() (i.e.,
memory_type MEMORY_DEVICE_GENERIC and MEMORY_DEVICE_PCI_P2PDMA), fall
through to the code that could return the zone device page to the page
allocator instead of adjusting the pgmap reference count.

Clearly these type of pages are not having the reference count decremented
to zero via release_pages() or page allocation problems would be seen.
Just to be safe, handle the 1 to zero case in release_pages() like
__put_page() does.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ralph Campbell <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/gup: combine put_compound_head() and unpin_user_page()

These functions accomplish the same thing but have different
implementations.

unpin_user_page() has a bug where it calls mod_node_page_state() after
calling put_page() which creates a risk that the page could have been
hot-uplugged from the system.

Fix this by using put_compound_head() as the only implementation.

__unpin_devmap_managed_user_page() and related can be deleted as well in
favour of the simpler, but slower, version in put_compound_head() that has
an extra atomic page_ref_sub, but always calls put_page() which internally
contains the special devmap code.

Move put_compound_head() to be directly after try_grab_compound_head() so
people can find it in future.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1970dc6f5226 ("mm/gup: /proc/vmstat: pin_user_pages (FOLL_PIN) reporting")
Signed-off-by: Jason Gunthorpe <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
CC: Joao Martins <[email protected]>
CC: Jonathan Corbet <[email protected]>
CC: Dan Williams <[email protected]>
CC: Dave Chinner <[email protected]>
CC: Christoph Hellwig <[email protected]>
CC: Jane Chu <[email protected]>
CC: "Kirill A. Shutemov" <[email protected]>
CC: Michal Hocko <[email protected]>
CC: Mike Kravetz <[email protected]>
CC: Shuah Khan <[email protected]>
CC: Muchun Song <[email protected]>
CC: Vlastimil Babka <[email protected]>
CC: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/gup: remove the vma allocation from gup_longterm_locked()

Long ago there wasn't a FOLL_LONGTERM flag so this DAX check was done by
post-processing the VMA list.

These days it is trivial to just check each VMA to see if it is DAX before
processing it inside __get_user_pages() and return failure if a DAX VMA is
encountered with FOLL_LONGTERM.

Removing the allocation of the VMA list is a significant speed up for many
call sites.

Add an IS_ENABLED to vma_is_fsdax so that code generation is unchanged
when DAX is compiled out.

Remove the dummy version of __gup_longterm_locked() as !CONFIG_CMA already
makes memalloc_nocma_save(), check_and_migrate_cma_pages(), and
memalloc_nocma_restore() into a NOP.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/gup: prevent gup_fast from racing with COW during fork

Since commit 70e806e4e645 ("mm: Do early cow for pinned pages during
fork() for ptes") pages under a FOLL_PIN will not be write protected
during COW for fork.  This means that pages returned from
pin_user_pages(FOLL_WRITE) should not become write protected while the pin
is active.

However, there is a small race where get_user_pages_fast(FOLL_PIN) can
establish a FOLL_PIN at the same time copy_present_page() is write
protecting it:

        CPU 0                             CPU 1
   get_user_pages_fast()
    internal_get_user_pages_fast()
                                       copy_page_range()
                                         pte_alloc_map_lock()
                                           copy_present_page()
                                             atomic_read(has_pinned) == 0
     page_maybe_dma_pinned() == false
     atomic_set(has_pinned, 1);
     gup_pgd_range()
      gup_pte_range()
       pte_t pte = gup_get_pte(ptep)
       pte_access_permitted(pte)
       try_grab_compound_head()
                                             pte = pte_wrprotect(pte)
                                     set_pte_at();
                                         pte_unmap_unlock()
      // GUP now returns with a write protected page

The first attempt to resolve this by using the write protect caused
problems (and was missing a barrrier), see commit f3c64eda3e50 ("mm: avoid
early COW write protect games during fork()")

Instead wrap copy_p4d_range() with the write side of a seqcount and check
the read side around gup_pgd_range().  If there is a collision then
get_user_pages_fast() fails and falls back to slow GUP.

Slow GUP is safe against this race because copy_page_range() is only
called while holding the exclusive side of the mmap_lock on the src
mm_struct.

[[email protected]: coding style fixes]
Link: https://lore.kernel.org/r/CAHk-=wi=iCnYCARbPGjkVJu9eyYeZ13N64tZYLdOB8CP5Q_PLw@mail.gmail.com
Link: https://lkml.kernel.org/r/[email protected]
Fixes: f3c64eda3e50 ("mm: avoid early COW write protect games during fork()")
Signed-off-by: Jason Gunthorpe <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Acked-by: "Ahmed S. Darwish" <[email protected]> [seqcount_t parts]
Cc: Andrea Arcangeli <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Kirill Shutemov <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/gup: reorganize internal_get_user_pages_fast()

Patch series "Add a seqcount between gup_fast and copy_page_range()", v4.

As discussed and suggested by Linus use a seqcount to close the small race
between gup_fast and copy_page_range().

Ahmed confirms that raw_write_seqcount_begin() is the correct API to use
in this case and it doesn't trigger any lockdeps.

I was able to test it using two threads, one forking and the other using
ibv_reg_mr() to trigger GUP fast.  Modifying copy_page_range() to sleep
made the window large enough to reliably hit to test the logic.

This patch (of 2):

The next patch in this series makes the lockless flow a little more
complex, so move the entire block into a new function and remove a level
of indention.  Tidy a bit of cruft:

- addr is always the same as start, so use start

- Use the modern check_add_overflow() for computing end = start + len

- nr_pinned/pages << PAGE_SHIFT needs the LHS to be unsigned long to
   avoid shift overflow, make the variables unsigned long to avoid coding
   casts in both places. nr_pinned was missing its cast

- The handling of ret and nr_pinned can be streamlined a bit

No functional change.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/gup_test: GUP_TEST depends on DEBUG_FS

Without DEBUG_FS, all the code in gup_benchmark becomes meaningless.
For sure kernel provides debugfs stub while DEBUG_FS is disabled, but
the point here is that GUP_TEST can do nothing without DEBUG_FS.

[[email protected]: add comment as a prompt to users as commented by John and Randy]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Barry Song <[email protected]>
Suggested-by: John Garry <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Cc: Ralph Campbell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/gup_test.c: mark gup_test_init as __init function

gup_test_init() is only called during initialization, mark it as __init to
save some memory.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Barry Song <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Ralph Campbell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftests/vm: 2x speedup for run_vmtests.sh

Each invocation of userfaultfd for "anon" and "shmem" was taking about
6.5 sec to run, contributing to an overall run time of about 22 sec for
run_vmtests.sh.

Reduce the size and bounce input values to the userfaultfd invocation
within run_vmtests.sh, enough to get each invocation down to about 1.0
sec. This should still provide a reasonable smoke test, while staying
within a nominal time budget of around 1 second or so per test. And this
brings the overall running time of run_vmtests.sh down to 11 second.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftests/vm: hmm-tests: remove the libhugetlbfs dependency

HMM selftests are incredibly useful, but they are only effective if people
actually build and run them.  All the other tests in selftests/vm can be
built with very standard, always-available libraries: libpthread, librt.
The hmm-tests.c program, on the other hand, requires something that is
(much) less readily available: libhugetlbfs.  And so the build will
typically fail for many developers.

A simple attempt to install libhugetlbfs will also run into complications
on some common distros these days: Fedora and Arch Linux (yes, Arch AUR
has it, but that's fragile, as always with AUR).  The library is not
maintained actively enough at the moment, for distros to deal with it.  I
had to build it from source, for Fedora, and that didn't go too smoothly
either.

It turns out that, out of 21 tests in hmm-tests.c, only 2 actually require
functionality from libhugetlbfs.  Therefore, if libhugetlbfs is missing,
simply ifdef those two tests out and allow the developer to at least have
the other 19 tests, if they don't want to pause to work through the above
issues.  Also issue a warning, so that it's clear that there is an
imperfection in the build.

In order to do that, a tiny shell script (check_config.sh) runs a quick
compile (not link, that's too prone to false failures with library paths),
and basically, if the compiler doesn't find hugetlbfs.h in its standard
locations, then the script concludes that libhugetlbfs is not available.
The output is in two files, one for inclusion in hmm-test.c
(local_config.h), and one for inclusion in the Makefile (local_config.mk).

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftests/vm: run_vmtests.sh: update and clean up gup_test invocation

Run benchmarks on the _fast variants of gup and pup, as originally
intended.

Run the new gup_test sub-test: dump pages. In addition to exercising the
dump_page() call, it also demonstrates the various options you can use to
specify which pages to dump, and how.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftests/vm: gup_test: introduce the dump_pages() sub-test

For quite a while, I was doing a quick hack to gup_test.c (previously,
gup_benchmark.c) whenever I wanted to try out my changes to dump_page().
This makes that hack unnecessary, and instead allows anyone to easily get
the same coverage from a user space program.  That saves a lot of time
because you don't have to change the kernel, in order to test different
pages and options.

The new sub-test takes advantage of the existing gup_test infrastructure,
which already provides a simple user space program, some allocated user
space pages, an ioctl call, pinning of those pages (via either
get_user_pages or pin_user_pages) and a corresponding kernel-side test
invocation.  There's not much more required, mainly just a couple of
inputs from the user.

In fact, the new test re-uses the existing command line options in order
to get various helpful combinations (THP or normal, _fast or slow gup, gup
vs.  pup, and more).

New command line options are: which pages to dump, and what type of
"get/pin" to use.

In order to figure out which pages to dump, the logic is:

* If the user doesn't specify anything, the page 0 (the first page in
  the address range that the program sets up for testing) is dumped.

* Or, the user can type up to 8 page indices anywhere on the command
  line.  If you type more than 8, then it uses the first 8 and ignores the
  remaining items.

For example:

    ./gup_test -ct -F 1 0 19 0x1000

Meaning:
    -c:          dump pages sub-test
    -t:          use THP pages
    -F 1:        use pin_user_pages() instead of get_user_pages()
    0 19 0x1000: dump pages 0, 19, and 4096

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftests/vm: only some gup_test items are really benchmarks

Therefore, some minor cleanup and improvements are in order:

1. Rename the other items appropriately.

2. Stop reporting timing information on the non-benchmark items. It's
   still being recorded and is available, but there's no point in
   cluttering up the report with data that no one reasonably needs to
   check.

3. Don't do iterations, for non-benchmark items.

4. Print out a shorter, more appropriate report for the non-benchmark
   tests.

5. Add the command that was run, to the report. This really helps, as
   there are quite a lot of options now.

6. Use a larger integer type for cmd, now that it's being compared
   Otherwise it doesn't work, because in this case cmd is about 3 billion,
   which is the perfect size for problems with signed vs unsigned int.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftests/vm: minor cleanup: Makefile and gup_test.c

A few cleanups that don't deserve separate patches, but that also should
not clutter up other functional changes:

1. Remove an unnecessary #include <prctl.h>

2. Restore the sorted order of TEST_GEN_FILES.

3. Add -lpthread to the common LDLIBS, as it is harmless and several
tests use it. This gets rid of one special rule already.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftests/vm: rename run_vmtests --> run_vmtests.sh

Rename to *.sh, in order to match the conventions of all of the other
items in selftest/vm.

The only reason not to use a .sh suffix a shell script like this, might be
to make it look more like a normal program, but that's not an issue here.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftests/vm: use a common gup_test.h

Avoid the need to copy-paste the gup_test ioctl commands and the struct
gup_test definition, between the kernel and the user space application, by
providing a new header file for these.  This allows easier and safer
adding of new ioctl calls, as well as reducing the overall line count.

Details: The header file has to be able to compile independently, because
of the arguably unfortunate way that the Makefile is written: the Makefile
tries to build all of its prerequisites, when really it should be only
building the .c files, and leaving the other prerequisites (LOCAL_HDRS) as
pure dependencies.

That Makefile limitation is probably not worth fixing, but it explains why
one of the includes had to be moved into the new header file.

Also: simplify the ioctl struct (struct gup_test), by deleting the unused
__expansion[10] field.  This sort of thing is what you might see in a
stable ABI, but this low-level, kernel-developer-oriented selftests/vm
system is very much not subject to ABI stability.  So "expansion" and
"reserved" fields are unnecessary here.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/gup_benchmark: rename to mm/gup_test

Patch series "selftests/vm: gup_test, hmm-tests, assorted improvements", v3.

Summary: This series provides two main things, and a number of smaller
supporting goodies.  The two main points are:

1) Add a new sub-test to gup_test, which in turn is a renamed version
   of gup_benchmark.  This sub-test allows nicer testing of dump_pages(),
   at least on user-space pages.

   For quite a while, I was doing a quick hack to gup_test.c whenever I
   wanted to try out changes to dump_page().  Then Matthew Wilcox asked me
   what I meant when I said "I used my dump_page() unit test", and I
   realized that it might be nice to check in a polished up version of
   that.

   Details about how it works and how to use it are in the commit
   description for patch #6 ("selftests/vm: gup_test: introduce the
   dump_pages() sub-test").

2) Fixes a limitation of hmm-tests: these tests are incredibly useful,
   but only if people actually build and run them.  And it turns out that
   libhugetlbfs is a little too effective at throwing a wrench in the
   works, there.  So I've added a little configuration check that removes
   just two of the 21 hmm-tests, if libhugetlbfs is not available.

   Further details in the commit description of patch #8
   ("selftests/vm: hmm-tests: remove the libhugetlbfs dependency").

Other smaller things that this series does:

a) Remove code duplication by creating gup_test.h.

b) Clear up the sub-test organization, and their invocation within
   run_vmtests.sh.

c) Other minor assorted improvements.

[1] v2 is here:
https://lore.kernel.org/linux-doc/20200929212747 [email protected]/

[2] https://lore.kernel.org/r/CAHk-=wgh-TMPHLY3jueHX7Y2fWh3D+nMBqVS__AZm6-oorquWA@mail.gmail.com

This patch (of 9):

Rename nearly every "gup_benchmark" reference and file name to "gup_test".
The one exception is for the actual gup benchmark test itself.

The current code already does a *little* bit more than benchmarking, and
definitely covers more than get_user_pages_fast().  More importantly,
however, subsequent patches are about to add some functionality that is
non-benchmark related.

Closely related changes:

* Kconfig: in addition to renaming the options from GUP_BENCHMARK to
  GUP_TEST, update the help text to reflect that it's no longer a
  benchmark-only test.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/filemap.c: remove else after a return

The `else' is not useful after a `return' in __lock_page_or_retry().

[[email protected]: coding style fixes]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hailong Liu<[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/truncate: add parameter explanation for invalidate_mapping_pagevec

To fix a kernel-doc markups issue:

  mm/truncate.c:646: warning: Function parameter or member 'mapping' not described in 'invalidate_mapping_pagevec'
  mm/truncate.c:646: warning: Function parameter or member 'start' not described in 'invalidate_mapping_pagevec'
  mm/truncate.c:646: warning: Function parameter or member 'end' not described in 'invalidate_mapping_pagevec'
  mm/truncate.c:646: warning: Function parameter or member 'nr_pagevec' not described in 'invalidate_mapping_pagevec'

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alex Shi <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/filemap.c: generic_file_buffered_read() now uses find_get_pages_contig

Convert generic_file_buffered_read() to get pages to read from in batches,
and then copy data to userspace from many pages at once - in particular,
we now don't touch any cachelines that might be contended while we're in
the loop to copy data to userspace.

This is is a performance improvement on workloads that do buffered reads
with large blocksizes, and a very large performance improvement if that
file is also being accessed concurrently by different threads.

On smaller reads (512 bytes), there's a very small performance improvement
(1%, within the margin of error).

akpm: kernel test robot found a 32% speedup on one test:
https://lkml.kernel.org/r/20201030081456.GY31092@shao2-debian

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: kernel test robot <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/filemap/c: break generic_file_buffered_read up into multiple functions

Patch series "generic_file_buffered_read() improvements", v2.

generic_file_buffered_read() has turned into a real monstrosity to work
with.  And it's a major performance improvement, for both small random and
large sequential reads.  On my test box, 4k buffered random reads go from
~150k to ~250k iops, and the improvements to big sequential reads are even
bigger.

This incorporates the fix for IOCB_WAITQ handling that Jens just posted as
well, also factors out lock_page_for_iocb() to improve handling of the
various iocb flags.

This patch (of 2):

This is prep work for changing generic_file_buffered_read() to use
find_get_pages_contig() to batch up all the pagecache lookups.

This patch should be functionally identical to the existing code and
changes as little as of the flow control as possible.  More refactoring
could be done, this patch is intended to be relatively minimal.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kent Overstreet <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/page_owner: record timestamp and pid

Collect the time for each allocation recorded in page owner so that
allocation "surges" can be measured.

Record the pid for each allocation recorded in page owner so that the
source of allocation "surges" can be better identified.

The above is very useful when doing memory analysis. On a crash for
example, we can get this information from kdump (or ramdump) and parse it
to figure out memory allocation problems.

Please note that on x86_64 this increases the size of struct page_owner
from 16 bytes to 32.

Vlastimil: it's not a functionality intended for production, so unless
somebody says they need to enable page_owner for debugging and this
increase prevents them from fitting into available memory, let's not
complicate things with making this optional.

[[email protected]: v3]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liam Mark <[email protected]>
Signed-off-by: Georgi Djakov <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Joonsoo Kim <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix page_owner initializing issue for arm32

Page owner of pages used by page owner itself used is missing on arm32
targets.  The reason is dummy_handle and failure_handle is not initialized
correctly.  Buddy allocator is used to initialize these two handles.
However, buddy allocator is not ready when page owner calls it.  This
change fixed that by initializing page owner after buddy initialization.

The working flow before and after this change are:
original logic:
1. allocated memory for page_ext(using memblock).
2. invoke the init callback of page_ext_ops like page_owner(using buddy
    allocator).
3. initialize buddy.

after this change:
1. allocated memory for page_ext(using memblock).
2. initialize buddy.
3. invoke the init callback of page_ext_ops like page_owner(using buddy
    allocator).

with the change, failure/dummy_handle can get its correct value and page
owner output for example has the one for page owner itself:

  Page allocated via order 2, mask 0x6202c0(GFP_USER|__GFP_NOWARN), pid 1006, ts 67278156558 ns
  PFN 543776 type Unmovable Block 531 type Unmovable Flags 0x0()
    init_page_owner+0x28/0x2f8
    invoke_init_callbacks_flatmem+0x24/0x34
    start_kernel+0x33c/0x5d8

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zhenhua Huang <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

device-dax/kmem: use struct_size()

Linus notes the kernel has had a nice helper for the 'size of struct with
variable array member at the end' operation for a couple years now, use
it.

Link: http://lore.kernel.org/r/CAHk-=wgNTLbvAD8mNTvh+GQyapNWeX20PXhU_+frqEvVq4298w@mail.gmail.com
Link: https://lkml.kernel.org/r/160288261564.3242821.6055291930923876456.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/slub: let number of online CPUs determine the slub page order

The page order of the slab that gets chosen for a given slab cache depends
on the number of objects that can be fit in the slab while meeting other
requirements.  We start with a value of minimum objects based on
nr_cpu_ids that is driven by possible number of CPUs and hence could be
higher than the actual number of CPUs present in the system.  This leads
to calculate_order() chosing a page order that is on the higher side
leading to increased slab memory consumption on systems that have bigger
page sizes.

Hence rely on the number of online CPUs when determining the mininum
objects, thereby increasing the chances of chosing a lower conservative
page order for the slab.

Vlastimil said:
  "Ideally, we would react to hotplug events and update existing caches
   accordingly. But for that, recalculation of order for existing caches
   would have to be made safe, while not affecting hot paths. We have
   removed the sysfs interface with 32a6f409b693 ("mm, slub: remove
   runtime allocation order changes") as it didn't seem easy and worth
   the trouble.

   In case somebody wants to start with a large order right from the
   boot because they know they will hotplug lots of cpus later, they can
   use slub_min_objects= boot param to override this heuristic. So in
   case this change regresses somebody's performance, there's a way
   around it and thus the risk is low IMHO"

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bharata B Rao <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, slub: use kmem_cache_debug_flags() in deactivate_slab()

Commit 9cf7a1118365 ("mm/slub: make add_full() condition more explicit")
replaced an unnecessarily generic kmem_cache_debug(s) check with an
explicit check of SLAB_STORE_USER and #ifdef CONFIG_SLUB_DEBUG.

We can achieve the same specific check with the recently added
kmem_cache_debug_flags() which removes the #ifdef and restores the
no-branch-overhead benefit of static key check when slub debugging is not
enabled.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Abel Wu <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Liu Xiang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/slab: rerform init_on_free earlier

Currently in CONFIG_SLAB init_on_free happens too late, and heap objects
go to the heap quarantine not being erased.

Lets move init_on_free clearing before calling kasan_slab_free(). In that
case heap quarantine will store erased objects, similarly to CONFIG_SLUB=y
behavior.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Popov <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Joonsoo Kim <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, slab, slub: clear the slab_cache field when freeing page

The page allocator expects that page->mapping is NULL for a page being
freed. SLAB and SLUB use the slab_cache field which is in union with
mapping, but before freeing the page, the field is referenced with the
"mapping" name when set to NULL.

It's IMHO more correct (albeit functionally the same) to use the
slab_cache name as that's the field we use in SL*B, and document why we
clear it in a comment (we don't clear fields such as s_mem or freelist, as
page allocator doesn't care about those). While using the 'mapping' name
would automagically keep the code correct if the unions in struct page
changed, such changes should be done consciously and needed changes
evaluated - the comment should help with that.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Joonsoo Kim <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dma-buf: use krealloc_array()

Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Acked-by: Christian König <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hwtracing: intel: use krealloc_array()

Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drm: atomic: use krealloc_array()

Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Acked-by: Daniel Vetter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

edac: ghes: use krealloc_array()

Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

pinctrl: use krealloc_array()

Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

vhost: vringh: use krealloc_array()

Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ALSA: pcm: use krealloc_array()

Use the helper that checks for overflows internally instead of manually
calculating the size of the new array.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Takashi Iwai <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: slab: provide krealloc_array()

When allocating an array of elements, users should check for
multiplication overflow or preferably use one of the provided helpers
like: kmalloc_array().

There's no krealloc_array() counterpart but there are many users who use
regular krealloc() to reallocate arrays. Let's provide an actual
krealloc_array() implementation.

While at it: add some documentation regarding krealloc.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: James Morse <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: slab: clarify krealloc()'s behavior with __GFP_ZERO

Patch series "slab: provide and use krealloc_array()", v3.

Andy brought to my attention the fact that users allocating an array of
equally sized elements should check if the size multiplication doesn't
overflow. This is why we have helpers like kmalloc_array().

However we don't have krealloc_array() equivalent and there are many users
who do their own multiplication when calling krealloc() for arrays.

This series provides krealloc_array() and uses it in a couple places.

A separate series will follow adding devm_krealloc_array() which is needed
in the xilinx adc driver.

This patch (of 9):

__GFP_ZERO is ignored by krealloc() (unless we fall-back to kmalloc()
path, in which case it's honored). Point that out in the kerneldoc.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Christian Knig <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: James Morse <[email protected]>
Cc: Robert Richter <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: "Michael S . Tsirkin" <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/slab_common.c: use list_for_each_entry in dump_unreclaimable_slab()

dump_unreclaimable_slab() acquires the slab_mutex first, and it won't
remove any slab_caches list entry when itering the slab_caches lists.

Thus we do not need list_for_each_entry_safe here, which is against
removal of list entry.

Link: https://lkml.kernel.org/r/20200926043440.GA180545@rlk
Signed-off-by: Hui Su <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arch/Kconfig: fix spelling mistakes

There are a few spelling mistakes in the Kconfig comments and help text.
Fix these.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ocfs2: ratelimit the 'max lookup times reached' notice

Running stress-ng on ocfs2 completely fills the kernel log with 'max
lookup times reached, filesystem may have nested directories.'

Let's ratelimit this message as done with others in the code.

Test-case:

  # mkfs.ocfs2 --mount local $DEV
  # mount $DEV $MNT
  # cd $MNT

  # dmesg -C
  # stress-ng --dirdeep 1 --dirdeep-ops 1000
  # dmesg | grep -c 'max lookup times reached'

Before:

  # dmesg -C
  # stress-ng --dirdeep 1 --dirdeep-ops 1000
  ...
  stress-ng: info:  [11116] successful run completed in 3.03s

  # dmesg | grep -c 'max lookup times reached'
  967

After:

  # dmesg -C
  # stress-ng --dirdeep 1 --dirdeep-ops 1000
  ...
  stress-ng: info:  [739] successful run completed in 0.96s

  # dmesg | grep -c 'max lookup times reached'
  10

  # dmesg
  [  259.086086] ocfs2_check_if_ancestor: 1990 callbacks suppressed
  [  259.086092] (stress-ng-dirde,740,1):ocfs2_check_if_ancestor:1091 max lookup times reached, filesystem may have nested directories, src inode: 18007, dest inode: 17940.
  ...

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mauricio Faria de Oliveira <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/ocfs2/cluster/tcp.c: remove unneeded break

A break is not needed if it is preceded by a goto

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tom Rix <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/ntfs: remove unused variable attr_len

This variable isn't used anymore, remove it to skip W=1 warning:

fs/ntfs/inode.c:2350:6: warning: variable `attr_len' set but not used [-Wunused-but-set-variable]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alex Shi <[email protected]>
Acked-by: Anton Altaparmakov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/ntfs: remove unused varibles

We actually don't use these varibles, so remove them to avoid gcc warning:

fs/ntfs/file.c:326:14: warning: variable `base_ni' set but not used [-Wunused-but-set-variable]
fs/ntfs/logfile.c:481:21: warning: variable `log_page_mask' set but not used [-Wunused-but-set-variable]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alex Shi <[email protected]>
Acked-by: Anton Altaparmakov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ide: remove BUG_ON(in_interrupt() || irqs_disabled()) from ide_unregister()

In the discussion about preempt count consistency across kernel
configurations:

https://lore.kernel.org/r/20200914204209.256266093@linutronix.de/

it was concluded that the usage of in_interrupt() and related context
checks should be removed from non-core code.

Both BUG_ON()s in ide-probe.c were introduced in commit
4015c949fb465 ("[PATCH] update ide core")

when ide_unregister() was extended with semaphore based locking. Both
checks won't complain about disabled preemption which is also wrong.

The might_sleep() in today's mutex_lock() will complain about the
missuses.

Remove the BUG_ON() statements.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Jens Axboe <[email protected]>
Cc: "David S. Miller" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ide/falcon: remove in_interrupt() usage

falconide_get_lock() is called by ide_lock_host() and its caller
(ide_issue_rq()) has already a might_sleep() check.

stdma_lock() has wait_event() which also has a might_sleep() check.

Remove the in_interrupt() check.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Cc: "David S. Miller" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

uapi: move constants from <linux/kernel.h> to <linux/const.h>

and include <linux/const.h> in UAPI headers instead of <linux/kernel.h>.

The reason is to avoid indirect <linux/sysinfo.h> include when using
some network headers: <linux/netlink.h> or others -> <linux/kernel.h>
-> <linux/sysinfo.h>.

This indirect include causes on MUSL redefinition of struct sysinfo when
included both <sys/sysinfo.h> and some of UAPI headers:

    In file included from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/kernel.h:5,
                     from x86_64-buildroot-linux-musl/sysroot/usr/include/linux/netlink.h:5,
                     from ../include/tst_netlink.h:14,
                     from tst_crypto.c:13:
    x86_64-buildroot-linux-musl/sysroot/usr/include/linux/sysinfo.h:8:8: error: redefinition of `struct sysinfo'
     struct sysinfo {
            ^~~~~~~
    In file included from ../include/tst_safe_macros.h:15,
                     from ../include/tst_test.h:93,
                     from tst_crypto.c:11:
    x86_64-buildroot-linux-musl/sysroot/usr/include/sys/sysinfo.h:10:8: note: originally defined here

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Vorel <[email protected]>
Suggested-by: Rich Felker <[email protected]>
Acked-by: Rich Felker <[email protected]>
Cc: Peter Korsgaard <[email protected]>
Cc: Baruch Siach <[email protected]>
Cc: Florian Weimer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kthread_worker: document CPU hotplug handling

The kthread worker API is simple.  In short, it allows to create, use, and
destroy workers.  kthread_create_worker_on_cpu() just allows to bind a
newly created worker to a given CPU.

It is up to the API user how to handle CPU hotplug.  They have to decide
how to handle pending work items, prevent queuing new ones, and restore
the functionality when the CPU goes off and on.  There are few catches:

   + The CPU affinity gets lost when it is scheduled on an offline CPU.

   + The worker might not exist when the CPU was off when the user
     created the workers.

A good practice is to implement two CPU hotplug callbacks and
destroy/create the worker when CPU goes down/up.

Mention this in the function description.

[[email protected]: grammar tweaks]

Link: https://lore.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Reported-by: Zhang Qiang <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kthread: add kthread_work tracepoints

While migrating some code from wq to kthread_worker, I found that I missed
the execute_start/end tracepoints. So add similar tracepoints for
kthread_work. And for completeness, queue_work tracepoint (although this
one differs slightly from the matching workqueue tracepoint).

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rob Clark <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Phil Auld <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Thara Gopinath <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Vincent Donnefort <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ilias Stamatis <[email protected]>
Cc: Liang Chen <[email protected]>
Cc: Ben Dooks <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "J. Bruce Fields" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Linux 5.10

Merge tag 'x86-urgent-2020-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"A set of x86 and membarrier fixes:

   - Correct a few problems in the x86 and the generic membarrier
     implementation. Small corrections for assumptions about visibility
     which have turned out not to be true.

   - Make the PAT bits for memory encryption correct vs 4K and 2M/1G
     page table entries as they are at a different location.

   - Fix a concurrency issue in the the local bandwidth readout of
     resource control leading to incorrect values

   - Fix the ordering of allocating a vector for an interrupt. The order
     missed to respect the provided cpumask when the first attempt of
     allocating node local in the mask fails. It then tries the node
     instead of trying the full provided mask first. This leads to
     erroneous error messages and breaking the (user) supplied affinity
     request. Reorder it.

   - Make the INT3 padding detection in optprobe work correctly"

* tag 'x86-urgent-2020-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kprobes: Fix optprobe to detect INT3 padding correctly
  x86/apic/vector: Fix ordering in vector assignment
  x86/resctrl: Fix incorrect local bandwidth when mba_sc is enabled
  x86/mm/mem_encrypt: Fix definition of PMD_FLAGS_DEC_WP
  membarrier: Execute SYNC_CORE on the calling thread
  membarrier: Explicitly sync remote cores when SYNC_CORE is requested
  membarrier: Add an actual barrier before rseq_preempt()
  x86/membarrier: Get rid of a dubious optimization

Merge tag 'block-5.10-2020-12-12' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"This should be it for 5.10.

  Mike and Song looked into the warning case, and thankfully it appears
  the fix was pretty trivial - we can just change the md device chunk
  type to unsigned int to get rid of it. They cannot currently be < 0,
  and nobody is checking for that either.

  We're reverting the discard changes as the corruption reports came in
  very late, and there's just no time to attempt to deal with it at this
  point. Reverting the changes in question is the right call for 5.10"

* tag 'block-5.10-2020-12-12' of git://git.kernel.dk/linux-block:
  md: change mddev 'chunk_sectors' from int to unsigned
  Revert "md: add md_submit_discard_bio() for submitting discard bio"
  Revert "md/raid10: extend r10bio devs to raid disks"
  Revert "md/raid10: pull codes that wait for blocked dev into one function"
  Revert "md/raid10: improve raid10 discard request"
  Revert "md/raid10: improve discard request for far layout"
  Revert "dm raid: remove unnecessary discard limits for raid10"

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Five small fixes.  Four in drivers:

   - hisi_sas: fix internal queue timeout

   - be2iscsi: revert a prior fix causing problems

   - bnx2i: add missing dependency

   - storvsc: late arriving revert of a problem fix

  and one in the core.

  The core one is a minor change to stop paying attention to the busy
  count when returning out of resources because there's a race window
  where the queue might not restart due to missing returning I/O"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  Revert "scsi: storvsc: Validate length of incoming packet in storvsc_on_channel_callback()"
  scsi: hisi_sas: Select a suitable queue for internal I/Os
  scsi: core: Fix race between handling STS_RESOURCE and completion
  scsi: be2iscsi: Revert "Fix a theoretical leak in beiscsi_create_eqs()"
  scsi: bnx2i: Requires MMU

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fix from Wolfram Sang:
"Bugfix for the AT24 EEPROM driver"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
misc: eeprom: at24: fix NVMEM name with custom AT24 device name

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"Bugfixes for ARM, x86 and tools"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  tools/kvm_stat: Exempt time-based counters
  KVM: mmu: Fix SPTE encoding of MMIO generation upper half
  kvm: x86/mmu: Use cpuid to determine max gfn
  kvm: svm: de-allocate svm_cpu_data for all cpus in svm_cpu_uninit()
  selftests: kvm/set_memory_region_test: Fix race in move region test
  KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort()
  KVM: arm64: Fix handling of merging tables into a block entry
  KVM: arm64: Fix memory leak on stage2 update of a valid PTE

Merge tag 'for-linus-5.10c-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
"A short series fixing a regression introduced in 5.9 for running as
  Xen dom0 on a system with NVMe backed storage"

* tag 'for-linus-5.10c-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: don't use page->lru for ZONE_DEVICE memory
  xen: add helpers for caching grant mapping pages

Merge tag 'riscv-for-linus-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fix from Palmer Dabbelt:
"Just one fix. It's nothing critical, just a randconfig that wasn't
  building. That said, it does seem pretty safe and is technically a
  regression so I'm sending it along for 5.10:

   - define get_cycles64() all the time, as it's used by most
     configurations"

* tag 'riscv-for-linus-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: Define get_cycles64() regardless of M-mode

Merge tag 'io_uring-5.10-2020-12-11' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"Two fixes in here, fixing issues introduced in this merge window"

* tag 'io_uring-5.10-2020-12-11' of git://git.kernel.dk/linux-block:
io_uring: fix file leak on error path of io ctx creation
io_uring: fix mis-seting personality's creds

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

- a fix for cm109 stomping on its own control URB if it tries to toggle
   buzzer immediately after userspace opens input device (found by
   syzcaller)

- another fix for Raydium touchscreens that do not like splitting
   command transfers

- quirks for i8042, soc_button_array, and goodix drivers to make them
   work better with certain hardware.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: goodix - add upside-down quirk for Teclast X98 Pro tablet
  Input: cm109 - do not stomp on control URB
  Input: i8042 - add Acer laptops to the i8042 reset list
  Input: cros_ec_keyb - send 'scancodes' in addition to key events
  Input: soc_button_array - add Lenovo Yoga Tablet2 1051L to the dmi_use_low_level_irq list
  Input: raydium_ts_i2c - do not split tx transactions

md: change mddev 'chunk_sectors' from int to unsigned

Commit e2782f560c29 ("Revert "dm raid: remove unnecessary discard
limits for raid10"") exposed compiler warnings introduced by commit
e0910c8e4f87 ("dm raid: fix discard limits for raid1 and raid10"):

In file included from ./include/linux/kernel.h:14,
                 from ./include/asm-generic/bug.h:20,
                 from ./arch/x86/include/asm/bug.h:93,
                 from ./include/linux/bug.h:5,
                 from ./include/linux/mmdebug.h:5,
                 from ./include/linux/gfp.h:5,
                 from ./include/linux/slab.h:15,
                 from drivers/md/dm-raid.c:8:
drivers/md/dm-raid.c: In function ‘raid_io_hints’:
./include/linux/minmax.h:18:28: warning: comparison of distinct pointer types lacks a cast
  (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                            ^~
./include/linux/minmax.h:32:4: note: in expansion of macro ‘__typecheck’
   (__typecheck(x, y) && __no_side_effects(x, y))
    ^~~~~~~~~~~
./include/linux/minmax.h:42:24: note: in expansion of macro ‘__safe_cmp’
  __builtin_choose_expr(__safe_cmp(x, y), \
                        ^~~~~~~~~~
./include/linux/minmax.h:51:19: note: in expansion of macro ‘__careful_cmp’
#define min(x, y) __careful_cmp(x, y, <)
                   ^~~~~~~~~~~~~
./include/linux/minmax.h:84:39: note: in expansion of macro ‘min’
  __x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); })
                                       ^~~
drivers/md/dm-raid.c:3739:33: note: in expansion of macro ‘min_not_zero’
   limits->max_discard_sectors = min_not_zero(rs->md.chunk_sectors,
                                 ^~~~~~~~~~~~

Fix this by changing the chunk_sectors member of 'struct mddev' from
int to 'unsigned int' to match the type used for the 'chunk_sectors'
member of 'struct queue_limits'.  Various MD code still uses 'int' but
none of it appears to ever make use of signed int; and storing
positive signed int in unsigned is perfectly safe.

Reported-by: Song Liu <[email protected]>
Fixes: e2782f560c29 ("Revert "dm raid: remove unnecessary discard limits for raid10"")
Fixes: e0910c8e4f87 ("dm raid: fix discard limits for raid1 and raid10")
Cc: stable@vger,kernel.org # e0910c8e4f87 was marked for stable@
Signed-off-by: Mike Snitzer <[email protected]>
Reviewed-by: Song Liu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

x86/kprobes: Fix optprobe to detect INT3 padding correctly

Commit

7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")

changed the padding bytes between functions from NOP to INT3. However,
when optprobe decodes a target function it finds INT3 and gives up the
jump optimization.

Instead of giving up any INT3 detection, check whether the rest of the
bytes to the end of the function are INT3. If all of them are INT3,
those come from the linker. In that case, continue the optprobe jump
optimization.

[ bp: Massage commit message. ]

Fixes: 7705dc855797 ("x86/vmlinux: Use INT3 instead of NOP for linker fill bytes")
Reported-by: Adam Zabrocki <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/160767025681.3880685.16021570341428835411.stgit@devnote2

Input: goodix - add upside-down quirk for Teclast X98 Pro tablet

The touchscreen on the Teclast x98 Pro is also mounted upside-down in
relation to the display orientation.

Signed-off-by: Simon Beginn <[email protected]>
Signed-off-by: Bastien Nocera <[email protected]>
Link: https://lore.kernel.org/r/20201117004253.27A5A27EFD@localhost
Signed-off-by: Dmitry Torokhov <[email protected]>

tools/kvm_stat: Exempt time-based counters

The new counters halt_poll_success_ns and halt_poll_fail_ns do not count
events. Instead they provide a time, and mess up our statistics. Therefore,
we should exclude them.
Removal is currently implemented with an exempt list. If more counters like
these appear, we can think about a more general rule like excluding all
fields name "*_ns", in case that's a standing convention.

Signed-off-by: Stefan Raspl <[email protected]>
Tested-and-reported-by: Christian Borntraeger <[email protected]>
Message-Id: <20201208210829 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: mmu: Fix SPTE encoding of MMIO generation upper half

Commit cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling")
cleaned up the computation of MMIO generation SPTE masks, however it
introduced a bug how the upper part was encoded:
SPTE bits 52-61 were supposed to contain bits 10-19 of the current
generation number, however a missing shift encoded bits 1-10 there instead
(mostly duplicating the lower part of the encoded generation number that
then consisted of bits 1-9).

In the meantime, the upper part was shrunk by one bit and moved by
subsequent commits to become an upper half of the encoded generation number
(bits 9-17 of bits 0-17 encoded in a SPTE).

In addition to the above, commit 56871d444bc4 ("KVM: x86: fix overlap between SPTE_MMIO_MASK and generation")
has changed the SPTE bit range assigned to encode the generation number and
the total number of bits encoded but did not update them in the comment
attached to their defines, nor in the KVM MMU doc.
Let's do it here, too, since it is too trivial thing to warrant a separate
commit.

Fixes: cae7ed3c2cb0 ("KVM: x86: Refactor the MMIO SPTE generation handling")
Signed-off-by: Maciej S. Szmigiero <[email protected]>
Message-Id: <156700708db2a5296c5ed7a8b9ac71f1e9765c85.1607129096 [email protected]>
Cc: [email protected]
[Reorganize macros so that everything is computed from the bit ranges. - Paolo]
Signed-off-by: Paolo Bonzini <[email protected]>

Merge tag 'mtd/fixes-for-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull mtd fixes from Miquel Raynal:
"Second series of fixes for raw NAND drivers initiated because of a
  rework of the ECC engine subsystem.

  The location of the DT parsing logic got moved, breaking several
  drivers which in fact were not doing the ECC engine initialization at
  the right place.

  These drivers have been fixed by enforcing a particular ECC engine
  type and algorithm, software Hamming, while the algorithm may be
  overwritten by a DT property. This merge request fixes this in the
  xway, socrates, plat_nand, pasemi, orion, mpc5121, gpio, au1550 and
  ams-delta controller drivers"

* tag 'mtd/fixes-for-5.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: rawnand: xway: Do not force a particular software ECC engine
  mtd: rawnand: socrates: Do not force a particular software ECC engine
  mtd: rawnand: plat_nand: Do not force a particular software ECC engine
  mtd: rawnand: pasemi: Do not force a particular software ECC engine
  mtd: rawnand: orion: Do not force a particular software ECC engine
  mtd: rawnand: mpc5121: Do not force a particular software ECC engine
  mtd: rawnand: gpio: Do not force a particular software ECC engine
  mtd: rawnand: au1550: Do not force a particular software ECC engine
  mtd: rawnand: ams-delta: Do not force a particular software ECC engine

Merge tag 'mmc-v5.10-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
"A couple of MMC fixes:

  MMC core:
   - Fixup condition for CMD13 polling for RPMB requests

  MMC host:
   - mtk-sd: Fix system suspend/resume support for CQHCI
   - mtd-sd: Extend SDIO IRQ fix to more variants
   - sdhci-of-arasan: Fix clock registration error for Keem Bay SOC
   - tmio: Bring HW to a sane state after a power off"

* tag 'mmc-v5.10-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: mediatek: mark PM functions as __maybe_unused
  mmc: block: Fixup condition for CMD13 polling for RPMB requests
  mmc: tmio: improve bringing HW to a sane state with MMC_POWER_OFF
  mmc: sdhci-of-arasan: Fix clock registration error for Keem Bay SOC
  mmc: mediatek: Extend recheck_sdio_irq fix to more variants
  mmc: mediatek: Fix system suspend/resume support for CQHCI

Merge tag 'at24-fixes-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-current

at24 fixes for v5.10

- fix NVMEM name with custom AT24 device name

Merge tag 'zonefs-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs fix from Damien Le Moal:
"A single patch in this pull request to fix a BIO and page reference
leak when writing sequential zone files"

* tag 'zonefs-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: fix page reference and BIO leak

bpf: Fix enum names for bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpers

Remove bpf_ prefix, which causes these helpers to be reported in verifier
dump as bpf_bpf_this_cpu_ptr() and bpf_bpf_per_cpu_ptr(), respectively. Lets
fix it as long as it is still possible before UAPI freezes on these helpers.

Fixes: eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()")
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"8 patches.

  Subsystems affected by this patch series: proc, selftests, kbuild, and
  mm (pagecache, kasan, hugetlb)"

* emailed patches from Andrew Morton <[email protected]>:
  mm/hugetlb: clear compound_nr before freeing gigantic pages
  kasan: fix object remaining in offline per-cpu quarantine
  elfcore: fix building with clang
  initramfs: fix clang build failure
  kbuild: avoid static_assert for genksyms
  selftest/fpu: avoid clang warning
  proc: use untagged_addr() for pagemap_read addresses
  revert "mm/filemap: add static for function __add_to_page_cache_locked"

mm/hugetlb: clear compound_nr before freeing gigantic pages

Commit 1378a5ee451a ("mm: store compound_nr as well as compound_order")
added compound_nr counter to first tail struct page, overlaying with
page->mapping.  The overlay itself is fine, but while freeing gigantic
hugepages via free_contig_range(), a "bad page" check will trigger for
non-NULL page->mapping on the first tail page:

  BUG: Bad page state in process bash  pfn:380001
  page:00000000c35f0856 refcount:0 mapcount:0 mapping:00000000126b68aa index:0x0 pfn:0x380001
  aops:0x0
  flags: 0x3ffff00000000000()
  raw: 3ffff00000000000 0000000000000100 0000000000000122 0000000100000000
  raw: 0000000000000000 0000000000000000 ffffffff00000000 0000000000000000
  page dumped because: non-NULL mapping
  Modules linked in:
  CPU: 6 PID: 616 Comm: bash Not tainted 5.10.0-rc7-next-20201208 #1
  Hardware name: IBM 3906 M03 703 (LPAR)
  Call Trace:
    show_stack+0x6e/0xe8
    dump_stack+0x90/0xc8
    bad_page+0xd6/0x130
    free_pcppages_bulk+0x26a/0x800
    free_unref_page+0x6e/0x90
    free_contig_range+0x94/0xe8
    update_and_free_page+0x1c4/0x2c8
    free_pool_huge_page+0x11e/0x138
    set_max_huge_pages+0x228/0x300
    nr_hugepages_store_common+0xb8/0x130
    kernfs_fop_write+0xd2/0x218
    vfs_write+0xb0/0x2b8
    ksys_write+0xac/0xe0
    system_call+0xe6/0x288
  Disabling lock debugging due to kernel taint

This is because only the compound_order is cleared in
destroy_compound_gigantic_page(), and compound_nr is set to
1U << order == 1 for order 0 in set_compound_order(page, 0).

Fix this by explicitly clearing compound_nr for first tail page after
calling set_compound_order(page, 0).

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1378a5ee451a ("mm: store compound_nr as well as compound_order")
Signed-off-by: Gerald Schaefer <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: <[email protected]> [5.9+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kasan: fix object remaining in offline per-cpu quarantine

We hit this issue in our internal test.  When enabling generic kasan, a
kfree()'d object is put into per-cpu quarantine first.  If the cpu goes
offline, object still remains in the per-cpu quarantine.  If we call
kmem_cache_destroy() now, slub will report "Objects remaining" error.

  =============================================================================
  BUG test_module_slab (Not tainted): Objects remaining in test_module_slab on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------

  Disabling lock debugging due to kernel taint
  INFO: Slab 0x(____ptrval____) objects=34 used=1 fp=0x(____ptrval____) flags=0x2ffff00000010200
  CPU: 3 PID: 176 Comm: cat Tainted: G    B             5.10.0-rc1-00007-g4525c8781ec0-dirty #10
  Hardware name: linux,dummy-virt (DT)
  Call trace:
     dump_backtrace+0x0/0x2b0
     show_stack+0x18/0x68
     dump_stack+0xfc/0x168
     slab_err+0xac/0xd4
     __kmem_cache_shutdown+0x1e4/0x3c8
     kmem_cache_destroy+0x68/0x130
     test_version_show+0x84/0xf0
     module_attr_show+0x40/0x60
     sysfs_kf_seq_show+0x128/0x1c0
     kernfs_seq_show+0xa0/0xb8
     seq_read+0x1f0/0x7e8
     kernfs_fop_read+0x70/0x338
     vfs_read+0xe4/0x250
     ksys_read+0xc8/0x180
     __arm64_sys_read+0x44/0x58
     el0_svc_common.constprop.0+0xac/0x228
     do_el0_svc+0x38/0xa0
     el0_sync_handler+0x170/0x178
     el0_sync+0x174/0x180
  INFO: Object 0x(____ptrval____) @offset=15848
  INFO: Allocated in test_version_show+0x98/0xf0 age=8188 cpu=6 pid=172
     stack_trace_save+0x9c/0xd0
     set_track+0x64/0xf0
     alloc_debug_processing+0x104/0x1a0
     ___slab_alloc+0x628/0x648
     __slab_alloc.isra.0+0x2c/0x58
     kmem_cache_alloc+0x560/0x588
     test_version_show+0x98/0xf0
     module_attr_show+0x40/0x60
     sysfs_kf_seq_show+0x128/0x1c0
     kernfs_seq_show+0xa0/0xb8
     seq_read+0x1f0/0x7e8
     kernfs_fop_read+0x70/0x338
     vfs_read+0xe4/0x250
     ksys_read+0xc8/0x180
     __arm64_sys_read+0x44/0x58
     el0_svc_common.constprop.0+0xac/0x228
  kmem_cache_destroy test_module_slab: Slab cache still has objects

Register a cpu hotplug function to remove all objects in the offline
per-cpu quarantine when cpu is going offline.  Set a per-cpu variable to
indicate this cpu is offline.

[[email protected]: fix slab double free when cpu-hotplug]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kuan-Ying Lee <[email protected]>
Signed-off-by: Zqiang <[email protected]>
Suggested-by: Dmitry Vyukov <[email protected]>
Reported-by: Guangye Yang <[email protected]>
Reviewed-by: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Matthias Brugger <[email protected]>
Cc: Nicholas Tang <[email protected]>
Cc: Miles Chen <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

elfcore: fix building with clang

kernel/elfcore.c only contains weak symbols, which triggers a bug with
clang in combination with recordmcount:

  Cannot find symbol for section 2: .text.
  kernel/elfcore.o: failed

Move the empty stubs into linux/elfcore.h as inline functions.  As only
two architectures use these, just use the architecture specific Kconfig
symbols to key off the declaration.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Barret Rhoden <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

initramfs: fix clang build failure

There is only one function in init/initramfs.c that is in the .text
section, and it is marked __weak.  When building with clang-12 and the
integrated assembler, this leads to a bug with recordmcount:

  ./scripts/recordmcount  "init/initramfs.o"
  Cannot find symbol for section 2: .text.
  init/initramfs.o: failed

I'm not quite sure what exactly goes wrong, but I notice that this
function is only ever called from an __init function, and normally
inlined.  Marking it __init as well is clearly correct and it leads to
recordmcount no longer complaining.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Barret Rhoden <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kbuild: avoid static_assert for genksyms

genksyms does not know or care about the _Static_assert() built-in, and
sometimes falls back to ignoring the later symbols, which causes
undefined behavior such as

  WARNING: modpost: EXPORT symbol "ethtool_set_ethtool_phy_ops" [vmlinux] version generation failed, symbol will not be versioned.
  ld: net/ethtool/common.o: relocation R_AARCH64_ABS32 against `__crc_ethtool_set_ethtool_phy_ops' can not be used when making a shared object
  net/ethtool/common.o:(_ftrace_annotated_branch+0x0): dangerous relocation: unsupported relocation

Redefine static_assert for genksyms to avoid that.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Suggested-by: Ard Biesheuvel <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Rikard Falkeborn <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftest/fpu: avoid clang warning

With extra warnings enabled, clang complains about the redundant
-mhard-float argument:

clang: error: argument unused during compilation: '-mhard-float' [-Werror,-Wunused-command-line-argument]

Move this into the gcc-only part of the Makefile.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4185b3b92792 ("selftests/fpu: Add an FPU selftest")
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Petteri Aimonen <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

proc: use untagged_addr() for pagemap_read addresses

When we try to visit the pagemap of a tagged userspace pointer, we find
that the start_vaddr is not correct because of the tag.
To fix it, we should untag the userspace pointers in pagemap_read().

I tested with 5.10-rc4 and the issue remains.

Explanation from Catalin in [1]:

"Arguably, that's a user-space bug since tagged file offsets were never
  supported. In this case it's not even a tag at bit 56 as per the arm64
  tagged address ABI but rather down to bit 47. You could say that the
  problem is caused by the C library (malloc()) or whoever created the
  tagged vaddr and passed it to this function. It's not a kernel
  regression as we've never supported it.

  Now, pagemap is a special case where the offset is usually not
  generated as a classic file offset but rather derived by shifting a
  user virtual address. I guess we can make a concession for pagemap
  (only) and allow such offset with the tag at bit (56 - PAGE_SHIFT + 3)"

My test code is based on [2]:

A userspace pointer which has been tagged by 0xb4: 0xb400007662f541c8

userspace program:

  uint64 OsLayer::VirtualToPhysical(void *vaddr) {
uint64 frame, paddr, pfnmask, pagemask;
int pagesize = sysconf(_SC_PAGESIZE);
off64_t off = ((uintptr_t)vaddr) / pagesize * 8; // off = 0xb400007662f541c8 / pagesize * 8 = 0x5a00003b317aa0
int fd = open(kPagemapPath, O_RDONLY);
...

if (lseek64(fd, off, SEEK_SET) != off || read(fd, &frame, 8) != 8) {
int err = errno;
string errtxt = ErrorString(err);
if (fd >= 0)
close(fd);
return 0;
}
  ...
  }

kernel fs/proc/task_mmu.c:

  static ssize_t pagemap_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
  {
...
src = *ppos;
svpfn = src / PM_ENTRY_BYTES; // svpfn == 0xb400007662f54
start_vaddr = svpfn << PAGE_SHIFT; // start_vaddr == 0xb400007662f54000
end_vaddr = mm->task_size;

/* watch out for wraparound */
// svpfn == 0xb400007662f54
// (mm->task_size >> PAGE) == 0x8000000
if (svpfn > mm->task_size >> PAGE_SHIFT) // the condition is true because of the tag 0xb4
start_vaddr = end_vaddr;

ret = 0;
while (count && (start_vaddr < end_vaddr)) { // we cannot visit correct entry because start_vaddr is set to end_vaddr
int len;
unsigned long end;
...
}
...
  }

[1] https://lore.kernel.org/patchwork/patch/1343258/
[2] https://github.com/stressapptest/stressapptest/blob/master/src/os.cc#L158

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miles Chen <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Song Bao Hua (Barry Song) <[email protected]>
Cc: <[email protected]> [5.4-]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

revert "mm/filemap: add static for function __add_to_page_cache_locked"

Revert commit 3351b16af494 ("mm/filemap: add static for function
__add_to_page_cache_locked") due to incompatibility with
ALLOW_ERROR_INJECTION which result in build errors.

Link: https://lkml.kernel.org/r/CAADnVQJ6tmzBXvtroBuEH6QA0H+q7yaSKxrVvVxhqr3KBZdEXg@mail.gmail.com
Tested-by: Justin Forbes <[email protected]>
Tested-by: Greg Thelen <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Cc: Michal Kubecek <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Input: cm109 - do not stomp on control URB

We need to make sure we are not stomping on the control URB that was
issued when opening the device when attempting to toggle buzzer.
To do that we need to mark it as pending in cm109_open().

Reported-and-tested-by: [email protected]
Cc: [email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>

mtd: rawnand: xway: Do not force a particular software ECC engine

Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.

Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.

Add the necessary logic to be sure Hamming keeps being only a default.

Fixes: d525914b5bd8 ("mtd: rawnand: xway: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/linux-mtd/[email protected]

mtd: rawnand: socrates: Do not force a particular software ECC engine

Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.

Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.

Add the necessary logic to be sure Hamming keeps being only a default.

Fixes: b36bf0a0fe5d ("mtd: rawnand: socrates: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/linux-mtd/[email protected]

mtd: rawnand: plat_nand: Do not force a particular software ECC engine

Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.

Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.

Add the necessary logic to be sure Hamming keeps being only a default.

Fixes: 612e048e6aab ("mtd: rawnand: plat_nand: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/linux-mtd/[email protected]

mtd: rawnand: pasemi: Do not force a particular software ECC engine

Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.

Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.

Add the necessary logic to be sure Hamming keeps being only a default.

Fixes: 8fc6f1f042b2 ("mtd: rawnand: pasemi: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/linux-mtd/[email protected]

mtd: rawnand: orion: Do not force a particular software ECC engine

Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.

Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.

Add the necessary logic to be sure Hamming keeps being only a default.

Reported-by: Chris Packham <[email protected]>
Fixes: 553508cec2e8 ("mtd: rawnand: orion: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <[email protected]>
Tested-by: Chris Packham <[email protected]>
Link: https://lore.kernel.org/linux-mtd/[email protected]

mtd: rawnand: mpc5121: Do not force a particular software ECC engine

Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.

Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.

Add the necessary logic to be sure Hamming keeps being only a default.

Fixes: 6dd09f775b72 ("mtd: rawnand: mpc5121: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/linux-mtd/[email protected]

mtd: rawnand: gpio: Do not force a particular software ECC engine

Originally, commit d7157ff49a5b ("mtd: rawnand: Use the ECC framework
user input parsing bits") kind of broke the logic around the
initialization of several ECC engines.

Unfortunately, the fix (which indeed moved the ECC initialization to
the right place) did not take into account the fact that a different
ECC algorithm could have been used thanks to a DT property,
considering the "Hamming" algorithm entry a configuration while it was
only a default.

Add the necessary logic to be sure Hamming keeps being only a default.

Fixes: f6341f6448e0 ("mtd: rawnand: gpio: Move the ECC initialization to ->attach_chip()")
Signed-off-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/linux-mtd/[email protected]