Git Repo - linux.git/log

mm: gup: remove set but unused local variable major

Patch series "Cleanups and fixup for gup".

This series contains cleanups to remove unneeded variable, useless BUG_ON
and use helper to improve readability.  Also we fix a potential pgmap
refcnt leak.  More details can be found in the respective changelogs.

This patch (of 5):

Since commit a2beb5f1efed ("mm: clean up the last pieces of page fault
accountings"), the local variable major is unused.  Remove it.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

include/linux/buffer_head.h: fix boolreturn.cocci warnings

./include/linux/buffer_head.h:412:64-65:WARNING:return of 0/1 in
function 'has_bh_in_lru' with return type bool

Return statements in functions returning bool should use true/false
instead of 1/0.

Generated by: scripts/coccinelle/misc/boolreturn.cocci

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jing Yangyang <[email protected]>
Reported-by: Zeal Robot <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

writeback: memcg: simplify cgroup_writeback_by_id

Currently cgroup_writeback_by_id calls mem_cgroup_wb_stats() to get dirty
pages for a memcg.  However mem_cgroup_wb_stats() does a lot more than
just get the number of dirty pages.  Just directly get the number of dirty
pages instead of calling mem_cgroup_wb_stats().  Also
cgroup_writeback_by_id() is only called for best-effort dirty flushing, so
remove the unused 'nr' parameter and no need to explicitly flush memcg
stats.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Shakeel Butt <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: inode: count invalidated shadow pages in pginodesteal

pginodesteal is supposed to capture the impact that inode reclaim has on
the page cache state. Currently, it doesn't consider shadow pages that
get dropped this way, even though this can have a significant impact on
paging behavior, memory pressure calculations etc.

To improve visibility into these effects, make sure shadow pages get
counted when they get dropped through inode reclaim.

This changes the return value semantics of invalidate_mapping_pages()
semantics slightly, but the only two users are the inode shrinker itsel
and a usb driver that logs it for debugging purposes.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: drop_caches: fix skipping over shadow cache inodes

When drop_caches truncates the page cache in an inode it also includes any
shadow entries for evicted pages. However, there is a preliminary check
on whether the inode has pages: if it has *only* shadow entries, it will
skip running truncation on the inode and leave it behind.

Fix the check to mapping_empty(), such that it runs truncation on any
inode that has cache entries at all.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Reported-by: Roman Gushchin <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: remove irqsave/restore locking from contexts with irqs enabled

The page cache deletion paths all have interrupts enabled, so no need to
use irqsafe/irqrestore locking variants.

They used to have irqs disabled by the memcg lock added in commit
c4843a7593a9 ("memcg: add per cgroup dirty page accounting"), but that has
since been replaced by memcg taking the page lock instead, commit
0a31bc97c80c ("mm: memcontrol: rewrite uncharge AP").

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

writeback: use READ_ONCE for unlocked reads of writeback stats

We do some unlocked reads of writeback statistics like
avg_write_bandwidth, dirty_ratelimit, or bw_time_stamp. Generally we are
fine with getting somewhat out-of-date values but actually getting
different values in various parts of the functions because the compiler
decided to reload value from original memory location could confuse
calculations. Use READ_ONCE for these unlocked accesses and WRITE_ONCE
for the updates to be on the safe side.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jan Kara <[email protected]>
Cc: Michael Stapelberg <[email protected]>
Cc: Wu Fengguang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

writeback: rename domain_update_bandwidth()

Rename domain_update_bandwidth() to domain_update_dirty_limit(). The
original name is a misnomer. The function has nothing to do with a
bandwidth, it updates dirty limits.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jan Kara <[email protected]>
Cc: Michael Stapelberg <[email protected]>
Cc: Wu Fengguang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

writeback: fix bandwidth estimate for spiky workload

Michael Stapelberg has reported that for workload with short big spikes of
writes (GCC linker seem to trigger this frequently) the write throughput
is heavily underestimated and tends to steadily sink until it reaches
zero.  This has rather bad impact on writeback throttling (causing
stalls).  The problem is that writeback throughput estimate gets updated
at most once per 200 ms.  One update happens early after we submit pages
for writeback (at that point writeout of only small fraction of pages is
completed and thus observed throughput is tiny).  Next update happens only
during the next write spike (updates happen only from inode writeback and
dirty throttling code) and if that is more than 1s after previous spike,
we decide system was idle and just ignore whatever was written until this
moment.

Fix the problem by making sure writeback throughput estimate is also
updated shortly after writeback completes to get reasonable estimate of
throughput for spiky workloads.

[[email protected]: avoid division by 0 in wb_update_dirty_ratelimit()]

Link: https://lore.kernel.org/lkml/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jan Kara <[email protected]>
Reported-by: Michael Stapelberg <[email protected]>
Tested-by: Michael Stapelberg <[email protected]>
Cc: Wu Fengguang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

writeback: reliably update bandwidth estimation

Currently we trigger writeback bandwidth estimation from
balance_dirty_pages() and from wb_writeback(). However neither of these
need to trigger when the system is relatively idle and writeback is
triggered e.g. from fsync(2). Make sure writeback estimates happen
reliably by triggering them from do_writepages().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jan Kara <[email protected]>
Cc: Michael Stapelberg <[email protected]>
Cc: Wu Fengguang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

writeback: track number of inodes under writeback

Patch series "writeback: Fix bandwidth estimates", v4.

Fix estimate of writeback throughput when device is not fully busy doing
writeback. Michael Stapelberg has reported that such workload (e.g.
generated by linking) tends to push estimated throughput down to 0 and as
a result writeback on the device is practically stalled.

The first three patches fix the reported issue, the remaining two patches
are unrelated cleanups of problems I've noticed when reading the code.

This patch (of 4):

Track number of inodes under writeback for each bdi_writeback structure.
We will use this to decide whether wb does any IO and so we can estimate
its writeback throughput. In principle we could use number of pages under
writeback (WB_WRITEBACK counter) for this however normal percpu counter
reads are too inaccurate for our purposes and summing the counter is too
expensive.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jan Kara <[email protected]>
Cc: Wu Fengguang <[email protected]>
Cc: Michael Stapelberg <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: add kernel_misc_reclaimable in show_free_areas

Print NR_KERNEL_MISC_RECLAIMABLE stat from show_free_areas() so users can
check whether the shrinker is working correctly and to show the current
memory usage.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: liuhailong <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: report a more useful address for reclaim acquisition

A recent lockdep report included these lines:

[   96.177910] 3 locks held by containerd/770:
[   96.177934]  #0: ffff88810815ea28 (&mm->mmap_lock#2){++++}-{3:3},
at: do_user_addr_fault+0x115/0x770
[   96.177999]  #1: ffffffff82915020 (rcu_read_lock){....}-{1:2}, at:
get_swap_device+0x33/0x140
[   96.178057]  #2: ffffffff82955ba0 (fs_reclaim){+.+.}-{0:0}, at:
__fs_reclaim_acquire+0x5/0x30

While it was not useful to that bug report to know where the reclaim lock
had been acquired, it might be useful under other circumstances.  Allow
the caller of __fs_reclaim_acquire to specify the instruction pointer to
use.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Omar Sandoval <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Boqun Feng <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/debug_vm_pgtable: fix corrupted page flag

In page table entry modifying tests, set_xxx_at() are used to populate
the page table entries. On ARM64, PG_arch_1 (PG_dcache_clean) flag is
set to the target page flag if execution permission is given. The logic
exits since commit 4f04d8f00545 ("arm64: MMU definitions"). The page
flag is kept when the page is free'd to buddy's free area list. However,
it will trigger page checking failure when it's pulled from the buddy's
free area list, as the following warning messages indicate.

   BUG: Bad page state in process memhog  pfn:08000
   page:0000000015c0a628 refcount:0 mapcount:0 \
        mapping:0000000000000000 index:0x1 pfn:0x8000
   flags: 0x7ffff8000000800(arch_1|node=0|zone=0|lastcpupid=0xfffff)
   raw: 07ffff8000000800 dead000000000100 dead000000000122 0000000000000000
   raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
   page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set

This fixes the issue by clearing PG_arch_1 through flush_dcache_page()
after set_xxx_at() is called. For architectures other than ARM64, the
unexpected overhead of cache flushing is acceptable.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: a5c3b9ffb0f4 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx]
Tested-by: Gerald Schaefer <[email protected]> [s390]
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chunyu Hu <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/debug_vm_pgtable: remove unused code

The variables used by old implementation isn't needed as we switched to
"struct pgtable_debug_args". Lets remove them and related code in
debug_vm_pgtable().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx]
Tested-by: Gerald Schaefer <[email protected]> [s390]
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chunyu Hu <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/debug_vm_pgtable: use struct pgtable_debug_args in PGD and P4D modifying tests

This uses struct pgtable_debug_args in PGD/P4D modifying tests. No
allocated huge page is used in these tests. Besides, the unused variable
@saved_p4dp and @saved_pudp are dropped.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx]
Tested-by: Gerald Schaefer <[email protected]> [s390]
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chunyu Hu <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/debug_vm_pgtable: use struct pgtable_debug_args in PUD modifying tests

This uses struct pgtable_debug_args in PUD modifying tests.  The allocated
huge page is used when set_pud_at() is used.  The corresponding tests are
skipped if the huge page doesn't exist.  Besides, the following unused
variables in debug_vm_pgtable() are dropped: @prot, @paddr, @pud_aligned.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx]
Tested-by: Gerald Schaefer <[email protected]> [s390]
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chunyu Hu <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/debug_vm_pgtable: use struct pgtable_debug_args in PMD modifying tests

This uses struct pgtable_debug_args in PMD modifying tests.  The allocated
huge page is used when set_pmd_at() is used.  The corresponding tests are
skipped if the huge page doesn't exist.  Besides, the unused variable
@pmd_aligned in debug_vm_pgtable() is dropped.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx]
Tested-by: Gerald Schaefer <[email protected]> [s390]
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chunyu Hu <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/debug_vm_pgtable: use struct pgtable_debug_args in PTE modifying tests

This uses struct pgtable_debug_args in PTE modifying tests.  The allocated
page is used as set_pte_at() is used there.  The tests are skipped if the
allocated page doesn't exist.  It's notable that args->ptep need to be
mapped before the tests.  The reason why we don't map args->ptep at the
beginning is PTE entry is only mapped and accessible in atomic context
when CONFIG_HIGHPTE is enabled.  So we avoid to do that so that atomic
context is only enabled if needed.

Besides, the unused variable @pte_aligned and @ptep in debug_vm_pgtable()
are dropped.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx]
Tested-by: Gerald Schaefer <[email protected]> [s390]
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chunyu Hu <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/debug_vm_pgtable: use struct pgtable_debug_args in migration and thp tests

This uses struct pgtable_debug_args in the migration and thp test
functions. It's notable that the pre-allocated page is used in
swap_migration_tests() as set_pte_at() is used there.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx]
Tested-by: Gerald Schaefer <[email protected]> [s390]
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chunyu Hu <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/debug_vm_pgtable: use struct pgtable_debug_args in soft_dirty and swap tests

This uses struct pgtable_debug_args in the soft_dirty and swap test
functions.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx]
Tested-by: Gerald Schaefer <[email protected]> [s390]
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chunyu Hu <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/debug_vm_pgtable: use struct pgtable_debug_args in protnone and devmap tests

This uses struct pgtable_debug_args in protnone and devmap test functions.
After that, the unused variable @protnone in debug_vm_pgtable() is
dropped.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx]
Tested-by: Gerald Schaefer <[email protected]> [s390]
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chunyu Hu <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/debug_vm_pgtable: use struct pgtable_debug_args in leaf and savewrite tests

This uses struct pgtable_debug_args in the leaf and savewrite test
functions.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx]
Tested-by: Gerald Schaefer <[email protected]> [s390]
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chunyu Hu <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/debug_vm_pgtable: use struct pgtable_debug_args in basic tests

This uses struct pgtable_debug_args in the basic test functions. The
unused variables @pgd_aligned and @p4d_aligned in debug_vm_pgtable() are
dropped.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx]
Tested-by: Gerald Schaefer <[email protected]> [s390]
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chunyu Hu <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/debug_vm_pgtable: introduce struct pgtable_debug_args

Patch series "mm/debug_vm_pgtable: Enhancements", v6.

There are a couple of issues with current implementations and this series
tries to resolve the issues:

  (a) All needed information are scattered in variables, passed to various
      test functions. The code is organized in pretty much relaxed fashion.

  (b) The page isn't allocated from buddy during page table entry modifying
      tests. The page can be invalid, conflicting to the implementations
      of set_xxx_at() on ARM64. The target page is accessed so that the
      iCache can be flushed when execution permission is given on ARM64.
      Besides, the target page can be unmapped and accessing to it causes
      kernel crash.

"struct pgtable_debug_args" is introduced to address issue (a).  For issue
(b), the used page is allocated from buddy in page table entry modifying
tests.  The corresponding tets will be skipped if we fail to allocate the
(huge) page.  For other test cases, the original page around to kernel
symbol (@start_kernel) is still used.

The patches are organized as below.  PATCH[2-10] could be combined to one
patch, but it will make the review harder:

  PATCH[1] introduces "struct pgtable_debug_args" as place holder of all
           needed information. With it, the old and new implementation
           can coexist.
  PATCH[2-10] uses "struct pgtable_debug_args" in various test functions.
  PATCH[11] removes the unused code for old implementation.
  PATCH[12] fixes the issue of corrupted page flag for ARM64

This patch (of 6):

In debug_vm_pgtable(), there are many local variables introduced to track
the needed information and they are passed to the functions for various
test cases.  It'd better to introduce a struct as place holder for these
information.  With it, what the tests functions need is the struct.  In
this way, the code is simplified and easier to be maintained.

Besides, set_xxx_at() could access the data on the corresponding pages in
the page table modifying tests.  So the accessed pages in the tests should
have been allocated from buddy.  Otherwise, we're accessing pages that
aren't owned by us.  This causes issues like page flag corruption or
kernel crash on accessing unmapped page when CONFIG_DEBUG_PAGEALLOC is
enabled.

This introduces "struct pgtable_debug_args".  The struct is initialized
and destroyed, but the information in the struct isn't used yet.  It will
be used in subsequent patches.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gavin Shan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Tested-by: Christophe Leroy <[email protected]> [powerpc 8xx]
Tested-by: Gerald Schaefer <[email protected]> [s390]
Cc: Anshuman Khandual <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Chunyu Hu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arch/csky/kernel/probes/kprobes.c: fix bugon.cocci warnings

Use BUG_ON instead of a if condition followed by BUG.

Generated by: scripts/coccinelle/misc/bugon.cocci

Link: https://lkml.kernel.org/r/alpine.DEB.2.22.394.2107061049150.7197@hadrien
Fixes: 7d37cb2c912d ("lib: fix kconfig dependency on ARCH_WANT_FRAME_POINTERS")
Signed-off-by: kernel test robot <[email protected]>
Signed-off-by: Julia Lawall <[email protected]>
Reported-by: kernel test robot <[email protected]>
Cc: Julian Braha <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ocfs2: ocfs2_downconvert_lock failure results in deadlock

Usually, ocfs2_downconvert_lock() function always downconverts dlm lock to
the expected level for satisfy dlm bast requests from the other nodes.

But there is a rare situation.  When dlm lock conversion is being
canceled, ocfs2_downconvert_lock() function will return -EBUSY.  You need
to be aware that ocfs2_cancel_convert() function is asynchronous in fsdlm
implementation.

If we does not requeue this lockres entry, ocfs2 downconvert thread no
longer handles this dlm lock bast request.  Then, the other nodes will not
get the dlm lock again, the current node's process will be blocked when
acquire this dlm lock again.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gang He <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ocfs2: quota_local: fix possible uninitialized-variable access in ocfs2_local_read_info()

A memory block is allocated through kmalloc(), and its return value is
assigned to the pointer oinfo. However, oinfo->dqi_gqinode is not
initialized but it is accessed in:
iput(oinfo->dqi_gqinode);

To fix this possible uninitialized-variable access, assign NULL to
oinfo->dqi_gqinode, and add ocfs2_qinfo_lock_res_init() behind the
assignment in ocfs2_local_read_info(). Remove ocfs2_qinfo_lock_res_init()
in ocfs2_global_read_info().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tuo Li <[email protected]>
Reported-by: TOTE Robot <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ocfs2: remove an unnecessary condition

The case where "tmp_oh" is NULL is handled at the start of the function.
At this point we know it's non-NULL so this will always return 1.

Link: https://lkml.kernel.org/r/YOcItgIXtisi3MaO@mwanda
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: Larry Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ia64: make num_rsvd_regions static

Commit f62800992e5917f2 ("ia64: switch to NO_BOOTMEM") removed the last
user of num_rsvd_regions outside arch/ia64/kernel/setup.c.

Link: https://lkml.kernel.org/r/a377b5437e3e9da93d02f996fe06a2b956cb0990.1629884459.git.geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <[email protected]>
Cc: Frank Rowand <[email protected]>
Cc: Jay Lan <[email protected]>
Cc: Magnus Damm <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ia64: make reserve_elfcorehdr() static

There never was a reason for reserve_elfcorehdr() to be global. Make the
function static, and move it before its sole caller.

Link: https://lkml.kernel.org/r/fe236cd73b64abc4abd03dd808cb015c907f4c8c.1629884459.git.geert+renesas@glider.be
Fixes: cee87af2a5f75713 ("[IA64] kexec: Use EFI_LOADER_DATA for ELF core header")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Cc: Frank Rowand <[email protected]>
Cc: Jay Lan <[email protected]>
Cc: Magnus Damm <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: Tony Luck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ia64: fix #endif comment for reserve_elfcorehdr()

Patch series "ia64: Miscellaneous fixes and cleanups".

This patch series contains some miscellaneous fixes and cleanups for ia64.
The second patch fixes a naming conflict triggered by a patch for the FDT
code.

This patch (of 3):

The definition of reserve_elfcorehdr() depends on CONFIG_CRASH_DUMP, not
CONFIG_PROC_VMCORE.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/77b4c0648f200cab7e1c2c5171c06763e09362aa.1629884459.git.geert+renesas@glider.be
Fixes: d9a9855d0b06ca6d ("always reserve elfcore header memory in crash kernel")
Fixes: 17c1f07ed70afa4f ("[IA64] Reserve elfcorehdr memory in CONFIG_CRASH_DUMP")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Cc: Simon Horman <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Jay Lan <[email protected]>
Cc: Magnus Damm <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Frank Rowand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ia64: fix typo in a comment

s/when when/when/

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: update documentation of get_write_access() and friends

As VM_DENYWRITE does no longer exists, let's spring-clean the
documentation of get_write_access() and friends.

Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>

mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()

Let's also remove masking off MAP_DENYWRITE from ksys_mmap_pgoff():
the last in-tree occurrence of MAP_DENYWRITE is now in LEGACY_MAP_MASK,
which accepts the flag e.g., for MAP_SHARED_VALIDATE; however, the flag
is ignored throughout the kernel now.

Add a comment to LEGACY_MAP_MASK stating that MAP_DENYWRITE is ignored.

Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>

mm: remove VM_DENYWRITE

All in-tree users of MAP_DENYWRITE are gone. MAP_DENYWRITE cannot be
set from user space, so all users are gone; let's remove it.

Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>

binfmt: remove in-tree usage of MAP_DENYWRITE

At exec time when we mmap the new executable via MAP_DENYWRITE we have it
opened via do_open_execat() and already deny_write_access()'ed the file
successfully. Once exec completes, we allow_write_acces(); however,
we set mm->exe_file in begin_new_exec() via set_mm_exe_file() and
also deny_write_access() as long as mm->exe_file remains set. We'll
effectively deny write access to our executable via mm->exe_file
until mm->exe_file is changed -- when the process is removed, on new
exec, or via sys_prctl(PR_SET_MM_MAP/EXE_FILE).

Let's remove all usage of MAP_DENYWRITE, it's no longer necessary for
mm->exe_file.

In case of an elf interpreter, we'll now only deny write access to the file
during exec. This is somewhat okay, because the interpreter behaves
(and sometime is) a shared library; all shared libraries, especially the
ones loaded directly in user space like via dlopen() won't ever be mapped
via MAP_DENYWRITE, because we ignore that from user space completely;
these shared libraries can always be modified while mapped and executed.
Let's only special-case the main executable, denying write access while
being executed by a process. This can be considered a minor user space
visible change.

While this is a cleanup, it also fixes part of a problem reported with
VM_DENYWRITE on overlayfs, as VM_DENYWRITE is effectively unused with
this patch and will be removed next:
"Overlayfs did not honor positive i_writecount on realfile for
VM_DENYWRITE mappings." [1]

[1] https://lore.kernel.org/r/[email protected]/

Reported-by: Chengguang Xu <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>

kernel/fork: always deny write access to current MM exe_file

We want to remove VM_DENYWRITE only currently only used when mapping the
executable during exec. During exec, we already deny_write_access() the
executable, however, after exec completes the VMAs mapped
with VM_DENYWRITE effectively keeps write access denied via
deny_write_access().

Let's deny write access when setting or replacing the MM exe_file. With
this change, we can remove VM_DENYWRITE for mapping executables.

Make set_mm_exe_file() return an error in case deny_write_access()
fails; note that this should never happen, because exec code does a
deny_write_access() early and keeps write access denied when calling
set_mm_exe_file. However, it makes the code easier to read and makes
set_mm_exe_file() and replace_mm_exe_file() look more similar.

This represents a minor user space visible change:
sys_prctl(PR_SET_MM_MAP/EXE_FILE) can now fail if the file is already
opened writable. Also, after sys_prctl(PR_SET_MM_MAP/EXE_FILE) the file
cannot be opened writable. Note that we can already fail with -EACCES if
the file doesn't have execute permissions.

Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>

kernel/fork: factor out replacing the current MM exe_file

Let's factor the main logic out into replace_mm_exe_file(), such that
all mm->exe_file logic is contained in kernel/fork.c.

While at it, perform some simple cleanups that are possible now that
we're simplifying the individual functions.

Acked-by: Christian Brauner <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>

binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()

uselib() is the legacy systemcall for loading shared libraries.
Nowadays, applications use dlopen() to load shared libraries, completely
implemented in user space via mmap().

For example, glibc uses MAP_COPY to mmap shared libraries. While this
maps to MAP_PRIVATE | MAP_DENYWRITE on Linux, Linux ignores any
MAP_DENYWRITE specification from user space in mmap.

With this change, all remaining in-tree users of MAP_DENYWRITE use it
to map an executable. We will be able to open shared libraries loaded
via uselib() writable, just as we already can via dlopen() from user
space.

This is one step into the direction of removing MAP_DENYWRITE from the
kernel. This can be considered a minor user space visible change.

Acked-by: "Eric W. Biederman" <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>

netfilter: socket: icmp6: fix use-after-scope

Bug reported by KASAN:

BUG: KASAN: use-after-scope in inet6_ehashfn (net/ipv6/inet6_hashtables.c:40)
Call Trace:
(...)
inet6_ehashfn (net/ipv6/inet6_hashtables.c:40)
(...)
nf_sk_lookup_slow_v6 (net/ipv6/netfilter/nf_socket_ipv6.c:91
net/ipv6/netfilter/nf_socket_ipv6.c:146)

It seems that this bug has already been fixed by Eric Dumazet in the
past in:
commit 78296c97ca1f ("netfilter: xt_socket: fix a stack corruption bug")

But a variant of the same issue has been introduced in
commit d64d80a2cde9 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match")

`daddr` and `saddr` potentially hold a reference to ipv6_var that is no
longer in scope when the call to `nf_socket_get_sock_v6` is made.

Fixes: d64d80a2cde9 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match")
Acked-by: Matthieu Baerts <[email protected]>
Signed-off-by: Benjamin Hesmans <[email protected]>
Reviewed-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft into HEAD

* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft:
iscsi_ibft: Fix isa_bus_to_virt not working under ARM

io_uring: fix possible poll event lost in multi shot mode

IIUC, IORING_POLL_ADD_MULTI is similar to epoll's edge-triggered mode,
that means once one pure poll request returns one event(cqe), we'll
need to read or write continually until EAGAIN is returned, then I think
there is a possible poll event lost race in multi shot mode:

t1  poll request add |                         |
t2                   |                         |
t3  event happens    |                         |
t4  task work add    |                         |
t5                   | task work run           |
t6                   |   commit one cqe        |
t7                   |                         | user app handles cqe
t8                   |   new event happen      |
t9                   |   add back to waitqueue |
t10                  |

After t6 but before t9, if new event happens, there'll be no wakeup
operation, and if user app has picked up this cqe in t7, read or write
until EAGAIN is returned. In t8, new event happens and will be lost,
though this race window maybe small.

To fix this possible race, add poll request back to waitqueue before
committing cqe.

Fixes: 88e41cf928a6 ("io_uring: add multishot mode for IORING_OP_POLL_ADD")
Signed-off-by: Xiaoguang Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

libata: Add ATA_HORKAGE_NO_NCQ_ON_ATI for Samsung 860 and 870 SSD.

Many users are reporting that the Samsung 860 and 870 SSD are having
various issues when combined with AMD/ATI (vendor ID 0x1002) SATA
controllers and only completely disabling NCQ helps to avoid these
issues.

Always disabling NCQ for Samsung 860/870 SSDs regardless of the host
SATA adapter vendor will cause I/O performance degradation with well
behaved adapters. To limit the performance impact to ATI adapters,
introduce the ATA_HORKAGE_NO_NCQ_ON_ATI flag to force disable NCQ
only for these adapters.

Also, two libata.force parameters (noncqati and ncqati) are introduced
to disable and enable the NCQ for the system which equipped with ATI
SATA adapter and Samsung 860 and 870 SSDs. The user can determine NCQ
function to be enabled or disabled according to the demand.

After verifying the chipset from the user reports, the issue appears
on AMD/ATI SB7x0/SB8x0/SB9x0 SATA Controllers and does not appear on
recent AMD SATA adapters. The vendor ID of ATI should be 0x1002.
Therefore, ATA_HORKAGE_NO_NCQ_ON_AMD was modified to
ATA_HORKAGE_NO_NCQ_ON_ATI.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=201693
Signed-off-by: Kate Hsuan <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Martin K. Petersen <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs

Commit ca6bfcb2f6d9 ("libata: Enable queued TRIM for Samsung SSD 860")
limited the existing ATA_HORKAGE_NO_NCQ_TRIM quirk from "Samsung SSD 8*",
covering all Samsung 800 series SSDs, to only apply to "Samsung SSD 840*"
and "Samsung SSD 850*" series based on information from Samsung.

But there is a large number of users which is still reporting issues
with the Samsung 860 and 870 SSDs combined with Intel, ASmedia or
Marvell SATA controllers and all reporters also report these problems
going away when disabling queued trims.

Note that with AMD SATA controllers users are reporting even worse
issues and only completely disabling NCQ helps there, this will be
addressed in a separate patch.

Fixes: ca6bfcb2f6d9 ("libata: Enable queued TRIM for Samsung SSD 860")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203475
Cc: [email protected]
Cc: Kate Hsuan <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

bio: fix kerneldoc documentation for bio_alloc_kiocb()

Apparently the last fixup got butter fingered a bit, the correct variable
name is 'nr_vecs', not 'nr_iovecs'.

Link: https://lore.kernel.org/lkml/[email protected]/
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge branch 'fixes' into next

Merge our fixes branch into next.

That lets us resolve a conflict in arch/powerpc/sysdev/xive/common.c.

Between cbc06f051c52 ("powerpc/xive: Do not skip CPU-less nodes when
creating the IPIs"), which moved request_irq() out of xive_init_ipis(),
and 17df41fec5b8 ("powerpc: use IRQF_NO_DEBUG for IPIs") which added
IRQF_NO_DEBUG to that request_irq() call, which has now moved.

net: remove the unnecessary check in cipso_v4_doi_free

The commit 733c99ee8be9 ("net: fix NULL pointer reference in
cipso_v4_doi_free") was merged by a mistake, this patch try
to cleanup the mess.

And we already have the commit e842cb60e8ac ("net: fix NULL
pointer reference in cipso_v4_doi_free") which fixed the root
cause of the issue mentioned in it's description.

Suggested-by: Paul Moore <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: bridge: mcast: fix vlan port router deadlock

Before vlan/port mcast router support was added
br_multicast_set_port_router was used only with bh already disabled due
to the bridge port lock, but that is no longer the case and when it is
called to configure a vlan/port mcast router we can deadlock with the
timer, so always disable bh to make sure it can be called from contexts
with both enabled and disabled bh.

Fixes: 2796d846d74a ("net: bridge: vlan: convert mcast router global option to per-vlan entry")
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: cs89x0: disable compile testing on powerpc

The ISA DMA API is inconsistent between architectures, and while
powerpc implements most of what the others have, it does not provide
isa_virt_to_bus():

../drivers/net/ethernet/cirrus/cs89x0.c: In function ‘net_open’:
../drivers/net/ethernet/cirrus/cs89x0.c:897:20: error: implicit declaration of function ‘isa_virt_to_bus’ [-Werror=implicit-function-declaration]
(unsigned long)isa_virt_to_bus(lp->dma_buff));
../drivers/net/ethernet/cirrus/cs89x0.c:894:3: note: in expansion of macro ‘cs89_dbg’
cs89_dbg(1, debug, "%s: dma %lx %lx\n",

I tried a couple of approaches to handle this consistently across
all architectures, but as this driver is really only used on
ARM, I ended up taking the easy way out and just disable compile
testing on powerpc.

Reported-by: Guenter Roeck <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Reported-by: Reported-by: kernel test robot <[email protected]>
Fixes: 47fd22f2b847 ("cs89x0: rework driver configuration")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: usb: qmi_wwan: add Telit 0x1060 composition

This patch adds support for Telit LN920 0x1060 composition

0x1060: tty, adb, rmnet, tty, tty, tty, tty

Signed-off-by: Carlo Lobrano <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

io_uring: prolong tctx_task_work() with flushing

io_submit_flush_completions() may enqueue linked requests for task_work
execution, so don't leave tctx_task_work() right after the tw list is
exhausted, but try to flush and then retry.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/0755d4c2c36301447c63bdd4146c10477cea4249.1630539342.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

io_uring: don't disable kiocb_done() CQE batching

Not passing issue_flags from kiocb_done() into __io_complete_rw() means
that completion batching for this case is disabled, e.g. for most of
buffered reads.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/b2689462835c3ee28a5999ef4f9a581e24be04a2.1630539342.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL

SQPOLL has a different thread doing submissions, we need to check for
that and use the right task context when updating the worker values.
Just hold the sqd->lock across the operation, this ensures that the
thread cannot go away while we poke at ->io_uring.

Link: https://github.com/axboe/liburing/issues/420
Fixes: 2e480058ddc2 ("io-wq: provide a way to limit max number of workers")
Reported-by: Johannes Lundberg <[email protected]>
Tested-by: Johannes Lundberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

perf tests: Add test for PMU aliases

A perf uncore PMU may have two PMU names, a real name and an alias.

Add one test case to verify that the real and alias names have the same
effect.

Iterate sysfs to get one event which has an alias and create an evlist
by adding two evsels. Evsel1 is created by event and evsel2 is created
by alias.

Test asserts:

evsel1->core.attr.type == evsel2->core.attr.type
evsel1->core.attr.config == evsel2->core.attr.config

Signed-off-by: Jin Yao <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf pmu: Add PMU alias support

A perf uncore PMU may have two PMU names, a real name and an alias. The
alias is exported at /sys/bus/event_source/devices/uncore_*/alias.
The perf tool should support the alias as well.

Add alias_name in the struct perf_pmu to store the alias. For the PMU
which doesn't have an alias. It's NULL.

Introduce two X86 specific functions to retrieve the real name and the
alias separately.

Only go through the sysfs to retrieve the mapping between the real name
and the alias once. The result is cached in a list, uncore_pmu_list.

Nothing changed for the other ARCHs.

With the patch, the perf tool can monitor the PMU with either the real
name or the alias.

Use the real name,
$ perf stat -e uncore_cha_2/event=1/ -x,
4044879584,,uncore_cha_2/event=1/,2528059205,100.00,,

Use the alias,
$ perf stat -e uncore_type_0_2/event=1/ -x,
3659675336,,uncore_type_0_2/event=1/,2287306455,100.00,,

Committer notes:

Rename 'struct perf_pmu_alias_name' to 'pmu_alias', the 'perf_' prefix
should be used for libperf, things inside just tools/perf/ are being
moved away from that prefix.

Also 'pmu_alias' is shorter and reflects the abstraction.

Also don't use 'pmu' as the name for variables for that type, we should
use that for the 'struct perf_pmu' variables, avoiding confusion. Use
'pmu_alias' for 'struct pmu_alias' variables.

Co-developed-by: Jin Yao <[email protected]>
Co-developed-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Jin Yao <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

seg6_iptunnel: Remove redundant initialization of variable err

The variable err is being initialized with a value that is never read, it
is being updated later on. The assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

perf session: Report collisions in AUX records

Just like the other flags in the AUX records, report a summary of the
Collisions if there were any.

Signed-off-by: Suzuki Poulouse <[email protected]>
Reviewed-by: Leo Yan <[email protected]>
Reviewed-by: Mathieu Poirier <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: [email protected]
Cc: [email protected]
LPU-Reference: 20210728091219 [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf script python: Allow reporting the [un]throttle PERF_RECORD_ meta event

perf_events may sometimes throttle an event due to creating too many
samples during a given timer tick.

As of now, the perf tool will not report on throttling, which means this
is a silent error.

Implement a callback for the throttle and unthrottle events within the
Python scripting engine, which can allow scripts to detect and report
when events may have been lost due to throttling.

The simplest script to report throttle events is:

  def throttle(*args):
      print("throttle" + repr(args))

  def unthrottle(*args):
      print("unthrottle" + repr(args))

Signed-off-by: Stephen Brennan <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf build: Report failure for testing feature libopencsd

When build perf tool with passing option 'CORESIGHT=1' explicitly, if
the feature test fails for library libopencsd, the build doesn't
complain the feature failure and continue to build the tool with
disabling the CoreSight feature insteadly.

This patch changes the building behaviour, when build perf tool with the
option 'CORESIGHT=1' and detect the failure for testing feature
libopencsd, the build process will be aborted and it shows the complaint
info.

Committer testing:

First make sure there is no opencsd library installed:

  $ rpm -qa | grep -i csd
  $ sudo rm -rf `find /usr/local -name "*csd*"`
  $ find /usr/local -name "*csd*"
  $

Then cleanup the perf build output directory:

  $ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ;
  $

And try to build explicitely asking for coresight:

  $ make O=/tmp/build/perf CORESIGHT=1 O=/tmp/build/perf -C tools/perf install-bin
  make: Entering directory '/var/home/acme/git/perf/tools/perf'
    BUILD:   Doing 'make -j24' parallel build
    HOSTCC  /tmp/build/perf/fixdep.o
    HOSTLD  /tmp/build/perf/fixdep-in.o
    LINK    /tmp/build/perf/fixdep
  Makefile.config:493: *** Error: No libopencsd library found or the version is not up-to-date. Please install recent libopencsd to build with CORESIGHT=1.  Stop.
  make[1]: *** [Makefile.perf:238: sub-make] Error 2
  make: *** [Makefile:113: install-bin] Error 2
  make: Leaving directory '/var/home/acme/git/perf/tools/perf'
  $

Now install the opencsd library present in Fedora 34:

  $ sudo dnf install opencsd-devel
  <SNIP>
  Installed:
    opencsd-1.0.0-1.fc34.x86_64 opencsd-devel-1.0.0-1.fc34.x86_64
  Complete!
  $

Try again building with coresight:

  $ make O=/tmp/build/perf CORESIGHT=1 O=/tmp/build/perf -C tools/perf install-bin
  make: Entering directory '/var/home/acme/git/perf/tools/perf'
    BUILD:   Doing 'make -j24' parallel build
  Makefile.config:493: *** Error: No libopencsd library found or the version is not up-to-date. Please install recent libopencsd to build with CORESIGHT=1.  Stop.
  make[1]: *** [Makefile.perf:238: sub-make] Error 2
  make: *** [Makefile:113: install-bin] Error 2
  make: Leaving directory '/var/home/acme/git/perf/tools/perf'
  $

Since Fedora 34 is pretty recent, one assumes we need to get it from its
upstream git repository, use rpm to find where that is:

  $ rpm -q --qf "%{URL}\n" opencsd
  https://github.com/Linaro/OpenCSD
  $

Go there, clone the repo, build it and install into /usr/local, then try
again:

  $ cd ~acme/git/perf
  $ make O=/tmp/build/perf VF=1 CORESIGHT=1 O=/tmp/build/perf -C tools/perf install-bin | grep -i opencsd
  ...                    libopencsd: [ on  ]
    PERF_VERSION = 5.14.g454719f67a3d
  $ export LD_LIBRARY_PATH=/usr/local/lib
  $ ldd ~/bin/perf | grep opencsd
   libopencsd_c_api.so.1 => /usr/local/lib/libopencsd_c_api.so.1 (0x00007f28f78a4000)
   libopencsd.so.1 => /usr/local/lib/libopencsd.so.1 (0x00007f28f6a2e000)
  $

Now it works.

Requested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Show a warning for an unknown magic number

Currently perf reports "Cannot allocate memory" which isn't very helpful
for a potentially user facing issue. If we add a new magic number in
the future, perf will be able to report unrecognised magic numbers.

Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https //lore.kernel.org/r/20210806134109.1182235 [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Print the decoder name

Use the real name of the decoder instead of hard-coding "ETM" to avoid
confusion when the trace is ETE. This also now distinguishes between
ETMv3 and ETMv4.

Reviewed-by: Leo Yan <[email protected]>
Reviewed-by: Suzuki Poulouse <[email protected]>
Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https //lore.kernel.org/r/20210806134109.1182235 [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Create ETE decoder

If the magic number indicates ETE instantiate a OCSD_BUILTIN_DCD_ETE
decoder instead of OCSD_BUILTIN_DCD_ETMV4I. ETE is the new trace feature
for Armv9.

Testing performed
=================

* Old files with v0 and v1 headers for ETMv4 still open correctly
* New files with new magic number open on new versions of perf
* New files with new magic number fail to open on old versions of perf
* Decoding with the ETE decoder results in the same output as the ETMv4
decoder as long as there are no new ETE packet types

Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: James Clark <[email protected]>
Acked-by: Suzuki Poulouse <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https //lore.kernel.org/r/20210806134109.1182235 [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Update OpenCSD decoder for ETE

OpenCSD v1.1.1 has a bug fix for the installation of the ETE decoder
headers. This also means that including headers separately for each
decoder is unnecessary so remove these.

Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: James Clark <[email protected]>
Acked-by: Suzuki Poulouse <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https //lore.kernel.org/r/20210806134109.1182235 [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Fix typo

TRCIRD2 should be TRCIDR2

Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: James Clark <[email protected]>
Acked-by: Suzuki Poulouse <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https //lore.kernel.org/r/20210806134109.1182235 [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Save TRCDEVARCH register

When ETE is present save the TRCDEVARCH register and set a new magic
number. It will be used to configure the decoder in a later commit.

Old versions of perf will not be able to open files with this new magic
number, but old files will still work with newer versions of perf.

Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: James Clark <[email protected]>
Acked-by: Suzuki Poulouse <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https //lore.kernel.org/r/20210806134109.1182235 [email protected]
[ Addressed some cosmetic suggestions by Suzuki Poulouse ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Refactor out ETMv4 header saving

Extract a function for saving the ETMv4 header because this will be used
for ETE in a later commit.

Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: James Clark <[email protected]>
Acked-by: Suzuki Poulouse <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https //lore.kernel.org/r/20210806134109.1182235 [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Initialise architecture based on TRCIDR1

Currently the architecture is hard coded as ARCH_V8, but from ETMv4.4
onwards this should be ARCH_AA64.

Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https //lore.kernel.org/r/20210806134109.1182235 [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Refactor initialisation of decoder params.

The initialisation of the decoder params is duplicated between
creation of the packet printer and packet decoder. Put them both
into one function so that future changes only need to be made in one
place.

Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: James Clark <[email protected]>
Acked-by: Suzuki Poulouse <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https //lore.kernel.org/r/20210806134109.1182235 [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tipc: clean up inconsistent indenting

There is a statement that is indented one character too deeply,
clean this up.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

skbuff: clean up inconsistent indenting

There is a statement that is indented one character too deeply,
clean this up.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drivers: net: smc911x: clean up inconsistent indenting

There are various function arguments that are not indented correctly,
clean these up with correct indentation.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: 3com: 3c59x: clean up inconsistent indenting

There is a statement that is not indented correctly, add in the
missing tab.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: Only send extra TCP acks in eligible socket states

Recent changes exposed a bug where specifically-timed requests to the
path manager netlink API could trigger a divide-by-zero in
__tcp_select_window(), as syzkaller does:

divide error: 0000 [#1] SMP KASAN NOPTI
CPU: 0 PID: 9667 Comm: syz-executor.0 Not tainted 5.14.0-rc6+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:__tcp_select_window+0x509/0xa60 net/ipv4/tcp_output.c:3016
Code: 44 89 ff e8 c9 29 e9 fd 45 39 e7 0f 8d 20 ff ff ff e8 db 28 e9 fd 44 89 e3 e9 13 ff ff ff e8 ce 28 e9 fd 44 89 e0 44 89 e3 99 <f7> 7c 24 04 29 d3 e9 fc fe ff ff e8 b7 28 e9 fd 44 89 f1 48 89 ea
RSP: 0018:ffff888031ccf020 EFLAGS: 00010216
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000040000
RDX: 0000000000000000 RSI: ffff88811532c080 RDI: 0000000000000002
RBP: 0000000000000000 R08: ffffffff835807c2 R09: 0000000000000000
R10: 0000000000000004 R11: ffffed1020b92441 R12: 0000000000000000
R13: 1ffff11006399e08 R14: 0000000000000000 R15: 0000000000000000
FS: 00007fa4c8344700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2f424000 CR3: 000000003e4e2003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
tcp_select_window net/ipv4/tcp_output.c:264 [inline]
__tcp_transmit_skb+0xc00/0x37a0 net/ipv4/tcp_output.c:1351
__tcp_send_ack.part.0+0x3ec/0x760 net/ipv4/tcp_output.c:3972
__tcp_send_ack net/ipv4/tcp_output.c:3978 [inline]
tcp_send_ack+0x7d/0xa0 net/ipv4/tcp_output.c:3978
mptcp_pm_nl_addr_send_ack+0x1ab/0x380 net/mptcp/pm_netlink.c:654
mptcp_pm_remove_addr+0x161/0x200 net/mptcp/pm.c:58
mptcp_nl_remove_id_zero_address+0x197/0x460 net/mptcp/pm_netlink.c:1328
mptcp_nl_cmd_del_addr+0x98b/0xd40 net/mptcp/pm_netlink.c:1359
genl_family_rcv_msg_doit.isra.0+0x225/0x340 net/netlink/genetlink.c:731
genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
genl_rcv_msg+0x341/0x5b0 net/netlink/genetlink.c:792
netlink_rcv_skb+0x148/0x430 net/netlink/af_netlink.c:2504
genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
netlink_unicast+0x537/0x750 net/netlink/af_netlink.c:1340
netlink_sendmsg+0x846/0xd80 net/netlink/af_netlink.c:1929
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0x14e/0x190 net/socket.c:724
____sys_sendmsg+0x709/0x870 net/socket.c:2403
___sys_sendmsg+0xff/0x170 net/socket.c:2457
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2486
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

mptcp_pm_nl_addr_send_ack() was attempting to send a TCP ACK on the
first subflow in the MPTCP socket's connection list without validating
that the subflow was in a suitable connection state. To address this,
always validate subflow state when sending extra ACKs on subflows
for address advertisement or subflow priority change.

Fixes: 84dfe3677a6f ("mptcp: send out dedicated ADD_ADDR packet")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/229
Co-developed-by: Paolo Abeni <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Acked-by: Geliang Tang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

pktgen: remove unused variable

pktgen_thread_worker() no longer needs wait variable, delete it.

Fixes: ef87979c273a ("pktgen: better scheduler friendliness")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ionic: fix double use of queue-lock

Deadlock seen in an instance where the hwstamp configuration
is changed while the driver is running:

[ 3988.736671]  schedule_preempt_disabled+0xe/0x10
[ 3988.736676]  __mutex_lock.isra.5+0x276/0x4e0
[ 3988.736683]  __mutex_lock_slowpath+0x13/0x20
[ 3988.736687]  ? __mutex_lock_slowpath+0x13/0x20
[ 3988.736692]  mutex_lock+0x2f/0x40
[ 3988.736711]  ionic_stop_queues_reconfig+0x16/0x40 [ionic]
[ 3988.736726]  ionic_reconfigure_queues+0x43e/0xc90 [ionic]
[ 3988.736738]  ionic_lif_config_hwstamp_rxq_all+0x85/0x90 [ionic]
[ 3988.736751]  ionic_lif_hwstamp_set_ts_config+0x29c/0x360 [ionic]
[ 3988.736763]  ionic_lif_hwstamp_set+0x76/0xf0 [ionic]
[ 3988.736776]  ionic_eth_ioctl+0x33/0x40 [ionic]
[ 3988.736781]  dev_ifsioc+0x12c/0x420
[ 3988.736785]  dev_ioctl+0x316/0x720

This can be demonstrated with "ptp4l -m -i <intf>"

To fix this, we pull the use of the queue_lock further up above the
callers of ionic_reconfigure_queues() and ionic_stop_queues_reconfig().

Fixes: 7ee99fc5ed2e ("ionic: pull hwstamp queue_lock up a level")
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

parisc: Fix unaligned-access crash in bootloader

Kernel v5.14 has various changes to optimize unaligned memory accesses,
e.g. commit 0652035a5794 ("asm-generic: unaligned: remove byteshift helpers").

Those changes triggered an unalignment-exception and thus crashed the
bootloader on parisc because the unaligned "output_len" variable now suddenly
was read word-wise while it was read byte-wise in the past.

Fix this issue by declaring the external output_len variable as char which then
forces the compiler to generate byte-accesses.

Signed-off-by: Helge Deller <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: John David Anglin <[email protected]>
Bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162
Fixes: 8c031ba63f8f ("parisc: Unbreak bootloader due to gcc-7 optimizations")
Fixes: 0652035a5794 ("asm-generic: unaligned: remove byteshift helpers")
Cc: <[email protected]> # v5.14+

kbuild: redo fake deps at include/ksym/*.h

Commit 0e0345b77ac4 ("kbuild: redo fake deps at include/config/*.h")
simplified the Kconfig/fixdep interaction a lot.

For CONFIG_FOO_BAR_BAZ, Kconfig now touches include/config/FOO_BAR_BAZ
instead of the previous include/config/foo/bar/baz.h .

This commit simplifies the TRIM_UNUSED_KSYMS feature in a similar way:

  - delete .h suffix
  - delete tolower()
  - put everything in 1 directory

For EXPORT_SYMBOL(FOO_BAR_BAZ), scripts/adjust_autoksyms.sh now touches
include/ksym/FOO_BAR_BAZ instead of include/ksym/foo/bar/baz.h .

This is more precise, avoiding possibly unnecessary rebuilds.

  EXPORT_SYMBOL(FOO_BAR_BAZ)
  EXPORT_SYMBOL(_FOO_BAR_BAZ)
  EXPORT_SYMBOL(__FOO_BAR_BAZ)

were previously mapped to the same header, include/ksym/foo/bar/baz.h
but now are handled separately.

Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: clean up objtool_args slightly

The code:

$(if $(or $(CONFIG_GCOV_KERNEL),$(CONFIG_LTO_CLANG)), ...)

... can be simpled to:

$(if $(CONFIG_GCOV_KERNEL)$(CONFIG_LTO_CLANG), ...)

Also, remove meaningless commas at the end of $(if ...).

Signed-off-by: Masahiro Yamada <[email protected]>

modpost: get the *.mod file path more simply

get_src_version() strips 'o' or 'lto.o' from the end of the object file
path (so, postfixlen is 1 or 5), then adds 'mod'.

If you look at the code closely, mod->name already holds the base path
with the extension stripped.

Most of the code changes made by commit 7ac204b545f2 ("modpost: lto:
strip .lto from module names") was actually unneeded.

sumversion.c does not need strends(), so it can get back local in
modpost.c again.

Signed-off-by: Masahiro Yamada <[email protected]>

checkkconfigsymbols.py: Fix the '--ignore' option

It seems like the implementation of the --ignore option is broken.

In check_symbols_helper, when going through the list of files, a file is
added to the list of source files to check if it matches the ignore
pattern. Instead, as stated in the comment below this condition, the
file should be added if it doesn't match the pattern.

This means that when providing an ignore pattern, the only files that
will be checked will be the ones we want the ignore, in addition to the
Kconfig files that don't match the pattern (the check in
parse_kconfig_files is done right)

Signed-off-by: Ariel Marcovitch <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: merge vmlinux_link() between ARCH=um and other architectures

For ARCH=um, ${CC} is used as the linker driver. Hence, the linker
options are prefixed with -Wl, .

Merge the similar code.

I replaced the -T option with the long option --script= so that it
works well with/without ${wl}.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Kees Cook <[email protected]>

kbuild: do not remove 'linux' link in scripts/link-vmlinux.sh

arch/um/Makefile passes the -f option to the ln command:

  linux: vmlinux
          @echo '  LINK $@'
          $(Q)ln -f $< $@

So, the hard link is always re-created, and the old one is removed
anyway.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Kees Cook <[email protected]>

kbuild: merge vmlinux_link() between the ordinary link and Clang LTO

When Clang LTO is enabled, vmlinux_link() reuses vmlinux.o instead of
re-linking ${KBUILD_VMLINUX_OBJS} and ${KBUILD_VMLINUX_LIBS}.

That is the only difference here, so merge the similar code.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Kees Cook <[email protected]>

kbuild: remove stale *.symversions

cmd_update_lto_symversions merges all the existing *.symversions, but
some of them might be stale.

If the last EXPORT_SYMBOL is removed from a C file, the *.symversions
file is not deleted or updated. It contains stale CRCs, but still they
will be used for linking the vmlinux or modules.

It is not a big deal when the EXPORT_SYMBOL is really removed. However,
when the EXPORT_SYMBOL is moved to another file, the same __crc_<symbol>
will appear twice in the merged *.symversions, possibly with different
CRCs if the function argument is changed at the same time. It would
confuse module versioning.

If no EXPORT_SYMBOL is found, let's remove *.symversions explicitly.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Kees Cook <[email protected]>

kbuild: remove unused quiet_cmd_update_lto_symversions

This is not used anywhere because the short log is displayed when
it is used through a $(call cmd,...) invocation.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Kees Cook <[email protected]>

gen_compile_commands: extract compiler command from a series of commands

The current gen_compile_commands.py assumes that objects are always
built by a single command.

It makes sense to support cases where objects are built by a series of
commands:

cmd_<object> := <command1> ; <command2>

One use-case is that <command1> is a compiler command, and <command2>
an objtool command.

It allows *.cmd files to contain an objtool command so that any change
in it triggers object rebuilds.

If ; appears after the C source file, take the first command.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Kees Cook <[email protected]>

x86: remove cc-option-yn test for -mtune=

As noted in the comment, -mtune= has been supported since GCC 3.4. The
minimum required version of GCC to build the kernel (as specified in
Documentation/process/changes.rst) is GCC 4.9.

tune is not immediately expanded. Instead it defines a macro that will
test via cc-option later values for -mtune=. But we can skip the test
whether to use -mtune= vs. -mcpu=.

Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Reviewed-by: Miguel Ojeda <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

arc: replace cc-option-yn uses with cc-option

cc-option-yn can be replaced with cc-option. ie.
Checking for support:
ifeq ($(call cc-option-yn,$(FLAG)),y)
becomes:
ifneq ($(call cc-option,$(FLAG)),)

Checking for lack of support:
ifeq ($(call cc-option-yn,$(FLAG)),n)
becomes:
ifeq ($(call cc-option,$(FLAG)),)

This allows us to pursue removing cc-option-yn.

Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

s390: replace cc-option-yn uses with cc-option

cc-option-yn can be replaced with cc-option. ie.
Checking for support:
ifeq ($(call cc-option-yn,$(FLAG)),y)
becomes:
ifneq ($(call cc-option,$(FLAG)),)

Checking for lack of support:
ifeq ($(call cc-option-yn,$(FLAG)),n)
becomes:
ifeq ($(call cc-option,$(FLAG)),)

This allows us to pursue removing cc-option-yn.

Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Acked-by: Heiko Carstens <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

ia64: move core-y in arch/ia64/Makefile to arch/ia64/Kbuild

Use obj-y to clean up Makefile.

Signed-off-by: Masahiro Yamada <[email protected]>

sparc: move the install rule to arch/sparc/Makefile

Currently, the install target in arch/sparc/Makefile descends into
arch/sparc/boot/Makefile to invoke the shell script, but there is no
good reason to do so.

arch/sparc/Makefile can run the shell script directly.

Signed-off-by: Masahiro Yamada <[email protected]>

security: remove unneeded subdir-$(CONFIG_...)

All of these are unneeded. The directories to descend are specified
by obj-$(CONFIG_...).

Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: sh: remove unused install script

The sh arch has a install.sh script, but no Makefile actually calls it.
Remove it to keep anyone from accidentally calling it in the future.

Signed-off-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y

When CONFIG_TRIM_UNUSED_KSYMS is enabled, I see some warnings like this:

nm: arch/x86/entry/vdso/vdso32/note.o: no symbols

$NM (both GNU nm and llvm-nm) warns when no symbol is found in the
object. Suppress the stderr.

Fangrui Song mentioned binutils>=2.37 `nm -q` can be used to suppress
"no symbols" [1], and llvm-nm>=13.0.0 supports -q as well.

We cannot use it for now, but note it as a TODO.

[1]: https://sourceware.org/bugzilla/show_bug.cgi?id=27408

Fixes: bbda5ec671d3 ("kbuild: simplify dependency generation for CONFIG_TRIM_UNUSED_KSYMS")
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>

kbuild: Switch to 'f' variants of integrated assembler flag

It has been brought up a few times in various code reviews that clang
3.5 introduced -f{,no-}integrated-as as the preferred way to enable and
disable the integrated assembler, mentioning that -{no-,}integrated-as
are now considered legacy flags.

Switch the kernel over to using those variants in case there is ever a
time where clang decides to remove the non-'f' variants of the flag.

Also, fix a typo in a comment ("intergrated" -> "integrated").

Link: https://releases.llvm.org/3.5.0/tools/clang/docs/ReleaseNotes.html#new-compiler-flags
Reviewed-by: Nick Desaulniers <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: Shuffle blank line to improve comment meaning

-Wunused-but-set-variable and -Wunused-const-variable are both disabled
for the same reason but there is a blank line between them and no blank
line between -Wno-unused-const-variable and the block.

Shuffle the new line so that it is clear that the comment applied to
both flags and the next block is separate from them.

Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: Add a comment above -Wno-gnu

Whenever a warning is disabled, it is helpful for future travelers to
understand why the warning is disabled and why it is acceptable to do
so. Add a comment for -Wno-gnu so that people understand why it is
disabled.

Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: Remove -Wno-format-invalid-specifier from clang block

Turning on -Wformat does not reveal any instances of this warning across
several different builds so remove this line to keep the number of
disabled warnings as slim as possible.

This has been disabled since commit 61163efae020 ("kbuild: LLVMLinux:
Add Kbuild support for building kernel with Clang"), which does not
explain exactly why it was turned off but since it was so long ago in
terms of both the kernel and LLVM so it is possible that some bug got
fixed along the way.

Signed-off-by: Nathan Chancellor <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: warn if FORCE is missing for if_changed(_dep,_rule) and filechk

if_changed, if_changed_dep, and if_changed_rule must have FORCE as a
prerequisite so the command line change is detected.

Documentation/kbuild/makefiles.rst clearly explains it:

Note: It is a typical mistake to forget the FORCE prerequisite.

However, not all people follow the document.

This mistake occurred again and again, so a compelling force is needed.

Show a warning if FORCE is missing in the prerequisite of if_changed
and friends. Same for filechk.

Signed-off-by: Masahiro Yamada <[email protected]>
Tested-by: Nicolas Schier <[email protected]>