Git Repo - linux.git/log

fs/epoll: restore waking from ep_done_scan()

Commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested
epoll") changed the userspace visible behavior of exclusive waiters
blocked on a common epoll descriptor upon a single event becoming ready.

Previously, all tasks doing epoll_wait would awake, and now only one is
awoken, potentially causing missed wakeups on applications that rely on
this behavior, such as Apache Qpid.

While the aforementioned commit aims at having only a wakeup single path
in ep_poll_callback (with the exceptions of epoll_ctl cases), we need to
restore the wakeup in what was the old ep_scan_ready_list() such that
the next thread can be awoken, in a cascading style, after the waker's
corresponding ep_send_events().

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")
Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Roman Penyaev <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kselftest: introduce new epoll test case

Patch series "fs/epoll: restore user-visible behavior upon event ready".

This series tries to address a change in user visible behavior, reported
in https://bugzilla.kernel.org/show_bug.cgi?id=208943.

Epoll does not report an event to all the threads running epoll_wait()
on the same epoll descriptor. Unsurprisingly, this was bisected back to
339ddb53d373 (fs/epoll: remove unnecessary wakeups of nested epoll), which
has had various problems in the past, beyond only nested epoll usage.

This patch (of 2):

This incorporates the testcase originally reported in:

https://bugzilla.kernel.org/show_bug.cgi?id=208943

Which ensures an event is reported to all threads blocked on the same
epoll descriptor, otherwise only a single thread will receive the wakeup
once the event become ready.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Roman Penyaev <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

checkpatch: improve ALLOC_ARRAY_ARGS test

The devm_ variant of 'kcalloc()' and 'kmalloc_array()' are not tested
Add the corresponding check.

Link: https://lkml.kernel.org/r/205fc4847972fb6779abcc8818f39c14d1b45af1.1618595794.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <[email protected]>
Acked-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

checkpatch: exclude four preprocessor sub-expressions from MACRO_ARG_REUSE

__must_be_array, offsetof, sizeof_field and __stringify are all
preprocessor macros and do not evaluate their arguments. As such, it is
safe not to warn when arguments are being reused in those four
sub-expressions.

Exclude those so that they can pass checkpatch.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vincent Mailhol <[email protected]>
Acked-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

checkpatch: warn when missing newline in return sysfs_emit() formats

return sysfs_emit() uses should include a newline.

Suggest adding a newline when one is missing.
Add one using --fix too.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

include/linux/compat.h: remove unneeded declaration from COMPAT_SYSCALL_DEFINEx()

compat_sys##name is declared twice, just one line below.

With this removal SYSCALL_DEFINEx() (defined in <linux/syscalls.h>)
and COMPAT_SYSCALL_DEFINEx() look symmetrical.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: parser: clean up kernel-doc

Mark match_uint() as kernel-doc notation since it is already fully
annotated as such. Use % prefix on constants in kernel-doc comments.
Convert function return descriptions to use the "Return:" kernel-doc
notation.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/genalloc: add parameter description to fix doc compile warning

Commit 52fbf1134d47 ("lib/genalloc.c: fix allocation of aligned buffer
from non-aligned chunk") added a new parameter 'start_addr' w/o
description for it. That causes some doc compile warning:

  lib/genalloc.c:649: warning: Function parameter or member 'start_addr' not described in 'gen_pool_first_fit'
  lib/genalloc.c:667: warning: Function parameter or member 'start_addr' not described in 'gen_pool_first_fit_align'
  lib/genalloc.c:694: warning: Function parameter or member 'start_addr' not described in 'gen_pool_fixed_alloc'
  lib/genalloc.c:729: warning: Function parameter or member 'start_addr' not described in 'gen_pool_first_fit_order_align'
  lib/genalloc.c:752: warning: Function parameter or member 'start_addr' not described in 'gen_pool_best_fit'

This fixes it by adding a parameter descriptions.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alex Shi <[email protected]>
Cc: Alexey Skidanov <[email protected]>
Cc: Huang Shijie <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Bhaskar Chowdhury <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/percpu_counter: tame kernel-doc compile warning

commit 3e8f399da490 ("writeback: rework wb_[dec|inc]_stat family of
functions") add some function description of percpu_counter_add_batch.
but the double '*' in comments means a kernel-doc format comment which
isn't right.

Since the whole file of lib/percpu_counter.c has no any other kernel-doc
format comments, we'd better to remove this incomplete one to tame the
kernel-doc warning:

lib/percpu_counter.c:83: warning: Function parameter or member 'fbc' not described in 'percpu_counter_add_batch'
lib/percpu_counter.c:83: warning: Function parameter or member 'amount' not described in 'percpu_counter_add_batch'
lib/percpu_counter.c:83: warning: Function parameter or member 'batch' not described in 'percpu_counter_add_batch'

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alex Shi <[email protected]>
Cc: Nikolay Borisov <[email protected]>
Cc: Stephen Boyd <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: stackdepot: turn depot_lock spinlock to raw_spinlock

In RT system, the spin_lock will be replaced by sleepable rt_mutex lock,
in __call_rcu(), disable interrupts before calling
kasan_record_aux_stack(), will trigger this calltrace:

  BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:951
  in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 19, name: pgdatinit0
  Call Trace:
    ___might_sleep.cold+0x1b2/0x1f1
    rt_spin_lock+0x3b/0xb0
    stack_depot_save+0x1b9/0x440
    kasan_save_stack+0x32/0x40
    kasan_record_aux_stack+0xa5/0xb0
    __call_rcu+0x117/0x880
    __exit_signal+0xafb/0x1180
    release_task+0x1d6/0x480
    exit_notify+0x303/0x750
    do_exit+0x678/0xcf0
    kthread+0x364/0x4f0
    ret_from_fork+0x22/0x30

Replace spinlock with raw_spinlock.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zqiang <[email protected]>
Reported-by: Andrew Halaney <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Vijayanand Jitta <[email protected]>
Cc: Vinayak Menon <[email protected]>
Cc: Yogesh Lal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: crc8: pointer to data block should be const

crc8() does not change the data passed to it, so the pointer argument
should be declared const. This avoids callers that receive const data
having to cast it to a non-const pointer to call crc8().

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Richard Fitzgerald <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/genalloc.c: Fix a typo

s/macthing/matching/

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bhaskar Chowdhury <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/list_sort.c: fix typo in function description

Replace beautiully with beautifully

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: ShihCheng Tu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: fix inconsistent indenting in process_bit1()

Smatch gives the warning:
lib/decompress_unlzma.c:395 process_bit1() warn: inconsistent indenting

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Wang Qing <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/bch.c: fix a typo in the file bch.c

s/buid/build/

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bhaskar Chowdhury <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: add entry for the bitmap API

Add myself as maintainer for bitmap API and Andy and Rasmus as reviewers.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Acked-by: Andy Shevchenko <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Cc: Alexey Klimov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Jianpeng Ma <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Stefano Brivio <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

tools: sync lib/find_bit implementation

Add fast paths to find_*_bit() functions as per kernel implementation.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Cc: Alexey Klimov <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Jianpeng Ma <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Stefano Brivio <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: add fast path for find_first_*_bit() and find_last_bit()

Similarly to bitmap functions, users would benefit if we'll handle a case
of small-size bitmaps that fit into a single word.

While here, move the find_last_bit() declaration to bitops/find.h where
other find_*_bit() functions sit.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Acked-by: Andy Shevchenko <[email protected]>
Cc: Alexey Klimov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Jianpeng Ma <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Stefano Brivio <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: add fast path for find_next_*_bit()

Similarly to bitmap functions, find_next_*_bit() users will benefit if
we'll handle a case of bitmaps that fit into a single word inline.  In the
very best case, the compiler may replace a function call with a few
instructions.

This is the quite typical find_next_bit() user:

unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
/* -1 is a legal arg here. */
if (n != -1)
cpumask_check(n);
return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
}
EXPORT_SYMBOL(cpumask_next);

Currently, on ARM64 the generated code looks like this:
0000000000000000 <cpumask_next>:
   0:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
   4:   11000402        add     w2, w0, #0x1
   8:   aa0103e0        mov     x0, x1
   c:   d2800401        mov     x1, #0x40                       // #64
  10:   910003fd        mov     x29, sp
  14:   93407c42        sxtw    x2, w2
  18:   94000000        bl      0 <find_next_bit>
  1c:   a8c17bfd        ldp     x29, x30, [sp], #16
  20:   d65f03c0        ret
  24:   d503201f        nop

After applying this patch:
0000000000000140 <cpumask_next>:
140:   11000400        add     w0, w0, #0x1
144:   93407c00        sxtw    x0, w0
148:   f100fc1f        cmp     x0, #0x3f
14c:   54000168        b.hi    178 <cpumask_next+0x38>  // b.pmore
150:   f9400023        ldr     x3, [x1]
154:   92800001        mov     x1, #0xffffffffffffffff         // #-1
158:   9ac02020        lsl     x0, x1, x0
15c:   52800802        mov     w2, #0x40                       // #64
160:   8a030001        and     x1, x0, x3
164:   dac00020        rbit    x0, x1
168:   f100003f        cmp     x1, #0x0
16c:   dac01000        clz     x0, x0
170:   1a800040        csel    w0, w2, w0, eq  // eq = none
174:   d65f03c0        ret
178:   52800800        mov     w0, #0x40                       // #64
17c:   d65f03c0        ret

find_next_bit() call is replaced with 6 instructions.  find_next_bit()
itself is 41 instructions plus function call overhead.

Despite inlining, the scripts/bloat-o-meter report smaller .text size
after applying the series:
add/remove: 11/9 grow/shrink: 233/176 up/down: 5780/-6768 (-988)

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Acked-by: Andy Shevchenko <[email protected]>
Cc: Alexey Klimov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Jianpeng Ma <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Stefano Brivio <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

tools: sync find_next_bit implementation

Sync the implementation with recent kernel changes.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Cc: Alexey Klimov <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Jianpeng Ma <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Stefano Brivio <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: inline _find_next_bit() wrappers

lib/find_bit.c declares five single-line wrappers for _find_next_bit().
We may turn those wrappers to inline functions. It eliminates unneeded
function calls and opens room for compile-time optimizations.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Acked-by: Andy Shevchenko <[email protected]>
Cc: Alexey Klimov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Jianpeng Ma <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Stefano Brivio <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

tools: sync small_const_nbits() macro with the kernel

Sync implementation with the kernel and move the macro from
tools/include/linux/bitmap.h to tools/include/asm-generic/bitsperlong.h

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Cc: Alexey Klimov <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Jianpeng Ma <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Stefano Brivio <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: extend the scope of small_const_nbits() macro

find_bit would also benefit from small_const_nbits() optimizations. The
detailed comment is provided by Rasmus Villemoes.

Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Rasmus Villemoes <[email protected]>
Signed-off-by: Yury Norov <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Acked-by: Andy Shevchenko <[email protected]>
Cc: Alexey Klimov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Jianpeng Ma <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Stefano Brivio <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arch: rearrange headers inclusion order in asm/bitops for m68k, sh and h8300

m68k and sh include bitmap/{find,le}.h prior to ffs/fls headers. New
fast-path implementation in find.h requires ffs/fls. Reordering the
headers inclusion sequence helps to prevent compile-time implicit function
declaration error.

[[email protected]: h8300: rearrange headers inclusion order in asm/bitops]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Cc: Alexey Klimov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Jianpeng Ma <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Stefano Brivio <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

tools: sync BITMAP_LAST_WORD_MASK() macro with the kernel

Kernel version generates better code.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Cc: Alexey Klimov <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Jianpeng Ma <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Stefano Brivio <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

tools: bitmap: sync function declarations with the kernel

Some functions in tools/include/linux/bitmap.h declare nbits as int. In
the kernel nbits is declared as unsigned int.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Cc: Alexey Klimov <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Jianpeng Ma <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Stefano Brivio <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

tools: disable -Wno-type-limits

Patch series "lib/find_bit: fast path for small bitmaps", v6.

Bitmap operations are much simpler and faster in case of small bitmaps
which fit into a single word.  In linux/bitmap.c we have a machinery that
allows compiler to replace actual function call with a few instructions if
bitmaps passed into the function are small and their size is known at
compile time.

find_*_bit() API lacks this functionality; but users will benefit from it
a lot.  One important example is cpumask subsystem when NR_CPUS <=
BITS_PER_LONG.

This patch (of 12):

GENMASK(h, l) may be passed with unsigned types.  In such case,
type-limits warning is generated for example in case of GENMASK(h, 0).

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Acked-by: Rasmus Villemoes <[email protected]>
Cc: Alexey Klimov <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Jianpeng Ma <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: John Paul Adrian Glaubitz <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Stefano Brivio <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Wolfram Sang <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/cred.c: make init_groups static

init_groups is declared in both cred.h and init_task.h, but it is not
actually referenced anywhere outside of cred.c where it is defined. So
make it static and remove the declarations.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rasmus Villemoes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/async.c: fix pr_debug statement

An async_func_t returns void - any errors encountered it has to stash
somewhere for consumers to discover later.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Rasmus Villemoes <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

linux/profile.h: remove unnecessary declaration

Declaring struct pt_regs is unnecessary. On the one hand, there is no
function using it; on the other hand, struct pt_regs has been declared in
linux/kernel.h. Remove them.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Wan Jiabing <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel.h: drop inclusion in bitmap.h

The bitmap.h header is used in a lot of code around the kernel.  Besides
that it includes kernel.h which sometimes makes a loop.

The problem here is many unneeded loops that make header hell
dependencies.  For example, how may you move bitmap_zalloc() from C-file
to the header?  Currently it's impossible.  And bitmap.h here is only the
tip of an iceberg.

kerne.h is a dump of everything that even has nothing in common at all.
We may still have it, but in my new code I prefer to include only the
headers that I want to use, without the bulk of unneeded kernel code.

Break the loop by introducing align.h, including it in kernel.h and
bitmap.h followed by replacing kernel.h with limits.h.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Yury Norov <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

include: remove pagemap.h from blkdev.h

My UEK-derived config has 1030 files depending on pagemap.h before this
change.  Afterwards, just 326 files need to be rebuilt when I touch
pagemap.h.  I think blkdev.h is probably included too widely, but
untangling that dependency is harder and this solves my problem.  x86
allmodconfig builds, but there may be implicit include problems on other
architectures.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Dan Williams <[email protected]> [nvdimm]
Acked-by: Jens Axboe <[email protected]> [block]
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Coly Li <[email protected]> [bcache]
Acked-by: Martin K. Petersen <[email protected]> [scsi]
Reviewed-by: William Kucharski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

proc/sysctl: fix function name error in comments

The function name should be modified to register_sysctl_paths instead of
register_sysctl_table_path.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: zhouchuangao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftests: proc: test subset=pid

Test that /proc instance mounted with

mount -t proc -o subset=pid

contains only ".", "..", "self", "thread-self" and pid directories.

Note:
Currently "subset=pid" doesn't return "." and ".." via readdir.
This must be a bug.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexey Dobriyan <[email protected]>
Acked-by: Alexey Gladkov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

proc: delete redundant subset=pid check

Two checks in lookup and readdir code should be enough to not have third
check in open code.

Can't open what can't be looked up?

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexey Dobriyan <[email protected]>
Acked-by: Alexey Gladkov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

proc: mandate ->proc_lseek in "struct proc_ops"

Now that proc_ops are separate from file_operations and other operations
it easy to check all instances to have ->proc_lseek hook and remove check
in main code.

Note:
nonseekable_open() files naturally don't require ->proc_lseek.

Garbage collect pde_lseek() function.

[[email protected]: smoke test lseek()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/YFYX0Bzwxlc7aBa/@localhost.localdomain
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

proc: save LOC in __xlate_proc_name()

Can't look at this verbosity anymore.

Link: https://lkml.kernel.org/r/YFYXAp/[email protected]
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/proc/generic.c: fix incorrect pde_is_permanent check

Currently the pde_is_permanent() check is being run on root multiple times
rather than on the next proc directory entry. This looks like a
copy-paste error. Fix this by replacing root with next.

Addresses-Coverity: ("Copy-paste error")
Link: https://lkml.kernel.org/r/[email protected]
Fixes: d919b33dafb3 ("proc: faster open/read/close with "permanent" files")
Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Reviewed-by: Alexey Dobriyan <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

alpha: csum_partial_copy.c: add function prototypes from <net/checksum.h>

Fix "no previous prototype" W=1 warnings from the kernel test robot:

  arch/alpha/lib/csum_partial_copy.c:349:1: error: no previous prototype for 'csum_and_copy_from_user' [-Werror=missing-prototypes]
  349 | csum_and_copy_from_user(const void __user *src, void *dst, int len)
      | ^~~~~~~~~~~~~~~~~~~~~~~
  arch/alpha/lib/csum_partial_copy.c:358:1: error: no previous prototype for 'csum_partial_copy_nocheck' [-Werror=missing-prototypes]
  358 | csum_partial_copy_nocheck(const void *src, void *dst, int len)
      | ^~~~~~~~~~~~~~~~~~~~~~~~~

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 808b49da54e6 ("alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user()")
Signed-off-by: Randy Dunlap <[email protected]>
Reported-by: kernel test robot <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

alpha: eliminate old-style function definitions

'make ARCH=alpha W=1' reports a couple of old-style function
definitions with missing parameter list, so fix those.

  arch/alpha/kernel/pc873xx.c: In function 'pc873xx_get_base':
  arch/alpha/kernel/pc873xx.c:16:21: warning: old-style function definition [-Wold-style-definition]
   16 | unsigned int __init pc873xx_get_base()

  arch/alpha/kernel/pc873xx.c: In function 'pc873xx_get_model':
  arch/alpha/kernel/pc873xx.c:21:14: warning: old-style function definition [-Wold-style-definition]
   21 | char *__init pc873xx_get_model()

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'drm-intel-next-fixes-2021-04-30' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

drm/i915 GVT fixes for v5.13-rc1:
- Fix a possible division by zero in vgpu display rate calculation

Signed-off-by: Dave Airlie <[email protected]>
From: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

tcp: Specify cmsgbuf is user pointer for receive zerocopy.

A prior change (1f466e1f15cf) introduces separate handling for
->msg_control depending on whether the pointer is a kernel or user
pointer. However, while tcp receive zerocopy is using this field, it
is not properly annotating that the buffer in this case is a user
pointer. This can cause faults when the improper mechanism is used
within put_cmsg().

This patch simply annotates tcp receive zerocopy's use as explicitly
being a user pointer.

Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.")
Signed-off-by: Arjun Roy <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: spectrum_mr: Update egress RIF list before route's action

Each multicast route that is forwarding packets (as opposed to trapping
them) points to a list of egress router interfaces (RIFs) through which
packets are replicated.

A route's action can transition from trap to forward when a RIF is
created for one of the route's egress virtual interfaces (eVIF). When
this happens, the route's action is first updated and only later the
list of egress RIFs is committed to the device.

This results in the route pointing to an invalid list. In case the list
pointer is out of range (due to uninitialized memory), the device will
complain:

mlxsw_spectrum2 0000:06:00.0: EMAD reg access failed (tid=5733bf490000905c,reg_id=300f(pefa),type=write,status=7(bad parameter))

Fix this by first committing the list of egress RIFs to the device and
only later update the route's action.

Note that a fix is not needed in the reverse function (i.e.,
mlxsw_sp_mr_route_evif_unresolve()), as there the route's action is
first updated and only later the RIF is removed from the list.

Cc: [email protected]
Fixes: c011ec1bbfd6 ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: ipa: fix inter-EE IRQ register definitions

In gsi_irq_setup(), two registers are written with the intention of
disabling inter-EE channel and event IRQs.

But the wrong registers are used (and defined); the ones used are
read-only registers that indicate whether the interrupt condition is
present.

Define the mask registers instead of the status registers, and use
them to disable the inter-EE interrupt types.

Fixes: 46f748ccaf01 ("net: ipa: explicitly disallow inter-EE interrupts")
Signed-off-by: Alex Elder <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'linux-can-fixes-for-5.13-20210506' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2021-05-06

The first two patches target the mcp251xfd driver. Dan Carpenter's
patch fixes a NULL pointer dereference in the probe function's error
path. A patch by me adds the missing can_rx_offload_del() in error
path of the probe function.

Frieder Schrempf contributes a patch for the mcp251x driver, the patch
fixes the resume from sleep before interface was brought up.

The last patch is by me and fixes a race condition in the TX path of
the m_can driver for peripheral (SPI) based m_can cores.

* tag 'linux-can-fixes-for-5.13-20210506' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
  can: mcp251x: fix resume from sleep before interface was brought up
  can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
  can: mcp251xfd: mcp251xfd_probe(): fix an error pointer dereference in probe
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Input: xpad - add support for Amazon Game Controller

The Amazon Luna controller (product name "Amazon Game Controller") behaves
like an Xbox 360 controller when connected over USB.

Signed-off-by: Matt Reynolds <[email protected]>
Reviewed-by: Harry Cutts <[email protected]>
Link: https://lore.kernel.org/r/20210429103548.1.If5f9a44cb81e25b9350f7c6c0b3c88b4ecd81166@changeid
Signed-off-by: Dmitry Torokhov <[email protected]>

Input: ili210x - add missing negation for touch indication on ili210x

This adds the negation needed for proper finger detection on Ilitek
ili2107/ili210x. This fixes polling issues (on Amazon Kindle Fire)
caused by returning false for the cooresponding finger on the touchscreen.

Signed-off-by: Hansem Ro <[email protected]>
Fixes: e3559442afd2a ("ili210x - rework the touchscreen sample processing")
Cc: [email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>

Merge tag 's390-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Heiko Carstens:

- add support for system call stack randomization

- handle stale PCI deconfiguration events

- couple of defconfig updates

- some fixes and cleanups

* tag 's390-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: fix detection of vector enhancements facility 1 vs. vector packed decimal facility
  s390/entry: add support for syscall stack randomization
  s390/configs: change CONFIG_VIRTIO_CONSOLE to "m"
  s390/cio: remove invalid condition on IO_SCH_UNREG
  s390/cpumf: remove call to perf_event_update_userpage
  s390/cpumf: move counter set size calculation to common place
  s390/cpumf: beautify if-then-else indentation
  s390/configs: enable CONFIG_PCI_IOV
  s390/pci: handle stale deconfiguration events
  s390/pci: rename zpci_configure_device()

Merge tag 'vfio-v5.13-rc1pt2' of git://github.com/awilliam/linux-vfio

Pull more VFIO updates from Alex Williamson:
"A second small set of commits for this merge window, primarily to
  unbreak some deletions from our uAPI header.

   - Additional mdev sample driver cleanup (Dan Carpenter)

   - Doc fix (Alyssa Ross)

   - Unbreak uAPI from NVLink2 support removal (Alex Williamson)"

* tag 'vfio-v5.13-rc1pt2' of git://github.com/awilliam/linux-vfio:
  docs: vfio: fix typo
  vfio/pci: Revert nvlink removal uAPI breakage
  vfio/mdev: remove unnecessary NULL check in mbochs_create()

futex: Make syscall entry points less convoluted

The futex and the compat syscall entry points do pretty much the same
except for the timespec data types and the corresponding copy from
user function.

Split out the rest into inline functions and share the functionality.

Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

futex: Get rid of the val2 conditional dance

There is no point in checking which FUTEX operand treats the utime pointer
as 'val2' argument because that argument to do_futex() is only used by
exactly these operands.

So just handing it in unconditionally is not making any difference, but
removes a lot of pointless gunk.

Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

futex: Do not apply time namespace adjustment on FUTEX_LOCK_PI

FUTEX_LOCK_PI does not require to have the FUTEX_CLOCK_REALTIME bit set
because it has been using CLOCK_REALTIME based absolute timeouts
forever. Due to that, the time namespace adjustment which is applied when
FUTEX_CLOCK_REALTIME is not set, will wrongly take place for FUTEX_LOCK_PI
and wreckage the timeout.

Exclude it from that procedure.

Fixes: c2f7d08cccf4 ("futex: Adjust absolute futex timeouts with per time namespace offset")
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]

Revert 337f13046ff0 ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op")

The FUTEX_WAIT operand has historically a relative timeout which means that
the clock id is irrelevant as relative timeouts on CLOCK_REALTIME are not
subject to wall clock changes and therefore are mapped by the kernel to
CLOCK_MONOTONIC for simplicity.

If a caller would set FUTEX_CLOCK_REALTIME for FUTEX_WAIT the timeout is
still treated relative vs. CLOCK_MONOTONIC and then the wait arms that
timeout based on CLOCK_REALTIME which is broken and obviously has never
been used or even tested.

Reject any attempt to use FUTEX_CLOCK_REALTIME with FUTEX_WAIT again.

The desired functionality can be achieved with FUTEX_WAIT_BITSET and a
FUTEX_BITSET_MATCH_ANY argument.

Fixes: 337f13046ff0 ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op")
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]

Merge branch 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux

Pull pcmcia updates from Dominik Brodowski:
"A number of patches fixing W=1 kernel build warnings, and one patch
  removing an always-false NULL check"

* 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux:
  pcmcia: rsrc_nonstatic: Fix call-back function as reference formatting
  pcmcia: pcmcia_resource: Fix some kernel-doc formatting/disparities and demote others
  pcmcia: ds: Fix function name disparity in header
  pcmcia: pcmcia_cis: Demote non-conforming kernel-doc headers to standard kernel-doc
  pcmcia: cistpl: Demote non-conformant kernel-doc headers to standard comments
  pcmcia: rsrc_nonstatic: Demote kernel-doc abuses
  pcmcia: ds: Remove if with always false condition

Merge tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
"Notable items here are

   - a series to take advantage of David Howells' netfs helper library
     from Jeff

   - three new filesystem client metrics from Xiubo

   - ceph.dir.rsnaps vxattr from Yanhu

   - two auth-related fixes from myself, marked for stable.

  Interspersed is a smattering of assorted fixes and cleanups across the
  filesystem"

* tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-client: (24 commits)
  libceph: allow addrvecs with a single NONE/blank address
  libceph: don't set global_id until we get an auth ticket
  libceph: bump CephXAuthenticate encoding version
  ceph: don't allow access to MDS-private inodes
  ceph: fix up some bare fetches of i_size
  ceph: convert some PAGE_SIZE invocations to thp_size()
  ceph: support getting ceph.dir.rsnaps vxattr
  ceph: drop pinned_page parameter from ceph_get_caps
  ceph: fix inode leak on getattr error in __fh_to_dentry
  ceph: only check pool permissions for regular files
  ceph: send opened files/pinned caps/opened inodes metrics to MDS daemon
  ceph: avoid counting the same request twice or more
  ceph: rename the metric helpers
  ceph: fix kerneldoc copypasta over ceph_start_io_direct
  ceph: use attach/detach_page_private for tracking snap context
  ceph: don't use d_add in ceph_handle_snapdir
  ceph: don't clobber i_snap_caps on non-I_NEW inode
  ceph: fix fall-through warnings for Clang
  ceph: convert ceph_readpages to ceph_readahead
  ceph: convert ceph_write_begin to netfs_write_begin
  ...

Merge tag 'ecryptfs-5.13-rc1-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs updates from Tyler Hicks:
"Code cleanups and a bug fix

   - W=1 compiler warning cleanups

   - Mutex initialization simplification

   - Protect against NULL pointer exception during mount"

* tag 'ecryptfs-5.13-rc1-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  ecryptfs: fix kernel panic with null dev_name
  ecryptfs: remove unused helpers
  ecryptfs: Fix typo in message
  eCryptfs: Use DEFINE_MUTEX() for mutex lock
  ecryptfs: keystore: Fix some kernel-doc issues and demote non-conformant headers
  ecryptfs: inode: Help out nearly-there header and demote non-conformant ones
  ecryptfs: mmap: Help out one function header and demote other abuses
  ecryptfs: crypto: Supply some missing param descriptions and demote abuses
  ecryptfs: miscdev: File headers are not good kernel-doc candidates
  ecryptfs: main: Demote a bunch of non-conformant kernel-doc headers
  ecryptfs: messaging: Add missing param descriptions and demote abuses
  ecryptfs: super: Fix formatting, naming and kernel-doc abuses
  ecryptfs: file: Demote kernel-doc abuses
  ecryptfs: kthread: Demote file header and provide description for 'cred'
  ecryptfs: dentry: File headers are not good candidates for kernel-doc
  ecryptfs: debug: Demote a couple of kernel-doc abuses
  ecryptfs: read_write: File headers do not make good candidates for kernel-doc
  ecryptfs: use DEFINE_MUTEX() for mutex lock
  eCryptfs: add a semicolon

Merge tag 'trace-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"Fix probes written to the set_ftrace_filter file

  Now that there's a library that accesses the tracefs file system
  (libtracefs), the way the files are interacted with is slightly
  different than the command line. For instance, the write() system call
  is used directly instead of an echo. This exposes some old bugs.

  If a probe is written to "set_ftrace_filter" without any white space
  after it, it will be ignored. This is because the write expects that a
  string written to it that does not end with white spaces thinks there
  is more to come. But if the file is closed, the release function needs
  to finish it. The "set_ftrace_filter" release function handles the
  filter part of the "set_ftrace_filter" file, but did not handle the
  probe part"

* tag 'trace-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Handle commands when closing set_ftrace_filter file

Merge tag 'acpi-5.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These revert one recent commit that turned out to be problematic,
  address two issues in the ACPI "custom method" interface and update
  GPIO properties documentation.

  Specifics:

   - Revent recent commit related to the handling of ACPI power
     resources during initialization, because it turned out to cause
     problems to occur on some systems (Rafael Wysocki).

   - Fix potential use-after-free and potential memory leak in the ACPI
     "custom method" debugfs interface (Mark Langsdorf).

   - Update ACPI GPIO properties documentation to cover assumptions
     regarding GPIO polarity (Andy Shevchenko)"

* tag 'acpi-5.13-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "ACPI: scan: Turn off unused power resources during initialization"
  ACPI: custom_method: fix a possible memory leak
  ACPI: custom_method: fix potential use-after-free issue
  Documentation: firmware-guide: gpio-properties: Add note to SPI CS case

Merge tag 'devicetree-fixes-for-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Several Renesas binding fixes to fix warnings

- Remove duplicate compatibles in 8250 binding

- Remove orphaned Sigma Designs Tango bindings

- Fix bcm2711-hdmi binding to use 'additionalProperties'

- Fix idt,32434-pic warning for missing 'interrupts' property

- Fix 'stored but not read' warnings in DT overlay code

* tag 'devicetree-fixes-for-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: net: renesas,etheravb: Fix optional second clock name
  dt-bindings: display: renesas,du: Add missing power-domains property
  dt-bindings: media: renesas,vin: Make resets optional on R-Car Gen1
  dt-bindings: PCI: rcar-pci-host: Document missing R-Car H1 support
  of: overlay: Remove redundant assignment to ret
  dt-bindings: serial: 8250: Remove duplicated compatible strings
  dt-bindings: Remove unused Sigma Designs Tango bindings
  dt-bindings: bcm2711-hdmi: Fix broken schema
  dt-bindings: interrupt-controller: idt,32434-pic: Add missing interrupts property

riscv: remove unused handle_exception symbol

Since commit 79b1feba5455 ("RISC-V: Setup exception vector early")
exception vectors are setup early and the handle_exception symbol from
the asm files is no longer referenced in traps.c. Remove it.

Signed-off-by: Rouven Czerwinski <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: Consistify protect_kernel_linear_mapping_text_rodata() use

The various uses of protect_kernel_linear_mapping_text_rodata() are
not consistent:
  - Its definition depends on "64BIT && !XIP_KERNEL",
  - Its forward declaration depends on MMU,
  - Its single caller depends on "STRICT_KERNEL_RWX && 64BIT && MMU &&
    !XIP_KERNEL".

Fix this by settling on the dependencies of the caller, which can be
simplified as STRICT_KERNEL_RWX depends on "MMU && !XIP_KERNEL".
Provide a dummy definition, as the caller is protected by
"IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)" instead of "#ifdef
CONFIG_STRICT_KERNEL_RWX".

Signed-off-by: Geert Uytterhoeven <[email protected]>
Tested-by: Alexandre Ghiti <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: enable SiFive errata CIP-453 and CIP-1200 Kconfig only if CONFIG_64BIT=y

The corresponding hardware issues of CONFIG_ERRATA_SIFIVE_CIP_453 and
CONFIG_ERRATA_SIFIVE_CIP_1200 only exist in the SiFive 64bit CPU cores.
Therefore, these two errata are required only if CONFIG_64BIT=y

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Vincent Chen <[email protected]>
Fixes: bff3ff525460 ("riscv: sifive: Apply errata "cip-1200" patch")
Fixes: 800149a77c2c ("riscv: sifive: Apply errata "cip-453" patch")
Signed-off-by: Palmer Dabbelt <[email protected]>

riscv: Only extend kernel reservation if mapped read-only

When the kernel mapping was moved outside of the linear mapping, the
kernel memory reservation was increased, to take into account mapping
granularity. However, this is done unconditionally, regardless of
whether the kernel memory is mapped read-only or not.

If this extension is not needed, up to 2 MiB may be lost, which has a
big impact on e.g. Canaan K210 (64-bit nommu) platforms with only 8 MiB
of RAM.

Reclaim the lost memory by only extending the reserved region when
needed, i.e. depending on a simplified version of the conditional logic
around the call to protect_kernel_linear_mapping_text_rodata().

Fixes: 2bfc6cd81bd17e43 ("riscv: Move kernel mapping outside of linear mapping")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Tested-by: Alexandre Ghiti <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM updates from Russell King:

- Fix BSS size calculation for LLVM

- Improve robustness of kernel entry around v7_invalidate_l1

- Fix and update kprobes assembly

- Correct breakpoint overflow handler check

- Pause function graph tracer when suspending a CPU

- Switch to generic syscallhdr.sh and syscalltbl.sh

- Remove now unused set_kernel_text_r[wo] functions

- Updates for ptdump (__init marking and using DEFINE_SHOW_ATTRIBUTE)

- Fix for interrupted SMC (secure) calls

- Remove Compaq Personal Server platform

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: footbridge: remove personal server platform
  ARM: 9075/1: kernel: Fix interrupted SMC calls
  ARM: 9074/1: ptdump: convert to DEFINE_SHOW_ATTRIBUTE
  ARM: 9073/1: ptdump: add __init section marker to three functions
  ARM: 9072/1: mm: remove set_kernel_text_r[ow]()
  ARM: 9067/1: syscalls: switch to generic syscallhdr.sh
  ARM: 9068/1: syscalls: switch to generic syscalltbl.sh
  ARM: 9066/1: ftrace: pause/unpause function graph tracer in cpu_suspend()
  ARM: 9064/1: hw_breakpoint: Do not directly check the event's overflow_handler hook
  ARM: 9062/1: kprobes: rewrite test-arm.c in UAL
  ARM: 9061/1: kprobes: fix UNPREDICTABLE warnings
  ARM: 9060/1: kexec: Remove unused kexec_reinit callback
  ARM: 9059/1: cache-v7: get rid of mini-stack
  ARM: 9058/1: cache-v7: refactor v7_invalidate_l1 to avoid clobbering r5/r6
  ARM: 9057/1: cache-v7: add missing ISB after cache level selection
  ARM: 9056/1: decompressor: fix BSS size calculation for LLVM ld.lld

Merge tag 'riscv-for-linus-5.13-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

- Support for the memtest= kernel command-line argument.

- Support for building the kernel with FORTIFY_SOURCE.

- Support for generic clockevent broadcasts.

- Support for the buildtar build target.

- Some build system cleanups to pass more LLVM-friendly arguments.

- Support for kprobes.

- A rearranged kernel memory map, the first part of supporting sv48
   systems.

- Improvements to kexec, along with support for kdump and crash
   kernels.

- An alternatives-based errata framework, along with support for
   handling a pair of errata that manifest on some SiFive designs
   (including the HiFive Unmatched).

- Support for XIP.

- A device tree for the Microchip PolarFire ICICLE SoC and associated
   dev board.

... along with a bunch of cleanups.  There are already a handful of fixes
on the list so there will likely be a part 2.

* tag 'riscv-for-linus-5.13-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (45 commits)
  RISC-V: Always define XIP_FIXUP
  riscv: Remove 32b kernel mapping from page table dump
  riscv: Fix 32b kernel build with CONFIG_DEBUG_VIRTUAL=y
  RISC-V: Fix error code returned by riscv_hartid_to_cpuid()
  RISC-V: Enable Microchip PolarFire ICICLE SoC
  RISC-V: Initial DTS for Microchip ICICLE board
  dt-bindings: riscv: microchip: Add YAML documentation for the PolarFire SoC
  RISC-V: Add Microchip PolarFire SoC kconfig option
  RISC-V: enable XIP
  RISC-V: Add crash kernel support
  RISC-V: Add kdump support
  RISC-V: Improve init_resources()
  RISC-V: Add kexec support
  RISC-V: Add EM_RISCV to kexec UAPI header
  riscv: vdso: fix and clean-up Makefile
  riscv/mm: Use BUG_ON instead of if condition followed by BUG.
  riscv/kprobe: fix kernel panic when invoking sys_read traced by kprobe
  riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU
  riscv: module: Create module allocations without exec permissions
  riscv: bpf: Avoid breaking W^X
  ...

Merge tag 'hexagon-5.13-0' of git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux

Pull Hexagon updates from Brian Cain:
"Hexagon architecture build fixes + builtins

  Small build fixes applied:

   - use -mlong-calls to build

   - extend jumps in futex_atomic_*

   - etc

  Also, for convenience and portability, the hexagon compiler builtin
  functions like memcpy etc have been added to the kernel -- following
  the idiom used by other architectures"

* tag 'hexagon-5.13-0' of git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux:
  Hexagon: add target builtins to kernel
  Hexagon: remove DEBUG from comet config
  Hexagon: change jumps to must-extend in futex_atomic_*
  Hexagon: fix build errors

Merge tag 'docs-5.13-2' of git://git.lwn.net/linux

Pull documentation fixes from Jonathan Corbet:
"A few late-arriving documentation fixes, including some oprofile
  cleanup, a kernel-doc fix, some regression-reporting updates, and the
  usual minor fixes"

* tag 'docs-5.13-2' of git://git.lwn.net/linux:
  Enlisted oprofile version line removed
  oprofiled version output line removed from the list
  Removed the oprofiled version option
  docs: reporting-issues.rst: CC subsystem and maintainers on regressions
  docs: correct URL to bios and kernel developer's guide
  docs/core-api: Consistent code style
  docs/zh_CN: Adjust order and content of zh_CN/index.rst
  Documentation: input: joydev file corrections
  docs: Fix typo in Documentation/x86/x86_64/5level-paging.rst
  kernel-doc: Add support for __deprecated

block: reexpand iov_iter after read/write

We get a bug:

BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
lib/iov_iter.c:1139
Read of size 8 at addr ffff0000d3fb11f8 by task

CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
5.10.0-00843-g352c8610ccd2 #2
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x110/0x164 lib/dump_stack.c:118
print_address_description+0x78/0x5c8 mm/kasan/report.c:385
__kasan_report mm/kasan/report.c:545 [inline]
kasan_report+0x148/0x1e4 mm/kasan/report.c:562
check_memory_region_inline mm/kasan/generic.c:183 [inline]
__asan_load8+0xb4/0xbc mm/kasan/generic.c:252
iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
io_read fs/io_uring.c:3421 [inline]
io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
__io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
io_submit_sqe fs/io_uring.c:6395 [inline]
io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
__do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
__se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
__arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
__invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

Allocated by task 12570:
stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
kasan_save_stack mm/kasan/common.c:48 [inline]
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
__kmalloc+0x23c/0x334 mm/slub.c:3970
kmalloc include/linux/slab.h:557 [inline]
__io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210
io_setup_async_rw fs/io_uring.c:3229 [inline]
io_read fs/io_uring.c:3436 [inline]
io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943
__io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
io_submit_sqe fs/io_uring.c:6395 [inline]
io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
__do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
__se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
__arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
__invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

Freed by task 12570:
stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
kasan_save_stack mm/kasan/common.c:48 [inline]
kasan_set_track+0x38/0x6c mm/kasan/common.c:56
kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
__kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
slab_free_hook mm/slub.c:1544 [inline]
slab_free_freelist_hook mm/slub.c:1577 [inline]
slab_free mm/slub.c:3142 [inline]
kfree+0x104/0x38c mm/slub.c:4124
io_dismantle_req fs/io_uring.c:1855 [inline]
__io_free_req+0x70/0x254 fs/io_uring.c:1867
io_put_req_find_next fs/io_uring.c:2173 [inline]
__io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279
__io_req_task_submit+0x154/0x21c fs/io_uring.c:2051
io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063
task_work_run+0xdc/0x128 kernel/task_work.c:151
get_signal+0x6f8/0x980 kernel/signal.c:2562
do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658
do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722
work_pending+0xc/0x180

blkdev_read_iter can truncate iov_iter's count since the count + pos may
exceed the size of the blkdev. This will confuse io_read that we have
consume the iovec. And once we do the iov_iter_revert in io_read, we
will trigger the slab-out-of-bounds. Fix it by reexpand the count with
size has been truncated.

blkdev_write_iter can trigger the problem too.

Signed-off-by: yangerkun <[email protected]>
Acked-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

Merge branches 'acpi-pm' and 'acpi-docs'

* acpi-pm:
Revert "ACPI: scan: Turn off unused power resources during initialization"

* acpi-docs:
Documentation: firmware-guide: gpio-properties: Add note to SPI CS case

locking/qrwlock: Cleanup queued_write_lock_slowpath()

Make the code more readable by replacing the atomic_cmpxchg_acquire()
by an equivalent atomic_try_cmpxchg_acquire() and change atomic_add()
to atomic_or().

For architectures that use qrwlock, I do not find one that has an
atomic_add() defined but not an atomic_or(). I guess it should be fine
by changing atomic_add() to atomic_or().

Note that the previous use of atomic_add() isn't wrong as only one
writer that is the wait_lock owner can set the waiting flag and the
flag will be cleared later on when acquiring the write lock.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Will Deacon <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

smp: Fix smp_call_function_single_async prototype

As of commit 966a967116e6 ("smp: Avoid using two cache lines for struct
call_single_data"), the smp code prefers 32-byte aligned call_single_data
objects for performance reasons, but the block layer includes an instance
of this structure in the main 'struct request' that is more senstive
to size than to performance here, see 4ccafe032005 ("block: unalign
call_single_data in struct request").

The result is a violation of the calling conventions that clang correctly
points out:

block/blk-mq.c:630:39: warning: passing 8-byte aligned argument to 32-byte aligned parameter 2 of 'smp_call_function_single_async' may result in an unaligned pointer access [-Walign-mismatch]
smp_call_function_single_async(cpu, &rq->csd);

It does seem that the usage of the call_single_data without cache line
alignment should still be allowed by the smp code, so just change the
function prototype so it accepts both, but leave the default alignment
unchanged for the other users. This seems better to me than adding
a local hack to shut up an otherwise correct warning in the caller.

Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Jens Axboe <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

x86/events/amd/iommu: Fix invalid Perf result due to IOMMU PMC power-gating

On certain AMD platforms, when the IOMMU performance counter source
(csource) field is zero, power-gating for the counter is enabled, which
prevents write access and returns zero for read access.

This can cause invalid perf result especially when event multiplexing
is needed (i.e. more number of events than available counters) since
the current logic keeps track of the previously read counter value,
and subsequently re-program the counter to continue counting the event.
With power-gating enabled, we cannot gurantee successful re-programming
of the counter.

Workaround this issue by :

1. Modifying the ordering of setting/reading counters and enabing/
   disabling csources to only access the counter when the csource
   is set to non-zero.

2. Since AMD IOMMU PMU does not support interrupt mode, the logic
   can be simplified to always start counting with value zero,
   and accumulate the counter value when stopping without the need
   to keep track and reprogram the counter with the previously read
   counter value.

This has been tested on systems with and without power-gating.

Fixes: 994d6608efe4 ("iommu/amd: Remove performance counter pre-initialization test")
Suggested-by: Alexander Monakov <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

sched/fair: Fix unfairness caused by missing load decay

This fixes an issue where old load on a cfs_rq is not properly decayed,
resulting in strange behavior where fairness can decrease drastically.
Real workloads with equally weighted control groups have ended up
getting a respective 99% and 1%(!!) of cpu time.

When an idle task is attached to a cfs_rq by attaching a pid to a cgroup,
the old load of the task is attached to the new cfs_rq and sched_entity by
attach_entity_cfs_rq. If the task is then moved to another cpu (and
therefore cfs_rq) before being enqueued/woken up, the load will be moved
to cfs_rq->removed from the sched_entity. Such a move will happen when
enforcing a cpuset on the task (eg. via a cgroup) that force it to move.

The load will however not be removed from the task_group itself, making
it look like there is a constant load on that cfs_rq. This causes the
vruntime of tasks on other sibling cfs_rq's to increase faster than they
are supposed to; causing severe fairness issues. If no other task is
started on the given cfs_rq, and due to the cpuset it would not happen,
this load would never be properly unloaded. With this patch the load
will be properly removed inside update_blocked_averages. This also
applies to tasks moved to the fair scheduling class and moved to another
cpu, and this path will also fix that. For fork, the entity is queued
right away, so this problem does not affect that.

This applies to cases where the new process is the first in the cfs_rq,
issue introduced 3d30544f0212 ("sched/fair: Apply more PELT fixes"), and
when there has previously been load on the cgroup but the cgroup was
removed from the leaflist due to having null PELT load, indroduced
in 039ae8bcf7a5 ("sched/fair: Fix O(nr_cgroups) in the load balancing
path").

For a simple cgroup hierarchy (as seen below) with two equally weighted
groups, that in theory should get 50/50 of cpu time each, it often leads
to a load of 60/40 or 70/30.

parent/
  cg-1/
    cpu.weight: 100
    cpuset.cpus: 1
  cg-2/
    cpu.weight: 100
    cpuset.cpus: 1

If the hierarchy is deeper (as seen below), while keeping cg-1 and cg-2
equally weighted, they should still get a 50/50 balance of cpu time.
This however sometimes results in a balance of 10/90 or 1/99(!!) between
the task groups.

$ ps u -C stress
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       18568  1.1  0.0   3684   100 pts/12   R+   13:36   0:00 stress --cpu 1
root       18580 99.3  0.0   3684   100 pts/12   R+   13:36   0:09 stress --cpu 1

parent/
  cg-1/
    cpu.weight: 100
    sub-group/
      cpu.weight: 1
      cpuset.cpus: 1
  cg-2/
    cpu.weight: 100
    sub-group/
      cpu.weight: 10000
      cpuset.cpus: 1

This can be reproduced by attaching an idle process to a cgroup and
moving it to a given cpuset before it wakes up. The issue is evident in
many (if not most) container runtimes, and has been reproduced
with both crun and runc (and therefore docker and all its "derivatives"),
and with both cgroup v1 and v2.

Fixes: 3d30544f0212 ("sched/fair: Apply more PELT fixes")
Fixes: 039ae8bcf7a5 ("sched/fair: Fix O(nr_cgroups) in the load balancing path")
Signed-off-by: Odin Ugedal <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

sched: Fix out-of-bound access in uclamp

Util-clamp places tasks in different buckets based on their clamp values
for performance reasons. However, the size of buckets is currently
computed using a rounding division, which can lead to an off-by-one
error in some configurations.

For instance, with 20 buckets, the bucket size will be 1024/20=51. A
task with a clamp of 1024 will be mapped to bucket id 1024/51=20. Sadly,
correct indexes are in range [0,19], hence leading to an out of bound
memory access.

Clamp the bucket id to fix the issue.

Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting")
Suggested-by: Qais Yousef <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

psi: Fix psi state corruption when schedule() races with cgroup move

4117cebf1a9f ("psi: Optimize task switch inside shared cgroups")
introduced a race condition that corrupts internal psi state. This
manifests as kernel warnings, sometimes followed by bogusly high IO
pressure:

  psi: task underflow! cpu=1 t=2 tasks=[0 0 0 0] clear=c set=0
  (schedule() decreasing RUNNING and ONCPU, both of which are 0)

  psi: incosistent task state! task=2412744:systemd cpu=17 psi_flags=e clear=3 set=0
  (cgroup_move_task() clearing MEMSTALL and IOWAIT, but task is MEMSTALL | RUNNING | ONCPU)

What the offending commit does is batch the two psi callbacks in
schedule() to reduce the number of cgroup tree updates. When prev is
deactivated and removed from the runqueue, nothing is done in psi at
first; when the task switch completes, TSK_RUNNING and TSK_IOWAIT are
updated along with TSK_ONCPU.

However, the deactivation and the task switch inside schedule() aren't
atomic: pick_next_task() may drop the rq lock for load balancing. When
this happens, cgroup_move_task() can run after the task has been
physically dequeued, but the psi updates are still pending. Since it
looks at the task's scheduler state, it doesn't move everything to the
new cgroup that the task switch that follows is about to clear from
it. cgroup_move_task() will leak the TSK_RUNNING count in the old
cgroup, and psi_sched_switch() will underflow it in the new cgroup.

A similar thing can happen for iowait. TSK_IOWAIT is usually set when
a p->in_iowait task is dequeued, but again this update is deferred to
the switch. cgroup_move_task() can see an unqueued p->in_iowait task
and move a non-existent TSK_IOWAIT. This results in the inconsistent
task state warning, as well as a counter underflow that will result in
permanent IO ghost pressure being reported.

Fix this bug by making cgroup_move_task() use task->psi_flags instead
of looking at the potentially mismatching scheduler state.

[ We used the scheduler state historically in order to not rely on
  task->psi_flags for anything but debugging. But that ship has sailed
  anyway, and this is simpler and more robust.

  We previously already batched TSK_ONCPU clearing with the
  TSK_RUNNING update inside the deactivation call from schedule(). But
  that ordering was safe and didn't result in TSK_ONCPU corruption:
  unlike most places in the scheduler, cgroup_move_task() only checked
  task_current() and handled TSK_ONCPU if the task was still queued. ]

Fixes: 4117cebf1a9f ("psi: Optimize task switch inside shared cgroups")
Signed-off-by: Johannes Weiner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

sched,doc: sched_debug_verbose cmdline should be sched_verbose

The cmdline should include sched_verbose but not sched_debug_verbose
as sched_debug_verbose is only the variant name in code.

Fixes: 9406415f46 ("sched/debug: Rename the sched_debug parameter to sched_verbose")
Signed-off-by: Barry Song <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

arm64: kernel: Update the stale comment

Commit af391b15f7b5 ("arm64: kernel: rename __cpu_suspend to keep it aligned with arm")
has used @index instead of @arg, but the comment is stale, update it.

Cc: Sudeep Holla <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Shaokun Zhang <[email protected]>
Reviewed-by: Sudeep Holla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

ALSA: hda: generic: change the DAC ctl name for LO+SPK or LO+HP

Without this change, the DAC ctl's name could be changed only when
the machine has both Speaker and Headphone, but we met some machines
which only has Lineout and Headhpone, and the Lineout and Headphone
share the Audio Mixer0 and DAC0, the ctl's name is set to "Front".

On most of machines, the "Front" is used for Speaker only or Lineout
only, but on this machine it is shared by Lineout and Headphone,
This introduces an issue in the pipewire and pulseaudio, suppose users
want the Headphone to be on and the Speaker/Lineout to be off, they
could turn off the "Front", this works on most of the machines, but on
this machine, the "Front" couldn't be turned off otherwise the
headphone will be off too. Here we do some change to let the ctl's
name change to "Headphone+LO" on this machine, and pipewire and
pulseaudio already could handle "Headphone+LO" and "Speaker+LO".
(https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/747)

BugLink: http://bugs.launchpad.net/bugs/804178
Signed-off-by: Hui Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

can: m_can: m_can_tx_work_queue(): fix tx_skb race condition

The m_can_start_xmit() function checks if the cdev->tx_skb is NULL and
returns with NETDEV_TX_BUSY in case tx_sbk is not NULL.

There is a race condition in the m_can_tx_work_queue(), where first
the skb is send to the driver and then the case tx_sbk is set to NULL.
A TX complete IRQ might come in between and wake the queue, which
results in tx_skb not being cleared yet.

Fixes: f524f829b75a ("can: m_can: Create a m_can platform framework")
Tested-by: Torin Cooper-Bennun <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: mcp251x: fix resume from sleep before interface was brought up

Since 8ce8c0abcba3 the driver queues work via priv->restart_work when
resuming after suspend, even when the interface was not previously
enabled. This causes a null dereference error as the workqueue is only
allocated and initialized in mcp251x_open().

To fix this we move the workqueue init to mcp251x_can_probe() as there
is no reason to do it later and repeat it whenever mcp251x_open() is
called.

Fixes: 8ce8c0abcba3 ("can: mcp251x: only reset hardware as required")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Frieder Schrempf <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
[mkl: fix error handling in mcp251x_stop()]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path

This patch adds the missing can_rx_offload_del(), that must be called
if mcp251xfd_register() fails.

Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: mcp251xfd: mcp251xfd_probe(): fix an error pointer dereference in probe

When we converted this code to use dev_err_probe() we accidentally
removed a return. It means that if devm_clk_get() it will lead to an
Oops when we call clk_get_rate() on the next line.

Fixes: cf8ee6de2543 ("can: mcp251xfd: mcp251xfd_probe(): use dev_err_probe() to simplify error handling")
Link: https://lore.kernel.org/r/YJANZf13Qxd5Mhr1@mwanda
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

drm/amdgpu: Use device specific BO size & stride check.

The builtin size check isn't really the right thing for AMD
modifiers due to a couple of reasons:

1) In the format structs we don't do set any of the tilesize / blocks
etc. to avoid having format arrays per modifier/GPU
2) The pitch on the main plane is pixel_pitch * bytes_per_pixel even
for tiled ...
3) The pitch for the DCC planes is really the pixel pitch of the main
surface that would be covered by it ...

Note that we only handle GFX9+ case but we do this after converting
the implicit modifier to an explicit modifier, so on GFX9+ all
framebuffers should be checked here.

There is a TODO about DCC alignment, but it isn't worse than before
and I'd need to dig a bunch into the specifics. Getting this out in
a reasonable timeframe to make sure it gets the appropriate testing
seemed more important.

Finally as I've found that debugging addfb2 failures is a pita I was
generous adding explicit error messages to every failure case.

Fixes: f258907fdd83 ("drm/amdgpu: Verify bo size can fit framebuffer size on init.")
Tested-by: Simon Ser <[email protected]>
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Init GFX10_ADDR_CONFIG for VCN v3 in DPG mode.

Otherwise tiling modes that require the values form this field
(In particular _*_X) would be corrupted upon video decode.

Copied from the VCN v2 code.

Fixes: 99541f392b4d ("drm/amdgpu: add mc resume DPG mode for VCN3.0")
Reviewed-and-Tested by: Leo Liu <[email protected]>
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

x86/process: setup io_threads more like normal user space threads

As io_threads are fully set up USER threads it's clearer to separate the
code path from the KTHREAD logic.

The only remaining difference to user space threads is that io_threads
never return to user space again. Instead they loop within the given
worker function.

The fact that they never return to user space means they don't have an
user space thread stack. In order to indicate that to tools like gdb we
reset the stack and instruction pointers to 0.

This allows gdb attach to user space processes using io-uring, which like
means that they have io_threads, without printing worrying message like
this:

  warning: Selected architecture i386:x86-64 is not compatible with reported target architecture i386

  warning: Architecture rejected target-supplied description

The output will be something like this:

  (gdb) info threads
    Id   Target Id                  Frame
  * 1    LWP 4863 "io_uring-cp-for" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
    2    LWP 4864 "iou-mgr-4863"    0x0000000000000000 in ?? ()
    3    LWP 4865 "iou-wrk-4863"    0x0000000000000000 in ?? ()
  (gdb) thread 3
  [Switching to thread 3 (LWP 4865)]
  #0  0x0000000000000000 in ?? ()
  (gdb) bt
  #0  0x0000000000000000 in ?? ()
  Backtrace stopped: Cannot access memory at address 0x0

Fixes: 4727dc20e042 ("arch: setup PF_IO_WORKER threads like PF_KTHREAD")
Link: https://lore.kernel.org/io-uring/[email protected]/T/#m1bbf5727e3d4e839603f6ec7ed79c7eebfba6267
Signed-off-by: Stefan Metzmacher <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Andy Lutomirski <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

netfilter: nftables: Fix a memleak from userdata error path in new objects

Release object name if userdata allocation fails.

Fixes: b131c96496b3 ("netfilter: nf_tables: add userdata support for nft_object")
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: remove BUG_ON() after skb_header_pointer()

Several conntrack helpers and the TCP tracker assume that
skb_header_pointer() never fails based on upfront header validation.
Even if this should not ever happen, BUG_ON() is a too drastic measure,
remove them.

Signed-off-by: Pablo Neira Ayuso <[email protected]>

MAINTAINERS: add io_uring tool to IO_URING

The files in ./tools/io_uring/ are maintained by the IO_URING maintainers.
Reflect that fact in MAINTAINERS.

Signed-off-by: Lukas Bulwahn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffers

Read and write operations are capped to MAX_RW_COUNT. Some read ops rely on
that limit, and that is not guaranteed by the IORING_OP_PROVIDE_BUFFERS.

Truncate those lengths when doing io_add_buffers, so buffer addresses still
use the uncapped length.

Also, take the chance and change struct io_buffer len member to __u32, so
it matches struct io_provide_buffer len member.

This fixes CVE-2021-3491, also reported as ZDI-CAN-13546.

Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Reported-by: Billy Jheng Bing-Jhong (@st424204)
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

KVM: x86: Consolidate guest enter/exit logic to common helpers

Move the enter/exit logic in {svm,vmx}_vcpu_enter_exit() to common
helpers. Opportunistically update the somewhat stale comment about the
updates needing to occur immediately after VM-Exit.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

context_tracking: KVM: Move guest enter/exit wrappers to KVM's domain

Move the guest enter/exit wrappers to kvm_host.h so that KVM can manage
its context tracking vs. vtime accounting without bleeding too many KVM
details into the context tracking code.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

context_tracking: Consolidate guest enter/exit wrappers

Consolidate the guest enter/exit wrappers, providing and tweaking stubs
as needed. This will allow moving the wrappers under KVM without having
to bleed #ifdefs into the soon-to-be KVM code.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

sched/vtime: Move guest enter/exit vtime accounting to vtime.h

Provide separate helpers for guest enter vtime accounting (in addition to
the existing guest exit helpers), and move all vtime accounting helpers
to vtime.h where the existing #ifdef infrastructure can be leveraged to
better delineate the different types of accounting. This will also allow
future cleanups via deduplication of context tracking code.

Opportunstically delete the vtime_account_kernel() stub now that all
callers are wrapped with CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

sched/vtime: Move vtime accounting external declarations above inlines

Move the blob of external declarations (and their stubs) above the set of
inline definitions (and their stubs) for vtime accounting. This will
allow a future patch to bring in more inline definitions without also
having to shuffle large chunks of code.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

KVM: x86: Defer vtime accounting 'til after IRQ handling

Defer the call to account guest time until after servicing any IRQ(s)
that happened in the guest or immediately after VM-Exit. Tick-based
accounting of vCPU time relies on PF_VCPU being set when the tick IRQ
handler runs, and IRQs are blocked throughout the main sequence of
vcpu_enter_guest(), including the call into vendor code to actually
enter and exit the guest.

This fixes a bug where reported guest time remains '0', even when
running an infinite loop in the guest:

https://bugzilla.kernel.org/show_bug.cgi?id=209831

Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs")
Suggested-by: Thomas Gleixner <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]

context_tracking: Move guest exit vtime accounting to separate helpers

Provide separate vtime accounting functions for guest exit instead of
open coding the logic within the context tracking code. This will allow
KVM x86 to handle vtime accounting slightly differently when using
tick-based accounting.

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

context_tracking: Move guest exit context tracking to separate helpers

Provide separate context tracking helpers for guest exit, the standalone
helpers will be called separately by KVM x86 in later patches to fix
tick-based accounting.

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

KVM/VMX: Invoke NMI non-IST entry instead of IST entry

In VMX, the host NMI handler needs to be invoked after NMI VM-Exit.
Before commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect
call instead of INTn"), this was done by INTn ("int $2"). But INTn
microcode is relatively expensive, so the commit reworked NMI VM-Exit
handling to invoke the kernel handler by function call.

But this missed a detail. The NMI entry point for direct invocation is
fetched from the IDT table and called on the kernel stack. But on 64-bit
the NMI entry installed in the IDT expects to be invoked on the IST stack.
It relies on the "NMI executing" variable on the IST stack to work
correctly, which is at a fixed position in the IST stack. When the entry
point is unexpectedly called on the kernel stack, the RSP-addressed "NMI
executing" variable is obviously also on the kernel stack and is
"uninitialized" and can cause the NMI entry code to run in the wrong way.

Provide a non-ist entry point for VMX which shares the C-function with
the regular NMI entry and invoke the new asm entry point instead.

On 32-bit this just maps to the regular NMI entry point as 32-bit has no
ISTs and is not affected.

[ tglx: Made it independent for backporting, massaged changelog ]

Fixes: 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn")
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Lai Jiangshan <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]

Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:
"The remainder of the main mm/ queue.

  143 patches.

  Subsystems affected by this patch series (all mm): pagecache, hugetlb,
  userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap,
  kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and
  kfence"

* emailed patches from Andrew Morton <[email protected]>: (143 commits)
  kfence: use power-efficient work queue to run delayed work
  kfence: maximize allocation wait timeout duration
  kfence: await for allocation using wait_event
  kfence: zero guard page after out-of-bounds access
  mm/process_vm_access.c: remove duplicate include
  mm/mempool: minor coding style tweaks
  mm/highmem.c: fix coding style issue
  btrfs: use memzero_page() instead of open coded kmap pattern
  iov_iter: lift memzero_page() to highmem.h
  mm/zsmalloc: use BUG_ON instead of if condition followed by BUG.
  mm/zswap.c: switch from strlcpy to strscpy
  arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
  x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE
  mm,memory_hotplug: add kernel boot option to enable memmap_on_memory
  acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported
  mm,memory_hotplug: allocate memmap from the added memory range
  mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count()
  mm,memory_hotplug: relax fully spanned sections check
  drivers/base/memory: introduce memory_block_{online,offline}
  mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove
  ...

Merge tag 'nfsd-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull more nfsd updates from Chuck Lever:
"Additional fixes and clean-ups for NFSD since tags/nfsd-5.13,
  including a fix to grant read delegations for files open for writing"

* tag 'nfsd-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix null pointer dereference in svc_rqst_free()
  SUNRPC: fix ternary sign expansion bug in tracing
  nfsd: Fix fall-through warnings for Clang
  nfsd: grant read delegations to clients holding writes
  nfsd: reshuffle some code
  nfsd: track filehandle aliasing in nfs4_files
  nfsd: hash nfs4_files by inode number
  nfsd: ensure new clients break delegations
  nfsd: removed unused argument in nfsd_startup_generic()
  nfsd: remove unused function
  svcrdma: Pass a useful error code to the send_err tracepoint
  svcrdma: Rename goto labels in svc_rdma_sendto()
  svcrdma: Don't leak send_ctxt on Send errors