Git Repo - linux.git/log

]> Git Repo - linux.git/log

Eric Biggers [Wed, 12 Aug 2020 01:35:33 +0000 (18:35 -0700)]

fs/minix: set s_maxbytes correctly

The minix filesystem leaves super_block::s_maxbytes at MAX_NON_LFS rather
than setting it to the actual filesystem-specific limit. This is broken
because it means userspace doesn't see the standard behavior like getting
EFBIG and SIGXFSZ when exceeding the maximum file size.

Fix this by setting s_maxbytes correctly.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Qiujun Huang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Eric Biggers [Wed, 12 Aug 2020 01:35:30 +0000 (18:35 -0700)]

fs/minix: reject too-large maximum file size

If the minix filesystem tries to map a very large logical block number to
its on-disk location, block_to_path() can return offsets that are too
large, causing out-of-bounds memory accesses when accessing indirect index
blocks. This should be prevented by the check against the maximum file
size, but this doesn't work because the maximum file size is read directly
from the on-disk superblock and isn't validated itself.

Fix this by validating the maximum file size at mount time.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Qiujun Huang <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Eric Biggers [Wed, 12 Aug 2020 01:35:27 +0000 (18:35 -0700)]

fs/minix: don't allow getting deleted inodes

If an inode has no links, we need to mark it bad rather than allowing it
to be accessed. This avoids WARNINGs in inc_nlink() and drop_nlink() when
doing directory operations on a fuzzed filesystem.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: [email protected]
Reported-by: [email protected]
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Qiujun Huang <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Eric Biggers [Wed, 12 Aug 2020 01:35:24 +0000 (18:35 -0700)]

fs/minix: check return value of sb_getblk()

Patch series "fs/minix: fix syzbot bugs and set s_maxbytes".

This series fixes all syzbot bugs in the minix filesystem:

KASAN: null-ptr-deref Write in get_block
KASAN: use-after-free Write in get_block
KASAN: use-after-free Read in get_block
WARNING in inc_nlink
KMSAN: uninit-value in get_block
WARNING in drop_nlink

It also fixes the minix filesystem to set s_maxbytes correctly, so that
userspace sees the correct behavior when exceeding the max file size.

This patch (of 6):

sb_getblk() can fail, so check its return value.

This fixes a NULL pointer dereference.

Originally from Qiujun Huang.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: [email protected]
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Qiujun Huang <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:35:21 +0000 (18:35 -0700)]

autofs: fix doubled word

Change doubled word "is" to "it is".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Ian Kent <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Joe Perches [Wed, 12 Aug 2020 01:35:19 +0000 (18:35 -0700)]

checkpatch: remove missing switch/case break test

This test doesn't work well and newer compilers are much better
at emitting this warning.

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Cambda Zhu <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Joe Perches [Wed, 12 Aug 2020 01:35:16 +0000 (18:35 -0700)]

checkpatch: add test for repeated words

Try to avoid adding repeated words either on the same line or consecutive
comment lines in a block

e.g.:

duplicated word in comment block

/*
* this is a comment block where the last word of the previous
* previous line is also the first word of the next line
*/

and simple duplication

/* test this this again */

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Inspired-by: Randy Dunlap <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Quentin Monnet [Wed, 12 Aug 2020 01:35:13 +0000 (18:35 -0700)]

checkpatch: fix CONST_STRUCT when const_structs.checkpatch is missing

Checkpatch reports warnings when some specific structs are not declared as
const in the code.  The list of structs to consider was initially defined
in the checkpatch.pl script itself, but it was later moved to an external
file (scripts/const_structs.checkpatch), in commit bf1fa1dae68e
("checkpatch: externalize the structs that should be const").  This
introduced two minor issues:

- When file scripts/const_structs.checkpatch is not present (for
  example, if checkpatch is run outside of the kernel directory with the
  "--no-tree" option), a warning is printed to stderr to tell the user
  that "No structs that should be const will be found". This is fair,
  but the warning is printed unconditionally, even if the option
  "--ignore CONST_STRUCT" is passed. In the latter case, we explicitly
  ask checkpatch to skip this check, so no warning should be printed.

- When scripts/const_structs.checkpatch is missing, or even when trying
  to silence the warning by adding an empty file, $const_structs is set
  to "", and the regex used for finding structs that should be const,
  "$line =~ /struct\s+($const_structs)(?!\s*\{)/)", matches all
  structs found in the code, thus reporting a number of false positives.

Let's fix the first item by skipping scripts/const_structs.checkpatch
processing if "CONST_STRUCT" checks are ignored, and the second one by
skipping the test if $const_structs is not defined. Since we modify the
read_words() function a little bit, update the checks for
$typedefsfile/$typeOtherTypedefs as well.

Signed-off-by: Quentin Monnet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Joe Perches <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Joe Perches [Wed, 12 Aug 2020 01:35:10 +0000 (18:35 -0700)]

checkpatch: add --fix option for ASSIGN_IN_IF

Add a --fix option for 2 types of single-line assignment in if statements

if ((foo = bar(...)) < BAZ) {
expands to:
foo = bar(..);
if (foo < BAZ) {
and
if ((foo = bar(...)) {
expands to:
foo = bar(...);
if (foo) {

if statements with assignments spanning multiple lines are
not converted with the --fix option.

if statements with additional logic are also not converted.

e.g.: if ((foo = bar(...)) & BAZ == BAZ) {

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Julia Lawall <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Joe Perches [Wed, 12 Aug 2020 01:35:07 +0000 (18:35 -0700)]

checkpatch: add test for possible misuse of IS_ENABLED() without CONFIG_

IS_ENABLED is almost always used with CONFIG_<FOO> defines.

Add a test to verify that the #define being tested starts with CONFIG_.

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Rikard Falkeborn [Wed, 12 Aug 2020 01:35:03 +0000 (18:35 -0700)]

lib/test_bits.c: add tests of GENMASK

Add tests of GENMASK and GENMASK_ULL.

A few test cases that should fail compilation are provided under #ifdef
TEST_GENMASK_FAILURES

[[email protected]: add MODULE_LICENSE()]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: make some functions static]
Link: http://lkml.kernel.org/r/[email protected]
Suggested-by: Andy Shevchenko <[email protected]>
Signed-off-by: Rikard Falkeborn <[email protected]>
Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: William Breathitt Gray <[email protected]>
Cc: Emil Velikov <[email protected]>
Cc: Syed Nayyar Waris <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Kars Mulder [Wed, 12 Aug 2020 01:34:56 +0000 (18:34 -0700)]

kstrto*: do not describe simple_strto*() as obsolete/replaced

The documentation of the kstrto*() functions describes kstrto*() as
"replacements" of the "obsolete" simple_strto*() functions. Both of these
terms are inaccurate: they're not replacements because they have different
behaviour, and the simple_strto*() are not obsolete because there are
cases where they have benefits over kstrto*().

Remove usage of the terms "replacement" and "obsolete" in reference to
simple_strto*(), and instead use the term "preferred over".

Fixes: 4c925d6031f71 ("kstrto*: add documentation")
Fixes: 885e68e8b7b13 ("kernel.h: update comment about simple_strto<foo>() functions")
Signed-off-by: Kars Mulder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Eldad Zack <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Mans Rullgard <[email protected]>
Cc: Petr Mladek <[email protected]>
Link: http://lkml.kernel.org/r/29b9-5f234c80-13-4e3aa200@244003027
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Kars Mulder [Wed, 12 Aug 2020 01:34:53 +0000 (18:34 -0700)]

kstrto*: correct documentation references to simple_strto*()

The documentation of the kstrto*() functions reference the simple_strtoull
function by "used as a replacement for [the obsolete] simple_strtoull".
All these functions describes themselves as replacements for the function
simple_strtoull, even though a function like kstrtol() would be more aptly
described as a replacement of simple_strtol().

Fix these references by making the documentation of kstrto*() reference
the closest simple_strto*() equivalent available. The functions
kstrto[u]int() do not have direct simple_strto[u]int() equivalences, so
these are made to refer to simple_strto[u]l() instead.

Furthermore, add parentheses after function names, as is standard in
kernel documentation.

Fixes: 4c925d6031f71 ("kstrto*: add documentation")
Signed-off-by: Kars Mulder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Eldad Zack <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Mans Rullgard <[email protected]>
Cc: Petr Mladek <[email protected]>
Link: http://lkml.kernel.org/r/1ee1-5f234c00-f3-165a6440@234394593
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Alexander A. Klimov [Wed, 12 Aug 2020 01:34:50 +0000 (18:34 -0700)]

lib/: replace HTTP links with HTTPS ones

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Signed-off-by: Alexander A. Klimov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Coly Li <[email protected]> [crc64.c]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Tiezhu Yang [Wed, 12 Aug 2020 01:34:47 +0000 (18:34 -0700)]

lib/test_lockup.c: fix return value of test_lockup_init()

Since filp_open() returns an error pointer, we should use IS_ERR() to
check the return value and then return PTR_ERR() if failed to get the
actual return value instead of always -EINVAL.

E.g. without this patch:

[root@localhost loongson]# ls no_such_file
ls: cannot access no_such_file: No such file or directory
[root@localhost loongson]# modprobe test_lockup file_path=no_such_file lock_sb_umount time_secs=60 state=S
modprobe: ERROR: could not insert 'test_lockup': Invalid argument
[root@localhost loongson]# dmesg | tail -1
[ 126.100596] test_lockup: cannot find file_path

With this patch:

[root@localhost loongson]# ls no_such_file
ls: cannot access no_such_file: No such file or directory
[root@localhost loongson]# modprobe test_lockup file_path=no_such_file lock_sb_umount time_secs=60 state=S
modprobe: ERROR: could not insert 'test_lockup': Unknown symbol in module, or unknown parameter (see dmesg)
[root@localhost loongson]# dmesg | tail -1
[ 95.134362] test_lockup: failed to open no_such_file: -2

Fixes: aecd42df6d39 ("lib/test_lockup.c: add parameters for locking generic vfs locks")
Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Kees Cook <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Tiezhu Yang [Wed, 12 Aug 2020 01:34:44 +0000 (18:34 -0700)]

lib/Kconfig.debug: make TEST_LOCKUP depend on module

Since test_lockup is a test module to generate lockups, it is better to
limit TEST_LOCKUP to module (=m) or disabled (=n) because we can not use
the module parameters when CONFIG_TEST_LOCKUP=y.

Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Kees Cook <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Wei Yongjun [Wed, 12 Aug 2020 01:34:41 +0000 (18:34 -0700)]

lib/test_lockup.c: make symbol 'test_works' static

Fix sparse build warning:

lib/test_lockup.c:403:1: warning:
symbol '__pcpu_scope_test_works' was not declared. Should it be static?

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Geert Uytterhoeven [Wed, 12 Aug 2020 01:34:38 +0000 (18:34 -0700)]

lib/test_bitops: do the full test during module init

Currently, the bitops test consists of two parts: one part is executed
during module load, the second part during module unload. This is
cumbersome for the user, as he has to perform two steps to execute all
tests, and is different from most (all?) other tests.

Merge the two parts, so both are executed during module load.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Luc Van Oostenryck [Wed, 12 Aug 2020 01:34:35 +0000 (18:34 -0700)]

lib/generic-radix-tree.c: remove unneeded __rcu

struct __genradix is defined as having its member 'root'
annotated as __rcu. But in the corresponding API RCU is not used.
Sparse reports this type mismatch as:
lib/generic-radix-tree.c:56:35: warning: incorrect type in initializer (different address spaces)
lib/generic-radix-tree.c:56:35: expected struct genradix_root *r
lib/generic-radix-tree.c:56:35: got struct genradix_root [noderef] <asn:4> *__val
with 6 other ones.

So, correct root's type by removing this unneeded __rcu.

Signed-off-by: Luc Van Oostenryck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Kent Overstreet <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Stefano Brivio [Wed, 12 Aug 2020 01:34:32 +0000 (18:34 -0700)]

lib/test_bitmap.c: add test for bitmap_cut()

Inspired by an original patch from Yury Norov: introduce a test for
bitmap_cut() that also makes sure functionality is as described for
partially overlapping src and dst.

Signed-off-by: Stefano Brivio <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Pablo Neira Ayuso <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Yury Norov <[email protected]>
Link: http://lkml.kernel.org/r/5fc45e6bbd4fa837cd9577f8a0c1d639df90a4ce.1592155364.git.sbrivio@redhat.com
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Stefano Brivio [Wed, 12 Aug 2020 01:34:29 +0000 (18:34 -0700)]

lib/bitmap.c: fix bitmap_cut() for partial overlapping case

Patch series "lib: Fix bitmap_cut() for overlaps, add test"

This patch (of 2):

Yury Norov reports that bitmap_cut() will not produce the right outcome if
src and dst partially overlap, with src pointing at some location after
dst, because the memmove() affects src before we store the bits that we
need to keep, that is, the bits preceding the cut -- as long as we the
beginning of the cut is not aligned to a long.

Fix this by storing those bits before the memmove().

Note that this is just a theoretical concern so far, as the only user of
this function, pipapo_drop() from the nftables set back-end implemented in
net/netfilter/nft_set_pipapo.c, always supplies entirely overlapping src
and dst.

Fixes: 2092767168f0 ("bitmap: Introduce bitmap_cut(): cut bits and shift remaining")
Reported-by: Yury Norov <[email protected]>
Signed-off-by: Stefano Brivio <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Pablo Neira Ayuso <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/003e38d4428cd6091ef00b5b03354f1bd7d9091e.1592155364.git.sbrivio@redhat.com
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Luc Van Oostenryck [Wed, 12 Aug 2020 01:34:26 +0000 (18:34 -0700)]

sparse: group the defines by functionality

By popular demand, reorder the defines for sparse annotations and group
them by functionality.

Signed-off-by: Luc Van Oostenryck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Miguel Ojeda <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Link: lore.kernel.org/r/CAMuHMdWQsirja-h3wBcZezk+H2Q_HShhAks8Hc8ps5fTAp=ObQ@mail.gmail.com
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Matthew Wilcox [Wed, 12 Aug 2020 01:34:23 +0000 (18:34 -0700)]

include/linux/poison.h: remove obsolete comment

When the definition was changed, the comment became stale. Just remove
it since there isn't anything useful to say here.

Fixes: b8a0255db958 ("include/linux/poison.h: use POISON_POINTER_DELTA for poison pointers")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Vasily Kulikov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Alexander A. Klimov [Wed, 12 Aug 2020 01:34:19 +0000 (18:34 -0700)]

include/: replace HTTP links with HTTPS ones

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Signed-off-by: Alexander A. Klimov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Arvind Sankar [Wed, 12 Aug 2020 01:34:16 +0000 (18:34 -0700)]

kernel.h: remove duplicate include of asm/div64.h

This seems to have been added inadvertently in commit
72deb455b5ec ("block: remove CONFIG_LBDAF")

Fixes: 72deb455b5ec ("block: remove CONFIG_LBDAF")
Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Feng Tang [Wed, 12 Aug 2020 01:34:13 +0000 (18:34 -0700)]

./Makefile: add debug option to enable function aligned on 32 bytes

Recently 0day reported many strange performance changes (regression or
improvement), in which there was no obvious relation between the culprit
commit and the benchmark at the first look, and it causes people to doubt
the test itself is wrong.

Upon further check, many of these cases are caused by the change to the
alignment of kernel text or data, as whole text/data of kernel are linked
together, change in one domain may affect alignments of other domains.

gcc has an option '-falign-functions=n' to force text aligned, and with
that option enabled, some of those performance changes will be gone, like
[1][2][3].

Add this option so that developers and 0day can easily find performance
bump caused by text alignment change, as tracking these strange bump is
quite time consuming.  Though it can't help in other cases like data
alignment changes like [4].

Following is some size data for v5.7 kernel built with a RHEL config used
in 0day:

    text      data      bss dec    filename
  19738771  13292906  5554236  38585913 vmlinux.noalign
  19758591  13297002  5529660  38585253 vmlinux.align32

Raw vmlinux size in bytes:

v5.7 v5.7+align32
253950832 254018000 +0.02%

Some benchmark data, most of them have no big change:

  * hackbench: [ -1.8%,  +0.5%]

  * fsmark: [ -3.2%,  +3.4%]  # ext4/xfs/btrfs

  * kbuild: [ -2.0%,  +0.9%]

  * will-it-scale: [ -0.5%,  +1.8%]  # mmap1/pagefault3

  * netperf:
    - TCP_CRR [+16.6%, +97.4%]
    - TCP_RR [-18.5%,  -1.8%]
    - TCP_STREAM [ -1.1%,  +1.9%]

[1] https://lore.kernel.org/lkml/20200114085637.GA29297@shao2-debian/
[2] https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
[3] https://lore.kernel.org/lkml/1d98d1f0-fe84-6df7-f5bd-f4cb2cdb7f45@intel.com/
[4] https://lore.kernel.org/lkml/20200205123216.GO12867@shao2-debian/

Signed-off-by: Feng Tang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 12 Aug 2020 01:34:10 +0000 (18:34 -0700)]

kernel: add a kernel_wait helper

Add a helper that waits for a pid and stores the status in the passed in
kernel pointer. Use it to fix the usage of kernel_wait4 in
call_usermodehelper_exec_sync that only happens to work due to the
implicit set_fs(KERNEL_DS) for kernel threads.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:34:07 +0000 (18:34 -0700)]

include/linux/xz.h: drop duplicated word

Drop the doubled word "than" in a comment.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Lasse Collin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:34:04 +0000 (18:34 -0700)]

include/linux/async_tx.h: drop duplicated word in a comment

Drop the doubled word "the" in a comment.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Dan Williams <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:34:00 +0000 (18:34 -0700)]

include/linux/exportfs.h: drop duplicated word in a comment

Drop the doubled word "a" in a comment.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:33:57 +0000 (18:33 -0700)]

include/linux/compiler-clang.h: drop duplicated word in a comment

Drop the doubled word "the" in a comment.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Luc Van Oostenryck [Wed, 12 Aug 2020 01:33:54 +0000 (18:33 -0700)]

alpha: fix annotation of io{read,write}{16,32}be()

These accessors must be used to read/write a big-endian bus. The value
returned or written is native-endian.

However, these accessors are defined using be{16,32}_to_cpu() or
cpu_to_be{16,32}() to make the endian conversion but these expect a
__be{16,32} when none is present. Keeping them would need a force cast
that would solve nothing at all.

So, do the conversion using swab{16,32}, like done in asm-generic for
similar situations.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Luc Van Oostenryck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 12 Aug 2020 01:33:50 +0000 (18:33 -0700)]

exec: use force_uaccess_begin during exec and exit

Both exec and exit want to ensure that the uaccess routines actually do
access user pointers. Use the newly added force_uaccess_begin helper
instead of an open coded set_fs for that to prepare for kernel builds
where set_fs() does not exist.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 12 Aug 2020 01:33:47 +0000 (18:33 -0700)]

uaccess: add force_uaccess_{begin,end} helpers

Add helpers to wrap the get_fs/set_fs magic for undoing any damange done
by set_fs(KERNEL_DS). There is no real functional benefit, but this
documents the intent of these calls better, and will allow stubbing the
functions out easily for kernels builds that do not allow address space
overrides in the future.

[[email protected]: drop two incorrect hunks, fix a commit log typo]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Acked-by: Greentime Hu <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 12 Aug 2020 01:33:44 +0000 (18:33 -0700)]

uaccess: remove segment_eq

segment_eq is only used to implement uaccess_kernel. Just open code
uaccess_kernel in the arch uaccess headers and remove one layer of
indirection.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Greentime Hu <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 12 Aug 2020 01:33:41 +0000 (18:33 -0700)]

riscv: include <asm/pgtable.h> in <asm/uaccess.h>

To ensure TASK_SIZE is defined for USER_DS.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 12 Aug 2020 01:33:38 +0000 (18:33 -0700)]

nds32: use uaccess_kernel in show_regs

Use the uaccess_kernel helper instead of duplicating it.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Greentime Hu <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Christoph Hellwig [Wed, 12 Aug 2020 01:33:34 +0000 (18:33 -0700)]

syscalls: use uaccess_kernel in addr_limit_user_check

Patch series "clean up address limit helpers", v2.

In preparation for eventually phasing out direct use of set_fs(), this
series removes the segment_eq() arch helper that is only used to implement
or duplicate the uaccess_kernel() API, and then adds descriptive helpers
to force the kernel address limit.

This patch (of 6):

Use the uaccess_kernel helper instead of duplicating it.

[[email protected]: arm: don't call addr_limit_user_check for nommu]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:33:31 +0000 (18:33 -0700)]

mm/zsmalloc.c: fix duplicated words

Change "as as" to "as a".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:33:28 +0000 (18:33 -0700)]

mm/zpool.c: delete duplicated word and fix grammar

Drop the repeated word "if".
Fix subject/verb agreement.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:33:26 +0000 (18:33 -0700)]

mm/vmscan.c: delete or fix duplicated words

Drop the repeated word "marked".
Change "time time" to "same time".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:33:23 +0000 (18:33 -0700)]

mm/usercopy.c: delete duplicated word

Drop the repeated word "the".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:33:19 +0000 (18:33 -0700)]

mm/slab_common.c: delete duplicated word

Drop the repeated word "and".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:33:17 +0000 (18:33 -0700)]

mm/shmem.c: delete duplicated word

Drop the repeated word "the".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:33:14 +0000 (18:33 -0700)]

mm/page_alloc.c: delete or fix duplicated words

Drop the repeated word "them" and "that".
Change "the the" to "to the".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:33:11 +0000 (18:33 -0700)]

mm/nommu.c: delete duplicated words

Drop the repeated word "that" in two places.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:33:08 +0000 (18:33 -0700)]

mm/migrate.c: delete duplicated word

Drop the repeated word "and".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:33:05 +0000 (18:33 -0700)]

mm/memory.c: delete duplicated words

Drop the repeated word "to" in two places.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:33:02 +0000 (18:33 -0700)]

mm/memcontrol.c: delete duplicated words

Drop the repeated word "down".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:32:59 +0000 (18:32 -0700)]

mm/hugetlb.c: delete duplicated words

Drop the repeated word "the" in two places.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:32:56 +0000 (18:32 -0700)]

mm/hmm.c: delete duplicated word

Drop the repeated word "pages".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:32:53 +0000 (18:32 -0700)]

mm/filemap.c: delete duplicated word

Drop the repeated word "the".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:32:49 +0000 (18:32 -0700)]

mm/compaction.c: delete duplicated word

Drop the repeated word "a".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Arvind Sankar [Wed, 12 Aug 2020 01:32:46 +0000 (18:32 -0700)]

sparc: drop unused MAX_PHYSADDR_BITS

The macro is not used anywhere, so remove the definition.

Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Arvind Sankar [Wed, 12 Aug 2020 01:32:43 +0000 (18:32 -0700)]

sh/mm: drop unused MAX_PHYSADDR_BITS

The macro is not used anywhere, so remove the definition.

Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:32:40 +0000 (18:32 -0700)]

include/linux/memcontrol.h: drop duplicate word and fix spello

Drop the doubled word "for" in a comment.
Fix spello of "incremented".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Chris Down <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:32:36 +0000 (18:32 -0700)]

include/linux/frontswap.h: drop duplicated word in a comment

Drop the doubled word "in" in a comment.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:32:33 +0000 (18:32 -0700)]

include/linux/highmem.h: fix duplicated words in a comment

Change the doubled word "is" in a comment to "it is".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:32:30 +0000 (18:32 -0700)]

mm: drop duplicated words in <linux/mm.h>

Drop the doubled words "to" and "the".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Randy Dunlap [Wed, 12 Aug 2020 01:32:27 +0000 (18:32 -0700)]

mm: drop duplicated words in <linux/pgtable.h>

Drop the doubled words "used" and "by".

Drop the repeated acronym "TLB" and make several other fixes around it.
(capital letters, spellos)

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Charan Teja Reddy [Wed, 12 Aug 2020 01:32:24 +0000 (18:32 -0700)]

mm, memory_hotplug: update pcp lists everytime onlining a memory block

When onlining a first memory block in a zone, pcp lists are not updated
thus pcp struct will have the default setting of ->high = 0,->batch = 1.

This means till the second memory block in a zone(if it have) is onlined
the pcp lists of this zone will not contain any pages because pcp's
->count is always greater than ->high thus free_pcppages_bulk() is called
to free batch size(=1) pages every time system wants to add a page to the
pcp list through free_unref_page().

To put this in a word, system is not using benefits offered by the pcp
lists when there is a single onlineable memory block in a zone. Correct
this by always updating the pcp lists when memory block is onlined.

Fixes: 1f522509c77a ("mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset")
Signed-off-by: Charan Teja Reddy <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Vinayak Menon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Jia He [Wed, 12 Aug 2020 01:32:20 +0000 (18:32 -0700)]

mm/memory_hotplug: fix unpaired mem_hotplug_begin/done

When check_memblock_offlined_cb() returns failed rc(e.g. the memblock is
online at that time), mem_hotplug_begin/done is unpaired in such case.

Therefore a warning:
Call Trace:
  percpu_up_write+0x33/0x40
  try_remove_memory+0x66/0x120
  ? _cond_resched+0x19/0x30
  remove_memory+0x2b/0x40
  dev_dax_kmem_remove+0x36/0x72 [kmem]
  device_release_driver_internal+0xf0/0x1c0
  device_release_driver+0x12/0x20
  bus_remove_device+0xe1/0x150
  device_del+0x17b/0x3e0
  unregister_dev_dax+0x29/0x60
  devm_action_release+0x15/0x20
  release_nodes+0x19a/0x1e0
  devres_release_all+0x3f/0x50
  device_release_driver_internal+0x100/0x1c0
  driver_detach+0x4c/0x8f
  bus_remove_driver+0x5c/0xd0
  driver_unregister+0x31/0x50
  dax_pmem_exit+0x10/0xfe0 [dax_pmem]

Fixes: f1037ec0cc8a ("mm/memory_hotplug: fix remove_memory() lockdep splat")
Signed-off-by: Jia He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: <[email protected]> [5.6+]
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chuhong Yuan <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Kaly Xin <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Jia He [Wed, 12 Aug 2020 01:32:16 +0000 (18:32 -0700)]

mm/memory_hotplug: introduce default dummy memory_add_physaddr_to_nid()

This is to introduce a general dummy helper. memory_add_physaddr_to_nid()
is a fallback option to get the nid in case NUMA_NO_NID is detected.

After this patch, arm64/sh/s390 can simply use the general dummy version.
PowerPC/x86/ia64 will still use their specific version.

This is the preparation to set a fallback value for dev_dax->target_node.

Signed-off-by: Jia He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Chuhong Yuan <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Kaly Xin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Daniel Jordan [Wed, 12 Aug 2020 01:32:12 +0000 (18:32 -0700)]

x86/mm: use max memory block size on bare metal

Some of our servers spend significant time at kernel boot initializing
memory block sysfs directories and then creating symlinks between them and
the corresponding nodes.  The slowness happens because the machines get
stuck with the smallest supported memory block size on x86 (128M), which
results in 16,288 directories to cover the 2T of installed RAM.  The
search for each memory block is noticeable even with commit 4fb6eabf1037
("drivers/base/memory.c: cache memory blocks in xarray to accelerate
lookup").

Commit 078eb6aa50dc ("x86/mm/memory_hotplug: determine block size based on
the end of boot memory") chooses the block size based on alignment with
memory end.  That addresses hotplug failures in qemu guests, but for bare
metal systems whose memory end isn't aligned to even the smallest size, it
leaves them at 128M.

Make kernels that aren't running on a hypervisor use the largest supported
size (2G) to minimize overhead on big machines.  Kernel boot goes 7%
faster on the aforementioned servers, shaving off half a second.

[[email protected]: v3]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Daniel Jordan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Krzysztof Kozlowski [Wed, 12 Aug 2020 01:32:09 +0000 (18:32 -0700)]

mm: mmu_notifier: fix and extend kerneldoc

Fix W=1 compile warnings (invalid kerneldoc):

    mm/mmu_notifier.c:187: warning: Function parameter or member 'interval_sub' not described in 'mmu_interval_read_bgin'
    mm/mmu_notifier.c:708: warning: Function parameter or member 'subscription' not described in 'mmu_notifier_registr'
    mm/mmu_notifier.c:708: warning: Excess function parameter 'mn' description in 'mmu_notifier_register'
    mm/mmu_notifier.c:880: warning: Function parameter or member 'subscription' not described in 'mmu_notifier_put'
    mm/mmu_notifier.c:880: warning: Excess function parameter 'mn' description in 'mmu_notifier_put'
    mm/mmu_notifier.c:982: warning: Function parameter or member 'ops' not described in 'mmu_interval_notifier_insert'

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Waiman Long [Wed, 12 Aug 2020 01:32:06 +0000 (18:32 -0700)]

include/linux/sched/mm.h: optimize current_gfp_context()

The current_gfp_context() converts a number of PF_MEMALLOC_* per-process
flags into the corresponding GFP_* flags for memory allocation.  In that
function, current->flags is accessed 3 times.  That may lead to duplicated
access of the same memory location.

This is not usually a problem with minimal debug config options on as the
compiler can optimize away the duplicated memory accesses.  With most of
the debug config options on, however, that may not be the case.  For
example, the x86-64 object size of the __need_fs_reclaim() in a debug
kernel that calls current_gfp_context() was 309 bytes.  With this patch
applied, the object size is reduced to 202 bytes.  This is a saving of 107
bytes and will probably be slightly faster too.

Use READ_ONCE() to access current->flags to prevent the compiler from
possibly accessing current->flags multiple times.

Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Mike Kravetz [Wed, 12 Aug 2020 01:32:03 +0000 (18:32 -0700)]

cma: don't quit at first error when activating reserved areas

The routine cma_init_reserved_areas is designed to activate all
reserved cma areas.  It quits when it first encounters an error.
This can leave some areas in a state where they are reserved but
not activated.  There is no feedback to code which performed the
reservation.  Attempting to allocate memory from areas in such a
state will result in a BUG.

Modify cma_init_reserved_areas to always attempt to activate all
areas.  The called routine, cma_activate_area is responsible for
leaving the area in a valid state.  No one is making active use
of returned error codes, so change the routine to void.

How to reproduce:  This example uses kernelcore, hugetlb and cma
as an easy way to reproduce.  However, this is a more general cma
issue.

Two node x86 VM 16GB total, 8GB per node
Kernel command line parameters, kernelcore=4G hugetlb_cma=8G
Related boot time messages,
  hugetlb_cma: reserve 8192 MiB, up to 4096 MiB per node
  cma: Reserved 4096 MiB at 0x0000000100000000
  hugetlb_cma: reserved 4096 MiB on node 0
  cma: Reserved 4096 MiB at 0x0000000300000000
  hugetlb_cma: reserved 4096 MiB on node 1
  cma: CMA area hugetlb could not be activated

# echo 8 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  ...
  Call Trace:
    bitmap_find_next_zero_area_off+0x51/0x90
    cma_alloc+0x1a5/0x310
    alloc_fresh_huge_page+0x78/0x1a0
    alloc_pool_huge_page+0x6f/0xf0
    set_max_huge_pages+0x10c/0x250
    nr_hugepages_store_common+0x92/0x120
    ? __kmalloc+0x171/0x270
    kernfs_fop_write+0xc1/0x1a0
    vfs_write+0xc7/0x1f0
    ksys_write+0x5f/0xe0
    do_syscall_64+0x4d/0x90
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: c64be2bb1c6e ("drivers: add Contiguous Memory Allocator")
Signed-off-by: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Acked-by: Barry Song <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Barry Song [Wed, 12 Aug 2020 01:32:00 +0000 (18:32 -0700)]

mm: hugetlb: fix the name of hugetlb CMA

Once we enable CMA_DEBUGFS, we will get the below errors: directory
'cma-hugetlb' with parent 'cma' already present.

We should have different names for different CMA areas.

Signed-off-by: Barry Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Barry Song [Wed, 12 Aug 2020 01:31:57 +0000 (18:31 -0700)]

mm: cma: fix the name of CMA areas

Patch series "mm: fix the names of general cma and hugetlb cma", v2.

The current code of CMA can only work when users pass a const string as
name parameter.  we need to fix the way to handle names in CMA.  On the
other hand, to avoid name conflicts after enabling CMA_DEBUGFS, each
hugetlb should get a different CMA name.

This patch (of 2):

If users give a name saved in stack, the current code will generate magic
pointer.  if users don't give a name(NULL), kasprintf() will always return
NULL as we are at the early stage.  that means cma_init_reserved_mem()
will return -ENOMEM if users set name parameter as NULL.

[[email protected]: return cma->name directly in cma_get_name]
Link: https://github.com/ClangBuiltLinux/linux/issues/1063
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Barry Song <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Jianqun Xu [Wed, 12 Aug 2020 01:31:54 +0000 (18:31 -0700)]

mm/cma.c: fix NULL pointer dereference when cma could not be activated

In some case the cma area could not be activated, but the cma_alloc be
used under this case, then the kernel will crash caused by NULL pointer
dereference.

Add bitmap valid check in cma_alloc to avoid this issue.

Signed-off-by: Jianqun Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Anshuman Khandual [Wed, 12 Aug 2020 01:31:51 +0000 (18:31 -0700)]

mm/vmstat: add events for THP migration without split

Add following new vmstat events which will help in validating THP
migration without split. Statistics reported through these new VM events
will help in performance debugging.

1. THP_MIGRATION_SUCCESS
2. THP_MIGRATION_FAILURE
3. THP_MIGRATION_SPLIT

In addition, these new events also update normal page migration statistics
appropriately via PGMIGRATE_SUCCESS and PGMIGRATE_FAILURE. While here,
this updates current trace event 'mm_migrate_pages' to accommodate now
available THP statistics.

[[email protected]: s/hpage_nr_pages/thp_nr_pages/]
[[email protected]: v2]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: s/thp_nr_pages/hpage_nr_pages/]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Signed-off-by: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Daniel Jordan <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Yang Shi [Wed, 12 Aug 2020 01:31:48 +0000 (18:31 -0700)]

mm: thp: remove debug_cow switch

Since commit 3917c80280c93a7123f ("thp: change CoW semantics for
anon-THP"), the CoW page fault of THP has been rewritten, debug_cow is not
used anymore. So, just remove it.

Signed-off-by: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Ralph Campbell [Wed, 12 Aug 2020 01:31:45 +0000 (18:31 -0700)]

mm/migrate: add migrate-shared test for migrate_vma_*()

Add a migrate_vma_*() self test for mmap(MAP_SHARED) to verify that
!vma_anonymous() ranges won't be migrated.

Signed-off-by: Ralph Campbell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: "Bharata B Rao" <[email protected]>
Cc: Shuah Khan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Ralph Campbell [Wed, 12 Aug 2020 01:31:41 +0000 (18:31 -0700)]

mm/migrate: optimize migrate_vma_setup() for holes

Patch series "mm/migrate: optimize migrate_vma_setup() for holes".

A simple optimization for migrate_vma_*() when the source vma is not an
anonymous vma and a new test case to exercise it.

This patch (of 2):

When migrating system memory to device private memory, if the source
address range is a valid VMA range and there is no memory or a zero page,
the source PFN array is marked as valid but with no PFN.

This lets the device driver allocate private memory and clear it, then
insert the new device private struct page into the CPU's page tables when
migrate_vma_pages() is called. migrate_vma_pages() only inserts the new
page if the VMA is an anonymous range.

There is no point in telling the device driver to allocate device private
memory and then not migrate the page. Instead, mark the source PFN array
entries as not migrating to avoid this overhead.

[[email protected]: v2]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ralph Campbell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: "Bharata B Rao" <[email protected]>
Cc: Shuah Khan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Mike Kravetz [Wed, 12 Aug 2020 01:31:38 +0000 (18:31 -0700)]

hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem

Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization") requires callers of huge_pte_alloc to hold i_mmap_rwsem
in at least read mode.  This is because the explicit locking in
huge_pmd_share (called by huge_pte_alloc) was removed.  When restructuring
the code, the call to huge_pte_alloc in the else block at the beginning of
hugetlb_fault was missed.

Unfortunately, that else clause is exercised when there is no page table
entry.  This will likely lead to a call to huge_pmd_share.  If
huge_pmd_share thinks pmd sharing is possible, it will traverse the
mapping tree (i_mmap) without holding i_mmap_rwsem.  If someone else is
modifying the tree, bad things such as addressing exceptions or worse
could happen.

Simply remove the else clause.  It should have been removed previously.
The code following the else will call huge_pte_alloc with the appropriate
locking.

To prevent this type of issue in the future, add routines to assert that
i_mmap_rwsem is held, and call these routines in huge pmd sharing
routines.

Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Suggested-by: Matthew Wilcox <[email protected]>
Signed-off-by: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: "Kirill A.Shutemov" <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Prakash Sangappa <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Mike Kravetz [Wed, 12 Aug 2020 01:31:35 +0000 (18:31 -0700)]

hugetlbfs: prevent filesystem stacking of hugetlbfs

syzbot found issues with having hugetlbfs on a union/overlay as reported
in [1].  Due to the limitations (no write) and special functionality of
hugetlbfs, it does not work well in filesystem stacking.  There are no
know use cases for hugetlbfs stacking.  Rather than making modifications
to get hugetlbfs working in such environments, simply prevent stacking.

[1] https://lore.kernel.org/linux-mm/000000000000b4684e05a2968ca6@google.com/

Reported-by: [email protected]
Suggested-by: Amir Goldstein <[email protected]>
Signed-off-by: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Miklos Szeredi <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Colin Walters <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Yafang Shao [Wed, 12 Aug 2020 01:31:32 +0000 (18:31 -0700)]

mm, oom: show process exiting information in __oom_kill_process()

When the OOM killer finds a victim and tryies to kill it, if the victim is
already exiting, the task mm will be NULL and no process will be killed.
But the dump_header() has been already executed, so it will be strange to
dump so much information without killing a process. We'd better show some
helpful information to indicate why this happens.

Suggested-by: David Rientjes <[email protected]>
Signed-off-by: Yafang Shao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Qian Cai <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Michal Hocko [Wed, 12 Aug 2020 01:31:28 +0000 (18:31 -0700)]

doc, mm: clarify /proc/<pid>/oom_score value range

The exported value includes oom_score_adj so the range is no [0, 1000] as
described in the previous section but rather [0, 2000]. Mention that fact
explicitly.

Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Yafang Shao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Michal Hocko [Wed, 12 Aug 2020 01:31:25 +0000 (18:31 -0700)]

doc, mm: sync up oom_score_adj documentation

There are at least two notes in the oom section. The 3% discount for root
processes is gone since d46078b28889 ("mm, oom: remove 3% bonus for
CAP_SYS_ADMIN processes").

Likewise children of the selected oom victim are not sacrificed since
bbbe48029720 ("mm, oom: remove 'prefer children over parent' heuristic")

Drop both of them.

Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Yafang Shao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Yafang Shao [Wed, 12 Aug 2020 01:31:22 +0000 (18:31 -0700)]

mm, oom: make the calculation of oom badness more accurate

Recently we found an issue on our production environment that when memcg
oom is triggered the oom killer doesn't chose the process with largest
resident memory but chose the first scanned process.  Note that all
processes in this memcg have the same oom_score_adj, so the oom killer
should chose the process with largest resident memory.

Bellow is part of the oom info, which is enough to analyze this issue.
[7516987.983223] memory: usage 16777216kB, limit 16777216kB, failcnt 52843037
[7516987.983224] memory+swap: usage 16777216kB, limit 9007199254740988kB, failcnt 0
[7516987.983225] kmem: usage 301464kB, limit 9007199254740988kB, failcnt 0
[...]
[7516987.983293] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[7516987.983510] [ 5740]     0  5740      257        1    32768        0          -998 pause
[7516987.983574] [58804]     0 58804     4594      771    81920        0          -998 entry_point.bas
[7516987.983577] [58908]     0 58908     7089      689    98304        0          -998 cron
[7516987.983580] [58910]     0 58910    16235     5576   163840        0          -998 supervisord
[7516987.983590] [59620]     0 59620    18074     1395   188416        0          -998 sshd
[7516987.983594] [59622]     0 59622    18680     6679   188416        0          -998 python
[7516987.983598] [59624]     0 59624  1859266     5161   548864        0          -998 odin-agent
[7516987.983600] [59625]     0 59625   707223     9248   983040        0          -998 filebeat
[7516987.983604] [59627]     0 59627   416433    64239   774144        0          -998 odin-log-agent
[7516987.983607] [59631]     0 59631   180671    15012   385024        0          -998 python3
[7516987.983612] [61396]     0 61396   791287     3189   352256        0          -998 client
[7516987.983615] [61641]     0 61641  1844642    29089   946176        0          -998 client
[7516987.983765] [ 9236]     0  9236     2642      467    53248        0          -998 php_scanner
[7516987.983911] [42898]     0 42898    15543      838   167936        0          -998 su
[7516987.983915] [42900]  1000 42900     3673      867    77824        0          -998 exec_script_vr2
[7516987.983918] [42925]  1000 42925    36475    19033   335872        0          -998 python
[7516987.983921] [57146]  1000 57146     3673      848    73728        0          -998 exec_script_J2p
[7516987.983925] [57195]  1000 57195   186359    22958   491520        0          -998 python2
[7516987.983928] [58376]  1000 58376   275764    14402   290816        0          -998 rosmaster
[7516987.983931] [58395]  1000 58395   155166     4449   245760        0          -998 rosout
[7516987.983935] [58406]  1000 58406 18285584  3967322 37101568        0          -998 data_sim
[7516987.984221] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=3aa16c9482ae3a6f6b78bda68a55d32c87c99b985e0f11331cddf05af6c4d753,mems_allowed=0-1,oom_memcg=/kubepods/podf1c273d3-9b36-11ea-b3df-246e9693c184,task_memcg=/kubepods/podf1c273d3-9b36-11ea-b3df-246e9693c184/1f246a3eeea8f70bf91141eeaf1805346a666e225f823906485ea0b6c37dfc3d,task=pause,pid=5740,uid=0
[7516987.984254] Memory cgroup out of memory: Killed process 5740 (pause) total-vm:1028kB, anon-rss:4kB, file-rss:0kB, shmem-rss:0kB
[7516988.092344] oom_reaper: reaped process 5740 (pause), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

We can find that the first scanned process 5740 (pause) was killed, but
its rss is only one page.  That is because, when we calculate the oom
badness in oom_badness(), we always ignore the negtive point and convert
all of these negtive points to 1.  Now as oom_score_adj of all the
processes in this targeted memcg have the same value -998, the points of
these processes are all negtive value.  As a result, the first scanned
process will be killed.

The oom_socre_adj (-998) in this memcg is set by kubelet, because it is a
a Guaranteed pod, which has higher priority to prevent from being killed
by system oom.

To fix this issue, we should make the calculation of oom point more
accurate.  We can achieve it by convert the chosen_point from 'unsigned
long' to 'long'.

[[email protected]: reported a issue in the previous version]
[[email protected]: fixed the issue reported by Cai]
[[email protected]: add the comment in proc_oom_score()]
[[email protected]: v3]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Naresh Kamboju <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Qian Cai <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Yanfei Xu [Wed, 12 Aug 2020 01:31:19 +0000 (18:31 -0700)]

include/linux/mempolicy.h: fix typo

Change "interlave" to "interleave".

Signed-off-by: Yanfei Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Wenchao Hao [Wed, 12 Aug 2020 01:31:16 +0000 (18:31 -0700)]

mm/mempolicy.c: check parameters first in kernel_get_mempolicy

Previous implementatoin calls untagged_addr() before error check, while if
the error check failed and return EINVAL, the untagged_addr() call is just
useless work.

Signed-off-by: Wenchao Hao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Krzysztof Kozlowski [Wed, 12 Aug 2020 01:31:13 +0000 (18:31 -0700)]

mm: mempolicy: fix kerneldoc of numa_map_to_online_node()

Fix W=1 compile warnings (invalid kerneldoc):

mm/mempolicy.c:137: warning: Function parameter or member 'node' not described in 'numa_map_to_online_node'
mm/mempolicy.c:137: warning: Excess function parameter 'nid' description in 'numa_map_to_online_node'

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Alex Shi [Wed, 12 Aug 2020 01:31:10 +0000 (18:31 -0700)]

mm/compaction: correct the comments of compact_defer_shift

There is no compact_defer_limit. It should be compact_defer_shift in
use. and add compact_order_failed explanation.

Signed-off-by: Alex Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Nitin Gupta [Wed, 12 Aug 2020 01:31:07 +0000 (18:31 -0700)]

mm: use unsigned types for fragmentation score

Proactive compaction uses per-node/zone "fragmentation score" which is
always in range [0, 100], so use unsigned type of these scores as well as
for related constants.

Signed-off-by: Nitin Gupta <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Baoquan He <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Nitin Gupta [Wed, 12 Aug 2020 01:31:04 +0000 (18:31 -0700)]

mm: fix compile error due to COMPACTION_HPAGE_ORDER

Fix compile error when COMPACTION_HPAGE_ORDER is assigned to
HUGETLB_PAGE_ORDER. The correct way to check if this constant is defined
is to check for CONFIG_HUGETLBFS.

Reported-by: Nathan Chancellor <[email protected]>
Signed-off-by: Nitin Gupta <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Nitin Gupta [Wed, 12 Aug 2020 01:31:00 +0000 (18:31 -0700)]

mm: proactive compaction

For some applications, we need to allocate almost all memory as hugepages.
However, on a running system, higher-order allocations can fail if the
memory is fragmented.  Linux kernel currently does on-demand compaction as
we request more hugepages, but this style of compaction incurs very high
latency.  Experiments with one-time full memory compaction (followed by
hugepage allocations) show that kernel is able to restore a highly
fragmented memory state to a fairly compacted memory state within <1 sec
for a 32G system.  Such data suggests that a more proactive compaction can
help us allocate a large fraction of memory as hugepages keeping
allocation latencies low.

For a more proactive compaction, the approach taken here is to define a
new sysctl called 'vm.compaction_proactiveness' which dictates bounds for
external fragmentation which kcompactd tries to maintain.

The tunable takes a value in range [0, 100], with a default of 20.

Note that a previous version of this patch [1] was found to introduce too
many tunables (per-order extfrag{low, high}), but this one reduces them to
just one sysctl.  Also, the new tunable is an opaque value instead of
asking for specific bounds of "external fragmentation", which would have
been difficult to estimate.  The internal interpretation of this opaque
value allows for future fine-tuning.

Currently, we use a simple translation from this tunable to [low, high]
"fragmentation score" thresholds (low=100-proactiveness, high=low+10%).
The score for a node is defined as weighted mean of per-zone external
fragmentation.  A zone's present_pages determines its weight.

To periodically check per-node score, we reuse per-node kcompactd threads,
which are woken up every 500 milliseconds to check the same.  If a node's
score exceeds its high threshold (as derived from user-provided
proactiveness value), proactive compaction is started until its score
reaches its low threshold value.  By default, proactiveness is set to 20,
which implies threshold values of low=80 and high=90.

This patch is largely based on ideas from Michal Hocko [2].  See also the
LWN article [3].

Performance data
================

System: x64_64, 1T RAM, 80 CPU threads.
Kernel: 5.6.0-rc3 + this patch

echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

Before starting the driver, the system was fragmented from a userspace
program that allocates all memory and then for each 2M aligned section,
frees 3/4 of base pages using munmap.  The workload is mainly anonymous
userspace pages, which are easy to move around.  I intentionally avoided
unmovable pages in this test to see how much latency we incur when
hugepage allocations hit direct compaction.

1. Kernel hugepage allocation latencies

With the system in such a fragmented state, a kernel driver then allocates
as many hugepages as possible and measures allocation latency:

(all latency values are in microseconds)

- With vanilla 5.6.0-rc3

  percentile latency
  –––––––––– –––––––
   5    7894
  10    9496
  25   12561
  30   15295
  40   18244
  50   21229
  60   27556
  75   30147
  80   31047
  90   32859
  95   33799

Total 2M hugepages allocated = 383859 (749G worth of hugepages out of 762G
total free => 98% of free memory could be allocated as hugepages)

- With 5.6.0-rc3 + this patch, with proactiveness=20

sysctl -w vm.compaction_proactiveness=20

  percentile latency
  –––––––––– –––––––
   5       2
  10       2
  25       3
  30       3
  40       3
  50       4
  60       4
  75       4
  80       4
  90       5
  95     429

Total 2M hugepages allocated = 384105 (750G worth of hugepages out of 762G
total free => 98% of free memory could be allocated as hugepages)

2. JAVA heap allocation

In this test, we first fragment memory using the same method as for (1).

Then, we start a Java process with a heap size set to 700G and request the
heap to be allocated with THP hugepages.  We also set THP to madvise to
allow hugepage backing of this heap.

/usr/bin/time
java -Xms700G -Xmx700G -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch

The above command allocates 700G of Java heap using hugepages.

- With vanilla 5.6.0-rc3

17.39user 1666.48system 27:37.89elapsed

- With 5.6.0-rc3 + this patch, with proactiveness=20

8.35user 194.58system 3:19.62elapsed

Elapsed time remains around 3:15, as proactiveness is further increased.

Note that proactive compaction happens throughout the runtime of these
workloads.  The situation of one-time compaction, sufficient to supply
hugepages for following allocation stream, can probably happen for more
extreme proactiveness values, like 80 or 90.

In the above Java workload, proactiveness is set to 20.  The test starts
with a node's score of 80 or higher, depending on the delay between the
fragmentation step and starting the benchmark, which gives more-or-less
time for the initial round of compaction.  As t he benchmark consumes
hugepages, node's score quickly rises above the high threshold (90) and
proactive compaction starts again, which brings down the score to the low
threshold level (80).  Repeat.

bpftrace also confirms proactive compaction running 20+ times during the
runtime of this Java benchmark.  kcompactd threads consume 100% of one of
the CPUs while it tries to bring a node's score within thresholds.

Backoff behavior
================

Above workloads produce a memory state which is easy to compact.  However,
if memory is filled with unmovable pages, proactive compaction should
essentially back off.  To test this aspect:

- Created a kernel driver that allocates almost all memory as hugepages
  followed by freeing first 3/4 of each hugepage.
- Set proactiveness=40
- Note that proactive_compact_node() is deferred maximum number of times
  with HPAGE_FRAG_CHECK_INTERVAL_MSEC of wait between each check
  (=> ~30 seconds between retries).

[1] https://patchwork.kernel.org/patch/11098289/
[2] https://lore.kernel.org/linux-mm/20161230131412 [email protected]/
[3] https://lwn.net/Articles/817905/

Signed-off-by: Nitin Gupta <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Reviewed-by: Khalid Aziz <[email protected]>
Reviewed-by: Oleksandr Natalenko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Oleksandr Natalenko <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Michal Koutný [Wed, 12 Aug 2020 01:30:57 +0000 (18:30 -0700)]

/proc/PID/smaps: consistent whitespace output format

The keys in smaps output are padded to fixed width with spaces. All
except for THPeligible that uses tabs (only since commit c06306696f83
("mm: thp: fix false negative of shmem vma's THP eligibility")).

Unify the output formatting to save time debugging some naïve parsers.
(Part of the unification is also aligning FilePmdMapped with others.)

Signed-off-by: Michal Koutný <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Yang Shi <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Joonsoo Kim [Wed, 12 Aug 2020 01:30:54 +0000 (18:30 -0700)]

mm/vmscan: restore active/inactive ratio for anonymous LRU

Now that workingset detection is implemented for anonymous LRU, we don't
need large inactive list to allow detecting frequently accessed pages
before they are reclaimed, anymore. This effectively reverts the
temporary measure put in by commit "mm/vmscan: make active/inactive ratio
as 1:1 for anon lru".

Signed-off-by: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Joonsoo Kim [Wed, 12 Aug 2020 01:30:50 +0000 (18:30 -0700)]

mm/swap: implement workingset detection for anonymous LRU

This patch implements workingset detection for anonymous LRU. All the
infrastructure is implemented by the previous patches so this patch just
activates the workingset detection by installing/retrieving the shadow
entry and adding refault calculation.

Signed-off-by: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Joonsoo Kim [Wed, 12 Aug 2020 01:30:47 +0000 (18:30 -0700)]

mm/swapcache: support to handle the shadow entries

Workingset detection for anonymous page will be implemented in the
following patch and it requires to store the shadow entries into the
swapcache. This patch implements an infrastructure to store the shadow
entry in the swapcache.

Signed-off-by: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Joonsoo Kim [Wed, 12 Aug 2020 01:30:43 +0000 (18:30 -0700)]

mm/workingset: prepare the workingset detection infrastructure for anon LRU

To prepare the workingset detection for anon LRU, this patch splits
workingset event counters for refault, activate and restore into anon and
file variants, as well as the refaults counter in struct lruvec.

Signed-off-by: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Joonsoo Kim [Wed, 12 Aug 2020 01:30:40 +0000 (18:30 -0700)]

mm/vmscan: protect the workingset on anonymous LRU

In current implementation, newly created or swap-in anonymous page is
started on active list.  Growing active list results in rebalancing
active/inactive list so old pages on active list are demoted to inactive
list.  Hence, the page on active list isn't protected at all.

Following is an example of this situation.

Assume that 50 hot pages on active list.  Numbers denote the number of
pages on active/inactive list (active | inactive).

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (used-once) pages
50(uo) | 50(h)

3. workload: another 50 newly created (used-once) pages
50(uo) | 50(uo), swap-out 50(h)

This patch tries to fix this issue.  Like as file LRU, newly created or
swap-in anonymous pages will be inserted to the inactive list.  They are
promoted to active list if enough reference happens.  This simple
modification changes the above example as following.

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (used-once) pages
50(h) | 50(uo)

3. workload: another 50 newly created (used-once) pages
50(h) | 50(uo), swap-out 50(uo)

As you can see, hot pages on active list would be protected.

Note that, this implementation has a drawback that the page cannot be
promoted and will be swapped-out if re-access interval is greater than the
size of inactive list but less than the size of total(active+inactive).
To solve this potential issue, following patch will apply workingset
detection similar to the one that's already applied to file LRU.

Signed-off-by: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Joonsoo Kim [Wed, 12 Aug 2020 01:30:36 +0000 (18:30 -0700)]

mm/vmscan: make active/inactive ratio as 1:1 for anon lru

Patch series "workingset protection/detection on the anonymous LRU list", v7.

* PROBLEM
In current implementation, newly created or swap-in anonymous page is
started on the active list.  Growing the active list results in
rebalancing active/inactive list so old pages on the active list are
demoted to the inactive list.  Hence, hot page on the active list isn't
protected at all.

Following is an example of this situation.

Assume that 50 hot pages on active list and system can contain total 100
pages.  Numbers denote the number of pages on active/inactive list (active
| inactive).  (h) stands for hot pages and (uo) stands for used-once
pages.

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (used-once) pages
50(uo) | 50(h)

3. workload: another 50 newly created (used-once) pages
50(uo) | 50(uo), swap-out 50(h)

As we can see, hot pages are swapped-out and it would cause swap-in later.

* SOLUTION
Since this is what we want to avoid, this patchset implements workingset
protection.  Like as the file LRU list, newly created or swap-in anonymous
page is started on the inactive list.  Also, like as the file LRU list, if
enough reference happens, the page will be promoted.  This simple
modification changes the above example as following.

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (used-once) pages
50(h) | 50(uo)

3. workload: another 50 newly created (used-once) pages
50(h) | 50(uo), swap-out 50(uo)

hot pages remains in the active list. :)

* EXPERIMENT
I tested this scenario on my test bed and confirmed that this problem
happens on current implementation. I also checked that it is fixed by
this patchset.

* SUBJECT
workingset detection

* PROBLEM
Later part of the patchset implements the workingset detection for the
anonymous LRU list.  There is a corner case that workingset protection
could cause thrashing.  If we can avoid thrashing by workingset detection,
we can get the better performance.

Following is an example of thrashing due to the workingset protection.

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (will be hot) pages
50(h) | 50(wh)

3. workload: another 50 newly created (used-once) pages
50(h) | 50(uo), swap-out 50(wh)

4. workload: 50 (will be hot) pages
50(h) | 50(wh), swap-in 50(wh)

5. workload: another 50 newly created (used-once) pages
50(h) | 50(uo), swap-out 50(wh)

6. repeat 4, 5

Without workingset detection, this kind of workload cannot be promoted and
thrashing happens forever.

* SOLUTION
Therefore, this patchset implements workingset detection.  All the
infrastructure for workingset detecion is already implemented, so there is
not much work to do.  First, extend workingset detection code to deal with
the anonymous LRU list.  Then, make swap cache handles the exceptional
value for the shadow entry.  Lastly, install/retrieve the shadow value
into/from the swap cache and check the refault distance.

* EXPERIMENT
I made a test program to imitates above scenario and confirmed that
problem exists.  Then, I checked that this patchset fixes it.

My test setup is a virtual machine with 8 cpus and 6100MB memory.  But,
the amount of the memory that the test program can use is about 280 MB.
This is because the system uses large ram-backed swap and large ramdisk to
capture the trace.

Test scenario is like as below.

1. allocate cold memory (512MB)
2. allocate hot-1 memory (96MB)
3. activate hot-1 memory (96MB)
4. allocate another hot-2 memory (96MB)
5. access cold memory (128MB)
6. access hot-2 memory (96MB)
7. repeat 5, 6

Since hot-1 memory (96MB) is on the active list, the inactive list can
contains roughly 190MB pages.  hot-2 memory's re-access interval (96+128
MB) is more 190MB, so it cannot be promoted without workingset detection
and swap-in/out happens repeatedly.  With this patchset, workingset
detection works and promotion happens.  Therefore, swap-in/out occurs
less.

Here is the result. (average of 5 runs)

type swap-in swap-out
base 863240 989945
patch 681565 809273

As we can see, patched kernel do less swap-in/out.

* OVERALL TEST (ebizzy using modified random function)
ebizzy is the test program that main thread allocates lots of memory and
child threads access them randomly during the given times.  Swap-in will
happen if allocated memory is larger than the system memory.

The random function that represents the zipf distribution is used to make
hot/cold memory.  Hot/cold ratio is controlled by the parameter.  If the
parameter is high, hot memory is accessed much larger than cold one.  If
the parameter is low, the number of access on each memory would be
similar.  I uses various parameters in order to show the effect of
patchset on various hot/cold ratio workload.

My test setup is a virtual machine with 8 cpus, 1024 MB memory and 5120 MB
ram swap.

Result format is as following.

param: 1-1024-0.1
- 1 (number of thread)
- 1024 (allocated memory size, MB)
- 0.1 (zipf distribution alpha,
0.1 works like as roughly uniform random,
1.3 works like as small portion of memory is hot and the others are cold)

pswpin: smaller is better
std: standard deviation
improvement: negative is better

* single thread
           param        pswpin       std       improvement
      base 1-1024.0-0.1 14101983.40   79441.19
      prot 1-1024.0-0.1 14065875.80  136413.01  (   -0.26 )
    detect 1-1024.0-0.1 13910435.60  100804.82  (   -1.36 )
      base 1-1024.0-0.7 7998368.80   43469.32
      prot 1-1024.0-0.7 7622245.80   88318.74  (   -4.70 )
    detect 1-1024.0-0.7 7618515.20   59742.07  (   -4.75 )
      base 1-1024.0-1.3 1017400.80   38756.30
      prot 1-1024.0-1.3  940464.60   29310.69  (   -7.56 )
    detect 1-1024.0-1.3  945511.40   24579.52  (   -7.07 )
      base 1-1280.0-0.1 22895541.40   50016.08
      prot 1-1280.0-0.1 22860305.40   51952.37  (   -0.15 )
    detect 1-1280.0-0.1 22705565.20   93380.35  (   -0.83 )
      base 1-1280.0-0.7 13717645.60   46250.65
      prot 1-1280.0-0.7 12935355.80   64754.43  (   -5.70 )
    detect 1-1280.0-0.7 13040232.00   63304.00  (   -4.94 )
      base 1-1280.0-1.3 1654251.40    4159.68
      prot 1-1280.0-1.3 1522680.60   33673.50  (   -7.95 )
    detect 1-1280.0-1.3 1599207.00   70327.89  (   -3.33 )
      base 1-1536.0-0.1 31621775.40   31156.28
      prot 1-1536.0-0.1 31540355.20   62241.36  (   -0.26 )
    detect 1-1536.0-0.1 31420056.00  123831.27  (   -0.64 )
      base 1-1536.0-0.7 19620760.60   60937.60
      prot 1-1536.0-0.7 18337839.60   56102.58  (   -6.54 )
    detect 1-1536.0-0.7 18599128.00   75289.48  (   -5.21 )
      base 1-1536.0-1.3 2378142.40   20994.43
      prot 1-1536.0-1.3 2166260.60   48455.46  (   -8.91 )
    detect 1-1536.0-1.3 2183762.20   16883.24  (   -8.17 )
      base 1-1792.0-0.1 40259714.80   90750.70
      prot 1-1792.0-0.1 40053917.20   64509.47  (   -0.51 )
    detect 1-1792.0-0.1 39949736.40  104989.64  (   -0.77 )
      base 1-1792.0-0.7 25704884.40   69429.68
      prot 1-1792.0-0.7 23937389.00   79945.60  (   -6.88 )
    detect 1-1792.0-0.7 24271902.00   35044.30  (   -5.57 )
      base 1-1792.0-1.3 3129497.00   32731.86
      prot 1-1792.0-1.3 2796994.40   19017.26  (  -10.62 )
    detect 1-1792.0-1.3 2886840.40   33938.82  (   -7.75 )
      base 1-2048.0-0.1 48746924.40   50863.88
      prot 1-2048.0-0.1 48631954.40   24537.30  (   -0.24 )
    detect 1-2048.0-0.1 48509419.80   27085.34  (   -0.49 )
      base 1-2048.0-0.7 32046424.40   78624.22
      prot 1-2048.0-0.7 29764182.20   86002.26  (   -7.12 )
    detect 1-2048.0-0.7 30250315.80  101282.14  (   -5.60 )
      base 1-2048.0-1.3 3916723.60   24048.55
      prot 1-2048.0-1.3 3490781.60   33292.61  (  -10.87 )
    detect 1-2048.0-1.3 3585002.20   44942.04  (   -8.47 )

* multi thread
           param        pswpin       std       improvement
      base 8-1024.0-0.1 16219822.60  329474.01
      prot 8-1024.0-0.1 15959494.00  654597.45  (   -1.61 )
    detect 8-1024.0-0.1 15773790.80  502275.25  (   -2.75 )
      base 8-1024.0-0.7 9174107.80  537619.33
      prot 8-1024.0-0.7 8571915.00  385230.08  (   -6.56 )
    detect 8-1024.0-0.7 8489484.20  364683.00  (   -7.46 )
      base 8-1024.0-1.3 1108495.60   83555.98
      prot 8-1024.0-1.3 1038906.20   63465.20  (   -6.28 )
    detect 8-1024.0-1.3  941817.80   32648.80  (  -15.04 )
      base 8-1280.0-0.1 25776114.20  450480.45
      prot 8-1280.0-0.1 25430847.00  465627.07  (   -1.34 )
    detect 8-1280.0-0.1 25282555.00  465666.55  (   -1.91 )
      base 8-1280.0-0.7 15218968.00  702007.69
      prot 8-1280.0-0.7 13957947.80  492643.86  (   -8.29 )
    detect 8-1280.0-0.7 14158331.20  238656.02  (   -6.97 )
      base 8-1280.0-1.3 1792482.80   30512.90
      prot 8-1280.0-1.3 1577686.40   34002.62  (  -11.98 )
    detect 8-1280.0-1.3 1556133.00   22944.79  (  -13.19 )
      base 8-1536.0-0.1 33923761.40  575455.85
      prot 8-1536.0-0.1 32715766.20  300633.51  (   -3.56 )
    detect 8-1536.0-0.1 33158477.40  117764.51  (   -2.26 )
      base 8-1536.0-0.7 20628907.80  303851.34
      prot 8-1536.0-0.7 19329511.20  341719.31  (   -6.30 )
    detect 8-1536.0-0.7 20013934.00  385358.66  (   -2.98 )
      base 8-1536.0-1.3 2588106.40  130769.20
      prot 8-1536.0-1.3 2275222.40   89637.06  (  -12.09 )
    detect 8-1536.0-1.3 2365008.40  124412.55  (   -8.62 )
      base 8-1792.0-0.1 43328279.20  946469.12
      prot 8-1792.0-0.1 41481980.80  525690.89  (   -4.26 )
    detect 8-1792.0-0.1 41713944.60  406798.93  (   -3.73 )
      base 8-1792.0-0.7 27155647.40  536253.57
      prot 8-1792.0-0.7 24989406.80  502734.52  (   -7.98 )
    detect 8-1792.0-0.7 25524806.40  263237.87  (   -6.01 )
      base 8-1792.0-1.3 3260372.80  137907.92
      prot 8-1792.0-1.3 2879187.80   63597.26  (  -11.69 )
    detect 8-1792.0-1.3 2892962.20   33229.13  (  -11.27 )
      base 8-2048.0-0.1 50583989.80  710121.48
      prot 8-2048.0-0.1 49599984.40  228782.42  (   -1.95 )
    detect 8-2048.0-0.1 50578596.00  660971.66  (   -0.01 )
      base 8-2048.0-0.7 33765479.60  812659.55
      prot 8-2048.0-0.7 30767021.20  462907.24  (   -8.88 )
    detect 8-2048.0-0.7 32213068.80  211884.24  (   -4.60 )
      base 8-2048.0-1.3 3941675.80   28436.45
      prot 8-2048.0-1.3 3538742.40   76856.08  (  -10.22 )
    detect 8-2048.0-1.3 3579397.80   58630.95  (   -9.19 )

As we can see, all the cases show improvement.  Especially, test case with
zipf distribution 1.3 show more improvements.  It means that if there is a
hot/cold tendency in anon pages, this patchset works better.

This patch (of 6):

Current implementation of LRU management for anonymous page has some
problems.  Most important one is that it doesn't protect the workingset,
that is, pages on the active LRU list.  Although, this problem will be
fixed in the following patchset, the preparation is required and this
patch does it.

What following patch does is to implement workingset protection.  After
the following patchset, newly created or swap-in pages will start their
lifetime on the inactive list.  If inactive list is too small, there is
not enough chance to be referenced and the page cannot become the
workingset.

In order to provide the newly anonymous or swap-in pages enough chance to
be referenced again, this patch makes active/inactive LRU ratio as 1:1.

This is just a temporary measure.  Later patch in the series introduces
workingset detection for anonymous LRU that will be used to better decide
if pages should start on the active and inactive list.  Afterwards this
patch is effectively reverted.

Signed-off-by: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Muchun Song [Wed, 12 Aug 2020 01:30:32 +0000 (18:30 -0700)]

mm/hugetlb: add mempolicy check in the reservation routine

In the reservation routine, we only check whether the cpuset meets the
memory allocation requirements.  But we ignore the mempolicy of MPOL_BIND
case.  If someone mmap hugetlb succeeds, but the subsequent memory
allocation may fail due to mempolicy restrictions and receives the SIGBUS
signal.  This can be reproduced by the follow steps.

1) Compile the test case.
    cd tools/testing/selftests/vm/
    gcc map_hugetlb.c -o map_hugetlb

2) Pre-allocate huge pages. Suppose there are 2 numa nodes in the
    system. Each node will pre-allocate one huge page.
    echo 2 > /proc/sys/vm/nr_hugepages

3) Run test case(mmap 4MB). We receive the SIGBUS signal.
    numactl --membind=3D0 ./map_hugetlb 4

With this patch applied, the mmap will fail in the step 3) and throw
"mmap: Cannot allocate memory".

[[email protected]: include sched.h for `current']

Reported-by: Jianchao Guo <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Signed-off-by: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Baoquan He <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Roman Gushchin [Wed, 12 Aug 2020 01:30:29 +0000 (18:30 -0700)]

kselftests: cgroup: add perpcu memory accounting test

Add a simple test to check the percpu memory accounting. The test creates
a cgroup tree with 1000 child cgroups and checks values of memory.current
and memory.stat::percpu.

Signed-off-by: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Tobin C. Harding <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Michal Koutný <[email protected]>
Cc: Bixuan Cui <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Roman Gushchin [Wed, 12 Aug 2020 01:30:25 +0000 (18:30 -0700)]

mm: memcg: charge memcg percpu memory to the parent cgroup

Memory cgroups are using large chunks of percpu memory to store vmstat
data. Yet this memory is not accounted at all, so in the case when there
are many (dying) cgroups, it's not exactly clear where all the memory is.

Because the size of memory cgroup internal structures can dramatically
exceed the size of object or page which is pinning it in the memory, it's
not a good idea to simply ignore it. It actually breaks the isolation
between cgroups.

Let's account the consumed percpu memory to the parent cgroup.

[[email protected]: add WARN_ON_ONCE()s, per Johannes]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Tobin C. Harding <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Bixuan Cui <[email protected]>
Cc: Michal Koutný <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Roman Gushchin [Wed, 12 Aug 2020 01:30:21 +0000 (18:30 -0700)]

mm: memcg/percpu: per-memcg percpu memory statistics

Percpu memory can represent a noticeable chunk of the total memory
consumption, especially on big machines with many CPUs.  Let's track
percpu memory usage for each memcg and display it in memory.stat.

A percpu allocation is usually scattered over multiple pages (and nodes),
and can be significantly smaller than a page.  So let's add a byte-sized
counter on the memcg level: MEMCG_PERCPU_B.  Byte-sized vmstat infra
created for slabs can be perfectly reused for percpu case.

[[email protected]: v3]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Tobin C. Harding <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Bixuan Cui <[email protected]>
Cc: Michal Koutný <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Roman Gushchin [Wed, 12 Aug 2020 01:30:17 +0000 (18:30 -0700)]

mm: memcg/percpu: account percpu memory to memory cgroups

Percpu memory is becoming more and more widely used by various subsystems,
and the total amount of memory controlled by the percpu allocator can make
a good part of the total memory.

As an example, bpf maps can consume a lot of percpu memory, and they are
created by a user.  Also, some cgroup internals (e.g.  memory controller
statistics) can be quite large.  On a machine with many CPUs and big
number of cgroups they can consume hundreds of megabytes.

So the lack of memcg accounting is creating a breach in the memory
isolation.  Similar to the slab memory, percpu memory should be accounted
by default.

To implement the perpcu accounting it's possible to take the slab memory
accounting as a model to follow.  Let's introduce two types of percpu
chunks: root and memcg.  What makes memcg chunks different is an
additional space allocated to store memcg membership information.  If
__GFP_ACCOUNT is passed on allocation, a memcg chunk should be be used.
If it's possible to charge the corresponding size to the target memory
cgroup, allocation is performed, and the memcg ownership data is recorded.
System-wide allocations are performed using root chunks, so there is no
additional memory overhead.

To implement a fast reparenting of percpu memory on memcg removal, we
don't store mem_cgroup pointers directly: instead we use obj_cgroup API,
introduced for slab accounting.

[[email protected]: fix CONFIG_MEMCG_KMEM=n build errors and warning]
[[email protected]: move unreachable code, per Roman]
[[email protected]: mm/percpu: fix 'defined but not used' warning]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Roman Gushchin <[email protected]>
Signed-off-by: Bixuan Cui <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Tobin C. Harding <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Bixuan Cui <[email protected]>
Cc: Michal Koutný <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Roman Gushchin [Wed, 12 Aug 2020 01:30:14 +0000 (18:30 -0700)]

percpu: return number of released bytes from pcpu_free_area()

Patch series "mm: memcg accounting of percpu memory", v3.

This patchset adds percpu memory accounting to memory cgroups.  It's based
on the rework of the slab controller and reuses concepts and features
introduced for the per-object slab accounting.

Percpu memory is becoming more and more widely used by various subsystems,
and the total amount of memory controlled by the percpu allocator can make
a good part of the total memory.

As an example, bpf maps can consume a lot of percpu memory, and they are
created by a user.  Also, some cgroup internals (e.g.  memory controller
statistics) can be quite large.  On a machine with many CPUs and big
number of cgroups they can consume hundreds of megabytes.

So the lack of memcg accounting is creating a breach in the memory
isolation.  Similar to the slab memory, percpu memory should be accounted
by default.

Percpu allocations by their nature are scattered over multiple pages, so
they can't be tracked on the per-page basis.  So the per-object tracking
introduced by the new slab controller is reused.

The patchset implements charging of percpu allocations, adds memcg-level
statistics, enables accounting for percpu allocations made by memory
cgroup internals and provides some basic tests.

To implement the accounting of percpu memory without a significant memory
and performance overhead the following approach is used: all accounted
allocations are placed into a separate percpu chunk (or chunks).  These
chunks are similar to default chunks, except that they do have an attached
vector of pointers to obj_cgroup objects, which is big enough to save a
pointer for each allocated object.  On the allocation, if the allocation
has to be accounted (__GFP_ACCOUNT is passed, the allocating process
belongs to a non-root memory cgroup, etc), the memory cgroup is getting
charged and if the maximum limit is not exceeded the allocation is
performed using a memcg-aware chunk.  Otherwise -ENOMEM is returned or the
allocation is forced over the limit, depending on gfp (as any other kernel
memory allocation).  The memory cgroup information is saved in the
obj_cgroup vector at the corresponding offset.  On the release time the
memcg information is restored from the vector and the cgroup is getting
uncharged.  Unaccounted allocations (at this point the absolute majority
of all percpu allocations) are performed in the old way, so no additional
overhead is expected.

To avoid pinning dying memory cgroups by outstanding allocations,
obj_cgroup API is used instead of directly saving memory cgroup pointers.
obj_cgroup is basically a pointer to a memory cgroup with a standalone
reference counter.  The trick is that it can be atomically swapped to
point at the parent cgroup, so that the original memory cgroup can be
released prior to all objects, which has been charged to it.  Because all
charges and statistics are fully recursive, it's perfectly correct to
uncharge the parent cgroup instead.  This scheme is used in the slab
memory accounting, and percpu memory can just follow the scheme.

This patch (of 5):

To implement accounting of percpu memory we need the information about the
size of freed object.  Return it from pcpu_free_area().

Signed-off-by: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Tobin C. Harding <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Waiman Long <[email protected]>
cC: Michal Koutný[email protected]>
Cc: Bixuan Cui <[email protected]>
Cc: Michal Koutný <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

Empty description

RSS Atom

This page took 0.149475 seconds and 4 git commands to generate.