]> Git Repo - linux.git/log
linux.git
4 years agodrivers/rapidio/devices/rio_mport_cdev.c: use struct_size() helper
Gustavo A. R. Silva [Wed, 12 Aug 2020 01:36:37 +0000 (18:36 -0700)]
drivers/rapidio/devices/rio_mport_cdev.c: use struct_size() helper

Make use of the struct_size() helper instead of an open-coded version in
order to avoid any potential type mistakes.

This issue was found with the help of Coccinelle and, audited and fixed
manually.

Addresses KSPP ID: https://github.com/KSPP/linux/issues/83

Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Matt Porter <[email protected]>
Cc: Alexandre Bounine <[email protected]>
Link: http://lkml.kernel.org/r/20200619170843.GA24923@embeddedor
Signed-off-by: Linus Torvalds <[email protected]>
4 years agokdump: append kernel build-id string to VMCOREINFO
Vijay Balakrishna [Wed, 12 Aug 2020 01:36:33 +0000 (18:36 -0700)]
kdump: append kernel build-id string to VMCOREINFO

Make kernel GNU build-id available in VMCOREINFO.  Having build-id in
VMCOREINFO facilitates presenting appropriate kernel namelist image with
debug information file to kernel crash dump analysis tools.  Currently
VMCOREINFO lacks uniquely identifiable key for crash analysis automation.

Regarding if this patch is necessary or matching of linux_banner and
OSRELEASE in VMCOREINFO employed by crash(8) meets the need -- IMO,
build-id approach more foolproof, in most instances it is a cryptographic
hash generated using internal code/ELF bits unlike kernel version string
upon which linux_banner is based that is external to the code.  I feel
each is intended for a different purpose.  Also OSRELEASE is not suitable
when two different kernel builds from same version with different features
enabled.

Currently for most linux (and non-linux) systems build-id can be extracted
using standard methods for file types such as user mode crash dumps,
shared libraries, loadable kernel modules etc., This is an exception for
linux kernel dump.  Having build-id in VMCOREINFO brings some uniformity
for automation tools.

Tyler said:

: I think this is a nice improvement over today's linux_banner approach for
: correlating vmlinux to a kernel dump.
:
: The elf notes parsing in this patch lines up with what is described in in
: the "Notes (Nhdr)" section of the elf(5) man page.
:
: BUILD_ID_MAX is sufficient to hold a sha1 build-id, which is the default
: build-id type today in GNU ld(2).  It is also sufficient to hold the
: "fast" build-id, which is the default build-id type today in LLVM lld(2).

Signed-off-by: Vijay Balakrishna <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
Acked-by: Baoquan He <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoexec: move path_noexec() check earlier
Kees Cook [Wed, 12 Aug 2020 01:36:30 +0000 (18:36 -0700)]
exec: move path_noexec() check earlier

The path_noexec() check, like the regular file check, was happening too
late, letting LSMs see impossible execve()s.  Check it earlier as well in
may_open() and collect the redundant fs/exec.c path_noexec() test under
the same robustness comment as the S_ISREG() check.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
    struct open_flags open_exec_flags = {
        .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
        .acc_mode = MAY_EXEC,
        ...
    do_filp_open(dfd, filename, open_flags)
        path_openat(nameidata, open_flags, flags)
            file = alloc_empty_file(open_flags, current_cred());
            do_open(nameidata, file, open_flags)
                may_open(path, acc_mode, open_flag)
                    /* new location of MAY_EXEC vs path_noexec() test */
                    inode_permission(inode, MAY_OPEN | acc_mode)
                        security_inode_permission(inode, acc_mode)
                vfs_open(path, file)
                    do_dentry_open(file, path->dentry->d_inode, open)
                        security_file_open(f)
                        open()
    /* old location of path_noexec() test */

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Aleksa Sarai <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoexec: move S_ISREG() check earlier
Kees Cook [Wed, 12 Aug 2020 01:36:26 +0000 (18:36 -0700)]
exec: move S_ISREG() check earlier

The execve(2)/uselib(2) syscalls have always rejected non-regular files.
Recently, it was noticed that a deadlock was introduced when trying to
execute pipes, as the S_ISREG() test was happening too late.  This was
fixed in commit 73601ea5b7b1 ("fs/open.c: allow opening only regular files
during execve()"), but it was added after inode_permission() had already
run, which meant LSMs could see bogus attempts to execute non-regular
files.

Move the test into the other inode type checks (which already look for
other pathological conditions[1]).  Since there is no need to use
FMODE_EXEC while we still have access to "acc_mode", also switch the test
to MAY_EXEC.

Also include a comment with the redundant S_ISREG() checks at the end of
execve(2)/uselib(2) to note that they are present to avoid any mistakes.

My notes on the call path, and related arguments, checks, etc:

do_open_execat()
    struct open_flags open_exec_flags = {
        .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
        .acc_mode = MAY_EXEC,
        ...
    do_filp_open(dfd, filename, open_flags)
        path_openat(nameidata, open_flags, flags)
            file = alloc_empty_file(open_flags, current_cred());
            do_open(nameidata, file, open_flags)
                may_open(path, acc_mode, open_flag)
    /* new location of MAY_EXEC vs S_ISREG() test */
                    inode_permission(inode, MAY_OPEN | acc_mode)
                        security_inode_permission(inode, acc_mode)
                vfs_open(path, file)
                    do_dentry_open(file, path->dentry->d_inode, open)
                        /* old location of FMODE_EXEC vs S_ISREG() test */
                        security_file_open(f)
                        open()

[1] https://lore.kernel.org/lkml/202006041910.9EF0C602@keescook/

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Aleksa Sarai <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoexec: change uselib(2) IS_SREG() failure to EACCES
Kees Cook [Wed, 12 Aug 2020 01:36:23 +0000 (18:36 -0700)]
exec: change uselib(2) IS_SREG() failure to EACCES

Patch series "Relocate execve() sanity checks", v2.

While looking at the code paths for the proposed O_MAYEXEC flag, I saw
some things that looked like they should be fixed up.

  exec: Change uselib(2) IS_SREG() failure to EACCES
This just regularizes the return code on uselib(2).

  exec: Move S_ISREG() check earlier
This moves the S_ISREG() check even earlier than it was already.

  exec: Move path_noexec() check earlier
This adds the path_noexec() check to the same place as the
S_ISREG() check.

This patch (of 3):

Change uselib(2)' S_ISREG() error return to EACCES instead of EINVAL so
the behavior matches execve(2), and the seemingly documented value.  The
"not a regular file" failure mode of execve(2) is explicitly
documented[1], but it is not mentioned in uselib(2)[2] which does,
however, say that open(2) and mmap(2) errors may apply.  The documentation
for open(2) does not include a "not a regular file" error[3], but mmap(2)
does[4], and it is EACCES.

[1] http://man7.org/linux/man-pages/man2/execve.2.html#ERRORS
[2] http://man7.org/linux/man-pages/man2/uselib.2.html#ERRORS
[3] http://man7.org/linux/man-pages/man2/open.2.html#ERRORS
[4] http://man7.org/linux/man-pages/man2/mmap.2.html#ERRORS

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Cc: Aleksa Sarai <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agocoredump: add %f for executable filename
Lepton Wu [Wed, 12 Aug 2020 01:36:20 +0000 (18:36 -0700)]
coredump: add %f for executable filename

The document reads "%e" should be "executable filename" while actually it
could be changed by things like pr_ctl PR_SET_NAME.  People who uses "%e"
in core_pattern get surprised when they find out they get thread name
instead of executable filename.

This is either a bug of document or a bug of code.  Since the behavior of
"%e" is there for long time, it could bring another surprise for users if
we "fix" the code.

So we just "fix" the document.  And more, for users who really need the
"executable filename" in core_pattern, we introduce a new "%f" for the
real executable filename.  We already have "%E" for executable path in
kernel, so just reuse most of its code for the new added "%f" format.

Signed-off-by: Lepton Wu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agotest_kmod: avoid potential double free in trigger_config_run_type()
Tiezhu Yang [Wed, 12 Aug 2020 01:36:16 +0000 (18:36 -0700)]
test_kmod: avoid potential double free in trigger_config_run_type()

Reset the member "test_fs" of the test configuration after a call of the
function "kfree_const" to a null pointer so that a double memory release
will not be performed.

Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader")
Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Luis Chamberlain <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Chuck Lever <[email protected]>
Cc: David Howells <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: James Morris <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: J. Bruce Fields <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Lars Ellenberg <[email protected]>
Cc: Nikolay Aleksandrov <[email protected]>
Cc: Philipp Reisner <[email protected]>
Cc: Roopa Prabhu <[email protected]>
Cc: "Serge E. Hallyn" <[email protected]>
Cc: Sergei Trofimovich <[email protected]>
Cc: Sergey Kvachonok <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Tony Vroon <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agokmod: remove redundant "be an" in the comment
Tiezhu Yang [Wed, 12 Aug 2020 01:36:12 +0000 (18:36 -0700)]
kmod: remove redundant "be an" in the comment

There exists redundant "be an" in the comment, remove it.

Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Luis Chamberlain <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Chuck Lever <[email protected]>
Cc: David Howells <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: James Morris <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: J. Bruce Fields <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Lars Ellenberg <[email protected]>
Cc: Nikolay Aleksandrov <[email protected]>
Cc: Philipp Reisner <[email protected]>
Cc: Roopa Prabhu <[email protected]>
Cc: "Serge E. Hallyn" <[email protected]>
Cc: Sergei Trofimovich <[email protected]>
Cc: Sergey Kvachonok <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Tony Vroon <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoselftests: kmod: use variable NAME in kmod_test_0001()
Tiezhu Yang [Wed, 12 Aug 2020 01:36:08 +0000 (18:36 -0700)]
selftests: kmod: use variable NAME in kmod_test_0001()

Patch series "kmod/umh: a few fixes".

Tiezhu Yang had sent out a patch set with a slew of kmod selftest fixes,
and one patch which modified kmod to return 254 when a module was not
found.  This opened up pandora's box about why that was being used for and
low and behold its because when UMH_WAIT_PROC is used we call a
kernel_wait4() call but have never unwrapped the error code.  The commit
log for that fix details the rationale for the approach taken.  I'd
appreciate some review on that, in particular nfs folks as it seems a case
was never really hit before.

This patch (of 5):

Use the variable NAME instead of "\000" directly in kmod_test_0001().

Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Luis Chamberlain <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Philipp Reisner <[email protected]>
Cc: Lars Ellenberg <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: J. Bruce Fields <[email protected]>
Cc: Chuck Lever <[email protected]>
Cc: Roopa Prabhu <[email protected]>
Cc: Nikolay Aleksandrov <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: David Howells <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: James Morris <[email protected]>
Cc: "Serge E. Hallyn" <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Sergei Trofimovich <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Sergey Kvachonok <[email protected]>
Cc: Tony Vroon <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agofs/signalfd.c: fix inconsistent return codes for signalfd4
Helge Deller [Wed, 12 Aug 2020 01:36:04 +0000 (18:36 -0700)]
fs/signalfd.c: fix inconsistent return codes for signalfd4

The kernel signalfd4() syscall returns different error codes when called
either in compat or native mode.  This behaviour makes correct emulation
in qemu and testing programs like LTP more complicated.

Fix the code to always return -in both modes- EFAULT for unaccessible user
memory, and EINVAL when called with an invalid signal mask.

Signed-off-by: Helge Deller <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Laurent Vivier <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agofat: fix fat_ra_init() for data clusters == 0
OGAWA Hirofumi [Wed, 12 Aug 2020 01:36:01 +0000 (18:36 -0700)]
fat: fix fat_ra_init() for data clusters == 0

If data clusters == 0, fat_ra_init() calls the ->ent_blocknr() for the
cluster beyond ->max_clusters.

This checks the limit before initialization to suppress the warning.

Reported-by: [email protected]
Signed-off-by: OGAWA Hirofumi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoVFAT/FAT/MSDOS FILESYSTEM: replace HTTP links with HTTPS ones
Alexander A. Klimov [Wed, 12 Aug 2020 01:35:59 +0000 (18:35 -0700)]
VFAT/FAT/MSDOS FILESYSTEM: replace HTTP links with HTTPS ones

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `xmlns`:
        For each link, `http://[^#  ]*(?:\w|/)`:
  If neither `gnu\.org/license`, nor `mozilla\.org/MPL`:
            If both the HTTP and HTTPS versions
            return 200 OK and serve the same content:
              Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: OGAWA Hirofumi <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agofatfs: switch write_lock to read_lock in fat_ioctl_get_attributes
Yubo Feng [Wed, 12 Aug 2020 01:35:56 +0000 (18:35 -0700)]
fatfs: switch write_lock to read_lock in fat_ioctl_get_attributes

There is no need to hold write_lock in fat_ioctl_get_attributes.
write_lock may make an impact on concurrency of fat_ioctl_get_attributes.

Signed-off-by: Yubo Feng <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: OGAWA Hirofumi <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agofs/ufs: avoid potential u32 multiplication overflow
Colin Ian King [Wed, 12 Aug 2020 01:35:53 +0000 (18:35 -0700)]
fs/ufs: avoid potential u32 multiplication overflow

The 64 bit ino is being compared to the product of two u32 values,
however, the multiplication is being performed using a 32 bit multiply so
there is a potential of an overflow.  To be fully safe, cast uspi->s_ncg
to a u64 to ensure a 64 bit multiplication occurs to avoid any chance of
overflow.

Fixes: f3e2a520f5fb ("ufs: NFS support")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Evgeniy Dushistov <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Addresses-Coverity: ("Unintentional integer overflow")
Signed-off-by: Linus Torvalds <[email protected]>
4 years agonilfs2: use a more common logging style
Joe Perches [Wed, 12 Aug 2020 01:35:49 +0000 (18:35 -0700)]
nilfs2: use a more common logging style

Add macros for nilfs_<level>(sb, fmt, ...) and convert the uses of
'nilfs_msg(sb, KERN_<LEVEL>, ...)' to 'nilfs_<level>(sb, ...)' so nilfs2
uses a logging style more like the typical kernel logging style.

Miscellanea:

o Realign arguments for these uses

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agonilfs2: convert __nilfs_msg to integrate the level and format
Joe Perches [Wed, 12 Aug 2020 01:35:46 +0000 (18:35 -0700)]
nilfs2: convert __nilfs_msg to integrate the level and format

Reduce object size a bit by removing the KERN_<LEVEL> as a separate
argument and adding it to the format string.

Reduce overall object size by about ~.5% (x86-64 defconfig w/ nilfs2)

old:
$ size -t fs/nilfs2/built-in.a | tail -1
 191738    8676      44  200458   30f0a (TOTALS)

new:
$ size -t fs/nilfs2/built-in.a | tail -1
 190971    8676      44  199691   30c0b (TOTALS)

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agonilfs2: only call unlock_new_inode() if I_NEW
Eric Biggers [Wed, 12 Aug 2020 01:35:43 +0000 (18:35 -0700)]
nilfs2: only call unlock_new_inode() if I_NEW

Patch series "nilfs2 updates".

This patch (of 3):

unlock_new_inode() is only meant to be called after a new inode has
already been inserted into the hash table.  But nilfs_new_inode() can call
it even before it has inserted the inode, triggering the WARNING in
unlock_new_inode().  Fix this by only calling unlock_new_inode() if the
inode has the I_NEW flag set, indicating that it's in the table.

Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agofs/minix: remove expected error message in block_to_path()
Eric Biggers [Wed, 12 Aug 2020 01:35:39 +0000 (18:35 -0700)]
fs/minix: remove expected error message in block_to_path()

When truncating a file to a size within the last allowed logical block,
block_to_path() is called with the *next* block.  This exceeds the limit,
causing the "block %ld too big" error message to be printed.

This case isn't actually an error; there are just no more blocks past that
point.  So, remove this error message.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Qiujun Huang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agofs/minix: fix block limit check for V1 filesystems
Eric Biggers [Wed, 12 Aug 2020 01:35:36 +0000 (18:35 -0700)]
fs/minix: fix block limit check for V1 filesystems

The minix filesystem reads its maximum file size from its on-disk
superblock.  This value isn't necessarily a multiple of the block size.
When it's not, the V1 block mapping code doesn't allow mapping the last
possible block.  Commit 6ed6a722f9ab ("minixfs: fix block limit check")
fixed this in the V2 mapping code.  Fix it in the V1 mapping code too.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Qiujun Huang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agofs/minix: set s_maxbytes correctly
Eric Biggers [Wed, 12 Aug 2020 01:35:33 +0000 (18:35 -0700)]
fs/minix: set s_maxbytes correctly

The minix filesystem leaves super_block::s_maxbytes at MAX_NON_LFS rather
than setting it to the actual filesystem-specific limit.  This is broken
because it means userspace doesn't see the standard behavior like getting
EFBIG and SIGXFSZ when exceeding the maximum file size.

Fix this by setting s_maxbytes correctly.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Qiujun Huang <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agofs/minix: reject too-large maximum file size
Eric Biggers [Wed, 12 Aug 2020 01:35:30 +0000 (18:35 -0700)]
fs/minix: reject too-large maximum file size

If the minix filesystem tries to map a very large logical block number to
its on-disk location, block_to_path() can return offsets that are too
large, causing out-of-bounds memory accesses when accessing indirect index
blocks.  This should be prevented by the check against the maximum file
size, but this doesn't work because the maximum file size is read directly
from the on-disk superblock and isn't validated itself.

Fix this by validating the maximum file size at mount time.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Qiujun Huang <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agofs/minix: don't allow getting deleted inodes
Eric Biggers [Wed, 12 Aug 2020 01:35:27 +0000 (18:35 -0700)]
fs/minix: don't allow getting deleted inodes

If an inode has no links, we need to mark it bad rather than allowing it
to be accessed.  This avoids WARNINGs in inc_nlink() and drop_nlink() when
doing directory operations on a fuzzed filesystem.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: [email protected]
Reported-by: [email protected]
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Qiujun Huang <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agofs/minix: check return value of sb_getblk()
Eric Biggers [Wed, 12 Aug 2020 01:35:24 +0000 (18:35 -0700)]
fs/minix: check return value of sb_getblk()

Patch series "fs/minix: fix syzbot bugs and set s_maxbytes".

This series fixes all syzbot bugs in the minix filesystem:

KASAN: null-ptr-deref Write in get_block
KASAN: use-after-free Write in get_block
KASAN: use-after-free Read in get_block
WARNING in inc_nlink
KMSAN: uninit-value in get_block
WARNING in drop_nlink

It also fixes the minix filesystem to set s_maxbytes correctly, so that
userspace sees the correct behavior when exceeding the max file size.

This patch (of 6):

sb_getblk() can fail, so check its return value.

This fixes a NULL pointer dereference.

Originally from Qiujun Huang.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: [email protected]
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Qiujun Huang <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoautofs: fix doubled word
Randy Dunlap [Wed, 12 Aug 2020 01:35:21 +0000 (18:35 -0700)]
autofs: fix doubled word

Change doubled word "is" to "it is".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Ian Kent <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agocheckpatch: remove missing switch/case break test
Joe Perches [Wed, 12 Aug 2020 01:35:19 +0000 (18:35 -0700)]
checkpatch: remove missing switch/case break test

This test doesn't work well and newer compilers are much better
at emitting this warning.

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Cambda Zhu <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agocheckpatch: add test for repeated words
Joe Perches [Wed, 12 Aug 2020 01:35:16 +0000 (18:35 -0700)]
checkpatch: add test for repeated words

Try to avoid adding repeated words either on the same line or consecutive
comment lines in a block

e.g.:

duplicated word in comment block

/*
 * this is a comment block where the last word of the previous
 * previous line is also the first word of the next line
 */

and simple duplication

/* test this this again */

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Inspired-by: Randy Dunlap <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
4 years agocheckpatch: fix CONST_STRUCT when const_structs.checkpatch is missing
Quentin Monnet [Wed, 12 Aug 2020 01:35:13 +0000 (18:35 -0700)]
checkpatch: fix CONST_STRUCT when const_structs.checkpatch is missing

Checkpatch reports warnings when some specific structs are not declared as
const in the code.  The list of structs to consider was initially defined
in the checkpatch.pl script itself, but it was later moved to an external
file (scripts/const_structs.checkpatch), in commit bf1fa1dae68e
("checkpatch: externalize the structs that should be const").  This
introduced two minor issues:

- When file scripts/const_structs.checkpatch is not present (for
  example, if checkpatch is run outside of the kernel directory with the
  "--no-tree" option), a warning is printed to stderr to tell the user
  that "No structs that should be const will be found". This is fair,
  but the warning is printed unconditionally, even if the option
  "--ignore CONST_STRUCT" is passed. In the latter case, we explicitly
  ask checkpatch to skip this check, so no warning should be printed.

- When scripts/const_structs.checkpatch is missing, or even when trying
  to silence the warning by adding an empty file, $const_structs is set
  to "", and the regex used for finding structs that should be const,
  "$line =~ /struct\s+($const_structs)(?!\s*\{)/)", matches all
  structs found in the code, thus reporting a number of false positives.

Let's fix the first item by skipping scripts/const_structs.checkpatch
processing if "CONST_STRUCT" checks are ignored, and the second one by
skipping the test if $const_structs is not defined. Since we modify the
read_words() function a little bit, update the checks for
$typedefsfile/$typeOtherTypedefs as well.

Signed-off-by: Quentin Monnet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Joe Perches <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agocheckpatch: add --fix option for ASSIGN_IN_IF
Joe Perches [Wed, 12 Aug 2020 01:35:10 +0000 (18:35 -0700)]
checkpatch: add --fix option for ASSIGN_IN_IF

Add a --fix option for 2 types of single-line assignment in if statements

if ((foo = bar(...)) < BAZ) {
expands to:
foo = bar(..);
if (foo < BAZ) {
and
if ((foo = bar(...)) {
expands to:
foo = bar(...);
if (foo) {

if statements with assignments spanning multiple lines are
not converted with the --fix option.

if statements with additional logic are also not converted.

e.g.: if ((foo = bar(...)) & BAZ == BAZ) {

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Julia Lawall <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agocheckpatch: add test for possible misuse of IS_ENABLED() without CONFIG_
Joe Perches [Wed, 12 Aug 2020 01:35:07 +0000 (18:35 -0700)]
checkpatch: add test for possible misuse of IS_ENABLED() without CONFIG_

IS_ENABLED is almost always used with CONFIG_<FOO> defines.

Add a test to verify that the #define being tested starts with CONFIG_.

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agolib/test_bits.c: add tests of GENMASK
Rikard Falkeborn [Wed, 12 Aug 2020 01:35:03 +0000 (18:35 -0700)]
lib/test_bits.c: add tests of GENMASK

Add tests of GENMASK and GENMASK_ULL.

A few test cases that should fail compilation are provided under #ifdef
TEST_GENMASK_FAILURES

[[email protected]: add MODULE_LICENSE()]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: make some functions static]
Link: http://lkml.kernel.org/r/[email protected]
Suggested-by: Andy Shevchenko <[email protected]>
Signed-off-by: Rikard Falkeborn <[email protected]>
Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: William Breathitt Gray <[email protected]>
Cc: Emil Velikov <[email protected]>
Cc: Syed Nayyar Waris <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agokstrto*: do not describe simple_strto*() as obsolete/replaced
Kars Mulder [Wed, 12 Aug 2020 01:34:56 +0000 (18:34 -0700)]
kstrto*: do not describe simple_strto*() as obsolete/replaced

The documentation of the kstrto*() functions describes kstrto*() as
"replacements" of the "obsolete" simple_strto*() functions.  Both of these
terms are inaccurate: they're not replacements because they have different
behaviour, and the simple_strto*() are not obsolete because there are
cases where they have benefits over kstrto*().

Remove usage of the terms "replacement" and "obsolete" in reference to
simple_strto*(), and instead use the term "preferred over".

Fixes: 4c925d6031f71 ("kstrto*: add documentation")
Fixes: 885e68e8b7b13 ("kernel.h: update comment about simple_strto<foo>() functions")
Signed-off-by: Kars Mulder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Eldad Zack <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Mans Rullgard <[email protected]>
Cc: Petr Mladek <[email protected]>
Link: http://lkml.kernel.org/r/29b9-5f234c80-13-4e3aa200@244003027
Signed-off-by: Linus Torvalds <[email protected]>
4 years agokstrto*: correct documentation references to simple_strto*()
Kars Mulder [Wed, 12 Aug 2020 01:34:53 +0000 (18:34 -0700)]
kstrto*: correct documentation references to simple_strto*()

The documentation of the kstrto*() functions reference the simple_strtoull
function by "used as a replacement for [the obsolete] simple_strtoull".
All these functions describes themselves as replacements for the function
simple_strtoull, even though a function like kstrtol() would be more aptly
described as a replacement of simple_strtol().

Fix these references by making the documentation of kstrto*() reference
the closest simple_strto*() equivalent available.  The functions
kstrto[u]int() do not have direct simple_strto[u]int() equivalences, so
these are made to refer to simple_strto[u]l() instead.

Furthermore, add parentheses after function names, as is standard in
kernel documentation.

Fixes: 4c925d6031f71 ("kstrto*: add documentation")
Signed-off-by: Kars Mulder <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Eldad Zack <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Mans Rullgard <[email protected]>
Cc: Petr Mladek <[email protected]>
Link: http://lkml.kernel.org/r/1ee1-5f234c00-f3-165a6440@234394593
Signed-off-by: Linus Torvalds <[email protected]>
4 years agolib/: replace HTTP links with HTTPS ones
Alexander A. Klimov [Wed, 12 Aug 2020 01:34:50 +0000 (18:34 -0700)]
lib/: replace HTTP links with HTTPS ones

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Signed-off-by: Alexander A. Klimov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Coly Li <[email protected]> [crc64.c]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agolib/test_lockup.c: fix return value of test_lockup_init()
Tiezhu Yang [Wed, 12 Aug 2020 01:34:47 +0000 (18:34 -0700)]
lib/test_lockup.c: fix return value of test_lockup_init()

Since filp_open() returns an error pointer, we should use IS_ERR() to
check the return value and then return PTR_ERR() if failed to get the
actual return value instead of always -EINVAL.

E.g. without this patch:

[root@localhost loongson]# ls no_such_file
ls: cannot access no_such_file: No such file or directory
[root@localhost loongson]# modprobe test_lockup file_path=no_such_file lock_sb_umount time_secs=60 state=S
modprobe: ERROR: could not insert 'test_lockup': Invalid argument
[root@localhost loongson]# dmesg | tail -1
[  126.100596] test_lockup: cannot find file_path

With this patch:

[root@localhost loongson]# ls no_such_file
ls: cannot access no_such_file: No such file or directory
[root@localhost loongson]# modprobe test_lockup file_path=no_such_file lock_sb_umount time_secs=60 state=S
modprobe: ERROR: could not insert 'test_lockup': Unknown symbol in module, or unknown parameter (see dmesg)
[root@localhost loongson]# dmesg | tail -1
[   95.134362] test_lockup: failed to open no_such_file: -2

Fixes: aecd42df6d39 ("lib/test_lockup.c: add parameters for locking generic vfs locks")
Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Kees Cook <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agolib/Kconfig.debug: make TEST_LOCKUP depend on module
Tiezhu Yang [Wed, 12 Aug 2020 01:34:44 +0000 (18:34 -0700)]
lib/Kconfig.debug: make TEST_LOCKUP depend on module

Since test_lockup is a test module to generate lockups, it is better to
limit TEST_LOCKUP to module (=m) or disabled (=n) because we can not use
the module parameters when CONFIG_TEST_LOCKUP=y.

Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Kees Cook <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agolib/test_lockup.c: make symbol 'test_works' static
Wei Yongjun [Wed, 12 Aug 2020 01:34:41 +0000 (18:34 -0700)]
lib/test_lockup.c: make symbol 'test_works' static

Fix sparse build warning:

lib/test_lockup.c:403:1: warning:
 symbol '__pcpu_scope_test_works' was not declared. Should it be static?

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agolib/test_bitops: do the full test during module init
Geert Uytterhoeven [Wed, 12 Aug 2020 01:34:38 +0000 (18:34 -0700)]
lib/test_bitops: do the full test during module init

Currently, the bitops test consists of two parts: one part is executed
during module load, the second part during module unload.  This is
cumbersome for the user, as he has to perform two steps to execute all
tests, and is different from most (all?) other tests.

Merge the two parts, so both are executed during module load.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agolib/generic-radix-tree.c: remove unneeded __rcu
Luc Van Oostenryck [Wed, 12 Aug 2020 01:34:35 +0000 (18:34 -0700)]
lib/generic-radix-tree.c: remove unneeded __rcu

struct __genradix is defined as having its member 'root'
annotated as __rcu. But in the corresponding API RCU is not used.
Sparse reports this type mismatch as:
lib/generic-radix-tree.c:56:35: warning: incorrect type in initializer (different address spaces)
lib/generic-radix-tree.c:56:35:    expected struct genradix_root *r
lib/generic-radix-tree.c:56:35:    got struct genradix_root [noderef] <asn:4> *__val
with 6 other ones.

So, correct root's type by removing this unneeded __rcu.

Signed-off-by: Luc Van Oostenryck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Kent Overstreet <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agolib/test_bitmap.c: add test for bitmap_cut()
Stefano Brivio [Wed, 12 Aug 2020 01:34:32 +0000 (18:34 -0700)]
lib/test_bitmap.c: add test for bitmap_cut()

Inspired by an original patch from Yury Norov: introduce a test for
bitmap_cut() that also makes sure functionality is as described for
partially overlapping src and dst.

Signed-off-by: Stefano Brivio <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Pablo Neira Ayuso <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Yury Norov <[email protected]>
Link: http://lkml.kernel.org/r/5fc45e6bbd4fa837cd9577f8a0c1d639df90a4ce.1592155364.git.sbrivio@redhat.com
Signed-off-by: Linus Torvalds <[email protected]>
4 years agolib/bitmap.c: fix bitmap_cut() for partial overlapping case
Stefano Brivio [Wed, 12 Aug 2020 01:34:29 +0000 (18:34 -0700)]
lib/bitmap.c: fix bitmap_cut() for partial overlapping case

Patch series "lib: Fix bitmap_cut() for overlaps, add test"

This patch (of 2):

Yury Norov reports that bitmap_cut() will not produce the right outcome if
src and dst partially overlap, with src pointing at some location after
dst, because the memmove() affects src before we store the bits that we
need to keep, that is, the bits preceding the cut -- as long as we the
beginning of the cut is not aligned to a long.

Fix this by storing those bits before the memmove().

Note that this is just a theoretical concern so far, as the only user of
this function, pipapo_drop() from the nftables set back-end implemented in
net/netfilter/nft_set_pipapo.c, always supplies entirely overlapping src
and dst.

Fixes: 2092767168f0 ("bitmap: Introduce bitmap_cut(): cut bits and shift remaining")
Reported-by: Yury Norov <[email protected]>
Signed-off-by: Stefano Brivio <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Pablo Neira Ayuso <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/003e38d4428cd6091ef00b5b03354f1bd7d9091e.1592155364.git.sbrivio@redhat.com
Signed-off-by: Linus Torvalds <[email protected]>
4 years agosparse: group the defines by functionality
Luc Van Oostenryck [Wed, 12 Aug 2020 01:34:26 +0000 (18:34 -0700)]
sparse: group the defines by functionality

By popular demand, reorder the defines for sparse annotations and group
them by functionality.

Signed-off-by: Luc Van Oostenryck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Miguel Ojeda <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Link: lore.kernel.org/r/CAMuHMdWQsirja-h3wBcZezk+H2Q_HShhAks8Hc8ps5fTAp=ObQ@mail.gmail.com
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoinclude/linux/poison.h: remove obsolete comment
Matthew Wilcox [Wed, 12 Aug 2020 01:34:23 +0000 (18:34 -0700)]
include/linux/poison.h: remove obsolete comment

When the definition was changed, the comment became stale.  Just remove
it since there isn't anything useful to say here.

Fixes: b8a0255db958 ("include/linux/poison.h: use POISON_POINTER_DELTA for poison pointers")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Vasily Kulikov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoinclude/: replace HTTP links with HTTPS ones
Alexander A. Klimov [Wed, 12 Aug 2020 01:34:19 +0000 (18:34 -0700)]
include/: replace HTTP links with HTTPS ones

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Signed-off-by: Alexander A. Klimov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agokernel.h: remove duplicate include of asm/div64.h
Arvind Sankar [Wed, 12 Aug 2020 01:34:16 +0000 (18:34 -0700)]
kernel.h: remove duplicate include of asm/div64.h

This seems to have been added inadvertently in commit
  72deb455b5ec ("block: remove CONFIG_LBDAF")

Fixes: 72deb455b5ec ("block: remove CONFIG_LBDAF")
Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years ago./Makefile: add debug option to enable function aligned on 32 bytes
Feng Tang [Wed, 12 Aug 2020 01:34:13 +0000 (18:34 -0700)]
./Makefile: add debug option to enable function aligned on 32 bytes

Recently 0day reported many strange performance changes (regression or
improvement), in which there was no obvious relation between the culprit
commit and the benchmark at the first look, and it causes people to doubt
the test itself is wrong.

Upon further check, many of these cases are caused by the change to the
alignment of kernel text or data, as whole text/data of kernel are linked
together, change in one domain may affect alignments of other domains.

gcc has an option '-falign-functions=n' to force text aligned, and with
that option enabled, some of those performance changes will be gone, like
[1][2][3].

Add this option so that developers and 0day can easily find performance
bump caused by text alignment change, as tracking these strange bump is
quite time consuming.  Though it can't help in other cases like data
alignment changes like [4].

Following is some size data for v5.7 kernel built with a RHEL config used
in 0day:

    text      data      bss  dec    filename
  19738771  13292906  5554236  38585913  vmlinux.noalign
  19758591  13297002  5529660  38585253  vmlinux.align32

Raw vmlinux size in bytes:

v5.7 v5.7+align32
253950832 254018000 +0.02%

Some benchmark data, most of them have no big change:

  * hackbench: [ -1.8%,  +0.5%]

  * fsmark: [ -3.2%,  +3.4%]  # ext4/xfs/btrfs

  * kbuild: [ -2.0%,  +0.9%]

  * will-it-scale: [ -0.5%,  +1.8%]  # mmap1/pagefault3

  * netperf:
    - TCP_CRR [+16.6%, +97.4%]
    - TCP_RR [-18.5%,  -1.8%]
    - TCP_STREAM [ -1.1%,  +1.9%]

[1] https://lore.kernel.org/lkml/20200114085637.GA29297@shao2-debian/
[2] https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
[3] https://lore.kernel.org/lkml/1d98d1f0-fe84-6df7-f5bd-f4cb2cdb7f45@intel.com/
[4] https://lore.kernel.org/lkml/20200205123216.GO12867@shao2-debian/

Signed-off-by: Feng Tang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agokernel: add a kernel_wait helper
Christoph Hellwig [Wed, 12 Aug 2020 01:34:10 +0000 (18:34 -0700)]
kernel: add a kernel_wait helper

Add a helper that waits for a pid and stores the status in the passed in
kernel pointer.  Use it to fix the usage of kernel_wait4 in
call_usermodehelper_exec_sync that only happens to work due to the
implicit set_fs(KERNEL_DS) for kernel threads.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoinclude/linux/xz.h: drop duplicated word
Randy Dunlap [Wed, 12 Aug 2020 01:34:07 +0000 (18:34 -0700)]
include/linux/xz.h: drop duplicated word

Drop the doubled word "than" in a comment.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Lasse Collin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoinclude/linux/async_tx.h: drop duplicated word in a comment
Randy Dunlap [Wed, 12 Aug 2020 01:34:04 +0000 (18:34 -0700)]
include/linux/async_tx.h: drop duplicated word in a comment

Drop the doubled word "the" in a comment.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Dan Williams <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoinclude/linux/exportfs.h: drop duplicated word in a comment
Randy Dunlap [Wed, 12 Aug 2020 01:34:00 +0000 (18:34 -0700)]
include/linux/exportfs.h: drop duplicated word in a comment

Drop the doubled word "a" in a comment.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Alexander Viro <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoinclude/linux/compiler-clang.h: drop duplicated word in a comment
Randy Dunlap [Wed, 12 Aug 2020 01:33:57 +0000 (18:33 -0700)]
include/linux/compiler-clang.h: drop duplicated word in a comment

Drop the doubled word "the" in a comment.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoalpha: fix annotation of io{read,write}{16,32}be()
Luc Van Oostenryck [Wed, 12 Aug 2020 01:33:54 +0000 (18:33 -0700)]
alpha: fix annotation of io{read,write}{16,32}be()

These accessors must be used to read/write a big-endian bus.  The value
returned or written is native-endian.

However, these accessors are defined using be{16,32}_to_cpu() or
cpu_to_be{16,32}() to make the endian conversion but these expect a
__be{16,32} when none is present.  Keeping them would need a force cast
that would solve nothing at all.

So, do the conversion using swab{16,32}, like done in asm-generic for
similar situations.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Luc Van Oostenryck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoexec: use force_uaccess_begin during exec and exit
Christoph Hellwig [Wed, 12 Aug 2020 01:33:50 +0000 (18:33 -0700)]
exec: use force_uaccess_begin during exec and exit

Both exec and exit want to ensure that the uaccess routines actually do
access user pointers.  Use the newly added force_uaccess_begin helper
instead of an open coded set_fs for that to prepare for kernel builds
where set_fs() does not exist.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agouaccess: add force_uaccess_{begin,end} helpers
Christoph Hellwig [Wed, 12 Aug 2020 01:33:47 +0000 (18:33 -0700)]
uaccess: add force_uaccess_{begin,end} helpers

Add helpers to wrap the get_fs/set_fs magic for undoing any damange done
by set_fs(KERNEL_DS).  There is no real functional benefit, but this
documents the intent of these calls better, and will allow stubbing the
functions out easily for kernels builds that do not allow address space
overrides in the future.

[[email protected]: drop two incorrect hunks, fix a commit log typo]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Acked-by: Greentime Hu <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agouaccess: remove segment_eq
Christoph Hellwig [Wed, 12 Aug 2020 01:33:44 +0000 (18:33 -0700)]
uaccess: remove segment_eq

segment_eq is only used to implement uaccess_kernel.  Just open code
uaccess_kernel in the arch uaccess headers and remove one layer of
indirection.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Greentime Hu <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoriscv: include <asm/pgtable.h> in <asm/uaccess.h>
Christoph Hellwig [Wed, 12 Aug 2020 01:33:41 +0000 (18:33 -0700)]
riscv: include <asm/pgtable.h> in <asm/uaccess.h>

To ensure TASK_SIZE is defined for USER_DS.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agonds32: use uaccess_kernel in show_regs
Christoph Hellwig [Wed, 12 Aug 2020 01:33:38 +0000 (18:33 -0700)]
nds32: use uaccess_kernel in show_regs

Use the uaccess_kernel helper instead of duplicating it.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Acked-by: Greentime Hu <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agosyscalls: use uaccess_kernel in addr_limit_user_check
Christoph Hellwig [Wed, 12 Aug 2020 01:33:34 +0000 (18:33 -0700)]
syscalls: use uaccess_kernel in addr_limit_user_check

Patch series "clean up address limit helpers", v2.

In preparation for eventually phasing out direct use of set_fs(), this
series removes the segment_eq() arch helper that is only used to implement
or duplicate the uaccess_kernel() API, and then adds descriptive helpers
to force the kernel address limit.

This patch (of 6):

Use the uaccess_kernel helper instead of duplicating it.

[[email protected]: arm: don't call addr_limit_user_check for nommu]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Nick Hu <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/zsmalloc.c: fix duplicated words
Randy Dunlap [Wed, 12 Aug 2020 01:33:31 +0000 (18:33 -0700)]
mm/zsmalloc.c: fix duplicated words

Change "as as" to "as a".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/zpool.c: delete duplicated word and fix grammar
Randy Dunlap [Wed, 12 Aug 2020 01:33:28 +0000 (18:33 -0700)]
mm/zpool.c: delete duplicated word and fix grammar

Drop the repeated word "if".
Fix subject/verb agreement.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/vmscan.c: delete or fix duplicated words
Randy Dunlap [Wed, 12 Aug 2020 01:33:26 +0000 (18:33 -0700)]
mm/vmscan.c: delete or fix duplicated words

Drop the repeated word "marked".
Change "time time" to "same time".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/usercopy.c: delete duplicated word
Randy Dunlap [Wed, 12 Aug 2020 01:33:23 +0000 (18:33 -0700)]
mm/usercopy.c: delete duplicated word

Drop the repeated word "the".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/slab_common.c: delete duplicated word
Randy Dunlap [Wed, 12 Aug 2020 01:33:19 +0000 (18:33 -0700)]
mm/slab_common.c: delete duplicated word

Drop the repeated word "and".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/shmem.c: delete duplicated word
Randy Dunlap [Wed, 12 Aug 2020 01:33:17 +0000 (18:33 -0700)]
mm/shmem.c: delete duplicated word

Drop the repeated word "the".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/page_alloc.c: delete or fix duplicated words
Randy Dunlap [Wed, 12 Aug 2020 01:33:14 +0000 (18:33 -0700)]
mm/page_alloc.c: delete or fix duplicated words

Drop the repeated word "them" and "that".
Change "the the" to "to the".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/nommu.c: delete duplicated words
Randy Dunlap [Wed, 12 Aug 2020 01:33:11 +0000 (18:33 -0700)]
mm/nommu.c: delete duplicated words

Drop the repeated word "that" in two places.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/migrate.c: delete duplicated word
Randy Dunlap [Wed, 12 Aug 2020 01:33:08 +0000 (18:33 -0700)]
mm/migrate.c: delete duplicated word

Drop the repeated word "and".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/memory.c: delete duplicated words
Randy Dunlap [Wed, 12 Aug 2020 01:33:05 +0000 (18:33 -0700)]
mm/memory.c: delete duplicated words

Drop the repeated word "to" in two places.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/memcontrol.c: delete duplicated words
Randy Dunlap [Wed, 12 Aug 2020 01:33:02 +0000 (18:33 -0700)]
mm/memcontrol.c: delete duplicated words

Drop the repeated word "down".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/hugetlb.c: delete duplicated words
Randy Dunlap [Wed, 12 Aug 2020 01:32:59 +0000 (18:32 -0700)]
mm/hugetlb.c: delete duplicated words

Drop the repeated word "the" in two places.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/hmm.c: delete duplicated word
Randy Dunlap [Wed, 12 Aug 2020 01:32:56 +0000 (18:32 -0700)]
mm/hmm.c: delete duplicated word

Drop the repeated word "pages".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/filemap.c: delete duplicated word
Randy Dunlap [Wed, 12 Aug 2020 01:32:53 +0000 (18:32 -0700)]
mm/filemap.c: delete duplicated word

Drop the repeated word "the".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/compaction.c: delete duplicated word
Randy Dunlap [Wed, 12 Aug 2020 01:32:49 +0000 (18:32 -0700)]
mm/compaction.c: delete duplicated word

Drop the repeated word "a".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agosparc: drop unused MAX_PHYSADDR_BITS
Arvind Sankar [Wed, 12 Aug 2020 01:32:46 +0000 (18:32 -0700)]
sparc: drop unused MAX_PHYSADDR_BITS

The macro is not used anywhere, so remove the definition.

Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agosh/mm: drop unused MAX_PHYSADDR_BITS
Arvind Sankar [Wed, 12 Aug 2020 01:32:43 +0000 (18:32 -0700)]
sh/mm: drop unused MAX_PHYSADDR_BITS

The macro is not used anywhere, so remove the definition.

Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoinclude/linux/memcontrol.h: drop duplicate word and fix spello
Randy Dunlap [Wed, 12 Aug 2020 01:32:40 +0000 (18:32 -0700)]
include/linux/memcontrol.h: drop duplicate word and fix spello

Drop the doubled word "for" in a comment.
Fix spello of "incremented".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Chris Down <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoinclude/linux/frontswap.h: drop duplicated word in a comment
Randy Dunlap [Wed, 12 Aug 2020 01:32:36 +0000 (18:32 -0700)]
include/linux/frontswap.h: drop duplicated word in a comment

Drop the doubled word "in" in a comment.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoinclude/linux/highmem.h: fix duplicated words in a comment
Randy Dunlap [Wed, 12 Aug 2020 01:32:33 +0000 (18:32 -0700)]
include/linux/highmem.h: fix duplicated words in a comment

Change the doubled word "is" in a comment to "it is".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm: drop duplicated words in <linux/mm.h>
Randy Dunlap [Wed, 12 Aug 2020 01:32:30 +0000 (18:32 -0700)]
mm: drop duplicated words in <linux/mm.h>

Drop the doubled words "to" and "the".

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm: drop duplicated words in <linux/pgtable.h>
Randy Dunlap [Wed, 12 Aug 2020 01:32:27 +0000 (18:32 -0700)]
mm: drop duplicated words in <linux/pgtable.h>

Drop the doubled words "used" and "by".

Drop the repeated acronym "TLB" and make several other fixes around it.
(capital letters, spellos)

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm, memory_hotplug: update pcp lists everytime onlining a memory block
Charan Teja Reddy [Wed, 12 Aug 2020 01:32:24 +0000 (18:32 -0700)]
mm, memory_hotplug: update pcp lists everytime onlining a memory block

When onlining a first memory block in a zone, pcp lists are not updated
thus pcp struct will have the default setting of ->high = 0,->batch = 1.

This means till the second memory block in a zone(if it have) is onlined
the pcp lists of this zone will not contain any pages because pcp's
->count is always greater than ->high thus free_pcppages_bulk() is called
to free batch size(=1) pages every time system wants to add a page to the
pcp list through free_unref_page().

To put this in a word, system is not using benefits offered by the pcp
lists when there is a single onlineable memory block in a zone.  Correct
this by always updating the pcp lists when memory block is onlined.

Fixes: 1f522509c77a ("mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset")
Signed-off-by: Charan Teja Reddy <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Vinayak Menon <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/memory_hotplug: fix unpaired mem_hotplug_begin/done
Jia He [Wed, 12 Aug 2020 01:32:20 +0000 (18:32 -0700)]
mm/memory_hotplug: fix unpaired mem_hotplug_begin/done

When check_memblock_offlined_cb() returns failed rc(e.g. the memblock is
online at that time), mem_hotplug_begin/done is unpaired in such case.

Therefore a warning:
 Call Trace:
  percpu_up_write+0x33/0x40
  try_remove_memory+0x66/0x120
  ? _cond_resched+0x19/0x30
  remove_memory+0x2b/0x40
  dev_dax_kmem_remove+0x36/0x72 [kmem]
  device_release_driver_internal+0xf0/0x1c0
  device_release_driver+0x12/0x20
  bus_remove_device+0xe1/0x150
  device_del+0x17b/0x3e0
  unregister_dev_dax+0x29/0x60
  devm_action_release+0x15/0x20
  release_nodes+0x19a/0x1e0
  devres_release_all+0x3f/0x50
  device_release_driver_internal+0x100/0x1c0
  driver_detach+0x4c/0x8f
  bus_remove_driver+0x5c/0xd0
  driver_unregister+0x31/0x50
  dax_pmem_exit+0x10/0xfe0 [dax_pmem]

Fixes: f1037ec0cc8a ("mm/memory_hotplug: fix remove_memory() lockdep splat")
Signed-off-by: Jia He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: <[email protected]> [5.6+]
Cc: Andy Lutomirski <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chuhong Yuan <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Kaly Xin <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/memory_hotplug: introduce default dummy memory_add_physaddr_to_nid()
Jia He [Wed, 12 Aug 2020 01:32:16 +0000 (18:32 -0700)]
mm/memory_hotplug: introduce default dummy memory_add_physaddr_to_nid()

This is to introduce a general dummy helper.  memory_add_physaddr_to_nid()
is a fallback option to get the nid in case NUMA_NO_NID is detected.

After this patch, arm64/sh/s390 can simply use the general dummy version.
PowerPC/x86/ia64 will still use their specific version.

This is the preparation to set a fallback value for dev_dax->target_node.

Signed-off-by: Jia He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Chuhong Yuan <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Kaly Xin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agox86/mm: use max memory block size on bare metal
Daniel Jordan [Wed, 12 Aug 2020 01:32:12 +0000 (18:32 -0700)]
x86/mm: use max memory block size on bare metal

Some of our servers spend significant time at kernel boot initializing
memory block sysfs directories and then creating symlinks between them and
the corresponding nodes.  The slowness happens because the machines get
stuck with the smallest supported memory block size on x86 (128M), which
results in 16,288 directories to cover the 2T of installed RAM.  The
search for each memory block is noticeable even with commit 4fb6eabf1037
("drivers/base/memory.c: cache memory blocks in xarray to accelerate
lookup").

Commit 078eb6aa50dc ("x86/mm/memory_hotplug: determine block size based on
the end of boot memory") chooses the block size based on alignment with
memory end.  That addresses hotplug failures in qemu guests, but for bare
metal systems whose memory end isn't aligned to even the smallest size, it
leaves them at 128M.

Make kernels that aren't running on a hypervisor use the largest supported
size (2G) to minimize overhead on big machines.  Kernel boot goes 7%
faster on the aforementioned servers, shaving off half a second.

[[email protected]: v3]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Daniel Jordan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Sistare <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm: mmu_notifier: fix and extend kerneldoc
Krzysztof Kozlowski [Wed, 12 Aug 2020 01:32:09 +0000 (18:32 -0700)]
mm: mmu_notifier: fix and extend kerneldoc

Fix W=1 compile warnings (invalid kerneldoc):

    mm/mmu_notifier.c:187: warning: Function parameter or member 'interval_sub' not described in 'mmu_interval_read_bgin'
    mm/mmu_notifier.c:708: warning: Function parameter or member 'subscription' not described in 'mmu_notifier_registr'
    mm/mmu_notifier.c:708: warning: Excess function parameter 'mn' description in 'mmu_notifier_register'
    mm/mmu_notifier.c:880: warning: Function parameter or member 'subscription' not described in 'mmu_notifier_put'
    mm/mmu_notifier.c:880: warning: Excess function parameter 'mn' description in 'mmu_notifier_put'
    mm/mmu_notifier.c:982: warning: Function parameter or member 'ops' not described in 'mmu_interval_notifier_insert'

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoinclude/linux/sched/mm.h: optimize current_gfp_context()
Waiman Long [Wed, 12 Aug 2020 01:32:06 +0000 (18:32 -0700)]
include/linux/sched/mm.h: optimize current_gfp_context()

The current_gfp_context() converts a number of PF_MEMALLOC_* per-process
flags into the corresponding GFP_* flags for memory allocation.  In that
function, current->flags is accessed 3 times.  That may lead to duplicated
access of the same memory location.

This is not usually a problem with minimal debug config options on as the
compiler can optimize away the duplicated memory accesses.  With most of
the debug config options on, however, that may not be the case.  For
example, the x86-64 object size of the __need_fs_reclaim() in a debug
kernel that calls current_gfp_context() was 309 bytes.  With this patch
applied, the object size is reduced to 202 bytes.  This is a saving of 107
bytes and will probably be slightly faster too.

Use READ_ONCE() to access current->flags to prevent the compiler from
possibly accessing current->flags multiple times.

Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agocma: don't quit at first error when activating reserved areas
Mike Kravetz [Wed, 12 Aug 2020 01:32:03 +0000 (18:32 -0700)]
cma: don't quit at first error when activating reserved areas

The routine cma_init_reserved_areas is designed to activate all
reserved cma areas.  It quits when it first encounters an error.
This can leave some areas in a state where they are reserved but
not activated.  There is no feedback to code which performed the
reservation.  Attempting to allocate memory from areas in such a
state will result in a BUG.

Modify cma_init_reserved_areas to always attempt to activate all
areas.  The called routine, cma_activate_area is responsible for
leaving the area in a valid state.  No one is making active use
of returned error codes, so change the routine to void.

How to reproduce:  This example uses kernelcore, hugetlb and cma
as an easy way to reproduce.  However, this is a more general cma
issue.

Two node x86 VM 16GB total, 8GB per node
Kernel command line parameters, kernelcore=4G hugetlb_cma=8G
Related boot time messages,
  hugetlb_cma: reserve 8192 MiB, up to 4096 MiB per node
  cma: Reserved 4096 MiB at 0x0000000100000000
  hugetlb_cma: reserved 4096 MiB on node 0
  cma: Reserved 4096 MiB at 0x0000000300000000
  hugetlb_cma: reserved 4096 MiB on node 1
  cma: CMA area hugetlb could not be activated

 # echo 8 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  ...
  Call Trace:
    bitmap_find_next_zero_area_off+0x51/0x90
    cma_alloc+0x1a5/0x310
    alloc_fresh_huge_page+0x78/0x1a0
    alloc_pool_huge_page+0x6f/0xf0
    set_max_huge_pages+0x10c/0x250
    nr_hugepages_store_common+0x92/0x120
    ? __kmalloc+0x171/0x270
    kernfs_fop_write+0xc1/0x1a0
    vfs_write+0xc7/0x1f0
    ksys_write+0x5f/0xe0
    do_syscall_64+0x4d/0x90
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: c64be2bb1c6e ("drivers: add Contiguous Memory Allocator")
Signed-off-by: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Acked-by: Barry Song <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm: hugetlb: fix the name of hugetlb CMA
Barry Song [Wed, 12 Aug 2020 01:32:00 +0000 (18:32 -0700)]
mm: hugetlb: fix the name of hugetlb CMA

Once we enable CMA_DEBUGFS, we will get the below errors: directory
'cma-hugetlb' with parent 'cma' already present.

We should have different names for different CMA areas.

Signed-off-by: Barry Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm: cma: fix the name of CMA areas
Barry Song [Wed, 12 Aug 2020 01:31:57 +0000 (18:31 -0700)]
mm: cma: fix the name of CMA areas

Patch series "mm: fix the names of general cma and hugetlb cma", v2.

The current code of CMA can only work when users pass a const string as
name parameter.  we need to fix the way to handle names in CMA.  On the
other hand, to avoid name conflicts after enabling CMA_DEBUGFS, each
hugetlb should get a different CMA name.

This patch (of 2):

If users give a name saved in stack, the current code will generate magic
pointer.  if users don't give a name(NULL), kasprintf() will always return
NULL as we are at the early stage.  that means cma_init_reserved_mem()
will return -ENOMEM if users set name parameter as NULL.

[[email protected]: return cma->name directly in cma_get_name]
Link: https://github.com/ClangBuiltLinux/linux/issues/1063
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Barry Song <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/cma.c: fix NULL pointer dereference when cma could not be activated
Jianqun Xu [Wed, 12 Aug 2020 01:31:54 +0000 (18:31 -0700)]
mm/cma.c: fix NULL pointer dereference when cma could not be activated

In some case the cma area could not be activated, but the cma_alloc be
used under this case, then the kernel will crash caused by NULL pointer
dereference.

Add bitmap valid check in cma_alloc to avoid this issue.

Signed-off-by: Jianqun Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/vmstat: add events for THP migration without split
Anshuman Khandual [Wed, 12 Aug 2020 01:31:51 +0000 (18:31 -0700)]
mm/vmstat: add events for THP migration without split

Add following new vmstat events which will help in validating THP
migration without split.  Statistics reported through these new VM events
will help in performance debugging.

1. THP_MIGRATION_SUCCESS
2. THP_MIGRATION_FAILURE
3. THP_MIGRATION_SPLIT

In addition, these new events also update normal page migration statistics
appropriately via PGMIGRATE_SUCCESS and PGMIGRATE_FAILURE.  While here,
this updates current trace event 'mm_migrate_pages' to accommodate now
available THP statistics.

[[email protected]: s/hpage_nr_pages/thp_nr_pages/]
[[email protected]: v2]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: s/thp_nr_pages/hpage_nr_pages/]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Signed-off-by: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Daniel Jordan <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Zi Yan <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm: thp: remove debug_cow switch
Yang Shi [Wed, 12 Aug 2020 01:31:48 +0000 (18:31 -0700)]
mm: thp: remove debug_cow switch

Since commit 3917c80280c93a7123f ("thp: change CoW semantics for
anon-THP"), the CoW page fault of THP has been rewritten, debug_cow is not
used anymore.  So, just remove it.

Signed-off-by: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/migrate: add migrate-shared test for migrate_vma_*()
Ralph Campbell [Wed, 12 Aug 2020 01:31:45 +0000 (18:31 -0700)]
mm/migrate: add migrate-shared test for migrate_vma_*()

Add a migrate_vma_*() self test for mmap(MAP_SHARED) to verify that
!vma_anonymous() ranges won't be migrated.

Signed-off-by: Ralph Campbell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: "Bharata B Rao" <[email protected]>
Cc: Shuah Khan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm/migrate: optimize migrate_vma_setup() for holes
Ralph Campbell [Wed, 12 Aug 2020 01:31:41 +0000 (18:31 -0700)]
mm/migrate: optimize migrate_vma_setup() for holes

Patch series "mm/migrate: optimize migrate_vma_setup() for holes".

A simple optimization for migrate_vma_*() when the source vma is not an
anonymous vma and a new test case to exercise it.

This patch (of 2):

When migrating system memory to device private memory, if the source
address range is a valid VMA range and there is no memory or a zero page,
the source PFN array is marked as valid but with no PFN.

This lets the device driver allocate private memory and clear it, then
insert the new device private struct page into the CPU's page tables when
migrate_vma_pages() is called.  migrate_vma_pages() only inserts the new
page if the VMA is an anonymous range.

There is no point in telling the device driver to allocate device private
memory and then not migrate the page.  Instead, mark the source PFN array
entries as not migrating to avoid this overhead.

[[email protected]: v2]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ralph Campbell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: "Bharata B Rao" <[email protected]>
Cc: Shuah Khan <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agohugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem
Mike Kravetz [Wed, 12 Aug 2020 01:31:38 +0000 (18:31 -0700)]
hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem

Commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization") requires callers of huge_pte_alloc to hold i_mmap_rwsem
in at least read mode.  This is because the explicit locking in
huge_pmd_share (called by huge_pte_alloc) was removed.  When restructuring
the code, the call to huge_pte_alloc in the else block at the beginning of
hugetlb_fault was missed.

Unfortunately, that else clause is exercised when there is no page table
entry.  This will likely lead to a call to huge_pmd_share.  If
huge_pmd_share thinks pmd sharing is possible, it will traverse the
mapping tree (i_mmap) without holding i_mmap_rwsem.  If someone else is
modifying the tree, bad things such as addressing exceptions or worse
could happen.

Simply remove the else clause.  It should have been removed previously.
The code following the else will call huge_pte_alloc with the appropriate
locking.

To prevent this type of issue in the future, add routines to assert that
i_mmap_rwsem is held, and call these routines in huge pmd sharing
routines.

Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Suggested-by: Matthew Wilcox <[email protected]>
Signed-off-by: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: "Kirill A.Shutemov" <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Prakash Sangappa <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agohugetlbfs: prevent filesystem stacking of hugetlbfs
Mike Kravetz [Wed, 12 Aug 2020 01:31:35 +0000 (18:31 -0700)]
hugetlbfs: prevent filesystem stacking of hugetlbfs

syzbot found issues with having hugetlbfs on a union/overlay as reported
in [1].  Due to the limitations (no write) and special functionality of
hugetlbfs, it does not work well in filesystem stacking.  There are no
know use cases for hugetlbfs stacking.  Rather than making modifications
to get hugetlbfs working in such environments, simply prevent stacking.

[1] https://lore.kernel.org/linux-mm/000000000000b4684e05a2968ca6@google.com/

Reported-by: [email protected]
Suggested-by: Amir Goldstein <[email protected]>
Signed-off-by: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Miklos Szeredi <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Colin Walters <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm, oom: show process exiting information in __oom_kill_process()
Yafang Shao [Wed, 12 Aug 2020 01:31:32 +0000 (18:31 -0700)]
mm, oom: show process exiting information in __oom_kill_process()

When the OOM killer finds a victim and tryies to kill it, if the victim is
already exiting, the task mm will be NULL and no process will be killed.
But the dump_header() has been already executed, so it will be strange to
dump so much information without killing a process.  We'd better show some
helpful information to indicate why this happens.

Suggested-by: David Rientjes <[email protected]>
Signed-off-by: Yafang Shao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Qian Cai <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agodoc, mm: clarify /proc/<pid>/oom_score value range
Michal Hocko [Wed, 12 Aug 2020 01:31:28 +0000 (18:31 -0700)]
doc, mm: clarify /proc/<pid>/oom_score value range

The exported value includes oom_score_adj so the range is no [0, 1000] as
described in the previous section but rather [0, 2000].  Mention that fact
explicitly.

Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Yafang Shao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agodoc, mm: sync up oom_score_adj documentation
Michal Hocko [Wed, 12 Aug 2020 01:31:25 +0000 (18:31 -0700)]
doc, mm: sync up oom_score_adj documentation

There are at least two notes in the oom section.  The 3% discount for root
processes is gone since d46078b28889 ("mm, oom: remove 3% bonus for
CAP_SYS_ADMIN processes").

Likewise children of the selected oom victim are not sacrificed since
bbbe48029720 ("mm, oom: remove 'prefer children over parent' heuristic")

Drop both of them.

Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Yafang Shao <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agomm, oom: make the calculation of oom badness more accurate
Yafang Shao [Wed, 12 Aug 2020 01:31:22 +0000 (18:31 -0700)]
mm, oom: make the calculation of oom badness more accurate

Recently we found an issue on our production environment that when memcg
oom is triggered the oom killer doesn't chose the process with largest
resident memory but chose the first scanned process.  Note that all
processes in this memcg have the same oom_score_adj, so the oom killer
should chose the process with largest resident memory.

Bellow is part of the oom info, which is enough to analyze this issue.
[7516987.983223] memory: usage 16777216kB, limit 16777216kB, failcnt 52843037
[7516987.983224] memory+swap: usage 16777216kB, limit 9007199254740988kB, failcnt 0
[7516987.983225] kmem: usage 301464kB, limit 9007199254740988kB, failcnt 0
[...]
[7516987.983293] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[7516987.983510] [ 5740]     0  5740      257        1    32768        0          -998 pause
[7516987.983574] [58804]     0 58804     4594      771    81920        0          -998 entry_point.bas
[7516987.983577] [58908]     0 58908     7089      689    98304        0          -998 cron
[7516987.983580] [58910]     0 58910    16235     5576   163840        0          -998 supervisord
[7516987.983590] [59620]     0 59620    18074     1395   188416        0          -998 sshd
[7516987.983594] [59622]     0 59622    18680     6679   188416        0          -998 python
[7516987.983598] [59624]     0 59624  1859266     5161   548864        0          -998 odin-agent
[7516987.983600] [59625]     0 59625   707223     9248   983040        0          -998 filebeat
[7516987.983604] [59627]     0 59627   416433    64239   774144        0          -998 odin-log-agent
[7516987.983607] [59631]     0 59631   180671    15012   385024        0          -998 python3
[7516987.983612] [61396]     0 61396   791287     3189   352256        0          -998 client
[7516987.983615] [61641]     0 61641  1844642    29089   946176        0          -998 client
[7516987.983765] [ 9236]     0  9236     2642      467    53248        0          -998 php_scanner
[7516987.983911] [42898]     0 42898    15543      838   167936        0          -998 su
[7516987.983915] [42900]  1000 42900     3673      867    77824        0          -998 exec_script_vr2
[7516987.983918] [42925]  1000 42925    36475    19033   335872        0          -998 python
[7516987.983921] [57146]  1000 57146     3673      848    73728        0          -998 exec_script_J2p
[7516987.983925] [57195]  1000 57195   186359    22958   491520        0          -998 python2
[7516987.983928] [58376]  1000 58376   275764    14402   290816        0          -998 rosmaster
[7516987.983931] [58395]  1000 58395   155166     4449   245760        0          -998 rosout
[7516987.983935] [58406]  1000 58406 18285584  3967322 37101568        0          -998 data_sim
[7516987.984221] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=3aa16c9482ae3a6f6b78bda68a55d32c87c99b985e0f11331cddf05af6c4d753,mems_allowed=0-1,oom_memcg=/kubepods/podf1c273d3-9b36-11ea-b3df-246e9693c184,task_memcg=/kubepods/podf1c273d3-9b36-11ea-b3df-246e9693c184/1f246a3eeea8f70bf91141eeaf1805346a666e225f823906485ea0b6c37dfc3d,task=pause,pid=5740,uid=0
[7516987.984254] Memory cgroup out of memory: Killed process 5740 (pause) total-vm:1028kB, anon-rss:4kB, file-rss:0kB, shmem-rss:0kB
[7516988.092344] oom_reaper: reaped process 5740 (pause), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

We can find that the first scanned process 5740 (pause) was killed, but
its rss is only one page.  That is because, when we calculate the oom
badness in oom_badness(), we always ignore the negtive point and convert
all of these negtive points to 1.  Now as oom_score_adj of all the
processes in this targeted memcg have the same value -998, the points of
these processes are all negtive value.  As a result, the first scanned
process will be killed.

The oom_socre_adj (-998) in this memcg is set by kubelet, because it is a
a Guaranteed pod, which has higher priority to prevent from being killed
by system oom.

To fix this issue, we should make the calculation of oom point more
accurate.  We can achieve it by convert the chosen_point from 'unsigned
long' to 'long'.

[[email protected]: reported a issue in the previous version]
[[email protected]: fixed the issue reported by Cai]
[[email protected]: add the comment in proc_oom_score()]
[[email protected]: v3]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Naresh Kamboju <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Qian Cai <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
4 years agoinclude/linux/mempolicy.h: fix typo
Yanfei Xu [Wed, 12 Aug 2020 01:31:19 +0000 (18:31 -0700)]
include/linux/mempolicy.h: fix typo

Change "interlave" to "interleave".

Signed-off-by: Yanfei Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
This page took 0.133989 seconds and 4 git commands to generate.