]> Git Repo - linux.git/log
linux.git
7 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 9 Sep 2017 17:30:07 +0000 (10:30 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

 - most of the rest of MM

 - a small number of misc things

 - lib/ updates

 - checkpatch

 - autofs updates

 - ipc/ updates

* emailed patches from Andrew Morton <[email protected]>: (126 commits)
  ipc: optimize semget/shmget/msgget for lots of keys
  ipc/sem: play nicer with large nsops allocations
  ipc/sem: drop sem_checkid helper
  ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t
  ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t
  ipc: convert ipc_namespace.count from atomic_t to refcount_t
  kcov: support compat processes
  sh: defconfig: cleanup from old Kconfig options
  mn10300: defconfig: cleanup from old Kconfig options
  m32r: defconfig: cleanup from old Kconfig options
  drivers/pps: use surrounding "if PPS" to remove numerous dependency checks
  drivers/pps: aesthetic tweaks to PPS-related content
  cpumask: make cpumask_next() out-of-line
  kmod: move #ifdef CONFIG_MODULES wrapper to Makefile
  kmod: split off umh headers into its own file
  MAINTAINERS: clarify kmod is just a kernel module loader
  kmod: split out umh code into its own file
  test_kmod: flip INT checks to be consistent
  test_kmod: remove paranoid UINT_MAX check on uint range processing
  vfat: deduplicate hex2bin()
  ...

7 years agoremove gperf left-overs from build system
Linus Torvalds [Sat, 9 Sep 2017 17:00:15 +0000 (10:00 -0700)]
remove gperf left-overs from build system

I removed all the gperf use, but not the Makefile rules.  Sam Ravnborg
says I get bonus points for cleaning this up.  I'll hold him to it.

Requested-by: Sam Ravnborg <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoipc: optimize semget/shmget/msgget for lots of keys
Guillaume Knispel [Fri, 8 Sep 2017 23:17:55 +0000 (16:17 -0700)]
ipc: optimize semget/shmget/msgget for lots of keys

ipc_findkey() used to scan all objects to look for the wanted key.  This
is slow when using a high number of keys.  This change adds an rhashtable
of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n).

This change gives a 865% improvement of benchmark reaim.jobs_per_min on a
56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1]

Other (more micro) benchmark results, by the author: On an i5 laptop, the
following loop executed right after a reboot took, without and with this
change:

    for (int i = 0, k=0x424242; i < KEYS; ++i)
        semget(k++, 1, IPC_CREAT | 0600);

                 total       total          max single  max single
   KEYS        without        with        call without   call with

      1            3.5         4.9   Âµs            3.5         4.9
     10            7.6         8.6   Âµs            3.7         4.7
     32           16.2        15.9   Âµs            4.3         5.3
    100           72.9        41.8   Âµs            3.7         4.7
   1000        5,630.0       502.0   Âµs             *           *
  10000    1,340,000.0     7,240.0   Âµs             *           *
  31900   17,600,000.0    22,200.0   Âµs             *           *

 *: unreliable measure: high variance

The duration for a lookup-only usage was obtained by the same loop once
the keys are present:

                 total       total          max single  max single
   KEYS        without        with        call without   call with

      1            2.1         2.5   Âµs            2.1         2.5
     10            4.5         4.8   Âµs            2.2         2.3
     32           13.0        10.8   Âµs            2.3         2.8
    100           82.9        25.1   Âµs             *          2.3
   1000        5,780.0       217.0   Âµs             *           *
  10000    1,470,000.0     2,520.0   Âµs             *           *
  31900   17,400,000.0     7,810.0   Âµs             *           *

Finally, executing each semget() in a new process gave, when still
summing only the durations of these syscalls:

creation:
                 total       total
   KEYS        without        with

      1            3.7         5.0   Âµs
     10           32.9        36.7   Âµs
     32          125.0       109.0   Âµs
    100          523.0       353.0   Âµs
   1000       20,300.0     3,280.0   Âµs
  10000    2,470,000.0    46,700.0   Âµs
  31900   27,800,000.0   219,000.0   Âµs

lookup-only:
                 total       total
   KEYS        without        with

      1            2.5         2.7   Âµs
     10           25.4        24.4   Âµs
     32          106.0        72.6   Âµs
    100          591.0       352.0   Âµs
   1000       22,400.0     2,250.0   Âµs
  10000    2,510,000.0    25,700.0   Âµs
  31900   28,200,000.0   115,000.0   Âµs

[1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop

Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix
Signed-off-by: Guillaume Knispel <[email protected]>
Reviewed-by: Marc Pardo <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Manfred Spraul <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Serge Hallyn <[email protected]>
Cc: Andrey Vagin <[email protected]>
Cc: Guillaume Knispel <[email protected]>
Cc: Marc Pardo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoipc/sem: play nicer with large nsops allocations
Davidlohr Bueso [Fri, 8 Sep 2017 23:17:52 +0000 (16:17 -0700)]
ipc/sem: play nicer with large nsops allocations

Replacing semop()'s kmalloc for kvmalloc was originally proposed by
Manfred on the premise that it can be called for large (than order-1)
sizes.  For example, while Oracle recommends setting SEMOPM to a _minimum_
of 100, some distros[1] encourage the setting to be a factor of the amount
of db tasks (PROCESSES), which can get fishy for large systems (easily
going beyond 1000).

[1] An Example of Semaphore Settings
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Setting_Semaphores-An_Example_of_Semaphore_Settings.html

So let's just convert this to kvmalloc, just like the rest of the
allocations we do in ipc.  While the fallback vmalloc obviously involves
more overhead, this by far the uncommon path, and it's better for the user
than just erroring out with kmalloc.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoipc/sem: drop sem_checkid helper
Davidlohr Bueso [Fri, 8 Sep 2017 23:17:49 +0000 (16:17 -0700)]
ipc/sem: drop sem_checkid helper

... 'tis not used.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t
Elena Reshetova [Fri, 8 Sep 2017 23:17:45 +0000 (16:17 -0700)]
ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t

refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter.  This allows to avoid
accidental refcounter overflows that might lead to use-after-free
situations.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Hans Liljestrand <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David Windsor <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Serge Hallyn <[email protected]>
Cc: <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoipc: convert sem_undo_list.refcnt from atomic_t to refcount_t
Elena Reshetova [Fri, 8 Sep 2017 23:17:42 +0000 (16:17 -0700)]
ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t

refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter.  This allows to avoid
accidental refcounter overflows that might lead to use-after-free
situations.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Hans Liljestrand <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David Windsor <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Serge Hallyn <[email protected]>
Cc: <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoipc: convert ipc_namespace.count from atomic_t to refcount_t
Elena Reshetova [Fri, 8 Sep 2017 23:17:38 +0000 (16:17 -0700)]
ipc: convert ipc_namespace.count from atomic_t to refcount_t

refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter.  This allows to avoid
accidental refcounter overflows that might lead to use-after-free
situations.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Elena Reshetova <[email protected]>
Signed-off-by: Hans Liljestrand <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David Windsor <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Serge Hallyn <[email protected]>
Cc: <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agokcov: support compat processes
Dmitry Vyukov [Fri, 8 Sep 2017 23:17:35 +0000 (16:17 -0700)]
kcov: support compat processes

Support compat processes in KCOV by providing compat_ioctl callback.
Compat mode uses the same ioctl callback: we have 2 commands that do not
use the argument and 1 that already checks that the arg does not overflow
INT_MAX.  This allows to use KCOV-guided fuzzing in compat processes.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dmitry Vyukov <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agosh: defconfig: cleanup from old Kconfig options
Krzysztof Kozlowski [Fri, 8 Sep 2017 23:17:31 +0000 (16:17 -0700)]
sh: defconfig: cleanup from old Kconfig options

Remove old, dead Kconfig options (in order appearing in this commit):
 - EXPERIMENTAL is gone since v3.9;
 - INET_LRO: commit 7bbf3cae65b6 ("ipv4: Remove inet_lro library");
 - MTD_CONCAT: commit f53fdebcc3e1 ("mtd: drop MTD_CONCAT from Kconfig
   entirely");
 - MTD_PARTITIONS: commit 6a8a98b22b10 ("mtd: kill
   CONFIG_MTD_PARTITIONS");
 - MTD_CHAR: commit 660685d9d1b4 ("mtd: merge mtdchar module with
   mtdcore");
 - NETDEV_1000 and NETDEV_10000: commit f860b0522f65 ("drivers/net:
   Kconfig and Makefile cleanup"); NET_ETHERNET should be replaced with
   just ETHERNET but that is separate change;
 - HID_SUPPORT: commit 1f41a6a99476 ("HID: Fix the generic Kconfig
   options");
 - RCU_CPU_STALL_DETECTOR: commit a00e0d714fbd ("rcu: Remove conditional
   compilation for RCU CPU stall warnings");
 - SYSCTL_SYSCALL_CHECK: commit 7c60c48f58a7 ("sysctl: Improve the
   sysctl sanity checks");
 - VIDEO_OUTPUT_CONTROL: commit f167a64e9d67 ("video / output: Drop
   display output class support");
 - MISC_DEVICES: commit 7c5763b8453a ("drivers: misc: Remove
   MISC_DEVICES config option");
 - AUTOFS_FS: commit 561c5cf9236a ("staging: Remove autofs3");
 - IP_NF_QUEUE: commit 3dd6664fac7e ("netfilter: remove unused "config
   IP_NF_QUEUE"");
 - USB_DEVICE_CLASS: commit 007bab91324e ("USB: remove
   CONFIG_USB_DEVICE_CLASS");
 - USB_LIBUSUAL: commit f61870ee6f8c ("usb: remove libusual");
 - DISPLAY_SUPPORT: commit 5a6b5e02d673 ("fbdev: remove display
   subsystem");
 - IP_NF_TARGET_ULOG: commit d4da843e6fad ("netfilter: kill remnants of
   ulog targets");
 - IP6_NF_QUEUE: commit d16cf20e2f2f ("netfilter: remove ip_queue
   support");
 - IP6_NF_TARGET_LOG: commit 6939c33a757b ("netfilter: merge ipt_LOG and
   ip6_LOG into xt_LOG");

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomn10300: defconfig: cleanup from old Kconfig options
Krzysztof Kozlowski [Fri, 8 Sep 2017 23:17:28 +0000 (16:17 -0700)]
mn10300: defconfig: cleanup from old Kconfig options

Remove old, dead Kconfig options (in order appearing in this commit):
 - EXPERIMENTAL is gone since v3.9;
 - INET_LRO: commit 7bbf3cae65b6 ("ipv4: Remove inet_lro library");
 - MTD_PARTITIONS: commit 6a8a98b22b10 ("mtd: kill
   CONFIG_MTD_PARTITIONS");
 - MTD_CHAR: commit 660685d9d1b4 ("mtd: merge mtdchar module with
   mtdcore");
 - NETDEV_1000 and NETDEV_10000: commit f860b0522f65 ("drivers/net:
   Kconfig and Makefile cleanup"); NET_ETHERNET should be replaced with
   just ETHERNET but that is separate change;
 - HID_SUPPORT: commit 1f41a6a99476 ("HID: Fix the generic Kconfig
   options");
 - RCU_CPU_STALL_DETECTOR: commit a00e0d714fbd ("rcu: Remove conditional
   compilation for RCU CPU stall warnings");

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agom32r: defconfig: cleanup from old Kconfig options
Krzysztof Kozlowski [Fri, 8 Sep 2017 23:17:25 +0000 (16:17 -0700)]
m32r: defconfig: cleanup from old Kconfig options

Remove old, dead Kconfig options (in order appearing in this commit):
 - EXPERIMENTAL is gone since v3.9;
 - IP_NF_QUEUE: commit 3dd6664fac7e ("netfilter: remove unused "config
   IP_NF_QUEUE"");
 - IP_NF_TARGET_ULOG: commit d4da843e6fad ("netfilter: kill remnants of
   ulog targets");
 - VIDEO_OUTPUT_CONTROL: commit f167a64e9d67 ("video / output: Drop
   display output class support");
 - MTD_PARTITIONS: commit 6a8a98b22b10 ("mtd: kill
   CONFIG_MTD_PARTITIONS");
 - MTD_CHAR: commit 660685d9d1b4 ("mtd: merge mtdchar module with
   mtdcore");
 - MTD_CONCAT: commit f53fdebcc3e1 ("mtd: drop MTD_CONCAT from Kconfig
   entirely");

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Acked-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agodrivers/pps: use surrounding "if PPS" to remove numerous dependency checks
Robert P. J. Day [Fri, 8 Sep 2017 23:17:22 +0000 (16:17 -0700)]
drivers/pps: use surrounding "if PPS" to remove numerous dependency checks

Adding high-level "if PPS" makes lower-level dependency tests superfluous.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Robert P. J. Day <[email protected]>
Acked-by: Rodolfo Giometti <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agodrivers/pps: aesthetic tweaks to PPS-related content
Robert P. J. Day [Fri, 8 Sep 2017 23:17:19 +0000 (16:17 -0700)]
drivers/pps: aesthetic tweaks to PPS-related content

Collection of aesthetic adjustments to various PPS-related files,
directories and Documentation, some quite minor just for the sake of
consistency, including:

 * Updated example of pps device tree node (courtesy Rodolfo G.)
 * "PPS-API" -> "PPS API"
 * "pps_source_info_s" -> "pps_source_info"
 * "ktimer driver" -> "pps-ktimer driver"
 * "ppstest /dev/pps0" -> "ppstest /dev/pps1" to match example
 * Add missing PPS-related entries to MAINTAINERS file
 * Other trivialities

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Robert P. J. Day <[email protected]>
Acked-by: Rodolfo Giometti <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agocpumask: make cpumask_next() out-of-line
Alexey Dobriyan [Fri, 8 Sep 2017 23:17:15 +0000 (16:17 -0700)]
cpumask: make cpumask_next() out-of-line

Every for_each_XXX_cpu() invocation calls cpumask_next() which is an
inline function:

static inline unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
        /* -1 is a legal arg here. */
        if (n != -1)
                cpumask_check(n);
        return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
}

However!

find_next_bit() is regular out-of-line function which means "nr_cpu_ids"
load and increment happen at the caller resulting in a lot of bloat

x86_64 defconfig:
add/remove: 3/0 grow/shrink: 8/373 up/down: 155/-5668 (-5513)
x86_64 allyesconfig-ish:
add/remove: 3/1 grow/shrink: 57/634 up/down: 3515/-28177 (-24662) !!!

Some archs redefine find_next_bit() but it is OK:

m68k inline but SMP is not supported
arm out-of-line
unicore32 out-of-line

Function call will happen anyway, so move load and increment into callee.

Link: http://lkml.kernel.org/r/20170824230010.GA1593@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agokmod: move #ifdef CONFIG_MODULES wrapper to Makefile
Luis R. Rodriguez [Fri, 8 Sep 2017 23:17:12 +0000 (16:17 -0700)]
kmod: move #ifdef CONFIG_MODULES wrapper to Makefile

The entire file is now conditionally compiled only when CONFIG_MODULES is
enabled, and this this is a bool.  Just move this conditional to the
Makefile as its easier to read this way.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Luis R. Rodriguez <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Jessica Yu <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Matt Redfearn <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Daniel Mentz <[email protected]>
Cc: David Binderman <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agokmod: split off umh headers into its own file
Luis R. Rodriguez [Fri, 8 Sep 2017 23:17:08 +0000 (16:17 -0700)]
kmod: split off umh headers into its own file

In the future usermode helper users do not need to carry in all the of
kmod headers declarations.

Since kmod.h still includes umh.h this change has no functional changes,
each umh user can be cleaned up separately later and with time.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Luis R. Rodriguez <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Jessica Yu <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Matt Redfearn <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Daniel Mentz <[email protected]>
Cc: David Binderman <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoMAINTAINERS: clarify kmod is just a kernel module loader
Luis R. Rodriguez [Fri, 8 Sep 2017 23:17:04 +0000 (16:17 -0700)]
MAINTAINERS: clarify kmod is just a kernel module loader

This should make it clearer what the kmod code is now that
the umh code is split out separately.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Luis R. Rodriguez <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Jessica Yu <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Matt Redfearn <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Daniel Mentz <[email protected]>
Cc: David Binderman <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agokmod: split out umh code into its own file
Luis R. Rodriguez [Fri, 8 Sep 2017 23:17:00 +0000 (16:17 -0700)]
kmod: split out umh code into its own file

Patch series "kmod: few code cleanups to split out umh code"

The usermode helper has a provenance from the old usb code which first
required a usermode helper.  Eventually this was shoved into kmod.c and
the kernel's modprobe calls was converted over eventually to share the
same code.  Over time the list of usermode helpers in the kernel has grown
-- so kmod is just but one user of the API.

This series is a simple logical cleanup which acknowledges the code
evolution of the usermode helper and shoves the UMH API into its own
dedicated file.  This way users of the API can later just include umh.h
instead of kmod.h.

Note despite the diff state the first patch really is just a code shove,
no functional changes are done there.  I did use git format-patch -M to
generate the patch, but in the end the split was not enough for git to
consider it a rename hence the large diffstat.

I've put this through 0-day and it gives me their machine compilation
blessings with all tests as OK.

This patch (of 4):

There's a slew of usermode helper users and kmod is just one of them.
Split out the usermode helper code into its own file to keep the logic and
focus split up.

This change provides no functional changes.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Luis R. Rodriguez <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Jessica Yu <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Matt Redfearn <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Daniel Mentz <[email protected]>
Cc: David Binderman <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agotest_kmod: flip INT checks to be consistent
Dan Carpenter [Fri, 8 Sep 2017 23:16:57 +0000 (16:16 -0700)]
test_kmod: flip INT checks to be consistent

Most checks will check for min and then max, except the int check.  Flip
the checks to be consistent with the other code.

[[email protected]: massaged commit log]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Jessica Yu <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: David Binderman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agotest_kmod: remove paranoid UINT_MAX check on uint range processing
Dan Carpenter [Fri, 8 Sep 2017 23:16:53 +0000 (16:16 -0700)]
test_kmod: remove paranoid UINT_MAX check on uint range processing

The UINT_MAX comparison is not needed because "max" is already an unsigned
int, and we expect developer C code max value input to have a sensible 0 -
UINT_MAX range.  Note that if it so happens to be UINT_MAX + 1 it would
lead to an issue, but we expect the developer to know this.

[[email protected]: massaged commit log]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Jessica Yu <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: David Binderman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agovfat: deduplicate hex2bin()
OGAWA Hirofumi [Fri, 8 Sep 2017 23:16:50 +0000 (16:16 -0700)]
vfat: deduplicate hex2bin()

We may use hex2bin() instead of custom approach.

Link: http://lkml.kernel.org/r/87zibktpil.fsf@devron
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: OGAWA Hirofumi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoautofs: use unsigned int/long instead of uint/ulong for ioctl args
Tomohiro Kusumi [Fri, 8 Sep 2017 23:16:47 +0000 (16:16 -0700)]
autofs: use unsigned int/long instead of uint/ulong for ioctl args

The standard types unsigned int and unsigned long should be used for
.compat_ioctl.  autofs is the only fs using uing/ulong for this, and these
are even the only uint/ulong in the entire autofs code.

Drop unneeded long cast in return value of autofs_dev_ioctl_compat().
It's already long.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tomohiro Kusumi <[email protected]>
Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoautofs: drop wrong comment
Tomohiro Kusumi [Fri, 8 Sep 2017 23:16:43 +0000 (16:16 -0700)]
autofs: drop wrong comment

This comment was correct when it was added in 8d7b48e0 ("autofs4: add
miscellaneous device for ioctls") in 2008, but not after 4e44b685 "Get rid
of path_lookup in autofs4" in 2009 which introduced find_autofs_mount().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tomohiro Kusumi <[email protected]>
Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoautofs: use AUTOFS_DEV_IOCTL_SIZE
Tomohiro Kusumi [Fri, 8 Sep 2017 23:16:40 +0000 (16:16 -0700)]
autofs: use AUTOFS_DEV_IOCTL_SIZE

Use a macro which defines misc-dev ioctl parameter size (excluding a path
beyond &path[0]) since it's been used to initialize and copy this
structure ever since it first appeared in 8d7b48e0 in 2008.

(or simply get rid of this if this is just unnecessary abstraction when
all it needs is sizeof(struct autofs_dev_ioctl))

Edit: [email protected]
That's a good point but I'd prefer to keep the macro define.
End edit: [email protected]

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tomohiro Kusumi <[email protected]>
Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoautofs: non functional header inclusion cleanup
Tomohiro Kusumi [Fri, 8 Sep 2017 23:16:37 +0000 (16:16 -0700)]
autofs: non functional header inclusion cleanup

Having header includes before any macro (without any dependency) simply
looks normal.  No reason to have these macros in between.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tomohiro Kusumi <[email protected]>
Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoautofs: remove unused AUTOFS_IOC_EXPIRE_DIRECT/INDIRECT
Tomohiro Kusumi [Fri, 8 Sep 2017 23:16:34 +0000 (16:16 -0700)]
autofs: remove unused AUTOFS_IOC_EXPIRE_DIRECT/INDIRECT

These are not used by either kernel or userspace, although
AUTOFS_IOC_EXPIRE_DIRECT once seems to have been used by userspace in
around 2006-2008, which was technically just an alias of the existing
ioctl AUTOFS_IOC_EXPIRE_MULTI.

ioctls for autofs are already complicated enough that they could be
removed unless these are staying here to be able to compile userspace code
of certain period of time from a decade ago.

Edit: [email protected]
Yes, this is indeed very old and anything that still uses must
be updated becuase it will be using broken functionality.
End edit: [email protected]

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tomohiro Kusumi <[email protected]>
Signed-off-by: Ian Kent <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoautofs: make dev ioctl version and ismountpoint user accessible
Ian Kent [Fri, 8 Sep 2017 23:16:30 +0000 (16:16 -0700)]
autofs: make dev ioctl version and ismountpoint user accessible

Some of the autofs miscellaneous device ioctls need to be accessable to
user space applications without CAP_SYS_ADMIN to get information about
autofs mounts.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Kent <[email protected]>
Cc: Colin Walters <[email protected]>
Cc: Ondrej Holy <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoautofs: make disc device user accessible
Ian Kent [Fri, 8 Sep 2017 23:16:27 +0000 (16:16 -0700)]
autofs: make disc device user accessible

The autofs miscellanous device ioctls that shouldn't require
CAP_SYS_ADMIN need to be accessible to user space applications in order
to be able to get information about autofs mounts.

The module checks capabilities so the miscelaneous device should be fine
with broad permissions.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ian Kent <[email protected]>
Cc: Colin Walters <[email protected]>
Cc: Ondrej Holy <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoautofs: fix AT_NO_AUTOMOUNT not being honored
Ian Kent [Fri, 8 Sep 2017 23:16:24 +0000 (16:16 -0700)]
autofs: fix AT_NO_AUTOMOUNT not being honored

The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT which
is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering of an
automount by the call.  But this flag is unconditionally cleared for all
stat family system calls except statx().

stat family system calls have always triggered mount requests for the
negative dentry case in follow_automount() which is intended but prevents
the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.

In order to handle the AT_NO_AUTOMOUNT for both system calls the negative
dentry case in follow_automount() needs to be changed to return ENOENT
when the LOOKUP_AUTOMOUNT flag is clear (and the other required flags are
clear).

AFAICT this change doesn't have any noticable side effects and may, in
some use cases (although I didn't see it in testing) prevent unnecessary
callbacks to the automount daemon.

It's also possible that a stat family call has been made with a path that
is in the process of being mounted by some other process.  But stat family
calls should return the automount state of the path as it is "now" so it
shouldn't wait for mount completion.

This is the same semantic as the positive dentry case already handled.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: deccf497d804a4c5fca ("Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()")
Signed-off-by: Ian Kent <[email protected]>
Cc: David Howells <[email protected]>
Cc: Colin Walters <[email protected]>
Cc: Ondrej Holy <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoinit/main.c: extract early boot entropy from the passed cmdline
Daniel Micay [Fri, 8 Sep 2017 23:16:20 +0000 (16:16 -0700)]
init/main.c: extract early boot entropy from the passed cmdline

Feed the boot command-line as to the /dev/random entropy pool

Existing Android bootloaders usually pass data which may not be known by
an external attacker on the kernel command-line.  It may also be the
case on other embedded systems.  Sample command-line from a Google Pixel
running CopperheadOS....

    console=ttyHSL0,115200,n8 androidboot.console=ttyHSL0
    androidboot.hardware=sailfish user_debug=31 ehci-hcd.park=3
    lpm_levels.sleep_disabled=1 cma=32M@0-0xffffffff buildvariant=user
    veritykeyid=id:dfcb9db0089e5b3b4090a592415c28e1cb4545ab
    androidboot.bootdevice=624000.ufshc androidboot.verifiedbootstate=yellow
    androidboot.veritymode=enforcing androidboot.keymaster=1
    androidboot.serialno=FA6CE0305299 androidboot.baseband=msm
    mdss_mdp.panel=1:dsi:0:qcom,mdss_dsi_samsung_ea8064tg_1080p_cmd:1:none:cfg:single_dsi
    androidboot.slot_suffix=_b fpsimd.fpsimd_settings=0
    app_setting.use_app_setting=0 kernelflag=0x00000000 debugflag=0x00000000
    androidboot.hardware.revision=PVT radioflag=0x00000000
    radioflagex1=0x00000000 radioflagex2=0x00000000 cpumask=0x00000000
    androidboot.hardware.ddr=4096MB,Hynix,LPDDR4 androidboot.ddrinfo=00000006
    androidboot.ddrsize=4GB androidboot.hardware.color=GRA00
    androidboot.hardware.ufs=32GB,Samsung androidboot.msm.hw_ver_id=268824801
    androidboot.qf.st=2 androidboot.cid=11111111 androidboot.mid=G-2PW4100
    androidboot.bootloader=8996-012001-1704121145
    androidboot.oem_unlock_support=1 androidboot.fp_src=1
    androidboot.htc.hrdump=detected androidboot.ramdump.opt=mem@2g:2g,mem@4g:2g
    androidboot.bootreason=reboot androidboot.ramdump_enable=0 ro
    root=/dev/dm-0 dm="system none ro,0 1 android-verity /dev/sda34"
    rootwait skip_initramfs init=/init androidboot.wificountrycode=US
    androidboot.boottime=1BLL:85,1BLE:669,2BLL:0,2BLE:1777,SW:6,KL:8136

Among other things, it contains a value unique to the device
(androidboot.serialno=FA6CE0305299), unique to the OS builds for the
device variant (veritykeyid=id:dfcb9db0089e5b3b4090a592415c28e1cb4545ab)
and timings from the bootloader stages in milliseconds
(androidboot.boottime=1BLL:85,1BLE:669,2BLL:0,2BLE:1777,SW:6,KL:8136).

[[email protected]: changelog tweak]
[[email protected]: line-wrapped command line]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Daniel Micay <[email protected]>
Signed-off-by: Laura Abbott <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Nick Kralevich <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoinit: move stack canary initialization after setup_arch
Laura Abbott [Fri, 8 Sep 2017 23:16:17 +0000 (16:16 -0700)]
init: move stack canary initialization after setup_arch

Patch series "Command line randomness", v3.

A series to add the kernel command line as a source of randomness.

This patch (of 2):

Stack canary intialization involves getting a random number.  Getting this
random number may involve accessing caches or other architectural specific
features which are not available until after the architecture is setup.
Move the stack canary initialization later to accommodate this.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Laura Abbott <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Daniel Micay <[email protected]>
Cc: Nick Kralevich <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agobinfmt_flat: delete two error messages for a failed memory allocation in decompress_e...
Markus Elfring [Fri, 8 Sep 2017 23:16:14 +0000 (16:16 -0700)]
binfmt_flat: delete two error messages for a failed memory allocation in decompress_exec()

Omit extra messages for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Markus Elfring <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agocheckpatch: add 6 missing types to --list-types
Jean Delvare [Fri, 8 Sep 2017 23:16:11 +0000 (16:16 -0700)]
checkpatch: add 6 missing types to --list-types

Unlike all other types, LONG_LINE, LONG_LINE_COMMENT and LONG_LINE_STRING
are passed to WARN() through a variable.  This causes the parser in
list_types() to miss them and consequently they are not present in the
output of --list-types.

Additionally, types TYPO_SPELLING, FSF_MAILING_ADDRESS and AVOID_BUG are
passed with a variable level, causing the parser to miss them too.

So modify the regex to also catch these special cases.

Link: http://lkml.kernel.org/r/20170902175610.7e4a7c9d@endymion
Fixes: 3beb42eced39 ("checkpatch: add --list-types to show message types to show or ignore")
Signed-off-by: Jean Delvare <[email protected]>
Acked-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agocheckpatch: rename variables to avoid confusion
Jean Delvare [Fri, 8 Sep 2017 23:16:07 +0000 (16:16 -0700)]
checkpatch: rename variables to avoid confusion

The variable name "$msg_type" is sometimes used to set the message type,
and sometimes used to set the message level.  This works but is kind of
confusing.  Use "$msg_level" in the latter case instead, to make the code
clearer.

Link: http://lkml.kernel.org/r/20170902175345.175db33a@endymion
Signed-off-by: Jean Delvare <[email protected]>
Acked-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agocheckpatch: fix typo in comment
Jean Delvare [Fri, 8 Sep 2017 23:16:04 +0000 (16:16 -0700)]
checkpatch: fix typo in comment

Link: http://lkml.kernel.org/r/20170902175249.15bb77f2@endymion
Signed-off-by: Jean Delvare <[email protected]>
Acked-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agocheckpatch: add --strict check for ifs with unnecessary parentheses
Joe Perches [Fri, 8 Sep 2017 23:16:01 +0000 (16:16 -0700)]
checkpatch: add --strict check for ifs with unnecessary parentheses

An if statement test like
if ((foo == bar) && (baz != qux))
can arguably be better written without the parentheses as
if (foo == bar && baz != qux)

Add a test to find these cases.

Link: http://lkml.kernel.org/r/dcd0561ddd0fa43c51a420d53b550d738bf42001.1502734458.git.joe@perches.com
Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/oid_registry.c: X.509: fix the buffer overflow in the utility function for OID...
Takashi Iwai [Fri, 8 Sep 2017 23:15:58 +0000 (16:15 -0700)]
lib/oid_registry.c: X.509: fix the buffer overflow in the utility function for OID string

The sprint_oid() utility function doesn't properly check the buffer size
that it causes that the warning in vsnprintf() be triggered.  For
example on v4.1 kernel:

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 2357 at lib/vsprintf.c:1867 vsnprintf+0x5a7/0x5c0()
  ...

We can trigger this issue by injecting maliciously crafted x509 cert in
DER format.  Just using hex editor to change the length of OID to over
the length of the SEQUENCE container.  For example:

    0:d=0  hl=4 l= 980 cons: SEQUENCE
    4:d=1  hl=4 l= 700 cons:  SEQUENCE
    8:d=2  hl=2 l=   3 cons:   cont [ 0 ]
   10:d=3  hl=2 l=   1 prim:    INTEGER           :02
   13:d=2  hl=2 l=   9 prim:   INTEGER           :9B47FAF791E7D1E3
   24:d=2  hl=2 l=  13 cons:   SEQUENCE
   26:d=3  hl=2 l=   9 prim:    OBJECT            :sha256WithRSAEncryption
   37:d=3  hl=2 l=   0 prim:    NULL
   39:d=2  hl=2 l= 121 cons:   SEQUENCE
   41:d=3  hl=2 l=  22 cons:    SET
   43:d=4  hl=2 l=  20 cons:     SEQUENCE      <=== the SEQ length is 20
   45:d=5  hl=2 l=   3 prim:      OBJECT            :organizationName
<=== the original length is 3, change the length of OID to over the length of SEQUENCE

Pawel Wieczorkiewicz reported this problem and Takashi Iwai provided
patch to fix it by checking the bufsize in sprint_oid().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: "Lee, Chun-Yi" <[email protected]>
Reported-by: Pawel Wieczorkiewicz <[email protected]>
Cc: David Howells <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Pawel Wieczorkiewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoradix-tree: must check __radix_tree_preload() return value
Eric Dumazet [Fri, 8 Sep 2017 23:15:54 +0000 (16:15 -0700)]
radix-tree: must check __radix_tree_preload() return value

__radix_tree_preload() only disables preemption if no error is returned.

So we really need to make sure callers always check the return value.

idr_preload() contract is to always disable preemption, so we need
to add a missing preempt_disable() if an error happened.

Similarly, ida_pre_get() only needs to call preempt_enable() in the
case no error happened.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Fixes: 7ad3d4d85c7a ("ida: Move ida_bitmap to a percpu variable")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: <[email protected]> [4.11+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/cmdline.c: remove meaningless comment
Baoquan He [Fri, 8 Sep 2017 23:15:51 +0000 (16:15 -0700)]
lib/cmdline.c: remove meaningless comment

One line of code was commented out by c++ style comment for debugging, but
forgot removing it.

Clean it up.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Baoquan He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/string.c: check for kmalloc() failure
Dan Carpenter [Fri, 8 Sep 2017 23:15:48 +0000 (16:15 -0700)]
lib/string.c: check for kmalloc() failure

This is mostly to keep the number of static checker warnings down so we
can spot new bugs instead of them being drowned in noise.  This function
doesn't return normal kernel error codes but instead the return value is
used to display exactly which memory failed.  I chose -1 as hopefully
that's a helpful thing to print.

Link: http://lkml.kernel.org/r/20170817115420.uikisjvfmtrqkzjn@mwanda
Signed-off-by: Dan Carpenter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Heikki Krogerus <[email protected]>
Cc: Daniel Micay <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/rhashtable: fix comment on locks_mul default value
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:45 +0000 (16:15 -0700)]
lib/rhashtable: fix comment on locks_mul default value

As of commit 4cf0b354d92 ("rhashtable: avoid large lock-array
allocations"), the default value for the locks multiplier was reduced
from 128 to 32.

Update the header file to reflect this.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: Florian Westphal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agobitmap: introduce BITMAP_FROM_U64()
Yury Norov [Fri, 8 Sep 2017 23:15:41 +0000 (16:15 -0700)]
bitmap: introduce BITMAP_FROM_U64()

The macro is the compile-time analogue of bitmap_from_u64() with the same
purpose: convert the 64-bit number to the properly ordered pair of 32-bit
parts, suitable for filling the bitmap in 32-bit BE environment.

Use it to make test_bitmap_parselist() correct for 32-bit BE ABIs.

Tested on BE mips/qemu.

[[email protected]: tweak code comment]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Cc: Noam Camus <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/test_bitmap.c: add test for bitmap_parselist()
Yury Norov [Fri, 8 Sep 2017 23:15:38 +0000 (16:15 -0700)]
lib/test_bitmap.c: add test for bitmap_parselist()

Do some basic checks for bitmap_parselist().

[[email protected]: fix printk warning]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Cc: Noam Camus <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/bitmap.c: make bitmap_parselist() thread-safe and much faster
Yury Norov [Fri, 8 Sep 2017 23:15:34 +0000 (16:15 -0700)]
lib/bitmap.c: make bitmap_parselist() thread-safe and much faster

Current implementation of bitmap_parselist() uses a static variable to
save local state while setting bits in the bitmap.  It is obviously wrong
if we assume execution in multiprocessor environment.  Fortunately, it's
possible to rewrite this portion of code to avoid using the static
variable.

It is also possible to set bits in the mask per-range with bitmap_set(),
not per-bit, as it is implemented now, with set_bit(); which is way
faster.

The important side effect of this change is that setting bits in this
function from now is not per-bit atomic and less memory-ordered.  This is
because set_bit() guarantees the order of memory accesses, while
bitmap_set() does not.  I think that it is the advantage of the new
approach, because the bitmap_parselist() is intended to initialise bit
arrays, and user should protect the whole bitmap during initialisation if
needed.  So protecting individual bits looks expensive and useless.  Also,
other range-oriented functions in lib/bitmap.c don't worry much about
atomicity.

With all that, setting 2k bits in map with the pattern like 0-2047:128/256
becomes ~50 times faster after applying the patch in my testing
environment (arm64 hosted on qemu).

The second patch of the series adds the test for bitmap_parselist().  It's
not intended to cover all tricky cases, just to make sure that I didn't
screw up during rework.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yury Norov <[email protected]>
Cc: Noam Camus <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib: add test module for CONFIG_DEBUG_VIRTUAL
Florian Fainelli [Fri, 8 Sep 2017 23:15:31 +0000 (16:15 -0700)]
lib: add test module for CONFIG_DEBUG_VIRTUAL

Add a test module that allows testing that CONFIG_DEBUG_VIRTUAL works
correctly, at least that it can catch invalid calls to virt_to_phys()
against the non-linear kernel virtual address map.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Florian Fainelli <[email protected]>
Cc: "Luis R. Rodriguez" <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/hexdump.c: return -EINVAL in case of error in hex2bin()
Andy Shevchenko [Fri, 8 Sep 2017 23:15:28 +0000 (16:15 -0700)]
lib/hexdump.c: return -EINVAL in case of error in hex2bin()

In some cases caller would like to use error code directly without
shadowing.

-EINVAL feels a rightful code to return in case of error in hex2bin().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoblock/cfq: cache rightmost rb_node
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:25 +0000 (16:15 -0700)]
block/cfq: cache rightmost rb_node

For the same reasons we already cache the leftmost pointer, apply the same
optimization for rb_last() calls.  Users must explicitly do this as
rb_root_cached only deals with the smallest node.

[[email protected]: brain fart #1]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomem/memcg: cache rightmost node
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:21 +0000 (16:15 -0700)]
mem/memcg: cache rightmost node

Such that we can optimize __mem_cgroup_largest_soft_limit_node().  The
only overhead is the extra footprint for the cached pointer, but this
should not be an issue for mem_cgroup_tree_per_node.

[[email protected]: brain fart #2]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agofs/epoll: use faster rb_first_cached()
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:18 +0000 (16:15 -0700)]
fs/epoll: use faster rb_first_cached()

...  such that we can avoid the tree walks to get the node with the
smallest key.  Semantically the same, as the previously used rb_first(),
but O(1).  The main overhead is the extra footprint for the cached rb_node
pointer, which should not matter for epoll.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoprocfs: use faster rb_first_cached()
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:15 +0000 (16:15 -0700)]
procfs: use faster rb_first_cached()

...  such that we can avoid the tree walks to get the node with the
smallest key.  Semantically the same, as the previously used rb_first(),
but O(1).  The main overhead is the extra footprint for the cached rb_node
pointer, which should not matter for procfs.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/interval-tree: correct comment wrt generic flavor
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:12 +0000 (16:15 -0700)]
lib/interval-tree: correct comment wrt generic flavor

interval_tree.h _is_ the generic flavor.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/interval_tree: fast overlap detection
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:08 +0000 (16:15 -0700)]
lib/interval_tree: fast overlap detection

Allow interval trees to quickly check for overlaps to avoid unnecesary
tree lookups in interval_tree_iter_first().

As of this patch, all interval tree flavors will require using a
'rb_root_cached' such that we can have the leftmost node easily
available.  While most users will make use of this feature, those with
special functions (in addition to the generic insert, delete, search
calls) will avoid using the cached option as they can do funky things
with insertions -- for example, vma_interval_tree_insert_after().

[[email protected]: fix deadlock from typo vm_lock_anon_vma()]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Jérôme Glisse <[email protected]>
Acked-by: Christian König <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Doug Ledford <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Christian Benvenuti <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoblock/cfq: replace cfq_rb_root leftmost caching
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:05 +0000 (16:15 -0700)]
block/cfq: replace cfq_rb_root leftmost caching

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolocking/rtmutex: replace top-waiter and pi_waiters leftmost caching
Davidlohr Bueso [Fri, 8 Sep 2017 23:15:01 +0000 (16:15 -0700)]
locking/rtmutex: replace top-waiter and pi_waiters leftmost caching

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agosched/deadline: replace earliest dl and rq leftmost caching
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:58 +0000 (16:14 -0700)]
sched/deadline: replace earliest dl and rq leftmost caching

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agosched/fair: replace cfs_rq->rb_leftmost
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:55 +0000 (16:14 -0700)]
sched/fair: replace cfs_rq->rb_leftmost

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/rbtree_test.c: support rb_root_cached
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:52 +0000 (16:14 -0700)]
lib/rbtree_test.c: support rb_root_cached

We can work with a single rb_root_cached root to test both cached and
non-cached rbtrees.  In addition, also add a test to measure latencies
between rb_first and its fast counterpart.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/rbtree_test.c: add (inorder) traversal test
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:49 +0000 (16:14 -0700)]
lib/rbtree_test.c: add (inorder) traversal test

This adds a second test for regular rb-tree testing in that there is no
need to repeat it for the augmented flavor.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/rbtree_test.c: make input module parameters
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:46 +0000 (16:14 -0700)]
lib/rbtree_test.c: make input module parameters

Allows for more flexible debugging.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agorbtree: add some additional comments for rebalancing cases
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:42 +0000 (16:14 -0700)]
rbtree: add some additional comments for rebalancing cases

While overall the code is very nicely commented, it might not be
immediately obvious from the diagrams what is going on.  Add a very
brief summary of each case.  Opposite cases where the node is the left
child are left untouched.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agorbtree: optimize root-check during rebalancing loop
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:39 +0000 (16:14 -0700)]
rbtree: optimize root-check during rebalancing loop

The only times the nil-parent (root node) condition is true is when the
node is the first in the tree, or after fixing rbtree rule #4 and the
case 1 rebalancing made the node the root.  Such conditions do not apply
most of the time:

(i) The common case in an rbtree is to have more than a single node,
    so this is only true for the first rb_insert().

(ii) While there is a chance only one first rotation is needed, cases
    where the node's uncle is black (cases 2,3) are more common as we can
    have the following scenarios during the rotation looping:

    case1 only, case1+1, case2+3, case1+2+3, case3 only, etc.

This patch, therefore, adds an unlikely() optimization to this
conditional.  When profiling with CONFIG_PROFILE_ANNOTATED_BRANCHES, a
kernel build shows that the incorrect rate is less than 15%, and for
workloads that involve insert mostly trees overtime tend to have less
than 2% incorrect rate.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agorbtree: cache leftmost node internally
Davidlohr Bueso [Fri, 8 Sep 2017 23:14:36 +0000 (16:14 -0700)]
rbtree: cache leftmost node internally

Patch series "rbtree: Cache leftmost node internally", v4.

A series to extending rbtrees to internally cache the leftmost node such
that we can have fast overlap check optimization for all interval tree
users[1].  The benefits of this series are that:

(i)   Unify users that do internal leftmost node caching.
(ii)  Optimize all interval tree users.
(iii) Convert at least two new users (epoll and procfs) to the new interface.

This patch (of 16):

Red-black tree semantics imply that nodes with smaller or greater (or
equal for duplicates) keys always be to the left and right,
respectively.  For the kernel this is extremely evident when considering
our rb_first() semantics.  Enabling lookups for the smallest node in the
tree in O(1) can save a good chunk of cycles in not having to walk down
the tree each time.  To this end there are a few core users that
explicitly do this, such as the scheduler and rtmutexes.  There is also
the desire for interval trees to have this optimization allowing faster
overlap checking.

This patch introduces a new 'struct rb_root_cached' which is just the
root with a cached pointer to the leftmost node.  The reason why the
regular rb_root was not extended instead of adding a new structure was
that this allows the user to have the choice between memory footprint
and actual tree performance.  The new wrappers on top of the regular
rb_root calls are:

 - rb_first_cached(cached_root) -- which is a fast replacement
     for rb_first.

 - rb_insert_color_cached(node, cached_root, new)

 - rb_erase_cached(node, cached_root)

In addition, augmented cached interfaces are also added for basic
insertion and deletion operations; which becomes important for the
interval tree changes.

With the exception of the inserts, which adds a bool for updating the
new leftmost, the interfaces are kept the same.  To this end, porting rb
users to the cached version becomes really trivial, and keeping current
rbtree semantics for users that don't care about the optimization
requires zero overhead.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agobitops: avoid integer overflow in GENMASK(_ULL)
Matthias Kaehlcke [Fri, 8 Sep 2017 23:14:33 +0000 (16:14 -0700)]
bitops: avoid integer overflow in GENMASK(_ULL)

GENMASK(_ULL) performs a left-shift of ~0UL(L), which technically
results in an integer overflow.  clang raises a warning if the overflow
occurs in a preprocessor expression.  Clear the low-order bits through a
substraction instead of the left-shift to avoid the overflow.

(akpm: no change in .text size in my testing)

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthias Kaehlcke <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoinclude: warn for inconsistent endian config definition
Babu Moger [Fri, 8 Sep 2017 23:14:29 +0000 (16:14 -0700)]
include: warn for inconsistent endian config definition

We have seen some generic code use config parameter CONFIG_CPU_BIG_ENDIAN
to decide the endianness.

Here are the few examples.
include/asm-generic/qrwlock.h
drivers/of/base.c
drivers/of/fdt.c
drivers/tty/serial/earlycon.c
drivers/tty/serial/serial_core.c

Display warning if CPU_BIG_ENDIAN is not defined on big endian
architecture and also warn if it defined on little endian architectures.

Here is our original discussion
https://lkml.org/lkml/2017/5/24/620

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Babu Moger <[email protected]>
Suggested-by: Arnd Bergmann <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]> (powerpc)
Cc: Michal Simek <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoarch/microblaze: add choice for endianness and update Makefile
Babu Moger [Fri, 8 Sep 2017 23:14:25 +0000 (16:14 -0700)]
arch/microblaze: add choice for endianness and update Makefile

microblaze architectures can be configured for either little or big endian
formats.  Add a choice option for the user to select the correct endian
format(default to big endian).

Also update the Makefile so toolchain can compile for the format it is
configured for.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Babu Moger <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]> (powerpc)
Cc: Peter Zijlstra <[email protected]>
Cc: Stafford Horne <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoarch: define CPU_BIG_ENDIAN for all fixed big endian archs
Babu Moger [Fri, 8 Sep 2017 23:14:22 +0000 (16:14 -0700)]
arch: define CPU_BIG_ENDIAN for all fixed big endian archs

Patch series "Define CPU_BIG_ENDIAN or warn for inconsistencies", v3.

While working on enabling queued rwlock on SPARC, found this following
code in include/asm-generic/qrwlock.h which uses CONFIG_CPU_BIG_ENDIAN to
clear a byte.

static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
 {
return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
 }

Problem is many of the fixed big endian architectures don't define
CPU_BIG_ENDIAN and clears the wrong byte.

Define CPU_BIG_ENDIAN for all the fixed big endian architecture to fix it.

Also found few more references of this config parameter in
drivers/of/base.c
drivers/of/fdt.c
drivers/tty/serial/earlycon.c
drivers/tty/serial/serial_core.c
Be aware that this may cause regressions if someone has worked-around
problems in the above code already. Remove the work-around.

Here is our original discussion
https://lkml.org/lkml/2017/5/24/620

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Babu Moger <[email protected]>
Suggested-by: Arnd Bergmann <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Stafford Horne <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Stefan Kristiansson <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Michael Ellerman <[email protected]> (powerpc)
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agotreewide: make "nr_cpu_ids" unsigned
Alexey Dobriyan [Fri, 8 Sep 2017 23:14:18 +0000 (16:14 -0700)]
treewide: make "nr_cpu_ids" unsigned

First, number of CPUs can't be negative number.

Second, different signnnedness leads to suboptimal code in the following
cases:

1)
kmalloc(nr_cpu_ids * sizeof(X));

"int" has to be sign extended to size_t.

2)
while (loff_t *pos < nr_cpu_ids)

MOVSXD is 1 byte longed than the same MOV.

Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".

Code savings on allyesconfig kernel: -3KB

add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
function                                     old     new   delta
coretemp_cpu_online                          450     512     +62
rcu_init_one                                1234    1272     +38
pci_device_probe                             374     399     +25

...

pgdat_reclaimable_pages                      628     556     -72
select_fallback_rq                           446     369     -77
task_numa_find_cpu                          1923    1807    -116

Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agovga: optimise console scrolling
Matthew Wilcox [Fri, 8 Sep 2017 23:14:15 +0000 (16:14 -0700)]
vga: optimise console scrolling

Where possible, call memset16(), memmove() or memcpy() instead of using
open-coded loops.  I don't like the calling convention that uses a byte
count instead of a count of u16s, but it's a little late to change that.
Reduces code size of fbcon.o by almost 400 bytes on my laptop build.

[[email protected]: fix build]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: David Miller <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agodrivers/scsi/sym53c8xx_2/sym_hipd.c: convert to use memset32
Matthew Wilcox [Fri, 8 Sep 2017 23:14:11 +0000 (16:14 -0700)]
drivers/scsi/sym53c8xx_2/sym_hipd.c: convert to use memset32

memset32() can be used to initialise these three arrays.  Minor code
footprint reduction.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: David Miller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agodrivers/block/zram/zram_drv.c: convert to using memset_l
Matthew Wilcox [Fri, 8 Sep 2017 23:14:07 +0000 (16:14 -0700)]
drivers/block/zram/zram_drv.c: convert to using memset_l

zram was the motivation for creating memset_l().  Minchan Kim sees a 7%
performance improvement on x86 with 100MB of non-zero deduplicatable
data:

        perf stat -r 10 dd if=/dev/zram0 of=/dev/null

vanilla:        0.232050465 seconds time elapsed ( +-  0.51% )
memset_l: 0.217219387 seconds time elapsed ( +-  0.07% )

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Tested-by: Minchan Kim <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: David Miller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoalpha: add support for memset16
Matthew Wilcox [Fri, 8 Sep 2017 23:14:04 +0000 (16:14 -0700)]
alpha: add support for memset16

Alpha already had an optimised fill-memory-with-16-bit-quantity
assembler routine called memsetw().  It has a slightly different calling
convention from memset16() in that it takes a byte count, not a count of
words.  That's the same convention used by ARM's __memset routines, so
rename Alpha's routine to match and add a memset16() wrapper around it.
Then convert Alpha's scr_memsetw() to call memset16() instead of
memsetw().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: David Miller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoARM: implement memset32 & memset64
Matthew Wilcox [Fri, 8 Sep 2017 23:14:00 +0000 (16:14 -0700)]
ARM: implement memset32 & memset64

Reuse the existing optimised memset implementation to implement an
optimised memset32 and memset64.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Reviewed-by: Russell King <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: David Miller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agox86: implement memset16, memset32 & memset64
Matthew Wilcox [Fri, 8 Sep 2017 23:13:56 +0000 (16:13 -0700)]
x86: implement memset16, memset32 & memset64

These are single instructions on x86.  There's no 64-bit instruction for
x86-32, but we don't yet have any user for memset64() on 32-bit
architectures, so don't bother to implement it.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: David Miller <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/string.c: add testcases for memset16/32/64
Matthew Wilcox [Fri, 8 Sep 2017 23:13:52 +0000 (16:13 -0700)]
lib/string.c: add testcases for memset16/32/64

[[email protected]: minor tweaks]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: David Miller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolib/string.c: add multibyte memset functions
Matthew Wilcox [Fri, 8 Sep 2017 23:13:48 +0000 (16:13 -0700)]
lib/string.c: add multibyte memset functions

Patch series "Multibyte memset variations", v4.

A relatively common idiom we're missing is a function to fill an area of
memory with a pattern which is larger than a single byte.  I first
noticed this with a zram patch which wanted to fill a page with an
'unsigned long' value.  There turn out to be quite a few places in the
kernel which can benefit from using an optimised function rather than a
loop; sometimes text size, sometimes speed, and sometimes both.  The
optimised PowerPC version (not included here) improves performance by
about 30% on POWER8 on just the raw memset_l().

Most of the extra lines of code come from the three testcases I added.

This patch (of 8):

memset16(), memset32() and memset64() are like memset(), but allow the
caller to fill the destination with a value larger than a single byte.
memset_l() and memset_p() allow the caller to use unsigned long and
pointer values respectively.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthew Wilcox <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: David Miller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agolinux/kernel.h: move DIV_ROUND_DOWN_ULL() macro
Masahiro Yamada [Fri, 8 Sep 2017 23:13:45 +0000 (16:13 -0700)]
linux/kernel.h: move DIV_ROUND_DOWN_ULL() macro

This macro is useful to avoid link error on 32-bit systems.

We have the same definition in two drivers, so move it to
include/linux/kernel.h

While we are here, refactor DIV_ROUND_UP_ULL() by using
DIV_ROUND_DOWN_ULL().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Mark Brown <[email protected]>
Cc: Cyrille Pitchen <[email protected]>
Cc: Jaroslav Kysela <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Liam Girdwood <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Marek Vasut <[email protected]>
Cc: Brian Norris <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: David Woodhouse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agofs, proc: unconditional cond_resched when reading smaps
David Rientjes [Fri, 8 Sep 2017 23:13:41 +0000 (16:13 -0700)]
fs, proc: unconditional cond_resched when reading smaps

If there are large numbers of hugepages to iterate while reading
/proc/pid/smaps, the page walk never does cond_resched().  On archs
without split pmd locks, there can be significant and observable
contention on mm->page_table_lock which cause lengthy delays without
rescheduling.

Always reschedule in smaps_pte_range() if necessary since the pagewalk
iteration can be expensive.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Rientjes <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agoproc: uninline proc_create()
Alexey Dobriyan [Fri, 8 Sep 2017 23:13:38 +0000 (16:13 -0700)]
proc: uninline proc_create()

Save some code from ~320 invocations all clearing last argument.

add/remove: 3/0 grow/shrink: 0/158 up/down: 45/-702 (-657)
function                                     old     new   delta
proc_create                                    -      17     +17
__ksymtab_proc_create                          -      16     +16
__kstrtab_proc_create                          -      12     +12
yam_init_driver                              301     298      -3

...

cifs_proc_init                               249     228     -21
via_fb_pci_probe                            2304    2280     -24

Link: http://lkml.kernel.org/r/20170819094702.GA27864@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agofs, proc: remove priv argument from is_stack
Michal Hocko [Fri, 8 Sep 2017 23:13:35 +0000 (16:13 -0700)]
fs, proc: remove priv argument from is_stack

Commit b18cb64ead40 ("fs/proc: Stop trying to report thread stacks")
removed the priv parameter user in is_stack so the argument is
redundant.  Drop it.

[[email protected]: remove unused variable]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm/mempolicy.c: remove BUG_ON() checks for VMA inside mpol_misplaced()
Anshuman Khandual [Fri, 8 Sep 2017 23:13:32 +0000 (16:13 -0700)]
mm/mempolicy.c: remove BUG_ON() checks for VMA inside mpol_misplaced()

VMA and its address bounds checks are too late in this function.  They
must have been verified earlier in the page fault sequence.  Hence just
remove them.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Suggested-by: Vlastimil Babka <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm/swapfile.c: fix swapon frontswap_map memory leak on error
David Rientjes [Fri, 8 Sep 2017 23:13:29 +0000 (16:13 -0700)]
mm/swapfile.c: fix swapon frontswap_map memory leak on error

Free frontswap_map if an error is encountered before enable_swap_info().

Signed-off-by: David Rientjes <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: <[email protected]> [4.12+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm: kvfree the swap cluster info if the swap file is unsatisfactory
Darrick J. Wong [Fri, 8 Sep 2017 23:13:25 +0000 (16:13 -0700)]
mm: kvfree the swap cluster info if the swap file is unsatisfactory

If initializing a small swap file fails because the swap file has a
problem (holes, etc.) then we need to free the cluster info as part of
cleanup.  Unfortunately a previous patch changed the code to use kvzalloc
but did not change all the vfree calls to use kvfree.

Found by running generic/357 from xfstests.

Link: http://lkml.kernel.org/r/20170831233515.GR3775@magnolia
Fixes: 54f180d3c181 ("mm, swap: use kvzalloc to allocate some swap data structures")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: "Huang, Ying" <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: <[email protected]> [4.12+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm/page_alloc.c: apply gfp_allowed_mask before the first allocation attempt
Tetsuo Handa [Fri, 8 Sep 2017 23:13:22 +0000 (16:13 -0700)]
mm/page_alloc.c: apply gfp_allowed_mask before the first allocation attempt

We are by error initializing alloc_flags before gfp_allowed_mask is
applied.  This could cause problems after pm_restrict_gfp_mask() is called
during suspend operation.  Apply gfp_allowed_mask before initializing
alloc_flags so that the first allocation attempt uses correct flags.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 83d4ca8148fd9092 ("mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath")
Signed-off-by: Tetsuo Handa <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agotools/testing/selftests/kcmp/kcmp_test.c: add KCMP_EPOLL_TFD testing
Cyrill Gorcunov [Fri, 8 Sep 2017 23:13:19 +0000 (16:13 -0700)]
tools/testing/selftests/kcmp/kcmp_test.c: add KCMP_EPOLL_TFD testing

KCMP's KCMP_EPOLL_TFD mode merged in commit 0791e3644e5ef2 ("kcmp: add
KCMP_EPOLL_TFD mode to compare epoll target files") we've had no selftest
for it yet (except in criu development list).  Thus add it.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Cyrill Gorcunov <[email protected]>
Cc: Andrey Vagin <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm/sparse.c: fix typo in online_mem_sections
Michal Hocko [Fri, 8 Sep 2017 23:13:15 +0000 (16:13 -0700)]
mm/sparse.c: fix typo in online_mem_sections

online_mem_sections() accidentally marks online only the first section
in the given range.  This is a typo which hasn't been noticed because I
haven't tested large 2GB blocks previously.  All users of
pfn_to_online_page would get confused on the the rest of the pfn range
in the block.

All we need to fix this is to use iterator (pfn) rather than start_pfn.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm/memory.c: fix mem_cgroup_oom_disable() call missing
Laurent Dufour [Fri, 8 Sep 2017 23:13:12 +0000 (16:13 -0700)]
mm/memory.c: fix mem_cgroup_oom_disable() call missing

Seen while reading the code, in handle_mm_fault(), in the case
arch_vma_access_permitted() is failing the call to
mem_cgroup_oom_disable() is not made.

To fix that, move the call to mem_cgroup_oom_enable() after calling
arch_vma_access_permitted() as it should not have entered the memcg OOM.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: bae473a423f6 ("mm: introduce fault_env")
Signed-off-by: Laurent Dufour <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm: memcontrol: use per-cpu stocks for socket memory uncharging
Roman Gushchin [Fri, 8 Sep 2017 23:13:09 +0000 (16:13 -0700)]
mm: memcontrol: use per-cpu stocks for socket memory uncharging

We've noticed a quite noticeable performance overhead on some hosts with
significant network traffic when socket memory accounting is enabled.

Perf top shows that socket memory uncharging path is hot:
  2.13%  [kernel]                [k] page_counter_cancel
  1.14%  [kernel]                [k] __sk_mem_reduce_allocated
  1.14%  [kernel]                [k] _raw_spin_lock
  0.87%  [kernel]                [k] _raw_spin_lock_irqsave
  0.84%  [kernel]                [k] tcp_ack
  0.84%  [kernel]                [k] ixgbe_poll
  0.83%  < workload >
  0.82%  [kernel]                [k] enqueue_entity
  0.68%  [kernel]                [k] __fget
  0.68%  [kernel]                [k] tcp_delack_timer_handler
  0.67%  [kernel]                [k] __schedule
  0.60%  < workload >
  0.59%  [kernel]                [k] __inet6_lookup_established
  0.55%  [kernel]                [k] __switch_to
  0.55%  [kernel]                [k] menu_select
  0.54%  libc-2.20.so            [.] __memcpy_avx_unaligned

To address this issue, the existing per-cpu stock infrastructure can be
used.

refill_stock() can be called from mem_cgroup_uncharge_skmem() to move
charge to a per-cpu stock instead of calling atomic
page_counter_uncharge().

To prevent the uncontrolled growth of per-cpu stocks, refill_stock()
will explicitly drain the cached charge, if the cached value exceeds
CHARGE_BATCH.

This allows significantly optimize the load:
  1.21%  [kernel]                [k] _raw_spin_lock
  1.01%  [kernel]                [k] ixgbe_poll
  0.92%  [kernel]                [k] _raw_spin_lock_irqsave
  0.90%  [kernel]                [k] enqueue_entity
  0.86%  [kernel]                [k] tcp_ack
  0.85%  < workload >
  0.74%  perf-11120.map          [.] 0x000000000061bf24
  0.73%  [kernel]                [k] __schedule
  0.67%  [kernel]                [k] __fget
  0.63%  [kernel]                [k] __inet6_lookup_established
  0.62%  [kernel]                [k] menu_select
  0.59%  < workload >
  0.59%  [kernel]                [k] __switch_to
  0.57%  libc-2.20.so            [.] __memcpy_avx_unaligned

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm: fadvise: avoid fadvise for fs without backing device
Shakeel Butt [Fri, 8 Sep 2017 23:13:05 +0000 (16:13 -0700)]
mm: fadvise: avoid fadvise for fs without backing device

The fadvise() manpage is silent on fadvise()'s effect on memory-based
filesystems (shmem, hugetlbfs & ramfs) and pseudo file systems (procfs,
sysfs, kernfs).  The current implementaion of fadvise is mostly a noop
for such filesystems except for FADV_DONTNEED which will trigger
expensive remote LRU cache draining.  This patch makes the noop of
fadvise() on such file systems very explicit.

However this change has two side effects for ramfs and one for tmpfs.
First fadvise(FADV_DONTNEED) could remove the unmapped clean zero'ed
pages of ramfs (allocated through read, readahead & read fault) and
tmpfs (allocated through read fault).  Also fadvise(FADV_WILLNEED) could
create such clean zero'ed pages for ramfs.  This change removes those
possibilities.

One of our generic libraries does fadvise(FADV_DONTNEED).  Recently we
observed high latency in fadvise() and noticed that the users have
started using tmpfs files and the latency was due to expensive remote
LRU cache draining.  For normal tmpfs files (have data written on them),
fadvise(FADV_DONTNEED) will always trigger the unneeded remote cache
draining.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Shakeel Butt <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Greg Thelen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm/zsmalloc.c: change stat type parameter to int
Matthias Kaehlcke [Fri, 8 Sep 2017 23:13:02 +0000 (16:13 -0700)]
mm/zsmalloc.c: change stat type parameter to int

zs_stat_inc/dec/get() uses enum zs_stat_type for the stat type, however
some callers pass an enum fullness_group value.  Change the type to int to
reflect the actual use of the functions and get rid of 'enum-conversion'
warnings

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthias Kaehlcke <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: Doug Anderson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm/mlock.c: use page_zone() instead of page_zone_id()
Joonsoo Kim [Fri, 8 Sep 2017 23:12:59 +0000 (16:12 -0700)]
mm/mlock.c: use page_zone() instead of page_zone_id()

page_zone_id() is a specialized function to compare the zone for the pages
that are within the section range.  If the section of the pages are
different, page_zone_id() can be different even if their zone is the same.
This wrong usage doesn't cause any actual problem since
__munlock_pagevec_fill() would be called again with failed index.
However, it's better to use more appropriate function here.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Joonsoo Kim <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm: consider the number in local CPUs when reading NUMA stats
Kemi Wang [Fri, 8 Sep 2017 23:12:55 +0000 (16:12 -0700)]
mm: consider the number in local CPUs when reading NUMA stats

To avoid deviation, the per cpu number of NUMA stats in
vm_numa_stat_diff[] is included when a user *reads* the NUMA stats.

Since NUMA stats does not be read by users frequently, and kernel does not
need it to make a decision, it will not be a problem to make the readers
more expensive.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemi Wang <[email protected]>
Reported-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Aaron Lu <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Ying Huang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm: update NUMA counter threshold size
Kemi Wang [Fri, 8 Sep 2017 23:12:52 +0000 (16:12 -0700)]
mm: update NUMA counter threshold size

There is significant overhead in cache bouncing caused by zone counters
(NUMA associated counters) update in parallel in multi-threaded page
allocation (suggested by Dave Hansen).

This patch updates NUMA counter threshold to a fixed size of MAX_U16 - 2,
as a small threshold greatly increases the update frequency of the global
counter from local per cpu counter(suggested by Ying Huang).

The rationality is that these statistics counters don't affect the
kernel's decision, unlike other VM counters, so it's not a problem to use
a large threshold.

With this patchset, we see 31.3% drop of CPU cycles(537-->369) for per
single page allocation and reclaim on Jesper's page_bench03 benchmark.

Benchmark provided by Jesper D Brouer(increase loop times to 10000000):
https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/
bench

 Threshold   CPU cycles    Throughput(88 threads)
     32          799         241760478
     64          640         301628829
     125         537         358906028 <==> system by default (base)
     256         468         412397590
     512         428         450550704
     4096        399         482520943
     20000       394         489009617
     30000       395         488017817
     65533       369(-31.3%) 521661345(+45.3%) <==> with this patchset
     N/A         342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemi Wang <[email protected]>
Reported-by: Jesper Dangaard Brouer <[email protected]>
Suggested-by: Dave Hansen <[email protected]>
Suggested-by: Ying Huang <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Aaron Lu <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Tim Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm: change the call sites of numa statistics items
Kemi Wang [Fri, 8 Sep 2017 23:12:48 +0000 (16:12 -0700)]
mm: change the call sites of numa statistics items

Patch series "Separate NUMA statistics from zone statistics", v2.

Each page allocation updates a set of per-zone statistics with a call to
zone_statistics().  As discussed in 2017 MM summit, these are a
substantial source of overhead in the page allocator and are very rarely
consumed.  This significant overhead in cache bouncing caused by zone
counters (NUMA associated counters) update in parallel in multi-threaded
page allocation (pointed out by Dave Hansen).

A link to the MM summit slides:
  http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf

To mitigate this overhead, this patchset separates NUMA statistics from
zone statistics framework, and update NUMA counter threshold to a fixed
size of MAX_U16 - 2, as a small threshold greatly increases the update
frequency of the global counter from local per cpu counter (suggested by
Ying Huang).  The rationality is that these statistics counters don't
need to be read often, unlike other VM counters, so it's not a problem
to use a large threshold and make readers more expensive.

With this patchset, we see 31.3% drop of CPU cycles(537-->369, see
below) for per single page allocation and reclaim on Jesper's
page_bench03 benchmark.  Meanwhile, this patchset keeps the same style
of virtual memory statistics with little end-user-visible effects (only
move the numa stats to show behind zone page stats, see the first patch
for details).

I did an experiment of single page allocation and reclaim concurrently
using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based
server (88 processors with 126G memory) with different size of threshold
of pcp counter.

Benchmark provided by Jesper D Brouer(increase loop times to 10000000):
  https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench

   Threshold   CPU cycles    Throughput(88 threads)
      32        799         241760478
      64        640         301628829
      125       537         358906028 <==> system by default
      256       468         412397590
      512       428         450550704
      4096      399         482520943
      20000     394         489009617
      30000     395         488017817
      65533     369(-31.3%) 521661345(+45.3%) <==> with this patchset
      N/A       342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics

This patch (of 3):

In this patch, NUMA statistics is separated from zone statistics
framework, all the call sites of NUMA stats are changed to use
numa-stats-specific functions, it does not have any functionality change
except that the number of NUMA stats is shown behind zone page stats
when users *read* the zone info.

E.g. cat /proc/zoneinfo
    ***Base***                           ***With this patch***
nr_free_pages 3976                         nr_free_pages 3976
nr_zone_inactive_anon 0                    nr_zone_inactive_anon 0
nr_zone_active_anon 0                      nr_zone_active_anon 0
nr_zone_inactive_file 0                    nr_zone_inactive_file 0
nr_zone_active_file 0                      nr_zone_active_file 0
nr_zone_unevictable 0                      nr_zone_unevictable 0
nr_zone_write_pending 0                    nr_zone_write_pending 0
nr_mlock     0                             nr_mlock     0
nr_page_table_pages 0                      nr_page_table_pages 0
nr_kernel_stack 0                          nr_kernel_stack 0
nr_bounce    0                             nr_bounce    0
nr_zspages   0                             nr_zspages   0
numa_hit 0                                *nr_free_cma  0*
numa_miss 0                                numa_hit     0
numa_foreign 0                             numa_miss    0
numa_interleave 0                          numa_foreign 0
numa_local   0                             numa_interleave 0
numa_other   0                             numa_local   0
*nr_free_cma 0*                            numa_other 0
    ...                                        ...
vm stats threshold: 10                     vm stats threshold: 10
    ...                                        ...

The next patch updates the numa stats counter size and threshold.

[[email protected]: coding-style fixes]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kemi Wang <[email protected]>
Reported-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ying Huang <[email protected]>
Cc: Aaron Lu <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm/memory.c: remove reduntant check for write access
Anshuman Khandual [Fri, 8 Sep 2017 23:12:45 +0000 (16:12 -0700)]
mm/memory.c: remove reduntant check for write access

Flags argument has been copied into vmf.flags and it is not changed in
between.  Hence a single write access check can be used for both PUD and
PMD.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agouserfaultfd: non-cooperative: closing the uffd without triggering SIGBUS
Andrea Arcangeli [Fri, 8 Sep 2017 23:12:42 +0000 (16:12 -0700)]
userfaultfd: non-cooperative: closing the uffd without triggering SIGBUS

This is an enhancement to avoid a non cooperative userfaultfd manager
having to unregister all regions before it can close the uffd after all
userfaultfd activity completed.

The UFFDIO_UNREGISTER would serialize against the handle_userfault by
taking the mmap_sem for writing, but we can simply repeat the page fault
if we detect the uffd was closed and so the regular page fault paths
should takeover.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrea Arcangeli <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm: remove useless vma parameter to offset_il_node
Laurent Dufour [Fri, 8 Sep 2017 23:12:39 +0000 (16:12 -0700)]
mm: remove useless vma parameter to offset_il_node

While reading the code I found that offset_il_node() has a vm_area_struct
pointer parameter which is unused.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laurent Dufour <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm/hmm: fix build when HMM is disabled
Jérôme Glisse [Fri, 8 Sep 2017 23:12:35 +0000 (16:12 -0700)]
mm/hmm: fix build when HMM is disabled

Combinatorial Kconfig is painfull. Withi this patch all below combination
build.

1)

2)
CONFIG_HMM_MIRROR=y

3)
CONFIG_DEVICE_PRIVATE=y

4)
CONFIG_DEVICE_PUBLIC=y

5)
CONFIG_HMM_MIRROR=y
CONFIG_DEVICE_PUBLIC=y

6)
CONFIG_HMM_MIRROR=y
CONFIG_DEVICE_PRIVATE=y

7)
CONFIG_DEVICE_PRIVATE=y
CONFIG_DEVICE_PUBLIC=y

8)
CONFIG_HMM_MIRROR=y
CONFIG_DEVICE_PRIVATE=y
CONFIG_DEVICE_PUBLIC=y

Link: http://lkml.kernel.org/r/[email protected]
Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: Jérôme Glisse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm/hmm: avoid bloating arch that do not make use of HMM
Jérôme Glisse [Fri, 8 Sep 2017 23:12:32 +0000 (16:12 -0700)]
mm/hmm: avoid bloating arch that do not make use of HMM

This moves all new code including new page migration helper behind kernel
Kconfig option so that there is no codee bloat for arch or user that do
not want to use HMM or any of its associated features.

arm allyesconfig (without all the patchset, then with and this patch):
   text    data     bss     dec     hex filename
83721896 46511131 27582964 157815991 96814b7 ../without/vmlinux
83722364 46511131 27582964 157816459 968168b vmlinux

[[email protected]: struct hmm is only use by HMM mirror functionality]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: fix build (arm multi_v7_defconfig)]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
7 years agomm/hmm: add new helper to hotplug CDM memory region
Jérôme Glisse [Fri, 8 Sep 2017 23:12:28 +0000 (16:12 -0700)]
mm/hmm: add new helper to hotplug CDM memory region

Unlike unaddressable memory, coherent device memory has a real resource
associated with it on the system (as CPU can address it).  Add a new
helper to hotplug such memory within the HMM framework.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jérôme Glisse <[email protected]>
Reviewed-by: Balbir Singh <[email protected]>
Cc: Aneesh Kumar <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Nellans <[email protected]>
Cc: Evgeny Baskakov <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Mark Hairgrove <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sherry Cheung <[email protected]>
Cc: Subhash Gutti <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Bob Liu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
This page took 0.135468 seconds and 4 git commands to generate.