Git Repo - linux.git/log

fs/select.c: use struct_size() in kmalloc()

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array.  For example:

  struct foo {
       int stuff;
       struct boo entry[];
  };

  size = sizeof(struct foo) + count * sizeof(struct boo);
  instance = kmalloc(size, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

  instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

Also, notice that variable size is unnecessary, hence it is removed.

This code was detected with the help of Coccinelle.

Link: http://lkml.kernel.org/r/20190604164226.GA13823@embeddedor
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: add account_locked_vm utility function

locked_vm accounting is done roughly the same way in five places, so
unify them in a helper.

Include the helper's caller in the debug print to distinguish between
callsites.

Error codes stay the same, so user-visible behavior does too. The one
exception is that the -EPERM case in tce_account_locked_vm is removed
because Alexey has never seen it triggered.

[[email protected]: v3]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: fix mm/util.c]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Daniel Jordan <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Tested-by: Alexey Kardashevskiy <[email protected]>
Acked-by: Alex Williamson <[email protected]>
Cc: Alan Tull <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Moritz Fischer <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Steve Sistare <[email protected]>
Cc: Wu Hao <[email protected]>
Cc: Ira Weiny <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arm64: mm: implement pte_devmap support

In order for things like get_user_pages() to work on ZONE_DEVICE memory,
we need a software PTE bit to identify device-backed PFNs. Hook this up
along with the relevant helpers to join in with ARCH_HAS_PTE_DEVMAP.

[[email protected]: build fixes]
Link: http://lkml.kernel.org/r/13026c4e64abc17133bbfa07d7731ec6691c0bcd.1559050949.git.robin.murphy@arm.com
Link: http://lkml.kernel.org/r/817d92886fc3b33bcbf6e105ee83a74babb3a5aa.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <[email protected]>
Acked-by: Will Deacon <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oliver O'Halloran <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: introduce ARCH_HAS_PTE_DEVMAP

ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
with the long-out-of-date comment can lead to the impression than an
architecture may just enable it (since __add_pages() now "comprehends
device memory" for itself) and expect things to work.

In practice, however, ZONE_DEVICE users have little chance of
functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
dependency so the real situation is clearer.

Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <[email protected]>
Acked-by: Dan Williams <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Acked-by: Oliver O'Halloran <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: clean up is_device_*_page() definitions

Refactor is_device_{public,private}_page() with is_pci_p2pdma_page() to
make them all consistent in depending on their respective config options
even when CONFIG_DEV_PAGEMAP_OPS is enabled for other reasons. This
allows a little more compile-time optimisation as well as the conceptual
and cosmetic cleanup.

Link: http://lkml.kernel.org/r/187c2ab27dea70635d375a61b2f2076d26c032b0.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <[email protected]>
Suggested-by: Jerome Glisse <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oliver O'Halloran <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/mmap: move common defines to mman-common.h

Two architecture that use arch specific MMAP flags are powerpc and
sparc. We still have few flag values common across them and other
architectures. Consolidate this in mman-common.h.

Also update the comment to indicate where to find HugeTLB specific
reserved values

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: move MAP_SYNC to asm-generic/mman-common.h

This enables support for synchronous DAX fault on powerpc

The generic changes are added as part of b6fb293f2497 ("mm: Define
MAP_SYNC and VM_SYNC flags")

Without this, mmap returns EOPNOTSUPP for MAP_SYNC with
MAP_SHARED_VALIDATE

Instead of adding MAP_SYNC with same value to
arch/powerpc/include/uapi/asm/mman.h, I am moving the #define to
asm-generic/mman-common.h.  Two architectures using mman-common.h
directly are sparc and powerpc.  We should be able to consloidate more
#defines to mman-common.h.  That can be done as a separate patch.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

device-dax: "Hotremove" persistent memory that is used like normal RAM

It is now allowed to use persistent memory like a regular RAM, but
currently there is no way to remove this memory until machine is
rebooted.

This work expands the functionality to also allows hotremoving
previously hotplugged persistent memory, and recover the device for use
for other purposes.

To hotremove persistent memory, the management software must first
offline all memory blocks of dax region, and than unbind it from
device-dax/kmem driver.  So, operations should look like this:

  echo offline > /sys/devices/system/memory/memoryN/state
  ...
  echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind

Note: if unbind is done without offlining memory beforehand, it won't be
possible to do dax0.0 hotremove, and dax's memory is going to be part of
System RAM until reboot.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: James Morris <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Yaowei Bai <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/hotplug: make remove_memory() interface usable

Presently the remove_memory() interface is inherently broken.  It tries
to remove memory but panics if some memory is not offline.  The problem
is that it is impossible to ensure that all memory blocks are offline as
this function also takes lock_device_hotplug that is required to change
memory state via sysfs.

So, between calling this function and offlining all memory blocks there
is always a window when lock_device_hotplug is released, and therefore,
there is always a chance for a panic during this window.

Make this interface to return an error if memory removal fails.  This
way it is safe to call this function without panicking machine, and also
makes it symmetric to add_memory() which already returns an error.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: James Morris <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Yaowei Bai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

device-dax: fix memory and resource leak if hotplug fails

Patch series ""Hotremove" persistent memory", v6.

Recently, adding a persistent memory to be used like a regular RAM was
added to Linux.  This work extends this functionality to also allow hot
removing persistent memory.

We (Microsoft) have an important use case for this functionality.

The requirement is for physical machines with small amount of RAM (~8G)
to be able to reboot in a very short period of time (<1s).  Yet, there
is a userland state that is expensive to recreate (~2G).

The solution is to boot machines with 2G preserved for persistent
memory.

Copy the state, and hotadd the persistent memory so machine still has
all 8G available for runtime.  Before reboot, offline and hotremove
device-dax 2G, copy the memory that is needed to be preserved to pmem0
device, and reboot.

The series of operations look like this:

1. After boot restore /dev/pmem0 to ramdisk to be consumed by apps.
   and free ramdisk.
2. Convert raw pmem0 to devdax
   ndctl create-namespace --mode devdax --map mem -e namespace0.0 -f
3. Hotadd to System RAM
   echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
   echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
   echo online_movable > /sys/devices/system/memoryXXX/state
4. Before reboot hotremove device-dax memory from System RAM
   echo offline > /sys/devices/system/memoryXXX/state
   echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind
5. Create raw pmem0 device
   ndctl create-namespace --mode raw  -e namespace0.0 -f
6. Copy the state that was stored by apps to ramdisk to pmem device
7. Do kexec reboot or reboot through firmware if firmware does not
   zero memory in pmem0 region (These machines have only regular
   volatile memory). So to have pmem0 device either memmap kernel
   parameter is used, or devices nodes in dtb are specified.

This patch (of 3):

When add_memory() fails, the resource and the memory should be freed.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM")
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: James Morris <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Yaowei Bai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

include/linux/lz4.h: fix spelling and copy-paste errors in documentation

Fix a few spelling and grammar errors, and two places where fast/safe in
the documentation did not match the function.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tom Levy <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Jiri Kosina <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ipc/mqueue.c: only perform resource calculation if user valid

Andreas Christoforou reported:

  UBSAN: Undefined behaviour in ipc/mqueue.c:414:49 signed integer overflow:
  9 * 2305843009213693951 cannot be represented in type 'long int'
  ...
  Call Trace:
    mqueue_evict_inode+0x8e7/0xa10 ipc/mqueue.c:414
    evict+0x472/0x8c0 fs/inode.c:558
    iput_final fs/inode.c:1547 [inline]
    iput+0x51d/0x8c0 fs/inode.c:1573
    mqueue_get_inode+0x8eb/0x1070 ipc/mqueue.c:320
    mqueue_create_attr+0x198/0x440 ipc/mqueue.c:459
    vfs_mkobj+0x39e/0x580 fs/namei.c:2892
    prepare_open ipc/mqueue.c:731 [inline]
    do_mq_open+0x6da/0x8e0 ipc/mqueue.c:771

Which could be triggered by:

        struct mq_attr attr = {
                .mq_flags = 0,
                .mq_maxmsg = 9,
                .mq_msgsize = 0x1fffffffffffffff,
                .mq_curmsgs = 0,
        };

        if (mq_open("/testing", 0x40, 3, &attr) == (mqd_t) -1)
                perror("mq_open");

mqueue_get_inode() was correctly rejecting the giant mq_msgsize, and
preparing to return -EINVAL.  During the cleanup, it calls
mqueue_evict_inode() which performed resource usage tracking math for
updating "user", before checking if there was a valid "user" at all
(which would indicate that the calculations would be sane).  Instead,
delay this check to after seeing a valid "user".

The overflow was real, but the results went unused, so while the flaw is
harmless, it's noisy for kernel fuzzers, so just fix it by moving the
calculation under the non-NULL "user" where it actually gets used.

Link: http://lkml.kernel.org/r/201906072207.ECB65450@keescook
Signed-off-by: Kees Cook <[email protected]>
Reported-by: Andreas Christoforou <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures

For architectures using __WARN_TAINT, the WARN_ON macro did not print
out the "cut here" string. The other WARN_XXX macros would print "cut
here" inside __warn_printk, which is not called for WARN_ON since it
doesn't have a message to print.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: a7bed27af194 ("bug: fix "cut here" location for __WARN_TAINT architectures")
Signed-off-by: Drew Davenport <[email protected]>
Acked-by: Kees Cook <[email protected]>
Tested-by: Kees Cook <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

scripts/gdb: add helpers to find and list devices

Add helper commands and functions for finding pointers to struct device
by enumerating linux device bus/class infrastructure.  This can be used
to fetch subsystem and driver-specific structs:

  (gdb) p *$container_of($lx_device_find_by_class_name("net", "eth0"), "struct net_device", "dev")
  (gdb) p *$container_of($lx_device_find_by_bus_name("i2c", "0-004b"), "struct i2c_client", "dev")
  (gdb) p *(struct imx_port*)$lx_device_find_by_class_name("tty", "ttymxc1")->parent->driver_data

Several generic "lx-device-list" functions are included to enumerate
devices by bus and class:

  (gdb) lx-device-list-bus usb
  (gdb) lx-device-list-class
  (gdb) lx-device-list-tree &platform_bus

Similar information is available in /sys but pointer values are
deliberately hidden.

Link: http://lkml.kernel.org/r/c948628041311cbf1b9b4cff3dda7d2073cb3eaa.1561492937.git.leonard.crestez@nxp.com
Signed-off-by: Leonard Crestez <[email protected]>
Reviewed-by: Stephen Boyd <[email protected]>
Cc: Kieran Bingham <[email protected]>
Cc: Jan Kiszka <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

scripts/gdb: add lx-genpd-summary command

This is like /sys/kernel/debug/pm/pm_genpd_summary except it's
accessible through a debugger.

This can be useful if the target crashes or hangs because power domains
were not properly enabled.

Link: http://lkml.kernel.org/r/f9ee627a0d4f94b894aa202fee8a98444049bed8.1561492937.git.leonard.crestez@nxp.com
Signed-off-by: Leonard Crestez <[email protected]>
Reviewed-by: Stephen Boyd <[email protected]>
Cc: Kieran Bingham <[email protected]>
Cc: Jan Kiszka <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl

The PPS assert/clear offset corrections are set by the PPS_SETPARAMS
ioctl in the pps_ktime structs, which also contain flags. The flags are
not initialized by applications (using the timepps.h header) and they
are not used by the kernel for anything except returning them back in
the PPS_GETPARAMS ioctl.

Set the flags to zero to make it clear they are unused and avoid leaking
uninitialized data of the PPS_SETPARAMS caller to other applications
that have a read access to the PPS device.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Miroslav Lichvar <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Rodolfo Giometti <[email protected]>
Cc: Greg KH <[email protected]>
Cc: Dan Carpenter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/pid.c: convert struct pid count to refcount_t

struct pid's count is an atomic_t field used as a refcount.  Use
refcount_t for it which is basically atomic_t but does additional
checking to prevent use-after-free bugs.

For memory ordering, the only change is with the following:

- if ((atomic_read(&pid->count) == 1) ||
-      atomic_dec_and_test(&pid->count)) {
+ if (refcount_dec_and_test(&pid->count)) {
kmem_cache_free(ns->pid_cachep, pid);

Here the change is from: Fully ordered --> RELEASE + ACQUIRE (as per
refcount-vs-atomic.rst) This ACQUIRE should take care of making sure the
free happens after the refcount_dec_and_test().

The above hunk also removes atomic_read() since it is not needed for the
code to work and it is unclear how beneficial it is.  The removal lets
refcount_dec_and_test() check for cases where get_pid() happened before
the object was freed.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Joel Fernandes (Google) <[email protected]>
Reviewed-by: Andrea Parri <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Elena Reshetova <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: KJ Tsanaktsidis <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings

The dev_info.name[] array has space for RIO_MAX_DEVNAME_SZ + 1
characters. But the problem here is that we don't ensure that the user
put a NUL terminator on the end of the string. It could lead to an out
of bounds read.

Link: http://lkml.kernel.org/r/20190529110601.GB19119@mwanda
Fixes: e8de370188d0 ("rapidio: add mport char device driver")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Alexandre Bounine <[email protected]>
Cc: Ira Weiny <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()

Now that restore_saved_sigmask_unless() is always called with the same
argument right before poll_select_copy_remaining() we can move it into
poll_select_copy_remaining() and make it the only caller of restore() in
fs/select.c.

The patch also renames poll_select_copy_remaining(),
poll_select_finish() looks better after this change.

kern_select() doesn't use set_user_sigmask(), so in this case
poll_select_finish() does restore_saved_sigmask_unless() "for no
reason". But this won't hurt, and WARN_ON(!TIF_SIGPENDING) is still
valid.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Laight <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Deepa Dinamani <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Eric Wong <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR

do_poll() returns -EINTR if interrupted and after that all its callers
have to translate it into -ERESTARTNOHAND. Change do_poll() to return
-ERESTARTNOHAND and update (simplify) the callers.

Note that this also unifies all users of restore_saved_sigmask_unless(),
see the next patch.

Linus:

: The *right* return value will actually be then chosen by
: poll_select_copy_remaining(), which will turn ERESTARTNOHAND to EINTR
: when it can't update the timeout.
:
: Except for the cases that use restart_block and do that instead and
: don't have the whole timeout restart issue as a result.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Linus Torvalds <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Laight <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Deepa Dinamani <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Eric Wong <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

signal: simplify set_user_sigmask/restore_user_sigmask

task->saved_sigmask and ->restore_sigmask are only used in the ret-from-
syscall paths. This means that set_user_sigmask() can save ->blocked in
->saved_sigmask and do set_restore_sigmask() to indicate that ->blocked
was modified.

This way the callers do not need 2 sigset_t's passed to set/restore and
restore_user_sigmask() renamed to restore_saved_sigmask_unless() turns
into the trivial helper which just calls restore_saved_sigmask().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Cc: Deepa Dinamani <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Eric Wong <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: David Laight <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

signal: reorder struct sighand_struct

struct sighand_struct::siglock field is the most used field by far, put
it first so that is can be accessed without IMM8 or IMM32 encoding on
x86_64.

Space savings (on trimmed down VM test config):

add/remove: 0/0 grow/shrink: 8/68 up/down: 49/-1147 (-1098)
Function                                     old     new   delta
complete_signal                              512     533     +21
do_signalfd4                                 335     346     +11
__cleanup_sighand                             39      43      +4
unhandled_signal                              49      52      +3
prepare_signal                               692     695      +3
ignore_signals                                37      40      +3
__tty_check_change.part                      248     251      +3
ksys_unshare                                 780     781      +1
sighand_ctor                                  33      29      -4
ptrace_trap_notify                            60      56      -4
sigqueue_free                                 98      91      -7
run_posix_cpu_timers                        1389    1382      -7
proc_pid_status                             2448    2441      -7
proc_pid_limits                              344     337      -7
posix_cpu_timer_rearm                        222     215      -7
posix_cpu_timer_get                          249     242      -7
kill_pid_info_as_cred                        243     236      -7
freeze_task                                  197     190      -7
flush_old_exec                              1873    1866      -7
do_task_stat                                3363    3356      -7
do_send_sig_info                              98      91      -7
do_group_exit                                147     140      -7
init_sighand                                2088    2080      -8
do_notify_parent_cldstop                     399     391      -8
signalfd_cleanup                              50      41      -9
do_notify_parent                             557     545     -12
__send_signal                               1029    1017     -12
ptrace_stop                                  590     577     -13
get_signal                                  1576    1563     -13
__lock_task_sighand                          112      99     -13
zap_pid_ns_processes                         391     377     -14
update_rlimit_cpu                             78      64     -14
tty_signal_session_leader                    413     399     -14
tty_open_proc_set_tty                        149     135     -14
tty_jobctrl_ioctl                            936     922     -14
set_cpu_itimer                               339     325     -14
ptrace_resume                                226     212     -14
ptrace_notify                                110      96     -14
proc_clear_tty                                81      67     -14
posix_cpu_timer_del                          229     215     -14
kernel_sigaction                             156     142     -14
getrusage                                    977     963     -14
get_current_tty                               98      84     -14
force_sigsegv                                 89      75     -14
force_sig_info                               205     191     -14
flush_signals                                 83      69     -14
flush_itimer_signals                          85      71     -14
do_timer_create                             1120    1106     -14
do_sigpending                                 88      74     -14
do_signal_stop                               537     523     -14
cgroup_init_fs_context                       644     630     -14
call_usermodehelper_exec_async               402     388     -14
calculate_sigpending                          58      44     -14
__x64_sys_timer_delete                       248     234     -14
__set_current_blocked                         80      66     -14
__ptrace_unlink                              310     296     -14
__ptrace_detach.part                         187     173     -14
send_sigqueue                                362     347     -15
get_cpu_itimer                               214     199     -15
signalfd_poll                                175     159     -16
dequeue_signal                               340     323     -17
do_getitimer                                 192     174     -18
release_task.part                           1060    1040     -20
ptrace_peek_siginfo                          408     387     -21
posix_cpu_timer_set                          827     806     -21
exit_signals                                 437     416     -21
do_sigaction                                 541     520     -21
do_setitimer                                 485     464     -21
disassociate_ctty.part                       545     517     -28
__x64_sys_rt_sigtimedwait                    721     679     -42
__x64_sys_ptrace                            1319    1277     -42
ptrace_request                              1828    1782     -46
signalfd_read                                507     459     -48
wait_consider_task                          2027    1971     -56
do_coredump                                 3672    3616     -56
copy_process.part                           6936    6871     -65

Link: http://lkml.kernel.org/r/20190503192800.GA18004@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftests/ptrace: add a test case for PTRACE_GET_SYSCALL_INFO

Check whether PTRACE_GET_SYSCALL_INFO semantics implemented in the
kernel matches userspace expectations.

[[email protected]: coding-style fixes]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dmitry V. Levin <[email protected]>
Acked-by: Shuah Khan <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Elvira Khabirova <[email protected]>
Cc: Eugene Syromyatnikov <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Helge Deller <[email protected]> [parisc]
Cc: James E.J. Bottomley <[email protected]>
Cc: James Hogan <[email protected]>
Cc: kbuild test robot <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Vincent Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ptrace: add PTRACE_GET_SYSCALL_INFO request

PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
details of the syscall the tracee is blocked in.

There are two reasons for a special syscall-related ptrace request.

Firstly, with the current ptrace API there are cases when ptracer cannot
retrieve necessary information about syscalls.  Some examples include:

* The notorious int-0x80-from-64-bit-task issue. See [1] for details.
   In short, if a 64-bit task performs a syscall through int 0x80, its
   tracer has no reliable means to find out that the syscall was, in
   fact, a compat syscall, and misidentifies it.

* Syscall-enter-stop and syscall-exit-stop look the same for the
   tracer. Common practice is to keep track of the sequence of
   ptrace-stops in order not to mix the two syscall-stops up. But it is
   not as simple as it looks; for example, strace had a (just recently
   fixed) long-standing bug where attaching strace to a tracee that is
   performing the execve system call led to the tracer identifying the
   following syscall-exit-stop as syscall-enter-stop, which messed up
   all the state tracking.

* Since the introduction of commit 84d77d3f06e7 ("ptrace: Don't allow
   accessing an undumpable mm"), both PTRACE_PEEKDATA and
   process_vm_readv become unavailable when the process dumpable flag is
   cleared. On such architectures as ia64 this results in all syscall
   arguments being unavailable for the tracer.

Secondly, ptracers also have to support a lot of arch-specific code for
obtaining information about the tracee.  For some architectures, this
requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
argument and return value.

ptrace(2) man page:

long ptrace(enum __ptrace_request request, pid_t pid,
            void *addr, void *data);
...
PTRACE_GET_SYSCALL_INFO
       Retrieve information about the syscall that caused the stop.
       The information is placed into the buffer pointed by "data"
       argument, which should be a pointer to a buffer of type
       "struct ptrace_syscall_info".
       The "addr" argument contains the size of the buffer pointed to
       by "data" argument (i.e., sizeof(struct ptrace_syscall_info)).
       The return value contains the number of bytes available
       to be written by the kernel.
       If the size of data to be written by the kernel exceeds the size
       specified by "addr" argument, the output is truncated.

[[email protected]: selftests/seccomp/seccomp_bpf: update for PTRACE_GET_SYSCALL_INFO]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Elvira Khabirova <[email protected]>
Co-developed-by: Dmitry V. Levin <[email protected]>
Signed-off-by: Dmitry V. Levin <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: Eugene Syromyatnikov <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Helge Deller <[email protected]> [parisc]
Cc: James E.J. Bottomley <[email protected]>
Cc: James Hogan <[email protected]>
Cc: kbuild test robot <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vincent Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

powerpc: define syscall_get_error()

syscall_get_error() is required to be implemented on this architecture in
addition to already implemented syscall_get_nr(), syscall_get_arguments(),
syscall_get_return_value(), and syscall_get_arch() functions in order to
extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dmitry V. Levin <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Cc: Elvira Khabirova <[email protected]>
Cc: Eugene Syromyatnikov <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Helge Deller <[email protected]> [parisc]
Cc: James E.J. Bottomley <[email protected]>
Cc: James Hogan <[email protected]>
Cc: kbuild test robot <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vincent Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

parisc: define syscall_get_error()

syscall_get_error() is required to be implemented on all architectures in
addition to already implemented syscall_get_nr(), syscall_get_arguments(),
syscall_get_return_value(), and syscall_get_arch() functions in order to
extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dmitry V. Levin <[email protected]>
Acked-by: Helge Deller <[email protected]> [parisc]
Cc: James E.J. Bottomley <[email protected]>
Cc: Elvira Khabirova <[email protected]>
Cc: Eugene Syromyatnikov <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: James Hogan <[email protected]>
Cc: kbuild test robot <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vincent Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mips: define syscall_get_error()

syscall_get_error() is required to be implemented on all architectures
in addition to already implemented syscall_get_nr(),
syscall_get_arguments(), syscall_get_return_value(), and
syscall_get_arch() functions in order to extend the generic ptrace API
with PTRACE_GET_SYSCALL_INFO request.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dmitry V. Levin <[email protected]>
Acked-by: Paul Burton <[email protected]>
Cc: Elvira Khabirova <[email protected]>
Cc: Eugene Syromyatnikov <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Helge Deller <[email protected]> [parisc]
Cc: James E.J. Bottomley <[email protected]>
Cc: kbuild test robot <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vincent Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hexagon: define syscall_get_error() and syscall_get_return_value()

syscall_get_* functions are required to be implemented on all
architectures in order to extend the generic ptrace API with
PTRACE_GET_SYSCALL_INFO request.

This adds remaining 2 syscall_get_* functions as documented in
asm-generic/syscall.h: syscall_get_error and syscall_get_return_value.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dmitry V. Levin <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Elvira Khabirova <[email protected]>
Cc: Eugene Syromyatnikov <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Helge Deller <[email protected]> [parisc]
Cc: James E.J. Bottomley <[email protected]>
Cc: James Hogan <[email protected]>
Cc: kbuild test robot <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vincent Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

nds32: fix asm/syscall.h

PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
details of the syscall the tracee is blocked in.

There are two reasons for a special syscall-related ptrace request.

Firstly, with the current ptrace API there are cases when ptracer cannot
retrieve necessary information about syscalls.  Some examples include:

* The notorious int-0x80-from-64-bit-task issue. See [1] for details.
   In short, if a 64-bit task performs a syscall through int 0x80, its
   tracer has no reliable means to find out that the syscall was, in
   fact, a compat syscall, and misidentifies it.

* Syscall-enter-stop and syscall-exit-stop look the same for the
   tracer. Common practice is to keep track of the sequence of
   ptrace-stops in order not to mix the two syscall-stops up. But it is
   not as simple as it looks; for example, strace had a (just recently
   fixed) long-standing bug where attaching strace to a tracee that is
   performing the execve system call led to the tracer identifying the
   following syscall-exit-stop as syscall-enter-stop, which messed up
   all the state tracking.

* Since the introduction of commit 84d77d3f06e7 ("ptrace: Don't allow
   accessing an undumpable mm"), both PTRACE_PEEKDATA and
   process_vm_readv become unavailable when the process dumpable flag is
   cleared. On such architectures as ia64 this results in all syscall
   arguments being unavailable for the tracer.

Secondly, ptracers also have to support a lot of arch-specific code for
obtaining information about the tracee.  For some architectures, this
requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
argument and return value.

PTRACE_GET_SYSCALL_INFO returns the following structure:

struct ptrace_syscall_info {
__u8 op; /* PTRACE_SYSCALL_INFO_* */
__u32 arch __attribute__((__aligned__(sizeof(__u32))));
__u64 instruction_pointer;
__u64 stack_pointer;
union {
struct {
__u64 nr;
__u64 args[6];
} entry;
struct {
__s64 rval;
__u8 is_error;
} exit;
struct {
__u64 nr;
__u64 args[6];
__u32 ret_data;
} seccomp;
};
};

The structure was chosen according to [2], except for the following
changes:

* seccomp substructure was added as a superset of entry substructure

* the type of nr field was changed from int to __u64 because syscall
   numbers are, as a practical matter, 64 bits

* stack_pointer field was added along with instruction_pointer field
   since it is readily available and can save the tracer from extra
   PTRACE_GETREGS/PTRACE_GETREGSET calls

* arch is always initialized to aid with tracing system calls such as
   execve()

* instruction_pointer and stack_pointer are always initialized so they
   could be easily obtained for non-syscall stops

* a boolean is_error field was added along with rval field, this way
   the tracer can more reliably distinguish a return value from an error
   value

strace has been ported to PTRACE_GET_SYSCALL_INFO.  Starting with
release 4.26, strace uses PTRACE_GET_SYSCALL_INFO API as the preferred
mechanism of obtaining syscall information.

[1] https://lore.kernel.org/lkml/CA+55aFzcSVmdDj9Lh_gdbz1OzHyEm6ZrGPBDAJnywm2LF_eVyg@mail.gmail.com/
[2] https://lore.kernel.org/lkml/CAObL_7GM0n80N7J_DFw_eQyfLyzq+sf4y2AvsCCV88Tb3AwEHA@mail.gmail.com/

This patch (of 7):

All syscall_get_*() and syscall_set_*() functions must be defined as
static inline as on all other architectures, otherwise asm/syscall.h
cannot be included in more than one compilation unit.

This bug has to be fixed in order to extend the generic
ptrace API with PTRACE_GET_SYSCALL_INFO request.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 1932fbe36e02 ("nds32: System calls handling")
Signed-off-by: Dmitry V. Levin <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Acked-by: Greentime Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Cc: Elvira Khabirova <[email protected]>
Cc: Eugene Syromyatnikov <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Helge Deller <[email protected]> [parisc]
Cc: James E.J. Bottomley <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/reiserfs/journal.c: change return type of dirty_one_transaction

Change return type of dirty_one_transaction from int to void. As this
function always return success.

Fixes below issue reported by coccicheck:

fs/reiserfs/journal.c:1690:5-8: Unneeded variable: "ret". Return "0" on line 1719

Link: http://lkml.kernel.org/r/20190702175430.GA5882@hari-Inspiron-1545
Signed-off-by: Hariprasad Kelam <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Bharath Vedartham <[email protected]>
Cc: Hariprasad Kelam <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/ufs/super.c: remove set but not used variable 'usb3'

Fixes gcc '-Wunused-but-set-variable' warning:

fs/ufs/super.c: In function ufs_statfs:
fs/ufs/super.c:1409:32: warning: variable usb3 set but not used [-Wunused-but-set-variable]

It is not used since commmit c596961d1b4c ("ufs: fix s_size/s_dsize
users")

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: YueHaibing <[email protected]>
Cc: Evgeniy Dushistov <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/hfsplus/xattr.c: replace strncpy with memcpy

strncpy() was used to copy a fixed size buffer.  Since NUL-terminating
string is not required here, prefer a memcpy function.  The generated
code (ppc32) remains the same.

Silence the following warning triggered using W=1:

  fs/hfsplus/xattr.c:410:3: warning: 'strncpy' output truncated before terminating nul copying 4 bytes from a string of the same length [-Wstringop-truncation]

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mathieu Malaterre <[email protected]>
Reviewed-by: Vyacheslav Dubeyko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: add hinting support for partial file caching

This adds support for partial file caching in Coda. Every read, write
and mmap informs the userspace cache manager about what part of a file
is about to be accessed so that the cache manager can ensure the
relevant parts are available before the operation is allowed to proceed.

When a read or write operation completes, this is also reported to allow
the cache manager to track when partially cached content can be
released.

If the cache manager does not support partial file caching, or when the
entire file has been fetched into the local cache, the cache manager may
return an EOPNOTSUPP error to indicate that intent upcalls are no longer
necessary until the file is closed.

[[email protected]: little whitespace fixup]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pedro Cuadra <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: ftoc validity check integration

This patch moves cfi check in coda_ftoc() instead of repeating it in the
wild.

  Module size
     text    data     bss     dec     hex filename
    28297    1040     700   30037    7555 fs/coda/coda.ko.before
    28263     980     700   29943    74f7 fs/coda/coda.ko.after

Link: http://lkml.kernel.org/r/a2c27663ec4547018c92d71c63b1dff4650b6546.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: remove sb test in coda_fid_to_inode()

coda_fid_to_inode() is only called by coda_downcall() where sb is already
being tested.

Link: http://lkml.kernel.org/r/d2163b3136348faf83ba47dc2d65a5d0a9a135dd.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: remove sysctl object from module when unused

Inspired by NFS sysctl process

Link: http://lkml.kernel.org/r/9afcc2cd09490849b309786bbf47fef75de7f91c.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: add __init to init_coda_psdev()

init_coda_psdev() was only called by __init function.

Link: http://lkml.kernel.org/r/a12a5a135fa6b0ea997e1a0af4be0a235c463a24.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: use SIZE() for stat

max_t expression was already defined in coda sources

Link: http://lkml.kernel.org/r/e6cda497ce8691db155cb35f8d13ea44ca6cedeb.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: destroy mutex in put_super()

We can safely destroy vc_mutex at the end of umount process.

Link: http://lkml.kernel.org/r/f436f68908c467c5663bc6a9251b52cd7b95d2a5.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Fabian Frederick <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: remove uapi/linux/coda_psdev.h

Nothing is left in this header that is used by userspace.

Link: http://lkml.kernel.org/r/bb11378cef94739f2cf89425dd6d302a52c64480.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: move internal defs out of include/linux/ [ver #2]

Move include/linux/coda_psdev.h to fs/coda/ as there's nothing else that
uses it.

Link: http://lkml.kernel.org/r/3ceeee0415a929b89fb02700b6b4b3a07938acb8.1558117389.git.jaharkes@cs.cmu.edu
Link: https://patchwork.kernel.org/patch/10590257/
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: bump module version

The out of tree module version had been bumped several times already,
but we haven't kept this in-tree one in sync, partly because most
changes go from here to the out-of-tree copy.

Link: http://lkml.kernel.org/r/8b0ab50a2da2f0180ac32c79d91811b4d1d0bd8b.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: get rid of CODA_FREE()

The CODA_FREE() macro just calls kvfree(). We can call that directly
instead.

Link: http://lkml.kernel.org/r/4950a94fd30ec5f84835dd4ca0bb67c0448672f5.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: get rid of CODA_ALLOC()

These days we have kvzalloc() so we can delete CODA_ALLOC().

I made a couple related changes in coda_psdev_write(). First, I added
some error handling to avoid a NULL dereference if the allocation
failed. Second, I used kvmalloc() instead of kvzalloc() because we copy
over the memory on the next line so there is no need to zero it first.

Link: http://lkml.kernel.org/r/e56010c822e7a7cbaa8a238cf82ad31c67eaa800.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: change Coda's user api to use 64-bit time_t in timespec

Move the 32-bit time_t problems to userspace.

Link: http://lkml.kernel.org/r/8d089068823bfb292a4020f773922fbd82ffad39.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: stop using 'struct timespec' in user API

We exchange file timestamps with user space using psdev device
read/write operations with a fixed but architecture specific binary
layout.

On 32-bit systems, this uses a 'timespec' structure that is defined by
the C library to contain two 32-bit values for seconds and nanoseconds.
As we get ready for the year 2038 overflow of the 32-bit signed seconds,
the kernel now uses 64-bit timestamps internally, and user space will do
the same change by changing the 'timespec' definition in the future.

Unfortunately, this breaks the layout of the coda_vattr structure, so we
need to redefine that in terms of something that does not change.  I'm
introducing a new 'struct vtimespec' structure here that keeps the
existing layout, and the same change has to be done in the coda user
space copy of linux/coda.h before anyone can use that on a 32-bit
architecture with 64-bit time_t.

An open question is what should happen to actual times past y2038, as
they are now truncated to the last valid date when sent to user space,
and interpreted as pre-1970 times when a timestamp with the MSB set is
read back into the kernel.  Alternatively, we could change the new
timespec64_to_coda()/coda_to_timespec64() functions to use a different
interpretation and extend the available range further to the future by
disallowing past timestamps.  This would require more changes in the
user space side though.

Link: http://lkml.kernel.org/r/562b7324149461743e4fbe2fedbf7c242f7e274a.1558117389.git.jaharkes@cs.cmu.edu
Link: https://patchwork.kernel.org/patch/10474735/
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Acked-by: Jan Harkes <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: clean up indentation, replace spaces with tab

Trivial fix to clean up indentation, replace spaces with tab

Link: http://lkml.kernel.org/r/ffc2bfa5a37ffcdf891c51b2e2ed618103965b24.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

uapi linux/coda_psdev.h: move CODA_REQ_ from uapi to kernel side headers

These constants only used internally and not exposed to userspace.

Link: http://lkml.kernel.org/r/baeafc30dad70d8b422ee679420099c2d8aa7da0.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: don't try to print names that were considered too long

Probably safer to just show the unexpected length and debug it from the
userspace side.

Link: http://lkml.kernel.org/r/582ae759a4fdfa31a64c35de489fa4efabac09d6.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: fix build using bare-metal toolchain

The kernel is self-contained project and can be built with bare-metal
toolchain. But bare-metal toolchain doesn't define __linux__. Because
of this u_quad_t type is not defined when using bare-metal toolchain and
codafs build fails. This patch fixes it by defining u_quad_t type
unconditionally.

Link: http://lkml.kernel.org/r/3cbb40b0a57b6f9923a9d67b53473c0b691a3eaa.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Sam Protsenko <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: potential buffer overflow in coda_psdev_write()

Add checks to make sure the downcall message we got from the Coda cache
manager is large enough to contain the data it is supposed to have.
i.e. when we get a CODA_ZAPDIR we can access &out->coda_zapdir.CodaFid.

Link: http://lkml.kernel.org/r/894fb6b250add09e4e3935f14649f21284a5cb18.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: add error handling for fget

When fget fails, the lack of error-handling code may cause unexpected
results.

This patch adds error-handling code after calling fget.

Link: http://lkml.kernel.org/r/2514ec03df9c33b86e56748513267a80dd8004d9.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Zhouyang Jia <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

uapi linux/coda_psdev.h: move upc_req definition from uapi to kernel side headers

Only users of upc_req in kernel side fs/coda/psdev.c and
fs/coda/upcall.c already include linux/coda_psdev.h.

Suggested by Jan Harkes <[email protected]> in
  https://lore.kernel.org/lkml/20150531111913 [email protected]/

Fixes these include/uapi/linux/coda_psdev.h compilation errors in userspace:

  linux/coda_psdev.h:12:19: error: field `uc_chain' has incomplete type
  struct list_head    uc_chain;
                   ^
  linux/coda_psdev.h:13:2: error: unknown type name `caddr_t'
  caddr_t             uc_data;
  ^
  linux/coda_psdev.h:14:2: error: unknown type name `u_short'
  u_short             uc_flags;
  ^
  linux/coda_psdev.h:15:2: error: unknown type name `u_short'
  u_short             uc_inSize;  /* Size is at most 5000 bytes */
  ^
  linux/coda_psdev.h:16:2: error: unknown type name `u_short'
  u_short             uc_outSize;
  ^
  linux/coda_psdev.h:17:2: error: unknown type name `u_short'
  u_short             uc_opcode;  /* copied from data to save lookup */
  ^
  linux/coda_psdev.h:19:2: error: unknown type name `wait_queue_head_t'
  wait_queue_head_t   uc_sleep;   /* process' wait queue */
  ^

Link: http://lkml.kernel.org/r/9f99f5ce6a0563d5266e6cf7aa9585aac2cae971.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Mikko Rapeli <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

uapi linux/coda.h: use __kernel_pid_t for userspace

Part of a patch by Mikko Rapeli, as Arnd Bergman commented on the
original patch.

pid_t might differ between libc and the kernel, so the kernel
interface has to use types that the kernel defines.

Link: http://lkml.kernel.org/r/f374a71f4d351bc8c8b3ac18ad7765c88d806d10.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Mikko Rapeli <[email protected]>
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: pass the host file in vma->vm_file on mmap

Patch series "Coda updates".

The following patch series is a collection of various fixes for Coda,
most of which were collected from linux-fsdevel or linux-kernel but
which have as yet not found their way upstream.

This patch (of 22):

Various file systems expect that vma->vm_file points at their own file
handle, several use file_inode(vma->vm_file) to get at their inode or
use vma->vm_file->private_data. However the way Coda wrapped mmap on a
host file broke this assumption, vm_file was still pointing at the Coda
file and the host file systems would scribble over Coda's inode and
private file data.

This patch fixes the incorrect expectation and wraps vm_ops->open and
vm_ops->close to allow Coda to track when the vm_area_struct is
destroyed so we still release the reference on the Coda file handle at
the right time.

Link: http://lkml.kernel.org/r/0e850c6e59c0b147dc2dcd51a3af004c948c3697.1558117389.git.jaharkes@cs.cmu.edu
Signed-off-by: Jan Harkes <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: David Howells <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Sam Protsenko <[email protected]>
Cc: Yann Droneaud <[email protected]>
Cc: Zhouyang Jia <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, kprobes: generalize and rename notify_page_fault() as kprobe_page_fault()

Architectures which support kprobes have very similar boilerplate around
calling kprobe_fault_handler().  Use a helper function in kprobes.h to
unify them, based on the x86 code.

This changes the behaviour for other architectures when preemption is
enabled.  Previously, they would have disabled preemption while calling
the kprobe handler.  However, preemption would be disabled if this fault
was due to a kprobe, so we know the fault was not due to a kprobe
handler and can simply return failure.

This behaviour was introduced in commit a980c0ef9f6d ("x86/kprobes:
Refactor kprobes_fault() like kprobe_exceptions_notify()")

[[email protected]: export kprobe_fault_handler()]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Reviewed-by: Dave Hansen <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Russell King <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Ralf Baechle <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

init/Kconfig: fix neighboring typos

This fixes a couple typos I noticed in the slab Kconfig:

sacrifies -> sacrifices
accellerate -> accelerate

Seeing as no other instances of these typos are found elsewhere in the
kernel and that I originally added one of the two, I can only assume
working on slab must have caused damage to the spelling centers of my
brain.

Link: http://lkml.kernel.org/r/201905292203.CD000546EB@keescook
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/binfmt_elf.c: delete stale comment

"passed_fileno" variable was deleted 11 years ago in 2.6.25.

Link: http://lkml.kernel.org/r/20190529201747.GA23248@avx2
Fixes: d20894a23708 ("Remove a.out interpreter support in ELF loader")
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/binfmt_flat.c: remove set but not used variable 'inode'

Fixes gcc '-Wunused-but-set-variable' warning:

fs/binfmt_flat.c: In function load_flat_file:
fs/binfmt_flat.c:419:16: warning: variable inode set but not used [-Wunused-but-set-variable]

It's never used and can be removed.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: YueHaibing <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

checkpatch.pl: warn on duplicate sysctl local variable

Commit d91bff3011cf ("proc/sysctl: add shared variables for range
check") adds some shared const variables to be used instead of a local
copy in each source file.  Warn when a chunk duplicates one of these
values in a ctl_table struct:

    $ scripts/checkpatch.pl 0001-test-commit.patch
    WARNING: duplicated sysctl range checking value 'zero', consider using the shared one in include/linux/sysctl.h
    #27: FILE: arch/arm/kernel/isa.c:48:
    +               .extra1         = &zero,

    WARNING: duplicated sysctl range checking value 'int_max', consider using the shared one in include/linux/sysctl.h
    #28: FILE: arch/arm/kernel/isa.c:49:
    +               .extra2         = &int_max,

    total: 0 errors, 2 warnings, 14 lines checked

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matteo Croce <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Aaron Tomlin <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/rbtree: avoid generating code twice for the cached versions

As was already noted in rbtree.h, the logic to cache rb_first (or
rb_last) can easily be implemented externally to the core rbtree api.

Change the implementation to do just that.  Previously the update of
rb_leftmost was wired deeper into the implmentation, but there were some
disadvantages to that - mostly, lib/rbtree.c had separate instantiations
for rb_insert_color() vs rb_insert_color_cached(), as well as rb_erase()
vs rb_erase_cached(), which were doing exactly the same thing save for
the rb_leftmost update at the start of either function.

   text    data     bss     dec     hex filename
   5405     120       0    5525    1595 lib/rbtree.o-vanilla
   3827      96       0    3923     f53 lib/rbtree.o-patch

[[email protected]: changelog addition]
Link: http://lkml.kernel.org/r/20190628171416.by5gdizl3rcxk5h5@linux-r8p5
[[email protected]: coding-style fixes]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michel Lespinasse <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/test_meminit.c: minor test fixes

Fix the following issues in test_meminit.c:
- |size| in fill_with_garbage_skip() should be signed so that it
doesn't overflow if it's not aligned on sizeof(*p);
- fill_with_garbage_skip() should actually skip |skip| bytes;
- do_kmem_cache_size() should deallocate memory in the RCU case.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 7e659650cbda ("lib: introduce test_meminit module")
Fixes: 94e8988d91c7 ("lib/test_meminit.c: fix -Wmaybe-uninitialized false positive")
Signed-off-by: Alexander Potapenko <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/test_meminit.c: fix -Wmaybe-uninitialized false positive

The conditional logic is too complicated for the compiler to fully
comprehend:

  lib/test_meminit.c: In function 'test_meminit_init':
  lib/test_meminit.c:236:5: error: 'buf_copy' may be used uninitialized in this function [-Werror=maybe-uninitialized]
       kfree(buf_copy);
       ^~~~~~~~~~~~~~~
  lib/test_meminit.c:201:14: note: 'buf_copy' was declared here

Simplify it by splitting out the non-rcu section.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: af734ee6ec85 ("lib: introduce test_meminit module")
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Alexander Potapenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/string_helpers: fix some kerneldoc warnings

Due to some sad limitations in how kerneldoc comments are parsed, the
documentation in lib/string_helpers.c generates these warnings:

  lib/string_helpers.c:236: WARNING: Unexpected indentation.
  lib/string_helpers.c:241: WARNING: Block quote ends without a blank line; unexpected unindent.
  lib/string_helpers.c:446: WARNING: Unexpected indentation.
  lib/string_helpers.c:451: WARNING: Block quote ends without a blank line; unexpected unindent.
  lib/string_helpers.c:474: WARNING: Unexpected indentation.

Rework the comments to obtain something like the desired result.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/ioremap: probe platform for p4d huge map support

Finish up what commit c2febafc6773 ("mm: convert generic code to 5-level
paging") started while levelling up P4D huge mapping support at par with
PUD and PMD. A new arch call back arch_ioremap_p4d_supported() is added
which just maintains status quo (P4D huge map not supported) on x86,
arm64 and powerpc.

When HAVE_ARCH_HUGE_VMAP is enabled its just a simple check from the
arch about the support, hence runtime effects are minimal.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Acked-by: Michael Ellerman <[email protected]> (powerpc)
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/ioremap: check virtual address alignment while creating huge mappings

Virtual address alignment is essential in ensuring correct clearing for
all intermediate level pgtable entries and freeing associated pgtable
pages. An unaligned address can end up randomly freeing pgtable page
that potentially still contains valid mappings. Hence also check it's
alignment along with existing phys_addr check.

Signed-off-by: Anshuman Khandual <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib: introduce test_meminit module

Add tests for heap and pagealloc initialization.  These can be used to
check init_on_alloc and init_on_free implementations as well as other
approaches to initialization.

Expected test output in the case the kernel provides heap initialization
(e.g.  when running with either init_on_alloc=1 or init_on_free=1):

  test_meminit: all 10 tests in test_pages passed
  test_meminit: all 40 tests in test_kvmalloc passed
  test_meminit: all 60 tests in test_kmemcache passed
  test_meminit: all 10 tests in test_rcu_persistent passed
  test_meminit: all 120 tests passed!

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexander Potapenko <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Kostya Serebryany <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Sandeep Patil <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Marco Elver <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/test_overflow.c: avoid tainting the kernel and fix wrap size

This adds __GFP_NOWARN to the kmalloc()-portions of the overflow test to
avoid tainting the kernel. Additionally fixes up the math on wrap size
to be architecture and page size agnostic.

Link: http://lkml.kernel.org/r/201905282012.0A8767E24@keescook
Fixes: ca90800a91ba ("test_overflow: Add memory allocation overflow tests")
Signed-off-by: Kees Cook <[email protected]>
Reported-by: Randy Dunlap <[email protected]>
Suggested-by: Rasmus Villemoes <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/test_string.c: add some testcases for strchr and strnchr

Make sure that the trailing NUL is considered part of the string and can
be found.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Rosin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/test_string.c: avoid masking memset16/32/64 failures

If a memsetXX implementation is completely broken and fails in the first
iteration, when i, j, and k are all zero, the failure is masked as zero
is returned. Failing in the first iteration is perhaps the most likely
failure, so this makes the tests pretty much useless. Avoid the
situation by always setting a random unused bit in the result on
failure.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 03270c13c5ff ("lib/string.c: add testcases for memset16/32/64")
Signed-off-by: Peter Rosin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/string.c: allow searching for NUL with strnchr

Patch series "lib/string: search for NUL with strchr/strnchr".

I noticed an inconsistency where strchr and strnchr do not behave the
same with respect to the trailing NUL.  strchr is standardised and the
kernel function conforms, and the kernel relies on the behavior.  So,
naturally strchr stays as-is and strnchr is what I change.

While writing a few tests to verify that my new strnchr loop was sane, I
noticed that the tests for memset16/32/64 had a problem.  Since it's all
about the lib/string.c file I made a short series of it all...

This patch (of 3):

strchr considers the terminating NUL to be part of the string, and NUL
can thus be searched for with that function.  For consistency, do the
same with strnchr.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Peter Rosin <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/list: tweak LIST_POISON2 for better code generation on x86_64

list_del() poisoning can generate 2 64-bit immediate loads but it also can
generate one 64-bit immediate load and an addition:

48 b8 00 01 00 00 00 00 ad de movabs rax,0xdead000000000100
48 89 47 58 mov    QWORD PTR [rdi+0x58],rax
48 05 00 01 00 00   <=====> add    rax,0x100
48 89 47 60 mov    QWORD PTR [rdi+0x60],rax

However on x86_64 not all constants are equal: those within [-128, 127]
range can be added with shorter "add r64, imm32" instruction:

48 b8 00 01 00 00 00 00 ad de movabs rax,0xdead000000000100
48 89 47 58 mov    QWORD PTR [rdi+0x58],rax
48 83 c0 22 <======> add    rax,0x22
48 89 47 60 mov    QWORD PTR [rdi+0x60],rax

Patch saves 2 bytes per some LIST_POISON2 usage.

(Slightly disappointing) space savings on F29 x86_64 config:

add/remove: 0/0 grow/shrink: 0/2164 up/down: 0/-5184 (-5184)
Function                                     old     new   delta
zstd_get_workspace                           548     546      -2
...
mlx4_delete_all_resources_for_slave         4826    4804     -22
Total: Before=83304131, After=83298947, chg -0.01%

New constants are:

0xdead000000000100
0xdead000000000122

Note: LIST_POISON1 can't be changed to ...11 because something in page
allocator requires low bit unset.

Link: http://lkml.kernel.org/r/20190513191502.GA8492@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Cc: Vasiliy Kulikov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

get_maintainer: add ability to skip moderated mailing lists

Add a command line switch --no-moderated to skip L: mailing lists marked
with 'moderated'.

Some people prefer not emailing moderated mailing lists as the
moderation time can be indeterminate and some emails can be
intentionally dropped by a moderator.

This can cause fragmentation of email threads when some are subscribed
to a moderated list but others are not and emails are dropped.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Joe Perches <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

asm-generic: fix a compilation warning

Fix this compilation warning on x86 by making flush_cache_vmap() inline.

  lib/ioremap.c: In function 'ioremap_page_range':
  lib/ioremap.c:214:16: warning: variable 'start' set but not used [-Wunused-but-set-variable]
    unsigned long start;
                  ^~~~~

While at it, convert all other similar functions to inline for
consistency.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Qian Cai <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arch/*: remove unused isa_page_to_bus()

isa_page_to_bus() is deprecated and is no longer used anywhere. Remove
it entirely.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Stephen Kitt <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arch: replace _BITUL() in kernel-space headers with BIT()

Now that BIT() can be used from assembly code, we can safely replace
_BITUL() with equivalent BIT().

UAPI headers are still required to use _BITUL(), but there is no more
reason to use it in kernel headers. BIT() is shorter.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

linux/bits.h: make BIT(), GENMASK(), and friends available in assembly

BIT(),  GENMASK(), etc. are useful to define register bits of hardware.
However, low-level code is often written in assembly, where they are
not available due to the hard-coded 1UL, 0UL.

In fact, in-kernel headers such as arch/arm64/include/asm/sysreg.h
use _BITUL() instead of BIT() so that the register bit macros are
available in assembly.

Using macros in include/uapi/linux/const.h have two reasons:

[1] For use in uapi headers
  We should use underscore-prefixed variants for user-space.

[2] For use in assembly code
  Since _BITUL() uses UL(1) instead of 1UL, it can be used as an
  alternative of BIT().

For [2], it is pretty easy to change BIT() etc. for use in assembly.

This allows to replace _BUTUL() in kernel-space headers with BIT().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel: fix typos and some coding style in comments

fix lenght to length

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Weitao Hou <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Colin Ian King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.

Normally, the inode's i_uid/i_gid are translated relative to s_user_ns,
but this is not a correct behavior for proc.  Since sysctl permission
check in test_perm is done against GLOBAL_ROOT_[UG]ID, it makes more
sense to use these values in u_[ug]id of proc inodes.  In other words:
although uid/gid in the inode is not read during test_perm, the inode
logically belongs to the root of the namespace.  I have confirmed this
with Eric Biederman at LPC and in this thread:
  https://lore.kernel.org/lkml/[email protected]

Consequences
============

Since the i_[ug]id values of proc nodes are not used for permissions
checks, this change usually makes no functional difference.  However, it
causes an issue in a setup where:

* a namespace container is created without root user in container -
   hence the i_[ug]id of proc nodes are set to INVALID_[UG]ID

* container creator tries to configure it by writing /proc/sys files,
   e.g. writing /proc/sys/kernel/shmmax to configure shared memory limit

Kernel does not allow to open an inode for writing if its i_[ug]id are
invalid, making it impossible to write shmmax and thus - configure the
container.

Using a container with no root mapping is apparently rare, but we do use
this configuration at Google.  Also, we use a generic tool to configure
the container limits, and the inability to write any of them causes a
failure.

History
=======

The invalid uids/gids in inodes first appeared due to 81754357770e (fs:
Update i_[ug]id_(read|write) to translate relative to s_user_ns).
However, AFAIK, this did not immediately cause any issues.  The
inability to write to these "invalid" inodes was only caused by a later
commit 0bd23d09b874 (vfs: Don't modify inodes with a uid or gid unknown
to the vfs).

Tested: Used a repro program that creates a user namespace without any
mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
Before the change, it shows the overflow uid, with the change it's 0.
The overflow uid indicates that the uid in the inode is not correct and
thus it is not possible to open the file for writing.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 0bd23d09b874 ("vfs: Don't modify inodes with a uid or gid unknown to the vfs")
Signed-off-by: Radoslaw Burny <[email protected]>
Acked-by: Luis Chamberlain <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Seth Forshee <[email protected]>
Cc: John Sperbeck <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: <[email protected]> [4.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

proc: test /proc/sysvipc vs setns(CLONE_NEWIPC)

I thought that /proc/sysvipc has the same bug as /proc/net

commit 1fde6f21d90f8ba5da3cb9c54ca991ed72696c43
proc: fix /proc/net/* after setns(2)

However, it doesn't! /proc/sysvipc files do

get_ipc_ns(current->nsproxy->ipc_ns);

in their open() hook and avoid the problem.

Keep the test, maybe /proc/sysvipc will become broken someday :-\

Link: http://lkml.kernel.org/r/20190706180146.GA21015@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/proc/inode.c: use typeof_member() macro

Don't repeat function signatures twice.

This is a kind-of-precursor for "struct proc_ops".

Note:

typeof(pde->proc_fops->...) ...;

can't be used because ->proc_fops is "const struct file_operations *".
"const" prevents assignment down the code and it can't be deleted in the
type system.

Link: http://lkml.kernel.org/r/20190529191110.GB5703@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

include/linux/kernel.h: add typeof_member() macro

Add typeof_member() macro so that types can be extracted without
introducing dummy variables.

Link: http://lkml.kernel.org/r/20190529190720.GA5703@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

vmcore: add a kernel parameter novmcoredd

Since commit 2724273e8fd0 ("vmcore: add API to collect hardware dump in
second kernel"), drivers are allowed to add device related dump data to
vmcore as they want by using the device dump API.  This has a potential
issue, the data is stored in memory, drivers may append too much data
and use too much memory.  The vmcore is typically used in a kdump kernel
which runs in a pre-reserved small chunk of memory.  So as a result it
will make kdump unusable at all due to OOM issues.

So introduce new 'novmcoredd' command line option.  User can disable
device dump to reduce memory usage.  This is helpful if device dump is
using too much memory, disabling device dump could make sure a regular
vmcore without device dump data is still available.

[[email protected]: tweak documentation]
[[email protected]: vmcore.c needs moduleparam.h]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kairui Song <[email protected]>
Acked-by: Dave Young <[email protected]>
Reviewed-by: Bhupesh Sharma <[email protected]>
Cc: Rahul Lakkireddy <[email protected]>
Cc: "David S . Miller" <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Baoquan He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

tools/testing/selftests/proc/proc-pid-vm.c: hide "segfault at ffffffffff600000" dmesg spam

Test tries to access vsyscall page and if it doesn't exist gets SIGSEGV
which can spam into dmesg. However the segfault happens by design.
Handle it and carry information via exit code to parent.

Link: http://lkml.kernel.org/r/20190524181256.GA2260@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: stub out all of swapops.h for !CONFIG_MMU

The whole header file deals with swap entries and PTEs, none of which
can exist for nommu builds. The current nommu ports have lots of stubs
to allow the inline functions in swapops.h to compile, but as none of
this functionality is actually used there is no point in even providing
it. This way we don't have to provide the stubs for the upcoming RISC-V
nommu port, and can eventually remove it from the existing ports.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Cc: Vladimir Murzin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: provide a print_vma_addr stub for !CONFIG_MMU

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Vladimir Murzin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix the MAP_UNINITIALIZED flag

We can't expose UAPI symbols differently based on CONFIG_ symbols, as
userspace won't have them available. Instead always define the flag,
but only respect it based on the config option.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Vladimir Murzin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/cma.c: fail if fixed declaration can't be honored

The description of cma_declare_contiguous() indicates that if the
'fixed' argument is true the reserved contiguous area must be exactly at
the address of the 'base' argument.

However, the function currently allows the 'base', 'size', and 'limit'
arguments to be silently adjusted to meet alignment constraints. This
commit enforces the documented behavior through explicit checks that
return an error if the region does not fit within a specified region.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 5ea3b1b2f8ad ("cma: add placement specifier for "cma=" kernel parameter")
Signed-off-by: Doug Berger <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Cc: Yue Hu <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Peng Fan <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/z3fold.c: reinitialize zhdr structs after migration

z3fold_page_migration() calls memcpy(new_zhdr, zhdr, PAGE_SIZE).
However, zhdr contains fields that can't be directly coppied over (ex:
list_head, a circular linked list). We only need to initialize the
linked lists in new_zhdr, as z3fold_isolate_page() already ensures that
these lists are empty

Additionally it is possible that zhdr->work has been placed in a
workqueue. In this case we shouldn't migrate the page, as zhdr->work
references zhdr as opposed to new_zhdr.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 1f862989b04ade61d3 ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Vitaly Vul <[email protected]>
Cc: Vitaly Wool <[email protected]>
Cc: Jonathan Adams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/z3fold.c: remove z3fold_migration trylock

z3fold_page_migrate() will never succeed because it attempts to acquire
a lock that has already been taken by migrate.c in __unmap_and_move().

  __unmap_and_move() migrate.c
    trylock_page(oldpage)
    move_to_new_page(oldpage_newpage)
      a_ops->migrate_page(oldpage, newpage)
        z3fold_page_migrate(oldpage, newpage)
          trylock_page(oldpage)

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Vitaly Wool <[email protected]>
Cc: Vitaly Vul <[email protected]>
Cc: Jonathan Adams <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Snild Dolkow <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/vmscan.c: add checks for incorrect handling of current->reclaim_state

Six sites are presently altering current->reclaim_state. There is a
risk that one function stomps on a caller's value. Use a helper
function to catch such errors.

Cc: Yafang Shao <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/vmscan.c: calculate reclaimed slab caches in all reclaim paths

There are six different reclaim paths by now:

- kswapd reclaim path
- node reclaim path
- hibernate preallocate memory reclaim path
- direct reclaim path
- memcg reclaim path
- memcg softlimit reclaim path

The slab caches reclaimed in these paths are only calculated in the
above three paths.

There're some drawbacks if we don't calculate the reclaimed slab caches.

- The sc->nr_reclaimed isn't correct if there're some slab caches
   relcaimed in this path.

- The slab caches may be reclaimed thoroughly if there're lots of
   reclaimable slab caches and few page caches.

   Let's take an easy example for this case. If one memcg is full of
   slab caches and the limit of it is 512M, in other words there're
   approximately 512M slab caches in this memcg. Then the limit of the
   memcg is reached and the memcg reclaim begins, and then in this memcg
   reclaim path it will continuesly reclaim the slab caches until the
   sc->priority drops to 0. After this reclaim stops, you will find
   there're few slab caches left, which is less than 20M in my test
   case. While after this patch applied the number is greater than 300M
   and the sc->priority only drops to 3.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Reviewed-by: Kirill Tkhai <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/vmscan.c: add a new member reclaim_state in struct shrink_control

Patch series "mm/vmscan: calculate reclaimed slab in all reclaim paths".

This patchset is to fix the issues in doing shrink slab.

There're six different reclaim paths by now,
- kswapd reclaim path
- node reclaim path
- hibernate preallocate memory reclaim path
- direct reclaim path
- memcg reclaim path
- memcg softlimit reclaim path

The slab caches reclaimed in these paths are only calculated in the
above three paths.  The issues are detailed explained in patch #2.  We
should calculate the reclaimed slab caches in every reclaim path.  In
order to do it, the struct reclaim_state is placed into the struct
shrink_control.

In node reclaim path, there'is another issue about shrinking slab, which
is adressed in "mm/vmscan: shrink slab in node reclaim"
(https://lore.kernel.org/linux-mm/1559874946 [email protected]/).

This patch (of 2):

The struct reclaim_state is used to record how many slab caches are
reclaimed in one reclaim path.  The struct shrink_control is used to
control one reclaim path.  So we'd better put reclaim_state into
shrink_control.

[[email protected]: remove reclaim_state assignment from __perform_reclaim()]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Kirill Tkhai <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones

After commit 815744d75152 ("mm: memcontrol: don't batch updates of local
VM stats and events"), the local VM counter are not in sync with the
hierarchical ones.

Below is one example in a leaf memcg on my server (with 8 CPUs):

inactive_file 3567570944
total_inactive_file 3568029696

We find that the deviation is very great because the 'val' in
__mod_memcg_state() is in pages while the effective value in
memcg_stat_show() is in bytes.

So the maximum of this deviation between local VM stats and total VM
stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an
unacceptably great value.

We should keep the local VM stats in sync with the total stats. In
order to keep this behavior the same across counters, this patch updates
__mod_lruvec_state() and __count_memcg_events() as well.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Yafang Shao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc

One of the gfp flags used to show that a page is movable is
__GFP_HIGHMEM. Currently z3fold_alloc() fails when __GFP_HIGHMEM is
passed. Now that z3fold pages are movable, we allow __GFP_HIGHMEM. We
strip the movability related flags from the call to kmem_cache_alloc()
for our slots since it is a kernel allocation.

[[email protected]: coding-style fixes]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Henry Burns <[email protected]>
Acked-by: Vitaly Wool <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/cma.c: fix a typo ("alloc_cma" -> "cma_alloc") in cma_release() comments

A comment referred to a non-existent function alloc_cma(), which should
have been cma_alloc().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryohei Suzuki <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/slab_common.c: work around clang bug #42570

Clang gets rather confused about two variables in the same special
section when one of them is not initialized, leading to an assembler
warning later:

/tmp/slab_common-18f869.s: Assembler messages:
/tmp/slab_common-18f869.s:7526: Warning: ignoring changed section attributes for .data..ro_after_init

Adding an initialization to kmalloc_caches is rather silly here
but does avoid the issue.

Link: https://bugs.llvm.org/show_bug.cgi?id=42570
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: David Rientjes <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/mpi/longlong.h: fix building with 32-bit x86

The mpi library contains some rather old inline assembly statements that
produce a lot of warnings for 32-bit x86, such as:

  lib/mpi/mpih-div.c:76:16: error: invalid use of a cast in a inline asm context requiring an l-value: remove the cast or build with -fheinous-gnu-extensions
                                  udiv_qrnnd(qp[i], n1, n1, np[i], d);
                                  ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
  lib/mpi/longlong.h:423:20: note: expanded from macro 'udiv_qrnnd'
          : "=a" ((USItype)(q)), \
                  ~~~~~~~~~~^~

There is no point in doing a type cast for the output of an inline
assembler statement, so just remove the cast here, as we have done for
other architectures in the past.

See also dea632cadd12 ("lib/mpi: fix build with clang").

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Cc: Stefan Agner <[email protected]>
Cc: Dmitry Kasatkin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/shmem.c: fix unused shmem_parse_huge() function warning

When CONFIG_SYSFS is disabled but CONFIG_TMPFS is enabled, we get a
warning about shmem_parse_huge() never being called:

mm/shmem.c:417:12: error: unused function 'shmem_parse_huge' [-Werror,-Wunused-function]
static int shmem_parse_huge(const char *str)

Change the #ifdef so we no longer build this function in that configuration.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 144df3b288c4 ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API")
Signed-off-by: Arnd Bergmann <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: David Howells <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Vineeth Remanan Pillai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/z3fold: don't try to use buddy slots after free

As reported by Henry Burns:

Running z3fold stress testing with address sanitization showed zhdr->slots
was being used after it was freed.

  z3fold_free(z3fold_pool, handle)
    free_handle(handle)
      kmem_cache_free(pool->c_handle, zhdr->slots)
    release_z3fold_page_locked_list(kref)
      __release_z3fold_page(zhdr, true)
        zhdr_to_pool(zhdr)
          slots_to_pool(zhdr->slots)  *BOOM*

To fix this, add pointer to the pool back to z3fold_header and modify
zhdr_to_pool to return zhdr->pool.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 7c2b8baa61fe ("mm/z3fold.c: add structure for buddy handles")
Signed-off-by: Vitaly Wool <[email protected]>
Reported-by: Henry Burns <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Jonathan Adams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>