Git Repo - linux.git/log

Revert "mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical"

This reverts commit 0f6d24f87856 ("mm/page-writeback.c: print a warning
if the vm dirtiness settings are illogical") because it causes false
positive warnings during OOM situations as noticed by Tetsuo Handa:

  Node 0 active_anon:3525940kB inactive_anon:8372kB active_file:216kB inactive_file:1872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2504kB dirty:52kB writeback:0kB shmem:8660kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 636928kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
  Node 0 DMA free:14848kB min:284kB low:352kB high:420kB active_anon:992kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 2687 3645 3645
  Node 0 DMA32 free:53004kB min:49608kB low:62008kB high:74408kB active_anon:2712648kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2773132kB mlocked:0kB kernel_stack:96kB pagetables:5096kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 0 958 958
  Node 0 Normal free:17140kB min:17684kB low:22104kB high:26524kB active_anon:812300kB inactive_anon:8372kB active_file:1228kB inactive_file:1868kB unevictable:0kB writepending:52kB present:1048576kB managed:981224kB mlocked:0kB kernel_stack:3520kB pagetables:8552kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  [...]
  Out of memory: Kill process 8459 (a.out) score 999 or sacrifice child
  Killed process 8459 (a.out) total-vm:4180kB, anon-rss:88kB, file-rss:0kB, shmem-rss:0kB
  oom_reaper: reaped process 8459 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
  vm direct limit must be set greater than background limit.

The problem is that both thresh and bg_thresh will be 0 if
available_memory is less than 4 pages when evaluating
global_dirtyable_memory.

While this might be worked around the whole point of the warning is
dubious at best.  We do rely on admins to do sensible things when
changing tunable knobs.  Dirty memory writeback knobs are not any
special in that regards so revert the warning rather than adding more
hacks to work this around.

Debugged by Yafang Shao.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 0f6d24f87856 ("mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical")
Signed-off-by: Michal Hocko <[email protected]>
Reported-by: Tetsuo Handa <[email protected]>
Cc: Yafang Shao <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/madvise.c: fix madvise() infinite loop under special circumstances

MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
Unfortunately madvise_willneed() doesn't communicate this information
properly to the generic madvise syscall implementation.  The calling
convention is quite subtle there.  madvise_vma() is supposed to either
return an error or update &prev otherwise the main loop will never
advance to the next vma and it will keep looping for ever without a way
to get out of the kernel.

It seems this has been broken since introduction.  Nobody has noticed
because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.

[[email protected]: rewrite changelog]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
Signed-off-by: chenjie <[email protected]>
Signed-off-by: guoxuenan <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: zhangyi (F) <[email protected]>
Cc: Miao Xie <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Carsten Otte <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

exec: avoid RLIMIT_STACK races with prlimit()

While the defense-in-depth RLIMIT_STACK limit on setuid processes was
protected against races from other threads calling setrlimit(), I missed
protecting it against races from external processes calling prlimit().
This adds locking around the change and makes sure that rlim_max is set
too.

Link: http://lkml.kernel.org/r/20171127193457.GA11348@beast
Fixes: 64701dee4178e ("exec: Use sane stack rlimit under secureexec")
Signed-off-by: Kees Cook <[email protected]>
Reported-by: Ben Hutchings <[email protected]>
Reported-by: Brad Spengler <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Cc: James Morris <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

IB/core: disable memory registration of filesystem-dax vmas

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow RDMA to create long standing memory registrations
against filesytem-dax vmas.

Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Christoph Hellwig <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Jason Gunthorpe <[email protected]>
Acked-by: Doug Ledford <[email protected]>
Cc: Sean Hefty <[email protected]>
Cc: Hal Rosenstock <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

v4l2: disable filesystem-dax mapping support

V4L2 memory registrations are incompatible with filesystem-dax that
needs the ability to revoke dma access to a mapping at will, or
otherwise allow the kernel to wait for completion of DMA. The
filesystem-dax implementation breaks the traditional solution of
truncate of active file backed mappings since there is no page-cache
page we can orphan to sustain ongoing DMA.

If v4l2 wants to support long lived DMA mappings it needs to arrange to
hold a file lease or use some other mechanism so that the kernel can
coordinate revoking DMA access when the filesystem needs to truncate
mappings.

Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Jan Kara <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Hal Rosenstock <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sean Hefty <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fail get_vaddr_frames() for filesystem-dax mappings

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow V4L2, Exynos, and other frame vector users to create
long standing / irrevocable memory registrations against filesytem-dax
vmas.

[[email protected]: add comment for vma_is_fsdax() check in get_vaddr_frames(), per Jan]
Link: http://lkml.kernel.org/r/151197874035.26211.4061781453123083667.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Hal Rosenstock <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sean Hefty <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: introduce get_user_pages_longterm

Patch series "introduce get_user_pages_longterm()", v2.

Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely.  This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).

In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future.  This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.

Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.

Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.

I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel.  The behavior regression this policy
change implies is one of the reasons we maintain the "dax enabled.
Warning: EXPERIMENTAL, use at your own risk" notification when mounting
a filesystem in dax mode.

It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.

This patch (of 4):

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow long standing memory registrations against
filesytem-dax vmas.  Device-dax vmas do not have this problem and are
explicitly allowed.

This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and
V4L2).

[[email protected]: use kcalloc()]
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <[email protected]>
Suggested-by: Christoph Hellwig <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Hal Rosenstock <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sean Hefty <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

device-dax: implement ->split() to catch invalid munmap attempts

Similar to how device-dax enforces that the 'address', 'offset', and
'len' parameters to mmap() be aligned to the device's fundamental
alignment, the same constraints apply to munmap().  Implement ->split()
to fail munmap calls that violate the alignment constraint.

Otherwise, we later fail VM_BUG_ON checks in the unmap_page_range() path
with crash signatures of the form:

    vma ffff8800b60c8a88 start 00007f88c0000000 end 00007f88c0e00000
    next           (null) prev           (null) mm ffff8800b61150c0
    prot 8000000000000027 anon_vma           (null) vm_ops ffffffffa0091240
    pgoff 0 file ffff8800b638ef80 private_data           (null)
    flags: 0x380000fb(read|write|shared|mayread|maywrite|mayexec|mayshare|softdirty|mixedmap|hugepage)
    ------------[ cut here ]------------
    kernel BUG at mm/huge_memory.c:2014!
    [..]
    RIP: 0010:__split_huge_pud+0x12a/0x180
    [..]
    Call Trace:
     unmap_page_range+0x245/0xa40
     ? __vma_adjust+0x301/0x990
     unmap_vmas+0x4c/0xa0
     unmap_region+0xae/0x120
     ? __vma_rb_erase+0x11a/0x230
     do_munmap+0x276/0x410
     vm_munmap+0x6a/0xa0
     SyS_munmap+0x1d/0x30

Link: http://lkml.kernel.org/r/151130418681.4029.7118245855057952010.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Jeff Moyer <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, hugetlbfs: introduce ->split() to vm_operations_struct

Patch series "device-dax: fix unaligned munmap handling"

When device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and fail attempts to split vmas into unaligned ranges.  It
would be messy to teach the munmap path about device-dax alignment
constraints in the same (hstate) way that hugetlbfs communicates this
constraint.  Instead, these patches introduce a new ->split() vm
operation.

This patch (of 2):

The device-dax interface has similar constraints as hugetlbfs in that it
requires the munmap path to unmap in huge page aligned units.  Rather
than add more custom vma handling code in __split_vma() introduce a new
vm operation to perform this vma specific check.

Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

scripts/faddr2line: extend usage on generic arch

When cross-compiling, fadd2line should use the binary tool used for the
target system, rather than that of the host.

Link: http://lkml.kernel.org/r/20171121092911.GA150711@sofia
Signed-off-by: Liu Changcheng <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: replace pte_write with pte_access_permitted in fault + gup paths

The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:

- validate that the mapping is writable from a protection keys
standpoint

- validate that the pte has _PAGE_USER set since all fault paths where
pte_write is must be referencing user-memory.

Link: http://lkml.kernel.org/r/151043111604.2842.8051684481794973100.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: replace pmd_write with pmd_access_permitted in fault + gup paths

The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:

- validate that the mapping is writable from a protection keys
standpoint

- validate that the pte has _PAGE_USER set since all fault paths where
pmd_write is must be referencing user-memory.

Link: http://lkml.kernel.org/r/151043111049.2842.15241454964150083466.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: replace pud_write with pud_access_permitted in fault + gup paths

The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:

- validate that the mapping is writable from a protection keys
standpoint

- validate that the pte has _PAGE_USER set since all fault paths where
pud_write is must be referencing user-memory.

[[email protected]: fix powerpc compile error]
Link: http://lkml.kernel.org/r/151129127237.37405.16073414520854722485.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151043110453.2842.2166049702068628177.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE

In response to compile breakage introduced by a series that added the
pud_write helper to x86, Stephen notes:

    did you consider using the other paradigm:

    In arch include files:
    #define pud_write       pud_write
    static inline int pud_write(pud_t pud)
     .....

    Then in include/asm-generic/pgtable.h:

    #ifndef pud_write
    tatic inline int pud_write(pud_t pud)
    {
            ....
    }
    #endif

    If you had, then the powerpc code would have worked ... ;-) and many
    of the other interfaces in include/asm-generic/pgtable.h are
    protected that way ...

Given that some architecture already define pmd_write() as a macro, it's
a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE.

Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Suggested-by: Stephen Rothwell <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Oliver OHalloran <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix device-dax pud write-faults triggered by get_user_pages()

Currently only get_user_pages_fast() can safely handle the writable gup
case due to its use of pud_access_permitted() to check whether the pud
entry is writable.  In the gup slow path pud_write() is used instead of
pud_access_permitted() and to date it has been unimplemented, just calls
BUG_ON().

    kernel BUG at ./include/linux/hugetlb.h:244!
    [..]
    RIP: 0010:follow_devmap_pud+0x482/0x490
    [..]
    Call Trace:
     follow_page_mask+0x28c/0x6e0
     __get_user_pages+0xe4/0x6c0
     get_user_pages_unlocked+0x130/0x1b0
     get_user_pages_fast+0x89/0xb0
     iov_iter_get_pages_alloc+0x114/0x4a0
     nfs_direct_read_schedule_iovec+0xd2/0x350
     ? nfs_start_io_direct+0x63/0x70
     nfs_file_direct_read+0x1e0/0x250
     nfs_file_read+0x90/0xc0

For now this just implements a simple check for the _PAGE_RW bit similar
to pmd_write.  However, this implies that the gup-slow-path check is
missing the extra checks that the gup-fast-path performs with
pud_access_permitted.  Later patches will align all checks to use the
'access_permitted' helper if the architecture provides it.

Note that the generic 'access_permitted' helper fallback is the simple
_PAGE_RW check on architectures that do not define the
'access_permitted' helper(s).

[[email protected]: fix powerpc compile error]
Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Acked-by: Thomas Gleixner <[email protected]> [x86]
Cc: Kirill A. Shutemov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/cma: fix alloc_contig_range ret code/potential leak

If the call __alloc_contig_migrate_range() in alloc_contig_range returns
-EBUSY, processing continues so that test_pages_isolated() is called
where there is a tracepoint to identify the busy pages.  However, it is
possible for busy pages to become available between the calls to these
two routines.  In this case, the range of pages may be allocated.
Unfortunately, the original return code (ret == -EBUSY) is still set and
returned to the caller.  Therefore, the caller believes the pages were
not allocated and they are leaked.

Update the comment to indicate that allocation is still possible even if
__alloc_contig_migrate_range returns -EBUSY.  Also, clear return code in
this case so that it is not accidentally used or returned to caller.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure")
Signed-off-by: Mike Kravetz <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Joonsoo Kim <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, oom_reaper: gather each vma to prevent leaking TLB entry

tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory
space.  In this case, tlb->fullmm is true.  Some archs like arm64
doesn't flush TLB when tlb->fullmm is true:

  commit 5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1").

Which causes leaking of tlb entries.

Will clarifies his patch:
"Basically, we tag each address space with an ASID (PCID on x86) which
  is resident in the TLB. This means we can elide TLB invalidation when
  pulling down a full mm because we won't ever assign that ASID to
  another mm without doing TLB invalidation elsewhere (which actually
  just nukes the whole TLB).

  I think that means that we could potentially not fault on a kernel
  uaccess, because we could hit in the TLB"

There could be a window between complete_signal() sending IPI to other
cores and all threads sharing this mm are really kicked off from cores.
In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to
flush TLB then frees pages.  However, due to the above problem, the TLB
entries are not really flushed on arm64.  Other threads are possible to
access these pages through TLB entries.  Moreover, a copy_to_user() can
also write to these pages without generating page fault, causes
use-after-free bugs.

This patch gathers each vma instead of gathering full vm space.  In this
case tlb->fullmm is not true.  The behavior of oom reaper become similar
to munmapping before do_exit, which should be safe for all archs.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: aac453635549 ("mm, oom: introduce oom reaper")
Signed-off-by: Wang Nan <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Bob Liu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, memory_hotplug: do not back off draining pcp free pages from kworker context

drain_all_pages backs off when called from a kworker context since
commit 0ccce3b92421 ("mm, page_alloc: drain per-cpu pages from workqueue
context") because the original IPI based pcp draining has been replaced
by a WQ based one and the check wanted to prevent from recursion and
inter workers dependencies. This has made some sense at the time
because the system WQ has been used and one worker holding the lock
could be blocked while waiting for new workers to emerge which can be a
problem under OOM conditions.

Since then commit ce612879ddc7 ("mm: move pcp and lru-pcp draining into
single wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a
rescuer so we shouldn't depend on any other WQ activity to make a
forward progress so calling drain_all_pages from a worker context is
safe as long as this doesn't happen from mm_percpu_wq itself which is
not the case because all workers are required to _not_ depend on any MM
locks.

Why is this a problem in the first place? ACPI driven memory hot-remove
(acpi_device_hotplug) is executed from the worker context. We end up
calling __offline_pages to free all the pages and that requires both
lru_add_drain_all_cpuslocked and drain_all_pages to do their job
otherwise we can have dangling pages on pcp lists and fail the offline
operation (__test_page_isolated_in_pageblock would see a page with 0 ref
count but without PageBuddy set).

Fix the issue by removing the worker check in drain_all_pages.
lru_add_drain_all_cpuslocked doesn't have this restriction so it works
as expected.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue context")
Signed-off-by: Michal Hocko <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: <[email protected]> [4.11+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
"I screwed up my merge window pull request; I only sent half of what I
  meant to.

  There were no new features, just bugfixes of various importance and
  some very minor cleanup, so I think it's all still appropriate for
  -rc2.

  Highlights:

   - Fixes from Trond for some races in the NFSv4 state code.

   - Fix from Naofumi Honda for a typo in the blocked lock notificiation
     code

   - Fixes from Vasily Averin for some problems starting and stopping
     lockd especially in network namespaces"

* tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux: (23 commits)
  lockd: fix "list_add double add" caused by legacy signal interface
  nlm_shutdown_hosts_net() cleanup
  race of nfsd inetaddr notifiers vs nn->nfsd_serv change
  race of lockd inetaddr notifiers vs nlmsvc_rqst change
  SUNRPC: make cache_detail structures const
  NFSD: make cache_detail structures const
  sunrpc: make the function arg as const
  nfsd: check for use of the closed special stateid
  nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
  lockd: lost rollback of set_grace_period() in lockd_down_net()
  lockd: added cleanup checks in exit_net hook
  grace: replace BUG_ON by WARN_ONCE in exit_net hook
  nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class
  lockd: remove net pointer from messages
  nfsd: remove net pointer from debug messages
  nfsd: Fix races with check_stateid_generation()
  nfsd: Ensure we check stateid validity in the seqid operation checks
  nfsd: Fix race in lock stateid creation
  nfsd4: move find_lock_stateid
  nfsd: Ensure we don't recognise lock stateids after freeing them
  ...

Merge tag 'for-4.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"We've collected some fixes in since the pre-merge window freeze.

  There's technically only one regression fix for 4.15, but the rest
  seems important and candidates for stable.

   - fix missing flush bio puts in error cases (is serious, but rarely
     happens)

   - fix reporting stat::st_blocks for buffered append writes

   - fix space cache invalidation

   - fix out of bound memory access when setting zlib level

   - fix potential memory corruption when fsync fails in the middle

   - fix crash in integrity checker

   - incremetnal send fix, path mixup for certain unlink/rename
     combination

   - pass flags to writeback so compressed writes can be throttled
     properly

   - error handling fixes"

* tag 'for-4.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: incremental send, fix wrong unlink path after renaming file
  btrfs: tree-checker: Fix false panic for sanity test
  Btrfs: fix list_add corruption and soft lockups in fsync
  btrfs: Fix wild memory access in compression level parser
  btrfs: fix deadlock when writing out space cache
  btrfs: clear space cache inode generation always
  Btrfs: fix reported number of inode blocks after buffered append writes
  Btrfs: move definition of the function btrfs_find_new_delalloc_bytes
  Btrfs: bail out gracefully rather than BUG_ON
  btrfs: dev_alloc_list is not protected by RCU, use normal list_del
  btrfs: add missing device::flush_bio puts
  btrfs: Fix transaction abort during failure in btrfs_rm_dev_item
  Btrfs: add write_flags for compression bio

Merge tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblaze

Pull Microblaze fix from Michal Simek:
"Add missing header to mmu_context_mm.h"

* tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: add missing include to mmu_context_mm.h

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

Pull sparc fix from David Miller:
"Sparc T4 and later cpu bootup regression fix"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix boot on T4 and later.

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) The forcedeth conversion from pci_*() DMA interfaces to dma_*() ones
    missed one spot. From Zhu Yanjun.

2) Missing CRYPTO_SHA256 Kconfig dep in cfg80211, from Johannes Berg.

3) Fix checksum offloading in thunderx driver, from Sunil Goutham.

4) Add SPDX to vm_sockets_diag.h, from Stephen Hemminger.

5) Fix use after free of packet headers in TIPC, from Jon Maloy.

6) "sizeof(ptr)" vs "sizeof(*ptr)" bug in i40e, from Gustavo A R Silva.

7) Tunneling fixes in mlxsw driver, from Petr Machata.

8) Fix crash in fanout_demux_rollover() of AF_PACKET, from Mike
    Maloney.

9) Fix race in AF_PACKET bind() vs. NETDEV_UP notifier, from Eric
    Dumazet.

10) Fix regression in sch_sfq.c due to one of the timer_setup()
    conversions. From Paolo Abeni.

11) SCTP does list_for_each_entry() using wrong struct member, fix from
    Xin Long.

12) Don't use big endian netlink attribute read for
    IFLA_BOND_AD_ACTOR_SYSTEM, it is in cpu endianness. Also from Xin
    Long.

13) Fix mis-initialization of q->link.clock in CBQ scheduler, preventing
    adding filters there. From Jiri Pirko.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
  ethernet: dwmac-stm32: Fix copyright
  net: via: via-rhine: use %p to format void * address instead of %x
  net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit
  myri10ge: Update MAINTAINERS
  net: sched: cbq: create block for q->link.block
  atm: suni: remove extraneous space to fix indentation
  atm: lanai: use %p to format kernel addresses instead of %x
  VSOCK: Don't set sk_state to TCP_CLOSE before testing it
  atm: fore200e: use %pK to format kernel addresses instead of %x
  ambassador: fix incorrect indentation of assignment statement
  vxlan: use __be32 type for the param vni in __vxlan_fdb_delete
  bonding: use nla_get_u64 to extract the value for IFLA_BOND_AD_ACTOR_SYSTEM
  sctp: use right member as the param of list_for_each_entry
  sch_sfq: fix null pointer dereference at timer expiration
  cls_bpf: don't decrement net's refcount when offload fails
  net/packet: fix a race in packet_bind() and packet_notifier()
  packet: fix crash in fanout_demux_rollover()
  sctp: remove extern from stream sched
  sctp: force the params with right types for sctp csum apis
  sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail
  ...

sparc64: Fix boot on T4 and later.

If we don't put the NG4fls.o object into the same part of
the link as the generic sparc64 objects for fls() and __fls()
then the relocation in the branch we use for patching will
not fit.

Move NG4fls.o into lib-y to fix this problem.

Fixes: 46ad8d2d22c1 ("sparc64: Use sparc optimized fls and __fls for T4 and above")
Signed-off-by: David S. Miller <[email protected]>
Reported-by: Anatoly Pugachev <[email protected]>
Tested-by: Anatoly Pugachev <[email protected]>

drm/radeon: remove init of CIK VMIDs 8-16 for amdkfd

VMIDs 8-16 in Kaveri were reserved for use by the amdkfd driver.
Because we removed amdkfd support from radeon, those VMIDs are now
used by radeon and are initialized by radeon.

This patch removes the function that initialized those VMIDs for amdkfd
use.
This initialization overridden the radeon initialization and caused GPU
faults and GUI crashed.

Fixes: f4fa88ab28ab ("drm/radeon: deprecate and remove KFD interface")
Rported-by: Michel Dänzer <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-and-Tested-by: Michel Dänzer <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/ttm: fix populate_and_map() functions once more

This reverts "drm/ttm: Fix configuration error around populate_and_map()
functions".

This fix has gone into the wrong direction. Those helpers should be
available even when neither CONFIG_INTEL_IOMMU nor CONFIG_SWIOTLB are
set.

Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

vsprintf: don't use 'restricted_pointer()' when not restricting

Instead, just fall back on the new '%p' behavior which hashes the
pointer.

Otherwise, '%pK' - that was intended to mark a pointer as restricted -
just ends up leaking pointers that a normal '%p' wouldn't leak. Which
just make the whole thing pointless.

I suspect we should actually get rid of '%pK' entirely, and make it just
work as '%p' regardless, but this is the minimal obvious fix. People
who actually use 'kptr_restrict' should weigh in on which behavior they
want.

Cc: Tobin Harding <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

SUNRPC: Allow connect to return EHOSTUNREACH

Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
Tested-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"

gcc 4.4.4 is too old to have full C11 anonymous union support, so
the current initialiser fails to compile.

Reported-by: Boris Ostrovsky <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
(compile-)Tested-by: Boris Ostrovsky <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

kallsyms: take advantage of the new '%px' format

The conditional kallsym hex printing used a special fixed-width '%lx'
output (KALLSYM_FMT) in preparation for the hashing of %p, but that
series ended up adding a %px specifier to help with the conversions.

Use it, and avoid the "print pointer as an unsigned long" code.

Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'printk-hash-pointer-4.15-rc2' of git://github.com/tcharding/linux

Pull printk pointer hashing update from Tobin Harding:
"Here is the patch set that implements hashing of printk specifier %p.

  First we have two clean up patches then we do the hashing. Hashing is
  done via the SipHash algorithm. The next patch adds printk specifier
  %px for printing pointers when we _really_ want to see the address i.e
  %px is functionally equivalent to %lx. Final patch in the set fixes
  KASAN since we break it by hashing %p.

  For the record here is the justification for the series:

    Currently there exist approximately 14 000 places in the Kernel
    where addresses are being printed using an unadorned %p. This
    potentially leaks sensitive information about the Kernel layout in
    memory. Many of these calls are stale, instead of fixing every call
    we hash the address by default before printing. We then add %px to
    provide a way to print the actual address. Although this is
    achievable using %lx, using %px will assist us if we ever want to
    change pointer printing behaviour. %px is more uniquely grep'able
    (there are already >50 000 uses of %lx).

    The added advantage of hashing %p is that security is now opt-out,
    if you _really_ want the address you have to work a little harder
    and use %px.

  This will of course break some users, forcing code printing needed
  addresses to be updated"

[ I do expect this to be an annoyance, and a number of %px users to be
  added for debuggability. But nobody is willing to audit existing %p
  users for information leaks, and a number of places really only use
  the pointer as an object identifier rather than really 'I need the
  address'.

  IOW - sorry for the inconvenience, but it's the least inconvenient of
  the options.    - Linus ]

* tag 'printk-hash-pointer-4.15-rc2' of git://github.com/tcharding/linux:
  kasan: use %px to print addresses instead of %p
  vsprintf: add printk specifier %px
  printk: hash addresses printed with %p
  vsprintf: refactor %pK code out of pointer()
  docs: correct documentation for %pK

Revert "mm, thp: Do not make pmd/pud dirty without a reason"

This reverts commit 152e93af3cfe2d29d8136cc0a02a8612507136ee.

It was a nice cleanup in theory, but as Nicolai Stange points out, we do
need to make the page dirty for the copy-on-write case even when we
didn't end up making it writable, since the dirty bit is what we use to
check that we've gone through a COW cycle.

Reported-by: Michal Hocko <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'nvme-4.15' of git://git.infradead.org/nvme into for-linus

Pull NVMe fixes from Christoph:

"A few more nvme updates for 4.15. A single small PCIe fix, and a number
of patches for RDMA that are a little larger than what I'd like to see
for -rc2, but they fix important issues seen in the wild."

quota: Check for register_shrinker() failure.

register_shrinker() might return -ENOMEM error since Linux 3.12.
Call panic() as with other failure checks in this function if
register_shrinker() failed.

Fixes: 1d3d4437eae1 ("vmscan: per-node deferred work")
Signed-off-by: Tetsuo Handa <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Michal Hocko <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Signed-off-by: Jan Kara <[email protected]>

ethernet: dwmac-stm32: Fix copyright

Uniformize STMicroelectronics copyrights header

Signed-off-by: Benjamin Gaignard <[email protected]>
CC: Alexandre Torgue <[email protected]>
Acked-by: Alexandre TORGUE <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

eeprom: at24: check at24_read/write arguments

So far we completely rely on the caller to provide valid arguments.
To be on the safe side perform an own sanity check.

Cc: [email protected]
Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

eeprom: at24: fix reading from 24MAC402/24MAC602

Chip datasheet mentions that word addresses other than the actual
start position of the MAC delivers undefined results. So fix this.
Current implementation doesn't work due to this wrong offset.

Cc: [email protected]
Fixes: 0b813658c115 ("eeprom: at24: add support for at24mac series")
Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

net: via: via-rhine: use %p to format void * address instead of %x

Don't use %x and casting to print out an address, instead use %p
and remove the casting. Cleans up smatch warnings:

drivers/net/ethernet/via/via-rhine.c:998 rhine_init_one_common()
warn: argument 4 to %lx specifier is cast from pointer

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit

On 64-bit (e.g. powerpc64/allmodconfig):

    drivers/net/ethernet/xilinx/ll_temac_main.c: In function 'temac_start_xmit_done':
    drivers/net/ethernet/xilinx/ll_temac_main.c:633:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
dev_kfree_skb_irq((struct sk_buff *)cur_p->app4);
  ^

cdmac_bd.app4 is u32, so it is too small to hold a kernel pointer.

Note that several other fields in struct cdmac_bd are also too small to
hold physical addresses on 64-bit platforms.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drm/fb_helper: Disable all crtc's when initial setup fails.

Some drivers like i915 start with crtc's enabled, but with deferred
fbcon setup they were no longer disabled as part of fbdev setup.
Headless units could no longer enter pc3 state because the crtc was
still enabled.

Fix this by calling restore_fbdev_mode when we would have called
it otherwise once during initial fbdev setup.

Signed-off-by: Maarten Lankhorst <[email protected]>
Fixes: ca91a2758fce ("drm/fb-helper: Support deferred setup")
Cc: <[email protected]> # v4.14+
Reported-by: Thomas Voegtle <[email protected]>
Tested-by: Thomas Voegtle <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

myri10ge: Update MAINTAINERS

Change the maintainer to Chris Lee who has access to Myricom hardware
and can test/review. Update the website URL.

Signed-off-by: Hyong-Youb Kim <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

eeprom: at24: correctly set the size for at24mac402

There's an ilog2() expansion in AT24_DEVICE_MAGIC() which rounds down
the actual size of EUI-48 byte array in at24mac402 eeproms to 4 from 6,
making it impossible to read it all.

Fix it by manually adjusting the value in probe().

This patch contains a temporary fix that is suitable for stable
branches. Eventually we'll probably remove the call to ilog2() while
converting the magic values to actual structs.

Cc: [email protected]
Fixes: 0b813658c115 ("eeprom: at24: add support for at24mac series")
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>

drm/atomic: make drm_atomic_helper_wait_for_vblanks more agressive

drm_atomic_helper_setup_commit expects that flipping of previous commits
has happened when it is called to set up a new commit. This can be violated
by commits where userspace doesn't get a flip completion event to
synchronize against i.e. legacy modesets and property changes.

The expectation is that those are done by blocking commits, which wait for
completion. Most drivers call drm_atomic_helper_wait_for_vblanks in the
commit_tail to ensure completion, but the wait for next vblank might not
actually happen if the commit didn't change any planes.

Make the wait more agressive by also waiting if no planes changed. This
is the minimal regression fix for the 4.15 kernel series. Long term
drivers should switch away from drm_atomic_helper_wait_for_vblanks and
use drm_atomic_helper_wait_for_flip_done instead.

Fixes: de39bec1a0c4 ("drm/atomic: Remove waits in drm_atomic_helper_commit_cleanup_done, v2.")
Signed-off-by: Lucas Stach <[email protected]>
Reviewed-by: Maarten Lankhorst <[email protected]>
Signed-off-by: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

mmc: core: prepend 0x to OCR entry in sysfs

The sysfs entry "ocr" was missing the 0x prefix to identify it as hex
formatted.

Fixes: 5fb06af7a33b ("mmc: core: Extend sysfs with OCR register")
Signed-off-by: Bastian Stender <[email protected]>
Cc: <[email protected]> # v4.8+
[Ulf: Amended change to also cover SD-cards]
Signed-off-by: Ulf Hansson <[email protected]>

mmc: core: prepend 0x to pre_eol_info entry in sysfs

The sysfs entry "pre_eol_info" was missing the 0x prefix to identify it
as hex formatted.

Fixes: 46bc5c408e4e ("mmc: core: Export device lifetime information through sysfs")
Signed-off-by: Bastian Stender <[email protected]>
Cc: <[email protected]> # v4.11+
Signed-off-by: Ulf Hansson <[email protected]>

powerpc: Do not assign thread.tidr if already assigned

If set_thread_tidr() is called twice for same task_struct then it will
allocate a new tidr value to it leaving the previous value still
dangling in the vas_thread_ida table.

To fix this the patch changes set_thread_tidr() to check if a tidr
value is already assigned to the task_struct and if yes then returns
zero.

Fixes: ec233ede4c86("powerpc: Add support for setting SPRN_TIDR")
Signed-off-by: Vaibhav Jain <[email protected]>
Reviewed-by: Andrew Donnellan <[email protected]>
[mpe: Modify to return 0 in the success case, not the TID value]
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: Avoid signed to unsigned conversion in set_thread_tidr()

There is an unsafe signed to unsigned conversion in set_thread_tidr()
that may cause an error value to be assigned to SPRN_TIDR register and
used as thread-id.

The issue happens as assign_thread_tidr() returns an int and
thread.tidr is an unsigned-long. So a negative error code returned
from assign_thread_tidr() will fail the error check and gets assigned
as tidr as a large positive value.

To fix this the patch assigns the return value of assign_thread_tidr()
to a temporary int and assigns it to thread.tidr iff its '> 0'.

The patch shouldn't impact the calling convention of set_thread_tidr()
i.e all -ve return-values are error codes and a return value of '0'
indicates success.

Fixes: ec233ede4c86("powerpc: Add support for setting SPRN_TIDR")
Signed-off-by: Vaibhav Jain <[email protected]>
Reviewed-by: Christophe Lombard [email protected]
Signed-off-by: Michael Ellerman <[email protected]>

kasan: use %px to print addresses instead of %p

Pointers printed with %p are now hashed by default. Kasan needs the
actual address. We can use the new printk specifier %px for this
purpose.

Use %px instead of %p to print addresses.

Signed-off-by: Tobin C. Harding <[email protected]>

vsprintf: add printk specifier %px

printk specifier %p now hashes all addresses before printing. Sometimes
we need to see the actual unmodified address. This can be achieved using
%lx but then we face the risk that if in future we want to change the
way the Kernel handles printing of pointers we will have to grep through
the already existent 50 000 %lx call sites. Let's add specifier %px as a
clear, opt-in, way to print a pointer and maintain some level of
isolation from all the other hex integer output within the Kernel.

Add printk specifier %px to print the actual unmodified address.

Signed-off-by: Tobin C. Harding <[email protected]>

printk: hash addresses printed with %p

Currently there exist approximately 14 000 places in the kernel where
addresses are being printed using an unadorned %p. This potentially
leaks sensitive information regarding the Kernel layout in memory. Many
of these calls are stale, instead of fixing every call lets hash the
address by default before printing. This will of course break some
users, forcing code printing needed addresses to be updated.

Code that _really_ needs the address will soon be able to use the new
printk specifier %px to print the address.

For what it's worth, usage of unadorned %p can be broken down as
follows (thanks to Joe Perches).

$ git grep -E '%p[^A-Za-z0-9]' | cut -f1 -d"/" | sort | uniq -c
   1084 arch
     20 block
     10 crypto
     32 Documentation
   8121 drivers
   1221 fs
    143 include
    101 kernel
     69 lib
    100 mm
   1510 net
     40 samples
      7 scripts
     11 security
    166 sound
    152 tools
      2 virt

Add function ptr_to_id() to map an address to a 32 bit unique
identifier. Hash any unadorned usage of specifier %p and any malformed
specifiers.

Signed-off-by: Tobin C. Harding <[email protected]>

vsprintf: refactor %pK code out of pointer()

Currently code to handle %pK is all within the switch statement in
pointer(). This is the wrong level of abstraction. Each of the other switch
clauses call a helper function, pK should do the same.

Refactor code out of pointer() to new function restricted_pointer().

Signed-off-by: Tobin C. Harding <[email protected]>

docs: correct documentation for %pK

Current documentation indicates that %pK prints a leading '0x'. This is
not the case.

Correct documentation for printk specifier %pK.

Signed-off-by: Tobin C. Harding <[email protected]>

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

- avoid potential bogus alignment for some AEAD operations

- fix crash in algif_aead

- avoid sleeping in softirq context with async af_alg

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: skcipher - Fix skcipher_walk_aead_common
  crypto: af_alg - remove locking in async callback
  crypto: algif_aead - skip SGL entries with NULL page

drm/amd/display: USB-C / thunderbolt dock specific workaround

reading dpcd 0x600 cause link loss for a particular USB-C dock with
thurderbolt. workaround by avoiding dcpd 0x600 read unless it's
necessary.

Signed-off-by: Hersen Wu <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Switch to drm_atomic_helper_wait_for_flip_done

This new helper function is advised to be used for drviers that
use the nonblocking commit tracking support instead of
drm_atomic_helper_wait_for_vblanks.

Signed-off-by: Andrey Grodzovsky <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Reviewed-and-Tested-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: fix gamma setting

Adding gamma changed check as condition for affected plane.
We ignored adding plane as affected if modeset was not required.
But for color management change we still need it.

Signed-off-by: Roman Li <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Do not put drm_atomic_state on resume

drm_atomic_helper_resume now puts it for us. See relevant patch here:
https://lists.freedesktop.org/archives/dri-devel/2017-October/154268.html

Signed-off-by: Leo (Sunpeng) Li <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix couple more inconsistent NULL checks in dc_resource

Found by smatch:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_resource.c:1001
acquire_free_pipe_for_stream() error: we previously assumed 'head_pipe'
could be null (see line 998)
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_resource.c:1808
dc_validate_global_state() error: we previously assumed 'new_ctx' could
be null (see line 1778)

Signed-off-by: Harry Wentland <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix potential NULL and mem leak in create_links

Found by smatch:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:148 create_links()
error: potential null dereference 'link->link_enc'. (kzalloc returns
null)

Signed-off-by: Harry Wentland <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix hubp check in set_cursor_position

Found by smatch:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:298
dc_stream_set_cursor_position() error: we previously assumed 'hubp'
could be null (see line 294)

Signed-off-by: Harry Wentland <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix use before NULL check in validate_timing

Found by smatch:
drivers/gpu/drm/amd/amdgpu/../display/dc/dce110/dce110_timing_generator.c:1124
dce110_timing_generator_validate_timing() warn: variable dereferenced
before check 'timing' (see line 1116)

Signed-off-by: Harry Wentland <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Bunch of smatch error and warning fixes in DC

drivers/gpu/drm/amd/amdgpu/../display/dc/basics/log_helpers.c:79
dc_conn_log() error: buffer overflow 'signal_type_info_tbl' 10 <= 10
drivers/gpu/drm/amd/amdgpu/../display/dc/bios/bios_parser.c:266
bios_parser_get_dst_obj() error: uninitialized symbol 'id'.
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_audio.c:357
dce_aud_az_enable() warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_resource.c:958
dcn10_acquire_idle_pipe_for_layer() error: we previously assumed
'head_pipe' could be null (see line 952)

Signed-off-by: Harry Wentland <[email protected]>
Reviewed-by: Sun peng Li <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix amdgpu_dm bugs found by smatch

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2760
create_eml_sink() warn: variable dereferenced before check
'aconnector->base.edid_blob_ptr' (see line 2758)
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:4270
amdgpu_dm_atomic_commit_tail() warn: variable dereferenced before check
'dm_new_crtc_state->stream' (see line 4266)
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:4417
dm_restore_drm_connector_state() warn: variable dereferenced before
check 'disconnected_acrtc' (see line 4415)

Signed-off-by: Harry Wentland <[email protected]>
Reviewed-by: Sun peng Li <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: try to find matching audio inst for enc inst first

[Description]
in eDP+ HDMI/DP clone or extended configuration, audio inst changed from inst 1 to inst0.
No failure related this though, just playback device endpoint inst changed.
Also remove one addition register read.

Signed-off-by: Charlene Liu <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: fix seq issue: turn on clock before programming afmt.

Signed-off-by: Charlene Liu <[email protected]>
Reviewed-by: Krunoslav Kovac <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: fix memory leaks on error exit return

Currently in the case where some of the allocations fail for dce110_tgv,
dce110_xfmv, dce110_miv or dce110_oppv then the exit return path ends
up leaking allocated objects. Fix this by kfree'ing them before returning.
Also re-work the comparison of the null pointers to use the !ptr idiom.

Detected by CoverityScan, CID#1460246, 1460325, 1460324, 1460392
("Resource Leak")

Fixes: c4562236b3bc ("drm/amd/dc: Add dc display driver (v2)")
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: check plane state before validating fbc

While validation fbc, array_mode of the pipe is accessed
without checking plane_state exists for it.
Causing to null pointer dereferencing followed by
reboot when a crtc associated with external display(not
connected) is page flipped.

This patch adds a check for plane_state before using
it to validate fbc.

Signed-off-by: Shirish S <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Do DC mode-change check when adding CRTCs

Within atomic check, dm_update_crtcs_state is called twice. First to
remove from the dc_state, and subsequently to add to it.

In both calls, a secondary mode-change check is done using dc-level
states. We shouldn't be doing this while removing, since a new
dc_stream_state has not been created to do the necessary comparison.
Because of this, the mode_changed flag within the DRM state can be
mistakenly set to false. Doing so only when adding prevents this.

We are also guaranteed that a call to add will come after remove, or
else the atomic check fails (and a commit will not happen).

Signed-off-by: Leo (Sunpeng) Li <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Revert noisy assert messages

This partially reverts
commit 4fb48bb66211 ("dc: fix split viewport rounding error").

Signed-off-by: Jordan Lazare <[email protected]>
Reviewed-by: Dmytro Laktyushkin <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: fix split viewport rounding error

Signed-off-by: Dmytro Laktyushkin <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Check aux channel before MST resume

It is to fix: MST display failed to resume from S3

At the beginning of resume from S3, need to check if mgr->aux is
NULL. Fake MST encoder doesn't have real aux channel.

Signed-off-by: Jerry (Fangzhi) Zuo <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: fix split recout offset

Previous recout calculation fix changed recout size rounding
and affected the offset when it should not have

Signed-off-by: Dmytro Laktyushkin <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Don't reject 3D timings

Signed-off-by: Andrew Jiang <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Handle as MST first and then DP dongle if sink support both

Signed-off-by: Hersen Wu <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Reviewed-by: Wenjing Liu <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: fix split recout calculation

Recout split rounding code was wrong

Signed-off-by: Dmytro Laktyushkin <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix S3 topology change

Clean fake sink flag on resume if real sink connected.
Fixing S3 topology change problem like this:
1) x desktop with 1 or > displays
2) unplug display
3) suspend
4) replug same display
5) resume
without this change replugged display doesn't light up

Signed-off-by: Roman Li <[email protected]>
Reviewed-by: Sun peng Li <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Reviewed-by: Andrey Grodzovsky <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Add timing validation against dongle cap

For DP active dongles, the dpcd dongle caps are read but not
used to validate mode timing. This addresses this.

In particular, this change fixes light up on the HDMI 4k TV
connected through DP active dongle. Since the 4k TV defaults
to YCbCr420, which the dongle don't support.

This change does not address MST cases, a more generalized
approach must be taken for that.

Signed-off-by: Eric Yang <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Should disable when new stream is null

core_link_disable_stream should be called when the new stream is null
(i.e. want to disable). Modify the if condition to reflect that.

Signed-off-by: Leo (Sunpeng) Li <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Add null check for 24BPP (xfm and dpp)

Fixes Nullptr error when trying 24BPP

Signed-off-by: Bhawanpreet Lakha <[email protected]>
Reviewed-by: Tony Cheng <[email protected]>
Acked-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: drop experimental flag for raven

Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: don't try to move pinned BOs

Never try to move pinned BOs during CS.

Signed-off-by: Christian König <[email protected]>
Reviewed-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Use unsigned ring indices in amdgpu_queue_mgr_map

This matches the corresponding UAPI fields. Treating the ring index as
signed could result in accessing random unrelated memory if the MSB was
set.

Fixes: effd924d2f3b ("drm/amdgpu: untie user ring ids from kernel ring
ids v6")
Cc: [email protected]
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Set adev->vcn.irq.num_types for VCN

We were setting adev->uvd.irq.num_types instead.

Fixes: 9b257116e784 ("drm/amdgpu: add vcn enc irq support")
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Michel Dänzer <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

Revert "drm/amdgpu: fix rmmod KCQ disable failed error"

This reverts commit 446947b44fb8cabc0213ff4efd706931e36b1963.

this patch is incorrrect, amdgpu_ucode_bo_fini always
called after gfx_hw_fini.

Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: used cached gca values for cik_read_register

Using the cached values has less latency for bare metal and
prevents reading back bogus values if the engine is powergated.

This was implemented for VI and SI, but somehow CIK got missed.

Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/gfx7: cache raster_config values

We did this for gfx6 and 8, but somehow missed gfx7.

Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: move UVD/VCE and VCN structure out from union

With the enablement of VCN Dec and Enc from user space, User space queries
kernel for the IP information, if HW has UVD/VCE, the info comes from these
IP blocks, but this could end up mis-interpret for VCN when they are in the
union, the other way same when HW with VCN block.

Signed-off-by: Leo Liu <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Fixes: 95d0906f8506 ("drm/amdgpu: add initial vcn support and decode tests")
Cc: [email protected]
Reviewed-and-Tested-by: Michel Dänzer <[email protected]>

RISC-V: remove spin_unlock_wait()

This was removed from the other architectures in commit
952111d7db02 ("arch: Remove spin_unlock_wait() arch-specific
definitions"). That landed between when we got upstream and when our
patches were reviewed, so this is a followup patch.

Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: `sfence.vma` orderes the instruction cache

This is just a comment change, but it's one that bit me on the mailing
list. It turns out that issuing a `sfence.vma` enforces instruction
cache ordering in addition to TLB ordering. This isn't explicitly
called out in the ISA manual, but Andrew will be making that more clear
in a future revision.

CC: Andrew Waterman <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Add READ_ONCE in arch_spin_is_locked()

This was just incorrect in the original version.

Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: __test_and_op_bit_ord should be strongly ordered

I mis-read the documentation. After looking at it again the
documentation is actually as clear as it can be, it's just that I didn't
actually read it in order and therefor did the wrong thing.

Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Remove smb_mb__{before,after}_spinlock()

These are obselete.

Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Remove __smp_bp__{before,after}_atomic

These duplicate the asm-generic definitions are therefor aren't useful.

Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Comment on why {,cmp}xchg is ordered how it is

This is another memory model FIXME.

Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Remove unused arguments from ATOMIC_OP

Our atomics are generated from a complicated series of preprocessor
macros, each of which is slightly different from the last. When writing
the macros I'd accidentally left some unused arguments floating around.
This patch removes the unused macro arguments.

Signed-off-by: Palmer Dabbelt <[email protected]>

net: sched: cbq: create block for q->link.block

q->link.block is not initialized, that leads to EINVAL when one tries to
add filter there. So initialize it properly.

This can be reproduced by:
$ tc qdisc add dev eth0 root handle 1: cbq avpkt 1000 rate 1000Mbit bandwidth 1000Mbit
$ tc filter add dev eth0 parent 1: protocol ip prio 100 u32 match ip protocol 0 0x00 flowid 1:1

Reported-by: Jaroslav Aster <[email protected]>
Reported-by: Ivan Vecera <[email protected]>
Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Jiri Pirko <[email protected]>
Acked-by: Eelco Chaudron <[email protected]>
Reviewed-by: Ivan Vecera <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

atm: suni: remove extraneous space to fix indentation

Remove a leading space, fixes indentation

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

atm: lanai: use %p to format kernel addresses instead of %x

Don't use %x and casting to print out a kernel address, instead use %p
and remove the casting. Cleans up smatch warnings:

drivers/atm/lanai.c:1589 service_buffer_allocate() warn: argument 2 to
%08lX specifier is cast from pointer
drivers/atm/lanai.c:2221 lanai_dev_open() warn: argument 4 to %lx
specifier is cast from pointer

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

VSOCK: Don't set sk_state to TCP_CLOSE before testing it

A recent commit (3b4477d2dcf2) converted the sk_state to use
TCP constants. In that change, vmci_transport_handle_detach
was changed such that sk->sk_state was set to TCP_CLOSE before
we test whether it is TCP_SYN_SENT. This change moves the
sk_state change back to the original locations in that function.

Signed-off-by: Jorgen Hansen <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

atm: fore200e: use %pK to format kernel addresses instead of %x

Don't use %x and casting to print out a kernel address, instead use the
%pK and remove the casting. Cleans up smatch warning:

drivers/atm/fore200e.c:3093 fore200e_proc_read() warn: argument 3 to %08x
specifier is cast from pointer

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>