]> Git Repo - linux.git/log
linux.git
5 years agomm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
John Hubbard [Fri, 31 Jan 2020 06:13:05 +0000 (22:13 -0800)]
mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()

Convert process_vm_access to use the new pin_user_pages_remote() call,
which sets FOLL_PIN.  Setting FOLL_PIN is now required for code that
requires tracking of pinned pages.

Also, release the pages via put_user_page*().

Also, rename "pages" to "pinned_pages", as this makes for easier reading
of process_vm_rw_single_vec().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agoIB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
John Hubbard [Fri, 31 Jan 2020 06:13:02 +0000 (22:13 -0800)]
IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP

Convert infiniband to use the new pin_user_pages*() calls.

Also, revert earlier changes to Infiniband ODP that had it using
put_user_page().  ODP is "Case 3" in
Documentation/core-api/pin_user_pages.rst, which is to say, normal
get_user_pages() and put_page() is the API to use there.

The new pin_user_pages*() calls replace corresponding get_user_pages*()
calls, and set the FOLL_PIN flag.  The FOLL_PIN flag requires that the
caller must return the pages via put_user_page*() calls, but infiniband
was already doing that as part of an earlier commit.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agogoldish_pipe: convert to pin_user_pages() and put_user_page()
John Hubbard [Fri, 31 Jan 2020 06:12:58 +0000 (22:12 -0800)]
goldish_pipe: convert to pin_user_pages() and put_user_page()

1. Call the new global pin_user_pages_fast(), from
   pin_goldfish_pages().

2. As required by pin_user_pages(), release these pages via
   put_user_page().  In this case, do so via put_user_pages_dirty_lock().

That has the side effect of calling set_page_dirty_lock(), instead of
set_page_dirty().  This is probably more accurate.

As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
dealing with a file backed page where we have reference on the inode it
hangs off." [1]

Another side effect is that the release code is simplified because the
page[] loop is now in gup.c instead of here, so just delete the local
release_user_pages() entirely, and call put_user_pages_dirty_lock()
directly, instead.

[1] https://lore.kernel.org/r/20190723153640[email protected]

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm/gup: introduce pin_user_pages*() and FOLL_PIN
John Hubbard [Fri, 31 Jan 2020 06:12:54 +0000 (22:12 -0800)]
mm/gup: introduce pin_user_pages*() and FOLL_PIN

Introduce pin_user_pages*() variations of get_user_pages*() calls, and
also pin_longterm_pages*() variations.

For now, these are placeholder calls, until the various call sites are
converted to use the correct get_user_pages*() or pin_user_pages*() API.

These variants will eventually all set FOLL_PIN, which is also
introduced, and thoroughly documented.

    pin_user_pages()
    pin_user_pages_remote()
    pin_user_pages_fast()

All pages that are pinned via the above calls, must be unpinned via
put_user_page().

The underlying rules are:

* FOLL_PIN is a gup-internal flag, so the call sites should not directly
  set it.  That behavior is enforced with assertions.

* Call sites that want to indicate that they are going to do DirectIO
  ("DIO") or something with similar characteristics, should call a
  get_user_pages()-like wrapper call that sets FOLL_PIN.  These wrappers
  will:

    * Start with "pin_user_pages" instead of "get_user_pages".  That
      makes it easy to find and audit the call sites.

    * Set FOLL_PIN

* For pages that are received via FOLL_PIN, those pages must be returned
  via put_user_page().

Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases in
this documentation.  (I've reworded it and expanded upon it.)

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]> [Documentation]
Reviewed-by: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomedia/v4l2-core: set pages dirty upon releasing DMA buffers
John Hubbard [Fri, 31 Jan 2020 06:12:50 +0000 (22:12 -0800)]
media/v4l2-core: set pages dirty upon releasing DMA buffers

After DMA is complete, and the device and CPU caches are synchronized,
it's still required to mark the CPU pages as dirty, if the data was
coming from the device.  However, this driver was just issuing a bare
put_page() call, without any set_page_dirty*() call.

Fix the problem, by calling set_page_dirty_lock() if the CPU pages were
potentially receiving data from the device.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Hans Verkuil <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agoIB/umem: use get_user_pages_fast() to pin DMA pages
John Hubbard [Fri, 31 Jan 2020 06:12:47 +0000 (22:12 -0800)]
IB/umem: use get_user_pages_fast() to pin DMA pages

And get rid of the mmap_sem calls, as part of that.  Note that
get_user_pages_fast() will, if necessary, fall back to
__gup_longterm_unlocked(), which takes the mmap_sem as needed.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm/gup: allow FOLL_FORCE for get_user_pages_fast()
John Hubbard [Fri, 31 Jan 2020 06:12:43 +0000 (22:12 -0800)]
mm/gup: allow FOLL_FORCE for get_user_pages_fast()

Commit 817be129e6f2 ("mm: validate get_user_pages_fast flags") allowed
only FOLL_WRITE and FOLL_LONGTERM to be passed to get_user_pages_fast().
This, combined with the fact that get_user_pages_fast() falls back to
"slow gup", which *does* accept FOLL_FORCE, leads to an odd situation:
if you need FOLL_FORCE, you cannot call get_user_pages_fast().

There does not appear to be any reason for filtering out FOLL_FORCE.
There is nothing in the _fast() implementation that requires that we
avoid writing to the pages.  So it appears to have been an oversight.

Fix by allowing FOLL_FORCE to be set for get_user_pages_fast().

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 817be129e6f2 ("mm: validate get_user_pages_fast flags")
Signed-off-by: John Hubbard <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agovfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
John Hubbard [Fri, 31 Jan 2020 06:12:39 +0000 (22:12 -0800)]
vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call

Update VFIO to take advantage of the recently loosened restriction on
FOLL_LONGTERM with get_user_pages_remote().  Also, now it is possible to
fix a bug: the VFIO caller is logically a FOLL_LONGTERM user, but it
wasn't setting FOLL_LONGTERM.

Also, remove an unnessary pair of calls that were releasing and
reacquiring the mmap_sem.  There is no need to avoid holding mmap_sem
just in order to call page_to_pfn().

Also, now that the the DAX check ("if a VMA is DAX, don't allow long
term pinning") is in the internals of get_user_pages_remote() and
__gup_longterm_locked(), there's no need for it at the VFIO call site.  So
remove it.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Tested-by: Alex Williamson <[email protected]>
Acked-by: Alex Williamson <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Suggested-by: Jason Gunthorpe <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM
John Hubbard [Fri, 31 Jan 2020 06:12:36 +0000 (22:12 -0800)]
mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM

As it says in the updated comment in gup.c: current FOLL_LONGTERM
behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the FS
DAX check requirement on vmas.

However, the corresponding restriction in get_user_pages_remote() was
slightly stricter than is actually required: it forbade all
FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers
that do not set the "locked" arg.

Update the code and comments to loosen the restriction, allowing
FOLL_LONGTERM in some cases.

Also, copy the DAX check ("if a VMA is DAX, don't allow long term
pinning") from the VFIO call site, all the way into the internals of
get_user_pages_remote() and __gup_longterm_locked().  That is:
get_user_pages_remote() calls __gup_longterm_locked(), which in turn
calls check_dax_vmas().  This check will then be removed from the VFIO
call site in a subsequent patch.

Thanks to Jason Gunthorpe for pointing out a clean way to fix this, and
to Dan Williams for helping clarify the DAX refactoring.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Tested-by: Alex Williamson <[email protected]>
Acked-by: Alex Williamson <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Suggested-by: Jason Gunthorpe <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agogoldish_pipe: rename local pin_user_pages() routine
John Hubbard [Fri, 31 Jan 2020 06:12:32 +0000 (22:12 -0800)]
goldish_pipe: rename local pin_user_pages() routine

Avoid naming conflicts: rename local static function from
"pin_user_pages()" to "goldfish_pin_pages()".

An upcoming patch will introduce a global pin_user_pages() function.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
John Hubbard [Fri, 31 Jan 2020 06:12:28 +0000 (22:12 -0800)]
mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages

An upcoming patch changes and complicates the refcounting and especially
the "put page" aspects of it.  In order to keep everything clean,
refactor the devmap page release routines:

* Rename put_devmap_managed_page() to page_is_devmap_managed(), and
  limit the functionality to "read only": return a bool, with no side
  effects.

* Add a new routine, put_devmap_managed_page(), to handle decrementing
  the refcount for ZONE_DEVICE pages.

* Change callers (just release_pages() and put_page()) to check
  page_is_devmap_managed() before calling the new
  put_devmap_managed_page() routine.  This is a performance point:
  put_page() is a hot path, so we need to avoid non- inline function calls
  where possible.

* Rename __put_devmap_managed_page() to free_devmap_managed_page(), and
  limit the functionality to unconditionally freeing a devmap page.

This is originally based on a separate patch by Ira Weiny, which applied
to an early version of the put_user_page() experiments.  Since then,
Jérôme Glisse suggested the refactoring described above.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
Suggested-by: Jérôme Glisse <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm: Cleanup __put_devmap_managed_page() vs ->page_free()
Dan Williams [Fri, 31 Jan 2020 06:12:24 +0000 (22:12 -0800)]
mm: Cleanup __put_devmap_managed_page() vs ->page_free()

After the removal of the device-public infrastructure there are only 2
->page_free() call backs in the kernel.  One of those is a
device-private callback in the nouveau driver, the other is a generic
wakeup needed in the DAX case.  In the hopes that all ->page_free()
callbacks can be migrated to common core kernel functionality, move the
device-private specific actions in __put_devmap_managed_page() under the
is_device_private_page() conditional, including the ->page_free()
callback.  For the other page types just open-code the generic wakeup.

Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
case.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm/gup: move try_get_compound_head() to top, fix minor issues
John Hubbard [Fri, 31 Jan 2020 06:12:21 +0000 (22:12 -0800)]
mm/gup: move try_get_compound_head() to top, fix minor issues

An upcoming patch uses try_get_compound_head() more widely, so move it to
the top of gup.c.

Also fix a tiny spelling error and a checkpatch.pl warning.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm/gup: factor out duplicate code from four routines
John Hubbard [Fri, 31 Jan 2020 06:12:17 +0000 (22:12 -0800)]
mm/gup: factor out duplicate code from four routines

Patch series "mm/gup: prereqs to track dma-pinned pages: FOLL_PIN", v12.

Overview:

This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], and in a remarkable number of email threads since about
2017.  :)

A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.

I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on.  That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".

In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup").  In other words, opt-in by
changing from this:

    get_user_pages() (sets FOLL_GET)
    put_page()

to this:
    pin_user_pages() (sets FOLL_PIN)
    unpin_user_page()

Testing:

* I've done some overall kernel testing (LTP, and a few other goodies),
  and some directed testing to exercise some of the changes. And as you
  can see, gup_benchmark is enhanced to exercise this. Basically, I've
  been able to runtime test the core get_user_pages() and
  pin_user_pages() and related routines, but not so much on several of
  the call sites--but those are generally just a couple of lines
  changed, each.

  Not much of the kernel is actually using this, which on one hand
  reduces risk quite a lot. But on the other hand, testing coverage
  is low. So I'd love it if, in particular, the Infiniband and PowerPC
  folks could do a smoke test of this series for me.

  Runtime testing for the call sites so far is pretty light:

    * io_uring: Some directed tests from liburing exercise this, and
                they pass.
    * process_vm_access.c: A small directed test passes.
    * gup_benchmark: the enhanced version hits the new gup.c code, and
                     passes.
    * infiniband: Ran rdma-core tests: rdma-core/build/bin/run_tests.py
    * VFIO: compiles (I'm vowing to set up a run time test soon, but it's
                      not ready just yet)
    * powerpc: it compiles...
    * drm/via: compiles...
    * goldfish: compiles...
    * net/xdp: compiles...
    * media/v4l2: compiles...

[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/

This patch (of 22):

There are four locations in gup.c that have a fair amount of code
duplication.  This means that changing one requires making the same
changes in four places, not to mention reading the same code four times,
and wondering if there are subtle differences.

Factor out the common code into static functions, thus reducing the
overall line count and the code's complexity.

Also, take the opportunity to slightly improve the efficiency of the
error cases, by doing a mass subtraction of the refcount, surrounded by
get_page()/put_page().

Also, further simplify (slightly), by waiting until the the successful
end of each routine, to increment *nr.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: John Hubbard <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jérôme Glisse <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Björn Töpel <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Hans Verkuil <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm/gup.c: use is_vm_hugetlb_page() to check whether to follow huge
Wei Yang [Fri, 31 Jan 2020 06:12:14 +0000 (22:12 -0800)]
mm/gup.c: use is_vm_hugetlb_page() to check whether to follow huge

No functional change, just leverage the helper function to improve
readability as others.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Wei Yang <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: David Rientjes <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm: fix gup_pud_range
Qiujun Huang [Fri, 31 Jan 2020 06:12:10 +0000 (22:12 -0800)]
mm: fix gup_pud_range

sorry for not processing for a long time.  I met it again.

patch v1   https://lkml.org/lkml/2019/9/20/656

do_machine_check()
  do_memory_failure()
    memory_failure()
      hw_poison_user_mappings()
        try_to_unmap()
          pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));

...and now we have a swap entry that indicates that the page entry
refers to a bad (and poisoned) page of memory, but gup_fast() at this
level of the page table was ignoring swap entries, and incorrectly
assuming that "!pxd_none() == valid and present".

And this was not just a poisoned page problem, but a generaly swap entry
problem.  So, any swap entry type (device memory migration, numa
migration, or just regular swapping) could lead to the same problem.

Fix this by checking for pxd_present(), instead of pxd_none().

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Qiujun Huang <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm/filemap.c: clean up filemap_write_and_wait()
Ira Weiny [Fri, 31 Jan 2020 06:12:07 +0000 (22:12 -0800)]
mm/filemap.c: clean up filemap_write_and_wait()

At some point filemap_write_and_wait() and
filemap_write_and_wait_range() got the exact same implementation with
the exception of the range being specified in *_range()

Similar to other functions in fs.h which call *_range(..., 0,
LLONG_MAX), change filemap_write_and_wait() to be a static inline which
calls filemap_write_and_wait_range()

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ira Weiny <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm/debug.c: always print flags in dump_page()
Vlastimil Babka [Fri, 31 Jan 2020 06:12:03 +0000 (22:12 -0800)]
mm/debug.c: always print flags in dump_page()

Commit 76a1850e4572 ("mm/debug.c: __dump_page() prints an extra line")
inadvertently removed printing of page flags for pages that are neither
anon nor ksm nor have a mapping.  Fix that.

Using pr_cont() again would be a solution, but the commit explicitly
removed its use.  Avoiding the danger of mixing up split lines from
multiple CPUs might be beneficial for near-panic dumps like this, so fix
this without reintroducing pr_cont().

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 76a1850e4572 ("mm/debug.c: __dump_page() prints an extra line")
Signed-off-by: Vlastimil Babka <[email protected]>
Reported-by: Anshuman Khandual <[email protected]>
Reported-by: Michal Hocko <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Ralph Campbell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm/kmemleak: turn kmemleak_lock and object->lock to raw_spinlock_t
He Zhe [Fri, 31 Jan 2020 06:12:00 +0000 (22:12 -0800)]
mm/kmemleak: turn kmemleak_lock and object->lock to raw_spinlock_t

kmemleak_lock as a rwlock on RT can possibly be acquired in atomic
context which does work.

Since the kmemleak operation is performed in atomic context make it a
raw_spinlock_t so it can also be acquired on RT.  This is used for
debugging and is not enabled by default in a production like environment
(where performance/latency matters) so it makes sense to make it a
raw_spinlock_t instead trying to get rid of the atomic context.  Turn
also the kmemleak_object->lock into raw_spinlock_t which is acquired
(nested) while the kmemleak_lock is held.

The time spent in "echo scan > kmemleak" slightly improved on 64core box
with this patch applied after boot.

[[email protected]: redo the description, update comments. Merge the individual bits:  He Zhe did the kmemleak_lock, Liu Haitao the ->lock and Yongxin Liu forwarded Liu's patch.]
Link: http://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: He Zhe <[email protected]>
Signed-off-by: Liu Haitao <[email protected]>
Signed-off-by: Yongxin Liu <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm/slub.c: avoid slub allocation while holding list_lock
Yu Zhao [Fri, 31 Jan 2020 06:11:57 +0000 (22:11 -0800)]
mm/slub.c: avoid slub allocation while holding list_lock

If we are already under list_lock, don't call kmalloc().  Otherwise we
will run into a deadlock because kmalloc() also tries to grab the same
lock.

Fix the problem by using a static bitmap instead.

  WARNING: possible recursive locking detected
  --------------------------------------------
  mount-encrypted/4921 is trying to acquire lock:
  (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437

  but task is already holding lock:
  (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&n->list_lock)->rlock);
    lock(&(&n->list_lock)->rlock);

   *** DEADLOCK ***

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yu Zhao <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agoocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction
wangyan [Fri, 31 Jan 2020 06:11:53 +0000 (22:11 -0800)]
ocfs2: use ocfs2_update_inode_fsync_trans() to access t_tid in handle->h_transaction

For the uniform format, we use ocfs2_update_inode_fsync_trans() to
access t_tid in handle->h_transaction

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yan Wang <[email protected]>
Reviewed-by: Jun Piao <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agoocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()
wangyan [Fri, 31 Jan 2020 06:11:50 +0000 (22:11 -0800)]
ocfs2: fix a NULL pointer dereference when call ocfs2_update_inode_fsync_trans()

I found a NULL pointer dereference in ocfs2_update_inode_fsync_trans(),
handle->h_transaction may be NULL in this situation:

ocfs2_file_write_iter
  ->__generic_file_write_iter
      ->generic_perform_write
        ->ocfs2_write_begin
          ->ocfs2_write_begin_nolock
            ->ocfs2_write_cluster_by_desc
              ->ocfs2_write_cluster
                ->ocfs2_mark_extent_written
                  ->ocfs2_change_extent_flag
                    ->ocfs2_split_extent
                      ->ocfs2_try_to_merge_extent
                        ->ocfs2_extend_rotate_transaction
                          ->ocfs2_extend_trans
                            ->jbd2_journal_restart
                              ->jbd2__journal_restart
                                // handle->h_transaction is NULL here
                                ->handle->h_transaction = NULL;
                                ->start_this_handle
                                  /* journal aborted due to storage
                                     network disconnection, return error */
                                  ->return -EROFS;
                         /* line 3806 in ocfs2_try_to_merge_extent (),
                            it will ignore ret error. */
                        ->ret = 0;
        ->...
        ->ocfs2_write_end
          ->ocfs2_write_end_nolock
            ->ocfs2_update_inode_fsync_trans
              // NULL pointer dereference
              ->oi->i_sync_tid = handle->h_transaction->t_tid;

The information of NULL pointer dereference as follows:
    JBD2: Detected IO errors while flushing file data on dm-11-45
    Aborting journal on device dm-11-45.
    JBD2: Error -5 detected when updating journal superblock for dm-11-45.
    (dd,22081,3):ocfs2_extend_trans:474 ERROR: status = -30
    (dd,22081,3):ocfs2_try_to_merge_extent:3877 ERROR: status = -30
    Unable to handle kernel NULL pointer dereference at
    virtual address 0000000000000008
    Mem abort info:
      ESR = 0x96000004
      Exception class = DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
    Data abort info:
      ISV = 0, ISS = 0x00000004
      CM = 0, WnR = 0
    user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000e74e1338
    [0000000000000008] pgd=0000000000000000
    Internal error: Oops: 96000004 [#1] SMP
    Process dd (pid: 22081, stack limit = 0x00000000584f35a9)
    CPU: 3 PID: 22081 Comm: dd Kdump: loaded
    Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
    pstate: 60400009 (nZCv daif +PAN -UAO)
    pc : ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2]
    lr : ocfs2_write_end_nolock+0x2a0/0x550 [ocfs2]
    sp : ffff0000459fba70
    x29: ffff0000459fba70 x28: 0000000000000000
    x27: ffff807ccf7f1000 x26: 0000000000000001
    x25: ffff807bdff57970 x24: ffff807caf1d4000
    x23: ffff807cc79e9000 x22: 0000000000001000
    x21: 000000006c6cd000 x20: ffff0000091d9000
    x19: ffff807ccb239db0 x18: ffffffffffffffff
    x17: 000000000000000e x16: 0000000000000007
    x15: ffff807c5e15bd78 x14: 0000000000000000
    x13: 0000000000000000 x12: 0000000000000000
    x11: 0000000000000000 x10: 0000000000000001
    x9 : 0000000000000228 x8 : 000000000000000c
    x7 : 0000000000000fff x6 : ffff807a308ed6b0
    x5 : ffff7e01f10967c0 x4 : 0000000000000018
    x3 : d0bc661572445600 x2 : 0000000000000000
    x1 : 000000001b2e0200 x0 : 0000000000000000
    Call trace:
     ocfs2_write_end_nolock+0x2b8/0x550 [ocfs2]
     ocfs2_write_end+0x4c/0x80 [ocfs2]
     generic_perform_write+0x108/0x1a8
     __generic_file_write_iter+0x158/0x1c8
     ocfs2_file_write_iter+0x668/0x950 [ocfs2]
     __vfs_write+0x11c/0x190
     vfs_write+0xac/0x1c0
     ksys_write+0x6c/0xd8
     __arm64_sys_write+0x24/0x30
     el0_svc_common+0x78/0x130
     el0_svc_handler+0x38/0x78
     el0_svc+0x8/0xc

To prevent NULL pointer dereference in this situation, we use
is_handle_aborted() before using handle->h_transaction->t_tid.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yan Wang <[email protected]>
Reviewed-by: Jun Piao <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agoocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider use
Andy Shevchenko [Fri, 31 Jan 2020 06:11:47 +0000 (22:11 -0800)]
ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider use

There are users already and will be more of BITS_TO_BYTES() macro.  Move
it to bitops.h for wider use.

In the case of ocfs2 the replacement is identical.

As for bnx2x, there are two places where floor version is used.  In the
first case to calculate the amount of structures that can fit one memory
page.  In this case obviously the ceiling variant is correct and
original code might have a potential bug, if amount of bits % 8 is not
0.  In the second case the macro is used to calculate bytes transmitted
in one microsecond.  This will work for all speeds which is multiply of
1Gbps without any change, for the rest new code will give ceiling value,
for instance 100Mbps will give 13 bytes, while old code gives 12 bytes
and the arithmetically correct one is 12.5 bytes.  Further the value is
used to setup timer threshold which in any case has its own margins due
to certain resolution.  I don't see here an issue with slightly shifting
thresholds for low speed connections, the card is supposed to utilize
highest available rate, which is usually 10Gbps.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Acked-by: Sudarsana Reddy Kalluru <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agoocfs2/dlm: remove redundant assignment to ret
Colin Ian King [Fri, 31 Jan 2020 06:11:43 +0000 (22:11 -0800)]
ocfs2/dlm: remove redundant assignment to ret

The variable ret is being initialized with a value that is never read
and it is being updated later with a new value.  The initialization is
redundant and can be removed.

Addresses Coverity ("Unused value")

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agoocfs2: make local header paths relative to C files
Masahiro Yamada [Fri, 31 Jan 2020 06:11:40 +0000 (22:11 -0800)]
ocfs2: make local header paths relative to C files

Gang He reports the failure of building fs/ocfs2/ as an external module
of the kernel installed on the system:

 $ cd fs/ocfs2
 $ make -C /lib/modules/`uname -r`/build M=`pwd` modules

If you want to make it work reliably, I'd recommend to remove ccflags-y
from the Makefiles, and to make header paths relative to the C files.  I
think this is the correct usage of the #include "..." directive.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Gang He <[email protected]>
Reported-by: Gang He <[email protected]>
Reviewed-by: Gang He <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agoocfs2: remove unneeded semicolons
zhengbin [Fri, 31 Jan 2020 06:11:36 +0000 (22:11 -0800)]
ocfs2: remove unneeded semicolons

Fixes coccicheck warnings:

  fs/ocfs2/cluster/quorum.c:76:2-3: Unneeded semicolon
  fs/ocfs2/dlmglue.c:573:2-3: Unneeded semicolon

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: zhengbin <[email protected]>
Reported-by: Hulk Robot <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agofs: ocfs: remove unnecessary assertion in dlm_migrate_lockres
Aditya Pakki [Fri, 31 Jan 2020 06:11:33 +0000 (22:11 -0800)]
fs: ocfs: remove unnecessary assertion in dlm_migrate_lockres

In the only caller of dlm_migrate_lockres() - dlm_empty_lockres(),
target is checked for O2NM_MAX_NODES.  Thus, the assertion in
dlm_migrate_lockres() is unnecessary and can be removed.  The patch
eliminates such a check.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Aditya Pakki <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agoscripts/spelling.txt: add "issus" typo
Luca Ceresoli [Fri, 31 Jan 2020 06:11:30 +0000 (22:11 -0800)]
scripts/spelling.txt: add "issus" typo

Add "issus" and correct it as "issues".

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Luca Ceresoli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agoscripts/spelling.txt: add more spellings to spelling.txt
Xiong [Fri, 31 Jan 2020 06:11:27 +0000 (22:11 -0800)]
scripts/spelling.txt: add more spellings to spelling.txt

Here are some of the common spelling mistakes and typos that I've found
while fixing up spelling mistakes in the kernel.  Most of them still
exist in more than two source files.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiong <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Chris Paterson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm: move_pages: report the number of non-attempted pages
Yang Shi [Fri, 31 Jan 2020 06:11:24 +0000 (22:11 -0800)]
mm: move_pages: report the number of non-attempted pages

Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"), the
semantic of move_pages() has changed to return the number of
non-migrated pages if they were result of a non-fatal reasons (usually a
busy page).

This was an unintentional change that hasn't been noticed except for LTP
tests which checked for the documented behavior.

There are two ways to go around this change.  We can even get back to
the original behavior and return -EAGAIN whenever migrate_pages is not
able to migrate pages due to non-fatal reasons.  Another option would be
to simply continue with the changed semantic and extend move_pages
documentation to clarify that -errno is returned on an invalid input or
when migration simply cannot succeed (e.g.  -ENOMEM, -EBUSY) or the
number of pages that couldn't have been migrated due to ephemeral
reasons (e.g.  page is pinned or locked for other reasons).

This patch implements the second option because this behavior is in
place for some time without anybody complaining and possibly new users
depending on it.  Also it allows to have a slightly easier error
handling as the caller knows that it is worth to retry when err > 0.

But since the new semantic would be aborted immediately if migration is
failed due to ephemeral reasons, need include the number of
non-attempted pages in the return value too.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
Signed-off-by: Yang Shi <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Cc: <[email protected]> [4.17+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm: thp: don't need care deferred split queue in memcg charge move path
Wei Yang [Fri, 31 Jan 2020 06:11:20 +0000 (22:11 -0800)]
mm: thp: don't need care deferred split queue in memcg charge move path

If compound is true, this means it is a PMD mapped THP.  Which implies
the page is not linked to any defer list.  So the first code chunk will
not be executed.

Also with this reason, it would not be proper to add this page to a
defer list.  So the second code chunk is not correct.

Based on this, we should remove the defer list related code.

[[email protected]: better patch title]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware")
Signed-off-by: Wei Yang <[email protected]>
Suggested-by: Kirill A. Shutemov <[email protected]>
Acked-by: Yang Shi <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: <[email protected]> [5.4+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm/memory_hotplug: fix remove_memory() lockdep splat
Dan Williams [Fri, 31 Jan 2020 06:11:17 +0000 (22:11 -0800)]
mm/memory_hotplug: fix remove_memory() lockdep splat

The daxctl unit test for the dax_kmem driver currently triggers the
(false positive) lockdep splat below.  It results from the fact that
remove_memory_block_devices() is invoked under the mem_hotplug_lock()
causing lockdep entanglements with cpu_hotplug_lock() and sysfs (kernfs
active state tracking).  It is a false positive because the sysfs
attribute path triggering the memory remove is not the same attribute
path associated with memory-block device.

sysfs_break_active_protection() is not applicable since there is no real
deadlock conflict, instead move memory-block device removal outside the
lock.  The mem_hotplug_lock() is not needed to synchronize the
memory-block device removal vs the page online state, that is already
handled by lock_device_hotplug().  Specifically, lock_device_hotplug()
is sufficient to allow try_remove_memory() to check the offline state of
the memblocks and be assured that any in progress online attempts are
flushed / blocked by kernfs_drain() / attribute removal.

The add_memory() path safely creates memblock devices under the
mem_hotplug_lock().  There is no kernfs active state synchronization in
the memblock device_register() path, so nothing to fix there.

This change is only possible thanks to the recent change that refactored
memory block device removal out of arch_remove_memory() (commit
4c4b7f9ba948 "mm/memory_hotplug: remove memory block devices before
arch_remove_memory()"), and David's due diligence tracking down the
guarantees afforded by kernfs_drain().  Not flagged for -stable since
this only impacts ongoing development and lockdep validation, not a
runtime issue.

    ======================================================
    WARNING: possible circular locking dependency detected
    5.5.0-rc3+ #230 Tainted: G           OE
    ------------------------------------------------------
    lt-daxctl/6459 is trying to acquire lock:
    ffff99c7f0003510 (kn->count#241){++++}, at: kernfs_remove_by_name_ns+0x41/0x80

    but task is already holding lock:
    ffffffffa76a5450 (mem_hotplug_lock.rw_sem){++++}, at: percpu_down_write+0x20/0xe0

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #2 (mem_hotplug_lock.rw_sem){++++}:
           __lock_acquire+0x39c/0x790
           lock_acquire+0xa2/0x1b0
           get_online_mems+0x3e/0xb0
           kmem_cache_create_usercopy+0x2e/0x260
           kmem_cache_create+0x12/0x20
           ptlock_cache_init+0x20/0x28
           start_kernel+0x243/0x547
           secondary_startup_64+0xb6/0xc0

    -> #1 (cpu_hotplug_lock.rw_sem){++++}:
           __lock_acquire+0x39c/0x790
           lock_acquire+0xa2/0x1b0
           cpus_read_lock+0x3e/0xb0
           online_pages+0x37/0x300
           memory_subsys_online+0x17d/0x1c0
           device_online+0x60/0x80
           state_store+0x65/0xd0
           kernfs_fop_write+0xcf/0x1c0
           vfs_write+0xdb/0x1d0
           ksys_write+0x65/0xe0
           do_syscall_64+0x5c/0xa0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe

    -> #0 (kn->count#241){++++}:
           check_prev_add+0x98/0xa40
           validate_chain+0x576/0x860
           __lock_acquire+0x39c/0x790
           lock_acquire+0xa2/0x1b0
           __kernfs_remove+0x25f/0x2e0
           kernfs_remove_by_name_ns+0x41/0x80
           remove_files.isra.0+0x30/0x70
           sysfs_remove_group+0x3d/0x80
           sysfs_remove_groups+0x29/0x40
           device_remove_attrs+0x39/0x70
           device_del+0x16a/0x3f0
           device_unregister+0x16/0x60
           remove_memory_block_devices+0x82/0xb0
           try_remove_memory+0xb5/0x130
           remove_memory+0x26/0x40
           dev_dax_kmem_remove+0x44/0x6a [kmem]
           device_release_driver_internal+0xe4/0x1c0
           unbind_store+0xef/0x120
           kernfs_fop_write+0xcf/0x1c0
           vfs_write+0xdb/0x1d0
           ksys_write+0x65/0xe0
           do_syscall_64+0x5c/0xa0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe

    other info that might help us debug this:

    Chain exists of:
      kn->count#241 --> cpu_hotplug_lock.rw_sem --> mem_hotplug_lock.rw_sem

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(mem_hotplug_lock.rw_sem);
                                   lock(cpu_hotplug_lock.rw_sem);
                                   lock(mem_hotplug_lock.rw_sem);
      lock(kn->count#241);

     *** DEADLOCK ***

No fixes tag as this has been a long standing issue that predated the
addition of kernfs lockdep annotations.

Link: http://lkml.kernel.org/r/157991441887.2763922.4770790047389427325.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm/migrate.c: also overwrite error when it is bigger than zero
Wei Yang [Fri, 31 Jan 2020 06:11:14 +0000 (22:11 -0800)]
mm/migrate.c: also overwrite error when it is bigger than zero

If we get here after successfully adding page to list, err would be 1 to
indicate the page is queued in the list.

Current code has two problems:

  * on success, 0 is not returned
  * on error, if add_page_for_migratioin() return 1, and the following err1
    from do_move_pages_to_node() is set, the err1 is not returned since err
    is 1

And these behaviors break the user interface.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: e0153fc2c760 ("mm: move_pages: return valid node id in status if the page is already on the target node").
Signed-off-by: Wei Yang <[email protected]>
Acked-by: Yang Shi <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm/sparse.c: reset section's mem_map when fully deactivated
Pingfan Liu [Fri, 31 Jan 2020 06:11:10 +0000 (22:11 -0800)]
mm/sparse.c: reset section's mem_map when fully deactivated

After commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug"),
when a mem section is fully deactivated, section_mem_map still records
the section's start pfn, which is not used any more and will be
reassigned during re-addition.

In analogy with alloc/free pattern, it is better to clear all fields of
section_mem_map.

Beside this, it breaks the user space tool "makedumpfile" [1], which
makes assumption that a hot-removed section has mem_map as NULL, instead
of checking directly against SECTION_MARKED_PRESENT bit.  (makedumpfile
will be better to change the assumption, and need a patch)

The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
trigger a crash, and save vmcore by makedumpfile

[1]: makedumpfile, commit e73016540293 ("[v1.6.7] Update version")

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Pingfan Liu <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Kazuhito Hagio <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomm/mempolicy.c: fix out of bounds write in mpol_parse_str()
Dan Carpenter [Fri, 31 Jan 2020 06:11:07 +0000 (22:11 -0800)]
mm/mempolicy.c: fix out of bounds write in mpol_parse_str()

What we are trying to do is change the '=' character to a NUL terminator
and then at the end of the function we restore it back to an '='.  The
problem is there are two error paths where we jump to the end of the
function before we have replaced the '=' with NUL.

We end up putting the '=' in the wrong place (possibly one element
before the start of the buffer).

Link: http://lkml.kernel.org/r/[email protected]
Reported-by: [email protected]
Fixes: 095f1fc4ebf3 ("mempolicy: rework shmem mpol parsing and display")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Dmitry Vyukov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agomemcg: fix a crash in wb_workfn when a device disappears
Theodore Ts'o [Fri, 31 Jan 2020 06:11:04 +0000 (22:11 -0800)]
memcg: fix a crash in wb_workfn when a device disappears

Without memcg, there is a one-to-one mapping between the bdi and
bdi_writeback structures.  In this world, things are fairly
straightforward; the first thing bdi_unregister() does is to shutdown
the bdi_writeback structure (or wb), and part of that writeback ensures
that no other work queued against the wb, and that the wb is fully
drained.

With memcg, however, there is a one-to-many relationship between the bdi
and bdi_writeback structures; that is, there are multiple wb objects
which can all point to a single bdi.  There is a refcount which prevents
the bdi object from being released (and hence, unregistered).  So in
theory, the bdi_unregister() *should* only get called once its refcount
goes to zero (bdi_put will drop the refcount, and when it is zero,
release_bdi gets called, which calls bdi_unregister).

Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about
the Brave New memcg World, and calls bdi_unregister directly.  It does
this without informing the file system, or the memcg code, or anything
else.  This causes the root wb associated with the bdi to be
unregistered, but none of the memcg-specific wb's are shutdown.  So when
one of these wb's are woken up to do delayed work, they try to
dereference their wb->bdi->dev to fetch the device name, but
unfortunately bdi->dev is now NULL, thanks to the bdi_unregister()
called by del_gendisk().  As a result, *boom*.

Fortunately, it looks like the rest of the writeback path is perfectly
happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to
create a bdi_dev_name() function which can handle bdi->dev being NULL.
This also allows us to bulletproof the writeback tracepoints to prevent
them from dereferencing a NULL pointer and crashing the kernel if one is
tracing with memcg's enabled, and an iSCSI device dies or a USB storage
stick is pulled.

The most common way of triggering this will be hotremoval of a device
while writeback with memcg enabled is going on.  It was triggering
several times a day in a heavily loaded production environment.

Google Bug Id: 145475544

Link: https://lore.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agolib/test_bitmap: correct test data offsets for 32-bit
Andy Shevchenko [Fri, 31 Jan 2020 06:11:01 +0000 (22:11 -0800)]
lib/test_bitmap: correct test data offsets for 32-bit

On 32-bit platform the size of long is only 32 bits which makes wrong
offset in the array of 64 bit size.

Calculate offset based on BITS_PER_LONG.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 30544ed5de43 ("lib/bitmap: introduce bitmap_replace() helper")
Signed-off-by: Andy Shevchenko <[email protected]>
Reported-by: Guenter Roeck <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Yury Norov <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
5 years agoMerge tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 31 Jan 2020 17:30:41 +0000 (09:30 -0800)]
Merge tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "This is the first batch of KVM changes.

  ARM:
   - cleanups and corner case fixes.

  PPC:
   - Bugfixes

  x86:
   - Support for mapping DAX areas with large nested page table entries.

   - Cleanups and bugfixes here too. A particularly important one is a
     fix for FPU load when the thread has TIF_NEED_FPU_LOAD. There is
     also a race condition which could be used in guest userspace to
     exploit the guest kernel, for which the embargo expired today.

   - Fast path for IPI delivery vmexits, shaving about 200 clock cycles
     from IPI latency.

   - Protect against "Spectre-v1/L1TF" (bring data in the cache via
     speculative out of bound accesses, use L1TF on the sibling
     hyperthread to read it), which unfortunately is an even bigger
     whack-a-mole game than SpectreV1.

  Sean continues his mission to rewrite KVM. In addition to a sizable
  number of x86 patches, this time he contributed a pretty large
  refactoring of vCPU creation that affects all architectures but should
  not have any visible effect.

  s390 will come next week together with some more x86 patches"

* tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  x86/KVM: Clean up host's steal time structure
  x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed
  x86/kvm: Cache gfn to pfn translation
  x86/kvm: Introduce kvm_(un)map_gfn()
  x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit
  KVM: PPC: Book3S PR: Fix -Werror=return-type build failure
  KVM: PPC: Book3S HV: Release lock on page-out failure path
  KVM: arm64: Treat emulated TVAL TimerValue as a signed 32-bit integer
  KVM: arm64: pmu: Only handle supported event counters
  KVM: arm64: pmu: Fix chained SW_INCR counters
  KVM: arm64: pmu: Don't mark a counter as chained if the odd one is disabled
  KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset
  KVM: x86: Use a typedef for fastop functions
  KVM: X86: Add 'else' to unify fastop and execute call path
  KVM: x86: inline memslot_valid_for_gpte
  KVM: x86/mmu: Use huge pages for DAX-backed files
  KVM: x86/mmu: Remove lpage_is_disallowed() check from set_spte()
  KVM: x86/mmu: Fold max_mapping_level() into kvm_mmu_hugepage_adjust()
  KVM: x86/mmu: Zap any compound page when collapsing sptes
  KVM: x86/mmu: Remove obsolete gfn restoration in FNAME(fetch)
  ...

5 years agoMerge tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh...
Linus Torvalds [Fri, 31 Jan 2020 00:11:50 +0000 (16:11 -0800)]
Merge tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx

Pull x86 MPX removal from Dave Hansen:
 "MPX requires recompiling applications, which requires compiler
  support. Unfortunately, GCC 9.1 is expected to be be released without
  support for MPX. This means that there was only a relatively small
  window where folks could have ever used MPX. It failed to gain wide
  adoption in the industry, and Linux was the only mainstream OS to ever
  support it widely.

  Support for the feature may also disappear on future processors.

  This set completes the process that we started during the 5.4 merge
  window when the MPX prctl()s were removed. XSAVE support is left in
  place, which allows MPX-using KVM guests to continue to function"

* tag 'mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-mpx:
  x86/mpx: remove MPX from arch/x86
  mm: remove arch_bprm_mm_init() hook
  x86/mpx: remove bounds exception code
  x86/mpx: remove build infrastructure
  x86/alternatives: add missing insn.h include

5 years agoMerge tag 'mtd/for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Linus Torvalds [Thu, 30 Jan 2020 23:46:02 +0000 (15:46 -0800)]
Merge tag 'mtd/for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD updates from Miquel Raynal:
 "MTD core
   - block2mtd: page index should use pgoff_t
   - maps: physmap: minimal Runtime PM support
   - maps: pcmciamtd: avoid possible sleep-in-atomic-context bugs
   - concat: Fix a comment referring to an unknown symbol

  Raw NAND:
   - Macronix: Use match_string() helper
   - Atmel: switch to using devm_fwnode_gpiod_get()
   - Denali: rework the SKIP_BYTES feature and add reset controlling
   - Brcmnand: set appropriate DMA mask
   - Cadence: add unspecified HAS_IOMEM dependency
   - Various cleanup.

  Onenand:
   - Rename Samsung and Omap2 drivers to avoid possible build warnings
   - Enable compile testing
   - Various build issues
   - Kconfig cleanup

  SPI-NAND:
   - Support for Toshiba TC58CVG2S0HRAIJ

  SPI-NOR:
   - Add support for TB selection using SR bit 6,
   - Add support for few flashes"

* tag 'mtd/for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (41 commits)
  mtd: concat: Fix a comment referring to an unknown symbol
  mtd: rawnand: add unspecified HAS_IOMEM dependency
  mtd: block2mtd: page index should use pgoff_t
  mtd: maps: physmap: Add minimal Runtime PM support
  mtd: maps: pcmciamtd: fix possible sleep-in-atomic-context bugs in pcmciamtd_set_vpp()
  mtd: onenand: Rename omap2 driver to avoid a build warning
  mtd: onenand: Use a better name for samsung driver
  mtd: rawnand: atmel: switch to using devm_fwnode_gpiod_get()
  mtd: spinand: add support for Toshiba TC58CVG2S0HRAIJ
  mtd: rawnand: macronix: Use match_string() helper to simplify the code
  mtd: sharpslpart: Fix unsigned comparison to zero
  mtd: onenand: Enable compile testing of OMAP and Samsung drivers
  mtd: onenand: samsung: Fix printing format for size_t on 64-bit
  mtd: onenand: samsung: Fix pointer cast -Wpointer-to-int-cast warnings on 64 bit
  mtd: rawnand: denali: remove hard-coded DENALI_DEFAULT_OOB_SKIP_BYTES
  mtd: rawnand: denali_dt: add reset controlling
  dt-bindings: mtd: denali_dt: document reset property
  mtd: rawnand: denali_dt: Add support for configuring SPARE_AREA_SKIP_BYTES
  mtd: rawnand: denali_dt: error out if platform has no associated data
  mtd: rawnand: brcmnand: Set appropriate DMA mask
  ...

5 years agoMerge tag 'upstream-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw...
Linus Torvalds [Thu, 30 Jan 2020 23:44:12 +0000 (15:44 -0800)]
Merge tag 'upstream-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBI/UBIFS updates from Miquel Raynal:
 "This pull request contains mostly fixes for UBI and UBIFS:

  UBI:
   - Fixes for memory leaks in error paths
   - Fix for an logic error in a fastmap selfcheck

  UBIFS:
   - Fix for FS_IOC_SETFLAGS related to fscrypt flag
   - Support for FS_ENCRYPT_FL
   - Fix for a dead lock in bulk-read mode"

Sent on behalf of Richard Weinberger who is traveling.

* tag 'upstream-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubi: Fix an error pointer dereference in error handling code
  ubifs: Fix memory leak from c->sup_node
  ubifs: Fix ino_t format warnings in orphan_delete()
  ubifs: Fix deadlock in concurrent bulk-read and writepage
  ubifs: Fix wrong memory allocation
  ubi: Free the normal volumes in error paths of ubi_attach_mtd_dev()
  ubi: Check the presence of volume before call ubi_fastmap_destroy_checkmap()
  ubifs: Add support for FS_ENCRYPT_FL
  ubifs: Fix FS_IOC_SETFLAGS unexpectedly clearing encrypt flag
  ubi: wl: Remove set but not used variable 'prev_e'
  ubi: fastmap: Fix inverted logic in seen selfcheck

5 years agoMerge tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk...
Linus Torvalds [Thu, 30 Jan 2020 23:39:24 +0000 (15:39 -0800)]
Merge tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this series, we've implemented transparent compression
  experimentally. It supports LZO and LZ4, but will add more later as we
  investigate in the field more.

  At this point, the feature doesn't expose compressed space to user
  directly in order to guarantee potential data updates later to the
  space. Instead, the main goal is to reduce data writes to flash disk
  as much as possible, resulting in extending disk life time as well as
  relaxing IO congestion.

  Alternatively, we're also considering to add ioctl() to reclaim
  compressed space and show it to user after putting the immutable bit.

  Enhancements:
   - add compression support
   - avoid unnecessary locks in quota ops
   - harden power-cut scenario for zoned block devices
   - use private bio_set to avoid IO congestion
   - replace GC mutex with rwsem to serialize callers

  Bug fixes:
   - fix dentry consistency and memory corruption in rename()'s error case
   - fix wrong swap extent reports
   - fix casefolding bugs
   - change lock coverage to avoid deadlock
   - avoid GFP_KERNEL under f2fs_lock_op

  And, we've cleaned up sysfs entries to prepare no debugfs"

* tag 'f2fs-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (31 commits)
  f2fs: fix race conditions in ->d_compare() and ->d_hash()
  f2fs: fix dcache lookup of !casefolded directories
  f2fs: Add f2fs stats to sysfs
  f2fs: delete duplicate information on sysfs nodes
  f2fs: change to use rwsem for gc_mutex
  f2fs: update f2fs document regarding to fsync_mode
  f2fs: add a way to turn off ipu bio cache
  f2fs: code cleanup for f2fs_statfs_project()
  f2fs: fix miscounted block limit in f2fs_statfs_project()
  f2fs: show the CP_PAUSE reason in checkpoint traces
  f2fs: fix deadlock allocating bio_post_read_ctx from mempool
  f2fs: remove unneeded check for error allocating bio_post_read_ctx
  f2fs: convert inline_dir early before starting rename
  f2fs: fix memleak of kobject
  f2fs: fix to add swap extent correctly
  f2fs: run fsck when getting bad inode during GC
  f2fs: support data compression
  f2fs: free sysfs kobject
  f2fs: declare nested quota_sem and remove unnecessary sems
  f2fs: don't put new_page twice in f2fs_rename
  ...

5 years agoMerge tag 'for_v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Linus Torvalds [Thu, 30 Jan 2020 23:37:41 +0000 (15:37 -0800)]
Merge tag 'for_v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull UDF, quota, reiserfs, ext2 fixes and cleanups from Jan Kara:
 "A few assorted fixes and cleanups for udf, quota, reiserfs, and ext2"

* tag 'for_v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fs/reiserfs: remove unused macros
  fs/quota: remove unused macro
  udf: Clarify meaning of f_files in udf_statfs
  udf: Allow writing to 'Rewritable' partitions
  udf: Disallow R/W mode for disk with Metadata partition
  udf: Fix meaning of ENTITYID_FLAGS_* macros to be really bitwise-or flags
  udf: Fix free space reporting for metadata and virtual partitions
  udf: Update header files to UDF 2.60
  udf: Move OSTA Identifier Suffix macros from ecma_167.h to osta_udf.h
  udf: Fix spelling in EXT_NEXT_EXTENT_ALLOCDESCS
  ext2: Adjust indentation in ext2_fill_super
  quota: avoid time_t in v1_disk_dqblk definition
  reiserfs: Fix spurious unlock in reiserfs_fill_super() error handling
  reiserfs: Fix memory leak of journal device string
  ext2: set proper errno in error case of ext2_fill_super()

5 years agoMerge tag 'xfs-5.6-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Thu, 30 Jan 2020 23:24:24 +0000 (15:24 -0800)]
Merge tag 'xfs-5.6-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "In this release we clean out the last of the old 32-bit timestamp
  code, fix a number of bugs and memory corruptions on 32-bit platforms,
  and a refactoring of some of the extended attribute code.

  I think I'll be back next week with some refactoring of how the XFS
  buffer code returns error codes, however I prefer to hold onto that
  for another week to let it soak a while longer

  Summary:

   - Get rid of compat_time_t

   - Convert time_t to time64_t in quota code

   - Remove shadow variables

   - Prevent ATTR_ flag misuse in the attrmulti ioctls

   - Clean out strlen in the attr code

   - Remove some bogus asserts

   - Fix various file size limit calculation errors with 32-bit kernels

   - Pack xfs_dir2_sf_entry_t to fix build errors on arm oabi

   - Fix nowait inode locking calls for directio aio reads

   - Fix memory corruption bugs when invalidating remote xattr value
     buffers

   - Streamline remote attr value removal

   - Make the buffer log format size consistent across platforms

   - Strengthen buffer log format size checking

   - Fix messed up return types of xfs_inode_need_cow

   - Fix some unused variable warnings"

* tag 'xfs-5.6-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (24 commits)
  xfs: remove unused variable 'done'
  xfs: fix uninitialized variable in xfs_attr3_leaf_inactive
  xfs: change return value of xfs_inode_need_cow to int
  xfs: check log iovec size to make sure it's plausibly a buffer log format
  xfs: make struct xfs_buf_log_format have a consistent size
  xfs: complain if anyone tries to create a too-large buffer log item
  xfs: clean up xfs_buf_item_get_format return value
  xfs: streamline xfs_attr3_leaf_inactive
  xfs: fix memory corruption during remote attr value buffer invalidation
  xfs: refactor remote attr value buffer invalidation
  xfs: fix IOCB_NOWAIT handling in xfs_file_dio_aio_read
  xfs: Add __packed to xfs_dir2_sf_entry_t definition
  xfs: fix s_maxbytes computation on 32-bit kernels
  xfs: truncate should remove all blocks, not just to the end of the page cache
  xfs: introduce XFS_MAX_FILEOFF
  xfs: remove bogus assertion when online repair isn't enabled
  xfs: Remove all strlen in all xfs_attr_* functions for attr names.
  xfs: fix misuse of the XFS_ATTR_INCOMPLETE flag
  xfs: also remove cached ACLs when removing the underlying attr
  xfs: reject invalid flags combinations in XFS_IOC_ATTRMULTI_BY_HANDLE
  ...

5 years agoMerge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso...
Linus Torvalds [Thu, 30 Jan 2020 23:17:05 +0000 (15:17 -0800)]
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "This merge window, we've added some performance improvements in how we
  handle inode locking in the read/write paths, and improving the
  performance of Direct I/O overwrites.

  We also now record the error code which caused the first and most
  recent ext4_error() report in the superblock, to make it easier to
  root cause problems in production systems.

  There are also many of the usual cleanups and miscellaneous bug fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (49 commits)
  jbd2: clean __jbd2_journal_abort_hard() and __journal_abort_soft()
  jbd2: make sure ESHUTDOWN to be recorded in the journal superblock
  ext4, jbd2: ensure panic when aborting with zero errno
  jbd2: switch to use jbd2_journal_abort() when failed to submit the commit record
  jbd2_seq_info_next should increase position index
  jbd2: remove pointless assertion in __journal_remove_journal_head
  ext4,jbd2: fix comment and code style
  jbd2: delete the duplicated words in the comments
  ext4: fix extent_status trace points
  ext4: fix symbolic enum printing in trace output
  ext4: choose hardlimit when softlimit is larger than hardlimit in ext4_statfs_project()
  ext4: fix race conditions in ->d_compare() and ->d_hash()
  ext4: make dioread_nolock the default
  ext4: fix extent_status fragmentation for plain files
  jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal
  ext4: drop ext4_kvmalloc()
  ext4: Add EXT4_IOC_FSGETXATTR/EXT4_IOC_FSSETXATTR to compat_ioctl
  ext4: remove unused macro MPAGE_DA_EXTENT_TAIL
  ext4: add missing braces in ext4_ext_drop_refs()
  ext4: fix some nonstandard indentation in extents.c
  ...

5 years agoMerge branch 'cve-2019-3016' into kvm-next-5.6
Paolo Bonzini [Thu, 30 Jan 2020 17:47:38 +0000 (18:47 +0100)]
Merge branch 'cve-2019-3016' into kvm-next-5.6

From Boris Ostrovsky:

The KVM hypervisor may provide a guest with ability to defer remote TLB
flush when the remote VCPU is not running. When this feature is used,
the TLB flush will happen only when the remote VPCU is scheduled to run
again. This will avoid unnecessary (and expensive) IPIs.

Under certain circumstances, when a guest initiates such deferred action,
the hypervisor may miss the request. It is also possible that the guest
may mistakenly assume that it has already marked remote VCPU as needing
a flush when in fact that request had already been processed by the
hypervisor. In both cases this will result in an invalid translation
being present in a vCPU, potentially allowing accesses to memory locations
in that guest's address space that should not be accessible.

Note that only intra-guest memory is vulnerable.

The five patches address both of these problems:
1. The first patch makes sure the hypervisor doesn't accidentally clear
a guest's remote flush request
2. The rest of the patches prevent the race between hypervisor
acknowledging a remote flush request and guest issuing a new one.

Conflicts:
arch/x86/kvm/x86.c [move from kvm_arch_vcpu_free to kvm_arch_vcpu_destroy]

5 years agox86/KVM: Clean up host's steal time structure
Boris Ostrovsky [Fri, 6 Dec 2019 15:36:12 +0000 (15:36 +0000)]
x86/KVM: Clean up host's steal time structure

Now that we are mapping kvm_steal_time from the guest directly we
don't need keep a copy of it in kvm_vcpu_arch.st. The same is true
for the stime field.

This is part of CVE-2019-3016.

Signed-off-by: Boris Ostrovsky <[email protected]>
Reviewed-by: Joao Martins <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
5 years agox86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed
Boris Ostrovsky [Thu, 5 Dec 2019 03:45:32 +0000 (03:45 +0000)]
x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed

There is a potential race in record_steal_time() between setting
host-local vcpu->arch.st.steal.preempted to zero (i.e. clearing
KVM_VCPU_PREEMPTED) and propagating this value to the guest with
kvm_write_guest_cached(). Between those two events the guest may
still see KVM_VCPU_PREEMPTED in its copy of kvm_steal_time, set
KVM_VCPU_FLUSH_TLB and assume that hypervisor will do the right
thing. Which it won't.

Instad of copying, we should map kvm_steal_time and that will
guarantee atomicity of accesses to @preempted.

This is part of CVE-2019-3016.

Signed-off-by: Boris Ostrovsky <[email protected]>
Reviewed-by: Joao Martins <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
5 years agox86/kvm: Cache gfn to pfn translation
Boris Ostrovsky [Thu, 5 Dec 2019 01:30:51 +0000 (01:30 +0000)]
x86/kvm: Cache gfn to pfn translation

__kvm_map_gfn()'s call to gfn_to_pfn_memslot() is
* relatively expensive
* in certain cases (such as when done from atomic context) cannot be called

Stashing gfn-to-pfn mapping should help with both cases.

This is part of CVE-2019-3016.

Signed-off-by: Boris Ostrovsky <[email protected]>
Reviewed-by: Joao Martins <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
5 years agox86/kvm: Introduce kvm_(un)map_gfn()
Boris Ostrovsky [Tue, 12 Nov 2019 16:35:06 +0000 (16:35 +0000)]
x86/kvm: Introduce kvm_(un)map_gfn()

kvm_vcpu_(un)map operates on gfns from any current address space.
In certain cases we want to make sure we are not mapping SMRAM
and for that we can use kvm_(un)map_gfn() that we are introducing
in this patch.

This is part of CVE-2019-3016.

Signed-off-by: Boris Ostrovsky <[email protected]>
Reviewed-by: Joao Martins <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
5 years agox86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit
Boris Ostrovsky [Wed, 30 Oct 2019 19:01:31 +0000 (19:01 +0000)]
x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit

kvm_steal_time_set_preempted() may accidentally clear KVM_VCPU_FLUSH_TLB
bit if it is called more than once while VCPU is preempted.

This is part of CVE-2019-3016.

(This bug was also independently discovered by Jim Mattson
<[email protected]>)

Signed-off-by: Boris Ostrovsky <[email protected]>
Reviewed-by: Joao Martins <[email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>
5 years agoMerge tag 'kvm-ppc-next-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulu...
Paolo Bonzini [Thu, 30 Jan 2020 17:14:26 +0000 (18:14 +0100)]
Merge tag 'kvm-ppc-next-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

Second KVM PPC update for 5.6

* Fix compile warning on 32-bit machines
* Fix locking error in secure VM support

5 years agoMerge tag 'kvmarm-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm...
Paolo Bonzini [Thu, 30 Jan 2020 17:13:14 +0000 (18:13 +0100)]
Merge tag 'kvmarm-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for Linux 5.6

- Fix MMIO sign extension
- Fix HYP VA tagging on tag space exhaustion
- Fix PSTATE/CPSR handling when generating exception
- Fix MMU notifier's advertizing of young pages
- Fix poisoned page handling
- Fix PMU SW event handling
- Fix TVAL register access
- Fix AArch32 external abort injection
- Fix ITS unmapped collection handling
- Various cleanups

5 years agoMerge tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Thu, 30 Jan 2020 16:04:01 +0000 (08:04 -0800)]
Merge tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Davbe Airlie:
 "This is the main pull request for graphics for 5.6. Usual selection of
  changes all over.

  I've got one outstanding vmwgfx pull that touches mm so kept it
  separate until after all of this lands. I'll try and get it to you
  soon after this, but it might be early next week (nothing wrong with
  code, just my schedule is messy)

  This also hits a lot of fbdev drivers with some cleanups.

  Other notables:
   - vulkan timeline semaphore support added to syncobjs
   - nouveau turing secureboot/graphics support
   - Displayport MST display stream compression support

  Detailed summary:

  uapi:
   - dma-buf heaps added (and fixed)
   - command line add support for panel oreientation
   - command line allow overriding penguin count

  drm:
   - mipi dsi definition updates
   - lockdep annotations for dma_resv
   - remove dma-buf kmap/kunmap support
   - constify fb_ops in all fbdev drivers
   - MST fix for daisy chained hotplug-
   - CTA-861-G modes with VIC >= 193 added
   - fix drm_panel_of_backlight export
   - LVDS decoder support
   - more device based logging support
   - scanline alighment for dumb buffers
   - MST DSC helpers

  scheduler:
   - documentation fixes
   - job distribution improvements

  panel:
   - Logic PD type 28 panel support
   - Jimax8729d MIPI-DSI
   - igenic JZ4770
   - generic DSI devicetree bindings
   - sony acx424AKP panel
   - Leadtek LTK500HD1829
   - xinpeng XPP055C272
   - AUO B116XAK01
   - GiantPlus GPM940B0
   - BOE NV140FHM-N49
   - Satoz SAT050AT40H12R2
   - Sharp LS020B1DD01D panels.

  ttm:
   - use blocking WW lock

  i915:
   - hw/uapi state separation
   - Lock annotation improvements
   - selftest improvements
   - ICL/TGL DSI VDSC support
   - VBT parsing improvments
   - Display refactoring
   - DSI updates + fixes
   - HDCP 2.2 for CFL
   - CML PCI ID fixes
   - GLK+ fbc fix
   - PSR fixes
   - GEN/GT refactor improvments
   - DP MST fixes
   - switch context id alloc to xarray
   - workaround updates
   - LMEM debugfs support
   - tiled monitor fixes
   - ICL+ clock gating programming removed
   - DP MST disable sequence fixed
   - LMEM discontiguous object maps
   - prefaulting for discontiguous objects
   - use LMEM for dumb buffers if possible
   - add LMEM mmap support

  amdgpu:
   - enable sync object timelines for vulkan
   - MST atomic routines
   - enable MST DSC support
   - add DMCUB display microengine support
   - DC OEM i2c support
   - Renoir DC fixes
   - Initial HDCP 2.x support
   - BACO support for Arcturus
   - Use BACO for runtime PM power save
   - gfxoff on navi10
   - gfx10 golden updates and fixes
   - DCN support on POWER
   - GFXOFF for raven1 refresh
   - MM engine idle handlers cleanup
   - 10bpc EDP panel fixes
   - renoir watermark fixes
   - SR-IOV fixes
   - Arcturus VCN fixes
   - GDDR6 training fixes
   - freesync fixes
   - Pollock support

  amdkfd:
   - unify more codepath with amdgpu
   - use KIQ to setup HIQ rather than MMIO

  radeon:
   - fix vma fault handler race
   - PPC DMA fix
   - register check fixes for r100/r200

  nouveau:
   - mmap_sem vs dma_resv fix
   - rewrite the ACR secure boot code for Turing
   - TU10x graphics engine support (TU11x pending)
   - Page kind mapping for turing
   - 10-bit LUT support
   - GP10B Tegra fixes
   - HD audio regression fix

  hisilicon/hibmc:
   - use generic fbdev code and helpers

  rockchip:
   - dsi/px30 support

  virtio:
   - fb damage support
   - static some functions

  vc4:
   - use dma_resv lock wrappers

  msm:
   - use dma_resv lock wrappers
   - sc7180 display + DSI support
   - a618 support
   - UBWC support improvements

  vmwgfx:
   - updates + new logging uapi

  exynos:
   - enable/disable callback cleanups

  etnaviv:
   - use dma_resv lock wrappers

  atmel-hlcdc:
   - clock fixes

  mediatek:
   - cmdq support
   - non-smooth cursor fixes
   - ctm property support

  sun4i:
   - suspend support
   - A64 mipi dsi support

  rcar-du:
   - Color management module support
   - LVDS encoder dual-link support
   - R8A77980 support

  analogic:
   - add support for an6345

  ast:
   - atomic modeset support
   - primary plane garbage fix

  arcgpu:
   - fixes for fourcc handling

  tegra:
   - minor fixes and improvments

  mcde:
   - vblank support

  meson:
   - OSD1 plane AFBC commit

  gma500:
   - add pageflip support
   - reomve global drm_dev

  komeda:
   - tweak debugfs output
   - d32 support
   - runtime PM suppotr

  udl:
   - use generic shmem helpers
   - cleanup and fixes"

* tag 'drm-next-2020-01-30' of git://anongit.freedesktop.org/drm/drm: (1998 commits)
  drm/nouveau/fb/gp102-: allow module to load even when scrubber binary is missing
  drm/nouveau/acr: return error when registering LSF if ACR not supported
  drm/nouveau/disp/gv100-: not all channel types support reporting error codes
  drm/nouveau/disp/nv50-: prevent oops when no channel method map provided
  drm/nouveau: support synchronous pushbuf submission
  drm/nouveau: signal pending fences when channel has been killed
  drm/nouveau: reject attempts to submit to dead channels
  drm/nouveau: zero vma pointer even if we only unreference it rather than free
  drm/nouveau: Add HD-audio component notifier support
  drm/nouveau: fix build error without CONFIG_IOMMU_API
  drm/nouveau/kms/nv04: remove set but not used variable 'width'
  drm/nouveau/kms/nv50: remove set but not unused variable 'nv_connector'
  drm/nouveau/mmu: fix comptag memory leak
  drm/nouveau/gr/gp10b: Use gp100_grctx and gp100_gr_zbc
  drm/nouveau/pmu/gm20b,gp10b: Fix Falcon bootstrapping
  drm/exynos: Rename Exynos to lowercase
  drm/exynos: change callback names
  drm/mst: Don't do atomic checks over disabled managers
  drm/amdgpu: add the lost mutex_init back
  drm/amd/display: skip opp blank or unblank if test pattern enabled
  ...

5 years agoMerge tag 'for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power...
Linus Torvalds [Thu, 30 Jan 2020 15:51:24 +0000 (07:51 -0800)]
Merge tag 'for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply

Pull power supply and reset updates from Sebastian Reichel:
 "Core:
   - Add battery internal resistance temperature table support

  Drivers:
   - sc27xx: Optimize the battery resistance with measuring temperature
   - max17042-battery: Add MAX17055 support
   - bq25890-charger: Add support of BQ25892 and BQ25896 chips
   - misc fixes"

* tag 'for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (44 commits)
  power: supply: ipaq_micro_battery: remove unneeded semicolon
  power: supply: bq25890_charger: fix incorrect error return when bq25890_field_read fails
  power: supply: axp20x_usb_power: Only poll while offline
  power: supply: axp20x_usb_power: Add wakeup control
  power: supply: axp20x_usb_power: Allow offlining
  power: supply: axp20x_usb_power: Use a match structure
  power: suppy: ucs1002: Make the symbol 'ucs1002_regulator_enable' static
  power: reset: at91-poweroff: use proper master clock register offset
  power: reset: at91-poweroff: introduce struct shdwc_reg_config
  power: supply: bq25890_charger: Add DT and I2C ids for all supported chips
  dt-bindings: Add new chips to bq25890 binding documentation
  power: supply: bq25890_charger: Add support of BQ25892 and BQ25896 chips
  power: supply: core: Update sysfs-class-power ABI document
  power: supply: sbs-battery: Fix a signedness bug in sbs_get_battery_capacity()
  power: supply: ltc2941-battery-gauge: fix use-after-free
  power: supply: max17040: Correct IRQ wake handling
  power: supply: axp20x_usb_power: Remove unused device_node
  power: supply: axp20x_ac_power: Add wakeup control
  power: supply: axp20x_ac_power: Allow offlining
  power: supply: axp20x_ac_power: Fix reporting online status
  ...

5 years agoMerge tag 'devicetree-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh...
Linus Torvalds [Thu, 30 Jan 2020 15:47:58 +0000 (07:47 -0800)]
Merge tag 'devicetree-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:

 - Update dtc to upstream v1.5.1-22-gc40aeb60b47a (plus 1 revert)

 - Fix for DMA coherent devices on Power

 - Rework and simplify the DT phandle cache code

 - DT schema conversions for LEDS, gpio-leds, STM32 dfsdm, STM32 UART,
   STM32 ROMEM, STM32 watchdog, STM32 DMAs, STM32 mlahb, STM32 RTC,
   STM32 RCC, STM32 syscon, rs485, Renesas rCar CSI2, Faraday FTIDE010,
   DWC2, Arm idle-states, Allwinner legacy resets, PRCM and clocks,
   Allwinner H6 OPP, Allwinner AHCI, Allwinner MBUS, Allwinner A31 CSI,
   Allwinner h/w codec, Allwinner A10 system ctrl, Allwinner SRAM,
   Allwinner USB PHY, Renesas CEU, generic PCI host, Arm Versatile PCI

 - New binding schemas for SATA and PATA controllers, TI and Infineon VR
   controllers, MAX31730

 - New compatible strings for i.MX8QM, WCN3991, renesas,r8a77961-wdt,
   renesas,etheravb-r8a77961

 - Add USB 'super-speed-plus' as a documented speed

 - Vendor prefixes for broadmobi, calaosystems, kam, and mps

 - Clean-up the multiple flavors of ST-Ericsson vendor prefixes

* tag 'devicetree-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (66 commits)
  scripts/dtc: Revert "yamltree: Ensure consistent bracketing of properties with phandles"
  of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc
  dt-bindings: leds: Convert gpio-leds to DT schema
  dt-bindings: leds: Convert common LED binding to schema
  dt-bindings: PCI: Convert generic host binding to DT schema
  dt-bindings: PCI: Convert Arm Versatile binding to DT schema
  dt-bindings: Be explicit about installing deps
  dt-bindings: stm32: convert dfsdm to json-schema
  dt-bindings: serial: Convert STM32 UART to json-schema
  dt-bindings: serial: Convert rs485 bindings to json-schema
  dt-bindings: timer: Use non-empty ranges in example
  dt-bindings: arm-boards: typo fix
  dt-bindings: Add TI and Infineon VR Controllers as trivial devices
  dt-binding: usb: add "super-speed-plus"
  dt-bindings: rcar-csi2: Convert bindings to json-schema
  dt-bindings: iio: adc: ad7606: Fix wrong maxItems value
  dt-bindings: Convert Faraday FTIDE010 to DT schema
  dt-bindings: Create DT bindings for PATA controllers
  dt-bindings: Create DT bindings for SATA controllers
  dt: bindings: add vendor prefix for Kamstrup A/S
  ...

5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 30 Jan 2020 15:42:00 +0000 (07:42 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

 1) Various mptcp fixupes from Florian Westphal and Geery Uytterhoeven.

 2) Don't clear the node/port GUIDs after we've assigned the correct
    values to them. From Leon Romanovsky.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net:
  net/core: Do not clear VF index for node/port GUIDs query
  mptcp: Fix undefined mptcp_handle_ipv6_mapped for modular IPV6
  net: drop_monitor: Use kstrdup
  udp: document udp_rcv_segment special case for looped packets
  mptcp: MPTCP_HMAC_TEST should depend on MPTCP
  mptcp: Fix incorrect IPV6 dependency check
  Revert "MAINTAINERS: mptcp@ mailing list is moderated"
  mptcp: handle tcp fallback when using syn cookies
  mptcp: avoid a lockdep splat when mcast group was joined
  mptcp: fix panic on user pointer access
  mptcp: defer freeing of cached ext until last moment
  net: mvneta: fix XDP support if sw bm is used as fallback
  sch_choke: Use kvcalloc
  mptcp: Fix build with PROC_FS disabled.
  MAINTAINERS: mptcp@ mailing list is moderated

5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide
Linus Torvalds [Thu, 30 Jan 2020 15:39:10 +0000 (07:39 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide

Pull IDE updates from David Miller:

 1) Fix mem region name in tx4949ide driver, from Christophe JAILLET.

 2) Make drive->dn read only, it should not be changeable by users. From
    Dan Carpenter.

 3) Several cast fixups from Krzysztof Kozlowski.

There is also going to be a removal of a now unused IDE driver, but that
will come via the MIPS tree.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
  ide: make drive->dn read only
  ide: serverworks: potential overflow in svwks_set_pio_mode()
  cmd64x: potential buffer overflow in cmd64x_program_timings()
  ide: remove unneeded header include path to drivers/ide
  ide: qd65xx: Fix cast to pointer from integer of different size
  ide: ht6560b: Fix cast to pointer from integer of different size
  ide: remove set but not used variable 'hwif'
  ide: remove unnecessary touch_softlockup_watchdog
  ide: tx4939ide: Fix the name used in a 'devm_request_mem_region()' call
  ide: Use dev_get_drvdata where possible

5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Thu, 30 Jan 2020 15:36:43 +0000 (07:36 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

Pull sparc updates from David Miller:

 1) Add a proper .exit.data section.

 2) Fix ipc64_perm type definition, from Arnd Bergmann.

 3) Support folded p4d page tables on sparc64, from Mike Rapport.

 4) Remove uses of struct timex, also from Arnd Bergmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  y2038: sparc: remove use of struct timex
  sparc64: add support for folded p4d page tables
  sparc/console: kill off obsolete declarations
  sparc32: fix struct ipc64_perm type definition
  sparc32, leon: Stop adding vendor and device id to prom ambapp path components
  sparc: Add .exit.data section.
  sparc: remove unneeded uapi/asm/statfs.h

5 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Thu, 30 Jan 2020 15:34:33 +0000 (07:34 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 KVM fix from Catalin Marinas:
 "Set the correct MDCR_EL2 register value on the first run of a vCPU"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  KVM: arm64: Write arch.mdcr_el2 changes since last vcpu_load on VHE

5 years agonet/core: Do not clear VF index for node/port GUIDs query
Leon Romanovsky [Thu, 30 Jan 2020 12:59:49 +0000 (14:59 +0200)]
net/core: Do not clear VF index for node/port GUIDs query

VF numbers were assigned to node_guid and port_guid, but cleared
right before such query calls were issued. It caused to return
node/port GUIDs of VF index 0 for all VFs.

Fixes: 30aad41721e0 ("net/core: Add support for getting VF GUIDs")
Reported-by: Adrian Chiris <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 years agoy2038: sparc: remove use of struct timex
Arnd Bergmann [Mon, 16 Dec 2019 14:48:53 +0000 (15:48 +0100)]
y2038: sparc: remove use of struct timex

'struct timex' is one of the last users of 'struct timeval' and is
only referenced in one place in the kernel any more, to convert the
user space timex into the kernel-internal version on sparc64, with a
different tv_usec member type.

As a preparation for hiding the time_t definition and everything
using that in the kernel, change the implementation once more
to only convert the timeval member, and then enclose the
struct definition in an #ifdef.

Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Julian Calaby <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 years agosparc64: add support for folded p4d page tables
Mike Rapoport [Sun, 24 Nov 2019 08:57:20 +0000 (10:57 +0200)]
sparc64: add support for folded p4d page tables

Implement primitives necessary for the 4th level folding, add walks of p4d
level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h.

Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 years agoide: make drive->dn read only
Dan Carpenter [Tue, 21 Jan 2020 13:06:42 +0000 (16:06 +0300)]
ide: make drive->dn read only

The IDE core always sets ->dn correctly so changing it is never
required.

Setting it to a different value than assigned by IDE core is very likely
to result in data corruption (due to wrong transfer timings being set on
the controller etc.)

Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Bartlomiej Zolnierkiewicz <[email protected]>
Tested-by: Bartlomiej Zolnierkiewicz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 years agomptcp: Fix undefined mptcp_handle_ipv6_mapped for modular IPV6
Geert Uytterhoeven [Thu, 30 Jan 2020 09:45:26 +0000 (10:45 +0100)]
mptcp: Fix undefined mptcp_handle_ipv6_mapped for modular IPV6

If CONFIG_MPTCP=y, CONFIG_MPTCP_IPV6=n, and CONFIG_IPV6=m:

    ERROR: "mptcp_handle_ipv6_mapped" [net/ipv6/ipv6.ko] undefined!

This does not happen if CONFIG_MPTCP_IPV6=y, as CONFIG_MPTCP_IPV6
selects CONFIG_IPV6, and thus forces CONFIG_IPV6 builtin.

As exporting a symbol for an empty function would be a bit wasteful, fix
this by providing a dummy version of mptcp_handle_ipv6_mapped() for the
CONFIG_MPTCP_IPV6=n case.

Rename mptcp_handle_ipv6_mapped() to mptcpv6_handle_mapped(), to make it
clear this is a pure-IPV6 function, just like mptcpv6_init().

Fixes: cec37a6e41aae7bf ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 years agonet: drop_monitor: Use kstrdup
Joe Perches [Tue, 28 Jan 2020 19:02:50 +0000 (11:02 -0800)]
net: drop_monitor: Use kstrdup

Convert the equivalent but rather odd uses of kmemdup with
__GFP_ZERO to the more common kstrdup and avoid unnecessary
zeroing of copied over memory.

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 years agoudp: document udp_rcv_segment special case for looped packets
Willem de Bruijn [Wed, 29 Jan 2020 20:20:17 +0000 (15:20 -0500)]
udp: document udp_rcv_segment special case for looped packets

Commit 6cd021a58c18a ("udp: segment looped gso packets correctly")
fixes an issue with rare udp gso multicast packets looped onto the
receive path.

The stable backport makes the narrowest change to target only these
packets, when needed. As opposed to, say, expanding __udp_gso_segment,
which is harder to reason to be free from unintended side-effects.

But the resulting code is hardly self-describing.
Document its purpose and rationale.

Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 years agomptcp: MPTCP_HMAC_TEST should depend on MPTCP
Geert Uytterhoeven [Wed, 29 Jan 2020 18:02:24 +0000 (19:02 +0100)]
mptcp: MPTCP_HMAC_TEST should depend on MPTCP

As the MPTCP HMAC test is integrated into the MPTCP code, it can be
built only when MPTCP is enabled.  Hence when MPTCP is disabled, asking
the user if the test code should be enabled is futile.

Wrap the whole block of MPTCP-specific config options inside a check for
MPTCP.  While at it, drop the "default n" for MPTCP_HMAC_TEST, as that
is the default anyway.

Fixes: 65492c5a6ab5df50 ("mptcp: move from sha1 (v0) to sha256 (v1)")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 years agomptcp: Fix incorrect IPV6 dependency check
Geert Uytterhoeven [Wed, 29 Jan 2020 18:01:17 +0000 (19:01 +0100)]
mptcp: Fix incorrect IPV6 dependency check

If CONFIG_MPTCP=y, CONFIG_MPTCP_IPV6=n, and CONFIG_IPV6=m:

    net/mptcp/protocol.o: In function `__mptcp_tcp_fallback':
    protocol.c:(.text+0x786): undefined reference to `inet6_stream_ops'

Fix this by checking for CONFIG_MPTCP_IPV6 instead of CONFIG_IPV6, like
is done in all other places in the mptcp code.

Fixes: 8ab183deb26a3b79 ("mptcp: cope with later TCP fallback")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 years agoMerge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-next
Dave Airlie [Thu, 30 Jan 2020 05:18:33 +0000 (15:18 +1000)]
Merge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-next

A couple of OOPS fixes, fixes for TU1xx if firmware isn't available,
better behaviour in the face of GPU faults, and a patch to make HD
audio work again after runpm changes.

Signed-off-by: Dave Airlie <[email protected]>
From: Ben Skeggs <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/
5 years agoMerge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Thu, 30 Jan 2020 03:56:50 +0000 (19:56 -0800)]
Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull mmu_notifier updates from Jason Gunthorpe:
 "This small series revises the names in mmu_notifier to make the code
  clearer and more readable"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  mm/mmu_notifiers: Use 'interval_sub' as the variable for mmu_interval_notifier
  mm/mmu_notifiers: Use 'subscription' as the variable name for mmu_notifier
  mm/mmu_notifier: Rename struct mmu_notifier_mm to mmu_notifier_subscriptions

5 years agoMerge tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner...
Linus Torvalds [Thu, 30 Jan 2020 03:38:34 +0000 (19:38 -0800)]
Merge tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread management updates from Christian Brauner:
 "Sargun Dhillon over the last cycle has worked on the pidfd_getfd()
  syscall.

  This syscall allows for the retrieval of file descriptors of a process
  based on its pidfd. A task needs to have ptrace_may_access()
  permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and
  Andy) on the target.

  One of the main use-cases is in combination with seccomp's user
  notification feature. As a reminder, seccomp's user notification
  feature was made available in v5.0. It allows a task to retrieve a
  file descriptor for its seccomp filter. The file descriptor is usually
  handed of to a more privileged supervising process. The supervisor can
  then listen for syscall events caught by the seccomp filter of the
  supervisee and perform actions in lieu of the supervisee, usually
  emulating syscalls. pidfd_getfd() is needed to expand its uses.

  There are currently two major users that wait on pidfd_getfd() and one
  future user:

   - Netflix, Sargun said, is working on a service mesh where users
     should be able to connect to a dns-based VIP. When a user connects
     to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be
     redirected to an envoy process. This service mesh uses seccomp user
     notifications and pidfd to intercept all connect calls and instead
     of connecting them to 1.2.3.4:80 connects them to e.g.
     127.0.0.1:8080.

   - LXD uses the seccomp notifier heavily to intercept and emulate
     mknod() and mount() syscalls for unprivileged containers/processes.
     With pidfd_getfd() more uses-cases e.g. bridging socket connections
     will be possible.

   - The patchset has also seen some interest from the browser corner.
     Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a
     broker process. In the future glibc will start blocking all signals
     during dlopen() rendering this type of sandbox impossible. Hence,
     in the future Firefox will switch to a seccomp-user-nofication
     based sandbox which also makes use of file descriptor retrieval.
     The thread for this can be found at
     https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html

  With pidfd_getfd() it is e.g. possible to bridge socket connections
  for the supervisee (binding to a privileged port) and taking actions
  on file descriptors on behalf of the supervisee in general.

  Sargun's first version was using an ioctl on pidfds but various people
  pushed for it to be a proper syscall which he duely implemented as
  well over various review cycles. Selftests are of course included.
  I've also added instructions how to deal with merge conflicts below.

  There's also a small fix coming from the kernel mentee project to
  correctly annotate struct sighand_struct with __rcu to fix various
  sparse warnings. We've received a few more such fixes and even though
  they are mostly trivial I've decided to postpone them until after -rc1
  since they came in rather late and I don't want to risk introducing
  build warnings.

  Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is
  needed to avoid allocation recursions triggerable by storage drivers
  that have userspace parts that run in the IO path (e.g. dm-multipath,
  iscsi, etc). These allocation recursions deadlock the device.

  The new prctl() allows such privileged userspace components to avoid
  allocation recursions by setting the PF_MEMALLOC_NOIO and
  PF_LESS_THROTTLE flags. The patch carries the necessary acks from the
  relevant maintainers and is routed here as part of prctl()
  thread-management."

* tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
  sched.h: Annotate sighand_struct with __rcu
  test: Add test for pidfd getfd
  arch: wire up pidfd_getfd syscall
  pid: Implement pidfd_getfd syscall
  vfs, fdtable: Add fget_task helper

5 years agoMerge tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block
Linus Torvalds [Thu, 30 Jan 2020 02:53:37 +0000 (18:53 -0800)]
Merge tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:

 - Support for various new opcodes (fallocate, openat, close, statx,
   fadvise, madvise, openat2, non-vectored read/write, send/recv, and
   epoll_ctl)

 - Faster ring quiesce for fileset updates

 - Optimizations for overflow condition checking

 - Support for max-sized clamping

 - Support for probing what opcodes are supported

 - Support for io-wq backend sharing between "sibling" rings

 - Support for registering personalities

 - Lots of little fixes and improvements

* tag 'for-5.6/io_uring-vfs-2020-01-29' of git://git.kernel.dk/linux-block: (64 commits)
  io_uring: add support for epoll_ctl(2)
  eventpoll: support non-blocking do_epoll_ctl() calls
  eventpoll: abstract out epoll_ctl() handler
  io_uring: fix linked command file table usage
  io_uring: support using a registered personality for commands
  io_uring: allow registering credentials
  io_uring: add io-wq workqueue sharing
  io-wq: allow grabbing existing io-wq
  io_uring/io-wq: don't use static creds/mm assignments
  io-wq: make the io_wq ref counted
  io_uring: fix refcounting with batched allocations at OOM
  io_uring: add comment for drain_next
  io_uring: don't attempt to copy iovec for READ/WRITE
  io_uring: honor IOSQE_ASYNC for linked reqs
  io_uring: prep req when do IOSQE_ASYNC
  io_uring: use labeled array init in io_op_defs
  io_uring: optimise sqe-to-req flags translation
  io_uring: remove REQ_F_IO_DRAINED
  io_uring: file switch work needs to get flushed on exit
  io_uring: hide uring_fd in ctx
  ...

5 years agoMerge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Thu, 30 Jan 2020 02:16:16 +0000 (18:16 -0800)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This series is slightly unusual because it includes Arnd's compat
  ioctl tree here:

    1c46a2cf2dbd Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue

  Excluding Arnd's changes, this is mostly an update of the usual
  drivers: megaraid_sas, mpt3sas, qla2xxx, ufs, lpfc, hisi_sas.

  There are a couple of core and base updates around error propagation
  and atomicity in the attribute container base we use for the SCSI
  transport classes.

  The rest is minor changes and updates"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (149 commits)
  scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask
  scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic affinity
  scsi: hisi_sas: Modify the file permissions of trigger_dump to write only
  scsi: hisi_sas: Replace magic number when handle channel interrupt
  scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock
  scsi: hisi_sas: use threaded irq to process CQ interrupts
  scsi: ufs: Use UFS device indicated maximum LU number
  scsi: ufs: Add max_lu_supported in struct ufs_dev_info
  scsi: ufs: Delete is_init_prefetch from struct ufs_hba
  scsi: ufs: Inline two functions into their callers
  scsi: ufs: Move ufshcd_get_max_pwr_mode() to ufshcd_device_params_init()
  scsi: ufs: Split ufshcd_probe_hba() based on its called flow
  scsi: ufs: Delete struct ufs_dev_desc
  scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails
  scsi: ufs-mediatek: enable low-power mode for hibern8 state
  scsi: ufs: export some functions for vendor usage
  scsi: ufs-mediatek: add dbg_register_dump implementation
  scsi: qla2xxx: Fix a NULL pointer dereference in an error path
  scsi: qla1280: Make checking for 64bit support consistent
  scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1
  ...

5 years agoMerge tag 'for-5.6/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/devic...
Linus Torvalds [Thu, 30 Jan 2020 02:08:49 +0000 (18:08 -0800)]
Merge tag 'for-5.6/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Fix DM core's potential for q->make_request_fn NULL pointer in the
   unlikely case that a DM device is created without a DM table and then
   accessed due to upper-layer userspace code or user error.

 - Fix DM thin-provisioning's metadata_pre_commit_callback to not use
   memory after it is free'd. Also refactor code to disallow changing
   the thin-pool's data device once in use -- doing so guarantees smae
   lifetime of pool's data device relative to the pool metadata.

 - Fix DM space maps used by DM thinp and DM cache to avoid reuse of a
   already used block. This race was identified with extremely heavy
   snapshot use in the context of DM thin provisioning.

 - Fix DM raid's table status relative to an active rebuild.

 - Fix DM crypt to use GFP_NOIO rather than GFP_NOFS in call to
   skcipher_request_alloc(). Also fix benbi IV constructor crash if used
   in authenticated mode.

 - Add DM crypt support for Elephant diffuser to allow for Bitlocker
   compatibility.

 - Fix DM verity target to not prefetch hash blocks for data that has
   already been verified.

 - Fix DM writecache's incorrect flush sequence during commit when in
   SSD mode.

 - Improve DM writecache's sequential write performance on SSDs.

 - Add DM zoned target support for zone sizes smaller than 128MiB.

 - Add DM multipath 'queue_if_no_path_timeout_secs' module param to
   allow timeout if path isn't reinstated. This allows users a kernel
   safety-net against IO hanging indefinitely, due to no active paths,
   that has historically only been provided by multipathd userspace.

 - Various DM code cleanups to use true/false rather than 1/0, a
   variable rename in dm-dust, and fix for a math error in comment for
   DM thin metadata's ondisk format.

* tag 'for-5.6/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (21 commits)
  dm: fix potential for q->make_request_fn NULL pointer
  dm writecache: improve performance of large linear writes on SSDs
  dm mpath: Add timeout mechanism for queue_if_no_path
  dm thin: change data device's flush_bio to be member of struct pool
  dm thin: don't allow changing data device during thin-pool reload
  dm thin: fix use-after-free in metadata_pre_commit_callback
  dm thin metadata: use pool locking at end of dm_pool_metadata_close
  dm writecache: fix incorrect flush sequence when doing SSD mode commit
  dm crypt: fix benbi IV constructor crash if used in authenticated mode
  dm crypt: Implement Elephant diffuser for Bitlocker compatibility
  dm space map common: fix to ensure new block isn't already in use
  dm verity: don't prefetch hash blocks for already-verified data
  dm crypt: fix GFP flags passed to skcipher_request_alloc()
  dm thin metadata: Fix trivial math error in on-disk format documentation
  dm thin metadata: use true/false for bool variable
  dm snapshot: use true/false for bool variable
  dm bio prison v2: use true/false for bool variable
  dm mpath: use true/false for bool variable
  dm zoned: support zone sizes smaller than 128MiB
  dm raid: table line rebuild status fixes
  ...

5 years agoMerge tag 'docs-5.6' of git://git.lwn.net/linux
Linus Torvalds [Wed, 29 Jan 2020 23:27:31 +0000 (15:27 -0800)]
Merge tag 'docs-5.6' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It has been a relatively quiet cycle for documentation, but there's
  still a couple of things of note:

   - Conversion of the NFS documentation to RST

   - A new document on how to help with documentation (and a maintainer
     profile entry too)

  Plus the usual collection of typo fixes, etc"

* tag 'docs-5.6' of git://git.lwn.net/linux: (40 commits)
  docs: filesystems: add overlayfs to index.rst
  docs: usb: remove some broken references
  scripts/find-unused-docs: Fix massive false positives
  docs: nvdimm: use ReST notation for subsection
  zram: correct documentation about sysfs node of huge page writeback
  Documentation: zram: various fixes in zram.rst
  Add a maintainer entry profile for documentation
  Add a document on how to contribute to the documentation
  docs: Keep up with the location of NoUri
  Documentation: Call out example SYM_FUNC_* usage as x86-specific
  Documentation: nfs: fault_injection: convert to ReST
  Documentation: nfs: pnfs-scsi-server: convert to ReST
  Documentation: nfs: convert pnfs-block-server to ReST
  Documentation: nfs: idmapper: convert to ReST
  Documentation: convert nfsd-admin-interfaces to ReST
  Documentation: nfs-rdma: convert to ReST
  Documentation: nfsroot.rst: COSMETIC: refill a paragraph
  Documentation: nfsroot.txt: convert to ReST
  Documentation: convert nfs.txt to ReST
  Documentation: filesystems: convert vfat.txt to RST
  ...

5 years agoMerge tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Wed, 29 Jan 2020 23:25:34 +0000 (15:25 -0800)]
Merge tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest kunit updates from Shuah Khan:
 "This kunit update consists of:

   - Support for building kunit as a module from Alan Maguire

   - AppArmor KUnit tests for policy unpack from Mike Salvatore"

* tag 'linux-kselftest-5.6-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: building kunit as a module breaks allmodconfig
  kunit: update documentation to describe module-based build
  kunit: allow kunit to be loaded as a module
  kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds
  kunit: allow kunit tests to be loaded as a module
  kunit: hide unexported try-catch interface in try-catch-impl.h
  kunit: move string-stream.h to lib/kunit
  apparmor: add AppArmor KUnit tests for policy unpack

5 years agoMerge tag 'linux-kselftest-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 29 Jan 2020 23:24:03 +0000 (15:24 -0800)]
Merge tag 'linux-kselftest-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest update from Shuah Khan:
 "This Kselftest update consists of several fixes to framework and
  individual tests.

  In addition, it enables LKDTM tests adding lkdtm target to kselftest
  Makefile"

* tag 'linux-kselftest-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ftrace: fix glob selftest
  selftests: settings: tests can be in subsubdirs
  kselftest: Minimise dependency of get_size on C library interfaces
  selftests/livepatch: Remove unused local variable in set_ftrace_enabled()
  selftests/livepatch: Replace set_dynamic_debug() with setup_config() in README
  selftests/lkdtm: Add tests for LKDTM targets
  selftests: Uninitialized variable in test_cgcore_proc_migration()
  selftests: fix build behaviour on targets' failures

5 years agoMerge tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux...
Linus Torvalds [Wed, 29 Jan 2020 22:55:47 +0000 (14:55 -0800)]
Merge tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground

Pull y2038 updates from Arnd Bergmann:
 "Core, driver and file system changes

  These are updates to device drivers and file systems that for some
  reason or another were not included in the kernel in the previous
  y2038 series.

  I've gone through all users of time_t again to make sure the kernel is
  in a long-term maintainable state, replacing all remaining references
  to time_t with safe alternatives.

  Some related parts of the series were picked up into the nfsd, xfs,
  alsa and v4l2 trees. A final set of patches in linux-mm removes the
  now unused time_t/timeval/timespec types and helper functions after
  all five branches are merged for linux-5.6, ensuring that no new users
  get merged.

  As a result, linux-5.6, or my backport of the patches to 5.4 [1],
  should be the first release that can serve as a base for a 32-bit
  system designed to run beyond year 2038, with a few remaining caveats:

   - All user space must be compiled with a 64-bit time_t, which will be
     supported in the coming musl-1.2 and glibc-2.32 releases, along
     with installed kernel headers from linux-5.6 or higher.

   - Applications that use the system call interfaces directly need to
     be ported to use the time64 syscalls added in linux-5.1 in place of
     the existing system calls. This impacts most users of futex() and
     seccomp() as well as programming languages that have their own
     runtime environment not based on libc.

   - Applications that use a private copy of kernel uapi header files or
     their contents may need to update to the linux-5.6 version, in
     particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h,
     linux/elfcore.h, linux/sockios.h, linux/timex.h and
     linux/can/bcm.h.

   - A few remaining interfaces cannot be changed to pass a 64-bit
     time_t in a compatible way, so they must be configured to use
     CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit
     timestamps. Most importantly this impacts all users of 'struct
     input_event'.

   - All y2038 problems that are present on 64-bit machines also apply
     to 32-bit machines. In particular this affects file systems with
     on-disk timestamps using signed 32-bit seconds: ext4 with
     ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs"

[1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame

* tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits)
  Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC"
  y2038: sh: remove timeval/timespec usage from headers
  y2038: sparc: remove use of struct timex
  y2038: rename itimerval to __kernel_old_itimerval
  y2038: remove obsolete jiffies conversion functions
  nfs: fscache: use timespec64 in inode auxdata
  nfs: fix timstamp debug prints
  nfs: use time64_t internally
  sunrpc: convert to time64_t for expiry
  drm/etnaviv: avoid deprecated timespec
  drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC
  drm/msm: avoid using 'timespec'
  hfs/hfsplus: use 64-bit inode timestamps
  hostfs: pass 64-bit timestamps to/from user space
  packet: clarify timestamp overflow
  tsacct: add 64-bit btime field
  acct: stop using get_seconds()
  um: ubd: use 64-bit time_t where possible
  xtensa: ISS: avoid struct timeval
  dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD
  ...

5 years agoMerge tag 'printk-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek...
Linus Torvalds [Wed, 29 Jan 2020 22:53:23 +0000 (14:53 -0800)]
Merge tag 'printk-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk

Pull printk update from Petr Mladek:
 "Prevent replaying log on all consoles"

* tag 'printk-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  printk: fix exclusive_console replaying

5 years agoio_uring: add support for epoll_ctl(2)
Jens Axboe [Wed, 8 Jan 2020 22:18:09 +0000 (15:18 -0700)]
io_uring: add support for epoll_ctl(2)

This adds IORING_OP_EPOLL_CTL, which can perform the same work as the
epoll_ctl(2) system call.

Signed-off-by: Jens Axboe <[email protected]>
5 years agoeventpoll: support non-blocking do_epoll_ctl() calls
Jens Axboe [Wed, 8 Jan 2020 22:05:37 +0000 (15:05 -0700)]
eventpoll: support non-blocking do_epoll_ctl() calls

Also make it available outside of epoll, along with the helper that
decides if we need to copy the passed in epoll_event.

Signed-off-by: Jens Axboe <[email protected]>
5 years agoeventpoll: abstract out epoll_ctl() handler
Jens Axboe [Wed, 8 Jan 2020 21:35:13 +0000 (14:35 -0700)]
eventpoll: abstract out epoll_ctl() handler

No functional changes in this patch.

Signed-off-by: Jens Axboe <[email protected]>
5 years agoio_uring: fix linked command file table usage
Jens Axboe [Wed, 29 Jan 2020 20:46:44 +0000 (13:46 -0700)]
io_uring: fix linked command file table usage

We're not consistent in how the file table is grabbed and assigned if we
have a command linked that requires the use of it.

Add ->file_table to the io_op_defs[] array, and use that to determine
when to grab the table instead of having the handlers set it if they
need to defer. This also means we can kill the IO_WQ_WORK_NEEDS_FILES
flag. We always initialize work->files, so io-wq can just check for
that.

Signed-off-by: Jens Axboe <[email protected]>
5 years agoMerge tag 'erofs-for-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang...
Linus Torvalds [Wed, 29 Jan 2020 19:47:08 +0000 (11:47 -0800)]
Merge tag 'erofs-for-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs updates from Gao Xiang:
 "A regression fix, several cleanups and (maybe) plus an upcoming new
  mount api convert patch as a part of vfs update are considered
  available for this cycle.

  All commits have been in linux-next and tested with no smoke out.

  Summary:

   - fix an out-of-bound read access introduced in v5.3, which could
     rarely cause data corruption

   - various cleanup patches"

* tag 'erofs-for-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: clean up z_erofs_submit_queue()
  erofs: fold in postsubmit_is_all_bypassed()
  erofs: fix out-of-bound read for shifted uncompressed block
  erofs: remove void tagging/untagging of workgroup pointers
  erofs: remove unused tag argument while registering a workgroup
  erofs: remove unused tag argument while finding a workgroup
  erofs: correct indentation of an assigned structure inside a function

5 years agoMerge branch 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Wed, 29 Jan 2020 19:45:09 +0000 (11:45 -0800)]
Merge branch 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull adfs updates from Al Viro:
 "adfs stuff for this cycle"

* 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
  fs/adfs: bigdir: Fix an error code in adfs_fplus_read()
  Documentation: update adfs filesystem documentation
  fs/adfs: mostly divorse inode number from indirect disc address
  fs/adfs: super: add support for E and E+ floppy image formats
  fs/adfs: super: extract filesystem block probe
  fs/adfs: dir: remove debug in adfs_dir_update()
  fs/adfs: super: fix inode dropping
  fs/adfs: bigdir: implement directory update support
  fs/adfs: bigdir: calculate and validate directory checkbyte
  fs/adfs: bigdir: directory validation strengthening
  fs/adfs: bigdir: extract directory validation
  fs/adfs: bigdir: factor out directory entry offset calculation
  fs/adfs: newdir: split out directory commit from update
  fs/adfs: newdir: clean up adfs_f_update()
  fs/adfs: newdir: merge adfs_dir_read() into adfs_f_read()
  fs/adfs: newdir: improve directory validation
  fs/adfs: newdir: factor out directory format validation
  fs/adfs: dir: use pointers to access directory head/tails
  fs/adfs: dir: add more efficient iterate() per-format method
  fs/adfs: dir: switch to iterate_shared method
  ...

5 years agoRevert "MAINTAINERS: mptcp@ mailing list is moderated"
Mat Martineau [Wed, 29 Jan 2020 17:41:37 +0000 (09:41 -0800)]
Revert "MAINTAINERS: mptcp@ mailing list is moderated"

This reverts commit 74759e1693311a8d1441de836c4080c192374238.

[email protected] accepts messages from non-subscribers. There was an
invisible and unexpected server-wide rule limiting the number of
recipients for subscribers and non-subscribers alike, and that has now
been turned off for this list.

Cc: Randy Dunlap <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 years agoMerge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Wed, 29 Jan 2020 19:20:24 +0000 (11:20 -0800)]
Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull openat2 support from Al Viro:
 "This is the openat2() series from Aleksa Sarai.

  I'm afraid that the rest of namei stuff will have to wait - it got
  zero review the last time I'd posted #work.namei, and there had been a
  leak in the posted series I'd caught only last weekend. I was going to
  repost it on Monday, but the window opened and the odds of getting any
  review during that... Oh, well.

  Anyway, openat2 part should be ready; that _did_ get sane amount of
  review and public testing, so here it comes"

From Aleksa's description of the series:
 "For a very long time, extending openat(2) with new features has been
  incredibly frustrating. This stems from the fact that openat(2) is
  possibly the most famous counter-example to the mantra "don't silently
  accept garbage from userspace" -- it doesn't check whether unknown
  flags are present[1].

  This means that (generally) the addition of new flags to openat(2) has
  been fraught with backwards-compatibility issues (O_TMPFILE has to be
  defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
  kernels gave errors, since it's insecure to silently ignore the
  flag[2]). All new security-related flags therefore have a tough road
  to being added to openat(2).

  Furthermore, the need for some sort of control over VFS's path
  resolution (to avoid malicious paths resulting in inadvertent
  breakouts) has been a very long-standing desire of many userspace
  applications.

  This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
  (which was a variant of David Drysdale's O_BENEATH patchset[4] which
  was a spin-off of the Capsicum project[5]) with a few additions and
  changes made based on the previous discussion within [6] as well as
  others I felt were useful.

  In line with the conclusions of the original discussion of
  AT_NO_JUMPS, the flag has been split up into separate flags. However,
  instead of being an openat(2) flag it is provided through a new
  syscall openat2(2) which provides several other improvements to the
  openat(2) interface (see the patch description for more details). The
  following new LOOKUP_* flags are added:

  LOOKUP_NO_XDEV:

     Blocks all mountpoint crossings (upwards, downwards, or through
     absolute links). Absolute pathnames alone in openat(2) do not
     trigger this. Magic-link traversal which implies a vfsmount jump is
     also blocked (though magic-link jumps on the same vfsmount are
     permitted).

  LOOKUP_NO_MAGICLINKS:

     Blocks resolution through /proc/$pid/fd-style links. This is done
     by blocking the usage of nd_jump_link() during resolution in a
     filesystem. The term "magic-links" is used to match with the only
     reference to these links in Documentation/, but I'm happy to change
     the name.

     It should be noted that this is different to the scope of
     ~LOOKUP_FOLLOW in that it applies to all path components. However,
     you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
     will *not* fail (assuming that no parent component was a
     magic-link), and you will have an fd for the magic-link.

     In order to correctly detect magic-links, the introduction of a new
     LOOKUP_MAGICLINK_JUMPED state flag was required.

  LOOKUP_BENEATH:

     Disallows escapes to outside the starting dirfd's
     tree, using techniques such as ".." or absolute links. Absolute
     paths in openat(2) are also disallowed.

     Conceptually this flag is to ensure you "stay below" a certain
     point in the filesystem tree -- but this requires some additional
     to protect against various races that would allow escape using
     "..".

     Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
     can trivially beam you around the filesystem (breaking the
     protection). In future, there might be similar safety checks done
     as in LOOKUP_IN_ROOT, but that requires more discussion.

  In addition, two new flags are added that expand on the above ideas:

  LOOKUP_NO_SYMLINKS:

     Does what it says on the tin. No symlink resolution is allowed at
     all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
     can still be used with NOFOLLOW to open an fd for the symlink as
     long as no parent path had a symlink component.

  LOOKUP_IN_ROOT:

     This is an extension of LOOKUP_BENEATH that, rather than blocking
     attempts to move past the root, forces all such movements to be
     scoped to the starting point. This provides chroot(2)-like
     protection but without the cost of a chroot(2) for each filesystem
     operation, as well as being safe against race attacks that
     chroot(2) is not.

     If a race is detected (as with LOOKUP_BENEATH) then an error is
     generated, and similar to LOOKUP_BENEATH it is not permitted to
     cross magic-links with LOOKUP_IN_ROOT.

     The primary need for this is from container runtimes, which
     currently need to do symlink scoping in userspace[7] when opening
     paths in a potentially malicious container.

     There is a long list of CVEs that could have bene mitigated by
     having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
     CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
     few).

  In order to make all of the above more usable, I'm working on
  libpathrs[8] which is a C-friendly library for safe path resolution.
  It features a userspace-emulated backend if the kernel doesn't support
  openat2(2). Hopefully we can get userspace to switch to using it, and
  thus get openat2(2) support for free once it's ready.

  Future work would include implementing things like
  RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
  programs to be sure they don't hit DoSes though stale NFS handles)"

* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Documentation: path-lookup: include new LOOKUP flags
  selftests: add openat2(2) selftests
  open: introduce openat2(2) syscall
  namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
  namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
  namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
  namei: LOOKUP_NO_XDEV: block mountpoint crossing
  namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
  namei: LOOKUP_NO_SYMLINKS: block symlink resolution
  namei: allow set_root() to produce errors
  namei: allow nd_jump_link() to produce errors
  nsfs: clean-up ns_get_path() signature to return int
  namei: only return -ECHILD from follow_dotdot_rcu()

5 years agoMerge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 29 Jan 2020 19:04:49 +0000 (11:04 -0800)]
Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

Pull RCU warning removal from Paul McKenney:
 "A single commit that fixes an embarrassing bug discussed here:

      https://lore.kernel.org/lkml/20200125131425[email protected]/

  which apparently also affects smaller systems"

[ This was sent to Ingo, but since I see the issue on the laptop I use for
  testing during the merge window, I'm doing the pull directly     - Linus ]

* 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu: Forgive slow expedited grace periods at boot time

5 years agoMerge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 29 Jan 2020 19:03:21 +0000 (11:03 -0800)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core fixes from Ingo Molnar:
 "Three objtool fixes, plus marking SFI as obsolete"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Skip samples subdirectory
  objtool: Fix ARCH=x86_64 build error
  objtool: Silence build output
  MAINTAINERS: Mark simple firmware interface (SFI) obsolete

5 years agoMerge tag 'char-misc-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Wed, 29 Jan 2020 18:35:54 +0000 (10:35 -0800)]
Merge tag 'char-misc-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here is the big char/misc/whatever driver changes for 5.6-rc1

  Included in here are loads of things from a variety of different
  driver subsystems:
   - soundwire updates
   - binder updates
   - nvmem updates
   - firmware drivers updates
   - extcon driver updates
   - various misc driver updates
   - fpga driver updates
   - interconnect subsystem and driver updates
   - bus driver updates
   - uio driver updates
   - mei driver updates
   - w1 driver cleanups
   - various other small driver updates

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (86 commits)
  mei: me: add jasper point DID
  char: hpet: Use flexible-array member
  binder: fix log spam for existing debugfs file creation.
  mei: me: add comet point (lake) H device ids
  nvmem: add QTI SDAM driver
  dt-bindings: nvmem: add binding for QTI SPMI SDAM
  dt-bindings: imx-ocotp: Add i.MX8MP compatible
  dt-bindings: soundwire: fix example
  soundwire: cadence: fix kernel-doc parameter descriptions
  soundwire: intel: report slave_ids for each link to SOF driver
  siox: Use the correct style for SPDX License Identifier
  w1: omap-hdq: Simplify driver with PM runtime autosuspend
  firmware: stratix10-svc: Remove unneeded semicolon
  firmware: google: Probe for a GSMI handler in firmware
  firmware: google: Unregister driver_info on failure and exit in gsmi
  firmware: google: Release devices before unregistering the bus
  slimbus: qcom: add missed clk_disable_unprepare in remove
  slimbus: Use the correct style for SPDX License Identifier
  slimbus: qcom-ngd-ctrl: Use dma_request_chan() instead dma_request_slave_channel()
  dt-bindings: SLIMBus: add slim devices optional properties
  ...

5 years agoMerge tag 'driver-core-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 29 Jan 2020 18:18:20 +0000 (10:18 -0800)]
Merge tag 'driver-core-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is a small set of changes for 5.6-rc1 for the driver core and
  some firmware subsystem changes.

  Included in here are:
   - device.h splitup like you asked for months ago
   - devtmpfs minor cleanups
   - firmware core minor changes
   - debugfs fix for lockdown mode
   - kernfs cleanup fix
   - cpu topology minor fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (22 commits)
  firmware: Rename FW_OPT_NOFALLBACK to FW_OPT_NOFALLBACK_SYSFS
  devtmpfs: factor out common tail of devtmpfs_{create,delete}_node
  devtmpfs: initify a bit
  devtmpfs: simplify initialization of mount_dev
  devtmpfs: factor out setup part of devtmpfsd()
  devtmpfs: fix theoretical stale pointer deref in devtmpfsd()
  driver core: platform: fix u32 greater or equal to zero comparison
  cpu-topology: Don't error on more than CONFIG_NR_CPUS CPUs in device tree
  debugfs: Return -EPERM when locked down
  driver core: Print device when resources present in really_probe()
  driver core: Fix test_async_driver_probe if NUMA is disabled
  driver core: platform: Prevent resouce overflow from causing infinite loops
  fs/kernfs/dir.c: Clean code by removing always true condition
  component: do not dereference opaque pointer in debugfs
  drivers/component: remove modular code
  debugfs: Fix warnings when building documentation
  device.h: move 'struct driver' stuff out to device/driver.h
  device.h: move 'struct class' stuff out to device/class.h
  device.h: move 'struct bus' stuff out to device/bus.h
  device.h: move dev_printk()-like functions to dev_printk.h
  ...

5 years agoMerge tag 'staging-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Wed, 29 Jan 2020 18:15:11 +0000 (10:15 -0800)]
Merge tag 'staging-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging and IIO updates from Greg KH:
 "Here is the big staging/iio driver patches for 5.6-rc1

  Included in here are:

   - lots of new IIO drivers and updates for that subsystem

   - the usual huge quantity of minor cleanups for staging drivers

   - removal of the following staging drivers:
       - isdn/avm
       - isdn/gigaset
       - isdn/hysdn
       - octeon-usb
       - octeon ethernet

  Overall we deleted far more lines than we added, removing over 40k of
  old and obsolete driver code.

  All of these changes have been in linux-next for a while with no
  reported issues"

* tag 'staging-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (353 commits)
  staging: most: usb: check for NULL device
  staging: next: configfs: fix release link
  staging: most: core: fix logging messages
  staging: most: core: remove container struct
  staging: most: remove struct device core driver
  staging: most: core: drop device reference
  staging: most: remove device from interface structure
  staging: comedi: drivers: fix spelling mistake "to" -> "too"
  staging: exfat: remove fs_func struct.
  staging: wilc1000: avoid mutex unlock without lock in wilc_wlan_handle_txq()
  staging: wilc1000: return zero on success and non-zero on function failure
  staging: axis-fifo: replace spinlock with mutex
  staging: wilc1000: remove unused code prior to throughput enhancement in SPI
  staging: wilc1000: added 'wilc_' prefix for 'struct assoc_resp' name
  staging: wilc1000: move firmware API struct's to separate header file
  staging: wilc1000: remove use of infinite loop conditions
  staging: kpc2000: rename variables with kpc namespace
  staging: vt6656: Remove memory buffer from vnt_download_firmware.
  staging: vt6656: Just check NEWRSR_DECRYPTOK for RX_FLAG_DECRYPTED.
  staging: vt6656: Use vnt_rx_tail struct for tail variables.
  ...

5 years agoMerge tag 'tty-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Wed, 29 Jan 2020 18:13:27 +0000 (10:13 -0800)]
Merge tag 'tty-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here are the big set of tty and serial driver updates for 5.6-rc1

  Included in here are:
   - dummy_con cleanups (touches lots of arch code)
   - sysrq logic cleanups (touches lots of serial drivers)
   - samsung driver fixes (wasn't really being built)
   - conmakeshash move to tty subdir out of scripts
   - lots of small tty/serial driver updates

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (140 commits)
  tty: n_hdlc: Use flexible-array member and struct_size() helper
  tty: baudrate: SPARC supports few more baud rates
  tty: baudrate: Synchronise baud_table[] and baud_bits[]
  tty: serial: meson_uart: Add support for kernel debugger
  serial: imx: fix a race condition in receive path
  serial: 8250_bcm2835aux: Document struct bcm2835aux_data
  serial: 8250_bcm2835aux: Use generic remapping code
  serial: 8250_bcm2835aux: Allocate uart_8250_port on stack
  serial: 8250_bcm2835aux: Suppress register_port error on -EPROBE_DEFER
  serial: 8250_bcm2835aux: Suppress clk_get error on -EPROBE_DEFER
  serial: 8250_bcm2835aux: Fix line mismatch on driver unbind
  serial_core: Remove unused member in uart_port
  vt: Correct comment documenting do_take_over_console()
  vt: Delete comment referencing non-existent unbind_con_driver()
  arch/xtensa/setup: Drop dummy_con initialization
  arch/x86/setup: Drop dummy_con initialization
  arch/unicore32/setup: Drop dummy_con initialization
  arch/sparc/setup: Drop dummy_con initialization
  arch/sh/setup: Drop dummy_con initialization
  arch/s390/setup: Drop dummy_con initialization
  ...

5 years agoMerge tag 'usb-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Wed, 29 Jan 2020 18:09:44 +0000 (10:09 -0800)]
Merge tag 'usb-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB/Thunderbolt/PHY driver updates from Greg KH:
 "Here is the big USB and Thunderbolt and PHY driver updates for
  5.6-rc1.

  With the advent of USB4, "Thunderbolt" has really become USB4, so the
  renaming of the Kconfig option and starting to share subsystem code
  has begun, hence both subsystems coming in through the same tree here.

  PHY driver updates also touched USB drivers, so that is coming in
  through here as well.

  Major stuff included in here are:
   - USB 4 initial support added (i.e. Thunderbolt)
   - musb driver updates
   - USB gadget driver updates
   - PHY driver updates
   - USB PHY driver updates
   - lots of USB serial stuff fixed up
   - USB typec updates
   - USB-IP fixes
   - lots of other smaller USB driver updates

  All of these have been in linux-next for a while now (the usb-serial
  tree is already tested in linux-next on its own before merged into
  here), with no reported issues"

[ Removed an incorrect compile test enablement for PHY_EXYNOS5250_SATA
  that causes configuration warnings    - Linus ]

* tag 'usb-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (207 commits)
  Doc: ABI: add usb charger uevent
  usb: phy: show USB charger type for user
  usb: cdns3: fix spelling mistake and rework grammar in text
  usb: phy: phy-gpio-vbus-usb: Convert to GPIO descriptors
  USB: serial: cyberjack: fix spelling mistake "To" -> "Too"
  USB: serial: ir-usb: simplify endpoint check
  USB: serial: ir-usb: make set_termios synchronous
  USB: serial: ir-usb: fix IrLAP framing
  USB: serial: ir-usb: fix link-speed handling
  USB: serial: ir-usb: add missing endpoint sanity check
  usb: typec: fusb302: fix "op-sink-microwatt" default that was in mW
  usb: typec: wcove: fix "op-sink-microwatt" default that was in mW
  usb: dwc3: pci: add ID for the Intel Comet Lake -V variant
  usb: typec: tcpci: mask event interrupts when remove driver
  usb: host: xhci-tegra: set MODULE_FIRMWARE for tegra186
  usb: chipidea: add inline for ci_hdrc_host_driver_init if host is not defined
  usb: chipidea: handle single role for usb role class
  usb: musb: fix spelling mistake: "periperal" -> "peripheral"
  phy: ti: j721e-wiz: Fix build error without CONFIG_OF_ADDRESS
  USB: usbfs: Always unlink URBs in reverse order
  ...

5 years agoMerge tag 'pinctrl-v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Wed, 29 Jan 2020 17:51:36 +0000 (09:51 -0800)]
Merge tag 'pinctrl-v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control updates from Linus Walleij:
 "This is the bulk of pin control changes, nothing too exciting about
  this.

  Some changes hit arch/sh and arch/arm but are well isolated and
  acknowledged by the respective arch maintainers.

  Core changes:

   - Dropped the chained IRQ setup callback into GPIOLIB as we got rid
     of the last users of that in this changeset.

  New drivers:

   - New driver for Ingenic X1830.

   - New driver for Freescale i.MX8MP.

  Driver enhancements:

   - Fix all remaining Intel drivers to pass their IRQ chips along with
     the GPIO chips.

   - Intel Baytrail allocates its irqchip dynamically.

   - Intel Lynxpoint is thoroughly rewritten and modernized.

   - Aspeed AST2600 pin muxing and configuration is much improved.

   - Qualcomm SC7180 functions are updated and wakeup interrupt map is
     provided.

   - A whole slew of Renesas SH-PFC cleanups and improvements.

   - Fix up the Intel DT bindings to use the generic YAML DT bindings
     schema (a first user of this)"

* tag 'pinctrl-v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (99 commits)
  pinctrl: madera: Remove extra blank line
  pinctrl: qcom: Don't lock around irq_set_irq_wake()
  pinctrl: mvebu: armada-37xx: use use platform api
  gpio: Drop the chained IRQ handler assign function
  pinctrl: freescale: Add i.MX8MP pinctrl driver support
  dt-bindings: imx: Add pinctrl binding doc for i.MX8MP
  pinctrl: tigerlake: Tiger Lake uses _HID enumeration
  pinctrl: sunrisepoint: Add Coffee Lake-S ACPI ID
  pinctrl: iproc: Use platform_get_irq_optional() to avoid error message
  pinctrl: dt-bindings: Fix some errors in the lgm and pinmux schema
  pinctrl: intel: Pass irqchip when adding gpiochip
  pinctrl: intel: Add GPIO <-> pin mapping ranges via callback
  pinctrl: baytrail: Replace WARN with dev_info_once when setting direct-irq pin to output
  pinctrl: baytrail: Do not clear IRQ flags on direct-irq enabled pins
  pinctrl: sunrisepoint: Add missing Interrupt Status register offset
  pinctrl: sh-pfc: Split R-Car H3 support in two independent drivers
  pinctrl: artpec6: fix __iomem on reg in set
  pinctrl: ingenic: Use devm_platform_ioremap_resource()
  pinctrl: ingenic: Factorize irq_set_type function
  pinctrl: ingenic: Remove duplicated ingenic_chip_info structures
  ...

5 years agoMerge tag 'gpio-v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux...
Linus Torvalds [Wed, 29 Jan 2020 17:43:39 +0000 (09:43 -0800)]
Merge tag 'gpio-v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO updates from Linus Walleij:
 "This is the bulk of GPIO changes for the v5.6 kernel cycle.

  This is a pretty calm cycle so far, nothing special going on really.
  Some more changes will come in from the irqchip and pin control trees.

  I also deleted an orphan include file for FMC that was dangling since
  subsystem was removed.

  Core changes:

   - Document the usecases for the kernelspace vs userspace handling of
     GPIOs.

   - Handle MSI (message signalled interrupts) properly in the core
     hierarchical irqdomain code.

   - Fix a rare race condition while initializing the descriptor array.

  New drivers:

   - Xylon LogiCVC GPIO driver.

   - WDC934x GPIO controller driver.

  Driver improvements:

   - Implemented suspend/resume in the Tegra driver.

   - MPC8xx edge detection fixup.

   - Properly convert ThunderX to use hierarchical irqdomain with
     GPIOLIB_IRQCHIP on top of the revert of the previous buggy
     switchover. This time it works (hopefully).

  Misc:

   - Drop a FMC remnant file <linux/ipmi-fru.h>

   - A slew of fixes"

* tag 'gpio-v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (48 commits)
  MAINTAINERS: Replace Tien Hock Loh as Altera PIO maintainer
  gpiolib: hold gpio devices lock until ->descs array is initialised
  gpio: aspeed-sgpio: fixed typos
  gpio: mvebu: clear irq in edge cause register before unmask edge irq
  gpiolib: Lower verbosity when allocating hierarchy irq
  gpiolib: Remove duplicated function gpio_do_set_config()
  gpio: Fix the no return statement warning
  gpio: wcd934x: Add support to wcd934x gpio controller
  gpiolib: remove set but not used variable 'config'
  gpio: vx855: fixed a typo
  gpio: mockup: sort headers alphabetically
  gpio: mockup: update the license tag
  gpio: Remove the unused flags
  gpiolib: Set lockdep class for hierarchical irq domains
  gpio: thunderx: Switch to GPIOLIB_IRQCHIP
  gpiolib: Add the support for the msi parent domain
  gpiolib: Add support for the irqdomain which doesn't use irq_fwspec as arg
  gpio: Add use guidance documentation
  dt-bindings: gpio: wcd934x: Add bindings for gpio
  gpio: altera: change to platform_get_irq_optional to avoid false-positive error
  ...

5 years agoMerge branch 'mptcp-fix-sockopt-crash-and-lockdep-splats'
David S. Miller [Wed, 29 Jan 2020 16:45:20 +0000 (17:45 +0100)]
Merge branch 'mptcp-fix-sockopt-crash-and-lockdep-splats'

Florian Westphal says:

====================
mptcp: fix sockopt crash and lockdep splats

Christoph Paasch reported a few bugs and lockdep splats triggered by
syzkaller.

One patch fixes a crash in set/getsockopt.
Two patches fix lockdep splats related to the order in which RTNL
and socket lock are taken.

Last patch fixes out-of-bounds access when TCP syncookies are used.

Change since last iteration on mptcp-list:
 - add needed refcount in patch 2
 - call tcp_get/setsockopt directly in patch 2

 Other patches unchanged except minor amends to commit messages.
====================

Signed-off-by: David S. Miller <[email protected]>
5 years agomptcp: handle tcp fallback when using syn cookies
Florian Westphal [Wed, 29 Jan 2020 14:54:46 +0000 (15:54 +0100)]
mptcp: handle tcp fallback when using syn cookies

We can't deal with syncookie mode yet, the syncookie rx path will create
tcp reqsk, i.e. we get OOB access because we treat tcp reqsk as mptcp reqsk one:

TCP: SYN flooding on port 20002. Sending cookies.
BUG: KASAN: slab-out-of-bounds in subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191
Read of size 1 at addr ffff8881167bc148 by task syz-executor099/2120
 subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191
 tcp_get_cookie_sock+0xcf/0x520 net/ipv4/syncookies.c:209
 cookie_v6_check+0x15a5/0x1e90 net/ipv6/syncookies.c:252
 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1123 [inline]
 [..]

Bug can be reproduced via "sysctl net.ipv4.tcp_syncookies=2".

Note that MPTCP should work with syncookies (4th ack would carry needed
state), but it appears better to sort that out in -next so do tcp
fallback for now.

I removed the MPTCP ifdef for tcp_rsk "is_mptcp" member because
if (IS_ENABLED()) is easier to read than "#ifdef IS_ENABLED()/#endif" pair.

Cc: Eric Dumazet <[email protected]>
Fixes: cec37a6e41aae7bf ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reported-by: Christoph Paasch <[email protected]>
Tested-by: Christoph Paasch <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 years agomptcp: avoid a lockdep splat when mcast group was joined
Florian Westphal [Wed, 29 Jan 2020 14:54:45 +0000 (15:54 +0100)]
mptcp: avoid a lockdep splat when mcast group was joined

syzbot triggered following lockdep splat:

ffffffff82d2cd40 (rtnl_mutex){+.+.}, at: ip_mc_drop_socket+0x52/0x180
but task is already holding lock:
ffff8881187a2310 (sk_lock-AF_INET){+.+.}, at: mptcp_close+0x18/0x30

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:
-> #1 (sk_lock-AF_INET){+.+.}:
       lock_acquire+0xee/0x230
       lock_sock_nested+0x89/0xc0
       do_ip_setsockopt.isra.0+0x335/0x22f0
       ip_setsockopt+0x35/0x60
       tcp_setsockopt+0x5d/0x90
       __sys_setsockopt+0xf3/0x190
       __x64_sys_setsockopt+0x61/0x70
       do_syscall_64+0x72/0x300
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
-> #0 (rtnl_mutex){+.+.}:
       check_prevs_add+0x2b7/0x1210
       __lock_acquire+0x10b6/0x1400
       lock_acquire+0xee/0x230
       __mutex_lock+0x120/0xc70
       ip_mc_drop_socket+0x52/0x180
       inet_release+0x36/0xe0
       __sock_release+0xfd/0x130
       __mptcp_close+0xa8/0x1f0
       inet_release+0x7f/0xe0
       __sock_release+0x69/0x130
       sock_close+0x18/0x20
       __fput+0x179/0x400
       task_work_run+0xd5/0x110
       do_exit+0x685/0x1510
       do_group_exit+0x7e/0x170
       __x64_sys_exit_group+0x28/0x30
       do_syscall_64+0x72/0x300
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

The trigger is:
  socket(AF_INET, SOCK_STREAM, 0x106 /* IPPROTO_MPTCP */) = 4
  setsockopt(4, SOL_IP, MCAST_JOIN_GROUP, {gr_interface=7, gr_group={sa_family=AF_INET, sin_port=htons(20003), sin_addr=inet_addr("224.0.0.2")}}, 136) = 0
  exit(0)

Which results in a call to rtnl_lock while we are holding
the parent mptcp socket lock via
mptcp_close -> lock_sock(msk) -> inet_release -> ip_mc_drop_socket -> rtnl_lock().

>From lockdep point of view we thus have both
'rtnl_lock; lock_sock' and 'lock_sock; rtnl_lock'.

Fix this by stealing the msk conn_list and doing the subflow close
without holding the msk lock.

Fixes: cec37a6e41aae7bf ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reported-by: Christoph Paasch <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
This page took 0.201241 seconds and 4 git commands to generate.