Git Repo - linux.git/log

libceph: don't set global_id until we get an auth ticket

With the introduction of enforcing mode, setting global_id as soon
as we get it in the first MAuth reply will result in EACCES if the
connection is reset before we get the second MAuth reply containing
an auth ticket -- because on retry we would attempt to reclaim that
global_id with no auth ticket at hand.

Neither ceph_auth_client nor ceph_mon_client depend on global_id
being set ealy, so just delay the setting until we get and process
the second MAuth reply. While at it, complain if the monitor sends
a zero global_id or changes our global_id as the session is likely
to fail after that.

Cc: [email protected] # needs backporting for < 5.11
Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Sage Weil <[email protected]>

libceph: bump CephXAuthenticate encoding version

A dummy v3 encoding (exactly the same as v2) was introduced so that
the monitors can distinguish broken clients that may not include their
auth ticket in CEPHX_GET_AUTH_SESSION_KEY request on reconnects, thus
failing to prove previous possession of their global_id (one part of
CVE-2021-20288).

The kernel client has always included its auth ticket, so it is
compatible with enforcing mode as is. However we want to bump the
encoding version to avoid having to authenticate twice on the initial
connect -- all legacy (CephXAuthenticate < v3) are now forced do so in
order to expose insecure global_id reclaim.

Marking for stable since at least for 5.11 and 5.12 it is trivial
(v2 -> v3).

Cc: [email protected] # 5.11+
URL: https://tracker.ceph.com/issues/50452
Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Sage Weil <[email protected]>

ceph: don't allow access to MDS-private inodes

The MDS reserves a set of inodes for its own usage, and these should
never be accessible to clients. Add a new helper to vet a proposed
inode number against that range, and complain loudly and refuse to
create or look it up if it's in it.

Also, ensure that the MDS doesn't try to delegate inodes that are in
that range or lower. Print a warning if it does, and don't save the
range in the xarray.

URL: https://tracker.ceph.com/issues/49922
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Xiubo Li <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: fix up some bare fetches of i_size

We need to use i_size_read(), which properly handles the torn read
case on 32-bit arches.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: convert some PAGE_SIZE invocations to thp_size()

Start preparing to allow the use of THPs in the pagecache with ceph by
making it use thp_size() in lieu of PAGE_SIZE in the appropriate places.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: support getting ceph.dir.rsnaps vxattr

Add support for grabbing the rsnaps value out of the inode info in
traces, and exposing that via ceph.dir.rsnaps xattr.

Signed-off-by: Yanhu Cao <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: drop pinned_page parameter from ceph_get_caps

All of the existing callers that don't set this to NULL just drop the
page reference at some arbitrary point later in processing. There's no
point in keeping a page reference that we don't use, so just drop the
reference immediately after checking the Uptodate flag.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: fix inode leak on getattr error in __fh_to_dentry

Fixes: 878dabb64117 ("ceph: don't return -ESTALE if there's still an open file")
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Xiubo Li <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: only check pool permissions for regular files

There is no need to do a ceph_pool_perm_check() on anything that isn't a
regular file, as the MDS is what handles talking to the OSD in those
cases. Just return 0 if it's not a regular file.

Reported-by: Luis Henriques <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Xiubo Li <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: send opened files/pinned caps/opened inodes metrics to MDS daemon

For the old ceph version, if it received this metric info, it will just
ignore them.

URL: https://tracker.ceph.com/issues/46866
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: avoid counting the same request twice or more

If the request will retry, skip updating the latency metric.

Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: rename the metric helpers

Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: fix kerneldoc copypasta over ceph_start_io_direct

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: use attach/detach_page_private for tracking snap context

There is some ambiguity around the use of PagePrivate. It's
generally expected in core code that if PagePrivate is set then
you have a reference to it. It's not clear that ceph always
does (and I believe it may not).

Change ceph to use attach/detach_page_private so that we keep a
reference to the page until the snap context is detached.

Link: https://lore.kernel.org/ceph-devel/[email protected]/T/#mf29e5abbb0ec8035cde0de30778690de7d956f84
Reported-by: David Howells <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: don't use d_add in ceph_handle_snapdir

It's possible ceph_get_snapdir could end up finding a (disconnected)
inode that already exists in the cache. Change the prototype for
ceph_handle_snapdir to return a dentry pointer and have it use
d_splice_alias so we don't end up with an aliased dentry in the cache.

Reported-by: Al Viro <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: don't clobber i_snap_caps on non-I_NEW inode

We want the snapdir to mirror the non-snapped directory's attributes for
most things, but i_snap_caps represents the caps granted on the snapshot
directory by the MDS itself. A misbehaving MDS could issue different
caps for the snapdir and we lose them here.

Only reset i_snap_caps when the inode is I_NEW. Also, move the setting
of i_op and i_fop inside the if block since they should never change
anyway.

Reported-by: Al Viro <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple
of warnings by explicitly adding a break and a goto statements instead
of just letting the code fall through to the next case.

URL: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: convert ceph_readpages to ceph_readahead

Convert ceph_readpages to ceph_readahead and make it use
netfs_readahead. With this we can rip out a lot of the old
readpage/readpages infrastructure.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: convert ceph_write_begin to netfs_write_begin

Convert ceph_write_begin to use the netfs_write_begin helper. Most of
the ops we need for it are already in place from the readpage conversion
but we do add a new check_write_begin op since ceph needs to be able to
vet whether there is an incompatible writeback already in flight before
reading in the page.

With this, we can also remove the old ceph_do_readpage helper.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: convert ceph_readpage to netfs_readpage

Have the ceph KConfig select NETFS_SUPPORT. Add a new netfs ops
structure and the operations for it. Convert ceph_readpage to use
the new netfs_readpage helper.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: fix fscache invalidation

Ensure that we invalidate the fscache whenever we invalidate the
pagecache.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: rework PageFsCache handling

With the new fscache API, the PageFsCache bit now indicates that the
page is being written to the cache and shouldn't be modified or released
until it's finished.

Change releasepage and invalidatepage to wait on that bit before
returning.

Also define FSCACHE_USE_NEW_IO_API so that we opt into the new fscache
API.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: rip out old fscache readpage handling

With the new netfs read helper functions, we won't need a lot of this
infrastructure as it handles the pagecache pages itself. Rip out the
read handling for now, and much of the old infrastructure that deals in
individual pages.

The cookie handling is mostly unchanged, however.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

Merge remote-tracking branch 'dhowells/netfs-lib'

Pick up David Howells' netfs helper library and the new fscache API.

Signed-off-by: Ilya Dryomov <[email protected]>

netfs: Miscellaneous fixes

Fix some miscellaneous things in the new netfs lib[1]:

(1) The kerneldoc for netfs_readpage() shouldn't say netfs_page().

(2) netfs_readpage() can get an integer overflow on 32-bit when it
     multiplies page_index(page) by PAGE_SIZE.  It should use
     page_file_offset() instead.

(3) netfs_write_begin() should use page_offset() to avoid the same
     overflow.

Note that netfs_readpage() needs to use page_file_offset() rather than
page_offset() as it may see swap-over-NFS.

Reported-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Link: https://lore.kernel.org/r/161789062190.6155.12711584466338493050.stgit@warthog.procyon.org.uk/

iov_iter: Four fixes for ITER_XARRAY

Fix four things[1] in the patch that adds ITER_XARRAY[2]:

(1) Remove the address_space struct predeclaration.  This is a holdover
     from when it was ITER_MAPPING.

(2) Fix _copy_mc_to_iter() so that the xarray segment updates count and
     iov_offset in the iterator before returning.

(3) Fix iov_iter_alignment() to not loop in the xarray case.  Because the
     middle pages are all whole pages, only the end pages need be
     considered - and this can be reduced to just looking at the start
     position in the xarray and the iteration size.

(4) Fix iov_iter_advance() to limit the size of the advance to no more
     than the remaining iteration size.

Reported-by: Al Viro <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/161918448151.3145707.11541538916600921083.stgit@warthog.procyon.org.uk

Linux 5.12

Merge tag 'perf-tools-fixes-for-v5.12-2021-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

- Fix potential NULL pointer dereference in the auxtrace option parser

- Fix access to PID in an array when setting a PID filter in 'perf ftrace'

- Fix error return code in the 'perf data' tool and in maps__clone(),
   found using a static analysis tool from Huawei

* tag 'perf-tools-fixes-for-v5.12-2021-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf map: Fix error return code in maps__clone()
  perf ftrace: Fix access to pid in array when setting a pid filter
  perf auxtrace: Fix potential NULL pointer dereference
  perf data: Fix error return code in perf_data__create_dir()

Merge tag 'perf_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf fixes from Borislav Petkov:

- Fix Broadwell Xeon's stepping in the PEBS isolation table of CPUs

- Fix a panic when initializing perf uncore machinery on Haswell and
   Broadwell servers

* tag 'perf_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[]
  perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3

Merge tag 'locking_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Borislav Petkov:
"Fix ordering in the queued writer lock's slowpath"

* tag 'locking_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/qrwlock: Fix ordering in queued_write_lock_slowpath()

Merge tag 'sched_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Borislav Petkov:
"Fix a typo in a macro ifdeffery"

* tag 'sched_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
preempt/dynamic: Fix typo in macro conditional statement

Merge tag 'x86_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:
"Fix an out-of-bounds memory access when setting up a crash kernel with
kexec"

* tag 'x86_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fix from Paolo Bonzini:
"Fix SRCU bug introduced in the merge window"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/xen: Take srcu lock when accessing kvm_memslots()

Revert "net/rds: Avoid potential use after free in rds_send_remove_from_sock"

This reverts commit 0c85a7e87465f2d4cbc768e245f4f45b2f299b05.

The games with 'rm' are on (two separate instances) of a local variable,
and make no difference.

Quoting Aditya Pakki:
"I was the author of the patch and it was the cause of the giant UMN
  revert.

  The patch is garbage and I was unaware of the steps involved in
  retracting it. I *believed* the maintainers would pull it, given it
  was already under Greg's list. The patch does not introduce any bugs
  but is pointless and is stupid. I accept my incompetence and for not
  requesting a revert earlier."

Link: https://lwn.net/Articles/854319/
Requested-by: Aditya Pakki <[email protected]>
Cc: Santosh Shilimkar <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'pinctrl-v5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
"Late pin control fixes, would have been in the main pull request
  normally but hey I got lucky and we got another week to polish up
  v5.12 so here we go.

  One driver fix and one making the core debugfs work:

   - Fix the number of pins in the community of the Intel Lewisburg SoC

   - Show pin numbers for controllers with base = 0 in the new debugfs
     feature"

* tag 'pinctrl-v5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: core: Show pin numbers for the controllers with base = 0
  pinctrl: lewisburg: Update number of pins in community

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"5 patches.

  Subsystems affected by this patch series: coda, overlayfs, and
  mm (pagecache and memcg)"

* emailed patches from Andrew Morton <[email protected]>:
  tools/cgroup/slabinfo.py: updated to work on current kernel
  mm/filemap: fix mapping_seek_hole_data on THP & 32-bit
  mm/filemap: fix find_lock_entries hang on 32-bit THP
  ovl: fix reference counting in ovl_mmap error path
  coda: fix reference counting in coda_file_mmap error path

Merge tag 'block-5.12-2021-04-23' of git://git.kernel.dk/linux-block

Pull block fix from Jens Axboe:
"A single fix for a behavioral regression in this series, when
re-reading the partition table with partitions open"

* tag 'block-5.12-2021-04-23' of git://git.kernel.dk/linux-block:
block: return -EBUSY when there are open partitions in blkdev_reread_part

tools/cgroup/slabinfo.py: updated to work on current kernel

slabinfo.py script does not work with actual kernel version.

First, it was unable to recognise SLUB susbsytem, and when I specified
it manually it failed again with

  AttributeError: 'struct page' has no member 'obj_cgroups'

.. and then again with

  File "tools/cgroup/memcg_slabinfo.py", line 221, in main
    memcg.kmem_caches.address_of_(),
  AttributeError: 'struct mem_cgroup' has no member 'kmem_caches'

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vasily Averin <[email protected]>
Tested-by: Roman Gushchin <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/filemap: fix mapping_seek_hole_data on THP & 32-bit

No problem on 64-bit, or without huge pages, but xfstests generic/285
and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs, and on
32-bit architectures, with the new mapping_seek_hole_data(). Several
different bugs turned out to need fixing.

u64 cast to stop losing bits when converting unsigned long to loff_t
(and let's use shifts throughout, rather than mixed with * and /).

Use round_up() when advancing pos, to stop assuming that pos was already
THP-aligned when advancing it by THP-size. (This use of round_up()
assumes that any THP has THP-aligned index: true at present and true
going forward, but could be recoded to avoid the assumption.)

Use xas_set() when iterating away from a THP, so that xa_index stays in
synch with start, instead of drifting away to return bogus offset.

Check start against end to avoid wrapping 32-bit xa_index to 0 (and to
handle these additional cases, seek_data or not, it's easier to break
the loop than goto: so rearrange exit from the function).

[[email protected]: remove unneeded u64 casts, per Matthew]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 41139aa4c3a3 ("mm/filemap: add mapping_seek_hole_data")
Signed-off-by: Hugh Dickins <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: William Kucharski <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/filemap: fix find_lock_entries hang on 32-bit THP

No problem on 64-bit, or without huge pages, but xfstests generic/308
hung uninterruptibly on 32-bit huge tmpfs.

Since commit 0cc3b0ec23ce ("Clarify (and fix) in 4.13 MAX_LFS_FILESIZE
macros"), MAX_LFS_FILESIZE is only a PAGE_SIZE away from wrapping 32-bit
xa_index to 0, so the new find_lock_entries() has to be extra careful
when handling a THP.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5c211ba29deb ("mm: add and use find_lock_entries")
Signed-off-by: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: William Kucharski <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Yang Shi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ovl: fix reference counting in ovl_mmap error path

mmap_region() now calls fput() on the vma->vm_file.

Fix this by using vma_set_file() so it doesn't need to be handled
manually here any more.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1527f926fd04 ("mm: mmap: fix fput in error path v2")
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Cc: Jan Harkes <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: <[email protected]> [5.11+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

coda: fix reference counting in coda_file_mmap error path

mmap_region() now calls fput() on the vma->vm_file.

So we need to drop the extra reference on the coda file instead of the
host file.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1527f926fd04 ("mm: mmap: fix fput in error path v2")
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Acked-by: Jan Harkes <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: <[email protected]> [5.11+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

KVM: x86/xen: Take srcu lock when accessing kvm_memslots()

kvm_memslots() will be called by kvm_write_guest_offset_cached() so we should
take the srcu lock. Let's pull the srcu lock operation from kvm_steal_time_set_preempted()
again to fix xen part.

Fixes: 30b5c851af7 ("KVM: x86/xen: Add support for vCPU runstate information")
Signed-off-by: Wanpeng Li <[email protected]>
Message-Id: <1619166200 [email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge tag 'arm-fixes-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"These should be the final fixes for v5.12.

  There is one fix for SD card detection on one Allwinner board, and a
  few fixes for the Tegra platform that I had already queued up for
  v5.13 due to a communication problem. This addresses MMC device
  ordering on multiple machines, audio support on Jetson AGX Xavier and
  suspend/resume on Jetson TX2"

* tag 'arm-fixes-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: dts: allwinner: Revert SD card CD GPIO for Pine64-LTS
  arm64: tegra: Move clocks from RT5658 endpoint to device node
  arm64: tegra: Fix mmc0 alias for Jetson Xavier NX
  arm64: tegra: Set fw_devlink=on for Jetson TX2
  arm64: tegra: Add unit-address for ACONNECT on Tegra186

perf map: Fix error return code in maps__clone()

Although 'err' has been initialized to -ENOMEM, but it will be reassigned
by the "err = unwind__prepare_access(...)" statement in the for loop. So
that, the value of 'err' is unknown when map__clone() failed.

Fixes: 6c502584438bda63 ("perf unwind: Call unwind__prepare_access for forked thread")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: zhen lei <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Fix access to pid in array when setting a pid filter

Command 'perf ftrace -v -- ls' fails in s390 (at least 5.12.0rc6).

The root cause is a missing pointer dereference which causes an
array element address to be used as PID.

Fix this by extracting the PID.

Output before:
  # ./perf ftrace -v -- ls
  function_graph tracer is used
  write '-263732416' to tracing/set_ftrace_pid failed: Invalid argument
  failed to set ftrace pid
  #

Output after:
   ./perf ftrace -v -- ls
   function_graph tracer is used
   # tracer: function_graph
   #
   # CPU  DURATION                  FUNCTION CALLS
   # |     |   |                     |   |   |   |
   4)               |  rcu_read_lock_sched_held() {
   4)   0.552 us    |    rcu_lockdep_current_cpu_online();
   4)   6.124 us    |  }

Reported-by: Alexander Schmidt <[email protected]>
Signed-off-by: Thomas Richter <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Sumanth Korikkar <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf auxtrace: Fix potential NULL pointer dereference

In the function auxtrace_parse_snapshot_options(), the callback pointer
"itr->parse_snapshot_options" can be NULL if it has not been set during
the AUX record initialization. This can cause tool crashing if the
callback pointer "itr->parse_snapshot_options" is dereferenced without
performing NULL check.

Add a NULL check for the pointer "itr->parse_snapshot_options" before
invoke the callback.

Fixes: d20031bb63dd6dde ("perf tools: Add AUX area tracing Snapshot Mode")
Signed-off-by: Leo Yan <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tiezhu Yang <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Merge tag 'drm-fixes-2021-04-23' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Just some small i915 and amdgpu fixes this week, should be all until
  you open the merge window.

  amdgpu:
   - Fix gpuvm page table update issue
   - Modifier fixes
   - Register fix for dimgrey cavefish

  i915:
   - GVT's BDW regression fix for cmd parser
   - Fix modesetting in case of unexpected AUX timeouts"

* tag 'drm-fixes-2021-04-23' of git://anongit.freedesktop.org/drm/drm:
  drm/amdgpu: fix GCR_GENERAL_CNTL offset for dimgrey_cavefish
  amd/display: allow non-linear multi-planar formats
  drm/amd/display: Update modifier list for gfx10_3
  drm/amdgpu: reserve fence slot to update page table
  drm/i915: Fix modesetting in case of unexpected AUX timeouts
  drm/i915/gvt: Fix BDW command parser regression

Merge tag 'gpio-fixes-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fix from Bartosz Golaszewski:
"Save and restore the sysconfig register in gpio-omap to fix a
power-management issue"

* tag 'gpio-fixes-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: omap: Save and restore sysconfig

Merge branch 'tegra/dt64' into arm/fixes

arm64: tegra: Device tree fixes for v5.12-rc6

This contains a couple of device tree fixes for the v5.12 release cycle.
These are needed for proper audio support on Jetson AGX Xavier, to boot
the Jetson Xavier NX from an SD card and to be able to suspend/resume
the Jetson TX2.

* tegra/dt64:
  arm64: tegra: Move clocks from RT5658 endpoint to device node
  arm64: tegra: Fix mmc0 alias for Jetson Xavier NX
  arm64: tegra: Set fw_devlink=on for Jetson TX2
  arm64: tegra: Add unit-address for ACONNECT on Tegra186

Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/
Signed-off-by: Arnd Bergmann <[email protected]>

fscache, cachefiles: Add alternate API to use kiocb for read/write to cache

Add an alternate API by which the cache can be accessed through a kiocb,
doing async DIO, rather than using the current API that tells the cache
where all the pages are.

The new API is intended to be used in conjunction with the netfs helper
library.  A filesystem must pick one or the other and not mix them.

Filesystems wanting to use the new API must #define FSCACHE_USE_NEW_IO_API
before #including the header.  This prevents them from continuing to use
the old API at the same time as there are incompatibilities in how the
PG_fscache page bit is used.

Changes:
v6:
- Provide a routine to shape a write so that the start and length can be
   aligned for DIO[3].

v4:
- Use the vfs_iocb_iter_read/write() helpers[1]
- Move initial definition of fscache_begin_read_operation() here.
- Remove a commented-out line[2]
- Combine ki->term_func calls in cachefiles_read_complete()[2].
- Remove explicit NULL initialiser[2].
- Remove extern on func decl[2].
- Put in param names on func decl[2].
- Remove redundant else[2].
- Fill out the kdoc comment for fscache_begin_read_operation().
- Rename fs/fscache/page2.c to io.c to match later patches.

Signed-off-by: David Howells <[email protected]>
Reviewed-and-tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/161781047695.463527.7463536103593997492.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161118142558.1232039.17993829899588971439.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161161037850.2537118.8819808229350326503.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161340402057.1303470.8038373593844486698.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161539545919.286939.14573472672781434757.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161653801477.2770958.10543270629064934227.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161789084517.6155.12799689829859169640.stgit@warthog.procyon.org.uk/

netfs: Add a tracepoint to log failures that would be otherwise unseen

Add a tracepoint to log internal failures (such as cache errors) that we
don't otherwise want to pass back to the netfs.

Signed-off-by: David Howells <[email protected]>
Tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/161781048813.463527.1557000804674707986.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161789082749.6155.15498680577213140870.stgit@warthog.procyon.org.uk/

netfs: Define an interface to talk to a cache

Add an interface to the netfs helper library for reading data from the
cache instead of downloading it from the server and support for writing
data just downloaded or cleared to the cache.

The API passes an iov_iter to the cache read/write routines to indicate the
data/buffer to be used.  This is done using the ITER_XARRAY type to provide
direct access to the netfs inode's pagecache.

When the netfs's ->begin_cache_operation() method is called, this must fill
in the cache_resources in the netfs_read_request struct, including the
netfs_cache_ops used by the helper lib to talk to the cache.  The helper
lib does not directly access the cache.

Changes:
v6:
- Call trace_netfs_read() after beginning the cache op so that the cookie
  debug ID can be logged[3].
- Don't record the error from writing to the cache.  We don't want to pass
  it back to the netfs[4].
- Fix copy-to-cache subreq amalgamation to not round up as it goes along
  otherwise it overcalculates the length of the write[5].

v5:
- Use end_page_fscache() rather than unlock_page_fscache()[2].

v4:
- Added flag to netfs_subreq_terminated() to indicate that the caller may
  have been running async and stuff that might sleep needs punting to a
  workqueue (can't use in_softirq()[1]).
- Add missing inc of netfs_n_rh_read stat.
- Move initial definition of fscache_begin_read_operation() elsewhere.
- Need to call op->begin_cache_operation() from netfs_write_begin().

Signed-off-by: David Howells <[email protected]>
Reviewed-and-tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/161781045123.463527.14533348855710902201.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161781046256.463527.18158681600085556192.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161781047695.463527.7463536103593997492.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161118141321.1232039.8296910406755622458.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161161036700.2537118.11170748455436854978.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161340399569.1303470.1138884774643385730.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161539542874.286939.13337898213448136687.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161653799826.2770958.9015430297426331950.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161789081462.6155.3853904866933313256.stgit@warthog.procyon.org.uk/

netfs: Gather stats

Gather statistics from the netfs interface that can be exported through a
seqfile. This is intended to be called by a later patch when viewing
/proc/fs/fscache/stats.

Signed-off-by: David Howells <[email protected]>
Reviewed-and-tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/161118139247.1232039.10556850937548511068.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161161034669.2537118.2761232524997091480.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161340397101.1303470.17581910581108378458.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161539539959.286939.6794352576462965914.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161653797700.2770958.5801990354413178228.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161789079281.6155.17141344853277186500.stgit@warthog.procyon.org.uk/

netfs: Add tracepoints

Add three tracepoints to track the activity of the read helpers:

(1) netfs/netfs_read

     This logs entry to the read helpers and also expansion of the range in
     a readahead request.

(2) netfs/netfs_rreq

     This logs the progress of netfs_read_request objects which track
     read requests.  A read request may be a compound of multiple
     subrequests.

(3) netfs/netfs_sreq

     This logs the progress of netfs_read_subrequest objects, which track
     the contributions from various sources to a read request.

Signed-off-by: David Howells <[email protected]>
Reviewed-and-tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/

netfs: Provide readahead and readpage netfs helpers

Add a pair of helper functions:

(*) netfs_readahead()
(*) netfs_readpage()

to do the work of handling a readahead or a readpage, where the page(s)
that form part of the request may be split between the local cache, the
server or just require clearing, and may be single pages and transparent
huge pages.  This is all handled within the helper.

Note that while both will read from the cache if there is data present,
only netfs_readahead() will expand the request beyond what it was asked to
do, and only netfs_readahead() will write back to the cache.

netfs_readpage(), on the other hand, is synchronous and only fetches the
page (which might be a THP) it is asked for.

The netfs gives the helper parameters from the VM, the cache cookie it
wants to use (or NULL) and a table of operations (only one of which is
mandatory):

(*) expand_readahead() [optional]

     Called to allow the netfs to request an expansion of a readahead
     request to meet its own alignment requirements.  This is done by
     changing rreq->start and rreq->len.

(*) clamp_length() [optional]

     Called to allow the netfs to cut down a subrequest to meet its own
     boundary requirements.  If it does this, the helper will generate
     additional subrequests until the full request is satisfied.

(*) is_still_valid() [optional]

     Called to find out if the data just read from the cache has been
     invalidated and must be reread from the server.

(*) issue_op() [required]

     Called to ask the netfs to issue a read to the server.  The subrequest
     describes the read.  The read request holds information about the file
     being accessed.

     The netfs can cache information in rreq->netfs_priv.

     Upon completion, the netfs should set the error, transferred and can
     also set FSCACHE_SREQ_CLEAR_TAIL and then call
     fscache_subreq_terminated().

(*) done() [optional]

     Called after the pages have been unlocked.  The read request is still
     pinning the file and mapping and may still be pinning pages with
     PG_fscache.  rreq->error indicates any error that has been
     accumulated.

(*) cleanup() [optional]

     Called when the helper is disposing of a finished read request.  This
     allows the netfs to clear rreq->netfs_priv.

Netfs support is enabled with CONFIG_NETFS_SUPPORT=y.  It will be built
even if CONFIG_FSCACHE=n and in this case much of it should be optimised
away, allowing the filesystem to use it even when caching is disabled.

Changes:
v5:
- Comment why netfs_readahead() is putting pages[2].
- Use page_file_mapping() rather than page->mapping[2].
- Use page_index() rather than page->index[2].
- Use set_page_fscache()[3] rather then SetPageFsCache() as this takes an
   appropriate ref too[4].

v4:
- Folded in a kerneldoc comment fix.
- Folded in a fix for the error handling in the case that ENOMEM occurs.
- Added flag to netfs_subreq_terminated() to indicate that the caller may
   have been running async and stuff that might sleep needs punting to a
   workqueue (can't use in_softirq()[1]).

Signed-off-by: David Howells <[email protected]>
Reviewed-and-tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pUm3ww@mail.gmail.com/
Link: https://lore.kernel.org/r/160588497406.3465195.18003475695899726222.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161118136849.1232039.8923686136144228724.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161161032290.2537118.13400578415247339173.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161340394873.1303470.6237319335883242536.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161539537375.286939.16642940088716990995.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161653795430.2770958.4947584573720000554.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161789076581.6155.6745849361504760209.stgit@warthog.procyon.org.uk/

netfs, mm: Add set/end/wait_on_page_fscache() aliases

Add set/end/wait_on_page_fscache() as aliases of
set/end/wait_page_private_2(). These allow a page to marked with
PG_fscache, the flag to be removed and waiters woken and waiting for the
flag to be cleared. A ref on the page is also taken and dropped.

[Linus suggested putting the fscache-themed functions into the
caching-specific headers rather than pagemap.h[1]]

Changes:
v5:
- Mirror the changes to the core routines[2].

Signed-off-by: David Howells <[email protected]>
Tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
cc: Linus Torvalds <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=XAEMTKswPu1Yw4uaro1R3ibs27ztw@mail.gmail.com/
Link: https://lore.kernel.org/r/161340393568.1303470.4997526899111310530.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161539536093.286939.5076448803512118764.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/161653793873.2770958.12157243390965814502.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161789075327.6155.7432127924219092385.stgit@warthog.procyon.org.uk/

netfs, mm: Move PG_fscache helper funcs to linux/netfs.h

Move the PG_fscache related helper funcs (such as SetPageFsCache()) to
linux/netfs.h rather than linux/fscache.h as the intention is to move to a
model where they're used by the network filesystem and the helper library,
but not by fscache/cachefiles itself.

Signed-off-by: David Howells <[email protected]>
Tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/161340392347.1303470.18065131603507621762.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161539534516.286939.6265142985563005000.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161653792959.2770958.5386546945273988117.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161789073997.6155.18442271115255650614.stgit@warthog.procyon.org.uk/

netfs: Make a netfs helper module

Make a netfs helper module to manage read request segmentation, caching
support and transparent huge page support on behalf of a network
filesystem.

Signed-off-by: David Howells <[email protected]>
Reviewed-and-tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/160588496284.3465195.10102643717770106661.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161118135638.1232039.1622182202673126285.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161161031028.2537118.1213974428943508753.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161340391427.1303470.14884950716721956560.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161539531569.286939.18317119181653706665.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161653790328.2770958.6710423217716151549.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161789071202.6155.16519256513958534906.stgit@warthog.procyon.org.uk/

mm: Implement readahead_control pageset expansion

Provide a function, readahead_expand(), that expands the set of pages
specified by a readahead_control object to encompass a revised area with a
proposed size and length.

The proposed area must include all of the old area and may be expanded yet
more by this function so that the edges align on (transparent huge) page
boundaries as allocated.

The expansion will be cut short if a page already exists in either of the
areas being expanded into.  Note that any expansion made in such a case is
not rolled back.

This will be used by fscache so that reads can be expanded to cache granule
boundaries, thereby allowing whole granules to be stored in the cache, but
there are other potential users also.

Changes:
v6:
- Fold in a patch from Matthew Wilcox to tell the ondemand readahead
  algorithm about the expansion so that the next readahead starts at the
  right place[2].

v4:
- Moved the declaration of readahead_expand() to a better place[1].

Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Mike Marshall <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/159974633888.2094769.8326206446358128373.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/160588479816.3465195.553952688795241765.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161118131787.1232039.4863969952441067985.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161161028670.2537118.13831420617039766044.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161340389201.1303470.14353807284546854878.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161539530488.286939.18085961677838089157.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161653789422.2770958.2108046612147345000.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161789069829.6155.4295672417565512161.stgit@warthog.procyon.org.uk/

mm/readahead: Handle ractl nr_pages being modified

Filesystems are not currently permitted to modify the number of pages
in the ractl. An upcoming patch to add readahead_expand() changes that
rule, so remove the check and resync the loop counter after every call
to the filesystem.

Tested-by: Jeff Layton <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/

fs: Document file_ra_state

Turn the comments into kernel-doc and improve the wording slightly.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: David Howells <[email protected]>
Tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/161789068619.6155.1397999970593531574.stgit@warthog.procyon.org.uk/

mm/filemap: Pass the file_ra_state in the ractl

For readahead_expand(), we need to modify the file ra_state, so pass it
down by adding it to the ractl. We have to do this because it's not always
the same as f_ra in the struct file that is already being passed.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: David Howells <[email protected]>
Tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/161789067431.6155.8063840447229665720.stgit@warthog.procyon.org.uk/

mm: Add set/end/wait functions for PG_private_2

Add three functions to manipulate PG_private_2:

(*) set_page_private_2() - Set the flag and take an appropriate reference
     on the flagged page.

(*) end_page_private_2() - Clear the flag, drop the reference and wake up
     any waiters, somewhat analogously with end_page_writeback().

(*) wait_on_page_private_2() - Wait for the flag to be cleared.

Wrappers will need to be placed in the netfs lib header in the patch that
adds that.

[This implements a suggestion by Linus[1] to not mix the terminology of
PG_private_2 and PG_fscache in the mm core function]

Changes:
v7:
- Use compound_head() in all the functions to make them THP safe[6].

v5:
- Add set and end functions, calling the end function end rather than
  unlock[3].
- Keep a ref on the page when PG_private_2 is set[4][5].

v4:
- Remove extern from the declaration[2].

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=XAEMTKswPu1Yw4uaro1R3ibs27ztw@mail.gmail.com/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/161340387944.1303470.7944159520278177652.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161539528910.286939.1252328699383291173.stgit@warthog.procyon.org.uk
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pUm3ww@mail.gmail.com/
Link: https://lore.kernel.org/r/CAHk-=wjSGsRj7xwhSMQ6dAQiz53xA39pOG+XA_WeTgwBBu4uqg@mail.gmail.com/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/161653788200.2770958.9517755716374927208.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161789066013.6155.9816857201817288382.stgit@warthog.procyon.org.uk/

iov_iter: Add ITER_XARRAY

Add an iterator, ITER_XARRAY, that walks through a set of pages attached to
an xarray, starting at a given page and offset and walking for the
specified amount of bytes. The iterator supports transparent huge pages.

The iterate_xarray() macro calls the helper function with rcu_access()
helped. I think that this is only a problem for iov_iter_for_each_range()
- and that returns an error for ITER_XARRAY (also, this function does not
appear to be called).

The caller must guarantee that the pages are all present and they must be
locked using PG_locked, PG_writeback or PG_fscache to prevent them from
going away or being migrated whilst they're being accessed.

This is useful for copying data from socket buffers to inodes in network
filesystems and for transferring data between those inodes and the cache
using direct I/O.

Whilst it is true that ITER_BVEC could be used instead, that would require
a bio_vec array to be allocated to refer to all the pages - which should be
redundant if inode->i_pages also points to all these pages.

Note that older versions of this patch implemented an ITER_MAPPING instead,
which was almost the same.

Changes:
v7:
- Rename iter_xarray_copy_pages() to iter_xarray_populate_pages()[1].

Signed-off-by: David Howells <[email protected]>
Reviewed-and-tested-by: Jeff Layton <[email protected]>
Tested-by: Dave Wysochanski <[email protected]>
Tested-By: Marc Dionne <[email protected]>
cc: Alexander Viro <[email protected]>
cc: Matthew Wilcox (Oracle) <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/158861205740.340223.16592990225607814022.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/159465785214.1376674.6062549291411362531.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/160588477334.3465195.3608963255682568730.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161118129703.1232039.17141248432017826976.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161161026313.2537118.14676007075365418649.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161340386671.1303470.10752208972482479840.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161539527815.286939.14607323792547049341.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161653786033.2770958.14154191921867463240.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/161789064740.6155.11932541175173658065.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'drm-intel-fixes-2021-04-22' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- GVT's BDW regression fix for cmd parser (Zhenyu)
- Fix modesetting in case of unexpected AUX timeouts (Imre)

Signed-off-by: Dave Airlie <[email protected]>
From: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'amd-drm-fixes-5.12-2021-04-21' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.12-2021-04-21:

amdgpu:
- Fix gpuvm page table update issue
- Modifier fixes
- Register fix for dimgrey cavefish

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
"Very late in the cycle but both risky if left unfixed and more or less
  obvious.."

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa/mlx5: Set err = -ENOMEM in case dma_map_sg_attrs fails
  vhost-vdpa: protect concurrent access to vhost device iotlb

vdpa/mlx5: Set err = -ENOMEM in case dma_map_sg_attrs fails

Set err = -ENOMEM if dma_map_sg_attrs() fails so the function reutrns
error.

Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
Signed-off-by: Eli Cohen <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>

vhost-vdpa: protect concurrent access to vhost device iotlb

Protect vhost device iotlb by vhost_dev->mutex. Otherwise,
it might cause corruption of the list and interval tree in
struct vhost_iotlb if userspace sends the VHOST_IOTLB_MSG_V2
message concurrently.

Fixes: 4c8cf318("vhost: introduce vDPA-based backend")
Cc: [email protected]
Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>

Merge tag 'sunxi-fixes-for-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes

One fix for the MMC card detect on the Pine H64 board

* tag 'sunxi-fixes-for-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: dts: allwinner: Revert SD card CD GPIO for Pine64-LTS

Link: https://lore.kernel.org/r/45fc5e4d-ef48-4729-a869-79a8f288bb83.lettre@localhost
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/tpmdd

Pull tpm fix from James Bottomley:
"This is an urgent regression fix for a tpm patch set that went in this
  merge window. It looks like a rebase before the original pull request
  lost a tpm_try_get_ops() so we have a lock imbalance in our code which
  is causing oopses. The original patch was correct on the mailing list.

  I'm sending this in agreement with Mimi (as joint maintainers of
  trusted keys) because Jarkko is off communing with the Reindeer or
  whatever it is Finns do when on holiday"

* tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/tpmdd:
  KEYS: trusted: Fix TPM reservation for seal/unseal

perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[]

The only stepping of Broadwell Xeon parts is stepping 1. Fix the
relevant isolation_ucodes[] entry, which previously enumerated
stepping 2.

Although the original commit was characterized as an optimization, it
is also a workaround for a correctness issue.

If a PMI arrives between kvm's call to perf_guest_get_msrs() and the
subsequent VM-entry, a stale value for the IA32_PEBS_ENABLE MSR may be
restored at the next VM-exit. This is because, unbeknownst to kvm, PMI
throttling may clear bits in the IA32_PEBS_ENABLE MSR. CPUs with "PEBS
isolation" don't suffer from this issue, because perf_guest_get_msrs()
doesn't report the IA32_PEBS_ENABLE value.

Fixes: 9b545c04abd4f ("perf/x86/kvm: Avoid unnecessary work in guest filtering")
Signed-off-by: Jim Mattson <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Peter Shier <[email protected]>
Acked-by: Andi Kleen <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

arm64: dts: allwinner: Revert SD card CD GPIO for Pine64-LTS

Commit 941432d00768 ("arm64: dts: allwinner: Drop non-removable from
SoPine/LTS SD card") enabled the card detect GPIO for the SOPine module,
along the way with the Pine64-LTS, which share the same base .dtsi.

This was based on the observation that the Pine64-LTS has as "push-push"
SD card socket, and that the schematic mentions the card detect GPIO.

After having received two reports about failing SD card access with that
patch, some more research and polls on that subject revealed that there
are at least two different versions of the Pine64-LTS out there:
- On some boards (including mine) the card detect pin is "stuck" at
high, regardless of an microSD card being inserted or not.
- On other boards the card-detect is working, but is active-high, by
virtue of an explicit inverter circuit, as shown in the schematic.

To cover all versions of the board out there, and don't take any chances,
let's revert the introduction of the active-low CD GPIO, but let's use
the broken-cd property for the Pine64-LTS this time. That should avoid
regressions and should work for everyone, even allowing SD card changes
now.
The SOPine card detect has proven to be working, so let's keep that
GPIO in place.

Fixes: 941432d00768 ("arm64: dts: allwinner: Drop non-removable from SoPine/LTS SD card")
Reported-by: Michael Weiser <[email protected]>
Reported-by: Daniel Kulesz <[email protected]>
Suggested-by: Chen-Yu Tsai <[email protected]>
Signed-off-by: Andre Przywara <[email protected]>
Tested-by: Michael Weiser <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

pinctrl: core: Show pin numbers for the controllers with base = 0

The commit f1b206cf7c57 ("pinctrl: core: print gpio in pins debugfs file")
enabled GPIO pin number and label in debugfs for pin controller. However,
it limited that feature to the chips where base is positive number. This,
in particular, excluded chips where base is 0 for the historical or backward
compatibility reasons. Refactor the code to include the latter as well.

Fixes: f1b206cf7c57 ("pinctrl: core: print gpio in pins debugfs file")
Cc: Drew Fustini <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Tested-by: Drew Fustini <[email protected]>
Reviewed-by: Drew Fustini <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

KEYS: trusted: Fix TPM reservation for seal/unseal

The original patch 8c657a0590de ("KEYS: trusted: Reserve TPM for seal
and unseal operations") was correct on the mailing list:

https://lore.kernel.org/linux-integrity/20210128235621 [email protected]/

But somehow got rebased so that the tpm_try_get_ops() in
tpm2_seal_trusted() got lost. This causes an imbalanced put of the
TPM ops and causes oopses on TIS based hardware.

This fix puts back the lost tpm_try_get_ops()

Fixes: 8c657a0590de ("KEYS: trusted: Reserve TPM for seal and unseal operations")
Reported-by: Mimi Zohar <[email protected]>
Acked-by: Mimi Zohar <[email protected]>
Signed-off-by: James Bottomley <[email protected]>

Merge tag 'mmc-v5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fix from Ulf Hansson:
"Replace WARN_ONCE with dev_warn_once for non-optimal sg-alignment in
the meson-gx host driver"

* tag 'mmc-v5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: meson-gx: replace WARN_ONCE with dev_warn_once about scatterlist size alignment in block mode

block: return -EBUSY when there are open partitions in blkdev_reread_part

The switch to go through blkdev_get_by_dev means we now ignore the
return value from bdev_disk_changed in __blkdev_get. Add a manual
check to restore the old semantics.

Fixes: 4601b4b130de ("block: reopen the device in blkdev_reread_part")
Reported-by: Karel Zak <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

drm/amdgpu: fix GCR_GENERAL_CNTL offset for dimgrey_cavefish

dimgrey_cavefish has similar gc_10_3 ip with sienna_cichlid,
so follow its registers offset setting.

Signed-off-by: Jiansong Chen <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

amd/display: allow non-linear multi-planar formats

Accept non-linear buffers which use a multi-planar format, as long
as they don't use DCC.

Tested on GFX9 with NV12.

Signed-off-by: Simon Ser <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Harry Wentland <[email protected]>
Cc: Nicholas Kazlauskas <[email protected]>
Cc: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/display: Update modifier list for gfx10_3

[Why]
Current list supports modifiers that have DCC_MAX_COMPRESSED_BLOCK
set to AMD_FMT_MOD_DCC_BLOCK_128B, while AMD_FMT_MOD_DCC_BLOCK_64B
is used instead by userspace.

[How]
Replace AMD_FMT_MOD_DCC_BLOCK_128B with AMD_FMT_MOD_DCC_BLOCK_64B
for modifiers with DCC supported.

Fixes: faa37f54ce0462 ("drm/amd/display: Expose modifiers")
Signed-off-by: Qingqing Zhuo <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Bas Nieuwenhuizen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amdgpu: reserve fence slot to update page table

Forgot to reserve a fence slot to use sdma to update page table, cause
below kernel BUG backtrace to handle vm retry fault while application is
exiting.

[  133.048143] kernel BUG at /home/yangp/git/compute_staging/kernel/drivers/dma-buf/dma-resv.c:281!
[  133.048487] Workqueue: events amdgpu_irq_handle_ih1 [amdgpu]
[  133.048506] RIP: 0010:dma_resv_add_shared_fence+0x204/0x280
[  133.048672]  amdgpu_vm_sdma_commit+0x134/0x220 [amdgpu]
[  133.048788]  amdgpu_vm_bo_update_range+0x220/0x250 [amdgpu]
[  133.048905]  amdgpu_vm_handle_fault+0x202/0x370 [amdgpu]
[  133.049031]  gmc_v9_0_process_interrupt+0x1ab/0x310 [amdgpu]
[  133.049165]  ? kgd2kfd_interrupt+0x9a/0x180 [amdgpu]
[  133.049289]  ? amdgpu_irq_dispatch+0xb6/0x240 [amdgpu]
[  133.049408]  amdgpu_irq_dispatch+0xb6/0x240 [amdgpu]
[  133.049534]  amdgpu_ih_process+0x9b/0x1c0 [amdgpu]
[  133.049657]  amdgpu_irq_handle_ih1+0x21/0x60 [amdgpu]
[  133.049669]  process_one_work+0x29f/0x640
[  133.049678]  worker_thread+0x39/0x3f0
[  133.049685]  ? process_one_work+0x640/0x640

Signed-off-by: Philip Yang <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 5.11.x

gpio: omap: Save and restore sysconfig

As we are using cpu_pm to save and restore context, we must also save and
restore the GPIO sysconfig register. This is needed because we are not
calling PM runtime functions at all with cpu_pm.

We need to save the sysconfig on idle as it's value can get reconfigured by
PM runtime and can be different from the init time value. Device specific
flags like "ti,no-idle-on-init" can affect the init value.

Fixes: b764a5863fd8 ("gpio: omap: Remove custom PM calls and use cpu_pm instead")
Cc: Aaro Koskinen <[email protected]>
Cc: Adam Ford <[email protected]>
Cc: Andreas Kemnade <[email protected]>
Cc: Grygorii Strashko <[email protected]>
Cc: Peter Ujfalusi <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>
Acked-by: Grygorii Strashko <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3

There may be a kernel panic on the Haswell server and the Broadwell
server, if the snbep_pci2phy_map_init() return error.

The uncore_extra_pci_dev[HSWEP_PCI_PCU_3] is used in the cpu_init() to
detect the existence of the SBOX, which is a MSR type of PMON unit.
The uncore_extra_pci_dev is allocated in the uncore_pci_init(). If the
snbep_pci2phy_map_init() returns error, perf doesn't initialize the
PCI type of the PMON units, so the uncore_extra_pci_dev will not be
allocated. But perf may continue initializing the MSR type of PMON
units. A null dereference kernel panic will be triggered.

The sockets in a Haswell server or a Broadwell server are identical.
Only need to detect the existence of the SBOX once.
Current perf probes all available PCU devices and stores them into the
uncore_extra_pci_dev. It's unnecessary.
Use the pci_get_device() to replace the uncore_extra_pci_dev. Only
detect the existence of the SBOX on the first available PCU device once.

Factor out hswep_has_limit_sbox(), since the Haswell server and the
Broadwell server uses the same way to detect the existence of the SBOX.

Add some macros to replace the magic number.

Fixes: 5306c31c5733 ("perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes")
Reported-by: Steve Wahl <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Steve Wahl <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

Merge tag 'trace-v5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"Fix tp_printk command line and trace events

  Masami added a wrapper to be able to unhash trace event pointers as
  they are only read by root anyway, and they can also be extracted by
  the raw trace data buffers. But this wrapper utilized the iterator to
  have a temporary buffer to manipulate the text with.

  tp_printk is a kernel command line option that will send the trace
  output of a trace event to the console on boot up (useful when the
  system crashes before finishing the boot). But the code used the same
  wrapper that Masami added, and its iterator did not have a buffer, and
  this caused the system to crash.

  Have the wrapper just print the trace event normally if the iterator
  has no temporary buffer"

* tag 'trace-v5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix checking event hash pointer logic when tp_printk is enabled

capabilities: require CAP_SETFCAP to map uid 0

cap_setfcap is required to create file capabilities.

Since commit 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"),
a process running as uid 0 but without cap_setfcap is able to work
around this as follows: unshare a new user namespace which maps parent
uid 0 into the child namespace.

While this task will not have new capabilities against the parent
namespace, there is a loophole due to the way namespaced file
capabilities are represented as xattrs.  File capabilities valid in
userns 1 are distinguished from file capabilities valid in userns 2 by
the kuid which underlies uid 0.  Therefore the restricted root process
can unshare a new self-mapping namespace, add a namespaced file
capability onto a file, then use that file capability in the parent
namespace.

To prevent that, do not allow mapping parent uid 0 if the process which
opened the uid_map file does not have CAP_SETFCAP, which is the
capability for setting file capabilities.

As a further wrinkle: a task can unshare its user namespace, then open
its uid_map file itself, and map (only) its own uid.  In this case we do
not have the credential from before unshare, which was potentially more
restricted.  So, when creating a user namespace, we record whether the
creator had CAP_SETFCAP.  Then we can use that during map_write().

With this patch:

1. Unprivileged user can still unshare -Ur

   ubuntu@caps:~$ unshare -Ur
   root@caps:~# logout

2. Root user can still unshare -Ur

   ubuntu@caps:~$ sudo bash
   root@caps:/home/ubuntu# unshare -Ur
   root@caps:/home/ubuntu# logout

3. Root user without CAP_SETFCAP cannot unshare -Ur:

   root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
   root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
   unable to set CAP_SETFCAP effective capability: Operation not permitted
   root@caps:/home/ubuntu# unshare -Ur
   unshare: write failed /proc/self/uid_map: Operation not permitted

Note: an alternative solution would be to allow uid 0 mappings by
processes without CAP_SETFCAP, but to prevent such a namespace from
writing any file capabilities.  This approach can be seen at [1].

Background history: commit 95ebabde382 ("capabilities: Don't allow
writing ambiguous v3 file capabilities") tried to fix the issue by
preventing v3 fscaps to be written to disk when the root uid would map
to the same uid in nested user namespaces.  This led to regressions for
various workloads.  For example, see [2].  Ultimately this is a valid
use-case we have to support meaning we had to revert this change in
3b0c2d3eaa83 ("Revert 95ebabde382c ("capabilities: Don't allow writing
ambiguous v3 file capabilities")").

Link: https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
Link: https://github.com/containers/buildah/issues/3071
Signed-off-by: Serge Hallyn <[email protected]>
Reviewed-by: Andrew G. Morgan <[email protected]>
Tested-by: Christian Brauner <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Tested-by: Giuseppe Scrivano <[email protected]>
Cc: Eric Biederman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

perf data: Fix error return code in perf_data__create_dir()

Although 'ret' has been initialized to -1, but it will be reassigned by
the "ret = open(...)" statement in the for loop. So that, the value of
'ret' is unknown when asprintf() failed.

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Zhen Lei <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access

Commit in Fixes: added support for kexec-ing a kernel on panic using a
new system call. As part of it, it does prepare a memory map for the new
kernel.

However, while doing so, it wrongly accesses memory it has not
allocated: it accesses the first element of the cmem->ranges[] array in
memmap_exclude_ranges() but it has not allocated the memory for it in
crash_setup_memmap_entries(). As KASAN reports:

  BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0
  Write of size 8 at addr ffffc90000426008 by task kexec/1187

  (gdb) list *crash_setup_memmap_entries+0x17e
  0xffffffff8107cafe is in crash_setup_memmap_entries (arch/x86/kernel/crash.c:322).
  317                                      unsigned long long mend)
  318     {
  319             unsigned long start, end;
  320
  321             cmem->ranges[0].start = mstart;
  322             cmem->ranges[0].end = mend;
  323             cmem->nr_ranges = 1;
  324
  325             /* Exclude elf header region */
  326             start = image->arch.elf_load_addr;
  (gdb)

Make sure the ranges array becomes a single element allocated.

[ bp: Write a proper commit message. ]

Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call")
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Dave Young <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

tracing: Fix checking event hash pointer logic when tp_printk is enabled

Pointers in events that are printed are unhashed if the flags allow it,
and the logic to do so is called before processing the event output from
the raw ring buffer. In most cases, this is done when a user reads one of
the trace files.

But if tp_printk is added on the kernel command line, this logic is done
for trace events when they are triggered, and their output goes out via
printk. The unhash logic (and even the validation of the output) did not
support the tp_printk output, and would crash.

Link: https://lore.kernel.org/linux-tegra/[email protected]/
Fixes: efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
Reported-by: Jon Hunter <[email protected]>
Tested-by: Jon Hunter <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

Merge tag 'gvt-fixes-2021-04-20' of https://github.com/intel/gvt-linux into drm-intel-fixes

gvt-fixes-2021-04-20

- Fix cmd parser regression on BDW (Zhenyu)

Signed-off-by: Rodrigo Vivi <[email protected]>
From: Zhenyu Wang <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Revert "gcov: clang: fix clang-11+ build"

This reverts commit 04c53de57cb6435738961dace8b1b71d3ecd3c39.

Nathan Chancellor points out that it should not have been merged into
mainline by itself. It was a fix for "gcov: use kvmalloc()", which is
still in -mm/-next. Merging it alone has broken the build.

Link: https://github.com/ClangBuiltLinux/continuous-integration2/runs/2384465683?check_suite_focus=true
Reported-by: Nathan Chancellor <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drm/i915: Fix modesetting in case of unexpected AUX timeouts

In case AUX failures happen unexpectedly during a modeset, the driver
should still complete the modeset. In particular the driver should
perform the link training sequence steps even in case of an AUX failure,
as this sequence also includes port initialization steps. Not doing that
can leave the port/pipe in a broken state and lead for instance to a
flip done timeout.

Fix this by continuing with link training (in a no-LTTPR mode) if the
DPRX DPCD readout failed for some reason at the beginning of link
training. After a successful connector detection we already have the
DPCD read out and cached, so the failed repeated read for it should not
cause a problem. Note that a partial AUX read could in theory partly
overwrite the cached DPCD (and return error) but this overwrite should
not happen if the returned values are corrupted (due to a timeout or
some other IO error).

Kudos to Ville to root cause the problem.

Fixes: 7dffbdedb96a ("drm/i915: Disable LTTPR support when the DPCD rev < 1.4")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3308
Cc: [email protected] # 5.11
Cc: Ville Syrjälä <[email protected]>
Signed-off-by: Imre Deak <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit e42e7e585984b85b0fb9dd1fefc85ee4800ca629)
Signed-off-by: Rodrigo Vivi <[email protected]>
[adjusted Fixes: tag]

preempt/dynamic: Fix typo in macro conditional statement

Commit 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched()
static call") tried to provide irqentry_exit_cond_resched() static call
in irqentry_exit, but has a typo in macro conditional statement.

Fixes: 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched() static call")
Signed-off-by: Zhouyi Zhou <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

mmc: meson-gx: replace WARN_ONCE with dev_warn_once about scatterlist size alignment in block mode

Since commit e085b51c74cc ("mmc: meson-gx: check for scatterlist size alignment in block mode"),
support for SDIO SD_IO_RW_EXTENDED transferts are properly filtered but some driver
like brcmfmac still gives a block sg buffer size not aligned with SDIO block,
triggerring a WARN_ONCE() with scary stacktrace even if the transfer works fine
but with possible degraded performances.

Simply replace with dev_warn_once() to inform user this should be fixed to avoid
degraded performance.

This should be ultimately fixed in brcmfmac, but since it's only a performance issue
the warning should be removed.

Fixes: e085b51c74cc ("mmc: meson-gx: check for scatterlist size alignment in block mode")
Reported-by: Marek Szyprowski <[email protected]>
Signed-off-by: Neil Armstrong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>

Linux 5.12-rc8

Merge tag 'arm-fixes-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"Another smaller set of fixes for three of the Arm platforms:

  TI OMAP:

     Fix swapped mmc device order also for omap3 that got changed with
     the recent PROBE_PREFER_ASYNCHRONOUS changes. While eventually the
     aliases should be board specific, all the mmc device instances are
     all there in the SoC, and we do probe them by default so that PM
     runtime can idle the devices if left enabled from the bootloader.

  Qualcomm Snapdragon:

     This bypasses the recently introduced interconnect handling in
     the GENI (serial engine) driver when running off ACPI, as this
     causes the GENI probe to fail and the Lenovo Yoga C630 to boot
     without keyboard and touchpad.

  Allwinner:

     One 32kHz clock fix for the beelink gs1, a CD polarity fix for the
     SoPine, some MAINTAINERS maintainance, and a clk / reset switch to
     our headers"

* tag 'arm-fixes-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: dts: allwinner: h6: beelink-gs1: Remove ext. 32 kHz osc reference
  MAINTAINERS: Match on allwinner keyword
  MAINTAINERS: Add our new mailing-list
  arm64: dts: allwinner: Fix SD card CD GPIO for SOPine systems
  arm64: dts: allwinner: h6: Switch to macros for RSB clock/reset indices
  ARM: OMAP2+: Fix uninitialized sr_inst
  ARM: dts: Fix swapped mmc order for omap3
  ARM: OMAP2+: Fix warning for omap_init_time_of()
  soc: qcom: geni: shield geni_icc_get() for ACPI boot

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:

- Halve maximum number of CPUs if DEBUG_KMAP_LOCAL is enabled

- Fix conversion for_each_membock() to for_each_mem_range()

- Fix footbridge PCI mapping

- Avoid uprobes hooking on thumb instructions

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9071/1: uprobes: Don't hook on thumb instructions
  ARM: footbridge: fix PCI interrupt mapping
  ARM: 9069/1: NOMMU: Fix conversion for_each_membock() to for_each_mem_range()
  ARM: 9063/1: mm: reduce maximum number of CPUs if DEBUG_KMAP_LOCAL is enabled

ARM: 9071/1: uprobes: Don't hook on thumb instructions

Since uprobes is not supported for thumb, check that the thumb bit is
not set when matching the uprobes instruction hooks.

The Arm UDF instructions used for uprobes triggering
(UPROBE_SWBP_ARM_INSN and UPROBE_SS_ARM_INSN) coincidentally share the
same encoding as a pair of unallocated 32-bit thumb instructions (not
UDF) when the condition code is 0b1111 (0xf). This in effect makes it
possible to trigger the uprobes functionality from thumb, and at that
using two unallocated instructions which are not permanently undefined.

Signed-off-by: Fredrik Strupe <[email protected]>
Cc: [email protected]
Fixes: c7edc9e326d5 ("ARM: add uprobes support")
Signed-off-by: Russell King <[email protected]>