Git Repo - linux.git/log

KVM: X86: Handle implicit supervisor access with SMAP

There are two kinds of implicit supervisor access
implicit supervisor access when CPL = 3
implicit supervisor access when CPL < 3

Current permission_fault() handles only the first kind for SMAP.

But if the access is implicit when SMAP is on, data may not be read
nor write from any user-mode address regardless the current CPL.

So the second kind should be also supported.

The first kind can be detect via CPL and access mode: if it is
supervisor access and CPL = 3, it must be implicit supervisor access.

But it is not possible to detect the second kind without extra
information, so this patch adds an artificial PFERR_EXPLICIT_ACCESS
into @access. This extra information also works for the first kind, so
the logic is changed to use this information for both cases.

The value of PFERR_EXPLICIT_ACCESS is deliberately chosen to be bit 48
which is in the most significant 16 bits of u64 and less likely to be
forced to change due to future hardware uses it.

This patch removes the call to ->get_cpl() for access mode is determined
by @access.  Not only does it reduce a function call, but also remove
confusions when the permission is checked for nested TDP.  The nested
TDP shouldn't have SMAP checking nor even the L2's CPL have any bearing
on it.  The original code works just because it is always user walk for
NPT and SMAP fault is not set for EPT in update_permission_bitmask.

Signed-off-by: Lai Jiangshan <[email protected]>
Message-Id: <20220311070346 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: X86: Rename variable smap to not_smap in permission_fault()

Comments above the variable says the bit is set when SMAP is overridden
or the same meaning in update_permission_bitmask(): it is not subjected
to SMAP restriction.

Renaming it to reflect the negative implication and make the code better
readability.

Signed-off-by: Lai Jiangshan <[email protected]>
Message-Id: <20220311070346 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: X86: Fix comments in update_permission_bitmask

The commit 09f037aa48f3 ("KVM: MMU: speedup update_permission_bitmask")
refactored the code of update_permission_bitmask() and change the
comments. It added a condition into a list to match the new code,
so the number/order for conditions in the comments should be updated
too.

Signed-off-by: Lai Jiangshan <[email protected]>
Message-Id: <20220311070346 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: X86: Change the type of access u32 to u64

Change the type of access u32 to u64 for FNAME(walk_addr) and
->gva_to_gpa().

The kinds of accesses are usually combinations of UWX, and VMX/SVM's
nested paging adds a new factor of access: is it an access for a guest
page table or for a final guest physical address.

And SMAP relies a factor for supervisor access: explicit or implicit.

So @access in FNAME(walk_addr) and ->gva_to_gpa() is better to include
all these information to do the walk.

Although @access(u32) has enough bits to encode all the kinds, this
patch extends it to u64:
o Extra bits will be in the higher 32 bits, so that we can
  easily obtain the traditional access mode (UWX) by converting
  it to u32.
o Reuse the value for the access kind defined by SVM's nested
  paging (PFERR_GUEST_FINAL_MASK and PFERR_GUEST_PAGE_MASK) as
  @error_code in kvm_handle_page_fault().

Signed-off-by: Lai Jiangshan <[email protected]>
Message-Id: <20220311070346 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Remove dirty handling from gfn_to_pfn_cache completely

It isn't OK to cache the dirty status of a page in internal structures
for an indefinite period of time.

Any time a vCPU exits the run loop to userspace might be its last; the
VMM might do its final check of the dirty log, flush the last remaining
dirty pages to the destination and complete a live migration. If we
have internal 'dirty' state which doesn't get flushed until the vCPU
is finally destroyed on the source after migration is complete, then
we have lost data because that will escape the final copy.

This problem already exists with the use of kvm_vcpu_unmap() to mark
pages dirty in e.g. VMX nesting.

Note that the actual Linux MM already considers the page to be dirty
since we have a writeable mapping of it. This is just about the KVM
dirty logging.

For the nesting-style use cases (KVM_GUEST_USES_PFN) we will need to
track which gfn_to_pfn_caches have been used and explicitly mark the
corresponding pages dirty before returning to userspace. But we would
have needed external tracking of that anyway, rather than walking the
full list of GPCs to find those belonging to this vCPU which are dirty.

So let's rely *solely* on that external tracking, and keep it simple
rather than laying a tempting trap for callers to fall into.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <20220303154127 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Use enum to track if cached PFN will be used in guest and/or host

Replace the guest_uses_pa and kernel_map booleans in the PFN cache code
with a unified enum/bitmask. Using explicit names makes it easier to
review and audit call sites.

Opportunistically add a WARN to prevent passing garbage; instantating a
cache without declaring its usage is either buggy or pointless.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <20220303154127 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: SVM: Fix kvm_cache_regs.h inclusions for is_guest_mode()

Include kvm_cache_regs.h to pick up the definition of is_guest_mode(),
which is referenced by nested_svm_virtualize_tpr() in svm.h. Remove
include from svm_onhpyerv.c which was done only because of lack of
include in svm.h.

Fixes: 883b0a91f41ab ("KVM: SVM: Move Nested SVM Implementation to nested.c")
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Peter Gonda <[email protected]>
Message-Id: <20220304161032.2270688 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86/pmu: Use different raw event masks for AMD and Intel

The third nybble of AMD's event select overlaps with Intel's IN_TX and
IN_TXCP bits. Therefore, we can't use AMD64_RAW_EVENT_MASK on Intel
platforms that support TSX.

Declare a raw_event_mask in the kvm_pmu structure, initialize it in
the vendor-specific pmu_refresh() functions, and use that mask for
PERF_TYPE_RAW configurations in reprogram_gp_counter().

Fixes: 710c47651431 ("KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW")
Signed-off-by: Jim Mattson <[email protected]>
Message-Id: <20220308012452.3468611 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Don't actually set a request when evicting vCPUs for GFN cache invd

Don't actually set a request bit in vcpu->requests when making a request
purely to force a vCPU to exit the guest.  Logging a request but not
actually consuming it would cause the vCPU to get stuck in an infinite
loop during KVM_RUN because KVM would see the pending request and bail
from VM-Enter to service the request.

Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as
nothing in KVM is wired up to set guest_uses_pa=true.  But, it'd be all
too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init()
without implementing handling of the request, especially since getting
test coverage of MMU notifier interaction with specific KVM features
usually requires a directed test.

Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus
to evict_vcpus.  The purpose of the request is to get vCPUs out of guest
mode, it's supposed to _avoid_ waking vCPUs that are blocking.

Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to
what it wants to accomplish, and to genericize the name so that it can
used for similar but unrelated scenarios, should they arise in the future.
Add a comment and documentation to explain why the "no action" request
exists.

Add compile-time assertions to help detect improper usage.  Use the inner
assertless helper in the one s390 path that makes requests without a
hardcoded request.

Cc: David Woodhouse <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20220223165302.3205276 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: avoid double put_page with gfn-to-pfn cache

If the cache's user host virtual address becomes invalid, there
is still a path from kvm_gfn_to_pfn_cache_refresh() where __release_gpc()
could release the pfn but the gpc->pfn field has not been overwritten
with an error value. If this happens, kvm_gfn_to_pfn_cache_unmap will
call put_page again on the same page.

Cc: [email protected]
Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86/mmu: Zap only TDP MMU leafs in zap range and mmu_notifier unmap

Re-introduce zapping only leaf SPTEs in kvm_zap_gfn_range() and
kvm_tdp_mmu_unmap_gfn_range(), this time without losing a pending TLB
flush when processing multiple roots (including nested TDP shadow roots).
Dropping the TLB flush resulted in random crashes when running Hyper-V
Server 2019 in a guest with KSM enabled in the host (or any source of
mmu_notifier invalidations, KSM is just the easiest to force).

This effectively revert commits 873dd122172f8cce329113cfb0dfe3d2344d80c0
and fcb93eb6d09dd302cbef22bd95a5858af75e4156, and thus restores commit
cf3e26427c08ad9015956293ab389004ac6a338e, plus this delta on top:

bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
        struct kvm_mmu_page *root;

        for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
-               flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, false);
+               flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);

        return flush;
}

Cc: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Tested-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <20220325230348.2587437 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: SVM: fix panic on out-of-bounds guest IRQ

As guest_irq is coming from KVM_IRQFD API call, it may trigger
crash in svm_update_pi_irte() due to out-of-bounds:

crash> bt
PID: 22218  TASK: ffff951a6ad74980  CPU: 73  COMMAND: "vcpu8"
#0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
#1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
#2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
#3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
#4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
#5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
#6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
    [exception RIP: svm_update_pi_irte+227]
    RIP: ffffffffc0761b53  RSP: ffffb1ba6707fd08  RFLAGS: 00010086
    RAX: ffffb1ba6707fd78  RBX: ffffb1ba66d91000  RCX: 0000000000000001
    RDX: 00003c803f63f1c0  RSI: 000000000000019a  RDI: ffffb1ba66db2ab8
    RBP: 000000000000019a   R8: 0000000000000040   R9: ffff94ca41b82200
    R10: ffffffffffffffcf  R11: 0000000000000001  R12: 0000000000000001
    R13: 0000000000000001  R14: ffffffffffffffcf  R15: 000000000000005f
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
#8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
#9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
    RIP: 00007f143c36488b  RSP: 00007f143a4e04b8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f05780041d0  RCX: 00007f143c36488b
    RDX: 00007f05780041d0  RSI: 000000004008ae6a  RDI: 0000000000000020
    RBP: 00000000000004e8   R8: 0000000000000008   R9: 00007f05780041e0
    R10: 00007f0578004560  R11: 0000000000000246  R12: 00000000000004e0
    R13: 000000000000001a  R14: 00007f1424001c60  R15: 00007f0578003bc0
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

Vmx have been fix this in commit 3a8b0677fc61 (KVM: VMX: Do not BUG() on
out-of-bounds guest IRQ), so we can just copy source from that to fix
this.

Co-developed-by: Yi Liu <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Yi Wang <[email protected]>
Message-Id: <20220309113025 [email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: MMU: propagate alloc_workqueue failure

If kvm->arch.tdp_mmu_zap_wq cannot be created, the failure has
to be propagated up to kvm_mmu_init_vm and kvm_arch_init_vm.
kvm_arch_init_vm also has to undo all the initialization, so
group all the MMU initialization code at the beginning and
handle cleaning up of kvm_page_track_init.

Signed-off-by: Paolo Bonzini <[email protected]>

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs updates from Al Viro:
"Assorted bits and pieces"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  aio: drop needless assignment in aio_read()
  clean overflow checks in count_mounts() a bit
  seq_file: fix NULL pointer arithmetic warning
  uml/x86: use x86 load_unaligned_zeropad()
  asm/user.h: killed unused macros
  constify struct path argument of finish_automount()/do_add_mount()
  fs: Remove FIXME comment in generic_write_checks()

Merge tag 'vfs-5.18-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull vfs fix from Darrick Wong:
"The erofs developers felt that FIEMAP should handle ranged requests
  starting at s_maxbytes by returning EFBIG instead of passing the
  filesystem implementation a nonsense 0-byte request.

  Not sure why they keep tagging this 'iomap', but the VFS shouldn't be
  asking for information about ranges of a file that the filesystem
  already declared that it does not support.

   - Fix a potential infinite loop in FIEMAP by fixing an off by one
     error when comparing the requested range against s_maxbytes"

* tag 'vfs-5.18-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  fs: fix an infinite loop in iomap_fiemap

Merge tag 'xfs-5.18-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
"This fixes multiple problems in the reserve pool sizing functions: an
  incorrect free space calculation, a pointless infinite loop, and even
  more braindamage that could result in the pool being overfilled. The
  pile of patches from Dave fix myriad races and UAF bugs in the log
  recovery code that much to our mutual surprise nobody's tripped over.
  Dave also fixed a performance optimization that had turned into a
  regression.

  Dave Chinner is taking over as XFS maintainer starting Sunday and
  lasting until 5.19-rc1 is tagged so that I can focus on starting a
  massive design review for the (feature complete after five years)
  online repair feature. From then on, he and I will be moving XFS to a
  co-maintainership model by trading duties every other release.

  NOTE: I hope very strongly that the other pieces of the (X)FS
  ecosystem (fstests and xfsprogs) will make similar changes to spread
  their maintenance load.

  Summary:

   - Fix an incorrect free space calculation in xfs_reserve_blocks that
     could lead to a request for free blocks that will never succeed.

   - Fix a hang in xfs_reserve_blocks caused by an infinite loop and the
     incorrect free space calculation.

   - Fix yet a third problem in xfs_reserve_blocks where multiple racing
     threads can overfill the reserve pool.

   - Fix an accounting error that lead to us reporting reserved space as
     "available".

   - Fix a race condition during abnormal fs shutdown that could cause
     UAF problems when memory reclaim and log shutdown try to clean up
     inodes.

   - Fix a bug where log shutdown can race with unmount to tear down the
     log, thereby causing UAF errors.

   - Disentangle log and filesystem shutdown to reduce confusion.

   - Fix some confusion in xfs_trans_commit such that a race between
     transaction commit and filesystem shutdown can cause unlogged dirty
     inode metadata to be committed, thereby corrupting the filesystem.

   - Remove a performance optimization in the log as it was discovered
     that certain storage hardware handle async log flushes so poorly as
     to cause serious performance regressions. Recent restructuring of
     other parts of the logging code mean that no performance benefit is
     seen on hardware that handle it well"

* tag 'xfs-5.18-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: drop async cache flushes from CIL commits.
  xfs: shutdown during log recovery needs to mark the log shutdown
  xfs: xfs_trans_commit() path must check for log shutdown
  xfs: xfs_do_force_shutdown needs to block racing shutdowns
  xfs: log shutdown triggers should only shut down the log
  xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks
  xfs: shutdown in intent recovery has non-intent items in the AIL
  xfs: aborting inodes on shutdown may need buffer lock
  xfs: don't report reserved bnobt space as available
  xfs: fix overfilling of reserve pool
  xfs: always succeed at setting the reserve pool size
  xfs: remove infinite loop when reserving free block pool
  xfs: don't include bnobt blocks when reserving free block pool
  xfs: document the XFS_ALLOC_AGFL_RESERVE constant

Merge tag 'riscv-for-linus-5.18-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fix from Palmer Dabbelt:

- Fix the RISC-V section of the generic CPU idle bindings to comply
with the recently tightened DT schema.

* tag 'riscv-for-linus-5.18-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
dt-bindings: Fix phandle-array issues in the idle-states bindings

Merge tag 'for-5.18/drivers-2022-04-01' of git://git.kernel.dk/linux-block

Pull block driver fixes from Jens Axboe:
"Followup block driver updates and fixes for the 5.18-rc1 merge window.
  In detail:

   - NVMe pull request
       - Fix multipath hang when disk goes live over reconnect (Anton
         Eidelman)
       - fix RCU hole that allowed for endless looping in multipath
         round robin (Chris Leech)
       - remove redundant assignment after left shift (Colin Ian King)
       - add quirks for Samsung X5 SSDs (Monish Kumar R)
       - fix the read-only state for zoned namespaces with unsupposed
         features (Pankaj Raghav)
       - use a private workqueue instead of the system workqueue in
         nvmet (Sagi Grimberg)
       - allow duplicate NSIDs for private namespaces (Sungup Moon)
       - expose use_threaded_interrupts read-only in sysfs (Xin Hao)"

   - nbd minor allocation fix (Zhang)

   - drbd fixes and maintainer addition (Lars, Jakob, Christoph)

   - n64cart build fix (Jackie)

   - loop compat ioctl fix (Carlos)

   - misc fixes (Colin, Dongli)"

* tag 'for-5.18/drivers-2022-04-01' of git://git.kernel.dk/linux-block:
  drbd: remove check of list iterator against head past the loop body
  drbd: remove usage of list iterator variable after loop
  nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
  MAINTAINERS: add drbd co-maintainer
  drbd: fix potential silent data corruption
  loop: fix ioctl calls using compat_loop_info
  nvme-multipath: fix hang when disk goes live over reconnect
  nvme: fix RCU hole that allowed for endless looping in multipath round robin
  nvme: allow duplicate NSIDs for private namespaces
  nvmet: remove redundant assignment after left shift
  nvmet: use a private workqueue instead of the system workqueue
  nvme-pci: add quirks for Samsung X5 SSDs
  nvme-pci: expose use_threaded_interrupts read-only in sysfs
  nvme: fix the read-only state for zoned namespaces with unsupposed features
  n64cart: convert bi_disk to bi_bdev->bd_disk fix build
  xen/blkfront: fix comment for need_copy
  xen-blkback: remove redundant assignment to variable i

Merge tag 'for-5.18/block-2022-04-01' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Either fixes or a few additions that got missed in the initial merge
  window pull. In detail:

   - List iterator fix to avoid leaking value post loop (Jakob)

   - One-off fix in minor count (Christophe)

   - Fix for a regression in how io priority setting works for an
     exiting task (Jiri)

   - Fix a regression in this merge window with blkg_free() being called
     in an inappropriate context (Ming)

   - Misc fixes (Ming, Tom)"

* tag 'for-5.18/block-2022-04-01' of git://git.kernel.dk/linux-block:
  blk-wbt: remove wbt_track stub
  block: use dedicated list iterator variable
  block: Fix the maximum minor value is blk_alloc_ext_minor()
  block: restore the old set_task_ioprio() behaviour wrt PF_EXITING
  block: avoid calling blkg_free() in atomic context
  lib/sbitmap: allocate sb->map via kvzalloc_node

Merge tag 'for-5.18/io_uring-2022-04-01' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"A little bit all over the map, some regression fixes for this merge
  window, and some general fixes that are stable bound. In detail:

   - Fix an SQPOLL memory ordering issue (Almog)

   - Accept fixes (Dylan)

   - Poll fixes (me)

   - Fixes for provided buffers and recycling (me)

   - Tweak to IORING_OP_MSG_RING command added in this merge window (me)

   - Memory leak fix (Pavel)

   - Misc fixes and tweaks (Pavel, me)"

* tag 'for-5.18/io_uring-2022-04-01' of git://git.kernel.dk/linux-block:
  io_uring: defer msg-ring file validity check until command issue
  io_uring: fail links if msg-ring doesn't succeeed
  io_uring: fix memory leak of uid in files registration
  io_uring: fix put_kbuf without proper locking
  io_uring: fix invalid flags for io_put_kbuf()
  io_uring: improve req fields comments
  io_uring: enable EPOLLEXCLUSIVE for accept poll
  io_uring: improve task work cache utilization
  io_uring: fix async accept on O_NONBLOCK sockets
  io_uring: remove IORING_CQE_F_MSG
  io_uring: add flag for disabling provided buffer recycling
  io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly
  io_uring: don't recycle provided buffer if punted to async worker
  io_uring: fix assuming triggered poll waitqueue is the single poll
  io_uring: bump poll refs to full 31-bits
  io_uring: remove poll entry from list when canceling all
  io_uring: fix memory ordering when SQPOLL thread goes to sleep
  io_uring: ensure that fsnotify is always called
  io_uring: recycle provided before arming poll

Merge tag 'for-5.18/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- Fix DM integrity shrink crash due to journal entry not being marked
   unused.

- Fix DM bio polling to handle possibility that underlying device(s)
   return BLK_STS_AGAIN during submission.

- Fix dm_io and dm_target_io flags race condition on Alpha.

- Add some pr_err debugging to help debug cases when DM ioctl structure
   is corrupted.

* tag 'for-5.18/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: fix bio polling to handle possibile BLK_STS_AGAIN
  dm: fix dm_io and dm_target_io flags race condition on Alpha
  dm integrity: set journal entry unused when shrinking device
  dm ioctl: log an error if the ioctl structure is corrupted

dt-bindings: Fix phandle-array issues in the idle-states bindings

As per 39bd2b6a3783 ("dt-bindings: Improve phandle-array schemas"), the
phandle-array bindings have been disambiguated. This fixes the new
RISC-V idle-states bindings to comply with the schema.

Fixes: 1bd524f7e8d8 ("dt-bindings: Add common bindings for ARM and RISC-V idle states")
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

Merge tag '5.18-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull ksmbd updates from Steve French:

- three cleanup fixes

- shorten module load warning

- two documentation fixes

* tag '5.18-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: replace usage of found with dedicated list iterator variable
  ksmbd: Remove a redundant zeroing of memory
  MAINTAINERS: ksmbd: switch Sergey to reviewer
  ksmbd: shorten experimental warning on loading the module
  ksmbd: use netif_is_bridge_port
  Documentation: ksmbd: update Feature Status table

Merge tag '5.18-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull more cifs updates from Steve French:

- three fixes for big endian issues in how Persistent and Volatile file
   ids were stored

- Various misc. fixes: including some for oops, 2 for ioctls, 1 for
   writeback

- cleanup of how tcon (tree connection) status is tracked

- Four changesets to move various duplicated protocol definitions
   (defined both in cifs.ko and ksmbd) into smbfs_common/smb2pdu.h

- important performance improvement to use cached handles in some key
   compounding code paths (reduces numbers of opens/closes sent in some
   workloads)

- fix to allow alternate DFS target to be used to retry on a failed i/o

* tag '5.18-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix NULL ptr dereference in smb2_ioctl_query_info()
  cifs: prevent bad output lengths in smb2_ioctl_query_info()
  smb3: fix ksmbd bigendian bug in oplock break, and move its struct to smbfs_common
  smb3: cleanup and clarify status of tree connections
  smb3: move defines for query info and query fsinfo to smbfs_common
  smb3: move defines for ioctl protocol header and SMB2 sizes to smbfs_common
  [smb3] move more common protocol header definitions to smbfs_common
  cifs: fix incorrect use of list iterator after the loop
  ksmbd: store fids as opaque u64 integers
  cifs: fix bad fids sent over wire
  cifs: change smb2_query_info_compound to use a cached fid, if available
  cifs: convert the path to utf16 in smb2_query_info_compound
  cifs: writeback fix
  cifs: do not skip link targets when an I/O fails

Merge tag 'exfat-for-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

- Add keep_last_dots mount option to allow access to paths with
   trailing dots

- Avoid repetitive volume dirty bit set/clear to improve storage life
   time

* tag 'exfat-for-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: do not clear VolumeDirty in writeback
  exfat: allow access to paths with trailing dots

Merge tag 'folio-5.18d' of git://git.infradead.org/users/willy/pagecache

Pull more filesystem folio updates from Matthew Wilcox:
"A mixture of odd changes that didn't quite make it into the original
  pull and fixes for things that did. Also the readpages changes had to
  wait for the NFS tree to be pulled first.

   - Remove ->readpages infrastructure

   - Remove AOP_FLAG_CONT_EXPAND

   - Move read_descriptor_t to networking code

   - Pass the iocb to generic_perform_write

   - Minor updates to iomap, btrfs, ext4, f2fs, ntfs"

* tag 'folio-5.18d' of git://git.infradead.org/users/willy/pagecache:
  btrfs: Remove a use of PAGE_SIZE in btrfs_invalidate_folio()
  ntfs: Correct mark_ntfs_record_dirty() folio conversion
  f2fs: Get the superblock from the mapping instead of the page
  f2fs: Correct f2fs_dirty_data_folio() conversion
  ext4: Correct ext4_journalled_dirty_folio() conversion
  filemap: Remove AOP_FLAG_CONT_EXPAND
  fs: Pass an iocb to generic_perform_write()
  fs, net: Move read_descriptor_t to net.h
  fs: Remove read_actor_t
  iomap: Simplify is_partially_uptodate a little
  readahead: Update comments
  mm: remove the skip_page argument to read_pages
  mm: remove the pages argument to read_pages
  fs: Remove ->readpages address space operation
  readahead: Remove read_cache_pages()

Merge tag 'xarray-5.18' of git://git.infradead.org/users/willy/xarray

Pull XArray updates from Matthew Wilcox:

- Documentation update

- Fix test-suite build after move of bitmap.h

- Fix xas_create_range() when a large entry is already present

- Fix xas_split() of a shadow entry

* tag 'xarray-5.18' of git://git.infradead.org/users/willy/xarray:
  XArray: Update the LRU list in xas_split()
  XArray: Fix xas_create_range() when multi-order entry present
  XArray: Include bitmap.h from xarray.h
  XArray: Document the locking requirement for the xa_state

Merge tag 'riscv-for-linus-5.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:
"This has a handful of new features:

   - Support for CURRENT_STACK_POINTER, which enables some extra stack
     debugging for HARDENED_USERCOPY.

   - Support for the new SBI CPU idle extension, via cpuidle and suspend
     drivers.

   - Profiling has been enabled in the defconfigs.

  but is mostly fixes and cleanups"

* tag 'riscv-for-linus-5.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits)
  RISC-V: K210 defconfigs: Drop redundant MEMBARRIER=n
  RISC-V: defconfig: Drop redundant SBI HVC and earlycon
  Documentation: riscv: remove non-existent directory from table of contents
  riscv: cpu.c: don't use kernel-doc markers for comments
  RISC-V: Enable profiling by default
  RISC-V: module: fix apply_r_riscv_rcv_branch_rela typo
  RISC-V: Declare per cpu boot data as static
  RISC-V: Fix a comment typo in riscv_of_parent_hartid()
  riscv: Increase stack size under KASAN
  riscv: Fix fill_callchain return value
  riscv: dts: canaan: Fix SPI3 bus width
  riscv: Rename "sp_in_global" to "current_stack_pointer"
  riscv module: remove (NOLOAD)
  RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine
  dt-bindings: Add common bindings for ARM and RISC-V idle states
  cpuidle: Add RISC-V SBI CPU idle driver
  cpuidle: Factor-out power domain related code from PSCI domain driver
  RISC-V: Add SBI HSM suspend related defines
  RISC-V: Add arch functions for non-retentive suspend entry/exit
  RISC-V: Rename relocate() and make it global
  ...

Merge tag 's390-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Vasily Gorbik:

- Add kretprobes framepointer verification and return address recovery
   in stacktrace.

- Support control domain masks on custom zcrypt devices and filter
   admin requests.

- Cleanup timer API usage.

- Rework absolute lowcore access helpers.

- Other various small improvements and fixes.

* tag 's390-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (26 commits)
  s390/alternatives: avoid using jgnop mnemonic
  s390/pci: rename get_zdev_by_bus() to zdev_from_bus()
  s390/pci: improve zpci_dev reference counting
  s390/smp: use physical address for SIGP_SET_PREFIX command
  s390: cleanup timer API use
  s390/zcrypt: fix using the correct variable for sizeof()
  s390/vfio-ap: fix kernel doc and signature of group notifier functions
  s390/maccess: rework absolute lowcore accessors
  s390/smp: cleanup control register update routines
  s390/smp: cleanup target CPU callback starting
  s390/test_unwind: verify __kretprobe_trampoline is replaced
  s390/unwind: avoid duplicated unwinding entries for kretprobes
  s390/unwind: recover kretprobe modified return address in stacktrace
  s390/kprobes: enable kretprobes framepointer verification
  s390/test_unwind: extend kretprobe test
  s390/ap: adjust whitespace
  s390/ap: use insn format for new instructions
  s390/alternatives: use insn format for new instructions
  s390/alternatives: use instructions instead of byte patterns
  s390/traps: improve panic message for translation-specification exception
  ...

Merge tag 'soc-fixes-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd BergmannL
"The introduction of vmap-stack on 32-bit arm caused a regression on a
  few omap3/omap4 machines that pass a stack variable into a firmware
  interface.

  The early pre-ACPI AMD Seattle machines have been broken for a while,
  Ard Biesheuvel has a series to bring them back for now.

  A few machines with multiple DMA channels used on a device have the
  channels in the wrong order according to the binding, which causes a
  harmless warning. Reversing the order is easier than fixing the tools
  to suppress the warning"

* tag 'soc-fixes-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  arm64: dts: ls1046a: Update i2c node dma properties
  arm64: dts: ls1043a: Update i2c dma properties
  ARM: dts: spear1340: Update serial node properties
  ARM: dts: spear13xx: Update SPI dma properties
  ARM: OMAP2+: Fix regression for smc calls for vmap stack
  dt: amd-seattle: add a description of the CPUs and caches
  dt: amd-seattle: disable IPMI controller and some GPIO blocks on B0
  dt: amd-seattle: add description of the SATA/CCP SMMUs
  dt: amd-seattle: add a description of the PCIe SMMU
  dt: amd-seattle: fix PCIe legacy interrupt routing
  dt: amd-seattle: upgrade AMD Seattle XGBE to new SMMU binding
  dt: amd-seattle: remove Overdrive revision A0 support
  dt: amd-seattle: remove Husky platform

perf python: Convert tracepoint.py example to python3

Convert the tracepoint.py file to python3 as many of the files in
tools/perf are already written in python3.

Committer testing:

  # export PYTHONPATH=/tmp/build/perf/python/
  # python3 ~acme/git/perf/tools/perf/python/tracepoint.py | head
  time 67394457376909 prev_comm=swapper/12 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=gnome-terminal- next_pid=3313 next_prio=120
  time 67394457807669 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
  time 67394457811859 prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=python3 next_pid=1485930 next_prio=120
  time 67394457824929 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
  time 67394457831899 prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=python3 next_pid=1485930 next_prio=120
  time 67394457842299 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
  time 67394457844179 prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=python3 next_pid=1485930 next_prio=120
  time 67394457853879 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
  time 67394457856339 prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=python3 next_pid=1485930 next_prio=120
  time 67394457865659 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
  Traceback (most recent call last):
    File "/var/home/acme/git/perf/tools/perf/python/tracepoint.py", line 48, in <module>
      main()
    File "/var/home/acme/git/perf/tools/perf/python/tracepoint.py", line 37, in main
      print("time %u prev_comm=%s prev_pid=%d prev_prio=%d prev_state=0x%x ==> next_comm=%s next_pid=%d next_prio=%d" % (
  BrokenPipeError: [Errno 32] Broken pipe
  #

Signed-off-by: Tanu M <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/linux-perf-users/CAPS78prawYzRZnyhWjgOnGw4EwoswNwztvfZFdCOPOydFzVwzQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf evlist: Directly return instead of using local ret variable

Addresses this coccinelle warning:

./tools/perf/util/evlist.c:1333:5-8: Unneeded variable: "err". Return
"- ENOMEM" on line 1358

Signed-off-by: Haowen Bai <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cpumap: More cpu map reuse by merge.

perf_cpu_map__merge() will reuse one of its arguments if they are equal or
the other argument is NULL.

The arguments could be reused if it is known one set of values is a
subset of the other.

For example, a map of 0-1 and a map of just 0 when merged yields the map
of 0-1.

Currently a new map is created rather than adding a reference count to
the original 0-1 map.

Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: German Gomez <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: John Garry <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cpumap: Add is_subset function

Returns true if the second argument is a subset of the first.

Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: German Gomez <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: John Garry <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf evlist: Rename cpus to user_requested_cpus

evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps
of all evsels.

For non-task targets, cpus is set to be cpus requested from the command
line, defaulting to all online cpus if no cpus are specified.

For an uncore event, all_cpus may be just CPU 0 or every online CPU.

This causes all_cpus to have fewer values than the cpus variable which
is confusing given the 'all' in the name.

To try to make the behavior clearer, rename cpus to user_requested_cpus
and add comments on the two struct variables.

Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: German Gomez <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: John Garry <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf tools: Stop depending on .git files for building PERF-VERSION-FILE

This essentially reverts commit c72e3f04b45fb2e5 ("tools/perf/build:
Speed up git-version test on re-make") and commit 4e666cdb06eede20
("perf tools: Fix dependency for version file creation")

In commit c72e3f04b45fb2e5 ("tools/perf/build: Speed up git-version test
on re-make"), a makefile dependency on .git/HEAD was added. The
background is that running PERF-VERSION-FILE is relatively slow, and
commands like "git describe" are particularly slow.

In commit 4e666cdb06eede20 ("perf tools: Fix dependency for version file
creation"), an additional dependency on .git/ORIG_HEAD was added, as
.git/HEAD may not change for "git reset --hard HEAD^" command. However,
depending on whether we're on a branch or not, a "git cherry-pick" may
not lead to the version being updated.

As discussed with the git community in [0], using git internal files for
dependencies is not reliable. Commit 4e666cdb06ee also breaks some build
scenarios [1].

As mentioned, c72e3f04b45fb2e5 ("tools/perf/build: Speed up git-version
test on re-make") was added to speed up the build. However in commit
7572733b84997d23 ("perf tools: Fix version kernel tag") we removed the
call to "git describe", so just revert Makefile.perf back to same as pre
c72e3f04b45fb2e5 ("tools/perf/build: Speed up git-version test on
re-make") and the build should not be so slow, as below:

Pre 7572733b8499:

  $> time util/PERF-VERSION-GEN
    PERF_VERSION = 5.17.rc8.g4e666cdb06ee

  real    0m0.110s
  user    0m0.091s
  sys     0m0.019s

Post 7572733b8499:

  $> time util/PERF-VERSION-GEN
    PERF_VERSION = 5.17.rc8.g7572733b8499

  real    0m0.039s
  user    0m0.036s
  sys     0m0.007s

[0] https://lore.kernel.org/git/[email protected]/T/#m4a4dd6de52fdbe21179306cd57b3761eb07f45f8
[1] https://lore.kernel.org/linux-perf-users/20220329093120.4173283 [email protected]/T/#u

Committer testing:

After a fresh rebuild using 'make -C tools/perf O=/tmp/build/perf install-bin':

  $ perf -v
  perf version 5.17.g162f9db407b6
  $ git log --oneline -1
  162f9db407b6a6e5 (HEAD -> perf/core) perf tools: Stop depending on .git files for building PERF-VERSION-FILE
  $

Now using a detached tarball, i.e. outside the kernel source tree:

  $ ls -la perf*tar
  ls: cannot access 'perf*tar': No such file or directory
  $ make perf-tar-src-pkg
    TAR
    PERF_VERSION = 5.17.g31d10b3ef133
  $ ls -la perf*tar
  -rw-r--r--. 1 acme acme 22241280 Mar 30 13:26 perf-5.17.0.tar
  $ mv perf-5.17.0.tar /tmp
  $ cd /tmp
  $ tar xf perf-5.17.0.tar
  $ cd perf-5.17.0/
  $ make -C tools/perf |& tail
    CC      util/pmu.o
    CC      util/pmu-flex.o
    CC      util/expr-flex.o
    CC      util/expr.o
    LD      util/scripting-engines/perf-in.o
    LD      util/intel-pt-decoder/perf-in.o
    LD      util/perf-in.o
    LD      perf-in.o
    LINK    perf
  make: Leaving directory '/tmp/perf-5.17.0/tools/perf'
  $ tools/perf/perf -v
  perf version 5.17.g31d10b3ef133
  $ pwd
  /tmp/perf-5.17.0
  $ cat PERF-VERSION-FILE
  #define PERF_VERSION "5.17.g31d10b3ef133"
  $

Fixes: 4e666cdb06eede20 ("perf tools: Fix dependency for version file creation")
Reported-by: Matthieu Baerts <[email protected]>
Signed-off-by: John Garry <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Matthieu Baerts <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools headers cpufeatures: Sync with the kernel sources

To pick the changes from:

  991625f3dd2cbc4b ("x86/ibt: Add IBT feature, MSR and #CP handling")

This only causes these perf files to be rebuilt:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools headers UAPI: Sync drm/i915_drm.h with the kernel sources

To pick up the changes in:

  caa574ffc4aaf4f2 ("drm/i915/uapi: document behaviour for DG2 64K support")

That don't add any new ioctl, so no changes in tooling.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
  diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h

Cc: Lucas De Marchi <[email protected]>
Cc: Matthew Auld <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools headers UAPI: Sync linux/kvm.h with the kernel sources

To pick the changes in:

  6d8491910fcd3324 ("KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2")
  ef11c9463ae00630 ("KVM: s390: Add vm IOCTL for key checked guest absolute memory access")
  e9e9feebcbc14b17 ("KVM: s390: Add optional storage key checking to MEMOP IOCTL")

That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the the 'perf trace' ioctl syscall argument
beautifiers.

This is also by now used by tools/testing/selftests/kvm/, a simple test
build succeeded.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: Christian Borntraeger <[email protected]>
Cc: Janis Schoetterl-Glausch <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools kvm headers arm64: Update KVM headers from the kernel sources

To pick the changes from:

  34739fd95fab3a5e ("KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field")
  583cda1b0e7d5d49 ("KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU")

That don't causes any changes in tooling (when built on x86), only
addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
  diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h

Cc: Marc Zyngier <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Alexandru Elisei <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools arch x86: Sync the msr-index.h copy with the kernel sources

To pick up the changes in:

  991625f3dd2cbc4b ("x86/ibt: Add IBT feature, MSR and #CP handling")

Addressing these tools/perf build warnings:

    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
    Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'

That makes the beautification scripts to pick some new entries:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  --- before 2022-03-29 16:23:07.678740040 -0300
  +++ after 2022-03-29 16:23:16.960978524 -0300
  @@ -220,6 +220,13 @@
    [0x00000669] = "MC6_DEMOTION_POLICY_CONFIG",
    [0x00000680] = "LBR_NHM_FROM",
    [0x00000690] = "CORE_PERF_LIMIT_REASONS",
  + [0x000006a0] = "IA32_U_CET",
  + [0x000006a2] = "IA32_S_CET",
  + [0x000006a4] = "IA32_PL0_SSP",
  + [0x000006a5] = "IA32_PL1_SSP",
  + [0x000006a6] = "IA32_PL2_SSP",
  + [0x000006a7] = "IA32_PL3_SSP",
  + [0x000006a8] = "IA32_INT_SSP_TAB",
    [0x000006B0] = "GFX_PERF_LIMIT_REASONS",
    [0x000006B1] = "RING_PERF_LIMIT_REASONS",
    [0x000006c0] = "LBR_NHM_TO",
  $

And this gets rebuilt:

  CC      /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
  LD      /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
  LD      /tmp/build/perf/trace/beauty/perf-in.o
  CC      /tmp/build/perf/util/amd-sample-raw.o
  LD      /tmp/build/perf/util/perf-in.o
  LD      /tmp/build/perf/perf-in.o
  LINK    /tmp/build/perf/perf

Now one can trace systemwide asking to see backtraces to where those
MSRs are being read/written with:

  # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
  ^C#

If we use -v (verbose mode) we can see what it does behind the scenes:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
  Using CPUID AuthenticAMD-25-21-0
  0x6a0
  0x6a8
  New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
  0x6a0
  0x6a8
  New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
  mmap size 528384B
  ^C#

Example with a frequent msr:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
  Using CPUID AuthenticAMD-25-21-0
  0x48
  New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  0x48
  New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  mmap size 528384B
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: build id mismatch for vmlinux.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
     0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       __switch_to_xtra ([kernel.kallsyms])
                                       __switch_to ([kernel.kallsyms])
                                       __schedule ([kernel.kallsyms])
                                       schedule ([kernel.kallsyms])
                                       futex_wait_queue_me ([kernel.kallsyms])
                                       futex_wait ([kernel.kallsyms])
                                       do_futex ([kernel.kallsyms])
                                       __x64_sys_futex ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                       __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
     0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       __switch_to_xtra ([kernel.kallsyms])
                                       __switch_to ([kernel.kallsyms])
                                       __schedule ([kernel.kallsyms])
                                       schedule_idle ([kernel.kallsyms])
                                       do_idle ([kernel.kallsyms])
                                       cpu_startup_entry ([kernel.kallsyms])
                                       secondary_startup_64_no_verify ([kernel.kallsyms])
  #

Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools headers UAPI: Sync asm-generic/mman-common.h with the kernel

To pick the changes from:

  9457056ac426e5ed ("mm: madvise: MADV_DONTNEED_LOCKED")

That result in these changes in the tools:

  $ diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
  --- tools/include/uapi/asm-generic/mman-common.h 2022-03-29 16:17:50.461694991 -0300
  +++ include/uapi/asm-generic/mman-common.h 2022-03-27 19:12:48.923250468 -0300
  @@ -75,6 +75,8 @@
   #define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */
   #define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */

  +#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */
  +
   /* compatibility flags */
   #define MAP_FILE 0

  $ tools/perf/trace/beauty/madvise_behavior.sh > before
  $ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h
  $ tools/perf/trace/beauty/madvise_behavior.sh > after
  $ diff -u before after
  --- before 2022-03-29 16:18:04.091044244 -0300
  +++ after 2022-03-29 16:18:11.692238906 -0300
  @@ -20,6 +20,7 @@
    [21] = "PAGEOUT",
    [22] = "POPULATE_READ",
    [23] = "POPULATE_WRITE",
  + [24] = "DONTNEED_LOCKED",
    [100] = "HWPOISON",
    [101] = "SOFT_OFFLINE",
   };
  $

I.e. now when madvise gets those behaviours as args, 'perf trace' will
be able to translate from the number to a human readable string and to
use the strings in tracepoint filter expressions.

This addresses the following perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
  diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h

Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf beauty: Update copy of linux/socket.h with the kernel sources

To pick the changes in:

  a6a6fe27bab48f0d ("net/smc: Dynamic control handshake limitation by socket options")

This automagically adds support for the SOL_MNC socket level:

  $ diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
  --- tools/perf/trace/beauty/include/linux/socket.h 2022-03-14 17:55:22.277148656 -0300
  +++ include/linux/socket.h 2022-03-27 19:12:48.908250063 -0300
  @@ -366,6 +366,7 @@
   #define SOL_XDP 283
   #define SOL_MPTCP 284
   #define SOL_MCTP 285
  +#define SOL_SMC 286

   /* IPX options */
   #define IPX_TYPE 1
  $ tools/perf/trace/beauty/socket.sh > before
  $ cp include/linux/socket.h tools/perf/trace/beauty/include/linux/socket.h
  $ tools/perf/trace/beauty/socket.sh > after
  $ diff -u before after
  --- before 2022-03-29 11:47:56.390258780 -0300
  +++ after 2022-03-29 11:48:03.158436189 -0300
  @@ -67,6 +67,7 @@
    [283] = "XDP",
    [284] = "MPTCP",
    [285] = "MCTP",
  + [286] = "SMC",
   };

   DEFINE_STRARRAY(socket_level, "SOL_");
  $

This will allow 'perf trace' to translate 286 into "SMC" as is done with
the other socket levels:

  # perf trace -e setsockopt --max-events 4
   344.916 ( 0.003 ms): Socket Thread/3816 setsockopt(fd: 168, level: TCP, optname: 5, optval: 0x7f5797b9c4f8, optlen: 4) = 0
   344.920 ( 0.002 ms): Socket Thread/3816 setsockopt(fd: 168, level: TCP, optname: 6, optval: 0x7f5797b9c4f4, optlen: 4) = 0
  1246.974 ( 0.010 ms): systemd-resolv/1128 setsockopt(fd: 22, level: IP, optname: 11, optval: 0x7ffc96cd7244, optlen: 4) = 0
  1246.986 ( 0.002 ms): systemd-resolv/1128 setsockopt(fd: 22, level: IP, optname: 8, optval: 0x7ffc96cd7264, optlen: 4) = 0

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
  diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h

Cc: Adrian Hunter <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: D. Wythe <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf tools: Update copy of libbpf's hashmap.c

To pick the changes in:

  fba60b171a032283 ("libbpf: Use IS_ERR_OR_NULL() in hashmap__free()")

That don't entail any changes in tools/perf.

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
  diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h

Not a kernel ABI, its just that this uses the mechanism in place for
checking kernel ABI files drift.

Cc: Andrii Nakryiko <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mauricio Vásquez <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf stat: Avoid SEGV if core.cpus isn't set

Passing NULL to perf_cpu_map__max doesn't make sense as there is no
valid max. Avoid this problem by null checking in
perf_stat_init_aggr_mode.

Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Antonov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Alexey Bayduraev <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: German Gomez <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: John Garry <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Merge branch 'akpm' (patches from Andrew)

Merge still more updates from Andrew Morton:
"16 patches.

  Subsystems affected by this patch series: ofs2, nilfs2, mailmap, and
  mm (madvise, mlock, mfence, memory-failure, kasan, debug, kmemleak,
  and damon)"

* emailed patches from Andrew Morton <[email protected]>:
  mm/damon: prevent activated scheme from sleeping by deactivated schemes
  mm/kmemleak: reset tag when compare object pointer
  doc/vm/page_owner.rst: remove content related to -c option
  tools/vm/page_owner_sort.c: remove -c option
  mm, kasan: fix __GFP_BITS_SHIFT definition breaking LOCKDEP
  mm,hwpoison: unmap poisoned page before invalidation
  mailmap: update Kirill's email
  mm: kfence: fix objcgs vector allocation
  mm/munlock: protect the per-CPU pagevec by a local_lock_t
  mm/munlock: update Documentation/vm/unevictable-lru.rst
  mm/munlock: add lru_add_drain() to fix memcg_stat_test
  nilfs2: get rid of nilfs_mapping_init()
  nilfs2: fix lockdep warnings during disk space reclamation
  nilfs2: fix lockdep warnings in page operations for btree nodes
  ocfs2: fix crash when mount with quota enabled
  Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"

mm/damon: prevent activated scheme from sleeping by deactivated schemes

In the DAMON, the minimum wait time of the schemes decides whether the
kernel wakes up 'kdamon_fn()'. But since the minimum wait time is
initialized to zero, there are corner cases against the original
objective.

For example, if we have several schemes for one target, and if the wait
time of the first scheme is zero, the minimum wait time will set zero,
which means 'kdamond_fn()' should wake up to apply this scheme.
However, in the following scheme, wait time can be set to non-zero.
Thus, the mininum wait time will be set to non-zero, which can cause
sleeping this interval for 'kdamon_fn()' due to one deactivated last
scheme.

This commit prevents making DAMON monitoring inactive state due to other
deactivated schemes.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jonghyeon Kim <[email protected]>
Reviewed-by: SeongJae Park <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/kmemleak: reset tag when compare object pointer

When we use HW-tag based kasan and enable vmalloc support, we hit the
following bug.  It is due to comparison between tagged object and
non-tagged pointer.

We need to reset the kasan tag when we need to compare tagged object and
non-tagged pointer.

  kmemleak: [name:kmemleak&]Scan area larger than object 0xffffffe77076f440
  CPU: 4 PID: 1 Comm: init Tainted: G S      W         5.15.25-android13-0-g5cacf919c2bc #1
  Hardware name: MT6983(ENG) (DT)
  Call trace:
   add_scan_area+0xc4/0x244
   kmemleak_scan_area+0x40/0x9c
   layout_and_allocate+0x1e8/0x288
   load_module+0x2c8/0xf00
   __se_sys_finit_module+0x190/0x1d0
   __arm64_sys_finit_module+0x20/0x30
   invoke_syscall+0x60/0x170
   el0_svc_common+0xc8/0x114
   do_el0_svc+0x28/0xa0
   el0_svc+0x60/0xf8
   el0t_64_sync_handler+0x88/0xec
   el0t_64_sync+0x1b4/0x1b8
  kmemleak: [name:kmemleak&]Object 0xf5ffffe77076b000 (size 32768):
  kmemleak: [name:kmemleak&]  comm "init", pid 1, jiffies 4294894197
  kmemleak: [name:kmemleak&]  min_count = 0
  kmemleak: [name:kmemleak&]  count = 0
  kmemleak: [name:kmemleak&]  flags = 0x1
  kmemleak: [name:kmemleak&]  checksum = 0
  kmemleak: [name:kmemleak&]  backtrace:
       module_alloc+0x9c/0x120
       move_module+0x34/0x19c
       layout_and_allocate+0x1c4/0x288
       load_module+0x2c8/0xf00
       __se_sys_finit_module+0x190/0x1d0
       __arm64_sys_finit_module+0x20/0x30
       invoke_syscall+0x60/0x170
       el0_svc_common+0xc8/0x114
       do_el0_svc+0x28/0xa0
       el0_svc+0x60/0xf8
       el0t_64_sync_handler+0x88/0xec
       el0t_64_sync+0x1b4/0x1b8

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kuan-Ying Lee <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Matthias Brugger <[email protected]>
Cc: Chinwen Chang <[email protected]>
Cc: Nicholas Tang <[email protected]>
Cc: Yee Lee <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

doc/vm/page_owner.rst: remove content related to -c option

-c option has been removed from page_owner_sort.c.

Remove the usage of -c option from Documentation.

This work is coauthored by
        Shenghong Han
        Yixuan Cao
        Chongxi Zhao
        Jiajian Ye
        Yuhong Feng
        Yongqiang Liu

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yinan Zhang <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Sean Anderson <[email protected]>
Cc: Tang Bin <[email protected]>
Cc: Zhenliang Wei <[email protected]>
Cc: Georgi Djakov <[email protected]>
Cc: Chongxi Zhao <[email protected]>
Cc: Jiajian Ye <[email protected]>
Cc: Yixuan Cao <[email protected]>
Cc: Yuhong Feng <[email protected]>
Cc: Yongqiang Liu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

tools/vm/page_owner_sort.c: remove -c option

The -c option is used to cull by stacktrace.  Now, --cull option has
been Added in page_owner_sort.c.  Culling by stacktrace is one of the
function of "--cull".  No need to set an extra parameter.  So remove -c
option.

Remove parsing of -c when parse parameter and remove "-c" from usage.

This work is coauthored by
        Shenghong Han
        Yixuan Cao
        Chongxi Zhao
        Jiajian Ye
        Yuhong Feng
        Yongqiang Liu

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yinan Zhang <[email protected]>
Cc: Chongxi Zhao <[email protected]>
Cc: Georgi Djakov <[email protected]>
Cc: Jiajian Ye <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Sean Anderson <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Tang Bin <[email protected]>
Cc: Yixuan Cao <[email protected]>
Cc: Yongqiang Liu <[email protected]>
Cc: Yuhong Feng <[email protected]>
Cc: Zhenliang Wei <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, kasan: fix __GFP_BITS_SHIFT definition breaking LOCKDEP

KASAN changes that added new GFP flags mistakenly updated
__GFP_BITS_SHIFT as the total number of GFP bits instead of as a shift
used to define __GFP_BITS_MASK.

This broke LOCKDEP, as __GFP_BITS_MASK now gets the 25th bit enabled
instead of the 28th for __GFP_NOLOCKDEP.

Update __GFP_BITS_SHIFT to always count KASAN GFP bits.

In the future, we could handle all combinations of KASAN and LOCKDEP to
occupy as few bits as possible. For now, we have enough GFP bits to be
inefficient in this quick fix.

Link: https://lkml.kernel.org/r/462ff52742a1fcc95a69778685737f723ee4dfb3.1648400273.git.andreyknvl@google.com
Fixes: 9353ffa6e9e9 ("kasan, page_alloc: allow skipping memory init for HW_TAGS")
Fixes: 53ae233c30a6 ("kasan, page_alloc: allow skipping unpoisoning for HW_TAGS")
Fixes: f49d9c5bb15c ("kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS")
Signed-off-by: Andrey Konovalov <[email protected]>
Reported-by: Sebastian Andrzej Siewior <[email protected]>
Tested-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm,hwpoison: unmap poisoned page before invalidation

In some cases it appears the invalidation of a hwpoisoned page fails
because the page is still mapped in another process.  This can cause a
program to be continuously restarted and die when it page faults on the
page that was not invalidated.  Avoid that problem by unmapping the
hwpoisoned page when we find it.

Another issue is that sometimes we end up oopsing in finish_fault, if
the code tries to do something with the now-NULL vmf->page.  I did not
hit this error when submitting the previous patch because there are
several opportunities for alloc_set_pte to bail out before accessing
vmf->page, and that apparently happened on those systems, and most of
the time on other systems, too.

However, across several million systems that error does occur a handful
of times a day.  It can be avoided by returning VM_FAULT_NOPAGE which
will cause do_read_fault to return before calling finish_fault.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path")
Signed-off-by: Rik van Riel <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Tested-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mailmap: update Kirill's email

My new email address is [email protected].

Link: https://lkml.kernel.org/r/164846762354.278960.13129571556274098855.stgit@pro
Signed-off-by: Kirill Tkhai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: kfence: fix objcgs vector allocation

If the kfence object is allocated to be used for objects vector, then
this slot of the pool eventually being occupied permanently since the
vector is never freed. The solutions could be (1) freeing vector when
the kfence object is freed or (2) allocating all vectors statically.

Since the memory consumption of object vectors is low, it is better to
chose (2) to fix the issue and it is also can reduce overhead of vectors
allocating in the future.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB")
Signed-off-by: Muchun Song <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Xiongchun Duan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/munlock: protect the per-CPU pagevec by a local_lock_t

The access to mlock_pvec is protected by disabling preemption via
get_cpu_var() or implicit by having preemption disabled by the caller
(in mlock_page_drain() case). This breaks on PREEMPT_RT since
folio_lruvec_lock_irq() acquires a sleeping lock in this section.

Create struct mlock_pvec which consits of the local_lock_t and the
pagevec. Acquire the local_lock() before accessing the per-CPU pagevec.
Replace mlock_page_drain() with a _local() version which is invoked on
the local CPU and acquires the local_lock_t and a _remote() version
which uses the pagevec from a remote CPU which offline.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/munlock: update Documentation/vm/unevictable-lru.rst

Update Documentation/vm/unevictable-lru.rst to reflect the changes made
by the mm/munlock series: keeping an mlock_count instead of page_mlock()
(formerly try_to_munlock()) and munlock_vma_pages_all() etc. Also make
other little updates or cleanups wherever noticed.

But, I apologize, this is already out of date, in that "folio" appears
nowhere: 5.18 will be in a transitional state from "page" to "folio",
and documenting its current mix of the two does not help to understand
"the Unevictable LRU". Should be revisited when naming is more settled.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: Greg Thelen <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/munlock: add lru_add_drain() to fix memcg_stat_test

Mike reports that LTP memcg_stat_test usually leads to

  memcg_stat_test 3 TINFO: Test unevictable with MAP_LOCKED
  memcg_stat_test 3 TINFO: Running memcg_process --mmap-lock1 -s 135168
  memcg_stat_test 3 TINFO: Warming up pid: 3460
  memcg_stat_test 3 TINFO: Process is still here after warm up: 3460
  memcg_stat_test 3 TFAIL: unevictable is 122880, 135168 expected

but may also lead to

  memcg_stat_test 4 TINFO: Test unevictable with mlock
  memcg_stat_test 4 TINFO: Running memcg_process --mmap-lock2 -s 135168
  memcg_stat_test 4 TINFO: Warming up pid: 4271
  memcg_stat_test 4 TINFO: Process is still here after warm up: 4271
  memcg_stat_test 4 TFAIL: unevictable is 122880, 135168 expected

or both.  A wee bit flaky.

follow_page_pte() used to have an lru_add_drain() per each page mlocked,
and the test came to rely on accurate stats.  The pagevec to be drained
is different now, but still covered by lru_add_drain(); and, never mind
the test, I believe it's in everyone's interest that a bulk faulting
interface like populate_vma_page_range() or faultin_vma_page_range()
should drain its local pagevecs at the end, to save others sometimes
needing the much more expensive lru_add_drain_all().

This does not absolutely guarantee exact stats - the mlocking task can
be migrated between CPUs as it proceeds - but it's good enough and the
tests pass.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: b67bf49ce7aa ("mm/munlock: delete FOLL_MLOCK and FOLL_POPULATE")
Signed-off-by: Hugh Dickins <[email protected]>
Reported-by: Mike Galbraith <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

nilfs2: get rid of nilfs_mapping_init()

After applying the lockdep warning fixes, nilfs_mapping_init() is no
longer used, so delete it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hao Sun <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

nilfs2: fix lockdep warnings during disk space reclamation

During disk space reclamation, nilfs2 still emits the following lockdep
warning due to page/folio operations on shadowed page caches that nilfs2
uses to get a snapshot of DAT file in memory:

  WARNING: CPU: 0 PID: 2643 at include/linux/backing-dev.h:272 __folio_mark_dirty+0x645/0x670
  ...
  RIP: 0010:__folio_mark_dirty+0x645/0x670
  ...
  Call Trace:
    filemap_dirty_folio+0x74/0xd0
    __set_page_dirty_nobuffers+0x85/0xb0
    nilfs_copy_dirty_pages+0x288/0x510 [nilfs2]
    nilfs_mdt_save_to_shadow_map+0x50/0xe0 [nilfs2]
    nilfs_clean_segments+0xee/0x5d0 [nilfs2]
    nilfs_ioctl_clean_segments.isra.19+0xb08/0xf40 [nilfs2]
    nilfs_ioctl+0xc52/0xfb0 [nilfs2]
    __x64_sys_ioctl+0x11d/0x170

This fixes the remaining warning by using inode objects to hold those
page caches.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Tested-by: Ryusuke Konishi <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hao Sun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

nilfs2: fix lockdep warnings in page operations for btree nodes

Patch series "nilfs2 lockdep warning fixes".

The first two are to resolve the lockdep warning issue, and the last one
is the accompanying cleanup and low priority.

Based on your comment, this series solves the issue by separating inode
object as needed.  Since I was worried about the impact of the object
composition changes, I tested the series carefully not to cause
regressions especially for delicate functions such like disk space
reclamation and snapshots.

This patch (of 3):

If CONFIG_LOCKDEP is enabled, nilfs2 hits lockdep warnings at
inode_to_wb() during page/folio operations for btree nodes:

  WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 inode_to_wb include/linux/backing-dev.h:269 [inline]
  WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 folio_account_dirtied mm/page-writeback.c:2460 [inline]
  WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 __folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509
  Modules linked in:
  ...
  RIP: 0010:inode_to_wb include/linux/backing-dev.h:269 [inline]
  RIP: 0010:folio_account_dirtied mm/page-writeback.c:2460 [inline]
  RIP: 0010:__folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509
  ...
  Call Trace:
    __set_page_dirty include/linux/pagemap.h:834 [inline]
    mark_buffer_dirty+0x4e6/0x650 fs/buffer.c:1145
    nilfs_btree_propagate_p fs/nilfs2/btree.c:1889 [inline]
    nilfs_btree_propagate+0x4ae/0xea0 fs/nilfs2/btree.c:2085
    nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337
    nilfs_collect_dat_data+0x45/0xd0 fs/nilfs2/segment.c:625
    nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1009
    nilfs_segctor_scan_file+0x47a/0x700 fs/nilfs2/segment.c:1048
    nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1224 [inline]
    nilfs_segctor_collect fs/nilfs2/segment.c:1494 [inline]
    nilfs_segctor_do_construct+0x14f3/0x6c60 fs/nilfs2/segment.c:2036
    nilfs_segctor_construct+0x7a7/0xb30 fs/nilfs2/segment.c:2372
    nilfs_segctor_thread_construct fs/nilfs2/segment.c:2480 [inline]
    nilfs_segctor_thread+0x3c3/0xf90 fs/nilfs2/segment.c:2563
    kthread+0x405/0x4f0 kernel/kthread.c:327
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

This is because nilfs2 uses two page caches for each inode and
inode->i_mapping never points to one of them, the btree node cache.

This causes inode_to_wb(inode) to refer to a different page cache than
the caller page/folio operations such like __folio_start_writeback(),
__folio_end_writeback(), or __folio_mark_dirty() acquired the lock.

This patch resolves the issue by allocating and using an additional
inode to hold the page cache of btree nodes.  The inode is attached
one-to-one to the traditional nilfs2 inode if it requires a block
mapping with b-tree.  This setup change is in memory only and does not
affect the disk format.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: Hao Sun <[email protected]>
Reported-by: David Hildenbrand <[email protected]>
Tested-by: Ryusuke Konishi <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ocfs2: fix crash when mount with quota enabled

There is a reported crash when mounting ocfs2 with quota enabled.

  RIP: 0010:ocfs2_qinfo_lock_res_init+0x44/0x50 [ocfs2]
  Call Trace:
    ocfs2_local_read_info+0xb9/0x6f0 [ocfs2]
    dquot_load_quota_sb+0x216/0x470
    dquot_load_quota_inode+0x85/0x100
    ocfs2_enable_quotas+0xa0/0x1c0 [ocfs2]
    ocfs2_fill_super.cold+0xc8/0x1bf [ocfs2]
    mount_bdev+0x185/0x1b0
    legacy_get_tree+0x27/0x40
    vfs_get_tree+0x25/0xb0
    path_mount+0x465/0xac0
    __x64_sys_mount+0x103/0x140

It is caused by when initializing dqi_gqlock, the corresponding dqi_type
and dqi_sb are not properly initialized.

This issue is introduced by commit 6c85c2c72819, which wants to avoid
accessing uninitialized variables in error cases.  So make global quota
info properly initialized.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1007141
Fixes: 6c85c2c72819 ("ocfs2: quota_local: fix possible uninitialized-variable access in ocfs2_local_read_info()")
Signed-off-by: Joseph Qi <[email protected]>
Reported-by: Dayvison <[email protected]>
Tested-by: Valentin Vidic <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"

This reverts commit 08095d6310a7 ("mm: madvise: skip unmapped vma holes
passed to process_madvise") as process_madvise() fails to return the
exact processed bytes in other cases too.

As an example: if process_madvise() hits mlocked pages after processing
some initial bytes passed in [start, end), it just returns EINVAL
although some bytes are processed. Thus making an exception only for
ENOMEM is partially fixing the problem of returning the proper advised
bytes.

Thus revert this patch and return proper bytes advised.

Link: https://lkml.kernel.org/r/e73da1304a88b6a8a11907045117cccf4c2b8374.1648046642.git.quic_charante@quicinc.com
Fixes: 08095d6310a7ce ("mm: madvise: skip unmapped vma holes passed to process_madvise")
Signed-off-by: Charan Teja Kalla <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

btrfs: Remove a use of PAGE_SIZE in btrfs_invalidate_folio()

While btrfs doesn't use large folios yet, this should have been changed
as part of the conversion from invalidatepage to invalidate_folio.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>

ntfs: Correct mark_ntfs_record_dirty() folio conversion

We've already done the work of block_dirty_folio() here, leaving
only the work that needs to be done by filemap_dirty_folio().
This was a misconversion where I misread __set_page_dirty_nobuffers()
as __set_page_dirty_buffers().

Fixes: e621900ad28b ("fs: Convert __set_page_dirty_buffers to block_dirty_folio")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>

f2fs: Get the superblock from the mapping instead of the page

It's slightly more efficient to go directly from the mapping to the
superblock than to go from the page. Now that these routines have
the mapping passed to them, there's no reason not to use it.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>

f2fs: Correct f2fs_dirty_data_folio() conversion

I got the return value wrong. Very little checks the return value
from set_page_dirty(), so nobody noticed during testing.

Fixes: 4f5e34f71318 ("f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>

ext4: Correct ext4_journalled_dirty_folio() conversion

This should use the new folio_buffers() instead of page_has_buffers().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>

filemap: Remove AOP_FLAG_CONT_EXPAND

This flag is no longer used, so remove it.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>

fs: Pass an iocb to generic_perform_write()

We can extract both the file pointer and the pos from the iocb.
This simplifies each caller as well as allowing generic_perform_write()
to see more of the iocb contents in the future.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>

fs, net: Move read_descriptor_t to net.h

fs.h has no more need for this typedef; networking is now the sole user
of the read_descriptor_t.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>

fs: Remove read_actor_t

This typedef is not used any more.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>

iomap: Simplify is_partially_uptodate a little

Remove the unnecessary variable 'len' and fix a comment to refer to
the folio instead of the page.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>

readahead: Update comments

- Refer to folios where appropriate, not pages (Matthew Wilcox)
- Eliminate references to the internal PG_readhead
- Use "readahead" consistently - not "read-ahead" or "read ahead"
(mostly Neil Brown)
- Clarify some sections that, on reflection, weren't very clear (Neil
Brown)
- Minor punctuation/spelling fixes (Neil Brown)

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

mm: remove the skip_page argument to read_pages

The skip_page argument to read_pages controls if rac->_index is
incremented before returning from the function. Just open code that in
the callers.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

mm: remove the pages argument to read_pages

This is always an empty list or NULL with the removal of the ->readahead
support, so remove it.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

fs: Remove ->readpages address space operation

All filesystems have now been converted to use ->readahead, so
remove the ->readpages operation and fix all the comments that
used to refer to it.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>

readahead: Remove read_cache_pages()

With no remaining users, remove this function and the related
infrastructure.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>

Merge tag 'sound-fix-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Just a few fixes that have been gathered since the previous pull:

   - An additional fix for potential PCM deadlocks

   - A series of HD-audio CS8409 codec patches for new models

   - Other device specific fixes for HD-audio, ASoC mediatek, Intel,
     fsl, rockchip"

* tag 'sound-fix-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: pcm: Fix potential AB/BA lock with buffer_mutex and mmap_lock
  ALSA: hda: Avoid unsol event during RPM suspending
  ALSA: hda/realtek: Fix audio regression on Mi Notebook Pro 2020
  ALSA: hda/cs8409: Add new Dolphin HW variants
  ALSA: hda/cs8409: Disable HSBIAS_SENSE_EN for Cyborg
  ALSA: hda/cs8409: Support new Warlock MLK Variants
  ALSA: hda/cs8409: Fix Full Scale Volume setting for all variants
  ALSA: hda/cs8409: Re-order quirk table into ascending order
  ALSA: hda/cs8409: Fix Warlock to use mono mic configuration
  ALSA: cs4236: fix an incorrect NULL check on list iterator
  ALSA: hda/realtek: Enable headset mic on Lenovo P360
  ASoC: SOF: Intel: Fix build error without SND_SOC_SOF_PCI_DEV
  ALSA: hda/realtek: Add mute and micmut LED support for Zbook Fury 17 G9
  ASoC: rockchip: i2s_tdm: Fixup config for SND_SOC_DAIFMT_DSP_A/B
  ASoC: fsl-asoc-card: Fix jack_event() always return 0
  ASoC: mediatek: mt6358: add missing EXPORT_SYMBOLs

Merge tag 'gpio-fixes-for-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- grammar and formatting fixes in comments for gpio-ts4900

- correct links in gpio-ts5500

- fix a warning in doc generation for the core GPIO documentation

* tag 'gpio-fixes-for-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: ts5500: Fix Links to Technologic Systems web resources
  gpio: Properly document parent data union
  gpio: ts4900: Fix comment formatting and grammar

dm: fix bio polling to handle possibile BLK_STS_AGAIN

Expanded testing of DM's bio polling support (using more fio threads
to dm-linear ontop of null_blk) exposed the possibility for polled
bios to hang (repeatedly polling in io_uring) when null_blk responds
with BLK_STS_AGAIN (due to lack of resources):

1) io_complete_rw_iopoll() is called from blkdev_bio_end_io_async() to
   notify kiocb is done, that is the completion interface between block
   layer and io_uring

2) io_complete_rw_iopoll() is called from io_do_iopoll()

3) dm returns BLK_STS_AGAIN for one bio (on behalf of underlying
   driver), then io_complete_rw_iopoll is called, but io_do_iopoll()
   doesn't handle -EAGAIN at all (due to logic in io_rw_should_reissue)

4) reason for dm's BLK_STS_AGAIN is underlying null_blk driver ran out
   of requests (easier to reproduce by setting low hw_queue_depth).

5) dm should handle BLK_STS_AGAIN for POLLED underlying IO, and may
   retry in dm layer.

This fix adds REQ_POLLED specific BLK_STS_AGAIN handling to
dm_io_complete() that clears REQ_POLLED and requeues the bio to DM
using queue_io().

Fixes: b99fdcdc3636 ("dm: support bio polling")
Signed-off-by: Ming Lei <[email protected]>
[snitzer: revised header, reused dm_io_complete's REQ_POLLED case]
Signed-off-by: Mike Snitzer <[email protected]>

dm: fix dm_io and dm_target_io flags race condition on Alpha

Early alpha processors cannot write a single byte or short; they read 8
bytes, modify the value in registers and write back 8 bytes.

This could cause race condition in the structure dm_io - if the fields
flags and io_count are modified simultaneously.

Fix this bug by using 32-bit flags if we are on Alpha and if we are
compiling for a processor that doesn't have the byte-word-extension.

Signed-off-by: Mikulas Patocka <[email protected]>
Fixes: bd4a6dd241ae ("dm: reduce size of dm_io and dm_target_io structs")
[snitzer: Jens allowed this change since Mikulas owns a relevant Alpha!]
Acked-by: Jens Axboe <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:

- a revert of a patch resetting extra buttons on touchpads claiming to
   be buttonpads as this caused regression on certain Dell devices

- a new driver for Mediatek MT6779 keypad

- a new driver for Imagis touchscreen

- rework of Google/Chrome OS "Vivaldi" keyboard handling

- assorted driver fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (31 commits)
  Revert "Input: clear BTN_RIGHT/MIDDLE on buttonpads"
  Input: adi - remove redundant variable z
  Input: add Imagis touchscreen driver
  dt-bindings: input/touchscreen: bindings for Imagis
  Input: synaptics - enable InterTouch on ThinkPad T14/P14s Gen 1 AMD
  Input: stmfts - fix reference leak in stmfts_input_open
  Input: add bounds checking to input_set_capability()
  Input: iqs5xx - use local input_dev pointer
  HID: google: modify HID device groups of eel
  HID: google: Add support for vivaldi to hid-hammer
  HID: google: extract Vivaldi hid feature mapping for use in hid-hammer
  Input: extract ChromeOS vivaldi physmap show function
  HID: google: switch to devm when registering keyboard backlight LED
  Input: mt6779-keypad - fix signedness bug
  Input: mt6779-keypad - add MediaTek keypad driver
  dt-bindings: input: Add bindings for Mediatek matrix keypad
  Input: da9063 - use devm_delayed_work_autocancel()
  Input: goodix - fix race on driver unbind
  Input: goodix - use input_copy_abs() helper
  Input: add input_copy_abs() function
  ...

Merge tag 'rtc-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
"The bulk of the patches are about replacing the uie_unsupported struct
  rtc_device member by a feature bit.

  Subsystem:

   - remove uie_unsupported, all users have been converted to clear
     RTC_FEATURE_UPDATE_INTERRUPT and provide a reason

   - RTCs with an alarm with a resolution of a minute are now letting
     the core handle rounding down the alarm time

   - fix use-after-free on device removal

  New driver:

   - OP-TEE RTC PTA

  Drivers:

   - sun6i: Add H616 support

   - cmos: Fix the AltCentury for AMD platforms

   - spear: set range"

* tag 'rtc-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (56 commits)
  rtc: check if __rtc_read_time was successful
  rtc: gamecube: Fix refcount leak in gamecube_rtc_read_offset_from_sram
  rtc: mc146818-lib: Fix the AltCentury for AMD platforms
  rtc: optee: add RTC driver for OP-TEE RTC PTA
  rtc: pm8xxx: Return -ENODEV if set_time disallowed
  rtc: pm8xxx: Attach wake irq to device
  clk: sunxi-ng: sun6i-rtc: include clk/sunxi-ng.h
  rtc: remove uie_unsupported
  rtc: xgene: stop using uie_unsupported
  rtc: hym8563: switch to RTC_FEATURE_UPDATE_INTERRUPT
  rtc: hym8563: let the core handle the alarm resolution
  rtc: hym8563: switch to devm_rtc_allocate_device
  rtc: efi: switch to RTC_FEATURE_UPDATE_INTERRUPT
  rtc: efi: switch to devm_rtc_allocate_device
  rtc: add new RTC_FEATURE_ALARM_WAKEUP_ONLY feature
  rtc: spear: fix spear_rtc_read_time
  rtc: spear: drop uie_unsupported
  rtc: spear: set range
  rtc: spear: switch to devm_rtc_allocate_device
  rtc: pcf8563: switch to RTC_FEATURE_UPDATE_INTERRUPT
  ...

Merge branches 'fixes' and 'misc' into for-linus

Revert "um: clang: Strip out -mno-global-merge from USER_CFLAGS"

This reverts commit 6580c5c18fb3df2b11c5e0452372f815deeff895.

This patch is buggy, as noted in the patch linked below. The root cause
has been solved by removing '-mno-global-merge' for the entire kernel.

Link: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Nathan Chancellor <[email protected]>
Reviewed-by: David Gow <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: Remove '-mno-global-merge'

This flag is specific to clang, where it is only used by the 32-bit and
64-bit ARM backends. In certain situations, the presence of this flag
will cause a warning, as shown by commit 6580c5c18fb3 ("um: clang: Strip
out -mno-global-merge from USER_CFLAGS").

Since commit 61163efae020 ("kbuild: LLVMLinux: Add Kbuild support for
building kernel with Clang") that added this flag back in 2014, there
have been quite a few changes to the GlobalMerge pass in LLVM. Building
several different ARCH=arm and ARCH=arm64 configurations with LLVM 11
(minimum) and 15 (current main version) with this flag removed (i.e.,
with the default of '-mglobal-merge') reveals no modpost warnings, so it
is likely that the issue noted in the comment is no longer relevant due
to changes in LLVM or modpost, meaning this flag can be removed.

If any new warnings show up that are a result of the removal of this
flag, it can be added back under arch/arm{,64}/Makefile to avoid
warnings on other architectures.

Signed-off-by: Nathan Chancellor <[email protected]>
Tested-by: David Gow <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Reviewed-by: Sedat Dilek <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: fix empty ${PYTHON} in scripts/link-vmlinux.sh

The two commits

d8d2d38275c1 ("kbuild: remove PYTHON variable")
a8cccdd95473 ("init: lto: ensure initcall ordering")

were applied in the same development cycle, into two different trees.

After they were merged together, this ${PYTHON} expands to an empty
string.

Therefore, ${srctree}/scripts/jobserver-exec is executed directly.
(it has the executable bit set)

This is working but let's fix the code into the intended form.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Reviewed-by: Sedat Dilek <[email protected]>

kconfig: remove stale comment about removed kconfig_print_symbol()

This comment is about kconfig_print_symbol(), which was removed by
commit 6ce45a91a982 ("kconfig: refactor conf_write_symbol()").

Signed-off-by: Masahiro Yamada <[email protected]>

dm integrity: set journal entry unused when shrinking device

Commit f6f72f32c22c ("dm integrity: don't replay journal data past the
end of the device") skips journal replay if the target sector points
beyond the end of the device. Unfortunatelly, it doesn't set the
journal entry unused, which resulted in this BUG being triggered:
BUG_ON(!journal_entry_is_unused(je))

Fix this by calling journal_entry_set_unused() for this case.

Fixes: f6f72f32c22c ("dm integrity: don't replay journal data past the end of the device")
Cc: [email protected] # v5.7+
Signed-off-by: Mikulas Patocka <[email protected]>
Tested-by: Milan Broz <[email protected]>
[snitzer: revised header]
Signed-off-by: Mike Snitzer <[email protected]>

dm ioctl: log an error if the ioctl structure is corrupted

This will help triage bugs when userspace is passing invalid ioctl
structure to the kernel.

Signed-off-by: Mikulas Patocka <[email protected]>
[snitzer: log errors using DMERR instead of DMWARN]
Signed-off-by: Mike Snitzer <[email protected]>

ARM: 9191/1: arm/stacktrace, kasan: Silence KASAN warnings in unwind_frame()

The following KASAN warning is detected by QEMU.

==================================================================
BUG: KASAN: stack-out-of-bounds in unwind_frame+0x508/0x870
Read of size 4 at addr c36bba90 by task cat/163

CPU: 1 PID: 163 Comm: cat Not tainted 5.10.0-rc1 #40
Hardware name: ARM-Versatile Express
[<c0113fac>] (unwind_backtrace) from [<c010e71c>] (show_stack+0x10/0x14)
[<c010e71c>] (show_stack) from [<c0b805b4>] (dump_stack+0x98/0xb0)
[<c0b805b4>] (dump_stack) from [<c0b7d658>] (print_address_description.constprop.0+0x58/0x4bc)
[<c0b7d658>] (print_address_description.constprop.0) from [<c031435c>] (kasan_report+0x154/0x170)
[<c031435c>] (kasan_report) from [<c0113c44>] (unwind_frame+0x508/0x870)
[<c0113c44>] (unwind_frame) from [<c010e298>] (__save_stack_trace+0x110/0x134)
[<c010e298>] (__save_stack_trace) from [<c01ce0d8>] (stack_trace_save+0x8c/0xb4)
[<c01ce0d8>] (stack_trace_save) from [<c0313520>] (kasan_set_track+0x38/0x60)
[<c0313520>] (kasan_set_track) from [<c0314cb8>] (kasan_set_free_info+0x20/0x2c)
[<c0314cb8>] (kasan_set_free_info) from [<c0313474>] (__kasan_slab_free+0xec/0x120)
[<c0313474>] (__kasan_slab_free) from [<c0311e20>] (kmem_cache_free+0x7c/0x334)
[<c0311e20>] (kmem_cache_free) from [<c01c35dc>] (rcu_core+0x390/0xccc)
[<c01c35dc>] (rcu_core) from [<c01013a8>] (__do_softirq+0x180/0x518)
[<c01013a8>] (__do_softirq) from [<c0135214>] (irq_exit+0x9c/0xe0)
[<c0135214>] (irq_exit) from [<c01a40e4>] (__handle_domain_irq+0xb0/0x110)
[<c01a40e4>] (__handle_domain_irq) from [<c0691248>] (gic_handle_irq+0xa0/0xb8)
[<c0691248>] (gic_handle_irq) from [<c0100b0c>] (__irq_svc+0x6c/0x94)
Exception stack(0xc36bb928 to 0xc36bb970)
b920: c36bb9c0 00000000 c0126919 c0101228 c36bb9c0 b76d7730
b940: c36b8000 c36bb9a0 c3335b00 c01ce0d8 00000003 c36bba3c c36bb940 c36bb978
b960: c010e298 c011373c 60000013 ffffffff
[<c0100b0c>] (__irq_svc) from [<c011373c>] (unwind_frame+0x0/0x870)
[<c011373c>] (unwind_frame) from [<00000000>] (0x0)

The buggy address belongs to the page:
page:(ptrval) refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0x636bb
flags: 0x0()
raw: 00000000 00000000 ef867764 00000000 00000000 00000000 ffffffff 00000000
page dumped because: kasan: bad access detected

addr c36bba90 is located in stack of task cat/163 at offset 48 in frame:
stack_trace_save+0x0/0xb4

this frame has 1 object:
[32, 48) 'trace'

Memory state around the buggy address:
c36bb980: f1 f1 f1 f1 00 04 f2 f2 00 00 f3 f3 00 00 00 00
c36bba00: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
>c36bba80: 00 00 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
^
c36bbb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c36bbb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

There is a same issue on x86 and has been resolved by the commit f7d27c35ddff
("x86/mm, kasan: Silence KASAN warnings in get_wchan()").
The solution could be applied to arm architecture too.

Signed-off-by: Lin Yujun <[email protected]>
Reported-by: He Ying <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>

ARM: 9190/1: kdump: add invalid input check for 'crashkernel=0'

Add invalid input check expression when 'crashkernel=0' is specified
running kdump.

Signed-off-by: Austin Kim <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>

MIPS: crypto: Fix CRC32 code

Commit 67512a8cf5a7 ("MIPS: Avoid macro redefinitions") changed how the
MIPS register macros were defined, in order to allow the code to compile
under LLVM/Clang.

The MIPS CRC32 code however wasn't updated accordingly, causing a build
bug when using a MIPS32r6 toolchain without CRC support.

Update the CRC32 code to use the macros correctly, to fix the build
failures.

Fixes: 67512a8cf5a7 ("MIPS: Avoid macro redefinitions")
Cc: <[email protected]>
Signed-off-by: Paul Cercueil <[email protected]>
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Thomas Bogendoerfer <[email protected]>

dma-mapping: move pgprot_decrypted out of dma_pgprot

pgprot_decrypted is used by AMD SME systems to allow access to memory
that was set to not encrypted using set_memory_decrypted. That only
happens for dma-direct memory as the IOMMU solves the addressing
challenges for the encryption bit using its own remapping.

Move the pgprot_decrypted call out of dma_pgprot which is also used
by the IOMMU mappings and into dma-direct so that it is only used with
memory that was set decrypted.

Fixes: f5ff79fddf0e ("dma-mapping: remove CONFIG_DMA_REMAP")
Reported-by: Alex Xu (Hello71) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Alex Xu (Hello71) <[email protected]>

Revert "Input: clear BTN_RIGHT/MIDDLE on buttonpads"

This reverts commit 37ef4c19b4c659926ce65a7ac709ceaefb211c40.

The touchpad present in the Dell Precision 7550 and 7750 laptops
reports a HID_DG_BUTTONTYPE of type MT_BUTTONTYPE_CLICKPAD. However,
the device is not a clickpad, it is a touchpad with physical buttons.

In order to fix this issue, a quirk for the device was introduced in
libinput [1] [2] to disable the INPUT_PROP_BUTTONPAD property:

[Precision 7x50 Touchpad]
MatchBus=i2c
MatchUdevType=touchpad
MatchDMIModalias=dmi:*svnDellInc.:pnPrecision7?50*
AttrInputPropDisable=INPUT_PROP_BUTTONPAD

However, because of the change introduced in 37ef4c19b4 ("Input: clear
BTN_RIGHT/MIDDLE on buttonpads") the BTN_RIGHT key bit is not mapped
anymore breaking the device right click button and making impossible to
workaround it in user space.

In order to avoid breakage on other present or future devices, revert
the patch causing the issue.

Signed-off-by: José Expósito <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
Acked-by: Peter Hutterer <[email protected]>
Acked-by: Benjamin Tissoires <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>

exfat: do not clear VolumeDirty in writeback

Before this commit, VolumeDirty will be cleared first in
writeback if 'dirsync' or 'sync' is not enabled. If the power
is suddenly cut off after cleaning VolumeDirty but other
updates are not written, the exFAT filesystem will not be able
to detect the power failure in the next mount.

And VolumeDirty will be set again but not cleared when updating
the parent directory. It means that BootSector will be written at
least once in each write-back, which will shorten the life of the
device.

Reviewed-by: Andy Wu <[email protected]>
Reviewed-by: Aoyama Wataru <[email protected]>
Signed-off-by: Yuezhang Mo <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>

exfat: allow access to paths with trailing dots

The Linux kernel exfat driver currently unconditionally strips
trailing periods '.' from path components. This isdone intentionally,
loosely following Windows behaviour and specifications
which state:

  #exFAT
  The concatenated file name has the same set of illegal characters as
  other FAT-based file systems (see Table 31).

  #FAT
  ...
  Leading and trailing spaces in a long name are ignored.
  Leading and embedded periods are allowed in a name and are stored in
  the long name. Trailing periods are ignored.

Note: Leading and trailing space ' ' characters are currently retained
by Linux kernel exfat, in conflict with the above specification.
On Windows 10, trailing and leading space ' ' characters are stripped
from the filenames.
Some implementations, such as fuse-exfat, don't perform path trailer
removal. When mounting images which contain trailing-dot paths, these
paths are unreachable, e.g.:

  + mount.exfat-fuse /dev/zram0 /mnt/test/
  FUSE exfat 1.3.0
  + cd /mnt/test/
  + touch fuse_created_dots... '  fuse_created_spaces  '
  + ls -l
  total 0
  -rwxrwxrwx 1 root 0 0 Aug 18 09:45 '  fuse_created_spaces  '
  -rwxrwxrwx 1 root 0 0 Aug 18 09:45  fuse_created_dots...
  + cd /
  + umount /mnt/test/
  + mount -t exfat /dev/zram0 /mnt/test
  + cd /mnt/test
  + ls -l
  ls: cannot access 'fuse_created_dots...': No such file or directory
  total 0
  -rwxr-xr-x 1 root 0 0 Aug 18 09:45 '  fuse_created_spaces  '
  -????????? ? ?    ? ?            ?  fuse_created_dots...
  + touch kexfat_created_dots... '  kexfat_created_spaces  '
  + ls -l
  ls: cannot access 'fuse_created_dots...': No such file or directory
  total 0
  -rwxr-xr-x 1 root 0 0 Aug 18 09:45 '  fuse_created_spaces  '
  -rwxr-xr-x 1 root 0 0 Aug 18 09:45 '  kexfat_created_spaces  '
  -????????? ? ?    ? ?            ?  fuse_created_dots...
  -rwxr-xr-x 1 root 0 0 Aug 18 09:45  kexfat_created_dots
  + cd /
  + umount /mnt/test/

This commit adds "keep_last_dots" mount option that controls whether or
not trailing periods '.' are stripped
from path components during file lookup or file creation.
This mount option can be used to access
paths with trailing periods and disallow creating files with names with
trailing periods. E.g. continuing from the previous example:

  + mount -t exfat -o keep_last_dots /dev/zram0 /mnt/test
  + cd /mnt/test
  + ls -l
  total 0
  -rwxr-xr-x 1 root 0 0 Aug 18 10:32 '  fuse_created_spaces  '
  -rwxr-xr-x 1 root 0 0 Aug 18 10:32 '  kexfat_created_spaces  '
  -rwxr-xr-x 1 root 0 0 Aug 18 10:32  fuse_created_dots...
  -rwxr-xr-x 1 root 0 0 Aug 18 10:32  kexfat_created_dots

  + echo > kexfat_created_dots_again...
  sh: kexfat_created_dots_again...: Invalid argument

Link: https://bugzilla.suse.com/show_bug.cgi?id=1188964
Link: https://lore.kernel.org/linux-fsdevel/003b01d755e4$31fb0d80$95f12880$
@samsung.com/
Link: https://docs.microsoft.com/en-us/windows/win32/fileio/exfat-specification
Suggested-by: Takashi Iwai <[email protected]>
Signed-off-by: Vasant Karasulli <[email protected]>
Co-developed-by: David Disseldorp <[email protected]>
Signed-off-by: David Disseldorp <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>

RISC-V: K210 defconfigs: Drop redundant MEMBARRIER=n

As of 93917ad50972 ("RISC-V: Add support for restartable sequence") we
have support for restartable sequences, which default to enabled. These
select MEMBARRIER, so disabling it is now redundant.

Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: defconfig: Drop redundant SBI HVC and earlycon

As of 3938d5a2f936 ("riscv: default to CONFIG_RISCV_SBI_V01=n") we no
longer default to enabling SBI-0.1 support, so these dependent configs
no longer have any effect. Remove them to avoid clutter.

Signed-off-by: Palmer Dabbelt <[email protected]>

platform/chrome: cros_ec_debugfs: detach log reader wq from devm

Debugfs console_log uses devm memory (e.g. debug_info in
cros_ec_console_log_poll()).  However, lifecycles of device and debugfs
are independent.  An use-after-free issue is observed if userland
program operates the debugfs after the memory has been freed.

The call trace:
do_raw_spin_lock
_raw_spin_lock_irqsave
remove_wait_queue
ep_unregister_pollwait
ep_remove
do_epoll_ctl

A Python example to reproduce the issue:
... import select
... p = select.epoll()
... f = open('/sys/kernel/debug/cros_scp/console_log')
... p.register(f, select.POLLIN)
... p.poll(1)
[(4, 1)]                    # 4=fd, 1=select.POLLIN

[ shutdown cros_scp at the point ]

... p.poll(1)
[(4, 16)]                   # 4=fd, 16=select.POLLHUP
... p.unregister(f)

An use-after-free issue raises here.  It called epoll_ctl with
EPOLL_CTL_DEL which in turn to use the workqueue in the devm (i.e.
log_wq).

Detaches log reader's workqueue from devm to make sure it is persistent
even if the device has been removed.

Signed-off-by: Tzung-Bi Shih <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Benson Leung <[email protected]>