Git Repo - linux.git/log

Merge tag 'm68k-for-v5.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k

Pull m68k updates from Geert Uytterhoeven:

  - Conversion of the Mac IDE driver to a platform driver

  - Minor cleanups and fixes

* tag 'm68k-for-v5.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  ide/macide: Convert Mac IDE driver to platform driver
  m68k: Replace HTTP links with HTTPS ones
  m68k: mm: Remove superfluous memblock_alloc*() casts
  m68k: mm: Use PAGE_ALIGNED() helper
  m68k: Sort selects in main Kconfig
  m68k: amiga: Clean up Amiga hardware configuration
  m68k: Revive _TIF_* masks
  m68k: Correct some typos in comments
  m68k: Use get_kernel_nofault() in show_registers()
  zorro: Fix address space collision message with RAM expansion boards
  m68k: amiga: Fix Denise detection on OCS

Merge tag 'microblaze-v5.10' of git://git.monstr.eu/linux-2.6-microblaze

Pull Microblaze build warning fix from Michal Simek.

* tag 'microblaze-v5.10' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: fix kbuild redundant file warning

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
"There's quite a lot of code here, but much of it is due to the
  addition of a new PMU driver as well as some arm64-specific selftests
  which is an area where we've traditionally been lagging a bit.

  In terms of exciting features, this includes support for the Memory
  Tagging Extension which narrowly missed 5.9, hopefully allowing
  userspace to run with use-after-free detection in production on CPUs
  that support it. Work is ongoing to integrate the feature with KASAN
  for 5.11.

  Another change that I'm excited about (assuming they get the hardware
  right) is preparing the ASID allocator for sharing the CPU page-table
  with the SMMU. Those changes will also come in via Joerg with the
  IOMMU pull.

  We do stray outside of our usual directories in a few places, mostly
  due to core changes required by MTE. Although much of this has been
  Acked, there were a couple of places where we unfortunately didn't get
  any review feedback.

  Other than that, we ran into a handful of minor conflicts in -next,
  but nothing that should post any issues.

  Summary:

   - Userspace support for the Memory Tagging Extension introduced by
     Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.

   - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
     switching.

   - Fix and subsequent rewrite of our Spectre mitigations, including
     the addition of support for PR_SPEC_DISABLE_NOEXEC.

   - Support for the Armv8.3 Pointer Authentication enhancements.

   - Support for ASID pinning, which is required when sharing
     page-tables with the SMMU.

   - MM updates, including treating flush_tlb_fix_spurious_fault() as a
     no-op.

   - Perf/PMU driver updates, including addition of the ARM CMN PMU
     driver and also support to handle CPU PMU IRQs as NMIs.

   - Allow prefetchable PCI BARs to be exposed to userspace using normal
     non-cacheable mappings.

   - Implementation of ARCH_STACKWALK for unwinding.

   - Improve reporting of unexpected kernel traps due to BPF JIT
     failure.

   - Improve robustness of user-visible HWCAP strings and their
     corresponding numerical constants.

   - Removal of TEXT_OFFSET.

   - Removal of some unused functions, parameters and prototypes.

   - Removal of MPIDR-based topology detection in favour of firmware
     description.

   - Cleanups to handling of SVE and FPSIMD register state in
     preparation for potential future optimisation of handling across
     syscalls.

   - Cleanups to the SDEI driver in preparation for support in KVM.

   - Miscellaneous cleanups and refactoring work"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
  Revert "arm64: initialize per-cpu offsets earlier"
  arm64: random: Remove no longer needed prototypes
  arm64: initialize per-cpu offsets earlier
  kselftest/arm64: Check mte tagged user address in kernel
  kselftest/arm64: Verify KSM page merge for MTE pages
  kselftest/arm64: Verify all different mmap MTE options
  kselftest/arm64: Check forked child mte memory accessibility
  kselftest/arm64: Verify mte tag inclusion via prctl
  kselftest/arm64: Add utilities and a test to validate mte memory
  perf: arm-cmn: Fix conversion specifiers for node type
  perf: arm-cmn: Fix unsigned comparison to less than zero
  arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
  arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
  arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
  arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
  KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
  arm64: Get rid of arm64_ssbd_state
  KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
  KVM: arm64: Get rid of kvm_arm_have_ssbd()
  KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
  ...

Merge tag 'tpmdd-next-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm updates from Jarkko Sakkinen:
"Support for a new TPM device and fixes and Git URL change (infraded ->
  korg)"

* tag 'tpmdd-next-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  MAINTAINERS: TPM DEVICE DRIVER: Update GIT
  tpm_tis: Add a check for invalid status
  tpm: use %*ph to print small buffer
  dt-bindings: Add SynQucer TPM MMIO as a trivial device
  tpm: tis: add support for MMIO TPM on SynQuacer

Merge branch 'efi/urgent' into efi/core, to pick up fixes

These fixes missed the v5.9 merge window, pick them up for early v5.10 merge.

Signed-off-by: Ingo Molnar <[email protected]>

perf/core: Fix race in the perf_mmap_close() function

There's a possible race in perf_mmap_close() when checking ring buffer's
mmap_count refcount value. The problem is that the mmap_count check is
not atomic because we call atomic_dec() and atomic_read() separately.

  perf_mmap_close:
  ...
   atomic_dec(&rb->mmap_count);
   ...
   if (atomic_read(&rb->mmap_count))
      goto out_put;

   <ring buffer detach>
   free_uid

out_put:
  ring_buffer_put(rb); /* could be last */

The race can happen when we have two (or more) events sharing same ring
buffer and they go through atomic_dec() and then they both see 0 as refcount
value later in atomic_read(). Then both will go on and execute code which
is meant to be run just once.

The code that detaches ring buffer is probably fine to be executed more
than once, but the problem is in calling free_uid(), which will later on
demonstrate in related crashes and refcount warnings, like:

  refcount_t: addition on 0; use-after-free.
  ...
  RIP: 0010:refcount_warn_saturate+0x6d/0xf
  ...
  Call Trace:
  prepare_creds+0x190/0x1e0
  copy_creds+0x35/0x172
  copy_process+0x471/0x1a80
  _do_fork+0x83/0x3a0
  __do_sys_wait4+0x83/0x90
  __do_sys_clone+0x85/0xa0
  do_syscall_64+0x5b/0x1e0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Using atomic decrease and check instead of separated calls.

Tested-by: Michael Petlan <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Acked-by: Wade Mealing <[email protected]>
Fixes: 9bb5d40cd93c ("perf: Fix mmap() accounting hole");
Link: https://lore.kernel.org/r/20200916115311.GE2301783@krava

Merge branch 'edac-drivers' into edac-updates-for-v5.10

Signed-off-by: Borislav Petkov <[email protected]>

Linux 5.9

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"Five fixes.

  Subsystems affected by this patch series: MAINTAINERS, mm/pagemap,
  mm/swap, and mm/hugetlb"

* emailed patches from Andrew Morton <[email protected]>:
  mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
  mm: validate inode in mapping_set_error()
  mm: mmap: Fix general protection fault in unlink_file_vma()
  MAINTAINERS: Antoine Tenart's email address
  MAINTAINERS: change hardening mailing list

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fix from Al Viro:
"Fixes an obvious bug (memory leak introduced in 5.8)"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
pipe: Fix memory leaks in create_pipe_files()

Merge tag 'x86-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"Two fixes:

   - Fix a (hopefully final) IRQ state tracking bug vs MCE handling

   - Fix a documentation link"

* tag 'x86-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation/x86: Fix incorrect references to zero-page.txt
  x86/mce: Use idtentry_nmi_enter/exit()

Merge tag 'irqchip-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Pull irqchip updates from Marc Zyngier:

Core changes:

  - Allow irq retriggering to follow a hierarchy
  - Allow interrupt hierarchies to be trimmed at allocation time
  - Allow interrupts to be hidden from /proc/interrupts (IPIs)
  - Introduce stub for set_handle_irq() when !GENERIC_IRQ_MULTI_HANDLER
  - New per-cpu IPI handling flow

Architecture changes:
  - Move arm/arm64 IPI handling to the core interrupt code, removing
    the home brewed accounting

Driver updates:
- New driver for the MStar (and more recently Mediatek) platforms
- New driver for the Actions Owl SIRQ controller
- New driver for the TI PRUSS infrastructure
- Wake-up support for the Qualcomm PDC controller
- Primary interrupt controller support for the Designware APB ICTL
- Convert the IPI code for GIC, GICv3, hip04, armada-270-xp and bcm2836
   to using standard interrupts
- Improve GICv3 pseudo-NMI support to deal with both non-secure and secure
   priorities on arm64
- Convert the GIC/GICv3 drivers to using HW-based irq retrigger
- A sprinkling of dev_err_probe() conversion
- A set of NVIDIA Tegra fixes for interrupt hierarchy corruption
- A reset fix for the Loongson HTVEC driver
- A couple of error handling fixes in the TI SCI drivers

Merge tag 'perf-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Ingo Molnar:
"Fix an error handling bug that can cause a lockup if a CPU is offline
(doh ...)"

* tag 'perf-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix task_function_call() error handling

mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged

When memory is hotplug added or removed the min_free_kbytes should be
recalculated based on what is expected by khugepaged. Currently after
hotplug, min_free_kbytes will be set to a lower default and higher
default set when THP enabled is lost.

This change restores min_free_kbytes as expected for THP consumers.

[[email protected]: v5]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: f000565adb77 ("thp: set recommended min free kbytes")
Signed-off-by: Vijay Balakrishna <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Allen Pais <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Song Liu <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: validate inode in mapping_set_error()

The swap address_space doesn't have host. Thus, it makes kernel crash once
swap write meets error. Fix it.

Fixes: 735e4ae5ba28 ("vfs: track per-sb writeback errors and report them to syncfs")
Signed-off-by: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Jeff Layton <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: David Howells <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: mmap: Fix general protection fault in unlink_file_vma()

The syzbot reported the below general protection fault:

  general protection fault, probably for non-canonical address
  0xe00eeaee0000003b: 0000 [#1] PREEMPT SMP KASAN
  KASAN: maybe wild-memory-access in range [0x00777770000001d8-0x00777770000001df]
  CPU: 1 PID: 10488 Comm: syz-executor721 Not tainted 5.9.0-rc3-syzkaller #0
  RIP: 0010:unlink_file_vma+0x57/0xb0 mm/mmap.c:164
  Call Trace:
     free_pgtables+0x1b3/0x2f0 mm/memory.c:415
     exit_mmap+0x2c0/0x530 mm/mmap.c:3184
     __mmput+0x122/0x470 kernel/fork.c:1076
     mmput+0x53/0x60 kernel/fork.c:1097
     exit_mm kernel/exit.c:483 [inline]
     do_exit+0xa8b/0x29f0 kernel/exit.c:793
     do_group_exit+0x125/0x310 kernel/exit.c:903
     get_signal+0x428/0x1f00 kernel/signal.c:2757
     arch_do_signal+0x82/0x2520 arch/x86/kernel/signal.c:811
     exit_to_user_mode_loop kernel/entry/common.c:136 [inline]
     exit_to_user_mode_prepare+0x1ae/0x200 kernel/entry/common.c:167
     syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:242
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

It's because the ->mmap() callback can change vma->vm_file and fput the
original file.  But the commit d70cec898324 ("mm: mmap: merge vma after
call_mmap() if possible") failed to catch this case and always fput()
the original file, hence add an extra fput().

[ Thanks Hillf for pointing this extra fput() out. ]

Fixes: d70cec898324 ("mm: mmap: merge vma after call_mmap() if possible")
Reported-by: [email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Christian König <[email protected]>
Cc: Hongxiang Lou <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: John Hubbard <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: Antoine Tenart's email address

Use my kernel.org address instead of my bootlin.com one.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: change hardening mailing list

As more email from git history gets aimed at the OpenWall
kernel-hardening@ list, there has been a desire to separate "new topics"
from "on-going" work.

To handle this, the superset of hardening email topics are now to be
directed to [email protected].

Update the MAINTAINERS file and the .mailmap to accomplish this, so that
linux-hardening@ can be treated like any other regular upstream kernel
development list.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Emese Revfy <[email protected]>
Cc: "Tobin C. Harding" <[email protected]>
Cc: Tycho Andersen <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Link: https://lore.kernel.org/linux-hardening/202010051443.279CC265D@keescook/
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Some more driver bugfixes for I2C. Including a revert - the updated
  series for it will come during the next merge window"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: owl: Clear NACK and BUS error bits
  Revert "i2c: imx: Fix reset of I2SR_IAL flag"
  i2c: meson: fixup rate calculation with filter delay
  i2c: meson: keep peripheral clock enabled
  i2c: meson: fix clock setting overwrite
  i2c: imx: Fix reset of I2SR_IAL flag

cifs: Fix incomplete memory allocation on setxattr path

On setxattr() syscall path due to an apprent typo the size of a dynamically
allocated memory chunk for storing struct smb2_file_full_ea_info object is
computed incorrectly, to be more precise the first addend is the size of
a pointer instead of the wanted object size. Coincidentally it makes no
difference on 64-bit platforms, however on 32-bit targets the following
memcpy() writes 4 bytes of data outside of the dynamically allocated memory.

  =============================================================================
  BUG kmalloc-16 (Not tainted): Redzone overwritten
  -----------------------------------------------------------------------------

  Disabling lock debugging due to kernel taint
  INFO: 0x79e69a6f-0x9e5cdecf @offset=368. First byte 0x73 instead of 0xcc
  INFO: Slab 0xd36d2454 objects=85 used=51 fp=0xf7d0fc7a flags=0x35000201
  INFO: Object 0x6f171df3 @offset=352 fp=0x00000000

  Redzone 5d4ff02d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
  Object 6f171df3: 00 00 00 00 00 05 06 00 73 6e 72 75 62 00 66 69  ........snrub.fi
  Redzone 79e69a6f: 73 68 32 0a                                      sh2.
  Padding 56254d82: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
  CPU: 0 PID: 8196 Comm: attr Tainted: G    B             5.9.0-rc8+ #3
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
  Call Trace:
   dump_stack+0x54/0x6e
   print_trailer+0x12c/0x134
   check_bytes_and_report.cold+0x3e/0x69
   check_object+0x18c/0x250
   free_debug_processing+0xfe/0x230
   __slab_free+0x1c0/0x300
   kfree+0x1d3/0x220
   smb2_set_ea+0x27d/0x540
   cifs_xattr_set+0x57f/0x620
   __vfs_setxattr+0x4e/0x60
   __vfs_setxattr_noperm+0x4e/0x100
   __vfs_setxattr_locked+0xae/0xd0
   vfs_setxattr+0x4e/0xe0
   setxattr+0x12c/0x1a0
   path_setxattr+0xa4/0xc0
   __ia32_sys_lsetxattr+0x1d/0x20
   __do_fast_syscall_32+0x40/0x70
   do_fast_syscall_32+0x29/0x60
   do_SYSENTER_32+0x15/0x20
   entry_SYSENTER_32+0x9f/0xf2

Fixes: 5517554e4313 ("cifs: Add support for writing attributes on SMB2+")
Signed-off-by: Vladimir Zapolskiy <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/khugepaged: fix filemap page_to_pgoff(page) != offset

There have been elusive reports of filemap_fault() hitting its
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
with CONFIG_READ_ONLY_THP_FOR_FS=y.

Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
without NUMA reuses the same huge page after collapse_file() failed
(whereas NUMA targets its allocation to the respective node each time).
And most of us were usually testing with CONFIG_NUMA=y kernels.

collapse_file(old start)
  new_page = khugepaged_alloc_page(hpage)
  __SetPageLocked(new_page)
  new_page->index = start // hpage->index=old offset
  new_page->mapping = mapping
  xas_store(&xas, new_page)

                          filemap_fault
                            page = find_get_page(mapping, offset)
                            // if offset falls inside hpage then
                            // compound_head(page) == hpage
                            lock_page_maybe_drop_mmap()
                              __lock_page(page)

  // collapse fails
  xas_store(&xas, old page)
  new_page->mapping = NULL
  unlock_page(new_page)

collapse_file(new start)
  new_page = khugepaged_alloc_page(hpage)
  __SetPageLocked(new_page)
  new_page->index = start // hpage->index=new offset
  new_page->mapping = mapping // mapping becomes valid again

                            // since compound_head(page) == hpage
                            // page_to_pgoff(page) got changed
                            VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)

An initial patch replaced __SetPageLocked() by lock_page(), which did
fix the race which Suren illustrates above.  But testing showed that it's
not good enough: if the racing task's __lock_page() gets delayed long
after its find_get_page(), then it may follow collapse_file(new start)'s
successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.

It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
check and retry (as is done for mapping), with similar relaxations in
find_lock_entry() and pagecache_get_page(): but it's not obvious what
else might get caught out; and khugepaged non-NUMA appears to be unique
in exposing a page to page cache, then revoking, without going through
a full cycle of freeing before reuse.

Instead, non-NUMA khugepaged_prealloc_page() release the old page
if anyone else has a reference to it (1% of cases when I tested).

Although never reported on huge tmpfs, I believe its find_lock_entry()
has been at similar risk; but huge tmpfs does not rely on khugepaged
for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.

Reported-by: Denis Lisov <[email protected]>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org
Reported-by: Qian Cai <[email protected]>
Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw
Reported-and-analyzed-by: Suren Baghdasaryan <[email protected]>
Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page")
Signed-off-by: Hugh Dickins <[email protected]>
Cc: [email protected] # v4.9+
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

io_uring: keep a pointer ref_node in file_data

->cur_refs of struct fixed_file_data always points to percpu_ref
embedded into struct fixed_file_ref_node. Don't overuse container_of()
and offsetting, and point directly to fixed_file_ref_node.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: refactor *files_register()'s error paths

Don't keep repeating cleaning sequences in error paths, write it once
in the and use labels. It's less error prone and looks cleaner.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: clean file_data access in files_register

Keep file_data in a local var and replace with it complex references
such as ctx->file_data.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: don't delay io_init_req() error check

Don't postpone io_init_req() error checks and do that right after
calling it. There is no control-flow statements or dependencies with
sqe/submitted accounting, so do those earlier, that makes the code flow
a bit more natural.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: clean leftovers after splitting issue

Kill extra if in io_issue_sqe() and place send/recv[msg] calls
appropriately under switch's cases.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: remove timeout.list after hrtimer cancel

Remove timeouts from ctx->timeout_list after hrtimer_try_to_cancel()
successfully cancels it. With this we don't need to care whether there
was a race and it was removed in io_timeout_fn(), and that will be handy
for following patches.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: use a separate struct for timeout_remove

Don't use struct io_timeout for both IORING_OP_TIMEOUT and
IORING_OP_TIMEOUT_REMOVE, they're quite different. Split them in two,
that allows to remove an unused field in struct io_timeout, and btw kill
->flags not used by either. This also easier to follow, especially for
timeout remove.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: improve submit_state.ios_left accounting

state->ios_left isn't decremented for requests that don't need a file,
so it might be larger than number of SQEs left. That in some
circumstances makes us to grab more files that is needed so imposing
extra put.
Deaccount one ios_left for each request.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: simplify io_file_get()

Keep ->needs_file_no_error check out of io_file_get(), and let callers
handle it. It makes it more straightforward. Also, as the only error it
can hand back -EBADF, make it return a file or NULL.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: kill extra check in fixed io_file_get()

ctx->nr_user_files == 0 IFF ctx->file_data == NULL and there fixed files
are not used. Hence, verifying fds only against ctx->nr_user_files is
enough. Remove the other check from hot path.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: clean up ->files grabbing

Move work.files grabbing into io_prep_async_work() to all other work
resources initialisation. We don't need to keep it separately now, as
->ring_fd/file are gone. It also allows to not grab it when a request
is not going to io-wq.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: don't io_prep_async_work() linked reqs

There is no real reason left for preparing io-wq work context for linked
requests in advance, remove it as this might become a bottleneck in some
cases.

Reported-by: Roman Gershman <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge branch 'irq/mstar' into irq/irqchip-next

Signed-off-by: Marc Zyngier <[email protected]>

dt-bindings: interrupt-controller: Add MStar interrupt controller

Add binding for MStar interrupt controller.

Signed-off-by: Mark-PK Tsai <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

irqchip/irq-mst: Add MStar interrupt controller support

Add MStar interrupt controller support using hierarchy irq
domain.

Signed-off-by: Mark-PK Tsai <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Tested-by: Daniel Palmer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge branch 'irq/irqchip-fixes' into irq/irqchip-next

Signed-off-by: Marc Zyngier <[email protected]>

Merge branch 'irq/tegra-pmc' into irq/irqchip-next

Signed-off-by: Marc Zyngier <[email protected]>

i2c: owl: Clear NACK and BUS error bits

When the NACK and BUS error bits are set by the hardware, the driver is
responsible for clearing them by writing "1" into the corresponding
status registers.

Hence perform the necessary operations in owl_i2c_interrupt().

Fixes: d211e62af466 ("i2c: Add Actions Semiconductor Owl family S900 I2C driver")
Reported-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Cristian Ciocaltea <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>

soc/tegra: pmc: Don't create fake interrupt hierarchy levels

The Tegra PMC driver does ungodly things with the interrupt hierarchy,
repeatedly corrupting it by pulling hwirq numbers out of thin air,
overriding existing IRQ mappings and changing the handling flow
of unsuspecting users.

All of this is done in the name of preserving the interrupt hierarchy
even when these levels do not exist in the HW. Together with the use
of proper IRQs for IPIs, this leads to an unbootable system as the
rescheduling IPI gets repeatedly repurposed for random drivers...

Instead, let's simply mark the level from which the hierarchy does
not make sense for the HW, and let the core code trim the usused
levels from the hierarchy.

Signed-off-by: Marc Zyngier <[email protected]>

soc/tegra: pmc: Allow optional irq parent callbacks

Make the PMC driver resistent to variable depth interrupt hierarchy,
which we are about to introduce.

Signed-off-by: Marc Zyngier <[email protected]>

gpio: tegra186: Allow optional irq parent callbacks

Make the tegra186 GPIO driver resistent to variable depth
interrupt hierarchy, which we are about to introduce.

No functionnal change yet.

Signed-off-by: Marc Zyngier <[email protected]>

genirq/irqdomain: Allow partial trimming of irq_data hierarchy

It appears that some HW is ugly enough that not all the interrupts
connected to a particular interrupt controller end up with the same
hierarchy depth (some of them are terminated early). This leaves
the irqchip hacker with only two choices, both equally bad:

- create discrete domain chains, one for each "hierarchy depth",
  which is very hard to maintain

- create fake hierarchy levels for the shallow paths, leading
  to all kind of problems (what are the safe hwirq values for these
  fake levels?)

Implement the ability to cut short a single interrupt hierarchy
from a level marked as being disconnected by using the new
irq_domain_disconnect_hierarchy() helper.

The irqdomain allocation code will then perform the trimming

Signed-off-by: Marc Zyngier <[email protected]>

Revert "i2c: imx: Fix reset of I2SR_IAL flag"

This reverts commit fa4d30556883f2eaab425b88ba9904865a4d00f3. An updated
version was sent. So, revert this version and give the new version more
time for testing.

Signed-off-by: Wolfram Sang <[email protected]>

Merge tag 'spi-fix-v5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fix from Mark Brown:
"One last minute fix for v5.9 which has been causing crashes in test
  systems with the fsl-dspi driver when they hit deferred probe (and
  which I probably let cook in next a bit longer than is ideal).

  And an update to MAINTAINERS reflecting Serge's extensive and
  detailed recent work on the DesignWare driver"

* tag 'spi-fix-v5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  MAINTAINERS: Add maintainer of DW APB SSI driver
  spi: fsl-dspi: fix NULL pointer dereference

Merge tag 'riscv-for-linus-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
"Two fixes this week:

   - A fix to actually reserve the device tree's memory. Without this
     the device tree can be overwritten on systems that don't otherwise
     reserve it. This issue should only manifest on !MMU systems.

   - A workaround for a BUG() that triggers when the memory that
     originally contained initdata is freed and later repurposed. This
     triggers a BUG() on builds that had HARDENED_USERCOPY enabled"

* tag 'riscv-for-linus-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fixup bootup failure with HARDENED_USERCOPY
  RISC-V: Make sure memblock reserves the memory containing DT

ata: ahci: mvebu: Make SATA PHY optional for Armada 3720

Older ATF does not provide SMC call for SATA phy power on functionality and
therefore initialization of ahci_mvebu is failing when older version of ATF
is using. In this case phy_power_on() function returns -EOPNOTSUPP.

This patch adds a new hflag AHCI_HFLAG_IGN_NOTSUPP_POWER_ON which cause
that ahci_platform_enable_phys() would ignore -EOPNOTSUPP errors from
phy_power_on() call.

It fixes initialization of ahci_mvebu on Espressobin boards where is older
Marvell's Arm Trusted Firmware without SMC call for SATA phy power.

This is regression introduced in commit 8e18c8e58da64 ("arm64: dts: marvell:
armada-3720-espressobin: declare SATA PHY property") where SATA phy was
defined and therefore ahci_platform_enable_phys() on Espressobin started
failing.

Fixes: 8e18c8e58da64 ("arm64: dts: marvell: armada-3720-espressobin: declare SATA PHY property")
Signed-off-by: Pali Rohár <[email protected]>
Tested-by: Tomasz Maciej Nowak <[email protected]>
Cc: <[email protected]> # 5.1+: ea17a0f153af: phy: marvell: comphy: Convert internal SMCC firmware return codes to errno
Signed-off-by: Jens Axboe <[email protected]>

block: fix uapi blkzoned.h comments

Update the kdoc comments for struct blk_zone (capacity field description
missing) and for struct blk_zone_report (flags field description
missing).

Signed-off-by: Damien Le Moal <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue

blk_exit_queue will free elevator_data, while blk_mq_run_work_fn
will access it. Move cancel of hctx->run_work to the front of
blk_exit_queue to avoid use-after-free.

Fixes: 1b97871b501f ("blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release")
Signed-off-by: Yang Yang <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'for-v5.9-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply

Pull power supply fix from Sebastian Reichel:
"Just a single change to revert enablement of packet error checking for
  battery data on Chromebooks, since some of their embedded controllers
  do not handle it correctly"

* tag 'for-v5.9-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: supply: sbs-battery: chromebook workaround for PEC

blk-mq: get rid of the dead flush handle code path

After commit 923218f6166a ("blk-mq: don't allocate driver tag upfront
for flush rq"), blk_mq_submit_bio() will call blk_insert_flush()
directly to handle flush request rather than blk_mq_sched_insert_request()
in the case of elevator.

Then, all flush request either have set RQF_FLUSH_SEQ flag when call
blk_mq_sched_insert_request(), or have inserted into hctx->dispatch.
So, remove the dead code path.

Signed-off-by: Yufen Yu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

block: get rid of unnecessary local variable

Since whole elevator register is protectd by sysfs_lock, we
don't need extras 'has_elevator'. Just use q->elevator directly.

Signed-off-by: Yufen Yu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

block: fix comment and add lockdep assert

After commit b89f625e28d4 ("block: don't release queue's sysfs
lock during switching elevator"), whole elevator register and
unregister function are covered by sysfs_lock. So, remove wrong
comment and add lockdep assert.

Signed-off-by: Yufen Yu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

blk-mq: use helper function to test hw stopped

We have introduced helper function blk_mq_hctx_stopped() to test
BLK_MQ_S_STOPPED.

Signed-off-by: Yufen Yu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

block: use helper function to test queue register

We have defined common interface blk_queue_registered() to
test QUEUE_FLAG_REGISTERED. Just use it.

Signed-off-by: Yufen Yu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

block: remove redundant mq check

elv_support_iosched() will check queue_is_mq() for us. So, remove
the redundant check to clean code.

Signed-off-by: Yufen Yu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

block: invoke blk_mq_exit_sched no matter whether have .exit_sched

We will register debugfs for scheduler no matter whether it have
defined callback funciton .exit_sched. So, blk_mq_exit_sched()
is always needed to unregister debugfs. Also, q->elevator should
be set as NULL after exiting scheduler.

For now, since all register scheduler have defined .exit_sched,
it will not cause any actual problem. But It will be more reasonable
to do this change.

Signed-off-by: Yufen Yu <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'gpio-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
"Some late fixes: one IRQ issue and one compilation issue for UML.

   - Fix a compilation issue with User Mode Linux

   - Handle spurious interrupts properly in the PCA953x driver"

* tag 'gpio-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: pca953x: Survive spurious interrupts
  gpiolib: Disable compat ->read() code in UML case

percpu_ref: don't refer to ref->data if it isn't allocated

We can't check ref->data->confirm_switch directly in __percpu_ref_exit(), since
ref->data may not be allocated in one not-initialized refcount.

Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
Reported-by: [email protected]
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

EDAC/amd64: Set proper family type for Family 19h Models 20h-2Fh

AMD Family 19h Models 20h-2Fh use the same PCI IDs as Family 17h Models
70h-7Fh. The same family ops and number of channels also apply.

Use the Family17h Model 70h family_type and ops for Family 19h Models
20h-2Fh. Update the controller name to match the system.

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

Merge tag 'mmc-v5.9-rc4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fix from Ulf Hansson:
"Assign a proper discard granularity rather than incorrectly set it to
zero"

* tag 'mmc-v5.9-rc4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: core: don't set limits.discard_granularity as 0

Merge tag 'drm-fixes-2020-10-09' of git://anongit.freedesktop.org/drm/drm

Pull amdgpu drm fixes from Dave Airlie:
"Fixes trickling in this week.

  Alex had a final fix for the newest GPU they introduced in rc1, along
  with one build regression and one crasher fix.

  Cross my fingers that's it for 5.9:

   - Fix a crash on renoir if you override the IP discovery parameter

   - Fix the build on ARC platforms

   - Display fix for Sienna Cichlid"

* tag 'drm-fixes-2020-10-09' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Change ABM config init interface
  drm/amdgpu/swsmu: fix ARC build errors
  drm/amdgpu: fix NULL pointer dereference for Renoir

Documentation: better locations for sysfs-pci, sysfs-tagging

sysfs-pci and sysfs-tagging were mis-filed: their locations within
Documentation/ implied that they were related to file systems. Actually,
each topic is about a very specific *use* of sysfs, and sysfs *happens*
to be a (virtual) filesystem, so this is not really the right place.

It's jarring to be reading about filesystems in general and then come
across these specific details about PCI, and tagging...and then back to
general filesystems again.

Move sysfs-pci to PCI, and move sysfs-tagging to networking. (Thanks to
Jonathan Corbet for coming up with the final locations.)

Signed-off-by: John Hubbard <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>

MAINTAINERS: remove LIBATA PATA DRIVERS entry

Even though there is not much happening for libata PATA drivers
I don't have time to look after them anymore.

Since Jens is maintaining the whole libata anyway just remove
"LIBATA PATA DRIVERS" entry.

Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.10/drivers

Pull MD updates from Song:

"The main changes are:
- Bug fixes in bitmap code, from Zhao Heming.
- Fix a work queue check, from Guoqing Jiang.
- Fix raid5 oops with reshape, from Song Liu.
- Clean up unused code, from Jason Yan."

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md/raid5: fix oops during stripe resizing
  md/bitmap: fix memory leak of temporary bitmap
  md: fix the checking of wrong work queue
  md/bitmap: md_bitmap_get_counter returns wrong blocks
  md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks
  md/raid0: remove unused function is_io_in_chunk_boundary()

Merge remote-tracking branch 'spi/for-5.10' into spi-next

Merge remote-tracking branch 'spi/for-5.9' into spi-linus

spi: cadence: Add SPI transfer delays

When processing an SPI transfer, honor the delay that might be passed
along with it.

Signed-off-by: Daniel Mack <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

io_uring: Convert advanced XArray uses to the normal API

There are no bugs here that I've spotted, it's just easier to use the
normal API and there are no performance advantages to using the more
verbose advanced API.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: Fix XArray usage in io_uring_add_task_file

The xas_store() wasn't paired with an xas_nomem() loop, so if it couldn't
allocate memory using GFP_NOWAIT, it would leak the reference to the file
descriptor. Also the node pointed to by the xas could be freed between
the call to xas_load() under the rcu_read_lock() and the acquisition of
the xa_lock.

It's easier to just use the normal xa_load/xa_store interface here.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
[axboe: fix missing assign after alloc, cur_uring -> tctx rename]
Signed-off-by: Jens Axboe <[email protected]>

io_uring: Fix use of XArray in __io_uring_files_cancel

We have to drop the lock during each iteration, so there's no advantage
to using the advanced API. Convert this to a standard xa_for_each() loop.

Reported-by: [email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Revert "arm64: initialize per-cpu offsets earlier"

This reverts commit 353e228eb355be5a65a3c0996c774a0f46737fda.

Qian Cai reports that TX2 no longer boots with his .config as it appears
that task_cpu() gets instrumented and used before KASAN has been
initialised.

Although Mark has a proposed fix, let's take the safe option of reverting
this for now and sorting it out properly later.

Link: https://lore.kernel.org/r/[email protected]
Reported-by: Qian Cai <[email protected]>
Tested-by: Mark Rutland <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

mmc: sdhci_am654: Fix module autoload

Add a MODULE_DEVICE_TABLE() entry so that the driver is autoloaded
when built as a module.

Signed-off-by: Faiz Abbas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>

Merge branch 'fixes' into next

Merge branch 'lkmm' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into locking/core

Pull LKMM changes for v5.10 from Paul E. McKenney.

Various documentation updates.

Signed-off-by: Ingo Molnar <[email protected]>

Merge branch 'kcsan' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into locking/core

Pull KCSAN updates for v5.10 from Paul E. McKenney:

- Improve kernel messages.

- Be more permissive with bitops races under KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y.

- Optimize debugfs stat counters.

- Introduce the instrument_*read_write() annotations, to provide a
   finer description of certain ops - using KCSAN's compound instrumentation.
   Use them for atomic RNW and bitops, where appropriate.
   Doing this might find new races.
   (Depends on the compiler having tsan-compound-read-before-write=1 support.)

- Support atomic built-ins, which will help certain architectures, such as s390.

- Misc enhancements and smaller fixes.

Signed-off-by: Ingo Molnar <[email protected]>

Merge branch 'locking/urgent' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <[email protected]>

lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"

The thinking in commit:

fddf9055a60d ("lockdep: Use raw_cpu_*() for per-cpu variables")

is flawed. While it is true that when we're migratable both CPUs will
have a 0 value, it doesn't hold that when we do get migrated in the
middle of a raw_cpu_op(), the old CPU will still have 0 by the time we
get around to reading it on the new CPU.

Luckily, the reason for that commit (s390 using preempt_disable()
instead of preempt_disable_notrace() in their percpu code), has since
been fixed by commit:

1196f12a2c96 ("s390: don't trace preemption in percpu macros")

An audit of arch/*/include/asm/percpu*.h shows there are no other
architectures affected by this particular issue.

Fixes: fddf9055a60d ("lockdep: Use raw_cpu_*() for per-cpu variables")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

lockdep: Fix lockdep recursion

Steve reported that lockdep_assert*irq*(), when nested inside lockdep
itself, will trigger a false-positive.

One example is the stack-trace code, as called from inside lockdep,
triggering tracing, which in turn calls RCU, which then uses
lockdep_assert_irqs_disabled().

Fixes: a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
Reported-by: Steven Rostedt <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

lockdep: Fix usage_traceoverflow

Basically print_lock_class_header()'s for loop is out of sync with the
the size of of ->usage_traces[].

Also clean things up a bit while at it, to avoid such mishaps in the future.

Fixes: 23870f122768 ("locking/lockdep: Fix "USED" <- "IN-NMI" inversions")
Reported-by: Qian Cai <[email protected]>
Debugged-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Qian Cai <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

mmc: core: don't set limits.discard_granularity as 0

In mmc_queue_setup_discard() the mmc driver queue's discard_granularity
might be set as 0 (when card->pref_erase > max_discard) while the mmc
device still declares to support discard operation. This is buggy and
triggered the following kernel warning message,

WARNING: CPU: 0 PID: 135 at __blkdev_issue_discard+0x200/0x294
CPU: 0 PID: 135 Comm: f2fs_discard-17 Not tainted 5.9.0-rc6 #1
Hardware name: Google Kevin (DT)
pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--)
pc : __blkdev_issue_discard+0x200/0x294
lr : __blkdev_issue_discard+0x54/0x294
sp : ffff800011dd3b10
x29: ffff800011dd3b10 x28: 0000000000000000 x27: ffff800011dd3cc4 x26: ffff800011dd3e18 x25: 000000000004e69b x24: 0000000000000c40 x23: ffff0000f1deaaf0 x22: ffff0000f2849200 x21: 00000000002734d8 x20: 0000000000000008 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000394 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 00000000000008b0 x9 : ffff800011dd3cb0 x8 : 000000000004e69b x7 : 0000000000000000 x6 : ffff0000f1926400 x5 : ffff0000f1940800 x4 : 0000000000000000 x3 : 0000000000000c40 x2 : 0000000000000008 x1 : 00000000002734d8 x0 : 0000000000000000 Call trace:
__blkdev_issue_discard+0x200/0x294
__submit_discard_cmd+0x128/0x374
__issue_discard_cmd_orderly+0x188/0x244
__issue_discard_cmd+0x2e8/0x33c
issue_discard_thread+0xe8/0x2f0
kthread+0x11c/0x120
ret_from_fork+0x10/0x1c
---[ end trace e4c8023d33dfe77a ]---

This patch fixes the issue by setting discard_granularity as SECTOR_SIZE
instead of 0 when (card->pref_erase > max_discard) is true. Now no more
complain from __blkdev_issue_discard() for the improper value of discard
granularity.

This issue is exposed after commit b35fd7422c2f ("block: check queue's
limits.discard_granularity in __blkdev_issue_discard()"), a "Fixes:" tag
is also added for the commit to make sure people won't miss this patch
after applying the change of __blkdev_issue_discard().

Fixes: e056a1b5b67b ("mmc: queue: let host controllers specify maximum discard timeout")
Fixes: b35fd7422c2f ("block: check queue's limits.discard_granularity in __blkdev_issue_discard()").
Reported-and-tested-by: Vicente Bergas <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Cc: Ulf Hansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>

perf: Fix task_function_call() error handling

The error handling introduced by commit:

2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")

looses any return value from smp_call_function_single() that is not
{0, -EINVAL}. This is a problem because it will return -EXNIO when the
target CPU is offline. Worse, in that case it'll turn into an infinite
loop.

Fixes: 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
Reported-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Kajol Jain <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Barret Rhoden <[email protected]>
Tested-by: Srikar Dronamraju <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

md/raid5: fix oops during stripe resizing

KoWei reported crash during raid5 reshape:

[ 1032.252932] Oops: 0002 [#1] SMP PTI
[...]
[ 1032.252943] RIP: 0010:memcpy_erms+0x6/0x10
[...]
[ 1032.252947] RSP: 0018:ffffba1ac0c03b78 EFLAGS: 00010286
[ 1032.252949] RAX: 0000784ac0000000 RBX: ffff91bec3d09740 RCX: 0000000000001000
[ 1032.252951] RDX: 0000000000001000 RSI: ffff91be6781c000 RDI: 0000784ac0000000
[ 1032.252953] RBP: ffffba1ac0c03bd8 R08: 0000000000001000 R09: ffffba1ac0c03bf8
[ 1032.252954] R10: 0000000000000000 R11: 0000000000000000 R12: ffffba1ac0c03bf8
[ 1032.252955] R13: 0000000000001000 R14: 0000000000000000 R15: 0000000000000000
[ 1032.252958] FS:  0000000000000000(0000) GS:ffff91becf500000(0000) knlGS:0000000000000000
[ 1032.252959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1032.252961] CR2: 0000784ac0000000 CR3: 000000031780a002 CR4: 00000000001606e0
[ 1032.252962] Call Trace:
[ 1032.252969]  ? async_memcpy+0x179/0x1000 [async_memcpy]
[ 1032.252977]  ? raid5_release_stripe+0x8e/0x110 [raid456]
[ 1032.252982]  handle_stripe_expansion+0x15a/0x1f0 [raid456]
[ 1032.252988]  handle_stripe+0x592/0x1270 [raid456]
[ 1032.252993]  handle_active_stripes.isra.0+0x3cb/0x5a0 [raid456]
[ 1032.252999]  raid5d+0x35c/0x550 [raid456]
[ 1032.253002]  ? schedule+0x42/0xb0
[ 1032.253006]  ? schedule_timeout+0x10e/0x160
[ 1032.253011]  md_thread+0x97/0x160
[ 1032.253015]  ? wait_woken+0x80/0x80
[ 1032.253019]  kthread+0x104/0x140
[ 1032.253022]  ? md_start_sync+0x60/0x60
[ 1032.253024]  ? kthread_park+0x90/0x90
[ 1032.253027]  ret_from_fork+0x35/0x40

This is because cache_size_mutex was unlocked too early in resize_stripes,
which races with grow_one_stripe() that grow_one_stripe() allocates a
stripe with wrong pool_size.

Fix this issue by unlocking cache_size_mutex after updating pool_size.

Cc: <[email protected]> # v4.4+
Reported-by: KoWei Sung <[email protected]>
Signed-off-by: Song Liu <[email protected]>

md/bitmap: fix memory leak of temporary bitmap

Callers of get_bitmap_from_slot() are responsible to free the bitmap.

Suggested-by: Guoqing Jiang <[email protected]>
Signed-off-by: Zhao Heming <[email protected]>
Signed-off-by: Song Liu <[email protected]>

md: fix the checking of wrong work queue

It should check md_rdev_misc_wq instead of md_misc_wq.

Fixes: cc1ffe61c026 ("md: add new workqueue for delete rdev")
Cc: <[email protected]> # v5.8+
Signed-off-by: Guoqing Jiang <[email protected]>
Signed-off-by: Song Liu <[email protected]>

md/bitmap: md_bitmap_get_counter returns wrong blocks

md_bitmap_get_counter() has code:

```
    if (bitmap->bp[page].hijacked ||
        bitmap->bp[page].map == NULL)
        csize = ((sector_t)1) << (bitmap->chunkshift +
                      PAGE_COUNTER_SHIFT - 1);
```

The minus 1 is wrong, this branch should report 2048 bits of space.
With "-1" action, this only report 1024 bit of space.

This bug code returns wrong blocks, but it doesn't inflence bitmap logic:
1. Most callers focus this function return value (the counter of offset),
   not the parameter blocks.
2. The bug is only triggered when hijacked is true or map is NULL.
   the hijacked true condition is very rare.
   the "map == null" only true when array is creating or resizing.
3. Even the caller gets wrong blocks, current code makes caller just to
   call md_bitmap_get_counter() one more time.

Signed-off-by: Zhao Heming <[email protected]>
Signed-off-by: Song Liu <[email protected]>

md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks

The patched code is used to get chunks number, should use round-up div
to replace current sector_div. The same code is in md_bitmap_resize():
```
chunks = DIV_ROUND_UP_SECTOR_T(blocks, 1 << chunkshift);
```

Signed-off-by: Zhao Heming <[email protected]>
Signed-off-by: Song Liu <[email protected]>

md/raid0: remove unused function is_io_in_chunk_boundary()

This function is no longger needed after commit 20d0189b1012 ("block:
Introduce new bio_split()").

Signed-off-by: Jason Yan <[email protected]>
Signed-off-by: Song Liu <[email protected]>

Merge tag 'amd-drm-fixes-5.9-2020-10-08' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

amd-drm-fixes-5.9-2020-10-08:

amdgpu:
- Fix a crash on renoir if you override the IP discovery parameter
- Fix the build on ARC platforms
- Display fix for Sienna Cichlid

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

io_uring: fix break condition for __io_uring_register() waiting

Colin reports that there's unreachable code, since we only ever break
if ret == 0. This is correct, and is due to a reversed logic condition
in when to break or not.

Break out of the loop if we don't process any task work, in that case
we do want to return -EINTR.

Fixes: af9c1a44f8de ("io_uring: process task work in io_uring_register()")
Reported-by: Colin Ian King <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

erofs: remove unnecessary enum entries

Opt_nouser_xattr and Opt_noacl are useless, so just remove them.

Signed-off-by: Chengguang Xu <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Gao Xiang <[email protected]>

Merge tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"A few fixes that should go into this release:

   - NVMe controller error path reference fix (Chaitanya)

   - Fix regression with IBM partitions on non-dasd devices (Christoph)

   - Fix a missing clear in the compat CDROM packet structure (Peilin)"

* tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-block:
  partitions/ibm: fix non-DASD devices
  nvme-core: put ctrl ref when module ref get fail
  block/scsi-ioctl: Fix kernel-infoleak in scsi_put_cdrom_generic_arg()

power: supply: sbs-battery: chromebook workaround for PEC

Looks like the I2C tunnel implementation from Chromebook's
embedded controller does not handle PEC correctly. Fix this
by disabling PEC for batteries behind those I2C tunnels as
a workaround.

Note, that some Chromebooks actually have been reported to
have working PEC support (with I2C tunnel). Since the problem
has not yet been fully understood this simply reverts all
Chromebooks to not use PEC for now.

Reported-by: "Milan P. Stanić" <[email protected]>
Reported-by: Vicente Bergas <[email protected]>
CC: Enric Balletbo i Serra <[email protected]>
Fixes: 7222bd603dd2 ("power: supply: sbs-battery: add PEC support")
Tested-by: Vicente Bergas <[email protected]>
Tested-by: "Milan P. Stanić" <[email protected]>
Signed-off-by: Sebastian Reichel <[email protected]>

spi: dw: Add Baikal-T1 SPI Controller bindings

These controllers are based on the DW APB SSI IP-core and embedded into
the SoC, so two of them are equipped with IRQ, DMA, 64 words FIFOs and 4
native CS, while another one as being utilized by the Baikal-T1 System
Boot Controller has got a very limited resources: no IRQ, no DMA, only a
single native chip-select and just 8 bytes Tx/Rx FIFOs available. That's
why we have to mark the IRQ to be optional for the later interface.

The SPI controller embedded into the Baikal-T1 System Boot Controller can
be also used to directly access an external SPI flash by means of a
dedicated FSM. The corresponding MMIO region availability is switchable by
the embedded multiplexor, which phandle can be specified in the dts node.

* We added a new example to test out the non-standard Baikal-T1 System
Boot SPI Controller DT binding.

Co-developed-by: Ramil Zaripov <[email protected]>
Signed-off-by: Ramil Zaripov <[email protected]>
Signed-off-by: Serge Semin <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

spi: dw: Add Baikal-T1 SPI Controller glue driver

Baikal-T1 is equipped with three DW APB SSI-based MMIO SPI controllers.
Two of them are pretty much normal: with IRQ, DMA, FIFOs of 64 words
depth, 4x CSs, but the third one as being a part of the Baikal-T1 System
Boot Controller has got a very limited resources: no IRQ, no DMA, only a
single native chip-select and Tx/Rx FIFO with just 8 words depth
available. In order to provide a transparent initial boot code execution
the Boot SPI controller is also utilized by an vendor-specific IP-block,
which exposes an SPI flash direct mapping interface. Since both direct
mapping and SPI controller normal utilization are mutual exclusive only
one of these interfaces can be used to access an external SPI slave
device. That's why a dedicated mux is embedded into the System Boot
Controller. All of that is taken into account in the Baikal-T1-specific DW
APB SSI glue driver implemented by means of the DW SPI core module.

Co-developed-by: Ramil Zaripov <[email protected]>
Signed-off-by: Ramil Zaripov <[email protected]>
Signed-off-by: Serge Semin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

spi: dw: Add poll-based SPI transfers support

A functionality of the poll-based transfer has been removed by
commit 1ceb09717e98 ("spi: dw: remove cs_control and poll_mode members
from chip_data") with a justification that "there is no user of one
anymore". It turns out one of our DW APB SSI core is synthesized with no
IRQ line attached and the only possible way of using it is to implement a
poll-based SPI transfer procedure. So we have to get the removed
functionality back, but with some alterations described below.

First of all the poll-based transfer is activated only if the DW SPI
controller doesn't have an IRQ line attached and the Linux IRQ number is
initialized with the IRQ_NOTCONNECTED value. Secondly the transfer
procedure is now executed with a delay performed between writer and reader
methods. The delay value is calculated based on the number of data words
expected to be received on the current iteration. Finally the errors
status is checked on each iteration.

Signed-off-by: Serge Semin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

spi: dw: Introduce max mem-ops SPI bus frequency setting

In some circumstances the current implementation of the SPI memory
operations may occasionally fail even though they are executed in the
atomic context. This may happen if the system bus is relatively slow in
comparison to the SPI bus frequency, or there is a concurrent access to
it, which makes the MMIO-operations occasionally stalling before
push-pulling data from the DW APB SPI FIFOs. These two problems we've
discovered on the Baikal-T1 SoC. In order to fix them we have no choice
but to set an artificial limitation on the SPI bus speed.

Note currently this limitation will be only applicable for the memory
operations, since the standard SPI core interface is implemented with an
assumption that there is no problem with the automatic CS toggling.

Signed-off-by: Serge Semin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

spi: dw: Add memory operations support

Aside from the synchronous Tx-Rx mode, which has been utilized to create
the normal SPI transfers in the framework of the DW SSI driver, DW SPI
controller supports Tx-only and EEPROM-read modes. The former one just
enables the controller to transmit all the data from the Tx FIFO ignoring
anything retrieved from the MISO lane. The later mode is so called
write-then-read operation: DW SPI controller first pushes out all the data
from the Tx FIFO, after that it'll automatically receive as much data as
has been specified by means of the CTRLR1 register. Both of those modes
can be used to implement the memory operations supported by the SPI-memory
subsystem.

The memory operation implementation is pretty much straightforward, except
a few peculiarities we have had to take into account to make things
working. Since DW SPI controller doesn't provide a way to directly set and
clear the native CS lane level, but instead automatically de-asserts it
when a transfer going on, we have to make sure the Tx FIFO isn't empty
during entire Tx procedure. In addition we also need to read data from the
Rx FIFO as fast as possible to prevent it' overflow with automatically
fetched incoming traffic. The denoted peculiarities get to cause even more
problems if DW SSI controller is equipped with relatively small FIFO and
is connected to a relatively slow system bus (APB) (with respect to the
SPI bus speed). In order to workaround the problems for as much as it's
possible, the memory operation execution procedure collects all the Tx
data into a single buffer and disables the local IRQs to speed the
write-then-optionally-read method up.

Note the provided memory operations are utilized by default only if
a glue driver hasn't provided a custom version of ones and this is not
a DW APB SSI controller with fixed automatic CS toggle functionality.

Co-developed-by: Ramil Zaripov <[email protected]>
Signed-off-by: Ramil Zaripov <[email protected]>
Signed-off-by: Serge Semin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

spi: dw: Add generic DW SSI status-check method

The DW SSI errors handling method can be generically implemented for all
types of the transfers: IRQ, DMA and poll-based ones. It will be a
function which checks the overflow/underflow error flags and resets the
controller if any of them is set. In the framework of this commit we make
use of the new method to detect the errors in the IRQ- and DMA-based SPI
transfer execution procedures.

Signed-off-by: Serge Semin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

spi: dw: Move num-of retries parameter to the header file

The parameter will be needed for another wait-done method being added in
the framework of the SPI memory operation modification in a further
commit.

Signed-off-by: Serge Semin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>