Linus Torvalds [Mon, 12 Oct 2020 17:10:56 +0000 (10:10 -0700)]
Merge tag 'm68k-for-v5.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- Conversion of the Mac IDE driver to a platform driver
- Minor cleanups and fixes
* tag 'm68k-for-v5.10-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
ide/macide: Convert Mac IDE driver to platform driver
m68k: Replace HTTP links with HTTPS ones
m68k: mm: Remove superfluous memblock_alloc*() casts
m68k: mm: Use PAGE_ALIGNED() helper
m68k: Sort selects in main Kconfig
m68k: amiga: Clean up Amiga hardware configuration
m68k: Revive _TIF_* masks
m68k: Correct some typos in comments
m68k: Use get_kernel_nofault() in show_registers()
zorro: Fix address space collision message with RAM expansion boards
m68k: amiga: Fix Denise detection on OCS
Linus Torvalds [Mon, 12 Oct 2020 17:00:51 +0000 (10:00 -0700)]
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"There's quite a lot of code here, but much of it is due to the
addition of a new PMU driver as well as some arm64-specific selftests
which is an area where we've traditionally been lagging a bit.
In terms of exciting features, this includes support for the Memory
Tagging Extension which narrowly missed 5.9, hopefully allowing
userspace to run with use-after-free detection in production on CPUs
that support it. Work is ongoing to integrate the feature with KASAN
for 5.11.
Another change that I'm excited about (assuming they get the hardware
right) is preparing the ASID allocator for sharing the CPU page-table
with the SMMU. Those changes will also come in via Joerg with the
IOMMU pull.
We do stray outside of our usual directories in a few places, mostly
due to core changes required by MTE. Although much of this has been
Acked, there were a couple of places where we unfortunately didn't get
any review feedback.
Other than that, we ran into a handful of minor conflicts in -next,
but nothing that should post any issues.
Summary:
- Userspace support for the Memory Tagging Extension introduced by
Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.
- Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
switching.
- Fix and subsequent rewrite of our Spectre mitigations, including
the addition of support for PR_SPEC_DISABLE_NOEXEC.
- Support for the Armv8.3 Pointer Authentication enhancements.
- Support for ASID pinning, which is required when sharing
page-tables with the SMMU.
- MM updates, including treating flush_tlb_fix_spurious_fault() as a
no-op.
- Perf/PMU driver updates, including addition of the ARM CMN PMU
driver and also support to handle CPU PMU IRQs as NMIs.
- Allow prefetchable PCI BARs to be exposed to userspace using normal
non-cacheable mappings.
- Implementation of ARCH_STACKWALK for unwinding.
- Improve reporting of unexpected kernel traps due to BPF JIT
failure.
- Improve robustness of user-visible HWCAP strings and their
corresponding numerical constants.
- Removal of TEXT_OFFSET.
- Removal of some unused functions, parameters and prototypes.
- Removal of MPIDR-based topology detection in favour of firmware
description.
- Cleanups to handling of SVE and FPSIMD register state in
preparation for potential future optimisation of handling across
syscalls.
- Cleanups to the SDEI driver in preparation for support in KVM.
- Miscellaneous cleanups and refactoring work"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
Revert "arm64: initialize per-cpu offsets earlier"
arm64: random: Remove no longer needed prototypes
arm64: initialize per-cpu offsets earlier
kselftest/arm64: Check mte tagged user address in kernel
kselftest/arm64: Verify KSM page merge for MTE pages
kselftest/arm64: Verify all different mmap MTE options
kselftest/arm64: Check forked child mte memory accessibility
kselftest/arm64: Verify mte tag inclusion via prctl
kselftest/arm64: Add utilities and a test to validate mte memory
perf: arm-cmn: Fix conversion specifiers for node type
perf: arm-cmn: Fix unsigned comparison to less than zero
arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
arm64: Get rid of arm64_ssbd_state
KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
KVM: arm64: Get rid of kvm_arm_have_ssbd()
KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
...
Linus Torvalds [Mon, 12 Oct 2020 16:54:39 +0000 (09:54 -0700)]
Merge tag 'tpmdd-next-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
"Support for a new TPM device and fixes and Git URL change (infraded ->
korg)"
* tag 'tpmdd-next-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
MAINTAINERS: TPM DEVICE DRIVER: Update GIT
tpm_tis: Add a check for invalid status
tpm: use %*ph to print small buffer
dt-bindings: Add SynQucer TPM MMIO as a trivial device
tpm: tis: add support for MMIO TPM on SynQuacer
Jiri Olsa [Wed, 16 Sep 2020 11:53:11 +0000 (13:53 +0200)]
perf/core: Fix race in the perf_mmap_close() function
There's a possible race in perf_mmap_close() when checking ring buffer's
mmap_count refcount value. The problem is that the mmap_count check is
not atomic because we call atomic_dec() and atomic_read() separately.
perf_mmap_close:
...
atomic_dec(&rb->mmap_count);
...
if (atomic_read(&rb->mmap_count))
goto out_put;
<ring buffer detach>
free_uid
out_put:
ring_buffer_put(rb); /* could be last */
The race can happen when we have two (or more) events sharing same ring
buffer and they go through atomic_dec() and then they both see 0 as refcount
value later in atomic_read(). Then both will go on and execute code which
is meant to be run just once.
The code that detaches ring buffer is probably fine to be executed more
than once, but the problem is in calling free_uid(), which will later on
demonstrate in related crashes and refcount warnings, like:
Linus Torvalds [Sun, 11 Oct 2020 18:18:04 +0000 (11:18 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"Five fixes.
Subsystems affected by this patch series: MAINTAINERS, mm/pagemap,
mm/swap, and mm/hugetlb"
* emailed patches from Andrew Morton <[email protected]>:
mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
mm: validate inode in mapping_set_error()
mm: mmap: Fix general protection fault in unlink_file_vma()
MAINTAINERS: Antoine Tenart's email address
MAINTAINERS: change hardening mailing list
Linus Torvalds [Sun, 11 Oct 2020 17:53:37 +0000 (10:53 -0700)]
Merge tag 'x86-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two fixes:
- Fix a (hopefully final) IRQ state tracking bug vs MCE handling
- Fix a documentation link"
* tag 'x86-urgent-2020-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation/x86: Fix incorrect references to zero-page.txt
x86/mce: Use idtentry_nmi_enter/exit()
Thomas Gleixner [Sun, 11 Oct 2020 17:53:13 +0000 (19:53 +0200)]
Merge tag 'irqchip-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
Core changes:
- Allow irq retriggering to follow a hierarchy
- Allow interrupt hierarchies to be trimmed at allocation time
- Allow interrupts to be hidden from /proc/interrupts (IPIs)
- Introduce stub for set_handle_irq() when !GENERIC_IRQ_MULTI_HANDLER
- New per-cpu IPI handling flow
Architecture changes:
- Move arm/arm64 IPI handling to the core interrupt code, removing
the home brewed accounting
Driver updates:
- New driver for the MStar (and more recently Mediatek) platforms
- New driver for the Actions Owl SIRQ controller
- New driver for the TI PRUSS infrastructure
- Wake-up support for the Qualcomm PDC controller
- Primary interrupt controller support for the Designware APB ICTL
- Convert the IPI code for GIC, GICv3, hip04, armada-270-xp and bcm2836
to using standard interrupts
- Improve GICv3 pseudo-NMI support to deal with both non-secure and secure
priorities on arm64
- Convert the GIC/GICv3 drivers to using HW-based irq retrigger
- A sprinkling of dev_err_probe() conversion
- A set of NVIDIA Tegra fixes for interrupt hierarchy corruption
- A reset fix for the Loongson HTVEC driver
- A couple of error handling fixes in the TI SCI drivers
mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
When memory is hotplug added or removed the min_free_kbytes should be
recalculated based on what is expected by khugepaged. Currently after
hotplug, min_free_kbytes will be set to a lower default and higher
default set when THP enabled is lost.
This change restores min_free_kbytes as expected for THP consumers.
It's because the ->mmap() callback can change vma->vm_file and fput the
original file. But the commit d70cec898324 ("mm: mmap: merge vma after
call_mmap() if possible") failed to catch this case and always fput()
the original file, hence add an extra fput().
[ Thanks Hillf for pointing this extra fput() out. ]
Kees Cook [Sun, 11 Oct 2020 06:16:27 +0000 (23:16 -0700)]
MAINTAINERS: change hardening mailing list
As more email from git history gets aimed at the OpenWall
kernel-hardening@ list, there has been a desire to separate "new topics"
from "on-going" work.
To handle this, the superset of hardening email topics are now to be
directed to [email protected].
Update the MAINTAINERS file and the .mailmap to accomplish this, so that
linux-hardening@ can be treated like any other regular upstream kernel
development list.
Linus Torvalds [Sat, 10 Oct 2020 23:09:12 +0000 (16:09 -0700)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Some more driver bugfixes for I2C. Including a revert - the updated
series for it will come during the next merge window"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: owl: Clear NACK and BUS error bits
Revert "i2c: imx: Fix reset of I2SR_IAL flag"
i2c: meson: fixup rate calculation with filter delay
i2c: meson: keep peripheral clock enabled
i2c: meson: fix clock setting overwrite
i2c: imx: Fix reset of I2SR_IAL flag
cifs: Fix incomplete memory allocation on setxattr path
On setxattr() syscall path due to an apprent typo the size of a dynamically
allocated memory chunk for storing struct smb2_file_full_ea_info object is
computed incorrectly, to be more precise the first addend is the size of
a pointer instead of the wanted object size. Coincidentally it makes no
difference on 64-bit platforms, however on 32-bit targets the following
memcpy() writes 4 bytes of data outside of the dynamically allocated memory.
Disabling lock debugging due to kernel taint
INFO: 0x79e69a6f-0x9e5cdecf @offset=368. First byte 0x73 instead of 0xcc
INFO: Slab 0xd36d2454 objects=85 used=51 fp=0xf7d0fc7a flags=0x35000201
INFO: Object 0x6f171df3 @offset=352 fp=0x00000000
Redzone 5d4ff02d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
Object 6f171df3: 00 00 00 00 00 05 06 00 73 6e 72 75 62 00 66 69 ........snrub.fi
Redzone 79e69a6f: 73 68 32 0a sh2.
Padding 56254d82: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
CPU: 0 PID: 8196 Comm: attr Tainted: G B 5.9.0-rc8+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Call Trace:
dump_stack+0x54/0x6e
print_trailer+0x12c/0x134
check_bytes_and_report.cold+0x3e/0x69
check_object+0x18c/0x250
free_debug_processing+0xfe/0x230
__slab_free+0x1c0/0x300
kfree+0x1d3/0x220
smb2_set_ea+0x27d/0x540
cifs_xattr_set+0x57f/0x620
__vfs_setxattr+0x4e/0x60
__vfs_setxattr_noperm+0x4e/0x100
__vfs_setxattr_locked+0xae/0xd0
vfs_setxattr+0x4e/0xe0
setxattr+0x12c/0x1a0
path_setxattr+0xa4/0xc0
__ia32_sys_lsetxattr+0x1d/0x20
__do_fast_syscall_32+0x40/0x70
do_fast_syscall_32+0x29/0x60
do_SYSENTER_32+0x15/0x20
entry_SYSENTER_32+0x9f/0xf2
Fixes: 5517554e4313 ("cifs: Add support for writing attributes on SMB2+") Signed-off-by: Vladimir Zapolskiy <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
There have been elusive reports of filemap_fault() hitting its
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
with CONFIG_READ_ONLY_THP_FOR_FS=y.
Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
without NUMA reuses the same huge page after collapse_file() failed
(whereas NUMA targets its allocation to the respective node each time).
And most of us were usually testing with CONFIG_NUMA=y kernels.
An initial patch replaced __SetPageLocked() by lock_page(), which did
fix the race which Suren illustrates above. But testing showed that it's
not good enough: if the racing task's __lock_page() gets delayed long
after its find_get_page(), then it may follow collapse_file(new start)'s
successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.
It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
check and retry (as is done for mapping), with similar relaxations in
find_lock_entry() and pagecache_get_page(): but it's not obvious what
else might get caught out; and khugepaged non-NUMA appears to be unique
in exposing a page to page cache, then revoking, without going through
a full cycle of freeing before reuse.
Instead, non-NUMA khugepaged_prealloc_page() release the old page
if anyone else has a reference to it (1% of cases when I tested).
Although never reported on huge tmpfs, I believe its find_lock_entry()
has been at similar risk; but huge tmpfs does not rely on khugepaged
for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.
Pavel Begunkov [Sat, 10 Oct 2020 17:34:16 +0000 (18:34 +0100)]
io_uring: keep a pointer ref_node in file_data
->cur_refs of struct fixed_file_data always points to percpu_ref
embedded into struct fixed_file_ref_node. Don't overuse container_of()
and offsetting, and point directly to fixed_file_ref_node.
Pavel Begunkov [Sat, 10 Oct 2020 17:34:13 +0000 (18:34 +0100)]
io_uring: don't delay io_init_req() error check
Don't postpone io_init_req() error checks and do that right after
calling it. There is no control-flow statements or dependencies with
sqe/submitted accounting, so do those earlier, that makes the code flow
a bit more natural.
Pavel Begunkov [Sat, 10 Oct 2020 17:34:11 +0000 (18:34 +0100)]
io_uring: remove timeout.list after hrtimer cancel
Remove timeouts from ctx->timeout_list after hrtimer_try_to_cancel()
successfully cancels it. With this we don't need to care whether there
was a race and it was removed in io_timeout_fn(), and that will be handy
for following patches.
Pavel Begunkov [Sat, 10 Oct 2020 17:34:10 +0000 (18:34 +0100)]
io_uring: use a separate struct for timeout_remove
Don't use struct io_timeout for both IORING_OP_TIMEOUT and
IORING_OP_TIMEOUT_REMOVE, they're quite different. Split them in two,
that allows to remove an unused field in struct io_timeout, and btw kill
->flags not used by either. This also easier to follow, especially for
timeout remove.
state->ios_left isn't decremented for requests that don't need a file,
so it might be larger than number of SQEs left. That in some
circumstances makes us to grab more files that is needed so imposing
extra put.
Deaccount one ios_left for each request.
Pavel Begunkov [Sat, 10 Oct 2020 17:34:08 +0000 (18:34 +0100)]
io_uring: simplify io_file_get()
Keep ->needs_file_no_error check out of io_file_get(), and let callers
handle it. It makes it more straightforward. Also, as the only error it
can hand back -EBADF, make it return a file or NULL.
Pavel Begunkov [Sat, 10 Oct 2020 17:34:07 +0000 (18:34 +0100)]
io_uring: kill extra check in fixed io_file_get()
ctx->nr_user_files == 0 IFF ctx->file_data == NULL and there fixed files
are not used. Hence, verifying fds only against ctx->nr_user_files is
enough. Remove the other check from hot path.
Pavel Begunkov [Sat, 10 Oct 2020 17:34:06 +0000 (18:34 +0100)]
io_uring: clean up ->files grabbing
Move work.files grabbing into io_prep_async_work() to all other work
resources initialisation. We don't need to keep it separately now, as
->ring_fd/file are gone. It also allows to not grab it when a request
is not going to io-wq.
Pavel Begunkov [Sat, 10 Oct 2020 17:34:05 +0000 (18:34 +0100)]
io_uring: don't io_prep_async_work() linked reqs
There is no real reason left for preparing io-wq work context for linked
requests in advance, remove it as this might become a bottleneck in some
cases.
When the NACK and BUS error bits are set by the hardware, the driver is
responsible for clearing them by writing "1" into the corresponding
status registers.
Hence perform the necessary operations in owl_i2c_interrupt().
The Tegra PMC driver does ungodly things with the interrupt hierarchy,
repeatedly corrupting it by pulling hwirq numbers out of thin air,
overriding existing IRQ mappings and changing the handling flow
of unsuspecting users.
All of this is done in the name of preserving the interrupt hierarchy
even when these levels do not exist in the HW. Together with the use
of proper IRQs for IPIs, this leads to an unbootable system as the
rescheduling IPI gets repeatedly repurposed for random drivers...
Instead, let's simply mark the level from which the hierarchy does
not make sense for the HW, and let the core code trim the usused
levels from the hierarchy.
Marc Zyngier [Tue, 6 Oct 2020 09:10:20 +0000 (10:10 +0100)]
genirq/irqdomain: Allow partial trimming of irq_data hierarchy
It appears that some HW is ugly enough that not all the interrupts
connected to a particular interrupt controller end up with the same
hierarchy depth (some of them are terminated early). This leaves
the irqchip hacker with only two choices, both equally bad:
- create discrete domain chains, one for each "hierarchy depth",
which is very hard to maintain
- create fake hierarchy levels for the shallow paths, leading
to all kind of problems (what are the safe hwirq values for these
fake levels?)
Implement the ability to cut short a single interrupt hierarchy
from a level marked as being disconnected by using the new
irq_domain_disconnect_hierarchy() helper.
The irqdomain allocation code will then perform the trimming
Wolfram Sang [Sat, 10 Oct 2020 11:03:54 +0000 (13:03 +0200)]
Revert "i2c: imx: Fix reset of I2SR_IAL flag"
This reverts commit fa4d30556883f2eaab425b88ba9904865a4d00f3. An updated
version was sent. So, revert this version and give the new version more
time for testing.
Linus Torvalds [Sat, 10 Oct 2020 01:05:12 +0000 (18:05 -0700)]
Merge tag 'spi-fix-v5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One last minute fix for v5.9 which has been causing crashes in test
systems with the fsl-dspi driver when they hit deferred probe (and
which I probably let cook in next a bit longer than is ideal).
And an update to MAINTAINERS reflecting Serge's extensive and
detailed recent work on the DesignWare driver"
* tag 'spi-fix-v5.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
MAINTAINERS: Add maintainer of DW APB SSI driver
spi: fsl-dspi: fix NULL pointer dereference
Linus Torvalds [Fri, 9 Oct 2020 18:49:22 +0000 (11:49 -0700)]
Merge tag 'riscv-for-linus-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"Two fixes this week:
- A fix to actually reserve the device tree's memory. Without this
the device tree can be overwritten on systems that don't otherwise
reserve it. This issue should only manifest on !MMU systems.
- A workaround for a BUG() that triggers when the memory that
originally contained initdata is freed and later repurposed. This
triggers a BUG() on builds that had HARDENED_USERCOPY enabled"
* tag 'riscv-for-linus-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fixup bootup failure with HARDENED_USERCOPY
RISC-V: Make sure memblock reserves the memory containing DT
Pali Rohár [Fri, 9 Oct 2020 08:42:44 +0000 (10:42 +0200)]
ata: ahci: mvebu: Make SATA PHY optional for Armada 3720
Older ATF does not provide SMC call for SATA phy power on functionality and
therefore initialization of ahci_mvebu is failing when older version of ATF
is using. In this case phy_power_on() function returns -EOPNOTSUPP.
This patch adds a new hflag AHCI_HFLAG_IGN_NOTSUPP_POWER_ON which cause
that ahci_platform_enable_phys() would ignore -EOPNOTSUPP errors from
phy_power_on() call.
It fixes initialization of ahci_mvebu on Espressobin boards where is older
Marvell's Arm Trusted Firmware without SMC call for SATA phy power.
This is regression introduced in commit 8e18c8e58da64 ("arm64: dts: marvell:
armada-3720-espressobin: declare SATA PHY property") where SATA phy was
defined and therefore ahci_platform_enable_phys() on Espressobin started
failing.
Yang Yang [Fri, 9 Oct 2020 08:00:14 +0000 (01:00 -0700)]
blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
blk_exit_queue will free elevator_data, while blk_mq_run_work_fn
will access it. Move cancel of hctx->run_work to the front of
blk_exit_queue to avoid use-after-free.
Fixes: 1b97871b501f ("blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release") Signed-off-by: Yang Yang <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Linus Torvalds [Fri, 9 Oct 2020 18:38:07 +0000 (11:38 -0700)]
Merge tag 'for-v5.9-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply fix from Sebastian Reichel:
"Just a single change to revert enablement of packet error checking for
battery data on Chromebooks, since some of their embedded controllers
do not handle it correctly"
* tag 'for-v5.9-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: sbs-battery: chromebook workaround for PEC
Yufen Yu [Fri, 9 Oct 2020 03:26:33 +0000 (23:26 -0400)]
blk-mq: get rid of the dead flush handle code path
After commit 923218f6166a ("blk-mq: don't allocate driver tag upfront
for flush rq"), blk_mq_submit_bio() will call blk_insert_flush()
directly to handle flush request rather than blk_mq_sched_insert_request()
in the case of elevator.
Then, all flush request either have set RQF_FLUSH_SEQ flag when call
blk_mq_sched_insert_request(), or have inserted into hctx->dispatch.
So, remove the dead code path.
Yufen Yu [Fri, 9 Oct 2020 03:26:31 +0000 (23:26 -0400)]
block: fix comment and add lockdep assert
After commit b89f625e28d4 ("block: don't release queue's sysfs
lock during switching elevator"), whole elevator register and
unregister function are covered by sysfs_lock. So, remove wrong
comment and add lockdep assert.
Yufen Yu [Fri, 9 Oct 2020 03:26:27 +0000 (23:26 -0400)]
block: invoke blk_mq_exit_sched no matter whether have .exit_sched
We will register debugfs for scheduler no matter whether it have
defined callback funciton .exit_sched. So, blk_mq_exit_sched()
is always needed to unregister debugfs. Also, q->elevator should
be set as NULL after exiting scheduler.
For now, since all register scheduler have defined .exit_sched,
it will not cause any actual problem. But It will be more reasonable
to do this change.
Linus Torvalds [Fri, 9 Oct 2020 18:33:48 +0000 (11:33 -0700)]
Merge tag 'gpio-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Some late fixes: one IRQ issue and one compilation issue for UML.
- Fix a compilation issue with User Mode Linux
- Handle spurious interrupts properly in the PCA953x driver"
* tag 'gpio-v5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: pca953x: Survive spurious interrupts
gpiolib: Disable compat ->read() code in UML case
Ming Lei [Fri, 9 Oct 2020 04:03:56 +0000 (12:03 +0800)]
percpu_ref: don't refer to ref->data if it isn't allocated
We can't check ref->data->confirm_switch directly in __percpu_ref_exit(), since
ref->data may not be allocated in one not-initialized refcount.
Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") Reported-by: [email protected] Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
John Hubbard [Fri, 9 Oct 2020 07:01:28 +0000 (00:01 -0700)]
Documentation: better locations for sysfs-pci, sysfs-tagging
sysfs-pci and sysfs-tagging were mis-filed: their locations within
Documentation/ implied that they were related to file systems. Actually,
each topic is about a very specific *use* of sysfs, and sysfs *happens*
to be a (virtual) filesystem, so this is not really the right place.
It's jarring to be reading about filesystems in general and then come
across these specific details about PCI, and tagging...and then back to
general filesystems again.
Move sysfs-pci to PCI, and move sysfs-tagging to networking. (Thanks to
Jonathan Corbet for coming up with the final locations.)
Jens Axboe [Fri, 9 Oct 2020 15:03:20 +0000 (09:03 -0600)]
Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.10/drivers
Pull MD updates from Song:
"The main changes are:
- Bug fixes in bitmap code, from Zhao Heming.
- Fix a work queue check, from Guoqing Jiang.
- Fix raid5 oops with reshape, from Song Liu.
- Clean up unused code, from Jason Yan."
* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md/raid5: fix oops during stripe resizing
md/bitmap: fix memory leak of temporary bitmap
md: fix the checking of wrong work queue
md/bitmap: md_bitmap_get_counter returns wrong blocks
md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks
md/raid0: remove unused function is_io_in_chunk_boundary()
io_uring: Convert advanced XArray uses to the normal API
There are no bugs here that I've spotted, it's just easier to use the
normal API and there are no performance advantages to using the more
verbose advanced API.
io_uring: Fix XArray usage in io_uring_add_task_file
The xas_store() wasn't paired with an xas_nomem() loop, so if it couldn't
allocate memory using GFP_NOWAIT, it would leak the reference to the file
descriptor. Also the node pointed to by the xas could be freed between
the call to xas_load() under the rcu_read_lock() and the acquisition of
the xa_lock.
It's easier to just use the normal xa_load/xa_store interface here.
Ingo Molnar [Fri, 9 Oct 2020 06:35:01 +0000 (08:35 +0200)]
Merge branch 'kcsan' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into locking/core
Pull KCSAN updates for v5.10 from Paul E. McKenney:
- Improve kernel messages.
- Be more permissive with bitops races under KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y.
- Optimize debugfs stat counters.
- Introduce the instrument_*read_write() annotations, to provide a
finer description of certain ops - using KCSAN's compound instrumentation.
Use them for atomic RNW and bitops, where appropriate.
Doing this might find new races.
(Depends on the compiler having tsan-compound-read-before-write=1 support.)
- Support atomic built-ins, which will help certain architectures, such as s390.
Peter Zijlstra [Mon, 5 Oct 2020 07:56:57 +0000 (09:56 +0200)]
lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"
The thinking in commit:
fddf9055a60d ("lockdep: Use raw_cpu_*() for per-cpu variables")
is flawed. While it is true that when we're migratable both CPUs will
have a 0 value, it doesn't hold that when we do get migrated in the
middle of a raw_cpu_op(), the old CPU will still have 0 by the time we
get around to reading it on the new CPU.
Luckily, the reason for that commit (s390 using preempt_disable()
instead of preempt_disable_notrace() in their percpu code), has since
been fixed by commit:
1196f12a2c96 ("s390: don't trace preemption in percpu macros")
An audit of arch/*/include/asm/percpu*.h shows there are no other
architectures affected by this particular issue.
Peter Zijlstra [Fri, 2 Oct 2020 09:04:21 +0000 (11:04 +0200)]
lockdep: Fix lockdep recursion
Steve reported that lockdep_assert*irq*(), when nested inside lockdep
itself, will trigger a false-positive.
One example is the stack-trace code, as called from inside lockdep,
triggering tracing, which in turn calls RCU, which then uses
lockdep_assert_irqs_disabled().
Fixes: a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables") Reported-by: Steven Rostedt <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
Coly Li [Fri, 2 Oct 2020 01:38:52 +0000 (09:38 +0800)]
mmc: core: don't set limits.discard_granularity as 0
In mmc_queue_setup_discard() the mmc driver queue's discard_granularity
might be set as 0 (when card->pref_erase > max_discard) while the mmc
device still declares to support discard operation. This is buggy and
triggered the following kernel warning message,
This patch fixes the issue by setting discard_granularity as SECTOR_SIZE
instead of 0 when (card->pref_erase > max_discard) is true. Now no more
complain from __blkdev_issue_discard() for the improper value of discard
granularity.
This issue is exposed after commit b35fd7422c2f ("block: check queue's
limits.discard_granularity in __blkdev_issue_discard()"), a "Fixes:" tag
is also added for the commit to make sure people won't miss this patch
after applying the change of __blkdev_issue_discard().
Kajol Jain [Thu, 27 Aug 2020 06:47:32 +0000 (12:17 +0530)]
perf: Fix task_function_call() error handling
The error handling introduced by commit:
2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
looses any return value from smp_call_function_single() that is not
{0, -EINVAL}. This is a problem because it will return -EXNIO when the
target CPU is offline. Worse, in that case it'll turn into an infinite
loop.
This is because cache_size_mutex was unlocked too early in resize_stripes,
which races with grow_one_stripe() that grow_one_stripe() allocates a
stripe with wrong pool_size.
Fix this issue by unlocking cache_size_mutex after updating pool_size.
The minus 1 is wrong, this branch should report 2048 bits of space.
With "-1" action, this only report 1024 bit of space.
This bug code returns wrong blocks, but it doesn't inflence bitmap logic:
1. Most callers focus this function return value (the counter of offset),
not the parameter blocks.
2. The bug is only triggered when hijacked is true or map is NULL.
the hijacked true condition is very rare.
the "map == null" only true when array is creating or resizing.
3. Even the caller gets wrong blocks, current code makes caller just to
call md_bitmap_get_counter() one more time.
The patched code is used to get chunks number, should use round-up div
to replace current sector_div. The same code is in md_bitmap_resize():
```
chunks = DIV_ROUND_UP_SECTOR_T(blocks, 1 << chunkshift);
```
Jens Axboe [Fri, 9 Oct 2020 01:09:46 +0000 (19:09 -0600)]
io_uring: fix break condition for __io_uring_register() waiting
Colin reports that there's unreachable code, since we only ever break
if ret == 0. This is correct, and is due to a reversed logic condition
in when to break or not.
Break out of the loop if we don't process any task work, in that case
we do want to return -EINTR.
Fixes: af9c1a44f8de ("io_uring: process task work in io_uring_register()") Reported-by: Colin Ian King <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
- Fix regression with IBM partitions on non-dasd devices (Christoph)
- Fix a missing clear in the compat CDROM packet structure (Peilin)"
* tag 'block5.9-2020-10-08' of git://git.kernel.dk/linux-block:
partitions/ibm: fix non-DASD devices
nvme-core: put ctrl ref when module ref get fail
block/scsi-ioctl: Fix kernel-infoleak in scsi_put_cdrom_generic_arg()
power: supply: sbs-battery: chromebook workaround for PEC
Looks like the I2C tunnel implementation from Chromebook's
embedded controller does not handle PEC correctly. Fix this
by disabling PEC for batteries behind those I2C tunnels as
a workaround.
Note, that some Chromebooks actually have been reported to
have working PEC support (with I2C tunnel). Since the problem
has not yet been fully understood this simply reverts all
Chromebooks to not use PEC for now.
Serge Semin [Wed, 7 Oct 2020 23:55:09 +0000 (02:55 +0300)]
spi: dw: Add Baikal-T1 SPI Controller bindings
These controllers are based on the DW APB SSI IP-core and embedded into
the SoC, so two of them are equipped with IRQ, DMA, 64 words FIFOs and 4
native CS, while another one as being utilized by the Baikal-T1 System
Boot Controller has got a very limited resources: no IRQ, no DMA, only a
single native chip-select and just 8 bytes Tx/Rx FIFOs available. That's
why we have to mark the IRQ to be optional for the later interface.
The SPI controller embedded into the Baikal-T1 System Boot Controller can
be also used to directly access an external SPI flash by means of a
dedicated FSM. The corresponding MMIO region availability is switchable by
the embedded multiplexor, which phandle can be specified in the dts node.
* We added a new example to test out the non-standard Baikal-T1 System
Boot SPI Controller DT binding.
Serge Semin [Wed, 7 Oct 2020 23:55:10 +0000 (02:55 +0300)]
spi: dw: Add Baikal-T1 SPI Controller glue driver
Baikal-T1 is equipped with three DW APB SSI-based MMIO SPI controllers.
Two of them are pretty much normal: with IRQ, DMA, FIFOs of 64 words
depth, 4x CSs, but the third one as being a part of the Baikal-T1 System
Boot Controller has got a very limited resources: no IRQ, no DMA, only a
single native chip-select and Tx/Rx FIFO with just 8 words depth
available. In order to provide a transparent initial boot code execution
the Boot SPI controller is also utilized by an vendor-specific IP-block,
which exposes an SPI flash direct mapping interface. Since both direct
mapping and SPI controller normal utilization are mutual exclusive only
one of these interfaces can be used to access an external SPI slave
device. That's why a dedicated mux is embedded into the System Boot
Controller. All of that is taken into account in the Baikal-T1-specific DW
APB SSI glue driver implemented by means of the DW SPI core module.
Serge Semin [Wed, 7 Oct 2020 23:55:08 +0000 (02:55 +0300)]
spi: dw: Add poll-based SPI transfers support
A functionality of the poll-based transfer has been removed by
commit 1ceb09717e98 ("spi: dw: remove cs_control and poll_mode members
from chip_data") with a justification that "there is no user of one
anymore". It turns out one of our DW APB SSI core is synthesized with no
IRQ line attached and the only possible way of using it is to implement a
poll-based SPI transfer procedure. So we have to get the removed
functionality back, but with some alterations described below.
First of all the poll-based transfer is activated only if the DW SPI
controller doesn't have an IRQ line attached and the Linux IRQ number is
initialized with the IRQ_NOTCONNECTED value. Secondly the transfer
procedure is now executed with a delay performed between writer and reader
methods. The delay value is calculated based on the number of data words
expected to be received on the current iteration. Finally the errors
status is checked on each iteration.
Serge Semin [Wed, 7 Oct 2020 23:55:07 +0000 (02:55 +0300)]
spi: dw: Introduce max mem-ops SPI bus frequency setting
In some circumstances the current implementation of the SPI memory
operations may occasionally fail even though they are executed in the
atomic context. This may happen if the system bus is relatively slow in
comparison to the SPI bus frequency, or there is a concurrent access to
it, which makes the MMIO-operations occasionally stalling before
push-pulling data from the DW APB SPI FIFOs. These two problems we've
discovered on the Baikal-T1 SoC. In order to fix them we have no choice
but to set an artificial limitation on the SPI bus speed.
Note currently this limitation will be only applicable for the memory
operations, since the standard SPI core interface is implemented with an
assumption that there is no problem with the automatic CS toggling.
Serge Semin [Wed, 7 Oct 2020 23:55:06 +0000 (02:55 +0300)]
spi: dw: Add memory operations support
Aside from the synchronous Tx-Rx mode, which has been utilized to create
the normal SPI transfers in the framework of the DW SSI driver, DW SPI
controller supports Tx-only and EEPROM-read modes. The former one just
enables the controller to transmit all the data from the Tx FIFO ignoring
anything retrieved from the MISO lane. The later mode is so called
write-then-read operation: DW SPI controller first pushes out all the data
from the Tx FIFO, after that it'll automatically receive as much data as
has been specified by means of the CTRLR1 register. Both of those modes
can be used to implement the memory operations supported by the SPI-memory
subsystem.
The memory operation implementation is pretty much straightforward, except
a few peculiarities we have had to take into account to make things
working. Since DW SPI controller doesn't provide a way to directly set and
clear the native CS lane level, but instead automatically de-asserts it
when a transfer going on, we have to make sure the Tx FIFO isn't empty
during entire Tx procedure. In addition we also need to read data from the
Rx FIFO as fast as possible to prevent it' overflow with automatically
fetched incoming traffic. The denoted peculiarities get to cause even more
problems if DW SSI controller is equipped with relatively small FIFO and
is connected to a relatively slow system bus (APB) (with respect to the
SPI bus speed). In order to workaround the problems for as much as it's
possible, the memory operation execution procedure collects all the Tx
data into a single buffer and disables the local IRQs to speed the
write-then-optionally-read method up.
Note the provided memory operations are utilized by default only if
a glue driver hasn't provided a custom version of ones and this is not
a DW APB SSI controller with fixed automatic CS toggle functionality.
Serge Semin [Wed, 7 Oct 2020 23:55:05 +0000 (02:55 +0300)]
spi: dw: Add generic DW SSI status-check method
The DW SSI errors handling method can be generically implemented for all
types of the transfers: IRQ, DMA and poll-based ones. It will be a
function which checks the overflow/underflow error flags and resets the
controller if any of them is set. In the framework of this commit we make
use of the new method to detect the errors in the IRQ- and DMA-based SPI
transfer execution procedures.