Parav Pandit [Mon, 2 Nov 2020 10:41:28 +0000 (12:41 +0200)]
net/mlx5: E-switch, Avoid extack error log for disabled vport
When E-switch vport is disabled, querying its hardware address is
unsupported.
Avoid setting extack error log message in such case.
Fixes: f099fde16db3 ("net/mlx5: E-switch, Support querying port function mac address") Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
Maor Gottlieb [Wed, 21 Oct 2020 05:42:49 +0000 (08:42 +0300)]
net/mlx5: Fix deletion of duplicate rules
When a rule is duplicated, the refcount of the rule is increased so only
the second deletion of the rule should cause destruction of the FTE.
Currently, the FTE will be destroyed in the first deletion of rule since
the modify_mask will be 0.
Fix it and call to destroy FTE only if all the rules (FTE's children)
have been removed.
Fixes: 718ce4d601db ("net/mlx5: Consolidate update FTE for all removal changes") Signed-off-by: Maor Gottlieb <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
async_icosq_lock may be taken from softirq and non-softirq contexts. It
requires protection with spin_lock_bh, otherwise a softirq may be
triggered in the middle of the critical section, and it may deadlock if
it tries to take the same lock. This patch fixes such a scenario by
using spin_lock_bh to disable softirqs on that CPU while inside the
critical section.
Fixes: 8d94b590f1e4 ("net/mlx5e: Turn XSK ICOSQ into a general asynchronous one") Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
Vlad Buslov [Mon, 31 Aug 2020 13:17:29 +0000 (16:17 +0300)]
net/mlx5e: Protect encap route dev from concurrent release
In functions mlx5e_route_lookup_ipv{4|6}() route_dev can be arbitrary net
device and not necessary mlx5 eswitch port representor. As such, in order
to ensure that route_dev is not destroyed concurrent the code needs either
explicitly take reference to the device before releasing reference to
rtable instance or ensure that caller holds rtnl lock. First approach is
chosen as a fix since rtnl lock dependency was intentionally removed from
mlx5 TC layer.
To prevent unprotected usage of route_dev in encap code take a reference to
the device before releasing rt. Don't save direct pointer to the device in
mlx5_encap_entry structure and use ifindex instead. Modify users of
route_dev pointer to properly obtain the net device instance from its
ifindex.
Fixes: 61086f391044 ("net/mlx5e: Protect encap hash table with mutex") Fixes: 6707f74be862 ("net/mlx5e: Update hw flows when encap source mac changed") Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
Modify header actions are allocated during parse tc actions and only
freed during the flow creation, however, on error flow the allocated
memory is wrongly unfreed.
Fix this by calling dealloc_mod_hdr_actions in __mlx5e_add_fdb_flow
and mlx5e_add_nic_flow error flow.
Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions") Fixes: 2f4fe4cab073 ("net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions") Signed-off-by: Maor Dickman <[email protected]> Reviewed-by: Paul Blakey <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
Linus Torvalds [Thu, 5 Nov 2020 19:52:17 +0000 (11:52 -0800)]
Merge tag 'linux-kselftest-kunit-fixes-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kunit fixes from Shuah Khan:
"Several kunit_tool and documentation fixes"
* tag 'linux-kselftest-kunit-fixes-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: tools: fix kunit_tool tests for parsing test plans
Documentation: kunit: Update Kconfig parts for KUNIT's module support
kunit: test: fix remaining kernel-doc warnings
kunit: Don't fail test suites if one of them is empty
kunit: Fix kunit.py --raw_output option
Linus Torvalds [Thu, 5 Nov 2020 19:41:38 +0000 (11:41 -0800)]
Merge tag 'trace-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix off-by-one error in retrieving the context buffer for
trace_printk()
- Fix off-by-one error in stack nesting limit
- Fix recursion to not make all NMI code false positive as recursing
- Stop losing events in function tracing when transitioning between irq
context
- Stop losing events in ring buffer when transitioning between irq
context
- Fix return code of error pointer in parse_synth_field() to prevent
NULL pointer dereference.
- Fix false positive of NMI recursion in kprobe event handling
* tag 'trace-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
kprobes: Tell lockdep about kprobe nesting
tracing: Make -ENOMEM the default error for parse_synth_field()
ring-buffer: Fix recursion protection transitions between interrupt context
tracing: Fix the checking of stackidx in __ftrace_trace_stack
ftrace: Handle tracing when switching between context
ftrace: Fix recursion check for NMI test
tracing: Fix out of bounds write in get_trace_buf
Linus Torvalds [Thu, 5 Nov 2020 19:32:03 +0000 (11:32 -0800)]
Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- clarify a comment (Michael Kelley)
- change a pr_warn() to pr_info() (Olaf Hering)
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Clarify comment on x2apic mode
hv_balloon: disable warning when floor reached
Linus Torvalds [Thu, 5 Nov 2020 19:25:02 +0000 (11:25 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"A few more merge window regressions that didn't make rc1:
- New validation in the DMA layer triggers wrong use of the DMA layer
in rxe, siw and rdmavt
- Accidental change of a hypervisor facing ABI when widening the port
speed u8 to u16 in vmw_pvrdma
- Memory leak on error unwind in SRP target"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/srpt: Fix typo in srpt_unregister_mad_agent docstring
RDMA/vmw_pvrdma: Fix the active_speed and phys_state value
IB/srpt: Fix memory leak in srpt_add_one
RDMA: Fix software RDMA drivers for dma mapping error
Linus Torvalds [Thu, 5 Nov 2020 19:16:34 +0000 (11:16 -0800)]
Merge tag 'spi-fix-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A small collection of driver specific fixes that have come in since
the merge window, nothing too major here but all good to have"
* tag 'spi-fix-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fsl-dspi: fix wrong pointer in suspend/resume
spi: bcm2835: fix gpio cs level inversion
spi: imx: fix runtime pm support for !CONFIG_PM
Linus Torvalds [Thu, 5 Nov 2020 19:11:40 +0000 (11:11 -0800)]
Merge tag 'regulator-fix-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"An addition to MAINTAINERS plus a fix for a nasty bootstrapping
problem which caused problems when we need to read the voltage of a
regulator that is not yet available during initialization, we were not
correctly distinguishing between this case and the case where a
regulator is put into a bypass mode"
* tag 'regulator-fix-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: defer probe when trying to get voltage from unresolved supply
MAINTAINERS: Add entry for Qualcomm IPQ4019 VQMMC regulator
Linus Torvalds [Thu, 5 Nov 2020 19:04:29 +0000 (11:04 -0800)]
Merge tag 'pm-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix the device links support in runtime PM, correct mistakes in
the cpuidle documentation, fix the handling of policy limits changes
in the schedutil cpufreq governor, fix assorted issues in the OPP
(operating performance points) framework and make one janitorial
change.
Specifics:
- Unify the handling of managed and stateless device links in the
runtime PM framework and prevent runtime PM references to devices
from being leaked after device link removal (Rafael Wysocki).
- Fix two mistakes in the cpuidle documentation (Julia Lawall).
- Prevent the schedutil cpufreq governor from missing policy limits
updates in some cases (Viresh Kumar).
- Prevent static OPPs from being dropped by mistake (Viresh Kumar).
- Prevent helper function in the OPP framework from returning
prematurely (Viresh Kumar).
- Prevent opp_table_lock from being held too long during removal of
OPP tables with no more active references (Viresh Kumar).
- Drop redundant semicolon from the Intel RAPL power capping driver
(Tom Rix)"
* tag 'pm-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: runtime: Resume the device earlier in __device_release_driver()
PM: runtime: Drop pm_runtime_clean_up_links()
PM: runtime: Drop runtime PM references to supplier on link removal
powercap/intel_rapl: remove unneeded semicolon
Documentation: PM: cpuidle: correct path name
Documentation: PM: cpuidle: correct typo
cpufreq: schedutil: Don't skip freq update if need_freq_update is set
opp: Reduce the size of critical section in _opp_table_kref_release()
opp: Fix early exit from dev_pm_opp_register_set_opp_helper()
opp: Don't always remove static OPPs in _of_add_opp_table_v1()
Linus Torvalds [Thu, 5 Nov 2020 18:57:01 +0000 (10:57 -0800)]
Merge tag 'fixes-2020-11-05' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull highmem initialization fix from Mike Rapoport:
"Fix highmem initialization on arm and xtensa
Recent refactoring of memblock iterators has broken initialization of
highmem on arm and xtensa because it changed the way beginning and end
of memory regions are rounded to PFNs. This fix restores the original
behaviour"
* tag 'fixes-2020-11-05' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
ARM, xtensa: highmem: avoid clobbering non-page aligned memory reservations
Linus Torvalds [Thu, 5 Nov 2020 18:51:51 +0000 (10:51 -0800)]
Merge tag 'gfs2-v5.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
"Various gfs2 fixes"
* tag 'gfs2-v5.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Wake up when sd_glock_disposal becomes zero
gfs2: Don't call cancel_delayed_work_sync from within delete work function
gfs2: check for live vs. read-only file system in gfs2_fitrim
gfs2: don't initialize statfs_change inodes in spectator mode
gfs2: Split up gfs2_meta_sync into inode and rgrp versions
gfs2: init_journal's undo directive should also undo the statfs inodes
gfs2: Add missing truncate_inode_pages_final for sd_aspace
gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free
Linus Torvalds [Thu, 5 Nov 2020 18:41:14 +0000 (10:41 -0800)]
Merge tag 'pci-v5.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Fix ACS regression that broke device pass-through (Rajat Jain)
- Revert DesignWare ATU memory resource to use last entry to fix
Tegra194 regression (Rob Herring)
- Remove duplicate mvebu resource requests to fix regression on Turris
Omnia (Rob Herring)
* tag 'pci-v5.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: mvebu: Fix duplicate resource requests
PCI: dwc: Restore ATU memory resource setup to use last entry
PCI: Always enable ACS even if no ACS Capability
Atish Patra [Wed, 7 Oct 2020 21:51:59 +0000 (14:51 -0700)]
RISC-V: Remove any memblock representing unusable memory area
RISC-V limits the physical memory size by -PAGE_OFFSET. Any memory beyond
that size from DRAM start is unusable. Just remove any memblock pointing
to those memory region without worrying about computing the maximum size.
Jens Axboe [Thu, 5 Nov 2020 16:50:16 +0000 (09:50 -0700)]
io_uring: use correct pointer for io_uring_show_cred()
Previous commit changed how we index the registered credentials, but
neglected to update one spot that is used when the personalities are
iterated through ->show_fdinfo(). Ensure we use the right struct type
for the iteration.
Pavel Begunkov [Thu, 5 Nov 2020 14:06:19 +0000 (14:06 +0000)]
io_uring: don't forget to task-cancel drained reqs
If there is a long-standing request of one task locking up execution of
deferred requests, and the defer list contains requests of another task
(all files-less), then a potential execution of __io_uring_task_cancel()
by that another task will sleep until that first long-standing request
completion, and that may take long.
E.g.
tsk1: req1/read(empty_pipe) -> tsk2: req(DRAIN)
Then __io_uring_task_cancel(tsk2) waits for req1 completion.
It seems we even can manufacture a complicated case with many tasks
sharing many rings that can lock them forever.
Cancel deferred requests for __io_uring_task_cancel() as well.
Kent Gibson [Thu, 5 Nov 2020 10:40:49 +0000 (18:40 +0800)]
gpiolib: fix sysfs when cdev is not selected
In gpiochip_setup_dev() the call to gpiolib_cdev_register() indirectly
calls device_add(). This is still required for the sysfs even when
CONFIG_GPIO_CDEV is not selected in the build.
Replace the stubbed functions in gpiolib-cdev.h with macros in gpiolib.c
that perform the required device_add() and device_del() when
CONFIG_GPIO_CDEV is not selected.
Billy Tsai [Fri, 30 Oct 2020 05:54:50 +0000 (13:54 +0800)]
pinctrl: aspeed: Fix GPI only function problem.
Some gpio pin at aspeed soc is input only and the prefix name of these
pin is "GPI" only.
This patch fine-tune the condition of GPIO check from "GPIO" to "GPI"
and it will fix the usage error of banks D and E in the AST2400/AST2500
and banks T and U in the AST2600.
Jens Axboe [Thu, 5 Nov 2020 14:10:50 +0000 (07:10 -0700)]
Merge tag 'nvme-5.10-2020-11-05' of git://git.infradead.org/nvme into block-5.10
Pull NVMe fixes from Christoph:
"nvme fixes for 5.10:
- revert a nvme_queue size optimization (Keith Bush)
- fabrics timeout races fixes (Chao Leng and Sagi Grimberg)"
* tag 'nvme-5.10-2020-11-05' of git://git.infradead.org/nvme:
nvme-tcp: avoid repeated request completion
nvme-rdma: avoid repeated request completion
nvme-tcp: avoid race between time out and tear down
nvme-rdma: avoid race between time out and tear down
nvme: introduce nvme_sync_io_queues
Revert "nvme-pci: remove last_sq_tail"
Christophe Leroy [Mon, 12 Oct 2020 08:54:33 +0000 (08:54 +0000)]
powerpc/8xx: Manage _PAGE_ACCESSED through APG bits in L1 entry
When _PAGE_ACCESSED is not set, a minor fault is expected.
To do this, TLB miss exception ANDs _PAGE_PRESENT and _PAGE_ACCESSED
into the L2 entry valid bit.
To simplify the processing and reduce the number of instructions in
TLB miss exceptions, manage it as an APG bit and get it next to
_PAGE_GUARDED bit to allow a copy in one go. Then declare the
corresponding groups as handling all accesses as user accesses.
As the PP bits always define user as No Access, it will generate
a fault.
* pm-opp:
opp: Reduce the size of critical section in _opp_table_kref_release()
opp: Fix early exit from dev_pm_opp_register_set_opp_helper()
opp: Don't always remove static OPPs in _of_add_opp_table_v1()
Anand Jain [Thu, 29 Oct 2020 22:53:56 +0000 (06:53 +0800)]
btrfs: dev-replace: fail mount if we don't have replace item with target device
If there is a device BTRFS_DEV_REPLACE_DEVID without the device replace
item, then it means the filesystem is inconsistent state. This is either
corruption or a crafted image. Fail the mount as this needs a closer
look what is actually wrong.
As of now if BTRFS_DEV_REPLACE_DEVID is present without the replace
item, in __btrfs_free_extra_devids() we determine that there is an
extra device, and free those extra devices but continue to mount the
device.
However, we were wrong in keeping tack of the rw_devices so the syzbot
testcase failed:
David Sterba [Thu, 9 Jul 2020 09:25:40 +0000 (11:25 +0200)]
btrfs: scrub: update message regarding read-only status
Based on user feedback update the message printed when scrub fails to
start due to write requirements. To make a distinction add a device id
to the messages.
Dan Carpenter [Fri, 23 Oct 2020 11:26:33 +0000 (14:26 +0300)]
btrfs: clean up NULL checks in qgroup_unreserve_range()
Smatch complains that this code dereferences "entry" before checking
whether it's NULL on the next line. Fortunately, rb_entry() will never
return NULL so it doesn't cause a problem. We can clean up the NULL
checking a bit to silence the warning and make the code more clear.
Josef Bacik [Mon, 26 Oct 2020 20:57:27 +0000 (16:57 -0400)]
btrfs: fix min reserved size calculation in merge_reloc_root
The minimum reserve size was adjusted to take into account the height of
the tree we are merging, however we can have a root with a level == 0.
What we want is root_level + 1 to get the number of nodes we may have to
cow. This fixes the enospc_debug warning pops with btrfs/101.
Nikolay: this fixes failures on btrfs/060 btrfs/062 btrfs/063 and
btrfs/195 That I was seeing, the call trace was:
btrfs: fix potential overflow in cluster_pages_for_defrag on 32bit arch
On 32-bit systems, this shift will overflow for files larger than 4GB as
start_index is unsigned long while the calls to btrfs_delalloc_*_space
expect u64.
CC: [email protected] # 4.4+ Fixes: df480633b891 ("btrfs: extent-tree: Switch to new delalloc space reserve and release") Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: David Sterba <[email protected]>
[ define the variable instead of repeating the shift ] Signed-off-by: David Sterba <[email protected]>
Casey Bowman [Wed, 7 Oct 2020 23:13:07 +0000 (16:13 -0700)]
thunderbolt: Add uaccess dependency to debugfs interface
Some calls in the debugfs interface are made to the linux/uaccess.h header,
but the header is not referenced. So, for x86_64 architectures, this
dependency seems to be pulled in elsewhere, which leads to a successful
compilation. However, on arm/arm64 architectures, it was found to error out
on implicit declarations.
This change fixes the implicit declaration error by adding the
linux/uaccess.h header.
Mika Westerberg [Wed, 7 Oct 2020 14:06:17 +0000 (17:06 +0300)]
thunderbolt: Fix memory leak if ida_simple_get() fails in enumerate_services()
The svc->key field is not released as it should be if ida_simple_get()
fails so fix that.
Fixes: 9aabb68568b4 ("thunderbolt: Fix to check return value of ida_simple_get") Cc: [email protected] Signed-off-by: Mika Westerberg <[email protected]>
Andy Shevchenko [Fri, 9 Oct 2020 18:08:55 +0000 (21:08 +0300)]
pinctrl: mcp23s08: Use full chunk of memory for regmap configuration
It appears that simplification of mcp23s08_spi_regmap_init() made
a regression due to wrong size calculation for dev_kmemdup() call.
It misses the fact that config variable is already a pointer, thus
the sizeof() calculation is wrong and only 4 or 8 bytes were copied.
Fix the parameters to devm_kmemdup() to copy a full chunk of memory.
Can Guo [Tue, 3 Nov 2020 06:24:40 +0000 (22:24 -0800)]
scsi: ufs: Try to save power mode change and UIC cmd completion timeout
Use the uic_cmd->cmd_active as a flag to track the lifecycle of an UIC cmd.
The flag is set before sending the UIC cmd and cleared in IRQ handler. When
a PMC or UIC cmd completion timeout happens, if the flag is not set,
instead of returning timeout error, we still treat it as a successful
operation. This is to deal with the scenario in which completion has been
raised but the one waiting for the completion cannot be awaken in time due
to kernel scheduling problem.
Can Guo [Tue, 3 Nov 2020 06:24:39 +0000 (22:24 -0800)]
scsi: ufs: Fix unbalanced scsi_block_reqs_cnt caused by ufshcd_hold()
The scsi_block_reqs_cnt increased in ufshcd_hold() is supposed to be
decreased back in ufshcd_ungate_work() in a paired way. However, if
specific ufshcd_hold/release sequences are met, it is possible that
scsi_block_reqs_cnt is increased twice but only one ungate work is
queued. To make sure scsi_block_reqs_cnt is handled by ufshcd_hold() and
ufshcd_ungate_work() in a paired way, increase it only if queue_work()
returns true.
Heiner Kallweit [Tue, 3 Nov 2020 17:52:18 +0000 (18:52 +0100)]
r8169: work around short packet hw bug on RTL8125
Network problems with RTL8125B have been reported [0] and with help
from Realtek it turned out that this chip version has a hw problem
with short packets (similar to RTL8168evl). Having said that activate
the same workaround as for RTL8168evl.
Realtek suggested to activate the workaround for RTL8125A too, even
though they're not 100% sure yet which RTL8125 versions are affected.
Peng Fan [Sun, 1 Nov 2020 11:23:54 +0000 (19:23 +0800)]
clk: imx8m: fix bus critical clk registration
noc/axi/ahb are bus clk, not peripheral clk.
Since peripheral clk has a limitation that for peripheral clock slice,
IP clock slices must be stopped to change the clock source.
However if the bus clk is marked as critical clk peripheral, the
assigned clock parent operation will fail.
So we added CLK_SET_PARENT_GATE flag to avoid glitch.
And add imx8m_clk_hw_composite_bus_critical for bus critical clock usage
Notice there are no stores other than to the stack. There should be a
stw in there for the store to current->set_child_tid.
This is only seen with GCC 4.9 era compilers (tested with 4.9.3 and
4.9.4), and only when CONFIG_PPC_KUAP is disabled.
When CONFIG_PPC_KUAP=y, the inline asm that's part of the isync()
and mtspr() inlined via allow_user_access() seems to be enough to
avoid the bug.
We already have a macro to work around this (or a similar bug), called
asm_volatile_goto which includes an empty asm block to tickle the
compiler into generating the right code. So use that.
With this applied the code generation looks more like it will work:
Magnus Karlsson [Tue, 3 Nov 2020 09:41:30 +0000 (10:41 +0100)]
libbpf: Fix possible use after free in xsk_socket__delete
Fix a possible use after free in xsk_socket__delete that will happen
if xsk_put_ctx() frees the ctx. To fix, save the umem reference taken
from the context and just use that instead.
Jeff Layton [Mon, 12 Oct 2020 13:39:06 +0000 (09:39 -0400)]
ceph: check session state after bumping session->s_seq
Some messages sent by the MDS entail a session sequence number
increment, and the MDS will drop certain types of requests on the floor
when the sequence numbers don't match.
In particular, a REQUEST_CLOSE message can cross with one of the
sequence morphing messages from the MDS which can cause the client to
stall, waiting for a response that will never come.
Originally, this meant an up to 5s delay before the recurring workqueue
job kicked in and resent the request, but a recent change made it so
that the client would never resend, causing a 60s stall unmounting and
sometimes a blockisting event.
Add a new helper for incrementing the session sequence and then testing
to see whether a REQUEST_CLOSE needs to be resent, and move the handling
of CEPH_MDS_SESSION_CLOSING into that function. Change all of the
bare sequence counter increments to use the new helper.
Reorganize check_session_state with a switch statement. It should no
longer be called when the session is CLOSING, so throw a warning if it
ever is (but still handle that case sanely).
Rob Herring [Fri, 23 Oct 2020 14:52:52 +0000 (09:52 -0500)]
PCI: mvebu: Fix duplicate resource requests
With commit 669cbc708122 ("PCI: Move DT resource setup into
devm_pci_alloc_host_bridge()"), the DT 'ranges' is parsed and populated
into resources when the host bridge is allocated. The resources are
requested as well, but that happens a second time for the mvebu driver in
mvebu_pcie_parse_request_resources(). We should only be requesting the
additional resources added in mvebu_pcie_parse_request_resources(). These
are not added by default because they use custom properties rather than
standard DT address translation.
Also, the bus ranges was also populated by default, so we can remove it
from mvebu_pci_host_probe().
Rob Herring [Mon, 26 Oct 2020 15:48:52 +0000 (10:48 -0500)]
PCI: dwc: Restore ATU memory resource setup to use last entry
Prior to commit 0f71c60ffd26 ("PCI: dwc: Remove storing of PCI resources"),
the DWC driver was setting up the last memory resource rather than the
first memory resource. This doesn't matter for most platforms which only
have 1 memory resource, but it broke Tegra194 which has a 2nd
(prefetchable) memory region that requires an ATU entry. The first region
on Tegra194 relies on the default 1:1 pass-thru of outbound transactions
and doesn't need an ATU entry.
Jakub Kicinski [Wed, 4 Nov 2020 18:36:37 +0000 (10:36 -0800)]
Merge tag 'linux-can-fixes-for-5.10-20201103' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2020-11-03
The first two patches are by Oleksij Rempel and they add a generic
can-controller Device Tree yaml binding and convert the text based binding
of the flexcan driver to a yaml based binding.
Zhang Changzhong's patch fixes a remove_proc_entry warning in the AF_CAN
core.
A patch by me fixes a kfree_skb() call from IRQ context in the rx-offload
helper.
Vincent Mailhol contributes a patch to prevent a call to kfree_skb() in
hard IRQ context in can_get_echo_skb().
Oliver Hartkopp's patch fixes the length calculation for RTR CAN frames
in the __can_get_echo_skb() helper.
Oleksij Rempel's patch fixes a use-after-free that shows up with j1939 in
can_create_echo_skb().
Yegor Yefremov contributes 4 patches to enhance the j1939 documentation.
Zhang Changzhong's patch fixes a hanging task problem in j1939_sk_bind()
if the netdev is down.
Then there are three patches for the newly added CAN_ISOTP protocol. Geert
Uytterhoeven enhances the kconfig help text. Oliver Hartkopp's patch adds
missing RX timeout handling in listen-only mode and Colin Ian King's patch
decreases the generated object code by 926 bytes.
Zhang Changzhong contributes a patch for the ti_hecc driver that fixes the
error path in the probe function.
Navid Emamdoost's patch for the xilinx_can driver fixes the error handling
in case of failing pm_runtime_get_sync().
There are two patches for the peak_usb driver. Dan Carpenter adds range
checking in decode operations and Stephane Grosjean's patch fixes
a timestamp wrapping problem.
Stephane Grosjean's patch for th peak_canfd driver fixes echo management if
loopback is on.
The next three patches all target the mcp251xfd driver. The first one is
by me and it increased the severity of CRC read error messages. The kernel
test robot removes an unneeded semicolon and Tom Rix removes unneeded
break in several switch-cases.
The last 4 patches are by Joakim Zhang and target the flexcan driver,
the first three fix ECC related device specific quirks for the LS1021A,
LX2160A and the VF610 SoC. The last patch disable wakeup completely upon
driver remove.
* tag 'linux-can-fixes-for-5.10-20201103' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: (27 commits)
can: flexcan: flexcan_remove(): disable wakeup completely
can: flexcan: add ECC initialization for VF610
can: flexcan: add ECC initialization for LX2160A
can: flexcan: remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
can: mcp251xfd: remove unneeded break
can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings
can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages
can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is on
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
can: peak_usb: add range checking in decode operations
can: xilinx_can: handle failure cases of pm_runtime_get_sync
can: ti_hecc: ti_hecc_probe(): add missed clk_disable_unprepare() in error path
can: isotp: padlen(): make const array static, makes object smaller
can: isotp: isotp_rcv_cf(): enable RX timeout handling in listen-only mode
can: isotp: Explain PDU in CAN_ISOTP help text
can: j1939: j1939_sk_bind(): return failure if netdev is down
can: j1939: use backquotes for code samples
can: j1939: swap addr and pgn in the send example
can: j1939: fix syntax and spelling
can: j1939: rename jacd tool
...
====================
Zhao Qiang [Tue, 3 Nov 2020 02:05:46 +0000 (10:05 +0800)]
spi: fsl-dspi: fix wrong pointer in suspend/resume
Since commit 530b5affc675 ("spi: fsl-dspi: fix use-after-free in
remove path"), this driver causes a "NULL pointer dereference"
in dspi_suspend/resume.
This is because since this commit, the drivers private data point to
"dspi" instead of "ctlr", the codes in suspend and resume func were
not modified correspondly.
Pavel Begunkov [Wed, 4 Nov 2020 13:39:31 +0000 (13:39 +0000)]
io_uring: fix overflowed cancel w/ linked ->files
Current io_match_files() check in io_cqring_overflow_flush() is useless
because requests drop ->files before going to the overflow list, however
linked to it request do not, and we don't check them.
Jens Axboe [Tue, 3 Nov 2020 19:19:07 +0000 (12:19 -0700)]
io_uring: drop req/tctx io_identity separately
We can't bundle this into one operation, as the identity may not have
originated from the tctx to begin with. Drop one ref for each of them
separately, if they don't match the static assignment. If we don't, then
if the identity is a lookup from registered credentials, we could be
freeing that identity as we're dropping a reference assuming it came from
the tctx. syzbot reports this as a use-after-free, as the identity is
still referencable from idr lookup:
==================================================================
BUG: KASAN: use-after-free in instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
BUG: KASAN: use-after-free in atomic_fetch_add_relaxed include/asm-generic/atomic-instrumented.h:142 [inline]
BUG: KASAN: use-after-free in __refcount_add include/linux/refcount.h:193 [inline]
BUG: KASAN: use-after-free in __refcount_inc include/linux/refcount.h:250 [inline]
BUG: KASAN: use-after-free in refcount_inc include/linux/refcount.h:267 [inline]
BUG: KASAN: use-after-free in io_init_req fs/io_uring.c:6700 [inline]
BUG: KASAN: use-after-free in io_submit_sqes+0x15a9/0x25f0 fs/io_uring.c:6774
Write of size 4 at addr ffff888011e08e48 by task syz-executor165/8487
Memory state around the buggy address: ffff888011e08d00: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc ffff888011e08d80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
> ffff888011e08e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
^ ffff888011e08e80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc ffff888011e08f00: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
==================================================================
Track if a given task io_uring context contains SQPOLL instances, so we
can iterate those for cancelation (and request counts). This ensures that
we properly wait on SQPOLL contexts, and find everything that needs
canceling.
Jens Axboe [Fri, 30 Oct 2020 15:36:41 +0000 (09:36 -0600)]
io-wq: cancel request if it's asking for files and we don't have them
This can't currently happen, but will be possible shortly. Handle missing
files just like we do not being able to grab a needed mm, and mark the
request as needing cancelation.
Thomas Gleixner [Wed, 4 Nov 2020 13:06:23 +0000 (14:06 +0100)]
entry: Fix the incorrect ordering of lockdep and RCU check
When an exception/interrupt hits kernel space and the kernel is not
currently in the idle task then RCU must be watching.
irqentry_enter() validates this via rcu_irq_enter_check_tick(), which in
turn invokes lockdep when taking a lock. But at that point lockdep does not
yet know about the fact that interrupts have been disabled by the CPU,
which triggers a lockdep splat complaining about inconsistent state.
Invoking trace_hardirqs_off() before rcu_irq_enter_check_tick() defeats the
point of rcu_irq_enter_check_tick() because trace_hardirqs_off() uses RCU.
So use the same sequence as for the idle case and tell lockdep about the
irq state change first, invoke the RCU check and then do the lockdep and
tracer update.
In commit 7588cbeec6df, we tried to fix a race stemming from the lack of
coordination between higher level code that wants to allocate and remap
CoW fork extents into the data fork. Christoph cites as examples the
always_cow mode, and a directio write completion racing with writeback.
According to the comments before the goto retry, we want to restart the
lookup to catch the extent in the data fork, but we don't actually reset
whichfork or cow_fsb, which means the second try executes using stale
information. Up until now I think we've gotten lucky that either
there's something left in the CoW fork to cause cow_fsb to be reset, or
either data/cow fork sequence numbers have advanced enough to force a
fresh lookup from the data fork. However, if we reach the retry with an
empty stable CoW fork and a stable data fork, neither of those things
happens. The retry foolishly re-calls xfs_convert_blocks on the CoW
fork which fails again. This time, we toss the write.
I've recently been working on extending reflink to the realtime device.
When the realtime extent size is larger than a single block, we have to
force the page cache to CoW the entire rt extent if a write (or
fallocate) are not aligned with the rt extent size. The strategy I've
chosen to deal with this is derived from Dave's blocksize > pagesize
series: dirtying around the write range, and ensuring that writeback
always starts mapping on an rt extent boundary. This has brought this
race front and center, since generic/522 blows up immediately.
However, I'm pretty sure this is a bug outright, independent of that.
Fixes: 7588cbeec6df ("xfs: retry COW fork delalloc conversion when no extent was found") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
Brian Foster [Thu, 29 Oct 2020 21:30:49 +0000 (14:30 -0700)]
iomap: clean up writeback state logic on writepage error
The iomap writepage error handling logic is a mash of old and
slightly broken XFS writepage logic. When keepwrite writeback state
tracking was introduced in XFS in commit 0d085a529b42 ("xfs: ensure
WB_SYNC_ALL writeback handles partial pages correctly"), XFS had an
additional cluster writeback context that scanned ahead of
->writepage() to process dirty pages over the current ->writepage()
extent mapping. This context expected a dirty page and required
retention of the TOWRITE tag on partial page processing so the
higher level writeback context would revisit the page (in contrast
to ->writepage(), which passes a page with the dirty bit already
cleared).
The cluster writeback mechanism was eventually removed and some of
the error handling logic folded into the primary writeback path in
commit 150d5be09ce4 ("xfs: remove xfs_cancel_ioend"). This patch
accidentally conflated the two contexts by using the keepwrite logic
in ->writepage() without accounting for the fact that the page is
not dirty. Further, the keepwrite logic has no practical effect on
the core ->writepage() caller (write_cache_pages()) because it never
revisits a page in the current function invocation.
Technically, the page should be redirtied for the keepwrite logic to
have any effect. Otherwise, write_cache_pages() may find the tagged
page but will skip it since it is clean. Even if the page was
redirtied, however, there is still no practical effect to keepwrite
since write_cache_pages() does not wrap around within a single
invocation of the function. Therefore, the dirty page would simply
end up retagged on the next writeback sequence over the associated
range.
All that being said, none of this really matters because redirtying
a partially processed page introduces a potential infinite redirty
-> writeback failure loop that deviates from the current design
principle of clearing the dirty state on writepage failure to avoid
building up too much dirty, unreclaimable memory on the system.
Therefore, drop the spurious keepwrite usage and dirty state
clearing logic from iomap_writepage_map(), treat the partially
processed page the same as a fully processed page, and let the
imminent ioend failure clean up the writeback state.
Brian Foster [Thu, 29 Oct 2020 21:30:48 +0000 (14:30 -0700)]
iomap: support partial page discard on writeback block mapping failure
iomap writeback mapping failure only calls into ->discard_page() if
the current page has not been added to the ioend. Accordingly, the
XFS callback assumes a full page discard and invalidation. This is
problematic for sub-page block size filesystems where some portion
of a page might have been mapped successfully before a failure to
map a delalloc block occurs. ->discard_page() is not called in that
error scenario and the bio is explicitly failed by iomap via the
error return from ->prepare_ioend(). As a result, the filesystem
leaks delalloc blocks and corrupts the filesystem block counters.
Since XFS is the only user of ->discard_page(), tweak the semantics
to invoke the callback unconditionally on mapping errors and provide
the file offset that failed to map. Update xfs_discard_page() to
discard the corresponding portion of the file and pass the range
along to iomap_invalidatepage(). The latter already properly handles
both full and sub-page scenarios by not changing any iomap or page
state on sub-page invalidations.
Brian Foster [Thu, 29 Oct 2020 21:30:48 +0000 (14:30 -0700)]
xfs: flush new eof page on truncate to avoid post-eof corruption
It is possible to expose non-zeroed post-EOF data in XFS if the new
EOF page is dirty, backed by an unwritten block and the truncate
happens to race with writeback. iomap_truncate_page() will not zero
the post-EOF portion of the page if the underlying block is
unwritten. The subsequent call to truncate_setsize() will, but
doesn't dirty the page. Therefore, if writeback happens to complete
after iomap_truncate_page() (so it still sees the unwritten block)
but before truncate_setsize(), the cached page becomes inconsistent
with the on-disk block. A mapped read after the associated page is
reclaimed or invalidated exposes non-zero post-EOF data.
For example, consider the following sequence when run on a kernel
modified to explicitly flush the new EOF page within the race
window:
$ xfs_io -fc "falloc 0 4k" -c fsync /mnt/file
$ xfs_io -c "pwrite 0 4k" -c "truncate 1k" /mnt/file
...
$ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file 00000400: 00 00 00 00 00 00 00 00 ........
$ umount /mnt/; mount <dev> /mnt/
$ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file 00000400: cd cd cd cd cd cd cd cd ........
Update xfs_setattr_size() to explicitly flush the new EOF page prior
to the page truncate to ensure iomap has the latest state of the
underlying block.
Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path") Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
Since the kprobe handlers have protection that prohibits other handlers from
executing in other contexts (like if an NMI comes in while processing a
kprobe, and executes the same kprobe, it will get fail with a "busy"
return). Lockdep is unaware of this protection. Use lockdep's nesting api to
differentiate between locks taken in INT3 context and other context to
suppress the false warnings.
Fangrui Song [Tue, 3 Nov 2020 01:23:58 +0000 (17:23 -0800)]
x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S
Commit
393f203f5fd5 ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions")
added .weak directives to arch/x86/lib/mem*_64.S instead of changing the
existing ENTRY macros to WEAK. This can lead to the assembly snippet
.weak memcpy
...
.globl memcpy
which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL memcpy
with LLVM's integrated assembler before LLVM 12. LLVM 12 (since
https://reviews.llvm.org/D90108) will error on such an overridden symbol
binding.
Commit
ef1e03152cb0 ("x86/asm: Make some functions local")
changed ENTRY in arch/x86/lib/memcpy_64.S to SYM_FUNC_START_LOCAL, which
was ineffective due to the preceding .weak directive.
The write-URB busy flag was being cleared before the completion handler
was done with the URB, something which could lead to corrupt transfers
due to a racing write request if the URB is resubmitted.
Fixes: 507ca9bc0476 ("[PATCH] USB: add ability for usb-serial drivers to determine if their write urb is currently being used.") Cc: stable <[email protected]> # 2.6.13 Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Johan Hovold <[email protected]>
free_highpages() iterates over the free memblock regions in high
memory, and marks each page as available for the memory management
system.
Until commit cddb5ddf2b76 ("arm, xtensa: simplify initialization of
high memory pages") it rounded beginning of each region upwards and end of
each region downwards.
However, after that commit free_highmem() rounds the beginning and end of
each region downwards, and we may end up freeing a page that is
memblock_reserve()d, resulting in memory corruption.
Restore the original rounding of the region boundaries to avoid freeing
reserved pages.
Arnd Bergmann [Mon, 26 Oct 2020 16:08:06 +0000 (17:08 +0100)]
habanalabs: fix kernel pointer type
All throughout the driver, normal kernel pointers are
stored as 'u64' struct members, which is kind of silly
and requires casting through a uintptr_t to void* every
time they are used.
There is one line that missed the intermediate uintptr_t
case, which leads to a compiler warning:
drivers/misc/habanalabs/common/command_buffer.c: In function 'hl_cb_mmap':
drivers/misc/habanalabs/common/command_buffer.c:512:44: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
512 | rc = hdev->asic_funcs->cb_mmap(hdev, vma, (void *) cb->kernel_address,
Rather than adding one more cast, just fix the type and
remove all the other casts.
Robert Hancock [Tue, 3 Nov 2020 19:33:15 +0000 (13:33 -0600)]
hwmon: (pmbus) Add mutex locking for sysfs reads
As part of commit a919ba06979a7 ("hwmon: (pmbus) Stop caching register
values"), the update of the sensor value is now triggered directly by the
sensor attribute value being read from sysfs. This created (or at least
made much more likely) a locking issue, since nothing protected the device
page selection from being unexpectedly modified by concurrent reads. If
sensor values on different pages on the same device were being concurrently
read by multiple threads, this could cause spurious read errors due to the
page register not reading back the same value last written, or sensor
values being read from the incorrect page.
Add locking of the update_lock mutex in pmbus_show_sensor and
pmbus_show_samples so that these cannot result in concurrent reads from the
underlying device.