This happens because at btrfs_read_qgroup_config() we can call
qgroup_rescan_init() while holding a read lock on a quota btree leaf,
acquired by the previous call to btrfs_search_slot_for_read(), and
qgroup_rescan_init() acquires the mutex qgroup_rescan_lock.
A qgroup rescan worker does the opposite: it acquires the mutex
qgroup_rescan_lock, at btrfs_qgroup_rescan_worker(), and then tries to
update the qgroup status item in the quota btree through the call to
update_qgroup_status_item(). This inversion of locking order
between the qgroup_rescan_lock mutex and quota btree locks causes the
splat.
Fix this simply by releasing and freeing the path before calling
qgroup_rescan_init() at btrfs_read_qgroup_config().
David Sterba [Mon, 16 Nov 2020 18:53:52 +0000 (19:53 +0100)]
btrfs: tree-checker: add missing returns after data_ref alignment checks
There are sectorsize alignment checks that are reported but then
check_extent_data_ref continues. This was not intended, wrong alignment
is not a minor problem and we should return with error.
btrfs: don't access possibly stale fs_info data for printing duplicate device
Syzbot reported a possible use-after-free when printing a duplicate device
warning device_list_add().
At this point it can happen that a btrfs_device::fs_info is not correctly
setup yet, so we're accessing stale data, when printing the warning
message using the btrfs_printk() wrappers.
==================================================================
BUG: KASAN: use-after-free in btrfs_printk+0x3eb/0x435 fs/btrfs/super.c:245
Read of size 8 at addr ffff8880878e06a8 by task syz-executor225/7068
The syzkaller reproducer for this use-after-free crafts a filesystem image
and loop mounts it twice in a loop. The mount will fail as the crafted
image has an invalid chunk tree. When this happens btrfs_mount_root() will
call deactivate_locked_super(), which then cleans up fs_info and
fs_info::sb. If a second thread now adds the same block-device to the
filesystem, it will get detected as a duplicate device and
device_list_add() will reject the duplicate and print a warning. But as
the fs_info pointer passed in is non-NULL this will result in a
use-after-free.
Instead of printing possibly uninitialized or already freed memory in
btrfs_printk(), explicitly pass in a NULL fs_info so the printing of the
device name will be skipped altogether.
There was a slightly different approach discussed in
https://lore.kernel.org/linux-btrfs/20200114060920[email protected]/t/#u
Merge tag 'misc-habanalabs-fixes-2020-11-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-linus
Oded writes:
This tag contains the following habanalabs driver fix for 5.10-rc6:
- Add missing statements and break; in case switch of ECC handling. Without
this fix, the handling of that interrupt will be erroneous.
* tag 'misc-habanalabs-fixes-2020-11-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux:
habanalabs/gaudi: fix missing code in ECC handling
ASoC: qcom: Fix enabling BCLK and LRCLK in LPAIF invalid state
Fix enabling BCLK and LRCLK only when LPAIF is invalid state and
bit clock in enable state.
In device suspend/resume scenario LPAIF is going to reset state.
which is causing LRCLK disable and BCLK enable.
Avoid such inconsitency by removing unnecessary cpu dai prepare API,
which is doing LRCLK enable, and by maintaining BLCK state information.
Maxime Ripard [Fri, 20 Nov 2020 14:42:45 +0000 (15:42 +0100)]
drm/vc4: kms: Don't disable the muxing of an active CRTC
The current HVS muxing code will consider the CRTCs in a given state to
setup their muxing in the HVS, and disable the other CRTCs muxes.
However, it's valid to only update a single CRTC with a state, and in this
situation we would mux out a CRTC that was enabled but left untouched by
the new state.
Fix this by setting a flag on the CRTC state when the muxing has been
changed, and only change the muxing configuration when that flag is there.
Maxime Ripard [Fri, 20 Nov 2020 14:42:44 +0000 (15:42 +0100)]
drm/vc4: kms: Store the unassigned channel list in the state
If a CRTC is enabled but not active, and that we're then doing a page
flip on another CRTC, drm_atomic_get_crtc_state will bring the first
CRTC state into the global state, and will make us wait for its vblank
as well, even though that might never occur.
Instead of creating the list of the free channels each time atomic_check
is called, and calling drm_atomic_get_crtc_state to retrieve the
allocated channels, let's create a private state object in the main
atomic state, and use it to store the available channels.
Since vc4 has a semaphore (with a value of 1, so a lock) in its commit
implementation to serialize all the commits, even the nonblocking ones, we
are free from the use-after-free race if two subsequent commits are not ran
in their submission order.
Merge tag 'icc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-linus
Georgi writes:
interconnect fixes for v5.10
This contains a few driver fixes and one core fix:
- Fix an excessive of_node_put() in the core.
- Fix boot regression and integer overflow on msm8974 platforms.
- Fix a minor issue on qcs404 and msm8916 platforms.
Signed-off-by: Georgi Djakov <[email protected]>
* tag 'icc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
interconnect: fix memory trashing in of_count_icc_providers()
interconnect: qcom: qcs404: Remove GPU and display RPM IDs
interconnect: qcom: msm8916: Remove rpm-ids from non-RPM nodes
interconnect: qcom: msm8974: Don't boost the NoC rate during boot
interconnect: qcom: msm8974: Prevent integer overflow in rate
Arnd Bergmann [Mon, 23 Nov 2020 16:30:24 +0000 (17:30 +0100)]
Merge tag 'v5.10-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes
Fixed ordering for MMC devices on rk3399, due to a mmc change jumbling
all ordering, a fix to make the Odroig Go Advance actually power down
and using the correct clock name on the NanoPi R2S.
* tag 'v5.10-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: Reorder LED triggers from mmc devices on rk3399-roc-pc.
arm64: dts: rockchip: Assign a fixed index to mmc devices on rk3399 boards.
arm64: dts: rockchip: Remove system-power-controller from pmic on Odroid Go Advance
arm64: dts: rockchip: fix NanoPi R2S GMAC clock name
Will Deacon [Fri, 20 Nov 2020 13:57:48 +0000 (13:57 +0000)]
arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()
With hardware dirty bit management, calling pte_wrprotect() on a writable,
dirty PTE will lose the dirty state and return a read-only, clean entry.
Move the logic from ptep_set_wrprotect() into pte_wrprotect() to ensure that
the dirty bit is preserved for writable entries, as this is required for
soft-dirty bit management if we enable it in the future.
Will Deacon [Fri, 20 Nov 2020 13:28:01 +0000 (13:28 +0000)]
arm64: pgtable: Fix pte_accessible()
pte_accessible() is used by ptep_clear_flush() to figure out whether TLB
invalidation is necessary when unmapping pages for reclaim. Although our
implementation is correct according to the architecture, returning true
only for valid, young ptes in the absence of racing page-table
modifications, this is in fact flawed due to lazy invalidation of old
ptes in ptep_clear_flush_young() where we elide the expensive DSB
instruction for completing the TLB invalidation.
Rather than penalise the aging path, adjust pte_accessible() to return
true for any valid pte, even if the access flag is cleared.
Shameer Kolothum [Thu, 19 Nov 2020 16:58:46 +0000 (16:58 +0000)]
iommu: Check return of __iommu_attach_device()
Currently iommu_create_device_direct_mappings() is called
without checking the return of __iommu_attach_device(). This
may result in failures in iommu driver if dev attach returns
error.
John Stultz [Thu, 12 Nov 2020 22:05:19 +0000 (22:05 +0000)]
arm-smmu-qcom: Ensure the qcom_scm driver has finished probing
Robin Murphy pointed out that if the arm-smmu driver probes before
the qcom_scm driver, we may call qcom_scm_qsmmu500_wait_safe_toggle()
before the __scm is initialized.
Now, getting this to happen is a bit contrived, as in my efforts it
required enabling asynchronous probing for both drivers, moving the
firmware dts node to the end of the dtsi file, as well as forcing a
long delay in the qcom_scm_probe function.
With those tweaks we ran into the following crash:
[ 2.631040] arm-smmu 15000000.iommu: Stage-1: 48-bit VA -> 48-bit IPA
[ 2.633372] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
...
[ 2.633402] [0000000000000000] user address but active_mm is swapper
[ 2.633409] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 2.633415] Modules linked in:
[ 2.633427] CPU: 5 PID: 117 Comm: kworker/u16:2 Tainted: G W 5.10.0-rc1-mainline-00025-g272a618fc36-dirty #3971
[ 2.633430] Hardware name: Thundercomm Dragonboard 845c (DT)
[ 2.633448] Workqueue: events_unbound async_run_entry_fn
[ 2.633456] pstate: 80c00005 (Nzcv daif +PAN +UAO -TCO BTYPE=--)
[ 2.633465] pc : qcom_scm_qsmmu500_wait_safe_toggle+0x78/0xb0
[ 2.633473] lr : qcom_smmu500_reset+0x58/0x78
[ 2.633476] sp : ffffffc0105a3b60
...
[ 2.633567] Call trace:
[ 2.633572] qcom_scm_qsmmu500_wait_safe_toggle+0x78/0xb0
[ 2.633576] qcom_smmu500_reset+0x58/0x78
[ 2.633581] arm_smmu_device_reset+0x194/0x270
[ 2.633585] arm_smmu_device_probe+0xc94/0xeb8
[ 2.633592] platform_drv_probe+0x58/0xa8
[ 2.633597] really_probe+0xec/0x398
[ 2.633601] driver_probe_device+0x5c/0xb8
[ 2.633606] __driver_attach_async_helper+0x64/0x88
[ 2.633610] async_run_entry_fn+0x4c/0x118
[ 2.633617] process_one_work+0x20c/0x4b0
[ 2.633621] worker_thread+0x48/0x460
[ 2.633628] kthread+0x14c/0x158
[ 2.633634] ret_from_fork+0x10/0x18
[ 2.633642] Code: a9034fa0d0007f7329107fa091342273 (f9400020)
To avoid this, this patch adds a check on qcom_scm_is_available() in
the qcom_smmu_impl_init() function, returning -EPROBE_DEFER if its
not ready.
This allows the driver to try to probe again later after qcom_scm has
finished probing.
Ran Wang [Mon, 23 Nov 2020 02:57:15 +0000 (10:57 +0800)]
spi: spi-nxp-fspi: fix fspi panic by unexpected interrupts
Given the case that bootloader(such as UEFI)'s FSPI driver might not
handle all interrupts before loading kernel, those legacy interrupts
would assert immidiately once kernel's FSPI driver enable them. Further,
if it was FSPI_INTR_IPCMDDONE, the irq handler nxp_fspi_irq_handler()
would call complete(&f->c) to notify others. However, f->c might not be
initialized yet at that time, then cause kernel panic.
Of cause, we should fix this issue within bootloader. But it would be
better to have this pacth to make dirver more robust (by clearing all
interrupt status bits before enabling interrupts).
iommu/amd: Enforce 4k mapping for certain IOMMU data structures
AMD IOMMU requires 4k-aligned pages for the event log, the PPR log,
and the completion wait write-back regions. However, when allocating
the pages, they could be part of large mapping (e.g. 2M) page.
This causes #PF due to the SNP RMP hardware enforces the check based
on the page level for these data structures.
So, fix by calling set_memory_4k() on the allocated pages.
Marek Majtyka [Fri, 20 Nov 2020 15:14:43 +0000 (16:14 +0100)]
xsk: Fix incorrect netdev reference count
Fix incorrect netdev reference count in xsk_bind operation. Incorrect
reference count of the device appears when a user calls bind with the
XDP_ZEROCOPY flag on an interface which does not support zero-copy.
In such a case, an error is returned but the reference count is not
decreased. This change fixes the fault, by decreasing the reference count
in case of such an error.
The problem being corrected appeared in '162c820ed896' for the first time,
and the code was moved to new file location over the time with commit
'c2d3d6a47462'. This specific patch applies to all version starting
from 'c2d3d6a47462'. The same solution should be applied but on different
file (net/xdp/xdp_umem.c) and function (xdp_umem_assign_dev) for versions
from '162c820ed896' to 'c2d3d6a47462' excluded.
Shiju Jose [Wed, 14 Oct 2020 09:31:39 +0000 (10:31 +0100)]
ACPI/IORT: Fix doc warnings in iort.c
Fix following warnings caused by mismatch between
function parameters and function comments.
drivers/acpi/arm64/iort.c:55: warning: Function parameter or member 'iort_node' not described in 'iort_set_fwnode'
drivers/acpi/arm64/iort.c:55: warning: Excess function parameter 'node' description in 'iort_set_fwnode'
drivers/acpi/arm64/iort.c:682: warning: Function parameter or member 'id' not described in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:682: warning: Function parameter or member 'bus_token' not described in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:682: warning: Excess function parameter 'req_id' description in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:1142: warning: Function parameter or member 'dma_size' not described in 'iort_dma_setup'
drivers/acpi/arm64/iort.c:1142: warning: Excess function parameter 'size' description in 'iort_dma_setup'
drivers/acpi/arm64/iort.c:1534: warning: Function parameter or member 'ops' not described in 'iort_add_platform_device'
Randy Dunlap [Mon, 23 Nov 2020 04:45:10 +0000 (20:45 -0800)]
arm64/fpsimd: add <asm/insn.h> to <asm/kprobes.h> to fix fpsimd build
Adding <asm/exception.h> brought in <asm/kprobes.h> which uses
<asm/probes.h>, which uses 'pstate_check_t' so the latter needs to
#include <asm/insn.h> for this typedef.
Fixes this build error:
In file included from arch/arm64/include/asm/kprobes.h:24,
from arch/arm64/include/asm/exception.h:11,
from arch/arm64/kernel/fpsimd.c:35:
arch/arm64/include/asm/probes.h:16:2: error: unknown type name 'pstate_check_t'
16 | pstate_check_t *pstate_cc;
Sven Schnelle [Fri, 20 Nov 2020 13:17:52 +0000 (14:17 +0100)]
s390: fix fpu restore in entry.S
We need to disable interrupts in load_fpu_regs(). Otherwise an
interrupt might come in after the registers are loaded, but before
CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns,
CIF_FPU will be cleared and the registers will never be restored.
The entry.S code usually saves the interrupt state in __SF_EMPTY on the
stack when disabling/restoring interrupts. sie64a however saves the pointer
to the sie control block in __SF_SIE_CONTROL, which references the same
location. This is non-obvious to the reader. To avoid thrashing the sie
control block pointer in load_fpu_regs(), move the __SIE_* offsets eight
bytes after __SF_EMPTY on the stack.
Stephen Rothwell [Mon, 23 Nov 2020 07:40:16 +0000 (18:40 +1100)]
powerpc/64s: Fix allnoconfig build since uaccess flush
Using DECLARE_STATIC_KEY_FALSE needs linux/jump_table.h.
Otherwise the build fails with eg:
arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: warning: data definition has no type or storage class
66 | DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
Michael Ellerman [Mon, 23 Nov 2020 10:16:27 +0000 (21:16 +1100)]
Merge tag 'powerpc-cve-2020-4788' into fixes
From Daniel's cover letter:
IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.
However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern.
This patch series flushes the L1 cache on kernel entry (patch 2) and after the
kernel performs any user accesses (patch 3). It also adds a self-test and
performs some related cleanups.
Sudeep Holla [Fri, 20 Nov 2020 10:12:52 +0000 (10:12 +0000)]
cpufreq: scmi: Fix build for !CONFIG_COMMON_CLK
Commit 8410e7f3b31e ("cpufreq: scmi: Fix OPP addition failure with a
dummy clock provider") registers a dummy clock provider using
devm_of_clk_add_hw_provider. These *_hw_provider functions are defined
only when CONFIG_COMMON_CLK=y. One possible fix is to add the Kconfig
dependency, but since we plan to move away from the clock dependency
for scmi cpufreq, it is preferrable to avoid that.
Let us just conditionally compile out the offending call to
devm_of_clk_add_hw_provider. It also uses the variable 'dev' outside
of the #ifdef block to avoid build warning.
drm/exynos: depend on COMMON_CLK to fix compile tests
The Exynos DRM uses Common Clock Framework thus it cannot be built on
platforms without it (e.g. compile test on MIPS with RALINK and
SOC_RT305X):
/usr/bin/mips-linux-gnu-ld: drivers/gpu/drm/exynos/exynos_mixer.o: in function `mixer_bind':
exynos_mixer.c:(.text+0x958): undefined reference to `clk_set_parent'
Linus Torvalds [Sun, 22 Nov 2020 22:36:06 +0000 (14:36 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- Various functionality / regression fixes for Logitech devices from
Hans de Goede
- Fix for (recently added) GPIO support in mcp2221 driver from Lars
Povlsen
- Power management handling fix/quirk in i2c-hid driver for certain
BIOSes that have strange aproach to power-cycle from Hans de Goede
- a few device ID additions and device-specific quirks
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: logitech-dj: Fix Dinovo Mini when paired with a MX5x00 receiver
HID: logitech-dj: Fix an error in mse_bluetooth_descriptor
HID: Add Logitech Dinovo Edge battery quirk
HID: logitech-hidpp: Add HIDPP_CONSUMER_VENDOR_KEYS quirk for the Dinovo Edge
HID: logitech-dj: Handle quad/bluetooth keyboards with a builtin trackpad
HID: add HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE for Gamevice devices
HID: mcp2221: Fix GPIO output handling
HID: hid-sensor-hub: Fix issue with devices with no report ID
HID: i2c-hid: Put ACPI enumerated devices in D3 on shutdown
HID: add support for Sega Saturn
HID: cypress: Support Varmilo Keyboards' media hotkeys
HID: ite: Replace ABS_MISC 120/121 events with touchpad on/off keypresses
HID: logitech-hidpp: Add PID for MX Anywhere 2
HID: uclogic: Add ID for Trust Flex Design Tablet
Linus Torvalds [Sun, 22 Nov 2020 21:26:07 +0000 (13:26 -0800)]
Merge tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"A couple of scheduler fixes:
- Make the conditional update of the overutilized state work
correctly by caching the relevant flags state before overwriting
them and checking them afterwards.
- Fix a data race in the wakeup path which caused loadavg on ARM64
platforms to become a random number generator.
- Fix the ordering of the iowaiter accounting operations so it can't
be decremented before it is incremented.
- Fix a bug in the deadline scheduler vs. priority inheritance when a
non-deadline task A has inherited the parameters of a deadline task
B and then blocks on a non-deadline task C.
The second inheritance step used the static deadline parameters of
task A, which are usually 0, instead of further propagating task
B's parameters. The zero initialized parameters trigger a bug in
the deadline scheduler"
* tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix priority inheritance with multiple scheduling classes
sched: Fix rq->nr_iowait ordering
sched: Fix data-race in wakeup
sched/fair: Fix overutilized update in enqueue_task_fair()
Linus Torvalds [Sun, 22 Nov 2020 21:23:43 +0000 (13:23 -0800)]
Merge tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
"A single fix for the x86 perf sysfs interfaces which used kobject
attributes instead of device attributes and therefore making clang's
control flow integrity checker upset"
* tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: fix sysfs type mismatches
Linus Torvalds [Sun, 22 Nov 2020 21:19:53 +0000 (13:19 -0800)]
Merge tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
"A single fix for lockdep which makes the recursion protection cover
graph lock/unlock"
* tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Put graph lock/unlock under lock_recursion protection
Linus Torvalds [Sun, 22 Nov 2020 21:05:48 +0000 (13:05 -0800)]
Merge tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Borislav Petkov:
"Forwarded EFI fixes from Ard Biesheuvel:
- fix memory leak in efivarfs driver
- fix HYP mode issue in 32-bit ARM version of the EFI stub when built
in Thumb2 mode
- avoid leaking EFI pgd pages on allocation failure"
* tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/x86: Free efi_pgd with free_pages()
efivarfs: fix memory leak in efivarfs_create()
efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP
Linus Torvalds [Sun, 22 Nov 2020 20:55:50 +0000 (12:55 -0800)]
Merge tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- An IOMMU VT-d build fix when CONFIG_PCI_ATS=n along with a revert of
same because the proper one is going through the IOMMU tree (Thomas
Gleixner)
- An Intel microcode loader fix to save the correct microcode patch to
apply during resume (Chen Yu)
- A fix to not access user memory of other processes when dumping
opcode bytes (Thomas Gleixner)
* tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "iommu/vt-d: Take CONFIG_PCI_ATS into account"
x86/dumpstack: Do not try to access user space code of other tasks
x86/microcode/intel: Check patch signature before saving microcode for early loading
iommu/vt-d: Take CONFIG_PCI_ATS into account
Linus Torvalds [Sun, 22 Nov 2020 20:14:46 +0000 (12:14 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"8 patches.
Subsystems affected by this patch series: mm (madvise, pagemap,
readahead, memcg, userfaultfd), kbuild, and vfs"
* emailed patches from Andrew Morton <[email protected]>:
mm: fix madvise WILLNEED performance problem
libfs: fix error cast of negative value in simple_attr_write()
mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
mm: memcg/slab: fix root memcg vmstats
mm: fix readahead_page_batch for retry entries
mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
compiler-clang: remove version check for BPF Tracing
mm/madvise: fix memory leak from process_madvise
Linus Torvalds [Sun, 22 Nov 2020 19:58:49 +0000 (11:58 -0800)]
Merge tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging and IIO fixes from Greg KH:
"Here are some small Staging and IIO driver fixes for 5.10-rc5. They
include:
- IIO fixes for reported regressions and problems
- new device ids for IIO drivers
- new device id for rtl8723bs driver
- staging ralink driver Kconfig dependency fix
- staging mt7621-pci bus resource fix
All of these have been in linux-next all week with no reported issues"
* tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: accel: kxcjk1013: Add support for KIOX010A ACPI DSM for setting tablet-mode
iio: accel: kxcjk1013: Replace is_smo8500_device with an acpi_type enum
docs: ABI: testing: iio: stm32: remove re-introduced unsupported ABI
iio: light: fix kconfig dependency bug for VCNL4035
iio/adc: ingenic: Fix AUX/VBAT readings when touchscreen is used
iio/adc: ingenic: Fix battery VREF for JZ4770 SoC
staging: rtl8723bs: Add 024c:0627 to the list of SDIO device-ids
staging: ralink-gdma: fix kconfig dependency bug for DMA_RALINK
staging: mt7621-pci: avoid to request pci bus resources
iio: imu: st_lsm6dsx: set 10ms as min shub slave timeout
counter/ti-eqep: Fix regmap max_register
iio: adc: stm32-adc: fix a regression when using dma and irq
iio: adc: mediatek: fix unset field
iio: cros_ec: Use default frequencies when EC returns invalid information
Linus Torvalds [Sun, 22 Nov 2020 19:52:10 +0000 (11:52 -0800)]
Merge tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fixes from Greg KH:
"Here are some small tty/serial fixes for 5.10-rc5 that resolve some
reported issues:
- speakup crash when telling the kernel to use a device that isn't
really there
- imx serial driver fixes for reported problems
- ar933x_uart driver fix for probe error handling path
All have been in linux-next for a while with no reported issues"
* tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: ar933x_uart: disable clk on error handling path in probe
tty: serial: imx: keep console clocks always on
speakup: Do not let the line discipline be used several times
tty: serial: imx: fix potential deadlock
Linus Torvalds [Sun, 22 Nov 2020 19:39:32 +0000 (11:39 -0800)]
Merge tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A final set of miscellaneous bug fixes for ext4"
* tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix bogus warning in ext4_update_dx_flag()
jbd2: fix kernel-doc markups
ext4: drop fast_commit from /proc/mounts
David Howells [Sun, 22 Nov 2020 13:13:45 +0000 (13:13 +0000)]
afs: Fix speculative status fetch going out of order wrt to modifications
When doing a lookup in a directory, the afs filesystem uses a bulk
status fetch to speculatively retrieve the statuses of up to 48 other
vnodes found in the same directory and it will then either update extant
inodes or create new ones - effectively doing 'lookup ahead'.
To avoid the possibility of deadlocking itself, however, the filesystem
doesn't lock all of those inodes; rather just the directory inode is
locked (by the VFS).
When the operation completes, afs_inode_init_from_status() or
afs_apply_status() is called, depending on whether the inode already
exists, to commit the new status.
A case exists, however, where the speculative status fetch operation may
straddle a modification operation on one of those vnodes. What can then
happen is that the speculative bulk status RPC retrieves the old status,
and whilst that is happening, the modification happens - which returns
an updated status, then the modification status is committed, then we
attempt to commit the speculative status.
This results in something like the following being seen in dmesg:
showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus
say that the vnode had data version 8 when we'd already recorded version
9 due to a local modification. This was causing the cache to be
invalidated for that vnode when it shouldn't have been. If it happens
on a data file, this might lead to local changes being lost.
Fix this by ignoring speculative status updates if the data version
doesn't match the expected value.
Note that it is possible to get a DV regression if a volume gets
restored from a backup - but we should get a callback break in such a
case that should trigger a recheck anyway. It might be worth checking
the volume creation time in the volsync info and, if a change is
observed in that (as would happen on a restore), invalidate all caches
associated with the volume.
Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Yicong Yang [Sun, 22 Nov 2020 06:17:19 +0000 (22:17 -0800)]
libfs: fix error cast of negative value in simple_attr_write()
The attr->set() receive a value of u64, but simple_strtoll() is used for
doing the conversion. It will lead to the error cast if user inputs a
negative value.
Use kstrtoull() instead of simple_strtoll() to convert a string got from
the user to an unsigned value. The former will return '-EINVAL' if it
gets a negetive value, but the latter can't handle the situation
correctly. Make 'val' unsigned long long as what kstrtoull() takes,
this will eliminate the compile warning on no 64-bit architectures.
Gerald Schaefer [Sun, 22 Nov 2020 06:17:15 +0000 (22:17 -0800)]
mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
Alexander reported a syzkaller / KASAN finding on s390, see below for
complete output.
In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be
freed in some cases. In the case of userfaultfd_missing(), this will
happen after calling handle_userfault(), which might have released the
mmap_lock. Therefore, the following pte_free(vma->vm_mm, pgtable) will
access an unstable vma->vm_mm, which could have been freed or re-used
already.
For all architectures other than s390 this will go w/o any negative
impact, because pte_free() simply frees the page and ignores the
passed-in mm. The implementation for SPARC32 would also access
mm->page_table_lock for pte_free(), but there is no THP support in
SPARC32, so the buggy code path will not be used there.
For s390, the mm->context.pgtable_list is being used to maintain the 2K
pagetable fragments, and operating on an already freed or even re-used
mm could result in various more or less subtle bugs due to list /
pagetable corruption.
Fix this by calling pte_free() before handle_userfault(), similar to how
it is already done in __do_huge_pmd_anonymous_page() for the WRITE /
non-huge_zero_page case.
Commit 6b251fc96cf2c ("userfaultfd: call handle_userfault() for
userfaultfd_missing() faults") actually introduced both, the
do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page()
changes wrt to calling handle_userfault(), but only in the latter case
it put the pte_free() before calling handle_userfault().
BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334
Memory state around the buggy address: 00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
>00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ 00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Muchun Song [Sun, 22 Nov 2020 06:17:12 +0000 (22:17 -0800)]
mm: memcg/slab: fix root memcg vmstats
If we reparent the slab objects to the root memcg, when we free the slab
object, we need to update the per-memcg vmstats to keep it correct for
the root memcg. Now this at least affects the vmstat of
NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack size is
smaller than the PAGE_SIZE.
David said:
"I assume that without this fix that the root memcg's vmstat would
always be inflated if we reparented"
Both btrfs and fuse have reported faults caused by seeing a retry entry
instead of the page they were looking for. This was caused by a missing
check in the iterator.
As can be seen in the below panic log, the accessing 0x402 causes a
panic. In the xarray.h, 0x402 means RETRY_ENTRY.
Dan Williams [Sun, 22 Nov 2020 06:17:05 +0000 (22:17 -0800)]
mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
The core-mm has a default __weak implementation of phys_to_target_node()
to mirror the weak definition of memory_add_physaddr_to_nid(). That
symbol is exported for modules. However, while the export in
mm/memory_hotplug.c exported the symbol in the configuration cases of:
Not only is that broken, but Christoph points out that the kernel should
not be exporting any __weak symbol, which means that
memory_add_physaddr_to_nid() example that phys_to_target_node() copied
is broken too.
Rework the definition of phys_to_target_node() and
memory_add_physaddr_to_nid() to not require weak symbols. Move to the
common arch override design-pattern of an asm header defining a symbol
to replace the default implementation.
The only common header that all memory_add_physaddr_to_nid() producing
architectures implement is asm/sparsemem.h. In fact, powerpc already
defines its memory_add_physaddr_to_nid() helper in sparsemem.h.
Double-down on that observation and define phys_to_target_node() where
necessary in asm/sparsemem.h. An alternate consideration that was
discarded was to put this override in asm/numa.h, but that entangles
with the definition of MAX_NUMNODES relative to the inclusion of
linux/nodemask.h, and requires powerpc to grow a new header.
The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid
now that the symbol is properly exported / stubbed in all combinations
of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.
Nick Desaulniers [Sun, 22 Nov 2020 06:17:01 +0000 (22:17 -0800)]
compiler-clang: remove version check for BPF Tracing
bpftrace parses the kernel headers and uses Clang under the hood.
Remove the version check when __BPF_TRACING__ is defined (as bpftrace
does) so that this tool can continue to parse kernel headers, even with
older clang sources.
Xu Qiang [Sat, 7 Nov 2020 10:42:26 +0000 (10:42 +0000)]
irqchip/gic-v3-its: Unconditionally save/restore the ITS state on suspend
On systems without HW-based collections (i.e. anything except GIC-500),
we rely on firmware to perform the ITS save/restore. This doesn't
really work, as although FW can properly save everything, it cannot
fully restore the state of the command queue (the read-side is reset
to the head of the queue). This results in the ITS consuming previously
processed commands, potentially corrupting the state.
Instead, let's always save the ITS state on suspend, disabling it in the
process, and restore the full state on resume. This saves us from broken
FW as long as it doesn't enable the ITS by itself (for which we can't do
anything).
This amounts to simply dropping the ITS_FLAGS_SAVE_SUSPEND_STATE.
Lijun Pan [Fri, 20 Nov 2020 22:40:13 +0000 (16:40 -0600)]
ibmvnic: skip tx timeout reset while in resetting
Sometimes it takes longer than 5 seconds (watchdog timeout) to complete
failover, migration, and other resets. In stead of scheduling another
timeout reset, we wait for the current one to complete.
Lijun Pan [Fri, 20 Nov 2020 22:40:12 +0000 (16:40 -0600)]
ibmvnic: notify peers when failover and migration happen
Commit 61d3e1d9bc2a ("ibmvnic: Remove netdev notify for failover resets")
excluded the failover case for notify call because it said
netdev_notify_peers() can cause network traffic to stall or halt.
Current testing does not show network traffic stall
or halt because of the notify call for failover event.
netdev_notify_peers may be used when a device wants to inform the
rest of the network about some sort of a reconfiguration
such as failover or migration.
It is unnecessary to call that in other events like
FATAL, NON_FATAL, CHANGE_PARAM, and TIMEOUT resets
since in those scenarios the hardware does not change.
If the driver must do a hard reset, it is necessary to notify peers.
Lijun Pan [Fri, 20 Nov 2020 22:40:11 +0000 (16:40 -0600)]
ibmvnic: fix call_netdevice_notifiers in do_reset
When netdev_notify_peers was substituted in
commit 986103e7920c ("net/ibmvnic: Fix RTNL deadlock during device reset"),
call_netdevice_notifiers(NETDEV_RESEND_IGMP, dev) was missed.
Fix it now.
Fixes: 986103e7920c ("net/ibmvnic: Fix RTNL deadlock during device reset") Signed-off-by: Lijun Pan <[email protected]> Reviewed-by: Dany Madden <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Jens Axboe [Fri, 20 Nov 2020 14:59:54 +0000 (07:59 -0700)]
tun: honor IOCB_NOWAIT flag
tun only checks the file O_NONBLOCK flag, but it should also be checking
the iocb IOCB_NOWAIT flag. Any fops using ->read/write_iter() should check
both, otherwise it breaks users that correctly expect O_NONBLOCK semantics
if IOCB_NOWAIT is set.
Julian Wiedmann [Fri, 20 Nov 2020 10:06:57 +0000 (11:06 +0100)]
net/af_iucv: set correct sk_protocol for child sockets
Child sockets erroneously inherit their parent's sk_type (ie. SOCK_*),
instead of the PF_IUCV protocol that the parent was created with in
iucv_sock_create().
We're currently not using sk->sk_protocol ourselves, so this shouldn't
have much impact (except eg. getting the output in skb_dump() right).
Starting with iOS 14 released in September 2020, connectivity using the
personal hotspot USB tethering function of iOS devices is broken.
Communication between the host and the device (for example ICMP traffic
or DNS resolution using the DNS service running in the device itself)
works fine, but communication to endpoints further away doesn't work.
Investigation on the matter shows that no UDP and ICMP traffic from the
tethered host is reaching the Internet at all. For TCP traffic there are
exchanges between tethered host and server but packets are modified in
transit leading to impossible communication.
After some trials Matti Vuorela discovered that reducing the URB buffer
size by two bytes restored the previous behavior. While a better
solution might exist to fix the issue, since the protocol is not
publicly documented and considering the small size of the fix, let's do
that.
Tom Seewald [Fri, 20 Nov 2020 19:25:28 +0000 (13:25 -0600)]
cxgb4: Fix build failure when CONFIG_TLS=m
After commit 9d2e5e9eeb59 ("cxgb4/ch_ktls: decrypted bit is not enough")
whenever CONFIG_TLS=m and CONFIG_CHELSIO_T4=y, the following build
failure occurs:
ld: drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.o: in function
`cxgb_select_queue':
cxgb4_main.c:(.text+0x2dac): undefined reference to `tls_validate_xmit_skb'
Fix this by ensuring that if TLS is set to be a module, CHELSIO_T4 will
also be compiled as a module. As otherwise the cxgb4 driver will not be
able to access TLS' symbols.
This is a potential use-after-free if the sysfs nodes are being accessed
whilst removing the struct slave, so wait for the object destruction to
complete before freeing the struct slave itself.
Linus Torvalds [Sat, 21 Nov 2020 18:36:25 +0000 (10:36 -0800)]
Merge tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"The critical fixes are for a crash that someone reported in the xattr
code on 32-bit arm last week; and a revert of the rmap key comparison
change from last week as it was totally wrong. I need a vacation. :(
Summary:
- Fix various deficiencies in online fsck's metadata checking code
- Fix an integer casting bug in the xattr code on 32-bit systems
- Fix a hang in an inode walk when the inode index is corrupt
- Fix error codes being dropped when initializing per-AG structures
- Fix nowait directio writes that partially succeed but return EAGAIN
- Revert last week's rmap comparison patch because it was wrong"
* tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: revert "xfs: fix rmap key and record comparison functions"
xfs: don't allow NOWAIT DIO across extent boundaries
xfs: return corresponding errcode if xfs_initialize_perag() fail
xfs: ensure inobt record walks always make forward progress
xfs: fix forkoff miscalculation related to XFS_LITINO(mp)
xfs: directory scrub should check the null bestfree entries too
xfs: strengthen rmap record flags checking
xfs: fix the minrecs logic when dealing with inode root child blocks
Linus Torvalds [Sat, 21 Nov 2020 18:33:33 +0000 (10:33 -0800)]
Merge tag 'fsnotify_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fanotify fix from Jan Kara:
"A single fanotify fix from Amir"
* tag 'fsnotify_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: fix logic of reporting name info with watched parent
Linus Torvalds [Sat, 21 Nov 2020 18:24:05 +0000 (10:24 -0800)]
Merge tag 'seccomp-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
"This gets the seccomp selftests running again on powerpc and sh, and
fixes an audit reporting oversight noticed in both seccomp and ptrace.
- Fix typos in seccomp selftests on powerpc and sh (Kees Cook)
- Fix PF_SUPERPRIV audit marking in seccomp and ptrace (Mickaël
Salaün)"
* tag 'seccomp-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests/seccomp: sh: Fix register names
selftests/seccomp: powerpc: Fix typo in macro variable name
seccomp: Set PF_SUPERPRIV when checking capability
ptrace: Set PF_SUPERPRIV when checking capability
CK Hu [Fri, 13 Nov 2020 03:49:07 +0000 (11:49 +0800)]
drm/mediatek: dsi: Modify horizontal front/back porch byte formula
In the patch to be fixed, horizontal_backporch_byte become too large
for some panel, so roll back that patch. For small hfp or hbp panel,
using vm->hfront_porch + vm->hback_porch to calculate
horizontal_backporch_byte would make it negtive, so
use horizontal_backporch_byte itself to make it positive.
Fixes: 35bf948f1edb ("drm/mediatek: dsi: Fix scrolling of panel with small hfp or hbp") Signed-off-by: CK Hu <[email protected]> Signed-off-by: Chun-Kuang Hu <[email protected]> Tested-by: Bilal Wasim <[email protected]>
Jakub Kicinski [Sat, 21 Nov 2020 02:59:50 +0000 (18:59 -0800)]
Merge branch 's390-qeth-fixes-2020-11-20'
Julian Wiedmann says:
====================
s390/qeth: fixes 2020-11-20
This brings several fixes for qeth's af_iucv-specific code paths.
Also one fix by Alexandra for the recently added BR_LEARNING_SYNC
support. We want to trust the feature indication bit, so that HW can
mask it out if there's any issues on their end.
====================
Julian Wiedmann [Fri, 20 Nov 2020 09:09:39 +0000 (10:09 +0100)]
s390/qeth: fix tear down of async TX buffers
When qeth_iqd_tx_complete() detects that a TX buffer requires additional
async completion via QAOB, it might fail to replace the queue entry's
metadata (and ends up triggering recovery).
Assume now that the device gets torn down, overruling the recovery.
If the QAOB notification then arrives before the tear down has
sufficiently progressed, the buffer state is changed to
QETH_QDIO_BUF_HANDLED_DELAYED by qeth_qdio_handle_aob().
The tear down code calls qeth_drain_output_queue(), where
qeth_cleanup_handled_pending() will then attempt to replace such a
buffer _again_. If it succeeds this time, the buffer ends up dangling in
its replacement's ->next_pending list ... where it will never be freed,
since there's no further call to qeth_cleanup_handled_pending().
But the second attempt isn't actually needed, we can simply leave the
buffer on the queue and re-use it after a potential recovery has
completed. The qeth_clear_output_buffer() in qeth_drain_output_queue()
will ensure that it's in a clean state again.
Fixes: 72861ae792c2 ("qeth: recovery through asynchronous delivery") Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Julian Wiedmann [Fri, 20 Nov 2020 09:09:38 +0000 (10:09 +0100)]
s390/qeth: fix af_iucv notification race
The two expected notification sequences are
1. TX_NOTIFY_PENDING with a subsequent TX_NOTIFY_DELAYED_*, when
our TX completion code first observed the pending TX and the QAOB
then completes at a later time; or
2. TX_NOTIFY_OK, when qeth_qdio_handle_aob() picked up the QAOB
completion before our TX completion code even noticed that the TX
was pending.
But as qeth_iqd_tx_complete() and qeth_qdio_handle_aob() can run
concurrently, we may end up with a race that results in a sequence of
TX_NOTIFY_DELAYED_* followed by TX_NOTIFY_PENDING. Which would confuse
the af_iucv code in its tracking of pending transmits.
Rework the notification code, so that qeth_qdio_handle_aob() defers its
notification if the TX completion code is still active.
Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Julian Wiedmann [Fri, 20 Nov 2020 09:09:37 +0000 (10:09 +0100)]
s390/qeth: make af_iucv TX notification call more robust
Calling into socket code is ugly already, at least check whether we are
dealing with the expected sk_family. Only looking at skb->protocol is
bound to cause troubles (consider eg. af_packet).
Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
====================
tcp: Address issues with ECT0 not being set in DCTCP packets
This patch set is meant to address issues seen with SYN/ACK packets not
containing the ECT0 bit when DCTCP is configured as the congestion control
algorithm for a TCP socket.
A simple test using "tcpdump" and "test_progs -t bpf_tcp_ca" makes the
issue obvious. Looking at the packets will result in the SYN/ACK packet
with an ECT0 bit that does not match the other packets for the flow when
the congestion control agorithm is switch from the default. So for example
going from non-DCTCP to a DCTCP congestion control algorithm we will see
the SYN/ACK IPV6 header will not have ECT0 set while the other packets in
the flow will. Likewise if we switch from a default of DCTCP to cubic we
will see the ECT0 bit set in the SYN/ACK while the other packets in the
flow will not.
====================
Alexander Duyck [Thu, 19 Nov 2020 21:23:58 +0000 (13:23 -0800)]
tcp: Set INET_ECN_xmit configuration in tcp_reinit_congestion_control
When setting congestion control via a BPF program it is seen that the
SYN/ACK for packets within a given flow will not include the ECT0 flag. A
bit of simple printk debugging shows that when this is configured without
BPF we will see the value INET_ECN_xmit value initialized in
tcp_assign_congestion_control however when we configure this via BPF the
socket is in the closed state and as such it isn't configured, and I do not
see it being initialized when we transition the socket into the listen
state. The result of this is that the ECT0 bit is configured based on
whatever the default state is for the socket.
Any easy way to reproduce this is to monitor the following with tcpdump:
tools/testing/selftests/bpf/test_progs -t bpf_tcp_ca
Without this patch the SYN/ACK will follow whatever the default is. If dctcp
all SYN/ACK packets will have the ECT0 bit set, and if it is not then ECT0
will be cleared on all SYN/ACK packets. With this patch applied the SYN/ACK
bit matches the value seen on the other packets in the given stream.
Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
Alexander Duyck [Thu, 19 Nov 2020 21:23:51 +0000 (13:23 -0800)]
tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header
An issue was recently found where DCTCP SYN/ACK packets did not have the
ECT bit set in the L3 header. A bit of code review found that the recent
change referenced below had gone though and added a mask that prevented the
ECN bits from being populated in the L3 header.
This patch addresses that by rolling back the mask so that it is only
applied to the flags coming from the incoming TCP request instead of
applying it to the socket tos/tclass field. Doing this the ECT bits were
restored in the SYN/ACK packets in my testing.
One thing that is not addressed by this patch set is the fact that
tcp_reflect_tos appears to be incompatible with ECN based congestion
avoidance algorithms. At a minimum the feature should likely be documented
which it currently isn't.
Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket") Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Wei Wang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
* tag 'block-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
s390/dasd: fix null pointer dereference for ERP requests
blk-cgroup: fix a hd_struct leak in blkcg_fill_root_iostats
nvme: fix memory leak freeing command effects
nvme: directly cache command effects log
nvme: free sq/cq dbbuf pointers when dbbuf set fails
block: mark flush request as IDLE when it is really finished
Linus Torvalds [Fri, 20 Nov 2020 19:47:22 +0000 (11:47 -0800)]
Merge tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Mostly regression or stable fodder:
- Disallow async path resolution of /proc/self
- Tighten constraints for segmented async buffered reads
- Fix double completion for a retry error case
- Fix for fixed file life times (Pavel)"
* tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
io_uring: order refnode recycling
io_uring: get an active ref_node from files_data
io_uring: don't double complete failed reissue request
mm: never attempt async page lock if we've transferred data already
io_uring: handle -EOPNOTSUPP on path resolution
proc: don't allow async path resolution of /proc/self components
Eric Biggers [Wed, 11 Nov 2020 21:48:55 +0000 (13:48 -0800)]
block/keyslot-manager: prevent crash when num_slots=1
If there is only one keyslot, then blk_ksm_init() computes
slot_hashtable_size=1 and log_slot_ht_size=0. This causes
blk_ksm_find_keyslot() to crash later because it uses
hash_ptr(key, log_slot_ht_size) to find the hash bucket containing the
key, and hash_ptr() doesn't support the bits == 0 case.
Fix this by making the hash table always have at least 2 buckets.
Vadim Fedorenko [Thu, 19 Nov 2020 15:59:48 +0000 (18:59 +0300)]
net/tls: missing received data after fast remote close
In case when tcp socket received FIN after some data and the
parser haven't started before reading data caller will receive
an empty buffer. This behavior differs from plain TCP socket and
leads to special treating in user-space.
The flow that triggers the race is simple. Server sends small
amount of data right after the connection is configured to use TLS
and closes the connection. In this case receiver sees TLS Handshake
data, configures TLS socket right after Change Cipher Spec record.
While the configuration is in process, TCP socket receives small
Application Data record, Encrypted Alert record and FIN packet. So
the TCP socket changes sk_shutdown to RCV_SHUTDOWN and sk_flag with
SK_DONE bit set. The received data is not parsed upon arrival and is
never sent to user-space.
Patch unpauses parser directly if we have unparsed data in tcp
receive queue.
Linus Torvalds [Fri, 20 Nov 2020 18:20:16 +0000 (10:20 -0800)]
Merge tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull iommu fixes from Will Deacon:
"Two straightforward vt-d fixes:
- Fix boot when intel iommu initialisation fails under TXT (tboot)
- Fix intel iommu compilation error when DMAR is enabled without ATS
and temporarily update IOMMU MAINTAINERs entry"
* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
MAINTAINERS: Temporarily add myself to the IOMMU entry
iommu/vt-d: Fix compile error with CONFIG_PCI_ATS not set
iommu/vt-d: Avoid panic if iommu init fails in tboot system
Linus Torvalds [Fri, 20 Nov 2020 18:16:26 +0000 (10:16 -0800)]
Merge tag 'mmc-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"A couple of MMC fixes:
- sdhci-of-arasan: Stabilize communication by fixing tap value configs
- sdhci-pci: Use SDR25 timing for HS mode for BYT-based Intel HWs"
* tag 'mmc-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-of-arasan: Issue DLL reset explicitly
mmc: sdhci-of-arasan: Use Mask writes for Tap delays
mmc: sdhci-of-arasan: Allow configuring zero tap values
mmc: sdhci-pci: Prefer SDR25 timing for High Speed mode for BYT-based Intel controllers
Anmol Karn [Thu, 19 Nov 2020 19:10:43 +0000 (00:40 +0530)]
rose: Fix Null pointer dereference in rose_send_frame()
rose_send_frame() dereferences `neigh->dev` when called from
rose_transmit_clear_request(), and the first occurrence of the
`neigh` is in rose_loopback_timer() as `rose_loopback_neigh`,
and it is initialized in rose_add_loopback_neigh() as NULL.
i.e when `rose_loopback_neigh` used in rose_loopback_timer()
its `->dev` was still NULL and rose_loopback_timer() was calling
rose_rx_call_request() without checking for NULL.
- net/rose/rose_link.c
This bug seems to get triggered in this line:
rose_call = (ax25_address *)neigh->dev->dev_addr;
Fix it by adding NULL checking for `rose_loopback_neigh->dev`
in rose_loopback_timer().
Linus Torvalds [Fri, 20 Nov 2020 17:56:16 +0000 (09:56 -0800)]
Merge tag 'sound-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes: the only core change is a minor error
code handling in the control API, and all the rest are device-specific
fixes, mostly quirks, fixups and ASoC Intel fixes.
It looks boring, and good so"
* tag 'sound-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: mixart: Fix mutex deadlock
ALSA: hda/ca0132: Fix compile warning without PCI
ASOC: Intel: kbl_rt5663_rt5514_max98927: Do not try to disable disabled clock
ALSA: usb-audio: Add delay quirk for all Logitech USB devices
ASoC: Intel: catpt: Correct clock selection for dai trigger
ASoC: Intel: catpt: Skip position update for unprepared streams
ASoC: qcom: lpass-platform: Fix memory leak
ASoC: Intel: KMB: Fix S24_LE configuration
ALSA: hda: Add Alderlake-S PCI ID and HDMI codec vid
ALSA: usb-audio: Use ALC1220-VB-DT mapping for ASUS ROG Strix TRX40 mobo
ALSA: firewire: Clean up a locking issue in copy_resp_to_buf()
ASoC: rt1015: increase the time to detect BCLK
ALSA: ctl: fix error path at adding user-defined element set
ALSA: hda/realtek - HP Headset Mic can't detect after boot
ALSA: hda/realtek - Add supported mute Led for HP
ALSA: hda/realtek: Add some Clove SSID in the ALC293(ALC1220)
ALSA: hda/realtek - Add supported for Lenovo ThinkPad Headset Button
ASoC: rt1015: add delay to fix pop noise from speaker
Linus Torvalds [Fri, 20 Nov 2020 17:49:25 +0000 (09:49 -0800)]
Merge tag 'drm-fixes-2020-11-20-2' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Weekly fixes pull.
This contains some fixes for sun4i/dw-hdmi probing, then amdgpu
enables arcturus hw without experimental flag and two other fixes and
a group of i915 fixes.
It also has a backported from next fix for the warn on reported in
ast/drm_gem_vram_helper code in the merge window. There's a separate
report which initially looked to be the same problem, but I'm going to
chase that up next week a bit more as I don't think the bisect landed
anywhere useful.
Summary:
core:
- vram helper TTM regression fix
amdgpu:
- Pageflip fix for navi1x with 5 or 6 displays
- Remove experimental flag for Arcturus
- Fix regression in atomic commit tail rework
* tag 'drm-fixes-2020-11-20-2' of git://anongit.freedesktop.org/drm/drm:
drm/i915/gt: Fixup tgl mocs for PTE tracking
drm/vram-helper: Fix use of top-down placement
drm/i915/gt: Remember to free the virtual breadcrumbs
drm/i915: Handle max_bpc==16
drm/amd/display: Always get CRTC updated constant values inside commit tail
drm/sun4i: backend: Fix probe failure with multiple backends
drm/sun4i: dw-hdmi: fix error return code in sun8i_dw_hdmi_bind()
drm/i915/selftests: Fix wrong return value of perf_request_latency()
drm/i915/selftests: Fix wrong return value of perf_series_engines()
drm/i915: Avoid memory leak with more than 16 workarounds on a list
drm/i915/tgl: Fix Media power gate sequence.
drm/amdgpu: remove experimental flag from arcturus
drm/amd/display: Add missing pflip irq for dcn2.0
drm/i915/gvt: return error when failing to take the module reference
drm: bridge: dw-hdmi: Avoid resetting force in the detect function
drm/i915/gvt: Set ENHANCED_FRAME_CAP bit
drm/i915/gvt: Temporarily disable vfio_edid for BXT/APL
Serge Semin [Tue, 17 Nov 2020 09:45:17 +0000 (12:45 +0300)]
spi: Take the SPI IO-mutex in the spi_setup() method
I've discovered that due to the recent commit 49d7d695ca4b ("spi: dw:
Explicitly de-assert CS on SPI transfer completion") a concurrent usage of
the spidev devices with different chip-selects causes the "SPI transfer
timed out" error. The root cause of the problem has turned to be in a race
condition of the SPI-transfer execution procedure and the spi_setup()
method being called at the same time. In particular in calling the
spi_set_cs(false) while there is an SPI-transfer being executed. In my
case due to the commit cited above all CSs get to be switched off by
calling the spi_setup() for /dev/spidev0.1 while there is an concurrent
SPI-transfer execution performed on /dev/spidev0.0. Of course a situation
of the spi_setup() being called while there is an SPI-transfer being
executed for two different SPI peripheral devices of the same controller
may happen not only for the spidev driver, but for instance for MMC SPI +
some another device, or spi_setup() being called from an SPI-peripheral
probe method while some other device has already been probed and is being
used by a corresponding driver...
Of course I could have provided a fix affecting the DW APB SSI driver
only, for instance, by creating a mutual exclusive access to the set_cs
callback and setting/clearing only the bit responsible for the
corresponding chip-select. But after a short research I've discovered that
the problem most likely affects a lot of the other drivers:
- drivers/spi/spi-sun4i.c - RMW the chip-select register;
- drivers/spi/spi-rockchip.c - RMW the chip-select register;
- drivers/spi/spi-qup.c - RMW a generic force-CS flag in a CSR.
- drivers/spi/spi-sifive.c - set a generic CS-mode flag in a CSR.
- drivers/spi/spi-bcm63xx-hsspi.c - uses an internal mutex to serialize
the bus config changes, but still isn't protected from the race
condition described above;
- drivers/spi/spi-geni-qcom.c - RMW a chip-select internal flag and set the
CS state in HW;
- drivers/spi/spi-orion.c - RMW a chip-select register;
- drivers/spi/spi-cadence.c - RMW a chip-select register;
- drivers/spi/spi-armada-3700.c - RMW a chip-select register;
- drivers/spi/spi-lantiq-ssc.c - overwrites the chip-select register;
- drivers/spi/spi-sun6i.c - RMW a chip-select register;
- drivers/spi/spi-synquacer.c - RMW a chip-select register;
- drivers/spi/spi-altera.c - directly sets the chip-select state;
- drivers/spi/spi-omap2-mcspi.c - RMW an internally cached CS state and
writes it to HW;
- drivers/spi/spi-mt65xx.c - RMW some CSR;
- drivers/spi/spi-jcore.c - directly sets the chip-selects state;
- drivers/spi/spi-mt7621.c - RMW a chip-select register;
I could have missed some drivers, but a scale of the problem is obvious.
As you can see most of the drivers perform an unprotected
Read-modify-write chip-select register modification in the set_cs callback.
Seeing the spi_setup() function is calling the spi_set_cs() and it can be
executed concurrently with SPI-transfers exec procedure, which also calls
spi_set_cs() in the SPI core spi_transfer_one_message() method, the race
condition of the register modification turns to be obvious.
To sum up the problem denoted above affects each driver for a controller
having more than one chip-select lane and which:
1) performs the RMW to some CS-related register with no serialization;
2) directly disables any CS on spi_set_cs(dev, false).
* the later is the case of the DW APB SSI driver.
The controllers which equipped with a single CS theoretically can also
experience the problem, but in practice will not since normally the
spi_setup() isn't called concurrently with the SPI-transfers executed on
the same SPI peripheral device.
In order to generically fix the denoted bug I'd suggest to serialize an
access to the controller IO by taking the IO mutex in the spi_setup()
callback. The mutex is held while there is an SPI communication going on
on the SPI-bus of the corresponding SPI-controller. So calling the
spi_setup() method and disabling/updating the CS state within it would be
safe while there is no any SPI-transfers being executed. Also note I
suppose it would be safer to protect the spi_controller->setup() callback
invocation too, seeing some of the SPI-controller drivers update a HW
state in there.
Alan Stern [Thu, 19 Nov 2020 17:02:28 +0000 (12:02 -0500)]
USB: core: Change %pK for __user pointers to %px
Commit 2f964780c03b ("USB: core: replace %p with %pK") used the %pK
format specifier for a bunch of __user pointers. But as the 'K' in
the specifier indicates, it is meant for kernel pointers. The reason
for the %pK specifier is to avoid leaks of kernel addresses, but when
the pointer is to an address in userspace the security implications
are minimal. In particular, no kernel information is leaked.
This patch changes the __user %pK specifiers (used in a bunch of
debugging output lines) to %px, which will always print the actual
address with no mangling. (Notably, there is no printk format
specifier particularly intended for __user pointers.)
Alan Stern [Thu, 19 Nov 2020 17:00:40 +0000 (12:00 -0500)]
USB: core: Fix regression in Hercules audio card
Commit 3e4f8e21c4f2 ("USB: core: fix check for duplicate endpoints")
aimed to make the USB stack more reliable by detecting and skipping
over endpoints that are duplicated between interfaces. This caused a
regression for a Hercules audio card (reported as Bugzilla #208357),
which contains such non-compliant duplications. Although the
duplications are harmless, skipping the valid endpoints prevented the
device from working.
This patch fixes the regression by adding ENDPOINT_IGNORE quirks for
the Hercules card, telling the kernel to ignore the invalid duplicate
endpoints and thereby allowing the valid endpoints to be used as
intended.
penghao [Wed, 18 Nov 2020 12:30:39 +0000 (20:30 +0800)]
USB: quirks: Add USB_QUIRK_DISCONNECT_SUSPEND quirk for Lenovo A630Z TIO built-in usb-audio card
Add a USB_QUIRK_DISCONNECT_SUSPEND quirk for the Lenovo TIO built-in
usb-audio. when A630Z going into S3,the system immediately wakeup 7-8
seconds later by usb-audio disconnect interrupt to avoids the issue.
eg dmesg:
....
[ 626.974091 ] usb 7-1.1: USB disconnect, device number 3
....
....
[ 1774.486691] usb 7-1.1: new full-speed USB device number 5 using xhci_hcd
[ 1774.947742] usb 7-1.1: New USB device found, idVendor=17ef, idProduct=a012, bcdDevice= 0.55
[ 1774.956588] usb 7-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1774.964339] usb 7-1.1: Product: Thinkcentre TIO24Gen3 for USB-audio
[ 1774.970999] usb 7-1.1: Manufacturer: Lenovo
[ 1774.975447] usb 7-1.1: SerialNumber: 000000000000
[ 1775.048590] usb 7-1.1: 2:1: cannot get freq at ep 0x1
.......
Seeking a better fix, we've tried a lot of things, including:
- Check that the device's power/wakeup is disabled
- Check that remote wakeup is off at the USB level
- All the quirks in drivers/usb/core/quirks.c
e.g. USB_QUIRK_RESET_RESUME,
USB_QUIRK_RESET,
USB_QUIRK_IGNORE_REMOTE_WAKEUP,
USB_QUIRK_NO_LPM.
but none of that makes any difference.
There are no errors in the logs showing any suspend/resume-related issues.
When the system wakes up due to the modem, log-wise it appears to be a
normal resume.
Introduce a quirk to disable the port during suspend when the modem is
detected.
Merge tag 'phy-fixes-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy into usb-linus
Vinod writes:
phy: fixes for 5.10
Bunch of fixes for phy drivers:
*) USB phy incorrect clearing of bits
*) Tegra xusb dangling pointer
*) qcom-qmp null ptr initialization
*) cpcap-usb irq flags
*) intel kkembay kconfig depends
*) qualcomm OF dependency
*) mediatek typo
* tag 'phy-fixes-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
phy: mediatek: fix spelling mistake in Kconfig "veriosn" -> "version"
phy: qualcomm: Fix 28 nm Hi-Speed USB PHY OF dependency
phy: qualcomm: usb: Fix SuperSpeed PHY OF dependency
phy: intel: PHY_INTEL_KEEMBAY_EMMC should depend on ARCH_KEEMBAY
phy: cpcap-usb: Use IRQF_ONESHOT
phy: qcom-qmp: Initialize another pointer to NULL
phy: tegra: xusb: Fix dangling pointer on probe failure
phy: usb: Fix incorrect clearing of tca_drv_sel bit in SETUP reg for 7211
Magnus Karlsson [Fri, 20 Nov 2020 11:53:39 +0000 (12:53 +0100)]
xsk: Fix umem cleanup bug at socket destruct
Fix a bug that is triggered when a partially setup socket is
destroyed. For a fully setup socket, a socket that has been bound to a
device, the cleanup of the umem is performed at the end of the buffer
pool's cleanup work queue item. This has to be performed in a work
queue, and not in RCU cleanup, as it is doing a vunmap that cannot
execute in interrupt context. However, when a socket has only been
partially set up so that a umem has been created but the buffer pool
has not, the code erroneously directly calls the umem cleanup function
instead of using a work queue, and this leads to a BUG_ON() in
vunmap().
As there in this case is no buffer pool, we cannot use its work queue,
so we need to introduce a work queue for the umem and schedule this for
the cleanup. So in the case there is no pool, we are going to use the
umem's own work queue to schedule the cleanup. But if there is a
pool, the cleanup of the umem is still being performed by the pool's
work queue, as it is important that the umem is cleaned up after the
pool.
Marek Szyprowski [Thu, 19 Nov 2020 10:37:46 +0000 (11:37 +0100)]
interconnect: fix memory trashing in of_count_icc_providers()
of_count_icc_providers() function uses for_each_available_child_of_node()
helper to recursively check all the available nodes. This helper already
properly handles child nodes' reference count, so there is no need to do
it explicitly. Remove the excessive call to of_node_put(). This fixes
memory trashing when CONFIG_OF_DYNAMIC is enabled (for example
arm/multi_v7_defconfig).
Kailang Yang [Thu, 19 Nov 2020 09:04:21 +0000 (17:04 +0800)]
ALSA: hda/realtek - Fixed Dell AIO wrong sound tone
This platform only had one audio jack.
If it plugged speaker then replug with speaker or headset, the sound
tone will change to abnormal.
Headset Mic also can't record when this issue was happen.
Georgi Djakov [Wed, 18 Nov 2020 11:10:44 +0000 (13:10 +0200)]
interconnect: qcom: qcs404: Remove GPU and display RPM IDs
The following errors are noticed during boot on a QCS404 board:
[ 2.926647] qcom_icc_rpm_smd_send mas 6 error -6
[ 2.934573] qcom_icc_rpm_smd_send mas 8 error -6
These errors show when we try to configure the GPU and display nodes.
Since these particular nodes aren't supported on RPM and are purely
local, we should just change their mas_rpm_id to -1 to avoid any
requests being sent for these master IDs.
Georgi Djakov [Thu, 12 Nov 2020 10:51:40 +0000 (12:51 +0200)]
interconnect: qcom: msm8916: Remove rpm-ids from non-RPM nodes
Some nodes are incorrectly marked as RPM-controlled (they have RPM
master and slave ids assigned), but are actually controlled by the
application CPU instead. The RPM complains when we send requests for
resources that it can't control. Let's fix this by replacing the IDs,
with the default "-1" in which case no requests are sent.