Git Repo - linux.git/log

sctp: check asoc strreset_chunk in sctp_generate_reconf_event

A null pointer reference issue can be triggered when the response of a
stream reconf request arrives after the timer is triggered, such as:

  send Incoming SSN Reset Request --->
  CPU0:
   reconf timer is triggered,
   go to the handler code before hold sk lock
                            <--- reply with Outgoing SSN Reset Request
  CPU1:
   process Outgoing SSN Reset Request,
   and set asoc->strreset_chunk to NULL
  CPU0:
   continue the handler code, hold sk lock,
   and try to hold asoc->strreset_chunk, crash!

In Ying Xu's testing, the call trace is:

  [ ] BUG: kernel NULL pointer dereference, address: 0000000000000010
  [ ] RIP: 0010:sctp_chunk_hold+0xe/0x40 [sctp]
  [ ] Call Trace:
  [ ]  <IRQ>
  [ ]  sctp_sf_send_reconf+0x2c/0x100 [sctp]
  [ ]  sctp_do_sm+0xa4/0x220 [sctp]
  [ ]  sctp_generate_reconf_event+0xbd/0xe0 [sctp]
  [ ]  call_timer_fn+0x26/0x130

This patch is to fix it by returning from the timer handler if asoc
strreset_chunk is already set to NULL.

Fixes: 7b9438de0cd4 ("sctp: add stream reconf timer")
Reported-by: Ying Xu <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
"One fix for an information leak caused by copying a buffer to
userspace without checking for error first in the sr driver"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sr: Do not leak information in ioctl

Merge tag 'for-linus-5.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
"A simple cleanup patch and a refcount fix for Xen on Arm"

* tag 'for-linus-5.18-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
arm/xen: Fix some refcount leaks
xen: Convert kmap() to kmap_local_page()

Merge tag 'drm-fixes-2022-04-23' of git://anongit.freedesktop.org/drm/drm

Pull more drm fixes from Dave Airlie:
"Maarten was away, so Maxine stepped up and sent me the drm-fixes
  merge, so no point leaving it for another week.

  The big change is an OF revert around bridge/panels, it may have some
  driver fallout, but hopefully this revert gets them shook out in the
  next week easier.

  Otherwise it's a bunch of locking/refcounts across drivers, a radeon
  dma_resv logic fix and some raspberry pi panel fixes.

  panel:
   - revert of patch that broke panel/bridge issues

  dma-buf:
   - remove unused header file.

  amdgpu:
   - partial revert of locking change

  radeon:
   - fix dma_resv logic inversion

  panel:
   - pi touchscreen panel init fixes

  vc4:
   - build fix
   - runtime pm refcount fix

  vmwgfx:
   - refcounting fix"

* tag 'drm-fixes-2022-04-23' of git://anongit.freedesktop.org/drm/drm:
  drm/amdgpu: partial revert "remove ctx->lock" v2
  Revert "drm: of: Lookup if child node has panel or bridge"
  Revert "drm: of: Properly try all possible cases for bridge/panel detection"
  drm/vc4: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage
  drm/vmwgfx: Fix gem refcounting and memory evictions
  drm/vc4: Fix build error when CONFIG_DRM_VC4=y && CONFIG_RASPBERRYPI_FIRMWARE=m
  drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare
  drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised
  dma-buf-map: remove renamed header file
  drm/radeon: fix logic inversion in radeon_sync_resv

Merge tag 'input-for-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

- a new set of keycodes to be used by marine navigation systems

- minor fixes to omap4-keypad and cypress-sf drivers

* tag 'input-for-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: add Marine Navigation Keycodes
  Input: omap4-keypad - fix pm_runtime_get_sync() error checking
  Input: cypress-sf - register a callback to disable the regulators

Merge tag 'block-5.18-2022-04-22' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Just two small regression fixes for bcache"

* tag 'block-5.18-2022-04-22' of git://git.kernel.dk/linux-block:
bcache: fix wrong bdev parameter when calling bio_alloc_clone() in do_bio_hook()
bcache: put bch_bio_map() back to correct location in journal_write_unlocked()

Merge tag 'io_uring-5.18-2022-04-22' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"Just two small fixes - one fixing a potential leak for the iovec for
  larger requests added in this cycle, and one fixing a theoretical leak
  with CQE_SKIP and IOPOLL"

* tag 'io_uring-5.18-2022-04-22' of git://git.kernel.dk/linux-block:
  io_uring: fix leaks on IOPOLL and CQE_SKIP
  io_uring: free iovec if file assignment fails

Merge tag 'perf-tools-fixes-for-v5.18-2022-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

- Fix header include for LLVM >= 14 when building with libclang.

- Allow access to 'data_src' for auxtrace in 'perf script' with ARM SPE
   perf.data files, fixing processing data with such attributes.

- Fix error message for test case 71 ("Convert perf time to TSC") on
   s390, where it is not supported.

* tag 'perf-tools-fixes-for-v5.18-2022-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf test: Fix error message for test case 71 on s390, where it is not supported
  perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event
  perf script: Always allow field 'data_src' for auxtrace
  perf clang: Fix header include for LLVM >= 14

sparc: cacheflush_32.h needs struct page

Add a struct page forward declaration to cacheflush_32.h.
Fixes this build warning:

    CC      drivers/crypto/xilinx/zynqmp-sha.o
  In file included from arch/sparc/include/asm/cacheflush.h:11,
                   from include/linux/cacheflush.h:5,
                   from drivers/crypto/xilinx/zynqmp-sha.c:6:
  arch/sparc/include/asm/cacheflush_32.h:38:37: warning: 'struct page' declared inside parameter list will not be visible outside of this definition or declaration
     38 | void sparc_flush_page_to_ram(struct page *page);

Exposed by commit 0e03b8fd2936 ("crypto: xilinx - Turn SHA into a
tristate and allow COMPILE_TEST") but not Fixes: that commit because the
underlying problem is older.

Signed-off-by: Randy Dunlap <[email protected]>
Reported-by: kernel test robot <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: [email protected]
Acked-by: Sam Ravnborg <[email protected]>
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

topology: Fix up build warning in topology_is_visible()

Commit aa63a74d4535 ("topology/sysfs: Hide PPIN on systems that do not
support it.") caused a build warning on some configurations:

drivers/base/topology.c: In function 'topology_is_visible':
drivers/base/topology.c:158:24: warning: unused variable 'dev' [-Wunused-variable]
158 | struct device *dev = kobj_to_dev(kobj);

Fix this up by getting rid of the variable entirely.

Fixes: aa63a74d4535 ("topology/sysfs: Hide PPIN on systems that do not support it.")
Cc: Tony Luck <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Merge tag 'usb-serial-5.18-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB-serial fixes for 5.18-rc4

Here's a fix for a potential overflow issue in the whiteheat driver when
using the old ARM ABI.

Included are also some new modem device ids.

All have been in linux-next with no reported issues.

* tag 'usb-serial-5.18-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
  USB: serial: whiteheat: fix heap overflow in WHITEHEAT_GET_DTR_RTS
  USB: serial: cp210x: add PIDs for Kamstrup USB Meter Reader
  USB: serial: option: add support for Cinterion MV32-WA/MV32-WB
  USB: serial: option: add Telit 0x1057, 0x1058, 0x1075 compositions

Merge tag 'drm-misc-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

Two fixes for the raspberrypi panel initialisation, one fix for a logic
inversion in radeon, a build and pm refcounting fix for vc4, two reverts
for drm_of_get_bridge that caused a number of regression and a locking
regression for amdgpu.

Signed-off-by: Dave Airlie <[email protected]>
From: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20220422084403.2xrhf3jusdej5yo4@houat

riscv: dts: microchip: reparent mpfs clocks

The 600M clock in the fabric is not the real reference, replace it with
a 125M clock which is the correct value for the icicle kit. Rename the
msspllclk node to mssrefclk since this is now the input to, not the
output of, the msspll clock. Control of the msspll clock has been moved
into the clock configurator, so add the register range for it to the clk
configurator. Finally, add a new output of the clock config block which
will provide the 1M reference clock for the MTIMER and the rtc.

Fixes: 528a5b1f2556 ("riscv: dts: microchip: add new peripherals to icicle kit device tree")
Fixes: 0fa6107eca41 ("RISC-V: Initial DTS for Microchip ICICLE board")
Reviewed-by: Daire McNamara <[email protected]>
Signed-off-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

clk: microchip: mpfs: add RTCREF clock control

The reference clock used by the PolarFire SoC's onboard rtc was missing
from the clock driver. Add this clock at the "config" clock level, with
the external reference clock as its parent.

Fixes: 635e5e73370e ("clk: microchip: Add driver for Microchip PolarFire SoC")
Reviewed-by: Daire McNamara <[email protected]>
Signed-off-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

clk: microchip: mpfs: re-parent the configurable clocks

Currently the mpfs clock driver uses a reference clock called the
"msspll", set in the device tree, as the parent for the cpu/axi/ahb
(config) clocks. The frequency of the msspll is determined by the FPGA
bitstream & the bootloader configures the clock to match the bitstream.
The real reference is provided by a 100 or 125 MHz off chip oscillator.

However, the msspll clock is not actually the parent of all clocks on
the system - the reference clock for the rtc/mtimer actually has the
off chip oscillator as its parent.

In order to fix this, add support for reading the configuration of the
msspll & reparent the "config" clocks so that they are derived from
this clock rather than the reference in the device tree.

Fixes: 635e5e73370e ("clk: microchip: Add driver for Microchip PolarFire SoC")
Reviewed-by: Daire McNamara <[email protected]>
Signed-off-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

dt-bindings: rtc: add refclk to mpfs-rtc

The rtc on PolarFire SoC does not use the AHB clock as its reference
frequency, but rather a 1 MHz refclk that it shares with MTIMER. Add
this second clock to the binding as a required property.

Fixes: 4cbcc0d7b397 ("dt-bindings: rtc: add bindings for microchip mpfs rtc")
Reviewed-by: Daire McNamara <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

dt-bindings: clk: mpfs: add defines for two new clocks

The RTC reference and MSSPLL were previously not documented or defined,
as they were unused. Add their defines to the PolarFire SoC header.

Fixes: 2145bb687e3f ("dt-bindings: clk: microchip: Add Microchip PolarFire host binding")
Reviewed-by: Daire McNamara <[email protected]>
Signed-off-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Krzysztof Kozlowski <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

dt-bindings: clk: mpfs document msspll dri registers

As there are two sections of registers that are responsible for clock
configuration on the PolarFire SoC: add the dynamic reconfiguration
interface section to the binding & describe what each of the sections
are used for.

Fixes: 2145bb687e3f ("dt-bindings: clk: microchip: Add Microchip PolarFire host binding")
Reviewed-by: Daire McNamara <[email protected]>
Acked-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

riscv: dts: microchip: fix usage of fic clocks on mpfs

The fic clocks passed to the pcie controller and other peripherals in
the device tree are not the clocks they actually run on. The fics are
actually clock domain crossers & the clock config blocks output is the
mss/cpu side input to the interconnect. The peripherals are actually
clocked by fixed frequency clocks embedded in the fpga fabric.

Fix the device tree so that these peripherals use the correct clocks.
The fabric side FIC0 & FIC1 inputs both use the same 125 MHz, so only
one clock is created for them.

Fixes: 528a5b1f2556 ("riscv: dts: microchip: add new peripherals to icicle kit device tree")
Reviewed-by: Daire McNamara <[email protected]>
Signed-off-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

clk: microchip: mpfs: mark CLK_ATHENA as critical

CLK_ATHENA is another fabric interconnect and should be marked as critical
as with FIC0-3, since disabling it will cause part of the fabric to go
into reset.

Fixes: 635e5e73370e ("clk: microchip: Add driver for Microchip PolarFire SoC")
Reviewed-by: Daire McNamara <[email protected]>
Signed-off-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

clk: microchip: mpfs: fix parents for FIC clocks

The fabric interconnects are on the AXI bus not AHB.
Update their parent clocks to fix this.

Fixes: 635e5e73370e ("clk: microchip: Add driver for Microchip PolarFire SoC")
Reviewed-by: Daire McNamara <[email protected]>
Signed-off-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
"Fix some syzbot-detected bugs, as well as other bugs found by I/O
  injection testing.

  Change ext4's fallocate to consistently drop set[ug]id bits when an
  fallocate operation might possibly change the user-visible contents of
  a file.

  Also, improve handling of potentially invalid values in the the
  s_overhead_cluster superblock field to avoid ext4 returning a negative
  number of free blocks"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: fix a potential race while discarding reserved buffers after an abort
  ext4: update the cached overhead value in the superblock
  ext4: force overhead calculation if the s_overhead_cluster makes no sense
  ext4: fix overhead calculation to account for the reserved gdt blocks
  ext4, doc: fix incorrect h_reserved size
  ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
  ext4: fix use-after-free in ext4_search_dir
  ext4: fix bug_on in start_this_handle during umount filesystem
  ext4: fix symlink file size not match to file content
  ext4: fix fallocate to use file_modified to update permissions consistently

Merge tag 'ata-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ATA fix from Damien Le Moal:
"A single fix to avoid a NULL pointer dereference in the pata_marvell
driver with adapters not supporting DMA, from Zheyu"

* tag 'ata-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_marvell: Check the 'bmdma_addr' beforing reading

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"The main and larger change here is a workaround for AMD's lack of
  cache coherency for encrypted-memory guests.

  I have another patch pending, but it's waiting for review from the
  architecture maintainers.

  RISC-V:

   - Remove 's' & 'u' as valid ISA extension

   - Do not allow disabling the base extensions 'i'/'m'/'a'/'c'

  x86:

   - Fix NMI watchdog in guests on AMD

   - Fix for SEV cache incoherency issues

   - Don't re-acquire SRCU lock in complete_emulated_io()

   - Avoid NULL pointer deref if VM creation fails

   - Fix race conditions between APICv disabling and vCPU creation

   - Bugfixes for disabling of APICv

   - Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume

  selftests:

   - Do not use bitfields larger than 32-bits, they differ between GCC
     and clang"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: selftests: introduce and use more page size-related constants
  kvm: selftests: do not use bitfields larger than 32-bits for PTEs
  KVM: SEV: add cache flush to solve SEV cache incoherency issues
  KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs
  KVM: SVM: Simplify and harden helper to flush SEV guest page(s)
  KVM: selftests: Silence compiler warning in the kvm_page_table_test
  KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog
  x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
  KVM: SPDX style and spelling fixes
  KVM: x86: Skip KVM_GUESTDBG_BLOCKIRQ APICv update if APICv is disabled
  KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race
  KVM: nVMX: Defer APICv updates while L2 is active until L1 is active
  KVM: x86: Tag APICv DISABLE inhibit, not ABSENT, if APICv is disabled
  KVM: Initialize debugfs_dentry when a VM is created to avoid NULL deref
  KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused
  KVM: RISC-V: Use kvm_vcpu.srcu_idx, drop RISC-V's unnecessary copy
  KVM: x86: Don't re-acquire SRCU lock in complete_emulated_io()
  RISC-V: KVM: Restrict the extensions that can be disabled
  RISC-V: KVM: Remove 's' & 'u' as valid ISA extension

net: ethernet: stmmac: fix write to sgmii_adapter_base

I made a mistake with the commit a6aaa0032424 ("net: ethernet: stmmac:
fix altr_tse_pcs function when using a fixed-link"). I should have
tested against both scenario of having a SGMII interface and one
without.

Without the SGMII PCS TSE adpater, the sgmii_adapter_base address is
NULL, thus a write to this address will fail.

Cc: [email protected]
Fixes: a6aaa0032424 ("net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link")
Signed-off-by: Dinh Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'wireguard-patches-for-5-18-rc4'

Jason A. Donenfeld says:

====================
wireguard patches for 5.18-rc4

Here are two small wireguard fixes for 5.18-rc4:

1) We enable ACPI in the QEMU test harness, so that multiple CPUs are
   actually used on x86 for testing for races.

2) Sending skbs with metadata dsts attached resulted in a null pointer
   dereference, triggerable from executing eBPF programs. The fix is a
   oneliner, changing a skb_dst() null check into a skb_valid_dst()
   boolean check.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

wireguard: device: check for metadata_dst with skb_valid_dst()

When we try to transmit an skb with md_dst attached through wireguard
we hit a null pointer dereference in wg_xmit() due to the use of
dst_mtu() which calls into dst_blackhole_mtu() which in turn tries to
dereference dst->dev.

Since wireguard doesn't use md_dsts we should use skb_valid_dst(), which
checks for DST_METADATA flag, and if it's set, then falls back to
wireguard's device mtu. That gives us the best chance of transmitting
the packet; otherwise if the blackhole netdev is used we'd get
ETH_MIN_MTU.

[  263.693506] BUG: kernel NULL pointer dereference, address: 00000000000000e0
[  263.693908] #PF: supervisor read access in kernel mode
[  263.694174] #PF: error_code(0x0000) - not-present page
[  263.694424] PGD 0 P4D 0
[  263.694653] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  263.694876] CPU: 5 PID: 951 Comm: mausezahn Kdump: loaded Not tainted 5.18.0-rc1+ #522
[  263.695190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
[  263.695529] RIP: 0010:dst_blackhole_mtu+0x17/0x20
[  263.695770] Code: 00 00 00 0f 1f 44 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 47 10 48 83 e0 fc 8b 40 04 85 c0 75 09 48 8b 07 <8b> 80 e0 00 00 00 c3 66 90 0f 1f 44 00 00 48 89 d7 be 01 00 00 00
[  263.696339] RSP: 0018:ffffa4a4422fbb28 EFLAGS: 00010246
[  263.696600] RAX: 0000000000000000 RBX: ffff8ac9c3553000 RCX: 0000000000000000
[  263.696891] RDX: 0000000000000401 RSI: 00000000fffffe01 RDI: ffffc4a43fb48900
[  263.697178] RBP: ffffa4a4422fbb90 R08: ffffffff9622635e R09: 0000000000000002
[  263.697469] R10: ffffffff9b69a6c0 R11: ffffa4a4422fbd0c R12: ffff8ac9d18b1a00
[  263.697766] R13: ffff8ac9d0ce1840 R14: ffff8ac9d18b1a00 R15: ffff8ac9c3553000
[  263.698054] FS:  00007f3704c337c0(0000) GS:ffff8acaebf40000(0000) knlGS:0000000000000000
[  263.698470] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  263.698826] CR2: 00000000000000e0 CR3: 0000000117a5c000 CR4: 00000000000006e0
[  263.699214] Call Trace:
[  263.699505]  <TASK>
[  263.699759]  wg_xmit+0x411/0x450
[  263.700059]  ? bpf_skb_set_tunnel_key+0x46/0x2d0
[   263.700382]  ? dev_queue_xmit_nit+0x31/0x2b0
[  263.700719]  dev_hard_start_xmit+0xd9/0x220
[  263.701047]  __dev_queue_xmit+0x8b9/0xd30
[  263.701344]  __bpf_redirect+0x1a4/0x380
[  263.701664]  __dev_queue_xmit+0x83b/0xd30
[  263.701961]  ? packet_parse_headers+0xb4/0xf0
[  263.702275]  packet_sendmsg+0x9a8/0x16a0
[  263.702596]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[  263.702933]  sock_sendmsg+0x5e/0x60
[  263.703239]  __sys_sendto+0xf0/0x160
[  263.703549]  __x64_sys_sendto+0x20/0x30
[  263.703853]  do_syscall_64+0x3b/0x90
[  263.704162]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  263.704494] RIP: 0033:0x7f3704d50506
[  263.704789] Code: 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
[  263.705652] RSP: 002b:00007ffe954b0b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[  263.706141] RAX: ffffffffffffffda RBX: 0000558bb259b490 RCX: 00007f3704d50506
[  263.706544] RDX: 000000000000004a RSI: 0000558bb259b7b2 RDI: 0000000000000003
[  263.706952] RBP: 0000000000000000 R08: 00007ffe954b0b90 R09: 0000000000000014
[  263.707339] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe954b0b90
[  263.707735] R13: 000000000000004a R14: 0000558bb259b7b2 R15: 0000000000000001
[  263.708132]  </TASK>
[  263.708398] Modules linked in: bridge netconsole bonding [last unloaded: bridge]
[  263.708942] CR2: 00000000000000e0

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Link: https://github.com/cilium/cilium/issues/19428
Reported-by: Martynas Pumputis <[email protected]>
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

wireguard: selftests: enable ACPI for SMP

It turns out that by having CONFIG_ACPI=n, we've been failing to boot
additional CPUs, and so these systems were functionally UP. The code
bloat is unfortunate for build times, but I don't see an alternative. So
this commit sets CONFIG_ACPI=y for x86_64 and i686 configs.

Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

tcp: ensure to use the most recently sent skb when filling the rate sample

If an ACK (s)acks multiple skbs, we favor the information
from the most recently sent skb by choosing the skb with
the highest prior_delivered count. But in the interval
between receiving ACKs, we send multiple skbs with the same
prior_delivered, because the tp->delivered only changes
when we receive an ACK.

We used RACK's solution, copying tcp_rack_sent_after() as
tcp_skb_sent_after() helper to determine "which packet was
sent last?". Later, we will use tcp_skb_sent_after() instead
in RACK.

Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection")
Signed-off-by: Pengcheng Yang <[email protected]>
Cc: Paolo Abeni <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Tested-by: Neal Cardwell <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: realtek: remove realtek,rtl8367s string

There is no need to add new compatible strings for each new supported
chip version. The compatible string is used only to select the subdriver
(rtl8365mb.c or rtl8366rb.c). Once in the subdriver, it will detect the
chip model by itself, ignoring which compatible string was used.

Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Luiz Angelo Daros de Luca <[email protected]>
Reviewed-by: Alvin Šipraga <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Acked-by: Arınç ÜNAL <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

dt-bindings: net: dsa: realtek: cleanup compatible strings

Compatible strings are used to help the driver find the chip ID/version
register for each chip family. After that, the driver can setup the
switch accordingly. Keep only the first supported model for each family
as a compatible string and reference other chip models in the
description.

The removed compatible strings have never been used in a released kernel.

Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Luiz Angelo Daros de Luca <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Acked-by: Arınç ÜNAL <[email protected]>
Reviewed-by: Alvin Šipraga <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: bridge: switchdev: check br_vlan_group() return value

br_vlan_group() can return NULL and thus return value must be checked
to avoid dereferencing a NULL pointer.

Fixes: 6284c723d9b9 ("net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations")
Signed-off-by: Clément Léger <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

pinctrl: stm32: Keep pinctrl block clock enabled when LEVEL IRQ requested

The current EOI handler for LEVEL triggered interrupts calls clk_enable(),
register IO, clk_disable(). The clock manipulation requires locking which
happens with IRQs disabled in clk_enable_lock(). Instead of turning the
clock on and off all the time, enable the clock in case LEVEL interrupt is
requested and keep the clock enabled until all LEVEL interrupts are freed.
The LEVEL interrupts are an exception on this platform and seldom used, so
this does not affect the common case.

This simplifies the LEVEL interrupt handling considerably and also fixes
the following splat found when using preempt-rt:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at kernel/locking/rtmutex.c:2040 __rt_mutex_trylock+0x37/0x62
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.109-rt65-stable-standard-00068-g6a5afc4b1217 #85
Hardware name: STM32 (Device Tree Support)
[<c010a45d>] (unwind_backtrace) from [<c010766f>] (show_stack+0xb/0xc)
[<c010766f>] (show_stack) from [<c06353ab>] (dump_stack+0x6f/0x84)
[<c06353ab>] (dump_stack) from [<c01145e3>] (__warn+0x7f/0xa4)
[<c01145e3>] (__warn) from [<c063386f>] (warn_slowpath_fmt+0x3b/0x74)
[<c063386f>] (warn_slowpath_fmt) from [<c063b43d>] (__rt_mutex_trylock+0x37/0x62)
[<c063b43d>] (__rt_mutex_trylock) from [<c063c053>] (rt_spin_trylock+0x7/0x16)
[<c063c053>] (rt_spin_trylock) from [<c036a2f3>] (clk_enable_lock+0xb/0x80)
[<c036a2f3>] (clk_enable_lock) from [<c036ba69>] (clk_core_enable_lock+0x9/0x18)
[<c036ba69>] (clk_core_enable_lock) from [<c034e9f3>] (stm32_gpio_get+0x11/0x24)
[<c034e9f3>] (stm32_gpio_get) from [<c034ef43>] (stm32_gpio_irq_trigger+0x1f/0x48)
[<c034ef43>] (stm32_gpio_irq_trigger) from [<c014aa53>] (handle_fasteoi_irq+0x71/0xa8)
[<c014aa53>] (handle_fasteoi_irq) from [<c0147111>] (generic_handle_irq+0x19/0x22)
[<c0147111>] (generic_handle_irq) from [<c014752d>] (__handle_domain_irq+0x55/0x64)
[<c014752d>] (__handle_domain_irq) from [<c0346f13>] (gic_handle_irq+0x53/0x64)
[<c0346f13>] (gic_handle_irq) from [<c0100ba5>] (__irq_svc+0x65/0xc0)
Exception stack(0xc0e01f18 to 0xc0e01f60)
1f00: 0000300c 00000000
1f20: 0000300c c010ff01 00000000 00000000 c0e00000 c0e07714 00000001 c0e01f78
1f40: c0e07758 00000000 ef7cd0ff c0e01f68 c010554b c0105542 40000033 ffffffff
[<c0100ba5>] (__irq_svc) from [<c0105542>] (arch_cpu_idle+0xc/0x1e)
[<c0105542>] (arch_cpu_idle) from [<c063be95>] (default_idle_call+0x21/0x3c)
[<c063be95>] (default_idle_call) from [<c01324f7>] (do_idle+0xe3/0x1e4)
[<c01324f7>] (do_idle) from [<c01327b3>] (cpu_startup_entry+0x13/0x14)
[<c01327b3>] (cpu_startup_entry) from [<c0a00c13>] (start_kernel+0x397/0x3d4)
[<c0a00c13>] (start_kernel) from [<00000000>] (0x0)
---[ end trace 0000000000000002 ]---

Power consumption measured on STM32MP157C DHCOM SoM is not increased or
is below noise threshold.

Fixes: 47beed513a85b ("pinctrl: stm32: Add level interrupt support to gpio irq chip")
Signed-off-by: Marek Vasut <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Fabien Dessenne <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: [email protected]
Cc: [email protected]
To: [email protected]
Reviewed-by: Fabien Dessenne <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

tcp: md5: incorrect tcp_header_len for incoming connections

In tcp_create_openreq_child we adjust tcp_header_len for md5 using the
remote address in newsk. But that address is still 0 in newsk at this
point, and it is only set later by the callers (tcp_v[46]_syn_recv_sock).
Use the address from the request socket instead.

Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.")
Signed-off-by: Francesco Ruggeri <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

perf test: Fix error message for test case 71 on s390, where it is not supported

Test case 71 'Convert perf time to TSC' is not supported on s390.

Subtest 71.1 is skipped with the correct message, but subtest 71.2 is
not skipped and fails.

The root cause is function evlist__open() called from
test__perf_time_to_tsc().  evlist__open() returns -ENOENT because the
event cycles:u is not supported by the selected PMU, for example
platform s390 on z/VM or an x86_64 virtual machine.

The PMU driver returns -ENOENT in this case. This error is leads to the
failure.

Fix this by returning TEST_SKIP on -ENOENT.

Output before:
71: Convert perf time to TSC:
71.1: TSC support:             Skip (This architecture does not support)
71.2: Perf time to TSC:        FAILED!

Output after:
71: Convert perf time to TSC:
71.1: TSC support:             Skip (This architecture does not support)
71.2: Perf time to TSC:        Skip (perf_read_tsc_conversion is not supported)

This also happens on an x86_64 virtual machine:
   # uname -m
   x86_64
   $ ./perf test -F 71
    71: Convert perf time to TSC  :
    71.1: TSC support             : Ok
    71.2: Perf time to TSC        : FAILED!
   $

Committer testing:

Continues to work on x86_64:

  $ perf test 71
   71: Convert perf time to TSC    :
   71.1: TSC support               : Ok
   71.2: Perf time to TSC          : Ok
  $

Fixes: 290fa68bdc458863 ("perf test tsc: Fix error message when not supported")
Signed-off-by: Thomas Richter <[email protected]>
Acked-by: Sumanth Korikkar <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Chengdong Li <[email protected]>
Cc: [email protected]
Cc: Heiko Carstens <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event

Since commit bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem
info is not available") "perf mem report" and "perf report --mem-mode"
don't report result if the PERF_SAMPLE_DATA_SRC bit is missed in sample
type.

The commit ffab487052054162 ("perf: arm-spe: Fix perf report
--mem-mode") partially fixes the issue. It adds PERF_SAMPLE_DATA_SRC
bit for Arm SPE event, this allows the perf data file generated by
kernel v5.18-rc1 or later version can be reported properly.

On the other hand, perf tool still fails to be backward compatibility
for a data file recorded by an older version's perf which contains Arm
SPE trace data. This patch is a workaround in reporting phase, when
detects ARM SPE PMU event and without PERF_SAMPLE_DATA_SRC bit, it will
force to set the bit in the sample type and give a warning info.

Fixes: bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem info is not available")
Reviewed-by: James Clark <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Tested-by: German Gomez <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf script: Always allow field 'data_src' for auxtrace

If use command 'perf script -F,+data_src' to dump memory samples with
Arm SPE trace data, it reports error:

# perf script -F,+data_src
Samples for 'dummy:u' event do not have DATA_SRC attribute set. Cannot print 'data_src' field.

This is because the 'dummy:u' event is absent DATA_SRC bit in its sample
type, so if a file contains AUX area tracing data then always allow
field 'data_src' to be selected as an option for perf script.

Fixes: e55ed3423c1bb29f ("perf arm-spe: Synthesize memory event")
Signed-off-by: Leo Yan <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf clang: Fix header include for LLVM >= 14

The header TargetRegistry.h has moved in LLVM/clang 14.

Committer notes:

The problem as noticed when building in ubuntu:22.04:

    90    98.61 ubuntu:22.04                  : FAIL gcc version 11.2.0 (Ubuntu 11.2.0-19ubuntu1)
      util/c++/clang.cpp:23:10: fatal error: llvm/Support/TargetRegistry.h: No such file or directory
         23 | #include "llvm/Support/TargetRegistry.h"
            |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      compilation terminated.

Fixed after applying this patch.

Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Guilherme Amadio <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Link: https://twitter.com/GuilhermeAmadio/status/1514970524232921088
Link: http://lore.kernel.org/lkml/Ylp0M/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

pinctrl: rockchip: sort the rk3308_mux_recalced_data entries

All the entries are sorted according to num/pin except for two
entries. Sort them too.

Signed-off-by: Luca Ceresoli <[email protected]>
Reviewed-by: Heiko Stuebner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

pinctrl: rockchip: fix RK3308 pinmux bits

Some of the pinmuxing bits described in rk3308_mux_recalced_data are wrong,
pointing to non-existing registers.

Fix the entire table.

Also add a comment in front of each entry with the same string that appears
in the datasheet to make the table easier to compare with the docs.

This fix has been tested on real hardware for the gpio3b3_sel entry.

Fixes: 7825aeb7b208 ("pinctrl: rockchip: add rk3308 SoC support")
Signed-off-by: Luca Ceresoli <[email protected]>
Reviewed-by: Heiko Stuebner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

Merge tag 'samsung-pinctrl-fixes-5.18' of https://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/samsung into fixes

Samsung pinctrl drivers fixes for v5.18

1. Fix sparse warning introduced in v5.18-rc1.
2. Fix possible unmet Kconfig dependency with COMPILE_TEST, present
since v4.3 or earlier.

gpio: Request interrupts after IRQ is initialized

Commit 5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members
before initialization") attempted to fix a race condition that lead to a
NULL pointer, but in the process caused a regression for _AEI/_EVT
declared GPIOs.

This manifests in messages showing deferred probing while trying to
allocate IRQs like so:

  amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x0000 to IRQ, err -517
  amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x002C to IRQ, err -517
  amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x003D to IRQ, err -517
  [ .. more of the same .. ]

The code for walking _AEI doesn't handle deferred probing and so this
leads to non-functional GPIO interrupts.

Fix this issue by moving the call to `acpi_gpiochip_request_interrupts`
to occur after gc->irc.initialized is set.

Fixes: 5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members before initialization")
Link: https://lore.kernel.org/linux-gpio/BL1PR12MB51577A77F000A008AA694675E2EF9@BL1PR12MB5157.namprd12.prod.outlook.com/
Link: https://bugzilla.suse.com/show_bug.cgi?id=1198697
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215850
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1979
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1976
Reported-by: Mario Limonciello <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Reviewed-by: Shreeya Patel <[email protected]>
Tested-By: Samuel Čavoj <[email protected]>
Tested-By: [email protected] Link:
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: Linus Walleij <[email protected]>
Reviewed-and-tested-by: Takashi Iwai <[email protected]>
Cc: Shreeya Patel <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'riscv-for-linus-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes Palmer Dabbelt:

- A pair of build fixes for the recent cpuidle driver

- A fix for systems without sv57 that manifests as a crash
   early in boot

* tag 'riscv-for-linus-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: cpuidle: fix Kconfig select for RISCV_SBI_CPUIDLE
  RISC-V: mm: Fix set_satp_mode() for platform not having Sv57
  cpuidle: riscv: support non-SMP config

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
"There's no real pattern to the fixes, but the main one fixes our
  pmd_leaf() definition to resolve a NULL dereference on the migration
  path.

   - Fix PMU event validation in the absence of any event counters

   - Fix allmodconfig build using clang in conjunction with binutils

   - Fix definitions of pXd_leaf() to handle PROT_NONE entries

   - More typo fixes"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: fix p?d_leaf()
  arm64: fix typos in comments
  arm64: Improve HAVE_DYNAMIC_FTRACE_WITH_REGS selection for clang
  arm_pmu: Validate single/group leader events

arm/xen: Fix some refcount leaks

The of_find_compatible_node() function returns a node pointer with
refcount incremented, We should use of_node_put() on it when done
Add the missing of_node_put() to release the refcount.

Fixes: 9b08aaa3199a ("ARM: XEN: Move xen_early_init() before efi_init()")
Fixes: b2371587fe0c ("arm/xen: Read extended regions from DT and init Xen resource")
Signed-off-by: Miaoqian Lin <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>

Merge tag 'xarray-5.18a' of git://git.infradead.org/users/willy/xarray

Pull xarray fixes from Matthew Wilcox:
"Syzbot found a nasty race between large page splitting and page
  lookup. Details in the commit log, but fortunately it has a reliable
  reproducer. I thought it better to send this one to you straight away.

  Also fix the test suite build for kmem_cache_alloc_lru()"

* tag 'xarray-5.18a' of git://git.infradead.org/users/willy/xarray:
  XArray: Disallow sibling entries of nodes
  tools: Add kmem_cache_alloc_lru()

Merge tag '5.18-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"Four fixes, two of them for stable:

   - fcollapse fix

   - reconnect lock fix

   - DFS oops fix

   - minor cleanup patch"

* tag '5.18-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: destage any unwritten data to the server before calling copychunk_write
  cifs: use correct lock type in cifs_reconnect()
  cifs: fix NULL ptr dereference in refresh_mounts()
  cifs: Use kzalloc instead of kmalloc/memset

Merge tag 'fs.fixes.v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull mount_setattr fix from Christian Brauner:
"The recent cleanup in e257039f0fc7 ("mount_setattr(): clean the
  control flow and calling conventions") switched the mount attribute
  codepaths from do-while to for loops as they are more idiomatic when
  walking mounts.

  However, we did originally choose do-while constructs because if we
  request a mount or mount tree to be made read-only we need to hold
  writers in the following way: The mount attribute code will grab
  lock_mount_hash() and then call mnt_hold_writers() which will
  _unconditionally_ set MNT_WRITE_HOLD on the mount.

  Any callers that need write access have to call mnt_want_write(). They
  will immediately see that MNT_WRITE_HOLD is set on the mount and the
  caller will then either spin (on non-preempt-rt) or wait on
  lock_mount_hash() (on preempt-rt).

  The fact that MNT_WRITE_HOLD is set unconditionally means that once
  mnt_hold_writers() returns we need to _always_ pair it with
  mnt_unhold_writers() in both the failure and success paths.

  The do-while constructs did take care of this. But Al's change to a
  for loop in the failure path stops on the first mount we failed to
  change mount attributes _without_ going into the loop to call
  mnt_unhold_writers().

  This in turn means that once we failed to make a mount read-only via
  mount_setattr() - i.e. there are already writers on that mount - we
  will block any writers indefinitely. Fix this by ensuring that the for
  loop always unsets MNT_WRITE_HOLD including the first mount we failed
  to change to read-only. Also sprinkle a few comments into the cleanup
  code to remind people about what is happening including myself. After
  all, I didn't catch it during review.

  This is only relevant on mainline and was reported by syzbot. Details
  about the syzbot reports are all in the commit message"

* tag 'fs.fixes.v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs: unset MNT_WRITE_HOLD on failure

Merge tag 'sound-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"At this time, the majority of changes are for pending ASoC fixes while
  a few usual HD-audio and USB-audio quirks are found.

  Almost all patches are small device-specific fixes, and nothing
  worrisome stands out, so far"

* tag 'sound-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (37 commits)
  ALSA: hda/realtek: Add quirk for Clevo NP70PNP
  ALSA: hda: intel-dsp-config: Add RaptorLake PCI IDs
  ALSA: hda/realtek: Enable mute/micmute LEDs and limit mic boost on EliteBook 845/865 G9
  ALSA: usb-audio: Clear MIDI port active flag after draining
  ALSA: usb-audio: add mapping for MSI MAG X570S Torpedo MAX.
  ALSA: hda/i915: Fix one too many pci_dev_put()
  ALSA: hda/hdmi: add HDMI codec VID for Raptorlake-P
  ALSA: hda/hdmi: fix warning about PCM count when used with SOF
  sound/oss/dmasound: fix 'dmasound_setup' defined but not used
  firmware: cs_dsp: Fix overrun of unterminated control name string
  ASoC: codecs: Fix an error handling path in (rx|tx|va)_macro_probe()
  ASoC: Intel: sof_es8336: Add a quirk for Huawei Matebook D15
  ASoC: Intel: sof_es8336: add a quirk for headset at mic1 port
  ASoC: Intel: sof_es8336: support a separate gpio to control headphone
  ASoC: Intel: sof_es8336: simplify speaker gpio naming
  ASoC: wm8731: Disable the regulator when probing fails
  ASoC: Intel: soc-acpi: correct device endpoints for max98373
  ASoC: codecs: wcd934x: do not switch off SIDO Buck when codec is in use
  ASoC: SOF: topology: Fix memory leak in sof_control_load()
  ASoC: SOF: topology: cleanup dailinks on widget unload
  ...

XArray: Disallow sibling entries of nodes

There is a race between xas_split() and xas_load() which can result in
the wrong page being returned, and thus data corruption.  Fortunately,
it's hard to hit (syzbot took three months to find it) and often guarded
with VM_BUG_ON().

The anatomy of this race is:

thread A thread B
order-9 page is stored at index 0x200
lookup of page at index 0x274
page split starts
load of sibling entry at offset 9
stores nodes at offsets 8-15
load of entry at offset 8

The entry at offset 8 turns out to be a node, and so we descend into it,
and load the page at index 0x234 instead of 0x274.  This is hard to fix
on the split side; we could replace the entire node that contains the
order-9 page instead of replacing the eight entries.  Fixing it on
the lookup side is easier; just disallow sibling entries that point
to nodes.  This cannot ever be a useful thing as the descent would not
know the correct offset to use within the new node.

The test suite continues to pass, but I have not added a new test for
this bug.

Reported-by: [email protected]
Tested-by: [email protected]
Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

tools: Add kmem_cache_alloc_lru()

Turn kmem_cache_alloc() into a wrapper around kmem_cache_alloc_lru().

Fixes: 9bbdc0f32409 ("xarray: use kmem_cache_alloc_lru to allocate xa_node")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reported-by: Liam R. Howlett <[email protected]>
Reported-by: Li Wang <[email protected]>

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"13 patches.

  Subsystems affected by this patch series: mm (memory-failure, memcg,
  userfaultfd, hugetlbfs, mremap, oom-kill, kasan, hmm), and kcov"

* emailed patches from Andrew Morton <[email protected]>:
  mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove()
  kcov: don't generate a warning on vm_insert_page()'s failure
  MAINTAINERS: add Vincenzo Frascino to KASAN reviewers
  oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup
  selftest/vm: add skip support to mremap_test
  selftest/vm: support xfail in mremap_test
  selftest/vm: verify remap destination address in mremap_test
  selftest/vm: verify mmap addr in mremap_test
  mm, hugetlb: allow for "high" userspace addresses
  userfaultfd: mark uffd_wp regardless of VM_WRITE flag
  memcg: sync flush only if periodic flush is delayed
  mm/memory-failure.c: skip huge_zero_page in memory_failure()
  mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()

mm/vmalloc: huge vmalloc backing pages should be split rather than compound

Huge vmalloc higher-order backing pages were allocated with __GFP_COMP
in order to allow the sub-pages to be refcounted by callers such as
"remap_vmalloc_page [sic]" (remap_vmalloc_range).

However a similar problem exists for other struct page fields callers
use, for example fb_deferred_io_fault() takes a vmalloc'ed page and
not only refcounts it but uses ->lru, ->mapping, ->index.

This is not compatible with compound sub-pages, and can cause bad page
state issues like

  BUG: Bad page state in process swapper/0  pfn:00743
  page:(____ptrval____) refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x743
  flags: 0x7ffff000000000(node=0|zone=0|lastcpupid=0x7ffff)
  raw: 007ffff000000000 c00c00000001d0c8 c00c00000001d0c8 0000000000000000
  raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: corrupted mapping in tail page
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc3-00082-gfc6fff4a7ce1-dirty #2810
  Call Trace:
    dump_stack_lvl+0x74/0xa8 (unreliable)
    bad_page+0x12c/0x170
    free_tail_pages_check+0xe8/0x190
    free_pcp_prepare+0x31c/0x4e0
    free_unref_page+0x40/0x1b0
    __vunmap+0x1d8/0x420
    ...

The correct approach is to use split high-order pages for the huge
vmalloc backing. These allow callers to treat them in exactly the same
way as individually-allocated order-0 pages.

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Nicholas Piggin <[email protected]>
Cc: Paul Menzel <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Rick Edgecombe <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook

xmit_check_hhlen() observes the dst for getting the device hard header
length to make sure a modified packet can fit. When a helper which changes
the dst - such as bpf_skb_set_tunnel_key() - is called as part of the
xmit program the accessed dst is no longer valid.

This leads to the following splat:

BUG: kernel NULL pointer dereference, address: 00000000000000de
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 798 Comm: ping Not tainted 5.18.0-rc2+ #103
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
RIP: 0010:bpf_xmit+0xfb/0x17f
Code: c6 c0 4d cd 8e 48 c7 c7 7d 33 f0 8e e8 42 09 fb ff 48 8b 45 58 48 8b 95 c8 00 00 00 48 2b 95 c0 00 00 00 48 83 e0 fe 48 8b 00 <0f> b7 80 de 00 00 00 39 c2 73 22 29 d0 b9 20 0a 00 00 31 d2 48 89
RSP: 0018:ffffb148c0bc7b98 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000240008 RCX: 0000000000000000
RDX: 0000000000000010 RSI: 00000000ffffffea RDI: 00000000ffffffff
RBP: ffff922a828a4e00 R08: ffffffff8f1350e8 R09: 00000000ffffdfff
R10: ffffffff8f055100 R11: ffffffff8f105100 R12: 0000000000000000
R13: ffff922a828a4e00 R14: 0000000000000040 R15: 0000000000000000
FS:  00007f414e8f0080(0000) GS:ffff922afdc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000de CR3: 0000000002d80006 CR4: 0000000000370ef0
Call Trace:
  <TASK>
  lwtunnel_xmit.cold+0x71/0xc8
  ip_finish_output2+0x279/0x520
  ? __ip_finish_output.part.0+0x21/0x130

Fix by fetching the device hard header length before running the BPF code.

Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure")
Signed-off-by: Eyal Birger <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

riscv: patch_text: Fixup last cpu should be master

These patch_text implementations are using stop_machine_cpuslocked
infrastructure with atomic cpu_count. The original idea: When the
master CPU patch_text, the others should wait for it. But current
implementation is using the first CPU as master, which couldn't
guarantee the remaining CPUs are waiting. This patch changes the
last CPU as the master to solve the potential risk.

Signed-off-by: Guo Ren <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Fixes: 043cb41a85de ("riscv: introduce interfaces to patch kernel code")
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

binder: Gracefully handle BINDER_TYPE_FDA objects with num_fds=0

Some android userspace is sending BINDER_TYPE_FDA objects with
num_fds=0. Like the previous patch, this is reproducible when
playing a video.

Before commit 09184ae9b575 BINDER_TYPE_FDA objects with num_fds=0
were 'correctly handled', as in no fixup was performed.

After commit 09184ae9b575 we aggregate fixup and skip regions in
binder_ptr_fixup structs and distinguish between the two by using
the skip_size field: if it's 0, then it's a fixup, otherwise skip.
When processing BINDER_TYPE_FDA objects with num_fds=0 we add a
skip region of skip_size=0, and this causes issues because now
binder_do_deferred_txn_copies will think this was a fixup region.

To address that, return early from binder_translate_fd_array to
avoid adding an empty skip region.

Fixes: 09184ae9b575 ("binder: defer copies of pre-patched txn data")
Acked-by: Todd Kjos <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Alessandro Astone <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

binder: Address corner cases in deferred copy and fixup

When handling BINDER_TYPE_FDA object we are pushing a parent fixup
with a certain skip_size but no scatter-gather copy object, since
the copy is handled standalone.
If BINDER_TYPE_FDA is the last children the scatter-gather copy
loop will never stop to skip it, thus we are left with an item in
the parent fixup list. This will trigger the BUG_ON().

This is reproducible in android when playing a video.
We receive a transaction that looks like this:
    obj[0] BINDER_TYPE_PTR, parent
    obj[1] BINDER_TYPE_PTR, child
    obj[2] BINDER_TYPE_PTR, child
    obj[3] BINDER_TYPE_FDA, child

Fixes: 09184ae9b575 ("binder: defer copies of pre-patched txn data")
Acked-by: Todd Kjos <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Alessandro Astone <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

thermal/governor: Remove deprecated information

The userspace governor is still in use on production systems and the
deprecating warning is scary.

Even if we want to get rid of the userspace governor, it is too soon
yet as the alternatives are not yet adopted.

Change the deprecated warning by an information message suggesting to
switch to the netlink thermal events.

Fixes: 0275c9fb0eff ("thermal/core: Make the userspace governor deprecated")
Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Revert "thermal/core: Deprecate changing cooling device state from userspace"

This reverts commit a67a46af4ad6342378e332b7420c1d1a2818c53f.

It has been reported the warning is annoying as the cooling device
state is still needed on some production system.

Meanwhile we provide a way to consolidate the thermal framework to
prevent multiple actors acting on the cooling devices with conflicting
decisions, let's revert this warning.

Signed-off-by: Daniel Lezcano <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

serial: 8250: Correct the clock for EndRun PTP/1588 PCIe device

The EndRun PTP/1588 dual serial port device is based on the Oxford
Semiconductor OXPCIe952 UART device with the PCI vendor:device ID set
for EndRun Technologies and is therefore driven by a fixed 62.5MHz clock
input derived from the 100MHz PCI Express clock. The clock rate is
divided by the oversampling rate of 16 as it is supplied to the baud
rate generator, yielding the baud base of 3906250.

Replace the incorrect baud base of 4000000 with the right value of
3906250 then, complementing commit 6cbe45d8ac93 ("serial: 8250: Correct
the clock for OxSemi PCIe devices").

Signed-off-by: Maciej W. Rozycki <[email protected]>
Cc: stable <[email protected]>
Fixes: 1bc8cde46a159 ("8250_pci: Added driver for Endrun Technologies PTP PCIe card.")
Reviewed-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: 8250: Also set sticky MCR bits in console restoration

Sticky MCR bits are lost in console restoration if console suspending
has been disabled. This currently affects the AFE bit, which works in
combination with RTS which we set, so we want to make sure the UART
retains control of its FIFO where previously requested. Also specific
drivers may need other bits in the future.

Signed-off-by: Maciej W. Rozycki <[email protected]>
Fixes: 4516d50aabed ("serial: 8250: Use canary to restart console after suspend")
Cc: [email protected] # v4.0+
Reviewed-by: Andy Shevchenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: n_gsm: fix software flow control handling

n_gsm is based on the 3GPP 07.010 and its newer version is the 3GPP 27.010.
See https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1516
The changes from 07.010 to 27.010 are non-functional. Therefore, I refer to
the newer 27.010 here. Chapter 5.4.8.1 states that XON/XOFF characters
shall be used instead of Fcon/Fcoff command in advanced option mode to
handle flow control. Chapter 5.4.8.2 describes how XON/XOFF characters
shall be handled. Basic option mode only used Fcon/Fcoff commands and no
XON/XOFF characters. These are treated as data bytes here.
The current implementation uses the gsm_mux field 'constipated' to handle
flow control from the remote peer and the gsm_dlci field 'constipated' to
handle flow control from each DLCI. The later is unrelated to this patch.
The gsm_mux field is correctly set for Fcon/Fcoff commands in
gsm_control_message(). However, the same is not true for XON/XOFF
characters in gsm1_receive().
Disable software flow control handling in the tty to allow explicit
handling by n_gsm.
Add the missing handling in advanced option mode for gsm_mux in
gsm1_receive() to comply with the standard.

This patch depends on the following commit:
Commit 8838b2af23ca ("tty: n_gsm: fix SW flow control encoding/handling")

Fixes: e1eaea46bb40 ("tty: n_gsm line discipline")
Cc: [email protected]
Signed-off-by: Daniel Starke <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: n_gsm: fix invalid use of MSC in advanced option

n_gsm is based on the 3GPP 07.010 and its newer version is the 3GPP 27.010.
See https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1516
The changes from 07.010 to 27.010 are non-functional. Therefore, I refer to
the newer 27.010 here. Chapter 5.4.6.3.7 states that the Modem Status
Command (MSC) shall only be used if the basic option was chosen.
The current implementation uses MSC frames even if advanced option was
chosen to inform the peer about modem line state updates. A standard
conform peer may choose to discard these frames in advanced option mode.
Furthermore, gsmtty_modem_update() is not part of the 'tty_operations'
functions despite its name.
Rename gsmtty_modem_update() to gsm_modem_update() to clarify this. Split
its function into gsm_modem_upd_via_data() and gsm_modem_upd_via_msc()
depending on the encoding and adaption. Introduce gsm_dlci_modem_output()
as adaption of gsm_dlci_data_output() to encode and queue empty frames in
advanced option mode. Use it in gsm_modem_upd_via_data().
gsm_modem_upd_via_msc() is based on the initial gsmtty_modem_update()
function which used only MSC frames to update modem states.

Fixes: e1eaea46bb40 ("tty: n_gsm line discipline")
Cc: [email protected]
Signed-off-by: Daniel Starke <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: n_gsm: fix broken virtual tty handling

Dynamic virtual tty registration was introduced to allow the user to handle
these cases with uevent rules. The following commits relate to this:
Commit 5b87686e3203 ("tty: n_gsm: Modify gsmtty driver register method when config requester")
Commit 0b91b5332368 ("tty: n_gsm: Save dlci address open status when config requester")
Commit 46292622ad73 ("tty: n_gsm: clean up indenting in gsm_queue()")

However, the following behavior can be seen with this implementation:
- n_gsm ldisc is activated via ioctl
- all configuration parameters are set to their default value (initiator=0)
- the mux gets activated and attached and gsmtty0 is being registered in
  in gsm_dlci_open() after DLCI 0 was established (DLCI 0 is the control
  channel)
- the user configures n_gsm via ioctl GSMIOC_SETCONF as initiator
- this re-attaches the n_gsm mux
- no new gsmtty devices are registered in gsmld_attach_gsm() because the
  mux is already active
- the initiator side registered only the control channel as gsmtty0
  (which should never happen) and no user channel tty

The commits above make it impossible to operate the initiator side as no
user channel tty is or will be available.
On the other hand, this behavior will make it also impossible to allow DLCI
parameter negotiation on responder side in the future. The responder side
first needs to provide a device for the application before the application
can set its parameters of the associated DLCI via ioctl.
Note that the user application is still able to detect a link establishment
without relaying to uevent by waiting for DTR open on responder side. This
is the same behavior as on a physical serial interface. And on initiator
side a tty hangup can be detected if a link establishment request failed.

Revert the commits above completely to always register all user channels
and no control channel after mux attachment. No other changes are made.

Fixes: 5b87686e3203 ("tty: n_gsm: Modify gsmtty driver register method when config requester")
Cc: [email protected]
Signed-off-by: Daniel Starke <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Revert "serial: sc16is7xx: Clear RS485 bits in the shutdown"

This reverts commit 927728a34f11b5a27f4610bdb7068317d6fdc72a.

Once the uart_port->rs485->flag is set to SER_RS485_ENABLED, the port
should always work in RS485 mode. If users want the port to leave
RS485 mode, they need to call ioctl() to clear SER_RS485_ENABLED.

So here we shouldn't clear the RS485 bits in the shutdown().

Fixes: 927728a34f11 ("serial: sc16is7xx: Clear RS485 bits in the shutdown")
Signed-off-by: Hui Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Merge tag 'icc-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-next

Georgi writes:

interconnect fixes for v5.18

This contains a fix for a reported issue on sc7180 platforms, where
one of the resources has been incorrectly modelled as both clock and
interconnect, which is causing a crash when both frameworks try to
manage it. Fix the same issue also on another platform that appears
to be affected by the same.

- interconnect: qcom: sc7180: Drop IP0 interconnects
- interconnect: qcom: sdx55: Drop IP0 interconnects

Signed-off-by: Georgi Djakov <[email protected]>
* tag 'icc-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
interconnect: qcom: sdx55: Drop IP0 interconnects
interconnect: qcom: sc7180: Drop IP0 interconnects

Merge tag 'phy-fixes-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy into char-misc-linus

Vinod writes:

phy: fixes for 5.18

Fixes for bunch of drivers:
- TI fixes for runtime disable, missing of_node_put and error handling
- Samsung fixes for device_put and of_node_put
- Amlogic error path handling

* tag 'phy-fixes-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: amlogic: fix error path in phy_g12a_usb3_pcie_probe()
  phy: ti: Add missing pm_runtime_disable() in serdes_am654_probe
  phy: mapphone-mdm6600: Fix PM error handling in phy_mdm6600_probe
  phy: ti: omap-usb2: Fix error handling in omap_usb2_enable_clocks
  phy: ti: tusb1210: Fix an error handling path in tusb1210_probe()
  phy: samsung: exynos5250-sata: fix missing device put in probe error paths
  phy: samsung: Fix missing of_node_put() in exynos_sata_phy_probe
  phy: ti: Fix missing of_node_put in ti_pipe3_get_sysctrl()
  phy: ti: tusb1210: Make tusb1210_chg_det_states static

netfilter: nft_set_rbtree: overlap detection with element re-addition after deletion

This patch fixes spurious EEXIST errors.

Extend d2df92e98a34 ("netfilter: nft_set_rbtree: handle element
re-addition after deletion") to deal with elements with same end flags
in the same transation.

Reset the overlap flag as described by 7c84d41416d8 ("netfilter:
nft_set_rbtree: Detect partial overlaps on insertion").

Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
Fixes: d2df92e98a34 ("netfilter: nft_set_rbtree: handle element re-addition after deletion")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Reviewed-by: Stefano Brivio <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge tag 'mhi-fixes-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-linus

Manivannan writes:

MHI fixes for v5.18

Couple of patches fixing the hibernation issue seen on MHI endpoint devices like
SDX65 modems:

- During hibernation, the host puts the device into D3cold after thaw() stage.
But at that time, the device would be in M0 state. So the device emits a
warning (not visible to the host but to device firmware only) stating invalid
transition. This is fixed by adding a poweroff() callback that puts the device
into M3 before D3cold.

- There is a possibility that the recovery worker might be running while trying
to powerdown the device. So flush the recovery worker before that.

* tag 'mhi-fixes-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi:
bus: mhi: host: pci_generic: Flush recovery worker during freeze
bus: mhi: host: pci_generic: Add missing poweroff() PM callback

usb: dwc3: core: Only handle soft-reset in DCTL

Make sure not to set run_stop bit or link state change request while
initiating soft-reset. Register read-modify-write operation may
unintentionally start the controller before the initialization completes
with its previous DCTL value, which can cause initialization failure.

Fixes: f59dcab17629 ("usb: dwc3: core: improve reset sequence")
Cc: <[email protected]>
Signed-off-by: Thinh Nguyen <[email protected]>
Link: https://lore.kernel.org/r/6aecbd78328f102003d40ccf18ceeebd411d3703.1650594792.git.Thinh.Nguyen@synopsys.com
Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: dsa: Add missing of_node_put() in dsa_port_link_register_of

The device_node pointer is returned by of_parse_phandle() with refcount
incremented. We should use of_node_put() on it when done.
of_node_put() will check for NULL value.

Fixes: a20f997010c4 ("net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed")
Signed-off-by: Miaoqian Lin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

regulator: dt-bindings: Revise the rt5190a buck/ldo description

Revise the rt5190a bucks and ldo property description.

Signed-off-by: ChiYuan Huang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

arm64: mm: fix p?d_leaf()

The pmd_leaf() is used to test a leaf mapped PMD, however, it misses
the PROT_NONE mapped PMD on arm64. Fix it. A real world issue [1]
caused by this was reported by Qian Cai. Also fix pud_leaf().

Link: https://patchwork.kernel.org/comment/24798260/
Fixes: 8aa82df3c123 ("arm64: mm: add p?d_leaf() definitions")
Reported-by: Qian Cai <[email protected]>
Signed-off-by: Muchun Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

objtool: Fix code relocs vs weak symbols

Occasionally objtool driven code patching (think .static_call_sites
.retpoline_sites etc..) goes sideways and it tries to patch an
instruction that doesn't match.

Much head-scatching and cursing later the problem is as outlined below
and affects every section that objtool generates for us, very much
including the ORC data. The below uses .static_call_sites because it's
convenient for demonstration purposes, but as mentioned the ORC
sections, .retpoline_sites and __mount_loc are all similarly affected.

Consider:

foo-weak.c:

  extern void __SCT__foo(void);

  __attribute__((weak)) void foo(void)
  {
  return __SCT__foo();
  }

foo.c:

  extern void __SCT__foo(void);
  extern void my_foo(void);

  void foo(void)
  {
  my_foo();
  return __SCT__foo();
  }

These generate the obvious code
(gcc -O2 -fcf-protection=none -fno-asynchronous-unwind-tables -c foo*.c):

foo-weak.o:
0000000000000000 <foo>:
   0:   e9 00 00 00 00          jmpq   5 <foo+0x5>      1: R_X86_64_PLT32       __SCT__foo-0x4

foo.o:
0000000000000000 <foo>:
   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   e8 00 00 00 00          callq  9 <foo+0x9>      5: R_X86_64_PLT32       my_foo-0x4
   9:   48 83 c4 08             add    $0x8,%rsp
   d:   e9 00 00 00 00          jmpq   12 <foo+0x12>    e: R_X86_64_PLT32       __SCT__foo-0x4

Now, when we link these two files together, you get something like
(ld -r -o foos.o foo-weak.o foo.o):

foos.o:
0000000000000000 <foo-0x10>:
   0:   e9 00 00 00 00          jmpq   5 <foo-0xb>      1: R_X86_64_PLT32       __SCT__foo-0x4
   5:   66 2e 0f 1f 84 00 00 00 00 00   nopw   %cs:0x0(%rax,%rax,1)
   f:   90                      nop

0000000000000010 <foo>:
  10:   48 83 ec 08             sub    $0x8,%rsp
  14:   e8 00 00 00 00          callq  19 <foo+0x9>     15: R_X86_64_PLT32      my_foo-0x4
  19:   48 83 c4 08             add    $0x8,%rsp
  1d:   e9 00 00 00 00          jmpq   22 <foo+0x12>    1e: R_X86_64_PLT32      __SCT__foo-0x4

Noting that ld preserves the weak function text, but strips the symbol
off of it (hence objdump doing that funny negative offset thing). This
does lead to 'interesting' unused code issues with objtool when ran on
linked objects, but that seems to be working (fingers crossed).

So far so good.. Now lets consider the objtool static_call output
section (readelf output, old binutils):

foo-weak.o:

Relocation section '.rela.static_call_sites' at offset 0x2c8 contains 1 entry:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000000  0000000200000002 R_X86_64_PC32          0000000000000000 .text + 0
0000000000000004  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1

foo.o:

Relocation section '.rela.static_call_sites' at offset 0x310 contains 2 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000000  0000000200000002 R_X86_64_PC32          0000000000000000 .text + d
0000000000000004  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1

foos.o:

Relocation section '.rela.static_call_sites' at offset 0x430 contains 4 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000000  0000000100000002 R_X86_64_PC32          0000000000000000 .text + 0
0000000000000004  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1
0000000000000008  0000000100000002 R_X86_64_PC32          0000000000000000 .text + 1d
000000000000000c  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1

So we have two patch sites, one in the dead code of the weak foo and one
in the real foo. All is well.

*HOWEVER*, when the toolchain strips unused section symbols it
generates things like this (using new enough binutils):

foo-weak.o:

Relocation section '.rela.static_call_sites' at offset 0x2c8 contains 1 entry:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000000  0000000200000002 R_X86_64_PC32          0000000000000000 foo + 0
0000000000000004  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1

foo.o:

Relocation section '.rela.static_call_sites' at offset 0x310 contains 2 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000000  0000000200000002 R_X86_64_PC32          0000000000000000 foo + d
0000000000000004  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1

foos.o:

Relocation section '.rela.static_call_sites' at offset 0x430 contains 4 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000000000  0000000100000002 R_X86_64_PC32          0000000000000000 foo + 0
0000000000000004  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1
0000000000000008  0000000100000002 R_X86_64_PC32          0000000000000000 foo + d
000000000000000c  0000000d00000002 R_X86_64_PC32          0000000000000000 __SCT__foo + 1

And now we can see how that foos.o .static_call_sites goes side-ways, we
now have _two_ patch sites in foo. One for the weak symbol at foo+0
(which is no longer a static_call site!) and one at foo+d which is in
fact the right location.

This seems to happen when objtool cannot find a section symbol, in which
case it falls back to any other symbol to key off of, however in this
case that goes terribly wrong!

As such, teach objtool to create a section symbol when there isn't
one.

Fixes: 44f6a7c0755d ("objtool: Fix seg fault with Clang non-section symbols")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

objtool: Fix type of reloc::addend

Elf{32,64}_Rela::r_addend is of type: Elf{32,64}_Sword, that means
that our reloc::addend needs to be long or face tuncation issues when
we do elf_rebuild_reloc_section():

- 107: 48 b8 00 00 00 00 00 00 00 00 movabs $0x0,%rax 109: R_X86_64_64 level4_kernel_pgt+0x80000067
+ 107: 48 b8 00 00 00 00 00 00 00 00 movabs $0x0,%rax 109: R_X86_64_64 level4_kernel_pgt-0x7fffff99

Fixes: 627fce14809b ("objtool: Add ORC unwind table generation")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

net: cosa: fix error check return value of register_chrdev()

If major equal 0, register_chrdev() returns error code when it fails.
This function dynamically allocate a major and return its number on
success, so we should use "< 0" to check it instead of "!".

Reported-by: Zeal Robot <[email protected]>
Signed-off-by: Lv Ruyi <[email protected]>
Acked-By: Jan "Yenya" Kasprzak <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'drm-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Extra quiet after Easter, only have minor i915 and msm pulls. However
  I haven't seen a PR from our misc tree in a little while, I've cc'ed
  all the suspects. Once that unblocks I expect a bit larger bunch of
  patches to arrive.

  Otherwise as I said, one msm revert and two i915 fixes.

  msm:

   - revert iommu change that broke some platforms.

  i915:

   - Unset enable_psr2_sel_fetch if PSR2 detection fails

   - Fix to detect when VRR is turned off from panel settings"

* tag 'drm-fixes-2022-04-22' of git://anongit.freedesktop.org/drm/drm:
  drm/i915/display/psr: Unset enable_psr2_sel_fetch if other checks in intel_psr2_config_valid() fails
  drm/msm: Revert "drm/msm: Stop using iommu_present()"
  drm/i915/display/vrr: Reset VRR capable property on a long hpd

mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove()

In some cases it is possible for mmu_interval_notifier_remove() to race
with mn_tree_inv_end() allowing it to return while the notifier data
structure is still in use.  Consider the following sequence:

  CPU0 - mn_tree_inv_end()            CPU1 - mmu_interval_notifier_remove()
  ----------------------------------- ------------------------------------
                                      spin_lock(subscriptions->lock);
                                      seq = subscriptions->invalidate_seq;
  spin_lock(subscriptions->lock);     spin_unlock(subscriptions->lock);
  subscriptions->invalidate_seq++;
                                      wait_event(invalidate_seq != seq);
                                      return;
  interval_tree_remove(interval_sub); kfree(interval_sub);
  spin_unlock(subscriptions->lock);
  wake_up_all();

As the wait_event() condition is true it will return immediately.  This
can lead to use-after-free type errors if the caller frees the data
structure containing the interval notifier subscription while it is
still on a deferred list.  Fix this by taking the appropriate lock when
reading invalidate_seq to ensure proper synchronisation.

I observed this whilst running stress testing during some development.
You do have to be pretty unlucky, but it leads to the usual problems of
use-after-free (memory corruption, kernel crash, difficult to diagnose
WARN_ON, etc).

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 99cb252f5e68 ("mm/mmu_notifier: add an interval tree notifier")
Signed-off-by: Alistair Popple <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Cc: Christian König <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kcov: don't generate a warning on vm_insert_page()'s failure

vm_insert_page()'s failure is not an unexpected condition, so don't do
WARN_ONCE() in such a case.

Instead, print a kernel message and just return an error code.

This flaw has been reported under an OOM condition by sysbot [1].

The message is mainly for the benefit of the test log, in this case the
fuzzer's log so that humans inspecting the log can figure out what was
going on. KCOV is a testing tool, so I think being a little more chatty
when KCOV unexpectedly is about to fail will save someone debugging
time.

We don't want the WARN, because it's not a kernel bug that syzbot should
report, and failure can happen if the fuzzer tries hard enough (as
above).

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: b3d7fe86fbd0 ("kcov: properly handle subsequent mmap calls"),
Signed-off-by: Aleksandr Nogikh <[email protected]>
Acked-by: Marco Elver <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Taras Madan <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: add Vincenzo Frascino to KASAN reviewers

Add my email address to KASAN reviewers list to make sure that I am
Cc'ed in all the KASAN changes that may affect arm64 MTE.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vincenzo Frascino <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup

The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which
can be targeted by the oom reaper.  This mapping is used to store the
futex robust list head; the kernel does not keep a copy of the robust
list and instead references a userspace address to maintain the
robustness during a process death.

A race can occur between exit_mm and the oom reaper that allows the oom
reaper to free the memory of the futex robust list before the exit path
has handled the futex death:

    CPU1                               CPU2
    --------------------------------------------------------------------
    page_fault
    do_exit "signal"
    wake_oom_reaper
                                        oom_reaper
                                        oom_reap_task_mm (invalidates mm)
    exit_mm
    exit_mm_release
    futex_exit_release
    futex_cleanup
    exit_robust_list
    get_user (EFAULT- can't access memory)

If the get_user EFAULT's, the kernel will be unable to recover the
waiters on the robust_list, leaving userspace mutexes hung indefinitely.

Delay the OOM reaper, allowing more time for the exit path to perform
the futex cleanup.

Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer

Based on a patch by Michal Hocko.

Link: https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: Joel Savitz <[email protected]>
Signed-off-by: Nico Pache <[email protected]>
Co-developed-by: Joel Savitz <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Rafael Aquini <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Herton R. Krzesinski <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joel Savitz <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftest/vm: add skip support to mremap_test

Allow the mremap test to be skipped due to errors such as failing to
parse the mmap_min_addr sysctl.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reviewed-by: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftest/vm: support xfail in mremap_test

Use ksft_test_result_xfail for the tests which are expected to fail.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reviewed-by: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftest/vm: verify remap destination address in mremap_test

Because mremap does not have a MAP_FIXED_NOREPLACE flag, it can destroy
existing mappings. This causes a segfault when regions such as text are
remapped and the permissions are changed.

Verify the requested mremap destination address does not overlap any
existing mappings by using mmap's MAP_FIXED_NOREPLACE flag. Keep
incrementing the destination address until a valid mapping is found or
fail the current test once the max address is reached.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reviewed-by: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftest/vm: verify mmap addr in mremap_test

Avoid calling mmap with requested addresses that are less than the
system's mmap_min_addr. When run as root, mmap returns EACCES when
trying to map addresses < mmap_min_addr. This is not one of the error
codes for the condition to retry the mmap in the test.

Rather than arbitrarily retrying on EACCES, don't attempt an mmap until
addr > vm.mmap_min_addr.

Add a munmap call after an alignment check as the mappings are retained
after the retry and can reach the vm.max_map_count sysctl.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sidhartha Kumar <[email protected]>
Reviewed-by: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, hugetlb: allow for "high" userspace addresses

This is a fix for commit f6795053dac8 ("mm: mmap: Allow for "high"
userspace addresses") for hugetlb.

This patch adds support for "high" userspace addresses that are
optionally supported on the system and have to be requested via a hint
mechanism ("high" addr parameter to mmap).

Architectures such as powerpc and x86 achieve this by making changes to
their architectural versions of hugetlb_get_unmapped_area() function.
However, arm64 uses the generic version of that function.

So take into account arch_get_mmap_base() and arch_get_mmap_end() in
hugetlb_get_unmapped_area().  To allow that, move those two macros out
of mm/mmap.c into include/linux/sched/mm.h

If these macros are not defined in architectural code then they default
to (TASK_SIZE) and (base) so should not introduce any behavioural
changes to architectures that do not define them.

For the time being, only ARM64 is affected by this change.

Catalin (ARM64) said
"We should have fixed hugetlb_get_unmapped_area() as well when we added
  support for 52-bit VA. The reason for commit f6795053dac8 was to
  prevent normal mmap() from returning addresses above 48-bit by default
  as some user-space had hard assumptions about this.

  It's a slight ABI change if you do this for hugetlb_get_unmapped_area()
  but I doubt anyone would notice. It's more likely that the current
  behaviour would cause issues, so I'd rather have them consistent.

  Basically when arm64 gained support for 52-bit addresses we did not
  want user-space calling mmap() to suddenly get such high addresses,
  otherwise we could have inadvertently broken some programs (similar
  behaviour to x86 here). Hence we added commit f6795053dac8. But we
  missed hugetlbfs which could still get such high mmap() addresses. So
  in theory that's a potential regression that should have bee addressed
  at the same time as commit f6795053dac8 (and before arm64 enabled
  52-bit addresses)"

Link: https://lkml.kernel.org/r/ab847b6edb197bffdfe189e70fb4ac76bfe79e0d.1650033747.git.christophe.leroy@csgroup.eu
Fixes: f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses")
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: <[email protected]> [5.0.x]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

userfaultfd: mark uffd_wp regardless of VM_WRITE flag

When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is
currently only marked as write-protected if the VMA has VM_WRITE flag
set. This seems incorrect or at least would be unexpected by the users.

Consider the following sequence of operations that are being performed
on a certain page:

mprotect(PROT_READ)
UFFDIO_COPY(UFFDIO_COPY_MODE_WP)
mprotect(PROT_READ|PROT_WRITE)

At this point the user would expect to still get UFFD notification when
the page is accessed for write, but the user would not get one, since
the PTE was not marked as UFFD_WP during UFFDIO_COPY.

Fix it by always marking PTEs as UFFD_WP regardless on the
write-permission in the VMA flags.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit")
Signed-off-by: Nadav Amit <[email protected]>
Acked-by: Peter Xu <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

memcg: sync flush only if periodic flush is delayed

Daniel Dao has reported [1] a regression on workloads that may trigger a
lot of refaults (anon and file).  The underlying issue is that flushing
rstat is expensive.  Although rstat flush are batched with (nr_cpus *
MEMCG_BATCH) stat updates, it seems like there are workloads which
genuinely do stat updates larger than batch value within short amount of
time.  Since the rstat flush can happen in the performance critical
codepaths like page faults, such workload can suffer greatly.

This patch fixes this regression by making the rstat flushing
conditional in the performance critical codepaths.  More specifically,
the kernel relies on the async periodic rstat flusher to flush the stats
and only if the periodic flusher is delayed by more than twice the
amount of its normal time window then the kernel allows rstat flushing
from the performance critical codepaths.

Now the question: what are the side-effects of this change? The worst
that can happen is the refault codepath will see 4sec old lruvec stats
and may cause false (or missed) activations of the refaulted page which
may under-or-overestimate the workingset size.  Though that is not very
concerning as the kernel can already miss or do false activations.

There are two more codepaths whose flushing behavior is not changed by
this patch and we may need to come to them in future.  One is the
writeback stats used by dirty throttling and second is the deactivation
heuristic in the reclaim.  For now keeping an eye on them and if there
is report of regression due to these codepaths, we will reevaluate then.

Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1f828223b799 ("memcg: flush lruvec stats in the refault")
Signed-off-by: Shakeel Butt <[email protected]>
Reported-by: Daniel Dao <[email protected]>
Tested-by: Ivan Babrou <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Koutný <[email protected]>
Cc: Frank Hofmann <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/memory-failure.c: skip huge_zero_page in memory_failure()

Kernel panic when injecting memory_failure for the global
huge_zero_page, when CONFIG_DEBUG_VM is enabled, as follows.

  Injecting memory failure for pfn 0x109ff9 at process virtual address 0x20ff9000
  page:00000000fb053fc3 refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x109e00
  head:00000000fb053fc3 order:9 compound_mapcount:0 compound_pincount:0
  flags: 0x17fffc000010001(locked|head|node=0|zone=2|lastcpupid=0x1ffff)
  raw: 017fffc000010001 0000000000000000 dead000000000122 0000000000000000
  raw: 0000000000000000 0000000000000000 00000002ffffffff 0000000000000000
  page dumped because: VM_BUG_ON_PAGE(is_huge_zero_page(head))
  ------------[ cut here ]------------
  kernel BUG at mm/huge_memory.c:2499!
  invalid opcode: 0000 [#1] PREEMPT SMP PTI
  CPU: 6 PID: 553 Comm: split_bug Not tainted 5.18.0-rc1+ #11
  Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
  RIP: 0010:split_huge_page_to_list+0x66a/0x880
  Code: 84 9b fb ff ff 48 8b 7c 24 08 31 f6 e8 9f 5d 2a 00 b8 b8 02 00 00 e9 e8 fb ff ff 48 c7 c6 e8 47 3c 82 4c b
  RSP: 0018:ffffc90000dcbdf8 EFLAGS: 00010246
  RAX: 000000000000003c RBX: 0000000000000001 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffff823e4c4f RDI: 00000000ffffffff
  RBP: ffff88843fffdb40 R08: 0000000000000000 R09: 00000000fffeffff
  R10: ffffc90000dcbc48 R11: ffffffff82d68448 R12: ffffea0004278000
  R13: ffffffff823c6203 R14: 0000000000109ff9 R15: ffffea000427fe40
  FS:  00007fc375a26740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fc3757c9290 CR3: 0000000102174006 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   try_to_split_thp_page+0x3a/0x130
   memory_failure+0x128/0x800
   madvise_inject_error.cold+0x8b/0xa1
   __x64_sys_madvise+0x54/0x60
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7fc3754f8bf9
  Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 8
  RSP: 002b:00007ffeda93a1d8 EFLAGS: 00000217 ORIG_RAX: 000000000000001c
  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3754f8bf9
  RDX: 0000000000000064 RSI: 0000000000003000 RDI: 0000000020ff9000
  RBP: 00007ffeda93a200 R08: 0000000000000000 R09: 0000000000000000
  R10: 00000000ffffffff R11: 0000000000000217 R12: 0000000000400490
  R13: 00007ffeda93a2e0 R14: 0000000000000000 R15: 0000000000000000

This makes huge_zero_page bail out explicitly before split in
memory_failure(), thus the panic above won't happen again.

Link: https://lkml.kernel.org/r/497d3835612610e370c74e697ea3c721d1d55b9c.1649775850.git.xuyu@linux.alibaba.com
Fixes: 6a46079cf57a ("HWPOISON: The high level memory error handler in the VM v7")
Signed-off-by: Xu Yu <[email protected]>
Reported-by: Abaci <[email protected]>
Suggested-by: Naoya Horiguchi <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()

There is a race condition between memory_failure_hugetlb() and hugetlb
free/demotion, which causes setting PageHWPoison flag on the wrong page.
The one simple result is that wrong processes can be killed, but another
(more serious) one is that the actual error is left unhandled, so no one
prevents later access to it, and that might lead to more serious results
like consuming corrupted data.

Think about the below race window:

  CPU 1                                   CPU 2
  memory_failure_hugetlb
  struct page *head = compound_head(p);
                                          hugetlb page might be freed to
                                          buddy, or even changed to another
                                          compound page.

  get_hwpoison_page -- page is not what we want now...

The current code first does prechecks roughly and then reconfirms after
taking refcount, but it's found that it makes code overly complicated,
so move the prechecks in a single hugetlb_lock range.

A newly introduced function, try_memory_failure_hugetlb(), always takes
hugetlb_lock (even for non-hugetlb pages).  That can be improved, but
memory_failure() is rare in principle, so should not be a big problem.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 761ad8d7c7b5 ("mm: hwpoison: introduce memory_failure_hugetlb()")
Signed-off-by: Naoya Horiguchi <[email protected]>
Reported-by: Mike Kravetz <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

clk: qcom: clk-rcg2: fix gfx3d frequency calculation

Since the commit 948fb0969eae ("clk: Always clamp the rounded rate"),
the clk_core_determine_round_nolock() would clamp the requested rate
between min and max rates from the rate request. Normally these fields
would be filled by clk_core_get_boundaries() called from
clk_round_rate().

However clk_gfx3d_determine_rate() uses a manually crafted rate request,
which did not have these fields filled. Thus the requested frequency
would be clamped to 0, resulting in weird frequencies being requested
from the hardware.

Fix this by filling min_rate and max_rate to the values valid for the
respective PLLs (0 and ULONG_MAX).

Fixes: 948fb0969eae ("clk: Always clamp the rounded rate")
Signed-off-by: Dmitry Baryshkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Bjorn Andersson <[email protected]>
Reported-by: Rob Clark <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

clk: microchip: mpfs: don't reset disabled peripherals

The current clock driver for PolarFire SoC puts the hardware behind
"periph" clocks into reset if their clock is disabled. CONFIG_PM was
recently added to the riscv defconfig and exposed issues caused by this
behaviour, where the Cadence GEM was being put into reset between its
bringup & the PHY bringup:

https://lore.kernel.org/linux-riscv/9f4b057d-1985-5fd3-65c0-f944161c7792@microchip.com/

Fix this (for now) by removing the reset from mpfs_periph_clk_disable.

Fixes: 635e5e73370e ("clk: microchip: Add driver for Microchip PolarFire SoC")
Reviewed-by: Daire McNamara <[email protected]>
Signed-off-by: Conor Dooley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stephen Boyd <[email protected]>

f2fs: should not truncate blocks during roll-forward recovery

If the file preallocated blocks and fsync'ed, we should not truncate them during
roll-forward recovery which will recover i_size correctly back.

Fixes: d4dd19ec1ea0 ("f2fs: do not expose unwritten blocks to user by DIO")
Cc: <[email protected]> # 5.17+
Signed-off-by: Jaegeuk Kim <[email protected]>

ata: pata_marvell: Check the 'bmdma_addr' beforing reading

Before detecting the cable type on the dma bar, the driver should check
whether the 'bmdma_addr' is zero, which means the adapter does not
support DMA, otherwise we will get the following error:

[    5.146634] Bad IO access at port 0x1 (return inb(port))
[    5.147206] WARNING: CPU: 2 PID: 303 at lib/iomap.c:44 ioread8+0x4a/0x60
[    5.150856] RIP: 0010:ioread8+0x4a/0x60
[    5.160238] Call Trace:
[    5.160470]  <TASK>
[    5.160674]  marvell_cable_detect+0x6e/0xc0 [pata_marvell]
[    5.161728]  ata_eh_recover+0x3520/0x6cc0
[    5.168075]  ata_do_eh+0x49/0x3c0

Signed-off-by: Zheyu Ma <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>

Merge tag 'drm-msm-fixes-2022-04-20' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

Revert to fix iommu regression.

Signed-off-by: Dave Airlie <[email protected]>
From: Rob Clark <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGtvPo4xD2peAztDMPP2n4utb7d9WQboMFwsba9E8U2rCw@mail.gmail.com

Merge tag 'dmaengine-fix-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine fixes from Vinod Koul:
"A bunch of driver fixes:

   - idxd device RO checks and device cleanup

   - dw-edma unaligned access and alignment

   - qcom: missing minItems in binding

   - mediatek pm usage fix

   - imx init script"

* tag 'dmaengine-fix-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dt-bindings: dmaengine: qcom: gpi: Add minItems for interrupts
  dmaengine: idxd: skip clearing device context when device is read-only
  dmaengine: idxd: add RO check for wq max_transfer_size write
  dmaengine: idxd: add RO check for wq max_batch_size write
  dmaengine: idxd: fix retry value to be constant for duration of function call
  dmaengine: idxd: match type for retries var in idxd_enqcmds()
  dmaengine: dw-edma: Fix inconsistent indenting
  dmaengine: dw-edma: Fix unaligned 64bit access
  dmaengine: mediatek:Fix PM usage reference leak of mtk_uart_apdma_alloc_chan_resources
  dmaengine: imx-sdma: Fix error checking in sdma_event_remap
  dma: at_xdmac: fix a missing check on list iterator
  dmaengine: imx-sdma: fix init of uart scripts
  dmaengine: idxd: fix device cleanup on disable

RISC-V: cpuidle: fix Kconfig select for RISCV_SBI_CPUIDLE

There can be lots of build errors when building cpuidle-riscv-sbi.o.
They are all caused by a kconfig problem with this warning:

WARNING: unmet direct dependencies detected for RISCV_SBI_CPUIDLE
  Depends on [n]: CPU_IDLE [=y] && RISCV [=y] && RISCV_SBI [=n]
  Selected by [y]:
  - SOC_VIRT [=y] && CPU_IDLE [=y]

so make the 'select' of RISCV_SBI_CPUIDLE also depend on RISCV_SBI.

Fixes: c5179ef1ca0c ("RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine")
Signed-off-by: Randy Dunlap <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: mm: Fix set_satp_mode() for platform not having Sv57

When Sv57 is not available the satp.MODE test in set_satp_mode() will
fail and lead to pgdir re-programming for Sv48. The pgdir re-programming
will fail as well due to pre-existing pgdir entry used for Sv57 and as
a result kernel fails to boot on RISC-V platform not having Sv57.

To fix above issue, we should clear the pgdir memory in set_satp_mode()
before re-programming.

Fixes: 011f09d12052 ("riscv: mm: Set sv57 on defaultly")
Reported-by: Mayuresh Chitale <[email protected]>
Signed-off-by: Anup Patel <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

Merge tag 'drm-intel-fixes-2022-04-20' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Unset enable_psr2_sel_fetch if PSR2 detection fails
- Fix to detect when VRR is turned off from panel settings

Signed-off-by: Dave Airlie <[email protected]>
From: Joonas Lahtinen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

kvm: selftests: introduce and use more page size-related constants

Clean up code that was hardcoding masks for various fields,
now that the masks are included in processor.h.

For more cleanup, define PAGE_SIZE and PAGE_MASK just like in Linux.
PAGE_SIZE in particular was defined by several tests.

Suggested-by: Sean Christopherson <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>