Git Repo - linux.git/log

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk updates from Stephen Boyd:
"Mainly driver updates this time around.

  There's a single patch to the core clk framework that simplifies a
  runtime PM call. Otherwise the majority of the diff falls to a few SoC
  drivers: Qualcomm, STM32 and MediaTek. Those SoCs gain some new
  hardware support and what comes along with that is quite a few lines
  of data and some clk_ops code.

  Beyond the new hardware support we have the usual pile of driver
  updates that add missing clks on already supported SoCs or fix up
  problems like bad clk tree descriptions. It's nice to see that more
  drivers are moving to clk_hw based APIs too.

  New Drivers:
   - Add STM32MP13 RCC driver (Reset Clock Controller)
   - MediaTek MT8186 SoC clk support
   - Airoha EN7523 SoC system clocks
   - Clock driver for exynosautov9 SoC
   - Renesas R-Car V4H and RZ/V2M SoCs
   - Renesas RZ/G2UL SoC
   - LPASS clk driver for Qualcomm sc7280 SoC
   - GCC clk driver for Qualcomm SC8280XP SoC

  Updates:
   - SDCC uses floor clk ops on Qualcomm MSM8976
   - Add modem reset and fix RPM clks on Qualcomm MSM8976
   - Add the two missing CLKOUT clocks for U8500/DB8500 SoC
   - Mark some clks critical on Ingenic X1000
   - Convert ux500 to clk_hw
   - Move MediaTek driver to clk_hw provider APIs
   - Use i2c driver probe_new to avoid id scans
   - Convert a number of Rockchip dt bindings to YAML
   - Mark hclk_vo critical on Rockchip rk3568
   - Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage
   - Various cleanups like memory allocation error checks and plugged
     leaks
   - Allwinner H6 RTC clock support
   - Allwinner H616 32 kHz clock support
   - Add the Universal Flash Storage clock on Renesas R-Car S4-8
   - Add I2C, SSIF-2 (sound), USB, CANFD, OSTM (timer), WDT, SPI Multi
     I/O Bus, RSPI, TSU (thermal), and ADC clocks and resets on Renesas
     RZ/G2UL
   - Add display clock support on Renesas RZ/G2L
   - Add RPC (QSPI/HyperFlash) clocks on Renesas R-Car E3 and D3
   - Add 27 MHz phy PLL ref clock on i.MX
   - Add mcore_booted module parameter to tell kernel M core has already
     booted for i.MX
   - Remove snvs clock on i.MX because it was for secure world only
   - Add dt bindings for i.MX8MN GPT
   - Add DISP2 pixel clock for i.MX8MP
   - Add clkout1/2 for i.MX8MP
   - Fix parent clock of ubs_root_clk for i.MX8MP
   - Implement better RCG parking on Qualcomm SoCs using the shared RCG
     clk ops
   - Kerneldoc fixes
   - Switch Tegra BPMP to determine_rate clk op
   - Add a pointer to dt schema for generic clock bindings"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (168 commits)
  Revert "clk: qcom: regmap-mux: add pipe clk implementation"
  Revert "clk: qcom: gcc-sc7280: use new clk_regmap_mux_safe_ops for PCIe pipe clocks"
  Revert "clk: qcom: gcc-sm8450: use new clk_regmap_mux_safe_ops for PCIe pipe clocks"
  clk: bcm: rpi: Use correct order for the parameters of devm_kcalloc()
  clk: stm32mp13: add safe mux management
  clk: stm32mp13: add multi mux function
  clk: stm32mp13: add all STM32MP13 kernel clocks
  clk: stm32mp13: add all STM32MP13 peripheral clocks
  clk: stm32mp13: manage secured clocks
  clk: stm32mp13: add composite clock
  clk: stm32mp13: add stm32 divider clock
  clk: stm32mp13: add stm32_gate management
  clk: stm32mp13: add stm32_mux clock management
  clk: stm32: Introduce STM32MP13 RCC drivers (Reset Clock Controller)
  dt-bindings: rcc: stm32: add new compatible for STM32MP13 SoC
  clk: ti: clkctrl: replace usage of found with dedicated list iterator variable
  clk: ti: composite: Prefer kcalloc over open coded arithmetic
  dt-bindings: clock: exynosautov9: correct count of NR_CLK
  clk: mediatek: mt8173: Switch to clk_hw provider APIs
  clk: mediatek: Switch to clk_hw provider APIs
  ...

Merge tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
"Resource management:

   - Restrict E820 clipping to PCI host bridge windows (Bjorn Helgaas)

   - Log E820 clipping better (Bjorn Helgaas)

   - Add kernel cmdline options to enable/disable E820 clipping (Hans de
     Goede)

   - Disable E820 reserved region clipping for IdeaPads, Yoga, Yoga
     Slip, Acer Spin 5, Clevo Barebone systems where clipping leaves no
     usable address space for touchpads, Thunderbolt devices, etc (Hans
     de Goede)

   - Disable E820 clipping by default starting in 2023 (Hans de Goede)

  PCI device hotplug:

   - Include files to remove implicit dependencies (Christophe Leroy)

   - Only put Root Ports in D3 if they can signal and wake from D3 so
     AMD Yellow Carp doesn't miss hotplug events (Mario Limonciello)

  Power management:

   - Define pci_restore_standard_config() only for CONFIG_PM_SLEEP since
     it's unused otherwise (Krzysztof Kozlowski)

   - Power up devices completely, including anything platform firmware
     needs to do, during runtime resume (Rafael J. Wysocki)

   - Move pci_resume_bus() to PM callbacks so we observe the required
     bridge power-up delays (Rafael J. Wysocki)

   - Drop unneeded runtime_d3cold device flag (Rafael J. Wysocki)

   - Split pci_raw_set_power_state() between pci_power_up() and a new
     pci_set_low_power_state() (Rafael J. Wysocki)

   - Set current_state to D3cold if config read returns ~0, indicating
     the device is not accessible (Rafael J. Wysocki)

   - Do not call pci_update_current_state() from pci_power_up() so BARs
     and ASPM config are restored correctly (Rafael J. Wysocki)

   - Write 0 to PMCSR in pci_power_up() in all cases (Rafael J. Wysocki)

   - Split pci_power_up() to pci_set_full_power_state() to avoid some
     redundant operations (Rafael J. Wysocki)

   - Skip restoring BARs if device is not in D0 (Rafael J. Wysocki)

   - Rearrange and clarify pci_set_power_state() (Rafael J. Wysocki)

   - Remove redundant BAR restores from pci_pm_thaw_noirq() (Rafael J.
     Wysocki)

  Virtualization:

   - Acquire device lock before config space access lock to avoid AB/BA
     deadlock with sriov_numvfs_store() (Yicong Yang)

  Error handling:

   - Clear MULTI_ERR_COR/UNCOR_RCV bits, which a race could previously
     leave permanently set (Kuppuswamy Sathyanarayanan)

  Peer-to-peer DMA:

   - Whitelist Intel Skylake-E Root Ports regardless of which devfn they
     are (Shlomo Pongratz)

  ASPM:

   - Override L1 acceptable latency advertised by Intel DG2 so ASPM L1
     can be enabled (Mika Westerberg)

  Cadence PCIe controller driver:

   - Set up device-specific register to allow PTM Responder to be
     enabled by the normal architected bit (Christian Gmeiner)

   - Override advertised FLR support since the controller doesn't
     implement FLR correctly (Parshuram Thombare)

  Cadence PCIe endpoint driver:

   - Correct bitmap size for the ob_region_map of outbound window usage
     (Dan Carpenter)

  Freescale i.MX6 PCIe controller driver:

   - Fix PERST# assertion/deassertion so we observe the required delays
     before accessing device (Francesco Dolcini)

  Freescale Layerscape PCIe controller driver:

   - Add "big-endian" DT property (Hou Zhiqiang)

   - Update SCFG DT property (Hou Zhiqiang)

   - Add "aer", "pme", "intr" DT properties (Li Yang)

   - Add DT compatible strings for ls1028a (Xiaowei Bao)

  Intel VMD host bridge driver:

   - Assign VMD IRQ domain before enumeration to avoid IOMMU interrupt
     remapping errors when MSI-X remapping is disabled (Nirmal Patel)

   - Revert VMD workaround that kept MSI-X remapping enabled when IOMMU
     remapping was enabled (Nirmal Patel)

  Marvell MVEBU PCIe controller driver:

   - Add of_pci_get_slot_power_limit() to parse the
     'slot-power-limit-milliwatt' DT property (Pali Rohár)

   - Add mvebu support for sending Set_Slot_Power_Limit message (Pali
     Rohár)

  MediaTek PCIe controller driver:

   - Fix refcount leak in mtk_pcie_subsys_powerup() (Miaoqian Lin)

  MediaTek PCIe Gen3 controller driver:

   - Reset PHY and MAC at probe time (AngeloGioacchino Del Regno)

  Microchip PolarFlare PCIe controller driver:

   - Add chained_irq_enter()/chained_irq_exit() calls to mc_handle_msi()
     and mc_handle_intx() to avoid lost interrupts (Conor Dooley)

   - Fix interrupt handling race (Daire McNamara)

  NVIDIA Tegra194 PCIe controller driver:

   - Drop tegra194 MSI register save/restore, which is unnecessary since
     the DWC core does it (Jisheng Zhang)

  Qualcomm PCIe controller driver:

   - Add SM8150 SoC DT binding and support (Bhupesh Sharma)

   - Fix pipe clock imbalance (Johan Hovold)

   - Fix runtime PM imbalance on probe errors (Johan Hovold)

   - Fix PHY init imbalance on probe errors (Johan Hovold)

   - Convert DT binding to YAML (Dmitry Baryshkov)

   - Update DT binding to show that resets aren't required for
     MSM8996/APQ8096 platforms (Dmitry Baryshkov)

   - Add explicit register names per chipset in DT binding (Dmitry
     Baryshkov)

   - Add sc7280-specific clock and reset definitions to DT binding
     (Dmitry Baryshkov)

  Rockchip PCIe controller driver:

   - Fix bitmap size when searching for free outbound region (Dan
     Carpenter)

  Rockchip DesignWare PCIe controller driver:

   - Remove "snps,dw-pcie" from rockchip-dwc DT "compatible" property
     because it's not fully compatible with rockchip (Peter Geis)

   - Reset rockchip-dwc controller at probe (Peter Geis)

   - Add rockchip-dwc INTx support (Peter Geis)

  Synopsys DesignWare PCIe controller driver:

   - Return error instead of success if DMA mapping of MSI area fails
     (Jiantao Zhang)

  Miscellaneous:

   - Change pci_set_dma_mask() documentation references to
     dma_set_mask() (Alex Williamson)"

* tag 'pci-v5.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (64 commits)
  dt-bindings: PCI: qcom: Add schema for sc7280 chipset
  dt-bindings: PCI: qcom: Specify reg-names explicitly
  dt-bindings: PCI: qcom: Do not require resets on msm8996 platforms
  dt-bindings: PCI: qcom: Convert to YAML
  PCI: qcom: Fix unbalanced PHY init on probe errors
  PCI: qcom: Fix runtime PM imbalance on probe errors
  PCI: qcom: Fix pipe clock imbalance
  PCI: qcom: Add SM8150 SoC support
  dt-bindings: pci: qcom: Document PCIe bindings for SM8150 SoC
  x86/PCI: Disable E820 reserved region clipping starting in 2023
  x86/PCI: Disable E820 reserved region clipping via quirks
  x86/PCI: Add kernel cmdline options to use/ignore E820 reserved regions
  PCI: microchip: Fix potential race in interrupt handling
  PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits
  PCI: cadence: Clear FLR in device capabilities register
  PCI: cadence: Allow PTM Responder to be enabled
  PCI: vmd: Revert 2565e5b69c44 ("PCI: vmd: Do not disable MSI-X remapping if interrupt remapping is enabled by IOMMU.")
  PCI: vmd: Assign VMD IRQ domain before enumeration
  PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store()
  PCI: rockchip-dwc: Add legacy interrupt support
  ...

f2fs: fix to tag gcing flag on page during file defragment

In order to garantee migrated data be persisted during checkpoint,
otherwise out-of-order persistency between data and node may cause
data corruption after SPOR.

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: replace F2FS_I(inode) and sbi by the local variable

We have define 'fi' at the begin of the functions, just use it,
rather than use F2FS_I(inode) again.

Signed-off-by: Yufen Yu <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
[Jaegeuk Kim: replace sbi]
Signed-off-by: Jaegeuk Kim <[email protected]>

Merge tag 'v5.19-rockchip-dts64-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/late

Clock properties for cru nodes to match the yaml-converted bindings
and renaming of Quartz-A bluetooth pin nodename to not conflict with
Yaml constraints.

* tag 'v5.19-rockchip-dts64-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: dts: rockchip: rename Quartz64-A bluetooth gpios
  arm64: dts: rockchip: add clocks property to cru node rk3368
  arm64: dts: rockchip: add clocks property to cru node rk3308
  arm64: dts: rockchip: add clocks to rk356x cru

Link: https://lore.kernel.org/r/7695907.Sb9uPGUboI@phil
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'v5.19-rockchip-dts32-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/late

Amba and clock fixes to conform better to actual dt-bindings.

* tag 'v5.19-rockchip-dts32-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  ARM: dts: rockchip: add clocks property to cru node rk3228
  ARM: dts: rockchip: add clocks property to cru node rk3036
  ARM: dts: rockchip: add clocks property to cru node rk3066a/rk3188
  ARM: dts: rockchip: add clocks property to cru node rk3288
  ARM: dts: rockchip: Remove "amba" bus nodes from rv1108
  ARM: dts: rockchip: add clocks property to cru node rv1108

Link: https://lore.kernel.org/r/4798587.jE0xQCEvom@phil
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'v5.19-rockchip-drivers2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/late

Refcount leak for a used of-node in the grf-init.

* tag 'v5.19-rockchip-drivers2' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
soc: rockchip: Fix refcount leak in rockchip_grf_init

Link: https://lore.kernel.org/r/4541398.Icojqenx9y@phil
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'mm-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull more MM updates from Andrew Morton:

- Two follow-on fixes for the post-5.19 series "Use pageblock_order for
   cma and alloc_contig_range alignment", from Zi Yan.

- A series of z3fold cleanups and fixes from Miaohe Lin.

- Some memcg selftests work from Michal Koutný <[email protected]>

- Some swap fixes and cleanups from Miaohe Lin

- Several individual minor fixups

* tag 'mm-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits)
  mm/shmem.c: suppress shift warning
  mm: Kconfig: reorganize misplaced mm options
  mm: kasan: fix input of vmalloc_to_page()
  mm: fix is_pinnable_page against a cma page
  mm: filter out swapin error entry in shmem mapping
  mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range
  mm/swapfile: fix lost swap bits in unuse_pte()
  mm/swapfile: unuse_pte can map random data if swap read fails
  selftests: memcg: factor out common parts of memory.{low,min} tests
  selftests: memcg: remove protection from top level memcg
  selftests: memcg: adjust expected reclaim values of protected cgroups
  selftests: memcg: expect no low events in unprotected sibling
  selftests: memcg: fix compilation
  mm/z3fold: fix z3fold_page_migrate races with z3fold_map
  mm/z3fold: fix z3fold_reclaim_page races with z3fold_free
  mm/z3fold: always clear PAGE_CLAIMED under z3fold page lock
  mm/z3fold: put z3fold page back into unbuddied list when reclaim or migration fails
  revert "mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"
  mm/z3fold: throw warning on failure of trylock_page in z3fold_alloc
  ...

Merge tag 'mm-hotfixes-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
"Six hotfixes.

  The page_table_check one from Miaohe Lin is considered a minor thing
  so it isn't marked for -stable. The remainder address pre-5.19 issues
  and are cc:stable"

* tag 'mm-hotfixes-stable-2022-05-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/page_table_check: fix accessing unmapped ptep
  kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]
  mm/page_alloc: always attempt to allocate at least one page during bulk allocation
  hugetlb: fix huge_pmd_unshare address update
  zsmalloc: fix races between asynchronous zspage free and page migration
  Revert "mm/cma.c: remove redundant cma_mutex lock"

Merge tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc updates from Andrew Morton:
"The non-MM patch queue for this merge window.

  Not a lot of material this cycle. Many singleton patches against
  various subsystems. Most notably some maintenance work in ocfs2
  and initramfs"

* tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (65 commits)
  kcov: update pos before writing pc in trace function
  ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
  ocfs2: dlmfs: don't clear USER_LOCK_ATTACHED when destroying lock
  fs/ntfs: remove redundant variable idx
  fat: remove time truncations in vfat_create/vfat_mkdir
  fat: report creation time in statx
  fat: ignore ctime updates, and keep ctime identical to mtime in memory
  fat: split fat_truncate_time() into separate functions
  MAINTAINERS: add Muchun as a memcg reviewer
  proc/sysctl: make protected_* world readable
  ia64: mca: drop redundant spinlock initialization
  tty: fix deadlock caused by calling printk() under tty_port->lock
  relay: remove redundant assignment to pointer buf
  fs/ntfs3: validate BOOT sectors_per_clusters
  lib/string_helpers: fix not adding strarray to device's resource list
  kernel/crash_core.c: remove redundant check of ck_cmdline
  ELF, uapi: fixup ELF_ST_TYPE definition
  ipc/mqueue: use get_tree_nodev() in mqueue_get_tree()
  ipc: update semtimedop() to use hrtimer
  ipc/sem: remove redundant assignments
  ...

crypto: poly1305 - cleanup stray CRYPTO_LIB_POLY1305_RSIZE

When CRYPTO_LIB_POLY1305 is unset, CRYPTO_LIB_POLY1305_RSIZE
is still set in the Kconfig, cluttering things.

Fix this by making CRYPTO_LIB_POLY1305_RSIZE depend on
CRYPTO_LIB_POLY1305.

Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arm64/hugetlb: Fix building errors in huge_ptep_clear_flush()

Fix the arm64 build error which was caused by commit ae07562909f3 ("mm:
change huge_ptep_clear_flush() to return the original pte") interacting
with commit fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from
get_clear_flush()"):

  arch/arm64/mm/hugetlbpage.c: In function ‘huge_ptep_clear_flush’:
  arch/arm64/mm/hugetlbpage.c:515:9: error: implicit declaration of function ‘get_clear_flush’; did you mean ‘ptep_clear_flush’? [-Werror=implicit-function-declaration]
    515 |  return get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
        |         ^~~~~~~~~~~~~~~
        |         ptep_clear_flush

Due to the new get_clear_contig() has dropped TLB flush, we should add
an explicit TLB flush in huge_ptep_clear_flush() to keep original
semantics when changing to use new get_clear_contig().

Fixes: fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()").
Fixes: ae07562909f3 ("mm: change huge_ptep_clear_flush() to return the original pte")
Reported-and-tested-by: Linux Kernel Functional Testing <[email protected]>
Reported-by: Sudip Mukherjee <[email protected]>
Suggested-by: Catalin Marinas <[email protected]>
Signed-off-by: Baolin Wang <[email protected]>
Reviewed-by: Gavin Shan <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

pipe: Fix missing lock in pipe_resize_ring()

pipe_resize_ring() needs to take the pipe->rd_wait.lock spinlock to
prevent post_one_notification() from trying to insert into the ring
whilst the ring is being replaced.

The occupancy check must be done after the lock is taken, and the lock
must be taken after the new ring is allocated.

The bug can lead to an oops looking something like:

BUG: KASAN: use-after-free in post_one_notification.isra.0+0x62e/0x840
Read of size 4 at addr ffff88801cc72a70 by task poc/27196
...
Call Trace:
  post_one_notification.isra.0+0x62e/0x840
  __post_watch_notification+0x3b7/0x650
  key_create_or_update+0xb8b/0xd20
  __do_sys_add_key+0x175/0x340
  __x64_sys_add_key+0xbe/0x140
  do_syscall_64+0x5c/0xc0
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported by Selim Enes Karaduman @Enesdex working with Trend Micro Zero
Day Initiative.

Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: [email protected] # ZDI-CAN-17291
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arm64: dts: rockchip: rename Quartz64-A bluetooth gpios

The bluetooth binding for the Quartz64 Model A has incorrectly named
host-wakeup and device-wakeup gpios. Rename them to clear some dtbs_check
warnings.

Signed-off-by: Peter Geis <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Stuebner <[email protected]>

arm64: dts: rockchip: add clocks property to cru node rk3368

Add clocks and clock-names because the device has to have
at least one input clock.
Also in case someone wants to add properties that start with
assign-xxx to fix warnings like:
'clocks' is a dependency of 'assigned-clocks'

Signed-off-by: Johan Jonker <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Stuebner <[email protected]>

arm64: dts: rockchip: add clocks property to cru node rk3308

Add clocks and clock-names to the rk3308 cru node, because
the device has to have at least one input clock.
Also in case someone wants to add properties that start with
assign-xxx to fix warnings like:
'clocks' is a dependency of 'assigned-clocks'
With the addition of new properties also sort the node properties
a little bit.

Signed-off-by: Johan Jonker <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Stuebner <[email protected]>

arm64: dts: rockchip: add clocks to rk356x cru

The rk356x cru requires a 24m clock input to function. Add the clocks
properties to the cru to clear some dtbs_check warnings.

Signed-off-by: Peter Geis <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Stuebner <[email protected]>

ARM: dts: rockchip: add clocks property to cru node rk3228

Add clocks and clock-names to the rk3228 cru node, because
the device has to have at least one input clock.

Signed-off-by: Johan Jonker <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Stuebner <[email protected]>

ARM: dts: rockchip: add clocks property to cru node rk3036

Add clocks and clock-names to the rk3036 cru node, because
the device has to have at least one input clock.

Signed-off-by: Johan Jonker <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Stuebner <[email protected]>

ARM: dts: rockchip: add clocks property to cru node rk3066a/rk3188

Add clocks property to rk3066a/rk3188 cru node to fix warnings like:
'clocks' is a dependency of 'assigned-clocks'

Signed-off-by: Johan Jonker <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Stuebner <[email protected]>

ARM: dts: rockchip: add clocks property to cru node rk3288

Add clocks property to rk3288 cru node to fix warnings like:
'clocks' is a dependency of 'assigned-clocks'.

Signed-off-by: Johan Jonker <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Stuebner <[email protected]>

ARM: dts: rockchip: Remove "amba" bus nodes from rv1108

The "amba" bus nodes wrapping all the DMA-330 nodes serve no useful
purpose, and certainly bear no relation at all to the actual underlying
interconnect topology. They appear to be cargo-cult copying from a
design misstep in the very early days of FDT adoption on ARM, which was
righted with the "arm,primecell" compatible, and the last trace of the
idea finally purged by commit 2ef7d5f342c1 ("ARM, ARM64: dts: drop
"arm,amba-bus" in favor of "simple-bus"").

As such, they can simply be removed and the DMA-330 nodes fitted into
the normal sort order.
The node names should be generic, so rename it to "dma-controller".

Signed-off-by: Johan Jonker <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Stuebner <[email protected]>

ARM: dts: rockchip: add clocks property to cru node rv1108

Add clocks and clock-names to the rv1108 cru node, because
the device has to have at least one input clock.

Signed-off-by: Johan Jonker <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Stuebner <[email protected]>

smb3: remove unneeded null check in cifs_readdir

Coverity pointed out an unneeded check.

Addresses-Coverity: 1518030 ("Null pointer dereferences")
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

mm/shmem.c: suppress shift warning

mm/shmem.c:1948 shmem_getpage_gfp() warn: should '(((1) << 12) / 512) << folio_order(folio)' be a 64 bit type?

On i386, so an unsigned long is 32-bit, but i_blocks is a 64-bit blkcnt_t.

Reported-by: kernel test robot <[email protected]>
Reported-by: Jessica Clarke <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: Kconfig: reorganize misplaced mm options

After commits 7b42f1041c98 ("mm: Kconfig: move swap and slab config
options to the MM section") and 519bcb797907 ("mm: Kconfig: group swap,
slab, hotplug and thp options into submenus") we now have nicely organized
mm related config options. I have noticed some that were still misplaced,
so this moves them from various places into the new structure:

VM_EVENT_COUNTERS, COMPAT_BRK, MMAP_ALLOW_UNINITIALIZED to mm/Kconfig and
general MM section.

SLUB_STATS to mm/Kconfig and the slab submenu.

DEBUG_SLAB, SLUB_DEBUG, SLUB_DEBUG_ON to mm/Kconfig.debug and the Kernel
hacking / Memory Debugging submenu.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: kasan: fix input of vmalloc_to_page()

When print virtual mapping info for vmalloc address, it should pass
the addr not page, fix it.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: c056a364e954 ("kasan: print virtual mapping info in reports")
Signed-off-by: Kefeng Wang <[email protected]>
Reviewed-by: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: fix is_pinnable_page against a cma page

Pages in the CMA area could have MIGRATE_ISOLATE as well as MIGRATE_CMA so
the current is_pinnable_page() could miss CMA pages which have
MIGRATE_ISOLATE.  It ends up pinning CMA pages as longterm for the
pin_user_pages() API so CMA allocations keep failing until the pin is
released.

     CPU 0                                   CPU 1 - Task B

cma_alloc
alloc_contig_range
                                        pin_user_pages_fast(FOLL_LONGTERM)
change pageblock as MIGRATE_ISOLATE
                                        internal_get_user_pages_fast
                                        lockless_pages_from_mm
                                        gup_pte_range
                                        try_grab_folio
                                        is_pinnable_page
                                          return true;
                                        So, pinned the page successfully.
page migration failure with pinned page
                                        ..
                                        .. After 30 sec
                                        unpin_user_page(page)

CMA allocation succeeded after 30 sec.

The CMA allocation path protects the migration type change race using
zone->lock but what GUP path need to know is just whether the page is on
CMA area or not rather than exact migration type.  Thus, we don't need
zone->lock but just checks migration type in either of (MIGRATE_ISOLATE
and MIGRATE_CMA).

Adding the MIGRATE_ISOLATE check in is_pinnable_page could cause rejecting
of pinning pages on MIGRATE_ISOLATE pageblocks even though it's neither
CMA nor movable zone if the page is temporarily unmovable.  However, such
a migration failure by unexpected temporal refcount holding is general
issue, not only come from MIGRATE_ISOLATE and the MIGRATE_ISOLATE is also
transient state like other temporal elevated refcount problem.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Minchan Kim <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Cc: David Hildenbrand <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: filter out swapin error entry in shmem mapping

There might be swapin error entries in shmem mapping. Filter them out to
avoid "Bad swap file entry" complaint.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Howells <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/shmem: fix infinite loop when swap in shmem error at swapoff time

When swap in shmem error at swapoff time, there would be a infinite loop
in the while loop in shmem_unuse_inode().  It's because swapin error is
deliberately ignored now and thus info->swapped will never reach 0.  So we
can't escape the loop in shmem_unuse().

In order to fix the issue, swapin_error entry is stored in the mapping
when swapin error occurs.  So the swapcache page can be freed and the user
won't end up with a permanently mounted swap because a sector is bad.  If
the page is accessed later, the user process will be killed so that
corrupted data is never consumed.  On the other hand, if the page is never
accessed, the user won't even notice it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reported-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Howells <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range

Once the MADV_FREE operation has succeeded, callers can expect they might
get zero-fill pages if accessing the memory again. Therefore it should be
safe to delete the hwpoison entry and swapin error entry. There is no
reason to kill the process if it has called MADV_FREE on the range.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Suggested-by: Alistair Popple <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: David Howells <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/swapfile: fix lost swap bits in unuse_pte()

This is observed by code review only but not any real report.

When we turn off swapping we could have lost the bits stored in the swap
ptes. The new rmap-exclusive bit is fine since that turned into a page
flag, but not for soft-dirty and uffd-wp. Add them.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Suggested-by: Peter Xu <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: David Howells <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/swapfile: unuse_pte can map random data if swap read fails

Patch series "A few fixup patches for mm", v4.

This series contains a few patches to avoid mapping random data if swap
read fails and fix lost swap bits in unuse_pte.  Also we free hwpoison and
swapin error entry in madvise_free_pte_range and so on.  More details can
be found in the respective changelogs.

This patch (of 5):

There is a bug in unuse_pte(): when swap page happens to be unreadable,
page filled with random data is mapped into user address space.  In case
of error, a special swap entry indicating swap read fails is set to the
page table.  So the swapcache page can be freed and the user won't end up
with a permanently mounted swap because a sector is bad.  And if the page
is accessed later, the user process will be killed so that corrupted data
is never consumed.  On the other hand, if the page is never accessed, the
user won't even notice it.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: David Howells <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Alistair Popple <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests: memcg: factor out common parts of memory.{low,min} tests

The memory protection test setup and runtime is almost equal for
memory.low and memory.min cases.

It makes modification of the common parts prone to mistakes, since the
protections are similar not only in setup but also in principle, factor
the common part out.

Past exceptions between the tests:
- missing memory.min is fine (kept),
- test_memcg_low protected orphaned pagecache (adapted like
test_memcg_min and we keep the processes of protected memory running).

The evaluation in two tests is different (OOM of allocator vs low events
of protégés), this is kept different.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Koutný <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
CC: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Richard Palethorpe <[email protected]>
Cc: David Vernet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests: memcg: remove protection from top level memcg

The reclaim is triggered by memory limit in a subtree, therefore the
testcase does not need configured protection against external reclaim.

Also, correct respective comments.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Koutný <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: David Vernet <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Richard Palethorpe <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests: memcg: adjust expected reclaim values of protected cgroups

The numbers are not easy to derive in a closed form (certainly mere
protections ratios do not apply), therefore use a simulation to obtain
expected numbers.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Koutný <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: David Vernet <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Richard Palethorpe <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests: memcg: expect no low events in unprotected sibling

This is effectively a revert of commit cdc69458a5f3 ("cgroup: account for
memory_recursiveprot in test_memcg_low()"). The case test_memcg_low will
fail with memory_recursiveprot until resolved in reclaim code.

However, this patch preserves the existing helpers and variables for later
uses.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Koutný <[email protected]>
Reviewed-by: David Vernet <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Richard Palethorpe <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests: memcg: fix compilation

Patch series "memcontrol selftests fixups", v2.

Flushing the patches to make memcontrol selftests check the events
behavior we had consensus about (test_memcg_low fails).

(test_memcg_reclaim, test_memcg_swap_max fail for me now but it's present
even before the refactoring.)

The two bigger changes are:
- adjustment of the protected values to make tests succeed with the given
  tolerance,
- both test_memcg_low and test_memcg_min check protection of memory in
  populated cgroups (actually as per Documentation/admin-guide/cgroup-v2.rst
  memory.min should not apply to empty cgroups, which is not the case
  currently. Therefore I unified tests with the populated case in order to to
  bring more broken tests).

This patch (of 5):

This fixes mis-applied changes from commit 72b1e03aa725 ("cgroup: account
for memory_localevents in test_memcg_oom_group_leaf_events()").

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Koutný <[email protected]>
Reviewed-by: David Vernet <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Richard Palethorpe <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/z3fold: fix z3fold_page_migrate races with z3fold_map

Think about the below scenario:

CPU1 CPU2
z3fold_page_migrate z3fold_map
  z3fold_page_trylock
  ...
  z3fold_page_unlock
  /* slots still points to old zhdr*/
get_z3fold_header
  get slots from handle
  get old zhdr from slots
  z3fold_page_trylock
  return *old* zhdr
  encode_handle(new_zhdr, FIRST|LAST|MIDDLE)
  put_page(page) /* zhdr is freed! */
but zhdr is still used by caller!

z3fold_map can map freed z3fold page and lead to use-after-free bug.  To
fix it, we add PAGE_MIGRATED to indicate z3fold page is migrated and soon
to be released.  So get_z3fold_header won't return such page.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/z3fold: fix z3fold_reclaim_page races with z3fold_free

Think about the below scenario:

CPU1 CPU2
z3fold_reclaim_page z3fold_free
spin_lock(&pool->lock) get_z3fold_header -- hold page_lock
kref_get_unless_zero
kref_put--zhdr->refcount can be 1 now
!z3fold_page_trylock
  kref_put -- zhdr->refcount is 0 now
   release_z3fold_page
    WARN_ON(!list_empty(&zhdr->buddy)); -- we're on buddy now!
    spin_lock(&pool->lock); -- deadlock here!

z3fold_reclaim_page might race with z3fold_free and will lead to pool lock
deadlock and zhdr buddy non-empty warning.  To fix this, defer getting the
refcount until page_lock is held just like what __z3fold_alloc does.  Note
this has the side effect that we won't break the reclaim if we meet a soon
to be released z3fold page now.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: dcf5aedb24f8 ("z3fold: stricter locking and more careful reclaim")
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/z3fold: always clear PAGE_CLAIMED under z3fold page lock

Think about the below race window:

CPU1 CPU2
z3fold_reclaim_page z3fold_free
test_and_set_bit PAGE_CLAIMED
failed to reclaim page
z3fold_page_lock(zhdr);
add back to the lru list;
z3fold_page_unlock(zhdr);
get_z3fold_header
page_claimed=test_and_set_bit PAGE_CLAIMED

clear_bit(PAGE_CLAIMED, &page->private);

if (!page_claimed) /* it's false true */
free_handle is not called

free_handle won't be called in this case. So z3fold_buddy_slots will leak.
Fix it by always clear PAGE_CLAIMED under z3fold page lock.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/z3fold: put z3fold page back into unbuddied list when reclaim or migration fails

When doing z3fold page reclaim or migration, the page is removed from
unbuddied list.  If reclaim or migration succeeds, it's fine as page is
released.  But in case it fails, the page is not put back into unbuddied
list now.  The page will be leaked until next compaction work, reclaim or
migration is done.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

revert "mm/z3fold.c: allow __GFP_HIGHMEM in z3fold_alloc"

Revert commit f1549cb5ab2b ("mm/z3fold.c: allow __GFP_HIGHMEM in
z3fold_alloc").

z3fold can't support GFP_HIGHMEM page now.  page_address is used directly
at all places.  Moreover, z3fold_header is on per cpu unbuddied list which
could be accessed anytime.  So we should remove the support of GFP_HIGHMEM
allocation for z3fold.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/z3fold: throw warning on failure of trylock_page in z3fold_alloc

If trylock_page fails, the page won't be non-lru movable page. When this
page is freed via free_z3fold_page, it will trigger bug on PageMovable
check in __ClearPageMovable. Throw warning on failure of trylock_page to
guard against such rare case just as what zsmalloc does.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/z3fold: remove buggy use of stale list for allocation

Currently if z3fold couldn't find an unbuddied page it would first try to
pull a page off the stale list.  But this approach is problematic.  If
init z3fold page fails later, the page should be freed via
free_z3fold_page to clean up the relevant resource instead of using
__free_page directly.  And if page is successfully reused, it will BUG_ON
later in __SetPageMovable because it's already non-lru movable page, i.e.
PAGE_MAPPING_MOVABLE is already set in page->mapping.  In order to fix all
of these issues, we can simply remove the buggy use of stale list for
allocation because can_sleep should always be false and we never really
hit the reusing code path now.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/z3fold: fix possible null pointer dereferencing

alloc_slots could fail to allocate memory under heavy memory pressure. So
we should check zhdr->slots against NULL to avoid future null pointer
dereferencing.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: fc5488651c7d ("z3fold: simplify freeing slots")
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/z3fold: fix sheduling while atomic

Patch series "A few fixup patches for z3fold".

This series contains a few fixup patches to fix sheduling while atomic,
fix possible null pointer dereferencing, fix various race conditions and
so on. More details can be found in the respective changelogs.

This patch (of 9):

z3fold's page_lock is always held when calling alloc_slots. So gfp should
be GFP_ATOMIC to avoid "scheduling while atomic" bug.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: fc5488651c7d ("z3fold: simplify freeing slots")
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: split free page with properly free memory accounting and without race

In isolate_single_pageblock(), free pages are checked without holding zone
lock, but they can go away in split_free_page() when zone lock is held.
Check the free page and its order again in split_free_page() when zone lock
is held. Recheck the page if the free page is gone under zone lock.

In addition, in split_free_page(), the free page was deleted from the page
list without changing free page accounting. Add the missing free page
accounting code.

Fix the type of order parameter in split_free_page().

Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
Signed-off-by: Zi Yan <[email protected]>
Reported-by: Doug Berger <[email protected]>
Link: https://lore.kernel.org/linux-mm/[email protected]/
Cc: David Hildenbrand <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Eric Ren <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Michael Walle <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: page-isolation: skip isolated pageblock in start_isolate_page_range()

start_isolate_page_range() first isolates the first and the last
pageblocks in the range and ensure pages across range boundaries are split
during isolation. But it missed the case when the range is <= a pageblock
and the first and the last pageblocks are the same one, so the second
isolate_single_pageblock() will always fail. To fix it, skip the
pageblock isolation in second isolate_single_pageblock().

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 88ee134320b8 ("mm: fix a potential infinite loop in start_isolate_page_range()")
Signed-off-by: Zi Yan <[email protected]>
Reported-by: Marek Szyprowski <[email protected]>
Tested-by: Marek Szyprowski <[email protected]>
Link: https://lore.kernel.org/linux-mm/[email protected]/
Reported-by: Michael Walle <[email protected]>
Tested-by: Michael Walle <[email protected]>
Link: https://lore.kernel.org/linux-mm/[email protected]/
Cc: Christophe Leroy <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Doug Berger <[email protected]>
Cc: Eric Ren <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

tools arch x86: Sync the msr-index.h copy with the kernel sources

To pick up the changes in:

  db1af12929c99d15 ("x86/msr-index: Define INTEGRITY_CAPABILITIES MSR")
  089be16d5992dd0b ("x86/msr: Add PerfCntrGlobal* registers")
  f52ba93190457aa2 ("tools/power turbostat: Add Power Limit4 support")

Addressing these tools/perf build warnings:

    diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
    Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'

That makes the beautification scripts to pick some new entries:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  --- before 2022-05-26 12:50:01.228612839 -0300
  +++ after 2022-05-26 12:50:07.699776166 -0300
  @@ -116,6 +116,7 @@
    [0x0000026f] = "MTRRfix4K_F8000",
    [0x00000277] = "IA32_CR_PAT",
    [0x00000280] = "IA32_MC0_CTL2",
  + [0x000002d9] = "INTEGRITY_CAPS",
    [0x000002ff] = "MTRRdefType",
    [0x00000309] = "CORE_PERF_FIXED_CTR0",
    [0x0000030a] = "CORE_PERF_FIXED_CTR1",
  @@ -176,6 +177,7 @@
    [0x00000586] = "IA32_RTIT_ADDR3_A",
    [0x00000587] = "IA32_RTIT_ADDR3_B",
    [0x00000600] = "IA32_DS_AREA",
  + [0x00000601] = "VR_CURRENT_CONFIG",
    [0x00000606] = "RAPL_POWER_UNIT",
    [0x0000060a] = "PKGC3_IRTL",
    [0x0000060b] = "PKGC6_IRTL",
  @@ -260,6 +262,10 @@
    [0xc0000102 - x86_64_specific_MSRs_offset] = "KERNEL_GS_BASE",
    [0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX",
    [0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO",
  + [0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG",
  + [0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS",
  + [0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL",
  + [0xc0000302 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS_CLR",
   };

   #define x86_AMD_V_KVM_MSRs_offset 0xc0010000
  @@ -318,4 +324,5 @@
    [0xc00102b4 - x86_AMD_V_KVM_MSRs_offset] = "AMD_CPPC_STATUS",
    [0xc00102f0 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN_CTL",
    [0xc00102f1 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN",
  + [0xc0010300 - x86_AMD_V_KVM_MSRs_offset] = "AMD_SAMP_BR_FROM",
   };
  $

Now one can trace systemwide asking to see backtraces to where those
MSRs are being read/written, see this example with a previous update:

  # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
  ^C#

If we use -v (verbose mode) we can see what it does behind the scenes:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
  Using CPUID AuthenticAMD-25-21-0
  0x6a0
  0x6a8
  New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
  0x6a0
  0x6a8
  New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
  mmap size 528384B
  ^C#

Example with a frequent msr:

  # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
  Using CPUID AuthenticAMD-25-21-0
  0x48
  New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  0x48
  New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
  mmap size 528384B
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: build id mismatch for vmlinux.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
     0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       __switch_to_xtra ([kernel.kallsyms])
                                       __switch_to ([kernel.kallsyms])
                                       __schedule ([kernel.kallsyms])
                                       schedule ([kernel.kallsyms])
                                       futex_wait_queue_me ([kernel.kallsyms])
                                       futex_wait ([kernel.kallsyms])
                                       do_futex ([kernel.kallsyms])
                                       __x64_sys_futex ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                       __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
     0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
                                       do_trace_write_msr ([kernel.kallsyms])
                                       do_trace_write_msr ([kernel.kallsyms])
                                       __switch_to_xtra ([kernel.kallsyms])
                                       __switch_to ([kernel.kallsyms])
                                       __schedule ([kernel.kallsyms])
                                       schedule_idle ([kernel.kallsyms])
                                       do_idle ([kernel.kallsyms])
                                       cpu_startup_entry ([kernel.kallsyms])
                                       secondary_startup_64_no_verify ([kernel.kallsyms])
  #

Cc: Adrian Hunter <[email protected]>
Cc: Hans de Goede <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Sumeet Pawnikar <[email protected]>
Cc: Tony Luck <[email protected]>
Link: https://lore.kernel.org/lkml/Yo+i%[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf scripts python: Support Arm CoreSight trace data disassembly

This commit adds python script to parse CoreSight tracing event and
print out source line and disassembly, it generates readable program
execution flow for easier humans inspecting.

The script receives CoreSight tracing packet with below format:

                +------------+------------+------------+
  packet(n):    |    addr    |    ip      |    cpu     |
                +------------+------------+------------+
  packet(n+1):  |    addr    |    ip      |    cpu     |
                +------------+------------+------------+

packet::addr presents the start address of the coming branch sample, and
packet::ip is the last address of the branch smple.  Therefore, a code
section between branches starts from packet(n)::addr and it stops at
packet(n+1)::ip.  As results we combines the two continuous packets to
generate the address range for instructions:

  [ sample(n)::addr .. sample(n+1)::ip ]

The script supports both objdump or llvm-objdump for disassembly with
specifying option '-d'.  If doesn't specify option '-d', the script
simply outputs source lines and symbols.

Below shows usages with llvm-objdump or objdump to output disassembly.

  # perf script -s scripts/python/arm-cs-trace-disasm.py -- -d llvm-objdump-11 -k ./vmlinux
  ARM CoreSight Trace Data Assembler Dump
   ffff800008eb3198 <etm4_enable_hw>:
   ffff800008eb3310: c0 38 00 35   cbnz w0, 0xffff800008eb3a28 <etm4_enable_hw+0x890>
   ffff800008eb3314: 9f 3f 03 d5   dsb sy
   ffff800008eb3318: df 3f 03 d5   isb
   ffff800008eb331c: f5 5b 42 a9   ldp x21, x22, [sp, #32]
   ffff800008eb3320: fb 73 45 a9   ldp x27, x28, [sp, #80]
   ffff800008eb3324: e0 82 40 39   ldrb w0, [x23, #32]
   ffff800008eb3328: 60 00 00 34   cbz w0, 0xffff800008eb3334 <etm4_enable_hw+0x19c>
   ffff800008eb332c: e0 03 19 aa   mov x0, x25
   ffff800008eb3330: 8c fe ff 97   bl 0xffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>
              main  6728/6728  [0004]         0.000000000  etm4_enable_hw+0x198                    [kernel.kallsyms]
   ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>:
   ffff800008eb2d60: 1f 20 03 d5   nop
   ffff800008eb2d64: 1f 20 03 d5   nop
   ffff800008eb2d68: 3f 23 03 d5   hint #25
   ffff800008eb2d6c: 00 00 40 f9   ldr x0, [x0]
   ffff800008eb2d70: 9f 3f 03 d5   dsb sy
   ffff800008eb2d74: 00 c0 3e 91   add x0, x0, #4016
   ffff800008eb2d78: 1f 00 00 b9   str wzr, [x0]
   ffff800008eb2d7c: bf 23 03 d5   hint #29
   ffff800008eb2d80: c0 03 5f d6   ret
              main  6728/6728  [0004]         0.000000000  etm4_cs_lock.isra.0.part.0+0x20

  # perf script -s scripts/python/arm-cs-trace-disasm.py -- -d objdump -k ./vmlinux
  ARM CoreSight Trace Data Assembler Dump
   ffff800008eb3310 <etm4_enable_hw+0x178>:
   ffff800008eb3310: 350038c0 cbnz w0, ffff800008eb3a28 <etm4_enable_hw+0x890>
   ffff800008eb3314: d5033f9f dsb sy
   ffff800008eb3318: d5033fdf isb
   ffff800008eb331c: a9425bf5 ldp x21, x22, [sp, #32]
   ffff800008eb3320: a94573fb ldp x27, x28, [sp, #80]
   ffff800008eb3324: 394082e0 ldrb w0, [x23, #32]
   ffff800008eb3328: 34000060 cbz w0, ffff800008eb3334 <etm4_enable_hw+0x19c>
   ffff800008eb332c: aa1903e0 mov x0, x25
   ffff800008eb3330: 97fffe8c bl ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>
              main  6728/6728  [0004]         0.000000000  etm4_enable_hw+0x198                    [kernel.kallsyms]
   ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>:
   ffff800008eb2d60: d503201f nop
   ffff800008eb2d64: d503201f nop
   ffff800008eb2d68: d503233f paciasp
   ffff800008eb2d6c: f9400000 ldr x0, [x0]
   ffff800008eb2d70: d5033f9f dsb sy
   ffff800008eb2d74: 913ec000 add x0, x0, #0xfb0
   ffff800008eb2d78: b900001f str wzr, [x0]
   ffff800008eb2d7c: d50323bf autiasp
   ffff800008eb2d80: d65f03c0 ret
              main  6728/6728  [0004]         0.000000000  etm4_cs_lock.isra.0.part.0+0x20

Signed-off-by: Leo Yan <[email protected]>
Co-authored-by: Al Grant <[email protected]>
Co-authored-by: Mathieu Poirier <[email protected]>
Co-authored-by: Tor Jeremiassen <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Eelco Chaudron <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephen Brennan <[email protected]>
Cc: Tanmay Jagdale <[email protected]>
Cc: [email protected]
Cc: zengshun . wu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf scripting python: Expose dso and map information

This change adds dso build_id and corresponding map's start and end
address. The info of dso build_id can be used to find dso file path,
and we can validate if a branch address falls into the range of map's
start and end addresses.

In addition, the map's start address can be used as an offset for
disassembly.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Cc: Al Grant <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Eelco Chaudron <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephen Brennan <[email protected]>
Cc: Tanmay Jagdale <[email protected]>
Cc: [email protected]
Cc: zengshun . wu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf jevents: Fix event syntax error caused by ExtSel

In the origin code, when "ExtSel" is 1, the eventcode will change to
"eventcode |= 1 << 21”. For event “UNC_Q_RxL_CREDITS_CONSUMED_VN0.DRS",
its "ExtSel" is "1", its eventcode will change from 0x1E to 0x20001E,
but in fact the eventcode should <=0x1FF, so this will cause the parse
fail:

  # perf stat -e "UNC_Q_RxL_CREDITS_CONSUMED_VN0.DRS" -a sleep 0.1
  event syntax error: '.._RxL_CREDITS_CONSUMED_VN0.DRS'
                                    \___ value too big for format, maximum is 511

On the perf kernel side, the kernel assumes the valid bits are continuous.
It will adjust the 0x100 (bit 8 for perf tool) to bit 21 in HW.

DEFINE_UNCORE_FORMAT_ATTR(event_ext, event, "config:0-7,21");

So the perf tool follows the kernel side and just set bit8 other than bit21.

Fixes: fedb2b518239cbc0 ("perf jevents: Add support for parsing uncore json files")
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Xing Zhengjun <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf tools arm64: Add support for VG register

Add the name of the VG register so it can be used in --user-regs

The event will fail to open if the register is requested but not
available so only add it to the mask if the kernel supports sve and also
if it supports that specific register.

Committer notes:

Add conditional definition of HWCAP_SVE, as suggested by Leo Yan, to
build on older systems where this is not available in the system
headers.

Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: James Clark <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mathieu Poirier <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

mm/page_table_check: fix accessing unmapped ptep

ptep is unmapped too early, so ptep could theoretically be accessed while
it's unmapped. This might become a problem if/when CONFIG_HIGHPTE becomes
available on riscv.

Fix it by deferring pte_unmap() until page table checking is done.

[[email protected]: account for ptep alteration, per Matthew]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 80110bbfbba6 ("mm/page_table_check: check entries at pmd levels")
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: Pasha Tatashin <[email protected]>
Cc: Qi Zheng <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]

Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [1], binutils (v2.36+) started dropping section symbols that
it thought were unused. This isn't an issue in general, but with
kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a
separate .text.unlikely section and the section symbol ".text.unlikely"
is being dropped. Due to this, recordmcount is unable to find a non-weak
symbol in .text.unlikely to generate a relocation record against.

Address this by dropping the weak attribute from these functions.
Instead, follow the existing pattern of having architectures #define the
name of the function they want to override in their headers.

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1

[[email protected]: arch/s390/include/asm/kexec.h needs linux/module.h]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Naveen N. Rao <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/page_alloc: always attempt to allocate at least one page during bulk allocation

Peter Pavlisko reported the following problem on kernel bugzilla 216007.

When I try to extract an uncompressed tar archive (2.6 milion
files, 760.3 GiB in size) on newly created (empty) XFS file system,
after first low tens of gigabytes extracted the process hangs in
iowait indefinitely. One CPU core is 100% occupied with iowait,
the other CPU core is idle (on 2-core Intel Celeron G1610T).

It was bisected to c9fa563072e1 ("xfs: use alloc_pages_bulk_array() for
buffers") but XFS is only the messenger.  The problem is that nothing is
waking kswapd to reclaim some pages at a time the PCP lists cannot be
refilled until some reclaim happens.  The bulk allocator checks that there
are some pages in the array and the original intent was that a bulk
allocator did not necessarily need all the requested pages and it was best
to return as quickly as possible.

This was fine for the first user of the API but both NFS and XFS require
the requested number of pages be available before making progress.  Both
could be adjusted to call the page allocator directly if a bulk allocation
fails but it puts a burden on users of the API.  Adjust the semantics to
attempt at least one allocation via __alloc_pages() before returning so
kswapd is woken if necessary.

It was reported via bugzilla that the patch addressed the problem and that
the tar extraction completed successfully.  This may also address bug
215975 but has yet to be confirmed.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216007
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215975
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
Signed-off-by: Mel Gorman <[email protected]>
Cc: "Darrick J. Wong" <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Jesper Dangaard Brouer <[email protected]>
Cc: Chuck Lever <[email protected]>
Cc: <[email protected]> [5.13+]
Signed-off-by: Andrew Morton <[email protected]>

hugetlb: fix huge_pmd_unshare address update

The routine huge_pmd_unshare() is passed a pointer to an address
associated with an area which may be unshared.  If unshare is successful
this address is updated to 'optimize' callers iterating over huge page
addresses.  For the optimization to work correctly, address should be
updated to the last huge page in the unmapped/unshared area.  However, in
the common case where the passed address is PUD_SIZE aligned, the address
is incorrectly updated to the address of the preceding huge page.  That
wastes CPU cycles as the unmapped/unshared range is scanned twice.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 39dde65c9940 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz <[email protected]>
Acked-by: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/bpf: fix stacktrace_build_id with missing kprobe/urandom_read

Kernel function urandom_read is replaced with urandom_read_iter.
Therefore, kprobe on urandom_read is not working any more:

[root@eth50-1 bpf]# ./test_progs -n 161
test_stacktrace_build_id:PASS:skel_open_and_load 0 nsec
libbpf: kprobe perf_event_open() failed: No such file or directory
libbpf: prog 'oncpu': failed to create kprobe 'urandom_read+0x0' \
perf event: No such file or directory
libbpf: prog 'oncpu': failed to auto-attach: -2
test_stacktrace_build_id:FAIL:attach_tp err -2
161 stacktrace_build_id:FAIL

Fix this by replacing urandom_read with urandom_read_iter in the test.

Fixes: 1b388e7765f2 ("random: convert to using fops->read_iter()")
Reported-by: Mykola Lysenko <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Acked-by: David Vernet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

arm64: dts: sprd: use new 'dma-channels' property

The '#dma-channels' property was deprecated in favor of one defined by
generic dma-common DT bindings. Add new property while keeping old one
for backwards compatibility.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]'
Signed-off-by: Arnd Bergmann <[email protected]>

ARM: dts: da850: use new 'dma-channels' property

The '#dma-channels' property was deprecated in favor of one defined by
generic dma-common DT bindings. Add new property while keeping old one
for backwards compatibility.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]'
Signed-off-by: Arnd Bergmann <[email protected]>

ARM: dts: pxa: use new 'dma-channels/requests' properties

The '#dma-channels' and '#dma-requests' properties were deprecated in
favor of these defined by generic dma-common DT bindings. Add new
properties while keeping old ones for backwards compatibility.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]'
Signed-off-by: Arnd Bergmann <[email protected]>

soc: ixp4xx/qmgr: Fix unused match warning

The kernel test robot found this inconsistency:

>> drivers/soc/ixp4xx/ixp4xx-npe.c:737:34: warning:
'ixp4xx_npe_of_match' defined but not used [-Wunused-const-variable=]
737 | static const struct of_device_id ixp4xx_npe_of_match[] = {

This is because the match is enclosed in the of_match_ptr()
which compiles into NULL when OF is disabled and this
is unnecessary.

Fix it by dropping of_match_ptr() around the match.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]'
Signed-off-by: Arnd Bergmann <[email protected]>

ARM: ep93xx: Make ts72xx_register_flash() static

... and fix the warning/error:

arch/arm/mach-ep93xx/ts72xx.c:154:13: error: no previous prototype for function 'ts72xx_register_flash' [-Werror,-Wmissing-prototypes]
void __init ts72xx_register_flash(struct mtd_partition *parts, int n,
^
arch/arm/mach-ep93xx/ts72xx.c:154:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void __init ts72xx_register_flash(struct mtd_partition *parts, int n,
^
static

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Alexander Sverdlin <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/T/
Link: https://lore.kernel.org/r/[email protected]'
Signed-off-by: Arnd Bergmann <[email protected]>

ARM: configs: enable support for Kontron KSwitch D10

The Kontron KSwitch D10 is based on a Microchip LAN9668 SoC. It is a
managed ethernet network switch with either 8 copper ports or 6 copper
ports and 2 SFP cages.

Enable all required kconfig symbols, either as module where possible or
compiled-in where it is not possible.

Signed-off-by: Michael Walle <[email protected]>
Acked-by: Nicolas Ferre <[email protected]>
Link: https://lore.kernel.org/r/[email protected]'
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'at91-dt-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/late

AT91 DT #2 for 5.19:

- at91: more DT compliance updates for RTC and RTT nodes
- at91: sama7g5: add microphone support

* tag 'at91-dt-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
  ARM: dts: at91: sama7g5ek: add node for PDMC0
  ARM: dts: at91: sama7g5: add nodes for PDMC
  ARM: dts: at91: Use the generic "rtc" node name for the rtt IPs
  ARM: dts: at91: Add the required 'atmel, rtt-rtc-time-reg' property

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'at91-soc-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/late

AT91 SoC #2 for 5.19:

- One Kconfig fix for random build error

* tag 'at91-soc-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
ARM: at91: pm: Fix rand build error

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

ep93xx: clock: Do not return the address of the freed memory

Avoid return freed memory addresses,Modified to the actual error
return value of clk_register().

Fixes: 9645ccc7bd7a ("ep93xx: clock: convert in-place to COMMON_CLK")
Signed-off-by: Genjian Zhang <[email protected]>
Acked-by: Alexander Sverdlin <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>

Merge branch 'hpe/gxp-soc' into arm/late

Patch series from Nick Hawkins:

"The GXP is the HPE BMC SoC that is used in the majority of HPE current
generation servers. Traditionally the asic will last multiple
generations of server before being replaced.

Info about SoC:

  HPE GXP is the name of the HPE Soc. This SoC is used to implement many
  BMC features at HPE. It supports ARMv7 architecture based on the Cortex
  A9 core. It is capable of using an AXI bus to which a memory controller
  is attached. It has multiple SPI interfaces to connect boot flash and
  BIOS flash. It uses a 10/100/1000 MAC for network connectivity. It has
  multiple i2c engines to drive connectivity with a host infrastructure.
  The initial patches enable the watchdog and timer enabling the host to
  be able to boot."

* hpe/gxp-soc:
  MAINTAINERS: Introduce HPE GXP Architecture
  ARM: dts: Introduce HPE GXP Device tree
  dt-bindings: arm: hpe: add GXP Support
  dt-bindings: timer: hpe,gxp-timer: Add HPE GXP Timer and Watchdog
  clocksource/drivers/timer-gxp: Add HPE GXP Timer
  watchdog: hpe-wdt: Introduce HPE GXP Watchdog
  ARM: configs: multi_v7_defconfig: Add HPE GXP ARCH
  ARM: hpe: Introduce the HPE GXP architecture

Signed-off-by: Arnd Bergmann <[email protected]>

powerpc/64: Include cache.h directly in paca.h

paca.h uses ____cacheline_aligned without directly including cache.h,
where it's defined.

For Book3S builds that's OK because paca.h includes lppaca.h, and it
does include cache.h.

But Book3E builds have been getting cache.h indirectly via printk.h,
which is dicey, and in fact that include was recently removed, leading
to build errors such as:

ld: fs/isofs/dir.o:(.bss+0x0): multiple definition of `____cacheline_aligned'; fs/isofs/namei.o:(.bss+0x0): first defined here

So include cache.h directly to fix the build error.

Fixes: 534aa1dc975a ("printk: stop including cache.h from printk.h")
Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

net: usb: qmi_wwan: add Telit 0x1250 composition

Add support for Telit LN910Cx 0x1250 composition

0x1250: rmnet, tty, tty, tty, tty

Signed-off-by: Carlo Lobrano <[email protected]>
Acked-by: Bjørn Mork <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: lan743x: PCI11010 / PCI11414 fix

Fix the MDIO interface declarations to reflect what is currently supported by
the PCI11010 / PCI11414 devices (C22 for RGMII and C22_C45 for SGMII)

Signed-off-by: Raju Lakkaraju <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "printk: wake up all waiters"

This reverts commit 938ba4084abcf6fdd21d9078513c52f8fb9b00d0.

The wait queue @log_wait never has exclusive waiters, so there
is no need to use wake_up_interruptible_all(). Using
wake_up_interruptible() was the correct function to wake all
waiters.

Since there are no exclusive waiters, erroneously changing
wake_up_interruptible() to wake_up_interruptible_all() did not
result in any behavior change. However, using
wake_up_interruptible_all() on a wait queue without exclusive
waiters is fundamentally wrong.

Go back to using wake_up_interruptible() to wake all waiters.

Signed-off-by: John Ogness <[email protected]>
Acked-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following contain more Netfilter fixes for net:

1) syzbot warning in nfnetlink bind, from Florian.

2) Refetch conntrack after __nf_conntrack_confirm(), from Florian Westphal.

3) Move struct nf_ct_timeout back at the bottom of the ctnl_time, to
where it before recent update, also from Florian.

4) Add NL_SET_BAD_ATTR() to nf_tables netlink for proper set element
commands error reporting.
====================

Signed-off-by: David S. Miller <[email protected]>

netfilter: nf_tables: set element extended ACK reporting support

Report the element that causes problems via netlink extended ACK for set
element commands.

Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: cttimeout: fix slab-out-of-bounds read in cttimeout_net_exit

syzbot reports:
BUG: KASAN: slab-out-of-bounds in __list_del_entry_valid+0xcc/0xf0 lib/list_debug.c:42
[..]
list_del include/linux/list.h:148 [inline]
cttimeout_net_exit+0x211/0x540 net/netfilter/nfnetlink_cttimeout.c:617

No reproducer so far. Looking at recent changes in this area
its clear that the free_head must not be at the end of the
structure because nf_ct_timeout structure has variable size.

Reported-by: <[email protected]>
Fixes: 78222bacfca9 ("netfilter: cttimeout: decouple unlink and free on netns destruction")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: conntrack: re-fetch conntrack after insertion

In case the conntrack is clashing, insertion can free skb->_nfct and
set skb->_nfct to the already-confirmed entry.

This wasn't found before because the conntrack entry and the extension
space used to free'd after an rcu grace period, plus the race needs
events enabled to trigger.

Reported-by: <[email protected]>
Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race")
Fixes: 2ad9d7747c10 ("netfilter: conntrack: free extension area immediately")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nfnetlink: fix warn in nfnetlink_unbind

syzbot reports following warn:
WARNING: CPU: 0 PID: 3600 at net/netfilter/nfnetlink.c:703 nfnetlink_unbind+0x357/0x3b0 net/netfilter/nfnetlink.c:694

The syzbot generated program does this:

socket(AF_NETLINK, SOCK_RAW, NETLINK_NETFILTER) = 3
setsockopt(3, SOL_NETLINK, NETLINK_DROP_MEMBERSHIP, [1], 4) = 0

... which triggers 'WARN_ON_ONCE(nfnlnet->ctnetlink_listeners == 0)' check.

Instead of counting, just enable reporting for every bind request
and check if we still have listeners on unbind.

While at it, also add the needed bounds check on nfnl_group2type[]
access.

Reported-by: <[email protected]>
Reported-by: <[email protected]>
Fixes: 2794cdb0b97b ("netfilter: nfnetlink: allow to detect if ctnetlink listeners exist")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

net: dsa: mv88e6xxx: Fix refcount leak in mv88e6xxx_mdios_register

of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when done.

mv88e6xxx_mdio_register() pass the device node to of_mdiobus_register().
We don't need the device node after it.

Add missing of_node_put() to avoid refcount leak.

Fixes: a3c53be55c95 ("net: dsa: mv88e6xxx: Support multiple MDIO busses")
Signed-off-by: Miaoqian Lin <[email protected]>
Reviewed-by: Marek Behún <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: ti: am65-cpsw-nuss: Fix some refcount leaks

of_get_child_by_name() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
am65_cpsw_init_cpts() and am65_cpsw_nuss_probe() don't release
the refcount in error case.
Add missing of_node_put() to avoid refcount leak.

Fixes: b1f66a5bee07 ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support")
Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Miaoqian Lin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: mtk_eth_soc: out of bounds read in mtk_hwlro_get_fdir_entry()

The "fsp->location" variable comes from user via ethtool_get_rxnfc().
Check that it is valid to prevent an out of bounds read.

Fixes: 7aab747e5563 ("net: ethernet: mediatek: add ethtool functions to configure RX flows of HW LRO")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'for-5.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- Enable DM core bioset's per-cpu bio cache if QUEUE_FLAG_POLL set.
   This change improves DM's hipri bio polling (REQ_POLLED) performance
   by 7 - 20% depending on the system.

- Update DM core to use jump_labels to further reduce cost of unlikely
   branches for zoned block devices, dm-stats and swap_bios throttling.

- Various DM core changes to reduce bio-based DM overhead and simplify
   IO accounting.

- Fundamental DM core improvements to dm_io reference counting and the
   elimination of using bio_split()+bio_chain() -- instead DM's
   bio-based IO accounting is updated to account that a split occurred.

- Improve DM core's abnormal bio processing to do less work.

- Improve DM core's hipri polling support to use a single list rather
   than an hlist.

- Update DM core to pass NULL bdev to bio_alloc_clone() so that
   initialization that isn't useful for DM can be elided.

- Add cond_resched to DM stats' various loops that loop over all
   entries.

- Fix incorrect error code return from DM integrity's constructor.

- Make DM crypt's printing of the key constant-time.

- Update bio-based DM multipath to provide high-resolution timer to the
   Historical Service Time (HST) path selector.

* tag 'for-5.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits)
  dm: pass NULL bdev to bio_alloc_clone
  dm cache metadata: remove unnecessary variable in __dump_mapping
  dm mpath: provide high-resolution timer to HST for bio-based
  dm crypt: make printing of the key constant-time
  dm integrity: fix error code in dm_integrity_ctr()
  dm stats: add cond_resched when looping over entries
  dm: improve abnormal bio processing
  dm: simplify bio-based IO accounting further
  dm: put all polled dm_io instances into a single list
  dm: improve dm_io reference counting
  dm: don't grab target io reference in dm_zone_map_bio
  dm: improve bio splitting and associated IO accounting
  dm: switch to bdev based IO accounting interfaces
  dm: pass dm_io instance to dm_io_acct directly
  dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct
  dm: use bio_sectors in dm_aceept_partial_bio
  dm: simplify basic targets
  dm: conditionally enable branching for less used features
  dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio
  dm: move hot dm_io members to same cacheline as dm_target_io
  ...

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
"Small collection of incremental improvement patches:

   - Minor code cleanup patches, comment improvements, etc from static
     tools

   - Clean the some of the kernel caps, reducing the historical stealth
     uAPI leftovers

   - Bug fixes and minor changes for rdmavt, hns, rxe, irdma

   - Remove unimplemented cruft from rxe

   - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs
     layer

   - flush_workqueue(system_unbound_wq) removal

   - Ensure rxe waits for objects to be unused before allowing the core
     to free them

   - Several rc quality bug fixes for hfi1"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits)
  RDMA/rtrs-clt: Fix one kernel-doc comment
  RDMA/hfi1: Remove all traces of diagpkt support
  RDMA/hfi1: Consolidate software versions
  RDMA/hfi1: Remove pointless driver version
  RDMA/hfi1: Fix potential integer multiplication overflow errors
  RDMA/hfi1: Prevent panic when SDMA is disabled
  RDMA/hfi1: Prevent use of lock before it is initialized
  RDMA/rxe: Fix an error handling path in rxe_get_mcg()
  IB/core: Fix typo in comment
  RDMA/core: Fix typo in comment
  IB/hf1: Fix typo in comment
  IB/qib: Fix typo in comment
  IB/iser: Fix typo in comment
  RDMA/mlx4: Avoid flush_scheduled_work() usage
  IB/isert: Avoid flush_scheduled_work() usage
  RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr()
  RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq()
  RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx()
  RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx()
  RDMA/irdma: Add SW mechanism to generate completions on error
  ...

Merge tag 'hardening-v5.19-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kernel hardening fix from Kees Cook:
"This fixes an unlucky build race condition when using the GCC plugins,
  noticed by a few folks.

   - Avoid GCC plugins needing utsrelease.h build target (Masahiro Yamada)"

* tag 'hardening-v5.19-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: use KERNELVERSION for plugin version

Merge tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
"We introduce 'courteous server' in this release. Previously NFSD would
  purge open and lock state for an unresponsive client after one lease
  period (typically 90 seconds). Now, after one lease period, another
  client can open and lock those files and the unresponsive client's
  lease is purged; otherwise if the unresponsive client's open and lock
  state is uncontended, the server retains that open and lock state for
  up to 24 hours, allowing the client's workload to resume after a
  lengthy network partition.

  A longstanding issue with NFSv4 file creation is also addressed.
  Previously a file creation can fail internally, returning an error to
  the client, but leave the newly created file in place as an artifact.
  The file creation code path has been reorganized so that internal
  failures and race conditions are less likely to result in an unwanted
  file creation.

  A fault injector has been added to help exercise paths that are run
  during kernel metadata cache invalidation. These caches contain
  information maintained by user space about exported filesystems. Many
  of our test workloads do not trigger cache invalidation.

  There is one patch that is needed to support PREEMPT_RT and a fix for
  an ancient 'sleep while spin-locked' splat that seems to have become
  easier to hit since v5.18-rc3"

* tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (36 commits)
  NFSD: nfsd_file_put() can sleep
  NFSD: Add documenting comment for nfsd4_release_lockowner()
  NFSD: Modernize nfsd4_release_lockowner()
  NFSD: Fix possible sleep during nfsd4_release_lockowner()
  nfsd: destroy percpu stats counters after reply cache shutdown
  nfsd: Fix null-ptr-deref in nfsd_fill_super()
  nfsd: Unregister the cld notifier when laundry_wq create failed
  SUNRPC: Use RMW bitops in single-threaded hot paths
  NFSD: Clean up the show_nf_flags() macro
  NFSD: Trace filecache opens
  NFSD: Move documenting comment for nfsd4_process_open2()
  NFSD: Fix whitespace
  NFSD: Remove dprintk call sites from tail of nfsd4_open()
  NFSD: Instantiate a struct file when creating a regular NFSv4 file
  NFSD: Clean up nfsd_open_verified()
  NFSD: Remove do_nfsd_create()
  NFSD: Refactor NFSv4 OPEN(CREATE)
  NFSD: Refactor NFSv3 CREATE
  NFSD: Refactor nfsd_create_setattr()
  NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
  ...

net: sched: fixed barrier to prevent skbuff sticking in qdisc backlog

In qdisc_run_begin(), smp_mb__before_atomic() used before test_bit()
does not provide any ordering guarantee as test_bit() is not an atomic
operation. This, added to the fact that the spin_trylock() call at
the beginning of qdisc_run_begin() does not guarantee acquire
semantics if it does not grab the lock, makes it possible for the
following statement :

if (test_bit(__QDISC_STATE_MISSED, &qdisc->state))

to be executed before an enqueue operation called before
qdisc_run_begin().

As a result the following race can happen :

           CPU 1                             CPU 2

      qdisc_run_begin()               qdisc_run_begin() /* true */
        set(MISSED)                            .
      /* returns false */                      .
          .                            /* sees MISSED = 1 */
          .                            /* so qdisc not empty */
          .                            __qdisc_run()
          .                                    .
          .                              pfifo_fast_dequeue()
----> /* may be done here */                  .
|         .                                clear(MISSED)
|         .                                    .
|         .                                smp_mb __after_atomic();
|         .                                    .
|         .                                /* recheck the queue */
|         .                                /* nothing => exit   */
|   enqueue(skb1)
|         .
|   qdisc_run_begin()
|         .
|     spin_trylock() /* fail */
|         .
|     smp_mb__before_atomic() /* not enough */
|         .
---- if (test_bit(MISSED))
        return false;   /* exit */

In the above scenario, CPU 1 and CPU 2 both try to grab the
qdisc->seqlock at the same time. Only CPU 2 succeeds and enters the
bypass code path, where it emits its skb then calls __qdisc_run().

CPU1 fails, sets MISSED and goes down the traditionnal enqueue() +
dequeue() code path. But when executing qdisc_run_begin() for the
second time, after enqueuing its skbuff, it sees the MISSED bit still
set (by itself) and consequently chooses to exit early without setting
it again nor trying to grab the spinlock again.

Meanwhile CPU2 has seen MISSED = 1, cleared it, checked the queue
and found it empty, so it returned.

At the end of the sequence, we end up with skb1 enqueued in the
backlog, both CPUs out of __dev_xmit_skb(), the MISSED bit not set,
and no __netif_schedule() called made. skb1 will now linger in the
qdisc until somebody later performs a full __qdisc_run(). Associated
to the bypass capacity of the qdisc, and the ability of the TCP layer
to avoid resending packets which it knows are still in the qdisc, this
can lead to serious traffic "holes" in a TCP connection.

We fix this by replacing the smp_mb__before_atomic() / test_bit() /
set_bit() / smp_mb__after_atomic() sequence inside qdisc_run_begin()
by a single test_and_set_bit() call, which is more concise and
enforces the needed memory barriers.

Fixes: 89837eb4b246 ("net: sched: add barrier to ensure correct ordering for lockless qdisc")
Signed-off-by: Vincent Ray <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: lan966x: check devm_of_phy_get() for -EDEFER_PROBE

At the moment, if devm_of_phy_get() returns an error the serdes
simply isn't set. While it is bad to ignore an error in general, there
is a particular bug that network isn't working if the serdes driver is
compiled as a module. In that case, devm_of_phy_get() returns
-EDEFER_PROBE and the error is silently ignored.

The serdes is optional, it is not there if the port is using RGMII, in
which case devm_of_phy_get() returns -ENODEV. Rearrange the error
handling so that -ENODEV will be handled but other error codes will
abort the probing.

Fixes: d28d6d2e37d1 ("net: lan966x: add port module support")
Signed-off-by: Michael Walle <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix UAF when creating non-stateful expression in set.

2) Set limit cost when cloning expression accordingly, from Phil Sutter.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_limit: Clone packet limits' cost value
netfilter: nf_tables: disallow non-stateful expression in sets earlier
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tracing: Fix comments for event_trigger_separate_filter()

The parameter name in comments of event_trigger_separate_filter() is
inconsistent with actual parameter name, fix it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: sunliming <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

x86/traceponit: Fix comment about irq vector tracepoints

Commit:

4b9a8dca0e58 ("x86/idt: Remove the tracing IDT completely")

removed the 'tracing IDT' from arch/x86/kernel/tracepoint.c,
but left related comment. So that the comment become anachronistic.
Just remove the comment.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: sunliming <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

x86,tracing: Remove unused headers

Commit 4b9a8dca0e58 ("x86/idt: Remove the tracing IDT completely")
removed the tracing IDT from the file arch/x86/kernel/tracepoint.c,
but left the related headers unused, remove it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: sunliming <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

ftrace: Clean up hash direct_functions on register failures

We see the following GPF when register_ftrace_direct fails:

[ ] general protection fault, probably for non-canonical address \
  0x200000000000010: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[...]
[ ] RIP: 0010:ftrace_find_rec_direct+0x53/0x70
[ ] Code: 48 c1 e0 03 48 03 42 08 48 8b 10 31 c0 48 85 d2 74 [...]
[ ] RSP: 0018:ffffc9000138bc10 EFLAGS: 00010206
[ ] RAX: 0000000000000000 RBX: ffffffff813e0df0 RCX: 000000000000003b
[ ] RDX: 0200000000000000 RSI: 000000000000000c RDI: ffffffff813e0df0
[ ] RBP: ffffffffa00a3000 R08: ffffffff81180ce0 R09: 0000000000000001
[ ] R10: ffffc9000138bc18 R11: 0000000000000001 R12: ffffffff813e0df0
[ ] R13: ffffffff813e0df0 R14: ffff888171b56400 R15: 0000000000000000
[ ] FS:  00007fa9420c7780(0000) GS:ffff888ff6a00000(0000) knlGS:000000000
[ ] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ ] CR2: 000000000770d000 CR3: 0000000107d50003 CR4: 0000000000370ee0
[ ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ ] Call Trace:
[ ]  <TASK>
[ ]  register_ftrace_direct+0x54/0x290
[ ]  ? render_sigset_t+0xa0/0xa0
[ ]  bpf_trampoline_update+0x3f5/0x4a0
[ ]  ? 0xffffffffa00a3000
[ ]  bpf_trampoline_link_prog+0xa9/0x140
[ ]  bpf_tracing_prog_attach+0x1dc/0x450
[ ]  bpf_raw_tracepoint_open+0x9a/0x1e0
[ ]  ? find_held_lock+0x2d/0x90
[ ]  ? lock_release+0x150/0x430
[ ]  __sys_bpf+0xbd6/0x2700
[ ]  ? lock_is_held_type+0xd8/0x130
[ ]  __x64_sys_bpf+0x1c/0x20
[ ]  do_syscall_64+0x3a/0x80
[ ]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ ] RIP: 0033:0x7fa9421defa9
[ ] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 9 f8 [...]
[ ] RSP: 002b:00007ffed743bd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[ ] RAX: ffffffffffffffda RBX: 00000000069d2480 RCX: 00007fa9421defa9
[ ] RDX: 0000000000000078 RSI: 00007ffed743bd80 RDI: 0000000000000011
[ ] RBP: 00007ffed743be00 R08: 0000000000bb7270 R09: 0000000000000000
[ ] R10: 00000000069da210 R11: 0000000000000246 R12: 0000000000000001
[ ] R13: 00007ffed743c4b0 R14: 00000000069d2480 R15: 0000000000000001
[ ]  </TASK>
[ ] Modules linked in: klp_vm(OK)
[ ] ---[ end trace 0000000000000000 ]---

One way to trigger this is:
  1. load a livepatch that patches kernel function xxx;
  2. run bpftrace -e 'kfunc:xxx {}', this will fail (expected for now);
  3. repeat #2 => gpf.

This is because the entry is added to direct_functions, but not removed.
Fix this by remove the entry from direct_functions when
register_ftrace_direct fails.

Also remove the last trailing space from ftrace.c, so we don't have to
worry about it anymore.

Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: 763e34e74bb7 ("ftrace: Add register_ftrace_direct()")
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing: Fix comments of create_filter()

The name in comments of parameter "filter_string" in function
create_filter is annotated as "filter_str", just fix it.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: sunliming <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing: Disable kcov on trace_preemptirq.c

Functions in trace_preemptirq.c could be invoked from early interrupt
code that bypasses kcov trace function's in_task() check. Disable kcov
on this file to reduce random code coverage.

Link: https://lkml.kernel.org/r/[email protected]
Acked-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Congyu Liu <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing: Initialize integer variable to prevent garbage return value

Initialize the integer variable to 0 to fix the clang scan warning:
Undefined or garbage value returned to caller
[core.uninitialized.UndefReturn]
return ret;

Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: 8993665abcce ("tracing/boot: Support multiple handlers for per-event histogram")
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Gautam Menghani <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

ftrace: Fix typo in comment

Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Julia Lawall <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

ftrace: Remove return value of ftrace_arch_modify_*()

All instances of the function ftrace_arch_modify_prepare() and
ftrace_arch_modify_post_process() return zero. There's no point in
checking their return value. Just have them be void functions.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Li kunyu <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing: Cleanup code by removing init "char *name"

The pointer is assigned to "type->name" anyway. no need to
initialize with "preemption".

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: liqiong <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing: Change "char *" string form to "char []"

The "char []" string form declares a single variable. It is better
than "char *" which creates two variables in the final assembly.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: liqiong <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing/timerlat: Do not wakeup the thread if the trace stops at the IRQ

There is no need to wakeup the timerlat/ thread if stop tracing is hit
at the timerlat's IRQ handler.

Return before waking up timerlat's thread.

Link: https://lkml.kernel.org/r/b392356c91b56aedd2b289513cc56a84cf87e60d.1652175637.git.bristot@kernel.org
Cc: Juri Lelli <[email protected]>
Cc: Clark Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>