Git Repo - linux.git/log

Merge branch 'for-6.9/lenovo' into for-linus

- 2nd version of code for applying proper quirk depending on firmware version
for lenovo/cptkbd (Mikhail Khvainitski)

Merge branch 'for-6.9/amd-sfh' into for-linus

- assorted fixes and optimizations for amd-sfh (Basavaraj Natikar)

Signed-off-by: Jiri Kosina <[email protected]>

Merge tag '6.9-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client updates from Steve French:

- fix for folios/netfs data corruption in cifs_extend_writeback

- additional tracepoint added

- updates for special files and symlinks: improvements to allow
   selecting use of either WSL or NFS reparse point format on creating
   special files

- allocation size improvement for cached files

- minor cleanup patches

- fix to allow changing the password on remount when password for the
   session is expired.

- lease key related fixes: caching hardlinked files, deletes of
   deferred close files, and an important fix to better reuse lease keys
   for compound operations, which also can avoid lease break timeouts
   when low on credits

- fix potential data corruption with write/readdir races

- compression cleanups and a fix for compression headers

* tag '6.9-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (24 commits)
  cifs: update internal module version number for cifs.ko
  smb: common: simplify compression headers
  smb: common: fix fields sizes in compression_pattern_payload_v1
  smb: client: negotiate compression algorithms
  smb3: add dynamic trace point for ioctls
  cifs: Fix writeback data corruption
  smb: client: return reparse type in /proc/mounts
  smb: client: set correct d_type for reparse DFS/DFSR and mount point
  smb: client: parse uid, gid, mode and dev from WSL reparse points
  smb: client: introduce SMB2_OP_QUERY_WSL_EA
  smb: client: Fix a NULL vs IS_ERR() check in wsl_set_xattrs()
  smb: client: add support for WSL reparse points
  smb: client: reduce number of parameters in smb2_compound_op()
  smb: client: fix potential broken compound request
  smb: client: move most of reparse point handling code to common file
  smb: client: introduce reparse mount option
  smb: client: retry compound request without reusing lease
  smb: client: do not defer close open handles to deleted files
  smb: client: reuse file lease key in compound operations
  smb3: update allocation size more accurately on write completion
  ...

block: limit block time caching to in_task() context

We should not have any callers of this from non-task context, but Jakub
ran [1] into one from blk-iocost. Rather than risk running into others,
or future ones, just limit blk_time_get_ns() to when it is called from
a task. Any other usage is invalid.

[1] https://lore.kernel.org/lkml/CAHk-=wiOaBLqarS2uFhM1YdwOvCX4CZaWkeyNDY1zONpbYw2ig@mail.gmail.com/

Fixes: da4c8c3d0975 ("block: cache current nsec time in struct blk_plug")
Reported-by: Jakub Kicinski <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

pidfs: remove config option

As Linus suggested this enables pidfs unconditionally. A key property to
retain is the ability to compare pidfds by inode number (cf. [1]).
That's extremely helpful just as comparing namespace file descriptors by
inode number is. They are used in a variety of scenarios where they need
to be compared, e.g., when receiving a pidfd via SO_PEERPIDFD from a
socket to trivially authenticate a the sender and various other
use-cases.

For 64bit systems this is pretty trivial to do. For 32bit it's slightly
more annoying as we discussed but we simply add a dumb ida based
allocator that gets used on 32bit. This gives the same guarantees about
inode numbers on 64bit without any overflow risk. Practically, we'll
never run into overflow issues because we're constrained by the number
of processes that can exist on 32bit and by the number of open files
that can exist on a 32bit system. On 64bit none of this matters and
things are very simple.

If 32bit also needs the uniqueness guarantee they can simply parse the
contents of /proc/<pid>/fd/<nr>. The uniqueness guarantees have a
variety of use-cases. One of the most obvious ones is that they will
make pidfiles (or "pidfdfiles", I guess) reliable as the unique
identifier can be placed into there that won't be reycled. Also a
frequent request.

Note, I took the chance and simplified path_from_stashed() even further.
Instead of passing the inode number explicitly to path_from_stashed() we
let the filesystem handle that internally. So path_from_stashed() ends
up even simpler than it is now. This is also a good solution allowing
the cleanup code to be clean and consistent between 32bit and 64bit. The
cleanup path in prepare_anon_dentry() is also switched around so we put
the inode before the dentry allocation. This means we only have to call
the cleanup handler for the filesystem's inode data once and can rely
->evict_inode() otherwise.

Aside from having to have a bit of extra code for 32bit it actually ends
up a nice cleanup for path_from_stashed() imho.

Tested on both 32 and 64bit including error injection.

Link: https://github.com/systemd/systemd/pull/31713
Link: https://lore.kernel.org/r/20240312-dingo-sehnlich-b3ecc35c6de7@brauner
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'modules-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull modules updates from Luis Chamberlain:
"Christophe Leroy did most of the work on this release, first with a
  few cleanups on CONFIG_STRICT_KERNEL_RWX and ending with error
  handling for when set_memory_XX() can fail.

  This is part of a larger effort to clean up all these callers which
  can fail, modules is just part of it"

* tag 'modules-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  module: Don't ignore errors from set_memory_XX()
  lib/test_kmod: fix kernel-doc warnings
  powerpc: Simplify strict_kernel_rwx_enabled()
  modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around rodata_enabled
  init: Declare rodata_enabled and mark_rodata_ro() at all time
  module: Change module_enable_{nx/x/ro}() to more explicit names
  module: Use set_memory_rox()

Merge tag 'efi-next-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:

- Measure initrd and command line using the CC protocol if the ordinary
   TCG2 protocol is not implemented, typically on TDX confidential VMs

- Avoid creating mappings that are both writable and executable while
   running in the EFI boot services. This is a prerequisite for getting
   the x86 shim loader signed by MicroSoft again, which allows the
   distros to install on x86 PCs that ship with EFI secure boot enabled.

- API update for struct platform_driver::remove()

* tag 'efi-next-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  virt: efi_secret: Convert to platform remove callback returning void
  x86/efistub: Remap kernel text read-only before dropping NX attribute
  efi/libstub: Add get_event_log() support for CC platforms
  efi/libstub: Measure into CC protocol if TCG2 protocol is absent
  efi/libstub: Add Confidential Computing (CC) measurement typedefs
  efi/tpm: Use symbolic GUID name from spec for final events table
  efi/libstub: Use TPM event typedefs from the TCG PC Client spec

Merge branches 'clk-samsung', 'clk-imx', 'clk-rockchip', 'clk-clkdev' and 'clk-rate-exclusive' into clk-next

- Increase dev_id len for clkdev lookups

* clk-samsung: (25 commits)
  clk: samsung: Add CPU clock support for Exynos850
  clk: samsung: Pass mask to wait_until_mux_stable()
  clk: samsung: Keep register offsets in chip specific structure
  clk: samsung: Keep CPU clock chip specific data in a dedicated struct
  clk: samsung: Pass register layout type explicitly to CLK_CPU()
  clk: samsung: Pass actual CPU clock registers base to CPU_CLK()
  clk: samsung: Group CPU clock functions by chip
  clk: samsung: Use single CPU clock notifier callback for all chips
  clk: samsung: Reduce params count in exynos_register_cpu_clock()
  clk: samsung: Pull struct exynos_cpuclk into clk-cpu.c
  clk: samsung: Improve clk-cpu.c style
  dt-bindings: clock: exynos850: Add CMU_CPUCLK0 and CMU_CPUCL1
  clk: samsung: gs101: add support for cmu_peric1
  clk: samsung: gs101: drop extra empty line
  dt-bindings: clock: google,gs101-clock: add PERIC1 clock management unit
  clk: samsung: exynos850: Propagate SPI IPCLK rate change
  clk: samsung: gs101: gpio_peric0_pclk needs to be kept on
  clk: samsung: exynos850: Add PDMA clocks
  dt-bindings: clock: tesla,fsd: Fix spelling mistake
  clk: samsung: gs101: add support for cmu_peric0
  ...

* clk-imx:
  clk: imx: imx8mp: Fix SAI_MCLK_SEL definition
  clk: imx: scu: Use common error handling code in imx_clk_scu_alloc_dev()
  clk: imx: composite-8m: Delete two unnecessary initialisations in __imx8m_clk_hw_composite()
  clk: imx: composite-8m: Less function calls in __imx8m_clk_hw_composite() after error detection

* clk-rockchip:
  clk: rockchip: rk3399: Allow to set rate of clk_i2s0_frac's parent
  clk: rockchip: rk3588: use linked clock ID for GATE_LINK
  clk: rockchip: rk3588: fix indent
  clk: rockchip: rk3588: fix pclk_vo0grf and pclk_vo1grf
  dt-bindings: clock: rk3588: add missing PCLK_VO1GRF
  dt-bindings: clock: rk3588: drop CLK_NR_CLKS
  clk: rockchip: rk3588: fix CLK_NR_CLKS usage
  clk: rockchip: rk3568: Add PLL rate for 128MHz

* clk-clkdev:
  clkdev: Update clkdev id usage to allow for longer names

* clk-rate-exclusive:
  clk: Add a devm variant of clk_rate_exclusive_get()

Merge branches 'clk-remove', 'clk-amlogic', 'clk-qcom', 'clk-parent' and 'clk-microchip' into clk-next

* clk-remove:
  clk: starfive: jh7110-vout: Convert to platform remove callback returning void
  clk: starfive: jh7110-isp: Convert to platform remove callback returning void
  clk: imx: imx8-acm: Convert to platform remove callback returning void

* clk-amlogic:
  clk: meson: Add missing clocks to axg_clk_regmaps

* clk-qcom: (62 commits)
  clk: qcom: gcc-ipq5018: fix register offset for GCC_UBI0_AXI_ARES reset
  clk: qcom: gcc-ipq5018: fix 'halt_reg' offset of 'gcc_pcie1_pipe_clk'
  clk: qcom: gcc-ipq5018: fix 'enable_reg' offset of 'gcc_gmac0_sys_clk'
  clk: qcom: camcc-x1e80100: Fix missing DT_IFACE enum in x1e80100 camcc
  clk: qcom: mmcc-msm8974: fix terminating of frequency table arrays
  clk: qcom: mmcc-apq8084: fix terminating of frequency table arrays
  clk: qcom: camcc-sc8280xp: fix terminating of frequency table arrays
  clk: qcom: gcc-ipq9574: fix terminating of frequency table arrays
  clk: qcom: gcc-ipq8074: fix terminating of frequency table arrays
  clk: qcom: gcc-ipq6018: fix terminating of frequency table arrays
  clk: qcom: gcc-ipq5018: fix terminating of frequency table arrays
  clk: qcom: dispcc-sdm845: Adjust internal GDSC wait times
  dt-bindings: clk: qcom: drop the SC7180 Modem subsystem clock controller
  clk: qcom: drop the SC7180 Modem subsystem clock driver
  clk: qcom: Use qcom_branch_set_clk_en()
  clk: qcom: branch: Add a helper for setting the enable bit
  clk: qcom: dispcc-sm8250: Make clk_init_data and pll_vco const
  clk: qcom: gcc-sc8180x: Add missing UFS QREF clocks
  clk: qcom: gcc-msm8953: add more resets
  clk: qcom: videocc-*: switch to module_platform_driver
  ...

* clk-parent:
  clk: Fix clk_core_get NULL dereference

* clk-microchip:
  clk: microchip: mpfs: convert MSSPLL outputs to clk_divider
  clk: microchip: mpfs: add missing MSSPLL outputs
  clk: microchip: mpfs: setup for using other mss pll outputs
  clk: microchip: mpfs: split MSSPLL in two
  dt-bindings: can: mpfs: add missing required clock
  dt-bindings: clock: mpfs: add more MSSPLL output definitions

Merge branches 'clk-aspeed', 'clk-keystone', 'clk-mobileye' and 'clk-allwinner' into clk-next

* clk-aspeed:
  clk: ast2600: Add FSI parent clock with correct rate
  dt-bindings: clock: ast2600: Add FSI clock

* clk-keystone:
  clk: keystone: sci-clk: Adding support for non contiguous clocks

* clk-mobileye:
  dt-bindings: reset: mobileye,eyeq5-reset: add bindings
  dt-bindings: clock: mobileye,eyeq5-clk: add bindings
  clk: fixed-factor: add fwname-based constructor functions
  clk: fixed-factor: add optional accuracy support

* clk-allwinner:
  clk: sunxi: usb: fix kernel-doc warnings
  clk: sunxi: sun9i-cpus: fix kernel-doc warnings
  clk: sunxi: a20-gmac: fix kernel-doc warnings

Merge branches 'clk-renesas', 'clk-cleanup', 'clk-hisilicon', 'clk-mediatek' and 'clk-bulk' into clk-next

- Add a devm_clk_bulk_get_all_enable() API to get and enable all clks
   for a device
- Fix some static checker errors in the hisilicon clk driver

* clk-renesas: (25 commits)
  clk: renesas: r8a779h0: Add RPC-IF clock
  clk: renesas: r8a779h0: Add SYS-DMAC clocks
  clk: renesas: r8a779h0: Add SDHI clock
  clk: renesas: r8a779h0: Add EtherAVB clocks
  clk: renesas: r9a07g04[34]: Fix typo for sel_shdi variable
  clk: renesas: r9a07g04[34]: Use SEL_SDHI1_STS status configuration for SD1 mux
  clk: renesas: r8a779f0: Correct PFC/GPIO parent clock
  clk: renesas: r8a779g0: Correct PFC/GPIO parent clocks
  clk: renesas: r8a779h0: Add I2C clocks
  clk: renesas: r8a779h0: Add watchdog clock
  clk: renesas: r8a779h0: Add PFC/GPIO clocks
  clk: renesas: r8a779g0: Fix PCIe clock name
  clk: renesas: cpg-mssr: Add support for R-Car V4M
  clk: renesas: rcar-gen4: Add support for FRQCRC1
  clk: renesas: r9a07g043: Add clock and reset entries for CRU
  clk: renesas: r9a08g045: Add clock and reset support for watchdog
  dt-bindings: clock: Add R8A779H0 V4M CPG Core Clock Definitions
  dt-bindings: clock: renesas,cpg-mssr: Document R-Car V4M support
  dt-bindings: power: Add r8a779h0 SYSC power domain definitions
  dt-bindings: power: renesas,rcar-sysc: Document R-Car V4M support
  ...

* clk-cleanup:
  clk: zynq: Prevent null pointer dereference caused by kmalloc failure
  clk: fractional-divider: Use bit operations consistently
  clk: fractional-divider: Move mask calculations out of lock
  clk: ti: dpll3xxx: use correct function names in kernel-doc
  clk: clocking-wizard: Remove redundant initialization of pointer div_addr
  clk: keystone: sci-clk: match func name comment to actual
  clk: cdce925: Remove redundant assignment to variable 'rate'
  MAINTAINERS: drop Sekhar Nori

* clk-hisilicon:
  clk: hisilicon: Use devm_kcalloc() instead of devm_kzalloc()
  clk: hisilicon: hi3559a: Fix an erroneous devm_kfree()
  clk: hisilicon: hi3519: Release the correct number of gates in hi3519_clk_unregister()

* clk-mediatek:
  clk: mediatek: clk-mt8173-apmixedsys: Use common error handling code in clk_mt8173_apmixed_probe()
  clk: mediatek: add infracfg reset controller for mt7988
  dt-bindings: reset: mediatek: add MT7988 infracfg reset IDs
  dt-bindings: clock: mediatek: convert SSUSBSYS to the json-schema clock
  dt-bindings: clock: mediatek: convert PCIESYS to the json-schema clock
  dt-bindings: clock: mediatek: convert hifsys to the json-schema clock
  clk: mediatek: mt7981-topckgen: flag SGM_REG_SEL as critical
  clk: mediatek: mt8183: Correct parent of CLK_INFRA_SSPM_32K_SELF
  clk: mediatek: mt7622-apmixedsys: Fix an error handling path in clk_mt8135_apmixed_probe()
  clk: mediatek: mt8135: Fix an error handling path in clk_mt8135_apmixed_probe()

* clk-bulk:
  clk: Provide managed helper to get and enable bulk clocks

Merge tag 'tpmdd-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm updates from Jarkko Sakkinen:
"Small bug fixes and device tree updates. No new features"

* tag 'tpmdd-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: tis_i2c: Add compatible string nuvoton,npct75x
  tpm_tis: Add compatible string atmel,at97sc3204
  tpm_tis_spi: Add compatible string atmel,attpm20p
  dt-bindings: tpm: Add compatible string atmel,attpm20p
  tpm,tpm_tis: Avoid warning splat at shutdown
  tpm/tpm_ftpm_tee: fix all kernel-doc warnings

Merge tag 'mailbox-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox

Pull mailbox updates from Jassi Brar:

- imx: add support for i.MX95 ELE/V2X MU

- misc: I will be signing-off from my personal gmail id from now on

* tag 'mailbox-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox:
  mailbox: imx: support i.MX95 Generic/ELE/V2X MU
  mailbox: imx: populate sub-nodes
  mailbox: imx: get RR/TR registers num from Parameter register
  mailbox: imx: support return value of init
  dt-bindings: mailbox: fsl,mu: add i.MX95 Generic/ELE/V2X MU compatible

mm/zswap: remove the memcpy if acomp is not sleepable

Most compressors are actually CPU-based and won't sleep during compression
and decompression.  We should remove the redundant memcpy for them.

This patch checks if the algorithm is sleepable by testing the
CRYPTO_ALG_ASYNC algorithm flag.

Generally speaking, async and sleepable are semantically similar but not
equal.  But for compress drivers, they are basically equal at least due to
the below facts.

Firstly, scompress drivers - crypto/deflate.c, lz4.c, zstd.c, lzo.c etc
have no sleep.  Secondly, zRAM has been using these scompress drivers for
years in atomic contexts, and never worried those drivers going to sleep.

One exception is that an async driver can sometimes still return
synchronously per Herbert's clarification.  In this case, we are still
having a redundant memcpy.  But we can't know if one particular acomp
request will sleep or not unless crypto can expose more details for each
specific request from offload drivers.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Barry Song <[email protected]>
Tested-by: Chengming Zhou <[email protected]>
Reviewed-by: Nhat Pham <[email protected]>
Acked-by: Yosry Ahmed <[email protected]>
Reviewed-by: Chengming Zhou <[email protected]>
Acked-by: Chris Li <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

crypto: introduce: acomp_is_async to expose if comp drivers might sleep

acomp's users might want to know if acomp is really async to optimize
themselves.  One typical user which can benefit from exposed async stat is
zswap.

In zswap, zsmalloc is the most commonly used allocator for (and perhaps
the only one).  For zsmalloc, we cannot sleep while we map the compressed
memory, so we copy it to a temporary buffer.  By knowing the alg won't
sleep can help zswap to avoid the need for a buffer.  This shows
noticeable improvement in load/store latency of zswap.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Barry Song <[email protected]>
Acked-by: Herbert Xu <[email protected]>
Acked-by: Chris Li <[email protected]>
Cc: Chengming Zhou <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Nhat Pham <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

memtest: use {READ,WRITE}_ONCE in memory scanning

memtest failed to find bad memory when compiled with clang. So use
{WRITE,READ}_ONCE to access memory to avoid compiler over optimization.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Qiang Zhang <[email protected]>
Cc: Bill Wendling <[email protected]>
Cc: Justin Stitt <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: prohibit the last subpage from reusing the entire large folio

In a Copy-on-Write (CoW) scenario, the last subpage will reuse the entire
large folio, resulting in the waste of (nr_pages - 1) pages.  This wasted
memory remains allocated until it is either unmapped or memory reclamation
occurs.

The following small program can serve as evidence of this behavior

main()
{
#define SIZE 1024 * 1024 * 1024UL
         void *p = malloc(SIZE);
         memset(p, 0x11, SIZE);
         if (fork() == 0)
                 _exit(0);
         memset(p, 0x12, SIZE);
         printf("done\n");
         while(1);
}

For example, using a 1024KiB mTHP by:
echo always > /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled

(1) w/o the patch, it takes 2GiB,

Before running the test program,
/ # free -m
                total        used        free      shared  buff/cache   available
Mem:            5754          84        5692           0          17        5669
Swap:              0           0           0

/ # /a.out &
/ # done

After running the test program,
/ # free -m
                 total        used        free      shared  buff/cache   available
Mem:            5754        2149        3627           0          19        3605
Swap:              0           0           0

(2) w/ the patch, it takes 1GiB only,

Before running the test program,
/ # free -m
                 total        used        free      shared  buff/cache   available
Mem:            5754          89        5687           0          17        5664
Swap:              0           0           0

/ # /a.out &
/ # done

After running the test program,
/ # free -m
                total        used        free      shared  buff/cache   available
Mem:            5754        1122        4655           0          17        4632
Swap:              0           0           0

This patch migrates the last subpage to a small folio and immediately
returns the large folio to the system. It benefits both memory availability
and anti-fragmentation.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Barry Song <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Lance Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: recover pud_leaf() definitions in nopmd case

This reverts one change in commit 924bd6a8c967 ("mm/x86: drop two
unnecessary pud_leaf() definitions").

One issue with that is it broke nopmd builds for at least both arm64 and
riscv (CONFIG_PGTABLE_LEVELS=2). The other issue is it was overlooked that
it's a common change rather than x86 specific (relevant to the commit
message of the commit).

Normally there's no need for empty definition of pXd_leaf() because of the
fallback functions, however this logic may not apply to pgtable-nopmd.h,
because that's a header that can even be used by arch *pgtable.h headers,
which can use the *_leaf() definitions _before_ the fallback functions are
defined. Leave it there to pass PGTABLE_LEVELS=2 builds.

Link: https://lkml.kernel.org/r/Ze8vFNV9YSdgC2S7@x1n
Fixes: 924bd6a8c967 ("mm/x86: drop two unnecessary pud_leaf() definitions")
Signed-off-by: Peter Xu <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Cc: Jason Gunthorpe <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Merge tag 'thermal-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control updates from Rafael Wysocki:
"These mostly change the thermal core in a few ways allowing thermal
  drivers to be simplified, in particular in their removal and failing
  probe handling parts that are notoriously prone to errors, and
  propagate the changes to several drivers.

  Apart from that, support for a new platform is added (Intel Lunar
  Lake-M), some bugs are fixed and some code is cleaned up, as usual.

  Specifics:

   - Store zone trips table and zone operations directly in struct
     thermal_zone_device (Rafael Wysocki)

   - Fix up flex array initialization during thermal zone device
     registration (Nathan Chancellor)

   - Rework writable trip points handling in the thermal core and
     several drivers (Rafael Wysocki)

   - Thermal core code cleanups (Dan Carpenter, Flavio Suligoi)

   - Use thermal zone accessor functions in the int340x Intel thermal
     driver (Rafael Wysocki)

   - Add Lunar Lake-M PCI ID to the int340x Intel thermal driver
     (Srinivas Pandruvada)

   - Minor fixes for thermal governors (Rafael Wysocki, Di Shen)

   - Trip point handling fixes for the iwlwifi wireless driver (Rafael
     Wysocki)

   - Code cleanups (Rafael J. Wysocki, AngeloGioacchino Del Regno)"

* tag 'thermal-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (29 commits)
  thermal: core: remove unnecessary check in trip_point_hyst_store()
  thermal: intel: int340x_thermal: Use thermal zone accessor functions
  thermal: core: Remove excess empty line from a comment
  thermal: int340x: processor_thermal: Add Lunar Lake-M PCI ID
  thermal: core: Eliminate writable trip points masks
  thermal: of: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  thermal: imx: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  wifi: iwlwifi: mvm: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  mlxsw: core_thermal: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  thermal: intel: Set THERMAL_TRIP_FLAG_RW_TEMP directly
  thermal: core: Drop the .set_trip_hyst() thermal zone operation
  thermal: core: Add flags to struct thermal_trip
  thermal: core: Move initial num_trips assignment before memcpy()
  thermal: Get rid of CONFIG_THERMAL_WRITABLE_TRIPS
  thermal: intel: Adjust ops handling during thermal zone registration
  thermal: ACPI: Constify acpi_thermal_zone_ops
  thermal: core: Store zone ops in struct thermal_zone_device
  thermal: intel: Discard trip tables after zone registration
  thermal: ACPI: Discard trips table after zone registration
  thermal: core: Store zone trips table in struct thermal_zone_device
  ...

Merge tag 'acpi-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI updates from Rafael Wysocki:
"These modify the ACPI device events and processor enumeration code to
  take the 'enabled' _STA bit into account as mandated by the ACPI
  specification, convert several platform drivers to using a remove
  callback that returns void, add some new quirks for ACPI IRQ override
  and other things, address assorted issues and clean up code.

  Specifics:

   - Rearrange Device Check and Bus Check notification handling in the
     ACPI device hotplug code to make it get the "enabled" _STA bit into
     account (Rafael Wysocki)

   - Modify acpi_processor_add() to skip processors with the "enabled"
     _STA bit clear, as per the specification (Rafael Wysocki)

   - Stop failing Device Check notification handling without a valid
     reason (Rafael Wysocki)

   - Defer enumeration of devices that depend on a device with an ACPI
     device ID equalt to INTC10CF to address probe ordering issues on
     some platforms (Wentong Wu)

   - Constify acpi_bus_type (Ricardo Marliere)

   - Make the ACPI-specific suspend-to-idle code take the Low-Power S0
     Idle MSFT UUID into account on non-AMD systems (Rafael Wysocki)

   - Add ACPI IRQ override quirks for some new platforms (Sergey
     Kalinichev, Maxim Kudinov, Alexey Froloff, Sviatoslav Harasymchuk,
     Nicolas Haye)

   - Make the NFIT parsing code use acpi_evaluate_dsm_typed() (Andy
     Shevchenko)

   - Fix a memory leak in acpi_processor_power_exit() (Armin Wolf)

   - Make it possible to quirk the CSI-2 and MIPI DisCo for Imaging
     properties parsing and add a quirk for Dell XPS 9315 (Sakari Ailus)

   - Prevent false-positive static checker warnings from triggering by
     intializing some variables in the ACPI thermal code to zero (Colin
     Ian King)

   - Add DELL0501 handling to acpi_quirk_skip_serdev_enumeration() and
     make that function generic (Hans de Goede)

   - Make the ACPI backlight code handle fetching EDID that is longer
     than 256 bytes (Mario Limonciello)

   - Skip initialization of GHES_ASSIST structures for Machine Check
     Architecture in APEI (Avadhut Naik)

   - Convert several plaform drivers in the ACPI subsystem to using a
     remove callback that returns void (Uwe Kleine-König)

   - Drop the long-deprecated custom_method debugfs interface that is
     problematic from the security standpoint (Rafael Wysocki)

   - Use %pe in a couple of places in the ACPI code for easier error
     decoding (Onkarnath)

   - Fix register width information handling during system memory
     accesses in the ACPI CPPC library (Jarred White)

   - Add AMD CPPC V2 support for family 17h processors to the ACPI CPPC
     library (Perry Yuan)"

* tag 'acpi-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits)
  ACPI: resource: Use IRQ override on Maibenben X565
  ACPI: CPPC: Use access_width over bit_width for system memory accesses
  ACPI: CPPC: enable AMD CPPC V2 support for family 17h processors
  ACPI: APEI: Skip initialization of GHES_ASSIST structures for Machine Check Architecture
  ACPI: scan: Consolidate Device Check and Bus Check notification handling
  ACPI: scan: Rework Device Check and Bus Check notification handling
  ACPI: scan: Make acpi_processor_add() check the device enabled bit
  ACPI: scan: Relocate acpi_bus_trim_one()
  ACPI: scan: Fix device check notification handling
  ACPI: resource: Add MAIBENBEN X577 to irq1_edge_low_force_override
  ACPI: pfr_update: Convert to platform remove callback returning void
  ACPI: pfr_telemetry: Convert to platform remove callback returning void
  ACPI: fan: Convert to platform remove callback returning void
  ACPI: GED: Convert to platform remove callback returning void
  ACPI: DPTF: Convert to platform remove callback returning void
  ACPI: AGDI: Convert to platform remove callback returning void
  ACPI: TAD: Convert to platform remove callback returning void
  ACPI: APEI: GHES: Convert to platform remove callback returning void
  ACPI: property: Polish ignoring bad data nodes
  ACPI: thermal_lib: Initialize temp_decik to zero
  ...

Merge tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
"From the functional perspective, the most significant change here is
  the addition of support for Energy Models that can be updated
  dynamically at run time.

  There is also the addition of LZ4 compression support for hibernation,
  the new preferred core support in amd-pstate, new platforms support in
  the Intel RAPL driver, new model-specific EPP handling in intel_pstate
  and more.

  Apart from that, the cpufreq default transition delay is reduced from
  10 ms to 2 ms (along with some related adjustments), the system
  suspend statistics code undergoes a significant rework and there is a
  usual bunch of fixes and code cleanups all over.

  Specifics:

   - Allow the Energy Model to be updated dynamically (Lukasz Luba)

   - Add support for LZ4 compression algorithm to the hibernation image
     creation and loading code (Nikhil V)

   - Fix and clean up system suspend statistics collection (Rafael
     Wysocki)

   - Simplify device suspend and resume handling in the power management
     core code (Rafael Wysocki)

   - Fix PCI hibernation support description (Yiwei Lin)

   - Make hibernation take set_memory_ro() return values into account as
     appropriate (Christophe Leroy)

   - Set mem_sleep_current during kernel command line setup to avoid an
     ordering issue with handling it (Maulik Shah)

   - Fix wake IRQs handling when pm_runtime_force_suspend() is used as a
     driver's system suspend callback (Qingliang Li)

   - Simplify pm_runtime_get_if_active() usage and add a replacement for
     pm_runtime_put_autosuspend() (Sakari Ailus)

   - Add a tracepoint for runtime_status changes tracking (Vilas Bhat)

   - Fix section title markdown in the runtime PM documentation (Yiwei
     Lin)

   - Enable preferred core support in the amd-pstate cpufreq driver
     (Meng Li)

   - Fix min_perf assignment in amd_pstate_adjust_perf() and make the
     min/max limit perf values in amd-pstate always stay within the
     (highest perf, lowest perf) range (Tor Vic, Meng Li)

   - Allow intel_pstate to assign model-specific values to strings used
     in the EPP sysfs interface and make it do so on Meteor Lake
     (Srinivas Pandruvada)

   - Drop long-unused cpudata::prev_cummulative_iowait from the
     intel_pstate cpufreq driver (Jiri Slaby)

   - Prevent scaling_cur_freq from exceeding scaling_max_freq when the
     latter is an inefficient frequency (Shivnandan Kumar)

   - Change default transition delay in cpufreq to 2ms (Qais Yousef)

   - Remove references to 10ms minimum sampling rate from comments in
     the cpufreq code (Pierre Gondois)

   - Honour transition_latency over transition_delay_us in cpufreq (Qais
     Yousef)

   - Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar)

   - General enhancements / cleanups to ARM cpufreq drivers (tianyu2,
     Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia
     Belova)

   - Update cpufreq-dt-platdev to block/approve devices (Richard Acayan)

   - Make the SCMI cpufreq driver get a transition delay value from
     firmware (Pierre Gondois)

   - Prevent the haltpoll cpuidle governor from shrinking guest
     poll_limit_ns below grow_start (Parshuram Sangle)

   - Avoid potential overflow in integer multiplication when computing
     cpuidle state parameters (C Cheng)

   - Adjust MWAIT hint target C-state computation in the ACPI cpuidle
     driver and in intel_idle to return a correct value for C0 (He
     Rongguang)

   - Address multiple issues in the TPMI RAPL driver and add support for
     new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui)

   - Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel
     Lezcano)

   - Fix kernel-doc for dtpm_create_hierarchy() (Yang Li)

   - Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth
     Norway Ananda)

   - Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil)

   - Fix a couple of warnings in the OPP core code related to W=1 builds
     (Viresh Kumar)

   - Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh
     Kumar)

   - Extend dev_pm_opp_data with turbo support (Sibi Sankar)

   - dt-bindings: drop maxItems from inner items (David Heidelberg)"

* tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (95 commits)
  dt-bindings: opp: drop maxItems from inner items
  OPP: debugfs: Fix warning around icc_get_name()
  OPP: debugfs: Fix warning with W=1 builds
  cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h
  OPP: Extend dev_pm_opp_data with turbo support
  Fix cpupower-frequency-info.1 man page typo
  cpufreq: scmi: Set transition_delay_us
  firmware: arm_scmi: Populate fast channel rate_limit
  firmware: arm_scmi: Populate perf commands rate_limit
  cpuidle: ACPI/intel: fix MWAIT hint target C-state computation
  PM: sleep: wakeirq: fix wake irq warning in system suspend
  powercap: dtpm: Fix kernel-doc for dtpm_create_hierarchy() function
  cpufreq: Don't unregister cpufreq cooling on CPU hotplug
  PM: suspend: Set mem_sleep_current during kernel command line setup
  cpufreq: Honour transition_latency over transition_delay_us
  cpufreq: Limit resolving a frequency to policy min/max
  Documentation: PM: Fix runtime_pm.rst markdown syntax
  cpufreq: amd-pstate: adjust min/max limit perf
  cpufreq: Remove references to 10ms min sampling rate
  cpufreq: intel_pstate: Update default EPPs for Meteor Lake
  ...

Merge tag 'pmdomain-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm

Pull pmdomain updates from Ulf Hansson:
"Core:
   - Log a message when unused PM domains gets disabled
   - Scale down parent/child performance states in the reverse order

  Providers:
   - qcom: rpmpd: Add power domains support for MSM8974, MSM8974PRO,
     PMA8084 and PM8841
   - renesas: rcar-gen4-sysc: Reduce atomic delays
   - renesas: rcar-sysc: Adjust the waiting time to cover the worst case
   - renesas: r8a779h0-sysc: Add support for the r8a779h0 PM domains
   - imx: imx8mp-blk-ctrl: Add the fdcc clock to the hdmimix domains
   - imx: imx8mp-blk-ctrl: Error out if domains are missing in DT

  Improve support for multiple PM domains:
   - Add two helper functions to attach/detach multiple PM domains
   - Convert a couple of drivers to use the new helper functions"

* tag 'pmdomain-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: (22 commits)
  pmdomain: renesas: rcar-gen4-sysc: Reduce atomic delays
  pmdomain: renesas: Adjust the waiting time to cover the worst case
  pmdomain: qcom: rpmpd: Add MSM8974PRO+PMA8084 power domains
  pmdomain: qcom: rpmpd: Add MSM8974+PM8841 power domains
  pmdomain: core: constify of_phandle_args in add device and subdomain
  pmdomain: core: constify of_phandle_args in xlate
  media: venus: Convert to dev_pm_domain_attach|detach_list() for vcodec
  remoteproc: qcom_q6v5_adsp: Convert to dev_pm_domain_attach|detach_list()
  remoteproc: imx_rproc: Convert to dev_pm_domain_attach|detach_list()
  remoteproc: imx_dsp_rproc: Convert to dev_pm_domain_attach|detach_list()
  PM: domains: Add helper functions to attach/detach multiple PM domains
  pmdomain: imx8mp-blk-ctrl: imx8mp_blk: Add fdcc clock to hdmimix domain
  pmdomain: mediatek: Use devm_platform_ioremap_resource() in init_scp()
  pmdomain: renesas: r8a779h0-sysc: Add r8a779h0 support
  pmdomain: imx8mp-blk-ctrl: Error out if domains are missing in DT
  pmdomain: ti: Add a null pointer check to the omap_prm_domain_init
  pmdomain: renesas: rcar-gen4-sysc: Remove unneeded includes
  pmdomain: core: Print a message when unused power domains are disabled
  pmdomain: qcom: rpmpd: Keep one RPM handle for all RPMPDs
  pmdomain: core: Scale down parent/child performance states in reverse order
  ...

Merge tag 'hwmon-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon updates from Guenter Roeck:
"New drivers:
   - Amphenol ChipCap 2
   - ASPEED g6 PWM/Fan tach
   - Astera Labs PT5161L retimer
   - ASUS ROG RYUJIN II 360 AIO cooler
   - LTC4282
   - Microsoft Surface devices
   - MPS MPQ8785 Synchronous Step-Down Converter
   - NZXT Kraken X and Z series AIO CPU coolers

  Additional chip support in existing drivers:
   - Ayaneo Air Plus 7320u (oxp-sensors)
   - INA260 (ina2xx)
   - XPS 9315 (dell-smm)
   - MSI customer ID (nct6683)

  Devicetree bindings updates:
   - Common schema for hardware monitoring devices
   - Common schema for fans
   - Update chip descriptions to use common schema
   - Document regulator properties in several drivers
   - Explicit bindings for infineon buck converters

  Other improvements:
   - Replaced rbtree with maple tree register cache in several drivers
   - Added support for humidity min/max alarm and volatage fault
     attributes to hwmon core
   - Dropped non-functional I2C_CLASS_HWMON support for drivers w/o
     detect()
   - Dropped obsolete and redundant entried from MAINTAINERS
   - Cleaned up axi-fan-control and coretemp drivers
   - Minor fixes and improvements in several other drivers"

* tag 'hwmon-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (70 commits)
  hwmon: (dell-smm) Add XPS 9315 to fan control whitelist
  hwmon: (aspeed-g6-pwm-tacho): Support for ASPEED g6 PWM/Fan tach
  dt-bindings: hwmon: Support Aspeed g6 PWM TACH Control
  dt-bindings: hwmon: fan: Add fan binding to schema
  dt-bindings: hwmon: tda38640: Add interrupt & regulator properties
  hwmon: (amc6821) add of_match table
  dt-bindings: hwmon: lm75: use common hwmon schema
  hwmon: (sis5595) drop unused DIV_TO_REG function
  dt-bindings: hwmon: reference common hwmon schema
  dt-bindings: hwmon: lltc,ltc4286: use common hwmon schema
  dt-bindings: hwmon: adi,adm1275: use common hwmon schema
  dt-bindings: hwmon: ti,ina2xx: use common hwmon schema
  dt-bindings: hwmon: add common properties
  hwmon: (pmbus/ir38064) Use PMBUS_REGULATOR_ONE to declare regulator
  hwmon: (pmbus/lm25066) Use PMBUS_REGULATOR_ONE to declare regulator
  hwmon: (pmbus/tda38640) Use PMBUS_REGULATOR_ONE to declare regulator
  regulator: dt-bindings: promote infineon buck converters to their own binding
  dt-bindings: hwmon/pmbus: ti,lm25066: document regulators
  dt-bindings: hwmon: nuvoton,nct6775: Add compatible value for NCT6799
  MAINTAINERS: Drop redundant hwmon entries
  ...

Merge tag 'gpio-updates-for-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio updates from Bartosz Golaszewski:
"The biggest feature is the locking overhaul. Up until now the
  synchronization in the GPIO subsystem was broken. There was a single
  spinlock "protecting" multiple data structures but doing it wrong (as
  evidenced by several places where it would be released when a sleeping
  function was called and then reacquired without checking the protected
  state).

  We tried to use an RW semaphore before but the main issue with GPIO is
  that we have drivers implementing the interfaces in both sleeping and
  non-sleeping ways as well as user-facing interfaces that can be called
  both from process as well as atomic contexts. Both ends converge in
  the same code paths that can use neither spinlocks nor mutexes. The
  only reasonable way out is to use SRCU and go mostly lockless. To that
  end: we add several SRCU structs in relevant places and use them to
  assure consistency between API calls together with atomic reads and
  writes of GPIO descriptor flags where it makes sense.

  This code has spent several weeks in next and has received several
  fixes in the first week or two after which it stabilized nicely. The
  GPIO subsystem is now resilient to providers being suddenly unbound.
  We managed to also remove the existing character device RW semaphore
  and the obsolete global spinlock.

  Other than the locking rework we have one new driver (for Chromebook
  EC), much appreciated documentation improvements from Kent and the
  regular driver improvements, DT-bindings updates and GPIOLIB core
  tweaks.

  Serialization rework:
   - use SRCU to serialize access to the global GPIO device list, to
     GPIO device structs themselves and to GPIO descriptors
   - make the GPIO subsystem resilient to the GPIO providers being
     unbound while the API calls are in progress
   - don't dereference the SRCU-protected chip pointer if the
     information we need can be obtained from the GPIO device structure
   - move some of the information contained in struct gpio_chip to
     struct gpio_device to further reduce the need to dereference the
     former
   - pass the GPIO device struct instead of the GPIO chip to sysfs
     callback to, again, reduce the need for accessing the latter
   - get GPIO descriptors from the GPIO device, not from the chip for
     the same reason
   - allow for mostly lockless operation of the GPIO driver API: assure
     consistency with SRCU and atomic operations
   - remove the global GPIO spinlock
   - remove the character device RW semaphore

  Core GPIOLIB:
   - constify pointers in GPIO API where applicable
   - unify the GPIO counting APIs for ACPI and OF
   - provide a macro for iterating over all GPIOs, not only the ones
     that are requested
   - remove leftover typedefs
   - pass the consumer device to GPIO core in
     devm_fwnode_gpiod_get_index() for improved logging
   - constify the GPIO bus type
   - don't warn about removing GPIO chips with descriptors still held by
     users as we can now handle this situation gracefully
   - remove unused logging helpers
   - unexport functions that are only used internally in the GPIO
     subsystem
   - set the device type (assign the relevant struct device_type) for
     GPIO devices

  New drivers:
   - add the ChromeOS EC GPIO driver

  Driver improvements:
   - allow building gpio-vf610 with COMPILE_TEST as well as disabling it
     in menuconfig (before it was always built for i.MX cofigs)
   - count the number of EICs using the device properties instead of
     hard-coding it in gpio-eic-sprd
   - improve the device naming, extend the debugfs output and add
     lockdep asserts to gpio-sim

  DT bindings:
   - document the 'label' property for gpio-pca9570
   - convert aspeed,ast2400-gpio bindings to DT schema
   - disallow unevaluated properties for gpio-mvebu
   - document a new model in renesas,rcar-gpio

  Documentation:
   - improve the character device kerneldocs in user-space headers
   - add proper documentation for the character device uAPI (both v1 and v2)
   - move the sysfs and gpio-mockup docs into the "obsolete" section
   - improve naming consistency for GPIO terms
   - clarify the line values description for sysfs
   - minor docs improvements
   - improve the driver API contract for setting GPIO direction
   - mark unsafe APIs as deprecated in kerneldocs and suggest
     replacements

  Other:
   - remove an obsolete test from selftests"

* tag 'gpio-updates-for-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (79 commits)
  gpio: sysfs: repair export returning -EPERM on 1st attempt
  selftest: gpio: remove obsolete gpio-mockup test
  gpiolib: Deduplicate cleanup for-loop in gpiochip_add_data_with_key()
  dt-bindings: gpio: aspeed,ast2400-gpio: Convert to DT schema
  gpio: acpi: Make acpi_gpio_count() take firmware node as a parameter
  gpio: of: Make of_gpio_get_count() take firmware node as a parameter
  gpiolib: Pass consumer device through to core in devm_fwnode_gpiod_get_index()
  gpio: sim: use for_each_hwgpio()
  gpio: provide for_each_hwgpio()
  gpio: don't warn about removing GPIO chips with active users anymore
  gpio: sim: delimit the fwnode name with a ":" when generating labels
  gpio: sim: add lockdep asserts
  gpio: Add ChromeOS EC GPIO driver
  gpio: constify of_phandle_args in of_find_gpio_device_by_xlate()
  gpio: fix memory leak in gpiod_request_commit()
  gpio: constify opaque pointer "data" in gpio_device_find()
  gpio: cdev: fix a NULL-pointer dereference with DEBUG enabled
  gpio: uapi: clarify default_values being logical
  gpio: sysfs: fix inverted pointer logic
  gpio: don't let lockdep complain about inherently dangerous RCU usage
  ...

Merge tag 'spi-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi updates from Mark Brown:
"This release sees some exciting changes from David Lechner which
  implements some optimisations that have been talked about for a long
  time which allows client drivers to pre-prepare SPI messages for
  repeated or low latency use. This lets us move work out of latency
  sensitive paths and avoid repeating work for frequently performed
  operations. As well as being useful in itself this will also be used
  in future to allow controllers to directly trigger SPI operations (eg,
  from interrupts).

  Otherwise this release has mostly been focused on cleanups, plus a
  couple of new devices:

   - Support for pre-optimising messages

   - A big set of updates from Uwe Kleine-König moving drivers to use
     APIs with more modern terminology for controllers

   - Major overhaul of the s3c64xx driver

   - Support for Google GS101 and Samsung Exynos850"

* tag 'spi-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (122 commits)
  spi: Introduce SPI_INVALID_CS and is_valid_cs()
  spi: Fix types of the last chip select storage variables
  spi: Consistently use BIT for cs_index_mask
  spi: Exctract spi_dev_check_cs() helper
  spi: Exctract spi_set_all_cs_unused() helper
  spi: s3c64xx: switch exynos850 to new port config data
  spi: s3c64xx: switch gs101 to new port config data
  spi: s3c64xx: deprecate fifo_lvl_mask, rx_lvl_offset and port_id
  spi: s3c64xx: get rid of the OF alias ID dependency
  spi: s3c64xx: introduce s3c64xx_spi_set_port_id()
  spi: s3c64xx: let the SPI core determine the bus number
  spi: s3c64xx: allow FIFO depth to be determined from the compatible
  spi: s3c64xx: retrieve the FIFO depth from the device tree
  spi: s3c64xx: determine the fifo depth only once
  spi: s3c64xx: allow full FIFO masks
  spi: s3c64xx: define a magic value
  spi: dt-bindings: introduce FIFO depth properties
  spi: axi-spi-engine: use struct_size() macro
  spi: axi-spi-engine: use __counted_by() attribute
  spi: axi-spi-engine: remove p from struct spi_engine_message_state
  ...

Merge tag 'regulator-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator updates from Mark Brown:
"This has been a very quiet release, mostly cleanups, API updates and
  simple device additions. I messed up slightly and there are a couple
  of duplicated commits resulting from me leaving things in my inbox
  which didn't seem worth removing by the time I noticed them.

   - Conversion of several drivers to GPIO descriptors

   - Build out the features of of the MP8859 driver

   - Support for Qualcomm PM4125 and PM6150"

* tag 'regulator-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (31 commits)
  regulator: lp8788-buck: fix copy and paste bug in lp8788_dvs_gpio_request()
  regulator: core: make regulator_class constant
  regulator: da9121: Remove unused of_gpio.h
  regulator: userspace-consumer: add module device table
  regulator: dt-bindings: gpio-regulator: Fix "gpios-states" and "states" array bounds
  regulator: mp8859: Implement set_current_limit()
  regulator: mp8859: Report slew rate
  regulator: mp8859: Support status and error readback
  regulator: mp8859: Support active discharge control
  regulator: mp8859: Support mode operations
  regulator: mp8859: Support enable control
  regulator: mp8859: Validate and log device identifier information
  regulator: mp8859: Specify register accessibility and enable caching
  regulator: max8998: Convert to GPIO descriptors
  regulator: max8997: Convert to GPIO descriptors
  regulator: lp8788-buck: Fully convert to GPIO descriptors
  regulator: da9055: Fully convert to GPIO descriptors
  regulator: max8973: Finalize switch to GPIO descriptors
  regulator: dt-bindings: qcom,usb-vbus-regulator: add support for PM4125
  regulator: dt-bindings: qcom,usb-vbus-regulator: add support for PM4125
  ...

Merge tag 'regmap-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap updates from Mark Brown:
"Just two updates this time around, a rework of max_register handling
  which enables us to support devices with only one register better and
  a new test which will be used to validate use of some new SPI
  optimisations which will be coming in during this merge window"

* tag 'regmap-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: kunit: Add a test for ranges in combination with windows
  regmap: rework ->max_register handling

Merge tag 'mmc-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC updates from Ulf Hansson:
"MMC core:
   - Drop the use of BLK_BOUNCE_HIGH
   - Fix partition switch for GP3
   - Remove usage of the deprecated ida_simple API

  MMC host:
   - cqhci: Update bouncing email-addresses in MAINTAINERS
   - davinci_mmc: Use sg_miter for PIO
   - dw_mmc-hi3798cv200: Convert the DT bindings to YAML
   - dw_mmc-hi3798mv200: Add driver for the new dw_mmc variant
   - fsl-imx-esdhc: A couple of corrections/updates to the DT bindings
   - meson-mx-sdhc: Drop use of the ->card_hw_reset() callback
   - moxart-mmc: Use sg_miter for PIO
   - moxart-mmc: Fix accounting for DMA transfers
   - mvsdio: Use sg_miter for PIO
   - mxcmmc: Use sg_miter for PIO
   - omap: Use sg_miter for PIO
   - renesas,sdhi: Add support for R-Car V4M variant
   - sdhci-esdhc-mcf: Use sg_miter for swapping
   - sdhci-of-dwcmshc: Add support for Sophgo CV1800B and SG2002 variants
   - sh_mmcif: Use sg_miter for PIO
   - tmio: Avoid concurrent runs of mmc_request_done()"

* tag 'mmc-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (44 commits)
  mmc: core: make mmc_host_class constant
  mmc: core: Fix switch on gp3 partition
  mmc: tmio: comment the ERR_PTR usage in this driver
  mmc: mmc_spi: Don't mention DMA direction
  mmc: dw_mmc: Remove unused of_gpio.h
  mmc: dw_mmc: add support for hi3798mv200
  dt-bindings: mmc: hisilicon,hi3798cv200-dw-mshc: add Hi3798MV200 binding
  dt-bindings: mmc: dw-mshc-hi3798cv200: convert to YAML
  mmc: dw_mmc-hi3798cv200: remove MODULE_ALIAS()
  mmc: core: Use a struct device* as in-param to mmc_of_parse_clk_phase()
  mmc: wmt-sdmmc: remove an incorrect release_mem_region() call in the .remove function
  mmc: tmio: avoid concurrent runs of mmc_request_done()
  dt-bindings: mmc: fsl-imx-mmc: Document the required clocks
  mmc: sh_mmcif: Advance sg_miter before reading blocks
  mmc: sh_mmcif: sg_miter must not be atomic
  mmc: sdhci-esdhc-mcf: Flag the sg_miter as atomic
  dt-bindings: mmc: fsl-imx-esdhc: add default and 100mhz state
  mmc: core: constify the struct device_type usage
  mmc: sdhci-of-dwcmshc: Add support for Sophgo CV1800B and SG2002
  dt-bindings: mmc: sdhci-of-dwcmhsc: Add Sophgo CV1800B and SG2002 support
  ...

Merge tag 'pwm/for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux

Pull pwm updates from Uwe Kleine-König:
"This contains the usual amount of driver and device tree changes.
  Additionally there is a big rework of how pwm lowlevel drivers are
  registered to prepare adding character device support.

  Thanks to Dharma Balasubiramani, Dong Aisheng, Duje Mihanović, Jerome
  Brunet, Raag Jadav and Rafał Miłecki for their contributions. And
  sorry for those who still need some patience because I didn't manage
  to empty my review queue"

* tag 'pwm/for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux: (185 commits)
  pwm: imx-tpm: fix probe crash due to access registers without clock
  pwm: meson: generalize 4 inputs clock on meson8 pwm type
  dt-bindings: pwm: amlogic: Add a new binding for meson8 pwm types
  dt-bindings: pwm: amlogic: fix s4 bindings
  pwm: dwc: simplify error handling
  pwm: dwc: Add 16 channel support for Intel Elkhart Lake
  pwm: dwc: drop redundant error check
  staging: greybus: pwm: Make use of devm_pwmchip_alloc() function
  staging: greybus: pwm: Rework how the number of PWM lines is determined
  staging: greybus: pwm: Drop unused gb_connection_set_data()
  staging: greybus: pwm: Rely on pwm framework to pass a valid hwpwm
  staging: greybus: pwm: Make use of pwmchip_parent() accessor
  staging: greybus: pwm: Change prototype of helpers to prepare further changes
  leds: qcom-lpg: Make use of devm_pwmchip_alloc() function
  drm/bridge: ti-sn65dsi86: Make use of devm_pwmchip_alloc() function
  drm/bridge: ti-sn65dsi86: Make use of pwmchip_parent() accessor
  gpio: mvebu: Make use of devm_pwmchip_alloc() function
  pwm: xilinx: Make use of devm_pwmchip_alloc() function
  pwm: xilinx: Prepare removing pwm_chip from driver data
  pwm: vt8500: Make use of devm_pwmchip_alloc() function
  ...

Merge tag 'tag-chrome-platform-firmware-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome platform firmware updates from Tzung-Bi Shih:

- Allow userspace to automatically load coreboot modules by adding
   modaliases and sending uevents

- Make bus_type const

* tag 'tag-chrome-platform-firmware-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  firmware: coreboot: Replace tag with id table in driver struct
  firmware: coreboot: Generate aliases for coreboot modules
  firmware: coreboot: Generate modalias uevent for devices
  firmware: coreboot: make coreboot_bus_type const

Merge tag 'for-6.9/dm-vdo' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper VDO target from Mike Snitzer:
"Introduce the DM vdo target which provides block-level deduplication,
  compression, and thin provisioning. Please see:

      Documentation/admin-guide/device-mapper/vdo.rst
      Documentation/admin-guide/device-mapper/vdo-design.rst

  The DM vdo target handles its concurrency by pinning an IO, and
  subsequent stages of handling that IO, to a particular VDO thread.
  This aspect of VDO is "unique" but its overall implementation is very
  tightly coupled to its mostly lockless threading model. As such, VDO
  is not easily changed to use more traditional finer-grained locking
  and Linux workqueues. Please see the "Zones and Threading" section of
  vdo-design.rst

  The DM vdo target has been used in production for many years but has
  seen significant changes over the past ~6 years to prepare it for
  upstream inclusion. The codebase is still large but it is isolated to
  drivers/md/dm-vdo/ and has been made considerably more approachable
  and maintainable.

  Matt Sakai has been added to the MAINTAINERS file to reflect that he
  will send VDO changes upstream through the DM subsystem maintainers"

* tag 'for-6.9/dm-vdo' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (142 commits)
  dm vdo: document minimum metadata size requirements
  dm vdo: remove meaningless version number constant
  dm vdo: remove vdo_perform_once
  dm vdo block-map: Remove stray semicolon
  dm vdo string-utils: change from uds_ to vdo_ namespace
  dm vdo logger: change from uds_ to vdo_ namespace
  dm vdo funnel-queue: change from uds_ to vdo_ namespace
  dm vdo indexer: fix use after free
  dm vdo logger: remove log level to string conversion code
  dm vdo: document log_level parameter
  dm vdo: add 'log_level' module parameter
  dm vdo: remove all sysfs interfaces
  dm vdo target: eliminate inappropriate uses of UDS_SUCCESS
  dm vdo indexer: update ASSERT and ASSERT_LOG_ONLY usage
  dm vdo encodings: update some stale comments
  dm vdo permassert: audit all of ASSERT to test for VDO_SUCCESS
  dm-vdo funnel-workqueue: return VDO_SUCCESS from make_simple_work_queue
  dm vdo thread-utils: return VDO_SUCCESS on vdo_create_thread success
  dm vdo int-map: return VDO_SUCCESS on success
  dm vdo: check for VDO_SUCCESS return value from memory-alloc functions
  ...

Merge tag 'for-6.9/dm-bh-wq' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper BH workqueue conversion from Mike Snitzer:
"Convert the DM verity and crypt targets from (ab)using tasklets to
  using BH workqueues.

  These changes were coordinated with Tejun and are based ontop of DM's
  6.9 changes and Tejun's 6.9 workqueue tree"

* tag 'for-6.9/dm-bh-wq' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm-verity: Convert from tasklet to BH workqueue
  dm-crypt: Convert from tasklet to BH workqueue

Merge tag 'for-6.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- Fix DM core's IO submission (which include dm-io and dm-bufio) such
   that a bio's IO priority is propagated. Work focused on enabling both
   DM crypt and verity targets to retain the appropriate IO priority

- Fix DM raid reshape logic to not allow an empty flush bio to be
   requeued due to false concern about the bio, which doesn't have a
   data payload, accessing beyond the end of the device

- Fix DM core's internal resume so that it properly calls both presume
   and resume methods, which fixes the potential for a postsuspend and
   resume imbalance

- Update DM verity target to set DM_TARGET_SINGLETON flag because it
   doesn't make sense to have a DM table with a mix of targets that
   include dm-verity

- Small cleanups in DM crypt, thin, and integrity targets

- Fix references to dm-devel mailing list to use latest list address

* tag 'for-6.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: call the resume method on internal suspend
  dm raid: fix false positive for requeue needed during reshape
  dm-integrity: set max_integrity_segments in dm_integrity_io_hints
  dm: update relevant MODULE_AUTHOR entries to latest dm-devel mailing list
  dm ioctl: update DM_DRIVER_EMAIL to new dm-devel mailing list
  dm verity: set DM_TARGET_SINGLETON feature flag
  dm crypt: Fix IO priority lost when queuing write bios
  dm verity: Fix IO priority lost when reading FEC and hash
  dm bufio: Support IO priority
  dm io: Support IO priority
  dm crypt: remove redundant state settings after waking up
  dm thin: add braces around conditional code that spans lines

Merge tag 'ata-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata updates from Niklas Cassel:

- Do not enable LPM for external ports (hotplug-capable ports or eSATA
   ports), as the HBA will not be able to detect hot plug removal events
   when LPM is enabled (me)

- Drop the board type board_ahci_low_power. Now when we make sure that
   we won't enable LPM for external ports, we can always set the LPM
   policy to CONFIG_SATA_MOBILE_LPM_POLICY for internal ports. There is
   thus no longer any need for the board type board_ahci_low_power, so
   it can be removed. (As before, LPM features not supported by the HBA
   and/or the device will not be enabled, regardless of the LPM policy
   Kconfig) (Mario Limonciello)

   Note that the default CONFIG_SATA_MOBILE_LPM_POLICY value is still 0
   (which will not try to enable any LPM features), however, most Linux
   distributions override this and set it to 3 (Medium power with DIPM).
   We intend to change the default to 3 in the coming cycles, but we
   will wait a cycle or two.

- Add board type board_ahci_pcs_quirk and make all legacy Intel
   platforms use it. The Intel PCS quirk was being applied to basically
   all Intel platforms, which caused some issues (the device failing to
   come back after a reset), when being applied to newer Intel platforms
   where it shouldn't have been applied.

   New platforms can be added using board type board_ahci, which will
   not have the quirk applied (me)

- Rename board_ahci_nosntf to board_ahci_pcs_quirk_no_sntf to more
   clearly highlight that it applies two different quirks (me)

- Modify the ahci_broken_devslp() quirk to be implemented like all the
   other quirks (i.e. define a board type for the quirk) (me)

- Drop unused board_ahci_noncq board type (me)

- Rename board_ahci_nomsi to board_ahci_no_msi to match the other board
   types (me)

- Make pata_parport_bus_type const (Ricardo B. Marliere)

- Remove at91 compact flash device tree binding. (The binding is not
   used by any driver.) (from Hari Prasath Gujulan Elango)

- Convert MediaTek device tree binding to json-schema (Rafał Miłecki)

- At boot, print the number of implemented ports, instead of printing
   the maximum number of ports supported by the HBA silicon (me)

* tag 'ata-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ahci: print the number of implemented ports
  dt-bindings: ata: convert MediaTek controller to the json-schema
  ahci: rename board_ahci_nomsi
  ahci: drop unused board_ahci_noncq
  ahci: clean up ahci_broken_devslp quirk
  ahci: rename board_ahci_nosntf
  ahci: clean up intel_pcs_quirk
  ata: ahci: Drop low power policy board type
  ata: ahci: do not enable LPM on external ports
  ata: ahci: drop hpriv param from ahci_update_initial_lpm_policy()
  ata: ahci: a hotplug capable port is an external port
  ata: ahci: move marking of external port earlier
  dt-bindings: ata: atmel: remove at91 compact flash documentation
  ata: pata_parport: make pata_parport_bus_type const

Merge tag 'dma-mapping-6.9-2024-03-11' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

- fix leaked pages on dma_set_decrypted() failure (Rick Edgecombe)

- add a new swiotlb debugfs file (ZhangPeng)

* tag 'dma-mapping-6.9-2024-03-11' of git://git.infradead.org/users/hch/dma-mapping:
dma-direct: Leak pages on dma_set_decrypted() failure
swiotlb: add debugfs to track swiotlb transient pool usage

Merge tag 'iommu-updates-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
"Core changes:
    - Constification of bus_type pointer
    - Preparations for user-space page-fault delivery
    - Use a named kmem_cache for IOVA magazines

  Intel VT-d changes from Lu Baolu:
    - Add RBTree to track iommu probed devices
    - Add Intel IOMMU debugfs document
    - Cleanup and refactoring

  ARM-SMMU Updates from Will Deacon:
    - Device-tree binding updates for a bunch of Qualcomm SoCs
    - SMMUv2: Support for Qualcomm X1E80100 MDSS
    - SMMUv3: Significant rework of the driver's STE manipulation and
      domain handling code. This is the initial part of a larger scale
      rework aiming to improve the driver's implementation of the
      IOMMU-API in preparation for hooking up IOMMUFD support.

  AMD-Vi Updates:
    - Refactor GCR3 table support for SVA
    - Cleanups

  Some smaller cleanups and fixes"

* tag 'iommu-updates-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (88 commits)
  iommu: Fix compilation without CONFIG_IOMMU_INTEL
  iommu/amd: Fix sleeping in atomic context
  iommu/dma: Document min_align_mask assumption
  iommu/vt-d: Remove scalabe mode in domain_context_clear_one()
  iommu/vt-d: Remove scalable mode context entry setup from attach_dev
  iommu/vt-d: Setup scalable mode context entry in probe path
  iommu/vt-d: Fix NULL domain on device release
  iommu: Add static iommu_ops->release_domain
  iommu/vt-d: Improve ITE fault handling if target device isn't present
  iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected
  PCI: Make pci_dev_is_disconnected() helper public for other drivers
  iommu/vt-d: Use device rbtree in iopf reporting path
  iommu/vt-d: Use rbtree to track iommu probed devices
  iommu/vt-d: Merge intel_svm_bind_mm() into its caller
  iommu/vt-d: Remove initialization for dynamically heap-allocated rcu_head
  iommu/vt-d: Remove treatment for revoking PASIDs with pending page faults
  iommu/vt-d: Add the document for Intel IOMMU debugfs
  iommu/vt-d: Use kcalloc() instead of kzalloc()
  iommu/vt-d: Remove INTEL_IOMMU_BROKEN_GFX_WA
  iommu: re-use local fwnode variable in iommu_ops_from_fwnode()
  ...

ALSA: usb-audio: Stop parsing channels bits when all channels are found.

If a usb audio device sets more bits than the amount of channels
it could write outside of the map array.

Signed-off-by: Johan Carlsson <[email protected]>
Fixes: 04324ccc75f9 ("ALSA: usb-audio: add channel map support")
Message-ID: <20240313081509 [email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

Revert "mm: add arch hook to validate mmap() prot flags"

This reverts commit cb1a393c40eee2f1692c995ea0cc6e45bfccde4d.

Since the arm64 WXN patch has been reverted, remove this hook as it
would not have any users.

Signed-off-by: Catalin Marinas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Revert "arm64: mm: add support for WXN memory translation attribute"

This reverts commit 50e3ed0f93f4f62ed2aa83de5db6cb84ecdd5707.

The SCTLR_EL1.WXN control forces execute-never when a page has write
permissions. While the idea of hardening such write/exec combinations is
good, with permissions indirection enabled (FEAT_PIE) this control
becomes RES0. FEAT_PIE introduces a slightly different form of WXN which
only has an effect when the base permission is RWX and the write is
toggled by the permission overlay (FEAT_POE, not yet supported by the
arm64 kernel). Revert the patch for now.

Signed-off-by: Catalin Marinas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf/x86/amd/core: Avoid register reset when CPU is dead

When bringing a CPU online, some of the PMC and LBR related registers
are reset. The same is done when a CPU is taken offline although that
is unnecessary. This currently happens in the "cpu_dead" callback which
is also incorrect as the callback runs on a control CPU instead of the
one that is being taken offline. This also affects hibernation and
suspend to RAM on some platforms as reported in the link below.

Fixes: 21d59e3e2c40 ("perf/x86/amd/core: Detect PerfMonV2 support")
Reported-by: Mario Limonciello <[email protected]>
Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/550a026764342cf7e5812680e3e2b91fe662b5ac.1706526029.git.sandipan.das@amd.com

perf/x86/amd/lbr: Discard erroneous branch entries

The Revision Guide for AMD Family 19h Model 10-1Fh processors declares
Erratum 1452 which states that non-branch entries may erroneously be
recorded in the Last Branch Record (LBR) stack with the valid and
spec bits set.

Such entries can be recognized by inspecting bit 61 of the corresponding
LastBranchStackToIp register. This bit is currently reserved but if found
to be set, the associated branch entry should be discarded.

Signed-off-by: Sandipan Das <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://bugzilla.kernel.org/attachment.cgi?id=305518
Link: https://lore.kernel.org/r/3ad2aa305f7396d41a40e3f054f740d464b16b7f.1706526029.git.sandipan.das@amd.com

ALSA: hda/tas2781: remove unnecessary runtime_pm calls

The runtime_pm handling seems to have been loosely inspired by the
cs32l41 driver, but in this case the get_noresume/put sequence is not
required.

Signed-off-by: Pierre-Louis Bossart <[email protected]>
Message-ID: <20240312161217 [email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: hda/realtek - ALC236 fix volume mute & mic mute LED on some HP models

Some HP laptops have received revisions that altered their board IDs
and therefore the current patches/quirks do not apply to them.
Specifically, for my Probook 440 G8, I have a board ID of 8a74.
It is necessary to add a line for that specific model.

Signed-off-by: Valentine Altair <[email protected]>
Cc: <[email protected]>
Message-ID: <kOqXRBcxkKt6m5kciSDCkGqMORZi_HB3ZVPTX5sD3W1pKxt83Pf-WiQ1V1pgKKI8pYr4oGvsujt3vk2zsCE-DDtnUADFG6NGBlS5N3U4xgA=@proton.me>
Signed-off-by: Takashi Iwai <[email protected]>

Merge branch 'for-6.9/cxl-fixes' into for-6.9/cxl

Pick up a parsing fix for the CDAT SSLBIS structure for v6.9.

Merge branch 'for-6.9/cxl-einj' into for-6.9/cxl

Pick up support for injecting errors via ACPI EINJ into the CXL protocol
for v6.9.

Merge branch 'for-6.9/cxl-qos' into for-6.9/cxl

Pick up support for CXL "HMEM reporting" for v6.9, i.e. build an HMAT
from CXL CDAT and PCIe switch information.

lib/firmware_table: Provide buffer length argument to cdat_table_parse()

There exist card implementations with a CDAT table using a fixed size
buffer, but with entries filled in that do not fill the whole table
length size. Then, the last entry in the CDAT table may not mark the
end of the CDAT table buffer specified by the length field in the CDAT
header. It can be shorter with trailing unused (zero'ed) data. The
actual table length is determined while reading all CDAT entries of
the table with DOE.

If the table is greater than expected (containing zero'ed trailing
data), the CDAT parser fails with:

[   48.691717] Malformed DSMAS table length: (24:0)
[   48.702084] [CDAT:0x00] Invalid zero length
[   48.711460] cxl_port endpoint1: Failed to parse CDAT: -22

In addition, a check of the table buffer length is missing to prevent
an out-of-bound access then parsing the CDAT table.

Hardening code against device returning borked table. Fix that by
providing an optional buffer length argument to
acpi_parse_entries_array() that can be used by cdat_table_parse() to
propagate the buffer size down to its users to check the buffer
length. This also prevents a possible out-of-bound access mentioned.

Add a check to warn about a malformed CDAT table length.

Cc: Rafael J. Wysocki <[email protected]>
Cc: Len Brown <[email protected]>
Reviewed-by: Dave Jiang <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

cxl/pci: Get rid of pointer arithmetic reading CDAT table

Reading the CDAT table using DOE requires a Table Access Response
Header in addition to the CDAT entry. In current implementation this
has caused offsets with sizeof(__le32) to the actual buffers. This led
to hardly readable code and even bugs. E.g., see fix of devm_kfree()
in read_cdat_data():

commit c65efe3685f5 ("cxl/cdat: Free correct buffer on checksum error")

Rework code to avoid calculations with sizeof(__le32). Introduce
struct cdat_doe_rsp for this which contains the Table Access Response
Header and a variable payload size for various data structures
afterwards to access the CDAT table and its CDAT Data Structures
without recalculating buffer offsets.

Cc: Lukas Wunner <[email protected]>
Cc: Fan Ni <[email protected]>
Reviewed-by: Dave Jiang <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

cxl/pci: Rename DOE mailbox handle to doe_mb

Trivial variable rename for the DOE mailbox handle from cdat_doe to
doe_mb. The variable name cdat_doe is too ambiguous, use doe_mb that
is commonly used for the mailbox.

Signed-off-by: Robert Richter <[email protected]>
Reviewed-by: Dave Jiang <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

cxl: Fix the incorrect assignment of SSLBIS entry pointer initial location

The 'entry' pointer in cdat_sslbis_handler() is set to header +
sizeof(common header). However, the math missed the addition of the SSLBIS
main header. It should be header + sizeof(common header) + sizeof(*sslbis).
Use a defined struct for all the SSLBIS parts in order to avoid pointer
math errors.

The bug causes incorrect parsing of the SSLBIS table and introduces incorrect
performance values to the access_coordinates during the CXL access_coordinate
calculation path if there are CXL switches present in the topology.

The issue was found during testing of new code being added to add additional
checks for invalid CDAT values during CXL access_coordinate calculation. The
testing was done on qemu with a CXL topology including a CXL switch.

Fixes: 80aa780dda20 ("cxl: Add callback to parse the SSLBIS subtable from CDAT")
Signed-off-by: Dave Jiang <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Fan Ni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

cxl/core: Add CXL EINJ debugfs files

Export CXL helper functions in einj-cxl.c for getting/injecting
available CXL protocol error types to sysfs under kernel/debug/cxl.

The kernel/debug/cxl/einj_types file will print the available CXL
protocol errors in the same format as the available_error_types
file provided by the einj module. The
kernel/debug/cxl/$dport_dev/einj_inject file is functionally the same
as the error_type and error_inject files provided by the EINJ module,
i.e.: writing an error type into $dport_dev/einj_inject will inject
said error type into the CXL dport represented by $dport_dev.

Reviewed-by: Jonathan Cameron <[email protected]>
Signed-off-by: Ben Cheatham <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

EINJ, Documentation: Update EINJ kernel doc

Update EINJ kernel document to include how to inject CXL protocol error
types, build the kernel to include CXL error types, and give an example
injection.

Reviewed-by: Jonathan Cameron <[email protected]>
Signed-off-by: Ben Cheatham <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

EINJ: Add CXL error type support

Move CXL protocol error types from einj.c (now einj-core.c) to einj-cxl.c.
einj-cxl.c implements the necessary handling for CXL protocol error
injection and exposes an API for the CXL core to use said functionality,
while also allowing the EINJ module to be built without CXL support.
Because CXL error types targeting CXL 1.0/1.1 ports require special
handling, only allow them to be injected through the new cxl debugfs
interface (next commit) and return an error when attempting to inject
through the legacy interface.

Reviewed-by: Jonathan Cameron <[email protected]>
Signed-off-by: Ben Cheatham <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

EINJ: Migrate to a platform driver

Change the EINJ module to install a platform device/driver on module
init and move the module init() and exit() functions to driver probe and
remove. This change allows the EINJ module to load regardless of whether
setting up EINJ succeeds, which allows dependent modules to still load
(i.e. the CXL core).

Since EINJ may no longer be initialized when the module loads, any
functions that are called from dependent/external modules should
safegaurd against the case EINJ didn't load.

Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Signed-off-by: Ben Cheatham <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

Merge tag 'printk-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk updates from Petr Mladek:
"Improve the behavior during panic. The issues were found when testing
  the ongoing changes introducing atomic consoles and printk kthreads:

   - pr_flush() has to wait for the last reserved record instead of the
     last finalized one. Note that records are finalized in random order
     when generated by more CPUs in parallel.

   - Ignore non-finalized records during panic(). Messages printed on
     panic-CPU are always finalized. Messages printed by other CPUs
     might never be finalized when the CPUs get stopped.

   - Block new printk() calls on non-panic CPUs completely. Backtraces
     are printed before entering the panic mode. Later messages would
     just mess information printed by the panic CPU.

   - Do not take console_lock in console_flush_on_panic() at all. The
     original code did try_lock()/console_unlock(). The unlock part
     might cause a deadlock when panic() happened in a scheduler code.

   - Fix conversion of 64-bit sequence number for 32-bit atomic
     operations"

* tag 'printk-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  dump_stack: Do not get cpu_sync for panic CPU
  panic: Flush kernel log buffer at the end
  printk: Avoid non-panic CPUs writing to ringbuffer
  printk: Disable passing console lock owner completely during panic()
  printk: ringbuffer: Skip non-finalized records in panic
  printk: Wait for all reserved records with pr_flush()
  printk: ringbuffer: Cleanup reader terminology
  printk: Add this_cpu_in_panic()
  printk: For @suppress_panic_printk check for other CPU in panic
  printk: ringbuffer: Clarify special lpos values
  printk: ringbuffer: Do not skip non-finalized records with prb_next_seq()
  printk: Use prb_first_seq() as base for 32bit seq macros
  printk: Adjust mapping for 32bit seq macros
  printk: nbcon: Relocate 32bit seq macros

mm, slab: remove last vestiges of SLAB_MEM_SPREAD

Yes, yes, I know the slab people were planning on going slow and letting
every subsystem fight this thing on their own. But let's just rip off
the band-aid and get it over and done with. I don't want to see a
number of unnecessary pull requests just to get rid of a flag that no
longer has any meaning.

This was mainly done with a couple of 'sed' scripts and then some manual
cleanup of the end result.

Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'slab-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

- Freelist loading optimization (Chengming Zhou)

   When the per-cpu slab is depleted and a new one loaded from the cpu
   partial list, optimize the loading to avoid an irq enable/disable
   cycle. This results in a 3.5% performance improvement on the "perf
   bench sched messaging" test.

- Kernel boot parameters cleanup after SLAB removal (Xiongwei Song)

   Due to two different main slab implementations we've had boot
   parameters prefixed either slab_ and slub_ with some later becoming
   an alias as both implementations gained the same functionality (i.e.
   slab_nomerge vs slub_nomerge). In order to eventually get rid of the
   implementation-specific names, the canonical and documented
   parameters are now all prefixed slab_ and the slub_ variants become
   deprecated but still working aliases.

- SLAB_ kmem_cache creation flags cleanup (Vlastimil Babka)

   The flags had hardcoded #define values which became tedious and
   error-prone when adding new ones. Assign the values via an enum that
   takes care of providing unique bit numbers. Also deprecate
   SLAB_MEM_SPREAD which was only used by SLAB, so it's a no-op since
   SLAB removal. Assign it an explicit zero value. The removals of the
   flag usage are handled independently in the respective subsystems,
   with a final removal of any leftover usage planned for the next
   release.

- Misc cleanups and fixes (Chengming Zhou, Xiaolei Wang, Zheng Yejian)

   Includes removal of unused code or function parameters and a fix of a
   memleak.

* tag 'slab-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  slab: remove PARTIAL_NODE slab_state
  mm, slab: remove memcg_from_slab_obj()
  mm, slab: remove the corner case of inc_slabs_node()
  mm/slab: Fix a kmemleak in kmem_cache_destroy()
  mm, slab, kasan: replace kasan_never_merge() with SLAB_NO_MERGE
  mm, slab: use an enum to define SLAB_ cache creation flags
  mm, slab: deprecate SLAB_MEM_SPREAD flag
  mm, slab: fix the comment of cpu partial list
  mm, slab: remove unused object_size parameter in kmem_cache_flags()
  mm/slub: remove parameter 'flags' in create_kmalloc_caches()
  mm/slub: remove unused parameter in next_freelist_entry()
  mm/slub: remove full list manipulation for non-debug slab
  mm/slub: directly load freelist from cpu partial slab in the likely case
  mm/slub: make the description of slab_min_objects helpful in doc
  mm/slub: replace slub_$params with slab_$params in slub.rst
  mm/slub: unify all sl[au]b parameters with "slab_$param"
  Documentation: kernel-parameters: remove noaliencache

Merge tag 'lsm-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm updates from Paul Moore:

- Promote IMA/EVM to a proper LSM

   This is the bulk of the diffstat, and the source of all the changes
   in the VFS code. Prior to the start of the LSM stacking work it was
   important that IMA/EVM were separate from the rest of the LSMs,
   complete with their own hooks, infrastructure, etc. as it was the
   only way to enable IMA/EVM at the same time as a LSM.

   However, now that the bulk of the LSM infrastructure supports
   multiple simultaneous LSMs, we can simplify things greatly by
   bringing IMA/EVM into the LSM infrastructure as proper LSMs. This is
   something I've wanted to see happen for quite some time and Roberto
   was kind enough to put in the work to make it happen.

- Use the LSM hook default values to simplify the call_int_hook() macro

   Previously the call_int_hook() macro required callers to supply a
   default return value, despite a default value being specified when
   the LSM hook was defined.

   This simplifies the macro by using the defined default return value
   which makes life easier for callers and should also reduce the number
   of return value bugs in the future (we've had a few pop up recently,
   hence this work).

- Use the KMEM_CACHE() macro instead of kmem_cache_create()

   The guidance appears to be to use the KMEM_CACHE() macro when
   possible and there is no reason why we can't use the macro, so let's
   use it.

- Fix a number of comment typos in the LSM hook comment blocks

   Not much to say here, we fixed some questionable grammar decisions in
   the LSM hook comment blocks.

* tag 'lsm-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (28 commits)
  cred: Use KMEM_CACHE() instead of kmem_cache_create()
  lsm: use default hook return value in call_int_hook()
  lsm: fix typos in security/security.c comment headers
  integrity: Remove LSM
  ima: Make it independent from 'integrity' LSM
  evm: Make it independent from 'integrity' LSM
  evm: Move to LSM infrastructure
  ima: Move IMA-Appraisal to LSM infrastructure
  ima: Move to LSM infrastructure
  integrity: Move integrity_kernel_module_request() to IMA
  security: Introduce key_post_create_or_update hook
  security: Introduce inode_post_remove_acl hook
  security: Introduce inode_post_set_acl hook
  security: Introduce inode_post_create_tmpfile hook
  security: Introduce path_post_mknod hook
  security: Introduce file_release hook
  security: Introduce file_post_open hook
  security: Introduce inode_post_removexattr hook
  security: Introduce inode_post_setattr hook
  security: Align inode_setattr hook definition with EVM
  ...

Merge tag 'selinux-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
"Really only a few notable changes:

   - Continue the coding style/formatting fixup work

     This is the bulk of the diffstat in this pull request, with the
     focus this time around being the security/selinux/ss directory.

     We've only got a couple of files left to cleanup and once we're
     done with that we can start enabling some automatic style
     verfication and introduce tooling to help new folks format their
     code correctly.

   - Don't restrict xattr copy-up when SELinux policy is not loaded

     This helps systems that use overlayfs, or similar filesystems,
     preserve their SELinux labels during early boot when the SELinux
     policy has yet to be loaded.

   - Reduce the work we do during inode initialization time

     This isn't likely to show up in any benchmark results, but we
     removed an unnecessary SELinux object class lookup/calculation
     during inode initialization.

   - Correct the return values in selinux_socket_getpeersec_dgram()

     We had some inconsistencies with respect to our return values
     across selinux_socket_getpeersec_dgram() and
     selinux_socket_getpeersec_stream().

     This provides a more uniform set of error codes across the two
     functions and should help make it easier for users to identify
     the source of a failure"

* tag 'selinux-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (24 commits)
  selinux: fix style issues in security/selinux/ss/symtab.c
  selinux: fix style issues in security/selinux/ss/symtab.h
  selinux: fix style issues in security/selinux/ss/sidtab.c
  selinux: fix style issues in security/selinux/ss/sidtab.h
  selinux: fix style issues in security/selinux/ss/services.h
  selinux: fix style issues in security/selinux/ss/policydb.c
  selinux: fix style issues in security/selinux/ss/policydb.h
  selinux: fix style issues in security/selinux/ss/mls_types.h
  selinux: fix style issues in security/selinux/ss/mls.c
  selinux: fix style issues in security/selinux/ss/mls.h
  selinux: fix style issues in security/selinux/ss/hashtab.c
  selinux: fix style issues in security/selinux/ss/hashtab.h
  selinux: fix style issues in security/selinux/ss/ebitmap.c
  selinux: fix style issues in security/selinux/ss/ebitmap.h
  selinux: fix style issues in security/selinux/ss/context.h
  selinux: fix style issues in security/selinux/ss/context.h
  selinux: fix style issues in security/selinux/ss/constraint.h
  selinux: fix style issues in security/selinux/ss/conditional.c
  selinux: fix style issues in security/selinux/ss/conditional.h
  selinux: fix style issues in security/selinux/ss/avtab.c
  ...

Revert "crypto: remove CONFIG_CRYPTO_STATS"

This reverts commit 2beb81fbf0c01a62515a1bcef326168494ee2bd0.

While removing CONFIG_CRYPTO_STATS is a worthy goal, this also
removed unrelated infrastructure such as crypto_comp_alg_common.

Signed-off-by: Herbert Xu <[email protected]>

Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
"Core & protocols:

   - Large effort by Eric to lower rtnl_lock pressure and remove locks:

      - Make commonly used parts of rtnetlink (address, route dumps
        etc) lockless, protected by RCU instead of rtnl_lock.

      - Add a netns exit callback which already holds rtnl_lock,
        allowing netns exit to take rtnl_lock once in the core instead
        of once for each driver / callback.

      - Remove locks / serialization in the socket diag interface.

      - Remove 6 calls to synchronize_rcu() while holding rtnl_lock.

      - Remove the dev_base_lock, depend on RCU where necessary.

   - Support busy polling on a per-epoll context basis. Poll length and
     budget parameters can be set independently of system defaults.

   - Introduce struct net_hotdata, to make sure read-mostly global
     config variables fit in as few cache lines as possible.

   - Add optional per-nexthop statistics to ease monitoring / debug of
     ECMP imbalance problems.

   - Support TCP_NOTSENT_LOWAT in MPTCP.

   - Ensure that IPv6 temporary addresses' preferred lifetimes are long
     enough, compared to other configured lifetimes, and at least 2 sec.

   - Support forwarding of ICMP Error messages in IPSec, per RFC 4301.

   - Add support for the independent control state machine for bonding
     per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
     control state machine.

   - Add "network ID" to MCTP socket APIs to support hosts with multiple
     disjoint MCTP networks.

   - Re-use the mono_delivery_time skbuff bit for packets which user
     space wants to be sent at a specified time. Maintain the timing
     information while traversing veth links, bridge etc.

   - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.

   - Simplify many places iterating over netdevs by using an xarray
     instead of a hash table walk (hash table remains in place, for use
     on fastpaths).

   - Speed up scanning for expired routes by keeping a dedicated list.

   - Speed up "generic" XDP by trying harder to avoid large allocations.

   - Support attaching arbitrary metadata to netconsole messages.

  Things we sprinkled into general kernel code:

   - Enforce VM_IOREMAP flag and range in ioremap_page_range and
     introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
     bpf_arena).

   - Rework selftest harness to enable the use of the full range of ksft
     exit code (pass, fail, skip, xfail, xpass).

  Netfilter:

   - Allow userspace to define a table that is exclusively owned by a
     daemon (via netlink socket aliveness) without auto-removing this
     table when the userspace program exits. Such table gets marked as
     orphaned and a restarting management daemon can re-attach/regain
     ownership.

   - Speed up element insertions to nftables' concatenated-ranges set
     type. Compact a few related data structures.

  BPF:

   - Add BPF token support for delegating a subset of BPF subsystem
     functionality from privileged system-wide daemons such as systemd
     through special mount options for userns-bound BPF fs to a trusted
     & unprivileged application.

   - Introduce bpf_arena which is sparse shared memory region between
     BPF program and user space where structures inside the arena can
     have pointers to other areas of the arena, and pointers work
     seamlessly for both user-space programs and BPF programs.

   - Introduce may_goto instruction that is a contract between the
     verifier and the program. The verifier allows the program to loop
     assuming it's behaving well, but reserves the right to terminate
     it.

   - Extend the BPF verifier to enable static subprog calls in spin lock
     critical sections.

   - Support registration of struct_ops types from modules which helps
     projects like fuse-bpf that seeks to implement a new struct_ops
     type.

   - Add support for retrieval of cookies for perf/kprobe multi links.

   - Support arbitrary TCP SYN cookie generation / validation in the TC
     layer with BPF to allow creating SYN flood handling in BPF
     firewalls.

   - Add code generation to inline the bpf_kptr_xchg() helper which
     improves performance when stashing/popping the allocated BPF
     objects.

  Wireless:

   - Add SPP (signaling and payload protected) AMSDU support.

   - Support wider bandwidth OFDMA, as required for EHT operation.

  Driver API:

   - Major overhaul of the Energy Efficient Ethernet internals to
     support new link modes (2.5GE, 5GE), share more code between
     drivers (especially those using phylib), and encourage more
     uniform behavior. Convert and clean up drivers.

   - Define an API for querying per netdev queue statistics from
     drivers.

   - IPSec: account in global stats for fully offloaded sessions.

   - Create a concept of Ethernet PHY Packages at the Device Tree level,
     to allow parameterizing the existing PHY package code.

   - Enable Rx hashing (RSS) on GTP protocol fields.

  Misc:

   - Improvements and refactoring all over networking selftests.

   - Create uniform module aliases for TC classifiers, actions, and
     packet schedulers to simplify creating modprobe policies.

   - Address all missing MODULE_DESCRIPTION() warnings in networking.

   - Extend the Netlink descriptions in YAML to cover message
     encapsulation or "Netlink polymorphism", where interpretation of
     nested attributes depends on link type, classifier type or some
     other "class type".

  Drivers:

   - Ethernet high-speed NICs:
      - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
      - Intel (100G, ice, idpf):
         - support E825-C devices
      - nVidia/Mellanox:
         - support devices with one port and multiple PCIe links
      - Broadcom (bnxt):
         - support n-tuple filters
         - support configuring the RSS key
      - Wangxun (ngbe/txgbe):
         - implement irq_domain for TXGBE's sub-interrupts
      - Pensando/AMD:
         - support XDP
         - optimize queue submission and wakeup handling (+17% bps)
         - optimize struct layout, saving 28% of memory on queues

   - Ethernet NICs embedded and virtual:
      - Google cloud vNIC:
         - refactor driver to perform memory allocations for new queue
           config before stopping and freeing the old queue memory
      - Synopsys (stmmac):
         - obey queueMaxSDU and implement counters required by 802.1Qbv
      - Renesas (ravb):
         - support packet checksum offload
         - suspend to RAM and runtime PM support

   - Ethernet switches:
      - nVidia/Mellanox:
         - support for nexthop group statistics
      - Microchip:
         - ksz8: implement PHY loopback
         - add support for KSZ8567, a 7-port 10/100Mbps switch

   - PTP:
      - New driver for RENESAS FemtoClock3 Wireless clock generator.
      - Support OCP PTP cards designed and built by Adva.

   - CAN:
      - Support recvmsg() flags for own, local and remote traffic on CAN
        BCM sockets.
      - Support for esd GmbH PCIe/402 CAN device family.
      - m_can:
         - Rx/Tx submission coalescing
         - wake on frame Rx

   - WiFi:
      - Intel (iwlwifi):
         - enable signaling and payload protected A-MSDUs
         - support wider-bandwidth OFDMA
         - support for new devices
         - bump FW API to 89 for AX devices; 90 for BZ/SC devices
      - MediaTek (mt76):
         - mt7915: newer ADIE version support
         - mt7925: radio temperature sensor support
      - Qualcomm (ath11k):
         - support 6 GHz station power modes: Low Power Indoor (LPI),
           Standard Power) SP and Very Low Power (VLP)
         - QCA6390 & WCN6855: support 2 concurrent station interfaces
         - QCA2066 support
      - Qualcomm (ath12k):
         - refactoring in preparation for Multi-Link Operation (MLO)
           support
         - 1024 Block Ack window size support
         - firmware-2.bin support
         - support having multiple identical PCI devices (firmware needs
           to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
         - QCN9274: support split-PHY devices
         - WCN7850: enable Power Save Mode in station mode
         - WCN7850: P2P support
      - RealTek:
         - rtw88: support for more rtw8811cu and rtw8821cu devices
         - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
         - rtlwifi: speed up USB firmware initialization
         - rtwl8xxxu:
             - RTL8188F: concurrent interface support
             - Channel Switch Announcement (CSA) support in AP mode
      - Broadcom (brcmfmac):
         - per-vendor feature support
         - per-vendor SAE password setup
         - DMI nvram filename quirk for ACEPC W5 Pro"

* tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
  nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
  nexthop: Fix out-of-bounds access during attribute validation
  nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
  nexthop: Only parse NHA_OP_FLAGS for get messages that require it
  bpf: move sleepable flag from bpf_prog_aux to bpf_prog
  bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
  selftests/bpf: Add kprobe multi triggering benchmarks
  ptp: Move from simple ida to xarray
  vxlan: Remove generic .ndo_get_stats64
  vxlan: Do not alloc tstats manually
  devlink: Add comments to use netlink gen tool
  nfp: flower: handle acti_netdevs allocation failure
  net/packet: Add getsockopt support for PACKET_COPY_THRESH
  net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
  selftests/bpf: Add bpf_arena_htab test.
  selftests/bpf: Add bpf_arena_list test.
  selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
  bpf: Add helper macro bpf_addr_space_cast()
  libbpf: Recognize __arena global variables.
  bpftool: Recognize arena map type
  ...

Merge tag 'docs-6.9' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
"A moderatly busy cycle for development this time around.

   - Some cleanup of the main index page for easier navigation

   - Rework some of the other top-level pages for better readability
     and, with luck, fewer merge conflicts in the future.

   - Submit-checklist improvements, hopefully the first of many.

   - New Italian translations

   - A fair number of kernel-doc fixes and improvements. We have also
     dropped the recommendation to use an old version of Sphinx.

   - A new document from Thorsten on bisection

  ... and lots of fixes and updates"

* tag 'docs-6.9' of git://git.lwn.net/linux: (54 commits)
  docs: verify/bisect: fixes, finetuning, and support for Arch
  docs: Makefile: Add dependency to $(YNL_INDEX) for targets other than htmldocs
  docs: Move ja_JP/howto.rst to ja_JP/process/howto.rst
  docs: submit-checklist: use subheadings
  docs: submit-checklist: structure by category
  docs: new text on bisecting which also covers bug validation
  docs: drop the version constraints for sphinx and dependencies
  docs: kerneldoc-preamble.sty: Remove code for Sphinx <2.4
  docs: Restore "smart quotes" for quotes
  docs/zh_CN: accurate translation of "function"
  docs: Include simplified link titles in main index
  docs: Correct formatting of title in admin-guide/index.rst
  docs: kernel_feat.py: fix build error for missing files
  MAINTAINERS: Set the field name for subsystem profile section
  kasan: Add documentation for CONFIG_KASAN_EXTRA_INFO
  Fixed case issue with 'fault-injection' in documentation
  kernel-doc: handle #if in enums as well
  Documentation: update mailing list addresses
  doc: kerneldoc.py: fix indentation
  scripts/kernel-doc: simplify signature printing
  ...

Merge tag 'audit-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
"Two small audit patches:

   - Use the KMEM_CACHE() macro instead of kmem_cache_create()

     The guidance appears to be to use the KMEM_CACHE() macro when
     possible and there is no reason why we can't use the macro, so
     let's use it.

   - Remove an unnecessary assignment in audit_dupe_lsm_field()

     A return value variable was assigned a value in its declaration,
     but the declaration value is overwritten before the return value
     variable is ever referenced; drop the assignment at declaration
     time"

* tag 'audit-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: use KMEM_CACHE() instead of kmem_cache_create()
  audit: remove unnecessary assignment in audit_dupe_lsm_field()

Merge tag 'Smack-for-6.9' of https://github.com/cschaufler/smack-next

Pull smack updates from Casey Schaufler:

- Improvements to the initialization of in-memory inodes

- A fix in ramfs to propery ensure the initialization of in-memory
   inodes

- Removal of duplicated code in smack_cred_transfer()

* tag 'Smack-for-6.9' of https://github.com/cschaufler/smack-next:
  Smack: use init_task_smack() in smack_cred_transfer()
  ramfs: Initialize security of in-memory inodes
  smack: Initialize the in-memory inode in smack_inode_init_security()
  smack: Always determine inode labels in smack_inode_init_security()
  smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity()
  smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr()

Merge tag 'seccomp-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp updates from Kees Cook:
"There are no core kernel changes here; it's entirely selftests and
  samples:

   - Improve reliability of selftests (Terry Tritton, Kees Cook)

   - Fix strict-aliasing warning in samples (Arnd Bergmann)"

* tag 'seccomp-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  samples: user-trap: fix strict-aliasing warning
  selftests/seccomp: Pin benchmark to single CPU
  selftests/seccomp: user_notification_addfd check nextfd is available
  selftests/seccomp: Change the syscall used in KILL_THREAD test
  selftests/seccomp: Handle EINVAL on unshare(CLONE_NEWPID)

cxl/region: Deal with numa nodes not enumerated by SRAT

For the numa nodes that are not created by SRAT, no memory_target is
allocated and is not managed by the HMAT_REPORTING code. Therefore
hmat_callback() memory hotplug notifier will exit early on those NUMA
nodes. The CXL memory hotplug notifier will need to call
node_set_perf_attrs() directly in order to setup the access sysfs
attributes.

In acpi_numa_init(), the last proximity domain (pxm) id created by SRAT is
stored. Add a helper function acpi_node_backed_by_real_pxm() in order to
check if a NUMA node id is defined by SRAT or created by CFMWS.

node_set_perf_attrs() symbol is exported to allow update of perf attribs
for a node. The sysfs path of
/sys/devices/system/node/nodeX/access0/initiators/* is created by
node_set_perf_attrs() for the various attributes where nodeX is matched
to the NUMA node of the CXL region.

Cc: Rafael J. Wysocki <[email protected]>
Reviewed-by: Alison Schofield <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

Merge tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:
"As is pretty normal for this tree, there are changes all over the
  place, especially for small fixes, selftest improvements, and improved
  macro usability.

  Some header changes ended up landing via this tree as they depended on
  the string header cleanups. Also, a notable set of changes is the work
  for the reintroduction of the UBSAN signed integer overflow sanitizer
  so that we can continue to make improvements on the compiler side to
  make this sanitizer a more viable future security hardening option.

  Summary:

   - string.h and related header cleanups (Tanzir Hasan, Andy
     Shevchenko)

   - VMCI memcpy() usage and struct_size() cleanups (Vasiliy Kovalev,
     Harshit Mogalapalli)

   - selftests/powerpc: Fix load_unaligned_zeropad build failure
     (Michael Ellerman)

   - hardened Kconfig fragment updates (Marco Elver, Lukas Bulwahn)

   - Handle tail call optimization better in LKDTM (Douglas Anderson)

   - Use long form types in overflow.h (Andy Shevchenko)

   - Add flags param to string_get_size() (Andy Shevchenko)

   - Add Coccinelle script for potential struct_size() use (Jacob
     Keller)

   - Fix objtool corner case under KCFI (Josh Poimboeuf)

   - Drop 13 year old backward compat CAP_SYS_ADMIN check (Jingzi Meng)

   - Add str_plural() helper (Michal Wajdeczko, Kees Cook)

   - Ignore relocations in .notes section

   - Add comments to explain how __is_constexpr() works

   - Fix m68k stack alignment expectations in stackinit Kunit test

   - Convert string selftests to KUnit

   - Add KUnit tests for fortified string functions

   - Improve reporting during fortified string warnings

   - Allow non-type arg to type_max() and type_min()

   - Allow strscpy() to be called with only 2 arguments

   - Add binary mode to leaking_addresses scanner

   - Various small cleanups to leaking_addresses scanner

   - Adding wrapping_*() arithmetic helper

   - Annotate initial signed integer wrap-around in refcount_t

   - Add explicit UBSAN section to MAINTAINERS

   - Fix UBSAN self-test warnings

   - Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL

   - Reintroduce UBSAN's signed overflow sanitizer"

* tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits)
  selftests/powerpc: Fix load_unaligned_zeropad build failure
  string: Convert helpers selftest to KUnit
  string: Convert selftest to KUnit
  sh: Fix build with CONFIG_UBSAN=y
  compiler.h: Explain how __is_constexpr() works
  overflow: Allow non-type arg to type_max() and type_min()
  VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler()
  lib/string_helpers: Add flags param to string_get_size()
  x86, relocs: Ignore relocations in .notes section
  objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks
  overflow: Use POD in check_shl_overflow()
  lib: stackinit: Adjust target string to 8 bytes for m68k
  sparc: vdso: Disable UBSAN instrumentation
  kernel.h: Move lib/cmdline.c prototypes to string.h
  leaking_addresses: Provide mechanism to scan binary files
  leaking_addresses: Ignore input device status lines
  leaking_addresses: Use File::Temp for /tmp files
  MAINTAINERS: Update LEAKING_ADDRESSES details
  fortify: Improve buffer overflow reporting
  fortify: Add KUnit tests for runtime overflows
  ...

Merge tag 'execve-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve updates from Kees Cook:

- Drop needless error path code in remove_arg_zero() (Li kunyu, Kees
   Cook)

- binfmt_elf_efpic: Don't use missing interpreter's properties (Max
   Filippov)

- Use /bin/bash for execveat selftests

* tag 'execve-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  exec: Simplify remove_arg_zero() error path
  selftests/exec: Perform script checks with /bin/bash
  exec: Delete unnecessary statements in remove_arg_zero()
  fs: binfmt_elf_efpic: don't use missing interpreter's properties

Merge tag 'pstore-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore updates from Kees Cook:

- Make PSTORE_RAM available by default on arm64 (Nícolas F R A Prado)

- Allow for dynamic initialization in modular build (Guilherme G
   Piccoli)

- Add missing allocation failure check (Kunwu Chan)

- Avoid duplicate memory zeroing (Christophe JAILLET)

- Avoid potential double-free during pstorefs umount

* tag 'pstore-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/zone: Don't clear memory twice
  pstore/zone: Add a null pointer check to the psz_kmsg_read
  efi: pstore: Allow dynamic initialization based on module parameter
  arm64: defconfig: Enable PSTORE_RAM
  pstore/ram: Register to module device table
  pstore: inode: Only d_invalidate() is needed

Merge tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
"The bulk of the patches for this release are optimizations, code
  clean-ups, and minor bug fixes.

  One new feature to mention is that NFSD administrators now have the
  ability to revoke NFSv4 open and lock state. NFSD's NFSv3 support has
  had this capability for some time.

  As always I am grateful to NFSD contributors, reviewers, and testers"

* tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (75 commits)
  NFSD: Clean up nfsd4_encode_replay()
  NFSD: send OP_CB_RECALL_ANY to clients when number of delegations reaches its limit
  NFSD: Document nfsd_setattr() fill-attributes behavior
  nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr()
  nfsd: Fix a regression in nfsd_setattr()
  NFSD: OP_CB_RECALL_ANY should recall both read and write delegations
  NFSD: handle GETATTR conflict with write delegation
  NFSD: add support for CB_GETATTR callback
  NFSD: Document the phases of CREATE_SESSION
  NFSD: Fix the NFSv4.1 CREATE_SESSION operation
  nfsd: clean up comments over nfs4_client definition
  svcrdma: Add Write chunk WRs to the RPC's Send WR chain
  svcrdma: Post WRs for Write chunks in svc_rdma_sendto()
  svcrdma: Post the Reply chunk and Send WR together
  svcrdma: Move write_info for Reply chunks into struct svc_rdma_send_ctxt
  svcrdma: Post Send WR chain
  svcrdma: Fix retry loop in svc_rdma_send()
  svcrdma: Prevent a UAF in svc_rdma_send()
  svcrdma: Fix SQ wake-ups
  svcrdma: Increase the per-transport rw_ctx count
  ...

Merge tag 'erofs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs updates from Gao Xiang:
"In this cycle, we introduce compressed inode support over fscache
  since a lot of native EROFS images are explicitly compressed so that
  EROFS over fscache can be more widely used even without Dragonfly
  Nydus [1].

  Apart from that, there are some folio conversions for compressed
  inodes available as well as a lockdep false positive fix.

  Summary:

   - Some folio conversions for compressed inodes;

   - Add compressed inode support over fscache;

   - Fix lockdep false positives of erofs_pseudo_mnt"

Link: https://nydus.dev
* tag 'erofs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: support compressed inodes over fscache
  erofs: make iov_iter describe target buffers over fscache
  erofs: fix lockdep false positives on initializing erofs_pseudo_mnt
  erofs: refine managed cache operations to folios
  erofs: convert z_erofs_submissionqueue_endio() to folios
  erofs: convert z_erofs_fill_bio_vec() to folios
  erofs: get rid of `justfound` debugging tag
  erofs: convert z_erofs_do_read_page() to folios
  erofs: convert z_erofs_onlinepage_.* to folios

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

Pull fsverity update from Eric Biggers:
"Slightly improve data verification performance by eliminating an
unnecessary lock"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
fsverity: remove hash page spin lock

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux

Pull fscrypt updates from Eric Biggers:
"Fix flakiness in a test by releasing the quota synchronously when a
  key is removed, and other minor cleanups"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  fscrypt: shrink the size of struct fscrypt_inode_info slightly
  fscrypt: write CBC-CTS instead of CTS-CBC
  fscrypt: clear keyring before calling key_put()
  fscrypt: explicitly require that inode->i_blkbits be set

assoc_array: fix the return value in assoc_array_insert_mid_shortcut()

Returning the edit variable is redundant because it is dereferenced right
before it is returned. It would be better to return true.

Found by Linux Verification Center (linuxtesting.org) with Svace.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Roman Smirnov <[email protected]>
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

buildid: use kmap_local_page()

Use kmap_local_page() instead of kmap_atomic() which has been deprecated.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Peng Hao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

watchdog/core: remove sysctl handlers from public header

The functions are only used in the file where they are defined. Remove
them from the header and make them static.

Also guard proc_soft_watchdog with a #define-guard as it is not used
otherwise.

Link: https://lkml.kernel.org/r/20240306-const-sysctl-prep-watchdog-v1-1-bd45da3a41cf@weissschuh.net
Signed-off-by: Thomas Weißschuh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: use div64_ul() instead of do_div()

Fixes Coccinelle/coccicheck warnings reported by do_div.cocci.

Compared to do_div(), div64_ul() does not implicitly cast the divisor and
does not unnecessarily calculate the remainder.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Thorsten Blum <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mul_u64_u64_div_u64: increase precision by conditionally swapping a and b

As indicated in the added comment, the algorithm works better if b is big.
As multiplication is commutative, a and b can be swapped. Do this if a
is bigger than b.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Uwe Kleine-König <[email protected]>
Tested-by: Biju Das <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements

Now that run_vmtests.sh does not guarantee that the correct hugepage count
is available, skip the hugetlb-madvise test if the requirements are not
met rather than failing.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nico Pache <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Muhammad Usama Anjum <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/mm: skip uffd hugetlb tests with insufficient hugepages

Now that run_vmtests.sh does not guarantee that the correct hugepage count
is available, add a check inside the userfaultfd hugetlb test to verify
the nr_hugepages count before continuing.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nico Pache <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Muhammad Usama Anjum <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

selftests/mm: dont fail testsuite due to a lack of hugepages

Patch series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests", v2.

This series addresses issues related to hugepage requirements in the MM
selftests, ensuring tests are skipped rather than failing when the
necessary hugepage count is not met.

This adjustment allows for a more graceful handling for systems with
insufficient hugepages, preventing unnecessary test failures and improving
the overall robustness of the test suite.

This patch (of 3):

On systems that have large core counts and large page sizes, but limited
memory, the userfaultfd test hugepage requirement is too large.

Exiting early due to missing one test's requirements is a rather
aggressive strategy, and prevents a lot of other tests from running.
Remove the early exit to prevent this.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ee00479d6702 ("selftests: vm: Try harder to allocate huge pages")
Signed-off-by: Nico Pache <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: Muhammad Usama Anjum <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/huge_memory: skip invalid debugfs new_order input for folio split

User can put arbitrary new_order via debugfs for folio split test.
Although new_order check is added to split_huge_page_to_list_order() in
the prior commit, these two additional checks can avoid unnecessary folio
locking and split_folio_to_order() calls.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zi Yan <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/linux-mm/[email protected]/
Cc: David Hildenbrand <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/huge_memory: check new folio order when split a folio

A folio can only be split into lower orders.

Since there are no new_order checks in debugfs, any new_order can be
passed via debugfs into split_huge_page_to_list_to_order().

Check new_order to make sure it is smaller than input folio order.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Signed-off-by: Zi Yan <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/linux-mm/[email protected]/
Cc: David Hildenbrand <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure

With cache_trim_mode on, reclaim logic doesn't bother reclaiming anon
pages.  However, it should be more careful to use the mode because it's
going to prevent anon pages from being reclaimed even if there are a huge
number of anon pages that are cold and should be reclaimed.  Even worse,
that leads kswapd_failures to reach MAX_RECLAIM_RETRIES and stopping
kswapd from functioning until direct reclaim eventually works to resume
kswapd.

So kswapd needs to retry its scan priority loop with cache_trim_mode off
again if the mode doesn't work for reclaim.

The problematic behavior can be reproduced by:

   CONFIG_NUMA_BALANCING enabled
   sysctl_numa_balancing_mode set to NUMA_BALANCING_MEMORY_TIERING
   numa node0 (8GB local memory, 16 CPUs)
   numa node1 (8GB slow tier memory, no CPUs)

   Sequence:

   1) echo 3 > /proc/sys/vm/drop_caches
   2) To emulate the system with full of cold memory in local DRAM, run
      the following dummy program and never touch the region:

         mmap(0, 8 * 1024 * 1024 * 1024, PROT_READ | PROT_WRITE,
              MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, -1, 0);

   3) Run any memory intensive work e.g. XSBench.
   4) Check if numa balancing is working e.i. promotion/demotion.
   5) Iterate 1) ~ 4) until numa balancing stops.

With this, you could see that promotion/demotion are not working because
kswapd has stopped due to ->kswapd_failures >= MAX_RECLAIM_RETRIES.

Interesting vmstat delta's differences between before and after are like:

   +-----------------------+-------------------------------+
   | interesting vmstat    | before        | after         |
   +-----------------------+-------------------------------+
   | nr_inactive_anon      | 321935        | 1664772       |
   | nr_active_anon        | 1780700       | 437834        |
   | nr_inactive_file      | 30425         | 40882         |
   | nr_active_file        | 14961         | 3012          |
   | pgpromote_success     | 356           | 1293122       |
   | pgpromote_candidate   | 21953245      | 1824148       |
   | pgactivate            | 1844523       | 3311907       |
   | pgdeactivate          | 50634         | 1554069       |
   | pgfault               | 31100294      | 6518806       |
   | pgdemote_kswapd       | 30856         | 2230821       |
   | pgscan_kswapd         | 1861981       | 7667629       |
   | pgscan_anon           | 1822930       | 7610583       |
   | pgscan_file           | 39051         | 57046         |
   | pgsteal_anon          | 386           | 2192033       |
   | pgsteal_file          | 30470         | 38788         |
   | pageoutrun            | 30            | 412           |
   | numa_hint_faults      | 27418279      | 2875955       |
   | numa_pages_migrated   | 356           | 1293122       |
   +-----------------------+-------------------------------+

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Byungchul Park <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: add an explicit smp_wmb() to UFFDIO_CONTINUE

Users of UFFDIO_CONTINUE may reasonably assume that a write memory barrier
is included as part of UFFDIO_CONTINUE. That is, a user may believe that
all writes it has done to a page that it is now UFFDIO_CONTINUE'ing are
guaranteed to be visible to anyone subsequently reading the page through
the newly mapped virtual memory region.

Today, such a user happens to be correct. mmget_not_zero(), for example,
is called as part of UFFDIO_CONTINUE (and comes before any PTE updates),
and it implicitly gives us a write barrier.

To be resilient against future changes, include an explicit smp_wmb().
While we're at it, optimize the smp_wmb() that is already incidentally
present for the HugeTLB case.

Merely making a syscall does not generally imply the memory ordering
constraints that we need (including on x86).

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: James Houghton <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: fix list corruption in put_pages_list

My recent change to put_pages_list() dereferences folio->lru.next after
returning the folio to the page allocator. Usually this is now on the pcp
list with other free folios, so we try to free an already-free folio.
This only happens with lists that have more than 15 entries, so it wasn't
immediately discovered. Revert to using list_for_each_safe() so we
dereference lru.next before disposing of the folio.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 24835f899c01 ("mm: use free_unref_folios() in put_pages_list()")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reported-by: "Borah, Chaitanya Kumar" <[email protected]>
Closes: https://lore.kernel.org/intel-gfx/SJ1PR11MB61292145F3B79DA58ADDDA63B9232@SJ1PR11MB6129.namprd11.prod.outlook.com/
Signed-off-by: Andrew Morton <[email protected]>

mm: remove folio from deferred split list before uncharging it

When freeing a large folio, we must remove it from the deferred split list
before we uncharge it as each memcg has its own deferred split list (with
associated lock) and removing a folio from the deferred split list while
holding the wrong lock will corrupt that list and cause various related
problems.

Link: https://lore.kernel.org/linux-mm/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: f77171d241e3 (mm: allow non-hugetlb large folios to be batch processed)
Fixes: 29f3843026cf (mm: free folios directly in move_folios_to_lru())
Fixes: bc2ff4cbc329 (mm: free folios in a batch in shrink_folio_list())
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Debugged-by: Ryan Roberts <[email protected]>
Tested-by: Ryan Roberts <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Merge tag 'affs-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull affs update from David Sterba:
"One change to AFFS that removes use of SLAB_MEM_SPREAD, which is going
to be removed from MM code"

* tag 'affs-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
affs: remove SLAB_MEM_SPREAD flag usage

cxl/region: Add memory hotplug notifier for cxl region

When the CXL region is formed, the driver computes the performance data
for the region. However this data is not available at the node data
collection that has been populated by the HMAT during kernel
initialization. Add a memory hotplug notifier to update the access
coordinates to the 'struct memory_target' context kept by the
HMAT_REPORTING code.

Add CXL_CALLBACK_PRI for a memory hotplug callback priority. Set the
priority number to be called before HMAT_CALLBACK_PRI. The CXL update must
happen before hmat_callback().

A new HMAT_REPORTING helper hmat_update_target_coordinates() is added in
order to allow CXL to update the memory_target access coordinates.

A new ext_updated member is added to the memory_target to indicate that
the access coordinates within the memory_target has been updated by an
external agent such as CXL. This prevents data being overwritten by the
hmat_update_target_attrs() triggered by hmat_callback().

Cc: Andrew Morton <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Reviewed-by: Huang, Ying <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

cxl/region: Add sysfs attribute for locality attributes of CXL regions

Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
region. The bandwidth is the aggregated bandwidth of all devices that
contribute to the CXL region. The latency is the worst latency of the
device amongst all the devices that contribute to the CXL region.

Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

cxl/region: Calculate performance data for a region

Calculate and store the performance data for a CXL region. Find the worst
read and write latency for all the included ranges from each of the devices
that attributes to the region and designate that as the latency data. Sum
all the read and write bandwidth data for each of the device region and
that is the total bandwidth for the region.

The perf list is expected to be constructed before the endpoint decoders
are registered and thus there should be no early reading of the entries
from the region assemble action. The calling of the region qos calculate
function is under the protection of cxl_dpa_rwsem and will ensure that
all DPA associated work has completed.

Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

cxl: Set cxlmd->endpoint before adding port device

Move setting of cxlmd->endpoint to before calling add_device() on the port
device. Otherwise when referencing cxlmd->endpoint in region discovery code
that is triggered by the port driver probe function, the endpoint port
pointer is not valid.

Current code does not hit this issue yet since cxlmd->endpoint is not being
referenced during region discovery. However follow on code that does
performance calculations will.

Tested-by: Wonjae Lee <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

cxl: Move QoS class to be calculated from the nearest CPU

Retrieve the qos_class (QTG ID) using the access coordinates from the
nearest CPU rather than the nearst initiator that may not be a CPU.
This may be the more appropriate number that applications care about.

For most cases, access0 and access1 have the same values.

Link: https://lore.kernel.org/linux-cxl/[email protected]/
Suggested-by: Jonathan Cameron <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

cxl: Split out host bridge access coordinates

The difference between access class 0 and access class 1 for 'struct
access_coordinate', if any, is that class 0 is for the distance from
the target to the closest initiator and that class 1 is for the distance
from the target to the closest CPU. For CXL memory, the nearest initiator
may not necessarily be a CPU node. The performance path from the CXL
endpoint to the host bridge should remain the same. However, the numbers
extracted and stored from HMAT is the difference for the two access
classes. Split out the performance numbers for the host bridge (generic
target) from the calculation of the entire path in order to allow
calculation of both access classes for a CXL region.

Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

cxl: Split out combine_coordinates() for common shared usage

Refactor the common code of combining coordinates in order to reduce code.
Create a new function cxl_cooordinates_combine() it combine two 'struct
access_coordinate'.

Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access classes

Update acpi_get_genport_coordinates() to allow retrieval of both access
classes of the 'struct access_coordinate' for a generic target. The update
will allow CXL code to compute access coordinates for both access class.

Cc: Rafael J. Wysocki <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

ACPI: HMAT: Introduce 2 levels of generic port access class

In order to compute access0 and access1 classes for CXL memory, 2 levels
of generic port information must be stored. Access0 will indicate the
generic port access coordinates to the closest initiator and access1
will indicate the generic port access coordinates to the cloest CPU.

Cc: Rafael J. Wysocki <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

base/node / ACPI: Enumerate node access class for 'struct access_coordinate'

Both generic node and HMAT handling code have been using magic numbers to
indicate access classes for 'struct access_coordinate'. Introduce enums to
enumerate the access0 and access1 classes shared by the two subsystems.
Update the function parameters and callers as appropriate to utilize the
new enum.

Access0 is named to ACCESS_COORDINATE_LOCAL in order to indicate that the
access class is for 'struct access_coordinate' between a target node and
the nearest initiator node.

Access1 is named to ACCESS_COORDINATE_CPU in order to indicate that the
access class is for 'struct access_coordinate' between a target node and
the nearest CPU node.

Cc: Greg Kroah-Hartman <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

ACPI: HMAT: Remove register of memory node for generic target

For generic targets, there's no reason to call
register_memory_node_under_compute_node() with the access levels that are
only visible to HMAT handling code. Only update the attributes and rename
hmat_register_generic_target_initiators() to hmat_update_generic_target().

The original call path ends up triggering register_memory_node_under_compute_node().
Although the access level would be "3" and not impact any current node arrays, it
introduces unwanted data into the numa node access_coordinate array.

Fixes: a3a3e341f169 ("acpi: numa: Add setting of generic port system locality attributes")
Cc: Rafael J. Wysocki <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Tested-by: Jonathan Cameron <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>

Merge tag 'for-6.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
"Mostly stabilization, refactoring and cleanup changes. There rest are
  minor performance optimizations due to caching or lock contention
  reduction and a few notable fixes.

  Performance improvements:

   - minor speedup in logging when repeatedly allocated structure is
     preallocated only once, improves latency and decreases lock
     contention

   - minor throughput increase (+6%), reduced lock contention after
     clearing delayed allocation bits, applies to several common
     workload types

   - skip full quota rescan if a new relation is added in the same
     transaction

  Fixes:

   - zstd fix for inline compressed file in subpage mode, updated
     version from the 6.8 time

   - proper qgroup inheritance ioctl parameter validation

   - more fiemap followup fixes after reduced locking done in 6.8:
      - fix race when detecting delalloc ranges

  Core changes:

   - more debugging code:
      - added assertions for a very rare crash in raid56 calculation
      - tree-checker dumps page state to give more insights into
        possible reference counting issues

   - add checksum calculation offloading sysfs knob, for now enabled
     under DEBUG only to determine a good heuristic for deciding the
     offload or synchronous, depends on various factors (block group
     profile, device speed) and is not as clear as initially thought
     (checksum type)

   - error handling improvements, added assertions

   - more page to folio conversion (defrag, truncate), cached size and
     shift

   - preparation for more fine grained locking of sectors in subpage
     mode

   - cleanups and refactoring:
      - include cleanups, forward declarations
      - pointer-to-structure helpers
      - redundant argument removals
      - removed unused code
      - slab cache updates, last use of SLAB_MEM_SPREAD removed"

* tag 'for-6.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (114 commits)
  btrfs: reuse cloned extent buffer during fiemap to avoid re-allocations
  btrfs: fix race when detecting delalloc ranges during fiemap
  btrfs: fix off-by-one chunk length calculation at contains_pending_extent()
  btrfs: qgroup: allow quick inherit if snapshot is created and added to the same parent
  btrfs: qgroup: validate btrfs_qgroup_inherit parameter
  btrfs: include device major and minor numbers in the device scan notice
  btrfs: mark btrfs_put_caching_control() static
  btrfs: remove SLAB_MEM_SPREAD flag use
  btrfs: qgroup: always free reserved space for extent records
  btrfs: tree-checker: dump the page status if hit something wrong
  btrfs: compression: remove dead comments in btrfs_compress_heuristic()
  btrfs: subpage: make writer lock utilize bitmap
  btrfs: subpage: make reader lock utilize bitmap
  btrfs: unexport btrfs_subpage_start_writer() and btrfs_subpage_end_and_test_writer()
  btrfs: pass a valid extent map cache pointer to __get_extent_map()
  btrfs: merge btrfs_del_delalloc_inode() helpers
  btrfs: pass btrfs_device to btrfs_scratch_superblocks()
  btrfs: handle transaction commit errors in flush_reservations()
  btrfs: use KMEM_CACHE() to create btrfs_free_space cache
  btrfs: use KMEM_CACHE() to create delayed ref caches
  ...