Linus Torvalds [Wed, 13 Mar 2024 20:15:24 +0000 (13:15 -0700)]
Merge tag '6.9-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client updates from Steve French:
- fix for folios/netfs data corruption in cifs_extend_writeback
- additional tracepoint added
- updates for special files and symlinks: improvements to allow
selecting use of either WSL or NFS reparse point format on creating
special files
- allocation size improvement for cached files
- minor cleanup patches
- fix to allow changing the password on remount when password for the
session is expired.
- lease key related fixes: caching hardlinked files, deletes of
deferred close files, and an important fix to better reuse lease keys
for compound operations, which also can avoid lease break timeouts
when low on credits
- fix potential data corruption with write/readdir races
- compression cleanups and a fix for compression headers
* tag '6.9-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (24 commits)
cifs: update internal module version number for cifs.ko
smb: common: simplify compression headers
smb: common: fix fields sizes in compression_pattern_payload_v1
smb: client: negotiate compression algorithms
smb3: add dynamic trace point for ioctls
cifs: Fix writeback data corruption
smb: client: return reparse type in /proc/mounts
smb: client: set correct d_type for reparse DFS/DFSR and mount point
smb: client: parse uid, gid, mode and dev from WSL reparse points
smb: client: introduce SMB2_OP_QUERY_WSL_EA
smb: client: Fix a NULL vs IS_ERR() check in wsl_set_xattrs()
smb: client: add support for WSL reparse points
smb: client: reduce number of parameters in smb2_compound_op()
smb: client: fix potential broken compound request
smb: client: move most of reparse point handling code to common file
smb: client: introduce reparse mount option
smb: client: retry compound request without reusing lease
smb: client: do not defer close open handles to deleted files
smb: client: reuse file lease key in compound operations
smb3: update allocation size more accurately on write completion
...
Jens Axboe [Tue, 12 Mar 2024 21:58:41 +0000 (15:58 -0600)]
block: limit block time caching to in_task() context
We should not have any callers of this from non-task context, but Jakub
ran [1] into one from blk-iocost. Rather than risk running into others,
or future ones, just limit blk_time_get_ns() to when it is called from
a task. Any other usage is invalid.
Fixes: da4c8c3d0975 ("block: cache current nsec time in struct blk_plug") Reported-by: Jakub Kicinski <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
As Linus suggested this enables pidfs unconditionally. A key property to
retain is the ability to compare pidfds by inode number (cf. [1]).
That's extremely helpful just as comparing namespace file descriptors by
inode number is. They are used in a variety of scenarios where they need
to be compared, e.g., when receiving a pidfd via SO_PEERPIDFD from a
socket to trivially authenticate a the sender and various other
use-cases.
For 64bit systems this is pretty trivial to do. For 32bit it's slightly
more annoying as we discussed but we simply add a dumb ida based
allocator that gets used on 32bit. This gives the same guarantees about
inode numbers on 64bit without any overflow risk. Practically, we'll
never run into overflow issues because we're constrained by the number
of processes that can exist on 32bit and by the number of open files
that can exist on a 32bit system. On 64bit none of this matters and
things are very simple.
If 32bit also needs the uniqueness guarantee they can simply parse the
contents of /proc/<pid>/fd/<nr>. The uniqueness guarantees have a
variety of use-cases. One of the most obvious ones is that they will
make pidfiles (or "pidfdfiles", I guess) reliable as the unique
identifier can be placed into there that won't be reycled. Also a
frequent request.
Note, I took the chance and simplified path_from_stashed() even further.
Instead of passing the inode number explicitly to path_from_stashed() we
let the filesystem handle that internally. So path_from_stashed() ends
up even simpler than it is now. This is also a good solution allowing
the cleanup code to be clean and consistent between 32bit and 64bit. The
cleanup path in prepare_anon_dentry() is also switched around so we put
the inode before the dentry allocation. This means we only have to call
the cleanup handler for the filesystem's inode data once and can rely
->evict_inode() otherwise.
Aside from having to have a bit of extra code for 32bit it actually ends
up a nice cleanup for path_from_stashed() imho.
Tested on both 32 and 64bit including error injection.
Linus Torvalds [Wed, 13 Mar 2024 19:40:58 +0000 (12:40 -0700)]
Merge tag 'modules-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull modules updates from Luis Chamberlain:
"Christophe Leroy did most of the work on this release, first with a
few cleanups on CONFIG_STRICT_KERNEL_RWX and ending with error
handling for when set_memory_XX() can fail.
This is part of a larger effort to clean up all these callers which
can fail, modules is just part of it"
* tag 'modules-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module: Don't ignore errors from set_memory_XX()
lib/test_kmod: fix kernel-doc warnings
powerpc: Simplify strict_kernel_rwx_enabled()
modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around rodata_enabled
init: Declare rodata_enabled and mark_rodata_ro() at all time
module: Change module_enable_{nx/x/ro}() to more explicit names
module: Use set_memory_rox()
Linus Torvalds [Wed, 13 Mar 2024 19:37:41 +0000 (12:37 -0700)]
Merge tag 'efi-next-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
- Measure initrd and command line using the CC protocol if the ordinary
TCG2 protocol is not implemented, typically on TDX confidential VMs
- Avoid creating mappings that are both writable and executable while
running in the EFI boot services. This is a prerequisite for getting
the x86 shim loader signed by MicroSoft again, which allows the
distros to install on x86 PCs that ship with EFI secure boot enabled.
- API update for struct platform_driver::remove()
* tag 'efi-next-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
virt: efi_secret: Convert to platform remove callback returning void
x86/efistub: Remap kernel text read-only before dropping NX attribute
efi/libstub: Add get_event_log() support for CC platforms
efi/libstub: Measure into CC protocol if TCG2 protocol is absent
efi/libstub: Add Confidential Computing (CC) measurement typedefs
efi/tpm: Use symbolic GUID name from spec for final events table
efi/libstub: Use TPM event typedefs from the TCG PC Client spec
Stephen Boyd [Wed, 13 Mar 2024 19:36:21 +0000 (12:36 -0700)]
Merge branches 'clk-samsung', 'clk-imx', 'clk-rockchip', 'clk-clkdev' and 'clk-rate-exclusive' into clk-next
- Increase dev_id len for clkdev lookups
* clk-samsung: (25 commits)
clk: samsung: Add CPU clock support for Exynos850
clk: samsung: Pass mask to wait_until_mux_stable()
clk: samsung: Keep register offsets in chip specific structure
clk: samsung: Keep CPU clock chip specific data in a dedicated struct
clk: samsung: Pass register layout type explicitly to CLK_CPU()
clk: samsung: Pass actual CPU clock registers base to CPU_CLK()
clk: samsung: Group CPU clock functions by chip
clk: samsung: Use single CPU clock notifier callback for all chips
clk: samsung: Reduce params count in exynos_register_cpu_clock()
clk: samsung: Pull struct exynos_cpuclk into clk-cpu.c
clk: samsung: Improve clk-cpu.c style
dt-bindings: clock: exynos850: Add CMU_CPUCLK0 and CMU_CPUCL1
clk: samsung: gs101: add support for cmu_peric1
clk: samsung: gs101: drop extra empty line
dt-bindings: clock: google,gs101-clock: add PERIC1 clock management unit
clk: samsung: exynos850: Propagate SPI IPCLK rate change
clk: samsung: gs101: gpio_peric0_pclk needs to be kept on
clk: samsung: exynos850: Add PDMA clocks
dt-bindings: clock: tesla,fsd: Fix spelling mistake
clk: samsung: gs101: add support for cmu_peric0
...
* clk-imx:
clk: imx: imx8mp: Fix SAI_MCLK_SEL definition
clk: imx: scu: Use common error handling code in imx_clk_scu_alloc_dev()
clk: imx: composite-8m: Delete two unnecessary initialisations in __imx8m_clk_hw_composite()
clk: imx: composite-8m: Less function calls in __imx8m_clk_hw_composite() after error detection
* clk-rockchip:
clk: rockchip: rk3399: Allow to set rate of clk_i2s0_frac's parent
clk: rockchip: rk3588: use linked clock ID for GATE_LINK
clk: rockchip: rk3588: fix indent
clk: rockchip: rk3588: fix pclk_vo0grf and pclk_vo1grf
dt-bindings: clock: rk3588: add missing PCLK_VO1GRF
dt-bindings: clock: rk3588: drop CLK_NR_CLKS
clk: rockchip: rk3588: fix CLK_NR_CLKS usage
clk: rockchip: rk3568: Add PLL rate for 128MHz
* clk-clkdev:
clkdev: Update clkdev id usage to allow for longer names
* clk-rate-exclusive:
clk: Add a devm variant of clk_rate_exclusive_get()
Stephen Boyd [Wed, 13 Mar 2024 19:33:44 +0000 (12:33 -0700)]
Merge branches 'clk-renesas', 'clk-cleanup', 'clk-hisilicon', 'clk-mediatek' and 'clk-bulk' into clk-next
- Add a devm_clk_bulk_get_all_enable() API to get and enable all clks
for a device
- Fix some static checker errors in the hisilicon clk driver
* clk-renesas: (25 commits)
clk: renesas: r8a779h0: Add RPC-IF clock
clk: renesas: r8a779h0: Add SYS-DMAC clocks
clk: renesas: r8a779h0: Add SDHI clock
clk: renesas: r8a779h0: Add EtherAVB clocks
clk: renesas: r9a07g04[34]: Fix typo for sel_shdi variable
clk: renesas: r9a07g04[34]: Use SEL_SDHI1_STS status configuration for SD1 mux
clk: renesas: r8a779f0: Correct PFC/GPIO parent clock
clk: renesas: r8a779g0: Correct PFC/GPIO parent clocks
clk: renesas: r8a779h0: Add I2C clocks
clk: renesas: r8a779h0: Add watchdog clock
clk: renesas: r8a779h0: Add PFC/GPIO clocks
clk: renesas: r8a779g0: Fix PCIe clock name
clk: renesas: cpg-mssr: Add support for R-Car V4M
clk: renesas: rcar-gen4: Add support for FRQCRC1
clk: renesas: r9a07g043: Add clock and reset entries for CRU
clk: renesas: r9a08g045: Add clock and reset support for watchdog
dt-bindings: clock: Add R8A779H0 V4M CPG Core Clock Definitions
dt-bindings: clock: renesas,cpg-mssr: Document R-Car V4M support
dt-bindings: power: Add r8a779h0 SYSC power domain definitions
dt-bindings: power: renesas,rcar-sysc: Document R-Car V4M support
...
* clk-cleanup:
clk: zynq: Prevent null pointer dereference caused by kmalloc failure
clk: fractional-divider: Use bit operations consistently
clk: fractional-divider: Move mask calculations out of lock
clk: ti: dpll3xxx: use correct function names in kernel-doc
clk: clocking-wizard: Remove redundant initialization of pointer div_addr
clk: keystone: sci-clk: match func name comment to actual
clk: cdce925: Remove redundant assignment to variable 'rate'
MAINTAINERS: drop Sekhar Nori
* clk-hisilicon:
clk: hisilicon: Use devm_kcalloc() instead of devm_kzalloc()
clk: hisilicon: hi3559a: Fix an erroneous devm_kfree()
clk: hisilicon: hi3519: Release the correct number of gates in hi3519_clk_unregister()
* clk-mediatek:
clk: mediatek: clk-mt8173-apmixedsys: Use common error handling code in clk_mt8173_apmixed_probe()
clk: mediatek: add infracfg reset controller for mt7988
dt-bindings: reset: mediatek: add MT7988 infracfg reset IDs
dt-bindings: clock: mediatek: convert SSUSBSYS to the json-schema clock
dt-bindings: clock: mediatek: convert PCIESYS to the json-schema clock
dt-bindings: clock: mediatek: convert hifsys to the json-schema clock
clk: mediatek: mt7981-topckgen: flag SGM_REG_SEL as critical
clk: mediatek: mt8183: Correct parent of CLK_INFRA_SSPM_32K_SELF
clk: mediatek: mt7622-apmixedsys: Fix an error handling path in clk_mt8135_apmixed_probe()
clk: mediatek: mt8135: Fix an error handling path in clk_mt8135_apmixed_probe()
* clk-bulk:
clk: Provide managed helper to get and enable bulk clocks
Linus Torvalds [Wed, 13 Mar 2024 19:23:36 +0000 (12:23 -0700)]
Merge tag 'mailbox-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox
Pull mailbox updates from Jassi Brar:
- imx: add support for i.MX95 ELE/V2X MU
- misc: I will be signing-off from my personal gmail id from now on
* tag 'mailbox-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox:
mailbox: imx: support i.MX95 Generic/ELE/V2X MU
mailbox: imx: populate sub-nodes
mailbox: imx: get RR/TR registers num from Parameter register
mailbox: imx: support return value of init
dt-bindings: mailbox: fsl,mu: add i.MX95 Generic/ELE/V2X MU compatible
Barry Song [Thu, 22 Feb 2024 08:11:35 +0000 (21:11 +1300)]
mm/zswap: remove the memcpy if acomp is not sleepable
Most compressors are actually CPU-based and won't sleep during compression
and decompression. We should remove the redundant memcpy for them.
This patch checks if the algorithm is sleepable by testing the
CRYPTO_ALG_ASYNC algorithm flag.
Generally speaking, async and sleepable are semantically similar but not
equal. But for compress drivers, they are basically equal at least due to
the below facts.
Firstly, scompress drivers - crypto/deflate.c, lz4.c, zstd.c, lzo.c etc
have no sleep. Secondly, zRAM has been using these scompress drivers for
years in atomic contexts, and never worried those drivers going to sleep.
One exception is that an async driver can sometimes still return
synchronously per Herbert's clarification. In this case, we are still
having a redundant memcpy. But we can't know if one particular acomp
request will sleep or not unless crypto can expose more details for each
specific request from offload drivers.
Barry Song [Thu, 22 Feb 2024 08:11:34 +0000 (21:11 +1300)]
crypto: introduce: acomp_is_async to expose if comp drivers might sleep
acomp's users might want to know if acomp is really async to optimize
themselves. One typical user which can benefit from exposed async stat is
zswap.
In zswap, zsmalloc is the most commonly used allocator for (and perhaps
the only one). For zsmalloc, we cannot sleep while we map the compressed
memory, so we copy it to a temporary buffer. By knowing the alg won't
sleep can help zswap to avoid the need for a buffer. This shows
noticeable improvement in load/store latency of zswap.
Barry Song [Fri, 8 Mar 2024 09:27:21 +0000 (22:27 +1300)]
mm: prohibit the last subpage from reusing the entire large folio
In a Copy-on-Write (CoW) scenario, the last subpage will reuse the entire
large folio, resulting in the waste of (nr_pages - 1) pages. This wasted
memory remains allocated until it is either unmapped or memory reclamation
occurs.
The following small program can serve as evidence of this behavior
For example, using a 1024KiB mTHP by:
echo always > /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled
(1) w/o the patch, it takes 2GiB,
Before running the test program,
/ # free -m
total used free shared buff/cache available
Mem: 5754 84 5692 0 17 5669
Swap: 0 0 0
/ # /a.out &
/ # done
After running the test program,
/ # free -m
total used free shared buff/cache available
Mem: 5754 2149 3627 0 19 3605
Swap: 0 0 0
(2) w/ the patch, it takes 1GiB only,
Before running the test program,
/ # free -m
total used free shared buff/cache available
Mem: 5754 89 5687 0 17 5664
Swap: 0 0 0
/ # /a.out &
/ # done
After running the test program,
/ # free -m
total used free shared buff/cache available
Mem: 5754 1122 4655 0 17 4632
Swap: 0 0 0
This patch migrates the last subpage to a small folio and immediately
returns the large folio to the system. It benefits both memory availability
and anti-fragmentation.
Peter Xu [Mon, 11 Mar 2024 16:10:45 +0000 (12:10 -0400)]
mm: recover pud_leaf() definitions in nopmd case
This reverts one change in commit 924bd6a8c967 ("mm/x86: drop two
unnecessary pud_leaf() definitions").
One issue with that is it broke nopmd builds for at least both arm64 and
riscv (CONFIG_PGTABLE_LEVELS=2). The other issue is it was overlooked that
it's a common change rather than x86 specific (relevant to the commit
message of the commit).
Normally there's no need for empty definition of pXd_leaf() because of the
fallback functions, however this logic may not apply to pgtable-nopmd.h,
because that's a header that can even be used by arch *pgtable.h headers,
which can use the *_leaf() definitions _before_ the fallback functions are
defined. Leave it there to pass PGTABLE_LEVELS=2 builds.
Linus Torvalds [Wed, 13 Mar 2024 19:03:57 +0000 (12:03 -0700)]
Merge tag 'thermal-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These mostly change the thermal core in a few ways allowing thermal
drivers to be simplified, in particular in their removal and failing
probe handling parts that are notoriously prone to errors, and
propagate the changes to several drivers.
Apart from that, support for a new platform is added (Intel Lunar
Lake-M), some bugs are fixed and some code is cleaned up, as usual.
Specifics:
- Store zone trips table and zone operations directly in struct
thermal_zone_device (Rafael Wysocki)
- Fix up flex array initialization during thermal zone device
registration (Nathan Chancellor)
- Rework writable trip points handling in the thermal core and
several drivers (Rafael Wysocki)
- Use thermal zone accessor functions in the int340x Intel thermal
driver (Rafael Wysocki)
- Add Lunar Lake-M PCI ID to the int340x Intel thermal driver
(Srinivas Pandruvada)
- Minor fixes for thermal governors (Rafael Wysocki, Di Shen)
- Trip point handling fixes for the iwlwifi wireless driver (Rafael
Wysocki)
- Code cleanups (Rafael J. Wysocki, AngeloGioacchino Del Regno)"
* tag 'thermal-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (29 commits)
thermal: core: remove unnecessary check in trip_point_hyst_store()
thermal: intel: int340x_thermal: Use thermal zone accessor functions
thermal: core: Remove excess empty line from a comment
thermal: int340x: processor_thermal: Add Lunar Lake-M PCI ID
thermal: core: Eliminate writable trip points masks
thermal: of: Set THERMAL_TRIP_FLAG_RW_TEMP directly
thermal: imx: Set THERMAL_TRIP_FLAG_RW_TEMP directly
wifi: iwlwifi: mvm: Set THERMAL_TRIP_FLAG_RW_TEMP directly
mlxsw: core_thermal: Set THERMAL_TRIP_FLAG_RW_TEMP directly
thermal: intel: Set THERMAL_TRIP_FLAG_RW_TEMP directly
thermal: core: Drop the .set_trip_hyst() thermal zone operation
thermal: core: Add flags to struct thermal_trip
thermal: core: Move initial num_trips assignment before memcpy()
thermal: Get rid of CONFIG_THERMAL_WRITABLE_TRIPS
thermal: intel: Adjust ops handling during thermal zone registration
thermal: ACPI: Constify acpi_thermal_zone_ops
thermal: core: Store zone ops in struct thermal_zone_device
thermal: intel: Discard trip tables after zone registration
thermal: ACPI: Discard trips table after zone registration
thermal: core: Store zone trips table in struct thermal_zone_device
...
Linus Torvalds [Wed, 13 Mar 2024 18:54:05 +0000 (11:54 -0700)]
Merge tag 'acpi-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These modify the ACPI device events and processor enumeration code to
take the 'enabled' _STA bit into account as mandated by the ACPI
specification, convert several platform drivers to using a remove
callback that returns void, add some new quirks for ACPI IRQ override
and other things, address assorted issues and clean up code.
Specifics:
- Rearrange Device Check and Bus Check notification handling in the
ACPI device hotplug code to make it get the "enabled" _STA bit into
account (Rafael Wysocki)
- Modify acpi_processor_add() to skip processors with the "enabled"
_STA bit clear, as per the specification (Rafael Wysocki)
- Stop failing Device Check notification handling without a valid
reason (Rafael Wysocki)
- Defer enumeration of devices that depend on a device with an ACPI
device ID equalt to INTC10CF to address probe ordering issues on
some platforms (Wentong Wu)
- Constify acpi_bus_type (Ricardo Marliere)
- Make the ACPI-specific suspend-to-idle code take the Low-Power S0
Idle MSFT UUID into account on non-AMD systems (Rafael Wysocki)
- Add ACPI IRQ override quirks for some new platforms (Sergey
Kalinichev, Maxim Kudinov, Alexey Froloff, Sviatoslav Harasymchuk,
Nicolas Haye)
- Make the NFIT parsing code use acpi_evaluate_dsm_typed() (Andy
Shevchenko)
- Fix a memory leak in acpi_processor_power_exit() (Armin Wolf)
- Make it possible to quirk the CSI-2 and MIPI DisCo for Imaging
properties parsing and add a quirk for Dell XPS 9315 (Sakari Ailus)
- Prevent false-positive static checker warnings from triggering by
intializing some variables in the ACPI thermal code to zero (Colin
Ian King)
- Add DELL0501 handling to acpi_quirk_skip_serdev_enumeration() and
make that function generic (Hans de Goede)
- Make the ACPI backlight code handle fetching EDID that is longer
than 256 bytes (Mario Limonciello)
- Skip initialization of GHES_ASSIST structures for Machine Check
Architecture in APEI (Avadhut Naik)
- Convert several plaform drivers in the ACPI subsystem to using a
remove callback that returns void (Uwe Kleine-König)
- Drop the long-deprecated custom_method debugfs interface that is
problematic from the security standpoint (Rafael Wysocki)
- Use %pe in a couple of places in the ACPI code for easier error
decoding (Onkarnath)
- Fix register width information handling during system memory
accesses in the ACPI CPPC library (Jarred White)
- Add AMD CPPC V2 support for family 17h processors to the ACPI CPPC
library (Perry Yuan)"
* tag 'acpi-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits)
ACPI: resource: Use IRQ override on Maibenben X565
ACPI: CPPC: Use access_width over bit_width for system memory accesses
ACPI: CPPC: enable AMD CPPC V2 support for family 17h processors
ACPI: APEI: Skip initialization of GHES_ASSIST structures for Machine Check Architecture
ACPI: scan: Consolidate Device Check and Bus Check notification handling
ACPI: scan: Rework Device Check and Bus Check notification handling
ACPI: scan: Make acpi_processor_add() check the device enabled bit
ACPI: scan: Relocate acpi_bus_trim_one()
ACPI: scan: Fix device check notification handling
ACPI: resource: Add MAIBENBEN X577 to irq1_edge_low_force_override
ACPI: pfr_update: Convert to platform remove callback returning void
ACPI: pfr_telemetry: Convert to platform remove callback returning void
ACPI: fan: Convert to platform remove callback returning void
ACPI: GED: Convert to platform remove callback returning void
ACPI: DPTF: Convert to platform remove callback returning void
ACPI: AGDI: Convert to platform remove callback returning void
ACPI: TAD: Convert to platform remove callback returning void
ACPI: APEI: GHES: Convert to platform remove callback returning void
ACPI: property: Polish ignoring bad data nodes
ACPI: thermal_lib: Initialize temp_decik to zero
...
Linus Torvalds [Wed, 13 Mar 2024 18:40:06 +0000 (11:40 -0700)]
Merge tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"From the functional perspective, the most significant change here is
the addition of support for Energy Models that can be updated
dynamically at run time.
There is also the addition of LZ4 compression support for hibernation,
the new preferred core support in amd-pstate, new platforms support in
the Intel RAPL driver, new model-specific EPP handling in intel_pstate
and more.
Apart from that, the cpufreq default transition delay is reduced from
10 ms to 2 ms (along with some related adjustments), the system
suspend statistics code undergoes a significant rework and there is a
usual bunch of fixes and code cleanups all over.
Specifics:
- Allow the Energy Model to be updated dynamically (Lukasz Luba)
- Add support for LZ4 compression algorithm to the hibernation image
creation and loading code (Nikhil V)
- Fix and clean up system suspend statistics collection (Rafael
Wysocki)
- Simplify device suspend and resume handling in the power management
core code (Rafael Wysocki)
- Fix PCI hibernation support description (Yiwei Lin)
- Make hibernation take set_memory_ro() return values into account as
appropriate (Christophe Leroy)
- Set mem_sleep_current during kernel command line setup to avoid an
ordering issue with handling it (Maulik Shah)
- Fix wake IRQs handling when pm_runtime_force_suspend() is used as a
driver's system suspend callback (Qingliang Li)
- Simplify pm_runtime_get_if_active() usage and add a replacement for
pm_runtime_put_autosuspend() (Sakari Ailus)
- Add a tracepoint for runtime_status changes tracking (Vilas Bhat)
- Fix section title markdown in the runtime PM documentation (Yiwei
Lin)
- Enable preferred core support in the amd-pstate cpufreq driver
(Meng Li)
- Fix min_perf assignment in amd_pstate_adjust_perf() and make the
min/max limit perf values in amd-pstate always stay within the
(highest perf, lowest perf) range (Tor Vic, Meng Li)
- Allow intel_pstate to assign model-specific values to strings used
in the EPP sysfs interface and make it do so on Meteor Lake
(Srinivas Pandruvada)
- Drop long-unused cpudata::prev_cummulative_iowait from the
intel_pstate cpufreq driver (Jiri Slaby)
- Prevent scaling_cur_freq from exceeding scaling_max_freq when the
latter is an inefficient frequency (Shivnandan Kumar)
- Change default transition delay in cpufreq to 2ms (Qais Yousef)
- Remove references to 10ms minimum sampling rate from comments in
the cpufreq code (Pierre Gondois)
- Honour transition_latency over transition_delay_us in cpufreq (Qais
Yousef)
- Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar)
- General enhancements / cleanups to ARM cpufreq drivers (tianyu2,
NÃcolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia
Belova)
- Update cpufreq-dt-platdev to block/approve devices (Richard Acayan)
- Make the SCMI cpufreq driver get a transition delay value from
firmware (Pierre Gondois)
- Prevent the haltpoll cpuidle governor from shrinking guest
poll_limit_ns below grow_start (Parshuram Sangle)
- Avoid potential overflow in integer multiplication when computing
cpuidle state parameters (C Cheng)
- Adjust MWAIT hint target C-state computation in the ACPI cpuidle
driver and in intel_idle to return a correct value for C0 (He
Rongguang)
- Address multiple issues in the TPMI RAPL driver and add support for
new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui)
- Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel
Lezcano)
- Fix kernel-doc for dtpm_create_hierarchy() (Yang Li)
- Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth
Norway Ananda)
- Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil)
- Fix a couple of warnings in the OPP core code related to W=1 builds
(Viresh Kumar)
- Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh
Kumar)
- Extend dev_pm_opp_data with turbo support (Sibi Sankar)
- dt-bindings: drop maxItems from inner items (David Heidelberg)"
* tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (95 commits)
dt-bindings: opp: drop maxItems from inner items
OPP: debugfs: Fix warning around icc_get_name()
OPP: debugfs: Fix warning with W=1 builds
cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h
OPP: Extend dev_pm_opp_data with turbo support
Fix cpupower-frequency-info.1 man page typo
cpufreq: scmi: Set transition_delay_us
firmware: arm_scmi: Populate fast channel rate_limit
firmware: arm_scmi: Populate perf commands rate_limit
cpuidle: ACPI/intel: fix MWAIT hint target C-state computation
PM: sleep: wakeirq: fix wake irq warning in system suspend
powercap: dtpm: Fix kernel-doc for dtpm_create_hierarchy() function
cpufreq: Don't unregister cpufreq cooling on CPU hotplug
PM: suspend: Set mem_sleep_current during kernel command line setup
cpufreq: Honour transition_latency over transition_delay_us
cpufreq: Limit resolving a frequency to policy min/max
Documentation: PM: Fix runtime_pm.rst markdown syntax
cpufreq: amd-pstate: adjust min/max limit perf
cpufreq: Remove references to 10ms min sampling rate
cpufreq: intel_pstate: Update default EPPs for Meteor Lake
...
Linus Torvalds [Wed, 13 Mar 2024 18:33:10 +0000 (11:33 -0700)]
Merge tag 'pmdomain-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain updates from Ulf Hansson:
"Core:
- Log a message when unused PM domains gets disabled
- Scale down parent/child performance states in the reverse order
Providers:
- qcom: rpmpd: Add power domains support for MSM8974, MSM8974PRO,
PMA8084 and PM8841
- renesas: rcar-gen4-sysc: Reduce atomic delays
- renesas: rcar-sysc: Adjust the waiting time to cover the worst case
- renesas: r8a779h0-sysc: Add support for the r8a779h0 PM domains
- imx: imx8mp-blk-ctrl: Add the fdcc clock to the hdmimix domains
- imx: imx8mp-blk-ctrl: Error out if domains are missing in DT
Improve support for multiple PM domains:
- Add two helper functions to attach/detach multiple PM domains
- Convert a couple of drivers to use the new helper functions"
* tag 'pmdomain-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: (22 commits)
pmdomain: renesas: rcar-gen4-sysc: Reduce atomic delays
pmdomain: renesas: Adjust the waiting time to cover the worst case
pmdomain: qcom: rpmpd: Add MSM8974PRO+PMA8084 power domains
pmdomain: qcom: rpmpd: Add MSM8974+PM8841 power domains
pmdomain: core: constify of_phandle_args in add device and subdomain
pmdomain: core: constify of_phandle_args in xlate
media: venus: Convert to dev_pm_domain_attach|detach_list() for vcodec
remoteproc: qcom_q6v5_adsp: Convert to dev_pm_domain_attach|detach_list()
remoteproc: imx_rproc: Convert to dev_pm_domain_attach|detach_list()
remoteproc: imx_dsp_rproc: Convert to dev_pm_domain_attach|detach_list()
PM: domains: Add helper functions to attach/detach multiple PM domains
pmdomain: imx8mp-blk-ctrl: imx8mp_blk: Add fdcc clock to hdmimix domain
pmdomain: mediatek: Use devm_platform_ioremap_resource() in init_scp()
pmdomain: renesas: r8a779h0-sysc: Add r8a779h0 support
pmdomain: imx8mp-blk-ctrl: Error out if domains are missing in DT
pmdomain: ti: Add a null pointer check to the omap_prm_domain_init
pmdomain: renesas: rcar-gen4-sysc: Remove unneeded includes
pmdomain: core: Print a message when unused power domains are disabled
pmdomain: qcom: rpmpd: Keep one RPM handle for all RPMPDs
pmdomain: core: Scale down parent/child performance states in reverse order
...
Linus Torvalds [Wed, 13 Mar 2024 18:26:58 +0000 (11:26 -0700)]
Merge tag 'hwmon-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon updates from Guenter Roeck:
"New drivers:
- Amphenol ChipCap 2
- ASPEED g6 PWM/Fan tach
- Astera Labs PT5161L retimer
- ASUS ROG RYUJIN II 360 AIO cooler
- LTC4282
- Microsoft Surface devices
- MPS MPQ8785 Synchronous Step-Down Converter
- NZXT Kraken X and Z series AIO CPU coolers
Additional chip support in existing drivers:
- Ayaneo Air Plus 7320u (oxp-sensors)
- INA260 (ina2xx)
- XPS 9315 (dell-smm)
- MSI customer ID (nct6683)
Devicetree bindings updates:
- Common schema for hardware monitoring devices
- Common schema for fans
- Update chip descriptions to use common schema
- Document regulator properties in several drivers
- Explicit bindings for infineon buck converters
Other improvements:
- Replaced rbtree with maple tree register cache in several drivers
- Added support for humidity min/max alarm and volatage fault
attributes to hwmon core
- Dropped non-functional I2C_CLASS_HWMON support for drivers w/o
detect()
- Dropped obsolete and redundant entried from MAINTAINERS
- Cleaned up axi-fan-control and coretemp drivers
- Minor fixes and improvements in several other drivers"
* tag 'hwmon-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (70 commits)
hwmon: (dell-smm) Add XPS 9315 to fan control whitelist
hwmon: (aspeed-g6-pwm-tacho): Support for ASPEED g6 PWM/Fan tach
dt-bindings: hwmon: Support Aspeed g6 PWM TACH Control
dt-bindings: hwmon: fan: Add fan binding to schema
dt-bindings: hwmon: tda38640: Add interrupt & regulator properties
hwmon: (amc6821) add of_match table
dt-bindings: hwmon: lm75: use common hwmon schema
hwmon: (sis5595) drop unused DIV_TO_REG function
dt-bindings: hwmon: reference common hwmon schema
dt-bindings: hwmon: lltc,ltc4286: use common hwmon schema
dt-bindings: hwmon: adi,adm1275: use common hwmon schema
dt-bindings: hwmon: ti,ina2xx: use common hwmon schema
dt-bindings: hwmon: add common properties
hwmon: (pmbus/ir38064) Use PMBUS_REGULATOR_ONE to declare regulator
hwmon: (pmbus/lm25066) Use PMBUS_REGULATOR_ONE to declare regulator
hwmon: (pmbus/tda38640) Use PMBUS_REGULATOR_ONE to declare regulator
regulator: dt-bindings: promote infineon buck converters to their own binding
dt-bindings: hwmon/pmbus: ti,lm25066: document regulators
dt-bindings: hwmon: nuvoton,nct6775: Add compatible value for NCT6799
MAINTAINERS: Drop redundant hwmon entries
...
Linus Torvalds [Wed, 13 Mar 2024 18:14:55 +0000 (11:14 -0700)]
Merge tag 'gpio-updates-for-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"The biggest feature is the locking overhaul. Up until now the
synchronization in the GPIO subsystem was broken. There was a single
spinlock "protecting" multiple data structures but doing it wrong (as
evidenced by several places where it would be released when a sleeping
function was called and then reacquired without checking the protected
state).
We tried to use an RW semaphore before but the main issue with GPIO is
that we have drivers implementing the interfaces in both sleeping and
non-sleeping ways as well as user-facing interfaces that can be called
both from process as well as atomic contexts. Both ends converge in
the same code paths that can use neither spinlocks nor mutexes. The
only reasonable way out is to use SRCU and go mostly lockless. To that
end: we add several SRCU structs in relevant places and use them to
assure consistency between API calls together with atomic reads and
writes of GPIO descriptor flags where it makes sense.
This code has spent several weeks in next and has received several
fixes in the first week or two after which it stabilized nicely. The
GPIO subsystem is now resilient to providers being suddenly unbound.
We managed to also remove the existing character device RW semaphore
and the obsolete global spinlock.
Other than the locking rework we have one new driver (for Chromebook
EC), much appreciated documentation improvements from Kent and the
regular driver improvements, DT-bindings updates and GPIOLIB core
tweaks.
Serialization rework:
- use SRCU to serialize access to the global GPIO device list, to
GPIO device structs themselves and to GPIO descriptors
- make the GPIO subsystem resilient to the GPIO providers being
unbound while the API calls are in progress
- don't dereference the SRCU-protected chip pointer if the
information we need can be obtained from the GPIO device structure
- move some of the information contained in struct gpio_chip to
struct gpio_device to further reduce the need to dereference the
former
- pass the GPIO device struct instead of the GPIO chip to sysfs
callback to, again, reduce the need for accessing the latter
- get GPIO descriptors from the GPIO device, not from the chip for
the same reason
- allow for mostly lockless operation of the GPIO driver API: assure
consistency with SRCU and atomic operations
- remove the global GPIO spinlock
- remove the character device RW semaphore
Core GPIOLIB:
- constify pointers in GPIO API where applicable
- unify the GPIO counting APIs for ACPI and OF
- provide a macro for iterating over all GPIOs, not only the ones
that are requested
- remove leftover typedefs
- pass the consumer device to GPIO core in
devm_fwnode_gpiod_get_index() for improved logging
- constify the GPIO bus type
- don't warn about removing GPIO chips with descriptors still held by
users as we can now handle this situation gracefully
- remove unused logging helpers
- unexport functions that are only used internally in the GPIO
subsystem
- set the device type (assign the relevant struct device_type) for
GPIO devices
New drivers:
- add the ChromeOS EC GPIO driver
Driver improvements:
- allow building gpio-vf610 with COMPILE_TEST as well as disabling it
in menuconfig (before it was always built for i.MX cofigs)
- count the number of EICs using the device properties instead of
hard-coding it in gpio-eic-sprd
- improve the device naming, extend the debugfs output and add
lockdep asserts to gpio-sim
DT bindings:
- document the 'label' property for gpio-pca9570
- convert aspeed,ast2400-gpio bindings to DT schema
- disallow unevaluated properties for gpio-mvebu
- document a new model in renesas,rcar-gpio
Documentation:
- improve the character device kerneldocs in user-space headers
- add proper documentation for the character device uAPI (both v1 and v2)
- move the sysfs and gpio-mockup docs into the "obsolete" section
- improve naming consistency for GPIO terms
- clarify the line values description for sysfs
- minor docs improvements
- improve the driver API contract for setting GPIO direction
- mark unsafe APIs as deprecated in kerneldocs and suggest
replacements
Other:
- remove an obsolete test from selftests"
* tag 'gpio-updates-for-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (79 commits)
gpio: sysfs: repair export returning -EPERM on 1st attempt
selftest: gpio: remove obsolete gpio-mockup test
gpiolib: Deduplicate cleanup for-loop in gpiochip_add_data_with_key()
dt-bindings: gpio: aspeed,ast2400-gpio: Convert to DT schema
gpio: acpi: Make acpi_gpio_count() take firmware node as a parameter
gpio: of: Make of_gpio_get_count() take firmware node as a parameter
gpiolib: Pass consumer device through to core in devm_fwnode_gpiod_get_index()
gpio: sim: use for_each_hwgpio()
gpio: provide for_each_hwgpio()
gpio: don't warn about removing GPIO chips with active users anymore
gpio: sim: delimit the fwnode name with a ":" when generating labels
gpio: sim: add lockdep asserts
gpio: Add ChromeOS EC GPIO driver
gpio: constify of_phandle_args in of_find_gpio_device_by_xlate()
gpio: fix memory leak in gpiod_request_commit()
gpio: constify opaque pointer "data" in gpio_device_find()
gpio: cdev: fix a NULL-pointer dereference with DEBUG enabled
gpio: uapi: clarify default_values being logical
gpio: sysfs: fix inverted pointer logic
gpio: don't let lockdep complain about inherently dangerous RCU usage
...
Linus Torvalds [Wed, 13 Mar 2024 18:07:37 +0000 (11:07 -0700)]
Merge tag 'spi-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"This release sees some exciting changes from David Lechner which
implements some optimisations that have been talked about for a long
time which allows client drivers to pre-prepare SPI messages for
repeated or low latency use. This lets us move work out of latency
sensitive paths and avoid repeating work for frequently performed
operations. As well as being useful in itself this will also be used
in future to allow controllers to directly trigger SPI operations (eg,
from interrupts).
Otherwise this release has mostly been focused on cleanups, plus a
couple of new devices:
- Support for pre-optimising messages
- A big set of updates from Uwe Kleine-König moving drivers to use
APIs with more modern terminology for controllers
- Major overhaul of the s3c64xx driver
- Support for Google GS101 and Samsung Exynos850"
* tag 'spi-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (122 commits)
spi: Introduce SPI_INVALID_CS and is_valid_cs()
spi: Fix types of the last chip select storage variables
spi: Consistently use BIT for cs_index_mask
spi: Exctract spi_dev_check_cs() helper
spi: Exctract spi_set_all_cs_unused() helper
spi: s3c64xx: switch exynos850 to new port config data
spi: s3c64xx: switch gs101 to new port config data
spi: s3c64xx: deprecate fifo_lvl_mask, rx_lvl_offset and port_id
spi: s3c64xx: get rid of the OF alias ID dependency
spi: s3c64xx: introduce s3c64xx_spi_set_port_id()
spi: s3c64xx: let the SPI core determine the bus number
spi: s3c64xx: allow FIFO depth to be determined from the compatible
spi: s3c64xx: retrieve the FIFO depth from the device tree
spi: s3c64xx: determine the fifo depth only once
spi: s3c64xx: allow full FIFO masks
spi: s3c64xx: define a magic value
spi: dt-bindings: introduce FIFO depth properties
spi: axi-spi-engine: use struct_size() macro
spi: axi-spi-engine: use __counted_by() attribute
spi: axi-spi-engine: remove p from struct spi_engine_message_state
...
Linus Torvalds [Wed, 13 Mar 2024 18:05:20 +0000 (11:05 -0700)]
Merge tag 'regulator-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"This has been a very quiet release, mostly cleanups, API updates and
simple device additions. I messed up slightly and there are a couple
of duplicated commits resulting from me leaving things in my inbox
which didn't seem worth removing by the time I noticed them.
- Conversion of several drivers to GPIO descriptors
- Build out the features of of the MP8859 driver
- Support for Qualcomm PM4125 and PM6150"
* tag 'regulator-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (31 commits)
regulator: lp8788-buck: fix copy and paste bug in lp8788_dvs_gpio_request()
regulator: core: make regulator_class constant
regulator: da9121: Remove unused of_gpio.h
regulator: userspace-consumer: add module device table
regulator: dt-bindings: gpio-regulator: Fix "gpios-states" and "states" array bounds
regulator: mp8859: Implement set_current_limit()
regulator: mp8859: Report slew rate
regulator: mp8859: Support status and error readback
regulator: mp8859: Support active discharge control
regulator: mp8859: Support mode operations
regulator: mp8859: Support enable control
regulator: mp8859: Validate and log device identifier information
regulator: mp8859: Specify register accessibility and enable caching
regulator: max8998: Convert to GPIO descriptors
regulator: max8997: Convert to GPIO descriptors
regulator: lp8788-buck: Fully convert to GPIO descriptors
regulator: da9055: Fully convert to GPIO descriptors
regulator: max8973: Finalize switch to GPIO descriptors
regulator: dt-bindings: qcom,usb-vbus-regulator: add support for PM4125
regulator: dt-bindings: qcom,usb-vbus-regulator: add support for PM4125
...
Linus Torvalds [Wed, 13 Mar 2024 18:01:21 +0000 (11:01 -0700)]
Merge tag 'regmap-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"Just two updates this time around, a rework of max_register handling
which enables us to support devices with only one register better and
a new test which will be used to validate use of some new SPI
optimisations which will be coming in during this merge window"
* tag 'regmap-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: kunit: Add a test for ranges in combination with windows
regmap: rework ->max_register handling
Linus Torvalds [Wed, 13 Mar 2024 17:59:28 +0000 (10:59 -0700)]
Merge tag 'mmc-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC updates from Ulf Hansson:
"MMC core:
- Drop the use of BLK_BOUNCE_HIGH
- Fix partition switch for GP3
- Remove usage of the deprecated ida_simple API
MMC host:
- cqhci: Update bouncing email-addresses in MAINTAINERS
- davinci_mmc: Use sg_miter for PIO
- dw_mmc-hi3798cv200: Convert the DT bindings to YAML
- dw_mmc-hi3798mv200: Add driver for the new dw_mmc variant
- fsl-imx-esdhc: A couple of corrections/updates to the DT bindings
- meson-mx-sdhc: Drop use of the ->card_hw_reset() callback
- moxart-mmc: Use sg_miter for PIO
- moxart-mmc: Fix accounting for DMA transfers
- mvsdio: Use sg_miter for PIO
- mxcmmc: Use sg_miter for PIO
- omap: Use sg_miter for PIO
- renesas,sdhi: Add support for R-Car V4M variant
- sdhci-esdhc-mcf: Use sg_miter for swapping
- sdhci-of-dwcmshc: Add support for Sophgo CV1800B and SG2002 variants
- sh_mmcif: Use sg_miter for PIO
- tmio: Avoid concurrent runs of mmc_request_done()"
* tag 'mmc-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (44 commits)
mmc: core: make mmc_host_class constant
mmc: core: Fix switch on gp3 partition
mmc: tmio: comment the ERR_PTR usage in this driver
mmc: mmc_spi: Don't mention DMA direction
mmc: dw_mmc: Remove unused of_gpio.h
mmc: dw_mmc: add support for hi3798mv200
dt-bindings: mmc: hisilicon,hi3798cv200-dw-mshc: add Hi3798MV200 binding
dt-bindings: mmc: dw-mshc-hi3798cv200: convert to YAML
mmc: dw_mmc-hi3798cv200: remove MODULE_ALIAS()
mmc: core: Use a struct device* as in-param to mmc_of_parse_clk_phase()
mmc: wmt-sdmmc: remove an incorrect release_mem_region() call in the .remove function
mmc: tmio: avoid concurrent runs of mmc_request_done()
dt-bindings: mmc: fsl-imx-mmc: Document the required clocks
mmc: sh_mmcif: Advance sg_miter before reading blocks
mmc: sh_mmcif: sg_miter must not be atomic
mmc: sdhci-esdhc-mcf: Flag the sg_miter as atomic
dt-bindings: mmc: fsl-imx-esdhc: add default and 100mhz state
mmc: core: constify the struct device_type usage
mmc: sdhci-of-dwcmshc: Add support for Sophgo CV1800B and SG2002
dt-bindings: mmc: sdhci-of-dwcmhsc: Add Sophgo CV1800B and SG2002 support
...
Linus Torvalds [Wed, 13 Mar 2024 17:51:39 +0000 (10:51 -0700)]
Merge tag 'pwm/for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm updates from Uwe Kleine-König:
"This contains the usual amount of driver and device tree changes.
Additionally there is a big rework of how pwm lowlevel drivers are
registered to prepare adding character device support.
Thanks to Dharma Balasubiramani, Dong Aisheng, Duje Mihanović, Jerome
Brunet, Raag Jadav and Rafał Miłecki for their contributions. And
sorry for those who still need some patience because I didn't manage
to empty my review queue"
* tag 'pwm/for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux: (185 commits)
pwm: imx-tpm: fix probe crash due to access registers without clock
pwm: meson: generalize 4 inputs clock on meson8 pwm type
dt-bindings: pwm: amlogic: Add a new binding for meson8 pwm types
dt-bindings: pwm: amlogic: fix s4 bindings
pwm: dwc: simplify error handling
pwm: dwc: Add 16 channel support for Intel Elkhart Lake
pwm: dwc: drop redundant error check
staging: greybus: pwm: Make use of devm_pwmchip_alloc() function
staging: greybus: pwm: Rework how the number of PWM lines is determined
staging: greybus: pwm: Drop unused gb_connection_set_data()
staging: greybus: pwm: Rely on pwm framework to pass a valid hwpwm
staging: greybus: pwm: Make use of pwmchip_parent() accessor
staging: greybus: pwm: Change prototype of helpers to prepare further changes
leds: qcom-lpg: Make use of devm_pwmchip_alloc() function
drm/bridge: ti-sn65dsi86: Make use of devm_pwmchip_alloc() function
drm/bridge: ti-sn65dsi86: Make use of pwmchip_parent() accessor
gpio: mvebu: Make use of devm_pwmchip_alloc() function
pwm: xilinx: Make use of devm_pwmchip_alloc() function
pwm: xilinx: Prepare removing pwm_chip from driver data
pwm: vt8500: Make use of devm_pwmchip_alloc() function
...
Linus Torvalds [Wed, 13 Mar 2024 17:10:30 +0000 (10:10 -0700)]
Merge tag 'tag-chrome-platform-firmware-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform firmware updates from Tzung-Bi Shih:
- Allow userspace to automatically load coreboot modules by adding
modaliases and sending uevents
- Make bus_type const
* tag 'tag-chrome-platform-firmware-for-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
firmware: coreboot: Replace tag with id table in driver struct
firmware: coreboot: Generate aliases for coreboot modules
firmware: coreboot: Generate modalias uevent for devices
firmware: coreboot: make coreboot_bus_type const
Linus Torvalds [Wed, 13 Mar 2024 16:57:23 +0000 (09:57 -0700)]
Merge tag 'for-6.9/dm-vdo' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper VDO target from Mike Snitzer:
"Introduce the DM vdo target which provides block-level deduplication,
compression, and thin provisioning. Please see:
The DM vdo target handles its concurrency by pinning an IO, and
subsequent stages of handling that IO, to a particular VDO thread.
This aspect of VDO is "unique" but its overall implementation is very
tightly coupled to its mostly lockless threading model. As such, VDO
is not easily changed to use more traditional finer-grained locking
and Linux workqueues. Please see the "Zones and Threading" section of
vdo-design.rst
The DM vdo target has been used in production for many years but has
seen significant changes over the past ~6 years to prepare it for
upstream inclusion. The codebase is still large but it is isolated to
drivers/md/dm-vdo/ and has been made considerably more approachable
and maintainable.
Matt Sakai has been added to the MAINTAINERS file to reflect that he
will send VDO changes upstream through the DM subsystem maintainers"
* tag 'for-6.9/dm-vdo' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (142 commits)
dm vdo: document minimum metadata size requirements
dm vdo: remove meaningless version number constant
dm vdo: remove vdo_perform_once
dm vdo block-map: Remove stray semicolon
dm vdo string-utils: change from uds_ to vdo_ namespace
dm vdo logger: change from uds_ to vdo_ namespace
dm vdo funnel-queue: change from uds_ to vdo_ namespace
dm vdo indexer: fix use after free
dm vdo logger: remove log level to string conversion code
dm vdo: document log_level parameter
dm vdo: add 'log_level' module parameter
dm vdo: remove all sysfs interfaces
dm vdo target: eliminate inappropriate uses of UDS_SUCCESS
dm vdo indexer: update ASSERT and ASSERT_LOG_ONLY usage
dm vdo encodings: update some stale comments
dm vdo permassert: audit all of ASSERT to test for VDO_SUCCESS
dm-vdo funnel-workqueue: return VDO_SUCCESS from make_simple_work_queue
dm vdo thread-utils: return VDO_SUCCESS on vdo_create_thread success
dm vdo int-map: return VDO_SUCCESS on success
dm vdo: check for VDO_SUCCESS return value from memory-alloc functions
...
Linus Torvalds [Wed, 13 Mar 2024 16:46:51 +0000 (09:46 -0700)]
Merge tag 'for-6.9/dm-bh-wq' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper BH workqueue conversion from Mike Snitzer:
"Convert the DM verity and crypt targets from (ab)using tasklets to
using BH workqueues.
These changes were coordinated with Tejun and are based ontop of DM's
6.9 changes and Tejun's 6.9 workqueue tree"
* tag 'for-6.9/dm-bh-wq' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-verity: Convert from tasklet to BH workqueue
dm-crypt: Convert from tasklet to BH workqueue
Linus Torvalds [Wed, 13 Mar 2024 16:38:10 +0000 (09:38 -0700)]
Merge tag 'for-6.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Fix DM core's IO submission (which include dm-io and dm-bufio) such
that a bio's IO priority is propagated. Work focused on enabling both
DM crypt and verity targets to retain the appropriate IO priority
- Fix DM raid reshape logic to not allow an empty flush bio to be
requeued due to false concern about the bio, which doesn't have a
data payload, accessing beyond the end of the device
- Fix DM core's internal resume so that it properly calls both presume
and resume methods, which fixes the potential for a postsuspend and
resume imbalance
- Update DM verity target to set DM_TARGET_SINGLETON flag because it
doesn't make sense to have a DM table with a mix of targets that
include dm-verity
- Small cleanups in DM crypt, thin, and integrity targets
- Fix references to dm-devel mailing list to use latest list address
* tag 'for-6.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: call the resume method on internal suspend
dm raid: fix false positive for requeue needed during reshape
dm-integrity: set max_integrity_segments in dm_integrity_io_hints
dm: update relevant MODULE_AUTHOR entries to latest dm-devel mailing list
dm ioctl: update DM_DRIVER_EMAIL to new dm-devel mailing list
dm verity: set DM_TARGET_SINGLETON feature flag
dm crypt: Fix IO priority lost when queuing write bios
dm verity: Fix IO priority lost when reading FEC and hash
dm bufio: Support IO priority
dm io: Support IO priority
dm crypt: remove redundant state settings after waking up
dm thin: add braces around conditional code that spans lines
Linus Torvalds [Wed, 13 Mar 2024 16:29:53 +0000 (09:29 -0700)]
Merge tag 'ata-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata updates from Niklas Cassel:
- Do not enable LPM for external ports (hotplug-capable ports or eSATA
ports), as the HBA will not be able to detect hot plug removal events
when LPM is enabled (me)
- Drop the board type board_ahci_low_power. Now when we make sure that
we won't enable LPM for external ports, we can always set the LPM
policy to CONFIG_SATA_MOBILE_LPM_POLICY for internal ports. There is
thus no longer any need for the board type board_ahci_low_power, so
it can be removed. (As before, LPM features not supported by the HBA
and/or the device will not be enabled, regardless of the LPM policy
Kconfig) (Mario Limonciello)
Note that the default CONFIG_SATA_MOBILE_LPM_POLICY value is still 0
(which will not try to enable any LPM features), however, most Linux
distributions override this and set it to 3 (Medium power with DIPM).
We intend to change the default to 3 in the coming cycles, but we
will wait a cycle or two.
- Add board type board_ahci_pcs_quirk and make all legacy Intel
platforms use it. The Intel PCS quirk was being applied to basically
all Intel platforms, which caused some issues (the device failing to
come back after a reset), when being applied to newer Intel platforms
where it shouldn't have been applied.
New platforms can be added using board type board_ahci, which will
not have the quirk applied (me)
- Rename board_ahci_nosntf to board_ahci_pcs_quirk_no_sntf to more
clearly highlight that it applies two different quirks (me)
- Modify the ahci_broken_devslp() quirk to be implemented like all the
other quirks (i.e. define a board type for the quirk) (me)
- Drop unused board_ahci_noncq board type (me)
- Rename board_ahci_nomsi to board_ahci_no_msi to match the other board
types (me)
- Make pata_parport_bus_type const (Ricardo B. Marliere)
- Remove at91 compact flash device tree binding. (The binding is not
used by any driver.) (from Hari Prasath Gujulan Elango)
- Convert MediaTek device tree binding to json-schema (Rafał Miłecki)
- At boot, print the number of implemented ports, instead of printing
the maximum number of ports supported by the HBA silicon (me)
* tag 'ata-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ahci: print the number of implemented ports
dt-bindings: ata: convert MediaTek controller to the json-schema
ahci: rename board_ahci_nomsi
ahci: drop unused board_ahci_noncq
ahci: clean up ahci_broken_devslp quirk
ahci: rename board_ahci_nosntf
ahci: clean up intel_pcs_quirk
ata: ahci: Drop low power policy board type
ata: ahci: do not enable LPM on external ports
ata: ahci: drop hpriv param from ahci_update_initial_lpm_policy()
ata: ahci: a hotplug capable port is an external port
ata: ahci: move marking of external port earlier
dt-bindings: ata: atmel: remove at91 compact flash documentation
ata: pata_parport: make pata_parport_bus_type const
Linus Torvalds [Wed, 13 Mar 2024 16:23:21 +0000 (09:23 -0700)]
Merge tag 'dma-mapping-6.9-2024-03-11' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- fix leaked pages on dma_set_decrypted() failure (Rick Edgecombe)
- add a new swiotlb debugfs file (ZhangPeng)
* tag 'dma-mapping-6.9-2024-03-11' of git://git.infradead.org/users/hch/dma-mapping:
dma-direct: Leak pages on dma_set_decrypted() failure
swiotlb: add debugfs to track swiotlb transient pool usage
Linus Torvalds [Wed, 13 Mar 2024 16:15:30 +0000 (09:15 -0700)]
Merge tag 'iommu-updates-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"Core changes:
- Constification of bus_type pointer
- Preparations for user-space page-fault delivery
- Use a named kmem_cache for IOVA magazines
Intel VT-d changes from Lu Baolu:
- Add RBTree to track iommu probed devices
- Add Intel IOMMU debugfs document
- Cleanup and refactoring
ARM-SMMU Updates from Will Deacon:
- Device-tree binding updates for a bunch of Qualcomm SoCs
- SMMUv2: Support for Qualcomm X1E80100 MDSS
- SMMUv3: Significant rework of the driver's STE manipulation and
domain handling code. This is the initial part of a larger scale
rework aiming to improve the driver's implementation of the
IOMMU-API in preparation for hooking up IOMMUFD support.
AMD-Vi Updates:
- Refactor GCR3 table support for SVA
- Cleanups
Some smaller cleanups and fixes"
* tag 'iommu-updates-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (88 commits)
iommu: Fix compilation without CONFIG_IOMMU_INTEL
iommu/amd: Fix sleeping in atomic context
iommu/dma: Document min_align_mask assumption
iommu/vt-d: Remove scalabe mode in domain_context_clear_one()
iommu/vt-d: Remove scalable mode context entry setup from attach_dev
iommu/vt-d: Setup scalable mode context entry in probe path
iommu/vt-d: Fix NULL domain on device release
iommu: Add static iommu_ops->release_domain
iommu/vt-d: Improve ITE fault handling if target device isn't present
iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected
PCI: Make pci_dev_is_disconnected() helper public for other drivers
iommu/vt-d: Use device rbtree in iopf reporting path
iommu/vt-d: Use rbtree to track iommu probed devices
iommu/vt-d: Merge intel_svm_bind_mm() into its caller
iommu/vt-d: Remove initialization for dynamically heap-allocated rcu_head
iommu/vt-d: Remove treatment for revoking PASIDs with pending page faults
iommu/vt-d: Add the document for Intel IOMMU debugfs
iommu/vt-d: Use kcalloc() instead of kzalloc()
iommu/vt-d: Remove INTEL_IOMMU_BROKEN_GFX_WA
iommu: re-use local fwnode variable in iommu_ops_from_fwnode()
...
The SCTLR_EL1.WXN control forces execute-never when a page has write
permissions. While the idea of hardening such write/exec combinations is
good, with permissions indirection enabled (FEAT_PIE) this control
becomes RES0. FEAT_PIE introduces a slightly different form of WXN which
only has an effect when the base permission is RWX and the write is
toggled by the permission overlay (FEAT_POE, not yet supported by the
arm64 kernel). Revert the patch for now.
Sandipan Das [Mon, 29 Jan 2024 11:06:26 +0000 (16:36 +0530)]
perf/x86/amd/core: Avoid register reset when CPU is dead
When bringing a CPU online, some of the PMC and LBR related registers
are reset. The same is done when a CPU is taken offline although that
is unnecessary. This currently happens in the "cpu_dead" callback which
is also incorrect as the callback runs on a control CPU instead of the
one that is being taken offline. This also affects hibernation and
suspend to RAM on some platforms as reported in the link below.
The Revision Guide for AMD Family 19h Model 10-1Fh processors declares
Erratum 1452 which states that non-branch entries may erroneously be
recorded in the Last Branch Record (LBR) stack with the valid and
spec bits set.
Such entries can be recognized by inspecting bit 61 of the corresponding
LastBranchStackToIp register. This bit is currently reserved but if found
to be set, the associated branch entry should be discarded.
Valentine Altair [Tue, 12 Mar 2024 14:42:00 +0000 (14:42 +0000)]
ALSA: hda/realtek - ALC236 fix volume mute & mic mute LED on some HP models
Some HP laptops have received revisions that altered their board IDs
and therefore the current patches/quirks do not apply to them.
Specifically, for my Probook 440 G8, I have a board ID of 8a74.
It is necessary to add a line for that specific model.
Robert Richter [Sat, 17 Feb 2024 21:39:46 +0000 (22:39 +0100)]
lib/firmware_table: Provide buffer length argument to cdat_table_parse()
There exist card implementations with a CDAT table using a fixed size
buffer, but with entries filled in that do not fill the whole table
length size. Then, the last entry in the CDAT table may not mark the
end of the CDAT table buffer specified by the length field in the CDAT
header. It can be shorter with trailing unused (zero'ed) data. The
actual table length is determined while reading all CDAT entries of
the table with DOE.
If the table is greater than expected (containing zero'ed trailing
data), the CDAT parser fails with:
In addition, a check of the table buffer length is missing to prevent
an out-of-bound access then parsing the CDAT table.
Hardening code against device returning borked table. Fix that by
providing an optional buffer length argument to
acpi_parse_entries_array() that can be used by cdat_table_parse() to
propagate the buffer size down to its users to check the buffer
length. This also prevents a possible out-of-bound access mentioned.
Add a check to warn about a malformed CDAT table length.
Robert Richter [Fri, 16 Feb 2024 15:58:43 +0000 (16:58 +0100)]
cxl/pci: Get rid of pointer arithmetic reading CDAT table
Reading the CDAT table using DOE requires a Table Access Response
Header in addition to the CDAT entry. In current implementation this
has caused offsets with sizeof(__le32) to the actual buffers. This led
to hardly readable code and even bugs. E.g., see fix of devm_kfree()
in read_cdat_data():
commit c65efe3685f5 ("cxl/cdat: Free correct buffer on checksum error")
Rework code to avoid calculations with sizeof(__le32). Introduce
struct cdat_doe_rsp for this which contains the Table Access Response
Header and a variable payload size for various data structures
afterwards to access the CDAT table and its CDAT Data Structures
without recalculating buffer offsets.
Robert Richter [Fri, 16 Feb 2024 15:58:42 +0000 (16:58 +0100)]
cxl/pci: Rename DOE mailbox handle to doe_mb
Trivial variable rename for the DOE mailbox handle from cdat_doe to
doe_mb. The variable name cdat_doe is too ambiguous, use doe_mb that
is commonly used for the mailbox.
Dave Jiang [Fri, 1 Mar 2024 21:09:48 +0000 (14:09 -0700)]
cxl: Fix the incorrect assignment of SSLBIS entry pointer initial location
The 'entry' pointer in cdat_sslbis_handler() is set to header +
sizeof(common header). However, the math missed the addition of the SSLBIS
main header. It should be header + sizeof(common header) + sizeof(*sslbis).
Use a defined struct for all the SSLBIS parts in order to avoid pointer
math errors.
The bug causes incorrect parsing of the SSLBIS table and introduces incorrect
performance values to the access_coordinates during the CXL access_coordinate
calculation path if there are CXL switches present in the topology.
The issue was found during testing of new code being added to add additional
checks for invalid CDAT values during CXL access_coordinate calculation. The
testing was done on qemu with a CXL topology including a CXL switch.
Ben Cheatham [Mon, 11 Mar 2024 14:25:07 +0000 (09:25 -0500)]
cxl/core: Add CXL EINJ debugfs files
Export CXL helper functions in einj-cxl.c for getting/injecting
available CXL protocol error types to sysfs under kernel/debug/cxl.
The kernel/debug/cxl/einj_types file will print the available CXL
protocol errors in the same format as the available_error_types
file provided by the einj module. The
kernel/debug/cxl/$dport_dev/einj_inject file is functionally the same
as the error_type and error_inject files provided by the EINJ module,
i.e.: writing an error type into $dport_dev/einj_inject will inject
said error type into the CXL dport represented by $dport_dev.
Ben Cheatham [Mon, 11 Mar 2024 14:25:08 +0000 (09:25 -0500)]
EINJ, Documentation: Update EINJ kernel doc
Update EINJ kernel document to include how to inject CXL protocol error
types, build the kernel to include CXL error types, and give an example
injection.
Ben Cheatham [Mon, 11 Mar 2024 14:25:06 +0000 (09:25 -0500)]
EINJ: Add CXL error type support
Move CXL protocol error types from einj.c (now einj-core.c) to einj-cxl.c.
einj-cxl.c implements the necessary handling for CXL protocol error
injection and exposes an API for the CXL core to use said functionality,
while also allowing the EINJ module to be built without CXL support.
Because CXL error types targeting CXL 1.0/1.1 ports require special
handling, only allow them to be injected through the new cxl debugfs
interface (next commit) and return an error when attempting to inject
through the legacy interface.
Ben Cheatham [Mon, 11 Mar 2024 14:25:05 +0000 (09:25 -0500)]
EINJ: Migrate to a platform driver
Change the EINJ module to install a platform device/driver on module
init and move the module init() and exit() functions to driver probe and
remove. This change allows the EINJ module to load regardless of whether
setting up EINJ succeeds, which allows dependent modules to still load
(i.e. the CXL core).
Since EINJ may no longer be initialized when the module loads, any
functions that are called from dependent/external modules should
safegaurd against the case EINJ didn't load.
Linus Torvalds [Wed, 13 Mar 2024 03:54:50 +0000 (20:54 -0700)]
Merge tag 'printk-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
"Improve the behavior during panic. The issues were found when testing
the ongoing changes introducing atomic consoles and printk kthreads:
- pr_flush() has to wait for the last reserved record instead of the
last finalized one. Note that records are finalized in random order
when generated by more CPUs in parallel.
- Ignore non-finalized records during panic(). Messages printed on
panic-CPU are always finalized. Messages printed by other CPUs
might never be finalized when the CPUs get stopped.
- Block new printk() calls on non-panic CPUs completely. Backtraces
are printed before entering the panic mode. Later messages would
just mess information printed by the panic CPU.
- Do not take console_lock in console_flush_on_panic() at all. The
original code did try_lock()/console_unlock(). The unlock part
might cause a deadlock when panic() happened in a scheduler code.
- Fix conversion of 64-bit sequence number for 32-bit atomic
operations"
* tag 'printk-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
dump_stack: Do not get cpu_sync for panic CPU
panic: Flush kernel log buffer at the end
printk: Avoid non-panic CPUs writing to ringbuffer
printk: Disable passing console lock owner completely during panic()
printk: ringbuffer: Skip non-finalized records in panic
printk: Wait for all reserved records with pr_flush()
printk: ringbuffer: Cleanup reader terminology
printk: Add this_cpu_in_panic()
printk: For @suppress_panic_printk check for other CPU in panic
printk: ringbuffer: Clarify special lpos values
printk: ringbuffer: Do not skip non-finalized records with prb_next_seq()
printk: Use prb_first_seq() as base for 32bit seq macros
printk: Adjust mapping for 32bit seq macros
printk: nbcon: Relocate 32bit seq macros
Linus Torvalds [Wed, 13 Mar 2024 03:32:19 +0000 (20:32 -0700)]
mm, slab: remove last vestiges of SLAB_MEM_SPREAD
Yes, yes, I know the slab people were planning on going slow and letting
every subsystem fight this thing on their own. But let's just rip off
the band-aid and get it over and done with. I don't want to see a
number of unnecessary pull requests just to get rid of a flag that no
longer has any meaning.
This was mainly done with a couple of 'sed' scripts and then some manual
cleanup of the end result.
Linus Torvalds [Wed, 13 Mar 2024 03:14:54 +0000 (20:14 -0700)]
Merge tag 'slab-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- Freelist loading optimization (Chengming Zhou)
When the per-cpu slab is depleted and a new one loaded from the cpu
partial list, optimize the loading to avoid an irq enable/disable
cycle. This results in a 3.5% performance improvement on the "perf
bench sched messaging" test.
- Kernel boot parameters cleanup after SLAB removal (Xiongwei Song)
Due to two different main slab implementations we've had boot
parameters prefixed either slab_ and slub_ with some later becoming
an alias as both implementations gained the same functionality (i.e.
slab_nomerge vs slub_nomerge). In order to eventually get rid of the
implementation-specific names, the canonical and documented
parameters are now all prefixed slab_ and the slub_ variants become
deprecated but still working aliases.
The flags had hardcoded #define values which became tedious and
error-prone when adding new ones. Assign the values via an enum that
takes care of providing unique bit numbers. Also deprecate
SLAB_MEM_SPREAD which was only used by SLAB, so it's a no-op since
SLAB removal. Assign it an explicit zero value. The removals of the
flag usage are handled independently in the respective subsystems,
with a final removal of any leftover usage planned for the next
release.
Includes removal of unused code or function parameters and a fix of a
memleak.
* tag 'slab-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slab: remove PARTIAL_NODE slab_state
mm, slab: remove memcg_from_slab_obj()
mm, slab: remove the corner case of inc_slabs_node()
mm/slab: Fix a kmemleak in kmem_cache_destroy()
mm, slab, kasan: replace kasan_never_merge() with SLAB_NO_MERGE
mm, slab: use an enum to define SLAB_ cache creation flags
mm, slab: deprecate SLAB_MEM_SPREAD flag
mm, slab: fix the comment of cpu partial list
mm, slab: remove unused object_size parameter in kmem_cache_flags()
mm/slub: remove parameter 'flags' in create_kmalloc_caches()
mm/slub: remove unused parameter in next_freelist_entry()
mm/slub: remove full list manipulation for non-debug slab
mm/slub: directly load freelist from cpu partial slab in the likely case
mm/slub: make the description of slab_min_objects helpful in doc
mm/slub: replace slub_$params with slab_$params in slub.rst
mm/slub: unify all sl[au]b parameters with "slab_$param"
Documentation: kernel-parameters: remove noaliencache
Linus Torvalds [Wed, 13 Mar 2024 03:03:34 +0000 (20:03 -0700)]
Merge tag 'lsm-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Promote IMA/EVM to a proper LSM
This is the bulk of the diffstat, and the source of all the changes
in the VFS code. Prior to the start of the LSM stacking work it was
important that IMA/EVM were separate from the rest of the LSMs,
complete with their own hooks, infrastructure, etc. as it was the
only way to enable IMA/EVM at the same time as a LSM.
However, now that the bulk of the LSM infrastructure supports
multiple simultaneous LSMs, we can simplify things greatly by
bringing IMA/EVM into the LSM infrastructure as proper LSMs. This is
something I've wanted to see happen for quite some time and Roberto
was kind enough to put in the work to make it happen.
- Use the LSM hook default values to simplify the call_int_hook() macro
Previously the call_int_hook() macro required callers to supply a
default return value, despite a default value being specified when
the LSM hook was defined.
This simplifies the macro by using the defined default return value
which makes life easier for callers and should also reduce the number
of return value bugs in the future (we've had a few pop up recently,
hence this work).
- Use the KMEM_CACHE() macro instead of kmem_cache_create()
The guidance appears to be to use the KMEM_CACHE() macro when
possible and there is no reason why we can't use the macro, so let's
use it.
- Fix a number of comment typos in the LSM hook comment blocks
Not much to say here, we fixed some questionable grammar decisions in
the LSM hook comment blocks.
* tag 'lsm-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (28 commits)
cred: Use KMEM_CACHE() instead of kmem_cache_create()
lsm: use default hook return value in call_int_hook()
lsm: fix typos in security/security.c comment headers
integrity: Remove LSM
ima: Make it independent from 'integrity' LSM
evm: Make it independent from 'integrity' LSM
evm: Move to LSM infrastructure
ima: Move IMA-Appraisal to LSM infrastructure
ima: Move to LSM infrastructure
integrity: Move integrity_kernel_module_request() to IMA
security: Introduce key_post_create_or_update hook
security: Introduce inode_post_remove_acl hook
security: Introduce inode_post_set_acl hook
security: Introduce inode_post_create_tmpfile hook
security: Introduce path_post_mknod hook
security: Introduce file_release hook
security: Introduce file_post_open hook
security: Introduce inode_post_removexattr hook
security: Introduce inode_post_setattr hook
security: Align inode_setattr hook definition with EVM
...
Linus Torvalds [Wed, 13 Mar 2024 02:48:03 +0000 (19:48 -0700)]
Merge tag 'selinux-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"Really only a few notable changes:
- Continue the coding style/formatting fixup work
This is the bulk of the diffstat in this pull request, with the
focus this time around being the security/selinux/ss directory.
We've only got a couple of files left to cleanup and once we're
done with that we can start enabling some automatic style
verfication and introduce tooling to help new folks format their
code correctly.
- Don't restrict xattr copy-up when SELinux policy is not loaded
This helps systems that use overlayfs, or similar filesystems,
preserve their SELinux labels during early boot when the SELinux
policy has yet to be loaded.
- Reduce the work we do during inode initialization time
This isn't likely to show up in any benchmark results, but we
removed an unnecessary SELinux object class lookup/calculation
during inode initialization.
- Correct the return values in selinux_socket_getpeersec_dgram()
We had some inconsistencies with respect to our return values
across selinux_socket_getpeersec_dgram() and
selinux_socket_getpeersec_stream().
This provides a more uniform set of error codes across the two
functions and should help make it easier for users to identify
the source of a failure"
* tag 'selinux-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (24 commits)
selinux: fix style issues in security/selinux/ss/symtab.c
selinux: fix style issues in security/selinux/ss/symtab.h
selinux: fix style issues in security/selinux/ss/sidtab.c
selinux: fix style issues in security/selinux/ss/sidtab.h
selinux: fix style issues in security/selinux/ss/services.h
selinux: fix style issues in security/selinux/ss/policydb.c
selinux: fix style issues in security/selinux/ss/policydb.h
selinux: fix style issues in security/selinux/ss/mls_types.h
selinux: fix style issues in security/selinux/ss/mls.c
selinux: fix style issues in security/selinux/ss/mls.h
selinux: fix style issues in security/selinux/ss/hashtab.c
selinux: fix style issues in security/selinux/ss/hashtab.h
selinux: fix style issues in security/selinux/ss/ebitmap.c
selinux: fix style issues in security/selinux/ss/ebitmap.h
selinux: fix style issues in security/selinux/ss/context.h
selinux: fix style issues in security/selinux/ss/context.h
selinux: fix style issues in security/selinux/ss/constraint.h
selinux: fix style issues in security/selinux/ss/conditional.c
selinux: fix style issues in security/selinux/ss/conditional.h
selinux: fix style issues in security/selinux/ss/avtab.c
...
Linus Torvalds [Wed, 13 Mar 2024 00:44:08 +0000 (17:44 -0700)]
Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Large effort by Eric to lower rtnl_lock pressure and remove locks:
- Make commonly used parts of rtnetlink (address, route dumps
etc) lockless, protected by RCU instead of rtnl_lock.
- Add a netns exit callback which already holds rtnl_lock,
allowing netns exit to take rtnl_lock once in the core instead
of once for each driver / callback.
- Remove locks / serialization in the socket diag interface.
- Remove 6 calls to synchronize_rcu() while holding rtnl_lock.
- Remove the dev_base_lock, depend on RCU where necessary.
- Support busy polling on a per-epoll context basis. Poll length and
budget parameters can be set independently of system defaults.
- Introduce struct net_hotdata, to make sure read-mostly global
config variables fit in as few cache lines as possible.
- Add optional per-nexthop statistics to ease monitoring / debug of
ECMP imbalance problems.
- Support TCP_NOTSENT_LOWAT in MPTCP.
- Ensure that IPv6 temporary addresses' preferred lifetimes are long
enough, compared to other configured lifetimes, and at least 2 sec.
- Support forwarding of ICMP Error messages in IPSec, per RFC 4301.
- Add support for the independent control state machine for bonding
per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
control state machine.
- Add "network ID" to MCTP socket APIs to support hosts with multiple
disjoint MCTP networks.
- Re-use the mono_delivery_time skbuff bit for packets which user
space wants to be sent at a specified time. Maintain the timing
information while traversing veth links, bridge etc.
- Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.
- Simplify many places iterating over netdevs by using an xarray
instead of a hash table walk (hash table remains in place, for use
on fastpaths).
- Speed up scanning for expired routes by keeping a dedicated list.
- Speed up "generic" XDP by trying harder to avoid large allocations.
- Support attaching arbitrary metadata to netconsole messages.
Things we sprinkled into general kernel code:
- Enforce VM_IOREMAP flag and range in ioremap_page_range and
introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
bpf_arena).
- Rework selftest harness to enable the use of the full range of ksft
exit code (pass, fail, skip, xfail, xpass).
Netfilter:
- Allow userspace to define a table that is exclusively owned by a
daemon (via netlink socket aliveness) without auto-removing this
table when the userspace program exits. Such table gets marked as
orphaned and a restarting management daemon can re-attach/regain
ownership.
- Speed up element insertions to nftables' concatenated-ranges set
type. Compact a few related data structures.
BPF:
- Add BPF token support for delegating a subset of BPF subsystem
functionality from privileged system-wide daemons such as systemd
through special mount options for userns-bound BPF fs to a trusted
& unprivileged application.
- Introduce bpf_arena which is sparse shared memory region between
BPF program and user space where structures inside the arena can
have pointers to other areas of the arena, and pointers work
seamlessly for both user-space programs and BPF programs.
- Introduce may_goto instruction that is a contract between the
verifier and the program. The verifier allows the program to loop
assuming it's behaving well, but reserves the right to terminate
it.
- Extend the BPF verifier to enable static subprog calls in spin lock
critical sections.
- Support registration of struct_ops types from modules which helps
projects like fuse-bpf that seeks to implement a new struct_ops
type.
- Add support for retrieval of cookies for perf/kprobe multi links.
- Support arbitrary TCP SYN cookie generation / validation in the TC
layer with BPF to allow creating SYN flood handling in BPF
firewalls.
- Add code generation to inline the bpf_kptr_xchg() helper which
improves performance when stashing/popping the allocated BPF
objects.
Wireless:
- Add SPP (signaling and payload protected) AMSDU support.
- Support wider bandwidth OFDMA, as required for EHT operation.
Driver API:
- Major overhaul of the Energy Efficient Ethernet internals to
support new link modes (2.5GE, 5GE), share more code between
drivers (especially those using phylib), and encourage more
uniform behavior. Convert and clean up drivers.
- Define an API for querying per netdev queue statistics from
drivers.
- IPSec: account in global stats for fully offloaded sessions.
- Create a concept of Ethernet PHY Packages at the Device Tree level,
to allow parameterizing the existing PHY package code.
- Enable Rx hashing (RSS) on GTP protocol fields.
Misc:
- Improvements and refactoring all over networking selftests.
- Create uniform module aliases for TC classifiers, actions, and
packet schedulers to simplify creating modprobe policies.
- Address all missing MODULE_DESCRIPTION() warnings in networking.
- Extend the Netlink descriptions in YAML to cover message
encapsulation or "Netlink polymorphism", where interpretation of
nested attributes depends on link type, classifier type or some
other "class type".
Drivers:
- Ethernet high-speed NICs:
- Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
- Intel (100G, ice, idpf):
- support E825-C devices
- nVidia/Mellanox:
- support devices with one port and multiple PCIe links
- Broadcom (bnxt):
- support n-tuple filters
- support configuring the RSS key
- Wangxun (ngbe/txgbe):
- implement irq_domain for TXGBE's sub-interrupts
- Pensando/AMD:
- support XDP
- optimize queue submission and wakeup handling (+17% bps)
- optimize struct layout, saving 28% of memory on queues
- Ethernet NICs embedded and virtual:
- Google cloud vNIC:
- refactor driver to perform memory allocations for new queue
config before stopping and freeing the old queue memory
- Synopsys (stmmac):
- obey queueMaxSDU and implement counters required by 802.1Qbv
- Renesas (ravb):
- support packet checksum offload
- suspend to RAM and runtime PM support
- Ethernet switches:
- nVidia/Mellanox:
- support for nexthop group statistics
- Microchip:
- ksz8: implement PHY loopback
- add support for KSZ8567, a 7-port 10/100Mbps switch
- PTP:
- New driver for RENESAS FemtoClock3 Wireless clock generator.
- Support OCP PTP cards designed and built by Adva.
- CAN:
- Support recvmsg() flags for own, local and remote traffic on CAN
BCM sockets.
- Support for esd GmbH PCIe/402 CAN device family.
- m_can:
- Rx/Tx submission coalescing
- wake on frame Rx
- WiFi:
- Intel (iwlwifi):
- enable signaling and payload protected A-MSDUs
- support wider-bandwidth OFDMA
- support for new devices
- bump FW API to 89 for AX devices; 90 for BZ/SC devices
- MediaTek (mt76):
- mt7915: newer ADIE version support
- mt7925: radio temperature sensor support
- Qualcomm (ath11k):
- support 6 GHz station power modes: Low Power Indoor (LPI),
Standard Power) SP and Very Low Power (VLP)
- QCA6390 & WCN6855: support 2 concurrent station interfaces
- QCA2066 support
- Qualcomm (ath12k):
- refactoring in preparation for Multi-Link Operation (MLO)
support
- 1024 Block Ack window size support
- firmware-2.bin support
- support having multiple identical PCI devices (firmware needs
to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
- QCN9274: support split-PHY devices
- WCN7850: enable Power Save Mode in station mode
- WCN7850: P2P support
- RealTek:
- rtw88: support for more rtw8811cu and rtw8821cu devices
- rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
- rtlwifi: speed up USB firmware initialization
- rtwl8xxxu:
- RTL8188F: concurrent interface support
- Channel Switch Announcement (CSA) support in AP mode
- Broadcom (brcmfmac):
- per-vendor feature support
- per-vendor SAE password setup
- DMI nvram filename quirk for ACEPC W5 Pro"
* tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
nexthop: Fix out-of-bounds access during attribute validation
nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
nexthop: Only parse NHA_OP_FLAGS for get messages that require it
bpf: move sleepable flag from bpf_prog_aux to bpf_prog
bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
selftests/bpf: Add kprobe multi triggering benchmarks
ptp: Move from simple ida to xarray
vxlan: Remove generic .ndo_get_stats64
vxlan: Do not alloc tstats manually
devlink: Add comments to use netlink gen tool
nfp: flower: handle acti_netdevs allocation failure
net/packet: Add getsockopt support for PACKET_COPY_THRESH
net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
selftests/bpf: Add bpf_arena_htab test.
selftests/bpf: Add bpf_arena_list test.
selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
bpf: Add helper macro bpf_addr_space_cast()
libbpf: Recognize __arena global variables.
bpftool: Recognize arena map type
...
Linus Torvalds [Tue, 12 Mar 2024 22:18:34 +0000 (15:18 -0700)]
Merge tag 'docs-6.9' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"A moderatly busy cycle for development this time around.
- Some cleanup of the main index page for easier navigation
- Rework some of the other top-level pages for better readability
and, with luck, fewer merge conflicts in the future.
- Submit-checklist improvements, hopefully the first of many.
- New Italian translations
- A fair number of kernel-doc fixes and improvements. We have also
dropped the recommendation to use an old version of Sphinx.
- A new document from Thorsten on bisection
... and lots of fixes and updates"
* tag 'docs-6.9' of git://git.lwn.net/linux: (54 commits)
docs: verify/bisect: fixes, finetuning, and support for Arch
docs: Makefile: Add dependency to $(YNL_INDEX) for targets other than htmldocs
docs: Move ja_JP/howto.rst to ja_JP/process/howto.rst
docs: submit-checklist: use subheadings
docs: submit-checklist: structure by category
docs: new text on bisecting which also covers bug validation
docs: drop the version constraints for sphinx and dependencies
docs: kerneldoc-preamble.sty: Remove code for Sphinx <2.4
docs: Restore "smart quotes" for quotes
docs/zh_CN: accurate translation of "function"
docs: Include simplified link titles in main index
docs: Correct formatting of title in admin-guide/index.rst
docs: kernel_feat.py: fix build error for missing files
MAINTAINERS: Set the field name for subsystem profile section
kasan: Add documentation for CONFIG_KASAN_EXTRA_INFO
Fixed case issue with 'fault-injection' in documentation
kernel-doc: handle #if in enums as well
Documentation: update mailing list addresses
doc: kerneldoc.py: fix indentation
scripts/kernel-doc: simplify signature printing
...
Linus Torvalds [Tue, 12 Mar 2024 22:10:51 +0000 (15:10 -0700)]
Merge tag 'audit-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Two small audit patches:
- Use the KMEM_CACHE() macro instead of kmem_cache_create()
The guidance appears to be to use the KMEM_CACHE() macro when
possible and there is no reason why we can't use the macro, so
let's use it.
- Remove an unnecessary assignment in audit_dupe_lsm_field()
A return value variable was assigned a value in its declaration,
but the declaration value is overwritten before the return value
variable is ever referenced; drop the assignment at declaration
time"
* tag 'audit-pr-20240312' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: use KMEM_CACHE() instead of kmem_cache_create()
audit: remove unnecessary assignment in audit_dupe_lsm_field()
Linus Torvalds [Tue, 12 Mar 2024 22:08:06 +0000 (15:08 -0700)]
Merge tag 'Smack-for-6.9' of https://github.com/cschaufler/smack-next
Pull smack updates from Casey Schaufler:
- Improvements to the initialization of in-memory inodes
- A fix in ramfs to propery ensure the initialization of in-memory
inodes
- Removal of duplicated code in smack_cred_transfer()
* tag 'Smack-for-6.9' of https://github.com/cschaufler/smack-next:
Smack: use init_task_smack() in smack_cred_transfer()
ramfs: Initialize security of in-memory inodes
smack: Initialize the in-memory inode in smack_inode_init_security()
smack: Always determine inode labels in smack_inode_init_security()
smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity()
smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr()
Linus Torvalds [Tue, 12 Mar 2024 22:05:27 +0000 (15:05 -0700)]
Merge tag 'seccomp-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
"There are no core kernel changes here; it's entirely selftests and
samples:
- Improve reliability of selftests (Terry Tritton, Kees Cook)
- Fix strict-aliasing warning in samples (Arnd Bergmann)"
* tag 'seccomp-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
samples: user-trap: fix strict-aliasing warning
selftests/seccomp: Pin benchmark to single CPU
selftests/seccomp: user_notification_addfd check nextfd is available
selftests/seccomp: Change the syscall used in KILL_THREAD test
selftests/seccomp: Handle EINVAL on unshare(CLONE_NEWPID)
Dave Jiang [Fri, 8 Mar 2024 21:59:31 +0000 (14:59 -0700)]
cxl/region: Deal with numa nodes not enumerated by SRAT
For the numa nodes that are not created by SRAT, no memory_target is
allocated and is not managed by the HMAT_REPORTING code. Therefore
hmat_callback() memory hotplug notifier will exit early on those NUMA
nodes. The CXL memory hotplug notifier will need to call
node_set_perf_attrs() directly in order to setup the access sysfs
attributes.
In acpi_numa_init(), the last proximity domain (pxm) id created by SRAT is
stored. Add a helper function acpi_node_backed_by_real_pxm() in order to
check if a NUMA node id is defined by SRAT or created by CFMWS.
node_set_perf_attrs() symbol is exported to allow update of perf attribs
for a node. The sysfs path of
/sys/devices/system/node/nodeX/access0/initiators/* is created by
node_set_perf_attrs() for the various attributes where nodeX is matched
to the NUMA node of the CXL region.
Linus Torvalds [Tue, 12 Mar 2024 21:49:30 +0000 (14:49 -0700)]
Merge tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"As is pretty normal for this tree, there are changes all over the
place, especially for small fixes, selftest improvements, and improved
macro usability.
Some header changes ended up landing via this tree as they depended on
the string header cleanups. Also, a notable set of changes is the work
for the reintroduction of the UBSAN signed integer overflow sanitizer
so that we can continue to make improvements on the compiler side to
make this sanitizer a more viable future security hardening option.
Summary:
- string.h and related header cleanups (Tanzir Hasan, Andy
Shevchenko)
- Add comments to explain how __is_constexpr() works
- Fix m68k stack alignment expectations in stackinit Kunit test
- Convert string selftests to KUnit
- Add KUnit tests for fortified string functions
- Improve reporting during fortified string warnings
- Allow non-type arg to type_max() and type_min()
- Allow strscpy() to be called with only 2 arguments
- Add binary mode to leaking_addresses scanner
- Various small cleanups to leaking_addresses scanner
- Adding wrapping_*() arithmetic helper
- Annotate initial signed integer wrap-around in refcount_t
- Add explicit UBSAN section to MAINTAINERS
- Fix UBSAN self-test warnings
- Simplify UBSAN build via removal of CONFIG_UBSAN_SANITIZE_ALL
- Reintroduce UBSAN's signed overflow sanitizer"
* tag 'hardening-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (51 commits)
selftests/powerpc: Fix load_unaligned_zeropad build failure
string: Convert helpers selftest to KUnit
string: Convert selftest to KUnit
sh: Fix build with CONFIG_UBSAN=y
compiler.h: Explain how __is_constexpr() works
overflow: Allow non-type arg to type_max() and type_min()
VMCI: Fix possible memcpy() run-time warning in vmci_datagram_invoke_guest_handler()
lib/string_helpers: Add flags param to string_get_size()
x86, relocs: Ignore relocations in .notes section
objtool: Fix UNWIND_HINT_{SAVE,RESTORE} across basic blocks
overflow: Use POD in check_shl_overflow()
lib: stackinit: Adjust target string to 8 bytes for m68k
sparc: vdso: Disable UBSAN instrumentation
kernel.h: Move lib/cmdline.c prototypes to string.h
leaking_addresses: Provide mechanism to scan binary files
leaking_addresses: Ignore input device status lines
leaking_addresses: Use File::Temp for /tmp files
MAINTAINERS: Update LEAKING_ADDRESSES details
fortify: Improve buffer overflow reporting
fortify: Add KUnit tests for runtime overflows
...
- Avoid potential double-free during pstorefs umount
* tag 'pstore-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/zone: Don't clear memory twice
pstore/zone: Add a null pointer check to the psz_kmsg_read
efi: pstore: Allow dynamic initialization based on module parameter
arm64: defconfig: Enable PSTORE_RAM
pstore/ram: Register to module device table
pstore: inode: Only d_invalidate() is needed
Linus Torvalds [Tue, 12 Mar 2024 21:27:37 +0000 (14:27 -0700)]
Merge tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"The bulk of the patches for this release are optimizations, code
clean-ups, and minor bug fixes.
One new feature to mention is that NFSD administrators now have the
ability to revoke NFSv4 open and lock state. NFSD's NFSv3 support has
had this capability for some time.
As always I am grateful to NFSD contributors, reviewers, and testers"
* tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (75 commits)
NFSD: Clean up nfsd4_encode_replay()
NFSD: send OP_CB_RECALL_ANY to clients when number of delegations reaches its limit
NFSD: Document nfsd_setattr() fill-attributes behavior
nfsd: Fix NFSv3 atomicity bugs in nfsd_setattr()
nfsd: Fix a regression in nfsd_setattr()
NFSD: OP_CB_RECALL_ANY should recall both read and write delegations
NFSD: handle GETATTR conflict with write delegation
NFSD: add support for CB_GETATTR callback
NFSD: Document the phases of CREATE_SESSION
NFSD: Fix the NFSv4.1 CREATE_SESSION operation
nfsd: clean up comments over nfs4_client definition
svcrdma: Add Write chunk WRs to the RPC's Send WR chain
svcrdma: Post WRs for Write chunks in svc_rdma_sendto()
svcrdma: Post the Reply chunk and Send WR together
svcrdma: Move write_info for Reply chunks into struct svc_rdma_send_ctxt
svcrdma: Post Send WR chain
svcrdma: Fix retry loop in svc_rdma_send()
svcrdma: Prevent a UAF in svc_rdma_send()
svcrdma: Fix SQ wake-ups
svcrdma: Increase the per-transport rw_ctx count
...
Linus Torvalds [Tue, 12 Mar 2024 20:25:53 +0000 (13:25 -0700)]
Merge tag 'erofs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, we introduce compressed inode support over fscache
since a lot of native EROFS images are explicitly compressed so that
EROFS over fscache can be more widely used even without Dragonfly
Nydus [1].
Apart from that, there are some folio conversions for compressed
inodes available as well as a lockdep false positive fix.
Summary:
- Some folio conversions for compressed inodes;
- Add compressed inode support over fscache;
- Fix lockdep false positives of erofs_pseudo_mnt"
Link: https://nydus.dev
* tag 'erofs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: support compressed inodes over fscache
erofs: make iov_iter describe target buffers over fscache
erofs: fix lockdep false positives on initializing erofs_pseudo_mnt
erofs: refine managed cache operations to folios
erofs: convert z_erofs_submissionqueue_endio() to folios
erofs: convert z_erofs_fill_bio_vec() to folios
erofs: get rid of `justfound` debugging tag
erofs: convert z_erofs_do_read_page() to folios
erofs: convert z_erofs_onlinepage_.* to folios
Linus Torvalds [Tue, 12 Mar 2024 20:17:36 +0000 (13:17 -0700)]
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux
Pull fscrypt updates from Eric Biggers:
"Fix flakiness in a test by releasing the quota synchronously when a
key is removed, and other minor cleanups"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: shrink the size of struct fscrypt_inode_info slightly
fscrypt: write CBC-CTS instead of CTS-CBC
fscrypt: clear keyring before calling key_put()
fscrypt: explicitly require that inode->i_blkbits be set
mul_u64_u64_div_u64: increase precision by conditionally swapping a and b
As indicated in the added comment, the algorithm works better if b is big.
As multiplication is commutative, a and b can be swapped. Do this if a
is bigger than b.
Nico Pache [Wed, 6 Mar 2024 22:37:14 +0000 (15:37 -0700)]
selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
Now that run_vmtests.sh does not guarantee that the correct hugepage count
is available, skip the hugetlb-madvise test if the requirements are not
met rather than failing.
Nico Pache [Wed, 6 Mar 2024 22:37:13 +0000 (15:37 -0700)]
selftests/mm: skip uffd hugetlb tests with insufficient hugepages
Now that run_vmtests.sh does not guarantee that the correct hugepage count
is available, add a check inside the userfaultfd hugetlb test to verify
the nr_hugepages count before continuing.
Nico Pache [Wed, 6 Mar 2024 22:37:12 +0000 (15:37 -0700)]
selftests/mm: dont fail testsuite due to a lack of hugepages
Patch series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests", v2.
This series addresses issues related to hugepage requirements in the MM
selftests, ensuring tests are skipped rather than failing when the
necessary hugepage count is not met.
This adjustment allows for a more graceful handling for systems with
insufficient hugepages, preventing unnecessary test failures and improving
the overall robustness of the test suite.
This patch (of 3):
On systems that have large core counts and large page sizes, but limited
memory, the userfaultfd test hugepage requirement is too large.
Exiting early due to missing one test's requirements is a rather
aggressive strategy, and prevents a lot of other tests from running.
Remove the early exit to prevent this.
Zi Yan [Thu, 7 Mar 2024 18:18:54 +0000 (13:18 -0500)]
mm/huge_memory: skip invalid debugfs new_order input for folio split
User can put arbitrary new_order via debugfs for folio split test.
Although new_order check is added to split_huge_page_to_list_order() in
the prior commit, these two additional checks can avoid unnecessary folio
locking and split_folio_to_order() calls.
Byungchul Park [Mon, 4 Mar 2024 06:27:37 +0000 (15:27 +0900)]
mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
With cache_trim_mode on, reclaim logic doesn't bother reclaiming anon
pages. However, it should be more careful to use the mode because it's
going to prevent anon pages from being reclaimed even if there are a huge
number of anon pages that are cold and should be reclaimed. Even worse,
that leads kswapd_failures to reach MAX_RECLAIM_RETRIES and stopping
kswapd from functioning until direct reclaim eventually works to resume
kswapd.
So kswapd needs to retry its scan priority loop with cache_trim_mode off
again if the mode doesn't work for reclaim.
The problematic behavior can be reproduced by:
CONFIG_NUMA_BALANCING enabled
sysctl_numa_balancing_mode set to NUMA_BALANCING_MEMORY_TIERING
numa node0 (8GB local memory, 16 CPUs)
numa node1 (8GB slow tier memory, no CPUs)
Sequence:
1) echo 3 > /proc/sys/vm/drop_caches
2) To emulate the system with full of cold memory in local DRAM, run
the following dummy program and never touch the region:
3) Run any memory intensive work e.g. XSBench.
4) Check if numa balancing is working e.i. promotion/demotion.
5) Iterate 1) ~ 4) until numa balancing stops.
With this, you could see that promotion/demotion are not working because
kswapd has stopped due to ->kswapd_failures >= MAX_RECLAIM_RETRIES.
Interesting vmstat delta's differences between before and after are like:
James Houghton [Thu, 7 Mar 2024 01:02:50 +0000 (01:02 +0000)]
mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
Users of UFFDIO_CONTINUE may reasonably assume that a write memory barrier
is included as part of UFFDIO_CONTINUE. That is, a user may believe that
all writes it has done to a page that it is now UFFDIO_CONTINUE'ing are
guaranteed to be visible to anyone subsequently reading the page through
the newly mapped virtual memory region.
Today, such a user happens to be correct. mmget_not_zero(), for example,
is called as part of UFFDIO_CONTINUE (and comes before any PTE updates),
and it implicitly gives us a write barrier.
To be resilient against future changes, include an explicit smp_wmb().
While we're at it, optimize the smp_wmb() that is already incidentally
present for the HugeTLB case.
Merely making a syscall does not generally imply the memory ordering
constraints that we need (including on x86).
My recent change to put_pages_list() dereferences folio->lru.next after
returning the folio to the page allocator. Usually this is now on the pcp
list with other free folios, so we try to free an already-free folio.
This only happens with lists that have more than 15 entries, so it wasn't
immediately discovered. Revert to using list_for_each_safe() so we
dereference lru.next before disposing of the folio.
Link: https://lkml.kernel.org/r/[email protected] Fixes: 24835f899c01 ("mm: use free_unref_folios() in put_pages_list()") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reported-by: "Borah, Chaitanya Kumar" <[email protected]> Closes: https://lore.kernel.org/intel-gfx/SJ1PR11MB61292145F3B79DA58ADDDA63B9232@SJ1PR11MB6129.namprd11.prod.outlook.com/ Signed-off-by: Andrew Morton <[email protected]>
mm: remove folio from deferred split list before uncharging it
When freeing a large folio, we must remove it from the deferred split list
before we uncharge it as each memcg has its own deferred split list (with
associated lock) and removing a folio from the deferred split list while
holding the wrong lock will corrupt that list and cause various related
problems.
Dave Jiang [Fri, 8 Mar 2024 21:59:30 +0000 (14:59 -0700)]
cxl/region: Add memory hotplug notifier for cxl region
When the CXL region is formed, the driver computes the performance data
for the region. However this data is not available at the node data
collection that has been populated by the HMAT during kernel
initialization. Add a memory hotplug notifier to update the access
coordinates to the 'struct memory_target' context kept by the
HMAT_REPORTING code.
Add CXL_CALLBACK_PRI for a memory hotplug callback priority. Set the
priority number to be called before HMAT_CALLBACK_PRI. The CXL update must
happen before hmat_callback().
A new HMAT_REPORTING helper hmat_update_target_coordinates() is added in
order to allow CXL to update the memory_target access coordinates.
A new ext_updated member is added to the memory_target to indicate that
the access coordinates within the memory_target has been updated by an
external agent such as CXL. This prevents data being overwritten by the
hmat_update_target_attrs() triggered by hmat_callback().
Dave Jiang [Fri, 8 Mar 2024 21:59:29 +0000 (14:59 -0700)]
cxl/region: Add sysfs attribute for locality attributes of CXL regions
Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
region. The bandwidth is the aggregated bandwidth of all devices that
contribute to the CXL region. The latency is the worst latency of the
device amongst all the devices that contribute to the CXL region.
Dave Jiang [Fri, 8 Mar 2024 21:59:28 +0000 (14:59 -0700)]
cxl/region: Calculate performance data for a region
Calculate and store the performance data for a CXL region. Find the worst
read and write latency for all the included ranges from each of the devices
that attributes to the region and designate that as the latency data. Sum
all the read and write bandwidth data for each of the device region and
that is the total bandwidth for the region.
The perf list is expected to be constructed before the endpoint decoders
are registered and thus there should be no early reading of the entries
from the region assemble action. The calling of the region qos calculate
function is under the protection of cxl_dpa_rwsem and will ensure that
all DPA associated work has completed.
Dave Jiang [Fri, 8 Mar 2024 21:59:27 +0000 (14:59 -0700)]
cxl: Set cxlmd->endpoint before adding port device
Move setting of cxlmd->endpoint to before calling add_device() on the port
device. Otherwise when referencing cxlmd->endpoint in region discovery code
that is triggered by the port driver probe function, the endpoint port
pointer is not valid.
Current code does not hit this issue yet since cxlmd->endpoint is not being
referenced during region discovery. However follow on code that does
performance calculations will.
Dave Jiang [Fri, 8 Mar 2024 21:59:26 +0000 (14:59 -0700)]
cxl: Move QoS class to be calculated from the nearest CPU
Retrieve the qos_class (QTG ID) using the access coordinates from the
nearest CPU rather than the nearst initiator that may not be a CPU.
This may be the more appropriate number that applications care about.
For most cases, access0 and access1 have the same values.
Dave Jiang [Fri, 8 Mar 2024 21:59:25 +0000 (14:59 -0700)]
cxl: Split out host bridge access coordinates
The difference between access class 0 and access class 1 for 'struct
access_coordinate', if any, is that class 0 is for the distance from
the target to the closest initiator and that class 1 is for the distance
from the target to the closest CPU. For CXL memory, the nearest initiator
may not necessarily be a CPU node. The performance path from the CXL
endpoint to the host bridge should remain the same. However, the numbers
extracted and stored from HMAT is the difference for the two access
classes. Split out the performance numbers for the host bridge (generic
target) from the calculation of the entire path in order to allow
calculation of both access classes for a CXL region.
Dave Jiang [Fri, 8 Mar 2024 21:59:24 +0000 (14:59 -0700)]
cxl: Split out combine_coordinates() for common shared usage
Refactor the common code of combining coordinates in order to reduce code.
Create a new function cxl_cooordinates_combine() it combine two 'struct
access_coordinate'.
Dave Jiang [Fri, 8 Mar 2024 21:59:23 +0000 (14:59 -0700)]
ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access classes
Update acpi_get_genport_coordinates() to allow retrieval of both access
classes of the 'struct access_coordinate' for a generic target. The update
will allow CXL code to compute access coordinates for both access class.
Dave Jiang [Fri, 8 Mar 2024 21:59:22 +0000 (14:59 -0700)]
ACPI: HMAT: Introduce 2 levels of generic port access class
In order to compute access0 and access1 classes for CXL memory, 2 levels
of generic port information must be stored. Access0 will indicate the
generic port access coordinates to the closest initiator and access1
will indicate the generic port access coordinates to the cloest CPU.
Dave Jiang [Fri, 8 Mar 2024 21:59:21 +0000 (14:59 -0700)]
base/node / ACPI: Enumerate node access class for 'struct access_coordinate'
Both generic node and HMAT handling code have been using magic numbers to
indicate access classes for 'struct access_coordinate'. Introduce enums to
enumerate the access0 and access1 classes shared by the two subsystems.
Update the function parameters and callers as appropriate to utilize the
new enum.
Access0 is named to ACCESS_COORDINATE_LOCAL in order to indicate that the
access class is for 'struct access_coordinate' between a target node and
the nearest initiator node.
Access1 is named to ACCESS_COORDINATE_CPU in order to indicate that the
access class is for 'struct access_coordinate' between a target node and
the nearest CPU node.
Dave Jiang [Fri, 8 Mar 2024 21:59:20 +0000 (14:59 -0700)]
ACPI: HMAT: Remove register of memory node for generic target
For generic targets, there's no reason to call
register_memory_node_under_compute_node() with the access levels that are
only visible to HMAT handling code. Only update the attributes and rename
hmat_register_generic_target_initiators() to hmat_update_generic_target().
The original call path ends up triggering register_memory_node_under_compute_node().
Although the access level would be "3" and not impact any current node arrays, it
introduces unwanted data into the numa node access_coordinate array.
Linus Torvalds [Tue, 12 Mar 2024 19:28:34 +0000 (12:28 -0700)]
Merge tag 'for-6.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Mostly stabilization, refactoring and cleanup changes. There rest are
minor performance optimizations due to caching or lock contention
reduction and a few notable fixes.
Performance improvements:
- minor speedup in logging when repeatedly allocated structure is
preallocated only once, improves latency and decreases lock
contention
- minor throughput increase (+6%), reduced lock contention after
clearing delayed allocation bits, applies to several common
workload types
- skip full quota rescan if a new relation is added in the same
transaction
Fixes:
- zstd fix for inline compressed file in subpage mode, updated
version from the 6.8 time
- more fiemap followup fixes after reduced locking done in 6.8:
- fix race when detecting delalloc ranges
Core changes:
- more debugging code:
- added assertions for a very rare crash in raid56 calculation
- tree-checker dumps page state to give more insights into
possible reference counting issues
- add checksum calculation offloading sysfs knob, for now enabled
under DEBUG only to determine a good heuristic for deciding the
offload or synchronous, depends on various factors (block group
profile, device speed) and is not as clear as initially thought
(checksum type)
- error handling improvements, added assertions
- more page to folio conversion (defrag, truncate), cached size and
shift
- preparation for more fine grained locking of sectors in subpage
mode
- cleanups and refactoring:
- include cleanups, forward declarations
- pointer-to-structure helpers
- redundant argument removals
- removed unused code
- slab cache updates, last use of SLAB_MEM_SPREAD removed"
* tag 'for-6.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (114 commits)
btrfs: reuse cloned extent buffer during fiemap to avoid re-allocations
btrfs: fix race when detecting delalloc ranges during fiemap
btrfs: fix off-by-one chunk length calculation at contains_pending_extent()
btrfs: qgroup: allow quick inherit if snapshot is created and added to the same parent
btrfs: qgroup: validate btrfs_qgroup_inherit parameter
btrfs: include device major and minor numbers in the device scan notice
btrfs: mark btrfs_put_caching_control() static
btrfs: remove SLAB_MEM_SPREAD flag use
btrfs: qgroup: always free reserved space for extent records
btrfs: tree-checker: dump the page status if hit something wrong
btrfs: compression: remove dead comments in btrfs_compress_heuristic()
btrfs: subpage: make writer lock utilize bitmap
btrfs: subpage: make reader lock utilize bitmap
btrfs: unexport btrfs_subpage_start_writer() and btrfs_subpage_end_and_test_writer()
btrfs: pass a valid extent map cache pointer to __get_extent_map()
btrfs: merge btrfs_del_delalloc_inode() helpers
btrfs: pass btrfs_device to btrfs_scratch_superblocks()
btrfs: handle transaction commit errors in flush_reservations()
btrfs: use KMEM_CACHE() to create btrfs_free_space cache
btrfs: use KMEM_CACHE() to create delayed ref caches
...