]> Git Repo - linux.git/log
linux.git
6 months agodrm/amdgpu/gfx11: fallback to driver reset compute queue directly (v2)
Prike Liang [Fri, 14 Jun 2024 13:25:44 +0000 (21:25 +0800)]
drm/amdgpu/gfx11: fallback to driver reset compute queue directly (v2)

Since the MES FW resets kernel compute queue always failed, this
may caused by the KIQ failed to process unmap KCQ. So, before MES
FW work properly that will fallback to driver executes dequeue and
resets SPI directly. Besides, rework the ring reset function and make
the busy ring type reset in each function respectively.

Acked-by: Vitaly Prosyak <[email protected]>
Signed-off-by: Prike Liang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agodrm/amd/display: 3.2.299
Aric Cyr [Sun, 25 Aug 2024 23:40:51 +0000 (19:40 -0400)]
drm/amd/display: 3.2.299

This version brings along the following:

- DCN35 fixes
- DML2 fixes
- IPS fixes
- ODM fixes
- Miscellaneous cleanups
- MST fixes
- SPL fixes

Acked-by: Aurabindo Pillai <[email protected]>
Signed-off-by: Aric Cyr <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agodrm/amd/display: Fix flickering caused by dccg
Hansen Dsouza [Wed, 14 Aug 2024 15:20:08 +0000 (11:20 -0400)]
drm/amd/display: Fix flickering caused by dccg

Always allow un-gating. Follow legacy workaround for repeated
dppclk dto updates

Reviewed-by: Muhammad Ahmed <[email protected]>
Signed-off-by: Hansen Dsouza <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agodrm/amd/display: Block timing sync for different signals in PMO
Dillon Varone [Thu, 22 Aug 2024 21:52:57 +0000 (17:52 -0400)]
drm/amd/display: Block timing sync for different signals in PMO

PMO assumes that like timings can be synchronized, but DC only allows
this if the signal types match.

Reviewed-by: Austin Zheng <[email protected]>
Signed-off-by: Dillon Varone <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agodrm/amd/display: fix graphics hang in multi-display mst case
Gabe Teeger [Fri, 23 Aug 2024 13:50:22 +0000 (09:50 -0400)]
drm/amd/display: fix graphics hang in multi-display mst case

[what]
Graphics hang observed with 3 displays connected to DP2.0 mst dock.

[why]
There's a mismatch in dml and dc between the assignments of hpo link
encoders.

[how]
Add a new array in dml that tracks the current mapping of HPO stream
encoders to HPO link encoders in dc.

Reviewed-by: Sung joon Kim <[email protected]>
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Signed-off-by: Gabe Teeger <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agodrm/amd/display: Add sharpness control interface
Relja Vojvodic [Wed, 21 Aug 2024 13:34:21 +0000 (09:34 -0400)]
drm/amd/display: Add sharpness control interface

- Add interface for controlling shapness level input into DCN.
- Update SPL to support custom sharpness values.
- Add support for different sharpness values depending on YUV/RGB
  content.

Reviewed-by: Samson Tam <[email protected]>
Signed-off-by: Relja Vojvodic <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agoRevert "drm/amd/display: Wait for all pending cleared before full update"
Dillon Varone [Tue, 20 Aug 2024 19:13:14 +0000 (15:13 -0400)]
Revert "drm/amd/display: Wait for all pending cleared before full update"

This reverts commit f0b7dcf25834afd17df316367dfe5d4c890c713c.

It is causing graphics hangs.

Reviewed-by: Martin Leung <[email protected]>
Signed-off-by: Dillon Varone <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agodrm/amd/display: disable sharpness if HDR Multiplier is too large
Samson Tam [Thu, 22 Aug 2024 00:17:23 +0000 (20:17 -0400)]
drm/amd/display: disable sharpness if HDR Multiplier is too large

[Why]
Certain profiles have higher HDR multiplier than SDR boost max which
is not currently supported

[How]
Disable sharpness for these profiles

Fixes: 1b0ce903fe74 ("drm/amd/display: add improvements for text display and HDR DWM and MPO")
Reviewed-by: Martin Leung <[email protected]>
Signed-off-by: Samson Tam <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agodrm/amd/display: Add dpia debug option to control power management
Meenakshikumar Somasundaram [Tue, 20 Aug 2024 17:15:38 +0000 (13:15 -0400)]
drm/amd/display: Add dpia debug option to control power management

[Why]
To provide option to dpia control power management

[How]
By adding disable_usb4_pm_support bit field in dpia_debug option to
control dpia power management

Reviewed-by: Jun Lei <[email protected]>
Signed-off-by: Meenakshikumar Somasundaram <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agodrm/amdgpu/gfx11: add ring reset callbacks
Alex Deucher [Fri, 24 May 2024 16:20:10 +0000 (12:20 -0400)]
drm/amdgpu/gfx11: add ring reset callbacks

Add ring reset callbacks for gfx and compute.

Acked-by: Vitaly Prosyak <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agodrm/amd/display: re-enable Dynamic ODM policy
Samson Tam [Wed, 21 Aug 2024 15:03:11 +0000 (11:03 -0400)]
drm/amd/display: re-enable Dynamic ODM policy

[Why]
Previous disable ODM policy due to underflow issue with sharpener.
Issue is resolved after updating sharpening policy to apply to
both windowed and fullscreen video

[How]
Remove sharpness check disabling Dynamic ODM policy

Reviewed-by: Martin Leung <[email protected]>
Signed-off-by: Samson Tam <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agodrm/amd/display: Lock DC and exit IPS when changing backlight
Leo Li [Tue, 20 Aug 2024 18:34:15 +0000 (14:34 -0400)]
drm/amd/display: Lock DC and exit IPS when changing backlight

Backlight updates require aux and/or register access. Therefore, driver
needs to disallow IPS beforehand.

So, acquire the dc lock before calling into dc to update backlight - we
should be doing this regardless of IPS. Then, while the lock is held,
disallow IPS before calling into dc, then allow IPS afterwards (if it
was previously allowed).

Reviewed-by: Aurabindo Pillai <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Leo Li <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agodrm/amd/display: only trigger BIOS related assert for older ASICs
Daniel Sa [Tue, 20 Aug 2024 18:19:26 +0000 (14:19 -0400)]
drm/amd/display: only trigger BIOS related assert for older ASICs

[Why]
Some asserts are always hit on startup/Pnp when they should only be used
to indicate when something has gone wrong.

[How]
Ignore result of getting function from bios cmd table for newer asics.

Reviewed-by: Jun Lei <[email protected]>
Signed-off-by: Daniel Sa <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agodrm/amd/display: Fix DCN35 set min dispclk logic
Nicholas Susanto [Tue, 20 Aug 2024 15:05:54 +0000 (11:05 -0400)]
drm/amd/display: Fix DCN35 set min dispclk logic

[Why]

Setting min dispclk to 50Mhz outside clock lowering function causes
unnecessary calls to SMU to lower dispclk and causes dentist hangs when
there is no stream on the pipes.

[How]

Move the set minimum dispclk logic inside the lowering dispclk if
statement.

Fixes: 234441320552 ("DCN35 set min dispclk to 50Mhz")
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Signed-off-by: Nicholas Susanto <[email protected]>
Signed-off-by: Hamza Mahfooz <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agodrm/amdgpu/gfx9.4.3: Implement compute pipe reset
Prike Liang [Thu, 29 Aug 2024 03:47:12 +0000 (11:47 +0800)]
drm/amdgpu/gfx9.4.3: Implement compute pipe reset

Implement the compute pipe reset, and the driver will
fallback to pipe reset when queue reset fails.
The pipe reset only deactivates the queue which is
scheduled in the pipe, and meanwhile the MEC pipe
will be reset to the firmware _start pointer. So,
it seems pipe reset will cost more cycles than the
queue reset; therefore, the driver tries to recover
by doing queue reset first.

Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Prike Liang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
6 months agoKVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM
Tom Dohrmann [Mon, 2 Sep 2024 14:42:19 +0000 (14:42 +0000)]
KVM: x86: Only advertise KVM_CAP_READONLY_MEM when supported by VM

Until recently, KVM_CAP_READONLY_MEM was unconditionally supported on
x86, but this is no longer the case for SEV-ES and SEV-SNP VMs.

When KVM_CHECK_EXTENSION is invoked on a VM, only advertise
KVM_CAP_READONLY_MEM when it's actually supported.

Fixes: 66155de93bcf ("KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)")
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Michael Roth <[email protected]>
Signed-off-by: Tom Dohrmann <[email protected]>
Message-ID: <20240902144219.3716974[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
6 months agoMerge tag 'kvm-x86-fixes-6.11-rcN' of https://github.com/kvm-x86/linux into kvm-master
Paolo Bonzini [Mon, 2 Sep 2024 14:55:27 +0000 (10:55 -0400)]
Merge tag 'kvm-x86-fixes-6.11-rcN' of https://github.com/kvm-x86/linux into kvm-master

KVM x86 fixes for 6.11

 - Fixup missed comments from the REMOVED_SPTE=>FROZEN_SPTE rename.

 - Ensure a root is successfully loaded when pre-faulting SPTEs.

 - Grab kvm->srcu when handling KVM_SET_VCPU_EVENTS to guard against accessing
   memslots if toggling SMM happens to force a VM-Exit.

 - Emulate MSR_{FS,GS}_BASE on SVM even though interception is always disabled,
   so that KVM does the right thing if KVM's emulator encounters {RD,WR}MSR.

 - Explicitly clear BUS_LOCK_DETECT from KVM's caps on AMD, as KVM doesn't yet
   virtualize BUS_LOCK_DETECT on AMD.

 - Cleanup the help message for CONFIG_KVM_AMD_SEV, and call out that KVM now
   supports SEV-SNP too.

6 months agohwmon: (hp-wmi-sensors) Check if WMI event data exists
Armin Wolf [Sun, 1 Sep 2024 03:10:51 +0000 (05:10 +0200)]
hwmon: (hp-wmi-sensors) Check if WMI event data exists

The BIOS can choose to return no event data in response to a
WMI event, so the ACPI object passed to the WMI notify handler
can be NULL.

Check for such a situation and ignore the event in such a case.

Fixes: 23902f98f8d4 ("hwmon: add HP WMI Sensors driver")
Signed-off-by: Armin Wolf <[email protected]>
Reviewed-by: Ilpo Järvinen <[email protected]>
Message-ID: <20240901031055[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
6 months agogpio: modepin: Enable module autoloading
Liao Chen [Mon, 2 Sep 2024 11:58:48 +0000 (11:58 +0000)]
gpio: modepin: Enable module autoloading

Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based
on the alias from of_device_id table.

Fixes: 7687a5b0ee93 ("gpio: modepin: Add driver support for modepin GPIO controller")
Signed-off-by: Liao Chen <[email protected]>
Reviewed-by: Michal Simek <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
6 months agoigc: Unlock on error in igc_io_resume()
Dan Carpenter [Thu, 29 Aug 2024 19:22:45 +0000 (22:22 +0300)]
igc: Unlock on error in igc_io_resume()

Call rtnl_unlock() on this error path, before returning.

Fixes: bc23aa949aeb ("igc: Add pcie error handler support")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Gerhard Engleder <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 months agogpio: rockchip: fix OF node leak in probe()
Krzysztof Kozlowski [Mon, 26 Aug 2024 15:08:32 +0000 (17:08 +0200)]
gpio: rockchip: fix OF node leak in probe()

Driver code is leaking OF node reference from of_get_parent() in
probe().

Fixes: 936ee2675eee ("gpio/rockchip: add driver for rockchip gpio")
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Reviewed-by: Heiko Stuebner <[email protected]>
Reviewed-by: Shawn Lin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
6 months agodrm/xe/display: remove unused compat kdev_to_i915() and pdev_to_i915()
Jani Nikula [Thu, 29 Aug 2024 14:47:48 +0000 (17:47 +0300)]
drm/xe/display: remove unused compat kdev_to_i915() and pdev_to_i915()

The display code no longer uses kdev_to_i915() or pdev_to_i915()
helpers. Remove them.

Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/05b948f9012fc7c0b97d567c70b0bac8791d554a.1724942754.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <[email protected]>
6 months agodrm/i915/hdcp: migrate away from kdev_to_i915() in GSC messaging
Jani Nikula [Thu, 29 Aug 2024 14:47:47 +0000 (17:47 +0300)]
drm/i915/hdcp: migrate away from kdev_to_i915() in GSC messaging

Use to_intel_display() instead of kdev_to_i915() in the HDCP component
API hooks. Avoid further drive-by changes at this point, and just
convert the display pointer to i915, and leave the struct intel_display
conversion for later.

The NULL error checking in the hooks make this a bit cumbersome. I'm not
actually sure they're really required, but don't go down that rabbit
hole just now.

Cc: Gustavo Sousa <[email protected]>
Reviewed-by: Ankit Nautiyal <[email protected]>
Reviewed-by: Suraj Kandpal <[email protected]>
Reviewed-by: Arun R Murthy <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/efd5c4c164c01b7ee50ad43f202b074b373fb810.1724942754.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <[email protected]>
6 months agodrm/xe: Fix merge fails related to display runtime PM
Maarten Lankhorst [Mon, 2 Sep 2024 11:20:02 +0000 (13:20 +0200)]
drm/xe: Fix merge fails related to display runtime PM

The most recent merge commits introduced some fails to drm/drm-next,
I've noticed these when looking at the xe patches.

Solve it!

Fixes: 8bdb468dd7a5 ("Merge tag 'drm-xe-next-2024-08-28' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next")
Signed-off-by: Maarten Lankhorst <[email protected]>
[sima: add fixes line, and drop 3rd hunk because that's just a bugfix,
not mismerge, which should go in seperately with proper fixes line and
review/testing.]
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/i915/hdcp: migrate away from kdev_to_i915() in bind/unbind
Jani Nikula [Thu, 29 Aug 2024 14:47:46 +0000 (17:47 +0300)]
drm/i915/hdcp: migrate away from kdev_to_i915() in bind/unbind

Use to_intel_display() instead of kdev_to_i915() in the HDCP component
API hooks. Avoid further drive-by changes at this point, and just
convert the display pointer to i915, and leave the struct intel_display
conversion for later.

Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/0beedaa438e912828b48d9980f017807e079d7ab.1724942754.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <[email protected]>
6 months agodrm/i915/audio: migrate away from kdev_to_i915()
Jani Nikula [Thu, 29 Aug 2024 14:47:45 +0000 (17:47 +0300)]
drm/i915/audio: migrate away from kdev_to_i915()

Use to_intel_display() instead of kdev_to_i915() in the audio component
API hooks. Avoid further drive-by changes at this point, and just
convert the display pointer to i915, and leave the struct intel_display
conversion for later.

Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/35ef00470db0088eb82b0406e4f7730154f54baf.1724942754.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <[email protected]>
6 months agodrm/i915: support struct device and pci_dev in to_intel_display()
Jani Nikula [Thu, 29 Aug 2024 14:47:44 +0000 (17:47 +0300)]
drm/i915: support struct device and pci_dev in to_intel_display()

Now that both xe and i915 store struct drm_device in drvdata, we can
trivially support struct device and struct pci_dev in
to_intel_display().

We do need to check for NULL drvdata before converting it into struct
intel_device pointer, though. Do it in __drm_device_to_intel_display().

v2: Add NULL check in __drm_device_to_intel_display() (Gustavo)

Reviewed-by: Gustavo Sousa <[email protected]> # v1
Link: https://patchwork.freedesktop.org/patch/msgid/f025a3fa4422725c78baac4501ad3ecc9e5b40d5.1724942754.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <[email protected]>
6 months agodrm/i915 & drm/xe: save struct drm_device to drvdata
Jani Nikula [Thu, 29 Aug 2024 14:47:43 +0000 (17:47 +0300)]
drm/i915 & drm/xe: save struct drm_device to drvdata

In the future, the display code shall not have any idea about struct
xe_device or struct drm_i915_private, but will need to get at the struct
drm_device via drvdata. Store the struct drm_device pointer to drvdata
instead of the driver specific pointer.

Avoid passing NULL to container_of() via to_i915()/to_xe_device(). (It
does return NULL for NULL pointers when the offset happens to be 0, but
otherwise returns garbage pointers for NULL.)

Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/946805b32e38d4785880cc7857e01e6a309126a9.1724942754.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <[email protected]>
6 months agospi: bcm63xx: Enable module autoloading
Liao Chen [Sat, 31 Aug 2024 09:42:31 +0000 (09:42 +0000)]
spi: bcm63xx: Enable module autoloading

Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based
on the alias from of_device_id table.

Signed-off-by: Liao Chen <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
6 months agodrm/i915/fence: Mark debug_fence_free() with __maybe_unused
Andy Shevchenko [Thu, 29 Aug 2024 15:58:38 +0000 (18:58 +0300)]
drm/i915/fence: Mark debug_fence_free() with __maybe_unused

When debug_fence_free() is unused
(CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS=n), it prevents kernel builds
with clang, `make W=1` and CONFIG_WERROR=y:

.../i915_sw_fence.c:118:20: error: unused function 'debug_fence_free' [-Werror,-Wunused-function]
  118 | static inline void debug_fence_free(struct i915_sw_fence *fence)
      |                    ^~~~~~~~~~~~~~~~

Fix this by marking debug_fence_free() with __maybe_unused.

See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").

Fixes: fc1584059d6c ("drm/i915: Integrate i915_sw_fence with debugobjects")
Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Jani Nikula <[email protected]>
(cherry picked from commit 8be4dce5ea6f2368cc25edc71989c4690fa66964)
Signed-off-by: Joonas Lahtinen <[email protected]>
6 months agodrm/i915/fence: Mark debug_fence_init_onstack() with __maybe_unused
Andy Shevchenko [Thu, 29 Aug 2024 15:58:37 +0000 (18:58 +0300)]
drm/i915/fence: Mark debug_fence_init_onstack() with __maybe_unused

When debug_fence_init_onstack() is unused (CONFIG_DRM_I915_SELFTEST=n),
it prevents kernel builds with clang, `make W=1` and CONFIG_WERROR=y:

.../i915_sw_fence.c:97:20: error: unused function 'debug_fence_init_onstack' [-Werror,-Wunused-function]
   97 | static inline void debug_fence_init_onstack(struct i915_sw_fence *fence)
      |                    ^~~~~~~~~~~~~~~~~~~~~~~~

Fix this by marking debug_fence_init_onstack() with __maybe_unused.

See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").

Fixes: 214707fc2ce0 ("drm/i915/selftests: Wrap a timer into a i915_sw_fence")
Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Jani Nikula <[email protected]>
(cherry picked from commit 5bf472058ffb43baf6a4cdfe1d7f58c4c194c688)
Signed-off-by: Joonas Lahtinen <[email protected]>
6 months agodrm/i915: Fix readout degamma_lut mismatch on ilk/snb
Ville Syrjälä [Wed, 10 Jul 2024 12:41:37 +0000 (15:41 +0300)]
drm/i915: Fix readout degamma_lut mismatch on ilk/snb

On ilk/snb the pipe may be configured to place the LUT before or
after the CSC depending on various factors, but as there is only
one LUT (no split mode like on IVB+) we only advertise a gamma_lut
and no degamma_lut in the uapi to avoid confusing userspace.

This can cause a problem during readout if the VBIOS/GOP enabled
the LUT in the pre CSC configuration. The current code blindly
assigns the results of the readout to the degamma_lut, which will
cause a failure during the next atomic_check() as we aren't expecting
anything to be in degamma_lut since it's not visible to userspace.

Fix the problem by assigning whatever LUT we read out from the
hardware into gamma_lut.

Cc: [email protected]
Fixes: d2559299d339 ("drm/i915: Make ilk_read_luts() capable of degamma readout")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11608
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Uma Shankar <[email protected]>
(cherry picked from commit 33eca84db6e31091cef63584158ab64704f78462)
Signed-off-by: Joonas Lahtinen <[email protected]>
6 months agodrm/i915: Do not attempt to load the GSC multiple times
Daniele Ceraolo Spurio [Tue, 20 Aug 2024 21:59:52 +0000 (14:59 -0700)]
drm/i915: Do not attempt to load the GSC multiple times

If the GSC FW fails to load the GSC HW hangs permanently; the only ways
to recover it are FLR or D3cold entry, with the former only being
supported on driver unload and the latter only on DGFX, for which we
don't need to load the GSC. Therefore, if GSC fails to load there is no
need to try again because the HW is stuck in the error state and the
submission to load the FW would just hang the GSCCS.

Note that, due to wa_14015076503, on MTL the GuC escalates all GSCCS
hangs to full GT resets, which would trigger a new attempt to load the
GSC FW in the post-reset HW re-init; this issue is also fixed by not
attempting to load the GSC FW after an error.

Fixes: 15bd4a67e914 ("drm/i915/gsc: GSC firmware loading")
Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Cc: Daniele Ceraolo Spurio <[email protected]>
Cc: Alan Previn <[email protected]>
Cc: John Harrison <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: <[email protected]> # v6.3+
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 03ded4d432a1fb7bb6c44c5856d14115f6f6c3b9)
Signed-off-by: Joonas Lahtinen <[email protected]>
6 months agoMerge tag 'timers-v6.11-rc7' of https://git.linaro.org/people/daniel.lezcano/linux...
Thomas Gleixner [Mon, 2 Sep 2024 09:56:59 +0000 (11:56 +0200)]
Merge tag 'timers-v6.11-rc7' of https://git.linaro.org/people/daniel.lezcano/linux into timers/urgent

Pull clocksource driver fixes from Daniel Lezcano:

  - Remove percpu irq related code in the timer-of initialization
    routine as it is broken but also unused (Daniel Lezcano)

  - Fix return -ETIME when delta exceeds INT_MAX and the next event not
    taking effect sometimes (Jacky Bai)

Link: https://lore.kernel.org/all/[email protected]
6 months agodrm/panel: ili9341: Remove duplicate code
Andy Shevchenko [Tue, 13 Aug 2024 09:12:58 +0000 (12:12 +0300)]
drm/panel: ili9341: Remove duplicate code

Remove duplicate code that is handled by tinyDRM,
i.e. drivers/gpu/drm/tiny/ili9341.c.

Suggested-by: Maxime Ripard <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Maxime Ripard <[email protected]>
6 months agodrm/panthor: Use the BITS_PER_LONG macro
Jinjie Ruan [Mon, 2 Sep 2024 09:44:04 +0000 (17:44 +0800)]
drm/panthor: Use the BITS_PER_LONG macro

sizeof(unsigned long) * 8 is the number of bits in an unsigned long
variable, replace it with BITS_PER_LONG macro to make them simpler.

And fix the warning:
WARNING: Comparisons should place the constant on the right side of the test
#23: FILE: drivers/gpu/drm/panthor/panthor_mmu.c:2696:
+       if (BITS_PER_LONG < va_bits) {

Signed-off-by: Jinjie Ruan <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Signed-off-by: Steven Price <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agonet: microchip: vcap: Fix use-after-free error in kunit test
Jens Emil Schulz Østergaard [Thu, 29 Aug 2024 09:52:54 +0000 (11:52 +0200)]
net: microchip: vcap: Fix use-after-free error in kunit test

This is a clear use-after-free error. We remove it, and rely on checking
the return code of vcap_del_rule.

Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/kernel-janitors/[email protected]/
Fixes: c956b9b318d9 ("net: microchip: sparx5: Adding KUNIT tests of key/action values in VCAP API")
Signed-off-by: Jens Emil Schulz Østergaard <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 months agoMAINTAINERS: Remove Wedson as Rust maintainer
Wedson Almeida Filho [Wed, 28 Aug 2024 21:11:17 +0000 (18:11 -0300)]
MAINTAINERS: Remove Wedson as Rust maintainer

I am retiring from the project, so removing myself from MAINTAINERS as I won't
have time to dedicate to it.

Signed-off-by: Wedson Almeida Filho <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
6 months agodrm/panfrost: Add cycle counter job requirement
Mary Guillemard [Mon, 19 Aug 2024 08:02:23 +0000 (10:02 +0200)]
drm/panfrost: Add cycle counter job requirement

Extend the uAPI with a new job requirement flag for cycle
counters. This requirement is used by userland to indicate that a job
requires cycle counters or system timestamp to be propagated. (for use
with write value timestamp jobs)

We cannot enable cycle counters unconditionally as this would result in
an increase of GPU power consumption. As a result, they should be left
off unless required by the application.

If a job requires cycle counters or system timestamps propagation, we
must enable cycle counting before issuing a job and disable it right
after the job completes.

Since this extends the uAPI and because userland needs a way to advertise
features like VK_KHR_shader_clock conditionally, we bumps the driver
minor version.

v2:
- Rework commit message
- Squash uAPI changes and implementation in this commit
- Simplify changes based on Steven Price comments

v3:
- Add Steven Price r-b
- Fix a codestyle issue

Signed-off-by: Mary Guillemard <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Tested-by: Heiko Stuebner <[email protected]>
Signed-off-by: Steven Price <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/panfrost: Add SYSTEM_TIMESTAMP and SYSTEM_TIMESTAMP_FREQUENCY parameters
Mary Guillemard [Mon, 19 Aug 2024 08:02:22 +0000 (10:02 +0200)]
drm/panfrost: Add SYSTEM_TIMESTAMP and SYSTEM_TIMESTAMP_FREQUENCY parameters

Expose system timestamp and frequency supported by the GPU.

Mali uses an external timer as GPU system time. On ARM, this is wired to
the generic arch timer so we wire cntfrq_el0 as device frequency.

This new uAPI will be used in Mesa to implement timestamp queries and
VK_KHR_calibrated_timestamps.

v2:
- Rewrote to use GPU timestamp register
- Add missing include for arch_timer_get_cntfrq
- Rework commit message

v3:
- Move panfrost_cycle_counter_get and panfrost_cycle_counter_put to
  panfrost_ioctl_query_timestamp
- Handle possible overflow in panfrost_timestamp_read

Signed-off-by: Mary Guillemard <[email protected]>
Tested-by: Heiko Stuebner <[email protected]>
Reviewed-by: Steven Price <[email protected]>
Signed-off-by: Steven Price <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agopinctrl: qcom: x1e80100: Bypass PDC wakeup parent for now
Stephan Gerhold [Fri, 30 Aug 2024 09:09:07 +0000 (11:09 +0200)]
pinctrl: qcom: x1e80100: Bypass PDC wakeup parent for now

On X1E80100, GPIO interrupts for wakeup-capable pins have been broken since
the introduction of the pinctrl driver. This prevents keyboard and touchpad
from working on most of the X1E laptops. So far we have worked around this
by manually building a kernel with the "wakeup-parent" removed from the
pinctrl node in the device tree, but we cannot expect all users to do that.

Implement a similar workaround in the driver by clearing the wakeirq_map
for X1E80100. This avoids using the PDC wakeup parent for all GPIOs
and handles the interrupts directly in the pinctrl driver instead.

The PDC driver needs additional changes to support X1E80100 properly.
Adding a workaround separately first allows to land the necessary PDC
changes through the normal release cycle, while still solving the more
critical problem with keyboard and touchpad on the current stable kernel
versions. Bypassing the PDC is enough for now, because we have not yet
enabled the deep idle states where using the PDC becomes necessary.

Cc: [email protected]
Fixes: 05e4941d97ef ("pinctrl: qcom: Add X1E80100 pinctrl driver")
Signed-off-by: Stephan Gerhold <[email protected]>
Reviewed-by: Johan Hovold <[email protected]>
Tested-by: Johan Hovold <[email protected]>
Reviewed-by: Konrad Dybcio <[email protected]>
Reviewed-by: Abel Vesa <[email protected]>
Link: https://lore.kernel.org/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
6 months agodrm/imagination: Use memdup_user() helper
Jinjie Ruan [Mon, 2 Sep 2024 02:33:00 +0000 (10:33 +0800)]
drm/imagination: Use memdup_user() helper

Switching to memdup_user(), which combines kmalloc() and copy_from_user(),
and it can simplfy code.

Signed-off-by: Jinjie Ruan <[email protected]>
Suggested-by: Christophe JAILLET <[email protected]>
Reviewed-by: Frank Binns <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Matt Coster <[email protected]>
6 months agodrm/imagination: Use memdup_user() helper to simplify code
Jinjie Ruan [Sat, 31 Aug 2024 10:29:30 +0000 (18:29 +0800)]
drm/imagination: Use memdup_user() helper to simplify code

Switching to memdup_user(), which combines kmalloc() and copy_from_user(),
and it can simplfy code.

Signed-off-by: Jinjie Ruan <[email protected]>
Reviewed-by: Frank Binns <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Matt Coster <[email protected]>
6 months agodrm/imagination: Free pvr_vm_gpuva after unlink
Matt Coster [Mon, 2 Sep 2024 08:48:48 +0000 (09:48 +0100)]
drm/imagination: Free pvr_vm_gpuva after unlink

This caused a measurable memory leak. Although the individual
allocations are small, the leaks occurs in a high-usage codepath
(remapping or unmapping device memory) so they add up quickly.

Fixes: ff5f643de0bf ("drm/imagination: Add GEM and VM related code")
Cc: [email protected]
Reviewed-by: Frank Binns <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Matt Coster <[email protected]>
6 months agodrm/imagination: Use pvr_vm_context_get()
Matt Coster [Fri, 30 Aug 2024 15:06:01 +0000 (15:06 +0000)]
drm/imagination: Use pvr_vm_context_get()

I missed this open-coded kref_get() while trying to debug a refcount
bug, so let's use the helper function here to avoid that waste of time
again in the future.

Fixes: ff5f643de0bf ("drm/imagination: Add GEM and VM related code")
Reviewed-by: Frank Binns <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Matt Coster <[email protected]>
6 months agoclocksource/drivers/imx-tpm: Fix next event not taking effect sometime
Jacky Bai [Thu, 25 Jul 2024 19:33:55 +0000 (15:33 -0400)]
clocksource/drivers/imx-tpm: Fix next event not taking effect sometime

The value written into the TPM CnV can only be updated into the hardware
when the counter increases. Additional writes to the CnV write buffer are
ignored until the register has been updated. Therefore, we need to check
if the CnV has been updated before continuing. This may require waiting for
1 counter cycle in the worst case.

Cc: [email protected]
Fixes: 059ab7b82eec ("clocksource/drivers/imx-tpm: Add imx tpm timer support")
Signed-off-by: Jacky Bai <[email protected]>
Reviewed-by: Peng Fan <[email protected]>
Reviewed-by: Ye Li <[email protected]>
Reviewed-by: Jason Liu <[email protected]>
Signed-off-by: Frank Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Daniel Lezcano <[email protected]>
6 months agoclocksource/drivers/imx-tpm: Fix return -ETIME when delta exceeds INT_MAX
Jacky Bai [Thu, 25 Jul 2024 19:33:54 +0000 (15:33 -0400)]
clocksource/drivers/imx-tpm: Fix return -ETIME when delta exceeds INT_MAX

In tpm_set_next_event(delta), return -ETIME by wrong cast to int when delta
is larger than INT_MAX.

For example:

tpm_set_next_event(delta = 0xffff_fffe)
{
        ...
        next = tpm_read_counter(); // assume next is 0x10
        next += delta; // next will 0xffff_fffe + 0x10 = 0x1_0000_000e
        now = tpm_read_counter();  // now is 0x10
        ...

        return (int)(next - now) <= 0 ? -ETIME : 0;
                     ^^^^^^^^^^
                     0x1_0000_000e - 0x10 = 0xffff_fffe, which is -2 when
                     cast to int. So return -ETIME.
}

To fix this, introduce a 'prev' variable and check if 'now - prev' is
larger than delta.

Cc: [email protected]
Fixes: 059ab7b82eec ("clocksource/drivers/imx-tpm: Add imx tpm timer support")
Signed-off-by: Jacky Bai <[email protected]>
Reviewed-by: Peng Fan <[email protected]>
Reviewed-by: Ye Li <[email protected]>
Reviewed-by: Jason Liu <[email protected]>
Signed-off-by: Frank Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Daniel Lezcano <[email protected]>
6 months agoclocksource/drivers/timer-of: Remove percpu irq related code
Daniel Lezcano [Mon, 19 Aug 2024 10:03:35 +0000 (12:03 +0200)]
clocksource/drivers/timer-of: Remove percpu irq related code

GCC's named address space checks errors out with:

drivers/clocksource/timer-of.c: In function ‘timer_of_irq_exit’:
drivers/clocksource/timer-of.c:29:46: error: passing argument 2 of
‘free_percpu_irq’ from pointer to non-enclosed address space
  29 |                 free_percpu_irq(of_irq->irq, clkevt);
     |                                              ^~~~~~
In file included from drivers/clocksource/timer-of.c:8:
./include/linux/interrupt.h:201:43: note: expected ‘__seg_gs void *’
but argument is of type ‘struct clock_event_device *’
 201 | extern void free_percpu_irq(unsigned int, void __percpu *);
     |                                           ^~~~~~~~~~~~~~~
drivers/clocksource/timer-of.c: In function ‘timer_of_irq_init’:
drivers/clocksource/timer-of.c:74:51: error: passing argument 4 of
‘request_percpu_irq’ from pointer to non-enclosed address space
  74 |                                    np->full_name, clkevt) :
     |                                                   ^~~~~~
./include/linux/interrupt.h:190:56: note: expected ‘__seg_gs void *’
but argument is of type ‘struct clock_event_device *’
 190 |                    const char *devname, void __percpu *percpu_dev_id)

Sparse warns about:

timer-of.c:29:46: warning: incorrect type in argument 2 (different address spaces)
timer-of.c:29:46:    expected void [noderef] __percpu *
timer-of.c:29:46:    got struct clock_event_device *clkevt
timer-of.c:74:51: warning: incorrect type in argument 4 (different address spaces)
timer-of.c:74:51:    expected void [noderef] __percpu *percpu_dev_id
timer-of.c:74:51:    got struct clock_event_device *clkevt

It appears the code is incorrect as reported by Uros Bizjak:

"The referred code is questionable as it tries to reuse
the clkevent pointer once as percpu pointer and once as generic
pointer, which should be avoided."

This change removes the percpu related code as no drivers is using it.

[Daniel: Fixed the description]

Fixes: dc11bae785295 ("clocksource/drivers: Add timer-of common init routine")
Reported-by: Uros Bizjak <[email protected]>
Tested-by: Uros Bizjak <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Daniel Lezcano <[email protected]>
6 months agodrm/i915/fence: Mark debug_fence_free() with __maybe_unused
Andy Shevchenko [Thu, 29 Aug 2024 15:58:38 +0000 (18:58 +0300)]
drm/i915/fence: Mark debug_fence_free() with __maybe_unused

When debug_fence_free() is unused
(CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS=n), it prevents kernel builds
with clang, `make W=1` and CONFIG_WERROR=y:

.../i915_sw_fence.c:118:20: error: unused function 'debug_fence_free' [-Werror,-Wunused-function]
  118 | static inline void debug_fence_free(struct i915_sw_fence *fence)
      |                    ^~~~~~~~~~~~~~~~

Fix this by marking debug_fence_free() with __maybe_unused.

See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").

Fixes: fc1584059d6c ("drm/i915: Integrate i915_sw_fence with debugobjects")
Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Jani Nikula <[email protected]>
6 months agodrm/i915/fence: Mark debug_fence_init_onstack() with __maybe_unused
Andy Shevchenko [Thu, 29 Aug 2024 15:58:37 +0000 (18:58 +0300)]
drm/i915/fence: Mark debug_fence_init_onstack() with __maybe_unused

When debug_fence_init_onstack() is unused (CONFIG_DRM_I915_SELFTEST=n),
it prevents kernel builds with clang, `make W=1` and CONFIG_WERROR=y:

.../i915_sw_fence.c:97:20: error: unused function 'debug_fence_init_onstack' [-Werror,-Wunused-function]
   97 | static inline void debug_fence_init_onstack(struct i915_sw_fence *fence)
      |                    ^~~~~~~~~~~~~~~~~~~~~~~~

Fix this by marking debug_fence_init_onstack() with __maybe_unused.

See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static
inline functions for W=1 build").

Fixes: 214707fc2ce0 ("drm/i915/selftests: Wrap a timer into a i915_sw_fence")
Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Jani Nikula <[email protected]>
6 months agorust: macros: provide correct provenance when constructing THIS_MODULE
Boqun Feng [Wed, 28 Aug 2024 18:01:29 +0000 (11:01 -0700)]
rust: macros: provide correct provenance when constructing THIS_MODULE

Currently while defining `THIS_MODULE` symbol in `module!()`, the
pointer used to construct `ThisModule` is derived from an immutable
reference of `__this_module`, which means the pointer doesn't have
the provenance for writing, and that means any write to that pointer
is UB regardless of data races or not. However, the usage of
`THIS_MODULE` includes passing this pointer to functions that may write
to it (probably in unsafe code), and this will create soundness issues.

One way to fix this is using `addr_of_mut!()` but that requires the
unstable feature "const_mut_refs". So instead of `addr_of_mut()!`,
an extern static `Opaque` is used here: since `Opaque<T>` is transparent
to `T`, an extern static `Opaque` will just wrap the C symbol (defined
in a C compile unit) in an `Opaque`, which provides a pointer with
writable provenance via `Opaque::get()`. This fix the potential UBs
because of pointer provenance unmatched.

Reported-by: Alice Ryhl <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
Reviewed-by: Alice Ryhl <[email protected]>
Reviewed-by: Trevor Gross <[email protected]>
Reviewed-by: Benno Lossin <[email protected]>
Reviewed-by: Gary Guo <[email protected]>
Closes: https://rust-for-linux.zulipchat.com/#narrow/stream/x/topic/x/near/465412664
Fixes: 1fbde52bde73 ("rust: add `macros` crate")
Cc: [email protected] # 6.6.x: be2ca1e03965: ("rust: types: Make Opaque::get const")
Link: https://lore.kernel.org/r/[email protected]
[ Fixed two typos, reworded title. - Miguel ]
Signed-off-by: Miguel Ojeda <[email protected]>
6 months agoMerge tag 'ata-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata...
Linus Torvalds [Mon, 2 Sep 2024 02:59:59 +0000 (19:59 -0700)]
Merge tag 'ata-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fix from Damien Le Moal:

 - Fix a potential memory leak in the ata host initialization code (from
   Zheng)

* tag 'ata-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata: Fix memory leak for error path in ata_host_alloc()

6 months agoalloc_tag: fix allocation tag reporting when CONFIG_MODULES=n
Suren Baghdasaryan [Wed, 28 Aug 2024 23:15:36 +0000 (16:15 -0700)]
alloc_tag: fix allocation tag reporting when CONFIG_MODULES=n

codetag_module_init() is used to initialize sections containing allocation
tags.  This function is used to initialize module sections as well as core
kernel sections, in which case the module parameter is set to NULL.  This
function has to be called even when CONFIG_MODULES=n to initialize core
kernel allocation tag sections.  When CONFIG_MODULES=n, this function is a
NOP, which is wrong.  This leads to /proc/allocinfo reported as empty.
Fix this by making it independent of CONFIG_MODULES.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 916cc5167cc6 ("lib: code tagging framework")
Signed-off-by: Suren Baghdasaryan <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: Sourav Panda <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]> [6.10+]
Signed-off-by: Andrew Morton <[email protected]>
6 months agomm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area
Adrian Huang [Thu, 29 Aug 2024 13:06:33 +0000 (21:06 +0800)]
mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area

When running the vmalloc stress on a 448-core system, observe the average
latency of purge_vmap_node() is about 2 seconds by using the eBPF/bcc
'funclatency.py' tool [1].

  # /your-git-repo/bcc/tools/funclatency.py -u purge_vmap_node & pid1=$! && sleep 8 && modprobe test_vmalloc nr_threads=$(nproc) run_test_mask=0x7; kill -SIGINT $pid1

     usecs             : count    distribution
        0 -> 1         : 0       |                                        |
        2 -> 3         : 29      |                                        |
        4 -> 7         : 19      |                                        |
        8 -> 15        : 56      |                                        |
       16 -> 31        : 483     |****                                    |
       32 -> 63        : 1548    |************                            |
       64 -> 127       : 2634    |*********************                   |
      128 -> 255       : 2535    |*********************                   |
      256 -> 511       : 1776    |**************                          |
      512 -> 1023      : 1015    |********                                |
     1024 -> 2047      : 573     |****                                    |
     2048 -> 4095      : 488     |****                                    |
     4096 -> 8191      : 1091    |*********                               |
     8192 -> 16383     : 3078    |*************************               |
    16384 -> 32767     : 4821    |****************************************|
    32768 -> 65535     : 3318    |***************************             |
    65536 -> 131071    : 1718    |**************                          |
   131072 -> 262143    : 2220    |******************                      |
   262144 -> 524287    : 1147    |*********                               |
   524288 -> 1048575   : 1179    |*********                               |
  1048576 -> 2097151   : 822     |******                                  |
  2097152 -> 4194303   : 906     |*******                                 |
  4194304 -> 8388607   : 2148    |*****************                       |
  8388608 -> 16777215  : 4497    |*************************************   |
 16777216 -> 33554431  : 289     |**                                      |

  avg = 2041714 usecs, total: 78381401772 usecs, count: 38390

  The worst case is over 16-33 seconds, so soft lockup is triggered [2].

[Root Cause]
1) Each purge_list has the long list. The following shows the number of
   vmap_area is purged.

   crash> p vmap_nodes
   vmap_nodes = $27 = (struct vmap_node *) 0xff2de5a900100000
   crash> vmap_node 0xff2de5a900100000 128 | grep nr_purged
     nr_purged = 663070
     ...
     nr_purged = 821670
     nr_purged = 692214
     nr_purged = 726808
     ...

2) atomic_long_sub() employs the 'lock' prefix to ensure the atomic
   operation when purging each vmap_area. However, the iteration is over
   600000 vmap_area (See 'nr_purged' above).

   Here is objdump output:

     $ objdump -D vmlinux
     ffffffff813e8c80 <purge_vmap_node>:
     ...
     ffffffff813e8d70:  f0 48 29 2d 68 0c bb  lock sub %rbp,0x2bb0c68(%rip)
     ...

   Quote from "Instruction tables" pdf file [3]:
     Instructions with a LOCK prefix have a long latency that depends on
     cache organization and possibly RAM speed. If there are multiple
     processors or cores or direct memory access (DMA) devices, then all
     locked instructions will lock a cache line for exclusive access,
     which may involve RAM access. A LOCK prefix typically costs more
     than a hundred clock cycles, even on single-processor systems.

   That's why the latency of purge_vmap_node() dramatically increases
   on a many-core system: One core is busy on purging each vmap_area of
   the *long* purge_list and executing atomic_long_sub() for each
   vmap_area, while other cores free vmalloc allocations and execute
   atomic_long_add_return() in free_vmap_area_noflush().

[Solution]
Employ a local variable to record the total purged pages, and execute
atomic_long_sub() after the traversal of the purge_list is done. The
experiment result shows the latency improvement is 99%.

[Experiment Result]
1) System Configuration: Three servers (with HT-enabled) are tested.
     * 72-core server: 3rd Gen Intel Xeon Scalable Processor*1
     * 192-core server: 5th Gen Intel Xeon Scalable Processor*2
     * 448-core server: AMD Zen 4 Processor*2

2) Kernel Config
     * CONFIG_KASAN is disabled

3) The data in column "w/o patch" and "w/ patch"
     * Unit: micro seconds (us)
     * Each data is the average of 3-time measurements

         System        w/o patch (us)   w/ patch (us)    Improvement (%)
     ---------------   --------------   -------------    -------------
     72-core server          2194              14            99.36%
     192-core server       143799            1139            99.21%
     448-core server      1992122            6883            99.65%

[1] https://github.com/iovisor/bcc/blob/master/tools/funclatency.py
[2] https://gist.github.com/AdrianHuang/37c15f67b45407b83c2d32f918656c12
[3] https://www.agner.org/optimize/instruction_tables.pdf

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Adrian Huang <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agomailmap: update entry for Jan Kuliga
Jan Kuliga [Fri, 30 Aug 2024 09:56:58 +0000 (11:56 +0200)]
mailmap: update entry for Jan Kuliga

Soon I won't be able to use my current email address.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jan Kuliga <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agocodetag: debug: mark codetags for poisoned page as empty
Hao Ge [Sun, 25 Aug 2024 16:36:49 +0000 (00:36 +0800)]
codetag: debug: mark codetags for poisoned page as empty

When PG_hwpoison pages are freed they are treated differently in
free_pages_prepare() and instead of being released they are isolated.

Page allocation tag counters are decremented at this point since the page
is considered not in use.  Later on when such pages are released by
unpoison_memory(), the allocation tag counters will be decremented again
and the following warning gets reported:

[  113.930443][ T3282] ------------[ cut here ]------------
[  113.931105][ T3282] alloc_tag was not set
[  113.931576][ T3282] WARNING: CPU: 2 PID: 3282 at ./include/linux/alloc_tag.h:130 pgalloc_tag_sub.part.66+0x154/0x164
[  113.932866][ T3282] Modules linked in: hwpoison_inject fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_man4
[  113.941638][ T3282] CPU: 2 UID: 0 PID: 3282 Comm: madvise11 Kdump: loaded Tainted: G        W          6.11.0-rc4-dirty #18
[  113.943003][ T3282] Tainted: [W]=WARN
[  113.943453][ T3282] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[  113.944378][ T3282] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  113.945319][ T3282] pc : pgalloc_tag_sub.part.66+0x154/0x164
[  113.946016][ T3282] lr : pgalloc_tag_sub.part.66+0x154/0x164
[  113.946706][ T3282] sp : ffff800087093a10
[  113.947197][ T3282] x29: ffff800087093a10 x28: ffff0000d7a9d400 x27: ffff80008249f0a0
[  113.948165][ T3282] x26: 0000000000000000 x25: ffff80008249f2b0 x24: 0000000000000000
[  113.949134][ T3282] x23: 0000000000000001 x22: 0000000000000001 x21: 0000000000000000
[  113.950597][ T3282] x20: ffff0000c08fcad8 x19: ffff80008251e000 x18: ffffffffffffffff
[  113.952207][ T3282] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800081746210
[  113.953161][ T3282] x14: 0000000000000000 x13: 205d323832335420 x12: 5b5d353031313339
[  113.954120][ T3282] x11: ffff800087093500 x10: 000000000000005d x9 : 00000000ffffffd0
[  113.955078][ T3282] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008236ba90 x6 : c0000000ffff7fff
[  113.956036][ T3282] x5 : ffff000b34bf4dc8 x4 : ffff8000820aba90 x3 : 0000000000000001
[  113.956994][ T3282] x2 : ffff800ab320f000 x1 : 841d1e35ac932e00 x0 : 0000000000000000
[  113.957962][ T3282] Call trace:
[  113.958350][ T3282]  pgalloc_tag_sub.part.66+0x154/0x164
[  113.959000][ T3282]  pgalloc_tag_sub+0x14/0x1c
[  113.959539][ T3282]  free_unref_page+0xf4/0x4b8
[  113.960096][ T3282]  __folio_put+0xd4/0x120
[  113.960614][ T3282]  folio_put+0x24/0x50
[  113.961103][ T3282]  unpoison_memory+0x4f0/0x5b0
[  113.961678][ T3282]  hwpoison_unpoison+0x30/0x48 [hwpoison_inject]
[  113.962436][ T3282]  simple_attr_write_xsigned.isra.34+0xec/0x1cc
[  113.963183][ T3282]  simple_attr_write+0x38/0x48
[  113.963750][ T3282]  debugfs_attr_write+0x54/0x80
[  113.964330][ T3282]  full_proxy_write+0x68/0x98
[  113.964880][ T3282]  vfs_write+0xdc/0x4d0
[  113.965372][ T3282]  ksys_write+0x78/0x100
[  113.965875][ T3282]  __arm64_sys_write+0x24/0x30
[  113.966440][ T3282]  invoke_syscall+0x7c/0x104
[  113.966984][ T3282]  el0_svc_common.constprop.1+0x88/0x104
[  113.967652][ T3282]  do_el0_svc+0x2c/0x38
[  113.968893][ T3282]  el0_svc+0x3c/0x1b8
[  113.969379][ T3282]  el0t_64_sync_handler+0x98/0xbc
[  113.969980][ T3282]  el0t_64_sync+0x19c/0x1a0
[  113.970511][ T3282] ---[ end trace 0000000000000000 ]---

To fix this, clear the page tag reference after the page got isolated
and accounted for.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: d224eb0287fb ("codetag: debug: mark codetags for reserved pages as empty")
Signed-off-by: Hao Ge <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Acked-by: Suren Baghdasaryan <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hao Ge <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Pasha Tatashin <[email protected]>
Cc: <[email protected]> [6.10+]
Signed-off-by: Andrew Morton <[email protected]>
6 months agomm/memcontrol: respect zswap.writeback setting from parent cg too
Mike Yuan [Fri, 23 Aug 2024 16:27:06 +0000 (16:27 +0000)]
mm/memcontrol: respect zswap.writeback setting from parent cg too

Currently, the behavior of zswap.writeback wrt.  the cgroup hierarchy
seems a bit odd.  Unlike zswap.max, it doesn't honor the value from parent
cgroups.  This surfaced when people tried to globally disable zswap
writeback, i.e.  reserve physical swap space only for hibernation [1] -
disabling zswap.writeback only for the root cgroup results in subcgroups
with zswap.writeback=1 still performing writeback.

The inconsistency became more noticeable after I introduced the
MemoryZSwapWriteback= systemd unit setting [2] for controlling the knob.
The patch assumed that the kernel would enforce the value of parent
cgroups.  It could probably be workarounded from systemd's side, by going
up the slice unit tree and inheriting the value.  Yet I think it's more
sensible to make it behave consistently with zswap.max and friends.

[1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation
[2] https://github.com/systemd/systemd/pull/31734

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 501a06fe8e4c ("zswap: memcontrol: implement zswap writeback disabling")
Signed-off-by: Mike Yuan <[email protected]>
Reviewed-by: Nhat Pham <[email protected]>
Acked-by: Yosry Ahmed <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Michal Koutný <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agoscripts: fix gfp-translate after ___GFP_*_BITS conversion to an enum
Marc Zyngier [Fri, 23 Aug 2024 16:38:50 +0000 (17:38 +0100)]
scripts: fix gfp-translate after ___GFP_*_BITS conversion to an enum

Richard reports that since 772dd0342727c ("mm: enumerate all gfp flags"),
gfp-translate is broken, as the bit numbers are implicit, leaving the
shell script unable to extract them.  Even more, some bits are now at a
variable location, making it double extra hard to parse using a simple
shell script.

Use a brute-force approach to the problem by generating a small C stub
that will use the enum to dump the interesting bits.

As an added bonus, we are now able to identify invalid bits for a given
configuration.  As an added drawback, we cannot parse include files that
predate this change anymore.  Tough luck.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 772dd0342727 ("mm: enumerate all gfp flags")
Signed-off-by: Marc Zyngier <[email protected]>
Reported-by: Richard Weinberger <[email protected]>
Cc: Petr Tesařík <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agoRevert "mm: skip CMA pages when they are not available"
Usama Arif [Wed, 21 Aug 2024 19:26:07 +0000 (20:26 +0100)]
Revert "mm: skip CMA pages when they are not available"

This reverts commit 5da226dbfce3 ("mm: skip CMA pages when they are not
available") and b7108d66318a ("Multi-gen LRU: skip CMA pages when they are
not eligible").

lruvec->lru_lock is highly contended and is held when calling
isolate_lru_folios.  If the lru has a large number of CMA folios
consecutively, while the allocation type requested is not MIGRATE_MOVABLE,
isolate_lru_folios can hold the lock for a very long time while it skips
those.  For FIO workload, ~150million order=0 folios were skipped to
isolate a few ZONE_DMA folios [1].  This can cause lockups [1] and high
memory pressure for extended periods of time [2].

Remove skipping CMA for MGLRU as well, as it was introduced in sort_folio
for the same resaon as 5da226dbfce3a2f44978c2c7cf88166e69a6788b.

[1] https://lore.kernel.org/all/CAOUHufbkhMZYz20aM_3rHZ3OcK4m2puji2FGpUpn_-DevGk3Kg@mail.gmail.com/
[2] https://lore.kernel.org/all/[email protected]/

[[email protected]: also revert b7108d66318a, per Johannes]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5da226dbfce3 ("mm: skip CMA pages when they are not available")
Signed-off-by: Usama Arif <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Bharata B Rao <[email protected]>
Cc: Breno Leitao <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yu Zhao <[email protected]>
Cc: Zhaoyang Huang <[email protected]>
Cc: Zhaoyang Huang <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agomaple_tree: remove rcu_read_lock() from mt_validate()
Liam R. Howlett [Tue, 20 Aug 2024 17:54:17 +0000 (13:54 -0400)]
maple_tree: remove rcu_read_lock() from mt_validate()

The write lock should be held when validating the tree to avoid updates
racing with checks.  Holding the rcu read lock during a large tree
validation may also cause a prolonged rcu read window and "rcu_preempt
detected stalls" warnings.

Link: https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <[email protected]>
Reported-by: [email protected]
Cc: Hillf Danton <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agokexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
Petr Tesarik [Mon, 5 Aug 2024 15:07:50 +0000 (17:07 +0200)]
kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y

Fix the condition to exclude the elfcorehdr segment from the SHA digest
calculation.

The j iterator is an index into the output sha_regions[] array, not into
the input image->segment[] array.  Once it reaches
image->elfcorehdr_index, all subsequent segments are excluded.  Besides,
if the purgatory segment precedes the elfcorehdr segment, the elfcorehdr
may be wrongly included in the calculation.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: f7cc804a9fd4 ("kexec: exclude elfcorehdr from the segment digest")
Signed-off-by: Petr Tesarik <[email protected]>
Acked-by: Baoquan He <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Hari Bathini <[email protected]>
Cc: Sourabh Jain <[email protected]>
Cc: Eric DeVolder <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agomm/slub: add check for s->flags in the alloc_tagging_slab_free_hook
Hao Ge [Fri, 16 Aug 2024 01:33:36 +0000 (09:33 +0800)]
mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook

When enable CONFIG_MEMCG & CONFIG_KFENCE & CONFIG_KMEMLEAK, the following
warning always occurs,This is because the following call stack occurred:
mem_pool_alloc
    kmem_cache_alloc_noprof
        slab_alloc_node
            kfence_alloc

Once the kfence allocation is successful,slab->obj_exts will not be empty,
because it has already been assigned a value in kfence_init_pool.

Since in the prepare_slab_obj_exts_hook function,we perform a check for
s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE),the alloc_tag_add function
will not be called as a result.Therefore,ref->ct remains NULL.

However,when we call mem_pool_free,since obj_ext is not empty, it
eventually leads to the alloc_tag_sub scenario being invoked.  This is
where the warning occurs.

So we should add corresponding checks in the alloc_tagging_slab_free_hook.
For __GFP_NO_OBJ_EXT case,I didn't see the specific case where it's using
kfence,so I won't add the corresponding check in
alloc_tagging_slab_free_hook for now.

[    3.734349] ------------[ cut here ]------------
[    3.734807] alloc_tag was not set
[    3.735129] WARNING: CPU: 4 PID: 40 at ./include/linux/alloc_tag.h:130 kmem_cache_free+0x444/0x574
[    3.735866] Modules linked in: autofs4
[    3.736211] CPU: 4 UID: 0 PID: 40 Comm: ksoftirqd/4 Tainted: G        W          6.11.0-rc3-dirty #1
[    3.736969] Tainted: [W]=WARN
[    3.737258] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[    3.737875] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    3.738501] pc : kmem_cache_free+0x444/0x574
[    3.738951] lr : kmem_cache_free+0x444/0x574
[    3.739361] sp : ffff80008357bb60
[    3.739693] x29: ffff80008357bb70 x28: 0000000000000000 x27: 0000000000000000
[    3.740338] x26: ffff80008207f000 x25: ffff000b2eb2fd60 x24: ffff0000c0005700
[    3.740982] x23: ffff8000804229e4 x22: ffff800082080000 x21: ffff800081756000
[    3.741630] x20: fffffd7ff8253360 x19: 00000000000000a8 x18: ffffffffffffffff
[    3.742274] x17: ffff800ab327f000 x16: ffff800083398000 x15: ffff800081756df0
[    3.742919] x14: 0000000000000000 x13: 205d344320202020 x12: 5b5d373038343337
[    3.743560] x11: ffff80008357b650 x10: 000000000000005d x9 : 00000000ffffffd0
[    3.744231] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008237bad0 x6 : c0000000ffff7fff
[    3.744907] x5 : ffff80008237ba78 x4 : ffff8000820bbad0 x3 : 0000000000000001
[    3.745580] x2 : 68d66547c09f7800 x1 : 68d66547c09f7800 x0 : 0000000000000000
[    3.746255] Call trace:
[    3.746530]  kmem_cache_free+0x444/0x574
[    3.746931]  mem_pool_free+0x44/0xf4
[    3.747306]  free_object_rcu+0xc8/0xdc
[    3.747693]  rcu_do_batch+0x234/0x8a4
[    3.748075]  rcu_core+0x230/0x3e4
[    3.748424]  rcu_core_si+0x14/0x1c
[    3.748780]  handle_softirqs+0x134/0x378
[    3.749189]  run_ksoftirqd+0x70/0x9c
[    3.749560]  smpboot_thread_fn+0x148/0x22c
[    3.749978]  kthread+0x10c/0x118
[    3.750323]  ret_from_fork+0x10/0x20
[    3.750696] ---[ end trace 0000000000000000 ]---

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
Signed-off-by: Hao Ge <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agonilfs2: fix state management in error path of log writing function
Ryusuke Konishi [Wed, 14 Aug 2024 10:11:19 +0000 (19:11 +0900)]
nilfs2: fix state management in error path of log writing function

After commit a694291a6211 ("nilfs2: separate wait function from
nilfs_segctor_write") was applied, the log writing function
nilfs_segctor_do_construct() was able to issue I/O requests continuously
even if user data blocks were split into multiple logs across segments,
but two potential flaws were introduced in its error handling.

First, if nilfs_segctor_begin_construction() fails while creating the
second or subsequent logs, the log writing function returns without
calling nilfs_segctor_abort_construction(), so the writeback flag set on
pages/folios will remain uncleared.  This causes page cache operations to
hang waiting for the writeback flag.  For example,
truncate_inode_pages_final(), which is called via nilfs_evict_inode() when
an inode is evicted from memory, will hang.

Second, the NILFS_I_COLLECTED flag set on normal inodes remain uncleared.
As a result, if the next log write involves checkpoint creation, that's
fine, but if a partial log write is performed that does not, inodes with
NILFS_I_COLLECTED set are erroneously removed from the "sc_dirty_files"
list, and their data and b-tree blocks may not be written to the device,
corrupting the block mapping.

Fix these issues by uniformly calling nilfs_segctor_abort_construction()
on failure of each step in the loop in nilfs_segctor_do_construct(),
having it clean up logs and segment usages according to progress, and
correcting the conditions for calling nilfs_redirty_inodes() to ensure
that the NILFS_I_COLLECTED flag is cleared.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: a694291a6211 ("nilfs2: separate wait function from nilfs_segctor_write")
Signed-off-by: Ryusuke Konishi <[email protected]>
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agonilfs2: fix missing cleanup on rollforward recovery error
Ryusuke Konishi [Sat, 10 Aug 2024 06:52:42 +0000 (15:52 +0900)]
nilfs2: fix missing cleanup on rollforward recovery error

In an error injection test of a routine for mount-time recovery, KASAN
found a use-after-free bug.

It turned out that if data recovery was performed using partial logs
created by dsync writes, but an error occurred before starting the log
writer to create a recovered checkpoint, the inodes whose data had been
recovered were left in the ns_dirty_files list of the nilfs object and
were not freed.

Fix this issue by cleaning up inodes that have read the recovery data if
the recovery routine fails midway before the log writer starts.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 0f3e1c7f23f8 ("nilfs2: recovery functions")
Signed-off-by: Ryusuke Konishi <[email protected]>
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agonilfs2: protect references to superblock parameters exposed in sysfs
Ryusuke Konishi [Sun, 11 Aug 2024 10:03:20 +0000 (19:03 +0900)]
nilfs2: protect references to superblock parameters exposed in sysfs

The superblock buffers of nilfs2 can not only be overwritten at runtime
for modifications/repairs, but they are also regularly swapped, replaced
during resizing, and even abandoned when degrading to one side due to
backing device issues.  So, accessing them requires mutual exclusion using
the reader/writer semaphore "nilfs->ns_sem".

Some sysfs attribute show methods read this superblock buffer without the
necessary mutual exclusion, which can cause problems with pointer
dereferencing and memory access, so fix it.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group")
Signed-off-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agouserfaultfd: don't BUG_ON() if khugepaged yanks our page table
Jann Horn [Tue, 13 Aug 2024 20:25:22 +0000 (22:25 +0200)]
userfaultfd: don't BUG_ON() if khugepaged yanks our page table

Since khugepaged was changed to allow retracting page tables in file
mappings without holding the mmap lock, these BUG_ON()s are wrong - get
rid of them.

We could also remove the preceding "if (unlikely(...))" block, but then we
could reach pte_offset_map_lock() with transhuge pages not just for file
mappings but also for anonymous mappings - which would probably be fine
but I think is not necessarily expected.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
Signed-off-by: Jann Horn <[email protected]>
Reviewed-by: Qi Zheng <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agouserfaultfd: fix checks for huge PMDs
Jann Horn [Tue, 13 Aug 2024 20:25:21 +0000 (22:25 +0200)]
userfaultfd: fix checks for huge PMDs

Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.

The pmd_trans_huge() code in mfill_atomic() is wrong in three different
ways depending on kernel version:

1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit
   the right two race windows) - I've tested this in a kernel build with
   some extra mdelay() calls. See the commit message for a description
   of the race scenario.
   On older kernels (before 6.5), I think the same bug can even
   theoretically lead to accessing transhuge page contents as a page table
   if you hit the right 5 narrow race windows (I haven't tested this case).
2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for
   detecting PMDs that don't point to page tables.
   On older kernels (before 6.5), you'd just have to win a single fairly
   wide race to hit this.
   I've tested this on 6.1 stable by racing migration (with a mdelay()
   patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86
   VM, that causes a kernel oops in ptlock_ptr().
3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed
   to yank page tables out from under us (though I haven't tested that),
   so I think the BUG_ON() checks in mfill_atomic() are just wrong.

I decided to write two separate fixes for these (one fix for bugs 1+2, one
fix for bug 3), so that the first fix can be backported to kernels
affected by bugs 1+2.

This patch (of 2):

This fixes two issues.

I discovered that the following race can occur:

  mfill_atomic                other thread
  ============                ============
                              <zap PMD>
  pmdp_get_lockless() [reads none pmd]
  <bail if trans_huge>
  <if none:>
                              <pagefault creates transhuge zeropage>
    __pte_alloc [no-op]
                              <zap PMD>
  <bail if pmd_trans_huge(*dst_pmd)>
  BUG_ON(pmd_none(*dst_pmd))

I have experimentally verified this in a kernel with extra mdelay() calls;
the BUG_ON(pmd_none(*dst_pmd)) triggers.

On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow
pte_offset_map[_lock]() to fail"), this can't lead to anything worse than
a BUG_ON(), since the page table access helpers are actually designed to
deal with page tables concurrently disappearing; but on older kernels
(<=6.4), I think we could probably theoretically race past the two
BUG_ON() checks and end up treating a hugepage as a page table.

The second issue is that, as Qi Zheng pointed out, there are other types
of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs
(in particular, migration PMDs).

On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a
PMD that contains a migration entry (which just requires winning a single,
fairly wide race), it will pass the PMD to pte_offset_map_lock(), which
assumes that the PMD points to a page table.

Breakage follows: First, the kernel tries to take the PTE lock (which will
crash or maybe worse if there is no "struct page" for the address bits in
the migration entry PMD - I think at least on X86 there usually is no
corresponding "struct page" thanks to the PTE inversion mitigation, amd64
looks different).

If that didn't crash, the kernel would next try to write a PTE into what
it wrongly thinks is a page table.

As part of fixing these issues, get rid of the check for pmd_trans_huge()
before __pte_alloc() - that's redundant, we're going to have to check for
that after the __pte_alloc() anyway.

Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Signed-off-by: Jann Horn <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: Qi Zheng <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agomm: vmalloc: ensure vmap_block is initialised before adding to queue
Will Deacon [Mon, 12 Aug 2024 17:16:06 +0000 (18:16 +0100)]
mm: vmalloc: ensure vmap_block is initialised before adding to queue

Commit 8c61291fd850 ("mm: fix incorrect vbq reference in
purge_fragmented_block") extended the 'vmap_block' structure to contain a
'cpu' field which is set at allocation time to the id of the initialising
CPU.

When a new 'vmap_block' is being instantiated by new_vmap_block(), the
partially initialised structure is added to the local 'vmap_block_queue'
xarray before the 'cpu' field has been initialised.  If another CPU is
concurrently walking the xarray (e.g.  via vm_unmap_aliases()), then it
may perform an out-of-bounds access to the remote queue thanks to an
uninitialised index.

This has been observed as UBSAN errors in Android:

 | Internal error: UBSAN: array index out of bounds: 00000000f2005512 [#1] PREEMPT SMP
 |
 | Call trace:
 |  purge_fragmented_block+0x204/0x21c
 |  _vm_unmap_aliases+0x170/0x378
 |  vm_unmap_aliases+0x1c/0x28
 |  change_memory_common+0x1dc/0x26c
 |  set_memory_ro+0x18/0x24
 |  module_enable_ro+0x98/0x238
 |  do_init_module+0x1b0/0x310

Move the initialisation of 'vb->cpu' in new_vmap_block() ahead of the
addition to the xarray.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8c61291fd850 ("mm: fix incorrect vbq reference in purge_fragmented_block")
Signed-off-by: Will Deacon <[email protected]>
Reviewed-by: Baoquan He <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Cc: Zhaoyang Huang <[email protected]>
Cc: Hailong.Liu <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agoselftests: mm: fix build errors on armhf
Muhammad Usama Anjum [Fri, 9 Aug 2024 08:25:11 +0000 (13:25 +0500)]
selftests: mm: fix build errors on armhf

The __NR_mmap isn't found on armhf.  The mmap() is commonly available
system call and its wrapper is present on all architectures.  So it should
be used directly.  It solves problem for armhf and doesn't create problem
for other architectures.

Remove sys_mmap() functions as they aren't doing anything else other than
calling mmap().  There is no need to set errno = 0 manually as glibc
always resets it.

For reference errors are as following:

  CC       seal_elf
seal_elf.c: In function 'sys_mmap':
seal_elf.c:39:33: error: '__NR_mmap' undeclared (first use in this function)
   39 |         sret = (void *) syscall(__NR_mmap, addr, len, prot,
      |                                 ^~~~~~~~~

mseal_test.c: In function 'sys_mmap':
mseal_test.c:90:33: error: '__NR_mmap' undeclared (first use in this function)
   90 |         sret = (void *) syscall(__NR_mmap, addr, len, prot,
      |                                 ^~~~~~~~~

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4926c7a52de7 ("selftest mm/mseal memory sealing")
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Cc: Jeff Xu <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
6 months agodrm/msm/dpu: enable writeback on SM6350
Dmitry Baryshkov [Sun, 3 Dec 2023 00:32:03 +0000 (03:32 +0300)]
drm/msm/dpu: enable writeback on SM6350

Enable WB2 hardware block, enabling writeback support on this platform.

Signed-off-by: Dmitry Baryshkov <[email protected]>
Tested-by: Luca Weiss <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/570194/
Link: https://lore.kernel.org/r/[email protected]
6 months agodrm/msm/dpu: enable writeback on SM6125
Dmitry Baryshkov [Sun, 3 Dec 2023 00:32:02 +0000 (03:32 +0300)]
drm/msm/dpu: enable writeback on SM6125

Enable WB2 hardware block, enabling writeback support on this platform.

Signed-off-by: Dmitry Baryshkov <[email protected]>
Reviewed-by: Abhinav Kumar <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/570193/
Link: https://lore.kernel.org/r/[email protected]
6 months agodrm/msm/dpu: enable writeback on SC8108X
Dmitry Baryshkov [Sun, 3 Dec 2023 00:32:01 +0000 (03:32 +0300)]
drm/msm/dpu: enable writeback on SC8108X

Enable WB2 hardware block, enabling writeback support on this platform.

Signed-off-by: Dmitry Baryshkov <[email protected]>
Reviewed-by: Abhinav Kumar <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/570196/
Link: https://lore.kernel.org/r/[email protected]
6 months agodrm/msm/dpu: enable writeback on SM8150
Dmitry Baryshkov [Sun, 3 Dec 2023 00:32:00 +0000 (03:32 +0300)]
drm/msm/dpu: enable writeback on SM8150

Enable WB2 hardware block, enabling writeback support on this platform.

Reviewed-by: Abhinav Kumar <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/570192/
Link: https://lore.kernel.org/r/[email protected]
[DB: picked up WB_SDM845_MASK from sdm845 patch]
Signed-off-by: Dmitry Baryshkov <[email protected]>
6 months agodrm/msm: fix %s null argument error
Sherry Yang [Tue, 27 Aug 2024 16:53:37 +0000 (09:53 -0700)]
drm/msm: fix %s null argument error

The following build error was triggered because of NULL string argument:

BUILDSTDERR: drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c: In function 'mdp5_smp_dump':
BUILDSTDERR: drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c:352:51: error: '%s' directive argument is null [-Werror=format-overflow=]
BUILDSTDERR:   352 |                         drm_printf(p, "%s:%d\t%d\t%s\n",
BUILDSTDERR:       |                                                   ^~
BUILDSTDERR: drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c:352:51: error: '%s' directive argument is null [-Werror=format-overflow=]

This happens from the commit a61ddb4393ad ("drm: enable (most) W=1
warnings by default across the subsystem"). Using "(null)" instead
to fix it.

Fixes: bc5289eed481 ("drm/msm/mdp5: add debugfs to show smp block status")
Signed-off-by: Sherry Yang <[email protected]>
Reviewed-by: Abhinav Kumar <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/611071/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
6 months agodrm/msm/dsi: correct programming sequence for SM8350 / SM8450
Dmitry Baryshkov [Sun, 4 Aug 2024 05:40:07 +0000 (08:40 +0300)]
drm/msm/dsi: correct programming sequence for SM8350 / SM8450

According to the display-drivers, 5nm DSI PLL (v4.2, v4.3) have
different boundaries for pll_clock_inverters programming. Follow the
vendor code and use correct values.

Fixes: 2f9ae4e395ed ("drm/msm/dsi: add support for DSI-PHY on SM8350 and SM8450")
Signed-off-by: Dmitry Baryshkov <[email protected]>
Reviewed-by: Abhinav Kumar <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/606947/
Link: https://lore.kernel.org/r/[email protected]
6 months agodrm/msm/dp: enable widebus on all relevant chipsets
Abhinav Kumar [Tue, 30 Jul 2024 19:50:11 +0000 (12:50 -0700)]
drm/msm/dp: enable widebus on all relevant chipsets

Hardware document indicates that widebus is recommended on DP on all
MDSS chipsets starting version 5.x.x and above.

Follow the guideline and mark widebus support on all relevant
chipsets for DP.

Fixes: 766f705204a0 ("drm/msm/dp: Remove now unused connector_type from desc")
Fixes: 1b2d98bdd7b7 ("drm/msm/dp: Add DisplayPort controller for SM8650")
Signed-off-by: Abhinav Kumar <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Fixes: 757a2f36ab09 ("drm/msm/dp: enable widebus feature for display port")
Fixes: 1b2d98bdd7b7 ("drm/msm/dp: Add DisplayPort controller for SM8650")
Patchwork: https://patchwork.freedesktop.org/patch/606556/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
6 months agodrm/msm: add msm8998 hdmi phy/pll support
Arnaud Vrac [Wed, 24 Jul 2024 15:01:37 +0000 (17:01 +0200)]
drm/msm: add msm8998 hdmi phy/pll support

Add support for the HDMI PHY as present on the Qualcomm MSM8998 SoC.
This code is mostly copy & paste of the vendor code from msm-4.4
kernel.lnx.4.4.r38-rel.

Signed-off-by: Arnaud Vrac <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/605631/
Link: https://lore.kernel.org/r/[email protected]
[DB: replaced division with do_div64 to fix build issues on ARM32]
Signed-off-by: Dmitry Baryshkov <[email protected]>
6 months agodrm/msm/hdmi: add "qcom,hdmi-tx-8998" compatible
Marc Gonzalez [Wed, 24 Jul 2024 15:01:36 +0000 (17:01 +0200)]
drm/msm/hdmi: add "qcom,hdmi-tx-8998" compatible

Current driver already supports the msm8998 HDMI TX.
We just need to add the compatible string.

Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/605632/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
6 months agodt-bindings: display/msm: hdmi: add qcom,hdmi-tx-8998
Marc Gonzalez [Wed, 24 Jul 2024 15:01:35 +0000 (17:01 +0200)]
dt-bindings: display/msm: hdmi: add qcom,hdmi-tx-8998

HDMI TX block embedded in the APQ8098.

Reviewed-by: Rob Herring (Arm) <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/605638/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
6 months agodt-bindings: phy: add qcom,hdmi-phy-8998
Marc Gonzalez [Wed, 24 Jul 2024 15:01:34 +0000 (17:01 +0200)]
dt-bindings: phy: add qcom,hdmi-phy-8998

HDMI PHY block embedded in the APQ8098.

Acked-by: Rob Herring (Arm) <[email protected]>
Acked-by: Vinod Koul <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/605634/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
6 months agodrm/msm/dpu: Configure DP INTF/PHY selector
Dmitry Baryshkov [Sat, 31 Aug 2024 10:10:44 +0000 (13:10 +0300)]
drm/msm/dpu: Configure DP INTF/PHY selector

Some platforms provides a mechanism for configuring the mapping between
(one or two) DisplayPort intfs and their PHYs.

In particular SC8180X requires this to be configured, since on this
platform there are fewer controllers than PHYs.

The change implements the logic for optionally configuring which PHY
each of the DP INTFs should be connected to and marks the SC8180X DPU to
program 2 entries.

For now the request is simply to program the mapping 1:1, any support
for alternative mappings is left until the use case arrise.

Note that e.g. msm-4.14 unconditionally maps INTF 0 to PHY 0 on all
platforms, so perhaps this is needed in order to get DisplayPort working
on some other platforms as well.

Co-developed-by: Bjorn Andersson <[email protected]>
Signed-off-by: Bjorn Andersson <[email protected]>
Signed-off-by: Dmitry Baryshkov <[email protected]>
Reviewed-by: Abhinav Kumar <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/600895/
Link: https://lore.kernel.org/r/[email protected]
6 months agoMerge tag 'x86-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 1 Sep 2024 21:43:08 +0000 (14:43 -0700)]
Merge tag 'x86-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:

 - x2apic_disable() clears x2apic_state and x2apic_mode unconditionally,
   even when the state is X2APIC_ON_LOCKED, which prevents the kernel to
   disable it thereby creating inconsistent state.

   Reorder the logic so it actually works correctly

 - The XSTATE logic for handling LBR is incorrect as it assumes that
   XSAVES supports LBR when the CPU supports LBR. In fact both
   conditions need to be true. Otherwise the enablement of LBR in the
   IA32_XSS MSR fails and subsequently the machine crashes on the next
   XRSTORS operation because IA32_XSS is not initialized.

   Cache the XSTATE support bit during init and make the related
   functions use this cached information and the LBR CPU feature bit to
   cure this.

 - Cure a long standing bug in KASLR

   KASLR uses the full address space between PAGE_OFFSET and vaddr_end
   to randomize the starting points of the direct map, vmalloc and
   vmemmap regions. It thereby limits the size of the direct map by
   using the installed memory size plus an extra configurable margin for
   hot-plug memory. This limitation is done to gain more randomization
   space because otherwise only the holes between the direct map,
   vmalloc, vmemmap and vaddr_end would be usable for randomizing.

   The limited direct map size is not exposed to the rest of the kernel,
   so the memory hot-plug and resource management related code paths
   still operate under the assumption that the available address space
   can be determined with MAX_PHYSMEM_BITS.

   request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1
   downwards. That means the first allocation happens past the end of
   the direct map and if unlucky this address is in the vmalloc space,
   which causes high_memory to become greater than VMALLOC_START and
   consequently causes iounmap() to fail for valid ioremap addresses.

   Cure this by exposing the end of the direct map via PHYSMEM_END and
   use that for the memory hot-plug and resource management related
   places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case
   PHYSMEM_END maps to a variable which is initialized by the KASLR
   initialization and otherwise it is based on MAX_PHYSMEM_BITS as
   before.

 - Prevent a data leak in mmio_read(). The TDVMCALL exposes the value of
   an initialized variabled on the stack to the VMM. The variable is
   only required as output value, so it does not have to exposed to the
   VMM in the first place.

 - Prevent an array overrun in the resource control code on systems with
   Sub-NUMA Clustering enabled because the code failed to adjust the
   index by the number of SNC nodes per L3 cache.

* tag 'x86-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Fix arch_mbm_* array overrun on SNC
  x86/tdx: Fix data leak in mmio_read()
  x86/kaslr: Expose and use the end of the physical memory address space
  x86/fpu: Avoid writing LBR bit to IA32_XSS unless supported
  x86/apic: Make x2apic_disable() work correctly

6 months agoMerge tag 'perf-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 1 Sep 2024 21:33:31 +0000 (14:33 -0700)]
Merge tag 'perf-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Thomas Gleixner:
 "A single fix for x86 performance monitoring.

  Haswell PMUs suffer from several errata and require a limit the
  minimal period for counter events, otherwise they suffer from endless
  loops in the PMU interrupt"

* tag 'perf-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Limit the period on Haswell

6 months agoMerge tag 'locking-urgent-2024-08-25' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 1 Sep 2024 21:26:33 +0000 (14:26 -0700)]
Merge tag 'locking-urgent-2024-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Thomas Gleixner:
 "A single fix for rt_mutex.

  The deadlock detection code drops into an infinite scheduling loop
  while still holding rt_mutex::wait_lock, which rightfully triggers a
  'scheduling in atomic' warning.

  Unlock it before that"

* tag 'locking-urgent-2024-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rtmutex: Drop rt_mutex::wait_lock before scheduling

6 months agoMerge tag 'irq-urgent-2024-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 1 Sep 2024 21:19:00 +0000 (14:19 -0700)]
Merge tag 'irq-urgent-2024-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A set of fixes for interrupt chip drivers:

   - Unbreak the PLIC driver for Allwinner D1 systems

     The recent conversion of the PLIC driver to a platform driver broke
     Allwinnder D1 systems due to the deferred probing of platform
     drivers.

     Due to that the only timer available on D1 systems cannot get an
     interrupt, which causes the system to hang at boot. Other RISCV
     platforms are not affected because they provide the architected SBI
     timer which uses the built in core interrupt controller.

     Cure this by probing PLIC early on D1 systems

   - Cure a regression in ARM/GIC-V3 on 32-bit ARM systems caused by the
     recent addition of a initialization function, which accesses system
     registers before they are enabled. On 64-bit ARM they are enabled
     prior to that by sheer luck.

     Ensure they are enabled.

   - Cure a use before check problem in the MSI library. The existing
     NULL pointer check is too late.

   - Cure a lock order inversion in the ARM/GIC-V4 driver

   - Fix a IS_ERR() vs. NULL pointer check issue in the RISCV APLIC
     driver

   - Plug a reference count leak in the ARM/GIC-V2 driver"

* tag 'irq-urgent-2024-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-msi-lib: Check for NULL ops in msi_lib_irq_domain_select()
  irqchip/gic-v3: Init SRE before poking sysregs
  irqchip/gic-v2m: Fix refcount leak in gicv2m_of_init()
  irqchip/riscv-aplic: Fix an IS_ERR() vs NULL bug in probe()
  irqchip/gic-v4: Fix ordering between vmapp and vpe locks
  irqchip/sifive-plic: Probe plic driver early for Allwinner D1 platform

6 months agobcachefs: fix rebalance accounting
Kent Overstreet [Sun, 1 Sep 2024 19:53:03 +0000 (15:53 -0400)]
bcachefs: fix rebalance accounting

Fixes: 49aa7830396b ("bcachefs: Fix rebalance_work accounting")
Signed-off-by: Kent Overstreet <[email protected]>
6 months agoMerge branch 'mctp-serial-tx-escapes'
David S. Miller [Sun, 1 Sep 2024 17:14:02 +0000 (18:14 +0100)]
Merge branch 'mctp-serial-tx-escapes'

Matt Johnston says:

====================
net: mctp-serial: Fix for missing tx escapes

The mctp-serial code to add escape characters was incorrect due to an
off-by-one error. This series adds a test for the chunking which splits
by escape characters, and fixes the bug.

v2: Fix kunit param const pointer
====================

Signed-off-by: David S. Miller <[email protected]>
6 months agonet: mctp-serial: Fix missing escapes on transmit
Matt Johnston [Thu, 29 Aug 2024 07:43:46 +0000 (15:43 +0800)]
net: mctp-serial: Fix missing escapes on transmit

0x7d and 0x7e bytes are meant to be escaped in the data portion of
frames, but this didn't occur since next_chunk_len() had an off-by-one
error. That also resulted in the final byte of a payload being written
as a separate tty write op.

The chunk prior to an escaped byte would be one byte short, and the
next call would never test the txpos+1 case, which is where the escaped
byte was located. That meant it never hit the escaping case in
mctp_serial_tx_work().

Example Input: 01 00 08 c8 7e 80 02

Previous incorrect chunks from next_chunk_len():

01 00 08
c8 7e 80
02

With this fix:

01 00 08 c8
7e
80 02

Cc: [email protected]
Fixes: a0c2ccd9b5ad ("mctp: Add MCTP-over-serial transport binding")
Signed-off-by: Matt Johnston <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 months agonet: mctp-serial: Add kunit test for next_chunk_len()
Matt Johnston [Thu, 29 Aug 2024 07:43:45 +0000 (15:43 +0800)]
net: mctp-serial: Add kunit test for next_chunk_len()

Test various edge cases of inputs that contain characters
that need escaping.

This adds a new kunit suite for mctp-serial.

Signed-off-by: Matt Johnston <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 months agodrm/msm/adreno: Add A306A support
Otto Pflüger [Mon, 22 Jul 2024 14:58:19 +0000 (16:58 +0200)]
drm/msm/adreno: Add A306A support

Add support for Adreno 306A GPU what is found in MSM8917 SoC.
This GPU marketing name is Adreno 308.

Signed-off-by: Otto Pflüger <[email protected]>
[use internal name of the GPU, reword the commit message]
Reviewed-by: Konrad Dybcio <[email protected]>
Signed-off-by: Barnabás Czémán <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/605403/
Signed-off-by: Rob Clark <[email protected]>
6 months agodrm/msm/a6xx: Add A621 support
Konrad Dybcio [Wed, 28 Aug 2024 15:06:59 +0000 (17:06 +0200)]
drm/msm/a6xx: Add A621 support

A621 is a clear A662 derivative (same lineage as A650), no explosions
or sick features, other than a NoC bug which can stall the GPU..

Add support for it.

Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/611100/
Signed-off-by: Rob Clark <[email protected]>
6 months agodrm/msm/a6xx: Set GMU CGC properties on a6xx too
Konrad Dybcio [Wed, 28 Aug 2024 15:06:58 +0000 (17:06 +0200)]
drm/msm/a6xx: Set GMU CGC properties on a6xx too

This was apparently never done before.. Program the expected values.

This also gets rid of sneakily setting that register through the HWCG
reg list on A690.

Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/611098/
Signed-off-by: Rob Clark <[email protected]>
6 months agodrm/msm/a6xx: Use the per-GPU value for gmu_cgc_mode
Konrad Dybcio [Wed, 28 Aug 2024 15:06:57 +0000 (17:06 +0200)]
drm/msm/a6xx: Use the per-GPU value for gmu_cgc_mode

This register's magic value differs wildly between different GPUs, use
the hardcoded data instead of trying to make some logic out of it.

Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/611096/
Signed-off-by: Rob Clark <[email protected]>
6 months agodrm/msm/a6xx: Store correct gmu_cgc_mode in struct a6xx_info
Konrad Dybcio [Wed, 28 Aug 2024 15:06:56 +0000 (17:06 +0200)]
drm/msm/a6xx: Store correct gmu_cgc_mode in struct a6xx_info

Store the correct values that we happen to have for some A7xx SKUs in
the GPU info struct and fill out the missing information for A6xx GPUs
based on downstream kernel information.

Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/611094/
[add missing entry to a615 catalog to resolve conflict]
Signed-off-by: Rob Clark <[email protected]>
6 months agodrm/msm/a6xx: Store primFifoThreshold in struct a6xx_info
Konrad Dybcio [Wed, 28 Aug 2024 15:06:55 +0000 (17:06 +0200)]
drm/msm/a6xx: Store primFifoThreshold in struct a6xx_info

The if-else monster is so unmaintainable that one case is repeated
twice. Get rid of it.

Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/611092/
[add missing entry to a615 catalog to resolve conflict]
Signed-off-by: Rob Clark <[email protected]>
6 months agodrm/msm/a6xx: Evaluate adreno_is_a650_family in pdc_in_aop check
Konrad Dybcio [Fri, 19 Jul 2024 10:03:26 +0000 (12:03 +0200)]
drm/msm/a6xx: Evaluate adreno_is_a650_family in pdc_in_aop check

A650 family includes A660 family (they've got a big family), A650
itself, and some more A6XX_GEN3 SKUs, all of which should fall into
the same branch of the if-condition. Simplify that.

Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/605206/
Signed-off-by: Rob Clark <[email protected]>
6 months agodrm/msm/a5xx: workaround early ring-buffer emptiness check
Vladimir Lypak [Sun, 1 Sep 2024 13:54:03 +0000 (13:54 +0000)]
drm/msm/a5xx: workaround early ring-buffer emptiness check

There is another cause for soft lock-up of GPU in empty ring-buffer:
race between GPU executing last commands and CPU checking ring for
emptiness. On GPU side IRQ for retire is triggered by CACHE_FLUSH_TS
event and RPTR shadow (which is used to check ring emptiness) is updated
a bit later from CP_CONTEXT_SWITCH_YIELD. Thus if GPU is executing its
last commands slow enough or we check that ring too fast we will miss a
chance to trigger switch to lower priority ring because current ring isn't
empty just yet. This can escalate to lock-up situation described in
previous patch.
To work-around this issue we keep track of last submit sequence number
for each ring and compare it with one written to memptrs from GPU during
execution of CACHE_FLUSH_TS event.

Fixes: b1fc2839d2f9 ("drm/msm: Implement preemption for A5XX targets")
Signed-off-by: Vladimir Lypak <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/612047/
Signed-off-by: Rob Clark <[email protected]>
6 months agodrm/msm/a5xx: fix races in preemption evaluation stage
Vladimir Lypak [Sun, 1 Sep 2024 13:54:02 +0000 (13:54 +0000)]
drm/msm/a5xx: fix races in preemption evaluation stage

On A5XX GPUs when preemption is used it's invietable to enter a soft
lock-up state in which GPU is stuck at empty ring-buffer doing nothing.
This appears as full UI lockup and not detected as GPU hang (because
it's not). This happens due to not triggering preemption when it was
needed. Sometimes this state can be recovered by some new submit but
generally it won't happen because applications are waiting for old
submits to retire.

One of the reasons why this happens is a race between a5xx_submit and
a5xx_preempt_trigger called from IRQ during submit retire. Former thread
updates ring->cur of previously empty and not current ring right after
latter checks it for emptiness. Then both threads can just exit because
for first one preempt_state wasn't NONE yet and for second one all rings
appeared to be empty.

To prevent such situations from happening we need to establish guarantee
for preempt_trigger to make decision after each submit or retire. To
implement this we serialize preemption initiation using spinlock. If
switch is already in progress we need to re-trigger preemption when it
finishes.

Fixes: b1fc2839d2f9 ("drm/msm: Implement preemption for A5XX targets")
Signed-off-by: Vladimir Lypak <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/612045/
Signed-off-by: Rob Clark <[email protected]>
6 months agodrm/msm/a5xx: properly clear preemption records on resume
Vladimir Lypak [Sun, 1 Sep 2024 13:54:01 +0000 (13:54 +0000)]
drm/msm/a5xx: properly clear preemption records on resume

Two fields of preempt_record which are used by CP aren't reset on
resume: "data" and "info". This is the reason behind faults which happen
when we try to switch to the ring that was active last before suspend.
In addition those faults can't be recovered from because we use suspend
and resume to do so (keeping values of those fields again).

Fixes: b1fc2839d2f9 ("drm/msm: Implement preemption for A5XX targets")
Signed-off-by: Vladimir Lypak <[email protected]>
Reviewed-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/612043/
Signed-off-by: Rob Clark <[email protected]>
6 months agodrm/msm/a5xx: disable preemption in submits by default
Vladimir Lypak [Sun, 1 Sep 2024 13:54:00 +0000 (13:54 +0000)]
drm/msm/a5xx: disable preemption in submits by default

Fine grain preemption (switching from/to points within submits)
requires extra handling in command stream of those submits, especially
when rendering with tiling (using GMEM). However this handling is
missing at this point in mesa (and always was). For this reason we get
random GPU faults and hangs if more than one priority level is used
because local preemption is enabled prior to executing command stream
from submit.
With that said it was ahead of time to enable local preemption by
default considering the fact that even on downstream kernel it is only
enabled if requested via UAPI.

Fixes: a7a4c19c36de ("drm/msm/a5xx: fix setting of the CP_PREEMPT_ENABLE_LOCAL register")
Signed-off-by: Vladimir Lypak <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/612041/
Signed-off-by: Rob Clark <[email protected]>
This page took 0.144281 seconds and 4 git commands to generate.