]> Git Repo - linux.git/log
linux.git
6 months agodrm/xe/xe2hpg: Add Wa_15016589081
Tejas Upadhyay [Wed, 4 Sep 2024 10:13:33 +0000 (15:43 +0530)]
drm/xe/xe2hpg: Add Wa_15016589081

Wa_15016589081 applies to xe2_hpg renderCS

V2(Gustavo)
  - rename bit macro

Signed-off-by: Tejas Upadhyay <[email protected]>
Reviewed-by: Gustavo Sousa <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
6 months agodrm/xe: Add a xe_bo subtest for shrinking / swapping
Thomas Hellström [Mon, 9 Sep 2024 08:56:54 +0000 (10:56 +0200)]
drm/xe: Add a xe_bo subtest for shrinking / swapping

Add a subtest that tries to allocate twice the amount of
buffer object memory available, write data to it and then read
all the data back verifying data integrity.
In order to be able to do this on systems that
have no or not enough swap-space available, allocate some memory
as purgeable, and introduce a function to purge such memory from
the TTM swap_notify path.

this test is intended to add test coverage to the current
bo swap path and upcoming shrinking path.

The test has previously been part of the xe bo shrinker series.

v2:
- Skip test if the execution time is expected to be too long.
- Minor code cleanups.

v3:
- Print random seed. (Matthew Auld)

Cc: Rodrigo Vivi <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Matthew Auld <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: fix build warning with CONFIG_PM=n
Arnd Bergmann [Mon, 9 Sep 2024 20:25:08 +0000 (20:25 +0000)]
drm/xe: fix build warning with CONFIG_PM=n

The 'runtime_status' field is an implementation detail of the
power management code, so a device driver should not normally
touch this:

drivers/gpu/drm/xe/xe_pm.c: In function 'xe_pm_suspending_or_resuming':
drivers/gpu/drm/xe/xe_pm.c:606:26: error: 'struct dev_pm_info' has no member named 'runtime_status'
  606 |         return dev->power.runtime_status == RPM_SUSPENDING ||
      |                          ^
drivers/gpu/drm/xe/xe_pm.c:607:27: error: 'struct dev_pm_info' has no member named 'runtime_status'
  607 |                 dev->power.runtime_status == RPM_RESUMING;
      |                           ^
drivers/gpu/drm/xe/xe_pm.c:608:1: error: control reaches end of non-void function [-Werror=return-type]

Add an #ifdef check to avoid the build regression.

Fixes: cb85e39dc5d1 ("drm/xe: Suppress missing outer rpm protection warning")
Reviewed-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Don't keep stale pointer to bo->ggtt_node
Michal Wajdeczko [Fri, 6 Sep 2024 22:03:48 +0000 (00:03 +0200)]
drm/xe: Don't keep stale pointer to bo->ggtt_node

When we fail to map a BO in the GGTT, we release our GGTT node
placeholder, but leave stale bo->ggtt_node pointer to it, which
triggers an assert immediately followed by a crash, due to UAF:

[ ] xe 0000:00:02.0: [drm] Assertion `bo->ggtt_node->base.size == bo->size` failed!
[ ] WARNING: CPU: 4 PID: 126 at drivers/gpu/drm/xe/xe_ggtt.c:689 xe_ggtt_remove_bo+0x1d9/0x250 [xe]
[ ] RIP: 0010:xe_ggtt_remove_bo+0x1d9/0x250 [xe]
[ ] Call Trace:
[ ]  <TASK>
[ ]  ? __warn+0x88/0x190
[ ]  ? xe_ggtt_remove_bo+0x1d9/0x250 [xe]
[ ]  ? report_bug+0x1c3/0x1d0
[ ]  ? handle_bug+0x42/0x70
[ ]  ? exc_invalid_op+0x14/0x70
[ ]  ? asm_exc_invalid_op+0x16/0x20
[ ]  ? xe_ggtt_remove_bo+0x1d9/0x250 [xe]
[ ]  ? xe_ggtt_remove_bo+0x1d9/0x250 [xe]
[ ]  xe_ttm_bo_destroy+0x11f/0x260 [xe]
[ ]  ? ttm_bo_release+0x31c/0x350 [ttm]
[ ]  ? __mutex_unlock_slowpath+0x35/0x270
[ ]  __xe_bo_create_locked+0x4a0/0x550 [xe]
[ ]  ? mark_held_locks+0x49/0x80
[ ]  xe_bo_create_pin_map_at+0x37/0x200 [xe]
[ ]  xe_bo_create_pin_map+0x11/0x20 [xe]

While around, for similar reason, also don't keep an error pointer
if we fail to allocate ggtt_node placeholder.

Fixes: 34e804220f69 ("drm/xe: Make xe_ggtt_node struct independent")
Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Mark reserved engines in snapshot
Lucas De Marchi [Fri, 6 Sep 2024 20:56:09 +0000 (13:56 -0700)]
drm/xe: Mark reserved engines in snapshot

When printing <debufs>/gt*/hw_engines, it's useful to mark
what engines are reserved so it doesn't mislead developers
while debugging.

Cc: José Roberto de Souza <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
6 months agodrm/xe: Fix arg to pci_iomap()
Lucas De Marchi [Fri, 6 Sep 2024 03:25:07 +0000 (20:25 -0700)]
drm/xe: Fix arg to pci_iomap()

Commit 2d8865b27724 ("drm/xe: Move BAR definitions to dedicated file")
moved the BAR definition to the header, but replaced the wrong arg in
the pci_iomap() function - the last arg is actuall the length, not the
BAR. Luckily GTTMMADR_BAR == 0, so it still works. Fix the argument
to avoid confusion.

Cc: Michal Wajdeczko <[email protected]>
Reviewed-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Alan Previn <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
6 months agodrm/xe: Update runtime detection of has_flat_ccs
Lucas De Marchi [Wed, 4 Sep 2024 16:22:38 +0000 (09:22 -0700)]
drm/xe: Update runtime detection of has_flat_ccs

It's confusing to have a *set* function that actually probes the
hardware rather than receiving a parameter. Rename it to *probe* along
with prefix removal and comment in the relevant places that the
has_flat_ccs flag may be overridden in runtime.

While at it, fix the mixed declaration of struct xe_gt.

Reviewed-by: Matt Roper <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
6 months agodrm/xe: Cleanup has_flat_ccs handling
Lucas De Marchi [Wed, 4 Sep 2024 16:22:37 +0000 (09:22 -0700)]
drm/xe: Cleanup has_flat_ccs handling

The flag is set in XE_HP_FEATURES, but then overridden in all but one
xe_graphics_desc. Make it set only where needed.

Reviewed-by: Jonathan Cavitt <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
6 months agodrm/xe: fix missing 'xe_vm_put'
Dafna Hirschfeld [Sun, 1 Sep 2024 04:42:27 +0000 (07:42 +0300)]
drm/xe: fix missing 'xe_vm_put'

Fix memleak caused by missing xe_vm_put

Fixes: 852856e3b6f6 ("drm/xe: Use reserved copy engine for user binds on faulting devices")
Signed-off-by: Dafna Hirschfeld <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Suppress missing outer rpm protection warning
Rodrigo Vivi [Thu, 5 Sep 2024 14:02:15 +0000 (10:02 -0400)]
drm/xe: Suppress missing outer rpm protection warning

Do not raise a WARN if we are likely within suspending or resuming
path. This is likely this false positive:

rpm_status:           0000:03:00.0 status=RPM_SUSPENDING
console:              xe_bo_evict_all (called from suspend)
xe_sched_job_create:  dev=0000:03:00.0, ...
xe_sched_job_exec:    dev=0000:03:00.0, ...
xe_pm_runtime_put:    dev=0000:03:00.0, ...
xe_sched_job_run:     dev=0000:03:00.0, ...
rpm_usage:            0000:03:00.0 flags-0 cnt-2  ...
rpm_usage:            0000:03:00.0 flags-0 cnt-2  ...
rpm_usage:            0000:03:00.0 flags-0 cnt-2  ...
console:              xe 0000:03:00.0: [drm] Missing outer runtime
                                                     PM protection
console:               xe_guc_ct_send+0x15/0x50 [xe]
console:               guc_exec_queue_run_job+0x1509/0x3950 [xe]
[snip]
console:               drm_sched_run_job_work+0x649/0xc20

At this point, BOs are getting evicted from VRAM with rpm
usage-counter = 2, but rpm status = SUSPENDING.

The xe->pm_callback_task won't be equal 'current' because this call is
coming from a work queue.

So, pm_runtime_get_if_active() will be called and return 0 because rpm
status != ACTIVE (but equal SUSPENDING or RESUMING).

v2: Still get the reference even on non suspending/resuming
    path (Jonathan, Brost).

Cc: Matthew Brost <[email protected]>
Cc: Matthew Auld <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe/xe_gt_idle: add debugfs entry for powergating info
Riana Tauro [Fri, 6 Sep 2024 07:11:26 +0000 (12:41 +0530)]
drm/xe/xe_gt_idle: add debugfs entry for powergating info

Coarse Powergating is a power saving technique where Render and Media
can be power-gated independently irrespective of the rest of the GT.

For debug purposes, it is useful to expose the powergating information.

v2: move to debugfs
    add details to commit message
    add per-slice status for media
    define reg bits in descending order (Matt Roper)

v3: fix return statement
    fix kernel-doc
    use loop for media slices
    use helper function for status (Michal)

v4: add pg prefix
    do not wake GT if in C6 (Badal)

Signed-off-by: Riana Tauro <[email protected]>
Reviewed-by: Badal Nilawar <[email protected]>
Acked-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe/xe_gt_idle: modify powergate enable condition
Riana Tauro [Fri, 6 Sep 2024 07:11:25 +0000 (12:41 +0530)]
drm/xe/xe_gt_idle: modify powergate enable condition

Modify powergate enable condition based on the type of GT or presence of
media engines. Also have a copy of the value written to powergate enable
register.

v2: add condition to enable render or media powergating (Badal)

v3: fix commit message (Shekhar)
    fix kernel-doc

Signed-off-by: Riana Tauro <[email protected]>
Reviewed-by: Shekhar Chauhan <[email protected]>
Reviewed-by: Badal Nilawar <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: use IS_ENABLED() instead of defined() on config options
Jani Nikula [Wed, 4 Sep 2024 14:52:31 +0000 (17:52 +0300)]
drm/xe: use IS_ENABLED() instead of defined() on config options

Prefer IS_ENABLED() instead of defined() for checking whether a kconfig
option is enabled.

Reviewed-by: Badal Nilawar <[email protected]>
Reviewed-by: Ashutosh Dixit <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Jani Nikula <[email protected]>
6 months agodrm/xe/pciids: separate ARL and MTL PCI IDs
Jani Nikula [Wed, 4 Sep 2024 09:46:49 +0000 (12:46 +0300)]
drm/xe/pciids: separate ARL and MTL PCI IDs

Avoid including PCI IDs for one platform to the PCI IDs of another. It's
more clear to deal with them completely separately at the PCI ID macro
level.

Reviewed-by: Shekhar Chauhan <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/a30cb0da7694a8eccceba66d676ac59aa0e96176.1725443121.git.jani.nikula@intel.com
6 months agodrm/xe/pciids: separate RPL-U and RPL-P PCI IDs
Jani Nikula [Wed, 4 Sep 2024 09:46:48 +0000 (12:46 +0300)]
drm/xe/pciids: separate RPL-U and RPL-P PCI IDs

Avoid including PCI IDs for one platform to the PCI IDs of another. It's
more clear to deal with them completely separately at the PCI ID macro
level.

Reviewed-by: Sai Teja Pottumuttu <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/4868d36fbfa8c38ea2d490bca82cf6370b8d65dd.1725443121.git.jani.nikula@intel.com
6 months agodrm/xe/pciids: add some missing ADL-N PCI IDs
Jani Nikula [Wed, 4 Sep 2024 09:46:47 +0000 (12:46 +0300)]
drm/xe/pciids: add some missing ADL-N PCI IDs

Similar to commit 425b463859ed ("drm/i915: Update ADL-N PCI IDs").

Reviewed-by: Sai Teja Pottumuttu <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/47d543393e4026588401a03c4e3ce12ce29780e3.1725443121.git.jani.nikula@intel.com
6 months agodrm/xe/pat: sanity check compression and coh_mode
Matthew Auld [Wed, 28 Aug 2024 09:22:58 +0000 (10:22 +0100)]
drm/xe/pat: sanity check compression and coh_mode

There is an implicit assumption in the driver that compression and
coh_1way+ are mutually exclusive. If this is ever not true then userptr
and imported dma-buf from external device will have uncleared ccs state.
Add a build bug for this so we don't forget.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Cc: Nirmoy Das <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: prevent potential UAF in pf_provision_vf_ggtt()
Matthew Auld [Wed, 28 Aug 2024 10:43:42 +0000 (11:43 +0100)]
drm/xe: prevent potential UAF in pf_provision_vf_ggtt()

The node ptr can point to an already freed ptr, if we hit the path with
an already allocated node. We later dereference that pointer with:

xe_gt_assert(gt, !xe_ggtt_node_allocated(node));

which is a potential UAF. Fix this by not stashing the ptr for node.
Also since it is likely a bad idea to leave config->ggtt_region pointing
to a stale ptr, also set that to NULL by calling
pf_release_vf_config_ggtt() instead of pf_release_ggtt().

Fixes: 34e804220f69 ("drm/xe: Make xe_ggtt_node struct independent")
Signed-off-by: Matthew Auld <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Replace double space with single space after comma
Nitin Gote [Fri, 23 Aug 2024 08:06:43 +0000 (13:36 +0530)]
drm/xe: Replace double space with single space after comma

Avoid using double space, ",  " in function or macro parameters
where it's not required by any alignment purpose. Replace it with
a single space, ", ".

Signed-off-by: Nitin Gote <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
6 months agodrm/xe/pf: Sanitize VF scratch registers on FLR
Michal Wajdeczko [Mon, 2 Sep 2024 19:29:53 +0000 (21:29 +0200)]
drm/xe/pf: Sanitize VF scratch registers on FLR

Some VF accessible registers (like GuC scratch registers) must be
explicitly reset during the FLR. While this is today done by the GuC
firmware, according to the design, this should be responsibility of
the PF driver, as future platforms may require more registers to be
reset. Likewise GuC, the PF can access VFs registers by adding some
platform specific offset to the original register address.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Use xe_pm_runtime_get in xe_bo_move() if reclaim-safe.
Thomas Hellström [Tue, 3 Sep 2024 09:42:32 +0000 (11:42 +0200)]
drm/xe: Use xe_pm_runtime_get in xe_bo_move() if reclaim-safe.

xe_bo_move() might be called in the TTM swapout path from validation
by another TTM device. If so, we are not likely to have a RPM
reference. So iff xe_pm_runtime_get() is safe to call from reclaim,
use it instead of xe_pm_runtime_get_noresume().

Strictly this is currently needed only if handle_system_ccs is true,
but use xe_pm_runtime_get() if possible anyway to increase test
coverage.

At the same time warn if handle_system_ccs is true and we can't
call xe_pm_runtime_get() from reclaim context. This will likely trip
if someone tries to enable SRIOV on LNL, without fixing Xe SRIOV
runtime resume / suspend.

Cc: Rodrigo Vivi <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Matthew Auld <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/display: Avoid encoder_suspend at runtime suspend
Rodrigo Vivi [Fri, 30 Aug 2024 18:35:07 +0000 (14:35 -0400)]
drm/xe/display: Avoid encoder_suspend at runtime suspend

Fix circular locking dependency on runtime suspend.

<4> [74.952215] ======================================================
<4> [74.952217] WARNING: possible circular locking dependency detected
<4> [74.952219] 6.10.0-rc7-xe #1 Not tainted
<4> [74.952221] ------------------------------------------------------
<4> [74.952223] kworker/7:1/82 is trying to acquire lock:
<4> [74.952226] ffff888120548488 (&dev->mode_config.mutex){+.+.}-{3:3}, at: drm_modeset_lock_all+0x40/0x1e0 [drm]
<4> [74.952260]
but task is already holding lock:
<4> [74.952262] ffffffffa0ae59c0 (xe_pm_runtime_lockdep_map){+.+.}-{0:0}, at: xe_pm_runtime_suspend+0x2f/0x340 [xe]
<4> [74.952322]
which lock already depends on the new lock.

The commit 'b1d90a86 ("drm/xe: Use the encoder suspend helper also used
by the i915 driver")' didn't do anything wrong. It actually fixed a
critical bug, because the encoder_suspend was never getting actually
called because it was returning if (has_display(xe)) instead of
if (!has_display(xe)). However, this ended up introducing the encoder
suspend calls in the runtime routines as well, causing the circular
locking dependency.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2304
Fixes: b1d90a862c89 ("drm/xe: Use the encoder suspend helper also used by the i915 driver")
Cc: Imre Deak <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Add missing runtime reference to wedged upon gt_reset
Rodrigo Vivi [Fri, 30 Aug 2024 18:35:06 +0000 (14:35 -0400)]
drm/xe: Add missing runtime reference to wedged upon gt_reset

Fixes this missed case:

xe 0000:00:02.0: [drm] Missing outer runtime PM protection
WARNING: CPU: 99 PID: 1455 at drivers/gpu/drm/xe/xe_pm.c:564 xe_pm_runtime_get_noresume+0x48/0x60 [xe]
Call Trace:
<TASK>
? show_regs+0x67/0x70
? __warn+0x94/0x1b0
? xe_pm_runtime_get_noresume+0x48/0x60 [xe]
? report_bug+0x1b7/0x1d0
? handle_bug+0x46/0x80
? exc_invalid_op+0x19/0x70
? asm_exc_invalid_op+0x1b/0x20
? xe_pm_runtime_get_noresume+0x48/0x60 [xe]
xe_device_declare_wedged+0x91/0x280 [xe]
gt_reset_worker+0xa2/0x250 [xe]

v2: Also move get and get the right Fixes tag (Himal, Brost)

Fixes: fb74b205cdd2 ("drm/xe: Introduce a simple wedged state")
Cc: Himal Prasad Ghimiray <[email protected]>
Cc: Matthew Brost <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Remove redundant [drm] tag from xe_assert() message
Michal Wajdeczko [Mon, 2 Sep 2024 19:07:26 +0000 (21:07 +0200)]
drm/xe: Remove redundant [drm] tag from xe_assert() message

Since commit 178c0a33c421 ("drm/print: Add generic drm dev printk
function") the output from drm_WARN() includes previously missing
the [drm] tag, so now xe_assert() is printing it twice:

  [ ] xe 0000:00:02.0: [drm] [drm] Assertion `false` failed!

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/pf: Reset thresholds when releasing a VF config
Michal Wajdeczko [Fri, 30 Aug 2024 13:21:00 +0000 (15:21 +0200)]
drm/xe/pf: Reset thresholds when releasing a VF config

As part of the VF config release, we should reset all parameters,
including thresholds, to always start with the clean VF config.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/pf: Add thresholds to the VF KLV config
Michal Wajdeczko [Fri, 30 Aug 2024 13:20:59 +0000 (15:20 +0200)]
drm/xe/pf: Add thresholds to the VF KLV config

We are pushing threshold KLV to the GuC immediately during the
threshold provisioning, but those configs will be lost during a
GT reset.  Include threshold KLVs while encoding full VF config
buffer to make sure the GuC receives all of the config KLVs.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Invalidate media_gt TLBs in PT code
Matthew Brost [Mon, 26 Aug 2024 17:01:44 +0000 (10:01 -0700)]
drm/xe: Invalidate media_gt TLBs in PT code

Testing on LNL has shown media GT's TLBs need to be invalidated via the
GuC, update PT code appropriately.

v2:
 - Do dma_fence_get before first call of invalidation_fence_init (Himal)
 - No need to check for valid chain fence (Himal)
v3:
 - Use dma-fence-array

Fixes: 3330361543fc ("drm/xe/lnl: Add LNL platform definition")
Signed-off-by: Matthew Brost <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodma-buf: Split out dma fence array create into alloc and arm functions
Matthew Brost [Mon, 26 Aug 2024 17:01:43 +0000 (10:01 -0700)]
dma-buf: Split out dma fence array create into alloc and arm functions

Useful to preallocate dma fence array and then arm in path of reclaim or
a dma fence.

v2:
 - s/arm/init (Christian)
 - Drop !array warn (Christian)
v3:
 - Fix kernel doc typos (dim)

Cc: Sumit Semwal <[email protected]>
Cc: Christian König <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/hwmon: Treat hwmon as a per-device concept
Matt Roper [Thu, 29 Aug 2024 22:06:22 +0000 (15:06 -0700)]
drm/xe/hwmon: Treat hwmon as a per-device concept

There's only one instance of hwmon per device, and MMIO access to it is
always done through the root tile.  The code has been passing around a
pointer to the root tile's primary GT, which is confusing since this
isn't really a GT-level concept.  Replace that pointer with an xe_device
pointer and use xe_root_mmio_gt(xe) to get a pointer when we need to do
register MMIO.  This makes things easier to follow, and also cleans up
the code in preparation for a much larger MMIO register access overhaul
that's coming soon.

Signed-off-by: Matt Roper <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/pcode: Treat pcode as per-tile rather than per-GT
Matt Roper [Thu, 29 Aug 2024 22:06:21 +0000 (15:06 -0700)]
drm/xe/pcode: Treat pcode as per-tile rather than per-GT

There's only one instance of the pcode per tile, and for GT-related
accesses both the primary and media GT share the same register
interface.  Since Xe was using per-GT locking, the pcode mutex wasn't
actually protecting everything that it should since concurrent accesses
related to a tile's primary GT and media GT were possible.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matt Roper <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/display: Drop unnecessary xe_gt.h includes
Matt Roper [Thu, 29 Aug 2024 23:03:08 +0000 (16:03 -0700)]
drm/xe/display: Drop unnecessary xe_gt.h includes

None of the Xe display files work directly with the GT or need anything
from xe_gt.h.  Drop the unnecessary include.

Signed-off-by: Matt Roper <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Fix memory leak on xe_alloc_pf_queue failure
Nirmoy Das [Mon, 26 Aug 2024 16:20:35 +0000 (18:20 +0200)]
drm/xe: Fix memory leak on xe_alloc_pf_queue failure

Simplify memory unwinding on error also fixing current memory
leak that can happen on error.

v2: use devm_kcalloc(Matt A)

Fixes: 3338e4f90c14 ("drm/xe: Use topology to determine page fault queue size")
Cc: Matthew Auld <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Stuart Summers <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
6 months agodrm/xe/pf: Improve VF control
Michal Wajdeczko [Wed, 28 Aug 2024 21:08:09 +0000 (23:08 +0200)]
drm/xe/pf: Improve VF control

Our initial VF control implementation was focused on providing
a very minimal support for the VF_STATE_NOTIFY events just to
meet GuC requirements, without tracking a VF state or doing any
expected actions (like cleanup in case of the FLR notification).

Try to improve this by defining set of VF state machines, each
responsible for processing one activity (PAUSE, RESUME, STOP or
FLR). All required steps defined by the VF state machine are then
executed by the PF worker from the dedicated workqueue.

Any external requests or notifications simply try to transition
between the states to trigger a work and then wait for that work
to finish. Some predefined default timeouts are used to avoid
changing existing API calls, but it should be easy to extend the
control API to also accept specific timeout values.

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Piotr Piórkowski <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/pf: Drop GuC notifications for non-existing VF
Michal Wajdeczko [Wed, 28 Aug 2024 21:08:08 +0000 (23:08 +0200)]
drm/xe/pf: Drop GuC notifications for non-existing VF

It is unlikely that GuC will ever send a G2H notification with an
invalid VFID and it is currently harmless if that actually happen.
But in upcoming patches we will start using that VFID as an index
and we must be sure it is a valid to avoid a crash due to a buggy
firmware or a currupted G2H message.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/pf: Fix documentation formatting
Michal Wajdeczko [Wed, 28 Aug 2024 21:08:07 +0000 (23:08 +0200)]
drm/xe/pf: Fix documentation formatting

Current formatting of "The VF FLR Flow with GuC" only looks fine,
but it will not render properly when included in htmldocs due to:

  WARNING: Block quote ends without a blank line; unexpected unindent.
  CRITICAL: Missing matching underline for section title overline.

Fix that by adding proper indent and using list markup.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/pf: Add function to sanitize VF resources
Michal Wajdeczko [Wed, 28 Aug 2024 21:08:06 +0000 (23:08 +0200)]
drm/xe/pf: Add function to sanitize VF resources

On current platforms it is a PF driver responsibility to clear
some of the VF's resources during a VF FLR. Add simple function
that will clear configured VF resources (GGTT, LMEM). We will
start using this function soon.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/gsc: Wedge the device if the GSCCS reset fails
Daniele Ceraolo Spurio [Wed, 28 Aug 2024 22:14:57 +0000 (15:14 -0700)]
drm/xe/gsc: Wedge the device if the GSCCS reset fails

Due to the special handling of the GSCCS in HW, we can't escalate to GT
reset when we receive the reset failure interrupt; the specs indicate
that we should trigger an FLR instead, but we do not have support for
that at the moment, so the HW will stay permanently in a broken state.
We should therefore mark the device as wedged, the same as if the GT
reset had failed.

Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Reviewed-by: Julia Filipchuk <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/gsc: Add debugfs to print GSC info
Daniele Ceraolo Spurio [Wed, 28 Aug 2024 21:51:57 +0000 (14:51 -0700)]
drm/xe/gsc: Add debugfs to print GSC info

This is useful for debug, in case something goes wrong with the GSC. The
info includes the version information and the current value of the HECI1
status registers.

Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Cc: John Harrison <[email protected]>
Cc: Alan Previn <[email protected]>
Reviewed-by: Julia Filipchuk <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/gsc: Track the platform in the compatibility version
Daniele Ceraolo Spurio [Wed, 28 Aug 2024 21:51:56 +0000 (14:51 -0700)]
drm/xe/gsc: Track the platform in the compatibility version

The GSC compatibility version number is reset for each new platform. To
indicate this, the version includes a number that identifies the
platform (102 = MTL, 104 = LNL); this matches what happens for the
release version, where the major number also identifies a platform.

To make it clearer in our logs that the compatibility version is
specific to the platform, it is useful to include this platform number.
However, given that our binary names already include the platform, it is
not necessary to add this extra number there.

Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Cc: John Harrison <[email protected]>
Cc: Alan Previn <[email protected]>
Reviewed-by: Julia Filipchuk <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/gsc: Fix FW status if the firmware is already loaded
Daniele Ceraolo Spurio [Wed, 28 Aug 2024 21:51:55 +0000 (14:51 -0700)]
drm/xe/gsc: Fix FW status if the firmware is already loaded

We set the FW status to "TRANSFERRED" after the load completes and to
"RUNNING"once we're done with proxy init, so do the same if we're trying
to re-load the FW and it is already loaded.

Note that there is no difference in driver behavior between the 2
states, but it's useful to be accurate when we dump the status for
debug.

Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Cc: John Harrison <[email protected]>
Cc: Alan Previn <[email protected]>
Reviewed-by: Julia Filipchuk <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/gsc: Do not attempt to load the GSC multiple times
Daniele Ceraolo Spurio [Wed, 28 Aug 2024 21:51:54 +0000 (14:51 -0700)]
drm/xe/gsc: Do not attempt to load the GSC multiple times

The GSC HW is only reset by driver FLR or D3cold entry. We don't support
the former at runtime, while the latter is only supported on DGFX, for
which we don't support GSC. Therefore, if GSC failed to load previously
there is no need to try again because the HW is stuck in the error state.

An assert has been added so that if we ever add DGFX support we'll know
we need to handle the D3 case.

v2: use "< 0" instead of "!= 0" in the FW state error check (Julia).

Fixes: dd0e89e5edc2 ("drm/xe/gsc: GSC FW load")
Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Cc: John Harrison <[email protected]>
Cc: Alan Previn <[email protected]>
Cc: <[email protected]> # v6.8+
Reviewed-by: Julia Filipchuk <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: replace #include <drm/xe_drm.h> with <uapi/drm/xe_drm.h>
Jani Nikula [Tue, 27 Aug 2024 09:15:39 +0000 (12:15 +0300)]
drm/xe: replace #include <drm/xe_drm.h> with <uapi/drm/xe_drm.h>

include/drm/xe_drm.h does not exist. Prefer the explicit uapi include.

Signed-off-by: Jani Nikula <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe/hwmon: Fix WRITE_I1 param from u32 to u16
Karthik Poosa [Tue, 27 Aug 2024 15:53:01 +0000 (21:23 +0530)]
drm/xe/hwmon: Fix WRITE_I1 param from u32 to u16

WRITE_I1 sub-command of the POWER_SETUP pcode command accepts a u16
parameter instead of u32. This change prevents potential illegal
sub-command errors.

v2: Mask uval instead of changing the prototype. (Badal)

v3: Rephrase commit message. (Badal)

Signed-off-by: Karthik Poosa <[email protected]>
Fixes: 92d44a422d0d ("drm/xe/hwmon: Expose card reactive critical power")
Reviewed-by: Badal Nilawar <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: move the kernel lrc from hwe to execlist port
Ilia Levi [Mon, 26 Aug 2024 10:06:55 +0000 (13:06 +0300)]
drm/xe: move the kernel lrc from hwe to execlist port

The kernel lrc is used solely by the execlist infra.
Move it to the execlist port struct and initialize it only when
execlists are used.

v2: Rebase, improve error handling readability (Jonathan)

Signed-off-by: Ilia Levi <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe/bmg: Drop force_probe requirement
Balasubramani Vivekanandan [Wed, 28 Aug 2024 08:21:52 +0000 (13:51 +0530)]
drm/xe/bmg: Drop force_probe requirement

Battlemage platform is sufficiently tested and found stable. CI is also
pretty stable. Remove the force_probe requirement to enable the platform
support by default.

Cc: Thomas Hellström <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jani Nikula <[email protected]>
Signed-off-by: Balasubramani Vivekanandan <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
6 months agodrm/xe: Fix NPD in ggtt_node_remove()
Himal Prasad Ghimiray [Wed, 28 Aug 2024 09:22:29 +0000 (14:52 +0530)]
drm/xe: Fix NPD in ggtt_node_remove()

Make sure that ggtt_node_remove() is invoked only if both node and ggtt
are not null. Move the null checks to the caller function
xe_ggtt_node_remove().

v2: Move null check below declarations (Tejas)

Fixes: 919bb54e989c ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node")
Cc: Rodrigo Vivi <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Tejas Upadhyay <[email protected]>
Reviewed-by: Tejas Upadhyay <[email protected]>
Signed-off-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Use separate rpm lockdep map for non-d3cold-capable devices
Thomas Hellström [Mon, 26 Aug 2024 14:34:50 +0000 (16:34 +0200)]
drm/xe: Use separate rpm lockdep map for non-d3cold-capable devices

For non-d3cold-capable devices we'd like to be able to wake up the
device from reclaim. In particular, for Lunar Lake we'd like to be
able to blit CCS metadata to system at shrink time; at least from
kswapd where it's reasonable OK to wait for rpm resume and a
preceding rpm suspend.

Therefore use a separate lockdep map for such devices and prime it
reclaim-tainted.

v2:
- Rename lockmap acquire- and release functions. (Rodrigo Vivi).
- Reinstate the old xe_pm_runtime_lockdep_prime() function and
  rename it to xe_rpm_might_enter_cb(). (Matthew Auld).
- Introduce a separate xe_pm_runtime_lockdep_prime function
  called from module init for known required locking orders.
v3:
- Actually hook up the prime function at module init.
v4:
- Rebase.
v5:
- Don't use reclaim-safe RPM with sriov.

Cc: "Vivi, Rodrigo" <[email protected]>
Cc: "Auld, Matthew" <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agoRevert "drm/ttm: Add a flag to allow drivers to skip clear-on-free"
Nirmoy Das [Wed, 28 Aug 2024 08:36:35 +0000 (10:36 +0200)]
Revert "drm/ttm: Add a flag to allow drivers to skip clear-on-free"

Remove TTM_TT_FLAG_CLEARED_ON_FREE now that XE stopped using this
flag.

This reverts commit decbfaf06db05fa1f9b33149ebb3c145b44e878f.

Cc: Christian König <[email protected]>
Cc: Himal Prasad Ghimiray <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Thomas Hellström <[email protected]>
Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
6 months agoRevert "drm/xe/lnl: Offload system clear page activity to GPU"
Nirmoy Das [Wed, 28 Aug 2024 08:36:34 +0000 (10:36 +0200)]
Revert "drm/xe/lnl: Offload system clear page activity to GPU"

This optimization relied on having to clear CCS on allocations.
If there is no need to clear CCS on allocations then this would mostly
help in reducing CPU utilization.

Revert this patch at this moment because of:
1 Currently Xe can't do clear on free and using a invalid ttm flag,
TTM_TT_FLAG_CLEARED_ON_FREE which could poison global ttm pool on
multi-device setup.

2 Also for LNL CPU:WB doesn't require clearing CCS as such BO will
not be allowed to bind with compression PTE. Subsequent patch will
disable clearing CCS for CPU:WB BOs for LNL.

This reverts commit 23683061805be368c8d1c7e7ff52abc470cac275.

Cc: Christian König <[email protected]>
Cc: Himal Prasad Ghimiray <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Thomas Hellström <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Signed-off-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
6 months agodrm/xe: Support 'nomodeset' kernel command-line option
Thomas Zimmermann [Tue, 27 Aug 2024 12:09:05 +0000 (14:09 +0200)]
drm/xe: Support 'nomodeset' kernel command-line option

Setting 'nomodeset' on the kernel command line disables all graphics
drivers with modesetting capabilities, leaving only firmware drivers,
such as simpledrm or efifb.

Most DRM drivers automatically support 'nomodeset' via DRM's module
helper macros. In xe, which uses regular module_init(), manually call
drm_firmware_drivers_only() to test for 'nomodeset'. Do not register
the driver if set.

v2:
- use xe's init table (Lucas)
- do NULL test for init/exit functions

Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
6 months agodrm/xe: Remove unrequired NULL check in xe_sched_job_free_fences
Himal Prasad Ghimiray [Tue, 20 Aug 2024 09:02:30 +0000 (14:32 +0530)]
drm/xe: Remove unrequired NULL check in xe_sched_job_free_fences

dma_fence_chain_free() can handle NULL input, there is no need for NULL
check by caller.

Signed-off-by: Himal Prasad Ghimiray <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Jagmeet Randhawa <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
6 months agodrm/xe: Remove unrequired NULL checks in xe_sync_entry_cleanup
Himal Prasad Ghimiray [Tue, 20 Aug 2024 09:02:29 +0000 (14:32 +0530)]
drm/xe: Remove unrequired NULL checks in xe_sync_entry_cleanup

dma_fence_put() and dma_fence_chain_free() can handle NULL input,
there is no need for NULL check by caller.

Signed-off-by: Himal Prasad Ghimiray <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
6 months agodrm/xe: Remove extra dma_fence_put on xe_sync_entry_add_deps failure
Himal Prasad Ghimiray [Tue, 20 Aug 2024 09:02:28 +0000 (14:32 +0530)]
drm/xe: Remove extra dma_fence_put on xe_sync_entry_add_deps failure

drm_sched_job_add_dependency() drops references even in case of error,
no need for caller to call dma_fence_put.

Signed-off-by: Himal Prasad Ghimiray <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Ashutosh Dixit <[email protected]>
Acked-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
6 months agodrm/xe/lnl: Drop force_probe requirement
Lucas De Marchi [Thu, 22 Aug 2024 22:46:15 +0000 (15:46 -0700)]
drm/xe/lnl: Drop force_probe requirement

Lunar Lake has been usable for a while in a desktop setup. Bugs are
sporadically showing up in CI, but being promptly fixed. Nothing very
concerning.

All the uapi changes related to fundamental platform usage have been
finalized.

Remove the force_probe requirement and enable the platform by default.

Cc: Thomas Hellström <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jani Nikula <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>
6 months agodrm/xe: Remove NULL check of lrc->bo in xe_lrc_snapshot_capture()
Apoorva Singh [Fri, 16 Aug 2024 08:03:55 +0000 (13:33 +0530)]
drm/xe: Remove NULL check of lrc->bo in xe_lrc_snapshot_capture()

- lrc->bo NULL check is not needed in xe_lrc_snapshot_capture() as
  its already been taken care of in xe_lrc_init().

Signed-off-by: Apoorva Singh <[email protected]>
Acked-by: Rodrigo Vivi <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
6 months agodrm/xe: Fix total initialization in xe_ggtt_print_holes()
Nathan Chancellor [Sat, 24 Aug 2024 03:47:13 +0000 (20:47 -0700)]
drm/xe: Fix total initialization in xe_ggtt_print_holes()

Clang warns (or errors with CONFIG_DRM_WERROR or CONFIG_WERROR):

  drivers/gpu/drm/xe/xe_ggtt.c:810:3: error: variable 'total' is uninitialized when used here [-Werror,-Wuninitialized]
    810 |                 total += hole_size;
        |                 ^~~~~
  drivers/gpu/drm/xe/xe_ggtt.c:798:11: note: initialize the variable 'total' to silence this warning
    798 |         u64 total;
        |                  ^
        |                   = 0
  1 error generated.

Move the zero initialization of total from
xe_gt_sriov_pf_config_print_available_ggtt() to xe_ggtt_print_holes() to
resolve the warning.

Fixes: 136367290ea5 ("drm/xe: Introduce xe_ggtt_print_holes")
Signed-off-by: Nathan Chancellor <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20240823-drm-xe-fix-total-in-xe_ggtt_print_holes-v1-1-12b02d079327@kernel.org
Signed-off-by: Lucas De Marchi <[email protected]>
6 months agodrm/xe/display: handle HPD polling in display runtime suspend/resume
Vinod Govindapillai [Fri, 23 Aug 2024 11:21:48 +0000 (14:21 +0300)]
drm/xe/display: handle HPD polling in display runtime suspend/resume

In XE, display runtime suspend / resume routines are called only
if d3cold is allowed. This makes the driver unable to detect any
HPDs once the device goes into runtime suspend state in platforms
like LNL. Update the display runtime suspend / resume routines
to include HPD polling regardless of d3cold status.

While xe_display_pm_suspend/resume() performs steps during runtime
suspend/resume that shouldn't happen, like suspending MST and they
are missing other steps like enabling DC9, this patchset is meant
to keep the current behavior wrt. these, leaving the corresponding
updates for a follow-up

v2: have a separate function for display runtime s/r (Rodrigo)

v3: better streamlining of system s/r and runtime s/r calls (Imre)

v4: rebased

Reviewed-by: Arun R Murthy <[email protected]>
Signed-off-by: Vinod Govindapillai <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Handle polling only for system s/r in xe_display_pm_suspend/resume()
Imre Deak [Fri, 23 Aug 2024 11:21:47 +0000 (14:21 +0300)]
drm/xe: Handle polling only for system s/r in xe_display_pm_suspend/resume()

This is a preparation for the follow-up patch where polling
will be handled properly for all cases during runtime suspend/resume.

v2: rebased

Reviewed-by: Arun R Murthy <[email protected]>
Signed-off-by: Imre Deak <[email protected]>
Signed-off-by: Vinod Govindapillai <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Suspend/resume user access only during system s/r
Imre Deak [Fri, 23 Aug 2024 11:21:46 +0000 (14:21 +0300)]
drm/xe: Suspend/resume user access only during system s/r

Enable/Disable user access only during system suspend/resume.
This should not happen during runtime s/r

v2: rebased

Reviewed-by: Arun R Murthy <[email protected]>
Signed-off-by: Imre Deak <[email protected]>
Signed-off-by: Vinod Govindapillai <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Update xe_sa to use xe_managed_bo_create_pin_map
Matthew Brost [Tue, 20 Aug 2024 17:29:58 +0000 (10:29 -0700)]
drm/xe: Update xe_sa to use xe_managed_bo_create_pin_map

Preferred way to create kernel BOs is xe_managed_bo_create_pin_map, use
it.

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Move hw_engine_fini to devm managed
Matthew Brost [Tue, 20 Aug 2024 17:29:56 +0000 (10:29 -0700)]
drm/xe: Move hw_engine_fini to devm managed

Kernel BOs are destroyed with GGTT mappings, this is hardware
interaction so use devm.

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Drop warn on xe_guc_pc_gucrc_disable in guc pc fini
Matthew Brost [Tue, 20 Aug 2024 17:29:55 +0000 (10:29 -0700)]
drm/xe: Drop warn on xe_guc_pc_gucrc_disable in guc pc fini

Not a big deal if CT is down as driver is unloading, no need to warn.

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Jagmeet Randhawa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Set firmware state to loadable before registering guc_fini_hw
Matthew Brost [Tue, 20 Aug 2024 17:29:54 +0000 (10:29 -0700)]
drm/xe: Set firmware state to loadable before registering guc_fini_hw

The guc_fini_hw registered calls __xe_uc_fw_status which is only
expected to be called after initializing fw state. Move this before
registering guc_fini_hw.

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Move ggtt_fini to devm managed
Matthew Brost [Tue, 20 Aug 2024 17:29:53 +0000 (10:29 -0700)]
drm/xe: Move ggtt_fini to devm managed

ggtt->scratch is destroyed via devm, ggtt_fini sets ggtt->scratch to
NULL, ggtt->scratch in GGTT clears, so ensure ggtt->scratch is set NULL
before the BO is destroyed.

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agoRevert "drm/xe: Invalidate media_gt TLBs in PT code"
Matthew Brost [Fri, 23 Aug 2024 16:22:07 +0000 (09:22 -0700)]
Revert "drm/xe: Invalidate media_gt TLBs in PT code"

This reverts commit 40520283e0fd11237ed9dfc0991503b3403d5fa4.

We can't install dma-fence-chain in timeline sync objs.

Signed-off-by: Matthew Brost <[email protected]>
Acked-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Fix missing runtime outer protection for ggtt_remove_node
Rodrigo Vivi [Wed, 21 Aug 2024 19:38:42 +0000 (15:38 -0400)]
drm/xe: Fix missing runtime outer protection for ggtt_remove_node

Defer the ggtt node removal to a thread if runtime_pm is not active.

The ggtt node removal can be called from multiple places, including
places where we cannot protect with outer callers and places we are
within other locks. So, try to grab the runtime reference if the
device is already active, otherwise defer the removal to a separate
thread from where we are sure we can wake the device up.

v2: - use xe wq instead of system wq (Matt and CI)
    - Avoid GFP_KERNEL to be future proof since this removal can
    be called from outside our drivers and we don't want to block
    if atomic is needed. (Brost)
v3: amend forgot chunk declaring xe_device.
v4: Use a xe_ggtt_region to encapsulate the node and remova info,
    wihtout the need for any memory allocation at runtime.
v5: Actually fill the delayed_removal.invalidate (Brost)
v6: - Ensure that ggtt_region is not freed before work finishes (Auld)
    - Own wq to ensures that the queued works are flushed before
      ggtt_fini (Brost)
v7: also free ggtt_region on early !bound return (Auld)
v8: Address the null deref (CI)
v9: Based on the new xe_ggtt_node for the proper care of the lifetime
    of the object.
v10: Redo the lost v5 change. (Brost)
v11: Simplify the invalidate_on_remove (Lucas)

Cc: Matthew Auld <[email protected]>
Cc: Paulo Zanoni <[email protected]>
Cc: Francois Dugast <[email protected]>
Cc: Thomas Hellström <[email protected]>
Cc: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Make xe_ggtt_node struct independent
Rodrigo Vivi [Wed, 21 Aug 2024 19:38:41 +0000 (15:38 -0400)]
drm/xe: Make xe_ggtt_node struct independent

In some rare cases, the drm_mm node cannot be removed synchronously
due to runtime PM conditions. In this situation, the node removal will
be delegated to a workqueue that will be able to wake up the device
before removing the node.

However, in this situation, the lifetime of the xe_ggtt_node cannot
be restricted to the lifetime of the parent object. So, this patch
introduces the infrastructure so the xe_ggtt_node struct can be
allocated in advance and freed when needed.

By having the ggtt backpointer, it also ensure that the init function
is always called before any attempt to insert or reserve the node
in the GGTT.

v2: s/xe_ggtt_node_force_fini/xe_ggtt_node_fini and use it
    internaly (Brost)
v3: - Use GF_NOFS for node allocation (CI)
    - Avoid ggtt argument, now that we have it inside the node (Lucas)
    - Fix some missed fini cases (CI)
v4: - Fix SRIOV critical case where config->ggtt_region was
      lost (Michal)
    - Avoid ggtt argument also on removal (missed case on v3) (Michal)
    - Remove useless checks (Michal)
    - Return 0 instead of negative errno on a u32 addr. (Michal)
    - s/xe_ggtt_assign/xe_ggtt_node_assign for coherence, while we
      are touching it (Michal)
v5: - Fix VFs' ggtt_balloon

Cc: Matthew Auld <[email protected]>
Cc: Michal Wajdeczko <[email protected]>
Cc: Matthew Brost <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Refactor xe_ggtt balloon functions to make the node clear
Rodrigo Vivi [Wed, 21 Aug 2024 19:38:40 +0000 (15:38 -0400)]
drm/xe: Refactor xe_ggtt balloon functions to make the node clear

These operations are related to node. Convert them to the
new appropriate name space xe_ggtt_node.

v2: Also move arguments around for consistency (Lucas).
v3: s/node_balloon/node_insert_balloon and
    s/node_deballoon/node_remove_balloon (Michal).

Reviewed-by: Lucas De Marchi <[email protected]>
Cc: Michal Wajdeczko <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Introduce xe_ggtt_print_holes
Rodrigo Vivi [Wed, 21 Aug 2024 19:38:39 +0000 (15:38 -0400)]
drm/xe: Introduce xe_ggtt_print_holes

Introduce a new xe_ggtt_print_holes helper that attends the SRIOV
demand and finishes the goal of limiting drm_mm access to xe_ggtt.

Cc: Michal Wajdeczko <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Introduce xe_ggtt_largest_hole
Rodrigo Vivi [Wed, 21 Aug 2024 19:38:38 +0000 (15:38 -0400)]
drm/xe: Introduce xe_ggtt_largest_hole

Introduce a new xe_ggtt_largest_hole helper that attends the SRIOV
demand and continue with the goal of limiting drm_mm access to xe_ggtt.

v2: Fix a typo (Michal)

Cc: Michal Wajdeczko <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Limit drm_mm_node_allocated access to xe_ggtt_node
Rodrigo Vivi [Wed, 21 Aug 2024 19:38:37 +0000 (15:38 -0400)]
drm/xe: Limit drm_mm_node_allocated access to xe_ggtt_node

Continue with the encapsulation of drm_mm_node inside xe_ggtt.

Cc: Michal Wajdeczko <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Rename xe_ggtt_node related functions
Rodrigo Vivi [Wed, 21 Aug 2024 19:38:36 +0000 (15:38 -0400)]
drm/xe: Rename xe_ggtt_node related functions

Bring some consistency and prepare for more xe_ggtt_node related
functions to be introduced.

Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Encapsulate drm_mm_node inside xe_ggtt_node
Rodrigo Vivi [Wed, 21 Aug 2024 19:38:35 +0000 (15:38 -0400)]
drm/xe: Encapsulate drm_mm_node inside xe_ggtt_node

The xe_ggtt component uses drm_mm to manage the GGTT.
The drm_mm_node is just a node inside drm_mm, but in Xe we use that
only in the GGTT context. So, this patch encapsulates the drm_mm_node
into a xe_ggtt's new struct.

This is the first step towards limiting all the drm_mm access
through xe_ggtt. The ultimate goal is to have a better control of
the node insertion and removal, so the removal can be delegated
to a delayed workqueue.

v2: Fix includes and typos (Michal and Brost)

Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/{i915, xe}: Avoid direct inspection of dpt_vma from outside dpt
Rodrigo Vivi [Wed, 21 Aug 2024 19:38:34 +0000 (15:38 -0400)]
drm/{i915, xe}: Avoid direct inspection of dpt_vma from outside dpt

DPT code is so dependent on i915 vma implementation and it is not
ported yet to Xe.

This patch limits inspection to DPT's VMA struct to intel_dpt
component only, so the Xe GGTT code can evolve.

Cc: Matthew Brost <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Juha-Pekka Heikkila <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Remove unnecessary drm_mm.h includes
Rodrigo Vivi [Wed, 21 Aug 2024 19:38:33 +0000 (15:38 -0400)]
drm/xe: Remove unnecessary drm_mm.h includes

These includes are no longer necessary, and where appropriate
are replaced by the linux/types.h one.

Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Introduce GGTT documentation
Rodrigo Vivi [Wed, 21 Aug 2024 19:38:32 +0000 (15:38 -0400)]
drm/xe: Introduce GGTT documentation

Document xe_ggtt and ensure it is part of the built kernel docs.

v2: - Accepted all Michal's suggestions
    - Rebased on top of new set_pte per platform/wa function pointer
v3: - Typos and other acronym fixes (Michal)

Cc: Matthew Brost <[email protected]>
Cc: Michal Wajdeczko <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]> #v1
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: Removed unused xe_ggtt_printk
Rodrigo Vivi [Wed, 21 Aug 2024 19:38:31 +0000 (15:38 -0400)]
drm/xe: Removed unused xe_ggtt_printk

Apparently this was only useful when enabling ggtt support
for the very first time and never used again.
It is also not useful now that we have the ggtt_dump available
through debugfs.

Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
6 months agodrm/xe: fixup xe_alloc_pf_queue
Matthew Auld [Wed, 21 Aug 2024 17:19:18 +0000 (18:19 +0100)]
drm/xe: fixup xe_alloc_pf_queue

kzalloc expects number of bytes, therefore we should convert the number
of dw into bytes, otherwise we are likely just accessing beyond the
array causing all kinds of carnage. Also fixup the error handling while
we are here.

v2:
 - Prefer kcalloc (dim)

Fixes: 3338e4f90c14 ("drm/xe: Use topology to determine page fault queue size")
Signed-off-by: Matthew Auld <[email protected]>
Cc: Stuart Summers <[email protected]>
Cc: Matthew Brost <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Invalidate media_gt TLBs in PT code
Matthew Brost [Tue, 20 Aug 2024 16:16:32 +0000 (09:16 -0700)]
drm/xe: Invalidate media_gt TLBs in PT code

Testing on LNL has shown media GT's TLBs need to be invalidated via the
GuC, update PT code appropriately.

v2:
 - Do dma_fence_get before first call of invalidation_fence_init (Himal)
 - No need to check for valid chain fence (Himal)

Fixes: 3330361543fc ("drm/xe/lnl: Add LNL platform definition")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Invalidate media_gt TLBs
Matthew Brost [Tue, 20 Aug 2024 16:01:29 +0000 (09:01 -0700)]
drm/xe: Invalidate media_gt TLBs

Testing on LNL has shown media TLBs need to be invalidated via the GuC,
update xe_vm_invalidate_vma appropriately.

v2: Fix 2 tile case
v3: Include missing local change

Fixes: 3330361543fc ("drm/xe/lnl: Add LNL platform definition")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Free job before xe_exec_queue_put
Matthew Brost [Tue, 20 Aug 2024 20:23:09 +0000 (13:23 -0700)]
drm/xe: Free job before xe_exec_queue_put

Free job depends on job->vm being valid, the last xe_exec_queue_put can
destroy the VM. Prevent UAF by freeing job before xe_exec_queue_put.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Reviewed-by: Jagmeet Randhawa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Drop HW fence pointer to HW fence ctx
Matthew Brost [Thu, 15 Aug 2024 19:35:22 +0000 (12:35 -0700)]
drm/xe: Drop HW fence pointer to HW fence ctx

The HW fence ctx objects are not ref counted rather tied to the life of
an LRC object. HW fences reference the HW fence ctx, HW fences can
outlive LRCs thus resulting in UAF. Drop the  HW fence pointer to HW
fence ctx rather just store what is needed directly in HW fence.

v2:
 - Fix typo in commit (Ashutosh)
 - Use snprintf (Ashutosh)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Ashutosh Dixit <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/guc: Bump the G2H queue size to account for page faults
Stuart Summers [Sat, 17 Aug 2024 02:47:32 +0000 (02:47 +0000)]
drm/xe/guc: Bump the G2H queue size to account for page faults

With the increase in the size of the recoverable page fault
queue, we want to ensure the initial messages from GuC in
the G2H buffer have space while we transfer those out to the
actual pf_queue. Bump the G2H queue size to account for this
increase in the pf_queue size.

Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Stuart Summers <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/4c2b6974801bcffd8a010d838c8733fa4092573d.1723862633.git.stuart.summers@intel.com
6 months agodrm/xe: Use topology to determine page fault queue size
Stuart Summers [Sat, 17 Aug 2024 02:47:31 +0000 (02:47 +0000)]
drm/xe: Use topology to determine page fault queue size

Currently the page fault queue size is hard coded. However
the hardware supports faulting for each EU and each CS.
For some applications running on hardware with a large
number of EUs and CSs, this can result in an overflow of
the page fault queue.

Add a small calculation to determine the page fault queue
size based on the number of EUs and CSs in the platform as
detmined by fuses.

Signed-off-by: Stuart Summers <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/24d582a3b48c97793b8b6a402f34b4b469471636.1723862633.git.stuart.summers@intel.com
6 months agodrm/xe: Fix missing workqueue destroy in xe_gt_pagefault
Stuart Summers [Sat, 17 Aug 2024 02:47:30 +0000 (02:47 +0000)]
drm/xe: Fix missing workqueue destroy in xe_gt_pagefault

On driver reload we never free up the memory for the pagefault and
access counter workqueues. Add those destroy calls here.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Stuart Summers <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/c9a951505271dc3a7aee76de7656679f69c11518.1723862633.git.stuart.summers@intel.com
6 months agodrm/xe/lnl: Offload system clear page activity to GPU
Nirmoy Das [Fri, 16 Aug 2024 13:51:54 +0000 (15:51 +0200)]
drm/xe/lnl: Offload system clear page activity to GPU

On LNL because of flat CCS, driver creates migrates job to clear
CCS meta data. Extend that to also clear system pages using GPU.
Inform TTM to allocate pages without __GFP_ZERO to avoid double page
clearing by clearing out TTM_TT_FLAG_ZERO_ALLOC flag and set
TTM_TT_FLAG_CLEARED_ON_FREE while freeing to skip ttm pool's clear
on free as XE now takes care of clearing pages. If a bo is in system
placement such as BO created with  DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING
and there is a cpu map then for such BO gpu clear will be avoided as
there is no dma mapping for such BO at that moment to create migration
jobs.

Tested this patch api_overhead_benchmark_l0 from
https://github.com/intel/compute-benchmarks

Without the patch:
api_overhead_benchmark_l0 --testFilter=UsmMemoryAllocation:
UsmMemoryAllocation(api=l0 type=Host size=4KB) 84.206 us
UsmMemoryAllocation(api=l0 type=Host size=1GB) 105775.56 us
erf tool top 5 entries:
71.44% api_overhead_be  [kernel.kallsyms]   [k] clear_page_erms
6.34%  api_overhead_be  [kernel.kallsyms]   [k] __pageblock_pfn_to_page
2.24%  api_overhead_be  [kernel.kallsyms]   [k] cpa_flush
2.15%  api_overhead_be  [kernel.kallsyms]   [k] pages_are_mergeable
1.94%  api_overhead_be  [kernel.kallsyms]   [k] find_next_iomem_res

With the patch:
api_overhead_benchmark_l0 --testFilter=UsmMemoryAllocation:
UsmMemoryAllocation(api=l0 type=Host size=4KB) 79.439 us
UsmMemoryAllocation(api=l0 type=Host size=1GB) 98677.75 us
Perf tool top 5 entries:
11.16% api_overhead_be  [kernel.kallsyms]   [k] __pageblock_pfn_to_page
7.85%  api_overhead_be  [kernel.kallsyms]   [k] cpa_flush
7.59%  api_overhead_be  [kernel.kallsyms]   [k] find_next_iomem_res
7.24%  api_overhead_be  [kernel.kallsyms]   [k] pages_are_mergeable
5.53%  api_overhead_be  [kernel.kallsyms]   [k] lookup_address_in_pgd_attr

Without this patch clear_page_erms() dominates execution time which is
also not pipelined with migration jobs. With this patch page clearing
will get pipelined with migration job and will free CPU for more work.

v2: Handle regression on dgfx(Himal)
    Update commit message as no ttm API changes needed.
v3: Fix Kunit test.
v4: handle data leak on cpu mmap(Thomas)
v5: s/gpu_page_clear/gpu_page_clear_sys and move setting
    it to xe_ttm_sys_mgr_init() and other nits (Matt Auld)
v6: Disable it when init_on_alloc and/or init_on_free is active(Matt)
    Use compute-benchmarks as reporter used it to report this
    allocation latency issue also a proper test application than mime.
    In v5, the test showed significant reduction in alloc latency but
    that is not the case any more, I think this was mostly because
    previous test was done on IFWI which had low mem BW from CPU.

Cc: Himal Prasad Ghimiray <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
6 months agodrm/ttm: Add a flag to allow drivers to skip clear-on-free
Nirmoy Das [Fri, 16 Aug 2024 13:51:53 +0000 (15:51 +0200)]
drm/ttm: Add a flag to allow drivers to skip clear-on-free

Add TTM_TT_FLAG_CLEARED_ON_FREE, which DRM drivers can set before
releasing backing stores if they want to skip clear-on-free.

Cc: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Suggested-by: Christian König <[email protected]>
Reviewed-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
6 months agodrm/xe/oa: Use vma_pages() helper function in xe_oa_mmap()
Thorsten Blum [Mon, 19 Aug 2024 09:57:52 +0000 (11:57 +0200)]
drm/xe/oa: Use vma_pages() helper function in xe_oa_mmap()

Use the vma_pages() helper function and remove the following
Coccinelle/coccicheck warning reported by vma_pages.cocci:

  WARNING: Consider using vma_pages helper on vma

Reviewed-by: Ashutosh Dixit <[email protected]>
Signed-off-by: Thorsten Blum <[email protected]>
Signed-off-by: Ashutosh Dixit <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe/display: Make display suspend/resume work on discrete
Maarten Lankhorst [Tue, 6 Aug 2024 10:50:44 +0000 (12:50 +0200)]
drm/xe/display: Make display suspend/resume work on discrete

We should unpin before evicting all memory, and repin after GT resume.
This way, we preserve the contents of the framebuffers, and won't hang
on resume due to migration engine not being restored yet.

Signed-off-by: Maarten Lankhorst <[email protected]>
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: [email protected] # v6.8+
Reviewed-by: Uma Shankar <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Maarten Lankhorst,,, <[email protected]>
6 months agodrm/xe/display: Match i915 driver suspend/resume sequences better
Maarten Lankhorst [Tue, 6 Aug 2024 10:50:43 +0000 (12:50 +0200)]
drm/xe/display: Match i915 driver suspend/resume sequences better

Suspend fbdev sooner, and disable user access before suspending to
prevent some races. I've noticed this when comparing xe suspend to
i915's.

Matches the following commits from i915:
24b412b1bfeb ("drm/i915: Disable intel HPD poll after DRM poll init/enable")
1ef28d86bea9 ("drm/i915: Suspend the framebuffer console earlier during system suspend")
bd738d859e71 ("drm/i915: Prevent modesets during driver init/shutdown")

Thanks to Imre for pointing me to those commits.

Driver shutdown is currently missing, but I have some idea how to
implement it next.

Signed-off-by: Maarten Lankhorst <[email protected]>
Cc: Imre Deak <[email protected]>
Reviewed-by: Uma Shankar <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Maarten Lankhorst,,, <[email protected]>
6 months agodrm/xe: prevent UAF around preempt fence
Matthew Auld [Wed, 14 Aug 2024 11:01:30 +0000 (12:01 +0100)]
drm/xe: prevent UAF around preempt fence

The fence lock is part of the queue, therefore in the current design
anything locking the fence should then also hold a ref to the queue to
prevent the queue from being freed.

However, currently it looks like we signal the fence and then drop the
queue ref, but if something is waiting on the fence, the waiter is
kicked to wake up at some later point, where upon waking up it first
grabs the lock before checking the fence state. But if we have already
dropped the queue ref, then the lock might already be freed as part of
the queue, leading to uaf.

To prevent this, move the fence lock into the fence itself so we don't
run into lifetime issues. Alternative might be to have device level
lock, or only release the queue in the fence release callback, however
that might require pushing to another worker to avoid locking issues.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2454
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2342
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2020
Signed-off-by: Matthew Auld <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: <[email protected]> # v6.8+
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 months agodrm/xe: Remove redundant param from xe_bo_create_user
Nirmoy Das [Fri, 16 Aug 2024 10:22:48 +0000 (12:22 +0200)]
drm/xe: Remove redundant param from xe_bo_create_user

BO from xe_bo_create_user() will always be of type,
ttm_bo_type_device. So remove that redundant parameter.

Cc: Matthew Auld <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>
7 months agodrm/xe/device: Remove unused xe_device::usm::num_vm_in_*
Francois Dugast [Fri, 9 Aug 2024 15:51:36 +0000 (17:51 +0200)]
drm/xe/device: Remove unused xe_device::usm::num_vm_in_*

Those counters were used to keep track of the numbers VMs in fault mode
and in non-fault mode, to determine if the whole device was in fault mode
or not. This is no longer needed so remove those variables and their
usages.

Signed-off-by: Francois Dugast <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
7 months agodrm/xe/vm: Remove restriction that all VMs must be faulting if one is
Francois Dugast [Fri, 9 Aug 2024 15:51:35 +0000 (17:51 +0200)]
drm/xe/vm: Remove restriction that all VMs must be faulting if one is

With this restriction, all VMs on the device must be faulting VMs if there
is already one faulting VM, in which case the device is considered in
fault mode. This prevents for example an application from running 3D jobs
for the compositor while submitting a SVM compute job on the same device.

Now that mutual exclusion of faulting LR jobs and dma fence jobs is
ensured on the hw engine group, remove this restriction to allow running
faulting and non-faulting VMs on the same device.

Signed-off-by: Francois Dugast <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
7 months agodrm/xe/exec: Switch hw engine group execution mode upon job submission
Francois Dugast [Fri, 9 Aug 2024 15:51:34 +0000 (17:51 +0200)]
drm/xe/exec: Switch hw engine group execution mode upon job submission

If the job about to be submitted is a dma-fence job, update the current
execution mode of the hw engine group. This triggers an immediate suspend
of the exec queues running faulting long-running jobs.

If the job about to be submitted is a long-running job, kick a new worker
used to resume the exec queues running faulting long-running jobs once
the dma-fence jobs have completed.

v2: Kick the resume worker from exec IOCTL, switch to unordered workqueue,
    destroy it after use (Matt Brost)

v3: Do not resume if no exec queue was suspended (Matt Brost)

v4: Squash commits (Matt Brost)

v5: Do not kick the worker when xe_vm_in_preempt_fence_mode (Matt Brost)

Signed-off-by: Francois Dugast <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
7 months agodrm/xe/hw_engine_group: Ensure safe transition between execution modes
Francois Dugast [Fri, 9 Aug 2024 15:51:33 +0000 (17:51 +0200)]
drm/xe/hw_engine_group: Ensure safe transition between execution modes

Provide a way to safely transition execution modes of the hw engine group
ahead of the actual execution. When necessary, either wait for running
jobs to complete or preempt them, thus ensuring mutual exclusion between
execution modes.

Unlike a mutex, the rw_semaphore used in this context allows multiple
submissions in the same mode.

v2: Use lockdep_assert_held_write, add annotations (Matt Brost)

v3: Fix kernel doc, remove redundant code (Matt Brost)

v4: Now that xe_hw_engine_group_suspend_faulting_lr_jobs can fail,
    propagate the error to the caller (Matt Brost)

Signed-off-by: Francois Dugast <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
7 months agodrm/xe/hw_engine_group: Add helper to wait for dma fence jobs
Francois Dugast [Fri, 9 Aug 2024 15:51:32 +0000 (17:51 +0200)]
drm/xe/hw_engine_group: Add helper to wait for dma fence jobs

This is a required feature for faulting long running jobs not to be
submitted while dma fence jobs are running on the hw engine group.

v2: Switch to lockdep_assert_held_write in worker, get a proper reference
    for the last fence (Matt Brost)

v3: Directly call dma_fence_put with the fence ref (Matt Brost)

Signed-off-by: Francois Dugast <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
7 months agodrm/xe/exec_queue: Prepare last fence for hw engine group resume context
Francois Dugast [Fri, 9 Aug 2024 15:51:31 +0000 (17:51 +0200)]
drm/xe/exec_queue: Prepare last fence for hw engine group resume context

Ensure we can safely take a ref of the exec queue's last fence from the
context of resuming jobs from the hw engine group. The locking requirements
differ from the general case, hence the introduction of this new function.

v2: Add kernel doc, rework the code to prevent code duplication

v3: Fix kernel doc, remove now unnecessary lockdep variants (Matt Brost)

v4: Remove new put function (Matt Brost)

Signed-off-by: Francois Dugast <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
7 months agodrm/xe/exec_queue: Remove duplicated code
Francois Dugast [Fri, 9 Aug 2024 15:51:30 +0000 (17:51 +0200)]
drm/xe/exec_queue: Remove duplicated code

This code section is the same as the body of
xe_exec_queue_last_fence_put_unlocked() so call the function instead and
remove duplicated code to make maintenance easier.

Signed-off-by: Francois Dugast <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
7 months agodrm/xe/hw_engine_group: Add helper to suspend faulting LR jobs
Francois Dugast [Fri, 9 Aug 2024 15:51:29 +0000 (17:51 +0200)]
drm/xe/hw_engine_group: Add helper to suspend faulting LR jobs

This is a required feature for dma fence jobs to preempt faulting long
running jobs in order to ensure mutual exclusion on a given hw engine
group.

v2: Pipeline calls to suspend(q) and suspend_wait(q) to improve
    efficiency, switch to lockdep_assert_held_write (Matt Brost)

v3: Return error on suspend_wait failure to propagate on the call stack
    up to IOCTL (Matt Brost)

Signed-off-by: Francois Dugast <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
This page took 0.119355 seconds and 4 git commands to generate.