Git Repo - linux.git/log

drm/xe/guc: Allow CTB G2H processing without G2H IRQ

During early initialization, in the xe_guc_min_load_for_hwconfig()
function, we are successfully enabling CTB communication, but it
will only allow us to send non-blocking H2G messages, as due to
not yet enabled IRQs, including G2H IRQs, we will not notice any
new G2H message sent by the GuC, including replies to our blocking
H2G request messages. And those successful replies are mandatory
for the VF drivers to continue normal operations.

As attempt to workaround this driver initialization ordering issue,
introduce special safe-mode CTB worker, that will periodically
trigger G2H processing, like original IRQ handler, in case no
MSI/MSIX IRQs were enabled on the driver yet. Once we detect that
IRQ were enabled, we will stop this worker.

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Matthew Brost <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/guc: Split g2h worker function

In the next patch we will want to perform the same steps that
g2h worker function is doing but from the different worker.

Suggested-by: Matthew Brost <[email protected]>
Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: move disable_c6 call

disable c6 called in guc_pc_fini_hw is unreachable.

GuC PC init returns earlier if skip_guc_pc is true and never
registers the finish call thus making disable_c6 unreachable.

move this call to gt idle.

v2: rebase
v3: add fixes tag (Himal)

Fixes: 975e4a3795d4 ("drm/xe: Manually setup C6 when skip_guc_pc is set")
Signed-off-by: Riana Tauro <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe/xe_gt_idle: use GT forcewake domain assertion

The rc6 registers used in disable_c6 function belong
to the GT forcewake domain. Hence change the forcewake
assertion to check GT forcewake domain.

v2: add fixes tag (Himal)

Fixes: 975e4a3795d4 ("drm/xe: Manually setup C6 when skip_guc_pc is set")
Signed-off-by: Riana Tauro <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe/xe_gt_debugfs: Add synchronous gt reset debugfs

We currently have debugfs support that allows the userspace to initiate
an asynchronous gt reset on command. However, userspace may also wish
to wait for the completion of the gt reset before performing any
additional work. To that end, add a version of the force_reset gt
debugfs function that operates synchronously.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1068
Suggested-by: Matthew Brost <[email protected]>
Signed-off-by: Jonathan Cavitt <[email protected]>
CC: John Harrison <[email protected]>
CC: Stuart Summers <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Do not dereference NULL job->fence in trace points

job->fence is not assigned until xe_sched_job_arm(), check for
job->fence in xe_sched_job_seqno() so any usage of this function (trace
points) do not result in NULL ptr dereference. Also check job->fence
before assigning error in job trace points.

Fixes: 0ac7a2c745e8 ("drm/xe: Don't initialize fences at xe_sched_job_create()")
Cc: Thomas Hellström <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Custom GT restart

Only few steps from the GT restart phase are applicable for the
VF drivers, as initialization of PAT, WOPCM, MOCS or CCS mode can
be done only by the native or PF drivers. Use custom GT restart
function if running in VF mode.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Custom GuC reset

The VF drivers can't trigger real GuC firmware reset using GDRST
register, but for the VF drivers it is sufficient to send VF_RESET
message to reset any VF specific state maintained by the GuC.
Use our existing VF bootstrap function as VF_RESET is part of it.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Custom uC initialization

VF drivers can't modify WOPCM registers nor upload firmwares to
GuC, HuC or GSC. Modify xe_uc initialization functions to skip
those steps if running in the VF mode, or defer to a new custom
helper function that would not include those steps.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Support only GuC/HuC firmwares

Only GuC/HuC firmwares can be used by the VFs, don't claim support
for any other firmware (like GSC) even if the driver can support it.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Don't overmap identity VRAM mapping

Overmapping the identity VRAM mapping is triggering hardware bugs on
certain platforms. Use 2M pages for the last unaligned (to 1G) VRAM
chunk.

v2:
- Always use 2M pages for last chunk (Fei Yang)
- break loop when 2M pages are used
- Add assert for usable_size being 2M aligned
v3:
- Fix checkpatch

Cc: Maarten Lankhorst <[email protected]>
Cc: Fei Yang <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Fei Yang <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: flush engine buffers before signalling user fence on all engines

Tests show that user fence signalling requires kind of write barrier,
otherwise not all writes performed by the workload will be available
to userspace. It is already done for render and compute, we need it
also for the rest: video, gsc, copy.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Andrzej Hajda <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Revert "drm/xe: flush gtt before signalling user fence on all engines"

This reverts commit 38007fa96419a9db9719f170b9e8a7877821cdd1.

Signaling user-fence after seqno write does not seem to be good solution.
Instead of changing order separate barrier should be put before user-fence,
this will be done in separate patch.

v2: added fixes tag in case reverted patch gets backported to stable

Fixes: 38007fa96419 ("drm/xe: flush gtt before signalling user fence on all engines")
Signed-off-by: Andrzej Hajda <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vm: Simplify if condition

The if condition !A || A && B can be simplified to !A || B.

Fixes the following Coccinelle/coccicheck warning reported by
excluded_middle.cocci:

WARNING !A || A && B is equivalent to !A || B

Compile-tested only.

Signed-off-by: Thorsten Blum <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: drop redundant W=1 warnings from Makefile

Since commit a61ddb4393ad ("drm: enable (most) W=1 warnings by default
across the subsystem"), most of the extra warnings in the driver
Makefile are redundant. Remove them.

Note that -Wmissing-declarations and -Wmissing-prototypes are always
enabled by default in scripts/Makefile.extrawarn.

Acked-by: Lucas De Marchi <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/958a67adcbb64d3a387d2a07d83b05d71176e938.1716471145.git.jani.nikula@intel.com

drm/xe: Use missing lock in relay_needs_worker

Add missing lock that is protecting relay->incoming_actions.

Cc: Michal Wajdeczko <[email protected]>
Reviewed-by: Michal Wajdeczko <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>

drm/xe/xe2lpg: Add permanent wa_14020756599

For xe2_lpg render Wa_14020756599 is applied to
all steppings.

Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Signed-off-by: Tejas Upadhyay <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/xe2lpm: Add permanent Wa_14020756599

For xe2_lpm Wa_14020756599 is applied to all steppings and
when RCS is present on graphics GT.

V5(MattR):
  - Add more comments about new API
V4:
  - Make it part of lrc wa
  - Check for RCS as rtp rule
V3(MattR):
  - Rename rtp api name
  - Use MEDIA_VERx100
V2:
  - Remove engine filter video decode
  - Fix typo GRAPHICS/MEDIA/s - Himal

Reviewed-by: Matt Roper <[email protected]>
Signed-off-by: Tejas Upadhyay <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Add kernel-doc to some xe_lrc interfaces

Add kernel-doc to xe_lrc_create/destroy and xe_lrc_get/put
interfaces.

v2: Fix kernel-doc for xe_lrc_create(), drop Fixes tag.
(Matt Brost, Michal Wajdeczko)

Signed-off-by: Niranjana Vishwanathapura <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Fix NULL ptr dereference in devcoredump

Kernel VM do not have an Xe file. Include a check for Xe file in the VM
before trying to get pid from VM's Xe file when taking a devcoredump.

Fixes: b10d0c5e9df7 ("drm/xe: Add process name to devcoredump")
Cc: Rodrigo Vivi <[email protected]>
Cc: José Roberto de Souza <[email protected]>
Cc: [email protected]
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/pf: Update the LMTT when freeing VF GT config

The LMTT must be updated whenever we change the VF LMEM configuration.
We missed that step when freeing the whole VF GT config, which could
result in stale PTE in LMTT or LMTT PT object leaks. Fix that.

Fixes: ac6598aed1b3 ("drm/xe/pf: Add support to configure SR-IOV VFs")
Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Split MCR initialization

The initialization order of GT topology, MCR, PAT and GuC HWconfig
as done today by native/PF driver, can't be followed as-is by the
VF driver, since fuse registers used in GT topology discovery will
be obtained by the VF driver from the GuC in HWconfig step.

While native/PF drivers need to program the HW PAT table prior to
loading the GuC, this requires only multicast writes support from
the MCR code, which could be initialized separately from the full
MCR support that requires the GT topology to setup steering data.

Split MCR initialization into two steps to avoid introducing VF
specific code paths. This also fixes duplicated spin_lock inits.

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Matt Roper <[email protected]>
Cc: Michał Winiarski <[email protected]>
Cc: Zhanjun Dong <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Setup VRAM based on received config data

VF drivers will obtain VRAM configuration from the GuC as part of
the VF self config. Use that configuration instead of trying to
read inaccessible registers.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Promote VRAM initialization function to own file

There is no point in mixing register access and VRAM code in the
same file. Move and rename the VRAM probe function to a new file
(there are no other changes other then new simple kernel-doc).

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Matt Roper <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Drop xe_ prefix from static functions in xe_mmio.c

Rename static functions to align with our typical coding style.
While at it, downgrade the existing kernel-doc for internal
function to normal comment.

Suggested-by: Matt Roper <[email protected]>
Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Move BAR definitions to dedicated file

We should keep all hardware definitions separated from the driver
code. Move LMEM_BAR definition to new regs/xe_bars.h file and also
add there GTTMMADR_BAR definition to avoid using magic 0 resource.

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Matt Roper <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Move XEHP_MTCFG_ADDR register definition to xe_regs.h

We should not define registers directly in the code while we have
dedicated files for all register definitions. Move XEHP_MTCFG_ADDR
to regs/xe_regs.h

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Matt Roper <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Revert "drm/xe: make gt_remove use devm"

This reverts commit cd506a33b0d9759e0a58556799b1b38650fa3698.

The gt_remove function was explicitly added as part of the remove flow
instead of using drmm/devm automatic cleanup due to it being illegal
to remove a component after the driver has been detached from the pci
device; the GSC proxy component is removed as part of gt_remove, so we
need to do it in the pci cleanup flow. The function already has a
comment above it to explain this.

Note that the change to use the devm also caused an invalid pointer
deref in the gsc_proxy unbind function, but I didn't bother to debug
which pointer was bad since we shouldn't be calling the unbind that
late anyway and this revert fixes it.

Both issue were not seen in CI because the GSC loading is temporarily
disabled due to a critical bug, which means we're not binding the
component.

Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: replace format-less snprintf() with strscpy()

Using snprintf() with a format string from task->comm is a bit
dangerous since the string may be controlled by unprivileged
userspace:

drivers/gpu/drm/xe/xe_devcoredump.c: In function 'devcoredump_snapshot':
drivers/gpu/drm/xe/xe_devcoredump.c:184:9: error: format not a string literal and no format arguments [-Werror=format-security]
184 | snprintf(ss->process_name, sizeof(ss->process_name), process_name);
| ^~~~~~~~

In this case there is no reason for an snprintf(), so use a simpler
string copy.

Fixes: b10d0c5e9df7 ("drm/xe: Add process name to devcoredump")
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Decouple xe_exec_queue and xe_lrc

Decouple xe_lrc from xe_exec_queue and reference count xe_lrc.
Removing hard coupling between xe_exec_queue and xe_lrc allows
flexible design where the user interface xe_exec_queue can be
destroyed independent of the hardware/firmware interface xe_lrc.

v2: Fix lrc indexing in wq_item_append()

Signed-off-by: Niranjana Vishwanathapura <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Remove unwanted mutex locking

Do not hold xef->exec_queue.lock mutex while parsing the xarray
xef->exec_queue.xa in xe_file_close() as it is not needed and
will cause an unwanted dependency between this lock and the vm->lock.

This lock protects the exec queue lookup and reference taking which
doesn't apply to this code path. When FD is closing, IOCTLs presumably
can't be modifying the xarray.

v2: Update commit text (Matt Brost)
v3: Add more code comment (Rodrigo Vivi)
v4: Further expand code comment (Rodirgo Vivi)

Signed-off-by: Niranjana Vishwanathapura <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Jagmeet Randhawa <[email protected]>
Acked-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Drop undesired prefix from the platform name

We don't have to use exact names of the enumerators as the
potentially user-facing platform names. When constructing
platform descriptor fields, use the unique platform tag and
add the XE_ prefix only to the generated enum field.

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/hwmon: Expose card power and energy attributes of BMG

In BMG there are separate registers for card/platform power and
energy.
These are exposed through channel 0 i.e power_1/energy1_xxx.

Signed-off-by: Karthik Poosa <[email protected]>
Reviewed-by: Badal Nilawar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Balasubramani Vivekanandan <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe/hwmon: Add HWMON support for BMG

Add HWMON support for BMG. Exposing the pkg power, current,
energy info.

Signed-off-by: Karthik Poosa <[email protected]>
Reviewed-by: Badal Nilawar <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Balasubramani Vivekanandan <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Add engine name to the engine reset and cat-err log

Add engine name to the engine reset and cat error log
which should be useful while debugging.

v2: Add logical mask and engine class(Matt)
Use xe_gt_{info|dbg} (Michal)

Cc: Matthew Brost <[email protected]>
Cc: Michal Wajdeczko <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>

drm/xe: Check empty pinned BO list with lock held.

Use lock that is meant to use for accessing the BO pin list.

Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>

drm/xe/guc: Fix uninitialised count in GuC load debug prints

The debug prints about how long the GuC load takes have a loop
counter. However that was neither initialised nor incremented! Plus,
counting loops is no longer meaningful given the wait function returns
early for any change in the status value. So fix it to only count
loops due to actual timeouts.

Signed-off-by: John Harrison <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Fixes: b0ac1b42dbdc ("drm/xe/guc: Port over the slow GuC loading support from i915")
Cc: John Harrison <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Oded Gabbay <[email protected]>
Cc: "Thomas Hellström" <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Matt Roper <[email protected]>
Cc: Michal Wajdeczko <[email protected]>
Cc: Fei Yang <[email protected]>
Cc: [email protected]
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

MAINTAINERS: update Xe driver maintainers

Because I left Intel, I'm removing myself from the list
of Xe driver maintainers.

Signed-off-by: Oded Gabbay <[email protected]>
Acked-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe: Enable Coarse Power Gating

Enable power gating for all units and sub-pipes that
are disabled by default.

v2: change the init function name
    use symmetric calls for enable/disable pg
    re-pharase commit message (Rodrigo)
    modify the sub-pipe power gating condition

v3: set hysteresis value for render and media
    when GuC PC is disabled
    skip CPG for PVC (Vinay)

v4: rebase

Signed-off-by: Riana Tauro <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]> #v2
Reviewed-by: Vinay Belgaumkar <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Standardize power gate registers

Standardize power gate registers

No functional changes

v2: change commit message (Rodrigo)

Signed-off-by: Riana Tauro <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Don't refer to general LRC initialization as a "wa"

During engine LRC initialization a number of registers need to be
programmed as general setup. This programming is not a "workaround" so
naming the RTP table as "lrc_was" is misleading; switch to the name
"lrc_setup" to more accurately describe what the table is actually for.

Signed-off-by: Matt Roper <[email protected]>
Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Use platform name in xe_assert()

We can now use more user-friendly platform name instead of
previosly used magic platform enumerator value:

  [ ] xe 0000:00:02.0: [drm] Assertion `false` failed!
      platform: ALDERLAKE_S ...
  [ ] xe 0000:03:00.0: [drm] Assertion `false` failed!
      platform: DG2 ...

vs

  [ ] xe 0000:00:02.0: [drm] Assertion `false` failed!
      platform: 3 ...
  [ ] xe 0000:03:00.0: [drm] Assertion `false` failed!
      platform: 7 ...

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Store platform name in xe_device.info

We already maintain the platform name as part of the device
descriptor, but in xe_device.info we only store platform enum,
which is not the best for use in any user-facing messages.

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: allow unaligned start and size xe_res_cursor parameters

xe_res_cursor code does not depend on the alignment. On the other side
unaligned accesses are useful from pread/pwrite point of view.

Signed-off-by: Andrzej Hajda <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>

drm/xe: flush gtt before signalling user fence on all engines

Tests show that user fence signalling requires kind of write barrier,
otherwise not all writes performed by the workload will be available
to userspace. It is already done for render and compute, we need it
also for the rest: video, gsc, copy.

v2: added gsc and copy engines, added fixes and r-b tags

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1488
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Andrzej Hajda <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>

drm/xe: Do not access xe file when updating exec queue run_ticks

The current code is running into a use after free case where xe file is
closed before the exec queue run_ticks can be updated. This is occurring
in the xe_file_close path. To fix that, do not access xe file when
updating the exec queue run_ticks. Instead store the exec queue run_ticks
locally in the exec queue object and accumulate it when the user dumps
the drm client stats. We know that the xe file is valid when user is
dumping the run_ticks for the drm client, so this effectively
removes the dependency on xe file object in
xe_exec_queue_update_run_ticks().

v2:
- Fix the accumulation of q->run_ticks delta into xe file run_ticks
- s/runtime/run_ticks/ (Rodrigo)

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1908
Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime")
Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Use run_ticks instead of runtime for client stats

Note that runtime is also used in the pm context, so it is confusing to
use the same name to denote run time of the drm client. Use a more
appropriate name for the client utilization.

While at it, drop the incorrect multi-lrc comment in the helper
description

v2: s/show_runtime/show_run_ticks/ (Rodrigo)

Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Move job creation out of the struct xe_migrate::job_mutex

In order to be able to run gpu jobs from reclaim context,
move job creation (where allocation takes place) out of the
struct xe_migrate::job_mutex, and prime that mutex as reclaim
tainted.

Jobs that may need to run from reclaim context include
CCS metadata extraction at shrinking time.

Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Remove xe_lrc_create_seqno_fence()

It's not used anymore.

Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Don't initialize fences at xe_sched_job_create()

Pre-allocate but don't initialize fences at xe_sched_job_create(),
and initialize / arm them instead at xe_sched_job_arm(). This
makes it possible to move xe_sched_job_create() with its memory
allocation out of any lock that is required for fence
initialization, and that may not allow memory allocation under it.

Replaces the struct dma_fence_array for parallell jobs with a
struct dma_fence_chain, since the former doesn't allow
a split-up between allocation and initialization.

v2:
- Rebase.
- Don't always use the first lrc when initializing parallel
lrc fences.
- Use dma_fence_chain_contained() to access the lrc fences.

v4:
- Add an assert that job->lrc_seqno == fence->seqno.
(Matthew Brost)

Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]> #v2
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Split lrc seqno fence creation up

Since sometimes a lock is required to initialize a seqno fence,
and it might be desirable not to hold that lock while performing
memory allocations, split the lrc seqno fence creation up into an
allocation phase and an initialization phase.

Since lrc seqno fences under the hood are hw_fences, do the same
for these and remove the xe_hw_fence_create() function since it
is not used anymore.

Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Decouple job seqno and lrc seqno

Tightly coupling these seqno presents problems if alternative fences for
jobs are used. Decouple these for correctness.

v2:
- Slightly reword commit message (Thomas)
- Make sure the lrc fence ops are used in comparison (Thomas)
- Assume seqno is unsigned rather than signed in format string (Thomas)

Cc: Thomas Hellström <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Signed-off-by: Thomas Hellström <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Use only assigned GGTT region

Each VF is assigned a limited range of the GGTT address space.
To ensure that the VF driver does not use GGTT allocations outside
of the assigned region, explicitly reserve GGTT space below and
above this region when initializing GGTT.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Michał Winiarski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Read VF configuration prior to GGTT initialization

Each VF will be assigned with only a limited range of the GGTT
address space. Make sure that VF driver will read its own GGTT
configuration before starting any GGTT initialization.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Michał Winiarski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Treat GMDID as another runtime register

While the GMDID registers are not part of the runtime register list
shared by the PF driver, we may still return cached values from our
VF specific read32() helper function.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Cache value of the GMDID register

Read and cache value of the GMDID register as part of the config
query that VF driver is doing over MMIO.

While the VF driver likely already obtained the value of the GMDID
register once during the early driver probe, we couldn't cache it
then as the GT structures were not ready yet.

Cache it now, in case the driver needs it later when the GuC MMIO
communication, required to query GMDID from GuC, could be no longer
desired as it will be replaced by the CTB communication.

While around, assert that we will query GMDID only when applicable.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Provide early access to GMDID register

VFs do not have direct access to the GMDID register and must obtain
its value from the GuC. Since we need GMDID value very early in the
driver probe flow, before we even start the full setup of GT and GuC
data structures, we must do some early initializations ourselves.

Additionally, since we also need GMDID for the media GT, which isn't
created yet, temporarly tweak the root GT type into MEDIA to allow
communication with the correct GuC, as only it can provide the value
of the media GMDID register.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Obtain value of GMDID register from GuC

VFs don't have access to the GMDID register and must obtain it
value using GuC VF ABI KLV query. Add function for doing that.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/guc: Add GLOBAL_CFG_GMD_ID KLV definition

VF drivers can't access GMD_ID register over MMIO.
The value of the GMD_ID register must be queried from GuC.
It is available as GLOBAL_CFG_GMD_ID KLV.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Use register values obtained from the PF

As part of the its initialization, the VF driver has already
obtained a list of the runtime (fuse) register values from the
PF driver. When VF driver is attempting to read register that is
inaccessible to the VF, use the values from this list instead.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/guc: Port over the slow GuC loading support from i915

GuC loading can take longer than it is supposed to for various
reasons. So add in the code to cope with that and to report it when it
happens. There are also many different reasons why GuC loading can
fail, so add in the code for checking for those and for reporting
issues in a meaningful manner rather than just hitting a timeout and
saying 'fail: status = %x'.

Also, remove the 'FIXME' comment about an i915 bug that has never been
applicable to Xe!

v2: Actually report the requested and granted frequencies rather than
showing granted twice (review feedback from Badal).
v3: Locally code all the timeout and end condition handling because a
helper function is not allowed (review feedback from Lucas/Rodrigo).
v4: Add more documentation comments and rename a define to add units
(review feedback from Lucas).
v5: Fix copy/paste error in xe_mmio_wait32_not (review feedback from
Lucas) and rebase (no more return value from guc_wait_ucode).

Signed-off-by: John Harrison <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Make read_perf_limit_reasons globally accessible

Other driver code beyond the sysfs interface wants to know about
throttling. So make the query function globally accessible.

v2: Revert include order change (review feedback from Lucas)
v3: Remove '_sysfs' from throttle file names and keep limit query in
the same file rather than moving elsewhere (review feedback from
Rodrigo).
v4: Correct #include while renaming header file (review feedback
from Lucas).

Signed-off-by: John Harrison <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Nuke simple error capture

This error capture prints into dmesg HW state when a gpu hang happens.
It was useful when we did not had devcoredump, now it is a incompleted
version of devcoredump that has potential to flood dmesg.

Cc: Rodrigo Vivi <[email protected]>
Cc: John Harrison <[email protected]>
Signed-off-by: José Roberto de Souza <[email protected]>
Reviewed-by: John Harrison <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Add process name to devcoredump

Process name help us track what application caused the gpug hang, this
is crucial when running several applications at the same time.

v2:
- handle Xe KMD exec_queues without VM

v3:
- use get_pid_task() (suggested by Nirmoy)

Cc: Rodrigo Vivi <[email protected]>
Cc: Nirmoy Das <[email protected]>
Signed-off-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: remove unused struct 'xe_gt_desc'

'xe_gt_desc' is unused since
commit 1e6c20be6c83 ("drm/xe: Drop extra_gts[] declarations and
XE_GT_TYPE_REMOTE").

Remove it.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Link:
https://patchwork.freedesktop.org/patch/msgid/20240522175840 [email protected]
Reviewed-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Enable D3Cold on 'low' VRAM utilization

Now that we eliminated all the mem_access get/put with its
locking issues from the inner calls of migration, we can
allow D3Cold.

Enable it when VRAM utilization is lower then 300Mb. On
higher utilization we only allow D3hot so we don't increase
so much the latency on runtime resume due to the memory
restoration.

Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Anshuman Gupta <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Stop checking for power_lost on D3Cold

GuC reset status is not reliable for this purpose and it is
once in a while ending up in a situation of D3Cold, where
power_reset is false and without the proper memory restoration
the GuC reload and Display will fail to come back from D3Cold.

So, let's do a full restoration of everything if we have a risk
of losing power, without further optimizations.

v2: also remove the gut_in_reset function (Anshuman)

Cc: Anshuman Gupta <[email protected]>
Reviewed-by: Anshuman Gupta <[email protected]>
Reviewed-by: Badal Nilawar <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Prepare display for D3Cold

Prepare power-well and DC handling for a full power
lost during D3Cold, then sanitize it upon D3->D0.
Otherwise we get a bunch of state mismatch.

Ideally we could leave DC9 enabled and wouldn't need
to move DC9->DC0 on every runtime resume, however,
the disable_DC is part of the power-well checks and
intrinsic to the dc_off power well. In the future that
can be detangled so we can have even bigger power savings.
But for now, let's focus on getting a D3Cold, which saves
much more power by itself.

v2: create new functions to avoid full-suspend-resume path,
which would result in a deadlock between xe_gem_fault and the
modeset-ioctl.

v3: Only avoid the full modeset to avoid the race, for a more
robust suspend-resume.

Cc: Anshuman Gupta <[email protected]>
Cc: Uma Shankar <[email protected]>
Tested-by: Francois Dugast <[email protected]>
Reviewed-by: Anshuman Gupta <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Relax runtime pm protection around VM

In the regular use case scenario, user space will create a
VM, and keep it alive for the entire duration of its workload.

For the regular desktop cases, it means that the VM
is alive even on idle scenarios where display goes off. This
is unacceptable since this would entirely block runtime PM
indefinitely, blocking deeper Package-C state. This would be
a waste drainage of power.

Limit the VM protection solely for long-running workloads that
are not protected by the scheduler references.
By design, run_job for long-running workloads returns NULL and
the scheduler drops all the references of it, hence protecting
the VM for this case is necessary.

v2: Update commit message to a more imperative language and to
    reflect why the VM protection is really needed.
    Also add a comment in the code to let the reason visbible.

v3: Remove vma_access case and the mentions to mmap. Mmap cases
    are already protected by the gem page fault.

Cc: Thomas Hellström <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Matthew Brost <[email protected]>
Tested-by: Francois Dugast <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Relax runtime pm protection during execution

Limit the protection only during moments of actual job execution,
and introduce protection for guc submit fini, which is currently
unprotected due to the absence of exec_queue life protection.

In the regular use case scenario, user space will create an
exec queue, and keep it alive to reuse that until it is done
with that kind of workload.

For the regular desktop cases, it means that the exec_queue
is alive even on idle scenarios where display goes off. This
is unacceptable since this would entirely block runtime PM
indefinitely, blocking deeper Package-C state. This would be
a waste drainage of power.

Cc: Matthew Brost <[email protected]>
Tested-by: Francois Dugast <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Fix xe_pm_runtime_get_if_in_use documentation

Let's be clear on what it is actually doing and align with
xe_pm_runtime_get_if_active doc style.

Tested-by: Francois Dugast <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Fix xe_pm_runtime_get_if_active return

Current callers of this function are already taking the result
to a boolean and using in an if. It might be a problem because
current function might return negative error codes on failure,
without increasing the reference counter.

In this scenario we could end up with extra 'put' call ending
in unbalanced scenarios.

Let's fix it, while aligning with the current xe_pm_get_if_in_use
style.

Tested-by: Francois Dugast <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe: Properly handle alloc_guc_id() failure

Release the submission_state lock if alloc_guc_id() fails.

v2: Add Fixes tag and CC stable kernel

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: <[email protected]> # v6.8+
Signed-off-by: Niranjana Vishwanathapura <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/uc: Don't emit false error if running in execlist mode

When running in execlist mode (using force_execlist=1 modparam)
we incorrectly select the error path in xe_uc_init(), leading to
an unwanted error message like this:

[ ] xe 0000:00:00.0: [drm] *ERROR* GT0: Failed to initialize uC (0000000000000000)

Fix that by doing early return like we do in other similar cases.

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Daniele Ceraolo Spurio <[email protected]>
Reviewed-by: Daniele Ceraolo Spurio <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/display: move device_remove over to drmm

i915 display calls this when releasing the drm_device, match this also
in xe by using drmm. intel_display_device_remove() is freeing purely
software state for the drm_device.

v2: fix build error

Signed-off-by: Matthew Auld <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/display: stop calling domains_driver_remove twice

Unclear why we call this twice.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/display: move display fini stuff to devm

Match the i915 display handling here with calling both no_irq and
noaccel when removing the device.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: reset mmio mappings with devm

Set our various mmio mappings to NULL. This should make it easier to
catch something rogue trying to mess with mmio after device removal. For
example, we might unmap everything and then start hitting some mmio
address which has already been unmamped by us and then remapped by
something else, causing all kinds of carnage.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/mmio: move mmio_fini over to devm

Not valid to touch mmio once the device is removed, so make sure we
unmap on removal and not just when driver instance goes away. Also set
the mmio pointers to NULL to hopefully catch such issues more easily.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: make gt_remove use devm

No need to hand roll the onion unwind here, just move gt_remove over to
devm which will already have the correct ordering.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/gt: break out gt_fini into sw vs hw state

Have a cleaner separation between hw vs sw.

v2: Fix missing return

Signed-off-by: Matthew Auld <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/coredump: move over to devm

Here we are using drmm to ensure we release the coredump when unloading
the module, however the coredump is very much tied to the struct device
underneath. We can see this when we hotunplug the device, for which we
have already got a coredump attached. In such a case the coredump still
remains and adding another is not possible. However we still register
the release action via xe_driver_devcoredump_fini(), so in effect two or
more releases for one dump. The other consideration is that the
coredump state is embedded in the xe_driver instance, so technically
once the drmm release action fires we might free the coredumpe state
from a different driver instance, assuming we have two release actions
and they can race. Rather use devm here to remove the coredump when the
device is released.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1679
Signed-off-by: Matthew Auld <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/device: move xe_device_sanitize over to devm

Disable GuC submission when removing the device.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/device: move flr to devm

Should be called when driver is removed, not when this particular driver
instance is destroyed.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/irq: move irq_uninstall over to devm

Makes sense to trigger this when the device is removed.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/guc_pc: s/pc_fini/pc_fini_hw/

Make it clear that is about cleaning up the HW/FW side, and not software
state.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/guc_pc: move pc_fini to devm

Here we are touching the HW/GuC and presumably this should happen when
the device is removed. Currently if you hotunplug the device this is
skipped if there is already open driver instance.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/guc: s/guc_fini/guc_fini_hw/

Make it clear that is about cleaning up the HW/FW side, and not software
state.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/guc: move guc_fini over to devm

Make sure to actually call this when the device is removed. Currently we
only trigger it when the driver instance goes away, but that doesn't
work too well with hotunplug, since device can be removed and re-probed
with a new driver instance, where the guc_fini() is called too late.
Move the fini over to devm to ensure this is called when device is
removed.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717
Signed-off-by: Matthew Auld <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/ggtt: use drm_dev_enter to mark device section

Device can be hotunplugged before we start destroying gem objects. In
such a case don't touch the GGTT entries, trigger any invalidations or
mess around with rpm.  This should already be taken care of when
removing the device, we just need to take care of dealing with the
software state, like removing the mm node.

v2: (Andrzej)
  - Avoid some duplication by tracking the bound status and checking
    that instead.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1717
Signed-off-by: Matthew Auld <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Reviewed-by: Jagmeet Randhawa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: covert sysfs over to devm

Hotunplugging the device seems to result in stuff like:

kobject_add_internal failed for tile0 with -EEXIST, don't try to
register things with the same name in the same directory.

We only remove the sysfs as part of drmm, however that is tied to the
lifetime of the driver instance and not the device underneath. Attempt
to fix by using devm for all of the remaining sysfs stuff related to the
device.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1667
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1432
Signed-off-by: Matthew Auld <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/pci: remove broken driver_release

This is quite broken since we are nuking the pdev link to the private
driver struct, but note here that driver_release is called when the
drm_device is released (poor mans drmm), which can be long after the
device has been removed. So here what we are actually doing is nuking
the pdev link for what is potentially bound to a different drm_device.
If that happens before our pci remove callback is triggered (for the new
drm_device) we silently exit and skip some important cleanup steps,
resulting in hilarity.

There should be no reason to implement driver_release, when we already
have nicer stuff like drmm, so just remove completely. The actual pdev
link is already nuked when removing the device.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Custom GuC initialization if VF

The GuC firmware is loaded and initialized by the PF driver. Make
sure VF drivers only perform permitted operations. For submission
initialization, use number of GuC context IDs from self config.

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: John Harrison <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/guc: Allow to initialize submission with limited set of IDs

While PF and native drivers may initialize submission code to use
all available GuC contexts IDs, the VF driver may only use limited
number of IDs. Update init function to accept number of context
IDs available for use.

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Himal Prasad Ghimiray <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Cleanup xe_mmio.h

We don't need <linux/delay.h> include since commit 5c09bd6ccd41
("drm/xe/mmio: Move xe_mmio_wait32() to xe_mmio.c").

We don't need <linux/io-64-nonatomic-lo-hi.h> include since commit
54c659660d63 ("drm/xe: Make xe_mmio_read|write() functions non-inline").

And since commit 924e6a9789a0 ("drm/xe/uapi: Remove MMIO ioctl")
we don't need forward declarations of drm_device and drm_file.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Don't rely on indirect includes from xe_mmio.h

These compilation units use udelay() or some GT oriented printk
functions without explicitly including proper header files, and
relying on #includes from the xe_mmio.h instead. Fix that.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Francois Dugast <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/i915/display: Add missing include to intel_vga.c

This compilation unit uses udelay() function without including
it's header file. Fix that to break dependency on other code.

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Jani Nikula <[email protected]>
Acked-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Fix xe_guc_pc.h

Prefer forward declaration over #include xe_guc_pc_types.h

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Francois Dugast <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Fix xe_huc.h

Prefer forward declaration over #include xe_huc_types.h

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Francois Dugast <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Fix xe_gsc.h

Prefer forward declaration over #include xe_gsc_types.h

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Francois Dugast <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]