Git Repo - linux.git/log

drm/xe: Move and export xe_hw_engine lookup.

Move and export xe_hw_engine lookup. This is in preparation
to use this in eudebug code where we want to find active
engine.

v2: s/tile/gt due to uapi changes (Mika)

Signed-off-by: Dominik Grzegorzek <[email protected]>
Signed-off-by: Mika Kuoppala <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Use dma_fence_chain_free in chain fence unused as a sync

A chain fence is uninitialized if not installed in a drm sync obj. Thus
if xe_sync_entry_cleanup is called and sync->chain_fence is non-NULL the
proper cleanup is dma_fence_chain_free rather than a dma-fence put.

Reported-by: Paulo Zanoni <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2411
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2261
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/oa/uapi: Make bit masks unsigned

When building with gcc-5:

    In function ‘decode_oa_format.isra.26’,
inlined from ‘xe_oa_set_prop_oa_format’ at drivers/gpu/drm/xe/xe_oa.c:1664:6:
    ././include/linux/compiler_types.h:510:38: error: call to ‘__compiletime_assert_1336’ declared with attribute error: FIELD_GET: mask is not constant
    [...]
    ./include/linux/bitfield.h:155:3: note: in expansion of macro ‘__BF_FIELD_CHECK’
       __BF_FIELD_CHECK(_mask, _reg, 0U, "FIELD_GET: "); \
       ^
    drivers/gpu/drm/xe/xe_oa.c:1573:18: note: in expansion of macro ‘FIELD_GET’
      u32 bc_report = FIELD_GET(DRM_XE_OA_FORMAT_MASK_BC_REPORT, fmt);
      ^

Fixes: b6fd51c62119 ("drm/xe/oa/uapi: Define and parse OA stream properties")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/xe2hpg: Introduce performance tuning changes for Xe2_HPG

Add performance tuning changes for Xe2_HPG

Bspec: 72161
Signed-off-by: Sai Teja Pottumuttu <[email protected]>
Reviewed-by: Gustavo Sousa <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Migrate OOB WAs to OR rules

Now that rtp has OR rules, it's not needed to extend it to process OOB
WAs. Previously if an entry had no name, it was considered as "a set of
rules OR'ed with the last named entry".

Instead of generating new entries, add OR rules. The syntax for
xe_wa_oob.rules remains the same, with xe_gen_wa_oob generating the
slightly different table. Object sizes delta are negligible, but having
just one logic makes it easier to maintain:

add/remove: 0/0 grow/shrink: 1/2 up/down: 160/-269 (-109)
Function                                     old     new   delta
__compound_literal                          6104    6264    +160
xe_wa_dump                                  1839    1810     -29
oob_was                                      816     576    -240
Total: Before=17257, After=17148, chg -0.63%

Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/rtp: Expand max rules/actions per entry again

Like commit 512660cd1f1a ("drm/xe/rtp: Expand max rules/actions per
entry") did, expand the maximum number of actions/rules. That commit was
too conservative, just incrementing 2. Other than the ugliness of these
macros and additional preprocessor steps when they are used, there are
no downsides on increasing the maximum: the tables in which they are
used use a sentinel to mark the last element.

With rtp processing now supporting OR rules, it's possible to migrate
the extension made for OOB WAs that "entries with name are OR'ed in
previous entry". For that the maximum number of rules needs to be
increased.

Just double it. Hopefully 12 is sufficient for longer than 6 was.

Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/rtp: Simplify marking active workarounds

Stop doing the calculation both in rtp_mark_active() and in its caller.
The caller easily knows the number of entries to mark, so just pass it
forward. That also simplifies rtp_mark_active() since now it doesn't
have a special case when handling 1 entry.

Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/kunit: Test rtp with no actions

The OOB WAs use xe_rtp_process(), without passing an sr to save result
of the actions since there are none. They are also executed in a gt-only
context, making it harder to share the implementation. Thus, introduce a
new set of tests to check these RTP entries. The only check that can be
done is if the entry was marked as active.

Before commit fd6797ec50c5 ("drm/xe/rtp: Fix off-by-one when processing
rules") several of these tests were failing: the processing of OR'ed
entries would make the subsequent entry to be inadvertently enabled.

Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/kunit: Rename rtp test cases

Those tests check the behavior of xe_rtp_process_to_sr(), so name them
accordingly to allow adding tests for xe_rtp_process() later.

Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/kunit: Test active rtp entries

Enabling active tracking in the rtp context and check for all the tests
the expected entries become active.

Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/kunit: Rename count to count_sr_entries

The RTP tests check both the result of processing the RTP entries and
the outcome saved as SR entries. Rename "count" to be explicit about
what's being counted.

Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/kunit: Test WAs for BMG

Add one variant for BMG to make sure the workarounds do not conflict.
This matches the machine with BMG in CI:

BATTLEMAGE e20b:0000 dgfx:1
gfx:Xe2_LPG / Xe2_HPG (20.01)
media:Xe2_LPM / Xe2_HPM (13.01)
display:yes dma_m_s:46 tc:1 gscfi:0 cscfi:1
Stepping = (G:A0, M:A1, D:**, B:**)

Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/migrate: Future-proof compressed PAT check

Although all current Xe2 platforms support FlatCCS, we probably
shouldn't assume that will be universally true forever. In the past
we've had platforms like PVC that didn't support compression, and the
same could show up again at some point in the future. Future-proof the
migration code by adding an explicit check for FlatCCS support to the
condition that decides whether to use a compressed PAT index for
migration.

While we're at it, we can drop the IS_DGFX check since it's redundant
with the src_is_vram check (only dGPUs have VRAM).

Cc: Akshata Jahagirdar <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Reviewed-by: Akshata Jahagirdar <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/rtp: Fix off-by-one when processing rules

Gustavo noticed an odd "+ 2" in rtp_mark_active() while processing
rtp rules and pointed that it should be "+ 1". In fact, while processing
entries without actions (OOB workarounds), if the WA is activated and
has OR rules, it will also inadvertently activate the very next
workaround.

Test in a LNL B0 platform by moving 18024947630 on top of 16020292621,
makes the latter become active:

$ cat /sys/kernel/debug/dri/0/gt0/workarounds
...
OOB Workarounds
18024947630
16020292621
14018094691
16022287689
13011645652
22019338487_display

In future a kunit test will be added to cover the rtp checks for entries
without actions.

Fixes: fe19328b900c ("drm/xe/rtp: Add support for entries with no action")
Cc: Gustavo Sousa <[email protected]>
Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe: Assert G2H outstanding when releasing G2H

Ensure we are managing G2H credits correctly. Extra important now that
this is tied to PM.

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Michal Wajdeczko <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/mmio: Use single logic for waiting functions

The implementations for xe_mmio_wait32() and xe_mmio_wait32_not() are
almost identical. Let us avoid duplication of logic by having them
calling a common __xe_mmio_wait32() function.

Signed-off-by: Gustavo Sousa <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Remove stale declaration of xe_mmio_probe_vram()

The declaration of xe_mmio_probe_vram() became useless since
commit 638d1c79cbf1 ("drm/xe: Promote VRAM initialization function to
own file"). Remove it.

Signed-off-by: Gustavo Sousa <[email protected]>
Reviewed-by: Michal Wajdeczko <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/huc: Define HuC binary for BMG

Add the unversioned define for the BMG HuC FW.

Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/gsc: Define GSC binary for LNL

As with previous binaries, we match the compatibility version instead of
the build number.

Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/huc: Define HuC binary for LNL

Add the unversioned define for the LNL HuC FW.

All new binaries are GSC-enabled (and even if they weren't the driver
can detect the type of HuC binary), so the new lnl HuC filename doesn't
use the _gsc postfix to avoid confusion with the GSC binary.

Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Fix possible UAF in guc_exec_queue_process_msg

Store xe_device ahead of processing message as message can be free'd in
some cases.

v2:
- Including missing local changes
v3:
- Resend for CI

Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Fixes: d930c19fdff3 ("drm/xe: Build PM into GuC CT layer")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Delete unused register from xe_regs.h

Register SOFTWARE_FLAGS_SPR33 is unused; therefore, delete it.

Cc: Michal Wajdeczko <[email protected]>
Signed-off-by: Himal Prasad Ghimiray <[email protected]>
Reviewed-by: Tejas Upadhyay <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe: Add assert for XE_WA() usage

It's not always safe to call XE_WA() in the driver initialization. Add a
xe_gt_assert() so this doesn't go unnoticed.

While at it, fix typo in kernel-doc about OOB workarounds.

Reviewed-by: Tejas Upadhyay <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe: Refactor mmio setup for multi-tile

Extract functions to setup the multi-tile mmio space and extension
space, while better documenting the final memory layout. No change in
behavior.

Reviewed-by: Gustavo Sousa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe: Remove fence check from send_tlb_invalidation

'fence' argument in send_tlb_invalidation cannot be NULL, remove
non-NULL check from send_tlb_invalidation.

Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Fixes: 61ac035361ae ("drm/xe: Drop xe_gt_tlb_invalidation_wait")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>

drm/xe: Store process name and pid in xe file

An xe file can outlive the associated process as the GPU cleanup is just
triggered upon file close (process kill) and completes sometime later.
If the file close triggers error conditions (GPU hangs) the process
cannot be safely referenced to retrieve the name and pid for debug
information. Store the process name and pid directly in the xe file to
be safe.

v2:
- Access file->pid via rcu_access_pointer (Matthew Auld)

Fixes: b10d0c5e9df7 ("drm/xe: Add process name to devcoredump")
Fixes: f6ca930d974e ("drm/xe: Add process name and PID to job timedout message")
Signed-off-by: Matthew Brost <[email protected]>
Acked-by: Rodrigo Vivi <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Return -ENOBUFS if a kmalloc fails which is tied to an array of binds

The size of an array of binds is directly tied to several kmalloc in the
KMD, thus making these kmalloc more likely to fail. Return -ENOBUFS in
the case of these failures.

The expected UMD behavior upon returning -ENOBUFS is to split an array
of binds into a series of single binds.

v2:
- Resend for CI
v3:
- Resend for CI

Cc: Paulo Zanoni <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Fix xe_pt_abort_unbind

When restoring the children PT entries on a bind failure the incorrect
loop index has used resulting in PT entries being leaked. This is shown
by running xe_vm.bind-array-conflict-error-inject on a VRAM device going
into a suspend state after the test completes.

v2:
- s/childern/children (CI, Matt Auld)

Fixes: a708f6501c69 ("drm/xe: Update PT layer with better error handling")
Cc: Matthew Auld <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Fix warning on unreachable statement

eu_type_to_str() relies on -Wswitch to warn (and -Werror) to make sure
it handles all enum values. However it's perfectly legal to pass an int
to that function so in the end that function may happen to return
nothing. There's too much implicit knowledge about the initialization
of eu_type for a compiler to notice eu_type is never assigned to
anything other than those values.

Trying to reproduce this issue, none of gcc-9, gcc-10 and gcc-13
triggered for me, but this was reported in a different system with
gcc-10:

drivers/gpu/drm/xe/xe.o: warning: objtool: xe_gt_topology_dump() falls through to next function xe_gt_topology_init()

Also it was reported these warnings when building with clang:

drivers/gpu/drm/xe/xe.o: warning: objtool: xe_gt_topology_dump+0x77: sibling call from callable instruction with modified stack frame
drivers/gpu/drm/xe/xe.o: warning: objtool: xe_gt_topology_dump() falls through to next function xe_dss_mask_group_ffs()
drivers/gpu/drm/xe/xe.o: warning: objtool: xe_gt_topology_dump+0x77: can't find jump dest instruction at .text.xe_gt_topology_dump+0xc0

Since that value is not really possible in real world, just take the
simple approach and return NULL.

Fixes: 7108b4a589cd ("drm/xe/uapi: Expose SIMD16 EU mask in topology query")
Reviewed-by: Nathan Chancellor <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Reviewed-by: Michal Wajdeczko <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe: Add NEEDS_2M BO flag

In addition of NEEDS_64K BO flag, add similar one to force 2 MiB
alignment of the buffer objects. Explicitly use this flag during
VF LMEM provisioning as LMTT uses 2 MiB pages and one day we may
drop requirement of allocating pinned objects as contiguous.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Normalize NEEDS_64K BO flag

In commit 62742d126631 ("drm/xe: Normalize bo flags macros"),
we normalized all BO flags but XE_BO_NEEDS_64K. Do it now.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/tests: Skip xe_mocs live tests on VF device

There is no point to run those tests on VFs devices as they can't
access any of the MOCS registers. Skip testing on the VF device.

  [ ] =================== xe_mocs (1 subtest) ====================
  [ ] ================ xe_live_mocs_kernel_kunit  ================
  [ ] [PASSED] 0000:4d:00.0
  [ ] [SKIPPED] 0000:4d:00.1
  [ ] ============ [PASSED] xe_live_mocs_kernel_kunit ============
  [ ] ===================== [PASSED] xe_mocs =====================

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/tests: Convert xe_mocs live tests

Convert xe_mocs live tests to parameterized style.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/tests: Convert xe_migrate live tests

Convert xe_migrate live tests to parameterized style.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/tests: Convert xe_dma_buf live tests

Convert xe_dma_buf live tests to parameterized style.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/tests: Convert xe_bo live tests

Convert xe_bo live tests to parameterized style.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/tests: Add helpers for use in live tests

Instead of iterating over available Xe devices within a testcase,
without being able to distinguish potential failures from different
devices on system with many Xe devices, introduce helpers that will
allow to treat each Xe device as a parameter for the testcase like:

static void bar(struct kunit *test)
{
struct xe_device *xe = test->priv;
...
}

struct kunit_case foo_live_tests[] = {
KUNIT_CASE_PARAM(bar, xe_pci_live_device_gen_param),
{}
};

struct kunit_suite foo_suite = {
.name = "foo_live",
.test_cases = foo_live_tests,
.init = xe_kunit_helper_xe_device_live_test_init,
};

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Introduce const cast helper

Typically we want to preserve pointer constness when converting
from one xe pointer to another, but in some rare cases, like kunit
parameter conversions, we might want to discard this constness.
Add a helper that we will use to clearly indicate our intention.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]> #v1
Cc: Lucas De Marchi <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/oa: Don't use hardcoded values

The current implementation uses hardcoded values instead of common defines.

v2:
- Make the commit a regular commit instead of a fixup commit
- slightly modify commit message

Signed-off-by: Ohad Sharabi <[email protected]>
Reviewed-by: Ashutosh Dixit <[email protected]>
Signed-off-by: Ashutosh Dixit <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Build PM into GuC CT layer

Take PM ref when any G2H are outstanding, drop when none are
outstanding.

To safely ensure we have PM ref when in the GuC CT layer, a PM ref needs
to be held when scheduler messages are pending too.

v2:
- Add outer PM protections to xe_file_close (CI)
v3:
- Only take PM ref 0->1 and drop on 1->0 (Matthew Auld)
v4:
- Add assert to G2H increment function
v5:
- Rebase
v6:
- Declare xe as local variable in xe_file_close (CI)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Matthew Auld <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Nirmoy Das <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Hold a PM ref when GT TLB invalidations are inflight

Avoid GT TLB invalidation timeouts by holding a PM ref when
invalidations are inflight.

v2:
- Drop PM ref before signaling fence (CI)
v3:
- Move invalidation_fence_signal helper in tlb timeout to previous
patch (Matthew Auld)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <[email protected]>
Cc: Nirmoy Das <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Drop xe_gt_tlb_invalidation_wait

Having two methods to wait on GT TLB invalidations is not ideal. Remove
xe_gt_tlb_invalidation_wait and only use GT TLB invalidation fences.

In addition to two methods being less than ideal, once GT TLB
invalidations are coalesced the seqno cannot be assigned during
xe_gt_tlb_invalidation_ggtt/range. Thus xe_gt_tlb_invalidation_wait
would not have a seqno to wait one. A fence however can be armed and
later signaled.

v3:
- Add explaination about coalescing to commit message
v4:
- Don't put dma fence if defined on stack (CI)
v5:
- Initialize ret to zero (CI)
v6:
- Use invalidation_fence_signal helper in tlb timeout (Matthew Auld)

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Add xe_gt_tlb_invalidation_fence_init helper

Other layers should not be touching struct xe_gt_tlb_invalidation_fence
directly, add helper for initialization.

v2:
- Add dma_fence_get and list init to xe_gt_tlb_invalidation_fence_init

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Fix register value lookup

We should use the number of actual entries stored in the runtime
register buffer, not the maximum number of entries that this buffer
can hold, otherwise bsearch() may fail and we may miss the data and
wrongly report unexpected access to some registers.

Fixes: 4edadc41a3a4 ("drm/xe/vf: Use register values obtained from the PF")
Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Piotr Piórkowski <[email protected]>
Cc: Matt Roper <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Fix use after free when client stats are captured

xe_file_close triggers an asynchronous queue cleanup and then frees up
the xef object. Since queue cleanup flushes all pending jobs and the KMD
stores client usage stats into the xef object after jobs are flushed, we
see a use-after-free for the xef object. Resolve this by taking a
reference to xef from xe_exec_queue.

While at it, revert an earlier change that contained a partial work
around for this issue.

v2:
- Take a ref to xef even for the VM bind queue (Matt)
- Squash patches relevant to that fix and work around (Lucas)

v3: Fix typo (Lucas)

Fixes: ce62827bc294 ("drm/xe: Do not access xe file when updating exec queue run_ticks")
Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1908
Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe: Take a ref to xe file when user creates a VM

Take a reference to xef when user creates the VM and put the reference
when user destroys the VM.

Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe: Add ref counting for xe_file

Add ref counting for xe_file.

v2:
- Add kernel doc for exported functions (Matt)
- Instead of xe_file_destroy, export the get/put helpers (Lucas)

v3: Fixup the kernel-doc format and description (Matt, Lucas)

Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe: Move part of xe_file cleanup to a helper

In order to make xe_file ref counted, move destruction of xe_file
members to a helper.

v2: Move xe_vm_close_and_put back into xe_file_close (Matt)

Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/uapi: Expose SIMD16 EU mask in topology query

PVC, Xe2 and later platforms have 16-wide EUs. We were implicitly
reporting for PVC the number of 16-wide EUs without giving userspace any
hint that they were different than for other platforms. Xe2 and later
also have 16-wide, but in those cases the reported number would
correspond to the 8-wide count.

To avoid confusion and make sure the right number is used by userspace
depending on the platform, add a new item to the topology query and drop
the one that is not available. The new mask reported for both PVC and
Xe2 should now match the numbers reported via hwconfig.

v2: Use a different topo item with EU type in its name to report the
new mask instead of adding the type itself as the item (Matt Roper)

Reviewed-by: Matt Roper <[email protected]>
Acked-by: José Roberto de Souza <[email protected]>
Acked-by: Mateusz Jablonski <[email protected]>
Acked-by: Wenbin Lu <[email protected]>
Acked-by: Effie Yu <[email protected]>
Acked-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe: Remove unused xe_sync_entry_wait

xe_sync_entry_wait is no longer used, remove it.

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Validate user fence during creation

Fail invalid addresses during user fence creation.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/pm: Add trace for pm functions

Add trace for xe pm function for better debuggability.

v2: Fix indentation and add trace for xe_pm_runtime_get_ioctl

Cc: Matthew Brost <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>

drm/xe/fbdev: Limit the usage of stolen for LNL+

As per recommendation in the workarounds:
WA_22019338487

There is an issue with accessing Stolen memory pages due a
hardware limitation. Limit the usage of stolen memory for
fbdev for LNL+. Don't use BIOS FB from stolen on LNL+ and
assign the same from system memory.

v2: Corrected the WA Number, limited WA to LNL and
    Adopted XE_WA framework as suggested by Lucas and Matt.

v3: Introduced the waxxx_display to implement display side
    of WA changes on Lunarlake. Used xe_root_mmio_gt and
    avoid the for loop (Suggested by Lucas)

v4: Fixed some nits (Luca)

Reviewed-by: Lucas De Marchi <[email protected]>
Signed-off-by: Uma Shankar <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/xe2: Do not run xe_bo_test for xe2+ dgfx

In xe2+ dgfx, we don't need to handle the copying of ccs
metadata during migration. This test validates the ccs data post
clear and copy during evict/restore operation. Thus, we can skip
this test on xe2+ dgfx.

Signed-off-by: Akshata Jahagirdar <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/57d9df82ad02e53c9b0d2a7d40bb27acce57b927.1721250309.git.akshata.jahagirdar@intel.com

drm/xe/migrate: Add kunit to test migration functionality for BMG

This part of kunit verifies that
- main data is decompressed and ccs data is clear post bo eviction.
- main data is raw copied and ccs data is clear post bo restore.

v2: Added missing bo_put()/bo_unlock() (Matt Auld)

Signed-off-by: Akshata Jahagirdar <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/1d36d4377c566508e42b3fb80d3fe4a588fd00ca.1721250309.git.akshata.jahagirdar@intel.com

drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx

During eviction (vram->sysmem), we use compressed -> uncompressed mapping.
During restore (sysmem->vram), we need to use mapping from
uncompressed -> uncompressed.
Handle logic for selecting the compressed identity map for eviction,
and selecting uncompressed map for restore operations.
v2: Move check of xe_migrate_ccs_emit() before calling
xe_migrate_ccs_copy(). (Nirmoy)

Signed-off-by: Akshata Jahagirdar <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/79b3a016e686a662ae68c32b5fc7f0f2ac8043e9.1721250309.git.akshata.jahagirdar@intel.com

drm/xe/xe2: Introduce identity map for compressed pat for vram

Xe2+ has unified compression (exactly one compression mode/format),
where compression is now controlled via PAT at PTE level.
This simplifies KMD operations, as it can now decompress freely
without concern for the buffer's original compression format—unlike DG2,
which had multiple compression formats and thus required copying the
raw CCS state during VRAM eviction. In addition mixed VRAM and system
memory buffers were not supported with compression enabled.

On Xe2 dGPU compression is still only supported with VRAM, however we
can now support compression with VRAM and system memory buffers,
with GPU access being seamless underneath. So long as when doing
VRAM -> system memory the KMD uses compressed -> uncompressed,
to decompress it. This also allows CPU access to such buffers,
assuming that userspace first decompress the corresponding
pages being accessed.
If the pages are already in system memory then KMD would have already
decompressed them. When restoring such buffers with sysmem -> VRAM
the KMD can't easily know which pages were originally compressed,
so we always use uncompressed -> uncompressed here.
With this it also means we can drop all the raw CCS handling on such
platforms (including needing to allocate extra CCS storage).

In order to support this we now need to have two different identity
mappings for compressed and uncompressed VRAM.
In this patch, we set up the additional identity map for the VRAM with
compressed pat_index. We then select the appropriate mapping during
migration/clear. During eviction (vram->sysmem), we use the mapping
from compressed -> uncompressed. During restore (sysmem->vram), we need
the mapping from uncompressed -> uncompressed.
Therefore, we need to have two different mappings for compressed and
uncompressed vram. We set up an additional identity map for the vram
with compressed pat_index.
We then select the appropriate mapping during migration/clear.

v2: Formatting nits, Updated code to match recent changes in
xe_migrate_prepare_vm(). (Matt)

v3: Move identity map loop to a helper function. (Matt Brost)

v4: Split helper function in different patch, and
add asserts and nits. (Matt Brost)

v5: Convert the 2 bool arguments of pte_update_size to flags
argument (Matt Brost)

v6: Formatting nits (Matt Brost)

Signed-off-by: Akshata Jahagirdar <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/b00db5c7267e54260cb6183ba24b15c1e6ae52a3.1721250309.git.akshata.jahagirdar@intel.com

drm/xe/migrate: Add helper function to program identity map

Add an helper function to program identity map.

v2: Formatting nits

Signed-off-by: Akshata Jahagirdar <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/91dc05f05bd33076fb9a9f74f8495b48d2abff53.1721250309.git.akshata.jahagirdar@intel.com

drm/xe/migrate: Add kunit to test clear functionality

This test verifies if the main and ccs data are cleared during bo creation.
The motivation to use Kunit instead of IGT is that, although we can verify
whether the data is zero following bo creation,
we cannot confirm whether the zero value after bo creation is the result of
our clear function or simply because the initial data present was zero.

v2: Updated the mutex_lock and unlock logic,
Changed out_unlock to out_put. (Matt)

v3: Added missing dma_fence_put(). (Nirmoy)

v4: Rebase.

v5: Add missing bo_put(), bo_unlock() calls. (Matt Auld)

Signed-off-by: Akshata Jahagirdar <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Acked-by: Nirmoy Das <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/c07603439b88cfc99e78c0e2069327e65d5aa87d.1721250309.git.akshata.jahagirdar@intel.com

drm/xe/migrate: Handle clear ccs logic for xe2 dgfx

For Xe2 dGPU, we clear the bo by modifying the VRAM using an
uncompressed pat index which then indirectly updates the
compression status as uncompressed i.e zeroed CCS.
So xe_migrate_clear() should be updated for BMG to not
emit CCS surf copy commands.

v2: Moved xe_device_needs_ccs_emit() to xe_migrate.c and changed
name to xe_migrate_needs_ccs_emit() since its very specific to
migration.(Matt)

Signed-off-by: Akshata Jahagirdar <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/8dd869dd8dda5e17ace28c04f1a48675f5540874.1721250309.git.akshata.jahagirdar@intel.com

drm/xe: Don't suspend device upon wedge

When wedging a device we shouldn't be suspending device as state for
debug will be lost.

Also this appears to not work as the below stack trace pops upon trying
to resume a wedged device:

[  304.245044] INFO: task cat:12115 blocked for more than 151 seconds.
[  304.251333]       Tainted: G        W          6.10.0-rc7-xe+ #3518
[  304.257617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  304.265459] task:cat             state:D stack:13384 pid:12115 tgid:12115 ppid:3986   flags:0x00000006
[  304.265465] Call Trace:
[  304.265467]  <TASK>
[  304.265469]  __schedule+0x3c4/0xdf0
[  304.265478]  schedule+0x3c/0x140
[  304.265481]  rpm_resume+0x1cc/0x740
[  304.265484]  ? __pfx_autoremove_wake_function+0x10/0x10
[  304.265489]  __pm_runtime_resume+0x49/0x80
[  304.265494]  guc_info+0x6b/0xb0 [xe]
[  304.265538]  ? __pfx___drm_printfn_seq_file+0x10/0x10
[  304.265541]  ? __pfx___drm_puts_seq_file+0x10/0x10
[  304.265545]  seq_read_iter+0x111/0x4c0
[  304.265551]  seq_read+0xfc/0x140
[  304.265556]  full_proxy_read+0x58/0x80
[  304.265560]  vfs_read+0xa7/0x360
[  304.265563]  ? find_held_lock+0x2b/0x80
[  304.265568]  ksys_read+0x64/0xe0
[  304.265571]  do_syscall_64+0x68/0x140
[  304.265575]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  304.265578] RIP: 0033:0x7f4254d14992
[  304.265580] RSP: 002b:00007ffc558666f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  304.265583] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f4254d14992
[  304.265584] RDX: 0000000000020000 RSI: 00007f4254ebb000 RDI: 0000000000000003
[  304.265586] RBP: 00007f4254ebb000 R08: 00007f4254eba010 R09: 00007f4254eba010
[  304.265587] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000
[  304.265588] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[  304.265593]  </TASK>
[  304.265594]
               Showing all locks held in the system:
[  304.265598] 1 lock held by khungtaskd/57:
[  304.265599]  #0: ffffffff8273b860 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x36/0x1c0
[  304.265607] 3 locks held by kworker/6:1/90:
[  304.265610] 1 lock held by in:imklog/547:
[  304.265611]  #0: ffff88810498cd88 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x76/0xc0
[  304.265620] 1 lock held by dmesg/1310:

v2: Drop local 'err' variable (Jonathan)

Fixes: 8ed9aaae39f3 ("drm/xe: Force wedged state and block GT reset upon any GPU hang")
Cc: Rodrigo Vivi <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Wedge the entire device

Wedge the entire device, not just GT which may have triggered the wedge.
To implement this, cleanup the layering so xe_device_declare_wedged()
calls into the lower layers (GT) to ensure entire device is wedged.

While we are here, also signal any pending GT TLB invalidations upon
wedging device.

Lastly, short circuit reset wait if device is wedged.

v2:
- Short circuit reset wait if device is wedged (Local testing)

Fixes: 8ed9aaae39f3 ("drm/xe: Force wedged state and block GT reset upon any GPU hang")
Cc: Rodrigo Vivi <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/gsc: add Battlemage support

Add heci_cscfi support bit for new CSC engine type.
It has same mmio offsets as DG2 GSC but separate interrupt flow.

Signed-off-by: Alexander Usyskin <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Daniele Ceraolo Spurio <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/vf: Track writes to inaccessible registers from VF

Only limited set of registers is accessible for the VF driver and
the hardware will silently drop writes to inaccessible registers.
To improve our VF driver lets intercept all such writes to warn
about such unexpected writes on debug builds or optionally allow
to provide some substitution (as a potential future extension).

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Gustavo Sousa <[email protected]>
Cc: Piotr Piórkowski <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/xe2: Add Wa_15015404425

Wa_15015404425 asks us to perform four "dummy" writes to a
non-existent register offset before every real register read.
Although the specific offset of the writes doesn't directly
matter, the workaround suggests offset 0x130030 as a good target
so that these writes will be easy to recognize and filter out in
debugging traces.

V5(MattR):
  - Avoid negating an equality comparison
V4(MattR):
  - Use writel and remove xe_reg usage
V3(MattR):
  - Define dummy reg local to function
  - Avoid tracing dummy writes
  - Update commit message
V2:
  - Add WA to 8/16/32bit reads also - MattR
  - Corrected dummy reg address - MattR
  - Use for loop to avoid mental pause - JaniN

Reviewed-by: Matt Roper <[email protected]>
Signed-off-by: Tejas Upadhyay <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/pf: Limit fair VF LMEM provisioning

Due to the current design of the BO and VRAM manager, any object
with XE_BO_FLAG_PINNED flag, which the PF driver uses during VF
LMEM provisionining, is created with the TTM_PL_FLAG_CONTIGUOUS
flag, which may cause VRAM fragmentation that prevents subsequent
allocations of larger objects, like fair VF LMEM provisioning.

To avoid such failures, round down fair VF LMEM provisioning size
to next power of two size, to compensate what xe_ttm_vram_mgr is
doing to achieve contiguous allocations.

Fixes: ac6598aed1b3 ("drm/xe/pf: Add support to configure SR-IOV VFs")
Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Piotr Piórkowski <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/exec: Fix minor bug related to xe_sync_entry_cleanup

Increment num_syncs after xe_sync_entry_parse() is successful to ensure
the xe_sync_entry_cleanup() logic under "err_syncs" label works correctly.

v2: Use the same pattern as that in xe_vm.c (Matt Brost)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Ashutosh Dixit <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/kunit: Simplify xe_mocs live tests code layout

The test case logic is implemented by the functions compiled as
part of the core Xe driver module and then exported to build and
register the test suite in the live test module.

But we don't need to export individual test case functions, we may
just export the entire test suite. And we don't need to register
this test suite in a separate file, it can be done in the main
file of the live test module.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/kunit: Simplify xe_migrate live tests code layout

The test case logic is implemented by the functions compiled as
part of the core Xe driver module and then exported to build and
register the test suite in the live test module.

But we don't need to export individual test case functions, we may
just export the entire test suite. And we don't need to register
this test suite in a separate file, it can be done in the main
file of the live test module.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/kunit: Simplify xe_dma_buf live tests code layout

The test case logic is implemented by the functions compiled as
part of the core Xe driver module and then exported to build and
register the test suite in the live test module.

But we don't need to export individual test case functions, we may
just export the entire test suite. And we don't need to register
this test suite in a separate file, it can be done in the main
file of the live test module.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/kunit: Simplify xe_bo live tests code layout

The test case logic is implemented by the functions compiled as
part of the core Xe driver module and then exported to build and
register the test suite in the live test module.

But we don't need to export individual test case functions, we may
just export the entire test suite. And we don't need to register
this test suite in a separate file, it can be done in the main
file of the live test module.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/kunit: Drop XE_TEST_EXPORT

It's unused and can be replaced with VISIBLE_IF_KUNIT if needed.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/kunit: Kill xe_cur_kunit()

We shouldn't use custom helper if there is a official one.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Add process name and PID to job timedout message

This will be very helpful for Mesa CI, where it uses PID to match
the exacly test that cause timedout/GPU hang and mark that test as
failing.

Also printing the process name as it might be relavant for human
readers.

Cc: Rodrigo Vivi <[email protected]>
Signed-off-by: José Roberto de Souza <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/xe2: Make subsequent L2 flush sequential

Issuing the flush on top of an ongoing flush is not desirable.
Lets use lock to make it sequential.

Reviewed-by: Nirmoy Das <[email protected]>
Signed-off-by: Tejas Upadhyay <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>

drm/xe/display/xe_hdcp_gsc: Free arbiter on driver removal

Free arbiter allocated in intel_hdcp_gsc_init().

Fixes: 152f2df954d8 ("drm/xe/hdcp: Enable HDCP for XE")
Cc: Suraj Kandpal <[email protected]>
Cc: Arun R Murthy <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Nirmoy Das <[email protected]>

drm/xe: Generate oob before compiling anything

Instead of keep adding more dependencies as WAs are needed in different
places of the driver, just add a rule with all the objects so the code
generation happens before anything else.

While at it, group lines related to wa_oob in the Makefile.

v2: Prefix $(obj) when declaring dependency

Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/gt: Remove double include

The header generated/xe_wa_oob.h is included twice. Remove one.

Fixes: 01570b446939 ("drm/xe/bmg: implement Wa_16023588340")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Reviewed-by: Michal Wajdeczko <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe/xe2lpg: Extend workaround 14021402888

workaround 14021402888 also applies to Xe2_LPG.
Replicate the existing entry to one specific for Xe2_LPG.

Signed-off-by: Bommu Krishnaiah <[email protected]>
Cc: Tejas Upadhyay <[email protected]>
Cc: Matt Roper <[email protected]>
Cc: Himal Prasad Ghimiray <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Lucas De Marchi <[email protected]>

drm/xe: Drop trace_xe_hw_fence_free

fence->ctx may be stale memory when trace_xe_hw_fence_free is called
resuling UAF bug when deriving the device name. This tracepoint is not
all that useful, so just drop it.

Fixes: 501c4255c409 ("drm/xe/trace: Print device_id in xe_trace events")
Cc: Ville Syrjälä <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Gustavo Sousa <[email protected]>
Cc: Radhakrishna Sripada <[email protected]>
Cc: Matt Roper <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/xe2lpm: Extend Wa_16021639441

Wa_16021639441 applies to Xe2_LPM.

Signed-off-by: Ngai-Mint Kwan <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Signed-off-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Use write-back caching mode for system memory on DGFX

The caching mode for buffer objects with VRAM as a possible
placement was forced to write-combined, regardless of placement.

However, write-combined system memory is expensive to allocate and
even though it is pooled, the pool is expensive to shrink, since
it involves global CPU TLB flushes.

Moreover write-combined system memory from TTM is only reliably
available on x86 and DGFX doesn't have an x86 restriction.

So regardless of the cpu caching mode selected for a bo,
internally use write-back caching mode for system memory on DGFX.

Coherency is maintained, but user-space clients may perceive a
difference in cpu access speeds.

v2:
- Update RB- and Ack tags.
- Rephrase wording in xe_drm.h (Matt Roper)
v3:
- Really rephrase wording.

Signed-off-by: Thomas Hellström <[email protected]>
Fixes: 622f709ca629 ("drm/xe/uapi: Add support for CPU caching mode")
Cc: Pallavi Mishra <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: [email protected]
Cc: Joonas Lahtinen <[email protected]>
Cc: Effie Yu <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Jose Souza <[email protected]>
Cc: Michal Mrozek <[email protected]>
Cc: <[email protected]> # v6.8+
Acked-by: Matthew Auld <[email protected]>
Acked-by: José Roberto de Souza <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Fixes: 622f709ca629 ("drm/xe/uapi: Add support for CPU caching mode")
Acked-by: Michal Mrozek <[email protected]>
Acked-by: Effie Yu <[email protected]> #On chat
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/i915: disable fbc due to Wa_16023588340

On BMG-G21 we need to disable fbc due to complications around the WA.

v2:
- Try to handle with i915_drv.h and compat layer. (Rodrigo)
v3:
- For simplicity retreat back to the original design for now.
- Drop the extra \ from the Makefile (Jani)

Signed-off-by: Matthew Auld <[email protected]>
Cc: Jonathan Cavitt <[email protected]>
Cc: Matt Roper <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Vinod Govindapillai <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: [email protected]
Reviewed-by: Jonathan Cavitt <[email protected]>
Acked-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/bmg: implement Wa_16023588340

This involves enabling l2 caching of host side memory access to VRAM
through the CPU BAR. The main fallout here is with display since VRAM
writes from CPU can now be cached in GPU l2, and display is never
coherent with caches, so needs various manual flushing. In the case of
fbc we disable it due to complications in getting this to work
correctly (in a later patch).

Signed-off-by: Matthew Auld <[email protected]>
Cc: Jonathan Cavitt <[email protected]>
Cc: Matt Roper <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Cc: Vinod Govindapillai <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Use VF_CAP_REG for device wmb

To force a write barrier on the device memory, we write to the
SOFTWARE_FLAGS_SPR33 register, but this particular register was
selected because it was one of the writable and unused register.

Since a write barrier should also work if we use the read-only
register, switch to VF_CAP_REG register that is also marked as
accessible for VFs.

While at it, add simple kernel-doc for xe_device_wmb() function.

Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Matt Roper <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Kill regs/xe_sriov_regs.h

There is no real benefit to maintain a separate file. The register
definitions related to SR-IOV can be placed in existing headers.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Fix register definition order in xe_regs.h

Swap XEHP_CLOCK_GATE_DIS(0x101014) with GU_DEBUG(x101018).

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Reviewed-by: Himal Prasad Ghimiray <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Add VM bind IOCTL error injection

Add VM bind IOCTL error injection which steals MSB of the bind flags
field which if set injects errors at various points in the VM bind
IOCTL. Intended to validate error paths. Enabled by CONFIG_DRM_XE_DEBUG.

v4:
- Change define layout (Jonathan)

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Update PT layer with better error handling

Update PT layer so if a memory allocation for a PTE fails the error can
be propagated to the user without requiring the VM to be killed.

v5:
- change return value invalidation_fence_init to void (Matthew Auld)
v7:
- Invert i,j usage in two places (Matthew Auld)
- s/0/NULL (Matthew Auld)
- Don't ignore return value of xe_pt_new_shared (Matthew Auld)
- Don't check for NULL in xe_pt_entry (Matthew Auld)

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Update VM trace events

The trace events have changed moving to a single job per VM bind IOCTL,
update the trace events align with old behavior as much as possible.

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Convert multiple bind ops into single job

This aligns with the uAPI of an array of binds or single bind that
results in multiple GPUVA ops to be considered a single atomic
operations.

The design is roughly:
- xe_vma_ops is a list of xe_vma_op (GPUVA op)
- each xe_vma_op resolves to 0-3 PT ops
- xe_vma_ops creates a single job
- if at any point during binding a failure occurs, xe_vma_ops contains
the information necessary unwind the PT and VMA (GPUVA) state

v2:
- add missing dma-resv slot reservation (CI, testing)
v4:
- Fix TLB invalidation (Paulo)
- Add missing xe_sched_job_last_fence_add/test_dep check (Inspection)
v5:
- Invert i, j usage (Matthew Auld)
- Add helper to test and add job dep (Matthew Auld)
- Return on anything but -ETIME for cpu bind (Matthew Auld)
- Return -ENOBUFS if suballoc of BB fails due to size (Matthew Auld)
- s/do/Do (Matthew Auld)
- Add missing comma (Matthew Auld)
- Do not assign return value to xe_range_fence_insert (Matthew Auld)
v6:
- s/0x1ff/MAX_PTE_PER_SDI (Matthew Auld, CI)
- Check to large of SA in Xe to avoid triggering WARN (Matthew Auld)
- Fix checkpatch issues
v7:
- Rebase
- Support more than 510 PTEs updates in a bind job (Paulo, mesa testing)
v8:
- Rebase

Cc: Thomas Hellström <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Add xe_exec_queue_last_fence_test_dep

Helpful to determine if a bind can immediately use CPU or needs to be
deferred a drm scheduler job.

v7:
- Better wording in kernel doc (Matthew Auld)

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Add xe_vm_pgtable_update_op to xe_vma_ops

Each xe_vma_op resolves to 0-3 pt_ops. Add storage for the pt_ops to
xe_vma_ops which is dynamically allocated based the number and types of
xe_vma_op in the xe_vma_ops list. Allocation only implemented in this
patch.

This will help with converting xe_vma_ops (multiple xe_vma_op) in a
atomic update unit.

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: s/xe_tile_migrate_engine/xe_tile_migrate_exec_queue

Engine is old nomenclature, replace with exec queue.

Signed-off-by: Matthew Brost <[email protected]>
Reviewed-by: Jonathan Cavitt <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/uapi: Rename xe perf layer as xe observation layer

In Xe, the perf layer allows capture of HW counter streams. These HW
counters are generally performance related but don't have to be necessarily
so. Also, the name "perf" is a carryover from i915 and is not preferred.

Here we propose the name "observation" for this common layer which allows
capture of different types of these counter streams.

v2: Rename observability layer to observation layer (Lucas/Rodrigo)
v3: Rename sysctl file to "observation_paranoid" (Jose)

Fixes: 52c2e956dceb ("drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types")
Fixes: fe8929bdf835 ("drm/xe/perf/uapi: Add perf_stream_paranoid sysctl")
Acked-by: Lucas De Marchi <[email protected]>
Acked-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Ashutosh Dixit <[email protected]>
Reviewed-by: Umesh Nerlige Ramappa <[email protected]>
Acked-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe: Add timeout to preempt fences

To adhere to dma fencing rules that fences must signal within a
reasonable amount of time, add a 5 second timeout to preempt fences. If
this timeout occurs, kill the associated VM as this fatal to the VM.

v2:
- Add comment for smp_wmb (Checkpatch)
- Fix kernel doc typo (Inspection)
- Add comment for killed check (Niranjana)
v3:
- Drop smp_wmb (Matthew Auld)
- Don't take vm->lock in preempt fence worker (Matthew Auld)
- Drop RB given changes to patch
v4:
- Add WRITE/READ_ONCE (Niranjana)
- Don't export xe_vm_kill (Niranjana)

Cc: Matthew Auld <[email protected]>
Cc: Niranjana Vishwanathapura <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Tested-by: Stuart Summers <[email protected]>
Reviewed-by: Niranjana Vishwanathapura <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/guc: Demote GuC IDs usage message to debug

Printing message at INFO level about available GuC IDs is not that
important, DEBUG level is enough. It will also match message about
available doorbells:

[ ] xe ... [drm:xe_guc_id_mgr_init [xe]] GT0: using 65535 GuC IDs
[ ] xe ... [drm:xe_guc_db_mgr_init [xe]] GT0: using 256 doorbells

While at it, use proper "GuC" name.

Signed-off-by: Michal Wajdeczko <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/xe/bmg: Apply Wa_22019338487

Extend this WA to BMG GT as well. In this case media GT is
not affected. The cap frequencies and max allowed ggtt writes
are different as well. On BMG, we need to do a flush after 1100
GGTT writes, and we need to limit the GT frequency request
to 2133 Mhz during driver load and leave it at that value after
driver unloads.

v3: Fix checkpatch issue

Reviewed-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Vinay Belgaumkar <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe/guc: Prevent use of uninitialized mutex

When skip_guc_pc is set and/or this is for a VF.

Fixes: 3b1592fb7835 ("drm/xe/lnl: Apply Wa_22019338487")
Signed-off-by: Vinay Belgaumkar <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>

drm/xe/oa: Destroy the stream_lock mutex

The mutex allocated in xe_oa_stream_init() was never previously
destroyed. Do so now.

Fixes: e936f885f1e9 ("drm/xe/oa/uapi: Expose OA stream fd")
Cc: Michal Wajdeczko <[email protected]>
Signed-off-by: Ashutosh Dixit <[email protected]>
Reviewed-by: Umesh Nerlige Ramappa <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]