Matt Roper [Thu, 1 Jun 2023 21:52:30 +0000 (14:52 -0700)]
drm/xe: Drop extra_gts[] declarations and XE_GT_TYPE_REMOTE
Now that tiles and GTs are handled separately, extra_gts[] doesn't
really provide any useful information that we can't just infer directly.
The primary GT of the root tile and of the remote tiles behave the same
way and don't need independent handling.
When we re-add support for media GTs in a future patch, the presence of
media can be determined from MEDIA_VER() (i.e., >= 13) and media's GSI
offset handling is expected to remain constant for all forseeable future
platforms, so it won't need to be provided in a definition structure
either.
Matt Roper [Thu, 1 Jun 2023 21:52:28 +0000 (14:52 -0700)]
drm/xe: Clarify 'gt' retrieval for primary tile
There are a bunch of places in the driver where we need to perform
non-GT MMIO against the platform's primary tile (display code, top-level
interrupt enable/disable, driver initialization, etc.). Rename
'to_gt()' to 'xe_primary_mmio_gt()' to clarify that we're trying to get
a primary MMIO handle for these top-level operations.
In the future we need to move away from xe_gt as the target for MMIO
operations (most of which are completely unrelated to GT).
v2:
- s/xe_primary_mmio_gt/xe_root_mmio_gt/ for more consistency with how
we refer to tile 0. (Lucas)
v3:
- Tweak comment on xe_root_mmio_gt(). (Lucas)
Matt Roper [Thu, 1 Jun 2023 21:52:27 +0000 (14:52 -0700)]
drm/xe: Move migration from GT to tile
Migration primarily focuses on the memory associated with a tile, so it
makes more sense to track this at the tile level (especially since the
driver was already skipping migration operations on media GTs).
Note that the blitter engine used to perform the migration always lives
in the tile's primary GT today. In theory that could change if media
GTs ever start including blitter engines in the future, but we can
extend the design if/when that happens in the future.
v2:
- Fix kunit test build
- Kerneldoc parameter name update
v3:
- Removed leftover prototype for removed function. (Gustavo)
- Remove unrelated / unwanted error handling change. (Gustavo)
Matt Roper [Thu, 1 Jun 2023 21:52:25 +0000 (14:52 -0700)]
drm/xe: Memory allocations are tile-based, not GT-based
Since memory and address spaces are a tile concept rather than a GT
concept, we need to plumb tile-based handling through lots of
memory-related code.
Note that one remaining shortcoming here that will need to be addressed
before media GT support can be re-enabled is that although the address
space is shared between a tile's GTs, each GT caches the PTEs
independently in their own TLB and thus TLB invalidation should be
handled at the GT level.
Matt Roper [Thu, 1 Jun 2023 21:52:23 +0000 (14:52 -0700)]
drm/xe: Move VRAM from GT to tile
On platforms with VRAM, the VRAM is associated with the tile, not the
GT.
v2:
- Unsquash the GGTT handling back into its own patch.
- Fix kunit test build
v3:
- Tweak the "FIXME" comment to clarify that this function will be
completely gone by the end of the series. (Lucas)
v4:
- Move a few changes that were supposed to be part of the GGTT patch
back to that commit. (Gustavo)
v5:
- Kerneldoc parameter name fix.
Matt Roper [Thu, 1 Jun 2023 21:52:19 +0000 (14:52 -0700)]
drm/xe: Move register MMIO into xe_tile
Each tile has its own register region in the BAR, containing instances
of all registers for the platform. In contrast, the multiple GTs within
a tile share the same MMIO space; there's just a small subset of
registers (the GSI registers) which have multiple copies at different
offsets (0x0 for primary GT, 0x380000 for media GT). Move the register
MMIO region size/pointers to the tile structure, leaving just the GSI
offset information in the GT structure.
Matt Roper [Thu, 1 Jun 2023 21:52:18 +0000 (14:52 -0700)]
drm/xe: Add for_each_tile iterator
As we start splitting tile handling out from GT handling, we'll need to
be able to iterate over tiles separately from GTs. This iterator will
be used in upcoming patches.
Matt Roper [Thu, 1 Jun 2023 21:52:15 +0000 (14:52 -0700)]
drm/xe: Introduce xe_tile
Create a new xe_tile structure to begin separating the concept of "tile"
from "GT." A tile is effectively a complete GPU, and a GT is just one
part of that. On platforms like MTL, there's only a single full GPU
(tile) which has its IP blocks provided by two GTs. In contrast, a
"multi-tile" platform like PVC is basically multiple complete GPUs
packed behind a single PCI device.
For now, just create xe_tile as a simple wrapper around xe_gt. The
items in xe_gt that are truly tied to the tile rather than the GT will
be moved in future patches. Support for multiple GTs per tile (i.e.,
the MTL standalone media case) will also be re-introduced in a future
patch.
v2:
- Fix kunit test build
- Move hunk from next patch to use local tile variable rather than
direct xe->tiles[id] accesses. (Lucas)
- Mention compute in kerneldoc. (Rodrigo)
Matt Roper [Thu, 1 Jun 2023 21:52:14 +0000 (14:52 -0700)]
drm/xe/mtl: Disable media GT
Xe incorrectly conflates the concept of 'tile' and 'GT.' Since MTL's
media support is not yet functioning properly, let's just disable it
completely for now while we fix the fundamental driver design. Support
for media GTs on platforms like MTL will be re-added later.
v2:
- Drop some unrelated code cleanup that didn't belong in this patch.
(Lucas)
v3:
- Drop unnecessary xe_gt.h include. (Gustavo)
Matthew Auld [Thu, 1 Jun 2023 12:35:05 +0000 (13:35 +0100)]
drm/xe/vm: fix double list add
It looks like the driver only wants to track one vma for each external
object per vm. However it looks like bo_has_vm_references_locked() will
ignore any vma that is marked as vma->destroyed (not actually destroyed
yet). If we then mark our externally tracked vma as destroyed and then
create a new vma for the same object and vm, we can have two externally
tracked vma for the same object and vm. When the destroy actually
happens it tries to move the external tracking to a different vma, but
in this case it is already being tracked, leading to double list add
errors. It should be safe to simply drop the destroyed check in
bo_has_vm_references(), since the actual destroy will switch the
external tracking to the next available vma.
__emit_job_gen12_render_compute() masks some PIPE_CONTROL bits that
do not exist in platforms without render engine.
So here replacing the PVC check by something more generic that will
support any future platforms without render engine.
When creating page tables from xe_exec_ioctl, we may end up freeing
memory we just validated. To be certain this does not happen, do not
allow the current reservation to be evicted from the ioctl.
Matthew Auld [Wed, 24 May 2023 17:56:54 +0000 (18:56 +0100)]
drm/xe: keep pulling mem_access_get further back
Lockdep is unhappy about ggtt->lock -> runtime_pm, where it seems
to think this can somehow get inverted. The ggtt->lock looks like a
potentially sensitive driver lock, so likely a sensible move to never
call the runtime_pm routines while holding it. Actually it looks like
d3cold wants to grab this, so perhaps this can indeed deadlock.
v2:
- Don't forget about xe_gt_tlb_invalidation_vma(), which now needs
explicit access_get.
Matthew Auld [Wed, 24 May 2023 17:56:53 +0000 (18:56 +0100)]
drm/xe: don't allocate under ct->lock
Seems to be a sensitive lock, where ct->lock looks to be primed with
fs_reclaim, so holding that and then allocating memory will cause
lockdep to complain. We need to change the ordering wrt to grabbing the
ct->lock and potentially grabbing the runtime_pm, since some of the
runtime_pm routines can allocate memory (or at least that's what lockdep
seems to suggest).
Matthew Auld [Thu, 25 May 2023 11:45:43 +0000 (12:45 +0100)]
drm/xe/migrate: retain CCS aux state for vram -> vram
There is no mention that migrate_copy() will skip copying the CCS aux
state for all types of vram -> vram transfers. Currently we don't need
such a facility but might be surprising if we ever do.
v2: (Lucas):
- s/lmem/vram/ in the commit message
- Tidy up the code a bit; use one emit_copy_ccs()
v3:
- Reword the commit message
Thomas Hellström [Wed, 24 May 2023 16:52:29 +0000 (16:52 +0000)]
drm/xe: Support copying of data between system memory bos
Modify the xe_migrate_copy() function somewhat to explicitly allow
copying of data between two buffer objects including system memory
buffer objects. Update the migrate test accordingly.
v2:
- Check that buffer object sizes match when copying (Matthew Auld)
Lucas De Marchi [Fri, 26 May 2023 16:43:58 +0000 (09:43 -0700)]
drm/xe/guc: Port Wa_14014475959 to xe_wa and fix it
Port Wa_14014475959 to xe_wa fixing its condition. The workaround should
only be applied on the primary GT, not on media. So just checking by
MTL platform is not enough: checking GT is of the right type is also
needed.
Since the GRAPHICS_STEP() does checks the GT type, we could leave the
first check as a platform one: it'd would be easier to understand and
not go out of sync with the graphics_ip_map[] in
drivers/gpu/drm/xe/xe_pci.c. However it also means that new platforms
using the same IP wouldn't match. Prefer using the IP version.
Lucas De Marchi [Fri, 26 May 2023 16:43:57 +0000 (09:43 -0700)]
drm/xe/rtp: Also check gt type
When running rules on MTL and beyond that have media as a standalone GT,
the rule should only match if the gt passed as parameter match the
version/range/stepping that the rule is checking. This allows
workarounds affecting only the media GT to be applied only on that GT
and vice-versa.
For platforms before MTL, the GT will not be of media type, even if it
includes media engines. Make sure to cover that case by checking if the
platforma has standalone media.
Lucas De Marchi [Fri, 26 May 2023 16:43:55 +0000 (09:43 -0700)]
drm/xe/guc: Port Wa_16015675438/Wa_18020744125 to xe_wa
Wa_16015675438 and Wa_18020744125 apply to DG2 using the same action and
conditions. Add both to the oob rules so they are both reported as
active. Note that previously they were not checking by platform or IP
version, hence making them not future-proof. Those workarounds should
only be active in PVC and DG2, besides the check for "no render engine".
v2: From current WA database, Wa_16015675438 applies to all DG2
subplatforms except G11. Migrate condition to use subplatform and
remove G11 from the match (Matt Roper)
Lucas De Marchi [Fri, 26 May 2023 16:43:54 +0000 (09:43 -0700)]
drm/xe/guc: Port Wa_22012727170/Wa_22012727685 to xe_wa
Wa_22012727170 and Wa_22012727685 apply to DG2 using the same action and
conditions. Add both to the oob rules so they are both reported as
active.
Do not Wa_22012727170 to PVC and MTL since only early A* steppings are
affected.
v2: Remove DG2_G10 from Wa_22012727685 to match current WA database
(Matt Roper)
v3: GRAPHICS_STEP(A0, FOREVER) can be left alone for DG2 as this means
all steppings
Lucas De Marchi [Fri, 26 May 2023 16:43:52 +0000 (09:43 -0700)]
drm/xe/guc: Port Wa_14012197797/Wa_22011391025 to xe_wa
Wa_14012197797 and Wa_22011391025 apply to DG2 using the same action.
They apply to slightly different conditions. Add both to the oob rules
so they are both reported as active.
Lucas De Marchi [Fri, 26 May 2023 16:43:50 +0000 (09:43 -0700)]
drm/xe/guc: Port Wa_22012773006 to xe_wa
Let xe_guc.c start using XE_WA() for workarounds, starting from a simple
one: Wa_22012773006. It's also changed to start with graphics version
12, since that is the first supported by xe.
Lucas De Marchi [Fri, 26 May 2023 16:43:49 +0000 (09:43 -0700)]
drm/xe: Add support for OOB workarounds
There are WAs that, due to their nature, cannot be applied from a
central place like xe_wa.c. Those are peppered around the rest of the
code, as needed. Now they have a new name: "out-of-band workarounds".
These workarounds have their names and rules still grouped in xe_wa.c,
inside the xe_wa_oob array, which is generated at compile time by
xe_wa_oob.rules and the hostprog xe_gen_wa_oob. The code generation
guarantees that the header xe_wa_oob.h contains the IDs for the
workarounds that match the index in the table. This way the runtime
checks that are spread throughout the code are simple tests against the
bitmap saved during initialization.
v2: Fix prev_name tracking not working when it's empty, i.e. when there
is more than 1 continuation rule.
Lucas De Marchi [Fri, 26 May 2023 16:43:48 +0000 (09:43 -0700)]
drm/xe: Include build directory
When doing out-of-tree builds with O= or KBUILD_OUTPUT=, it's important
to also add the directory where the target is saved. Otherwise any file
generated by the build system may not be available for other targets
depending on it.
The $(obj) is added automatically when building the entire kernel,
but it's not added when M=drivers/gpu/drm/xe is added.
Lucas De Marchi [Fri, 26 May 2023 16:43:47 +0000 (09:43 -0700)]
drm/xe/rtp: Add support for entries with no action
Add a separate struct to hold entries in a table that has no action
associated with each of them. The goal is that the caller in future can
set a per-context callback, or just use the active entry marking
feature.
Lucas De Marchi [Fri, 26 May 2023 16:43:46 +0000 (09:43 -0700)]
drm/xe/rtp: Add check for media stepping
Start differentiating the media and graphics stepping as it will be
important for MTL. Note that RTP is still not prepared to handle the
different types of GT, i.e. checking for graphics version/range/stepping
on a media gt or vice versa still matches regardless of the gt being
passed as parameter. Changing it to accommodate MTL is left for a future
patch.
Lucas De Marchi [Fri, 26 May 2023 16:43:45 +0000 (09:43 -0700)]
drm/xe/rtp: Rename STEP to GRAPHICS_STEP
Rename the RTP match in order to prepare the code base to check for the
media version. Up until MTL, the graphics vs media distinction wrt to
stepping was not ver relevant as they were the same GT. However, with
MTL this is no longer true.
Lucas De Marchi [Fri, 26 May 2023 16:43:44 +0000 (09:43 -0700)]
drm/xe/debugfs: Dump active workarounds
Add a "workarounds" node in debugfs that can dump all the active
workarounds using the information recorded by rtp infra when those
workarounds were processed.
Lucas De Marchi [Fri, 26 May 2023 16:43:43 +0000 (09:43 -0700)]
drm/xe/wa: Track gt/engine/lrc active workarounds
Allocate the data to track workarounds on each gt of the device,
and pass that to RTP so the active workarounds are tracked.
Even if the workarounds available until now are mostly device
or platform centric, with the different IP versions for media and
graphics starting with MTL, it's possible that some workarounds
need to be applied only on select GTs. Also, given the workaround
database is per IP block, for tracking purposes there is no need to
differentiate the workarounds per engine class. Hence the bitmask
to track active workarounds can be tracked per GT.
v2: Move the tracking from per-device to per-GT basis (Matt Roper)
Lucas De Marchi [Fri, 26 May 2023 16:43:42 +0000 (09:43 -0700)]
drm/xe/rtp: Allow to track active workarounds
Add the metadata in struct xe_rtp_process_ctx, to be set by
xe_rtp_process_ctx_enable_active_tracking(), so rtp knows how to mark
the active entries while processing the table. This can be used by the
WA infra to record what are the active workarounds.
Lucas De Marchi [Fri, 26 May 2023 16:43:41 +0000 (09:43 -0700)]
drm/xe/rtp: Add "_sr" to entry/function names
The xe_rtp_process() function and xe_rtp_entry depend on the
save-restore struct. In future it will be desired to process rtp rules,
regardless of adding them to a save-restore. Rename the struct and
function so the intent is clear and the name is freed for future uses.
Lucas De Marchi [Fri, 26 May 2023 16:43:39 +0000 (09:43 -0700)]
drm/xe/rtp: Split rtp process initialization
The selection between hwe and gt is exposed to the outside of rtp, by
the xe_rtp_process() function. However it doesn't make seense from the
caller point of view to pass a hwe and a gt as argument since the gt
should always be the one containing the hwe.
This clarifies the interface by separating the context creation into an
initializer. The initializer then passes the correct value and there
should never be a case with hwe and gt set: when hwe is passed, the gt
is the one containing it. Internally the functions continue receiving
the argument separately.
v2: Leave the device-only context to a separate patch if they are indeed
needed later
Lucas De Marchi [Fri, 26 May 2023 16:43:38 +0000 (09:43 -0700)]
drm/xe: Fix Wa_22011802037 annotation
It was missing one digit, so not showing up as a proper WA number. Add
the missing number and annotate it with a FIXME as there are more to be
implemented to consider this WA done: ensure CS is stop before doing a
reset, wait for pending.
Also, this WA applies to platforms up to graphics version 1270 (with the
exception of MTL A*, that are not supported in xe). Fix platform check.
Matt Roper [Wed, 24 May 2023 19:26:35 +0000 (12:26 -0700)]
drm/xe/pvc: Don't try to invalidate AuxCCS TLB
Generally !has_flatccs implies that a platform has AuxCCS compression
and thus needs to invalidate the AuxCCS TLB. However PVC is a special
case because it has no compression of either type (FlatCCS or AuxCCS)
so we should avoid writing to non-existent AuxCCS registers.
Pad the uAPI definition so that it would align identically between
64-bit and 32-bit uarch, so consumers using this header will work
correctly from 32-bit compat userspace on a 64-bit kernel. Do it
in a minimally invasive way, so that 64-bit userspace will still
work with the previous header, and so that no fields suddenly
change sizes.
The iommu_dma_map_sg() function ensures iova allocation doesn't
cross dma segment boundary. It does so by padding some sg elements.
This can cause overflow, ending up with sg->length being set to 0.
Avoid this by halving the maximum segment size (rounded down to
PAGE_SIZE).
Specify maximum segment size for sg elements by using
sg_alloc_table_from_pages_segment() to allocate sg_table.
v2: Use correct max segment size in dma_set_max_seg_size() call
Matt Roper [Wed, 24 May 2023 18:59:52 +0000 (11:59 -0700)]
drm/xe: Add stepping support for GMD_ID platforms
For platforms with GMD_ID registers, the IP stepping should be
determined from the 'revid' field of those registers rather than from
the PCI revid.
The hardware teams have indicated that they plan to keep the revid =>
stepping mapping consistent across all GMD_ID platforms, with major
steppings (A0, B0, C0, etc.) having revids that are multiples of 4, and
minor steppings (A1, A2, A3, etc.) taking the intermediate values. For
now we'll trust that hardware follows through on this plan; if they have
to change direction in the future (e.g., they wind up needing something
like an "A4" that doesn't fit this scheme), we can add a GMD_ID-based
lookup table when the time comes.
v2:
- Set xe->info.platform before finding stepping; the pre-GMD_ID code
relies on this value to pick a lookup table.
v3:
- Also set xe->info.subplatform before picking the stepping for
pre-GMD_ID lookup.
Gustavo Sousa [Thu, 18 May 2023 21:56:51 +0000 (18:56 -0300)]
drm/xe: Fail xe_device_create() if wq allocation fails
Let's make sure we give the driver a valid workqueue.
While at it, also make sure to call destroy_workqueue() only if the
workqueue is a valid one. That is necessary because xe_device_destroy()
is indirectly called as part of the cleanup process of a failed
xe_device_create().
Gustavo Sousa [Thu, 18 May 2023 21:56:50 +0000 (18:56 -0300)]
drm/xe: Call drmm_add_action_or_reset() early in xe_device_create()
Otherwise no cleanup is actually done if we branch to err_put.
This works for now: currently we do know that, once inside
xe_device_destroy(), ttm_device_init() was successful so we can safely
call ttm_device_fini(); and, for xe->ordered_wq, there is an upcoming
commit to check its value before calling destroy_workqueue().
However, we might need change this in the future if we have more
initializers called that can fail in a way that we can not know which
one was it once inside xe_device_destroy().
Rodrigo Vivi [Tue, 16 May 2023 14:54:16 +0000 (10:54 -0400)]
drm/xe: Limit CONFIG_DRM_XE_SIMPLE_ERROR_CAPTURE to itself.
There are multiple kind of config prints and with the upcoming
devcoredump there will be another layer. Let's limit the config
to the top level functions and leave the clean-up work for the
compilers so we don't create a spider-web of configs.
No functional change. Just a preparation for the devcoredump.
Rodrigo Vivi [Tue, 16 May 2023 14:54:15 +0000 (10:54 -0400)]
drm/xe: Add HW Engine snapshot to xe_devcoredump.
Let's continue to add our existent simple logs to devcoredump one
by one. Any format change should come on follow-up work.
v2: remove unnecessary, and now duplicated, dma_fence annotation. (Matthew)
v3: avoid for_each with faulty_engine since that can be already freed at
the time of the read/free. Instead, iterate in the full array of
hw_engines. (Kasan)
Rodrigo Vivi [Tue, 16 May 2023 14:54:14 +0000 (10:54 -0400)]
drm/xe: Convert Xe HW Engine print to snapshot capture and print.
The goal is to allow for a snapshot capture to be taken at the time
of the crash, while the print out can happen at a later time through
the exposed devcoredump virtual device.
v2: Addressing these Matthew comments:
- Handle memory allocation failures.
- Do not use GFP_ATOMIC on cases like debugfs prints.
- placement of @reg doc.
- identation issues.
v3: checkpatch
v4: Rebase and get back to GFP_ATOMIC only.
Rodrigo Vivi [Tue, 16 May 2023 14:54:12 +0000 (10:54 -0400)]
drm/xe: Convert GuC Engine print to snapshot capture and print.
The goal is to allow for a snapshot capture to be taken at the time
of the crash, while the print out can happen at a later time through
the exposed devcoredump virtual device.
v2: Handle memory allocation failures. (Matthew)
Do not use GFP_ATOMIC on cases like debugfs prints. (Matthew)
v3: checkpatch
v4: pending_list allocation needs to be atomic because of the
spin_lock. (Matthew)
get back to GFP_ATOMIC only. (lockdep).
Rodrigo Vivi [Tue, 16 May 2023 14:54:09 +0000 (10:54 -0400)]
drm/xe: Convert GuC CT print to snapshot capture and print.
The goal is to allow for a snapshot capture to be taken at the time
of the crash, while the print out can happen at a later time through
the exposed devcoredump virtual device.
v2: Handle memory allocation failures. (Matthew)
Do not use GFP_ATOMIC on cases like debugfs prints. (Matthew)
v3: checkpatch fixes
v4: Do not use atomic in the g2h_worker_func (Matthew)
Rodrigo Vivi [Tue, 16 May 2023 14:54:08 +0000 (10:54 -0400)]
drm/xe: Extract non mapped regions out of GuC CTB into its own struct.
No functional change here. The goal is to have a clear split between
the mapped portions of the CTB and the static information, so we can
easily capture snapshots that will be used for later read out with
the devcoredump infrastructure.
Rodrigo Vivi [Tue, 16 May 2023 14:54:07 +0000 (10:54 -0400)]
drm/xe: Do not take any action if our device was removed.
Unfortunately devcoredump infrastructure does not provide and
interface for us to force the device removal upon the pci_remove
time of our device.
The devcoredump is linked at the device level, so when in use
it will prevent the module removal, but it doesn't prevent the
call of the pci_remove callback. This callback cannot fail
anyway and we end up clearing and freeing the entire pci device.
Hence, after we removed the pci device, we shouldn't allow any
read or free operations to avoid segmentation fault.
Rodrigo Vivi [Thu, 18 May 2023 21:12:39 +0000 (17:12 -0400)]
drm/xe: Introduce the dev_coredump infrastructure.
The goal is to use devcoredump infrastructure to report error states
captured at the crash time.
The error state will contain useful information for GPU hang debug, such
as INSTDONE registers and the current buffers getting executed, as well
as any other information that helps user space and allow later replays of
the error.
The proposal here is to avoid a Xe only error_state like i915 and use
a standard dev_coredump infrastructure to expose the error state.
For our own case, the data is only useful if it is a snapshot of the
time when the GPU crash has happened, since we reset the GPU immediately
after and the registers might have changed. So the proposal here is to
have an internal snapshot to be printed out later.
Also, usually a subsequent GPU hang can be only a cause of the initial
one. So we only save the 'first' hang. The dev_coredump has a delayed
work queue where it remove the coredump and free all the data within a
few moments of the error. When that happens we also reset our capture
state and allow further snapshots.
Right now this infra only print out the time of the hang. More information
will be migrated here on subsequent work. Also, in order to organize the
dump better, the goal is to propose dev_coredump changes itself to allow
multiple files and different controls. But for now we start Xe usage of
it without any dependency on dev_coredump core changes.
v2: Add dma_fence annotation for capture that might happen during long
running. (Thomas and Matt)
Use xe->drm.primary->index on drm_info msg. (Jani)
v3: checkpatch fixes
v4: Fix building and locking issues found by Francois.
Actually let's kill all of the locking in here. gt_reset serialization
already guarantee that there will be only one capture at the same time.
Also, the devcoredump has its own locking to protect the free and reads
and drivers don't need to duplicate it.
Besides this, the dma_fence locking was pushed to a following patch
since it is not needed in this one.
Fix a use after free identified by KASAN: Do not stash the faulty_engine
since that will be freed somewhere else.
v5: Fix Uptime - ktime_get_boottime actually returns the Uptime. (Francois)
Michal Wajdeczko [Mon, 16 Jan 2023 19:52:57 +0000 (20:52 +0100)]
drm/xe: Introduce GT oriented log messages
While debugging GT related problems, it's good to know which GT was
reporting problems. Introduce helper macros to allow prefix GT logs
with GT identifier. We will use them in upcoming patches.
v2: use xe_ prefix (Lucas)
v3: use correct include
Matthew Brost [Sun, 16 Apr 2023 23:14:26 +0000 (16:14 -0700)]
drm/xe: Allow dma-fences as in-syncs for compute / faulting VM
This is allowed and encouraged by the dma-fencing rules. This along with
allowing compute VMs to export dma-fences on binds will result in a
simpler compute UMD.
Gustavo Sousa [Thu, 11 May 2023 19:48:22 +0000 (16:48 -0300)]
drm/xe: Call exit functions when xe_register_pci_driver() fails
Move xe_register_pci_driver() and xe_unregister_pci_driver() to
init_funcs to make sure that exit functions are also called when
xe_register_pci_driver() fails.
Note that this also allows adding init functions to be run after
xe_register_pci_driver().
v2:
- Move functions to init_funcs instead of having a special case for
xe_register_pci_driver(). (Jani)
Lucas De Marchi [Fri, 12 May 2023 23:36:49 +0000 (16:36 -0700)]
drm/xe: Load HuC on Alderlake P
Alderlake P uses TGL HuC and it was not added together with ADL-S,
because it was failing for unrelated reasons. Now that those are fixed,
allow it to load HuC.
Matt Roper [Wed, 19 Apr 2023 21:37:03 +0000 (14:37 -0700)]
drm/xe/adln: Enable ADL-N
ADL-N is pretty much the same as ADL-P (i.e., Xe_LP graphics + Xe_M
media + Xe_LPD display). However unlike ADL-P, there's no GuC hwconfig
support so the "tgl" GuC firmware should be loaded (i.e., the same
situation as ADL-S).
Matt Roper [Wed, 19 Apr 2023 21:37:02 +0000 (14:37 -0700)]
drm/xe/adlp: Add revid => step mapping
Setup the mapping from PCI revid to IP stepping for ADL-P (and its RPL-P
subplatform) in case this information becomes important for implementing
workarounds.
Matthew Auld [Fri, 5 May 2023 14:49:10 +0000 (15:49 +0100)]
drm/xe: fix tlb_invalidation_seqno_past()
Checking seqno_recv >= seqno looks like it will incorrectly report true
when the seqno has wrapped (not unlikely given
TLB_INVALIDATION_SEQNO_MAX). Calling xe_gt_tlb_invalidation_wait() might
then return before the flush has been completed by the GuC.
Fix this by treating a large negative delta as an indication that the
seqno has wrapped around. Similar to how we treat a large positive delta
as an indication that the seqno_recv must have wrapped around, but in
that case the seqno has likely also signalled.
It looks like we could also potentially make the seqno use the full
32bits as supported by the GuC.
Lucas De Marchi [Mon, 8 May 2023 22:53:19 +0000 (15:53 -0700)]
drm/xe/mmio: Use struct xe_reg
Convert all the callers to deal with xe_mmio_*() using struct xe_reg
instead of plain u32. In a few places there was also a rename
s/reg/reg_val/ when dealing with the value returned so it doesn't get
mixed up with the register address.
With multiple active VMs, under memory pressure, it is possible that
ttm_bo_validate() run into -EDEADLK in ttm_mem_evict_wait_busy() and
return -ENOMEM.
Until ttm properly handles locking in such scenarios, best thing the
driver can do is unwind the lock and retry.
Update xe_exec_begin to retry validating BOs with a timeout upon
-ENOMEM.
With multiple active VMs, under memory pressure, it is possible that
ttm_bo_validate() run into -EDEADLK in ttm_mem_evict_wait_busy() and
return -ENOMEM.
Until ttm properly handles locking in such scenarios, best thing the
driver can do is unwind the lock and retry.
Update preempt worker to retry validating BOs with a timeout upon
-ENOMEM.
Lucas De Marchi [Sat, 29 Apr 2023 06:23:27 +0000 (23:23 -0700)]
drm/xe/guc: Handle RCU_MODE as masked from definition
guc_mmio_regset_write() had a flags for the registers to be added to the
GuC's regset list. The only register actually using that was RCU_MODE,
but it was setting the flags to a bogus value. From
struct xe_guc_fwif.h,
Cross checking with i915, the only flag to set in RCU_MODE is
GUC_REGSET_MASKED. That can be done automatically from the register, as
long as the definition is correct.
Add the XE_REG_OPTION_MASKED annotation to RCU_MODE and kill the "flags"
field in guc_mmio_regset_write(): guc_mmio_regset_write_one() can decide
that based on the register being passed.
Lucas De Marchi [Thu, 4 May 2023 07:32:45 +0000 (00:32 -0700)]
drm/xe: Fix comment on Wa_22013088509
On i915 the "see comment about Wa_22013088509" referred to the comment
in the graphics version >= 11 branch, where there were more details
about it. From the platforms supported by xe, only PVC needs
Wa_22013088509, but as the comment says, it's simpler to do it for all
platforms as there is no downside. Bring the missing comment over from
i915 and reword it to fit xe better.
Additional programming annotated with Wa_<number> should be reserved to
those that have a official workaround. Just pointing to a bug or
additional reference can be done with something else. Copy what i915
does and refer to it as "hsdes: ....".
Lucas De Marchi [Thu, 27 Apr 2023 18:44:09 +0000 (11:44 -0700)]
drm/xe: Fix media detection for pre-GMD_ID platforms
Reading the GMD_ID register on platforms before that register
became available is not reliable. The assumption was that since the
register was not allocated, it would return 0. But on PVC for example it
returns garbage (or a very specific number), triggering the following
error:
xe 0000:8c:00.0: [drm] *ERROR* Hardware reports unknown media version 1025.55
Fix it by stop relying on the value returned by that registers on
platforms before GMD_ID. Instead this relies on the graphics description
struct being already pre-set on the device: this can only ever be true
for platforms before the GMD_ID support. In that case, GMD_ID is skipped
and the hardcoded values are used. This should also help on early
bring-up in case the GMD_ID returns something not expected and we need to
temporarily hardcode values. With this, PVC doesn't trigger the error
and goes straight to:
Lucas De Marchi [Thu, 27 Apr 2023 22:32:56 +0000 (15:32 -0700)]
drm/xe: Move helper macros to separate header
The macros to handle the RTP tables are very scary, but shouldn't be
used outside of the header adding the infra. Move it to a separate
header and make sure it's only included when it can be.
Lucas De Marchi [Thu, 27 Apr 2023 22:32:55 +0000 (15:32 -0700)]
drm/xe: Plumb xe_reg into WAs, rtp, etc
Now that struct xe_reg and struct xe_reg_mcr are types that can be used
by xe, convert more of the driver to use them. Some notes about the
conversions:
- The RTP tables don't need the MASKED flags anymore in the
actions as that information now comes from the register
definition
- There is no need for the _XE_RTP_REG/_XE_RTP_REG_MCR macros
and the register types on RTP infra: that comes from the
register definitions.
- When declaring the RTP entries, there is no need anymore to
undef XE_REG and friends: the RTP macros deal with removing
the cast where needed due to not being able to use a compound
statement for initialization in the tables
- The index in the reg-sr xarray is the register offset only.
Otherwise we wouldn't catch mistakes about adding both a
MCR-style and normal-style registers. For that, the register
is now also part of the entry, so the options can be compared
to check for compatible entries.
In order to be able to accomplish this, some improvements are needed on
the RTP macros. Change its implementation to concentrate on "pasting a prefix
to each argument" rather than the more general "call any macro for each
argument". Hopefully this will avoid trying to extend this infra and
making it more complex. With the use of tuples for building the
arguments, it's not possible to pass additional register fields and
using xe_reg in the RTP tables.
xe_mmio_* still need to be converted, from u32 to xe_reg, but that is
left for another change.