drm/xe/guc: Allow CTB G2H processing without G2H IRQ
During early initialization, in the xe_guc_min_load_for_hwconfig()
function, we are successfully enabling CTB communication, but it
will only allow us to send non-blocking H2G messages, as due to
not yet enabled IRQs, including G2H IRQs, we will not notice any
new G2H message sent by the GuC, including replies to our blocking
H2G request messages. And those successful replies are mandatory
for the VF drivers to continue normal operations.
As attempt to workaround this driver initialization ordering issue,
introduce special safe-mode CTB worker, that will periodically
trigger G2H processing, like original IRQ handler, in case no
MSI/MSIX IRQs were enabled on the driver yet. Once we detect that
IRQ were enabled, we will stop this worker.
We currently have debugfs support that allows the userspace to initiate
an asynchronous gt reset on command. However, userspace may also wish
to wait for the completion of the gt reset before performing any
additional work. To that end, add a version of the force_reset gt
debugfs function that operates synchronously.
Matthew Brost [Wed, 5 Jun 2024 05:50:41 +0000 (22:50 -0700)]
drm/xe: Do not dereference NULL job->fence in trace points
job->fence is not assigned until xe_sched_job_arm(), check for
job->fence in xe_sched_job_seqno() so any usage of this function (trace
points) do not result in NULL ptr dereference. Also check job->fence
before assigning error in job trace points.
Only few steps from the GT restart phase are applicable for the
VF drivers, as initialization of PAT, WOPCM, MOCS or CCS mode can
be done only by the native or PF drivers. Use custom GT restart
function if running in VF mode.
The VF drivers can't trigger real GuC firmware reset using GDRST
register, but for the VF drivers it is sufficient to send VF_RESET
message to reset any VF specific state maintained by the GuC.
Use our existing VF bootstrap function as VF_RESET is part of it.
VF drivers can't modify WOPCM registers nor upload firmwares to
GuC, HuC or GSC. Modify xe_uc initialization functions to skip
those steps if running in the VF mode, or defer to a new custom
helper function that would not include those steps.
Matthew Brost [Mon, 3 Jun 2024 18:18:24 +0000 (11:18 -0700)]
drm/xe: Don't overmap identity VRAM mapping
Overmapping the identity VRAM mapping is triggering hardware bugs on
certain platforms. Use 2M pages for the last unaligned (to 1G) VRAM
chunk.
v2:
- Always use 2M pages for last chunk (Fei Yang)
- break loop when 2M pages are used
- Add assert for usable_size being 2M aligned
v3:
- Fix checkpatch
Andrzej Hajda [Wed, 5 Jun 2024 07:29:48 +0000 (09:29 +0200)]
drm/xe: flush engine buffers before signalling user fence on all engines
Tests show that user fence signalling requires kind of write barrier,
otherwise not all writes performed by the workload will be available
to userspace. It is already done for render and compute, we need it
also for the rest: video, gsc, copy.
Signaling user-fence after seqno write does not seem to be good solution.
Instead of changing order separate barrier should be put before user-fence,
this will be done in separate patch.
v2: added fixes tag in case reverted patch gets backported to stable
Jani Nikula [Thu, 23 May 2024 13:37:06 +0000 (16:37 +0300)]
drm/xe: drop redundant W=1 warnings from Makefile
Since commit a61ddb4393ad ("drm: enable (most) W=1 warnings by default
across the subsystem"), most of the extra warnings in the driver
Makefile are redundant. Remove them.
Note that -Wmissing-declarations and -Wmissing-prototypes are always
enabled by default in scripts/Makefile.extrawarn.
Tejas Upadhyay [Mon, 3 Jun 2024 10:49:50 +0000 (16:19 +0530)]
drm/xe/xe2lpm: Add permanent Wa_14020756599
For xe2_lpm Wa_14020756599 is applied to all steppings and
when RCS is present on graphics GT.
V5(MattR):
- Add more comments about new API
V4:
- Make it part of lrc wa
- Check for RCS as rtp rule
V3(MattR):
- Rename rtp api name
- Use MEDIA_VERx100
V2:
- Remove engine filter video decode
- Fix typo GRAPHICS/MEDIA/s - Himal
Michal Wajdeczko [Mon, 27 May 2024 11:54:08 +0000 (13:54 +0200)]
drm/xe/pf: Update the LMTT when freeing VF GT config
The LMTT must be updated whenever we change the VF LMEM configuration.
We missed that step when freeing the whole VF GT config, which could
result in stale PTE in LMTT or LMTT PT object leaks. Fix that.
Michal Wajdeczko [Thu, 30 May 2024 11:58:14 +0000 (13:58 +0200)]
drm/xe: Split MCR initialization
The initialization order of GT topology, MCR, PAT and GuC HWconfig
as done today by native/PF driver, can't be followed as-is by the
VF driver, since fuse registers used in GT topology discovery will
be obtained by the VF driver from the GuC in HWconfig step.
While native/PF drivers need to program the HW PAT table prior to
loading the GuC, this requires only multicast writes support from
the MCR code, which could be initialized separately from the full
MCR support that requires the GT topology to setup steering data.
Split MCR initialization into two steps to avoid introducing VF
specific code paths. This also fixes duplicated spin_lock inits.
Michal Wajdeczko [Thu, 30 May 2024 13:35:27 +0000 (15:35 +0200)]
drm/xe/vf: Setup VRAM based on received config data
VF drivers will obtain VRAM configuration from the GuC as part of
the VF self config. Use that configuration instead of trying to
read inaccessible registers.
Michal Wajdeczko [Thu, 30 May 2024 13:35:26 +0000 (15:35 +0200)]
drm/xe: Promote VRAM initialization function to own file
There is no point in mixing register access and VRAM code in the
same file. Move and rename the VRAM probe function to a new file
(there are no other changes other then new simple kernel-doc).
Michal Wajdeczko [Thu, 30 May 2024 13:35:25 +0000 (15:35 +0200)]
drm/xe: Drop xe_ prefix from static functions in xe_mmio.c
Rename static functions to align with our typical coding style.
While at it, downgrade the existing kernel-doc for internal
function to normal comment.
Michal Wajdeczko [Thu, 30 May 2024 13:35:24 +0000 (15:35 +0200)]
drm/xe: Move BAR definitions to dedicated file
We should keep all hardware definitions separated from the driver
code. Move LMEM_BAR definition to new regs/xe_bars.h file and also
add there GTTMMADR_BAR definition to avoid using magic 0 resource.
The gt_remove function was explicitly added as part of the remove flow
instead of using drmm/devm automatic cleanup due to it being illegal
to remove a component after the driver has been detached from the pci
device; the GSC proxy component is removed as part of gt_remove, so we
need to do it in the pci cleanup flow. The function already has a
comment above it to explain this.
Note that the change to use the devm also caused an invalid pointer
deref in the gsc_proxy unbind function, but I didn't bother to debug
which pointer was bad since we shouldn't be calling the unbind that
late anyway and this revert fixes it.
Both issue were not seen in CI because the GSC loading is temporarily
disabled due to a critical bug, which means we're not binding the
component.
Arnd Bergmann [Tue, 28 May 2024 13:32:36 +0000 (15:32 +0200)]
drm/xe: replace format-less snprintf() with strscpy()
Using snprintf() with a format string from task->comm is a bit
dangerous since the string may be controlled by unprivileged
userspace:
drivers/gpu/drm/xe/xe_devcoredump.c: In function 'devcoredump_snapshot':
drivers/gpu/drm/xe/xe_devcoredump.c:184:9: error: format not a string literal and no format arguments [-Werror=format-security]
184 | snprintf(ss->process_name, sizeof(ss->process_name), process_name);
| ^~~~~~~~
In this case there is no reason for an snprintf(), so use a simpler
string copy.
Decouple xe_lrc from xe_exec_queue and reference count xe_lrc.
Removing hard coupling between xe_exec_queue and xe_lrc allows
flexible design where the user interface xe_exec_queue can be
destroyed independent of the hardware/firmware interface xe_lrc.
Do not hold xef->exec_queue.lock mutex while parsing the xarray
xef->exec_queue.xa in xe_file_close() as it is not needed and
will cause an unwanted dependency between this lock and the vm->lock.
This lock protects the exec queue lookup and reference taking which
doesn't apply to this code path. When FD is closing, IOCTLs presumably
can't be modifying the xarray.
v2: Update commit text (Matt Brost)
v3: Add more code comment (Rodrigo Vivi)
v4: Further expand code comment (Rodirgo Vivi)
Michal Wajdeczko [Tue, 21 May 2024 14:22:56 +0000 (16:22 +0200)]
drm/xe: Drop undesired prefix from the platform name
We don't have to use exact names of the enumerators as the
potentially user-facing platform names. When constructing
platform descriptor fields, use the unique platform tag and
add the XE_ prefix only to the generated enum field.
John Harrison [Fri, 24 May 2024 20:26:03 +0000 (13:26 -0700)]
drm/xe/guc: Fix uninitialised count in GuC load debug prints
The debug prints about how long the GuC load takes have a loop
counter. However that was neither initialised nor incremented! Plus,
counting loops is no longer meaningful given the wait function returns
early for any change in the status value. So fix it to only count
loops due to actual timeouts.
Riana Tauro [Fri, 24 May 2024 07:09:16 +0000 (12:39 +0530)]
drm/xe: Enable Coarse Power Gating
Enable power gating for all units and sub-pipes that
are disabled by default.
v2: change the init function name
use symmetric calls for enable/disable pg
re-pharase commit message (Rodrigo)
modify the sub-pipe power gating condition
v3: set hysteresis value for render and media
when GuC PC is disabled
skip CPG for PVC (Vinay)
Matt Roper [Fri, 24 May 2024 23:04:45 +0000 (16:04 -0700)]
drm/xe: Don't refer to general LRC initialization as a "wa"
During engine LRC initialization a number of registers need to be
programmed as general setup. This programming is not a "workaround" so
naming the RTP table as "lrc_was" is misleading; switch to the name
"lrc_setup" to more accurately describe what the table is actually for.
Michal Wajdeczko [Tue, 21 May 2024 14:22:55 +0000 (16:22 +0200)]
drm/xe: Store platform name in xe_device.info
We already maintain the platform name as part of the device
descriptor, but in xe_device.info we only store platform enum,
which is not the best for use in any user-facing messages.
Andrzej Hajda [Wed, 22 May 2024 07:27:27 +0000 (09:27 +0200)]
drm/xe: flush gtt before signalling user fence on all engines
Tests show that user fence signalling requires kind of write barrier,
otherwise not all writes performed by the workload will be available
to userspace. It is already done for render and compute, we need it
also for the rest: video, gsc, copy.
v2: added gsc and copy engines, added fixes and r-b tags
drm/xe: Do not access xe file when updating exec queue run_ticks
The current code is running into a use after free case where xe file is
closed before the exec queue run_ticks can be updated. This is occurring
in the xe_file_close path. To fix that, do not access xe file when
updating the exec queue run_ticks. Instead store the exec queue run_ticks
locally in the exec queue object and accumulate it when the user dumps
the drm client stats. We know that the xe file is valid when user is
dumping the run_ticks for the drm client, so this effectively
removes the dependency on xe file object in
xe_exec_queue_update_run_ticks().
v2:
- Fix the accumulation of q->run_ticks delta into xe file run_ticks
- s/runtime/run_ticks/ (Rodrigo)
drm/xe: Use run_ticks instead of runtime for client stats
Note that runtime is also used in the pm context, so it is confusing to
use the same name to denote run time of the drm client. Use a more
appropriate name for the client utilization.
While at it, drop the incorrect multi-lrc comment in the helper
description
Thomas Hellström [Mon, 27 May 2024 13:59:12 +0000 (15:59 +0200)]
drm/xe: Move job creation out of the struct xe_migrate::job_mutex
In order to be able to run gpu jobs from reclaim context,
move job creation (where allocation takes place) out of the
struct xe_migrate::job_mutex, and prime that mutex as reclaim
tainted.
Jobs that may need to run from reclaim context include
CCS metadata extraction at shrinking time.
Thomas Hellström [Mon, 27 May 2024 13:59:10 +0000 (15:59 +0200)]
drm/xe: Don't initialize fences at xe_sched_job_create()
Pre-allocate but don't initialize fences at xe_sched_job_create(),
and initialize / arm them instead at xe_sched_job_arm(). This
makes it possible to move xe_sched_job_create() with its memory
allocation out of any lock that is required for fence
initialization, and that may not allow memory allocation under it.
Replaces the struct dma_fence_array for parallell jobs with a
struct dma_fence_chain, since the former doesn't allow
a split-up between allocation and initialization.
v2:
- Rebase.
- Don't always use the first lrc when initializing parallel
lrc fences.
- Use dma_fence_chain_contained() to access the lrc fences.
v4:
- Add an assert that job->lrc_seqno == fence->seqno.
(Matthew Brost)
Thomas Hellström [Mon, 27 May 2024 13:59:09 +0000 (15:59 +0200)]
drm/xe: Split lrc seqno fence creation up
Since sometimes a lock is required to initialize a seqno fence,
and it might be desirable not to hold that lock while performing
memory allocations, split the lrc seqno fence creation up into an
allocation phase and an initialization phase.
Since lrc seqno fences under the hood are hw_fences, do the same
for these and remove the xe_hw_fence_create() function since it
is not used anymore.
Matthew Brost [Mon, 27 May 2024 13:59:08 +0000 (15:59 +0200)]
drm/xe: Decouple job seqno and lrc seqno
Tightly coupling these seqno presents problems if alternative fences for
jobs are used. Decouple these for correctness.
v2:
- Slightly reword commit message (Thomas)
- Make sure the lrc fence ops are used in comparison (Thomas)
- Assume seqno is unsigned rather than signed in format string (Thomas)
Michal Wajdeczko [Mon, 27 May 2024 11:20:15 +0000 (13:20 +0200)]
drm/xe/vf: Use only assigned GGTT region
Each VF is assigned a limited range of the GGTT address space.
To ensure that the VF driver does not use GGTT allocations outside
of the assigned region, explicitly reserve GGTT space below and
above this region when initializing GGTT.
Michal Wajdeczko [Fri, 24 May 2024 11:37:13 +0000 (13:37 +0200)]
drm/xe/vf: Read VF configuration prior to GGTT initialization
Each VF will be assigned with only a limited range of the GGTT
address space. Make sure that VF driver will read its own GGTT
configuration before starting any GGTT initialization.
Michal Wajdeczko [Thu, 23 May 2024 19:22:40 +0000 (21:22 +0200)]
drm/xe/vf: Treat GMDID as another runtime register
While the GMDID registers are not part of the runtime register list
shared by the PF driver, we may still return cached values from our
VF specific read32() helper function.
Michal Wajdeczko [Thu, 23 May 2024 19:22:39 +0000 (21:22 +0200)]
drm/xe/vf: Cache value of the GMDID register
Read and cache value of the GMDID register as part of the config
query that VF driver is doing over MMIO.
While the VF driver likely already obtained the value of the GMDID
register once during the early driver probe, we couldn't cache it
then as the GT structures were not ready yet.
Cache it now, in case the driver needs it later when the GuC MMIO
communication, required to query GMDID from GuC, could be no longer
desired as it will be replaced by the CTB communication.
While around, assert that we will query GMDID only when applicable.
Michal Wajdeczko [Thu, 23 May 2024 22:30:42 +0000 (00:30 +0200)]
drm/xe/vf: Provide early access to GMDID register
VFs do not have direct access to the GMDID register and must obtain
its value from the GuC. Since we need GMDID value very early in the
driver probe flow, before we even start the full setup of GT and GuC
data structures, we must do some early initializations ourselves.
Additionally, since we also need GMDID for the media GT, which isn't
created yet, temporarly tweak the root GT type into MEDIA to allow
communication with the correct GuC, as only it can provide the value
of the media GMDID register.
Michal Wajdeczko [Thu, 23 May 2024 19:22:36 +0000 (21:22 +0200)]
drm/xe/guc: Add GLOBAL_CFG_GMD_ID KLV definition
VF drivers can't access GMD_ID register over MMIO.
The value of the GMD_ID register must be queried from GuC.
It is available as GLOBAL_CFG_GMD_ID KLV.
Michal Wajdeczko [Thu, 23 May 2024 19:22:35 +0000 (21:22 +0200)]
drm/xe/vf: Use register values obtained from the PF
As part of the its initialization, the VF driver has already
obtained a list of the runtime (fuse) register values from the
PF driver. When VF driver is attempting to read register that is
inaccessible to the VF, use the values from this list instead.
John Harrison [Sat, 18 May 2024 04:36:59 +0000 (21:36 -0700)]
drm/xe/guc: Port over the slow GuC loading support from i915
GuC loading can take longer than it is supposed to for various
reasons. So add in the code to cope with that and to report it when it
happens. There are also many different reasons why GuC loading can
fail, so add in the code for checking for those and for reporting
issues in a meaningful manner rather than just hitting a timeout and
saying 'fail: status = %x'.
Also, remove the 'FIXME' comment about an i915 bug that has never been
applicable to Xe!
v2: Actually report the requested and granted frequencies rather than
showing granted twice (review feedback from Badal).
v3: Locally code all the timeout and end condition handling because a
helper function is not allowed (review feedback from Lucas/Rodrigo).
v4: Add more documentation comments and rename a define to add units
(review feedback from Lucas).
v5: Fix copy/paste error in xe_mmio_wait32_not (review feedback from
Lucas) and rebase (no more return value from guc_wait_ucode).
John Harrison [Sat, 18 May 2024 04:36:58 +0000 (21:36 -0700)]
drm/xe: Make read_perf_limit_reasons globally accessible
Other driver code beyond the sysfs interface wants to know about
throttling. So make the query function globally accessible.
v2: Revert include order change (review feedback from Lucas)
v3: Remove '_sysfs' from throttle file names and keep limit query in
the same file rather than moving elsewhere (review feedback from
Rodrigo).
v4: Correct #include while renaming header file (review feedback
from Lucas).
This error capture prints into dmesg HW state when a gpu hang happens.
It was useful when we did not had devcoredump, now it is a incompleted
version of devcoredump that has potential to flood dmesg.
Rodrigo Vivi [Wed, 22 May 2024 17:01:05 +0000 (13:01 -0400)]
drm/xe: Enable D3Cold on 'low' VRAM utilization
Now that we eliminated all the mem_access get/put with its
locking issues from the inner calls of migration, we can
allow D3Cold.
Enable it when VRAM utilization is lower then 300Mb. On
higher utilization we only allow D3hot so we don't increase
so much the latency on runtime resume due to the memory
restoration.
Rodrigo Vivi [Wed, 22 May 2024 17:01:04 +0000 (13:01 -0400)]
drm/xe: Stop checking for power_lost on D3Cold
GuC reset status is not reliable for this purpose and it is
once in a while ending up in a situation of D3Cold, where
power_reset is false and without the proper memory restoration
the GuC reload and Display will fail to come back from D3Cold.
So, let's do a full restoration of everything if we have a risk
of losing power, without further optimizations.
v2: also remove the gut_in_reset function (Anshuman)
Rodrigo Vivi [Wed, 22 May 2024 17:01:03 +0000 (13:01 -0400)]
drm/xe: Prepare display for D3Cold
Prepare power-well and DC handling for a full power
lost during D3Cold, then sanitize it upon D3->D0.
Otherwise we get a bunch of state mismatch.
Ideally we could leave DC9 enabled and wouldn't need
to move DC9->DC0 on every runtime resume, however,
the disable_DC is part of the power-well checks and
intrinsic to the dc_off power well. In the future that
can be detangled so we can have even bigger power savings.
But for now, let's focus on getting a D3Cold, which saves
much more power by itself.
v2: create new functions to avoid full-suspend-resume path,
which would result in a deadlock between xe_gem_fault and the
modeset-ioctl.
v3: Only avoid the full modeset to avoid the race, for a more
robust suspend-resume.
Rodrigo Vivi [Wed, 22 May 2024 17:01:02 +0000 (13:01 -0400)]
drm/xe: Relax runtime pm protection around VM
In the regular use case scenario, user space will create a
VM, and keep it alive for the entire duration of its workload.
For the regular desktop cases, it means that the VM
is alive even on idle scenarios where display goes off. This
is unacceptable since this would entirely block runtime PM
indefinitely, blocking deeper Package-C state. This would be
a waste drainage of power.
Limit the VM protection solely for long-running workloads that
are not protected by the scheduler references.
By design, run_job for long-running workloads returns NULL and
the scheduler drops all the references of it, hence protecting
the VM for this case is necessary.
v2: Update commit message to a more imperative language and to
reflect why the VM protection is really needed.
Also add a comment in the code to let the reason visbible.
v3: Remove vma_access case and the mentions to mmap. Mmap cases
are already protected by the gem page fault.
Rodrigo Vivi [Wed, 22 May 2024 17:01:01 +0000 (13:01 -0400)]
drm/xe: Relax runtime pm protection during execution
Limit the protection only during moments of actual job execution,
and introduce protection for guc submit fini, which is currently
unprotected due to the absence of exec_queue life protection.
In the regular use case scenario, user space will create an
exec queue, and keep it alive to reuse that until it is done
with that kind of workload.
For the regular desktop cases, it means that the exec_queue
is alive even on idle scenarios where display goes off. This
is unacceptable since this would entirely block runtime PM
indefinitely, blocking deeper Package-C state. This would be
a waste drainage of power.
Rodrigo Vivi [Wed, 22 May 2024 17:00:59 +0000 (13:00 -0400)]
drm/xe: Fix xe_pm_runtime_get_if_active return
Current callers of this function are already taking the result
to a boolean and using in an if. It might be a problem because
current function might return negative error codes on failure,
without increasing the reference counter.
In this scenario we could end up with extra 'put' call ending
in unbalanced scenarios.
Let's fix it, while aligning with the current xe_pm_get_if_in_use
style.
Michal Wajdeczko [Tue, 21 May 2024 11:48:57 +0000 (13:48 +0200)]
drm/xe/uc: Don't emit false error if running in execlist mode
When running in execlist mode (using force_execlist=1 modparam)
we incorrectly select the error path in xe_uc_init(), leading to
an unwanted error message like this:
Matthew Auld [Wed, 22 May 2024 10:22:01 +0000 (11:22 +0100)]
drm/xe/display: move device_remove over to drmm
i915 display calls this when releasing the drm_device, match this also
in xe by using drmm. intel_display_device_remove() is freeing purely
software state for the drm_device.
Matthew Auld [Wed, 22 May 2024 10:21:58 +0000 (11:21 +0100)]
drm/xe: reset mmio mappings with devm
Set our various mmio mappings to NULL. This should make it easier to
catch something rogue trying to mess with mmio after device removal. For
example, we might unmap everything and then start hitting some mmio
address which has already been unmamped by us and then remapped by
something else, causing all kinds of carnage.
Matthew Auld [Wed, 22 May 2024 10:21:57 +0000 (11:21 +0100)]
drm/xe/mmio: move mmio_fini over to devm
Not valid to touch mmio once the device is removed, so make sure we
unmap on removal and not just when driver instance goes away. Also set
the mmio pointers to NULL to hopefully catch such issues more easily.
Matthew Auld [Wed, 22 May 2024 10:21:54 +0000 (11:21 +0100)]
drm/xe/coredump: move over to devm
Here we are using drmm to ensure we release the coredump when unloading
the module, however the coredump is very much tied to the struct device
underneath. We can see this when we hotunplug the device, for which we
have already got a coredump attached. In such a case the coredump still
remains and adding another is not possible. However we still register
the release action via xe_driver_devcoredump_fini(), so in effect two or
more releases for one dump. The other consideration is that the
coredump state is embedded in the xe_driver instance, so technically
once the drmm release action fires we might free the coredumpe state
from a different driver instance, assuming we have two release actions
and they can race. Rather use devm here to remove the coredump when the
device is released.
Matthew Auld [Wed, 22 May 2024 10:21:49 +0000 (11:21 +0100)]
drm/xe/guc_pc: move pc_fini to devm
Here we are touching the HW/GuC and presumably this should happen when
the device is removed. Currently if you hotunplug the device this is
skipped if there is already open driver instance.
Matthew Auld [Wed, 22 May 2024 10:21:47 +0000 (11:21 +0100)]
drm/xe/guc: move guc_fini over to devm
Make sure to actually call this when the device is removed. Currently we
only trigger it when the driver instance goes away, but that doesn't
work too well with hotunplug, since device can be removed and re-probed
with a new driver instance, where the guc_fini() is called too late.
Move the fini over to devm to ensure this is called when device is
removed.
Matthew Auld [Wed, 22 May 2024 10:21:46 +0000 (11:21 +0100)]
drm/xe/ggtt: use drm_dev_enter to mark device section
Device can be hotunplugged before we start destroying gem objects. In
such a case don't touch the GGTT entries, trigger any invalidations or
mess around with rpm. This should already be taken care of when
removing the device, we just need to take care of dealing with the
software state, like removing the mm node.
v2: (Andrzej)
- Avoid some duplication by tracking the bound status and checking
that instead.
Matthew Auld [Wed, 22 May 2024 10:21:45 +0000 (11:21 +0100)]
drm/xe: covert sysfs over to devm
Hotunplugging the device seems to result in stuff like:
kobject_add_internal failed for tile0 with -EEXIST, don't try to
register things with the same name in the same directory.
We only remove the sysfs as part of drmm, however that is tied to the
lifetime of the driver instance and not the device underneath. Attempt
to fix by using devm for all of the remaining sysfs stuff related to the
device.
Matthew Auld [Wed, 22 May 2024 10:21:44 +0000 (11:21 +0100)]
drm/xe/pci: remove broken driver_release
This is quite broken since we are nuking the pdev link to the private
driver struct, but note here that driver_release is called when the
drm_device is released (poor mans drmm), which can be long after the
device has been removed. So here what we are actually doing is nuking
the pdev link for what is potentially bound to a different drm_device.
If that happens before our pci remove callback is triggered (for the new
drm_device) we silently exit and skip some important cleanup steps,
resulting in hilarity.
There should be no reason to implement driver_release, when we already
have nicer stuff like drmm, so just remove completely. The actual pdev
link is already nuked when removing the device.
Michal Wajdeczko [Tue, 21 May 2024 09:25:18 +0000 (11:25 +0200)]
drm/xe/vf: Custom GuC initialization if VF
The GuC firmware is loaded and initialized by the PF driver. Make
sure VF drivers only perform permitted operations. For submission
initialization, use number of GuC context IDs from self config.
Michal Wajdeczko [Tue, 21 May 2024 09:25:17 +0000 (11:25 +0200)]
drm/xe/guc: Allow to initialize submission with limited set of IDs
While PF and native drivers may initialize submission code to use
all available GuC contexts IDs, the VF driver may only use limited
number of IDs. Update init function to accept number of context
IDs available for use.
Michal Wajdeczko [Mon, 20 May 2024 18:18:13 +0000 (20:18 +0200)]
drm/xe: Don't rely on indirect includes from xe_mmio.h
These compilation units use udelay() or some GT oriented printk
functions without explicitly including proper header files, and
relying on #includes from the xe_mmio.h instead. Fix that.