Matthew Auld [Tue, 1 Oct 2024 08:43:47 +0000 (09:43 +0100)]
drm/xe/ct: prevent UAF in send_recv()
Ensure we serialize with completion side to prevent UAF with fence going
out of scope on the stack, since we have no clue if it will fire after
the timeout before we can erase from the xa. Also we have some dependent
loads and stores for which we need the correct ordering, and we lack the
needed barriers. Fix this by grabbing the ct->lock after the wait, which
is also held by the completion side.
v2 (Badal):
- Also print done after acquiring the lock and seeing timeout.
Matthew Brost [Fri, 27 Sep 2024 23:22:28 +0000 (16:22 -0700)]
drm/xe: Fix memory leak when aborting binds
Make sure to call xe_pt_update_ops_fini in xe_pt_update_ops_abort to
free any memory the bind allocated.
Caught by kmemleak when running Vulkan CTS tests on LNL. The leak
seems to happen only when there's some kind of failure happening, like
the lack of memory. Example output:
drm/xe: Prevent null pointer access in xe_migrate_copy
xe_migrate_copy designed to copy content of TTM resources. When source
resource is null, it will trigger a NULL pointer dereference in
xe_migrate_copy. To avoid this situation, update lacks source flag to
true for this case, the flag will trigger xe_migrate_clear rather than
xe_migrate_copy.
drm/xe/oa: Don't reset OAC_CONTEXT_ENABLE on OA stream close
Mesa testing on Xe2+ revealed that when OA metrics are collected for an
exec_queue, after the OA stream is closed, future batch buffers submitted
on that exec_queue do not complete. Not resetting OAC_CONTEXT_ENABLE on OA
stream close resolves these hangs and should not have any adverse effects.
v2: Make the change that we don't reset the bit clearer (Ashutosh)
Also make the same fix for OAC as OAR (Ashutosh)
Matthew Auld [Wed, 25 Sep 2024 07:14:28 +0000 (08:14 +0100)]
drm/xe/queue: move xa_alloc to prevent UAF
Evil user can guess the next id of the queue before the ioctl completes
and then call queue destroy ioctl to trigger UAF since create ioctl is
still referencing the same queue. Move the xa_alloc all the way to the end
to prevent this.
Matthew Auld [Wed, 25 Sep 2024 07:14:27 +0000 (08:14 +0100)]
drm/xe/vm: move xa_alloc to prevent UAF
Evil user can guess the next id of the vm before the ioctl completes and
then call vm destroy ioctl to trigger UAF since create ioctl is still
referencing the same vm. Move the xa_alloc all the way to the end to
prevent this.
Matthew Brost [Wed, 24 Jul 2024 23:59:19 +0000 (16:59 -0700)]
drm/xe: Resume TDR after GT reset
Not starting the TDR after GT reset on exec queue which have been
restarted can lead to jobs being able to be run forever. Fix this by
restarting the TDR.
Matt Roper [Mon, 23 Sep 2024 21:45:11 +0000 (14:45 -0700)]
drm/xe: Move IRQ-related registers to dedicated header
IRQ registers have a well-defined scope and make sense to collect in a
dedicated header file. This also reduces confusion about the GT IRQ
registers --- even though those registers relate to the GTs, they
actually live outside the GT (in the sgunit) and thus do not need to
worry about GT-specific register concepts like forcewake, steering, etc.
Matthew Auld [Mon, 23 Sep 2024 14:56:48 +0000 (15:56 +0100)]
drm/xe: fix UAF around queue destruction
We currently do stuff like queuing the final destruction step on a
random system wq, which will outlive the driver instance. With bad
timing we can teardown the driver with one or more work workqueue still
being alive leading to various UAF splats. Add a fini step to ensure
user queues are properly torn down. At this point GuC should already be
nuked so queue itself should no longer be referenced from hw pov.
v2 (Matt B)
- Looks much safer to use a waitqueue and then just wait for the
xa_array to become empty before triggering the drain.
Matthew Auld [Tue, 24 Sep 2024 15:09:48 +0000 (16:09 +0100)]
drm/xe/guc_submit: add missing locking in wedged_fini
Any non-wedged queue can have a zero refcount here and can be running
concurrently with an async queue destroy, therefore dereferencing the
queue ptr to check wedge status after the lookup can trigger UAF if
queue is not wedged. Fix this by keeping the submission_state lock held
around the check to postpone the free and make the check safe, before
dropping again around the put() to avoid the deadlock.
Matthew Brost [Sat, 21 Sep 2024 01:17:12 +0000 (18:17 -0700)]
drm/xe: Clean up VM / exec queue file lock usage.
Both the VM / exec queue file lock protect the lookup and reference to
the object, nothing more. These locks are not intended anything else
underneath them. XA have their own locking too, so no need to take the
VM / exec queue file lock aside from when doing a lookup and reference
get.
Add some kernel doc to make this clear and cleanup a few typos too.
drm/xe/xe2: Add performance tuning for L3 cache flushing
A recommended performance tuning for LNL related to L3 cache flushing
was recently introduced in Bspec. Implement it.
Unlike the other existing tuning settings, we limit this one for LNL
only, since there is no info about whether this would be applicable to
other platforms yet. In the future we can come back and use IP version
ranges if applicable.
v2:
- Fix reference to Bspec. (Sai Teja, Tejas)
- Use correct register name for "Tuning: L3 RW flush all Cache". (Sai
Teja)
- Use SCRATCH3_LBCF (with the underscore) for better readability.
v3:
- Limit setting to LNL only. (Matt)
drm/xe/xe2: Assume tuning settings also apply for future media GT
We already make the assumption that recommended tuning settings for
primary GT on Xe2 will also apply for future releases. Let's make the
same assumption for the media GT. We can come back and define closed
ranges when that becomes necessary.
With exception of "Tuning: L3 cache - media", we are currently applying
recommended performance tuning settings only for the primary GT. Let's
also implement them for the media GT when applicable.
According to our spec, media GT registers CCCHKNREG1 and L3SQCREG* exist
only in Xe2_LPM and their offsets do not match their primary GT
counterparts. Furthermore, the range where CCCHKNREG1 belongs is not
listed as a multicast range on the media GT. As such, we need to have
Xe2_LPM-specific definitions for those registers and apply the setting
only for that specific IP.
Both Xe2_HPM and Xe2_LPM contain STATELESS_COMPRESSION_CTRL and the
offset on the media GT matches the one on the primary one. So we can
simply have a copy of "Tuning: Stateless compression control" for the
media GT.
v2:
- Fix implementation with respect to multicast vs non-multicast
registers. (Matt)
- Add missing XE2LPM_CCCHKNREG1 on second action of "Tuning:
Compression Overfetch - media".
v3:
- STATELESS_COMPRESSION_CTRL on Xe2_HPM is also a multicast register,
do not define a XE2HPM_STATELESS_COMPRESSION_CTRL register. (Tejas)
Ilia Levi [Wed, 18 Sep 2024 05:39:42 +0000 (08:39 +0300)]
drm/xe: memirq handler changes
Expose an interrupt processing handler for a single hw engine.
Refactor code to use this handler from the VF.
This handler also caters for the MSI-X mode, where the hardware engines
report interrupt source and status to the offset of engine instance zero
(this usage will be introduced in upcoming MSI-X enabling series).
Ilia Levi [Wed, 18 Sep 2024 05:39:41 +0000 (08:39 +0300)]
drm/xe: memirq infra changes for MSI-X
When using MSI-X, hw engines report interrupt status and source to engine
instance 0. For this scenario, in order to differentiate between the
engines, we need to pass different status/source pointers in the LRC.
The requirements on those pointers are:
- Interrupt status should be 4KiB aligned
- Interrupt source should be 64 bytes aligned
To accommodate this, we duplicate the current memirq page layout -
allocating a page for each engine instance and pass this page in the LRC.
Note that the same page can be reused for different engine types.
For example, an LRC executing on CCS #x will have pointers to page #x,
and an LRC executing on BCS #x will have the same pointers. Thus, to
locate the proper page, the pointer accessors were modified to receive
the hw engine.
Matt Roper [Tue, 17 Sep 2024 22:16:16 +0000 (15:16 -0700)]
drm/xe: Defer gt->mmio initialization until after multi-tile setup
With the recent xe_mmio redesign, tiles and GTs each have their own MMIO
accessor, with the GT inheriting some of the information (such as the
iomap pointer) from their containing tile. Given that non-root tiles
get initialized later than the root tile (and currently after the point
at which GT MMIO is initialized for _all_ GTs), we wind up incorrectly
inheriting uninitialized pointers for the initialization of GT MMIO for
GTs that reside on non-root tiles. This causes a driver crash on
multi-tile PVC platforms.
With the general xe_mmio redesign, it's now only necessary to do the
GT-level MMIO setup before the point we start reading/writing GT
registers. Move initialization of gt->mmio out of xe_info_init (which
runs before non-root tiles are initialized) and to the beginning of
where we start actually accessing the GTs themselves.
The high-level initialization flow now boils down to:
- General device init, software-only setup
- (no register access possible yet)
- Root tile initialization
- (access to device/tile0 registers possible via xe_root_tile_mmio())
- Initialization of non-root tiles
- (access to any tile's registers possible via tile->mmio)
- GT MMIO initialization, inheriting iomap from each GT's tile
- (access to any GT's registers possible via gt->mmio)
Matthew Auld [Mon, 16 Sep 2024 08:49:12 +0000 (09:49 +0100)]
drm/xe/vram: fix ccs offset calculation
Spec says SW is expected to round up to the nearest 128K, if not already
aligned for the CC unit view of CCS. We are seeing the assert sometimes
pop on BMG to tell us that there is a hole between GSM and CCS, as well
as popping other asserts with having a vram size with strange alignment,
which is likely caused by misaligned offset here.
v2 (Shuicheng):
- Do the round_up() on final SW address.
Michal Wajdeczko [Thu, 12 Sep 2024 20:38:17 +0000 (22:38 +0200)]
drm/xe/pf: Allow to trigger VF GuC state restore from debugfs
For feature enabling and testing purposes, allow to restore saved
or replaced VF GuC state from debugfs, bypassing normal migration
flow. This is available only under strict debug config.
Michal Wajdeczko [Thu, 12 Sep 2024 20:38:15 +0000 (22:38 +0200)]
drm/xe/pf: Save VF GuC state when pausing VF
Since usually pausing the VF is done as a first step to migrate
that VF, immediately save VF GuC state as a final step of the VF
pausing to have that data ready to export when needed.
Michal Wajdeczko [Fri, 13 Sep 2024 12:00:13 +0000 (14:00 +0200)]
drm/xe/pf: Add functions to save and restore VF GuC state
To successfully migrate a VM with attached GPU VF we also need to
migrate VF's GuC state. Add necessary functions that interacts with
GuC to save and restore a VF GuC state. We will start using them in
upcoming patches.
Since VF migration requires many more changes in the driver, enable
those functions only under debug config.
Michal Wajdeczko [Thu, 12 Sep 2024 20:38:13 +0000 (22:38 +0200)]
drm/xe/guc: Add PF2GUC_SAVE_RESTORE_VF to ABI
In upcoming patches we will add support to the PF driver to save
and restore a VF state maintained by the GuC to allow VF migration.
Add necessary H2G definitions to our GuC firmware ABI header.
By default xe_bb_create_job() appends a MI_BATCH_BUFFER_END to batch
buffer, this is not a problem if batch buffer is only used once but
oa reuses the batch buffer for the same metric and at each call
it appends a MI_BATCH_BUFFER_END, printing the warning below and then
overflowing.
Matthew Brost [Wed, 11 Sep 2024 01:18:20 +0000 (18:18 -0700)]
drm/xe: Do not run GPU page fault handler on a closed VM
Closing a VM removes page table memory thus we shouldn't touch page
tables when a VM is closed. Do not run the GPU page fault handler once
the VM is closed to avoid touching page tables.
Matthew Auld [Wed, 11 Sep 2024 15:55:29 +0000 (16:55 +0100)]
drm/xe/client: use mem_type from the current resource
Rather extract the mem_type from the current resource. Checking the
first potential placement doesn't really tell us where the bo is
currently allocated, especially if there are multiple potential
placements.
Matthew Auld [Wed, 11 Sep 2024 15:55:28 +0000 (16:55 +0100)]
drm/xe/client: add missing bo locking in show_meminfo()
bo_meminfo() wants to inspect bo state like tt and the ttm resource,
however this state can change at any point leading to stuff like NPD and
UAF, if the bo lock is not held. Grab the bo lock when calling
bo_meminfo(), ensuring we drop any spinlocks first. In the case of
object_idr we now also need to hold a ref.
Matthew Auld [Wed, 11 Sep 2024 15:55:27 +0000 (16:55 +0100)]
drm/xe/client: fix deadlock in show_meminfo()
There is a real deadlock as well as sleeping in atomic() bug in here, if
the bo put happens to be the last ref, since bo destruction wants to
grab the same spinlock and sleeping locks. Fix that by dropping the ref
using xe_bo_put_deferred(), and moving the final commit outside of the
lock. Dropping the lock around the put is tricky since the bo can go
out of scope and delete itself from the list, making it difficult to
navigate to the next list entry.
Matt Roper [Tue, 10 Sep 2024 23:48:03 +0000 (16:48 -0700)]
drm/xe/mmio: Drop compatibility macros
Now that all parts of the driver have switched over to using xe_mmio for
direct register access, we can drop the compatibility macros that allow
continued xe_gt usage.
v2:
- Move removal of 8/16-bit read and xe_mmio_wait32_not() wrappers to
this patch rather than removing them in earlier patches when last
caller was removed. (Rodrigo)
Matt Roper [Tue, 10 Sep 2024 23:47:37 +0000 (16:47 -0700)]
drm/xe/device: Convert register access to use xe_mmio
Stop using GT pointers for register access. Since a GT was passed as a
parameter to verify_lmem_ready() solely as a way to do MMIO accesses,
change the parameter to xe_device, which more accurately reflects that
this is a device-wide operation.
Matt Roper [Tue, 10 Sep 2024 23:47:33 +0000 (16:47 -0700)]
drm/xe/vram: Convert register access to use xe_mmio
Stop using GT pointers for register access. Note that MIRROR_FUSE3 is a
GT register and is accessed via gt->mmio, whereas GSMBASE is an sgunit
register so it is accessed via tile->mmio.
Matt Roper [Tue, 10 Sep 2024 23:47:31 +0000 (16:47 -0700)]
drm/xe/pcode: Convert register access to use xe_mmio
Stop using GT pointers for register access. Although some of the pcode
mailboxes are related to GTs, pcode itself (and the register interface
to access it) are outside the GT and should be accessed through the
tile's MMIO.
Matt Roper [Tue, 10 Sep 2024 23:47:30 +0000 (16:47 -0700)]
drm/xe/irq: Convert register access to use xe_mmio
Stop using GT pointers for register access. This misusage has been
especially confusing in interrupt code because even though some of the
interrupts are related to GTs (or engines within GTs), the interrupt
registers themselves live outside the GT, in the sgunit.
Matt Roper [Tue, 10 Sep 2024 23:47:29 +0000 (16:47 -0700)]
drm/xe: Switch MMIO interface to take xe_mmio instead of xe_gt
Since much of the MMIO register access done by the driver is to non-GT
registers, use of 'xe_gt' in these interfaces has been a long-standing
design flaw that's been hard to disentangle.
To avoid a flag day across the whole driver, munge the function names
and add temporary compatibility macros with the original function names
that can accept either the new xe_mmio or the old xe_gt structure as a
parameter. This will allow us to slowly convert parts of the driver
over to the new interface independently.
Matt Roper [Tue, 10 Sep 2024 23:47:28 +0000 (16:47 -0700)]
drm/xe: Adjust mmio code to pass VF substructure to SRIOV code
Although we want to break the GT-centric nature of the MMIO code in the
general driver, the SRIOV handling still relies on data in a VF
substructure of the GT. So add a GT backpointer, but name it
sriov_vf_gt to make it clear that it's only for this one specific
special case and will not be set or usable for anything else.
v2:
- Store backpointer to the GT itself rather than the SRIOV-specific
substructure. (Michal)
Matt Roper [Tue, 10 Sep 2024 23:47:27 +0000 (16:47 -0700)]
drm/xe: Add xe_tile backpointer to xe_mmio
Once MMIO operations stop being (incorrectly) tied to a GT, we'll still
need a backpointer for feature checks, message logging, and tracepoints.
Use a tile backpointer since that may allow the most useful debugging
output, while also providing access to the xe_device.
v2:
- Make backpointer an xe_tile instead of xe_device. (Michal)
Matt Roper [Tue, 10 Sep 2024 23:47:25 +0000 (16:47 -0700)]
drm/xe: Populate GT's mmio iomap from tile during init
Each GT should share the same register iomap as its parent tile. Future
patches will switch to access the iomap through the GT's mmio substruct
rather than through the tile.
Matt Roper [Tue, 10 Sep 2024 23:47:24 +0000 (16:47 -0700)]
drm/xe: Move GSI offset adjustment fields into 'struct xe_mmio'
By moving the GSI adjustment fields into 'struct xe_mmio' we can replace
the GT's MMIO substructure with another instance of xe_mmio. At the
moment this means MMIO operations wind up pulling information from two
different places (the tile's xe_mmio for the iomap and the GT's xe_mmio
for the adjustment), but we'll address that in future patches.
The type headers change a bit with this change, meaning that various
files should be including xe_device_types.h instead of (or in addition
to) xe_gt_types.h.
v2:
- Fix pre-existing kerneldoc typo while moving the fields (Lucas)
v3:
- Add missing '@' in kerneldoc. (Rodrigo)
Matt Roper [Tue, 10 Sep 2024 23:47:23 +0000 (16:47 -0700)]
drm/xe: Clarify size of MMIO region
xe_mmio currently has a size parameter that is assigned but never used
anywhere. The current values assigned appear to be the size of the BAR
region assigned for the tile (both for registers and other purposes such
as the GGTT). Since the current field isn't being used for anything,
change the assignments to 4MB (the size of the register region on all
current platform) and rename the field to 'regs_size' to more clearly
describe what it represents. We can use this value in later patches to
help ensure no register accesses accidentally go past the end of the
desired register space (which might not be caught easily if they still
fall within the iomap).
Matt Roper [Tue, 10 Sep 2024 23:47:22 +0000 (16:47 -0700)]
drm/xe: Create dedicated xe_mmio structure
Pull the 'mmio' substructure from xe_tile out into a dedicated type.
Future patches will expand this structure and then eventually move MMIO
read/write operations over to using this type.
v2:
- Fix kerneldoc of 'size' field. The rename/refocusing of this field
got moved to the next patch of the series. (Lucas)
- Correct commit message; it's the tile, not the device, mmio that's
been pulled out to a separate type. (Michal)
Matt Roper [Tue, 10 Sep 2024 23:47:21 +0000 (16:47 -0700)]
drm/xe: Move forcewake to 'gt.pm' substructure
Forcewake is a general GT power management concept that isn't specific
to MMIO register access. Move the forcewake information for a GT out of
the 'mmio' substruct and into a 'pm' substruct. Also use the gt_to_fw()
helper in a few more places where it was being open-coded.
Ashutosh Dixit [Mon, 9 Sep 2024 16:59:33 +0000 (09:59 -0700)]
drm/xe/oa: Enable Xe2+ PES disaggregation
Enable Xe2+ PES disaggregation (for OAG) to retrieve disaggregated metrics
when disaggregated data is needed. Userspace can select whether to receive
aggregated or disaggregated metrics via the particular OA configuration it
uses (programmed via DRM_XE_OBSERVATION_OP_ADD_CONFIG).
The system is turning off, and we should probably put the device
in a safe power state. We don't need to evict VRAM or suspend running
jobs to a safe state, as the device is rebooted anyway.
This does not imply the system is necessarily reset, as we can
kexec into a new kernel. Without shutting down, things like
USB Type-C may mysteriously start failing.
drm/xe: Remove runtime argument from display s/r functions
The previous change ensures that pm_suspend is only called when
suspending or resuming. This ensures no further bugs like those
in the previous commit.
drm/xe: Add a xe_bo subtest for shrinking / swapping
Add a subtest that tries to allocate twice the amount of
buffer object memory available, write data to it and then read
all the data back verifying data integrity.
In order to be able to do this on systems that
have no or not enough swap-space available, allocate some memory
as purgeable, and introduce a function to purge such memory from
the TTM swap_notify path.
this test is intended to add test coverage to the current
bo swap path and upcoming shrinking path.
The test has previously been part of the xe bo shrinker series.
v2:
- Skip test if the execution time is expected to be too long.
- Minor code cleanups.
Dave Airlie [Wed, 11 Sep 2024 03:05:37 +0000 (13:05 +1000)]
Merge tag 'exynos-drm-next-for-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next
Three cleanups
- Drop stale exynos file pattern from MAINTAINERS file
The old "exynos" directory is removed from MAINTAINERS as Samsung Exynos display bindings have been relocated. This resolves a warning from get_maintainers.pl about no files matching the outdated directory.
- Constify struct exynos_drm_ipp_funcs
By making struct exynos_drm_ipp_funcs constant, the patch enhances security by moving the structure to a read-only section of memory. This change results in a slight reduction in the data section size.
- Remove unnecessary code
The function exynos_atomic_commit is removed as it became redundant after a previous update. This cleans up the code and eliminates unused function declarations.
One fixup
- Fix wrong assignment in gsc_bind()
A double assignment in gsc_bind() was flagged by the cocci tool and corrected to fix an incorrect assignment, addressing a potential issue introduced in a prior commit.