Git Repo - linux.git/log

drm/xe: Annotate masked registers used by RTP

Go over all registers used in xe_rtp tables and mark the registers as
masked if they were passed a XE_RTP_ACTION_FLAG(MASKED_REG) flag.
This will allow the flag to be removed in future when xe_rtp starts
using the real xe_reg_t type.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-9-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Use XE_REG/XE_REG_MCR

These should replace the _MMIO() and MCR_REG() from i915, with the goal
of being more extensible, allowing to pass the additional fields for
struct xe_reg and struct xe_reg_mcr. Replace all uses of _MMIO() and
MCR_REG() in xe.

Since the RTP, reg-save-restore and WA infra are not ready to use the
new type, just undef the macro like was done for the i915 types
previously. That conversion will come later.

v2: Remove MEDIA_SOFT_SCRATCH_COUNT/MEDIA_SOFT_SCRATCH re-added by
mistake (Matt Roper)

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-8-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Introduce xe_reg/xe_reg_mcr

Stop using i915 types for registers. Use our own types. Differently from
i915, this will keep under the register definition the knowledge for the
different types of registers. For now, the "flags"/"options" are mcr and
masked, although only the former is being used.

Additionally MCR registers have their own type. The only place that
should really look inside a xe_mcr_reg_t is that code dealing with the
steering and using other APIs when the register is MCR has been a source
of problem in the past.

Most of the driver is agnostic to the register differences since they
either use the definition from the header or already call the correct
MCR_REG()/_MMIO() macros. By embeding the struct xe_reg inside the
struct it's also possible to guarantee the compiler will break if
using RANDOM_MCR_REG.reg is attempted, since now the u32 is inside the
inner struct.

v2:
  - Deep a dedicated type for MCR registers to avoid misuse
    (Matt Roper, Jani)
  - Drop the typedef and just use a struct since it's not an opaque type
    (Jani)
  - Add more kernel-doc
v3:
  - Use only 22 bits for the register address since all the platforms
    supported so far have only 4MB of MMIO per tile  (Matt Roper)

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-7-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Clarify register types on PAT programming

Clarify a few things related to the PAT programming, particularly on
MTL:

- The register type doesn't change depending on the GT - what
  happens is that media GT writes to other set of registers that
  are not MCR
- Remove "UNICAST": otherwise it's confusing why it's not using
  MCR registers with the unicast function variant

Also, there isn't much reason to keep those parts as macros: promote
them to proper functions and let the compiler inline if it sees fit.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Use REG_FIELD/REG_BIT for all regs/*.h

Convert the macro declarations to the equivalent GENMASK and
and bitfield prep for all registers.

v2 (Matt Roper):
  - Fix wrong conversion of RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_MASK
  - Reorder fields of XEHP_SLICE_UNIT_LEVEL_CLKGATE for consistency
  - Simplify CTC_SOURCE_* by only defining CTC_SOURCE_DIVIDE_LOGIC
    as REG_BIT(0)

v3: Also remove DOP_CLOCK_GATE_ENABLE that is unused and wrongly defined

Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Drop gen afixes from registers

The defines for the registers were brought over from i915 while
bootstrapping the driver. As xe supports TGL and later only, it doesn't
make sense to keep the GEN* prefixes and suffixes in the registers: TGL
is graphics version 12, previously called "GEN12". So drop the prefix
everywhere.

v2:
  - Also drop _TGL suffix and reword commit message as suggested
    by Matt Roper. While at it, rename VSUNIT_CLKGATE_DIS_TGL to
    VSUNIT_CLKGATE2_DIS with the additional "2", so it doesn't clash
    with the define for the other register

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/guc: Convert GuC registers to REG_FIELD/REG_BIT

Cleanup GuC register declarations by converting them to use REG_FIELD,
REG_BIT and REG_GENMASK. While converting, also reorder the bitfields
so they follow the convention of declaring the higher bits first.

v2:
- Drop unused HUC_LOADING_AGENT_VCR and DMA_ADDRESS_SPACE_GTT (Matt Roper)
- Simplify HUC_LOADING_AGENT_GUC define (Matt Roper)

Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Remove extra xe_mmio_read32 from xe_mmio_wait32

Commit 7aaec3a623ad ("drm/xe: Let's return last value read on
xe_mmio_wait32.") mentions that we should return the last value read,
but we never actually return it. This breaks display which depends on
the value being actually returned where needed.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Fixes: 7aaec3a623ad ("drm/xe: Let's return last value read on xe_mmio_wait32.")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/257
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/guc: Move GuC registers to regs/

There's no good reason to keep the GuC registers outside the regs/
directory: move the header with GuC registers under that.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/guc: Rename GEN11_SOFT_SCRATCH for clarity

That register is a completely different register, it's not the same as
SOFT_SCRATCH for GEN11 and beyond. Rename to to the same name as the
bspec uses, including the new variant for media. Also, move the
definitions to the guc header.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Rename instruction field to avoid confusion

There was both BLT_DEPTH_32 and XY_FAST_COLOR_BLT_DEPTH_32 - also add
the prefix to the first to make it clear this is about the FAST_**COPY**
operation. While at it, remove the GEN9_ prefix.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Rename RC0/RC6 macros

Follow up commits will mass-remove the gen prefix/suffix. For GEN6_RC0
and GEN6_RC6 that would make the variable too short and easy to
conflict. So, add "GT_" prefix that is also part of the register name.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Cleanup page-related defines

Rename the following defines to lose the GEN* prefixes since they don't
make sense for xe:

GEN8_PTE_SHIFT -> XE_PTE_SHIFT
GEN8_PAGE_SIZE -> XE_PAGE_SIZE
GEN8_PTE_MASK -> XE_PTE_MASK
GEN8_PDE_SHIFT -> XE_PDE_SHIFT
GEN8_PDES -> XE_PDES
GEN8_PDE_MASK -> XE_PDE_MASK
GEN8_64K_PTE_SHIFT -> XE_64K_PTE_SHIFT
GEN8_64K_PAGE_SIZE -> XE_64K_PAGE_SIZE
GEN8_64K_PTE_MASK -> XE_64K_PTE_MASK
GEN8_64K_PDE_MASK -> XE_64K_PDE_MASK
GEN8_PDE_PS_2M -> XE_PDE_PS_2M
GEN8_PDPE_PS_1G -> XE_PDPE_PS_1G
GEN8_PDE_IPS_64K -> XE_PDE_IPS_64K
GEN12_GGTT_PTE_LM -> XE_GGTT_PTE_LM
GEN12_USM_PPGTT_PTE_AE -> XE_USM_PPGTT_PTE_AE
GEN12_PPGTT_PTE_LM -> XE_PPGTT_PTE_LM
GEN12_PDE_64K -> XE_PDE_64K
GEN12_PTE_PS64 -> XE_PTE_PS64
GEN8_PAGE_PRESENT -> XE_PAGE_PRESENT
GEN8_PAGE_RW -> XE_PAGE_RW
PTE_READ_ONLY -> XE_PTE_READ_ONLY

Keep an XE_ prefix to make sure we don't mix the defines for the CPU
(e.g. PAGE_SIZE) with the ones fro the GPU).

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Fix print of RING_EXECLIST_SQ_CONTENTS_HI

On xe_hw_engine_print_state we were printing:
value_of(0x510) + 4 instead of
value_of(0x514) as desired.

So, let's properly define a RING_EXECLIST_SQ_CONTENTS_HI
register to fix the issue and also to avoid other issues
like that.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe: Update comment on why d3cold is still blocked.

The main issue with buddy allocator was fixed, but then
we ended up on other issues, so we need to step back and rethink
our strategy with D3cold. So, let's update the comment with a
todo list so we don't get tempted in removing it before we are
really ready.

Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>

drm/xe: Limit the system memory size to half of the system memory

ttm_global_init() imposes this limitation.

Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Fix build without CONFIG_PM_SLEEP

Build without CONFIG_PM_SLEEP (such as for riscv) was failing due
to unused xe_pci_runtime_* functions.

Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/guc_pc: Reorder forcewake and xe_pm_runtime calls

When the device is runtime suspended, reading some of the sysfs
entries under device/gt#/ causes a resume error
This is due to the ordering of pm_runtime and forcewake calls.
Reorder to wake up using xe_pm_runtime_get and then forcewake

v2: add goto statements (Rodrigo)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Keep all resize bar related prints inside xe_resize_vram_bar

xe_resize_vram_bar() function is already printing the status of bar
resizing. It has prints covering both success and failure.
There is no need of additional prints in the caller which were not so
easily to follow.

Modified all BAR size prints to consistently print the size in MiB.

Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Drop GFX_FLSH_CNTL_GEN6 write during GGTT invalidation

The write of GFX_FLSH_CNTL_GEN6 was inherited from the i915 codebase
where it was used to force a flush of the write-combine buffer in cases
where the GSM/GGTT were mapped as WC.  Since Xe never uses WC mappings
of the GGTT, this register write is unnecessary.  Furthermore, this
register was removed on Xe_HP-based platforms, so this write winds up
clobbering an unrelated register.

v2:
- Also drop GFX_FLSH_CNTL_GEN6 from the register file now that it's no
   longer used.  (Lucas)

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230418230247.3802438-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Fix xe_mmio_rmw32 operation

xe_mmio_rmw32 was failing to invert the passed in mask, resulting in a
register update that wasn't the expected RMW operation. Fortunately the
impact of this mistake was limited, since this function isn't heavily
used in Xe right now; this will mostly fix some GuC PM interrupt
unmasking.

v2:
- Rename parameters as 'clr' and 'set' to clarify semantics. (Lucas)

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230421145006.10940-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/lrc: give start_seqno a better default

If looking at the initial engine dump we should expect this to match
XE_FENCE_INITIAL_SEQNO - 1.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/sched_job: prefer dma_fence_is_later

Doesn't look like we are accounting for seqno wrap. Just use
__dma_fence_is_later() like we already do for xe_hw_fence_signaled().

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/sr: Apply masked registers properly

The 'clear' field for register save/restore entries was being placed in
the value bits of the register rather than the mask bits; make sure it
gets shifted into the mask bits.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230419224909.4000920-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/tlb: fix expected_seqno calculation

It looks like when tlb_invalidation.seqno overflows
TLB_INVALIDATION_SEQNO_MAX, we start counting again from one, as per
send_tlb_invalidation(). This is also inline with initial value we give
it in xe_gt_tlb_invalidation_init().  When calculating the
expected_seqno we should also take this into account.

While we are here also print out the values if we ever trigger the
warning.

v2 (José):
  - drm_WARN_ON() is preferred over plain WARN_ON(), since it gives
    information on the originating device.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/248
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Add Rocketlake device info

Add missing device info for Rocketlake.
While at it, also set the value for IS_ROCKETLAKE
macro which is right now set to 0.

v2: Also add abox_mask to the device info(Lucas)
v3: rebase
v4: Set IS_ROCKETLAKE (Anusha)

Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Tested-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>(v2)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/mmio: stop incorrectly triggering drm_warn

CI keeps triggering:

xe 0000:03:00.0: [drm] Restricting VRAM size to PCI resource size
(0x400000000->0x3fa000000)

Due to usable_size vs vram_size differences. However, we only want to
trigger the drm_warn() to let developers know that the system they are
using is going clamp the VRAM size to match the IO size, where they can
likely only use 256M of VRAM. Once we properly support small-bar we can
revisit this.

v2 (Lucas): Drop the TODO for now

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: GuC and HuC loading support for RKL

Rocketlake uses TGL GuC and HuC

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Only request PCODE_WRITE_MIN_FREQ_TABLE on LLC platforms

PCODE_WRITE_MIN_FREQ_TABLE is only applicable to platforms with an LLC.
Change the discrete GPU check to an LLC check instead; this take care of
skipping not only the discrete platforms, but also integrated platforms
like MTL that do not have an LLC.

Fixes MTL dmesg error:

xe 0000:00:02.0: [drm] *ERROR* PCODE Mailbox failed: 1 Illegal Command

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230410183910.2696628-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Track whether platform has LLC

Some driver initialization is conditional on the presence of an LLC.
Add an extra feature flag to support this.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230410183910.2696628-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Use packed bitfields for xe->info feature flags

Replace 'bool' fields with single bits to allow the various device
feature flags to pack more tightly.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20230410183910.2696628-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Don't grab runtime PM ref in engine create IOCTL

A VM had a runtime PM ref, a engine can't be created without a VM, and
the engine holds a ref to the VM thus this is unnecessary. Beyond that
taking a ref in the engine create IOCTL and dropping it in the destroy
IOCTL is wrong as a user doesn't have to call the destroy IOCTL (e.g.
they can just kill the process or close the driver FD). If a user does
this PM refs are leaked.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Let primary and media GT share a kernel_bb_pool

The media GT requires a valid gt->kernel_bb_pool during driver probe to
allocate the WA and NOOP batchbuffers used to record default context
images. Dynamically allocate the bb_pools so that the primary and media
GT can use the same pool during driver init.

The media GT still shouldn't be need the USM pool, so only hook up the
kernel_bb_pool for now.

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20230410200229.2726648-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Fix memory use after free

The wait_event_timeout() on g2h_fence.wq which is declared on
stack can return before the wake_up() gets called, resulting in a
stack out of bound access when wake_up() accesses the g2h_fene.wq.

Do not declare g2h_fence related wait_queue_head_t on stack.

Fixes the below KASAN BUG and associated kernel crashes.

BUG: KASAN: stack-out-of-bounds in do_raw_spin_lock+0x6f/0x1e0
Read of size 4 at addr ffff88826252f4ac by task kworker/u128:5/467

CPU: 25 PID: 467 Comm: kworker/u128:5 Tainted: G U 6.3.0-rc4-xe #1
Workqueue: events_unbound g2h_worker_func [xe]
Call Trace:
<TASK>
dump_stack_lvl+0x64/0xb0
print_report+0xc2/0x600
kasan_report+0x96/0xc0
do_raw_spin_lock+0x6f/0x1e0
_raw_spin_lock_irqsave+0x47/0x60
__wake_up_common_lock+0xc0/0x150
dequeue_one_g2h+0x20f/0x6a0 [xe]
g2h_worker_func+0xa9/0x180 [xe]
process_one_work+0x527/0x990
worker_thread+0x2d1/0x640
kthread+0x174/0x1b0
ret_from_fork+0x29/0x50
</TASK>

Tested-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: fix suspend-resume for dgfx

This stopped working now that TTM treats moving a pinned object through
ttm_bo_validate() as an error, for the general case. Add some new
routines to handle the new special casing needed for suspend-resume.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/244
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Tested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Clean up xe_device_desc

Now that most of the characteristics of a device are associated with the
graphics and media IPs, the remaining contents of xe_device_desc can be
cleaned up a bit:

* 'gt' is unused; drop it
* DEV_INFO_FOR_EACH_FLAG only covers two flags and is only used in this
one file; drop the unnecessary macro complexity
* Convert .has_4tile to a single bitfield bit so that it can be packed
with the other feature flags
* Move 'platform' lower in the structure for better packing

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230406235621.1914492-10-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Add KUnit test for xe_pci.c IP engine lists

Add a simple KUnit test to ensure that the hardware engine lists for
GMD_ID IP definitions are sensible (i.e., no graphics engines defined
for the media IP and vice versa).

Only the IP descriptors for GMD_ID platforms are checked for now.
Presumably the engine lists on older pre-GMD_ID platforms shouldn't be
changing. We can extend the KUnit testing in the future if we decide we
want to check those as well.

v2:
- Add missing 'const' in xe_call_for_each_media_ip to avoid compiler
warning.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230406235621.1914492-9-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Select graphics/media descriptors from GMD_ID

If graphics_desc and media_desc are not specified in a platform's
xe_device_desc, treat this as an indication that the IP version should
be determined from the hardware's GMD_ID register.

Note that leaving media_desc unset for a platform that simply doesn't
have the IP (e.g., PVC) is also okay --- a read of the GMD_ID register
offset will be attempted, but since there's no register at that location
a value of '0' will be returned, effectively disabling media support.

Mapping of version -> IP description is done via a table lookup; this
table will be re-used in future patches for some KUnit testing.

v2:
- Drop dummy structures.  NULL can be safely used for both the GMD_ID
   cases and the "media not present case."
- Use a table-based lookup of GMD_ID versions rather than a simple
   switch statement; the table will allow us to easily perform kunit
   testing of all the IP descriptors.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230406235621.1914492-8-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Add printable name to IP descriptors

Printing the name, along with the IP version number, will help reduce
confusion about which IP is present on a platform.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230406235621.1914492-7-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Clarify GT counting logic

The total number of GTs supported by a platform should be one primary
GT, one media GT (if media version >= 13), and a number of remote tile
GTs dependent on the graphics IP present. Express this more clearly in
the device setup.

Note that xe->info.tile_count is inaccurately named; the rest of the
driver treats this as the GT count, not just the tile count. This
will need to be cleaned up at some point down the road.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230406235621.1914492-6-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Move engine masks into IP descriptor structures

Break the top-level platform_engine_mask field into separate
hw_engine_mask fields in the graphics and media structures.  Since
hardware has more flexibility to mix-and-match IP versions going
forward, this allows each IP to list exactly which engines it provides;
the final per-GT engine list can then be constructured from those:

* On platforms without a standalone media GT (i.e., media IP versions
   prior to 13), the primary GT's engine list is the union of the
   graphics IP's engine list and the media IP's engine list.
* Otherwise, GT0's engine list is the graphics IP's engine list.
* For GT1 and beyond, the type of GT determines which IP's engine list
   is used.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230406235621.1914492-5-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Move most platform traits to graphics IP

Most of the traits currently in the device descriptor structures are
either tied to the graphics IP or should be inferred from the graphics
IP. This becomes important on MTL and beyond where IP versions are
supposed to be detected from the hardware's GMD_ID registers rather than
mapped from PCI devid.

Engine masks are left where they are for now; they'll be dealt with
separately in a future patch.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230406235621.1914492-4-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Set require_force_probe in each platform's description

Set require_force_probe explicitly in each platform's description
structure rather than embedding it within the FOO_FEATURES macros. Even
though we expect all platforms currently supported by the Xe driver to
be under force_probe protection, this will help prepare for some other
upcoming restructuring.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230406235621.1914492-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Start splitting xe_device_desc into graphics/media structures

Rather than storing all characteristics for an entire platform in the
xe_device_desc structure, create secondary graphics and media structures
to hold traits and feature flags specific to those IPs.  This will
eventually allow us to assign the graphics and media characteristics at
runtime based on the contents of the relevant GMD_ID registers.

For now, just move the IP versions into the new structures to keep
things simple.  Other IP-specific fields will migrate to these
structures in future patches.

Note that there's one functional change introduced by this:  previously
PVC was recognized as media version 12.60.  That's technically true, but
in practice the media engines are fused off on all production hardware.
By simply not assigning a media IP structure to PVC it will effectively
be treated as IP version 0.0 now (which the rest of the driver should
treat as non-existent media).

v2:
- Split the new structures out to their own header.  This will ease the
   addition of KUnit tests later.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230406235621.1914492-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Always log GuC/HuC firmware versions

When debugging issues related to GuC/HuC, it's important to know what is
the firmware version being used. The version from the filename can't be
relied upon, also because it normally only contains the major version
(except for the ones under experimental support).

Log the version from the blob after reading the CSS header. Example:

xe 0000:03:00.0: [drm] Using GuC firmware (70.5) from i915/dg2_guc_70.bin

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20230405224725.1993719-1-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Always write GEN12_RCU_MODE.GEN12_RCU_MODE_CCS_ENABLE for CCS engines

If CCS0 was fused we did not write this register thus CCS engine were
not enabled resulting in driver load failures.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Update GuC/HuC firmware autoselect logic

Update the logic to autoselect GuC/HuC for the platforms with the
following improvements:

- Document what is the firmware file that is expected to be
  loaded and what is checked from blob headers

- When the platform is under force-probe it's desired to enforce
  the full-version requirement so the correct firmware is used
  before widespread adoption and backward-compatibility
  commitments

- Directory from which we expect firmware blobs to be available in
  upstream linux-firmware repository depends on the platform: for
  the ones supported by i915 it uses the i915/ directory, but the ones
  expected to be supported by xe, it's on the xe/ directory. This
  means that for platforms in the intersection, the firmware is
  loaded from a different directory, but that is not much important
  in the firmware repo and it avoids firmware duplication.

- Make the table with the firmware definitions clearly state the
  versions being expected. Now with macros to select the version it's
  possible to choose between full-version/major-version for GuC and
  full-version/no-version for HuC. These are similar to the macros used
  in i915, but implemented in a slightly different way to avoid
  duplicating the macros for each firmware/type and functionality,
  besides adding the support for different directories.

- There is no check added regarding force-probe since xe should
  reuse the same firmware files published for i915 for past
  platforms. This can be improved later with additional
  kunit checking against a hardcoded list of platforms that
  falls in this category.

- As mentioned in the TODO, the major version fallback was not
  implemented before as currently each platform only supports one
  major. That can be easily added later.

- GuC version for MTL and PVC were updated to 70.6.4, using the exact
  full version, while the

After this the GuC firmware used by PVC changes to pvc_guc_70.5.2.bin
since it's using a file not published yet.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Link: https://lore.kernel.org/r/20230324051754.1346390-4-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Add test for GT workarounds and tunings

In order to avoid mistakes when populating the workarounds, it's good to
be able to test if the entries added are all compatible for a certain
platform. The platform itself is not needed as long as we create fake
devices with enough configuration for the RTP helpers to process the
tables.  Common mistakes that can be avoided:

- Entries clashing the bitfields being updated
- Register type being mixed (MCR vs regular / masked vs regular)
- Unexpected errors while adding the reg_sr entry

To test, inject a duplicate entry in gt_was, but with platform == tigerlake
rather than the currenct graphics version check:

       { XE_RTP_NAME("14011059788"),
XE_RTP_RULES(PLATFORM(TIGERLAKE)),
XE_RTP_ACTIONS(SET(GEN10_DFR_RATIO_EN_AND_CHICKEN, DFR_DISABLE))
       },

This produces the following result:

$  ./tools/testing/kunit/kunit.py run \
--kunitconfig drivers/gpu/drm/xe/.kunitconfig  xe_wa

[14:18:02] Starting KUnit Kernel (1/1)...
[14:18:02] ============================================================
[14:18:02] ==================== xe_wa (1 subtest) =====================
[14:18:02] ======================== xe_wa_gt  =========================
[14:18:02] [drm:xe_reg_sr_add] *ERROR* Discarding save-restore reg 9550 (clear: 00000200, set: 00000200, masked: no): ret=-22
[14:18:02]     # xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:116
[14:18:02]     Expected gt->reg_sr.errors == 0, but
[14:18:02]         gt->reg_sr.errors == 1 (0x1)
[14:18:02] [FAILED] TIGERLAKE (B0)
[14:18:02] [PASSED] DG1 (A0)
[14:18:02] [PASSED] DG1 (B0)
...

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20230401085151.1786204-8-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Add basic unit tests for rtp

Add some basic unit tests for rtp. This is intended to prove the
functionality of the rtp itself, like coalescing entries, rejecting
non-disjoint values, etc.

Contrary to the other tests in xe, this is a unit test to test the
sw-side only, so it can be executed on any machine - it doesn't interact
with the real hardware. Running it produces the following output:

$ ./tools/testing/kunit/kunit.py run --raw_output-kunit  \
--kunitconfig drivers/gpu/drm/xe/.kunitconfig xe_rtp
...
[01:26:27] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
    KTAP version 1
    # Subtest: xe_rtp
    1..1
KTAP version 1
# Subtest: xe_rtp_process_tests
ok 1 coalesce-same-reg
ok 2 no-match-no-add
ok 3 no-match-no-add-multiple-rules
ok 4 two-regs-two-entries
ok 5 clr-one-set-other
ok 6 set-field
[drm:xe_reg_sr_add] *ERROR* Discarding save-restore reg 0001 (clear: 00000001, set: 00000001, masked: no): ret=-22
ok 7 conflict-duplicate
[drm:xe_reg_sr_add] *ERROR* Discarding save-restore reg 0001 (clear: 00000003, set: 00000000, masked: no): ret=-22
ok 8 conflict-not-disjoint
[drm:xe_reg_sr_add] *ERROR* Discarding save-restore reg 0001 (clear: 00000002, set: 00000002, masked: no): ret=-22
[drm:xe_reg_sr_add] *ERROR* Discarding save-restore reg 0001 (clear: 00000001, set: 00000001, masked: yes): ret=-22
ok 9 conflict-reg-type
    # xe_rtp_process_tests: pass:9 fail:0 skip:0 total:9
    ok 1 xe_rtp_process_tests
# Totals: pass:9 fail:0 skip:0 total:9
ok 1 xe_rtp
...

Note that the ERRORs in the kernel log are expected since it's testing
incompatible entries.

v2:
  - Use parameterized table for tests  (Michał Winiarski)
  - Move everything to the xe_rtp_test.ko and only add a few exports to the
    right namespace
  - Add more tests to cover FIELD_SET, CLR, partially true rules, etc

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maarten Lankhorst<maarten.lankhorst@linux.intel.com> # v1
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20230401085151.1786204-7-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/reg_sr: Save errors for kunit integration

When there's an entry that is dropped when xe_reg_sr_add(), there's
not much we can do other than reporting the error - it's for certain a
driver issue or conflicting workarounds/tunings. Save the number of
errors to be used later by kunit to report where it happens.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230401085151.1786204-6-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Generalize fake device creation

Instead of requiring tests to initialize a fake device an keep it in
sync with xe_pci.c when it's platform-dependent, export a function from
xe_pci.c to be used and piggy back on the device info creation. For
simpler tests that don't need any specific platform and just need a fake
xe device to pass around, xe_pci_fake_device_init_any() can be used.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230401085151.1786204-5-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Use symbol namespace for kunit tests

Instead of simply using EXPORT_SYMBOL() to export the functions needed
in xe.ko to be be called across modules, use EXPORT_SYMBOL_IF_KUNIT()
which will export the symbol under the EXPORTED_FOR_KUNIT_TESTING
namespace.

This avoids accidentally "leaking" these functions and letting them be
called from outside the kunit tests. If these functiosn are accidentally
called from another module, they receive a modpost error like below:

ERROR: modpost: module XXXXXXX uses symbol
xe_ccs_migrate_kunit from namespace EXPORTED_FOR_KUNIT_TESTING,
but does not import it.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/r/20230401085151.1786204-4-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Move test infra out of xe_pci.[ch]

Move code out of xe_pci.[ch] into tests/*.[ch], like is done in other
similar compilation units. Even if this is not part of "tests for
xe_pci.c", they are functions exported and required by other tests. It's
better not to clutter the module headers and sources with them.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230401085151.1786204-3-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Extract function to initialize xe->info

Extract the part setting up from xe->info from xe_pci_probe() into its
own function. This pairs nicely with the display counterpart, avoids
info initialization to be placed elsewhere and helps future
improvements to build fake devices for tests.

While at it, normalize the names a little bit: the _get() suffix may be
mistaken by lock-related operation, so rename the function to
"find_subplatform()". Also rename the variable to subplatform_desc to
make it easier to understand, even if longer.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230401085151.1786204-2-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/irq: Don't clobber display interrupts on multi-tile platforms

Although our only multi-tile platform today (PVC) doesn't support
display, it's possible that some future multi-tile platform will.
If/when this happens, display interrupts (both traditional display and
ASLE backlight interrupts raised as a Gunit interrupt) should be
delivered to the primary tile. Save away tile0's master_ctl value so
that it can still be used for display interrupt handling after the GT
loop.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230401002106.588656-9-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/irq: Drop commented-out code for non-existent media engines

Although the hardware team has set aside some register bits for extra
media engines, no platform supported by the Xe driver today has VCS4-7
or VECS2-3. Drop the corresponding code (which was already commented
out); we can bring it back easily enough if such engines show up on a
future platform.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230401002106.588656-8-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/irq: Drop remaining "gen11_" prefix from IRQ functions

The remaining "gen11_*" IRQ functions are common to all platforms
supported by the Xe driver. Drop the unnecessary prefix.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230401002106.588656-7-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/irq: Rename and clarify top-level interrupt handling routines

Platforms supported by the Xe driver handle top-level interrupts in one
of two ways:
- Xe_LP platforms only have a "graphics master" register and lack a
   "master tile" register, so top-level interrupt detection and
   enable/disable happens in the graphics master.
- Xe_LP+ (aka DG1) and beyond have a "master tile" interrupt register
   that controls the enable/disable of top-level interrupts and must
   also be consulted to determine which tiles have received interrupts
   before the driver moves on the process the graphics master register.

For functions that are only relevant to the first set of platforms,
rename the function prefix to Xe_LP since "gen11" doesn't make sense in
the Xe driver.  Also add some comments briefly describing the two
top-level handlers.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230401002106.588656-6-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/irq: Drop unnecessary GEN11_ and GEN12_ register prefixes

Any interrupt registers that were introduced by platforms i915
considered to be "gen11" or "gen12" are present on all platforms that
the Xe driver supports; drop the unnecessary prefixes.

While working in the area, also convert a few open-coded bit
manipulations over to REG_BIT and REG_FIELD_GET notation.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230401002106.588656-5-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo: removed display. That was later squashed to the xe Display patch]

drm/xe/irq: Drop IRQ_INIT and IRQ_RESET macros

It's no longer necessary to wrap these operations in macros; a simple
function will suffice. Also switch to function names that more clearly
describe what operation is being performed: unmask_and_enable() and
mask_and_disable().

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230401002106.588656-4-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/irq: Add helpers to find ISR/IIR/IMR/IER registers

For cases where IRQ_INIT and IRQ_RESET are used, the relevant interrupt
registers are always consecutive and ordered ISR, IMR, IIR, IER.  Adding
helpers to look these up from a base offset will let us eliminate some
of the CPP pasting and simplify other upcoming patches.

v2:
- s/_REGS/_OFFSET/ for consistency.  (Lucas)
- Move IMR/IIR/IER helpers into xe_irq.c; they aren't needed anywhere
   else.  (Lucas)

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230401002106.588656-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/irq: Drop gen3_ prefixes

"Gen" terminology should be avoided in the Xe driver and "gen3" refers
to platforms that are 9 (!!) graphics generations earlier than the
oldest supported by the Xe driver, so this prefix really doesn't make
sense.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230401002106.588656-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: fix pvc unload issue

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740]  <TASK>
[ 4850.623134]  ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661]  ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317]  ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487]  ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054]  ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190]  xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945]  drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904]  ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574]  drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617]  drm_dev_release+0x30/0x50 [drm]
[ 4850.671209]  devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Fix platform order

Platform order in enum xe_platform started to be used by some parts of
the code, like the GuC/HuC firmware loading logic. The order itself is
not very important, but it's better to follow a convention: as was
documented in the comment above the enum, reorder the platforms by
graphics version. While at it, remove the gen terminology.

v2:
  - Use "graphics version" instead of chronological order (Matt Roper)
  - Also change pciidlist to follow the same order
  - Remove "gen" from comments around enum xe_platform

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230331230902.1603294-1-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Use proper vram offset

In xe_migrate functions, use proper vram io offset of the
tiles while calculating addresses.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/tests: Set correct expectation

In xe_migrate_sanity_kunit test, use correct expected value as
the expected value was not only used for the xe_migrate_clear(),
but also for the xe_migrate_copy() operation.

v2: Add 'Fixes' tag and update commit text

Fixes: 11a2407ed5f0 ("drm/xe: Stop accepting value in xe_migrate_clear")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/tests: Use proper batch base address

In xe_migrate_sanity_kunit test, use proper batch base address
by considering usm case.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Remove unused revid from firmware name

The rev field is always 0 so it ends up never used. In i915 it was
introduced because of CML: up to rev 5 it reuses the guc and huc
firmware blobs from KBL. After that there is a specific firmware for
that platform.  This can be reintroduced later if ever needed.

With the removal of revid the packed attribute in
uc_fw_platform_requirement, which is there only for reducing the space
these tables take, can also be removed since it has even more limited
usefulness: currently there's only padding of 2 bytes. Remove the
attribute to avoid the unaligned access.

$ pahole -C uc_fw_platform_requirement build64/drivers/gpu/drm/xe/xe_uc_fw.o
struct uc_fw_platform_requirement {
enum xe_platform           p;                    /*     0     4 */
const struct uc_fw_blob    blob;                 /*     4    10 */

/* size: 16, cachelines: 1, members: 2 */
/* padding: 2 */
/* last cacheline: 16 bytes */
};

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230324051754.1346390-2-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Don't emit extra MI_BATCH_BUFFER_END in WA batchbuffer

The MI_BATCH_BUFFER_END is already added automatically by
__xe_bb_create_job(); including it in the construction of the workaround
batchbuffer results in an unnecessary duplicate.

Link: https://lore.kernel.org/r/20230329173334.4015124-4-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Adjust batchbuffer space warning when creating a job

We should WARN (not BUG) when creating a job if the batchbuffer does not
have sufficient space and padding. The hardware prefetch requirements
should also be considered.

Link: https://lore.kernel.org/r/20230329173334.4015124-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Include hardware prefetch buffer in batchbuffer allocations

The hardware prefetches several cachelines of data from batchbuffers
before they are parsed. This prefetching only stops when the parser
encounters an MI_BATCH_BUFFER_END instruction (or a nested
MI_BATCH_BUFFER_START), so we must ensure that there is enough padding
at the end of the batchbuffer to prevent the prefetcher from running
past the end of the allocation and potentially faulting.

Bspec: 45717
Link: https://lore.kernel.org/r/20230329173334.4015124-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Better error messages for xe_gt_record_default_lrcs

Add some error messages describing the problem when
xe_gt_record_default_lrcs fails.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: don't auto fall back to execlist mode if guc failed to init

In general, this is due to FW load failure, should just report
error and fail the probe so that user can easily retry again.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/pat: Define PAT tables as static

The tables are only used within this file; there's no reason for them
not to be static.

Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230327175824.2967914-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/bo: refactor try_add_vram

Get rid of some of the duplication here. In a future patch we need to
also consider [fpfn, lpfn], so better adjust in only one place.

Suggested-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: add XE_BO_CREATE_VRAM_MASK

So we don't have to keep repeating VRAM0 | VRAM1. Also if there are ever
more instances, then we have less places to update.

Suggested-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/mtl: Handle PAT_INDEX offset jump

Starting with MTL, the number of entries in the PAT table increased to
16. The register offset jumped between index 7 and index 8, so a slight
adjustment is needed to ensure the PAT_INDEX macros select the proper
offset for the upper half of the table.

Note that although there are 16 registers in the hardware, the driver is
currently only asked to program the first 5, and we leave the rest at
their hardware default values. That means we don't actually touch the
upper half of the PAT table in the driver today and this patch won't
have any functional effect [yet].

Bspec: 44235
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230324210415.2434992-7-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/mtl: Fix PAT table coherency settings

Re-sync our MTL PAT table with the bspec. 1-way coherency should only
be set on table entry 3. We do not want an incorrect setting here to
accidentally paper over other bugs elsewhere in the driver.

Bspec: 45101
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230324210415.2434992-6-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/pat: Clean up PAT register definitions

Replace the deprecated "GEN" terminology in the PAT definitions.

Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230324210415.2434992-5-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/pat: Handle unicast vs MCR PAT registers

The PAT_INDEX registers are MCR registers on some platforms and unicast
on others.  On MTL the handling even varies between GTs:  the primary GT
uses MCR registers while the media GT uses unicast registers.  Let's add
proper MCR programming on the relevant platforms/GTs.

Given that we PAT tables to change pretty regularly on future platforms,
we'll make PAT programming an exception to the usual model of assuming
new platforms should inherit the previous platform's behavior.  Instead
we'll raise a warning if the current platform isn't handled in the
if/else ladder.  This should help prevent subtle cache misbehavior if we
forget to add the table for a new platform.

Bspec: 66534, 67609, 67788
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230324210415.2434992-4-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/pat: Use table-based programming of PAT settings

Provide per-platform tables of PAT values rather than per-platform
functions. This will simplify the handling of unicast vs MCR registers
in the upcoming patches.

Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230324210415.2434992-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/pat: Move PAT setup to a dedicated file

PAT handling is growing in complexity and will continue to do so in
upcoming platforms. Separate it out to a dedicated file to keep things
tidy.

The code is moved as-is here (aside from a few unused #define's that are
just dropped); further changes will come in future patches.

Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://lore.kernel.org/r/20230324210415.2434992-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Decrement fault mode counts in xe_vm_close_and_put

Rather waiting for the VM to be destroyed (all refs to VM go to zero),
drop the fault mode counts when the VM is closed in xe_vm_close_and_put.
This avoids a window where user space can create a faulting VM, close
it, and a subsequent creation of a non-faulting VM fails.

v2 (Lucas): Drop VLK reference in commit message

Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Suggested-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Add max engine priority to xe query

Intel Vulkan driver needs to know what is the maximum priority to fill
a device info struct for applications.

Right now we getting this information by creating a engine and setting
priorities from min to high to know what is the maximum priority for
running process but this leads to info messages to be printed to
dmesg:

xe 0000:03:00.0: [drm] Ioctl argument check failed at drivers/gpu/drm/xe/xe_engine.c:178: value == DRM_SCHED_PRIORITY_HIGH && !capable(CAP_SYS_NICE)

It does not cause any harm but when executing a test suite like
crucible it causes thousands of those messages to be printed.

So here adding one more property to drm_xe_query_config to fetch the
max engine priority.

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Load HuC on Alderlake S

Alderlake S uses TGL HuC.

Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230323224651.1187366-3-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/huc: Support for loading unversiond HuC

Follow the new direction of firmware and add macro
support for loading unversioned HuC. Keep HuC
versioned loading support as well for platforms
that fall under force_probe support

Add check to ensure driver does not do any version check
for HuC if going through unversioned load.

v2: unversioned firmware to be the default for platforms
not under force_probe. Maintain versioned firmware macro support
for platforms under force-probe protection.
v3: Minor style and naming adjustments (Lucas)

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230323224651.1187366-2-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Fix potential deadlock handling page faults

Within a class the GuC will hault scheduling if the head of the queue
can't be scheduled the queue will block. This can lead to deadlock if
BCS0-7 all have faults and another engine on BCS0-7 is at head of the
GuC scheduling queue as the migration engine used to fix tthe fault will
be blocked. To work around this set the migration engine to the highest
priority when servicing page faults.

v2 (Maarten): Set priority to kernel once at creation

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Use BO's GT to determine dma_offset when programming PTEs

Rather than using the passed in GT, use the BO's GT determine dma_offset
when programming PTEs as these two GT's could differ (i.e. mapping a BO
from a remote GT). The BO's GT is correct GT to use as this where BO
resides, while the passed in GT is where the mapping is created.

v2:
(Thomas) - Kernel doc, extra new line
(CI) - Rebase to tip

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/gt: some error handling fixes

Make sure we pass along the correct errors.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Reinstate render / compute cache invalidation in ring ops

Render / compute engines have additional caches (not just TLBs) that
need to be invalidated each batch, reinstate these invalidations in ring
ops.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Use atomic instead of mutex for xe_device_mem_access_ongoing

xe_guc_ct_fast_path() is called from an irq context, and cannot lock
the mutex used by xe_device_mem_access_ongoing().

Fortunately it is easy to fix, and the atomic guarantees are good enough
to ensure xe->mem_access.hold_rpm is set before last ref is dropped.

As far as I can tell, the runtime ref in device access should be
killable, but don't dare to do it yet.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Drop zero length arrays

Zero-length arrays as fake flexible arrays are deprecated and we are
moving towards adopting C99 flexible-array members instead.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/buddy: add compatible and intersects hooks

Copy this from i915. We need .compatible for vram -> vram transfers, so
they don't just get nooped by ttm, if need to move something from
mappable to non-mappble or vice versa. The .intersects is needed for
eviction, to determine if a victim resource is worth eviction. e.g if we
need mappable space there is no point in evicting a resource that has
zero mappable pages.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/buddy: add visible tracking

Replace the allocation code with the i915 version. This simplifies the
code a little, and importantly we get the accounting at the mgr level,
which is useful for debug (and maybe userspace), plus per resource
tracking so we can easily check if a resource is using one or pages in
the mappable part of vram (useful for eviction), or if the resource is
completely within the mappable portion (useful for checking if the
resource can be safely CPU mapped).

v2: Fix missing PAGE_SHIFT
v3: (Gwan-gyeong Mun)
  - Fix incorrect usage of ilog2(mm.chunk_size).
  - Fix calculation when checking for impossible allocation sizes, also
    check much earlier.
v4: (Gwan-gyeong Mun)
  - Fix calculation when extending the [fpfn, lpfn] range due to the
    roundup_pow_of_two().
v5: (Gwan-gyeong Mun)
  - Move the check for running out of mappable VRAM to before doing any of
    the roundup_pow_of_two().
v6: (Jani)
  - Stop abusing BUG_ON(). We can easily just use WARN_ON() here and
    return a proper error to the caller, which is much nicer if we ever
    trigger these.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Stop accepting value in xe_migrate_clear

Although xe_migrate_clear() has a value argument, currently the driver
is only passing 0 at all the places this function is invoked with the
exception the kunit tests are using the parameter to validate this
function with different values.
xe_migrate_clear() is failing on platforms with link copy engines
because xe_migrate_clear() via emit_clear() is using the blitter
instruction XY_FAST_COLOR_BLT to clear the memory. But this instruction
is not supported by link copy engine.
So the solution is to use the alternate instruction MEM_SET when
platform contains link copy engine. But MEM_SET instruction accepts only
8-bit value for setting whereas the value agrument of xe_migrate_clear()
is 32-bit.
So instead of spreading this limitation around all invocations of
xe_migrate_clear() and causing more confusion, it was decided to not
accept any value itself as driver does not really need this currently.

All the kunit tests are adapted as per the new function prototype.

This will be followed by a patch to add support for link copy engines.

Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe: Use max wopcm size when validating the preset GuC wopcm size

When the GuC wopcm base and size registers are populated by BIOS/IFWI,
validate the parameters against the maximum allowed wopcm size.

Bpsec: 44982

Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/buddy: remove the virtualized start

Hopefully not needed anymore. We can add a .compatible() hook once we
need to differentiate between mappable and non-mappable vram. If the
allocation is not contiguous then the start value is kind of
meaningless, so rather just mark as invalid.

In upstream, TTM wants to eventually remove the ttm_resource.start
usage.

References: 544432703b2f ("drm/ttm: Add new callbacks to ttm res mgr")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/mcr: Separate version from engine type selection

In order to improve readability and make it more future proof,
split the engine type from the graphics/platform checks.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230317223441.3891073-1-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/vram: start tracking the io_size

First step towards supporting small-bar is to track the io_size for
vram. We can longer assume that the io_size == vram size. This way we
know how much is CPU accessible via the BAR, and how much is not.
Effectively giving us a two tiered vram, where in some later patches we
can support different allocation strategies depending on if the memory
needs to be CPU accessible or not.

Note as this stage we still clamp the vram size to the usable vram size.
Only in the final patch do we turn this on for real, and allow distinct
io_size and vram_size.

v2: (Lucas):
- Improve the commit message, plus improve the kernel-doc for the
io_size to give a better sense of what it actually is.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/xe/vm: Defer vm rebind until next exec if nothing to execute

If all compute engines of a vm in compute mode are idle,
defer a rebind to the next exec to avoid the VM unnecessarily trying
to make memory resident and compete with other VMs for available
memory space.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>