Ville Syrjälä [Thu, 18 Jul 2019 14:50:49 +0000 (17:50 +0300)]
drm/i915: Simplify intel_get_crtc_ycbcr_config()
Make intel_get_crtc_ycbcr_config() simpler and rename it
to bdw_get_pipemisc_output_format() to better reflect what
it does.
Also toss in some comments to document that the 4:2:0 PIPECONF
bits are glk+ only. They are mbz on earlier platforms so reading
them unconditionally is safe however.
Ville Syrjälä [Thu, 18 Jul 2019 14:50:48 +0000 (17:50 +0300)]
drm/i915: Don't look at unrelated PIPECONF bits for interlaced readout
Since HSW the PIPECONF progressive vs. interlaced selection is done
with just two bits instead of the earlier three. Let's not look at the
extra bit on HSW+. Also gen2 doesn't support interlaced displays at all.
This is actually fine as is currently because the extra bit is mbz (as
are all three bits on gen2). But just to avoid mishaps in the future
if the bits get reused let's only look at what's properly defined.
Ville Syrjälä [Thu, 18 Jul 2019 16:45:23 +0000 (19:45 +0300)]
drm/i915: Never set limited_color_range=true for YCbCr output
crtc_state->limited_color_range only applies to RGB output but
we're currently setting it even for YCbCr output. That will
lead to conflicting MSA and PIPECONF settings which can mess
up the image. Let's make sure limited_color_range stays unset
with YCbCr output.
Also WARN if we end up with such a bogus combination when
programming the MSA MISC bits as it's impossible to even
indicate quantization rangle for YCbCr via MSA MISC. YCbCr
output is simply assumed to be limited range always. Note
that VSC SDP does provide a mechanism for full range YCbCr,
so in the future we may want to rethink how we compute/store
this state.
And for good measure we add the same WARN to the HDMI path.
Ville Syrjälä [Thu, 18 Jul 2019 14:50:44 +0000 (17:50 +0300)]
drm/i915: Fix AVI infoframe quantization range for YCbCr output
We're configuring the AVI infoframe quantization range bits as if
we're always transmitting RGB pixels. Let's fix this so that we
correctly indicate limited range YCC quantization range when
transmitting YCbCr instead.
Looks like we're currently setting the MSA to xvYCC BT.709 instead
of the YCbCr BT.601 claimed by the comment. But even that comment
is wrong since we configure the CSC matrix to BT.709.
Let's remove the bogus statement from the comment and fix the
MSA to indicate YCbCr BT.709 so that it matches the actual
pixel data we're transmitting.
v2: Remove the separator parameter altogether from
__MAKE_UC_FW_PATH.(Daniele)
- Squash all firmware update patches (Daniele)
v3: s/huc/HuC
- Correct the order of platforms
- Change REVID of cml to 5(Michal)
- Code space changes in huc_def (Daniele)
Chris Wilson [Fri, 20 Sep 2019 08:12:54 +0000 (09:12 +0100)]
Revert "drm/i915/tgl: Implement Wa_1406941453"
Our sanitychecks indicate that while this register is context
saved/restore, the HW does not preserve this bit within the register --
it likely doesn't exist, or one of those mythical bits that the
architects insist does something despite all appearances to the
contrary.
For reference, SAMPLER_MODE is already in i915_reg.h as
GEN10_SAMPLER_MODE and is being setup in icl_ctx_workarounds_init() as
opposed to the chosen location here of rcs_engine_wa_init).
Chris Wilson [Thu, 19 Sep 2019 11:19:12 +0000 (12:19 +0100)]
drm/i915: Protect timeline->hwsp dereferencing
As not only is the signal->timeline volatile, so will be acquiring the
timeline's HWSP. We must first carefully acquire the timeline from the
signaling request and then lock the timeline. With the removal of the
struct_mutex serialisation of request construction, we can have multiple
timelines active at once, and so we must avoid using the nested mutex
lock as it is quite possible for both timelines to be establishing
semaphores on the other and so deadlock.
Chris Wilson [Thu, 19 Sep 2019 11:19:11 +0000 (12:19 +0100)]
drm/i915: Lock signaler timeline while navigating
As we need to take a walk back along the signaler timeline to find the
fence before upon which we want to wait, we need to lock that timeline
to prevent it being modified as we walk. Similarly, we also need to
acquire a reference to the earlier fence while it still exists!
Though we lack the correct locking today, we are saved by the
overarching struct_mutex -- but that protection is being removed.
v2: Tvrtko made me realise I was being lax and using annotations to
ignore the AB-BA deadlock from the timeline overlap. As it would be
possible to construct a second request that was using a semaphore from the
same timeline as ourselves, we could quite easily end up in a situation
where we deadlocked in our mutex waits. Avoid that by using a trylock
and falling back to a normal dma-fence await if contended.
v3: Eek, the signal->timeline is volatile and must be carefully
dereferenced to ensure it is valid.
Chris Wilson [Thu, 19 Sep 2019 11:19:10 +0000 (12:19 +0100)]
drm/i915: Mark i915_request.timeline as a volatile, rcu pointer
The request->timeline is only valid until the request is retired (i.e.
before it is completed). Upon retiring the request, the context may be
unpinned and freed, and along with it the timeline may be freed. We
therefore need to be very careful when chasing rq->timeline that the
pointer does not disappear beneath us. The vast majority of users are in
a protected context, either during request construction or retirement,
where the timeline->mutex is held and the timeline cannot disappear. It
is those few off the beaten path (where we access a second timeline) that
need extra scrutiny -- to be added in the next patch after first adding
the warnings about dangerous access.
One complication, where we cannot use the timeline->mutex itself, is
during request submission onto hardware (under spinlocks). Here, we want
to check on the timeline to finalize the breadcrumb, and so we need to
impose a second rule to ensure that the request->timeline is indeed
valid. As we are submitting the request, it's context and timeline must
be pinned, as it will be used by the hardware. Since it is pinned, we
know the request->timeline must still be valid, and we cannot submit the
idle barrier until after we release the engine->active.lock, ergo while
submitting and holding that spinlock, a second thread cannot release the
timeline.
v2: Don't be lazy inside selftests; hold the timeline->mutex for as long
as we need it, and tidy up acquiring the timeline with a bit of
refactoring (i915_active_add_request)
Chris Wilson [Thu, 19 Sep 2019 15:18:11 +0000 (16:18 +0100)]
drm/i915/tgl: Suspend pre-parser across GTT invalidations
Before we execute a batch, we must first issue any and all TLB
invalidations so that batch picks up the new page table entries.
Tigerlake's preparser is weakening our post-sync CS_STALL inside the
invalidate pipe-control and allowing the loading of the batch buffer
before we have setup its page table (and so it loads the wrong page and
executes indefinitely).
The igt_cs_tlb indicates that this issue can only be observed on rcs,
even though the preparser is common to all engines. Alternatively, we
could do TLB shootdown via mmio on updating the GTT.
By inserting the pre-parser disable inside EMIT_INVALIDATE, we will also
accidentally fixup execution that writes into subsequent batches, such
as gem_exec_whisper and even relocations performed on the GPU. We should
be careful not to allow this disable to become baked into the uABI! The
issue is that if userspace relies on our disabling of the HW
optimisation, when we are ready to enable that optimisation, userspace
will then be broken...
Ville Syrjälä [Wed, 18 Sep 2019 15:07:07 +0000 (18:07 +0300)]
drm/i915: Don't advertise modes that exceed the max plane size
Modern platforms allow the transcoders hdisplay/vdisplay to exceed the
planes' max resolution. This has the nasty implication that modes on the
connectors' mode list may not be usable when the user asks for a
fullscreen plane. Seeing as that is the most common use case it seems
prudent to filter out modes that don't allow for fullscreen planes to
be enabled.
Let's do that in the connetor .mode_valid() hook so that normally
such modes are kept hidden but the user is still able to forcibly
specify such a mode if they know they don't need fullscreen planes.
This is in line with ealier policies regarding certain clock limits.
The idea is to prevent the casual user from encountering a mode that
would fail under typical conditions, but allow the expert user to
force things if they so wish.
Maybe in the future we should consider automagically using two
planes when one can't cover the entire screen? Wouldn't be a
great match for the current uapi with explicit planes though,
but I guess no worse than using two pipes (which we apparently
have to in the future anyway). Either that or we'd have to
teach userspace to do it for us.
Ville Syrjälä [Thu, 5 Sep 2019 13:50:43 +0000 (16:50 +0300)]
drm/i915: Bump skl+ max plane width to 5k for linear/x-tiled
The officially validated plane width limit is 4k on skl+, however
we already had people using 5k displays before we started to enforce
the limit. Also it seems Windows allows 5k resolutions as well
(though not sure if they do it with one plane or two).
According to hw folks 5k should work with the possible
exception of the following features:
- Ytile (already limited to 4k)
- FP16 (already limited to 4k)
- render compression (already limited to 4k)
- KVMR sprite and cursor (don't care)
- horizontal panning (need to verify this)
- pipe and plane scaling (need to verify this)
So apart from last two items on that list we are already
fine. We should really verify what happens with those last
two items but I don't have a 5k display on hand atm so it'll
have to wait.
In the meantime let's just bump the limit back up to 5k since
several users have already been using it without apparent issues.
At least we'll be no worse off than we were prior to lowering
the limits.
Matt Roper [Wed, 18 Sep 2019 23:56:26 +0000 (16:56 -0700)]
drm/i915: Unify ICP and MCC hotplug pin tables
The MCC hpd table is just a subset of the ICP table; we can eliminate it
and use the ICP table everywhere. The extra pins in the table won't be
a problem for MCC since we still supply an appropriate hotplug trigger
mask anywhere the pin table is used.
Matt Roper [Wed, 18 Sep 2019 23:56:25 +0000 (16:56 -0700)]
drm/i915: Future-proof DDC pin mapping
We generally assume future platforms will inherit the behavior of the
most recent platforms, so update our DDC pin mapping defaults to match
how ICP/TGP behave (i.e., pins starting from GMBUS_PIN_1_BXT for combo
PHY's and pins starting from GMBUS_PIN_9_TC1_ICP for TC PHY's). MCC's
non-standard handling of combo PHY C seems like a platform-specific
quirk that is unlikely to be duplicated on future platforms, so continue
handling it as a special case.
Without this change, future platforms would default to gen4-style pin
mapping which is almost certainly not what we'll want.
Chris Wilson [Wed, 18 Sep 2019 14:54:50 +0000 (15:54 +0100)]
drm/i915: Verify the engine after acquiring the active.lock
When using virtual engines, the rq->engine is not stable until we hold
the engine->active.lock (as the virtual engine may be exchanged with the
sibling). Since commit 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
we may retire a request concurrently with resubmitting it to HW, we need
to be extra careful to verify we are holding the correct lock for the
request's active list. This is similar to the issue we saw with
rescheduling the virtual requests, see sched_lock_engine().
Our assumption that the we can ask the HW to lock the SFC even if not
currently in use does not match the HW commitment. The expectation from
the HW is that SW will not try to lock the SFC if the engine is not
using it and if we do that the behavior is undefined; on ICL the HW
ends up to returning the ack and ignoring our lock request, but this is
not guaranteed and we shouldn't expect it going forward.
Also, failing to get the ack while the SFC is in use means that we can't
cleanly reset it, so fail the engine reset in that scenario.
v2: drop rmw change, keep the log as debug and handle failure (Chris),
improve comments (Tvrtko).
Chris Wilson [Tue, 17 Sep 2019 19:47:46 +0000 (20:47 +0100)]
drm/i915: Extend Haswell GT1 PSMI workaround to all
A few times in CI, we have detected a GPU hang on our Haswell GT2
systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a
was first introducted, it was applied to all Haswell, but later on we
found an erratum that supposedly restricted the issue to GT1 and so
constrained it only be applied on GT1. That may have been a mistake...
Matt Roper [Mon, 16 Sep 2019 23:32:51 +0000 (16:32 -0700)]
drm/i915/cml: Add second PCH ID for CMP
The CMP PCH ID we have in the driver is correct for the CML-U machines we have
in our CI system, but the CML-S and CML-H CI machines appear to use a
different PCH ID, leading our driver to detect no PCH for them.
Chris Wilson [Tue, 17 Sep 2019 12:30:55 +0000 (13:30 +0100)]
drm/i915/tgl: Extend MI_SEMAPHORE_WAIT
On Tigerlake, MI_SEMAPHORE_WAIT grew an extra dword, so be sure to
update the length field and emit that extra parameter and any padding
noop as required.
v2: Define the token shift while we are adding the updated MI_SEMAPHORE_WAIT
v3: Use int instead of bool in the addition so that readers are not left
wondering about the intricacies of the C spec. Now they just have to
worry what the integer value of a boolean operation is...
Chris Wilson [Tue, 17 Sep 2019 08:00:29 +0000 (09:00 +0100)]
drm/i915: Only apply a rmw mmio update if the value changes
If we try to clear, or even set, a bit in the register that doesn't
change the register state; skip the write. There's a slight danger in
that the register acts as a latch-on-write, but I do not think we use a
rmw cycle with any such latch registers.
Jani Nikula [Mon, 16 Sep 2019 09:29:01 +0000 (12:29 +0300)]
drm/i915: stop conflating HAS_DISPLAY() and disabled display
Stop setting ->pipe_mask to zero when display is disabled, allowing us
to have different code paths for not actually having display hardware,
and having display hardware disabled. This lets us develop those two
avenues independently.
There are no functional changes for when there is no display. However,
all uses of for_each_pipe() and for_each_pipe_masked() will start
running for the disabled display case. Put one of the more significant
ones behind checks for INTEL_DISPLAY_ENABLED(), otherwise the cases
should not be hit with disabled display, or they seem benign. Fingers
crossed.
All in all, this might not be the ideal solution. In fact we may have
had something along the lines of this in the past, but we ended up
conflating the two cases. Possibly even by recommendation by yours
truly; I did not dare dig up that part of the history. But the perfect
is the enemy of the good, this is a straightforward change, and lets us
get actual work done in both fronts without interfering with each other.
Ville Syrjälä [Fri, 13 Sep 2019 19:31:55 +0000 (22:31 +0300)]
drm/i915: Allow downscale factor of <3.0 on glk+ for all formats
Bspec says that glk+ max downscale factor is <3.0 for all pixel formats.
Older platforms had a max of <2.0 for NV12. Update the code to deal with
this.
Jani Nikula [Fri, 13 Sep 2019 10:04:07 +0000 (13:04 +0300)]
drm/i915: introduce INTEL_DISPLAY_ENABLED()
Prepare for making a distinction between not having display and having
disabled display. Add INTEL_DISPLAY_ENABLED() and use it where
HAS_DISPLAY() is used after intel_device_info_runtime_init(). This is
initially duplication, as disabling display still leads to ->pipe_mask =
0 and HAS_DISPLAY() being false.
Note that ever since i915.display_disable was introduced, it has not
affected PCH detection even if it uses HAS_DISPLAY(), as display disable
happens after that.
Since INTEL_DISPLAY_ENABLED() will not make sense unless HAS_DISPLAY()
is true, include a warning for catching misuses making decisions on
INTEL_DISPLAY_ENABLED() when HAS_DISPLAY() is false.
v2: Remove INTEL_DISPLAY_ENABLED() check from intel_detect_pch() (Chris)
Chris Wilson [Thu, 12 Sep 2019 16:08:34 +0000 (17:08 +0100)]
drm/i915: Don't mix srcu tag and negative error codes
While srcu may use an integer tag, it does not exclude potential error
codes and so may overlap with our own use of -EINTR. Use a separate
outparam to store the tag, and report the error code separately.
On ICL+, the max supported plane height is 4320, so bump it up
To support 4320, we need to increase the number of bits used to
read plane_height to 13 as opposed to older 12 bits.
v4:
* Adjust the width mask also since extra bits are mbz (Ville)
v3:
* Use 0xffff for mask as extra bits are mbz (Ville)
v2:
* ICL plane height supported is 4320 (Ville)
* Add a new line between max width and max height (Jose)
Chris Wilson [Fri, 13 Sep 2019 06:42:00 +0000 (07:42 +0100)]
drm/i915/gtt: Make sure the gen6 ppgtt is bound before first use
As we remove the struct_mutex protection from around the vma pinning,
counters need to be atomic and aware that there may be multiple threads
simultaneously active.
Chris Wilson [Thu, 12 Sep 2019 13:23:13 +0000 (14:23 +0100)]
drm/i915/tgl: Disable preemption while being debugged
We see failures where the context continues executing past a
preemption event, eventually leading to situations where a request has
executed before we have event submitted it to HW! It seems like tgl is
ignoring our RING_TAIL updates, but more likely is that there is a
missing update required for our semaphore waits around preemption.
Chris Wilson [Thu, 12 Sep 2019 12:48:13 +0000 (13:48 +0100)]
drm/i915/pmu: Use GT parked for estimating RC6 while asleep
As we track when we put the GT device to sleep upon idling, we can use
that callback to sample the current rc6 counters and record the
timestamp for estimating samples after that point while asleep.
v2: Stick to using ktime_t
v3: Track user_wakerefs that interfere with the new
intel_gt_pm_wait_for_idle
v4: No need for parked/unparked estimation if !CONFIG_PM
v5: Keep timer park/unpark logic as was
v6: Refactor duplicated estimate/update rc6 logic
v7: Pull intel_get_pm_get_if_awake() out from the pmu->lock.
Jani Nikula [Wed, 11 Sep 2019 20:29:08 +0000 (23:29 +0300)]
drm/i915: convert device info num_pipes to pipe_mask
Replace device info number of pipes with a bit mask of available
pipes. This will prove handy in the future. There's still a bunch of
future work to do to actually allow a non-consecutive mask of pipes, but
it's a start. No functional changes.
drm/i915/pmu: Skip busyness sampling when and where not needed
Since d0aa694b9239 ("drm/i915/pmu: Always sample an active ringbuffer")
the cost of sampling the engine state on execlists platforms became a
little bit higher when both engine busyness and one of the wait states are
being monitored. (Previously the busyness sampling on legacy platforms was
done via seqno comparison so there was no cost of mmio read.)
We can avoid that by skipping busyness sampling when engine supports
software busy stats and so avoid the cost of potential mmio read and
sample accumulation.
Chris Wilson [Thu, 12 Sep 2019 09:29:33 +0000 (10:29 +0100)]
drm/i915/execlists: Ensure the context is reloaded after a GPU reset
After we manipulate the context to allow replay after a GPU reset, force
that context to be reloaded. This should be a layer of paranoia, for if
the GPU was reset, the context will no longer be resident!
Chris Wilson [Thu, 12 Sep 2019 09:29:32 +0000 (10:29 +0100)]
drm/i915/execlists: Add a paranoid flush of the CSB pointers upon reset
After a GPU reset, we need to drain all the CS events so that we have an
accurate picture of the execlists state at the time of the reset. Be
paranoid and force a read of the CSB write pointer from memory.
Chris Wilson [Wed, 11 Sep 2019 17:59:26 +0000 (18:59 +0100)]
drm/i915: Disable FBC if BIOS reserved memory (stolen) is unavailable
The FBC requires a couple of contiguous buffers, which we allocate from
stolen memory. If stolen memory is unavailable, we cannot allocate those
buffers and so cannot support FBC. Mark it so.
Reuse the same .modeset_calc_cdclk() function for all bxt+.
The only difference in between the cnl/icl and the bxt variants
is the call to cnl_compute_min_voltage_level(). We can do that call
just fine on older platforms since they leave min_voltage_level[]
zeroed. Let's rename the function to bxt_compute_min_voltage_level()
just so it stays consistent with the rest of the naming scheme.
Ville Syrjälä [Wed, 11 Sep 2019 13:31:27 +0000 (16:31 +0300)]
drm/i915: Fix CD2X pipe select masking during cdclk sanitation
We're forgetting to mask off all three pipe select bits from the
CDCLK_CTL value on icl+ which may lead to the extra bit being
left in. That will cause us to consider the current hardware
cdclk state as invalid, and we proceed to sanitize it even
though the hardware may have active pipes and whatnot.
Fix up the mask so we get rid of all three pipe select bits
and thus hopefully no longer sanitize cdclk when it's already
correctly programmed.
Ville Syrjälä [Wed, 11 Sep 2019 13:31:26 +0000 (16:31 +0300)]
drm/i915: Fix cdclk bypass freq readout for tgl/bxt/glk
On tgl/bxt/glk the cdclk bypass frequency depends on the PLL
reference clock. So let's read out the ref clock before we
try to compute the bypass clock.
Chris Wilson [Wed, 11 Sep 2019 11:46:55 +0000 (12:46 +0100)]
drm/i915: Squeeze iommu status into debugfs/i915_capabilities
There's no easy way of checking whether iommu is enabled for the GPU
(you can grep dmesg if you know the device, or you can grep
i915_gpu_info if that's available). We do have a central
i915_capabilities with the intent of listing such pertinent information,
so add the iommu status.
Chris Wilson [Wed, 11 Sep 2019 07:47:27 +0000 (08:47 +0100)]
drm/i915/display: Add glk_cdclk_table
Commit 736da8112fee ("drm/i915: Use literal representation of cdclk
tables") pushed the cdclk logic into tables, adding glk_cdclk_table but
not using yet:
drivers/gpu/drm/i915/display/intel_cdclk.c:1173:38: error: ‘glk_cdclk_table’ defined but not used [-Werror=unused-const-variable=]
Chris Wilson [Wed, 11 Sep 2019 09:02:43 +0000 (10:02 +0100)]
drm/i915: Make i915_vma.flags atomic_t for mutex reduction
In preparation for reducing struct_mutex stranglehold around the vm,
make the vma.flags atomic so that we can acquire a pin on the vma
atomically before deciding if we need to take the mutex.
This allows userspace to use "legacy" mode for push constants, where
they are committed at 3DPRIMITIVE or flush time, rather than being
committed at 3DSTATE_BINDING_TABLE_POINTERS_XS time. Gen6-8 and Gen11
both use the "legacy" behavior - only Gen9 works in the "new" way.
Conflating push constants with binding tables is painful for userspace,
we would like to be able to avoid doing so.
Matt Roper [Tue, 10 Sep 2019 15:42:52 +0000 (08:42 -0700)]
drm/i915: Consolidate {bxt,cnl,icl}_init_cdclk
The BXT and CNL functions were already basically identical, whereas
ICL's function tried to do its own sanitization rather than calling
bxt_sanitize_cdclk.
This should actually fix a bug in our ICL initialization where it would
consider the /2 CD2X divider invalid and force an unnecessary
sanitization (we now have valid clock frequencies that use this
divider).
Matt Roper [Tue, 10 Sep 2019 15:42:51 +0000 (08:42 -0700)]
drm/i915: Enhance cdclk sanitization
When reading out the BIOS-programmed cdclk state, let's make sure that
the cdclk value is on the valid list for the platform, ensure that the
VCO matches the cdclk, and ensure that the CD2X divider was set
properly.
Matt Roper [Tue, 10 Sep 2019 15:42:47 +0000 (08:42 -0700)]
drm/i915: Combine bxt_set_cdclk and cnl_set_cdclk
We'd previously combined ICL/TGL logic into the cnl_set_cdclk function,
but BXT is pretty similar as well. Roll the cnl/icl/tgl logic back into
the bxt function; the only things we really need to handle separately
are punit notification and calling different functions to enable/disable
the cdclk PLL.
Matt Roper [Tue, 10 Sep 2019 16:15:06 +0000 (09:15 -0700)]
drm/i915: Use literal representation of cdclk tables
The bspec lays out legal cdclk frequencies, PLL ratios, and CD2X
dividers in an easy-to-read table for most recent platforms. We've been
translating the data from that table into platform-specific code logic,
but it's easy to overlook an area we need to update when adding new
cdclk values or enabling new platforms. Let's just add a form of the
bspec table to the code and then adjust our functions to pull what they
need directly out of the table.
v2: Fix comparison when finding best cdclk.
v3: Another logic fix for calc_cdclk.
v4:
- Use named initializers for cdclk tables. (Ville)
- Include refclk as a field in the table instead of adding all three
ratios for each entry. (Ville)
- Terminate tables with an empty entry to avoid needing to store the
table size. (Ville)
- Don't try so hard to return reasonable values from our lookup
functions if we get impossible inputs; just WARN and return 0.
(Ville)
- Keep a bxt_ prefix on the lookup functions since they're still only
used on bxt+ for now. We can rename them later if we extend this
table-based approach back to older platforms. (Ville)
v5:
- Fix cnl table's ratios for 24mhz refclk. (Ville)
- Don't miss the named initializers on the cnl table. (Ville)
- Represent refclk in table as u16 rather than u32. (Ville)
Matt Roper [Tue, 10 Sep 2019 16:05:20 +0000 (09:05 -0700)]
drm/i915: Consolidate bxt/cnl/icl cdclk readout
Aside from a few minor register changes and some different clock values,
cdclk design hasn't changed much since gen9lp. Let's consolidate the
handlers for bxt, cnl, and icl to keep the codeflow consistent.
Also, while we're at it, s/bxt_de_pll_update/bxt_de_pll_readout/ since
"update" makes me think we should be writing to hardware rather than
reading from it.
v2:
- Fix icl_calc_voltage_level() limits. (Ville)
- Use CNL_CDCLK_PLL_RATIO_MASK rather than BXT_DE_PLL_RATIO_MASK on
gen10+ to avoid confusion. (Ville)
v3:
- Also fix ehl_calc_voltage_level() limits. (Ville)
Chris Wilson [Tue, 10 Sep 2019 16:16:57 +0000 (17:16 +0100)]
drm/i915/tgl: Disable rc6 for debugging
Empirical evidence from CI tells us that our rc6 setup for Tigerlake is
off. Disable rc6 on tgl temporary so that we gain CI coverage as we
prepare a fix. It also appears that the BIOS on our tgl leaves rc6
enabled, so we have to explicitly disable it on init.
Chris Wilson [Tue, 10 Sep 2019 12:10:09 +0000 (13:10 +0100)]
drm/i915/selftests: Tighten the timeout testing for partial mmaps
Currently, if there is time remaining before the start of the loop, we
do one full iteration over many possible different chunks within the
object. A full loop may take 50+s (depending on speed of indirect GTT
mmapings) and we try separately with LINEAR, X and Y -- at which point
igt times out. If we check more frequently, we will interrupt the loop
upon our timeout -- it is hard to argue for as this significantly reduces
the test coverage as we dramatically reduce the runtime. In practical
terms, the coverage we should prioritise is in using different fence
setups, forcing verification of the tile row computations over the
current preference of checking extracting chunks. Though the exhaustive
search is great given an infinite timeout, to improve our current
coverage, we also add a randomised smoketest of partial mmaps. So let's
do both, add a randomised smoketest of partial tiling chunks and the
exhaustive (though time limited) search for failures.
Even in adding another subtest, we should shave 100s off BAT! (With,
hopefully, no loss in coverage, at least over multiple runs.)
Chris Wilson [Mon, 9 Sep 2019 11:00:06 +0000 (12:00 +0100)]
drm/i915/selftests: Take runtime wakeref for igt_ggtt_lowlevel
Being a "low-level" test, we opt to bypass the normal bind/unbind hooks
for the lower level insert_entries/clear_range. For ggtt, the
bind/unbind hooks provide the runtime wakeref and so we must also handle
this in exercising the low level hooks.
Chris Wilson [Mon, 9 Sep 2019 11:00:08 +0000 (12:00 +0100)]
drm/i915: Perform GGTT restore much earlier during resume
As soon as we re-enable the various functions within the HW, they may go
off and read data via a GGTT offset. Hence, if we have not yet restored
the GGTT PTE before then, they may read and even *write* random locations
in memory.
Chris Wilson [Mon, 9 Sep 2019 11:30:18 +0000 (12:30 +0100)]
drm/i915/ringbuffer: Flush writes before RING_TAIL update
Be paranoid and make sure we flush any and all writes out of the WCB
before performing the UC mmio to update the RING_TAIL. (An UC write
should itself be enough to do the flush, hence the paranoia here.) Quite
infrequently, we see problems where the GPU seems to overshoot the
RING_TAIL and so executes garbage, hence the speculation.
Chris Wilson [Sat, 7 Sep 2019 08:43:34 +0000 (09:43 +0100)]
drm/i915/execlists: Ignore lost completion events
Icelake hit an issue where it missed reporting a completion event and
instead jumped straight to a idle->active event (skipping over the
active->idle and not even hitting the lite-restore preemption).
For cherryview, add hw read out to create hw blob of gamma
lut values.
Review comments from previous series:
https://patchwork.freedesktop.org/patch/328252
v4: -No need to initialize *blob [Jani]
-Removed right shifts [Jani]
-Dropped dev local var [Jani]
v5: -Returned blob instead of assigning it internally within the
function [Ville]
-Renamed function cherryview_get_color_config() to chv_read_luts()
-Renamed cherryview_get_gamma_config() to chv_read_cgm_gamma_lut()
[Ville]
v9: -80 character limit [Uma]
-Made read func para as const [Ville, Uma]
-Renamed chv_read_cgm_gamma_lut() to chv_read_cgm_gamma_lut()
[Ville, Uma]
For i965, add hw read out to create hw blob of gamma
lut values.
Review comments from old series:
https://patchwork.freedesktop.org/series/58039/
v4: -No need to initialize *blob [Jani]
-Removed right shifts [Jani]
-Dropped dev local var [Jani]
v5: -Returned blob instead of assigning it internally
within the function [Ville]
-Renamed i965_get_color_config() to i965_read_lut() [Ville]
-Renamed i965_get_gamma_config_10p6() to i965_read_gamma_lut_10p6()
[Ville]
v9: -Typo and 80 character limit [Uma]
-Made read func para as const [Ville, Uma]
-Renamed i965_read_gamma_lut_10p6() to i965_read_lut_10p6() [Ville, Uma]
v10: -Swapped ldw and udw while creating hw blob [Jani]
-Added last index rgb lut value from PIPEGCMAX to h/w blob [Jani]
drm/i915/display: Add gamma precision function for CHV
intel_color_get_gamma_bit_precision() is extended for
cherryview by adding chv_gamma_precision(), i965 will use existing
i9xx_gamma_precision() func only.
Chris Wilson [Tue, 10 Sep 2019 08:02:08 +0000 (09:02 +0100)]
drm/i915/execlists: Clear STOP_RING bit on reset
During reset, we try to ensure no forward progress of the CS prior to
the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this
register is context saved and do we end up in the odd situation where we
save the STOP_RING bit and so try to stop the engine again immediately
upon resume. This is quite unexpected and causes us to complain about an
early CS completion event!
Matthew Auld [Mon, 9 Sep 2019 17:16:46 +0000 (18:16 +0100)]
drm/i915: include GTT page-size info in error state
It might prove useful in the future to know if the vma is utilising
huge-GTT-pages. Related to this is the GTT cache, where there is some HW
"quirkiness" where it must be disabled if using 2M pages, so include
that for good measure.
Matthew Auld [Mon, 9 Sep 2019 12:40:52 +0000 (13:40 +0100)]
drm/i915: cleanup cache-coloring
Try to tidy up the cache-coloring such that we rid the code of any
mm.color_adjust assumptions, this should hopefully make it more obvious
in the code when we need to actually use the cache-level as the color,
and as a bonus should make adding a different color-scheme simpler.
Chris Wilson [Sat, 7 Sep 2019 10:50:46 +0000 (11:50 +0100)]
drm/i915/execlists: Remove incorrect BUG_ON for schedule-out
As we may unwind incomplete requests (for preemption) prior to
processing the CSB and the schedule-out events, we may update rq->engine
(resetting it to point back to the parent virtual engine) prior to
calling execlists_schedule_out(), invalidating the assertion that the
request still points to the inflight engine. (The likelihood of this is
increased if the CSB interrupt processing is pushed to the ksoftirqd for
being too slow and direct submission overtakes it.)
Tvrtko summarised it as:
"So unwind from direct submission resets rq->engine and races with
process_csb from the tasklet which notices request has actually
completed."
Michel Thierry [Fri, 6 Sep 2019 12:23:14 +0000 (15:23 +0300)]
drm/i915/tgl: Register state context definition for Gen12
Gen12 has subtle changes in the reg state context offsets (some fields
are gone, some are in a different location), compared to previous Gens.
The simplest approach seems to be keeping Gen12 (and future platform)
changes apart from the previous gens, while keeping the registers that
are contiguous in functions we can reuse.
v2: alias, virtual engine, rpcs, prune unused regs
v3: use engine base (Daniele), take ctx_bb for all
Mika Kuoppala [Fri, 6 Sep 2019 13:49:57 +0000 (16:49 +0300)]
drm/i915: Use engine relative LRIs on context setup
Daniele pointed out that relative mmio works differently in
on context restore. Instead of adding the engine mmio base to offset,
it masks out the base and adds bits [12:2] to current engine base.
This should allow us to construct context register state to be
applicable to all instances, including virtual. And avoid the trouble
of updating the registers on virtual instances when submitting work.
v2: only enable for gen12 for now (Mika)
v3: make enabling readable (Chris)
Matt Roper [Thu, 5 Sep 2019 18:13:37 +0000 (11:13 -0700)]
drm/i915/tgl: Use refclk/2 as bypass frequency
Unlike gen11, which always ran at 50MHz when the cdclk PLL was disabled,
TGL runs at refclk/2. The 50MHz croclk/2 is only used by hardware
during some power state transitions.
Chris Wilson [Tue, 3 Sep 2019 06:21:33 +0000 (07:21 +0100)]
drm/i915: Protect debugfs per_file_stats with RCU lock
If we make sure we grab a strong reference to each object as we dump it,
we can reduce the locks outside of our iterators to an rcu_read_lock.
This should prevent errors like:
[ 2138.371911] BUG: KASAN: use-after-free in per_file_stats+0x43/0x380 [i915]
[ 2138.371924] Read of size 8 at addr ffff888223651000 by task cat/8293
drm/i915/mst: Do not hardcoded the crtcs that encoder can connect
Tiger Lake has up to 4 pipes so the mask would need to be 0xf instead of
0x7. Do not hardcode the mask so it allows the fake MST encoders to
connect to all pipes no matter how many the platform has.
Iterating over all pipes to keep consistent with intel_ddi_init().
Initialy this patch was replaced by commit 4eaceea3a00f ("drm/i915:
Fix DP-MST crtc_mask") but userspace it not correctly using
encoder.possible_crtcs and it was reverted by
commit e838bfa8e170 ("Revert "drm/i915: Fix DP-MST crtc_mask"")
Userspace should be fixed but it might take a while, so bringing this
patch back for now.
Lucas De Marchi [Wed, 4 Sep 2019 21:34:19 +0000 (14:34 -0700)]
drm/i915/tgl: add gen12 to stolen initialization
Add case for gen == 12 and add MISSING_CASE() for future gens. We were
already handling gen12 as the default, so this doesn't change the
current behavior.