Hung, Cruise [Fri, 13 May 2022 01:16:42 +0000 (09:16 +0800)]
drm/amd/display: Fix DMUB outbox trace in S4 (#4465)
[Why]
DMUB Outbox0 read/write pointer not sync after resumed from S4.
And that caused old traces were sent to outbox.
[How]
Disable DMUB Outbox0 interrupt
and clear DMUB Outbox0 read/write pointer when resumes from S4.
And then enable Outbox0 interrupt before starts DMCUB.
hengzhou [Sat, 7 May 2022 01:43:08 +0000 (09:43 +0800)]
drm/amd/display: Wait DMCUB to idle state before reset.
[WHY]
Very low rate to cause memory access issue while resetting
DMCUB after the halt command was sent to it.
The process of stopping fw of DMCUB may be timeout, that means
it is not in idle state, such as the window frames may still be
kept in cache, so reset by force will cause MMHUB hang.
[HOW]
After the halt command was sent, keep checking the DMCUB state until
it is idle.
drm/amd/display: Pass the new context into disable OTG WA
[Why]
When enabling an HPO stream for the first time after having previously
enabled a DIO stream there may be lingering DIO FIFO errors even though
the DIO is no longer enabled.
These can cause display clock change to hang if we don't apply the
OTG disable workaround since the ramping logic is tied to OTG on.
[How]
The workaround wasn't being applied in the sequence of:
1 DIO stream
0 streams
1 HPO stream
because current_state has no stream or planes in its context - and
it's only swapped after optimize has finished.
We should be using the incoming context instead to determine whether
this logic is needed or not.
Dave Airlie [Wed, 1 Jun 2022 00:56:11 +0000 (10:56 +1000)]
Merge tag 'amd-drm-next-5.19-2022-05-26-2' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.19-2022-05-26-2:
amdgpu:
- Update fdinfo to the common drm format
UAPI:
- Add VM_NOALLOC GPUVM attribute to prevent buffers for going into the MALL
Add AMDGPU_GEM_CREATE_DISCARDABLE flag to create buffers that can be discarded on eviction
Mesa code which uses these: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16466
Christian König [Wed, 11 May 2022 09:06:26 +0000 (11:06 +0200)]
drm/amdgpu: Convert to common fdinfo format v5
Convert fdinfo format to one documented in drm-usage-stats.rst.
It turned out that the existing implementation was actually completely
nonsense. The calculated percentages indeed represented the usage of the
engine, but with varying time slices.
So 10% usage for application A could mean something completely different
than 10% usage for application B.
Completely nuke that and just use the now standardized nanosecond
interface.
v2: drop the documentation change for now, nuke percentage calculation
v3: only account for each hw_ip, move the time_spend to the ctx mgr.
v4: move general ctx changes into separate patch, rework the fdinfo to
ctx_mgr interface so that all usages are calculated at once, drop
some unecessary and dangerous refcount dance.
v5: add one more comment how we calculate the time spend
Alex Deucher [Mon, 16 May 2022 18:12:33 +0000 (14:12 -0400)]
drm/amdgpu/discovery: validate VCN and SDMA instances
Validate the VCN and SDMA instances against the driver
structure sizes to make sure we don't get into a
situation where the firmware reports more instances than
the driver supports.
Sung Joon Kim [Thu, 19 May 2022 21:46:36 +0000 (17:46 -0400)]
drm/amd/display: add Coverage blend mode for overlay plane
According to the KMS man page, there is a
"Coverage" alpha blend mode that assumes the
pixel color values have NOT been pre-multiplied
and will be done when the actual blending to
the background color values happens.
Previously, this mode hasn't been enabled
in our driver and it was assumed that all
normal overlay planes are pre-multiplied
by default.
When a 3rd party app is used to input a image
in a specific format, e.g. PNG, as a source
of a overlay plane to blend with the background
primary plane, the pixel color values are not
pre-multiplied. So by adding "Coverage" blend
mode, our driver will support those cases.
Adding Coverage support also enables IGT
kms_plane_alpha_blend Coverage subtests:
1. coverage-7efc
2. coverage-vs-premult-vs-constant
Changes
1. Add DRM_MODE_BLEND_COVERAGE blend mode capability
2. Add "pre_multiplied_alpha" flag for Coverage case
3. Read the correct flag and set the DCN MPCC
pre_multiplied register bit (only on overlay plane)
Dan Carpenter [Mon, 16 May 2022 07:05:48 +0000 (10:05 +0300)]
drm/amdgpu: Off by one in dm_dmub_outbox1_low_irq()
The > ARRAY_SIZE() should be >= ARRAY_SIZE() to prevent an out of bounds
access.
Fixes: e27c41d5b068 ("drm/amd/display: Support for DMUB HPD interrupt handling") Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Evan Quan [Wed, 6 Apr 2022 06:14:50 +0000 (14:14 +0800)]
drm/amd/pm: correct the metrics version for SMU 11.0.11/12/13
Correct the metrics version used for SMU 11.0.11/12/13.
Fixes misreported GPU metrics (e.g., fan speed, etc.) depending
on which version of SMU firmware is loaded.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1925 Signed-off-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Jay Cornwall [Thu, 30 Dec 2021 13:32:06 +0000 (21:32 +0800)]
drm/amdkfd: Add gfx11 trap handler
Based on gfx10 with following changes:
- GPR_ALLOC.VGPR_SIZE field moved (and size corrected in gfx10)
- s_sendmsg_rtn_b64 replaces some s_sendmsg/s_getreg
- Buffer instructions no longer have direct-to-LDS modifier
Lijo Lazar [Thu, 19 May 2022 05:20:25 +0000 (10:50 +0530)]
drm/amd/pm: Fix missing thermal throttler status
On aldebaran, when thermal throttling happens due to excessive GPU
temperature, the reason for throttling event is missed in warning
message. This patch fixes it.
Sunil Khatri [Tue, 17 May 2022 05:57:11 +0000 (11:27 +0530)]
drm/amdgpu: move amdgpu_gmc_tmz_set after ip_version populated
To enable TMZ feature based on IP version needs adev->ip_version
populated but its empty. Move amdgpu_gmc_tmz_set to a place where
ip_version is populated.
Gong Yuanjun [Tue, 17 May 2022 09:57:00 +0000 (17:57 +0800)]
drm/radeon: fix a possible null pointer dereference
In radeon_fp_native_mode(), the return value of drm_mode_duplicate()
is assigned to mode, which will lead to a NULL pointer dereference
on failure of drm_mode_duplicate(). Add a check to avoid npd.
The failure status of drm_cvt_mode() on the other path is checked too.
Haohui Mai [Tue, 17 May 2022 06:06:35 +0000 (23:06 -0700)]
drm/amdgpu: Set CP_HQD_PQ_CONTROL.RPTR_BLOCK_SIZE correctly
Remove the accidental shifts on the values of RPTR_BLOCK_SIZE
in gfx_v8-v11. The bug essentially always programs the
corresponding fields to zero instead of the correct value.
The hardware clamps the min value to 5 so this resulted in a
value of 5 being programmed.
Jonathan Kim [Fri, 13 May 2022 00:38:18 +0000 (20:38 -0400)]
drm/amdkfd: simplify cpu hive assignment
CPU hive assignment currently assumes when a GPU hive is connected_to_cpu,
there is only one hive in the system.
Only assign CPUs to the hive if they are explicitly directly connected to
the GPU hive to get rid of the need for this assumption.
It's more efficient to do this when querying IO links since other non-CRAT
info has to be filled in anyways. Also, stop re-assigning the
same CPU to the same GPU hive if it has already been done before.
Aric Cyr [Mon, 9 May 2022 03:31:34 +0000 (23:31 -0400)]
drm/amd/display: 3.2.186
This version brings along the following:
- Improvements in link training fallback
- Adding individual edp hotplug support
- Fixes in DPIA HPD status, display clock change hang, etc.
- FPU isolation work for DCN30
drm/amd/display: Fic incorrect pipe being used for clk update
[Why]
we save the prev_dppclk value using "dpp_inst" but
when reading this value we use the index "i". In
a case where a pipe is fused off we can end up reading
the incorrect instance because i != dpp_inst in this
case.
[How]
read the prev_dppclk using dpp_inst instead of i
Jasdeep Dhillon [Fri, 6 May 2022 17:03:45 +0000 (13:03 -0400)]
drm/amd/display: Move FPU associated DCN30 code to DML folder
[why & how]
As part of the FPU isolation work documented in
https://patchwork.freedesktop.org/series/93042/, isolate
code that uses FPU in DCN30 to DML, where all FPU code
should locate.
drm/amd/display: Check zero planes for OTG disable W/A on clock change
[Why]
A display clock change hang can occur when switching between DIO and HPO
enabled modes during the optimize_bandwidth in dc_commit_state_no_check
call.
This happens when going from 4k120 8bpc 420 to 4k144 10bpc 444.
Display clock in the DIO case is 1200MHz, but pixel rate is 600MHz
because the pixel format is 420.
Display clock in the HPO case is less (800MHz?) because of ODM combine
which results in a smaller divider.
The DIO is still active in prepare but not active in the optimize which
results in the hang occuring.
During this change there are no planes on the stream so it's safe to
apply the workaround, but dpms_off = false and signal type is not
virtual.
[How]
Check for plane_count == 0, no planes on the stream.
It's easiest to check pipe->plane_state == NULL as an equivalent check
rather than trying to search for the stream status in the context
associated with the stream, so let's do that.
The primary, non MPO pipe should not have a NULL plane state.
Derek Lai [Thu, 5 May 2022 09:59:49 +0000 (17:59 +0800)]
drm/amd/display: Allow individual control of eDP hotplug support
[Why]
Second eDP can send display off notification through HPD
but DC isn't hooked up to handle. Some primary eDP panels
will toggle on/off incorrectly if it's enabled generically.
[How]
Extend the debug option to allow individually enabling hotplug
either the first eDP or the second eDP in a dual eDP system.
Paul Hsieh [Tue, 3 May 2022 06:26:41 +0000 (14:26 +0800)]
drm/amd/display: clear request when release aux engine
[Why]
when driver and dmub request aux engine at the same time,
dmub grant the aux engine but driver fail. Then driver
release aux engine but doesn't clear the request bit.
Then aux engine will be occupied by driver forever.
[How]
When driver release aux engine, clear request bit as well.
Jimmy Kizito [Thu, 14 Apr 2022 13:49:37 +0000 (09:49 -0400)]
drm/amd/display: Update link training fallback behaviour.
[Why]
Some displays may need several link training attempts before
link training succeeds.
[How]
If training succeeds after falling back to lower link bandwidth,
retry at original link bandwidth instead of abandoning link training
whenever link bandwidth is less than stream bandwidth.
Jani Nikula [Fri, 20 May 2022 09:46:00 +0000 (12:46 +0300)]
drm/i915/dsi: fix VBT send packet port selection for ICL+
The VBT send packet port selection was never updated for ICL+ where the
2nd link is on port B instead of port C as in VLV+ DSI.
First, single link DSI needs to use the configured port instead of
relying on the VBT sequence block port. Remove the hard-coded port C
check here and make it generic. For reference, see commit f915084edc5a
("drm/i915: Changes related to the sequence port no for") for the
original VLV specific fix.
Second, the sequence block port number is either 0 or 1, where 1
indicates the 2nd link. Remove the hard-coded port C here for 2nd
link. (This could be a "find second set bit" on DSI ports, but just
check the two possible options.)
Third, sanity check the result with a warning to avoid a NULL pointer
dereference.
Dave Airlie [Fri, 20 May 2022 06:34:29 +0000 (16:34 +1000)]
Merge tag 'msm-next-5.19-fixes' of https://gitlab.freedesktop.org/abhinavk/msm into drm-next
5.19 fixes for msm-next
- Limiting WB modes to max sspp linewidth
- Fixing the supported rotations to add 180 back for IGT
- Fix to handle pm_runtime_get_sync() errors to avoid unclocked access
in the bind() path for dpu driver
- Fix the irq_free() without request issue which was a big-time
hitter in the CI-runs.
Borislav Petkov [Wed, 18 May 2022 11:33:15 +0000 (14:33 +0300)]
drm/i915/uc: Fix undefined behavior due to shift overflowing the constant
Fix:
In file included from <command-line>:0:0:
drivers/gpu/drm/i915/gt/uc/intel_guc.c: In function ‘intel_guc_send_mmio’:
././include/linux/compiler_types.h:352:38: error: call to ‘__compiletime_assert_1047’ \
declared with attribute error: FIELD_PREP: mask is not constant
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
and other build errors due to shift overflowing values.
See https://lore.kernel.org/r/YkwQ6%[email protected] for the gory
details as to why it triggers with older gccs only.
Andi Shyti [Tue, 10 May 2022 14:04:47 +0000 (16:04 +0200)]
drm/i915/gt: Fix use of static in macro mismatch
The INTEL_GT_RPS_SYSFS_ATTR was creating to different structures
but. When called with the "static" keyword this is affecting only
the first structure, while the second is created as non static.
Move the static keyword inside the macros to affect both the
structures.
drm/i915: Fix CFI violation with show_dynamic_id()
When an attribute group is created with sysfs_create_group(), the
->sysfs_ops() callback is set to kobj_sysfs_ops, which sets the ->show()
callback to kobj_attr_show(). kobj_attr_show() uses container_of() to
get the ->show() callback from the attribute it was passed, meaning the
->show() callback needs to be the same type as the ->show() callback in
'struct kobj_attribute'.
However, show_dynamic_id() has the type of the ->show() callback in
'struct device_attribute', which causes a CFI violation when opening the
'id' sysfs node under drm/card0/metrics. This happens to work because
the layout of 'struct kobj_attribute' and 'struct device_attribute' are
the same, so the container_of() cast happens to allow the ->show()
callback to still work.
Change the type of show_dynamic_id() to match the ->show() callback in
'struct kobj_attributes' and update the type of sysfs_metric_id to
match, which resolves the CFI violation.
Imre Deak [Tue, 10 May 2022 11:49:57 +0000 (14:49 +0300)]
drm/i915: Fix 'mixing different enum types' warnings in intel_display_power.c
Fix the following sparse warnings:
drivers/gpu/drm/i915/display/intel_display_power.c:2431:34: warning: mixing different enum types:
drivers/gpu/drm/i915/display/intel_display_power.c:2431:34: unsigned int enum intel_display_power_domain
drivers/gpu/drm/i915/display/intel_display_power.c:2431:34: int enum port
drivers/gpu/drm/i915/display/intel_display_power.c:2442:37: warning: mixing different enum types:
drivers/gpu/drm/i915/display/intel_display_power.c:2442:37: unsigned int enum intel_display_power_domain
drivers/gpu/drm/i915/display/intel_display_power.c:2442:37: int enum port
drivers/gpu/drm/i915/display/intel_display_power.c:2468:43: warning: mixing different enum types:
drivers/gpu/drm/i915/display/intel_display_power.c:2468:43: unsigned int enum intel_display_power_domain
drivers/gpu/drm/i915/display/intel_display_power.c:2468:43: unsigned int enum aux_ch
drivers/gpu/drm/i915/display/intel_display_power.c:2479:35: warning: mixing different enum types:
drivers/gpu/drm/i915/display/intel_display_power.c:2479:35: unsigned int enum intel_display_power_domain
drivers/gpu/drm/i915/display/intel_display_power.c:2479:35: unsigned int enum aux_ch
Abhinav Kumar [Wed, 18 May 2022 22:34:07 +0000 (15:34 -0700)]
drm/msm/dpu: handle pm_runtime_get_sync() errors in bind path
If there are errors while trying to enable the pm in the
bind path, it will lead to unclocked access of hw revision
register thereby crashing the device.
This will not address why the pm_runtime_get_sync() fails
but at the very least we should be able to prevent the
crash by handling the error and bailing out earlier.
changes in v2:
- use pm_runtime_resume_and_get() instead of
pm_runtime_get_sync()
drm/msm: don't free the IRQ if it was not requested
As msm_drm_uninit() is called from the msm_drm_init() error path,
additional care should be necessary as not to call the free_irq() for
the IRQ that was not requested before (because an error occured earlier
than the request_irq() call).
This fixed the issue reported with the following backtrace:
drm/amd: Don't reset dGPUs if the system is going to s2idle
An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC
reset for handling aborted suspend can't work with s2idle.
This functionality was introduced in commit daf8de0874ab5b ("drm/amdgpu:
always reset the asic in suspend (v2)"). A few other commits have
gone on top of the ASIC reset, but this still doesn't work on the A+A
configuration in s2idle.
Avoid doing the reset on dGPUs specifically when using s2idle.
Miaoqian Lin [Thu, 12 May 2022 12:19:50 +0000 (16:19 +0400)]
drm/msm/a6xx: Fix refcount leak in a6xx_gpu_init
of_parse_phandle() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
a6xx_gmu_init() passes the node to of_find_device_by_node()
and of_dma_configure(), of_find_device_by_node() will takes its
reference, of_dma_configure() doesn't need the node after usage.
Douglas Anderson [Fri, 13 May 2022 20:15:13 +0000 (13:15 -0700)]
drm/msm/dsi: don't powerup at modeset time for parade-ps8640
Commit 7d8e9a90509f ("drm/msm/dsi: move DSI host powerup to modeset
time") caused sc7180 Chromebooks that use the parade-ps8640 bridge
chip to fail to turn the display back on after it turns off.
Unfortunately, it doesn't look easy to fix the parade-ps8640 driver to
handle the new power sequence. The Linux driver has almost nothing in
it and most of the logic for this bridge chip is in black-box firmware
that the bridge chip uses.
Also unfortunately, reverting the patch will break "tc358762".
The long term solution here is probably Dave Stevenson's series [1]
that would give more flexibility. However, that is likely not a quick
fix.
For the short term, we'll look at the compatible of the next bridge in
the chain and go back to the old way for the Parade PS8640 bridge
chip. If it's found that other bridge chips also need this workaround
then we can add them to the list or consider inverting the
condition. However, the hope is that the framework will not take too
much longer to land and we won't have to add anything other than
ps8640 here.
Sathishkumar S [Wed, 11 May 2022 11:48:31 +0000 (17:18 +0530)]
drm/amd/pm: consistent approach for smartshift
create smartshift sysfs attributes from dGPU device even
on smartshift 1.0 platform to be consistent. Do not populate
the attributes on platforms that have APU only but not dGPU
or vice versa.
V2:
avoid checking for the number of VGA/DISPLAY devices (Lijo)
move code to read from dGPU or APU into a function and reuse (Lijo)
Graham Sider [Thu, 12 May 2022 18:34:22 +0000 (14:34 -0400)]
drm/amdkfd: Fix static checker warning on MES queue type
convert_to_mes_queue_type return can be negative, but
queue_input.queue_type is uint32_t. Put return in integer var and cast
to unsigned after negative check.
Sathishkumar S [Wed, 11 May 2022 11:05:59 +0000 (16:35 +0530)]
drm/amd/pm: update smartshift powerboost calc for smu13
smartshift apu and dgpu power boost are reported as percentage
with respect to their power limits. adjust the units of power before
calculating the percentage of boost.
Sathishkumar S [Wed, 11 May 2022 10:36:12 +0000 (16:06 +0530)]
drm/amd/pm: update smartshift powerboost calc for smu12
smartshift apu and dgpu power boost are reported as percentage with
respect to their power limits. This value[0-100] reflects the boost
for the respective device.
Lang Yu [Wed, 11 May 2022 07:37:27 +0000 (15:37 +0800)]
drm/amdkfd: allocate MMIO/DOORBELL BOs with AMDGPU_GEM_CREATE_PREEMPTIBLE
MMIO/DOORBELL BOs' backing resources(bus address resources that are
used to talk to the GPU) are not managed by GTT manager, but they
are counted by GTT manager. That makes no sense.
With AMDGPU_GEM_CREATE_PREEMPTIBLE flag, such BOs will be managed by
PREEMPT manager(for preemptible contexts, e.g., KFD). Then they won't
be evicted and don't need to be pinned as well.
But we still leave these BOs pinned to indicate that the underlying
resource never moves.