This version brings along following fixed:
- Guard DST_Y_PREFETCH register overflow in DCN21
- Add missing DCN21 IP parameter
- Fix PSR command version
- Add ETW logging for AUX failures
- Add ETW log to dmub_psr_get_state
- Fixed EdidUtility build errors
- Fix missing reg offset for the dmcub test debug registers
- Adding update authentication interface
- Remove unused functions of opm state query support
- Always wait for update lock status
- Refactor riommu invalidation wa
- Ensure dentist display clock update finished in DCN20
drm/amd/display: ensure dentist display clock update finished in DCN20
[Why]
We don't check DENTIST_DISPCLK_CHG_DONE to ensure dentist
display clockis updated to target value. In some scenarios with large
display clock margin, it will deliver unfinished display clock and cause
issues like display black screen.
[How]
Checking DENTIST_DISPCLK_CHG_DONE to ensure display clock
has been update to target value before driver do other clock related
actions.
Wenjing Liu [Thu, 15 Jul 2021 18:55:28 +0000 (14:55 -0400)]
drm/amd/display: remove unused functions
[why]
It has been decided that opm state query support will be dropped.
Therefore link encryption enabled and save current encryption states
won't be used anymore and there are no foreseeable usages in the future.
We will remove these two interfaces for clean up.
[why]
Previously to toggle authentication, we need to remove and
add the same display back with modified adjustment.
This method will toggle DTM state without actual hardware changes.
This is not per design and would cause potential issues in the long run.
[how]
We are creating a dedicated interface that does the same thing as
remove and add back the display without changing DTM state.
[why]
For dual eDP when setting the new settings we need to set
command version to DMUB_CMD_PSR_CONTROL_VERSION_1, otherwise
DMUB will not read panel_inst parameter.
[how]
Instead of PSR_VERSION_1 pass DMUB_CMD_PSR_CONTROL_VERSION_1
Chengzhe Liu [Thu, 22 Jul 2021 03:31:10 +0000 (11:31 +0800)]
drm/amdgpu: Add msix restore for pass-through mode
In pass-through mode, after mode 1 reset, msix enablement status would
lost and never receives interrupt again. So, we should restore msix
status after mode 1 reset.
Stylon Wang [Tue, 20 Jul 2021 02:58:19 +0000 (10:58 +0800)]
drm/amd/display: Fix ASSR regression on embedded panels
[Why]
Regression found in some embedded panels traces back to the earliest
upstreamed ASSR patch. The changed code flow are causing problems
with some panels.
[How]
- Change ASSR enabling code while preserving original code flow
as much as possible
- Simplify the code on guarding with internal display flag
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=213779
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1620 Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Stylon Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Chengzhe Liu [Tue, 20 Jul 2021 07:18:12 +0000 (15:18 +0800)]
drm/amdgpu: Clear doorbell interrupt status for Sienna Cichlid
On Sienna Cichlid, in pass-through mode, if we unload the driver in BACO
mode(RTPM), then the kernel would receive thousands of interrupts.
That's because there is doorbell monitor interrupt on BIF, so KVM keeps
injecting interrupts to the guest VM. So we should clear the doorbell
interrupt status after BACO exit.
drm/amd/pm: Fix a bug communicating with the SMU (v5)
This fixes a bug which if we probe a non-existing
I2C device, and the SMU returns 0xFF, from then on
we can never communicate with the SMU, because the
code before this patch reads and interprets 0xFF
as a terminal error, and thus we never write 0
into register 90 to clear the status (and
subsequently send a new command to the SMU.)
It is not an error that the SMU returns status
0xFF. This means that the SMU executed the last
command successfully (execution status), but the
command result is an error of some sort (execution
result), depending on what the command was.
When doing a status check of the SMU, before we
send a new command, the only status which
precludes us from sending a new command is 0--the
SMU hasn't finished executing a previous command,
and 0xFC--the SMU is busy.
This bug was seen as the following line in the
kernel log,
amdgpu: Msg issuing pre-check failed(0xff) and SMU may be not in the right state!
when subsequent SMU commands, not necessarily
related to I2C, were sent to the SMU.
This patch fixes this bug.
v2: Add a comment to the description of
__smu_cmn_poll_stat() to explain why we're NOT
defining the SMU FW return codes as macros, but
are instead hard-coding them. Such a change, can
be followed up by a subsequent patch.
v3: The changes are,
a) Add comments to break labels in
__smu_cmn_reg2errno().
b) When an unknown/unspecified/undefined result is
returned back from the SMU, map that to
-EREMOTEIO, to distinguish failure at the SMU
FW.
c) Add kernel-doc to
smu_cmn_send_msg_without_waiting(),
smu_cmn_wait_for_response(),
smu_cmn_send_smc_msg_with_param().
d) In smu_cmn_send_smc_msg_with_param(), since we
wait for completion of the command, if the
result of the completion is
undefined/unknown/unspecified, we print that to
the kernel log.
v4: a) Add macros as requested, though redundant, to
be removed when SMU consolidates for all
ASICs--see comment in code.
b) Get out if the SMU code is unknown.
drm/amd/amdgpu: consider kernel job always not guilty
[Why]
Currently all timedout job will be considered to be guilty. In SRIOV
multi-vf use case, the vf flr happens first and then job time out is
found. There can be several jobs timeout during a very small time slice.
And if the innocent sdma job time out is found before the real bad
job, then the innocent sdma job will be set to guilty. This will lead
to a page fault after resubmitting job.
[How]
If the job is a kernel job, we will always consider it not guilty
Anson Jacob [Mon, 19 Jul 2021 17:46:09 +0000 (13:46 -0400)]
drm/amdgpu: Fix documentaion for dm_dmub_outbox1_low_irq
Fix make htmldocs complaint:
./drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:628: warning: Excess function parameter 'interrupt_params' description in 'DMUB_TRACE_MAX_READ'
v2:
Moved DMUB_TRACE_MAX_READ macro above function documentation
Oak Zeng [Thu, 15 Jul 2021 23:34:25 +0000 (18:34 -0500)]
drm/amdkfd: Fix a concurrency issue during kfd recovery
start_cpsch and stop_cpsch can be called during kfd device
initialization or during gpu reset/recovery. So they can
run concurrently. Currently in start_cpsch and stop_cpsch,
pm_init and pm_uninit is not protected by the dpm lock.
Imagine such a case that user use packet manager's function
to submit a pm4 packet to hang hws (ie through command
cat /sys/class/kfd/kfd/topology/nodes/1/gpu_id | sudo tee
/sys/kernel/debug/kfd/hang_hws), while kfd device is under
device reset/recovery so packet manager can be not initialized.
There will be unpredictable protection fault in such case.
This patch moves pm_init/uninit inside the dpm lock and check
packet manager is initialized before using packet manager
function.
Oak Zeng [Wed, 14 Jul 2021 14:59:51 +0000 (09:59 -0500)]
drm/amdgpu: Change a few function names
Function name "psp_np_fw_load" is not proper as people don't
know _np_fw_ means "non psp firmware". Change the function
name to psp_load_non_psp_fw for better understanding. Same
thing for function psp_execute_np_fw_load.
Oak Zeng [Wed, 14 Jul 2021 14:50:37 +0000 (09:50 -0500)]
drm/amdgpu: Fix a printing message
The printing message "PSP loading VCN firmware" is mis-leading because
people might think driver is loading VCN firmware. Actually when this
message is printed, driver is just preparing some VCN ucode, not loading
VCN firmware yet. The actual VCN firmware loading will be in the PSP block
hw_init. Fix the printing message
Jonathan Kim [Wed, 12 May 2021 16:26:20 +0000 (12:26 -0400)]
drm/amdgpu: add psp command to get num xgmi links between direct peers
The TA can now be invoked to provide the number of xgmi links connecting
a direct source and destination peer.
Non-direct peers will report zero links.
Anson Jacob [Mon, 19 Jul 2021 15:09:40 +0000 (11:09 -0400)]
drm/amdgpu: Fix documentaion for amdgpu_bo_add_to_shadow_list
make htmldocs complaints about parameter for amdgpu_bo_add_to_shadow_list
./drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:739: warning: Excess function parameter 'bo' description in 'amdgpu_bo_add_to_shadow_list'
./drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:739: warning: Function parameter or member 'vmbo' not described in 'amdgpu_bo_add_to_shadow_list'
./drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:739: warning: Excess function parameter 'bo' description in 'amdgpu_bo_add_to_shadow_list'
[Why]
PMFW message which previously thought to only control Z9 controls both
Z9 and Z10. Also HW design team requested that Z9 must only be supported
on eDP due to content protection interop.
[How]
Change zstate support condition to match updated policy
Anthony Koo [Sun, 11 Jul 2021 02:03:07 +0000 (22:03 -0400)]
drm/amd/display: [FW Promotion] Release 0.0.75
- Add reserved bits for future feature development
- Fix issue with mismatch with type const
- Replaced problematic code with old memcpy and casted problematic
pointers to unsigned char pointers
drm/amd/display: Refine condition for cursor visibility
[why]
There's a special case where upper plane is not the main plane. If it owns
the cursor, it will be invisible in the majority of the screen.
[How]
The condition for disabling cursor is changed:
- check if upper viewport completely covers current. This was the
previous change that doesn't handle all scenarios with pipe splitting.
- if not, show the cursor only if it's not scaled or no upper pipe.
Eric Yang [Fri, 9 Jul 2021 16:57:50 +0000 (12:57 -0400)]
drm/amd/display: add workaround for riommu invalidation request hang
[Why]
When an riommu invalidation request come at the same time as a pipe is
disabled there can be a case where DCN cannot ACK the request if only
one VMID is setup in the inuse list.
[How]
Setup a second unused VMID will work around the issue.
DCN 3x increased Line buffer size for DCHUB latency hiding, from 4 lines
of 4K resolution lines to 5 lines of 4K resolution lines. All Line
Buffer can be used as extended memory for P State change latency hiding.
The maximum number of lines is increased to 32 lines. Finally,
LB_MEMORY_CONFIG_1 (LB memory piece 1) and LB_MEMORY _CONFIG_2 (LB
memory piece 2) are not affected, no change in size, only 3 pieces is
affected, i.e., when all 3 pieces are used in both LB_MEMORY_CONFIG_0
and LB_MEMORY_CONFIG_3 (for 4:2:0) modes.
drm/amd/display: DCN2X Prefer ODM over bottom pipe to find second pipe
[WHY]
When finding a second pipe for pipe split, currently will look for
bottom pipe in context first to decide the second pipe. This causes
issues in 2 plane to 1 plane transitions like fullscreen video where
bottom pipe no longer exists in the new configuration.
[HOW]
If previous context had an ODM pipe, use that to find the secondary pipe
first before looking at bottom pipe.
[Why & How]
We're missing a default value for dram_channel_width_bytes in the
DCN3.1 SOC bounding box and we don't currently have the interface in
place to query the actual value from VBIOS.
Put in a hardcoded default until we have the interface in place.
drm/amd/display: Fix max vstartup calculation for modes with borders
[Why]
Vertical and horizontal borders in timings are treated as increasing the
active area - vblank and hblank actually shrink.
Our input into DML does not include these borders so it incorrectly
assumes it has more time than available for vstartup and tmdl
calculations for some modes with borders.
An example of such a timing would be 640x480@72Hz:
Jake Wang [Wed, 30 Jun 2021 17:55:50 +0000 (13:55 -0400)]
drm/amd/display: Fixed hardware power down bypass during headless boot
[Why]
During headless boot, DIG may be on which causes HW/SW discrepancies.
To avoid this we power down hardware on boot if DIG is turned on. With
introduction of multiple eDP, hardware power down is being bypassed
under certain conditions.
[How]
Fixed hardware power down bypass, and ensured hardware will power down
if DIG is on and seamless boot is not enabled.
Zhan Liu [Tue, 29 Jun 2021 01:20:18 +0000 (21:20 -0400)]
drm/amd/display: Reduce delay when sink device not able to ACK 00340h write
[Why]
Theoretically, per DP 1.4a spec, sink device needs to AUX_ACK 00340h
write. However, due to hardware limitation, some sink devices have no
00340h dpcd address at all. This results in sink side fails to reply
ACK, and consequently cause source side keep retrying DPCD write on DPCD
00340h. This results in significant delay when DPCD 00340h write is
triggered (e.g. at S3 resume).
[How]
Check whether sink device could ACK on DPCD 00340h write on boot. If
sink device fails to ACK, then remember that, so we won't write to DPCD
00340h later on.
There will be a drm.debug KMS level message to inform user once a 00340h
DPCD write is skipped on purpose.
Aurabindo Pillai [Tue, 29 Jun 2021 01:41:08 +0000 (21:41 -0400)]
drm/amd/display: add debug print for DCC validation failure
[Why&How]
Print a debug message when dcc validation fails in the display driver.
Most DCC enablement related errors are from userspace. Adding a debug
print in case of a failure from display driver will aid quicker triage.
This tends to take miliseconds in certain scenarios and we'd rather not
wait that long. Due to how this interacts with det size update and
locking waiting should not be necessary as compbuf updates before
unlock.
Add a watch for config error instead as that is something we actually do
care about.