Git Repo - linux.git/log

drm/amdgpu: use string choice helpers

Use string choice helpers for better readability.

Reported-by: kernel test robot <[email protected]>
Reported-by: Julia Lawall <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Signed-off-by: R Sundar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix comment about amdgpu.abmlevel defaults

Since 040fdcde288a ("drm/amdgpu: respect the abmlevel module parameter value
if it is set"), the default value for amdgpu.abmlevel was set to -1, or auto.
However, the comment explaining the default value was not updated to reflect
the change (-1, or auto; not -1, or disabled).

Clarify that the default value (-1) means auto.

Fixes: 040fdcde288a ("drm/amdgpu: respect the abmlevel module parameter value if it is set")
Reported-by: Ruikai Liu <[email protected]>
Signed-off-by: Mingcong Bai <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Expose special on chip memory pools in fdinfo

In the past these specialized on chip memory pools were reported as system
memory (aka 'cpu') which was not correct and misleading. That has since
been removed so lets make them visible as their own respective memory
regions.

Reviewed-by: Christian König <[email protected]>
Signed-off-by: Tvrtko Ursulin <[email protected]>
Cc: Christian König <[email protected]>
Cc: Yunxiang Li <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Stop reporting special chip memory pools as CPU memory in fdinfo

So far these specialized on chip memory pools were reported as system
memory (aka 'cpu') which is not correct and misleading. Lets remove that
and consider later making them visible as their own thing.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Suggested-by: Christian König <[email protected]>
Cc: Yunxiang Li <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: stop tracking visible memory stats

Since on modern systems all of vram can be made visible anyways, to
simplify the new implementation, drops tracking how much memory is
visible for now. If this is really needed we can add it back on top of
the new implementation, or just report all the BOs as visible.

Signed-off-by: Yunxiang Li <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: make drm-memory-* report resident memory

The old behavior reports the resident memory usage for this key and the
documentation say so as well. However this was accidentally changed to
include buffers that was evicted.

Fixes: 04bdba46542c ("drm/amdgpu: Use drm_print_memory_stats helper from fdinfo")
Signed-off-by: Yunxiang Li <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Tvrtko Ursulin <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Fix the memory allocation issue in amdgpu_discovery_get_nps_info()

Fix two issues with memory allocation in amdgpu_discovery_get_nps_info()
for mem_ranges:

- Add a check for allocation failure to avoid dereferencing a null
pointer.

- As suggested by Christophe, use kvcalloc() for memory allocation,
which checks for multiplication overflow.

Additionally, assign the output parameters nps_type and range_cnt after
the kvcalloc() call to prevent modifying the output parameters in case
of an error return.

Fixes: b194d21b9bcc ("drm/amdgpu: Use NPS ranges from discovery table")
Suggested-by: Christophe JAILLET <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Li Huafei <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add ring reset messages

Add messages to make it clear when a per ring reset
happens. This is helpful for debugging and aligns with
other reset methods.

v2: add ring name in success/fail messages (Lijo)

Reviewed-by: Lijo Lazar <[email protected]>
Reviewed-by: Kent Russell <[email protected]> (v1)
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix fairness in enforce isolation handling

Make sure KFD gets a turn when serializing access to
the GC IP. Currently non-KFD jobs can starve KFD if they
submit often enough. This patch prevents that by stalling
non-KFD if its time period has elapsed.

v2: fix units
v3: check enablement properly

Acked-by: Srinivasan Shanmugam <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Remove last parts of timing_trace

Commit c2c2ce1e9623 ("drm/amd/display: Optimize passive update planes.")
removed the last caller of context_timing_trace.
Remove it.

With that gone, no one is now looking at the 'timing_trace' flag, remove
it and all the places that set it.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Remove unused cm3_helper_translate_curve_to_degamma_hw_format

cm3_helper_translate_curve_to_degamma_hw_format() since it was added in
2020's commit
03f54d7d3448 ("drm/amd/display: Add DCN3 DPP")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Remove unused regamma functions

calculate_user_regamma_coeff() and calculate_user_regamma_ramp() were
added in 2018 in commit
55a01d4023ce ("drm/amd/display: Add user_regamma to color module")

but never used.

Remove them and their helpers.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: add an interface to query whether is KFD is active

Add an interface to query whether KFD has any active queues.

v2: fix build issues

Acked-by: Srinivasan Shanmugam <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/smu13: fix profile reporting

The following 3 commits landed in parallel:
commit d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting")
commit 7a1613e47e65 ("drm/amdgpu/smu13: always apply the powersave optimization")
commit 7c210ca5a2d7 ("drm/amdgpu: handle default profile on on devices without fullscreen 3D")
While everything is set correctly, this caused the profile to be
reported incorrectly because both the powersave and fullscreen3d bits
were set in the mask and when the driver prints the profile, it looks
for the first bit set.

Fixes: d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting")
Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: flag per-queue reset support for gfx9

Flag KFD support for per-queue reset on GFX9 devices.

Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: optimize ACA log print

- skip to print CE ACA log.
- optimize ACA log print for MCA.

Signed-off-by: Yang Wang <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add generic func to check if ta fw is applicable

Separated xgmi ta is required for specific APU, and driver needs
parse the ta binary properly with aux xgmi ta packed.

v2: make the check function more generic (Lijo)

Signed-off-by: Le Ma <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: clean up the suspend_complete

To check the status of S3 suspend completion,
use the PM core pm_suspend_global_flags bit(1)
to detect S3 abort events. Therefore, clean up
the AMDGPU driver's private flag suspend_complete.

Signed-off-by: Prike Liang <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: correct the S3 abort check condition

In the normal S3 entry, the TOS cycle counter is not
reset during BIOS execution the _S3 method, so it doesn't
determine whether the _S3 method is executed exactly.
Howerver, the PM core performs the S3 suspend will set the
PM_SUSPEND_FLAG_FW_RESUME bit if all the devices suspend
successfully. Therefore, drivers can check the
pm_suspend_global_flags bit(1) to detect the S3 suspend
abort event.

Fixes: 6704dbf71928 ("drm/amdgpu: update suspend status for aborting from deeper suspend")
Signed-off-by: Prike Liang <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm: Vangogh: Fix kernel memory out of bounds write

KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:

[   33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[   33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[   33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G        W          6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[   33.861816] Tainted: [W]=WARN
[   33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   33.861822] Call Trace:
[   33.861826]  <TASK>
[   33.861829]  dump_stack_lvl+0x66/0x90
[   33.861838]  print_report+0xce/0x620
[   33.861853]  kasan_report+0xda/0x110
[   33.862794]  kasan_check_range+0xfd/0x1a0
[   33.862799]  __asan_memset+0x23/0x40
[   33.862803]  smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.863306]  vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.864257]  vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.865682]  amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.866160]  amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.867135]  dev_attr_show+0x43/0xc0
[   33.867147]  sysfs_kf_seq_show+0x1f1/0x3b0
[   33.867155]  seq_read_iter+0x3f8/0x1140
[   33.867173]  vfs_read+0x76c/0xc50
[   33.867198]  ksys_read+0xfb/0x1d0
[   33.867214]  do_syscall_64+0x90/0x160
...
[   33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[   33.867358]  kasan_save_stack+0x33/0x50
[   33.867364]  kasan_save_track+0x17/0x60
[   33.867367]  __kasan_kmalloc+0x87/0x90
[   33.867371]  vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[   33.867835]  smu_sw_init+0xa32/0x1850 [amdgpu]
[   33.868299]  amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[   33.868733]  amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[   33.869167]  amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[   33.869608]  local_pci_probe+0xda/0x180
[   33.869614]  pci_device_probe+0x43f/0x6b0

Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.

Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.

The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.

v2:
* Drop impossible v3_0 case. (Mario)

Signed-off-by: Tvrtko Ursulin <[email protected]>
Fixes: 41cec40bc9ba ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <[email protected]>
Cc: Evan Quan <[email protected]>
Cc: Wenyou Yang <[email protected]>
Cc: Alex Deucher <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: 3.2.307

This version brings along following fixes:
- Fix polling DSC registers during S0i3
- Fix idle optimizations entry log
- Change MPC Tree visual confirm colours
- Fix underflow when playing 8K video in full screen mode
- Optimize power up sequence for specific OLED

Acked-by: Wayne Lin <[email protected]>
Signed-off-by: Aric Cyr <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: [FW Promotion] Release 0.0.240.0

Add some scruct for secure display.

Acked-by: Wayne Lin <[email protected]>
Signed-off-by: Taimur Hassan <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: store sharpness 1dlut table in dscl_prog_data

[Why]
Previously dscl_prog_data stored pointer to sharpness 1dlut table.
SPL had four pre-generated tables, one for each setup. This allowed
us to minimize number of times we had to recalculate table when
switching between setups.
However, with dual display, this becomes an issue because for a given
setup, we could have a different per app sharpness value than the
global sharpness value. So the pre-generated table will change
but both displays may point to the same table and one of them
will have the wrong sharpness setting.

[How]
Store the sharpness 1dlut table in dscl_prog_data. This ensures
that each display can have its own sharpness setting.

Reviewed-by: Ilya Bakoulin <[email protected]>
Signed-off-by: Samson Tam <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Do not read DSC state if not in use

[why & how]
DSC may be power gated when coming out of S0i3, so avoid polling
DSC registers since it will fail anyways. Only read if it is known
that DSC is in use.

Reviewed-by: Charlene Liu <[email protected]>
Signed-off-by: Ovidiu Bunea <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix idle optimizations entry log

[Why & How]
Whether we really enter idle optimizations are decided within DC.
Printing into dmesg before calling the DC API gives an incorrect
indication that we are entering idle optimization in cases where its
disabled manually.

To fix this, remove the print in DM and add them in DC

Reviewed-by: Alvin Lee <[email protected]>
Signed-off-by: Aurabindo Pillai <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Change MPC Tree visual confirm colours

[Why]
MPC background colours that use fractional components look different if
MPC OGAM is in use vs in bypass mode. The current red and orange colours
look very similar when OGAM is in bypass, so the colours need to change
to be consistently very easy to tell apart.

[How]
Use colours that only have 0 or MAX values in each component

Reviewed-by: Alvin Lee <[email protected]>
Signed-off-by: Joshua Aberback <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Simplify dcn35_is_ips_supported()

[WHAT & HOW]
The variable "ips_supported" is redundant and we can return from
dcn35_smu_get_ips_supported directly.

This fixes 1 UNUSED_VALUE issue reported by Coverity.

Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Hung <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Remove useless assignments and variables

[WHAT & HOW]
misc0, temp and split_pipe are assigned but immediately re-assigned
to other values. The early assignments are useless and are removed.
Unused variables are removed as well.

This fixes 5 UNUSED_VALUE issues reported by Coverity.

Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Hung <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: fix handling of max_downscale_src_width fail check in SPL

[Why]
If max_downscale_src_width check fails, we exit early from
spl_calculate_scaler_params but dscl_prog_data is not fully
populated. If viewport is left at 0, it can cause crash in dml.

[How]
Call spl_set_dscl_prog_data before we exit early from
spl_calculate_scaler_params to populate dscl_prog_data
Populate taps in spl_get_optimal_number_of_taps

Reviewed-by: Alvin Lee <[email protected]>
Signed-off-by: Samson Tam <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix underflow when playing 8K video in full screen mode

[Why&How]
Flickering observed while playing 8k HEVC-10 bit video in full screen
mode with black border. We didn't support this case for subvp.
Make change to the existing check to disable subvp for this corner case.

Reviewed-by: Alvin Lee <[email protected]>
Signed-off-by: Leo Ma <[email protected]>
Signed-off-by: Dillon Varone <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Refactoring if and endif statements to enable DC_LOGGER

[Why]
For Header related changes for core

[How]
Refactoring if and endif statements to enable DC_LOGGER

Reviewed-by: Mounika Adhuri <[email protected]>
Reviewed-by: Alvin Lee <[email protected]>
Signed-off-by: Lohita Mudimela <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Reduce HPD Detection Interval for IPS

Fix DP Compliance test 4.2.1.3, 4.2.2.8, 4.3.1.12, 4.3.1.13
when IPS enabled.

Original HPD detection interval is set to 5s which violates DP
compliance.
Reduce the interval parameter, such that link training can be
finished within 5 seconds.

Fixes: afca033f10d3 ("drm/amd/display: Add periodic detection for IPS")
Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Fangzhi Zuo <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"

This reverts
commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35")

[why & how]
The offending commit exposes a hang with lid close/open behavior.
Both issues seem to be related to ODM 2:1 mode switching, so there
is another issue generic to that sequence that needs to be
investigated.

Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Signed-off-by: Ovidiu Bunea <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Add P-State Stall Timeout Recovery Support for dcn401

[WHY&HOW]
Adds support for P-State stall timeout detection in DCHUBBUB.

Reviewed-by: Alvin Lee <[email protected]>
Signed-off-by: Dillon Varone <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Add a boot option to reduce phy ssc for HBR3

[Why]
Spread on DPREFCLK by 0.3 percent can have a negative effect on sink
when PHY SSC is also spread by 0.3 percent
[How]
Add boot option for DMU to lower PHY SSC

Reviewed-by: Nicholas Kazlauskas <[email protected]>
Signed-off-by: Hansen Dsouza <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Optimize power up sequence for specific OLED

[why & how]
OLED power up sequence takes an extra 150ms via hardcoded delay,
but there is a strict requirement on DisplayOn resume time.
For customer panel, remove these delays to meet target until a
cleaner solution is can be put in place.

Reviewed-by: Charlene Liu <[email protected]>
Signed-off-by: Ovidiu Bunea <[email protected]>
Signed-off-by: Tom Chung <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: drop volatile from ring buffer

Volatile only prevents the compiler from re-ordering reads and writes.
Since we always only modify the ring buffer from one CPU thread and have
an explicit barrier before signaling the HW this should have no effect at
all and just prevents compiler optimisations.

While at it drop the local variables as well.

Signed-off-by: Christian König <[email protected]>
Reviewed-by: Sunil Khatri <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Fix amdgpu_ip_block_hw_fini()

This NULL check is reversed so the function doesn't work.

Fixes: dad01f93f432 ("drm/amdgpu: validate hw_fini before function call")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

Documentation/gpu/amdgpu: Add programming model for DCN

One of the challenges to contributing to the display code is the
complexity of the DC component. This commit adds a documentation page
that discusses the programming model used by DCN and an overview of how
the display code is organized.

Cc: Leo Li <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Cc: Hamza Mahfooz <[email protected]>
Cc: Harry Wentland <[email protected]>
Cc: Mario Limonciello <[email protected]>
Cc: Christian Konig <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

Documentation/gpu: Document how to narrow down display issues

The amdgpu driver is composed of multiple components, each of which can
be a source of some specific problem that the user/developer can see.
This commit introduces steps to narrow down and collect display
information.

Cc: Leo Li <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Cc: Hamza Mahfooz <[email protected]>
Cc: Harry Wentland <[email protected]>
Cc: Mario Limonciello <[email protected]>
Cc: Christian Konig <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

amdgpu: Don't print L2 status if there's nothing to print

If a 2nd fault comes in before the 1st is handled, the 1st fault will
clear out the FAULT STATUS registers before the 2nd fault is handled.
Thus we get a lot of zeroes. If status=0, just skip the L2 fault status
information, to avoid confusion of why some VM fault status prints in
dmesg are all zeroes.

Signed-off-by: Kent Russell <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: add missing tracepoint event in DM atomic_commit_tail

There are two events to trace the beginning and the end of
amdgpu_dm_atomic_commit_tail, but only the one ate the beginning was
placed. Place amdgpu_dm_atomic_commit_tail_finish tracepoint at the end
than.

Signed-off-by: Melissa Wen <[email protected]>
Reviewed-by: Leo Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: sever xgmi io link if host driver has disable sharing

Host drivers can create partial hives per guest by disabling xgmi sharing
between certain peers in the main hive.
Typically, these partial hives are fully connected per guest session.
In the event that the host makes a mistake by adding a non-shared node
to a guest session, have the KFD reflect sharing disabled by severing
the IO link.

Signed-off-by: Jonathan Kim <[email protected]>
Tested-by: James Yao <[email protected]>
Reviewed-by: Harish Kasiviswanathan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: refine error handling in amdgpu_ttm_tt_pin_userptr

Free sg table when dma_map_sgtable() failed to avoid memory leak.

Signed-off-by: Lang Yu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Fix the logic for NPS request failure

On a hive, NPS request is placed by the first one for all devices in the
hive. If the request fails, mark the mode as UNKNOWN so that subsequent
devices on unload don't request it. Also, fix the mutex double lock
issue in error condition, should have been mutex_unlock.

Fixes: ee52489d1210 ("drm/amdgpu: Place NPS mode request on unload")
Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Rajneesh Bhardwaj <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: remove extra use of volatile

as the adding of mb() should be sufficient in function unmap_queues_cpsch,
remove the add of volatile type as recommended

Signed-off-by: Victor Zhao <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Reduce redundant gpu resets on nbio v7.4

On nbio v7.4, ras controller interrupt and athub
interrupt are generated after injecting UE to PCIE,
but gpu reset only needs to be triggered once.

Signed-off-by: YiPeng Chai <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: handle default profile on on devices without fullscreen 3D

Some devices do not support fullscreen 3D.

v2: Make the check generic.

Fixes: 336568de918e ("drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs")
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: Kenneth Feng <[email protected]>
Cc: Lijo Lazar <[email protected]>

Revert "drm/amdkfd: SMI report dropped event count"

This reverts commit a3ab2d45b9887ee609cd3bea39f668236935774c.

The userspace side for this code is not ready yet so revert
for now.

Reviewed-by: Philip Yang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: Philip Yang <[email protected]>

drm/amdgpu: Dereference the ATCS ACPI buffer

Need to dereference the atcs acpi buffer after
the method is executed, otherwise it will result in
a memory leak.

Signed-off-by: Prike Liang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Save VCN shared memory with init reset

VCN shared memory is in framebuffer and there are some flags initialized
during sw_init. Ideally, such programming should be during hw_init.

Make sure the flags are saved during reset on initialization since that
reset will affect frame buffer region. For clarity, separate it out to
another function.

Fixes: 1e4acf4d93cd ("drm/amdgpu: Add reset on init handler for XGMI")
Signed-off-by: Lijo Lazar <[email protected]>
Reported-by: Hao Zhou <[email protected]>
Reviewed-by: Leo Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: clean unused functions of uvd/vcn/vce

Some of the functions pointers of amdgpu_ip_funcs
are not used and are left commented out. Hence this
cleans those up which arent used.

Cc: Leo Liu <[email protected]>
Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Disable PSR-SU on Parade 08-01 TCON too

Stuart Hayhurst has found that both at bootup and fullscreen VA-API video
is leading to black screens for around 1 second and kernel WARNING [1] traces
when calling dmub_psr_enable() with Parade 08-01 TCON.

These symptoms all go away with PSR-SU disabled for this TCON, so disable
it for now while DMUB traces [2] from the failure can be analyzed and the failure
state properly root caused.

Cc: Marc Rossi <[email protected]>
Cc: Hamza Mahfooz <[email protected]>
Link: https://gitlab.freedesktop.org/drm/amd/uploads/a832dd515b571ee171b3e3b566e99a13/dmesg.log
Link: https://gitlab.freedesktop.org/drm/amd/uploads/8f13ff3b00963c833e23e68aa8116959/output.log
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2645
Reviewed-by: Leo Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts for vega20_ih

Port this change to vega20_ih.c:
commit afbf7955ff01 ("drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts")

Original commit message:
"Why:
Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit
if RB_ENABLE is not set.

How to fix:
Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set.
The RB_ENABLE bit is required to be set, together with
WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit
would clear the RB_OVERFLOW."

Signed-off-by: Victor Lu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Clean the functions pointer set as NULL

We dont need to set the functions to NULL which arent
needed as global structure members are by default
set to zero or NULL for pointers.

Cc: Leo Liu <[email protected]>
Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: clean the dummy soft_reset functions

Remove the dummy soft_reset functions for all
ip blocks.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: clean the dummy wait_for_idle functions

Remove the dummy wait_for_idle functions for all
ip blocks.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: clean the dummy suspend functions

Remove the dummy suspend functions for all
ip blocks.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: clean the dummy resume functions

Remove the dummy resume functions for all
ip blocks.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: validate wait_for_idle before function call

Before making a function call to wait_for_idle,
validate the function pointer like we do in sw_init.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: validate resume before function call

Before making a function call to resume, validate
the function pointer like we do in sw_init.

Use the helper function amdgpu_ip_block_resume where
same checks and calls are repeated.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: validate suspend before function call

Before making a function call to suspend, validate
the function pointer like we do in sw_init.

Use the helper function amdgpu_ip_block_suspend where
same checks and calls are repeated.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: validate hw_fini before function call

Before making a function call to hw_fini, validate
the function pointer like we do in sw_init.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: fix the hang caused by the write reorder to fence_addr

make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status
called, to avoid qcm fence timeout caused by incorrect ordering.

Signed-off-by: Victor Zhao <[email protected]>
Reviewed-by: Philip Yang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.2

This commit adds the cleaner shader microcode for GFX9.4.2 GPUs. The
cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs).

Clearing these resources is important for ensuring data isolation
between different workloads running on the GPU. Without the cleaner
shader, residual data from a previous workload could potentially be
accessed by a subsequent workload, leading to data leaks and incorrect
computation results.

The cleaner shader microcode is represented as an array of 32-bit words
(`gfx_9_4_2_cleaner_shader_hex`). This array is the binary
representation of the cleaner shader code, which is written in a
low-level GPU instruction set.

Also, this patch updates the `gfx_v9_0_sw_init` function to initialize
the cleaner shader if the MEC firmware version is 88 or higher. It sets
the `cleaner_shader_ptr` and `cleaner_shader_size` to the appropriate
values and attempts to initialize the cleaner shader.

When the cleaner shader feature is enabled, the AMDGPU driver loads this
array into a specific location in the GPU memory. The GPU then reads
this memory location to fetch and execute the cleaner shader
instructions.

The cleaner shader is executed automatically by the GPU at the end of
each workload, before the next workload starts. This ensures that all
GPU resources are in a clean state before the start of each workload.

This change ensures that the GPU memory is properly cleared between
different processes, preventing data leakage and enhancing security. It
also aligns with the serialization mechanism between KGD and KFD,
ensuring that the GPU state is consistent across different workloads.

Cc: Christian König <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Suggested-by: Alex Deucher <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix typo for sdma6 constant fill packet

Fix typo for sdma6 constant fill packet

Signed-off-by: Frank Min <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix random data corruption for sdma 7

There is random data corruption caused by const fill, this is caused by
write compression mode not correctly configured.

So correct compression mode for const fill.

Signed-off-by: Frank Min <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: 3.2.306

This version brings along following fixes:
- Fix dcn401 idle optimization problem
- Fix cursor corruption on dcn35
- Fix DP LL compliance failures
- Fix SubVP Phantom VBlank End calculation

Acked-by: Tom Chung <[email protected]>
Signed-off-by: Aric Cyr <[email protected]>
Signed-off-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: To change dcn301_init.h guard.

[why & How]
The original guard is wrongly to be set as for dcn30.
Changed it from 30 to 301.

Reviewed-by: Dillon Varone <[email protected]>
Signed-off-by: Bhuvanachandra Pinninti <[email protected]>
Signed-off-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: update fullscreen status to SPL

[Why]
Current fullscreen check in SPL using dm_helpers is out-of-sync
with dc state. This causes an issue during minimal transition
where we pick an invalid intermediate state because the pre and
post fullscreen status are different.

[How]
Add sharpening_required flag to dc_stream_state. Use this flag to
indicate if we are in fullscreen or not. Propagate flag to SPL for
fullscreen status. Remove workaround in DML

Reviewed-by: Alvin Lee <[email protected]>
Signed-off-by: Samson Tam <[email protected]>
Signed-off-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Add a Precise Delay Routine

Fix DP compliance failures 4.2.2.12, 4.3.1.21, 4.9.1.19
caused by imprecise delay on fsleep().

Reviewed-by: Aric Cyr <[email protected]>
Signed-off-by: Fangzhi Zuo <[email protected]>
Signed-off-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Recalculate SubVP Phantom VBlank End in dml21

[WHY]
The phantom stream timing is copied from the main stream as most
parameters are identical, however some need to be recalculated.
Currently VBlank End is not recalculated and copied from the main
incorrectly.

[HOW]
Recalculate VBlank End for phantom stream timing.

Reviewed-by: Alvin Lee <[email protected]>
Signed-off-by: Dillon Varone <[email protected]>
Signed-off-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: temp w/a for DP Link Layer compliance

[Why&How]
Disabling P-State support on full updates for DCN401 results in
introducing additional communication with SMU. A UCLK hard min message
to SMU takes 4 seconds to go through, which was due to DCN not allowing
pstate switch, which was caused by incorrect value for TTU watermark
before blanking the HUBP prior to DPG on for servicing the test request.

Fix the issue temporarily by disallowing pstate changes for compliance
test while test request handler is reworked for a proper fix.

Fixes: 67ea53a4bd9d ("drm/amd/display: Disable DCN401 UCLK P-State support on full updates")
Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Reviewed-by: Dillon Varone <[email protected]>
Signed-off-by: Aurabindo Pillai <[email protected]>
Signed-off-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Adding array index check to prevent memory corruption

[Why & How]
Array indices out of bound caused memory corruption. Adding checks to
ensure that array index stays in bound.

Reviewed-by: Charlene Liu <[email protected]>
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Signed-off-by: Leo Chen <[email protected]>
Signed-off-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Reuse subvp enable check for DCN401

Reuse subvp enable check from DCN32 for IGT testing of Sub-Viewport
feature on DCN401

Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Aurabindo Pillai <[email protected]>
Signed-off-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: w/a to program DISPCLK_R_GATE_DISABLE DCN35

[WHY & HOW]
Cursor corruption observed on USBC display with specific system setup with a
reboot. Cursor memory might still in the lightsleep state due to voltage
issue, we need program DISPCLK_R_GATE_DISABLE to avoid this issue only on
DCN35.

Reviewed-by: Nicholas Kazlauskas <[email protected]>
Signed-off-by: Yihan Zhu <[email protected]>
Signed-off-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: temp w/a for dGPU to enter idle optimizations

[Why&How]
vblank immediate disable currently does not work for all asics. On
DCN401, the vblank interrupts never stop coming, and hence we never
get a chance to trigger idle optimizations.

Add a workaround to enable immediate disable only on APUs for now. This
adds a 2-frame delay for triggering idle optimization, which is a
negligible overhead.

Fixes: 58a261bfc967 ("drm/amd/display: use a more lax vblank enable policy for older ASICs")
Fixes: e45b6716de4b ("drm/amd/display: use a more lax vblank enable policy for DCN35+")
Cc: Mario Limonciello <[email protected]>
Cc: Alex Deucher <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Aurabindo Pillai <[email protected]>
Signed-off-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: clean the dummy sw_fini functions

Remove the dummy sw_fini functions for all
ip blocks.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Add hpd_source index check for dcn401 link encoder setup

This patch adds a boundary check for the hpd_source index during the
link encoder creation process for all dcn401 ip. The check ensures that the
index is within the valid range of the link_enc_hpd_regs array to
prevent out-of-bounds access.

Cc: Tom Chung <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: Roman Li <[email protected]>
Cc: Alex Hung <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Cc: Harry Wentland <[email protected]>
Cc: Hamza Mahfooz <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Add hpd_source index check for dcn10 link encoder setup

This patch adds a boundary check for the hpd_source index during the
link encoder creation process for all dcn10 ip. The check ensures that the
index is within the valid range of the link_enc_hpd_regs array to
prevent out-of-bounds access.

Cc: Tom Chung <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: Roman Li <[email protected]>
Cc: Alex Hung <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Cc: Harry Wentland <[email protected]>
Cc: Hamza Mahfooz <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Add hpd_source index check for DCE60/80/100/110/112/120 link encoders

This patch adds a boundary check for the hpd_source index during the
link encoder creation process for all DCExxx IP's. The check ensures
that the index is within the valid range of the link_enc_hpd_regs array
to prevent out-of-bounds access.

Cc: Tom Chung <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: Roman Li <[email protected]>
Cc: Alex Hung <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Cc: Harry Wentland <[email protected]>
Cc: Hamza Mahfooz <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Use SPX as default in partition config

In certain cases - ex: when a reset is required on initialization - XCP
manager won't have a valid partition mode. In such cases, use SPX as the
default selected mode for which partition configuration details are
populated.

Fixes: 4ae86dc87850 ("drm/amdgpu: Add sysfs nodes to get xcp details")
Signed-off-by: Lijo Lazar <[email protected]>
Reported-by: Hao Zhou <[email protected]>
Reviewed-by: Asad Kamal <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: validate sw_fini before function call

Before making a function call to sw_fini, validate
the function pointer like we do in sw_init.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: clean the dummy sw_init functions

Remove the dummy sw_init functions for all
IP blocks.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: validate sw_init before function call

Before making a function call to sw_init, validate
the function pointer like we do in late_init.

Signed-off-by: Sunil Khatri <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: Not restore userptr buffer if kfd process has been removed

When kfd process has been terminated not restore userptr buffer after mmu
notifier invalidates a range.

Signed-off-by: Xiaogang Chen <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm: update deep sleep status on smu v14.0.2/3

disable deep sleep during the compute workload for the
potential performance loss on smu v14.0.2/3

Signed-off-by: Kenneth Feng <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm: update overdrive function on smu v14.0.2/3

update overdrive function on smu v14.0.2/3

Signed-off-by: Kenneth Feng <[email protected]>
Acked-by: Yang Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Zero-initialize mqd backup memory

Zero-initialize mqd backup memory, otherwise the check for
'already-backed-up' could go wrong.

Signed-off-by: Lijo Lazar <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm: update the driver-fw interface file for smu v14.0.2/3

update the driver-fw interface file for smu v14.0.2/3

Signed-off-by: Kenneth Feng <[email protected]>
Reviewed-by: Yang Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Ensure HPD source index is valid for dcn20/dcn201 link encoders

This patch adds a boundary check for the hpd_source index during the
link encoder creation process for dcn20/dcn201 IP's. The check ensures
that the index is within the valid range of the link_enc_hpd_regs array
to prevent out-of-bounds access.

Cc: Tom Chung <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: Roman Li <[email protected]>
Cc: Alex Hung <[email protected]>
Cc: Aurabindo Pillai <[email protected]>
Cc: Harry Wentland <[email protected]>
Cc: Hamza Mahfooz <[email protected]>
Signed-off-by: Srinivasan Shanmugam <[email protected]>
Reviewed-by: Roman Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix spelling mistake "tunndeling" -> "tunneling"

There is a spelling mistake in a dm_error message. Fix it.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

Revert "drm/amdgpu/gfx9: put queue resets behind a debug option"

This reverts commit 7c1a2d8aba6cadde0cc542b2d805edc0be667e79.

Extended validation has completed successfully, so enable
these features by default.

Acked-by: Jiadong Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: Jonathan Kim <[email protected]>
Cc: Jiadong Zhu <[email protected]>

drm/amdgpu: init saw registers for mmhub v1.0

This commits init registers in the Stand Along Walker
for mmhub v1.0, to support ISP use cases.

Signed-off-by: Zhu Lingshan <[email protected]>
Reported-and-tested-by: Du Bin <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/discovery: add ISP discovery entries for old APUs

Raven1/2 and Picasso have ISP 2.0.0, however their ISP blocks
are not in the IP discovery table yet.

This commit fixes this issue by adding new ISP entries for
Raven and Picasso in the IP discovery table.

Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Zhu Lingshan <[email protected]>
Acked-by: Alex Deucher <[email protected]>

drm/amd: Guard against bad data for ATIF ACPI method

If a BIOS provides bad data in response to an ATIF method call
this causes a NULL pointer dereference in the caller.

```
? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1))
? __die (arch/x86/kernel/dumpstack.c:423 arch/x86/kernel/dumpstack.c:434)
? page_fault_oops (arch/x86/mm/fault.c:544 (discriminator 2) arch/x86/mm/fault.c:705 (discriminator 2))
? do_user_addr_fault (arch/x86/mm/fault.c:440 (discriminator 1) arch/x86/mm/fault.c:1232 (discriminator 1))
? acpi_ut_update_object_reference (drivers/acpi/acpica/utdelete.c:642)
? exc_page_fault (arch/x86/mm/fault.c:1542)
? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:387 (discriminator 2)) amdgpu
? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:386 (discriminator 1)) amdgpu
```

It has been encountered on at least one system, so guard for it.

Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/swsmu: add automatic parameter to set_soft_freq_range

On chips that support it, you can specificy 0 and 0xffff for
min and max and the PMFW will use that to determine the optimal
min and max. This enables optimal performance when the
user manually switches between performance levels using sysfs.
Previously we'd set soft min/max which could limit performance.

Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Fix off by one in current_memory_partition_show()

The >= ARRAY_SIZE() should be > ARRAY_SIZE() to prevent an out of
bounds read.

Fixes: 012be6f22c01 ("drm/amdgpu: Add sysfs interfaces for NPS mode")
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs

This uses more aggressive hueristics than the the bootup default
profile. On windows the OS has a special fullscreen 3D mode
where this is used. Since we don't have the equivalent on Linux
default to this profile for dGPUs.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3618
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1500
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3131
Fixes: c50fe289ed72 ("drm/amdgpu/swsmu: always force a state reprogram on init")
Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/swsmu: Only force workload setup on init

Needed to set the workload type at init time so that
we can apply the navi3x margin optimization.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3618
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3131
Fixes: c50fe289ed72 ("drm/amdgpu/swsmu: always force a state reprogram on init")
Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>