Kenneth Feng [Wed, 30 Oct 2024 05:22:44 +0000 (13:22 +0800)]
drm/amd/pm: correct the workload setting
Correct the workload setting in order not to mix the setting
with the end user. Update the workload mask accordingly.
v2: changes as below:
1. the end user can not erase the workload from driver except default workload.
2. always shows the real highest priority workoad to the end user.
3. the real workload mask is combined with driver workload mask and end user workload mask.
v3: apply this to the other ASICs as well.
v4: simplify the code
v5: refine the code based on the review comments.
Jesse Zhang [Tue, 29 Oct 2024 02:14:35 +0000 (10:14 +0800)]
drm/amdgpu: add amdgpu_sdma_sched_mask debugfs
Userspace wants to run jobs on a specific sdma ring for verification purposes.
This debugfs entry helps to disable or enable submitting jobs to a specific ring.
This entry is populated only if there are at least two or more cores in the sdma ip.
Jesse Zhang [Tue, 29 Oct 2024 02:11:05 +0000 (10:11 +0800)]
drm/amdgpu: add amdgpu_gfx_sched_mask and amdgpu_compute_sched_mask debugfs
compute/gfx may have multiple rings on some hardware.
In some cases, userspace wants to run jobs on a specific ring for validation purposes.
This debugfs entry helps to disable or enable submitting jobs to a specific ring.
This entry is populated only if there are at least two or more cores in the gfx/compute ip.
Victor Zhao [Thu, 24 Oct 2024 05:40:39 +0000 (13:40 +0800)]
drm/amdgpu: skip amdgpu_device_cache_pci_state under sriov
Under sriov, host driver will save and restore vf pci cfg space during
reset. And during device init, under sriov, pci_restore_state happens after
fullaccess released, and it can have race condition with mmio protection
enable from host side leading to missing interrupts.
drm/amdkfd: Use dynamic allocation for CU occupancy array in 'kfd_get_cu_occupancy()'
The `kfd_get_cu_occupancy` function previously declared a large
`cu_occupancy` array as a local variable, which could lead to stack
overflows due to excessive stack usage. This commit replaces the static
array allocation with dynamic memory allocation using `kcalloc`,
thereby reducing the stack size.
This change avoids the risk of stack overflows in kernel space, in
scenarios where `AMDGPU_MAX_QUEUES` is large. The allocated memory is
freed using `kfree` before the function returns to prevent memory
leaks.
Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c: In function ‘kfd_get_cu_occupancy’:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:322:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
322 | }
| ^
Aric Cyr [Mon, 28 Oct 2024 02:37:01 +0000 (22:37 -0400)]
drm/amd/display: 3.2.308
This version brings along following fixes:
- Prune Invalid Modes for HDMI Output
- SPL Cleanup
- Fix brightness level not retained over reboot
- Remove inaccessible registers from DMU diagnostics
Fangzhi Zuo [Thu, 17 Oct 2024 22:15:10 +0000 (18:15 -0400)]
drm/amd/display: Prune Invalid Modes For HDMI Output
[Why]
1. HDMI does not have 6 bpc support. Having 6 bpc pass validation
does not comply with spec.
2. Validate 420 only for native HDMI, but not apply to pcon use
case.
3. Current mode validation log is not readable.
[how]
1. Cap 8 bpc for dp-hdmi converter.
2. Validate yuv420 for pcon use case as well,
if rgb/yuv444 8bpc cannot fit into pcon bw limitation of
the link from the converter to HDMI sink.
3. Add readable pixel_format and color_depth into debug log.
Kaitlyn Tse [Thu, 3 Oct 2024 22:13:27 +0000 (18:13 -0400)]
drm/amd/display: Implement new backlight_level_params structure
[Why]
Implement the new backlight_level_params structure as part of the VBAC
framework, the information in this structure is needed to be passed down
to the DMCUB to identify the backlight control type, to adjust the
backlight of the panel and to perform any required conversions from PWM
to nits or vice versa.
[How]
Modified existing functions to include the new backlight_level_params
structure.
Taimur Hassan [Mon, 28 Oct 2024 00:12:59 +0000 (20:12 -0400)]
drm/amd/display: [FW Promotion] Release 0.0.241.0
- Add DPCS health check
- Update USB4 PHY SSC
- Fix FAMS2 SubVP Close to VBlank changes
- Create VESA Aux-based backlight control path
- Fix PSR1 CRC error during CTS test
Wayne Lin [Fri, 25 Oct 2024 04:27:26 +0000 (12:27 +0800)]
drm/amd/display: Don't write DP_MSTM_CTRL after LT
[Why]
Observe after suspend/resme, we can't light up mst monitors under specific
mst hub. The reason is that driver still writes DPCD DP_MSTM_CTRL after LT.
It's forbidden even we write the same value for that dpcd register.
[How]
We already resume the mst branch device dpcd settings during
resume_mst_branch_status(). Leverage drm_dp_mst_topology_queue_probe() to
only probe the topology, not calling drm_dp_mst_topology_mgr_resume() which
will set DP_MSTM_CTRL as well.
Ilya Bakoulin [Wed, 9 Oct 2024 19:26:48 +0000 (15:26 -0400)]
drm/amd/display: Minimize wait for pending updates
[Why/How]
Move the wait for pending updates past prepare_bandwidth if the previous
update was not a full update to reduce the average time it takes to
complete a full update.
Aurabindo Pillai [Fri, 18 Oct 2024 14:52:16 +0000 (10:52 -0400)]
drm/amd/display: parse umc_info or vram_info based on ASIC
An upstream bug report suggests that there are production dGPUs that are
older than DCN401 but still have a umc_info in VBIOS tables with the
same version as expected for a DCN401 product. Hence, reading this
tables should be guarded with a version check.
Samson Tam [Sun, 20 Oct 2024 02:07:31 +0000 (22:07 -0400)]
drm/amd/display: fix asserts in SPL during bootup
[Why]
During mode validation, there maybe modes that fail
max_downscale_src_width check and scaling_quality
taps are 0. This will cause an assert to trigger
in spl_set_filters_data() because taps are 0.
[How]
Move taps calculation for non-adaptive scaling mode
to separate function and call it
if max_downscale_src_width fails. This will
populate taps if scaling_quality taps are 0.
drm/amd/display: fix rxstatus_msg_sz type narrowing
[Why]
Code reading rxstatus message size was incorrectly assigning it to
uint8_t, despite the value being 10 bits long (lower byte plus lowest
2 bits from upper byte). This caused the highest 2 bits to be ignored,
potentially missing invalid values.
[How]
Change all local variables holding rxstatus message size from uint8_t
to uint16_t, as in mod_hdcp_message_hdcp2::rx_id_list_size.
Replaced untyped HDCP_2_2_HMID_RXSTATUS_MSG_SZ_HI macro with function
hdcp_2_2_hmid_rxstatus_msg_sz(const uint8_t[2]) to encapsulate entire
calculation and return a typed result.
Removed spaces mixed with tabs to fix indentation on modified lines.
[why & how]
The offending commit caused a lighting issue for Samsung Odyssey G9
monitors when connecting via USB-C. The commit was intended to block certain UHBR rates.
Austin Zheng [Fri, 18 Oct 2024 18:55:21 +0000 (14:55 -0400)]
drm/amd/display: Do Not Fallback To SW Cursor If HW Cursor Required
[Why/How]
Tearing can occur if there is a flip immediate plane and SW cursor.
check_subvp_sw_cursor_fallback_req falls back to SW cursor if the
stream has the potential to use subVP.
Check for fallback not needed if HW cursor is required.
e.g. Fullscreen gaming
drm/amdgpu: fix comment about amdgpu.abmlevel defaults
Since 040fdcde288a ("drm/amdgpu: respect the abmlevel module parameter value
if it is set"), the default value for amdgpu.abmlevel was set to -1, or auto.
However, the comment explaining the default value was not updated to reflect
the change (-1, or auto; not -1, or disabled).
Clarify that the default value (-1) means auto.
Fixes: 040fdcde288a ("drm/amdgpu: respect the abmlevel module parameter value if it is set") Reported-by: Ruikai Liu <[email protected]> Signed-off-by: Mingcong Bai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Tvrtko Ursulin [Thu, 24 Oct 2024 09:23:41 +0000 (10:23 +0100)]
drm/amdgpu: Expose special on chip memory pools in fdinfo
In the past these specialized on chip memory pools were reported as system
memory (aka 'cpu') which was not correct and misleading. That has since
been removed so lets make them visible as their own respective memory
regions.
Tvrtko Ursulin [Thu, 24 Oct 2024 09:23:40 +0000 (10:23 +0100)]
drm/amdgpu: Stop reporting special chip memory pools as CPU memory in fdinfo
So far these specialized on chip memory pools were reported as system
memory (aka 'cpu') which is not correct and misleading. Lets remove that
and consider later making them visible as their own thing.
Yunxiang Li [Thu, 24 Oct 2024 09:23:39 +0000 (10:23 +0100)]
drm/amdgpu: stop tracking visible memory stats
Since on modern systems all of vram can be made visible anyways, to
simplify the new implementation, drops tracking how much memory is
visible for now. If this is really needed we can add it back on top of
the new implementation, or just report all the BOs as visible.
Yunxiang Li [Thu, 24 Oct 2024 09:23:38 +0000 (10:23 +0100)]
drm/amdgpu: make drm-memory-* report resident memory
The old behavior reports the resident memory usage for this key and the
documentation say so as well. However this was accidentally changed to
include buffers that was evicted.
Li Huafei [Tue, 29 Oct 2024 20:27:58 +0000 (04:27 +0800)]
drm/amdgpu: Fix the memory allocation issue in amdgpu_discovery_get_nps_info()
Fix two issues with memory allocation in amdgpu_discovery_get_nps_info()
for mem_ranges:
- Add a check for allocation failure to avoid dereferencing a null
pointer.
- As suggested by Christophe, use kvcalloc() for memory allocation,
which checks for multiplication overflow.
Additionally, assign the output parameters nps_type and range_cnt after
the kvcalloc() call to prevent modifying the output parameters in case
of an error return.
Alex Deucher [Mon, 14 Oct 2024 15:58:34 +0000 (11:58 -0400)]
drm/amdgpu: fix fairness in enforce isolation handling
Make sure KFD gets a turn when serializing access to
the GC IP. Currently non-KFD jobs can starve KFD if they
submit often enough. This patch prevents that by stalling
non-KFD if its time period has elapsed.
calculate_user_regamma_coeff() and calculate_user_regamma_ramp() were
added in 2018 in commit 55a01d4023ce ("drm/amd/display: Add user_regamma to color module")
Alex Deucher [Wed, 23 Oct 2024 13:13:21 +0000 (09:13 -0400)]
drm/amdgpu/smu13: fix profile reporting
The following 3 commits landed in parallel:
commit d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting")
commit 7a1613e47e65 ("drm/amdgpu/smu13: always apply the powersave optimization")
commit 7c210ca5a2d7 ("drm/amdgpu: handle default profile on on devices without fullscreen 3D")
While everything is set correctly, this caused the profile to be
reported incorrectly because both the powersave and fullscreen3d bits
were set in the mask and when the driver prints the profile, it looks
for the first bit set.
Fixes: d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting") Reviewed-by: Kenneth Feng <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Prike Liang [Mon, 14 Oct 2024 07:25:35 +0000 (15:25 +0800)]
drm/amdgpu: clean up the suspend_complete
To check the status of S3 suspend completion,
use the PM core pm_suspend_global_flags bit(1)
to detect S3 abort events. Therefore, clean up
the AMDGPU driver's private flag suspend_complete.
In the normal S3 entry, the TOS cycle counter is not
reset during BIOS execution the _S3 method, so it doesn't
determine whether the _S3 method is executed exactly.
Howerver, the PM core performs the S3 suspend will set the
PM_SUSPEND_FLAG_FW_RESUME bit if all the devices suspend
successfully. Therefore, drivers can check the
pm_suspend_global_flags bit(1) to detect the S3 suspend
abort event.
Fixes: 6704dbf71928 ("drm/amdgpu: update suspend status for aborting from deeper suspend") Signed-off-by: Prike Liang <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Tvrtko Ursulin [Fri, 25 Oct 2024 14:56:39 +0000 (15:56 +0100)]
drm/amd/pm: Vangogh: Fix kernel memory out of bounds write
KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:
Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.
Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.
The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.
Aric Cyr [Mon, 21 Oct 2024 01:41:45 +0000 (21:41 -0400)]
drm/amd/display: 3.2.307
This version brings along following fixes:
- Fix polling DSC registers during S0i3
- Fix idle optimizations entry log
- Change MPC Tree visual confirm colours
- Fix underflow when playing 8K video in full screen mode
- Optimize power up sequence for specific OLED
Samson Tam [Fri, 18 Oct 2024 04:26:41 +0000 (00:26 -0400)]
drm/amd/display: store sharpness 1dlut table in dscl_prog_data
[Why]
Previously dscl_prog_data stored pointer to sharpness 1dlut table.
SPL had four pre-generated tables, one for each setup. This allowed
us to minimize number of times we had to recalculate table when
switching between setups.
However, with dual display, this becomes an issue because for a given
setup, we could have a different per app sharpness value than the
global sharpness value. So the pre-generated table will change
but both displays may point to the same table and one of them
will have the wrong sharpness setting.
[How]
Store the sharpness 1dlut table in dscl_prog_data. This ensures
that each display can have its own sharpness setting.
Ovidiu Bunea [Tue, 15 Oct 2024 22:20:54 +0000 (18:20 -0400)]
drm/amd/display: Do not read DSC state if not in use
[why & how]
DSC may be power gated when coming out of S0i3, so avoid polling
DSC registers since it will fail anyways. Only read if it is known
that DSC is in use.
Aurabindo Pillai [Wed, 16 Oct 2024 17:08:02 +0000 (13:08 -0400)]
drm/amd/display: Fix idle optimizations entry log
[Why & How]
Whether we really enter idle optimizations are decided within DC.
Printing into dmesg before calling the DC API gives an incorrect
indication that we are entering idle optimization in cases where its
disabled manually.
To fix this, remove the print in DM and add them in DC
Joshua Aberback [Thu, 17 Oct 2024 19:56:30 +0000 (15:56 -0400)]
drm/amd/display: Change MPC Tree visual confirm colours
[Why]
MPC background colours that use fractional components look different if
MPC OGAM is in use vs in bypass mode. The current red and orange colours
look very similar when OGAM is in bypass, so the colours need to change
to be consistently very easy to tell apart.
[How]
Use colours that only have 0 or MAX values in each component
Alex Hung [Wed, 16 Oct 2024 18:18:39 +0000 (12:18 -0600)]
drm/amd/display: Remove useless assignments and variables
[WHAT & HOW]
misc0, temp and split_pipe are assigned but immediately re-assigned
to other values. The early assignments are useless and are removed.
Unused variables are removed as well.
This fixes 5 UNUSED_VALUE issues reported by Coverity.
Samson Tam [Wed, 16 Oct 2024 18:11:35 +0000 (14:11 -0400)]
drm/amd/display: fix handling of max_downscale_src_width fail check in SPL
[Why]
If max_downscale_src_width check fails, we exit early from
spl_calculate_scaler_params but dscl_prog_data is not fully
populated. If viewport is left at 0, it can cause crash in dml.
[How]
Call spl_set_dscl_prog_data before we exit early from
spl_calculate_scaler_params to populate dscl_prog_data
Populate taps in spl_get_optimal_number_of_taps
Leo Ma [Fri, 11 Oct 2024 18:08:34 +0000 (14:08 -0400)]
drm/amd/display: Fix underflow when playing 8K video in full screen mode
[Why&How]
Flickering observed while playing 8k HEVC-10 bit video in full screen
mode with black border. We didn't support this case for subvp.
Make change to the existing check to disable subvp for this corner case.
Fangzhi Zuo [Tue, 15 Oct 2024 18:22:32 +0000 (14:22 -0400)]
drm/amd/display: Reduce HPD Detection Interval for IPS
Fix DP Compliance test 4.2.1.3, 4.2.2.8, 4.3.1.12, 4.3.1.13
when IPS enabled.
Original HPD detection interval is set to 5s which violates DP
compliance.
Reduce the interval parameter, such that link training can be
finished within 5 seconds.
This reverts
commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35")
[why & how]
The offending commit exposes a hang with lid close/open behavior.
Both issues seem to be related to ODM 2:1 mode switching, so there
is another issue generic to that sequence that needs to be
investigated.
Hansen Dsouza [Tue, 15 Oct 2024 21:33:15 +0000 (17:33 -0400)]
drm/amd/display: Add a boot option to reduce phy ssc for HBR3
[Why]
Spread on DPREFCLK by 0.3 percent can have a negative effect on sink
when PHY SSC is also spread by 0.3 percent
[How]
Add boot option for DMU to lower PHY SSC
Ovidiu Bunea [Fri, 11 Oct 2024 18:55:52 +0000 (14:55 -0400)]
drm/amd/display: Optimize power up sequence for specific OLED
[why & how]
OLED power up sequence takes an extra 150ms via hardcoded delay,
but there is a strict requirement on DisplayOn resume time.
For customer panel, remove these delays to meet target until a
cleaner solution is can be put in place.
Christian König [Tue, 8 Oct 2024 15:23:22 +0000 (17:23 +0200)]
drm/amdgpu: drop volatile from ring buffer
Volatile only prevents the compiler from re-ordering reads and writes.
Since we always only modify the ring buffer from one CPU thread and have
an explicit barrier before signaling the HW this should have no effect at
all and just prevents compiler optimisations.
Rodrigo Siqueira [Thu, 17 Oct 2024 03:34:27 +0000 (21:34 -0600)]
Documentation/gpu/amdgpu: Add programming model for DCN
One of the challenges to contributing to the display code is the
complexity of the DC component. This commit adds a documentation page
that discusses the programming model used by DCN and an overview of how
the display code is organized.
Rodrigo Siqueira [Thu, 17 Oct 2024 03:34:26 +0000 (21:34 -0600)]
Documentation/gpu: Document how to narrow down display issues
The amdgpu driver is composed of multiple components, each of which can
be a source of some specific problem that the user/developer can see.
This commit introduces steps to narrow down and collect display
information.
Kent Russell [Wed, 16 Oct 2024 18:26:33 +0000 (14:26 -0400)]
amdgpu: Don't print L2 status if there's nothing to print
If a 2nd fault comes in before the 1st is handled, the 1st fault will
clear out the FAULT STATUS registers before the 2nd fault is handled.
Thus we get a lot of zeroes. If status=0, just skip the L2 fault status
information, to avoid confusion of why some VM fault status prints in
dmesg are all zeroes.
Melissa Wen [Wed, 23 Oct 2024 13:53:17 +0000 (10:53 -0300)]
drm/amd/display: add missing tracepoint event in DM atomic_commit_tail
There are two events to trace the beginning and the end of
amdgpu_dm_atomic_commit_tail, but only the one ate the beginning was
placed. Place amdgpu_dm_atomic_commit_tail_finish tracepoint at the end
than.
Jonathan Kim [Fri, 20 Sep 2024 15:46:05 +0000 (11:46 -0400)]
drm/amdkfd: sever xgmi io link if host driver has disable sharing
Host drivers can create partial hives per guest by disabling xgmi sharing
between certain peers in the main hive.
Typically, these partial hives are fully connected per guest session.
In the event that the host makes a mistake by adding a non-shared node
to a guest session, have the KFD reflect sharing disabled by severing
the IO link.
Lijo Lazar [Thu, 17 Oct 2024 09:02:12 +0000 (14:32 +0530)]
drm/amdgpu: Fix the logic for NPS request failure
On a hive, NPS request is placed by the first one for all devices in the
hive. If the request fails, mark the mode as UNKNOWN so that subsequent
devices on unload don't request it. Also, fix the mutex double lock
issue in error condition, should have been mutex_unlock.
Lijo Lazar [Tue, 15 Oct 2024 03:13:45 +0000 (08:43 +0530)]
drm/amdgpu: Save VCN shared memory with init reset
VCN shared memory is in framebuffer and there are some flags initialized
during sw_init. Ideally, such programming should be during hw_init.
Make sure the flags are saved during reset on initialization since that
reset will affect frame buffer region. For clarity, separate it out to
another function.
drm/amd/display: Disable PSR-SU on Parade 08-01 TCON too
Stuart Hayhurst has found that both at bootup and fullscreen VA-API video
is leading to black screens for around 1 second and kernel WARNING [1] traces
when calling dmub_psr_enable() with Parade 08-01 TCON.
These symptoms all go away with PSR-SU disabled for this TCON, so disable
it for now while DMUB traces [2] from the failure can be analyzed and the failure
state properly root caused.
Victor Lu [Thu, 18 Jul 2024 22:01:23 +0000 (18:01 -0400)]
drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts for vega20_ih
Port this change to vega20_ih.c:
commit afbf7955ff01 ("drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts")
Original commit message:
"Why:
Setting IH_RB_WPTR register to 0 will not clear the RB_OVERFLOW bit
if RB_ENABLE is not set.
How to fix:
Set WPTR_OVERFLOW_CLEAR bit after RB_ENABLE bit is set.
The RB_ENABLE bit is required to be set, together with
WPTR_OVERFLOW_ENABLE bit so that setting WPTR_OVERFLOW_CLEAR bit
would clear the RB_OVERFLOW."
This commit adds the cleaner shader microcode for GFX9.4.2 GPUs. The
cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs).
Clearing these resources is important for ensuring data isolation
between different workloads running on the GPU. Without the cleaner
shader, residual data from a previous workload could potentially be
accessed by a subsequent workload, leading to data leaks and incorrect
computation results.
The cleaner shader microcode is represented as an array of 32-bit words
(`gfx_9_4_2_cleaner_shader_hex`). This array is the binary
representation of the cleaner shader code, which is written in a
low-level GPU instruction set.
Also, this patch updates the `gfx_v9_0_sw_init` function to initialize
the cleaner shader if the MEC firmware version is 88 or higher. It sets
the `cleaner_shader_ptr` and `cleaner_shader_size` to the appropriate
values and attempts to initialize the cleaner shader.
When the cleaner shader feature is enabled, the AMDGPU driver loads this
array into a specific location in the GPU memory. The GPU then reads
this memory location to fetch and execute the cleaner shader
instructions.
The cleaner shader is executed automatically by the GPU at the end of
each workload, before the next workload starts. This ensures that all
GPU resources are in a clean state before the start of each workload.
This change ensures that the GPU memory is properly cleared between
different processes, preventing data leakage and enhancing security. It
also aligns with the serialization mechanism between KGD and KFD,
ensuring that the GPU state is consistent across different workloads.
Aric Cyr [Mon, 14 Oct 2024 00:21:39 +0000 (20:21 -0400)]
drm/amd/display: 3.2.306
This version brings along following fixes:
- Fix dcn401 idle optimization problem
- Fix cursor corruption on dcn35
- Fix DP LL compliance failures
- Fix SubVP Phantom VBlank End calculation