Perry Yuan [Thu, 27 Jul 2023 05:45:30 +0000 (01:45 -0400)]
drm/amdgpu: ungate power gating when system suspend
[Why] During suspend, if GFX DPM is enabled and GFXOFF feature is
enabled the system may get hung. So, it is suggested to disable
GFXOFF feature during suspend and enable it after resume.
[How] Update the code to disable GFXOFF feature during suspend and enable
it after resume.
[ 311.396526] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[ 311.396530] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
[ 311.396531] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
José Pekkarinen [Tue, 31 Oct 2023 17:08:47 +0000 (19:08 +0200)]
drm/radeon: replace 1-element arrays with flexible-array members
Reported by coccinelle, the following patch will move the
following 1 element arrays to flexible arrays.
drivers/gpu/drm/radeon/atombios.h:5523:32-48: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:5545:32-48: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:5461:34-44: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4447:30-40: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4236:30-41: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7095:28-45: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:3896:27-37: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:5443:16-25: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:5454:34-43: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4603:21-32: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4628:32-46: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:6285:29-39: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4296:30-36: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4756:28-36: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:4064:22-35: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7327:9-24: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7332:32-53: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7362:26-41: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7369:29-44: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7349:24-32: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
drivers/gpu/drm/radeon/atombios.h:7355:27-35: WARNING use flexible-array member instead (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
Alex Deucher [Tue, 17 Oct 2023 19:40:01 +0000 (15:40 -0400)]
drm/amdgpu: don't use ATRM for external devices
The ATRM ACPI method is for fetching the dGPU vbios rom
image on laptops and all-in-one systems. It should not be
used for external add in cards. If the dGPU is thunderbolt
connected, don't try ATRM.
v2: pci_is_thunderbolt_attached only works for Intel. Use
pdev->external_facing instead.
v3: dev_is_removable() seems to be what we want
If the size returned by drm buddy allocator is higher than
the required size, we take the higher size to calculate
the buffer start address. This is required if we couldn't
trim the buffer to the requested size. This will fix the
display corruption issue on APU's which has limited VRAM
size.
Tong Liu01 [Fri, 27 Oct 2023 03:26:46 +0000 (11:26 +0800)]
drm/amdgpu: add unmap latency when gfx11 set kiq resources
[why]
If driver does not set unmap latency for KIQ, the default value of KIQ
unmap latency is zero. When do unmap queue, KIQ will return that almost
immediately after receiving unmap command. So, the queue status will be
saved to MQD incorrectly or lost in some chance.
[how]
Set unmap latency when do kiq set resources. The unmap latency is set to
be 1 second that is synchronized with Windows driver.
Kenneth Feng [Wed, 25 Oct 2023 03:20:38 +0000 (11:20 +0800)]
drm/amd/pm: fix the high voltage and temperature issue
fix the high voltage and temperature issue after the driver is unloaded on smu 13.0.0,
smu 13.0.7 and smu 13.0.10
v2 - fix the code format and make sure it is used on the unload case only.
Yifan Zhang [Thu, 26 Oct 2023 13:23:06 +0000 (21:23 +0800)]
drm/amdgpu: remove amdgpu_mes_self_test in gpu recover
gpu tlb flush is skipped if reset sem is held, it makes
mes_self_test fail since it involves add_hw_queue/remove_hw_queue
which needs tlb flush functional. Remove mes_self_test in gpu
recover sequence.
This patch is to fix the recover failure in gfx11.
[ 1831.768292] [drm] ring sdma_32769.3.3 was added
[ 1831.768313] [drm] ring gfx_32769.1.1 ib test pass
[ 1831.768337] [drm] ring compute_32769.2.2 ib test pass
[ 1831.768399] amdgpu 0000:c2:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32769, for process pid 0 thread pid 0)
[ 1831.768434] amdgpu 0000:c2:00.0: amdgpu: in page starting at address 0x0000aec200000000 from client 10
[ 1831.768456] amdgpu 0000:c2:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800A30
[ 1831.768473] amdgpu 0000:c2:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[ 1831.768489] amdgpu 0000:c2:00.0: amdgpu: MORE_FAULTS: 0x0
[ 1831.768501] amdgpu 0000:c2:00.0: amdgpu: WALKER_ERROR: 0x0
[ 1831.768513] amdgpu 0000:c2:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 1831.768521] amdgpu 0000:c2:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 1831.768529] amdgpu 0000:c2:00.0: amdgpu: RW: 0x0
[ 1831.931229] amdgpu 0000:c2:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma_32769.3.3 test failed (-110)
[ 1832.062917] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[ 1832.063107] [drm:amdgpu_mes_remove_hw_queue [amdgpu]] *ERROR* failed to remove hardware queue, queue id = 3
Lijo Lazar [Fri, 27 Oct 2023 05:32:26 +0000 (11:02 +0530)]
drm/amd/pm: Fix warnings
Fixes warnings:
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_6_ppt.c:286:45:
warning: '%s' directive output may be truncated writing up to 29 bytes
into a region of size 23 [-Wformat-truncation=]
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0_6_ppt.c:286:52:
warning: '%s' directive output may be truncated writing up to 29 bytes
into a region of size 23 [-Wformat-truncation=]
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu14/smu_v14_0.c:72:45: warning:
'%s' directive output may be truncated writing up to 29 bytes into a
region of size 23 [-Wformat-truncation=]
drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu14/smu_v14_0.c:72:52: warning:
'%s' directive output may be truncated writing up to 29 bytes into a
region of size 23 [-Wformat-truncation=]
Tao Zhou [Wed, 25 Oct 2023 03:03:24 +0000 (11:03 +0800)]
drm/amdgpu: check RAS supported first in ras_reset_error_count
Not all platforms support RAS.
Fixes: 73582be11ac8 ("drm/amdgpu: bypass RAS error reset in some conditions") Signed-off-by: Tao Zhou <[email protected]> Reviewed-by: Yang Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Dave Airlie [Tue, 31 Oct 2023 00:47:49 +0000 (10:47 +1000)]
Merge tag 'drm-misc-next-2023-10-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v6.7-rc1:
drm-misc-next-2023-10-19 + following:
UAPI Changes:
Cross-subsystem Changes:
- Convert fbdev drivers to use fbdev i/o mem helpers.
Core Changes:
- Use cross-references for macros in docs.
- Make drm_client_buffer_addb use addfb2.
- Add NV20 and NV30 YUV formats.
- Documentation updates for create_dumb ioctl.
- CI fixes.
- Allow variable number of run-queues in scheduler.
Driver Changes:
- Rename drm/ast constants.
- Make ili9882t its own driver.
- Assorted fixes in ivpu, vc4, bridge/synopsis, amdgpu.
- Add planar formats to rockchip.
David Francis [Thu, 12 Oct 2023 14:35:20 +0000 (10:35 -0400)]
drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems
On gfx943 APU, EXT_COHERENT should give MTYPE_CC for local and
MTYPE_UC for nonlocal memory.
On NUMA systems, local memory gets the local mtype, set by an
override callback. If EXT_COHERENT is set, memory will be set as
MTYPE_UC by default, with local memory MTYPE_CC.
Add an option in the override function for this case, and
add a check to ensure it is not used on UNCACHED memory.
V2: Combined APU and NUMA code into one patch
V3: Fixed a potential nullptr in amdgpu_vm_bo_update
Hamza Mahfooz [Thu, 26 Oct 2023 15:50:45 +0000 (11:50 -0400)]
drm/amd/display: fix S/G display enablement
An assignment statement was reversed during a refactor which effectively
disabled S/G display outright. Since, we use
adev->mode_info.gpu_vm_support to indicate to the rest of the driver
that S/G display should be enabled and currently it is always set to
false. So, to fix this set adev->mode_info.gpu_vm_support's value to
that of init_data.flags.gpu_vm_support (and not vice versa).
Yifan Zhang [Tue, 24 Oct 2023 13:16:26 +0000 (21:16 +0800)]
drm/amd/pm: call smu_cmn_get_smc_version in is_mode1_reset_supported.
is_mode1_reset_supported may be called before smu init, when smu_context
is unitialized in driver load/unload test. Call smu_cmn_get_smc_version
explicitly in is_mode1_reset_supported.
v2: apply to aldebaran in case is_mode1_reset_supported will be
uncommented (Candice Li)
Aric Cyr [Mon, 9 Oct 2023 17:11:32 +0000 (13:11 -0400)]
drm/amd/display: 3.2.256
DC v3.2.256
Summary:
* Fixes null-deref regression after
"drm/amd/display: Update OPP counter from new interface"
* Fixes display flashing when VSR and HDR enabled on dcn32
* Fixes dcn3x intermittent hangs due to FPO
* Fixes MST Multi-Stream light up on dcn35
* Fixes green screen on DCN31x when DVI and HDMI monitors attached
* Adds DML2 improvements
* Adds idle power optimization improvements
* Accommodates panels with lower nit backlight
* Updates SDP VSC colorimetry from DP test automation request
* Reverts "drm/amd/display: allow edp updates for virtual signal"
drm/amd/display: Read before writing Backlight Mode Set Register
[HOW&WHY]
Reading the value from
DP_EDP_BACKLIGHT_MODE_SET_REGISTER, DPCD 0x721
before setting the
BP_EDP_PANEL_LUMINANC_CONTROL_ENABLE bit
to ensure there are no accidental overwrites.
Samson Tam [Thu, 5 Oct 2023 05:31:12 +0000 (01:31 -0400)]
drm/amd/display: fix num_ways overflow error
[Why]
Helper function calculates num_ways using 32-bit. But is
returned as 8-bit. If num_ways exceeds 8-bit, then it
reports back the incorrect num_ways and erroneously
uses MALL when it should not
[How]
Make returned value 32-bit and convert after it checks
against caps.cache_num_ways, which is under 8-bit
This commit adds the amdgpu_dm_plane_ prefix for all functions in the
amdgpu_dm_plane.c. This change enables an easy way to filter code paths
via ftrace.
drm/amd/display: Add prefix to amdgpu crtc functions
The ftrace debug feature allows filtering functions based on a prefix,
which can be helpful in some complex debug scenarios. The driver can
benefit more from this feature if the function name follows some
patterns; for this reason, this commit adds the prefix amdgpu_dm_crtc_
to all the functions that do not have it in the amdgpu_dm_crtc.c file.
Sung Joon Kim [Thu, 5 Oct 2023 18:56:24 +0000 (14:56 -0400)]
drm/amd/display: Fix HDMI framepack 3D test issue
[why]
Bandwidth validation failure on framepack tests.
Need to double pixel clock when 3D format is
framepack. Also for HDMI displays, we need to
keep the ITC flag to 1 by default.
[how]
Double the pixel clock when using framepack 3D format.
Set hdmi ITC bit to 1.
Swapnil Patel [Wed, 4 Oct 2023 19:58:57 +0000 (15:58 -0400)]
drm/amd/display: Reduce default backlight min from 5 nits to 1 nits
[Why & How]
Currently set_default_brightness_aux function uses 5 nits as lower limit
to check for valid default_backlight setting. However some newer panels
can support even lower default settings
George Shen [Mon, 2 Oct 2023 19:31:16 +0000 (15:31 -0400)]
drm/amd/display: Update SDP VSC colorimetry from DP test automation request
[Why]
Certain test equipment vendors check the SDP VSC for colorimetry against
the value from the test request during certain DP link layer tests for
YCbCr test cases.
[How]
Update SDP VSC with colorimetry from test automation request.
drm/amd: Explicitly disable ASPM when dynamic switching disabled
Currently there are separate but related checks:
* amdgpu_device_should_use_aspm()
* amdgpu_device_aspm_support_quirk()
* amdgpu_device_pcie_dynamic_switching_supported()
Simplify into checking whether DPM was enabled or not in the auto
case. This works because amdgpu_device_pcie_dynamic_switching_supported()
populates that value.
The change prevents migrating the entire range to VRAM because retry
fault restore_pages map the remaining system memory range to GPUs. It
will work correctly to submit together with partial mapping to GPU
patch later.
Qu Huang [Mon, 23 Oct 2023 12:56:37 +0000 (12:56 +0000)]
drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL
In certain types of chips, such as VEGA20, reading the amdgpu_regs_smc file could result in an abnormal null pointer access when the smc_rreg pointer is NULL. Below are the steps to reproduce this issue and the corresponding exception log:
Rodrigo Siqueira [Fri, 20 Oct 2023 16:06:50 +0000 (10:06 -0600)]
drm/amd/display: Fix DMUB errors introduced by DML2
When DML 2 was introduced, it changed part of the generic sequence of
DC, which caused issues on previous DCNs with DMUB support. This commit
ensures the new sequence only works for new DCNs from 3.5 and above.
Changes since V1:
- Harry: Use the attribute using_dml2 instead of check the DCN version.
Rodrigo Siqueira [Fri, 20 Oct 2023 21:17:09 +0000 (15:17 -0600)]
drm/amd/display: Set the DML2 attribute to false in all DCNs older than version 3.5
When DML2 was introduced, it targeted only new DCN versions. For
controlling which ASIC should use this new version of DML, it was
introduced the using_dml2 attribute. To avoid ambiguities, this commit
explicitly sets using_dml2 to false in all ASICs that do not support
DML2.
Alex Deucher [Wed, 25 Oct 2023 17:19:28 +0000 (13:19 -0400)]
drm/amdgpu: move buffer funcs setting up a level
Rather than doing this in the IP code for the SDMA paging
engine, move it up to the core device level init level.
This should fix the scheduler init ordering.
v2: drop extra parens
v3: drop SDMA helpers
v4: Added a Fixes tag because amdgpu dereferences an uninitialized
scheduler without this patch, and this patch fixes this. (Luben)
Luben Tuikov [Sun, 15 Oct 2023 01:15:35 +0000 (21:15 -0400)]
drm/sched: Convert the GPU scheduler to variable number of run-queues
The GPU scheduler has now a variable number of run-queues, which are set up at
drm_sched_init() time. This way, each driver announces how many run-queues it
requires (supports) per each GPU scheduler it creates. Note, that run-queues
correspond to scheduler "priorities", thus if the number of run-queues is set
to 1 at drm_sched_init(), then that scheduler supports a single run-queue,
i.e. single "priority". If a driver further sets a single entity per
run-queue, then this creates a 1-to-1 correspondence between a scheduler and
a scheduled entity.
Helen Koike [Tue, 19 Sep 2023 18:22:49 +0000 (15:22 -0300)]
MAINTAINERS: drm/ci: add entries for xfail files
DRM CI keeps track of which tests are failing, flaking or being skipped
by the ci in the expectations files. Add entries for those files to the
corresponding driver maintainer, so they can be notified when they
change.
Helen Koike [Tue, 24 Oct 2023 00:45:24 +0000 (21:45 -0300)]
drm/ci: do not automatically retry on error
Since the kernel doesn't use a bot like Mesa that requires tests to pass
in order to merge the patches, leave it to developers and/or maintainers
to manually retry.
Helen Koike [Tue, 24 Oct 2023 00:45:22 +0000 (21:45 -0300)]
drm/ci: increase i915 job timeout to 1h30m
With the new sharding, the default job timeout is not enough for i915
and their jobs are failing before completing.
See below the current execution time:
🞋 job i915:tgl 8/8 has new status: success (37m3s)
🞋 job i915:tgl 7/8 has new status: success (19m43s)
🞋 job i915:tgl 6/8 has new status: success (21m47s)
🞋 job i915:tgl 5/8 has new status: success (18m16s)
🞋 job i915:tgl 4/8 has new status: success (21m43s)
🞋 job i915:tgl 3/8 has new status: success (17m59s)
🞋 job i915:tgl 2/8 has new status: success (22m15s)
🞋 job i915:tgl 1/8 has new status: success (18m52s)
🞋 job i915:cml 2/2 has new status: success (1h19m58s)
🞋 job i915:cml 1/2 has new status: success (55m45s)
🞋 job i915:whl 2/2 has new status: success (1h8m56s)
🞋 job i915:whl 1/2 has new status: success (54m3s)
🞋 job i915:kbl 3/3 has new status: success (37m43s)
🞋 job i915:kbl 2/3 has new status: success (36m37s)
🞋 job i915:kbl 1/3 has new status: success (34m52s)
🞋 job i915:amly 2/2 has new status: success (1h7m60s)
🞋 job i915:amly 1/2 has new status: success (59m18s)
🞋 job i915:glk 2/2 has new status: success (58m26s)
🞋 job i915:glk 1/2 has new status: success (50m23s)
🞋 job i915:apl 3/3 has new status: success (1h6m39s)
🞋 job i915:apl 2/3 has new status: success (1h4m45s)
🞋 job i915:apl 1/3 has new status: success (1h7m38s)
(generated with ci_run_n_monitor.py script)
The longest job is 1h19m58s, so adjust the timeout.
Helen Koike [Tue, 24 Oct 2023 00:45:21 +0000 (21:45 -0300)]
drm/ci: add subset-1-gfx to LAVA_TAGS and adjust shards
The Collabora Lava farm added a tag called `subset-1-gfx` to half of
devices the graphics community use.
Lets use this tag so we don't occupy all the resources.
This is particular important because Mesa3D shares the resources with
DRM-CI and use them to do pre-merge tests, so it can block developers
from getting their patches merged.
Helen Koike [Tue, 24 Oct 2023 00:45:20 +0000 (21:45 -0300)]
drm/ci: clean up xfails (specially flakes list)
Since the script that collected the list of the expectation files was
bogus and placing test to the flakes list incorrectly, restart the
expectation files with the correct script.
This reduces a lot the number of tests in the flakes list.
Helen Koike [Tue, 24 Oct 2023 00:45:19 +0000 (21:45 -0300)]
drm/ci: uprev IGT and make sure core_getversion is run
IGT has recently merged a patch that makes code_getversion test to fails
if the driver isn't loaded or if it isn't the expected one defined in
variable IGT_FORCE_DRIVER.
Without this test, jobs were passing when the driver didn't load or
probe for some reason, giving the illusion that everything was ok.
Uprev IGT to include this modification and include core_getversion test
in all the shards.
Helen Koike [Tue, 24 Oct 2023 00:45:18 +0000 (21:45 -0300)]
drm/ci: add helper script update-xfails.py
Add helper script that given a gitlab pipeline url, analyse which are
the failures and flakes and update the xfails folder accordingly.
Example:
Trigger a pipeline in gitlab infrastructure, than re-try a few jobs more
than once (so we can have data if failures are consistent across jobs
with the same name or if they are flakes) and execute: