Git Repo - linux.git/log

drm/amd/display: Fix two cursor duplication when using overlay

Our driver supports overlay planes, and as expected, some userspace
compositor takes advantage of these features. If the userspace is not
enabling the cursor, they can use multiple planes as they please.
Nevertheless, we start to have constraints when userspace tries to
enable hardware cursor with various planes. Basically, we cannot draw
the cursor at the same size and position on two separated pipes since it
uses extra bandwidth and DML only run with one cursor.

For those reasons, when we enable hardware cursor and multiple planes,
our driver should accept variations like the ones described below:

  +-------------+   +--------------+
  | +---------+ |   |              |
  | |Primary  | |   | Primary      |
  | |         | |   | Overlay      |
  | +---------+ |   |              |
  |Overlay      |   |              |
  +-------------+   +--------------+

In this scenario, we can have the desktop UI in the overlay and some
other framebuffer attached to the primary plane (e.g., video). However,
userspace needs to obey some rules and avoid scenarios like the ones
described below (when enabling hw cursor):

                                      +--------+
                                      |Overlay |
+-------------+    +-----+-------+ +-|        |--+
| +--------+  | +--------+       | | +--------+  |
| |Overlay |  | |Overlay |       | |             |
| |        |  | |        |       | |             |
| +--------+  | +--------+       | |             |
| Primary     |    | Primary     | | Primary     |
+-------------+    +-------------+ +-------------+

+-------------+   +-------------+
|     +--------+  |  Primary    |
|     |Overlay |  |             |
|     |        |  |             |
|     +--------+  | +--------+  |
| Primary     |   | |Overlay |  |
+-------------+   +-|        |--+
                     +--------+

If the userspace violates some of the above scenarios, our driver needs
to reject the commit; otherwise, we can have unexpected behavior. Since
we don't have a proper driver validation for the above case, we can see
some problems like a duplicate cursor in applications that use multiple
planes. This commit fixes the cursor issue and others by adding adequate
verification for multiple planes.

Change since V1 (Harry and Sean):
- Remove cursor verification from the equation.

Cc: Louis Li <[email protected]>
Cc: Nicholas Kazlauskas <[email protected]>
Cc: Harry Wentland <[email protected]>
Cc: Hersen Wu <[email protected]>
Cc: Sean Paul <[email protected]>
Signed-off-by: Rodrigo Siqueira <[email protected]>
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: enable gfx ras in aldebran by default

gfx ras now can be enabled by default in aldebaran

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: John Clements <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: switch to mmhub ras callback for ras fini

invoke callback function for mmhub ras fini

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: John Clements <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: retired reset_ras_error_count from hdp callbacks

It was moved to hdp ras callbacks

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: John Clements <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: enable ras error count query and reset for HDP

add hdp block ras error query and reset support in
amdgpu ras error count query and reset interface

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: John Clements <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: init/fini hdp v4_0 ras

invoke hdp v4_0 ras init in gmc late_init phase
while ras fini in gmc sw_fini phase

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: John Clements <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: initialize hdp v4_0 ras functions

hdp v4_0 support ras features

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: John Clements <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: implement hdp v4_0 ras functions

implement hdp v4_0 ras functions, including
ras init/fini, query/reset_error_counter

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: John Clements <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add helpers for hdp ras init/fini

hdp ras init/fini are common functions that
can be shared among hdp generations

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: John Clements <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add hdp ras structures

centralize all hdp ras operation to ras_funcs

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: John Clements <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

amdgpu: fix GEM obj leak in amdgpu_display_user_framebuffer_create

This error code-path is missing a drm_gem_object_put call. Other
error code-paths are fine.

Signed-off-by: Simon Ser <[email protected]>
Fixes: 1769152ac64b ("drm/amdgpu: Fail fb creation from imported dma-bufs. (v2)")
Cc: Alex Deucher <[email protected]>
Cc: Harry Wentland <[email protected]>
Cc: Nicholas Kazlauskas <[email protected]>
Cc: Bas Nieuwenhuizen <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix build warnings

Fix the following build warnings.

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:
In function ‘dm_update_mst_vcpi_slots_for_dsc’:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6242:46:
warning: variable ‘old_con_state’ set but not used

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:
In function ‘amdgpu_dm_commit_cursors’:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7709:44:
warning: variable ‘new_plane_state’ set but not used

The variables were introduced to be used in iterators, but not used.
Use other iterators which don't require the unused variables.

Fixes: 8ad278062de4e ("drm/amd/display: Disable cursors before disabling planes")
Fixes: 29b9ba74f6384 ("drm/amd/display: Recalculate VCPI slots for new DSC connectors")
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/amdgpu: Fix errors in documentation of function parameters

In the function documentation, I removed the excess parameters,
described the undocumented ones, and fixed the syntax errors.

Signed-off-by: Fabio M. De Francesco <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/display: add documentation for dmcub_trace_event_en

Was missing when this structure was updated.

Fixes: 46a83eba276cd3 ("drm/amd/display: Add debugfs to control DMUB trace buffer events")
Reviewed-by: Leo (Hanghong) Ma <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm/powerplay/hwmgr: Fix kernel-doc syntax in documentation

Fixed kernel-doc syntax errors in documentation of functions.

Signed-off-by: Fabio M. De Francesco <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Register VGA clients after init can no longer fail

When an amdgpu device fails to init, it makes another VGA device cause
kernel splat:
kernel: amdgpu 0000:08:00.0: amdgpu: amdgpu_device_ip_init failed
kernel: amdgpu 0000:08:00.0: amdgpu: Fatal error during GPU init
kernel: amdgpu: probe of 0000:08:00.0 failed with error -110
...
kernel: amdgpu 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000018
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: 0000 [#1] SMP NOPTI
kernel: CPU: 6 PID: 1080 Comm: Xorg Tainted: G        W         5.12.0-rc8+ #12
kernel: Hardware name: HP HP EliteDesk 805 G6/872B, BIOS S09 Ver. 02.02.00 12/30/2020
kernel: RIP: 0010:amdgpu_device_vga_set_decode+0x13/0x30 [amdgpu]
kernel: Code: 06 31 c0 c3 b8 ea ff ff ff 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 8b 87 90 06 00 00 48 89 e5 53 89 f3 <48> 8b 40 18 40 0f b6 f6 e8 40 58 39 fd 80 fb 01 5b 5d 19 c0 83 e0
kernel: RSP: 0018:ffffae3c0246bd68 EFLAGS: 00010002
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: RDX: ffff8dd1af5a8560 RSI: 0000000000000000 RDI: ffff8dce8c160000
kernel: RBP: ffffae3c0246bd70 R08: ffff8dd1af5985c0 R09: ffffae3c0246ba38
kernel: R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000246
kernel: R13: 0000000000000000 R14: 0000000000000003 R15: ffff8dce81490000
kernel: FS:  00007f9303d8fa40(0000) GS:ffff8dd1af580000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000018 CR3: 0000000103cfa000 CR4: 0000000000350ee0
kernel: Call Trace:
kernel:  vga_arbiter_notify_clients.part.0+0x4a/0x80
kernel:  vga_get+0x17f/0x1c0
kernel:  vga_arb_write+0x121/0x6a0
kernel:  ? apparmor_file_permission+0x1c/0x20
kernel:  ? security_file_permission+0x30/0x180
kernel:  vfs_write+0xca/0x280
kernel:  ksys_write+0x67/0xe0
kernel:  __x64_sys_write+0x1a/0x20
kernel:  do_syscall_64+0x38/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f93041e02f7
kernel: Code: 75 05 48 83 c4 58 c3 e8 f7 33 ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
kernel: RSP: 002b:00007fff60e49b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f93041e02f7
kernel: RDX: 000000000000000b RSI: 00007fff60e49b40 RDI: 000000000000000f
kernel: RBP: 00007fff60e49b40 R08: 00000000ffffffff R09: 00007fff60e499d0
kernel: R10: 00007f93049350b5 R11: 0000000000000246 R12: 000056111d45e808
kernel: R13: 0000000000000000 R14: 000056111d45e7f8 R15: 000056111d46c980
kernel: Modules linked in: nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq input_leds snd_seq_device snd_timer snd soundcore joydev kvm_amd serio_raw k10temp mac_hid hp_wmi ccp kvm sparse_keymap wmi_bmof ucsi_acpi efi_pstore typec_ucsi rapl typec video wmi sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log hid_generic usbhid hid amdgpu drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul sysimgblt crc32_pclmul fb_sys_fops ghash_clmulni_intel cec rc_core aesni_intel crypto_simd psmouse cryptd r8169 i2c_piix4 drm ahci xhci_pci realtek libahci xhci_pci_renesas gpio_amdpt gpio_generic
kernel: CR2: 0000000000000018
kernel: ---[ end trace 76d04313d4214c51 ]---

Commit 4192f7b57689 ("drm/amdgpu: unmap register bar on device init
failure") makes amdgpu_driver_unload_kms() skips amdgpu_device_fini(),
so the VGA clients remain registered. So when
vga_arbiter_notify_clients() iterates over registered clients, it causes
NULL pointer dereference.

Since there's no reason to register VGA clients that early, so solve
the issue by putting them after all the goto cleanups.

v2:
- Remove redundant vga_switcheroo cleanup in failed: label.

Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure")
Signed-off-by: Kai-Heng Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown

The runtime resume PM op disregards the return value from
amdgpu_device_resume(), masking errors for failed resumes at the PM
layer.

Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Pavan Kumar Ramayanam <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix r initial values

Sriov gets suspend of IP block <dce_virtual> failed as return
value was not initialized.

v2: return 0 directly to align original code semantic before this
was broken out into a separate helper function instead of setting
initial values

Signed-off-by: Victor Zhao <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix no full coverage issue for gprs initialization

The wave's number per simd in aldebaran is changed to 8, so it is
impossible to use old algorithm to initiate all sgprs with one
threadgroup. The new algorithm firstly use three threadgroups to
initiate most sgprs simultaneously and then use another threadgroup with
4 waves to cover other uninitiated sgprs.

v2:
Add more description about the new algorithm to clear sgprs and add some
comment for shader binaries

Signed-off-by: Dennis Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: Add Aldebaran gws support

v2: updated MEC FW version after validating gws with debugger

Signed-off-by: Harish Kasiviswanathan <[email protected]>
Reviewed-by: Joseph Greathouse <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: enable subsequent retry fault

After draining the stale retry fault, or failed to validate the range
to recover, have to remove the fault address from fault filter ring, to
be able to handle subsequent retry interrupt on same address. Otherwise
the retry fault will not be processed to recover until timeout passed.

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: address remove from fault filter

Add interface to remove address from fault filter ring by resetting
fault ring entry key, then future vm fault on the address will be
processed to recover.

Define fault key as atomic64_t type to use atomic read/set/cmpxchg key
to protect fault ring access by interrupt handler and interrupt deferred
work for vg20. Change fault->timestamp to 48-bit to share same uint64_t
with 8-bit fault->next, it is enough for 48bit IH timestamp.

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: handle stale retry fault

Retry fault interrupt maybe pending in IH ring after GPU page table
is updated to recover the vm fault, because each page of the range
generate retry fault interrupt. There is race if application unmap
range to remove and free the range first and then retry fault work
restore_pages handle the retry fault interrupt, because range can not be
found, this vm fault can not be recovered and report incorrect GPU vm
fault to application.

Before unmap to remove and free range, drain retry fault interrupt
from IH ring1 to ensure no retry fault comes after the range is removed.

Drain retry fault interrupt skip the range which is on deferred list
to remove, or the range is child range, which is split by unmap, does
not add to svms and have interval notifier.

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: return IH ring drain finished if ring is empty

Sometimes IH do not setup ring wptr overflow flag after wptr exceed
rptr. As a workaround, if IH rptr equals to wptr, ring is empty,
return true to indicate IH ring checkpoint is processed, IH ring drain
is finished.

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: retry validation to recover range

GPU vm retry fault recover range need retry validation if

1. range is split in parallel by unmap while recover
2. range migrate to system memory and range is updated in system
memory while recover

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: fix spelling mistake in packet manager

The plural of 'process' should be 'processes'.

Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/amdgpu/sriov disable all ip hw status by default

Disable all ip's hw status to false before any hw_init.
Only set it to true until its hw_init is executed.

The old 5.9 branch has this change but somehow the 5.11 kernrel does
not have this fix.

Without this change, sriov tdr have gfx IB test fail.

Signed-off-by: Jack Zhang <[email protected]>
Review-by: Emily Deng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: restructure amdgpu_vram_mgr_new

Merge the two loops, loosen the restriction for big allocations.
This reduces the CPU overhead in the good case, but increases
it a bit under memory pressure.

Signed-off-by: Christian König <[email protected]>
Acked-and-Tested-by: Nirmoy Das <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: fix double free device pgmap resource

Use devm_memunmap_pages instead of memunmap_pages to release pgmap
and remove pgmap from device action, to avoid double free pgmap when
unloading driver module.

Release device memory region if failed to create device memory pages
structure.

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: Fix spelling mistake "unregisterd" -> "unregistered"

There is a spelling mistake in a pr_debug message. Fix it.

Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Correct and simplify sdma 4.x irq.num_types

Correct and init the sdma4.x irq.num_types.

v2: squash in fix (Alex)

Signed-off-by: Feifei Xu <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Change the sdma interrupt print level

Change the print level into debug.

Signed-off-by: Feifei Xu <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/sriov: Remove clear vf fw support

PSP clear_vf_fw feature is outdated and has been removed.
Remove the related functions.

Signed-off-by: Victor Zhao <[email protected]>
Reviewed-by: Emily Deng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: 3.2.133

Signed-off-by: Aric Cyr <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: [FW Promotion] Release 0.0.63

Signed-off-by: Anthony Koo <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Add SE_DCN3_REG_LIST for control SDP num

[Why] New platform. Need to add corresponding register control

Signed-off-by: Max.Tseng <[email protected]>
Reviewed-by: Anthony Koo <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: avoid to authentication when DEVICE_COUNT=0

[why]
we don't support authentication with DEVICE_COUNT=0

[how]
check value DEVICE_COUNT before doing authentication

Signed-off-by: Yu-ting Shen <[email protected]>
Reviewed-by: Wenjing Liu <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: take max dsc stream bandwidth overhead into account

[why]
As hardware team suggested that we need to add a max dsc bw overhead
into existing stream bandwidth when DSC is used.
The formula as below:

max_dsc_bw_overhead =
v_addressable * slice_count * 256 bit * pixel clock / v_total / h_total

effective stream bandwidth = pixel clock * bpp

stream bandwidth = effective stream bandwidth + dsc stream overhead

Signed-off-by: Wenjing Liu <[email protected]>
Reviewed-by: Eric Bernstein <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: fix wrong statement in mst hpd debugfs

[why]
Previous statement would always evaluate to true
making it meaningless
[how]
Just check if a connector is MST by checking if its port exists.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Mikita Lipski <[email protected]>
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Clear MASTER_UPDATE_LOCK_DB_EN when disable doublebuffer lock

[Why]
It seems there's a typo to set MASTER_UPDATE_LOCK_DB_EN when disable
doublebuffer lock.

[How]
Clear MASTER_UPDATE_LOCK_DB_EN when disable doublebuffer lock

Signed-off-by: Robin Chen <[email protected]>
Reviewed-by: Joshua Aberback <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Add new DP_SEC registers for programming SDP Line number

[Why]
Add register for programming in new platform

Signed-off-by: Max.Tseng <[email protected]>
Reviewed-by: Anthony Koo <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix BSOD with NULL check

[Why]
CLK mgr is null for server settings.

[How]
Guard the function with NULL check.

Signed-off-by: Chris Park <[email protected]>
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Added multi instance support for ABM

[WHY & HOW]
ABM assumes only 1 eDP is connected. Refactored existing
ABM interface to support multiple instances.

Signed-off-by: Jake Wang <[email protected]>
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: ddc resource data need to be initialized

[WHY]
to initial ddc structure before we use them to avoid error checking.

[HOW]
add 0 as structure initialization value in DCN resource construct

Signed-off-by: Yu-ting Shen <[email protected]>
Reviewed-by: Meenakshikumar Somasundaram <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Expose internal display flag via debugfs

[Why & How]
Add a per-connector debugfs entry to expose internal display flag,
which is indication that the display is "internally connected"
and not hotpluggable.

Signed-off-by: Stylon Wang <[email protected]>
Reviewed-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: skip program clock when allow seamless boot

[Why]
Driver program dpp clock calculate by pipe split config but hw config is single pipe.

[How]
Skip programming clock when allow seamless boot.
After porgramming pipe config, seamless boot flag will be clear.

Signed-off-by: Lewis Huang <[email protected]>
Reviewed-by: Eric Yang <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Revert wait vblank on update dpp clock

[Why]
This change only fix dpp clock switch to lower case.
New solution later can fix both case, which is "dc: skip
program clock when allow seamless boot"

[How]
This reverts commit "dc: wait vblank when stream enabled
and update dpp clock"

Signed-off-by: Lewis Huang <[email protected]>
Reviewed-by: Eric Yang <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: fix HDCP reset sequence on reinitialize

[why]
When setup is called after hdcp has already setup,
it would cause to disable HDCP flow won’t execute.

[how]
Don't clean up hdcp content to be 0.

Signed-off-by: Brandon Syu <[email protected]>
Reviewed-by: Wenjing Liu <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Add new case to get spread spectrum info

[WHY]
New minor revision needs to be handled

[HOW]
Add switch case and assert to catch missing switch cases in the future

Signed-off-by: Michael Strauss <[email protected]>
Reviewed-by: Eric Yang <[email protected]>
Acked-by: Wayne Lin <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: remove unnecessary header include

amdgpu.h is included in kfd_priv.h

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: John Clements <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: provide socket/die id info in RAS message

Add socket/die information in RAS messages for platforms
that support query those information

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: John Clements <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: implement smuio callback to query socket id

get_socket_id is used to query socket id

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: John Clements <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Enable SDMA LS for Vangogh

Add flags AMD_CG_SUPPORT_SDMA_LS for Vangogh.
Start to open sdma ls from firmware version 70.

Signed-off-by: Jinzhou Su <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: Fix kernel-doc syntax error

Fixed a kernel-doc error in the documentation of a function.

Signed-off-by: Fabio M. De Francesco <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Added missing prototype

Kernel test robot throws below warning ->

>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:125:5: warning:
>> no previous prototype for 'kgd_arcturus_hqd_sdma_load'
>> [-Wmissing-prototypes]
     125 | int kgd_arcturus_hqd_sdma_load(struct kgd_dev *kgd, void
*mqd,

>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:195:5: warning:
>> no previous prototype for 'kgd_arcturus_hqd_sdma_dump'
>> [-Wmissing-prototypes]
     195 | int kgd_arcturus_hqd_sdma_dump(struct kgd_dev *kgd,

>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:227:6: warning:
>> no previous prototype for 'kgd_arcturus_hqd_sdma_is_occupied'
>> [-Wmissing-prototypes]
     227 | bool kgd_arcturus_hqd_sdma_is_occupied(struct kgd_dev *kgd,
void *mqd)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c:246:5: warning:
>> no previous prototype for 'kgd_arcturus_hqd_sdma_destroy'
>> [-Wmissing-prototypes]
     246 | int kgd_arcturus_hqd_sdma_destroy(struct kgd_dev *kgd, void
*mqd,
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Added prototype for these functions.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Souptick Joarder <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

amdgpu/pm: set pp_dpm_dcefclk to readonly on NAVI10 and newer gpus

v2 : change condition to apply to all chips after NAVI10

Writing to dcefclk causes the gpu to become unresponsive, and requires a reboot.
Patch prevents user from successfully writing to file pp_dpm_dcefclk on parts
NAVI10 and newer, and gives better user feedback that this operation is not allowed.

Signed-off-by: Darren Powell <[email protected]>
Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

amdgpu/pm: Prevent force of DCEFCLK on NAVI10 and SIENNA_CICHLID

Writing to dcefclk causes the gpu to become unresponsive, and requires a reboot.
Patch ignores a .force_clk_levels(SMU_DCEFCLK) call and issues an
info message.

Signed-off-by: Darren Powell <[email protected]>
Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

amdgpu/pm: add extra info to SMU msg pre-check failed message

Insert the value of the response to error message emitted when the
SMU msg pre-check failes

Signed-off-by: Darren Powell <[email protected]>
Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm: Update energy_accumulator in gpu metrics

Now that it is available.

v2: squash in typecast fix (Alex)

Signed-off-by: Harish Kasiviswanathan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/dc: Fix a missing check bug in dm_dp_mst_detect()

In dm_dp_mst_detect(), We should check whether or not @connector
has been unregistered from userspace. If the connector is unregistered,
we should return disconnected status.

Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)")
Signed-off-by: Yingjie Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Reject non-zero src_y and src_x for video planes

[Why]
This hasn't been well tested and leads to complete system hangs on DCN1
based systems, possibly others.

The system hang can be reproduced by gesturing the video on the YouTube
Android app on ChromeOS into full screen.

[How]
Reject atomic commits with non-zero drm_plane_state.src_x or src_y values.

v2:
- Add code comment describing the reason we're rejecting non-zero
src_x and src_y
- Drop gerrit Change-Id
- Add stable CC
- Based on amd-staging-drm-next

v3: removed trailing whitespace

Signed-off-by: Harry Wentland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Nicholas Kazlauskas <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Hersen Wu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix concurrent VM flushes on Vega/Navi v2

Starting with Vega the hardware supports concurrent flushes
of VMID which can be used to implement per process VMID
allocation.

But concurrent flushes are mutual exclusive with back to
back VMID allocations, fix this to avoid a VMID used in
two ways at the same time.

v2: don't set ring to NULL

Signed-off-by: Christian König <[email protected]>
Reviewed-by: James Zhu <[email protected]>
Tested-by: James Zhu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: remove AMDGPU_GEM_CREATE_SHADOW flag

Remove unused AMDGPU_GEM_CREATE_SHADOW flag.
Userspace is never allowed to use this.

Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: cleanup amdgpu_bo_create()

Remove shadow bo related code as vm code is creating shadow bo
using proper API. Without shadow bo code, amdgpu_bo_create() is basically
a wrapper around amdgpu_bo_do_create(). So rename amdgpu_bo_do_create()
to amdgpu_bo_create().

Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: create shadow bo using amdgpu_bo_create_shadow()

Shadow BOs are only needed for vm code so call amdgpu_bo_create_shadow()
directly instead of depending on amdgpu_bo_create().

Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: remove unused vm context flags

Remove unused AMDGPU_VM_CONTEXT_GFX and AMDGPU_VM_CONTEXT_COMPUTE
flags.

Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: cleanup amdgpu_vm_init()

Currently only way to create compute vm is through
amdgpu_vm_make_compute(). So vm_context isn't required
anymore for amdgpu_vm_init().

Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: expose amdgpu_bo_create_shadow()

Exposed amdgpu_bo_create_shadow() will be needed
for amdgpu_vm handling.

Signed-off-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix coding style and documentation in amdgpu_vram_mgr.c

No functional changes, just cleaning up some leftovers and improve
documentation.

Signed-off-by: Christian König <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix coding style and documentation in amdgpu_gtt_mgr.c

Avoid the forward define, fix coding style, add documentation.

No functional change.

Signed-off-by: Christian König <[email protected]>
Reviewed-by: Nirmoy Das <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: remove redundant initialization to variable r

The variable r is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: fix uint32 variable compared to less than zero

Currently the call to kfd_process_gpuidx_from_gpuid is returning an
int value and this is being assigned to a uint32_t variable gpuidx
and this is being checked for a negative error return which is always
going to be false. Fix this by making gpuidx an int32_t. This makes
gpuidx also type consistent with the use of gpuidx from the callers.

Addresses-Coverity: ("Unsigned compared against 0")
Fixes: cda0f85bfa5e ("drm/amdkfd: refine migration policy with xnack on")
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: set attribute access for default ranges

Attribute access value for default ranges is set, based on
process xnack on/off.
XNACK ON has GPU access attribute for unregistered ranges through page
fault. While XNACK OFF has no access attribute for unregistered ranges.

Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: svm ranges creation for unregistered memory

SVM ranges are created for unregistered memory, triggered
by page faults. These ranges are migrated/mapped to
GPU VRAM memory.

Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: extend xnack limit page fault timeout

Extending this timeout will prevent IH from storm interrupts coming
from SDMA while a page fault is active. Currently, on Aldebaran,
handling that many interrupts can take a lot of CPU time
(up to 4 seconds).
This eventually causes timeouts in other tasks.

Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: disable gfx ras by default in aldebaran

aldebaran gfx ras is still under development

Signed-off-by: Hawking Zhang <[email protected]>
Reviewed-by: Dennis Li <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: add per-vmid-debug map_process_support

In order to support multi-process debugging, HWS PM4 packet
MAP_PROCESS requires an extension of 5 DWORDS to support targeting of
per-vmid SPI debug control registers as well as watch points per process.

Signed-off-by: Jonathan Kim <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/amdgpu: add cgls

enable cgls to improve the runtime power efficiency.

Signed-off-by: Kenneth Feng <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: optimize gfx ras features flag clean

Signed-off-by: Stanley.Yang <[email protected]>
Reviewed-by: Feifei Xu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Enable SDMA MGCG for Vangogh

Add flags AMD_CG_SUPPORT_SDMA_MGCG for Vangogh.
Start to open sdma mgcg from firmware version 70.

Signed-off-by: Jinzhou Su <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add support for ras init flags

conditionally configure ras for dgpu mode or poison propogation mode

Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: John Clements <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: refine gprs init shaders to check coverage

Add codes to check whether all SIMDs are covered, make sure that all
GPRs are initialized.

Signed-off-by: Dennis Li <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/amdgpu/amdgpu_cs: Repair some function naming disparity

Fixes the following W=1 kernel build warning(s):

drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:685: warning: expecting prototype for cs_parser_fini(). Prototype was for amdgpu_cs_parser_fini() instead
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1502: warning: expecting prototype for amdgpu_cs_wait_all_fence(). Prototype was for amdgpu_cs_wait_all_fences() instead
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1656: warning: expecting prototype for amdgpu_cs_find_bo_va(). Prototype was for amdgpu_cs_find_mapping() instead

Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/amdgpu/amdgpu_ring: Provide description for 'sched_score'

Fixes the following W=1 kernel build warning(s):

drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:169: warning: Function parameter or member 'sched_score' not described in 'amdgpu_ring_init'

Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/amdgpu/amdgpu_ttm: Fix incorrectly documented function 'amdgpu_ttm_copy_mem_to_mem()'

Fixes the following W=1 kernel build warning(s):

drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:311: warning: expecting prototype for amdgpu_copy_ttm_mem_to_mem(). Prototype was for amdgpu_ttm_copy_mem_to_mem() instead

Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/amdgpu/amdgpu_gart: Correct a couple of function names in the docs

Fixes the following W=1 kernel build warning(s):

drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:73: warning: expecting prototype for amdgpu_dummy_page_init(). Prototype was for amdgpu_gart_dummy_page_init() instead
drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:96: warning: expecting prototype for amdgpu_dummy_page_fini(). Prototype was for amdgpu_gart_dummy_page_fini() instead

Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Nirmoy Das <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Nirmoy Das <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/amdgpu/amdgpu_fence: Provide description for 'sched_score'

Fixes the following W=1 kernel build warning(s):

drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:444: warning: Function parameter or member 'sched_score' not described in 'amdgpu_fence_driver_init_ring'

Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Jerome Glisse <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/radeon/radeon_device: Provide function name in kernel-doc header

Fixes the following W=1 kernel build warning(s):

drivers/gpu/drm/radeon/radeon_device.c:1101: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst

Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/amdgpu/amdgpu_device: Remove unused variable 'r'

Fixes the following W=1 kernel build warning(s):

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: In function ‘amdgpu_device_suspend’:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3733:6: warning: variable ‘r’ set but not used [-Wunused-but-set-variable]

Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Lee Jones <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: Add CONFIG_HSA_AMD_SVM

Control whether to build SVM support into amdgpu with a Kconfig option.
This makes it easier to disable it in production kernels if this new
feature causes problems in production environments.

Use "depends on" instead of "select" for DEVICE_PRIVATE, as is
recommended for visible options.

Reviewed-by: Philip Yang <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: Add SVM API support capability bits

SVMAPISupported property added to HSA_CAPABILITY, the value match
HSA_CAPABILITY defined in Thunk spec:

SVMAPISupported: it will not be supported on older kernels that don't
have HMM or on systems with GFXv8 or older GPUs without support for
48-bit virtual addresses.

CoherentHostAccess property added to HSA_MEMORYPROPERTY, the value match
HSA_MEMORYPROPERTY defined in Thunk spec:

CoherentHostAccess: whether or not device memory can be coherently
accessed by the host CPU.

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: multiple gpu migrate vram to vram

If prefetch range to gpu with acutal location is another gpu, or GPU
retry fault restore pages to migrate the range with acutal location is
gpu, then migrate from one gpu to another gpu.

Use system memory as bridge because sdma engine may not able to access
another gpu vram, use sdma of source gpu to migrate to system memory,
then use sdma of destination gpu to migrate from system memory to gpu.

Print out gpuid or gpuidx in debug messages.

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: add svm range validate timestamp

With xnack on, add validate timestamp in order to handle GPU vm fault
from multiple GPUs.

If GPU retry fault need migrate the range to the best restore location,
use range validate timestamp to record system timestamp after range is
restored to update GPU page table.

Because multiple pages of same range have multiple retry fault, define
AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING to the long time period that
pending retry fault may still comes after page table update, to skip
duplicate retry fault of same range.

If difference between system timestamp and range last validate timestamp
is bigger than AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING, that means the
retry fault is from another GPU, then continue to handle retry fault
recover.

Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: refine migration policy with xnack on

With xnack on, GPU vm fault handler decide the best restore location,
then migrate range to the best restore location and update GPU mapping
to recover the GPU vm fault.

Signed-off-by: Philip Yang <[email protected]>
Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add svm_bo eviction to enable_signal cb

Add to amdgpu_amdkfd_fence.enable_signal callback, support
for svm_bo fence eviction.

Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: svm bo enable_signal call condition

[why]
To support svm bo eviction mechanism.

[how]
If the BO crated has AMDGPU_AMDKFD_CREATE_SVM_BO flag set,
enable_signal callback will be called inside amdgpu_evict_flags.
This also causes gutting of the BO by removing all placements,
so that TTM won't actually do an eviction. Instead it will discard
the memory held by the BO. This is needed for HMM migration to user
mode system memory pages.

Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: add svm_bo eviction mechanism support

svm_bo eviction mechanism is different from regular BOs.
Every SVM_BO created contains one eviction fence and one
worker item for eviction process.
SVM_BOs can be attached to one or more pranges.
For SVM_BO eviction mechanism, TTM will start to call
enable_signal callback for every SVM_BO until VRAM space
is available.
Here, all the ttm_evict calls are synchronous, this guarantees
that each eviction has completed and the fence has signaled before
it returns.

Signed-off-by: Alex Sierra <[email protected]>
Signed-off-by: Philip Yang <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add param bit flag to create SVM BOs

Add CREATE_SVM_BO define bit for SVM BOs.
Another define flag was moved to concentrate these
KFD type flags in one include file.

Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: add svm_bo reference for eviction fence

[why]
As part of the SVM functionality, the eviction mechanism used for
SVM_BOs is different. This mechanism uses one eviction fence per prange,
instead of one fence per kfd_process.

[how]
A svm_bo reference to amdgpu_amdkfd_fence to allow differentiate between
SVM_BO or regular BO evictions. This also include modifications to set the
reference at the fence creation call.

Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: SVM API call to restore page tables

Use SVM API to restore page tables when retry fault and
compute context are enabled.

Signed-off-by: Alex Sierra <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>