Dan Carpenter [Tue, 11 Jul 2017 19:53:29 +0000 (22:53 +0300)]
drm/amdgpu: Off by one sanity checks
This is just future proofing code, not something that can be triggered
in real life. We're testing to make sure we don't shift wrap when we
do "1ull << i" so "i" has to be in the 0-63 range. If it's 64 then we
have gone too far.
Jay Cornwall [Wed, 26 Apr 2017 19:51:57 +0000 (14:51 -0500)]
drm/amdgpu: Send no-retry XNACK for all fault types
A subset of VM fault types currently send retry XNACK to the client.
This causes a storm of interrupts from the VM to the host.
Until the storm is throttled by other means send no-retry XNACK for
all fault types instead. No change in behavior to the client which
will stall indefinitely with the current configuration in any case.
Improves system stability under GC or MMHUB faults.
Felix Kuehling [Fri, 15 Jul 2016 22:37:05 +0000 (18:37 -0400)]
drm/amdgpu: Make SDMA phase quantum configurable
Set a configurable SDMA phase quantum when enabling SDMA context
switching. The default value significantly reduces SDMA latency
in page table updates when user-mode SDMA queues have concurrent
activity, compared to the initial HW setting.
shaoyunl [Fri, 4 Dec 2015 20:01:22 +0000 (15:01 -0500)]
drm/amdgpu: Enable SDMA_CNTL.ATC_L1_ENABLE for SDMA on CZ
For GFX context, the ATC bit in SDMA*_GFX_VIRTUAL_ADDRESS can be cleared
to perform in VM mode. For RLC context, to support ATC mode , ATC bit in
SDMA*_RLC*_VIRTUAL_ADDRESS should be set. SDMA_CNTL.ATC_L1_ENABLE bit is
global setting that enables the L1-L2 translation for ATC address.
Michel Dänzer [Tue, 4 Jul 2017 08:16:42 +0000 (17:16 +0900)]
drm/amdgpu: Try evicting from CPU visible to invisible VRAM first
This gives BOs which haven't been accessed by the CPU since they were
moved to visible VRAM another chance to stay in VRAM when another BO
needs to go to visible VRAM.
This should allow BOs to stay in VRAM longer in some cases.
v2:
* Only do this for BOs which don't have the
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag set.
John Brooks [Wed, 28 Jun 2017 02:33:21 +0000 (22:33 -0400)]
drm/amdgpu: Don't force BOs into visible VRAM for page faults
There is no need for page faults to force BOs into visible VRAM if it's
full, and the time it takes to do so is great enough to cause noticeable
stuttering. Add GTT as a possible placement so that if visible VRAM is
full, page faults move BOs to GTT instead of evicting other BOs from VRAM.
John Brooks [Fri, 30 Jun 2017 15:31:08 +0000 (11:31 -0400)]
drm/amdgpu: Set/clear CPU_ACCESS flag on page fault and move to VRAM
When a BO is moved to VRAM, clear AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED.
This allows it to potentially later move to invisible VRAM if the CPU
does not access it again.
Setting the CPU_ACCESS flag in amdgpu_bo_fault_reserve_notify() also means
that we can remove the loop to restrict lpfn to the end of visible VRAM,
because amdgpu_ttm_placement_init() will do it for us.
v3 [Michel Dänzer]
* Use AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED instead of a new flag
(Christian König)
* Clear flag in amdgpu_bo_move instead of amdgpu_move_ram_vram
(Christian)
* Explicitly mention amdgpu_bo_fault_reserve_notify in amdgpu_bo_move
* Also clear flag in amdgpu_bo_create_restricted
The BO move throttling code is designed to allow VRAM to fill quickly if it
is relatively empty. However, this does not take into account situations
where the visible VRAM is smaller than total VRAM, and total VRAM may not
be close to full but the visible VRAM segment is under pressure. In such
situations, visible VRAM would experience unrestricted swapping and
performance would drop.
Add a separate counter specifically for moves involving visible VRAM, and
check it before moving BOs there.
v2: Only perform calculations for separate counter if visible VRAM is
smaller than total VRAM. (Michel Dänzer)
v3: [Michel Dänzer]
* Use BO's location rather than the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
flag to determine whether to account a move for visible VRAM in most
cases.
* Use a single
if (adev->mc.visible_vram_size < adev->mc.real_vram_size) {
block in amdgpu_cs_get_threshold_for_moves.
Fixes: 95844d20ae02 (drm/amdgpu: throttle buffer migrations at CS using a fixed MBps limit (v2)) Signed-off-by: John Brooks <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
drm/amdgpu: set firmware loading type as direct by default for raven
In previous case, driver can't enable psp via the kernel parameter for raven.
We should open this path and set it as direct by default till psp firmware
loading is workable.
Shaoyun Liu [Wed, 5 Jul 2017 14:56:14 +0000 (10:56 -0400)]
drm/amdgpu: NO KIQ usage on nbio hdp flush routine
nbio hdp flush routine are called within atomic context.
Avoid use KIQ when write to the HDP_MEM_COHERENCY_FLUSH_CNTL register
since this register has its own VF copy
Monk Liu [Tue, 6 Jun 2017 09:25:13 +0000 (17:25 +0800)]
drm/amdgpu:fix world switch hang
for SR-IOV, we must keep the pipeline-sync in the protection
of COND_EXEC, otherwise the command consumed by CPG is not
consistent when world switch triggerd, e.g.:
world switch hit and the IB frame is skipped so the fence
won't signal, thus CP will jump to the next DMAframe's pipeline-sync
command, and it will make CP hang foever.
after pipelin-sync moved into COND_EXEC the consistency can be
guaranteed
ttm_place are not supposed to change at runtime. All functions
working with ttm_place provided by <drm/ttm/ttm_placement.h> work
with const ttm_place. So mark the non-const structs as const.
drm_prop_enum_lists are not supposed to change at runtime. All functions
working with drm_prop_enum_list provided by <drm/drm_property.h> work with
const drm_prop_enum_list. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
18276 384 0 18660 48e4 drivers/gpu/drm/radeon/radeon_display.o
File size After adding 'const':
text data bss dec hex filename
18660 0 0 18660 48e4 drivers/gpu/drm/radeon/radeon_display.o
ttm_place are not supposed to change at runtime. All functions
working with ttm_place provided by <drm/ttm/ttm_placement.h> work
with const ttm_place. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
9235 344 136 9715 25f3 drivers/gpu/drm/radeon/radeon_ttm.o
File size After adding 'const':
text data bss dec hex filename
9267 312 136 9715 25f3 drivers/gpu/drm/radeon/radeon_ttm.o
ozeng [Tue, 27 Jun 2017 19:45:18 +0000 (14:45 -0500)]
drm/amdgpu: Changed CU reservation golden settings
With previous golden settings, compute task can't use
reserved LDS (32K) on CU0 and CU1. On 64K LDS system,
if compute work group allocate more than 32K LDS, then
it can't be dispatched to CU0 and CU1 because of the
reservation. This enables compute task to use reserved
LDS on CU0 and CU1.
Gavin Wan [Fri, 23 Jun 2017 17:55:15 +0000 (13:55 -0400)]
drm/amdgpu: Support passing amdgpu critical error to host via GPU Mailbox.
This feature works for SRIOV enviroment. For non-SRIOV enviroment, the
trans_error function does nothing.
The error information includes error_code (16bit), error_flags(16bit)
and error_data(64bit). Since there are not many errors, we keep the
errors in an array and transfer all errors to Host before amdgpu
initialization function (amdgpu_device_init) exit.
Mario Kleiner [Wed, 21 Jun 2017 01:44:56 +0000 (03:44 +0200)]
drm/amdgpu: Allow vblank_disable_immediate.
With instantaneous high precision vblank timestamping
that updates at leading edge of vblank, a cooked hw
vblank counter which increments at leading edge of
vblank, and reliable page flip execution and completion
at leading edge of vblank, we should meet the requirements
for fast/immediate vblank irq disable/enable.
Testing on Linux-4.12-rc5 + drm-next on a Radeon R9 380
Tonga Pro (DCE 10) with timing measurement equipment
indicates this works fine, so allow immediate vblank
disable for power saving.
For debugging in case of unexpected trouble, booting
with kernel cmdline option drm.vblankoffdelay=0
(or echo 0 > /sys/module/drm/parameters/vblankoffdelay)
would keep vblank irqs permanently on to approximate old
behavior.
Mario Kleiner [Wed, 21 Jun 2017 01:44:55 +0000 (03:44 +0200)]
drm/radeon: Allow vblank_disable_immediate.
With instantaneous high precision vblank timestamping
that updates at leading edge of vblank, a cooked hw
vblank counter which increments at leading edge of
vblank, and reliable page flip execution and completion
at leading edge of vblank, we should meet the requirements
for fast/immediate vblank irq disable/enable.
Testing on Linux-4.12-rc5 + drm-next on a Radeon HD 5770
(DCE 4) with timing measurement equipment indicates this
works fine, so allow immediate vblank disable for power
saving.
For debugging in case of unexpected trouble, booting
with kernel cmdline option drm.vblankoffdelay=0
(or echo 0 > /sys/module/drm/parameters/vblankoffdelay)
would keep vblank irqs permanently on to approximate old
behavior.
Alex Deucher [Tue, 1 Nov 2016 17:15:29 +0000 (13:15 -0400)]
drm/amdgpu/gmc6: use the vram location programmed by the vbios
This makes mc programming much simpler in future patches.
Since evergreen, the vbios has been programming the fb location
to the proper vram size. The only reason to reprogram it would
be to change the location.
Alex Deucher [Tue, 1 Nov 2016 17:14:45 +0000 (13:14 -0400)]
drm/amdgpu/gmc7: use the vram location programmed by the vbios
This makes mc programming much simpler in future patches.
Since evergreen, the vbios has been programming the fb location
to the proper vram size. The only reason to reprogram it would
be to change the location.
Alex Deucher [Tue, 1 Nov 2016 17:08:33 +0000 (13:08 -0400)]
drm/amdgpu/gmc8: use the vram location programmed by the vbios
This makes mc programming much simpler in future patches.
Since evergreen, the vbios has been programming the fb location
to the proper vram size. The only reason to reprogram it would
be to change the location.
Alex Deucher [Mon, 19 Jun 2017 21:00:38 +0000 (17:00 -0400)]
drm/amdgpu: disable vga render in dce hw_init
This got dropped accidently with the fb location changes, but for
some reason, this doesn't seem to cause an issue on all cards which
is why I never saw it despite extensive testing. I suspect it may
only be an issue on systems with a legacy sbios that enables vga.
Dave Airlie [Thu, 13 Jul 2017 01:22:34 +0000 (11:22 +1000)]
Merge tag 'drm-misc-next-fixes-2017-07-10' of git://anongit.freedesktop.org/git/drm-misc into drm-next
Core Changes:
- Fix empty timestamps on hw without vlbank counter (Laurent)
- Clear atomic state before retrying ww/mutex acquisition in remove_fb (Maarten)