Maxime Ripard [Wed, 25 Oct 2023 13:24:27 +0000 (15:24 +0200)]
drm/tests: Remove slow tests
Both the drm_buddy and drm_mm tests have been converted from selftest to
kunit recently.
However, a significant portion of them are trying to exert some part of
their API over a huge number of iterations and with random variations of
their parameters. They are thus more a way to discover new bugs than
actual unit tests.
This is fine in itself but leads to very slow runtime (up to 25s for
some drm_test_mm_reserve and drm_test_mm_insert on a Ryzen 7950x while
running the tests in qemu) which make them a poor fit for kunit.
Let's remove those tests from the drm_mm and drm_buddy test suites for
now, and if it's ever needed we can always create proper unit tests for
them later on.
This made the entire DRM tests execution time (as of v6.6-rc1) come from
65s to less than .5s on a Ryzen 7950x system when running under qemu,
and from 9 minutes to about 4s on a RaspberryPi4.
Usages of DRM_IVPU_BO_UNCACHED should be replaced by DRM_IVPU_BO_WC.
There is no functional benefit from DRM_IVPU_BO_UNCACHED if these
buffers are never mapped to host VM.
This allows to cut the buffer handling code in the kernel driver
by half.
Usage of DRM_IVPU_BO_UNCACHED buffers was removed from user-space
driver and will not be part of first UMD release.
accel/ivpu: Allocate vpu_addr in gem->open() callback
Use gem->open() callback to simplify the code and prepare for gem_shmem
conversion. It is called during handle creation for a gem object,
during prime import and in BO_CREATE ioctl. Hence can be used for vpu_addr
allocation. On the way remove unused bo->user_ptr field.
Luben Tuikov [Thu, 2 Nov 2023 22:17:20 +0000 (18:17 -0400)]
drm/sched: Don't disturb the entity when in RR-mode scheduling
Don't call drm_sched_select_entity() in drm_sched_run_job_queue(). In fact,
rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
it do just that, schedule the work item for execution.
The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
to determine if the scheduler has an entity ready in one of its run-queues,
and in the case of the Round-Robin (RR) scheduling, the function
drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
which is ready, sets up the run-queue and completion and returns that
entity. The FIFO scheduling algorithm is unaffected.
Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
in the case of RR scheduling, that would result in drm_sched_select_entity()
having been called twice, which may result in skipping a ready entity if more
than one entity is ready. This commit fixes this by eliminating the call to
drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
in drm_sched_run_job_work().
v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
Add fixes-tag. (Tvrtko)
drm/v3d: Expose the total GPU usage stats on sysfs
The previous patch exposed the accumulated amount of active time per
client for each V3D queue. But this doesn't provide a global notion of
the GPU usage.
Therefore, provide the accumulated amount of active time for each V3D
queue (BIN, RENDER, CSD, TFU and CACHE_CLEAN), considering all the jobs
submitted to the queue, independent of the client.
This data is exposed through the sysfs interface, so that if the
interface is queried at two different points of time the usage percentage
of each of the queues can be calculated.
drm/v3d: Implement show_fdinfo() callback for GPU usage stats
This patch exposes the accumulated amount of active time per client
through the fdinfo infrastructure. The amount of active time is exposed
for each V3D queue: BIN, RENDER, CSD, TFU and CACHE_CLEAN.
In order to calculate the amount of active time per client, a CPU clock
is used through the function local_clock(). The point where the jobs has
started is marked and is finally compared with the time that the job had
finished.
Moreover, the number of jobs submitted to each queue is also exposed on
fdinfo through the identifier "v3d-jobs-<queue>".
drm: Do not round to megabytes for greater than 1MiB sizes in fdinfo stats
It is better not to lose precision and not revert to 1 MiB size
granularity for every size greater than 1 MiB.
Sizes in KiB should not be so troublesome to read (and in fact machine
parsing is I expect the norm here), they align with other api like
/proc/meminfo, and they allow writing tests for the interface without
having to embed drm.ko implementation knowledge into them. (Like knowing
that minimum buffer size one can use for successful verification has to be
1MiB aligned, and on top account for any pre-existing memory utilisation
outside of driver's control.)
But probably even more importantly I think that it is just better to show
the accurate sizes and not arbitrary lose precision for a little bit of a
stretched use case of eyeballing fdinfo text directly.
Tvrtko Ursulin [Thu, 2 Nov 2023 10:55:38 +0000 (10:55 +0000)]
drm/sched: Drop suffix from drm_sched_wakeup_if_can_queue
Because a) helper is exported to other parts of the scheduler and
b) there isn't a plain drm_sched_wakeup to begin with, I think we can
drop the suffix and by doing so separate the intimiate knowledge
between the scheduler components a bit better.
Tvrtko Ursulin [Thu, 2 Nov 2023 10:55:37 +0000 (10:55 +0000)]
drm/sched: Rename drm_sched_run_job_queue_if_ready and clarify kerneldoc
"If ready" is not immediately clear what it means - is the scheduler
ready or something else? Drop the suffix, clarify kerneldoc, and employ
the same naming scheme as in drm_sched_run_free_queue:
- drm_sched_run_job_queue - enqueues if there is something to enqueue
*and* scheduler is ready (can queue)
- __drm_sched_run_job_queue - low-level helper to simply queue the job
Tvrtko Ursulin [Thu, 2 Nov 2023 10:55:36 +0000 (10:55 +0000)]
drm/sched: Rename drm_sched_free_job_queue to be more descriptive
The current name makes it sound like helper will free a queue, while what
it does is it enqueues the free job worker.
Rename it to drm_sched_run_free_queue to align with existing
drm_sched_run_job_queue.
Despite that creating an illusion there are two queues, while in reality
there is only one, at least it creates a consistent naming for the two
enqueuing helpers.
At the same time simplify the "if done" helper by dropping the suffix and
adding a double underscore prefix to the one which just enqueues.
Tvrtko Ursulin [Thu, 2 Nov 2023 10:55:35 +0000 (10:55 +0000)]
drm/sched: Move free worker re-queuing out of the if block
Whether or not there are more jobs to clean up does not depend on the
existance of the current job, given both drm_sched_get_finished_job and
drm_sched_free_job_queue_if_done take and drop the job list lock.
Therefore it is confusing to make it read like there is a dependency.
Tvrtko Ursulin [Thu, 2 Nov 2023 10:55:34 +0000 (10:55 +0000)]
drm/sched: Rename drm_sched_get_cleanup_job to be more descriptive
"Get cleanup job" makes it sound like helper is returning a job which will
execute some cleanup, or something, while the kerneldoc itself accurately
says "fetch the next _finished_ job". So lets rename the helper to be self
documenting.
accel/qaic: Support for 0 resize slice execution in BO
Add support to partially execute a slice which is resized to zero.
Executing a zero size slice in a BO should mean that there is no DMA
transfers involved but you should still configure doorbell and semaphores.
For example consider a BO of size 18K and it is sliced into 3 6K slices
and user calls partial execute ioctl with resize as 10K.
slice 0 - size is 6k and offset is 0, so resize of 10K will not cut short
this slice hence we send the entire slice for execution.
slice 1 - size is 6k and offset is 6k, so resize of 10K will cut short this
slice and only the first 4k should be DMA along with configuring
doorbell and semaphores.
slice 2 - size is 6k and offset is 12k, so resize of 10k will cut short
this slice and no DMA transfer would be involved but we should
would configure doorbell and semaphores.
This change begs to change the behavior of 0 resize. Currently, 0 resize
partial execute ioctl behaves exactly like execute ioctl i.e. no resize.
After this patch all the slice in BO should behave exactly like slice 2 in
above example.
Refactor copy_partial_exec_reqs() to make it more readable and less
complex.
Carl Vanderlip [Fri, 27 Oct 2023 18:08:10 +0000 (12:08 -0600)]
accel/qaic: Quiet array bounds check on DMA abort message
Current wrapper is right-sized to the message being transferred;
however, this is smaller than the structure defining message wrappers
since the trailing element is a union of message/transfer headers of
various sizes (8 and 32 bytes on 32-bit system where issue was
reported). Using the smaller header with a small message
(wire_trans_dma_xfer is 24 bytes including header) ends up being smaller
than a wrapper with the larger header. There are no accesses outside of
the defined size, however they are possible if the larger union member
is referenced.
Abort messages are outside of hot-path and changing the wrapper struct
would require a larger rewrite, so having the memory allocated to the
message be 8 bytes too big is acceptable.
This patch updates a number of register addresses that have
been changed in Raspberry Pi 5 (V3D 7.1) and updates the
code to use the corresponding registers and addresses based
on the actual V3D version.
Thierry Reding [Wed, 1 Nov 2023 17:20:17 +0000 (18:20 +0100)]
fbdev/simplefb: Add support for generic power-domains
The simple-framebuffer device tree bindings document the power-domains
property, so make sure that simplefb supports it. This ensures that the
power domains remain enabled as long as simplefb is active.
v2: - remove unnecessary call to simplefb_detach_genpds() since that's
already done automatically by devres
- fix crash if power-domains property is missing in DT
Thierry Reding [Wed, 1 Nov 2023 17:20:16 +0000 (18:20 +0100)]
fbdev/simplefb: Support memory-region property
The simple-framebuffer bindings specify that the "memory-region"
property can be used as an alternative to the "reg" property to define
the framebuffer memory used by the display hardware. Implement support
for this in the simplefb driver.
Matthew Brost [Tue, 31 Oct 2023 03:24:37 +0000 (20:24 -0700)]
drm/sched: Split free_job into own work item
Rather than call free_job and run_job in same work item have a dedicated
work item for each. This aligns with the design and intended use of work
queues.
v2:
- Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
timestamp in free_job() work item (Danilo)
v3:
- Drop forward dec of drm_sched_select_entity (Boris)
- Return in drm_sched_run_job_work if entity NULL (Boris)
v4:
- Replace dequeue with peek and invert logic (Luben)
- Wrap to 100 lines (Luben)
- Update comments for *_queue / *_queue_if_ready functions (Luben)
v5:
- Drop peek argument, blindly reinit idle (Luben)
- s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben)
- Update work_run_job & work_free_job kernel doc (Luben)
v6:
- Do not move drm_sched_select_entity in file (Luben)
Matthew Brost [Tue, 31 Oct 2023 03:24:36 +0000 (20:24 -0700)]
drm/sched: Convert drm scheduler to use a work queue rather than kthread
In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.
1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.
2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.
A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.
v2:
- (Rob Clark) Fix msm build
- Pass in run work queue
v3:
- (Boris) don't have loop in worker
v4:
- (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
- (Boris) default to ordered work queue
v6:
- (Luben / checkpatch) fix alignment in msm_ringbuffer.c
- (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
- (Luben) Update comment for drm_sched_wqueue_enqueue
- (Luben) Positive check for submit_wq in drm_sched_init
- (Luben) s/alloc_submit_wq/own_submit_wq
v7:
- (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
- (Luben) Adjust var names / comments
Matthew Brost [Tue, 31 Oct 2023 03:24:35 +0000 (20:24 -0700)]
drm/sched: Add drm_sched_wqueue_* helpers
Add scheduler wqueue ready, stop, and start helpers to hide the
implementation details of the scheduler from the drivers.
v2:
- s/sched_wqueue/sched_wqueue (Luben)
- Remove the extra white line after the return-statement (Luben)
- update drm_sched_wqueue_ready comment (Luben)
Christian König [Fri, 8 Sep 2023 08:27:23 +0000 (10:27 +0200)]
dma-buf: add dma_fence_timestamp helper
When a fence signals there is a very small race window where the timestamp
isn't updated yet. sync_file solves this by busy waiting for the
timestamp to appear, but on other ocassions didn't handled this
correctly.
Provide a dma_fence_timestamp() helper function for this and use it in
all appropriate cases.
Another alternative would be to grab the spinlock when that happens.
v2 by teddy: add a wait parameter to wait for the timestamp to show up, in case
the accurate timestamp is needed and/or the timestamp is not based on
ktime (e.g. hw timestamp)
v3 chk: drop the parameter again for unified handling
VPU was rebranded as NPU (Neural Processing Unit) so user facing
strings have to be updated but the code remains as is and the module
is still called intel_vpu.ko.
Karol Wachowski [Sat, 28 Oct 2023 15:59:34 +0000 (17:59 +0200)]
accel/ivpu: Make DMA allocations for MMU600 write combined
Previously using dma_alloc_wc() API we created cache coherent
(mapped as write-back) mappings.
Because we disable MMU600 snooping it was required to do costly
page walk and cache flushes after each page table modification.
With write-combined buffers it's possible to do a single write memory
barrier to flush write-combined buffer to memory which simplifies the
driver and significantly reduce time of map/unmap operations.
Mapping time of 255 MB is reduced from 2.5 ms to 500 us.
Waking up process, which wait for particular condition, will go to
sleep again on wake_up() if the condition is not met. Add abort flag
to wake up IPC receivers, which will finish with -ECANCELED error.
This is only needed for reset, run time power management prevent to
suspend VPU when there is pending IPC processing or pending job.
Stop job_done thread when going to suspend. Use kthread_park() instead
of kthread_stop() to avoid memory allocation and potential failure
on resume.
Use separate function as thread wake up condition. Use spin lock to assure
rx_msg_list is properly protected against concurrent access. This avoid
race condition when the rx_msg_list list is modified and read in
ivpu_ipc_recive() at the same time.
accel/ivpu/40xx: Allow to change profiling frequency
Profiling freq is a debug firmware feature. It switches default clock
to higher resolution for fine-grained and more accurate firmware task
profiling. We already configure it during boot up of VPU4.
Add debugfs knob and helpers per HW generation that allow to change it.
For vpu37xx the implementation is empty as profiling frequency can only
be changed on VPU4 or newer.
The DMA-fence annotations cause a lockdep warning (see below). As per
https://patchwork.freedesktop.org/patch/462170/ it sounds like the
annotations don't work correctly.
======================================================
WARNING: possible circular locking dependency detected
6.5.0-rc2+ #2 Not tainted
------------------------------------------------------
kmstest/219 is trying to acquire lock: c4705838 (&hdmi->lock){+.+.}-{3:3}, at: hdmi5_bridge_mode_set+0x1c/0x50
but task is already holding lock: c11e1128 (dma_fence_map){++++}-{0:0}, at: omap_atomic_commit_tail+0x14/0xbc
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
3 locks held by kmstest/219:
#0: f1011de4 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_mode_atomic_ioctl+0xf0/0xc38
#1: c47059c8 (crtc_ww_class_mutex){+.+.}-{3:3}, at: modeset_lock+0xf8/0x230
#2: c11e1128 (dma_fence_map){++++}-{0:0}, at: omap_atomic_commit_tail+0x14/0xbc
stack backtrace:
CPU: 1 PID: 219 Comm: kmstest Not tainted 6.5.0-rc2+ #2
Hardware name: Generic DRA74X (Flattened Device Tree)
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x58/0x70
dump_stack_lvl from check_noncircular+0x164/0x198
check_noncircular from __lock_acquire+0x145c/0x29cc
__lock_acquire from lock_acquire.part.0+0xb4/0x258
lock_acquire.part.0 from __mutex_lock+0x90/0x950
__mutex_lock from mutex_lock_nested+0x1c/0x24
mutex_lock_nested from hdmi5_bridge_mode_set+0x1c/0x50
hdmi5_bridge_mode_set from drm_bridge_chain_mode_set+0x48/0x5c
drm_bridge_chain_mode_set from crtc_set_mode+0x188/0x1d0
crtc_set_mode from omap_atomic_commit_tail+0x2c/0xbc
omap_atomic_commit_tail from commit_tail+0x9c/0x188
commit_tail from drm_atomic_helper_commit+0x158/0x18c
drm_atomic_helper_commit from drm_atomic_commit+0xa4/0xe8
drm_atomic_commit from drm_mode_atomic_ioctl+0x9a4/0xc38
drm_mode_atomic_ioctl from drm_ioctl+0x210/0x4a8
drm_ioctl from sys_ioctl+0x138/0xf00
sys_ioctl from ret_fast_syscall+0x0/0x1c
Exception stack(0xf1011fa8 to 0xf1011ff0)
1fa0: 00466d58be9ab51000000003c03864bcbe9ab510be9ab4e0
1fc0: 00466d58be9ab510c03864bc0000003600466ef000466fc00046702000466f20
1fe0: b6bc7ef4be9ab4d0b6bbbb00b6cb2cc0
The DMA-fence annotations cause a lockdep warning (see below). As per
https://patchwork.freedesktop.org/patch/462170/ it sounds like the
annotations don't work correctly.
======================================================
WARNING: possible circular locking dependency detected
6.6.0-rc2+ #1 Not tainted
------------------------------------------------------
kmstest/733 is trying to acquire lock: ffff8000819377f0 (fs_reclaim){+.+.}-{0:0}, at: __kmem_cache_alloc_node+0x58/0x2d4
but task is already holding lock: ffff800081a06aa0 (dma_fence_map){++++}-{0:0}, at: tidss_atomic_commit_tail+0x20/0xc0 [tidss]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
Steven Price [Fri, 20 Oct 2023 10:44:05 +0000 (11:44 +0100)]
drm/panfrost: Remove incorrect IS_ERR() check
sg_page_iter_page() doesn't return an error code, so the IS_ERR() check
is wrong and the error path will never be executed. This also allows
simplifying the code to remove the local variable 'page'.
Maíra Canal [Mon, 23 Oct 2023 10:58:33 +0000 (07:58 -0300)]
drm/v3d: wait for all jobs to finish before unregistering
Currently, we are only warning the user if the BIN or RENDER jobs don't
finish before we unregister V3D. We must wait for all jobs to finish
before unregistering. Therefore, warn the user if TFU or CSD jobs
are not done by the time the driver is unregistered.
accel/ivpu: Add support for delayed D0i3 entry message
Currently the VPU firmware prepares for D0i3 every time the VPU
is entering D0i2 Idle state. This is not optimal as we might not
enter D0i3 every time we enter D0i2 Idle and this preparation
is quite costly.
This optimization moves D0i3 preparation to a dedicated
message sent from the host driver only when the driver is about
to enter D0i3 - this reduces power consumption and latency for
certain workloads, for example audio workloads that submit
inference every 10 ms.
The VPU needs non zero time to enter IDLE state after responding to
D0i3 entry message. If the driver does not wait for the VPU to enter
IDLE state it could cause warm boot failures.
Split ivpu_ipc_send_receive() implementation to have a version
that does not call pm_runtime_resume_and_get(). That implementation
can be invoked when device is up and runtime resume is prohibited
(for example at the end of boot sequence).
The new function will be used for D0i3 entry IPC message addition
in the separate change.
accel/ivpu: Pass D0i3 residency time to the VPU firmware
The firmware needs to know the time spent in D0i3/D3 to
calculate telemetry data. The D0i3/D3 residency time is
calculated by the driver and passed to the firmware
in the boot parameters.
The driver also passes VPU perf counter value captured
right before entering D0i3 - this allows the VPU firmware
to generate monotonic timestamps for the logs.
accel/ivpu: Add support for VPU_JOB_FLAGS_NULL_SUBMISSION_MASK
Add test_mode = 3 that add VPU_JOB_FLAGS_NULL_SUBMISSION_MASK
flag to the job send to the VPU device. Then the VPU will process
the job but won't execute commands (except the command to signal
the fence).
This can be used to estimate job processing overhead in the host
software and VPU firmware.
Unlike the null hardware mode, the null submission mode will
still work even if UMD uses VPU fences to track job completion.
Arnd Bergmann [Fri, 27 Oct 2023 15:26:23 +0000 (17:26 +0200)]
accel/ivpu: avoid build failure with CONFIG_PM=n
The usage count of struct dev_pm_info is an implementation detail that
is only available if CONFIG_PM is enabled, so printing it in a debug message
causes a build failure in configurations without PM:
In file included from include/linux/device.h:15,
from include/linux/pci.h:37,
from drivers/accel/ivpu/ivpu_pm.c:8:
drivers/accel/ivpu/ivpu_pm.c: In function 'ivpu_rpm_get_if_active':
drivers/accel/ivpu/ivpu_pm.c:254:51: error: 'struct dev_pm_info' has no member named 'usage_count'
254 | atomic_read(&vdev->drm.dev->power.usage_count));
| ^
include/linux/dev_printk.h:129:48: note: in definition of macro 'dev_printk'
129 | _dev_printk(level, dev, fmt, ##__VA_ARGS__); \
| ^~~~~~~~~~~
drivers/accel/ivpu/ivpu_drv.h:75:17: note: in expansion of macro 'dev_dbg'
75 | dev_dbg((vdev)->drm.dev, "[%s] " fmt, #type, ##args); \
| ^~~~~~~
drivers/accel/ivpu/ivpu_pm.c:253:9: note: in expansion of macro 'ivpu_dbg'
253 | ivpu_dbg(vdev, RPM, "rpm_get_if_active count %d\n",
| ^~~~~~~~
The print message does not seem essential, so the easiest workaround is
to just remove it.
Use QAIC_TIMESYNC MHI channel to send UTC time to device in SBL
environment. Remove support for QAIC_TIMESYNC MHI channel in AMSS
environment as it is not used in that environment.
Ajit Pal Singh [Mon, 16 Oct 2023 17:01:13 +0000 (11:01 -0600)]
accel/qaic: Add support for periodic timesync
Device and Host have a time synchronization mechanism that happens once
during boot when device is in SBL mode. After that, in mission-mode there
is no timesync. In an experiment after continuous operation, device time
drifted w.r.t. host by approximately 3 seconds per day. This drift leads
to mismatch in timestamp of device and Host logs. To correct this
implement periodic timesync in driver. This timesync is carried out via
QAIC_TIMESYNC_PERIODIC MHI channel.
Carl Vanderlip [Mon, 16 Oct 2023 17:00:36 +0000 (11:00 -0600)]
accel/qaic: Enable 1 MSI fallback mode
Several virtualization use-cases either don't support 32 MultiMSIs
(Xen/VMware) or have significant drawbacks to their use (KVM's vIOMMU,
which is required to support 32 MSI, needs to allocate an alternate
system memory space for each device using vIOMMU (e.g. 8GB VM mem and
2 cards => 8 + 2 * 8 = 24GB host memory required)). Support these
cases by enabling a 1 MSI fallback mode.
Whenever all 32 MSIs requested are not available, a second request for
a single MSI is made. Its success is the initiator of single MSI mode.
This mode causes all interrupts generated by the device to be directed
to the 0th MSI (firmware >=v1.10 will do this as a response to the PCIe
MSI capability configuration). Likewise, all interrupt handlers for the
device are registered to the 0th MSI.
Since the DBC interrupt handler checks if the DBC is in use or if
there is any pending changes, the 'spurious' interrupts are
disregarded. If there is work to be done, the standard threaded IRQ
handler is dispatched.
On every interrupt, the MHI handler wakes up its threaded interrupt
handler, and attempts to wake any waiters for MHI state events.
Performance is within +-0.6% for test cases that typify real world
use. Larger differences ([-4,+132]%, avg +47%) exist for very simple
tasks (e.g. addition) compiled for single NSPs. It is assumed that the
small work and many interrupts typically cause contention (e.g. 16 NSPs
vs 4 CPUs), as evidenced by the standard deviation between runs also
decreasing (r=-0.48 between delta(Performace_test) and
delta(StdDev_test/Avg_test))
Simon Ser [Mon, 23 Oct 2023 20:36:39 +0000 (20:36 +0000)]
drm/doc: describe PATH format for DP MST
This is already uAPI, xserver parses it. It's useful to document
since user-space might want to lookup the parent connector.
Additionally, people (me included) have misunderstood the PATH
property for being stable across reboots, but since a KMS object
ID is baked in there that's not the case. So PATH shouldn't be
used as-is in config files and such.
Simon Ser [Fri, 20 Oct 2023 10:19:45 +0000 (10:19 +0000)]
drm: introduce CLOSEFB IOCTL
This new IOCTL allows callers to close a framebuffer without
disabling planes or CRTCs. This takes inspiration from Rob Clark's
unref_fb IOCTL [1] and DRM_MODE_FB_PERSIST [2].
User-space patch for wlroots available at [3]. IGT test available
at [4].
v2: add an extra pad field just in case we want to extend this IOCTL
in the future (Pekka, Sima).
dt-bindings: display: ssd132x: Remove '-' before compatible enum
This is a leftover from when the binding schema had the compatible string
property enum as a 'oneOf' child and the '-' was not removed when 'oneOf'
got dropped during the binding review process.
Alex Deucher [Wed, 25 Oct 2023 17:19:28 +0000 (13:19 -0400)]
drm/amdgpu: move buffer funcs setting up a level
Rather than doing this in the IP code for the SDMA paging
engine, move it up to the core device level init level.
This should fix the scheduler init ordering.
v2: drop extra parens
v3: drop SDMA helpers
v4: Added a Fixes tag because amdgpu dereferences an uninitialized
scheduler without this patch, and this patch fixes this. (Luben)
Luben Tuikov [Sun, 15 Oct 2023 01:15:35 +0000 (21:15 -0400)]
drm/sched: Convert the GPU scheduler to variable number of run-queues
The GPU scheduler has now a variable number of run-queues, which are set up at
drm_sched_init() time. This way, each driver announces how many run-queues it
requires (supports) per each GPU scheduler it creates. Note, that run-queues
correspond to scheduler "priorities", thus if the number of run-queues is set
to 1 at drm_sched_init(), then that scheduler supports a single run-queue,
i.e. single "priority". If a driver further sets a single entity per
run-queue, then this creates a 1-to-1 correspondence between a scheduler and
a scheduled entity.
Helen Koike [Tue, 19 Sep 2023 18:22:49 +0000 (15:22 -0300)]
MAINTAINERS: drm/ci: add entries for xfail files
DRM CI keeps track of which tests are failing, flaking or being skipped
by the ci in the expectations files. Add entries for those files to the
corresponding driver maintainer, so they can be notified when they
change.
Helen Koike [Tue, 24 Oct 2023 00:45:24 +0000 (21:45 -0300)]
drm/ci: do not automatically retry on error
Since the kernel doesn't use a bot like Mesa that requires tests to pass
in order to merge the patches, leave it to developers and/or maintainers
to manually retry.
Helen Koike [Tue, 24 Oct 2023 00:45:22 +0000 (21:45 -0300)]
drm/ci: increase i915 job timeout to 1h30m
With the new sharding, the default job timeout is not enough for i915
and their jobs are failing before completing.
See below the current execution time:
🞋 job i915:tgl 8/8 has new status: success (37m3s)
🞋 job i915:tgl 7/8 has new status: success (19m43s)
🞋 job i915:tgl 6/8 has new status: success (21m47s)
🞋 job i915:tgl 5/8 has new status: success (18m16s)
🞋 job i915:tgl 4/8 has new status: success (21m43s)
🞋 job i915:tgl 3/8 has new status: success (17m59s)
🞋 job i915:tgl 2/8 has new status: success (22m15s)
🞋 job i915:tgl 1/8 has new status: success (18m52s)
🞋 job i915:cml 2/2 has new status: success (1h19m58s)
🞋 job i915:cml 1/2 has new status: success (55m45s)
🞋 job i915:whl 2/2 has new status: success (1h8m56s)
🞋 job i915:whl 1/2 has new status: success (54m3s)
🞋 job i915:kbl 3/3 has new status: success (37m43s)
🞋 job i915:kbl 2/3 has new status: success (36m37s)
🞋 job i915:kbl 1/3 has new status: success (34m52s)
🞋 job i915:amly 2/2 has new status: success (1h7m60s)
🞋 job i915:amly 1/2 has new status: success (59m18s)
🞋 job i915:glk 2/2 has new status: success (58m26s)
🞋 job i915:glk 1/2 has new status: success (50m23s)
🞋 job i915:apl 3/3 has new status: success (1h6m39s)
🞋 job i915:apl 2/3 has new status: success (1h4m45s)
🞋 job i915:apl 1/3 has new status: success (1h7m38s)
(generated with ci_run_n_monitor.py script)
The longest job is 1h19m58s, so adjust the timeout.
Helen Koike [Tue, 24 Oct 2023 00:45:21 +0000 (21:45 -0300)]
drm/ci: add subset-1-gfx to LAVA_TAGS and adjust shards
The Collabora Lava farm added a tag called `subset-1-gfx` to half of
devices the graphics community use.
Lets use this tag so we don't occupy all the resources.
This is particular important because Mesa3D shares the resources with
DRM-CI and use them to do pre-merge tests, so it can block developers
from getting their patches merged.
Helen Koike [Tue, 24 Oct 2023 00:45:20 +0000 (21:45 -0300)]
drm/ci: clean up xfails (specially flakes list)
Since the script that collected the list of the expectation files was
bogus and placing test to the flakes list incorrectly, restart the
expectation files with the correct script.
This reduces a lot the number of tests in the flakes list.
Helen Koike [Tue, 24 Oct 2023 00:45:19 +0000 (21:45 -0300)]
drm/ci: uprev IGT and make sure core_getversion is run
IGT has recently merged a patch that makes code_getversion test to fails
if the driver isn't loaded or if it isn't the expected one defined in
variable IGT_FORCE_DRIVER.
Without this test, jobs were passing when the driver didn't load or
probe for some reason, giving the illusion that everything was ok.
Uprev IGT to include this modification and include core_getversion test
in all the shards.
Helen Koike [Tue, 24 Oct 2023 00:45:18 +0000 (21:45 -0300)]
drm/ci: add helper script update-xfails.py
Add helper script that given a gitlab pipeline url, analyse which are
the failures and flakes and update the xfails folder accordingly.
Example:
Trigger a pipeline in gitlab infrastructure, than re-try a few jobs more
than once (so we can have data if failures are consistent across jobs
with the same name or if they are flakes) and execute:
When building containers, some rust packages were installed without
locking the dependencies version, which got updated and started giving
errors like:
error: failed to compile `bindgen-cli v0.62.0`, intermediate artifacts can be found at `/tmp/cargo-installkNKRwf`
Caused by:
package `rustix v0.38.13` cannot be built because it requires rustc 1.63 or newer, while the currently active rustc version is 1.60.0
A patch to Mesa was added fixing this error, so update it.
Also, commit in linux kernel 6.6 rc3 broke booting in crosvm.
Mesa has upreved crosvm to fix this issue.
drm/ci: force-enable CONFIG_MSM_MMCC_8996 as built-in
Enable CONFIG_MSM_MMCC_8996, the multimedia clock controller on Qualcomm
MSM8996 to prevent the the board from hitting the probe deferral
timeouts in CI run.
drm/ci: pick up -external-fixes from the merge target repo
In case of the merge requests it might be useful to push repo-specific
fixes which have not yet propagated to the -external-fixes branch in the
main UPSTREAM_REPO. For example, in case of drm/msm development, we are
staging fixes locally for testing, before pushing them to the drm/drm
repo. Thus, if the CI run was triggered by merge request, also pick up
the -external fixes basing on the the CI_MERGE target repo / and branch.
Simon Ser [Mon, 21 Aug 2023 13:15:56 +0000 (13:15 +0000)]
drm/doc: document DRM_IOCTL_MODE_CREATE_DUMB
The main motivation is to repeat that dumb buffers should not be
abused for anything else than basic software rendering with KMS.
User-space devs are more likely to look at the IOCTL docs than to
actively search for the driver-oriented "Dumb Buffer Objects"
section.
v2: reference DRM_CAP_DUMB_BUFFER, DRM_CAP_DUMB_PREFERRED_DEPTH and
DRM_CAP_DUMB_PREFER_SHADOW (Pekka)
Jonas Karlman [Mon, 23 Oct 2023 17:37:15 +0000 (17:37 +0000)]
drm/rockchip: vop: Add NV15, NV20 and NV30 support
Add support for displaying 10-bit 4:2:0 and 4:2:2 formats produced by
the Rockchip Video Decoder on RK322X, RK3288, RK3328 and RK3399.
Also add support for 10-bit 4:4:4 format while at it.
V5: Use drm_format_info_min_pitch() for correct bpp
Add missing NV21, NV61 and NV42 formats
V4: Rework RK3328/RK3399 win0/1 data to not affect RK3368
V2: Added NV30 support
Jonas Karlman [Mon, 23 Oct 2023 17:37:14 +0000 (17:37 +0000)]
drm/fourcc: Add NV20 and NV30 YUV formats
DRM_FORMAT_NV20 and DRM_FORMAT_NV30 formats is the 2x1 and non-subsampled
variant of NV15, a 10-bit 2-plane YUV format that has no padding between
components. Instead, luminance and chrominance samples are grouped into 4s
so that each group is packed into an integer number of bytes:
YYYY = UVUV = 4 * 10 bits = 40 bits = 5 bytes
The '20' and '30' suffix refers to the optimum effective bits per pixel
which is achieved when the total number of luminance samples is a multiple
of 4.
Andy Yan [Wed, 18 Oct 2023 09:43:39 +0000 (17:43 +0800)]
drm/rockchip: vop2: rename window formats to show window type using them
formats_win_full_10bit is for cluster window,
formats_win_full_10bit_yuyv is for rk356x esmart, rk3588 esmart window
will support more format.
formats_win_lite is for smart window.
Rename it based the windows type may let meaning is clearer
Andy Yan [Wed, 18 Oct 2023 09:42:39 +0000 (17:42 +0800)]
drm/rockchip: vop2: remove the unsupported format of cluster window
The cluster window on vop2 doesn't support linear yuv
format(NV12/16/24), it only support afbc based yuv
format(DRM_FORMAT_YUV420_8BIT/10BIT), which will be
added in next patch.
Liu Ying [Wed, 18 Oct 2023 03:52:12 +0000 (11:52 +0800)]
drm/bridge: synopsys: dw-mipi-dsi: Fix hcomponent lbcc for burst mode
In order to support burst mode, vendor drivers set lane_mbps higher than
bandwidth through DPI interface. So, calculate horizontal component lane
byte clock cycle(lbcc) based on lane_mbps instead of pixel clock rate for
burst mode.
drm/client: Convert drm_client_buffer_addfb() to drm_mode_addfb2()
Currently drm_client_buffer_addfb() uses the legacy drm_mode_addfb(),
which uses bpp and depth to guess the wanted buffer format.
However, drm_client_buffer_addfb() already knows the exact buffer
format, so there is no need to convert back and forth between buffer
format and bpp/depth, and the function can just call drm_mode_addfb2()
directly instead.
Deepak R Varma [Wed, 18 Oct 2023 19:13:54 +0000 (00:43 +0530)]
accel/ivpu: Delete the TODO file
The work items listed in the TODO file of this driver file are either
completed or dropped. The file is no more significant according
to the maintainers. Hence removing it from the sources.
accel/ivpu: Do not initialize parameters on power up
Initialize HW specific parameters only once. We do not have to do this
on every power_up (performed during initialization and on resume). Move
corresponding code to ->info_init()