]>
Commit | Line | Data |
---|---|---|
5da98b82 | 1 | ====================== |
a6537be9 | 2 | Lightweight PI-futexes |
5da98b82 | 3 | ====================== |
a6537be9 SR |
4 | |
5 | We are calling them lightweight for 3 reasons: | |
6 | ||
7 | - in the user-space fastpath a PI-enabled futex involves no kernel work | |
8 | (or any other PI complexity) at all. No registration, no extra kernel | |
9 | calls - just pure fast atomic ops in userspace. | |
10 | ||
11 | - even in the slowpath, the system call and scheduling pattern is very | |
12 | similar to normal futexes. | |
13 | ||
14 | - the in-kernel PI implementation is streamlined around the mutex | |
15 | abstraction, with strict rules that keep the implementation | |
16 | relatively simple: only a single owner may own a lock (i.e. no | |
17 | read-write lock support), only the owner may unlock a lock, no | |
18 | recursive locking, etc. | |
19 | ||
20 | Priority Inheritance - why? | |
21 | --------------------------- | |
22 | ||
23 | The short reply: user-space PI helps achieving/improving determinism for | |
24 | user-space applications. In the best-case, it can help achieve | |
25 | determinism and well-bound latencies. Even in the worst-case, PI will | |
26 | improve the statistical distribution of locking related application | |
27 | delays. | |
28 | ||
5da98b82 MCC |
29 | The longer reply |
30 | ---------------- | |
a6537be9 SR |
31 | |
32 | Firstly, sharing locks between multiple tasks is a common programming | |
33 | technique that often cannot be replaced with lockless algorithms. As we | |
34 | can see it in the kernel [which is a quite complex program in itself], | |
35 | lockless structures are rather the exception than the norm - the current | |
36 | ratio of lockless vs. locky code for shared data structures is somewhere | |
37 | between 1:10 and 1:100. Lockless is hard, and the complexity of lockless | |
38 | algorithms often endangers to ability to do robust reviews of said code. | |
39 | I.e. critical RT apps often choose lock structures to protect critical | |
40 | data structures, instead of lockless algorithms. Furthermore, there are | |
41 | cases (like shared hardware, or other resource limits) where lockless | |
42 | access is mathematically impossible. | |
43 | ||
44 | Media players (such as Jack) are an example of reasonable application | |
45 | design with multiple tasks (with multiple priority levels) sharing | |
46 | short-held locks: for example, a highprio audio playback thread is | |
47 | combined with medium-prio construct-audio-data threads and low-prio | |
48 | display-colory-stuff threads. Add video and decoding to the mix and | |
49 | we've got even more priority levels. | |
50 | ||
51 | So once we accept that synchronization objects (locks) are an | |
52 | unavoidable fact of life, and once we accept that multi-task userspace | |
53 | apps have a very fair expectation of being able to use locks, we've got | |
54 | to think about how to offer the option of a deterministic locking | |
55 | implementation to user-space. | |
56 | ||
57 | Most of the technical counter-arguments against doing priority | |
58 | inheritance only apply to kernel-space locks. But user-space locks are | |
59 | different, there we cannot disable interrupts or make the task | |
60 | non-preemptible in a critical section, so the 'use spinlocks' argument | |
61 | does not apply (user-space spinlocks have the same priority inversion | |
62 | problems as other user-space locking constructs). Fact is, pretty much | |
63 | the only technique that currently enables good determinism for userspace | |
64 | locks (such as futex-based pthread mutexes) is priority inheritance: | |
65 | ||
66 | Currently (without PI), if a high-prio and a low-prio task shares a lock | |
67 | [this is a quite common scenario for most non-trivial RT applications], | |
68 | even if all critical sections are coded carefully to be deterministic | |
69 | (i.e. all critical sections are short in duration and only execute a | |
70 | limited number of instructions), the kernel cannot guarantee any | |
71 | deterministic execution of the high-prio task: any medium-priority task | |
72 | could preempt the low-prio task while it holds the shared lock and | |
73 | executes the critical section, and could delay it indefinitely. | |
74 | ||
5da98b82 MCC |
75 | Implementation |
76 | -------------- | |
a6537be9 SR |
77 | |
78 | As mentioned before, the userspace fastpath of PI-enabled pthread | |
79 | mutexes involves no kernel work at all - they behave quite similarly to | |
80 | normal futex-based locks: a 0 value means unlocked, and a value==TID | |
81 | means locked. (This is the same method as used by list-based robust | |
82 | futexes.) Userspace uses atomic ops to lock/unlock these mutexes without | |
83 | entering the kernel. | |
84 | ||
85 | To handle the slowpath, we have added two new futex ops: | |
86 | ||
5da98b82 MCC |
87 | - FUTEX_LOCK_PI |
88 | - FUTEX_UNLOCK_PI | |
a6537be9 SR |
89 | |
90 | If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to | |
91 | TID fails], then FUTEX_LOCK_PI is called. The kernel does all the | |
92 | remaining work: if there is no futex-queue attached to the futex address | |
93 | yet then the code looks up the task that owns the futex [it has put its | |
94 | own TID into the futex value], and attaches a 'PI state' structure to | |
95 | the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware, | |
96 | kernel-based synchronization object. The 'other' task is made the owner | |
97 | of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the | |
98 | futex value. Then this task tries to lock the rt-mutex, on which it | |
99 | blocks. Once it returns, it has the mutex acquired, and it sets the | |
100 | futex value to its own TID and returns. Userspace has no other work to | |
101 | perform - it now owns the lock, and futex value contains | |
102 | FUTEX_WAITERS|TID. | |
103 | ||
104 | If the unlock side fastpath succeeds, [i.e. userspace manages to do a | |
105 | TID -> 0 atomic transition of the futex value], then no kernel work is | |
106 | triggered. | |
107 | ||
108 | If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), | |
109 | then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the | |
110 | behalf of userspace - and it also unlocks the attached | |
111 | pi_state->rt_mutex and thus wakes up any potential waiters. | |
112 | ||
113 | Note that under this approach, contrary to previous PI-futex approaches, | |
114 | there is no prior 'registration' of a PI-futex. [which is not quite | |
115 | possible anyway, due to existing ABI properties of pthread mutexes.] | |
116 | ||
117 | Also, under this scheme, 'robustness' and 'PI' are two orthogonal | |
118 | properties of futexes, and all four combinations are possible: futex, | |
119 | robust-futex, PI-futex, robust+PI-futex. | |
120 | ||
121 | More details about priority inheritance can be found in | |
387b1468 | 122 | Documentation/locking/rt-mutex.rst. |