]>
Commit | Line | Data |
---|---|---|
992a8e60 MW |
1 | .. SPDX-License-Identifier: GPL-2.0+ |
2 | ||
3 | ====== | |
4 | XArray | |
5 | ====== | |
6 | ||
7 | :Author: Matthew Wilcox | |
8 | ||
9 | Overview | |
10 | ======== | |
11 | ||
12 | The XArray is an abstract data type which behaves like a very large array | |
13 | of pointers. It meets many of the same needs as a hash or a conventional | |
14 | resizable array. Unlike a hash, it allows you to sensibly go to the | |
15 | next or previous entry in a cache-efficient manner. In contrast to a | |
16 | resizable array, there is no need to copy data or change MMU mappings in | |
17 | order to grow the array. It is more memory-efficient, parallelisable | |
18 | and cache friendly than a doubly-linked list. It takes advantage of | |
19 | RCU to perform lookups without locking. | |
20 | ||
21 | The XArray implementation is efficient when the indices used are densely | |
22 | clustered; hashing the object and using the hash as the index will not | |
23 | perform well. The XArray is optimised for small indices, but still has | |
24 | good performance with large indices. If your index can be larger than | |
25 | ``ULONG_MAX`` then the XArray is not the data type for you. The most | |
26 | important user of the XArray is the page cache. | |
27 | ||
28 | Each non-``NULL`` entry in the array has three bits associated with | |
29 | it called marks. Each mark may be set or cleared independently of | |
30 | the others. You can iterate over entries which are marked. | |
31 | ||
32 | Normal pointers may be stored in the XArray directly. They must be 4-byte | |
33 | aligned, which is true for any pointer returned from :c:func:`kmalloc` and | |
34 | :c:func:`alloc_page`. It isn't true for arbitrary user-space pointers, | |
35 | nor for function pointers. You can store pointers to statically allocated | |
36 | objects, as long as those objects have an alignment of at least 4. | |
37 | ||
38 | You can also store integers between 0 and ``LONG_MAX`` in the XArray. | |
39 | You must first convert it into an entry using :c:func:`xa_mk_value`. | |
40 | When you retrieve an entry from the XArray, you can check whether it is | |
41 | a value entry by calling :c:func:`xa_is_value`, and convert it back to | |
42 | an integer by calling :c:func:`xa_to_value`. | |
43 | ||
44 | Some users want to store tagged pointers instead of using the marks | |
45 | described above. They can call :c:func:`xa_tag_pointer` to create an | |
46 | entry with a tag, :c:func:`xa_untag_pointer` to turn a tagged entry | |
47 | back into an untagged pointer and :c:func:`xa_pointer_tag` to retrieve | |
48 | the tag of an entry. Tagged pointers use the same bits that are used | |
49 | to distinguish value entries from normal pointers, so each user must | |
50 | decide whether they want to store value entries or tagged pointers in | |
51 | any particular XArray. | |
52 | ||
53 | The XArray does not support storing :c:func:`IS_ERR` pointers as some | |
54 | conflict with value entries or internal entries. | |
55 | ||
56 | An unusual feature of the XArray is the ability to create entries which | |
57 | occupy a range of indices. Once stored to, looking up any index in | |
58 | the range will return the same entry as looking up any other index in | |
59 | the range. Setting a mark on one index will set it on all of them. | |
60 | Storing to any index will store to all of them. Multi-index entries can | |
61 | be explicitly split into smaller entries, or storing ``NULL`` into any | |
62 | entry will cause the XArray to forget about the range. | |
63 | ||
64 | Normal API | |
65 | ========== | |
66 | ||
67 | Start by initialising an XArray, either with :c:func:`DEFINE_XARRAY` | |
68 | for statically allocated XArrays or :c:func:`xa_init` for dynamically | |
69 | allocated ones. A freshly-initialised XArray contains a ``NULL`` | |
70 | pointer at every index. | |
71 | ||
72 | You can then set entries using :c:func:`xa_store` and get entries | |
73 | using :c:func:`xa_load`. xa_store will overwrite any entry with the | |
74 | new entry and return the previous entry stored at that index. You can | |
75 | use :c:func:`xa_erase` instead of calling :c:func:`xa_store` with a | |
76 | ``NULL`` entry. There is no difference between an entry that has never | |
77 | been stored to and one that has most recently had ``NULL`` stored to it. | |
78 | ||
79 | You can conditionally replace an entry at an index by using | |
80 | :c:func:`xa_cmpxchg`. Like :c:func:`cmpxchg`, it will only succeed if | |
81 | the entry at that index has the 'old' value. It also returns the entry | |
82 | which was at that index; if it returns the same entry which was passed as | |
83 | 'old', then :c:func:`xa_cmpxchg` succeeded. | |
84 | ||
85 | If you want to only store a new entry to an index if the current entry | |
86 | at that index is ``NULL``, you can use :c:func:`xa_insert` which | |
87 | returns ``-EEXIST`` if the entry is not empty. | |
88 | ||
89 | You can enquire whether a mark is set on an entry by using | |
90 | :c:func:`xa_get_mark`. If the entry is not ``NULL``, you can set a mark | |
91 | on it by using :c:func:`xa_set_mark` and remove the mark from an entry by | |
92 | calling :c:func:`xa_clear_mark`. You can ask whether any entry in the | |
93 | XArray has a particular mark set by calling :c:func:`xa_marked`. | |
94 | ||
95 | You can copy entries out of the XArray into a plain array by calling | |
96 | :c:func:`xa_extract`. Or you can iterate over the present entries in | |
97 | the XArray by calling :c:func:`xa_for_each`. You may prefer to use | |
98 | :c:func:`xa_find` or :c:func:`xa_find_after` to move to the next present | |
99 | entry in the XArray. | |
100 | ||
0e9446c3 MW |
101 | Calling :c:func:`xa_store_range` stores the same entry in a range |
102 | of indices. If you do this, some of the other operations will behave | |
103 | in a slightly odd way. For example, marking the entry at one index | |
104 | may result in the entry being marked at some, but not all of the other | |
105 | indices. Storing into one index may result in the entry retrieved by | |
106 | some, but not all of the other indices changing. | |
107 | ||
992a8e60 MW |
108 | Finally, you can remove all entries from an XArray by calling |
109 | :c:func:`xa_destroy`. If the XArray entries are pointers, you may wish | |
110 | to free the entries first. You can do this by iterating over all present | |
111 | entries in the XArray using the :c:func:`xa_for_each` iterator. | |
112 | ||
371c752d MW |
113 | ID assignment |
114 | ------------- | |
115 | ||
116 | You can call :c:func:`xa_alloc` to store the entry at any unused index | |
117 | in the XArray. If you need to modify the array from interrupt context, | |
118 | you can use :c:func:`xa_alloc_bh` or :c:func:`xa_alloc_irq` to disable | |
119 | interrupts while allocating the ID. Unlike :c:func:`xa_store`, allocating | |
120 | a ``NULL`` pointer does not delete an entry. Instead it reserves an | |
121 | entry like :c:func:`xa_reserve` and you can release it using either | |
122 | :c:func:`xa_erase` or :c:func:`xa_release`. To use ID assignment, the | |
123 | XArray must be defined with :c:func:`DEFINE_XARRAY_ALLOC`, or initialised | |
124 | by passing ``XA_FLAGS_ALLOC`` to :c:func:`xa_init_flags`, | |
125 | ||
992a8e60 MW |
126 | Memory allocation |
127 | ----------------- | |
128 | ||
371c752d MW |
129 | The :c:func:`xa_store`, :c:func:`xa_cmpxchg`, :c:func:`xa_alloc`, |
130 | :c:func:`xa_reserve` and :c:func:`xa_insert` functions take a gfp_t | |
131 | parameter in case the XArray needs to allocate memory to store this entry. | |
992a8e60 MW |
132 | If the entry is being deleted, no memory allocation needs to be performed, |
133 | and the GFP flags specified will be ignored. | |
134 | ||
135 | It is possible for no memory to be allocatable, particularly if you pass | |
136 | a restrictive set of GFP flags. In that case, the functions return a | |
137 | special value which can be turned into an errno using :c:func:`xa_err`. | |
138 | If you don't need to know exactly which error occurred, using | |
139 | :c:func:`xa_is_err` is slightly more efficient. | |
140 | ||
141 | Locking | |
142 | ------- | |
143 | ||
144 | When using the Normal API, you do not have to worry about locking. | |
145 | The XArray uses RCU and an internal spinlock to synchronise access: | |
146 | ||
147 | No lock needed: | |
148 | * :c:func:`xa_empty` | |
149 | * :c:func:`xa_marked` | |
150 | ||
151 | Takes RCU read lock: | |
152 | * :c:func:`xa_load` | |
153 | * :c:func:`xa_for_each` | |
154 | * :c:func:`xa_find` | |
155 | * :c:func:`xa_find_after` | |
156 | * :c:func:`xa_extract` | |
157 | * :c:func:`xa_get_mark` | |
158 | ||
159 | Takes xa_lock internally: | |
160 | * :c:func:`xa_store` | |
161 | * :c:func:`xa_insert` | |
162 | * :c:func:`xa_erase` | |
163 | * :c:func:`xa_erase_bh` | |
164 | * :c:func:`xa_erase_irq` | |
165 | * :c:func:`xa_cmpxchg` | |
0e9446c3 | 166 | * :c:func:`xa_store_range` |
371c752d MW |
167 | * :c:func:`xa_alloc` |
168 | * :c:func:`xa_alloc_bh` | |
169 | * :c:func:`xa_alloc_irq` | |
992a8e60 MW |
170 | * :c:func:`xa_destroy` |
171 | * :c:func:`xa_set_mark` | |
172 | * :c:func:`xa_clear_mark` | |
173 | ||
174 | Assumes xa_lock held on entry: | |
175 | * :c:func:`__xa_store` | |
176 | * :c:func:`__xa_insert` | |
177 | * :c:func:`__xa_erase` | |
178 | * :c:func:`__xa_cmpxchg` | |
371c752d | 179 | * :c:func:`__xa_alloc` |
992a8e60 MW |
180 | * :c:func:`__xa_set_mark` |
181 | * :c:func:`__xa_clear_mark` | |
182 | ||
183 | If you want to take advantage of the lock to protect the data structures | |
184 | that you are storing in the XArray, you can call :c:func:`xa_lock` | |
185 | before calling :c:func:`xa_load`, then take a reference count on the | |
186 | object you have found before calling :c:func:`xa_unlock`. This will | |
187 | prevent stores from removing the object from the array between looking | |
188 | up the object and incrementing the refcount. You can also use RCU to | |
189 | avoid dereferencing freed memory, but an explanation of that is beyond | |
190 | the scope of this document. | |
191 | ||
192 | The XArray does not disable interrupts or softirqs while modifying | |
193 | the array. It is safe to read the XArray from interrupt or softirq | |
194 | context as the RCU lock provides enough protection. | |
195 | ||
196 | If, for example, you want to store entries in the XArray in process | |
197 | context and then erase them in softirq context, you can do that this way:: | |
198 | ||
199 | void foo_init(struct foo *foo) | |
200 | { | |
201 | xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH); | |
202 | } | |
203 | ||
204 | int foo_store(struct foo *foo, unsigned long index, void *entry) | |
205 | { | |
206 | int err; | |
207 | ||
208 | xa_lock_bh(&foo->array); | |
209 | err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL)); | |
210 | if (!err) | |
211 | foo->count++; | |
212 | xa_unlock_bh(&foo->array); | |
213 | return err; | |
214 | } | |
215 | ||
216 | /* foo_erase() is only called from softirq context */ | |
217 | void foo_erase(struct foo *foo, unsigned long index) | |
218 | { | |
219 | xa_lock(&foo->array); | |
220 | __xa_erase(&foo->array, index); | |
221 | foo->count--; | |
222 | xa_unlock(&foo->array); | |
223 | } | |
224 | ||
225 | If you are going to modify the XArray from interrupt or softirq context, | |
226 | you need to initialise the array using :c:func:`xa_init_flags`, passing | |
227 | ``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``. | |
228 | ||
229 | The above example also shows a common pattern of wanting to extend the | |
230 | coverage of the xa_lock on the store side to protect some statistics | |
231 | associated with the array. | |
232 | ||
233 | Sharing the XArray with interrupt context is also possible, either | |
234 | using :c:func:`xa_lock_irqsave` in both the interrupt handler and process | |
235 | context, or :c:func:`xa_lock_irq` in process context and :c:func:`xa_lock` | |
236 | in the interrupt handler. Some of the more common patterns have helper | |
237 | functions such as :c:func:`xa_erase_bh` and :c:func:`xa_erase_irq`. | |
238 | ||
239 | Sometimes you need to protect access to the XArray with a mutex because | |
240 | that lock sits above another mutex in the locking hierarchy. That does | |
241 | not entitle you to use functions like :c:func:`__xa_erase` without taking | |
242 | the xa_lock; the xa_lock is used for lockdep validation and will be used | |
243 | for other purposes in the future. | |
244 | ||
245 | The :c:func:`__xa_set_mark` and :c:func:`__xa_clear_mark` functions are also | |
246 | available for situations where you look up an entry and want to atomically | |
247 | set or clear a mark. It may be more efficient to use the advanced API | |
248 | in this case, as it will save you from walking the tree twice. | |
249 | ||
250 | Advanced API | |
251 | ============ | |
252 | ||
253 | The advanced API offers more flexibility and better performance at the | |
254 | cost of an interface which can be harder to use and has fewer safeguards. | |
255 | No locking is done for you by the advanced API, and you are required | |
256 | to use the xa_lock while modifying the array. You can choose whether | |
257 | to use the xa_lock or the RCU lock while doing read-only operations on | |
258 | the array. You can mix advanced and normal operations on the same array; | |
259 | indeed the normal API is implemented in terms of the advanced API. The | |
260 | advanced API is only available to modules with a GPL-compatible license. | |
261 | ||
262 | The advanced API is based around the xa_state. This is an opaque data | |
263 | structure which you declare on the stack using the :c:func:`XA_STATE` | |
264 | macro. This macro initialises the xa_state ready to start walking | |
265 | around the XArray. It is used as a cursor to maintain the position | |
266 | in the XArray and let you compose various operations together without | |
267 | having to restart from the top every time. | |
268 | ||
269 | The xa_state is also used to store errors. You can call | |
270 | :c:func:`xas_error` to retrieve the error. All operations check whether | |
271 | the xa_state is in an error state before proceeding, so there's no need | |
272 | for you to check for an error after each call; you can make multiple | |
273 | calls in succession and only check at a convenient point. The only | |
274 | errors currently generated by the XArray code itself are ``ENOMEM`` and | |
275 | ``EINVAL``, but it supports arbitrary errors in case you want to call | |
276 | :c:func:`xas_set_err` yourself. | |
277 | ||
278 | If the xa_state is holding an ``ENOMEM`` error, calling :c:func:`xas_nomem` | |
279 | will attempt to allocate more memory using the specified gfp flags and | |
280 | cache it in the xa_state for the next attempt. The idea is that you take | |
281 | the xa_lock, attempt the operation and drop the lock. The operation | |
282 | attempts to allocate memory while holding the lock, but it is more | |
283 | likely to fail. Once you have dropped the lock, :c:func:`xas_nomem` | |
284 | can try harder to allocate more memory. It will return ``true`` if it | |
285 | is worth retrying the operation (i.e. that there was a memory error *and* | |
286 | more memory was allocated). If it has previously allocated memory, and | |
287 | that memory wasn't used, and there is no error (or some error that isn't | |
288 | ``ENOMEM``), then it will free the memory previously allocated. | |
289 | ||
290 | Internal Entries | |
291 | ---------------- | |
292 | ||
293 | The XArray reserves some entries for its own purposes. These are never | |
294 | exposed through the normal API, but when using the advanced API, it's | |
295 | possible to see them. Usually the best way to handle them is to pass them | |
296 | to :c:func:`xas_retry`, and retry the operation if it returns ``true``. | |
297 | ||
298 | .. flat-table:: | |
299 | :widths: 1 1 6 | |
300 | ||
301 | * - Name | |
302 | - Test | |
303 | - Usage | |
304 | ||
305 | * - Node | |
306 | - :c:func:`xa_is_node` | |
307 | - An XArray node. May be visible when using a multi-index xa_state. | |
308 | ||
309 | * - Sibling | |
310 | - :c:func:`xa_is_sibling` | |
311 | - A non-canonical entry for a multi-index entry. The value indicates | |
312 | which slot in this node has the canonical entry. | |
313 | ||
314 | * - Retry | |
315 | - :c:func:`xa_is_retry` | |
316 | - This entry is currently being modified by a thread which has the | |
317 | xa_lock. The node containing this entry may be freed at the end | |
318 | of this RCU period. You should restart the lookup from the head | |
319 | of the array. | |
320 | ||
9f14d4f1 MW |
321 | * - Zero |
322 | - :c:func:`xa_is_zero` | |
323 | - Zero entries appear as ``NULL`` through the Normal API, but occupy | |
324 | an entry in the XArray which can be used to reserve the index for | |
325 | future use. | |
326 | ||
992a8e60 MW |
327 | Other internal entries may be added in the future. As far as possible, they |
328 | will be handled by :c:func:`xas_retry`. | |
329 | ||
330 | Additional functionality | |
331 | ------------------------ | |
332 | ||
333 | The :c:func:`xas_create_range` function allocates all the necessary memory | |
334 | to store every entry in a range. It will set ENOMEM in the xa_state if | |
335 | it cannot allocate memory. | |
336 | ||
337 | You can use :c:func:`xas_init_marks` to reset the marks on an entry | |
338 | to their default state. This is usually all marks clear, unless the | |
339 | XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set | |
340 | and all other marks are clear. Replacing one entry with another using | |
341 | :c:func:`xas_store` will not reset the marks on that entry; if you want | |
342 | the marks reset, you should do that explicitly. | |
343 | ||
344 | The :c:func:`xas_load` will walk the xa_state as close to the entry | |
345 | as it can. If you know the xa_state has already been walked to the | |
346 | entry and need to check that the entry hasn't changed, you can use | |
347 | :c:func:`xas_reload` to save a function call. | |
348 | ||
349 | If you need to move to a different index in the XArray, call | |
350 | :c:func:`xas_set`. This resets the cursor to the top of the tree, which | |
351 | will generally make the next operation walk the cursor to the desired | |
352 | spot in the tree. If you want to move to the next or previous index, | |
353 | call :c:func:`xas_next` or :c:func:`xas_prev`. Setting the index does | |
354 | not walk the cursor around the array so does not require a lock to be | |
355 | held, while moving to the next or previous index does. | |
356 | ||
357 | You can search for the next present entry using :c:func:`xas_find`. This | |
358 | is the equivalent of both :c:func:`xa_find` and :c:func:`xa_find_after`; | |
359 | if the cursor has been walked to an entry, then it will find the next | |
360 | entry after the one currently referenced. If not, it will return the | |
361 | entry at the index of the xa_state. Using :c:func:`xas_next_entry` to | |
362 | move to the next present entry instead of :c:func:`xas_find` will save | |
363 | a function call in the majority of cases at the expense of emitting more | |
364 | inline code. | |
365 | ||
366 | The :c:func:`xas_find_marked` function is similar. If the xa_state has | |
367 | not been walked, it will return the entry at the index of the xa_state, | |
368 | if it is marked. Otherwise, it will return the first marked entry after | |
369 | the entry referenced by the xa_state. The :c:func:`xas_next_marked` | |
370 | function is the equivalent of :c:func:`xas_next_entry`. | |
371 | ||
372 | When iterating over a range of the XArray using :c:func:`xas_for_each` | |
373 | or :c:func:`xas_for_each_marked`, it may be necessary to temporarily stop | |
374 | the iteration. The :c:func:`xas_pause` function exists for this purpose. | |
375 | After you have done the necessary work and wish to resume, the xa_state | |
376 | is in an appropriate state to continue the iteration after the entry | |
377 | you last processed. If you have interrupts disabled while iterating, | |
378 | then it is good manners to pause the iteration and reenable interrupts | |
379 | every ``XA_CHECK_SCHED`` entries. | |
380 | ||
381 | The :c:func:`xas_get_mark`, :c:func:`xas_set_mark` and | |
382 | :c:func:`xas_clear_mark` functions require the xa_state cursor to have | |
383 | been moved to the appropriate location in the xarray; they will do | |
384 | nothing if you have called :c:func:`xas_pause` or :c:func:`xas_set` | |
385 | immediately before. | |
386 | ||
387 | You can call :c:func:`xas_set_update` to have a callback function | |
388 | called each time the XArray updates a node. This is used by the page | |
389 | cache workingset code to maintain its list of nodes which contain only | |
390 | shadow entries. | |
391 | ||
392 | Multi-Index Entries | |
393 | ------------------- | |
394 | ||
395 | The XArray has the ability to tie multiple indices together so that | |
396 | operations on one index affect all indices. For example, storing into | |
397 | any index will change the value of the entry retrieved from any index. | |
398 | Setting or clearing a mark on any index will set or clear the mark | |
399 | on every index that is tied together. The current implementation | |
400 | only allows tying ranges which are aligned powers of two together; | |
401 | eg indices 64-127 may be tied together, but 2-6 may not be. This may | |
402 | save substantial quantities of memory; for example tying 512 entries | |
403 | together will save over 4kB. | |
404 | ||
405 | You can create a multi-index entry by using :c:func:`XA_STATE_ORDER` | |
406 | or :c:func:`xas_set_order` followed by a call to :c:func:`xas_store`. | |
407 | Calling :c:func:`xas_load` with a multi-index xa_state will walk the | |
408 | xa_state to the right location in the tree, but the return value is not | |
409 | meaningful, potentially being an internal entry or ``NULL`` even when there | |
410 | is an entry stored within the range. Calling :c:func:`xas_find_conflict` | |
411 | will return the first entry within the range or ``NULL`` if there are no | |
412 | entries in the range. The :c:func:`xas_for_each_conflict` iterator will | |
413 | iterate over every entry which overlaps the specified range. | |
414 | ||
415 | If :c:func:`xas_load` encounters a multi-index entry, the xa_index | |
416 | in the xa_state will not be changed. When iterating over an XArray | |
417 | or calling :c:func:`xas_find`, if the initial index is in the middle | |
418 | of a multi-index entry, it will not be altered. Subsequent calls | |
419 | or iterations will move the index to the first index in the range. | |
420 | Each entry will only be returned once, no matter how many indices it | |
421 | occupies. | |
422 | ||
423 | Using :c:func:`xas_next` or :c:func:`xas_prev` with a multi-index xa_state | |
424 | is not supported. Using either of these functions on a multi-index entry | |
425 | will reveal sibling entries; these should be skipped over by the caller. | |
426 | ||
427 | Storing ``NULL`` into any index of a multi-index entry will set the entry | |
428 | at every index to ``NULL`` and dissolve the tie. Splitting a multi-index | |
429 | entry into entries occupying smaller ranges is not yet supported. | |
430 | ||
431 | Functions and structures | |
432 | ======================== | |
433 | ||
434 | .. kernel-doc:: include/linux/xarray.h | |
435 | .. kernel-doc:: lib/xarray.c |