]>
Commit | Line | Data |
---|---|---|
15e8699f PB |
1 | ========================= |
2 | Atomic operations in QEMU | |
3 | ========================= | |
4 | ||
5 | CPUs perform independent memory operations effectively in random order. | |
6 | but this can be a problem for CPU-CPU interaction (including interactions | |
7 | between QEMU and the guest). Multi-threaded programs use various tools | |
8 | to instruct the compiler and the CPU to restrict the order to something | |
9 | that is consistent with the expectations of the programmer. | |
10 | ||
11 | The most basic tool is locking. Mutexes, condition variables and | |
12 | semaphores are used in QEMU, and should be the default approach to | |
13 | synchronization. Anything else is considerably harder, but it's | |
de99dab0 PB |
14 | also justified more often than one would like; |
15 | the most performance-critical parts of QEMU in particular require | |
16 | a very low level approach to concurrency, involving memory barriers | |
17 | and atomic operations. The semantics of concurrent memory accesses are governed | |
18 | by the C11 memory model. | |
15e8699f | 19 | |
de99dab0 PB |
20 | QEMU provides a header, ``qemu/atomic.h``, which wraps C11 atomics to |
21 | provide better portability and a less verbose syntax. ``qemu/atomic.h`` | |
22 | provides macros that fall in three camps: | |
15e8699f PB |
23 | |
24 | - compiler barriers: ``barrier()``; | |
25 | ||
26 | - weak atomic access and manual memory barriers: ``atomic_read()``, | |
27 | ``atomic_set()``, ``smp_rmb()``, ``smp_wmb()``, ``smp_mb()``, ``smp_mb_acquire()``, | |
28 | ``smp_mb_release()``, ``smp_read_barrier_depends()``; | |
29 | ||
30 | - sequentially consistent atomic access: everything else. | |
31 | ||
de99dab0 PB |
32 | In general, use of ``qemu/atomic.h`` should be wrapped with more easily |
33 | used data structures (e.g. the lock-free singly-linked list operations | |
34 | ``QSLIST_INSERT_HEAD_ATOMIC`` and ``QSLIST_MOVE_ATOMIC``) or synchronization | |
35 | primitives (such as RCU, ``QemuEvent`` or ``QemuLockCnt``). Bare use of | |
36 | atomic operations and memory barriers should be limited to inter-thread | |
37 | checking of flags and documented thoroughly. | |
38 | ||
39 | ||
15e8699f PB |
40 | |
41 | Compiler memory barrier | |
42 | ======================= | |
43 | ||
de99dab0 PB |
44 | ``barrier()`` prevents the compiler from moving the memory accesses on |
45 | either side of it to the other side. The compiler barrier has no direct | |
46 | effect on the CPU, which may then reorder things however it wishes. | |
15e8699f PB |
47 | |
48 | ``barrier()`` is mostly used within ``qemu/atomic.h`` itself. On some | |
49 | architectures, CPU guarantees are strong enough that blocking compiler | |
50 | optimizations already ensures the correct order of execution. In this | |
51 | case, ``qemu/atomic.h`` will reduce stronger memory barriers to simple | |
52 | compiler barriers. | |
53 | ||
54 | Still, ``barrier()`` can be useful when writing code that can be interrupted | |
55 | by signal handlers. | |
56 | ||
57 | ||
58 | Sequentially consistent atomic access | |
59 | ===================================== | |
60 | ||
61 | Most of the operations in the ``qemu/atomic.h`` header ensure *sequential | |
62 | consistency*, where "the result of any execution is the same as if the | |
63 | operations of all the processors were executed in some sequential order, | |
64 | and the operations of each individual processor appear in this sequence | |
65 | in the order specified by its program". | |
66 | ||
67 | ``qemu/atomic.h`` provides the following set of atomic read-modify-write | |
68 | operations:: | |
69 | ||
70 | void atomic_inc(ptr) | |
71 | void atomic_dec(ptr) | |
72 | void atomic_add(ptr, val) | |
73 | void atomic_sub(ptr, val) | |
74 | void atomic_and(ptr, val) | |
75 | void atomic_or(ptr, val) | |
76 | ||
77 | typeof(*ptr) atomic_fetch_inc(ptr) | |
78 | typeof(*ptr) atomic_fetch_dec(ptr) | |
79 | typeof(*ptr) atomic_fetch_add(ptr, val) | |
80 | typeof(*ptr) atomic_fetch_sub(ptr, val) | |
81 | typeof(*ptr) atomic_fetch_and(ptr, val) | |
82 | typeof(*ptr) atomic_fetch_or(ptr, val) | |
83 | typeof(*ptr) atomic_fetch_xor(ptr, val) | |
84 | typeof(*ptr) atomic_fetch_inc_nonzero(ptr) | |
85 | typeof(*ptr) atomic_xchg(ptr, val) | |
86 | typeof(*ptr) atomic_cmpxchg(ptr, old, new) | |
87 | ||
88 | all of which return the old value of ``*ptr``. These operations are | |
de99dab0 PB |
89 | polymorphic; they operate on any type that is as wide as a pointer or |
90 | smaller. | |
15e8699f PB |
91 | |
92 | Similar operations return the new value of ``*ptr``:: | |
93 | ||
94 | typeof(*ptr) atomic_inc_fetch(ptr) | |
95 | typeof(*ptr) atomic_dec_fetch(ptr) | |
96 | typeof(*ptr) atomic_add_fetch(ptr, val) | |
97 | typeof(*ptr) atomic_sub_fetch(ptr, val) | |
98 | typeof(*ptr) atomic_and_fetch(ptr, val) | |
99 | typeof(*ptr) atomic_or_fetch(ptr, val) | |
100 | typeof(*ptr) atomic_xor_fetch(ptr, val) | |
101 | ||
de99dab0 PB |
102 | ``qemu/atomic.h`` also provides loads and stores that cannot be reordered |
103 | with each other:: | |
15e8699f PB |
104 | |
105 | typeof(*ptr) atomic_mb_read(ptr) | |
106 | void atomic_mb_set(ptr, val) | |
107 | ||
de99dab0 PB |
108 | However these do not provide sequential consistency and, in particular, |
109 | they do not participate in the total ordering enforced by | |
110 | sequentially-consistent operations. For this reason they are deprecated. | |
111 | They should instead be replaced with any of the following (ordered from | |
112 | easiest to hardest): | |
15e8699f | 113 | |
de99dab0 | 114 | - accesses inside a mutex or spinlock |
15e8699f | 115 | |
de99dab0 | 116 | - lightweight synchronization primitives such as ``QemuEvent`` |
15e8699f | 117 | |
de99dab0 PB |
118 | - RCU operations (``atomic_rcu_read``, ``atomic_rcu_set``) when publishing |
119 | or accessing a new version of a data structure | |
15e8699f | 120 | |
de99dab0 PB |
121 | - other atomic accesses: ``atomic_read`` and ``atomic_load_acquire`` for |
122 | loads, ``atomic_set`` and ``atomic_store_release`` for stores, ``smp_mb`` | |
123 | to forbid reordering subsequent loads before a store. | |
15e8699f PB |
124 | |
125 | ||
126 | Weak atomic access and manual memory barriers | |
127 | ============================================= | |
128 | ||
129 | Compared to sequentially consistent atomic access, programming with | |
130 | weaker consistency models can be considerably more complicated. | |
de99dab0 PB |
131 | The only guarantees that you can rely upon in this case are: |
132 | ||
133 | - atomic accesses will not cause data races (and hence undefined behavior); | |
134 | ordinary accesses instead cause data races if they are concurrent with | |
135 | other accesses of which at least one is a write. In order to ensure this, | |
136 | the compiler will not optimize accesses out of existence, create unsolicited | |
137 | accesses, or perform other similar optimzations. | |
138 | ||
139 | - acquire operations will appear to happen, with respect to the other | |
140 | components of the system, before all the LOAD or STORE operations | |
141 | specified afterwards. | |
142 | ||
143 | - release operations will appear to happen, with respect to the other | |
144 | components of the system, after all the LOAD or STORE operations | |
145 | specified before. | |
146 | ||
147 | - release operations will *synchronize with* acquire operations; | |
148 | see :ref:`acqrel` for a detailed explanation. | |
15e8699f PB |
149 | |
150 | When using this model, variables are accessed with: | |
151 | ||
152 | - ``atomic_read()`` and ``atomic_set()``; these prevent the compiler from | |
153 | optimizing accesses out of existence and creating unsolicited | |
154 | accesses, but do not otherwise impose any ordering on loads and | |
155 | stores: both the compiler and the processor are free to reorder | |
156 | them. | |
157 | ||
158 | - ``atomic_load_acquire()``, which guarantees the LOAD to appear to | |
159 | happen, with respect to the other components of the system, | |
160 | before all the LOAD or STORE operations specified afterwards. | |
161 | Operations coming before ``atomic_load_acquire()`` can still be | |
162 | reordered after it. | |
163 | ||
164 | - ``atomic_store_release()``, which guarantees the STORE to appear to | |
165 | happen, with respect to the other components of the system, | |
de99dab0 | 166 | after all the LOAD or STORE operations specified before. |
15e8699f | 167 | Operations coming after ``atomic_store_release()`` can still be |
de99dab0 | 168 | reordered before it. |
15e8699f PB |
169 | |
170 | Restrictions to the ordering of accesses can also be specified | |
171 | using the memory barrier macros: ``smp_rmb()``, ``smp_wmb()``, ``smp_mb()``, | |
172 | ``smp_mb_acquire()``, ``smp_mb_release()``, ``smp_read_barrier_depends()``. | |
173 | ||
174 | Memory barriers control the order of references to shared memory. | |
175 | They come in six kinds: | |
176 | ||
177 | - ``smp_rmb()`` guarantees that all the LOAD operations specified before | |
178 | the barrier will appear to happen before all the LOAD operations | |
179 | specified after the barrier with respect to the other components of | |
180 | the system. | |
181 | ||
182 | In other words, ``smp_rmb()`` puts a partial ordering on loads, but is not | |
183 | required to have any effect on stores. | |
184 | ||
185 | - ``smp_wmb()`` guarantees that all the STORE operations specified before | |
186 | the barrier will appear to happen before all the STORE operations | |
187 | specified after the barrier with respect to the other components of | |
188 | the system. | |
189 | ||
190 | In other words, ``smp_wmb()`` puts a partial ordering on stores, but is not | |
191 | required to have any effect on loads. | |
192 | ||
193 | - ``smp_mb_acquire()`` guarantees that all the LOAD operations specified before | |
194 | the barrier will appear to happen before all the LOAD or STORE operations | |
195 | specified after the barrier with respect to the other components of | |
196 | the system. | |
197 | ||
198 | - ``smp_mb_release()`` guarantees that all the STORE operations specified *after* | |
199 | the barrier will appear to happen after all the LOAD or STORE operations | |
200 | specified *before* the barrier with respect to the other components of | |
201 | the system. | |
202 | ||
203 | - ``smp_mb()`` guarantees that all the LOAD and STORE operations specified | |
204 | before the barrier will appear to happen before all the LOAD and | |
205 | STORE operations specified after the barrier with respect to the other | |
206 | components of the system. | |
207 | ||
208 | ``smp_mb()`` puts a partial ordering on both loads and stores. It is | |
209 | stronger than both a read and a write memory barrier; it implies both | |
210 | ``smp_mb_acquire()`` and ``smp_mb_release()``, but it also prevents STOREs | |
211 | coming before the barrier from overtaking LOADs coming after the | |
212 | barrier and vice versa. | |
213 | ||
214 | - ``smp_read_barrier_depends()`` is a weaker kind of read barrier. On | |
215 | most processors, whenever two loads are performed such that the | |
216 | second depends on the result of the first (e.g., the first load | |
217 | retrieves the address to which the second load will be directed), | |
218 | the processor will guarantee that the first LOAD will appear to happen | |
219 | before the second with respect to the other components of the system. | |
220 | However, this is not always true---for example, it was not true on | |
221 | Alpha processors. Whenever this kind of access happens to shared | |
222 | memory (that is not protected by a lock), a read barrier is needed, | |
223 | and ``smp_read_barrier_depends()`` can be used instead of ``smp_rmb()``. | |
224 | ||
225 | Note that the first load really has to have a _data_ dependency and not | |
226 | a control dependency. If the address for the second load is dependent | |
227 | on the first load, but the dependency is through a conditional rather | |
228 | than actually loading the address itself, then it's a _control_ | |
229 | dependency and a full read barrier or better is required. | |
230 | ||
231 | ||
de99dab0 PB |
232 | Memory barriers and ``atomic_load_acquire``/``atomic_store_release`` are |
233 | mostly used when a data structure has one thread that is always a writer | |
234 | and one thread that is always a reader: | |
235 | ||
236 | +----------------------------------+----------------------------------+ | |
237 | | thread 1 | thread 2 | | |
238 | +==================================+==================================+ | |
239 | | :: | :: | | |
240 | | | | | |
241 | | atomic_store_release(&a, x); | y = atomic_load_acquire(&b); | | |
242 | | atomic_store_release(&b, y); | x = atomic_load_acquire(&a); | | |
243 | +----------------------------------+----------------------------------+ | |
244 | ||
245 | In this case, correctness is easy to check for using the "pairing" | |
246 | trick that is explained below. | |
247 | ||
248 | Sometimes, a thread is accessing many variables that are otherwise | |
249 | unrelated to each other (for example because, apart from the current | |
250 | thread, exactly one other thread will read or write each of these | |
251 | variables). In this case, it is possible to "hoist" the barriers | |
252 | outside a loop. For example: | |
253 | ||
254 | +------------------------------------------+----------------------------------+ | |
255 | | before | after | | |
256 | +==========================================+==================================+ | |
257 | | :: | :: | | |
258 | | | | | |
259 | | n = 0; | n = 0; | | |
260 | | for (i = 0; i < 10; i++) | for (i = 0; i < 10; i++) | | |
261 | | n += atomic_load_acquire(&a[i]); | n += atomic_read(&a[i]); | | |
262 | | | smp_mb_acquire(); | | |
263 | +------------------------------------------+----------------------------------+ | |
264 | | :: | :: | | |
265 | | | | | |
266 | | | smp_mb_release(); | | |
267 | | for (i = 0; i < 10; i++) | for (i = 0; i < 10; i++) | | |
268 | | atomic_store_release(&a[i], false); | atomic_set(&a[i], false); | | |
269 | +------------------------------------------+----------------------------------+ | |
270 | ||
271 | Splitting a loop can also be useful to reduce the number of barriers: | |
272 | ||
273 | +------------------------------------------+----------------------------------+ | |
274 | | before | after | | |
275 | +==========================================+==================================+ | |
276 | | :: | :: | | |
277 | | | | | |
278 | | n = 0; | smp_mb_release(); | | |
279 | | for (i = 0; i < 10; i++) { | for (i = 0; i < 10; i++) | | |
280 | | atomic_store_release(&a[i], false); | atomic_set(&a[i], false); | | |
281 | | smp_mb(); | smb_mb(); | | |
282 | | n += atomic_read(&b[i]); | n = 0; | | |
283 | | } | for (i = 0; i < 10; i++) | | |
284 | | | n += atomic_read(&b[i]); | | |
285 | +------------------------------------------+----------------------------------+ | |
286 | ||
287 | In this case, a ``smp_mb_release()`` is also replaced with a (possibly cheaper, and clearer | |
288 | as well) ``smp_wmb()``: | |
289 | ||
290 | +------------------------------------------+----------------------------------+ | |
291 | | before | after | | |
292 | +==========================================+==================================+ | |
293 | | :: | :: | | |
294 | | | | | |
295 | | | smp_mb_release(); | | |
296 | | for (i = 0; i < 10; i++) { | for (i = 0; i < 10; i++) | | |
297 | | atomic_store_release(&a[i], false); | atomic_set(&a[i], false); | | |
298 | | atomic_store_release(&b[i], false); | smb_wmb(); | | |
299 | | } | for (i = 0; i < 10; i++) | | |
300 | | | atomic_set(&b[i], false); | | |
301 | +------------------------------------------+----------------------------------+ | |
302 | ||
303 | ||
304 | .. _acqrel: | |
305 | ||
306 | Acquire/release pairing and the *synchronizes-with* relation | |
307 | ------------------------------------------------------------ | |
308 | ||
309 | Atomic operations other than ``atomic_set()`` and ``atomic_read()`` have | |
310 | either *acquire* or *release* semantics [#rmw]_. This has two effects: | |
311 | ||
312 | .. [#rmw] Read-modify-write operations can have both---acquire applies to the | |
313 | read part, and release to the write. | |
314 | ||
315 | - within a thread, they are ordered either before subsequent operations | |
316 | (for acquire) or after previous operations (for release). | |
317 | ||
318 | - if a release operation in one thread *synchronizes with* an acquire operation | |
319 | in another thread, the ordering constraints propagates from the first to the | |
320 | second thread. That is, everything before the release operation in the | |
321 | first thread is guaranteed to *happen before* everything after the | |
322 | acquire operation in the second thread. | |
323 | ||
324 | The concept of acquire and release semantics is not exclusive to atomic | |
325 | operations; almost all higher-level synchronization primitives also have | |
326 | acquire or release semantics. For example: | |
327 | ||
328 | - ``pthread_mutex_lock`` has acquire semantics, ``pthread_mutex_unlock`` has | |
329 | release semantics and synchronizes with a ``pthread_mutex_lock`` for the | |
330 | same mutex. | |
331 | ||
332 | - ``pthread_cond_signal`` and ``pthread_cond_broadcast`` have release semantics; | |
333 | ``pthread_cond_wait`` has both release semantics (synchronizing with | |
334 | ``pthread_mutex_lock``) and acquire semantics (synchronizing with | |
335 | ``pthread_mutex_unlock`` and signaling of the condition variable). | |
336 | ||
337 | - ``pthread_create`` has release semantics and synchronizes with the start | |
338 | of the new thread; ``pthread_join`` has acquire semantics and synchronizes | |
339 | with the exiting of the thread. | |
340 | ||
341 | - ``qemu_event_set`` has release semantics, ``qemu_event_wait`` has | |
342 | acquire semantics. | |
343 | ||
344 | For example, in the following example there are no atomic accesses, but still | |
345 | thread 2 is relying on the *synchronizes-with* relation between ``pthread_exit`` | |
346 | (release) and ``pthread_join`` (acquire): | |
347 | ||
348 | +----------------------+-------------------------------+ | |
349 | | thread 1 | thread 2 | | |
350 | +======================+===============================+ | |
351 | | :: | :: | | |
352 | | | | | |
353 | | *a = 1; | | | |
354 | | pthread_exit(a); | pthread_join(thread1, &a); | | |
355 | | | x = *a; | | |
356 | +----------------------+-------------------------------+ | |
357 | ||
358 | Synchronization between threads basically descends from this pairing of | |
359 | a release operation and an acquire operation. Therefore, atomic operations | |
360 | other than ``atomic_set()`` and ``atomic_read()`` will almost always be | |
361 | paired with another operation of the opposite kind: an acquire operation | |
362 | will pair with a release operation and vice versa. This rule of thumb is | |
363 | extremely useful; in the case of QEMU, however, note that the other | |
364 | operation may actually be in a driver that runs in the guest! | |
365 | ||
366 | ``smp_read_barrier_depends()``, ``smp_rmb()``, ``smp_mb_acquire()``, | |
367 | ``atomic_load_acquire()`` and ``atomic_rcu_read()`` all count | |
368 | as acquire operations. ``smp_wmb()``, ``smp_mb_release()``, | |
369 | ``atomic_store_release()`` and ``atomic_rcu_set()`` all count as release | |
370 | operations. ``smp_mb()`` counts as both acquire and release, therefore | |
371 | it can pair with any other atomic operation. Here is an example: | |
372 | ||
373 | +----------------------+------------------------------+ | |
374 | | thread 1 | thread 2 | | |
375 | +======================+==============================+ | |
376 | | :: | :: | | |
377 | | | | | |
378 | | atomic_set(&a, 1); | | | |
379 | | smp_wmb(); | | | |
380 | | atomic_set(&b, 2); | x = atomic_read(&b); | | |
381 | | | smp_rmb(); | | |
382 | | | y = atomic_read(&a); | | |
383 | +----------------------+------------------------------+ | |
384 | ||
385 | Note that a load-store pair only counts if the two operations access the | |
386 | same variable: that is, a store-release on a variable ``x`` *synchronizes | |
387 | with* a load-acquire on a variable ``x``, while a release barrier | |
388 | synchronizes with any acquire operation. The following example shows | |
389 | correct synchronization: | |
390 | ||
391 | +--------------------------------+--------------------------------+ | |
392 | | thread 1 | thread 2 | | |
393 | +================================+================================+ | |
394 | | :: | :: | | |
395 | | | | | |
396 | | atomic_set(&a, 1); | | | |
397 | | atomic_store_release(&b, 2); | x = atomic_load_acquire(&b); | | |
398 | | | y = atomic_read(&a); | | |
399 | +--------------------------------+--------------------------------+ | |
400 | ||
401 | Acquire and release semantics of higher-level primitives can also be | |
402 | relied upon for the purpose of establishing the *synchronizes with* | |
403 | relation. | |
15e8699f PB |
404 | |
405 | Note that the "writing" thread is accessing the variables in the | |
406 | opposite order as the "reading" thread. This is expected: stores | |
de99dab0 PB |
407 | before a release operation will normally match the loads after |
408 | the acquire operation, and vice versa. In fact, this happened already | |
409 | in the ``pthread_exit``/``pthread_join`` example above. | |
410 | ||
411 | Finally, this more complex example has more than two accesses and data | |
412 | dependency barriers. It also does not use atomic accesses whenever there | |
413 | cannot be a data race: | |
15e8699f PB |
414 | |
415 | +----------------------+------------------------------+ | |
416 | | thread 1 | thread 2 | | |
417 | +======================+==============================+ | |
418 | | :: | :: | | |
419 | | | | | |
420 | | b[2] = 1; | | | |
421 | | smp_wmb(); | | | |
422 | | x->i = 2; | | | |
423 | | smp_wmb(); | | | |
de99dab0 | 424 | | atomic_set(&a, x); | x = atomic_read(&a); | |
15e8699f PB |
425 | | | smp_read_barrier_depends(); | |
426 | | | y = x->i; | | |
427 | | | smp_read_barrier_depends(); | | |
428 | | | z = b[y]; | | |
429 | +----------------------+------------------------------+ | |
430 | ||
de99dab0 PB |
431 | Comparison with Linux kernel primitives |
432 | ======================================= | |
15e8699f PB |
433 | |
434 | Here is a list of differences between Linux kernel atomic operations | |
435 | and memory barriers, and the equivalents in QEMU: | |
436 | ||
437 | - atomic operations in Linux are always on a 32-bit int type and | |
438 | use a boxed ``atomic_t`` type; atomic operations in QEMU are polymorphic | |
439 | and use normal C types. | |
440 | ||
441 | - Originally, ``atomic_read`` and ``atomic_set`` in Linux gave no guarantee | |
442 | at all. Linux 4.1 updated them to implement volatile | |
443 | semantics via ``ACCESS_ONCE`` (or the more recent ``READ``/``WRITE_ONCE``). | |
444 | ||
445 | QEMU's ``atomic_read`` and ``atomic_set`` implement C11 atomic relaxed | |
446 | semantics if the compiler supports it, and volatile semantics otherwise. | |
447 | Both semantics prevent the compiler from doing certain transformations; | |
448 | the difference is that atomic accesses are guaranteed to be atomic, | |
449 | while volatile accesses aren't. Thus, in the volatile case we just cross | |
450 | our fingers hoping that the compiler will generate atomic accesses, | |
451 | since we assume the variables passed are machine-word sized and | |
452 | properly aligned. | |
453 | ||
454 | No barriers are implied by ``atomic_read`` and ``atomic_set`` in either Linux | |
455 | or QEMU. | |
456 | ||
457 | - atomic read-modify-write operations in Linux are of three kinds: | |
458 | ||
459 | ===================== ========================================= | |
460 | ``atomic_OP`` returns void | |
461 | ``atomic_OP_return`` returns new value of the variable | |
462 | ``atomic_fetch_OP`` returns the old value of the variable | |
463 | ``atomic_cmpxchg`` returns the old value of the variable | |
464 | ===================== ========================================= | |
465 | ||
de99dab0 | 466 | In QEMU, the second kind is named ``atomic_OP_fetch``. |
15e8699f PB |
467 | |
468 | - different atomic read-modify-write operations in Linux imply | |
469 | a different set of memory barriers; in QEMU, all of them enforce | |
de99dab0 PB |
470 | sequential consistency. |
471 | ||
472 | - in QEMU, ``atomic_read()`` and ``atomic_set()`` do not participate in | |
473 | the total ordering enforced by sequentially-consistent operations. | |
474 | This is because QEMU uses the C11 memory model. The following example | |
475 | is correct in Linux but not in QEMU: | |
476 | ||
477 | +----------------------------------+--------------------------------+ | |
478 | | Linux (correct) | QEMU (incorrect) | | |
479 | +==================================+================================+ | |
480 | | :: | :: | | |
481 | | | | | |
482 | | a = atomic_fetch_add(&x, 2); | a = atomic_fetch_add(&x, 2); | | |
483 | | b = READ_ONCE(&y); | b = atomic_read(&y); | | |
484 | +----------------------------------+--------------------------------+ | |
485 | ||
486 | because the read of ``y`` can be moved (by either the processor or the | |
487 | compiler) before the write of ``x``. | |
488 | ||
489 | Fixing this requires an ``smp_mb()`` memory barrier between the write | |
490 | of ``x`` and the read of ``y``. In the common case where only one thread | |
491 | writes ``x``, it is also possible to write it like this: | |
492 | ||
493 | +--------------------------------+ | |
494 | | QEMU (correct) | | |
495 | +================================+ | |
496 | | :: | | |
497 | | | | |
498 | | a = atomic_read(&x); | | |
499 | | atomic_set(&x, a + 2); | | |
500 | | smp_mb(); | | |
501 | | b = atomic_read(&y); | | |
502 | +--------------------------------+ | |
15e8699f PB |
503 | |
504 | Sources | |
505 | ======= | |
506 | ||
507 | - ``Documentation/memory-barriers.txt`` from the Linux kernel |