-CACHE COHERENCY
----------------
-
-Life isn't quite as simple as it may appear above, however: for while the
-caches are expected to be coherent, there's no guarantee that that coherency
-will be ordered. This means that while changes made on one CPU will
-eventually become visible on all CPUs, there's no guarantee that they will
-become apparent in the same order on those other CPUs.
-
-
-Consider dealing with a system that has a pair of CPUs (1 & 2), each of which
-has a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D):
-
- :
- : +--------+
- : +---------+ | |
- +--------+ : +--->| Cache A |<------->| |
- | | : | +---------+ | |
- | CPU 1 |<---+ | |
- | | : | +---------+ | |
- +--------+ : +--->| Cache B |<------->| |
- : +---------+ | |
- : | Memory |
- : +---------+ | System |
- +--------+ : +--->| Cache C |<------->| |
- | | : | +---------+ | |
- | CPU 2 |<---+ | |
- | | : | +---------+ | |
- +--------+ : +--->| Cache D |<------->| |
- : +---------+ | |
- : +--------+
- :
-
-Imagine the system has the following properties:
-
- (*) an odd-numbered cache line may be in cache A, cache C or it may still be
- resident in memory;
-
- (*) an even-numbered cache line may be in cache B, cache D or it may still be
- resident in memory;
-
- (*) while the CPU core is interrogating one cache, the other cache may be
- making use of the bus to access the rest of the system - perhaps to
- displace a dirty cacheline or to do a speculative load;
-
- (*) each cache has a queue of operations that need to be applied to that cache
- to maintain coherency with the rest of the system;
-
- (*) the coherency queue is not flushed by normal loads to lines already
- present in the cache, even though the contents of the queue may
- potentially affect those loads.
-
-Imagine, then, that two writes are made on the first CPU, with a write barrier
-between them to guarantee that they will appear to reach that CPU's caches in
-the requisite order:
-
- CPU 1 CPU 2 COMMENT
- =============== =============== =======================================
- u == 0, v == 1 and p == &u, q == &u
- v = 2;
- smp_wmb(); Make sure change to v is visible before
- change to p
- <A:modify v=2> v is now in cache A exclusively
- p = &v;
- <B:modify p=&v> p is now in cache B exclusively
-
-The write memory barrier forces the other CPUs in the system to perceive that
-the local CPU's caches have apparently been updated in the correct order. But
-now imagine that the second CPU wants to read those values:
-
- CPU 1 CPU 2 COMMENT
- =============== =============== =======================================
- ...
- q = p;
- x = *q;
-
-The above pair of reads may then fail to happen in the expected order, as the
-cacheline holding p may get updated in one of the second CPU's caches while
-the update to the cacheline holding v is delayed in the other of the second
-CPU's caches by some other cache event:
-
- CPU 1 CPU 2 COMMENT
- =============== =============== =======================================
- u == 0, v == 1 and p == &u, q == &u
- v = 2;
- smp_wmb();
- <A:modify v=2> <C:busy>
- <C:queue v=2>
- p = &v; q = p;
- <D:request p>
- <B:modify p=&v> <D:commit p=&v>
- <D:read p>
- x = *q;
- <C:read *q> Reads from v before v updated in cache
- <C:unbusy>
- <C:commit v=2>
-
-Basically, while both cachelines will be updated on CPU 2 eventually, there's
-no guarantee that, without intervention, the order of update will be the same
-as that committed on CPU 1.
-
-
-To intervene, we need to interpolate a data dependency barrier or a read
-barrier between the loads (which as of v4.15 is supplied unconditionally
-by the READ_ONCE() macro). This will force the cache to commit its
-coherency queue before processing any further requests:
-
- CPU 1 CPU 2 COMMENT
- =============== =============== =======================================
- u == 0, v == 1 and p == &u, q == &u
- v = 2;
- smp_wmb();
- <A:modify v=2> <C:busy>
- <C:queue v=2>
- p = &v; q = p;
- <D:request p>
- <B:modify p=&v> <D:commit p=&v>
- <D:read p>
- smp_read_barrier_depends()
- <C:unbusy>
- <C:commit v=2>
- x = *q;
- <C:read *q> Reads from v after v updated in cache
-
-
-This sort of problem can be encountered on DEC Alpha processors as they have a
-split cache that improves performance by making better use of the data bus.
-While most CPUs do imply a data dependency barrier on the read when a memory
-access depends on a read, not all do, so it may not be relied on.
-
-Other CPUs may also have split caches, but must coordinate between the various
-cachelets for normal memory accesses. The semantics of the Alpha removes the
-need for hardware coordination in the absence of memory barriers, which
-permitted Alpha to sport higher CPU clock rates back in the day. However,
-please note that (again, as of v4.15) smp_read_barrier_depends() should not
-be used except in Alpha arch-specific code and within the READ_ONCE() macro.
-
-