[linux.git] / Documentation / mutex-design.txt

Generic Mutex Subsystem

started by Ingo Molnar <[email protected]>

  "Why on earth do we need a new mutex subsystem, and what's wrong
   with semaphores?"

firstly, there's nothing wrong with semaphores. But if the simpler
mutex semantics are sufficient for your code, then there are a couple
of advantages of mutexes:

 - 'struct mutex' is smaller on most architectures: .e.g on x86,
   'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
   A smaller structure size means less RAM footprint, and better
   CPU-cache utilization.

 - tighter code. On x86 i get the following .text sizes when
   switching all mutex-alike semaphores in the kernel to the mutex
   subsystem:

        text    data     bss     dec     hex filename
     3280380  868188  396860 4545428  455b94 vmlinux-semaphore
     3255329  865296  396732 4517357  44eded vmlinux-mutex

   that's 25051 bytes of code saved, or a 0.76% win - off the hottest
   codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
   Smaller code means better icache footprint, which is one of the
   major optimization goals in the Linux kernel currently.

 - the mutex subsystem is slightly faster and has better scalability for
   contended workloads. On an 8-way x86 system, running a mutex-based
   kernel and testing creat+unlink+close (of separate, per-task files)
   in /tmp with 16 parallel tasks, the average number of ops/sec is:

    Semaphores:                        Mutexes:

    $ ./test-mutex V 16 10             $ ./test-mutex V 16 10
    8 CPUs, running 16 tasks.          8 CPUs, running 16 tasks.
    checking VFS performance.          checking VFS performance.
    avg loops/sec:      34713          avg loops/sec:      84153
    CPU utilization:    63%            CPU utilization:    22%

   i.e. in this workload, the mutex based kernel was 2.4 times faster
   than the semaphore based kernel, _and_ it also had 2.8 times less CPU
   utilization. (In terms of 'ops per CPU cycle', the semaphore kernel
   performed 551 ops/sec per 1% of CPU time used, while the mutex kernel
   performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times
   more efficient.)

   the scalability difference is visible even on a 2-way P4 HT box:

    Semaphores:                        Mutexes:

    $ ./test-mutex V 16 10             $ ./test-mutex V 16 10
    4 CPUs, running 16 tasks.          8 CPUs, running 16 tasks.
    checking VFS performance.          checking VFS performance.
    avg loops/sec:      127659         avg loops/sec:      181082
    CPU utilization:    100%           CPU utilization:    34%

   (the straight performance advantage of mutexes is 41%, the per-cycle
    efficiency of mutexes is 4.1 times better.)

 - there are no fastpath tradeoffs, the mutex fastpath is just as tight
   as the semaphore fastpath. On x86, the locking fastpath is 2
   instructions:

    c0377ccb <mutex_lock>:
    c0377ccb:       f0 ff 08                lock decl (%eax)
    c0377cce:       78 0e                   js     c0377cde <.text.lock.mutex>
    c0377cd0:       c3                      ret

   the unlocking fastpath is equally tight:

    c0377cd1 <mutex_unlock>:
    c0377cd1:       f0 ff 00                lock incl (%eax)
    c0377cd4:       7e 0f                   jle    c0377ce5 <.text.lock.mutex+0x7>
    c0377cd6:       c3                      ret

 - 'struct mutex' semantics are well-defined and are enforced if
   CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have
   virtually no debugging code or instrumentation. The mutex subsystem
   checks and enforces the following rules:

   * - only one task can hold the mutex at a time
   * - only the owner can unlock the mutex
   * - multiple unlocks are not permitted
   * - recursive locking is not permitted
   * - a mutex object must be initialized via the API
   * - a mutex object must not be initialized via memset or copying
   * - task may not exit with mutex held
   * - memory areas where held locks reside must not be freed
   * - held mutexes must not be reinitialized
   * - mutexes may not be used in hardware or software interrupt
   *   contexts such as tasklets and timers

   furthermore, there are also convenience features in the debugging
   code:

   * - uses symbolic names of mutexes, whenever they are printed in debug output
   * - point-of-acquire tracking, symbolic lookup of function names
   * - list of all locks held in the system, printout of them
   * - owner tracking
   * - detects self-recursing locks and prints out all relevant info
   * - detects multi-task circular deadlocks and prints out all affected
   *   locks and tasks (and only those tasks)

Disadvantages
-------------

The stricter mutex API means you cannot use mutexes the same way you
can use semaphores: e.g. they cannot be used from an interrupt context,
nor can they be unlocked from a different context that which acquired
it. [ I'm not aware of any other (e.g. performance) disadvantages from
using mutexes at the moment, please let me know if you find any. ]

Implementation of mutexes
-------------------------

'struct mutex' is the new mutex type, defined in include/linux/mutex.h
and implemented in kernel/mutex.c. It is a counter-based mutex with a
spinlock and a wait-list. The counter has 3 states: 1 for "unlocked",
0 for "locked" and negative numbers (usually -1) for "locked, potential
waiters queued".

the APIs of 'struct mutex' have been streamlined:

 DEFINE_MUTEX(name);

 mutex_init(mutex);

 void mutex_lock(struct mutex *lock);
 int  mutex_lock_interruptible(struct mutex *lock);
 int  mutex_trylock(struct mutex *lock);
 void mutex_unlock(struct mutex *lock);
 int  mutex_is_locked(struct mutex *lock);
 void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
 int  mutex_lock_interruptible_nested(struct mutex *lock,
                                      unsigned int subclass);
Commit	Line	Data
f3f54ffa IM	1	Generic Mutex Subsystem
	2
	3	started by Ingo Molnar <[email protected]>
	4
	5	"Why on earth do we need a new mutex subsystem, and what's wrong
	6	with semaphores?"
	7
	8	firstly, there's nothing wrong with semaphores. But if the simpler
	9	mutex semantics are sufficient for your code, then there are a couple
	10	of advantages of mutexes:
	11
	12	- 'struct mutex' is smaller on most architectures: .e.g on x86,
	13	'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
	14	A smaller structure size means less RAM footprint, and better
	15	CPU-cache utilization.
	16
	17	- tighter code. On x86 i get the following .text sizes when
	18	switching all mutex-alike semaphores in the kernel to the mutex
	19	subsystem:
	20
	21	text data bss dec hex filename
	22	3280380 868188 396860 4545428 455b94 vmlinux-semaphore
	23	3255329 865296 396732 4517357 44eded vmlinux-mutex
	24
	25	that's 25051 bytes of code saved, or a 0.76% win - off the hottest
	26	codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
	27	Smaller code means better icache footprint, which is one of the
	28	major optimization goals in the Linux kernel currently.
	29
	30	- the mutex subsystem is slightly faster and has better scalability for
	31	contended workloads. On an 8-way x86 system, running a mutex-based
	32	kernel and testing creat+unlink+close (of separate, per-task files)
	33	in /tmp with 16 parallel tasks, the average number of ops/sec is:
	34
	35	Semaphores: Mutexes:
	36
	37	$ ./test-mutex V 16 10 $ ./test-mutex V 16 10
	38	8 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
	39	checking VFS performance. checking VFS performance.
	40	avg loops/sec: 34713 avg loops/sec: 84153
	41	CPU utilization: 63% CPU utilization: 22%
	42
	43	i.e. in this workload, the mutex based kernel was 2.4 times faster
	44	than the semaphore based kernel, _and_ it also had 2.8 times less CPU
	45	utilization. (In terms of 'ops per CPU cycle', the semaphore kernel
	46	performed 551 ops/sec per 1% of CPU time used, while the mutex kernel
	47	performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times
	48	more efficient.)
	49
	50	the scalability difference is visible even on a 2-way P4 HT box:
	51
	52	Semaphores: Mutexes:
	53
	54	$ ./test-mutex V 16 10 $ ./test-mutex V 16 10
	55	4 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
	56	checking VFS performance. checking VFS performance.
	57	avg loops/sec: 127659 avg loops/sec: 181082
	58	CPU utilization: 100% CPU utilization: 34%
	59
	60	(the straight performance advantage of mutexes is 41%, the per-cycle
	61	efficiency of mutexes is 4.1 times better.)
	62
	63	- there are no fastpath tradeoffs, the mutex fastpath is just as tight
	64	as the semaphore fastpath. On x86, the locking fastpath is 2
65	instructions:
66
67	c0377ccb <mutex_lock>:
68	c0377ccb: f0 ff 08 lock decl (%eax)
69	c0377cce: 78 0e js c0377cde <.text.lock.mutex>
70	c0377cd0: c3 ret
71
72	the unlocking fastpath is equally tight:
73
74	c0377cd1 <mutex_unlock>:
75	c0377cd1: f0 ff 00 lock incl (%eax)
76	c0377cd4: 7e 0f jle c0377ce5 <.text.lock.mutex+0x7>
77	c0377cd6: c3 ret
78
79	- 'struct mutex' semantics are well-defined and are enforced if
80	CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have
81	virtually no debugging code or instrumentation. The mutex subsystem
82	checks and enforces the following rules:
83
84	* - only one task can hold the mutex at a time
85	* - only the owner can unlock the mutex
86	* - multiple unlocks are not permitted
87	* - recursive locking is not permitted
88	* - a mutex object must be initialized via the API
89	* - a mutex object must not be initialized via memset or copying
90	* - task may not exit with mutex held
91	* - memory areas where held locks reside must not be freed
92	* - held mutexes must not be reinitialized
f20fda48 ML	93	* - mutexes may not be used in hardware or software interrupt
f20fda48 ML	94	* contexts such as tasklets and timers
f3f54ffa IM	95
	96	furthermore, there are also convenience features in the debugging
	97	code:
	98
	99	* - uses symbolic names of mutexes, whenever they are printed in debug output
	100	* - point-of-acquire tracking, symbolic lookup of function names
	101	* - list of all locks held in the system, printout of them
	102	* - owner tracking
	103	* - detects self-recursing locks and prints out all relevant info
	104	* - detects multi-task circular deadlocks and prints out all affected
	105	* locks and tasks (and only those tasks)
	106
	107	Disadvantages
	108	-------------
	109
	110	The stricter mutex API means you cannot use mutexes the same way you
	111	can use semaphores: e.g. they cannot be used from an interrupt context,
	112	nor can they be unlocked from a different context that which acquired
	113	it. [ I'm not aware of any other (e.g. performance) disadvantages from
	114	using mutexes at the moment, please let me know if you find any. ]
	115
	116	Implementation of mutexes
	117	-------------------------
	118
	119	'struct mutex' is the new mutex type, defined in include/linux/mutex.h
	120	and implemented in kernel/mutex.c. It is a counter-based mutex with a
	121	spinlock and a wait-list. The counter has 3 states: 1 for "unlocked",
	122	0 for "locked" and negative numbers (usually -1) for "locked, potential
	123	waiters queued".
	124
	125	the APIs of 'struct mutex' have been streamlined:
	126
	127	DEFINE_MUTEX(name);
	128
	129	mutex_init(mutex);
	130
	131	void mutex_lock(struct mutex *lock);
	132	int mutex_lock_interruptible(struct mutex *lock);
	133	int mutex_trylock(struct mutex *lock);
	134	void mutex_unlock(struct mutex *lock);
	135	int mutex_is_locked(struct mutex *lock);
343b9019 RD	136	void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
	137	int mutex_lock_interruptible_nested(struct mutex *lock,
	138	unsigned int subclass);