Since the 2.6 kernel, the oom killer has slightly biased away from
CAP_SYS_ADMIN processes by discounting some of its memory usage in
comparison to other processes.
This has always been implicit and nothing exactly relies on the
behavior.
Gaurav notices that __task_cred() can dereference a potentially freed
pointer if the task under consideration is exiting because a reference
to the task_struct is not held.
Remove the CAP_SYS_ADMIN bias so that all processes are treated equally.
If any CAP_SYS_ADMIN process would like to be biased against, it is
always allowed to adjust /proc/pid/oom_score_adj.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Rientjes <[email protected]>
Reported-by: Gaurav Kohli <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
mm_pgtables_bytes(p->mm) / PAGE_SIZE;
task_unlock(p);
- /*
- * Root processes get 3% bonus, just like the __vm_enough_memory()
- * implementation used by LSMs.
- */
- if (has_capability_noaudit(p, CAP_SYS_ADMIN))
- points -= (points * 3) / 100;
-
/* Normalize to oom_score_adj units */
adj *= totalpages / 1000;
points += adj;