sched/core: Optimize ttwu() spinning on p->on_cpu
Both Rik and Mel reported seeing ttwu() spend significant time on:
smp_cond_load_acquire(&p->on_cpu, !VAL);
Attempt to avoid this by queueing the wakeup on the CPU that owns the
p->on_cpu value. This will then allow the ttwu() to complete without
further waiting.
Since we run schedule() with interrupts disabled, the IPI is
guaranteed to happen after p->on_cpu is cleared, this is what makes it
safe to queue early.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Jirka Hladky <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: [email protected]
Cc: Hillf Danton <[email protected]>
Cc: Rik van Riel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]