timer/migration: Remove buggy early return on deactivation
When a CPU enters into idle and deactivates itself from the timer
migration hierarchy without any global timer of its own to propagate,
the group event of that CPU is set to "ignore" and tmigr_update_events()
accordingly performs an early return without considering timers queued
by other CPUs.
If the hierarchy has a single level, and the CPU is the last one to
enter idle, it will ignore others' global timers, as in the following
layout:
[GRP0:0]
migrator = 0
active = 0
nextevt = T0i
/ \
0 1
active (T0i) idle (T1)
0) CPU 0 is active thus its event is ignored (the letter 'i') and so are
upper levels' events. CPU 1 is idle and has the timer T1 enqueued.
[GRP0:0]
migrator = NONE
active = NONE
nextevt = T0i
/ \
0 1
idle (T0i) idle (T1)
1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is
pushed as its next expiry and its own event kept as "ignore". As a result
tmigr_update_events() ignores T1 and CPU 0 goes to idle with T1
unhandled.
This isn't proper to single level hierarchy though. A similar issue,
although slightly different, may arise on multi-level:
[GRP1:0]
migrator = GRP0:0
active = GRP0:0
nextevt = T0:0i, T0:1
/ \
[GRP0:0] [GRP0:1]
migrator = 0 migrator = NONE
active = 0 active = NONE
nextevt = T0i nextevt = T2
/ \ / \
0 (T0i) 1 (T1) 2 (T2) 3
active idle idle idle
0) CPU 0 is active thus its event is ignored (the letter 'i') and so are
upper levels' events. CPU 1 is idle and has the timer T1 enqueued.
CPU 2 also has a timer. The expiry order is T0 (ignored) < T1 < T2
[GRP1:0]
migrator = GRP0:0
active = GRP0:0
nextevt = T0:0i, T0:1
/ \
[GRP0:0] [GRP0:1]
migrator = NONE migrator = NONE
active = NONE active = NONE
nextevt = T0i nextevt = T2
/ \ / \
0 (T0i) 1 (T1) 2 (T2) 3
idle idle idle idle
1) CPU 0 goes idle without global event queued. Therefore KTIME_MAX is
pushed as its next expiry and its own event kept as "ignore". As a result
tmigr_update_events() ignores T1. The change only propagated up to 1st
level so far.
[GRP1:0]
migrator = NONE
active = NONE
nextevt = T0:1
/ \
[GRP0:0] [GRP0:1]
migrator = NONE migrator = NONE
active = NONE active = NONE
nextevt = T0i nextevt = T2
/ \ / \
0 (T0i) 1 (T1) 2 (T2) 3
idle idle idle idle
2) The change now propagates up to the top. tmigr_update_events() finds
that the child event is ignored and thus removes it. The top level next
event is now T2 which is returned to CPU 0 as its next effective expiry
to take account for as the global idle migrator. However T1 has been
ignored along the way, leaving it unhandled.
Fix those issues with removing the buggy related early return. Ignored
child events must not prevent from evaluating the other events within
the same group.
Reported-by: Boqun Feng <[email protected]>
Reported-by: Florian Fainelli <[email protected]>
Reported-by: Thomas Gleixner <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/ZfOhB9ZByTZcBy4u@lothringen