pidfd: fix a poll race when setting exit_state
There is a race between reading task->exit_state in pidfd_poll and
writing it after do_notify_parent calls do_notify_pidfd. Expected
sequence of events is:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
tsk->exit_state = EXIT_DEAD
pidfd_poll
if (tsk->exit_state)
However nothing prevents the following sequence:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
pidfd_poll
if (tsk->exit_state)
tsk->exit_state = EXIT_DEAD
This causes a polling task to wait forever, since poll blocks because
exit_state is 0 and the waiting task is not notified again. A stress
test continuously doing pidfd poll and process exits uncovered this bug.
To fix it, we make sure that the task's exit_state is always set before
calling do_notify_pidfd.
Fixes: b53b0b9d9a6 ("pidfd: add polling support")
Cc: [email protected]
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[
[email protected]: adapt commit message and drop unneeded changes from wait_task_zombie]
Signed-off-by: Christian Brauner <[email protected]>