doc.2021.01.06a: Documentation updates.
fixes.2021.01.04b: Miscellaneous fixes.
kfree_rcu.2021.01.04a: kfree_rcu() updates.
mmdumpobj.2021.01.22a: Dump allocation point for memory blocks.
nocb.2021.01.06a: RCU callback offload updates and cblist segment lengths.
rt.2021.01.04a: Real-time updates.
stall.2021.01.06a: RCU CPU stall warning updates.
torture.2021.01.12a: Torture-test updates and polling SRCU grace-period API.
tortureall.2021.01.06a: Torture-test script updates.
to allow the various kernel subsystems (including RCU) to respond
appropriately to a given CPU-hotplug operation. Most RCU operations may
be invoked from CPU-hotplug notifiers, including even synchronous
- grace-period operations such as ``synchronize_rcu()`` and
- ``synchronize_rcu_expedited()``.
-
- However, all-callback-wait operations such as ``rcu_barrier()`` are also
- not supported, due to the fact that there are phases of CPU-hotplug
- operations where the outgoing CPU's callbacks will not be invoked until
- after the CPU-hotplug operation ends, which could also result in
- deadlock. Furthermore, ``rcu_barrier()`` blocks CPU-hotplug operations
- during its execution, which results in another type of deadlock when
- invoked from a CPU-hotplug notifier.
-------grace-period operations such as (``synchronize_rcu()`` and
-------``synchronize_rcu_expedited()``). However, these synchronous operations
++++++++grace-period operations such as (synchronize_rcu() and
++++++++synchronize_rcu_expedited()). However, these synchronous operations
+ do block and therefore cannot be invoked from notifiers that execute via
-------``stop_machine()``, specifically those between the ``CPUHP_AP_OFFLINE``
++++++++stop_machine(), specifically those between the ``CPUHP_AP_OFFLINE``
+ and ``CPUHP_AP_ONLINE`` states.
+
-------In addition, all-callback-wait operations such as ``rcu_barrier()`` may
++++++++In addition, all-callback-wait operations such as rcu_barrier() may
+ not be invoked from any CPU-hotplug notifier. This restriction is due
+ to the fact that there are phases of CPU-hotplug operations where the
+ outgoing CPU's callbacks will not be invoked until after the CPU-hotplug
+ operation ends, which could also result in deadlock. Furthermore,
-------``rcu_barrier()`` blocks CPU-hotplug operations during its execution,
++++++++rcu_barrier() blocks CPU-hotplug operations during its execution,
+ which results in another type of deadlock when invoked from a CPU-hotplug
+ notifier.
+
+ Finally, RCU must avoid deadlocks due to interaction between hotplug,
+ timers and grace period processing. It does so by maintaining its own set
+ of books that duplicate the centrally maintained ``cpu_online_mask``,
+ and also by reporting quiescent states explicitly when a CPU goes
+ offline. This explicit reporting of quiescent states avoids any need
+ for the force-quiescent-state loop (FQS) to report quiescent states for
+ offline CPUs. However, as a debugging measure, the FQS loop does splat
+ if offline CPUs block an RCU grace period for too long.
+
+ An offline CPU's quiescent state will be reported either:
+
-------1. As the CPU goes offline using RCU's hotplug notifier (``rcu_report_dead()``).
-------2. When grace period initialization (``rcu_gp_init()``) detects a
++++++++1. As the CPU goes offline using RCU's hotplug notifier (rcu_report_dead()).
++++++++2. When grace period initialization (rcu_gp_init()) detects a
+ race either with CPU offlining or with a task unblocking on a leaf
+ ``rcu_node`` structure whose CPUs are all offline.
+
-------The CPU-online path (``rcu_cpu_starting()``) should never need to report
++++++++The CPU-online path (rcu_cpu_starting()) should never need to report
+ a quiescent state for an offline CPU. However, as a debugging measure,
+ it does emit a warning if a quiescent state was not already reported
+ for that CPU.
+
+ During the checking/modification of RCU's hotplug bookkeeping, the
+ corresponding CPU's leaf node lock is held. This avoids race conditions
+ between RCU's hotplug notifier hooks, the grace period initialization
+ code, and the FQS loop, all of which refer to or modify this bookkeeping.
Scheduler and RCU
~~~~~~~~~~~~~~~~~
The `SRCU
API <https://lwn.net/Articles/609973/#RCU%20Per-Flavor%20API%20Table>`__
--------includes ``srcu_read_lock()``, ``srcu_read_unlock()``,
--------``srcu_dereference()``, ``srcu_dereference_check()``,
--------``synchronize_srcu()``, ``synchronize_srcu_expedited()``,
--------``call_srcu()``, ``srcu_barrier()``, and ``srcu_read_lock_held()``. It
--------also includes ``DEFINE_SRCU()``, ``DEFINE_STATIC_SRCU()``, and
--------``init_srcu_struct()`` APIs for defining and initializing
++++++++includes srcu_read_lock(), srcu_read_unlock(),
++++++++srcu_dereference(), srcu_dereference_check(),
++++++++synchronize_srcu(), synchronize_srcu_expedited(),
++++++++call_srcu(), srcu_barrier(), and srcu_read_lock_held(). It
++++++++also includes DEFINE_SRCU(), DEFINE_STATIC_SRCU(), and
++++++++init_srcu_struct() APIs for defining and initializing
``srcu_struct`` structures.
+++++++ +More recently, the SRCU API has added polling interfaces:
+++++++ +
+++++++ +#. start_poll_synchronize_srcu() returns a cookie identifying
+++++++ + the completion of a future SRCU grace period and ensures
+++++++ + that this grace period will be started.
+++++++ +#. poll_state_synchronize_srcu() returns ``true`` iff the
+++++++ + specified cookie corresponds to an already-completed
+++++++ + SRCU grace period.
+++++++ +#. get_state_synchronize_srcu() returns a cookie just like
+++++++ + start_poll_synchronize_srcu() does, but differs in that
+++++++ + it does nothing to ensure that any future SRCU grace period
+++++++ + will be started.
+++++++ +
+++++++ +These functions are used to avoid unnecessary SRCU grace periods in
+++++++ +certain types of buffer-cache algorithms having multi-stage age-out
+++++++ +mechanisms. The idea is that by the time the block has aged completely
+++++++ +from the cache, an SRCU grace period will be very likely to have elapsed.
+++++++ +
Tasks RCU
~~~~~~~~~
#include <linux/moduleparam.h>
#include <linux/delay.h>
#include <linux/slab.h>
- -------#include <linux/percpu-rwsem.h>
#include <linux/torture.h>
+ #include <linux/reboot.h>
MODULE_LICENSE("GPL");
return 0;
}
+++++++ +// Used by writers to randomly choose from the available grace-period
+++++++ +// primitives. The only purpose of the initialization is to size the array.
+++++++ +static int synctype[] = { RTWS_DEF_FREE, RTWS_EXP_SYNC, RTWS_COND_GET, RTWS_POLL_GET, RTWS_SYNC };
+++++++ +static int nsynctypes;
+++++++ +
/*
------- - * RCU torture writer kthread. Repeatedly substitutes a new structure
------- - * for that pointed to by rcu_torture_current, freeing the old structure
------- - * after a series of grace periods (the "pipeline").
+++++++ + * Determine which grace-period primitives are available.
*/
------- -static int
------- -rcu_torture_writer(void *arg)
+++++++ +static void rcu_torture_write_types(void)
{
------- - bool can_expedite = !rcu_gp_is_expedited() && !rcu_gp_is_normal();
------- - int expediting = 0;
------- - unsigned long gp_snap;
bool gp_cond1 = gp_cond, gp_exp1 = gp_exp, gp_normal1 = gp_normal;
------- - bool gp_sync1 = gp_sync;
------- - int i;
- ----- - int oldnice = task_nice(current);
------- - struct rcu_torture *rp;
------- - struct rcu_torture *old_rp;
------- - static DEFINE_TORTURE_RANDOM(rand);
- ----- - bool stutter_waited;
------- - int synctype[] = { RTWS_DEF_FREE, RTWS_EXP_SYNC,
------- - RTWS_COND_GET, RTWS_SYNC };
------- - int nsynctypes = 0;
------- -
------- - VERBOSE_TOROUT_STRING("rcu_torture_writer task started");
------- - if (!can_expedite)
------- - pr_alert("%s" TORTURE_FLAG
------- - " GP expediting controlled from boot/sysfs for %s.\n",
------- - torture_type, cur_ops->name);
+++++++ + bool gp_poll1 = gp_poll, gp_sync1 = gp_sync;
/* Initialize synctype[] array. If none set, take default. */
------- - if (!gp_cond1 && !gp_exp1 && !gp_normal1 && !gp_sync1)
------- - gp_cond1 = gp_exp1 = gp_normal1 = gp_sync1 = true;
------- - if (gp_cond1 && cur_ops->get_state && cur_ops->cond_sync) {
+++++++ + if (!gp_cond1 && !gp_exp1 && !gp_normal1 && !gp_poll1 && !gp_sync1)
+++++++ + gp_cond1 = gp_exp1 = gp_normal1 = gp_poll1 = gp_sync1 = true;
+++++++ + if (gp_cond1 && cur_ops->get_gp_state && cur_ops->cond_sync) {
synctype[nsynctypes++] = RTWS_COND_GET;
pr_info("%s: Testing conditional GPs.\n", __func__);
------- - } else if (gp_cond && (!cur_ops->get_state || !cur_ops->cond_sync)) {
+++++++ + } else if (gp_cond && (!cur_ops->get_gp_state || !cur_ops->cond_sync)) {
pr_alert("%s: gp_cond without primitives.\n", __func__);
}
if (gp_exp1 && cur_ops->exp_sync) {
!rcu_gp_is_normal();
}
rcu_torture_writer_state = RTWS_STUTTER;
- if (stutter_wait("rcu_torture_writer") &&
+++++++ + boot_ended = rcu_inkernel_boot_has_ended();
+ stutter_waited = stutter_wait("rcu_torture_writer");
+ if (stutter_waited &&
!READ_ONCE(rcu_fwd_cb_nodelay) &&
!cur_ops->slow_gps &&
!torture_must_stop() &&
torture_stop_kthread(rcu_torture_reader,
reader_tasks[i]);
kfree(reader_tasks);
+ reader_tasks = NULL;
}
+++++++ + kfree(rcu_torture_reader_mbchk);
+++++++ + rcu_torture_reader_mbchk = NULL;
if (fakewriter_tasks) {
- for (i = 0; i < nfakewriters; i++) {
+ for (i = 0; i < nfakewriters; i++)
torture_stop_kthread(rcu_torture_fakewriter,
fakewriter_tasks[i]);
- }
kfree(fakewriter_tasks);
fakewriter_tasks = NULL;
}
rcu_spawn_tasks_kthread_generic(&rcu_tasks);
return 0;
}
- -------core_initcall(rcu_spawn_tasks_kthread);
- #ifndef CONFIG_TINY_RCU
- static void show_rcu_tasks_classic_gp_kthread(void)
+ #if !defined(CONFIG_TINY_RCU)
+ void show_rcu_tasks_classic_gp_kthread(void)
{
show_rcu_tasks_generic_gp_kthread(&rcu_tasks, "");
}
rcu_spawn_tasks_kthread_generic(&rcu_tasks_rude);
return 0;
}
- -------core_initcall(rcu_spawn_tasks_rude_kthread);
- #ifndef CONFIG_TINY_RCU
- static void show_rcu_tasks_rude_gp_kthread(void)
+ #if !defined(CONFIG_TINY_RCU)
+ void show_rcu_tasks_rude_gp_kthread(void)
{
show_rcu_tasks_generic_gp_kthread(&rcu_tasks_rude, "");
}
rcu_spawn_tasks_kthread_generic(&rcu_tasks_trace);
return 0;
}
- -------core_initcall(rcu_spawn_tasks_trace_kthread);
- #ifndef CONFIG_TINY_RCU
- static void show_rcu_tasks_trace_gp_kthread(void)
+ #if !defined(CONFIG_TINY_RCU)
+ void show_rcu_tasks_trace_gp_kthread(void)
{
char buf[64];
* go offline later. Please also refer to "Hotplug CPU" section
* of RCU's Requirements documentation.
*/
------ -- rcu_state.gp_state = RCU_GP_ONOFF;
++++++ ++ WRITE_ONCE(rcu_state.gp_state, RCU_GP_ONOFF);
rcu_for_each_leaf_node(rnp) {
+ smp_mb(); // Pair with barriers used when updating ->ofl_seq to odd values.
+ firstseq = READ_ONCE(rnp->ofl_seq);
+ if (firstseq & 0x1)
+ while (firstseq == READ_ONCE(rnp->ofl_seq))
+ schedule_timeout_idle(1); // Can't wake unless RCU is watching.
+ smp_mb(); // Pair with barriers used when updating ->ofl_seq to even values.
raw_spin_lock(&rcu_state.ofl_lock);
raw_spin_lock_irq_rcu_node(rnp);
if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
static void rcu_do_batch(struct rcu_data *rdp)
{
int div;
++++ ++++ bool __maybe_unused empty;
unsigned long flags;
- const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
- rcu_segcblist_is_offloaded(&rdp->cblist);
+ const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
struct rcu_head *rhp;
struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
---- ---- long bl, count;
++++ ++++ long bl, count = 0;
long pending, tlimit = 0;
/* If no callbacks are ready, just return. */
unsigned long flags;
struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
struct rcu_node *rnp = rdp->mynode;
- -- ---- const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist);
- const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
- rcu_segcblist_is_offloaded(&rdp->cblist);
++++ ++++ const bool do_batch = !rcu_segcblist_completely_offloaded(&rdp->cblist);
if (cpu_is_offline(smp_processor_id()))
return;
trace_rcu_callback(rcu_state.name, head,
rcu_segcblist_n_cbs(&rdp->cblist));
++++ ++++ trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));
++++ ++++
/* Go handle any RCU core processing required. */
- if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
- unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
+ if (unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
__call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
} else {
__call_rcu_core(rdp, head, flags);
goto unlock_return;
}
- /*
- * Under high memory pressure GFP_NOWAIT can fail,
- * in that case the emergency path is maintained.
- */
++ ++++++ kasan_record_aux_stack(ptr);
success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr);
if (!success) {
+ run_page_cache_worker(krcp);
+
if (head == NULL)
// Inline if kvfree_rcu(one_arg) call.
goto unlock_return;