]>
Commit | Line | Data |
---|---|---|
c742b531 RL |
1 | Red-black Trees (rbtree) in Linux |
2 | January 18, 2007 | |
3 | Rob Landley <[email protected]> | |
4 | ============================= | |
5 | ||
6 | What are red-black trees, and what are they for? | |
7 | ------------------------------------------------ | |
8 | ||
9 | Red-black trees are a type of self-balancing binary search tree, used for | |
10 | storing sortable key/value data pairs. This differs from radix trees (which | |
11 | are used to efficiently store sparse arrays and thus use long integer indexes | |
12 | to insert/access/delete nodes) and hash tables (which are not kept sorted to | |
13 | be easily traversed in order, and must be tuned for a specific size and | |
14 | hash function where rbtrees scale gracefully storing arbitrary keys). | |
15 | ||
16 | Red-black trees are similar to AVL trees, but provide faster real-time bounded | |
17 | worst case performance for insertion and deletion (at most two rotations and | |
18 | three rotations, respectively, to balance the tree), with slightly slower | |
19 | (but still O(log n)) lookup time. | |
20 | ||
21 | To quote Linux Weekly News: | |
22 | ||
23 | There are a number of red-black trees in use in the kernel. | |
17a9e7bb RD |
24 | The deadline and CFQ I/O schedulers employ rbtrees to |
25 | track requests; the packet CD/DVD driver does the same. | |
c742b531 RL |
26 | The high-resolution timer code uses an rbtree to organize outstanding |
27 | timer requests. The ext3 filesystem tracks directory entries in a | |
28 | red-black tree. Virtual memory areas (VMAs) are tracked with red-black | |
29 | trees, as are epoll file descriptors, cryptographic keys, and network | |
30 | packets in the "hierarchical token bucket" scheduler. | |
31 | ||
32 | This document covers use of the Linux rbtree implementation. For more | |
33 | information on the nature and implementation of Red Black Trees, see: | |
34 | ||
35 | Linux Weekly News article on red-black trees | |
36 | http://lwn.net/Articles/184495/ | |
37 | ||
38 | Wikipedia entry on red-black trees | |
39 | http://en.wikipedia.org/wiki/Red-black_tree | |
40 | ||
41 | Linux implementation of red-black trees | |
42 | --------------------------------------- | |
43 | ||
44 | Linux's rbtree implementation lives in the file "lib/rbtree.c". To use it, | |
45 | "#include <linux/rbtree.h>". | |
46 | ||
47 | The Linux rbtree implementation is optimized for speed, and thus has one | |
48 | less layer of indirection (and better cache locality) than more traditional | |
49 | tree implementations. Instead of using pointers to separate rb_node and data | |
50 | structures, each instance of struct rb_node is embedded in the data structure | |
51 | it organizes. And instead of using a comparison callback function pointer, | |
52 | users are expected to write their own tree search and insert functions | |
53 | which call the provided rbtree functions. Locking is also left up to the | |
54 | user of the rbtree code. | |
55 | ||
56 | Creating a new rbtree | |
57 | --------------------- | |
58 | ||
59 | Data nodes in an rbtree tree are structures containing a struct rb_node member: | |
60 | ||
61 | struct mytype { | |
62 | struct rb_node node; | |
63 | char *keystring; | |
64 | }; | |
65 | ||
66 | When dealing with a pointer to the embedded struct rb_node, the containing data | |
67 | structure may be accessed with the standard container_of() macro. In addition, | |
68 | individual members may be accessed directly via rb_entry(node, type, member). | |
69 | ||
70 | At the root of each rbtree is an rb_root structure, which is initialized to be | |
71 | empty via: | |
72 | ||
73 | struct rb_root mytree = RB_ROOT; | |
74 | ||
75 | Searching for a value in an rbtree | |
76 | ---------------------------------- | |
77 | ||
78 | Writing a search function for your tree is fairly straightforward: start at the | |
79 | root, compare each value, and follow the left or right branch as necessary. | |
80 | ||
81 | Example: | |
82 | ||
83 | struct mytype *my_search(struct rb_root *root, char *string) | |
84 | { | |
85 | struct rb_node *node = root->rb_node; | |
86 | ||
87 | while (node) { | |
88 | struct mytype *data = container_of(node, struct mytype, node); | |
89 | int result; | |
90 | ||
91 | result = strcmp(string, data->keystring); | |
92 | ||
93 | if (result < 0) | |
94 | node = node->rb_left; | |
95 | else if (result > 0) | |
96 | node = node->rb_right; | |
97 | else | |
98 | return data; | |
99 | } | |
100 | return NULL; | |
101 | } | |
102 | ||
103 | Inserting data into an rbtree | |
104 | ----------------------------- | |
105 | ||
106 | Inserting data in the tree involves first searching for the place to insert the | |
107 | new node, then inserting the node and rebalancing ("recoloring") the tree. | |
108 | ||
109 | The search for insertion differs from the previous search by finding the | |
110 | location of the pointer on which to graft the new node. The new node also | |
111 | needs a link to its parent node for rebalancing purposes. | |
112 | ||
113 | Example: | |
114 | ||
115 | int my_insert(struct rb_root *root, struct mytype *data) | |
116 | { | |
117 | struct rb_node **new = &(root->rb_node), *parent = NULL; | |
118 | ||
119 | /* Figure out where to put new node */ | |
120 | while (*new) { | |
121 | struct mytype *this = container_of(*new, struct mytype, node); | |
122 | int result = strcmp(data->keystring, this->keystring); | |
123 | ||
124 | parent = *new; | |
125 | if (result < 0) | |
126 | new = &((*new)->rb_left); | |
127 | else if (result > 0) | |
128 | new = &((*new)->rb_right); | |
129 | else | |
130 | return FALSE; | |
131 | } | |
132 | ||
133 | /* Add new node and rebalance tree. */ | |
27af1da4 | 134 | rb_link_node(&data->node, parent, new); |
135 | rb_insert_color(&data->node, root); | |
c742b531 RL |
136 | |
137 | return TRUE; | |
138 | } | |
139 | ||
140 | Removing or replacing existing data in an rbtree | |
141 | ------------------------------------------------ | |
142 | ||
143 | To remove an existing node from a tree, call: | |
144 | ||
145 | void rb_erase(struct rb_node *victim, struct rb_root *tree); | |
146 | ||
147 | Example: | |
148 | ||
27af1da4 | 149 | struct mytype *data = mysearch(&mytree, "walrus"); |
c742b531 RL |
150 | |
151 | if (data) { | |
27af1da4 | 152 | rb_erase(&data->node, &mytree); |
c742b531 RL |
153 | myfree(data); |
154 | } | |
155 | ||
156 | To replace an existing node in a tree with a new one with the same key, call: | |
157 | ||
158 | void rb_replace_node(struct rb_node *old, struct rb_node *new, | |
159 | struct rb_root *tree); | |
160 | ||
161 | Replacing a node this way does not re-sort the tree: If the new node doesn't | |
162 | have the same key as the old node, the rbtree will probably become corrupted. | |
163 | ||
164 | Iterating through the elements stored in an rbtree (in sort order) | |
165 | ------------------------------------------------------------------ | |
166 | ||
167 | Four functions are provided for iterating through an rbtree's contents in | |
168 | sorted order. These work on arbitrary trees, and should not need to be | |
169 | modified or wrapped (except for locking purposes): | |
170 | ||
171 | struct rb_node *rb_first(struct rb_root *tree); | |
172 | struct rb_node *rb_last(struct rb_root *tree); | |
173 | struct rb_node *rb_next(struct rb_node *node); | |
174 | struct rb_node *rb_prev(struct rb_node *node); | |
175 | ||
176 | To start iterating, call rb_first() or rb_last() with a pointer to the root | |
177 | of the tree, which will return a pointer to the node structure contained in | |
178 | the first or last element in the tree. To continue, fetch the next or previous | |
179 | node by calling rb_next() or rb_prev() on the current node. This will return | |
180 | NULL when there are no more nodes left. | |
181 | ||
182 | The iterator functions return a pointer to the embedded struct rb_node, from | |
183 | which the containing data structure may be accessed with the container_of() | |
184 | macro, and individual members may be accessed directly via | |
185 | rb_entry(node, type, member). | |
186 | ||
187 | Example: | |
188 | ||
189 | struct rb_node *node; | |
190 | for (node = rb_first(&mytree); node; node = rb_next(node)) | |
19034233 | 191 | printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring); |
c742b531 | 192 | |
17d9ddc7 PV |
193 | Support for Augmented rbtrees |
194 | ----------------------------- | |
195 | ||
196 | Augmented rbtree is an rbtree with "some" additional data stored in each node. | |
197 | This data can be used to augment some new functionality to rbtree. | |
198 | Augmented rbtree is an optional feature built on top of basic rbtree | |
2f175074 SL |
199 | infrastructure. An rbtree user who wants this feature will have to call the |
200 | augmentation functions with the user provided augmentation callback | |
201 | when inserting and erasing nodes. | |
202 | ||
203 | On insertion, the user must call rb_augment_insert() once the new node is in | |
204 | place. This will cause the augmentation function callback to be called for | |
205 | each node between the new node and the root which has been affected by the | |
206 | insertion. | |
207 | ||
208 | When erasing a node, the user must call rb_augment_erase_begin() first to | |
209 | retrieve the deepest node on the rebalance path. Then, after erasing the | |
210 | original node, the user must call rb_augment_erase_end() with the deepest | |
211 | node found earlier. This will cause the augmentation function to be called | |
212 | for each affected node between the deepest node and the root. | |
17d9ddc7 PV |
213 | |
214 | ||
215 | Interval tree is an example of augmented rb tree. Reference - | |
216 | "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein. | |
217 | More details about interval trees: | |
218 | ||
219 | Classical rbtree has a single key and it cannot be directly used to store | |
220 | interval ranges like [lo:hi] and do a quick lookup for any overlap with a new | |
221 | lo:hi or to find whether there is an exact match for a new lo:hi. | |
222 | ||
223 | However, rbtree can be augmented to store such interval ranges in a structured | |
224 | way making it possible to do efficient lookup and exact match. | |
225 | ||
226 | This "extra information" stored in each node is the maximum hi | |
227 | (max_hi) value among all the nodes that are its descendents. This | |
228 | information can be maintained at each node just be looking at the node | |
229 | and its immediate children. And this will be used in O(log n) lookup | |
230 | for lowest match (lowest start address among all possible matches) | |
231 | with something like: | |
232 | ||
233 | find_lowest_match(lo, hi, node) | |
234 | { | |
235 | lowest_match = NULL; | |
236 | while (node) { | |
237 | if (max_hi(node->left) > lo) { | |
238 | // Lowest overlap if any must be on left side | |
239 | node = node->left; | |
240 | } else if (overlap(lo, hi, node)) { | |
241 | lowest_match = node; | |
242 | break; | |
243 | } else if (lo > node->lo) { | |
244 | // Lowest overlap if any must be on right side | |
245 | node = node->right; | |
246 | } else { | |
247 | break; | |
248 | } | |
249 | } | |
250 | return lowest_match; | |
251 | } | |
252 | ||
253 | Finding exact match will be to first find lowest match and then to follow | |
254 | successor nodes looking for exact match, until the start of a node is beyond | |
255 | the hi value we are looking for. |