]>
Commit | Line | Data |
---|---|---|
7f46a240 RL |
1 | ramfs, rootfs and initramfs |
2 | October 17, 2005 | |
3 | Rob Landley <[email protected]> | |
4 | ============================= | |
5 | ||
6 | What is ramfs? | |
7 | -------------- | |
8 | ||
9 | Ramfs is a very simple filesystem that exports Linux's disk caching | |
10 | mechanisms (the page cache and dentry cache) as a dynamically resizable | |
1810732e | 11 | RAM-based filesystem. |
7f46a240 RL |
12 | |
13 | Normally all files are cached in memory by Linux. Pages of data read from | |
14 | backing store (usually the block device the filesystem is mounted on) are kept | |
15 | around in case it's needed again, but marked as clean (freeable) in case the | |
16 | Virtual Memory system needs the memory for something else. Similarly, data | |
17 | written to files is marked clean as soon as it has been written to backing | |
18 | store, but kept around for caching purposes until the VM reallocates the | |
19 | memory. A similar mechanism (the dentry cache) greatly speeds up access to | |
20 | directories. | |
21 | ||
22 | With ramfs, there is no backing store. Files written into ramfs allocate | |
23 | dentries and page cache as usual, but there's nowhere to write them to. | |
24 | This means the pages are never marked clean, so they can't be freed by the | |
25 | VM when it's looking to recycle memory. | |
26 | ||
27 | The amount of code required to implement ramfs is tiny, because all the | |
28 | work is done by the existing Linux caching infrastructure. Basically, | |
29 | you're mounting the disk cache as a filesystem. Because of this, ramfs is not | |
30 | an optional component removable via menuconfig, since there would be negligible | |
31 | space savings. | |
32 | ||
33 | ramfs and ramdisk: | |
34 | ------------------ | |
35 | ||
36 | The older "ram disk" mechanism created a synthetic block device out of | |
1810732e | 37 | an area of RAM and used it as backing store for a filesystem. This block |
7f46a240 RL |
38 | device was of fixed size, so the filesystem mounted on it was of fixed |
39 | size. Using a ram disk also required unnecessarily copying memory from the | |
40 | fake block device into the page cache (and copying changes back out), as well | |
41 | as creating and destroying dentries. Plus it needed a filesystem driver | |
42 | (such as ext2) to format and interpret this data. | |
43 | ||
44 | Compared to ramfs, this wastes memory (and memory bus bandwidth), creates | |
45 | unnecessary work for the CPU, and pollutes the CPU caches. (There are tricks | |
46 | to avoid this copying by playing with the page tables, but they're unpleasantly | |
47 | complicated and turn out to be about as expensive as the copying anyway.) | |
48 | More to the point, all the work ramfs is doing has to happen _anyway_, | |
1810732e RD |
49 | since all file access goes through the page and dentry caches. The RAM |
50 | disk is simply unnecessary; ramfs is internally much simpler. | |
7f46a240 RL |
51 | |
52 | Another reason ramdisks are semi-obsolete is that the introduction of | |
53 | loopback devices offered a more flexible and convenient way to create | |
54 | synthetic block devices, now from files instead of from chunks of memory. | |
55 | See losetup (8) for details. | |
56 | ||
57 | ramfs and tmpfs: | |
58 | ---------------- | |
59 | ||
60 | One downside of ramfs is you can keep writing data into it until you fill | |
61 | up all memory, and the VM can't free it because the VM thinks that files | |
62 | should get written to backing store (rather than swap space), but ramfs hasn't | |
63 | got any backing store. Because of this, only root (or a trusted user) should | |
64 | be allowed write access to a ramfs mount. | |
65 | ||
66 | A ramfs derivative called tmpfs was created to add size limits, and the ability | |
67 | to write the data to swap space. Normal users can be allowed write access to | |
68 | tmpfs mounts. See Documentation/filesystems/tmpfs.txt for more information. | |
69 | ||
70 | What is rootfs? | |
71 | --------------- | |
72 | ||
e7b69055 RL |
73 | Rootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is |
74 | always present in 2.6 systems. You can't unmount rootfs for approximately the | |
75 | same reason you can't kill the init process; rather than having special code | |
76 | to check for and handle an empty list, it's smaller and simpler for the kernel | |
77 | to just make sure certain lists can't become empty. | |
7f46a240 | 78 | |
e7b69055 | 79 | Most systems just mount another filesystem over rootfs and ignore it. The |
7f46a240 RL |
80 | amount of space an empty instance of ramfs takes up is tiny. |
81 | ||
6e19eded RL |
82 | If CONFIG_TMPFS is enabled, rootfs will use tmpfs instead of ramfs by |
83 | default. To force ramfs, add "rootfstype=ramfs" to the kernel command | |
84 | line. | |
85 | ||
7f46a240 RL |
86 | What is initramfs? |
87 | ------------------ | |
88 | ||
89 | All 2.6 Linux kernels contain a gzipped "cpio" format archive, which is | |
90 | extracted into rootfs when the kernel boots up. After extracting, the kernel | |
91 | checks to see if rootfs contains a file "init", and if so it executes it as PID | |
92 | 1. If found, this init process is responsible for bringing the system the | |
93 | rest of the way up, including locating and mounting the real root device (if | |
94 | any). If rootfs does not contain an init program after the embedded cpio | |
95 | archive is extracted into it, the kernel will fall through to the older code | |
96 | to locate and mount a root partition, then exec some variant of /sbin/init | |
97 | out of that. | |
98 | ||
99 | All this differs from the old initrd in several ways: | |
100 | ||
e7b69055 RL |
101 | - The old initrd was always a separate file, while the initramfs archive is |
102 | linked into the linux kernel image. (The directory linux-*/usr is devoted | |
103 | to generating this archive during the build.) | |
7f46a240 RL |
104 | |
105 | - The old initrd file was a gzipped filesystem image (in some file format, | |
e7b69055 | 106 | such as ext2, that needed a driver built into the kernel), while the new |
7f46a240 | 107 | initramfs archive is a gzipped cpio archive (like tar only simpler, |
ec4b78a0 | 108 | see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst). The |
e7b69055 | 109 | kernel's cpio extraction code is not only extremely small, it's also |
1810732e | 110 | __init text and data that can be discarded during the boot process. |
7f46a240 RL |
111 | |
112 | - The program run by the old initrd (which was called /initrd, not /init) did | |
113 | some setup and then returned to the kernel, while the init program from | |
114 | initramfs is not expected to return to the kernel. (If /init needs to hand | |
115 | off control it can overmount / with a new root device and exec another init | |
116 | program. See the switch_root utility, below.) | |
117 | ||
118 | - When switching another root device, initrd would pivot_root and then | |
119 | umount the ramdisk. But initramfs is rootfs: you can neither pivot_root | |
120 | rootfs, nor unmount it. Instead delete everything out of rootfs to | |
121 | free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs | |
122 | with the new root (cd /newmount; mount --move . /; chroot .), attach | |
123 | stdin/stdout/stderr to the new /dev/console, and exec the new init. | |
124 | ||
33b13025 | 125 | Since this is a remarkably persnickety process (and involves deleting |
7f46a240 RL |
126 | commands before you can run them), the klibc package introduced a helper |
127 | program (utils/run_init.c) to do all this for you. Most other packages | |
128 | (such as busybox) have named this command "switch_root". | |
129 | ||
130 | Populating initramfs: | |
131 | --------------------- | |
132 | ||
133 | The 2.6 kernel build process always creates a gzipped cpio format initramfs | |
134 | archive and links it into the resulting kernel binary. By default, this | |
e7b69055 RL |
135 | archive is empty (consuming 134 bytes on x86). |
136 | ||
1838e392 | 137 | The config option CONFIG_INITRAMFS_SOURCE (in General Setup in menuconfig, |
138 | and living in usr/Kconfig) can be used to specify a source for the | |
139 | initramfs archive, which will automatically be incorporated into the | |
140 | resulting binary. This option can point to an existing gzipped cpio | |
141 | archive, a directory containing files to be archived, or a text file | |
142 | specification such as the following example: | |
7f46a240 RL |
143 | |
144 | dir /dev 755 0 0 | |
145 | nod /dev/console 644 0 0 c 5 1 | |
146 | nod /dev/loop0 644 0 0 b 7 0 | |
147 | dir /bin 755 1000 1000 | |
148 | slink /bin/sh busybox 777 0 0 | |
149 | file /bin/busybox initramfs/busybox 755 0 0 | |
150 | dir /proc 755 0 0 | |
151 | dir /sys 755 0 0 | |
152 | dir /mnt 755 0 0 | |
153 | file /init initramfs/init.sh 755 0 0 | |
154 | ||
99aef427 RL |
155 | Run "usr/gen_init_cpio" (after the kernel build) to get a usage message |
156 | documenting the above file format. | |
157 | ||
e7b69055 | 158 | One advantage of the configuration file is that root access is not required to |
7f46a240 RL |
159 | set permissions or create device nodes in the new archive. (Note that those |
160 | two example "file" entries expect to find files named "init.sh" and "busybox" in | |
161 | a directory called "initramfs", under the linux-2.6.* directory. See | |
ec4b78a0 | 162 | Documentation/driver-api/early-userspace/early_userspace_support.rst for more details.) |
7f46a240 | 163 | |
e7b69055 RL |
164 | The kernel does not depend on external cpio tools. If you specify a |
165 | directory instead of a configuration file, the kernel's build infrastructure | |
166 | creates a configuration file from that directory (usr/Makefile calls | |
f6f57a46 | 167 | usr/gen_initramfs_list.sh), and proceeds to package up that directory |
e7b69055 RL |
168 | using the config file (by feeding it to usr/gen_init_cpio, which is created |
169 | from usr/gen_init_cpio.c). The kernel's build-time cpio creation code is | |
170 | entirely self-contained, and the kernel's boot-time extractor is also | |
171 | (obviously) self-contained. | |
172 | ||
173 | The one thing you might need external cpio utilities installed for is creating | |
174 | or extracting your own preprepared cpio files to feed to the kernel build | |
175 | (instead of a config file or directory). | |
176 | ||
177 | The following command line can extract a cpio image (either by the above script | |
178 | or by the kernel build) back into its component files: | |
99aef427 RL |
179 | |
180 | cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames | |
181 | ||
e7b69055 RL |
182 | The following shell script can create a prebuilt cpio archive you can |
183 | use in place of the above config file: | |
184 | ||
185 | #!/bin/sh | |
186 | ||
187 | # Copyright 2006 Rob Landley <[email protected]> and TimeSys Corporation. | |
188 | # Licensed under GPL version 2 | |
189 | ||
190 | if [ $# -ne 2 ] | |
191 | then | |
192 | echo "usage: mkinitramfs directory imagename.cpio.gz" | |
193 | exit 1 | |
194 | fi | |
195 | ||
196 | if [ -d "$1" ] | |
197 | then | |
198 | echo "creating $2 from $1" | |
199 | (cd "$1"; find . | cpio -o -H newc | gzip) > "$2" | |
200 | else | |
201 | echo "First argument must be a directory" | |
202 | exit 1 | |
203 | fi | |
204 | ||
205 | Note: The cpio man page contains some bad advice that will break your initramfs | |
206 | archive if you follow it. It says "A typical way to generate the list | |
207 | of filenames is with the find command; you should give find the -depth option | |
208 | to minimize problems with permissions on directories that are unwritable or not | |
209 | searchable." Don't do this when creating initramfs.cpio.gz images, it won't | |
210 | work. The Linux kernel cpio extractor won't create files in a directory that | |
211 | doesn't exist, so the directory entries must go before the files that go in | |
212 | those directories. The above script gets them in the right order. | |
213 | ||
214 | External initramfs images: | |
215 | -------------------------- | |
216 | ||
217 | If the kernel has initrd support enabled, an external cpio.gz archive can also | |
218 | be passed into a 2.6 kernel in place of an initrd. In this case, the kernel | |
219 | will autodetect the type (initramfs, not initrd) and extract the external cpio | |
220 | archive into rootfs before trying to run /init. | |
221 | ||
222 | This has the memory efficiency advantages of initramfs (no ramdisk block | |
223 | device) but the separate packaging of initrd (which is nice if you have | |
224 | non-GPL code you'd like to run from initramfs, without conflating it with | |
225 | the GPL licensed Linux kernel binary). | |
226 | ||
1810732e | 227 | It can also be used to supplement the kernel's built-in initramfs image. The |
e7b69055 RL |
228 | files in the external archive will overwrite any conflicting files in |
229 | the built-in initramfs archive. Some distributors also prefer to customize | |
230 | a single kernel image with task-specific initramfs images, without recompiling. | |
231 | ||
99aef427 RL |
232 | Contents of initramfs: |
233 | ---------------------- | |
234 | ||
e7b69055 | 235 | An initramfs archive is a complete self-contained root filesystem for Linux. |
7f46a240 RL |
236 | If you don't already understand what shared libraries, devices, and paths |
237 | you need to get a minimal root filesystem up and running, here are some | |
238 | references: | |
239 | http://www.tldp.org/HOWTO/Bootdisk-HOWTO/ | |
240 | http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html | |
241 | http://www.linuxfromscratch.org/lfs/view/stable/ | |
242 | ||
243 | The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is | |
244 | designed to be a tiny C library to statically link early userspace | |
245 | code against, along with some related utilities. It is BSD licensed. | |
246 | ||
247 | I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net) | |
99aef427 | 248 | myself. These are LGPL and GPL, respectively. (A self-contained initramfs |
e7b69055 | 249 | package is planned for the busybox 1.3 release.) |
7f46a240 RL |
250 | |
251 | In theory you could use glibc, but that's not well suited for small embedded | |
252 | uses like this. (A "hello world" program statically linked against glibc is | |
253 | over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do | |
254 | name lookups, even when otherwise statically linked.) | |
255 | ||
e7b69055 RL |
256 | A good first step is to get initramfs to run a statically linked "hello world" |
257 | program as init, and test it under an emulator like qemu (www.qemu.org) or | |
258 | User Mode Linux, like so: | |
259 | ||
260 | cat > hello.c << EOF | |
261 | #include <stdio.h> | |
262 | #include <unistd.h> | |
263 | ||
264 | int main(int argc, char *argv[]) | |
265 | { | |
266 | printf("Hello world!\n"); | |
267 | sleep(999999999); | |
268 | } | |
269 | EOF | |
dd1c53a6 | 270 | gcc -static hello.c -o init |
e7b69055 RL |
271 | echo init | cpio -o -H newc | gzip > test.cpio.gz |
272 | # Testing external initramfs using the initrd loading mechanism. | |
273 | qemu -kernel /boot/vmlinuz -initrd test.cpio.gz /dev/zero | |
274 | ||
275 | When debugging a normal root filesystem, it's nice to be able to boot with | |
276 | "init=/bin/sh". The initramfs equivalent is "rdinit=/bin/sh", and it's | |
277 | just as useful. | |
278 | ||
99aef427 RL |
279 | Why cpio rather than tar? |
280 | ------------------------- | |
281 | ||
282 | This decision was made back in December, 2001. The discussion started here: | |
283 | ||
284 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1538.html | |
285 | ||
286 | And spawned a second thread (specifically on tar vs cpio), starting here: | |
287 | ||
288 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1587.html | |
289 | ||
290 | The quick and dirty summary version (which is no substitute for reading | |
291 | the above threads) is: | |
292 | ||
293 | 1) cpio is a standard. It's decades old (from the AT&T days), and already | |
294 | widely used on Linux (inside RPM, Red Hat's device driver disks). Here's | |
295 | a Linux Journal article about it from 1996: | |
296 | ||
297 | http://www.linuxjournal.com/article/1213 | |
298 | ||
299 | It's not as popular as tar because the traditional cpio command line tools | |
300 | require _truly_hideous_ command line arguments. But that says nothing | |
301 | either way about the archive format, and there are alternative tools, | |
302 | such as: | |
303 | ||
1f8ee46b | 304 | http://freecode.com/projects/afio |
99aef427 RL |
305 | |
306 | 2) The cpio archive format chosen by the kernel is simpler and cleaner (and | |
307 | thus easier to create and parse) than any of the (literally dozens of) | |
308 | various tar archive formats. The complete initramfs archive format is | |
309 | explained in buffer-format.txt, created in usr/gen_init_cpio.c, and | |
310 | extracted in init/initramfs.c. All three together come to less than 26k | |
311 | total of human-readable text. | |
312 | ||
313 | 3) The GNU project standardizing on tar is approximately as relevant as | |
314 | Windows standardizing on zip. Linux is not part of either, and is free | |
315 | to make its own technical decisions. | |
316 | ||
317 | 4) Since this is a kernel internal format, it could easily have been | |
318 | something brand new. The kernel provides its own tools to create and | |
319 | extract this format anyway. Using an existing standard was preferable, | |
320 | but not essential. | |
321 | ||
322 | 5) Al Viro made the decision (quote: "tar is ugly as hell and not going to be | |
323 | supported on the kernel side"): | |
324 | ||
325 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1540.html | |
326 | ||
327 | explained his reasoning: | |
328 | ||
329 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html | |
330 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html | |
331 | ||
332 | and, most importantly, designed and implemented the initramfs code. | |
333 | ||
7f46a240 RL |
334 | Future directions: |
335 | ------------------ | |
336 | ||
e7b69055 | 337 | Today (2.6.16), initramfs is always compiled in, but not always used. The |
7f46a240 RL |
338 | kernel falls back to legacy boot code that is reached only if initramfs does |
339 | not contain an /init program. The fallback is legacy code, there to ensure a | |
340 | smooth transition and allowing early boot functionality to gradually move to | |
341 | "early userspace" (I.E. initramfs). | |
342 | ||
343 | The move to early userspace is necessary because finding and mounting the real | |
344 | root device is complex. Root partitions can span multiple devices (raid or | |
345 | separate journal). They can be out on the network (requiring dhcp, setting a | |
1810732e | 346 | specific MAC address, logging into a server, etc). They can live on removable |
7f46a240 RL |
347 | media, with dynamically allocated major/minor numbers and persistent naming |
348 | issues requiring a full udev implementation to sort out. They can be | |
349 | compressed, encrypted, copy-on-write, loopback mounted, strangely partitioned, | |
350 | and so on. | |
351 | ||
352 | This kind of complexity (which inevitably includes policy) is rightly handled | |
353 | in userspace. Both klibc and busybox/uClibc are working on simple initramfs | |
e7b69055 | 354 | packages to drop into a kernel build. |
7f46a240 | 355 | |
e7b69055 RL |
356 | The klibc package has now been accepted into Andrew Morton's 2.6.17-mm tree. |
357 | The kernel's current early boot code (partition detection, etc) will probably | |
358 | be migrated into a default initramfs, automatically created and used by the | |
359 | kernel build. |