]>
Commit | Line | Data |
---|---|---|
c49e51a5 | 1 | ============================= |
853afb71 | 2 | No-MMU memory mapping support |
c49e51a5 | 3 | ============================= |
1da177e4 LT |
4 | |
5 | The kernel has limited support for memory mapping under no-MMU conditions, such | |
6 | as are used in uClinux environments. From the userspace point of view, memory | |
7 | mapping is made use of in conjunction with the mmap() system call, the shmat() | |
8 | call and the execve() system call. From the kernel's point of view, execve() | |
9 | mapping is actually performed by the binfmt drivers, which call back into the | |
10 | mmap() routines to do the actual work. | |
11 | ||
12 | Memory mapping behaviour also involves the way fork(), vfork(), clone() and | |
13 | ptrace() work. Under uClinux there is no fork(), and clone() must be supplied | |
14 | the CLONE_VM flag. | |
15 | ||
16 | The behaviour is similar between the MMU and no-MMU cases, but not identical; | |
17 | and it's also much more restricted in the latter case: | |
18 | ||
c49e51a5 | 19 | (#) Anonymous mapping, MAP_PRIVATE |
1da177e4 LT |
20 | |
21 | In the MMU case: VM regions backed by arbitrary pages; copy-on-write | |
22 | across fork. | |
23 | ||
24 | In the no-MMU case: VM regions backed by arbitrary contiguous runs of | |
25 | pages. | |
26 | ||
c49e51a5 | 27 | (#) Anonymous mapping, MAP_SHARED |
1da177e4 LT |
28 | |
29 | These behave very much like private mappings, except that they're | |
30 | shared across fork() or clone() without CLONE_VM in the MMU case. Since | |
31 | the no-MMU case doesn't support these, behaviour is identical to | |
32 | MAP_PRIVATE there. | |
33 | ||
c49e51a5 | 34 | (#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, !PROT_WRITE |
1da177e4 LT |
35 | |
36 | In the MMU case: VM regions backed by pages read from file; changes to | |
37 | the underlying file are reflected in the mapping; copied across fork. | |
38 | ||
39 | In the no-MMU case: | |
40 | ||
41 | - If one exists, the kernel will re-use an existing mapping to the | |
42 | same segment of the same file if that has compatible permissions, | |
43 | even if this was created by another process. | |
44 | ||
45 | - If possible, the file mapping will be directly on the backing device | |
b4caecd4 | 46 | if the backing device has the NOMMU_MAP_DIRECT capability and |
1da177e4 LT |
47 | appropriate mapping protection capabilities. Ramfs, romfs, cramfs |
48 | and mtd might all permit this. | |
49 | ||
83d4fcb3 | 50 | - If the backing device can't or won't permit direct sharing, |
b4caecd4 | 51 | but does have the NOMMU_MAP_COPY capability, then a copy of the |
1da177e4 LT |
52 | appropriate bit of the file will be read into a contiguous bit of |
53 | memory and any extraneous space beyond the EOF will be cleared | |
54 | ||
55 | - Writes to the file do not affect the mapping; writes to the mapping | |
56 | are visible in other processes (no MMU protection), but should not | |
57 | happen. | |
58 | ||
c49e51a5 | 59 | (#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, PROT_WRITE |
1da177e4 LT |
60 | |
61 | In the MMU case: like the non-PROT_WRITE case, except that the pages in | |
62 | question get copied before the write actually happens. From that point | |
63 | on writes to the file underneath that page no longer get reflected into | |
64 | the mapping's backing pages. The page is then backed by swap instead. | |
65 | ||
66 | In the no-MMU case: works much like the non-PROT_WRITE case, except | |
67 | that a copy is always taken and never shared. | |
68 | ||
c49e51a5 | 69 | (#) Regular file / blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE |
1da177e4 LT |
70 | |
71 | In the MMU case: VM regions backed by pages read from file; changes to | |
72 | pages written back to file; writes to file reflected into pages backing | |
73 | mapping; shared across fork. | |
74 | ||
75 | In the no-MMU case: not supported. | |
76 | ||
c49e51a5 | 77 | (#) Memory backed regular file, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE |
1da177e4 LT |
78 | |
79 | In the MMU case: As for ordinary regular files. | |
80 | ||
81 | In the no-MMU case: The filesystem providing the memory-backed file | |
82 | (such as ramfs or tmpfs) may choose to honour an open, truncate, mmap | |
83 | sequence by providing a contiguous sequence of pages to map. In that | |
84 | case, a shared-writable memory mapping will be possible. It will work | |
85 | as for the MMU case. If the filesystem does not provide any such | |
86 | support, then the mapping request will be denied. | |
87 | ||
c49e51a5 | 88 | (#) Memory backed blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE |
1da177e4 LT |
89 | |
90 | In the MMU case: As for ordinary regular files. | |
91 | ||
92 | In the no-MMU case: As for memory backed regular files, but the | |
93 | blockdev must be able to provide a contiguous run of pages without | |
94 | truncate being called. The ramdisk driver could do this if it allocated | |
95 | all its memory as a contiguous array upfront. | |
96 | ||
c49e51a5 | 97 | (#) Memory backed chardev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE |
1da177e4 LT |
98 | |
99 | In the MMU case: As for ordinary regular files. | |
100 | ||
101 | In the no-MMU case: The character device driver may choose to honour | |
102 | the mmap() by providing direct access to the underlying device if it | |
103 | provides memory or quasi-memory that can be accessed directly. Examples | |
104 | of such are frame buffers and flash devices. If the driver does not | |
105 | provide any such support, then the mapping request will be denied. | |
106 | ||
107 | ||
853afb71 | 108 | Further notes on no-MMU MMAP |
1da177e4 LT |
109 | ============================ |
110 | ||
c49e51a5 | 111 | (#) A request for a private mapping of a file may return a buffer that is not |
8feae131 DH |
112 | page-aligned. This is because XIP may take place, and the data may not be |
113 | paged aligned in the backing store. | |
114 | ||
c49e51a5 | 115 | (#) A request for an anonymous mapping will always be page aligned. If |
8feae131 DH |
116 | possible the size of the request should be a power of two otherwise some |
117 | of the space may be wasted as the kernel must allocate a power-of-2 | |
118 | granule but will only discard the excess if appropriately configured as | |
119 | this has an effect on fragmentation. | |
120 | ||
c49e51a5 | 121 | (#) The memory allocated by a request for an anonymous mapping will normally |
ea637639 JZ |
122 | be cleared by the kernel before being returned in accordance with the |
123 | Linux man pages (ver 2.22 or later). | |
124 | ||
125 | In the MMU case this can be achieved with reasonable performance as | |
126 | regions are backed by virtual pages, with the contents only being mapped | |
127 | to cleared physical pages when a write happens on that specific page | |
128 | (prior to which, the pages are effectively mapped to the global zero page | |
129 | from which reads can take place). This spreads out the time it takes to | |
130 | initialize the contents of a page - depending on the write-usage of the | |
131 | mapping. | |
132 | ||
133 | In the no-MMU case, however, anonymous mappings are backed by physical | |
134 | pages, and the entire map is cleared at allocation time. This can cause | |
135 | significant delays during a userspace malloc() as the C library does an | |
136 | anonymous mapping and the kernel then does a memset for the entire map. | |
137 | ||
138 | However, for memory that isn't required to be precleared - such as that | |
139 | returned by malloc() - mmap() can take a MAP_UNINITIALIZED flag to | |
140 | indicate to the kernel that it shouldn't bother clearing the memory before | |
141 | returning it. Note that CONFIG_MMAP_ALLOW_UNINITIALIZED must be enabled | |
142 | to permit this, otherwise the flag will be ignored. | |
143 | ||
144 | uClibc uses this to speed up malloc(), and the ELF-FDPIC binfmt uses this | |
145 | to allocate the brk and stack region. | |
146 | ||
c49e51a5 | 147 | (#) A list of all the private copy and anonymous mappings on the system is |
8feae131 | 148 | visible through /proc/maps in no-MMU mode. |
1da177e4 | 149 | |
c49e51a5 | 150 | (#) A list of all the mappings in use by a process is visible through |
dbf8685c DH |
151 | /proc/<pid>/maps in no-MMU mode. |
152 | ||
c49e51a5 | 153 | (#) Supplying MAP_FIXED or a requesting a particular mapping address will |
1da177e4 LT |
154 | result in an error. |
155 | ||
c49e51a5 | 156 | (#) Files mapped privately usually have to have a read method provided by the |
1da177e4 LT |
157 | driver or filesystem so that the contents can be read into the memory |
158 | allocated if mmap() chooses not to map the backing device directly. An | |
159 | error will result if they don't. This is most likely to be encountered | |
160 | with character device files, pipes, fifos and sockets. | |
161 | ||
6fa5f80b | 162 | |
853afb71 | 163 | Interprocess shared memory |
0112c4c6 DH |
164 | ========================== |
165 | ||
166 | Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU | |
167 | mode. The former through the usual mechanism, the latter through files created | |
168 | on ramfs or tmpfs mounts. | |
169 | ||
170 | ||
853afb71 | 171 | Futexes |
930e652a DH |
172 | ======= |
173 | ||
174 | Futexes are supported in NOMMU mode if the arch supports them. An error will | |
175 | be given if an address passed to the futex system call lies outside the | |
176 | mappings made by a process or if the mapping in which the address lies does not | |
177 | support futexes (such as an I/O chardev mapping). | |
178 | ||
179 | ||
853afb71 | 180 | No-MMU mremap |
6fa5f80b DH |
181 | ============= |
182 | ||
183 | The mremap() function is partially supported. It may change the size of a | |
c49e51a5 | 184 | mapping, and may move it [#]_ if MREMAP_MAYMOVE is specified and if the new size |
6fa5f80b DH |
185 | of the mapping exceeds the size of the slab object currently occupied by the |
186 | memory to which the mapping refers, or if a smaller slab object could be used. | |
187 | ||
188 | MREMAP_FIXED is not supported, though it is ignored if there's no change of | |
189 | address and the object does not need to be moved. | |
190 | ||
191 | Shared mappings may not be moved. Shareable mappings may not be moved either, | |
192 | even if they are not currently shared. | |
193 | ||
194 | The mremap() function must be given an exact match for base address and size of | |
195 | a previously mapped object. It may not be used to create holes in existing | |
196 | mappings, move parts of existing mappings or resize parts of mappings. It must | |
197 | act on a complete mapping. | |
198 | ||
c49e51a5 | 199 | .. [#] Not currently supported. |
6fa5f80b DH |
200 | |
201 | ||
853afb71 | 202 | Providing shareable character device support |
1da177e4 LT |
203 | ============================================ |
204 | ||
205 | To provide shareable character device support, a driver must provide a | |
206 | file->f_op->get_unmapped_area() operation. The mmap() routines will call this | |
207 | to get a proposed address for the mapping. This may return an error if it | |
208 | doesn't wish to honour the mapping because it's too long, at a weird offset, | |
209 | under some unsupported combination of flags or whatever. | |
210 | ||
211 | The driver should also provide backing device information with capabilities set | |
212 | to indicate the permitted types of mapping on such devices. The default is | |
213 | assumed to be readable and writable, not executable, and only shareable | |
214 | directly (can't be copied). | |
215 | ||
216 | The file->f_op->mmap() operation will be called to actually inaugurate the | |
217 | mapping. It can be rejected at that point. Returning the ENOSYS error will | |
b4caecd4 | 218 | cause the mapping to be copied instead if NOMMU_MAP_COPY is specified. |
1da177e4 LT |
219 | |
220 | The vm_ops->close() routine will be invoked when the last mapping on a chardev | |
221 | is removed. An existing mapping will be shared, partially or not, if possible | |
222 | without notifying the driver. | |
223 | ||
224 | It is permitted also for the file->f_op->get_unmapped_area() operation to | |
225 | return -ENOSYS. This will be taken to mean that this operation just doesn't | |
226 | want to handle it, despite the fact it's got an operation. For instance, it | |
227 | might try directing the call to a secondary driver which turns out not to | |
228 | implement it. Such is the case for the framebuffer driver which attempts to | |
229 | direct the call to the device-specific driver. Under such circumstances, the | |
b4caecd4 | 230 | mapping request will be rejected if NOMMU_MAP_COPY is not specified, and a |
1da177e4 LT |
231 | copy mapped otherwise. |
232 | ||
c49e51a5 | 233 | .. important:: |
1da177e4 LT |
234 | |
235 | Some types of device may present a different appearance to anyone | |
236 | looking at them in certain modes. Flash chips can be like this; for | |
237 | instance if they're in programming or erase mode, you might see the | |
238 | status reflected in the mapping, instead of the data. | |
239 | ||
240 | In such a case, care must be taken lest userspace see a shared or a | |
241 | private mapping showing such information when the driver is busy | |
242 | controlling the device. Remember especially: private executable | |
243 | mappings may still be mapped directly off the device under some | |
244 | circumstances! | |
245 | ||
246 | ||
853afb71 | 247 | Providing shareable memory-backed file support |
1da177e4 LT |
248 | ============================================== |
249 | ||
250 | Provision of shared mappings on memory backed files is similar to the provision | |
251 | of support for shared mapped character devices. The main difference is that the | |
252 | filesystem providing the service will probably allocate a contiguous collection | |
253 | of pages and permit mappings to be made on that. | |
254 | ||
255 | It is recommended that a truncate operation applied to such a file that | |
256 | increases the file size, if that file is empty, be taken as a request to gather | |
257 | enough pages to honour a mapping. This is required to support POSIX shared | |
258 | memory. | |
259 | ||
260 | Memory backed devices are indicated by the mapping's backing device info having | |
261 | the memory_backed flag set. | |
262 | ||
263 | ||
853afb71 | 264 | Providing shareable block device support |
1da177e4 LT |
265 | ======================================== |
266 | ||
267 | Provision of shared mappings on block device files is exactly the same as for | |
268 | character devices. If there isn't a real device underneath, then the driver | |
269 | should allocate sufficient contiguous memory to honour any supported mapping. | |
dd8632a1 PM |
270 | |
271 | ||
853afb71 | 272 | Adjusting page trimming behaviour |
dd8632a1 PM |
273 | ================================= |
274 | ||
275 | NOMMU mmap automatically rounds up to the nearest power-of-2 number of pages | |
276 | when performing an allocation. This can have adverse effects on memory | |
277 | fragmentation, and as such, is left configurable. The default behaviour is to | |
278 | aggressively trim allocations and discard any excess pages back in to the page | |
279 | allocator. In order to retain finer-grained control over fragmentation, this | |
280 | behaviour can either be disabled completely, or bumped up to a higher page | |
281 | watermark where trimming begins. | |
282 | ||
c49e51a5 | 283 | Page trimming behaviour is configurable via the sysctl ``vm.nr_trim_pages``. |