]>
Commit | Line | Data |
---|---|---|
252b5132 RH |
1 | \input texinfo |
2 | @setfilename ldint.info | |
0e9517a9 | 3 | @c Copyright 1992, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, |
aa820537 | 4 | @c 2003, 2005, 2006, 2007 |
a2b64bed | 5 | @c Free Software Foundation, Inc. |
252b5132 | 6 | |
9160ea82 AM |
7 | @ifnottex |
8 | @dircategory Software development | |
9 | @direntry | |
252b5132 | 10 | * Ld-Internals: (ldint). The GNU linker internals. |
9160ea82 AM |
11 | @end direntry |
12 | @end ifnottex | |
252b5132 | 13 | |
0e9517a9 | 14 | @copying |
252b5132 RH |
15 | This file documents the internals of the GNU linker ld. |
16 | ||
0e9517a9 | 17 | Copyright @copyright{} 1992, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2007 |
a2b64bed | 18 | Free Software Foundation, Inc. |
252b5132 RH |
19 | Contributed by Cygnus Support. |
20 | ||
0e9517a9 | 21 | Permission is granted to copy, distribute and/or modify this document |
9fbcbd81 | 22 | under the terms of the GNU Free Documentation License, Version 1.3 or |
0e9517a9 NC |
23 | any later version published by the Free Software Foundation; with the |
24 | Invariant Sections being ``GNU General Public License'' and ``Funding | |
25 | Free Software'', the Front-Cover texts being (a) (see below), and with | |
26 | the Back-Cover Texts being (b) (see below). A copy of the license is | |
27 | included in the section entitled ``GNU Free Documentation License''. | |
252b5132 | 28 | |
0e9517a9 | 29 | (a) The FSF's Front-Cover Text is: |
252b5132 | 30 | |
0e9517a9 NC |
31 | A GNU Manual |
32 | ||
33 | (b) The FSF's Back-Cover Text is: | |
34 | ||
35 | You have freedom to copy and modify this GNU Manual, like GNU | |
36 | software. Copies published by the Free Software Foundation raise | |
37 | funds for GNU development. | |
38 | @end copying | |
252b5132 RH |
39 | |
40 | @iftex | |
41 | @finalout | |
42 | @setchapternewpage off | |
43 | @settitle GNU Linker Internals | |
44 | @titlepage | |
45 | @title{A guide to the internals of the GNU linker} | |
46 | @author Per Bothner, Steve Chamberlain, Ian Lance Taylor, DJ Delorie | |
47 | @author Cygnus Support | |
48 | @page | |
49 | ||
50 | @tex | |
51 | \def\$#1${{#1}} % Kluge: collect RCS revision info without $...$ | |
5b343f5a | 52 | \xdef\manvers{2.10.91} % For use in headers, footers too |
252b5132 RH |
53 | {\parskip=0pt |
54 | \hfill Cygnus Support\par | |
55 | \hfill \manvers\par | |
56 | \hfill \TeX{}info \texinfoversion\par | |
57 | } | |
58 | @end tex | |
59 | ||
60 | @vskip 0pt plus 1filll | |
9fbcbd81 | 61 | Copyright @copyright{} 1992, 1993, 1994, 1995, 1996, 1997, 1998, 2000 |
252b5132 RH |
62 | Free Software Foundation, Inc. |
63 | ||
704c465c | 64 | Permission is granted to copy, distribute and/or modify this document |
9fbcbd81 | 65 | under the terms of the GNU Free Documentation License, Version 1.3 |
704c465c NC |
66 | or any later version published by the Free Software Foundation; |
67 | with no Invariant Sections, with no Front-Cover Texts, and with no | |
68 | Back-Cover Texts. A copy of the license is included in the | |
69 | section entitled "GNU Free Documentation License". | |
252b5132 RH |
70 | |
71 | @end titlepage | |
72 | @end iftex | |
73 | ||
74 | @node Top | |
75 | @top | |
76 | ||
77 | This file documents the internals of the GNU linker @code{ld}. It is a | |
78 | collection of miscellaneous information with little form at this point. | |
79 | Mostly, it is a repository into which you can put information about | |
80 | GNU @code{ld} as you discover it (or as you design changes to @code{ld}). | |
81 | ||
cf055d54 NC |
82 | This document is distributed under the terms of the GNU Free |
83 | Documentation License. A copy of the license is included in the | |
84 | section entitled "GNU Free Documentation License". | |
85 | ||
252b5132 RH |
86 | @menu |
87 | * README:: The README File | |
88 | * Emulations:: How linker emulations are generated | |
89 | * Emulation Walkthrough:: A Walkthrough of a Typical Emulation | |
b044cda1 | 90 | * Architecture Specific:: Some Architecture Specific Notes |
704c465c | 91 | * GNU Free Documentation License:: GNU Free Documentation License |
252b5132 RH |
92 | @end menu |
93 | ||
94 | @node README | |
95 | @chapter The @file{README} File | |
96 | ||
97 | Check the @file{README} file; it often has useful information that does not | |
98 | appear anywhere else in the directory. | |
99 | ||
100 | @node Emulations | |
101 | @chapter How linker emulations are generated | |
102 | ||
103 | Each linker target has an @dfn{emulation}. The emulation includes the | |
104 | default linker script, and certain emulations also modify certain types | |
105 | of linker behaviour. | |
106 | ||
107 | Emulations are created during the build process by the shell script | |
108 | @file{genscripts.sh}. | |
109 | ||
110 | The @file{genscripts.sh} script starts by reading a file in the | |
111 | @file{emulparams} directory. This is a shell script which sets various | |
112 | shell variables used by @file{genscripts.sh} and the other shell scripts | |
113 | it invokes. | |
114 | ||
115 | The @file{genscripts.sh} script will invoke a shell script in the | |
116 | @file{scripttempl} directory in order to create default linker scripts | |
117 | written in the linker command language. The @file{scripttempl} script | |
118 | will be invoked 5 (or, in some cases, 6) times, with different | |
119 | assignments to shell variables, to create different default scripts. | |
120 | The choice of script is made based on the command line options. | |
121 | ||
122 | After creating the scripts, @file{genscripts.sh} will invoke yet another | |
123 | shell script, this time in the @file{emultempl} directory. That shell | |
124 | script will create the emulation source file, which contains C code. | |
125 | This C code permits the linker emulation to override various linker | |
126 | behaviours. Most targets use the generic emulation code, which is in | |
127 | @file{emultempl/generic.em}. | |
128 | ||
129 | To summarize, @file{genscripts.sh} reads three shell scripts: an | |
130 | emulation parameters script in the @file{emulparams} directory, a linker | |
131 | script generation script in the @file{scripttempl} directory, and an | |
132 | emulation source file generation script in the @file{emultempl} | |
133 | directory. | |
134 | ||
135 | For example, the Sun 4 linker sets up variables in | |
136 | @file{emulparams/sun4.sh}, creates linker scripts using | |
137 | @file{scripttempl/aout.sc}, and creates the emulation code using | |
138 | @file{emultempl/sunos.em}. | |
139 | ||
140 | Note that the linker can support several emulations simultaneously, | |
141 | depending upon how it is configured. An emulation can be selected with | |
142 | the @code{-m} option. The @code{-V} option will list all supported | |
143 | emulations. | |
144 | ||
145 | @menu | |
146 | * emulation parameters:: @file{emulparams} scripts | |
147 | * linker scripts:: @file{scripttempl} scripts | |
148 | * linker emulations:: @file{emultempl} scripts | |
149 | @end menu | |
150 | ||
151 | @node emulation parameters | |
152 | @section @file{emulparams} scripts | |
153 | ||
154 | Each target selects a particular file in the @file{emulparams} directory | |
155 | by setting the shell variable @code{targ_emul} in @file{configure.tgt}. | |
156 | This shell variable is used by the @file{configure} script to control | |
157 | building an emulation source file. | |
158 | ||
159 | Certain conventions are enforced. Suppose the @code{targ_emul} variable | |
160 | is set to @var{emul} in @file{configure.tgt}. The name of the emulation | |
161 | shell script will be @file{emulparams/@var{emul}.sh}. The | |
162 | @file{Makefile} must have a target named @file{e@var{emul}.c}; this | |
163 | target must depend upon @file{emulparams/@var{emul}.sh}, as well as the | |
164 | appropriate scripts in the @file{scripttempl} and @file{emultempl} | |
165 | directories. The @file{Makefile} target must invoke @code{GENSCRIPTS} | |
166 | with two arguments: @var{emul}, and the value of the make variable | |
167 | @code{tdir_@var{emul}}. The value of the latter variable will be set by | |
168 | the @file{configure} script, and is used to set the default target | |
169 | directory to search. | |
170 | ||
171 | By convention, the @file{emulparams/@var{emul}.sh} shell script should | |
172 | only set shell variables. It may set shell variables which are to be | |
173 | interpreted by the @file{scripttempl} and the @file{emultempl} scripts. | |
174 | Certain shell variables are interpreted directly by the | |
175 | @file{genscripts.sh} script. | |
176 | ||
177 | Here is a list of shell variables interpreted by @file{genscripts.sh}, | |
178 | as well as some conventional shell variables interpreted by the | |
179 | @file{scripttempl} and @file{emultempl} scripts. | |
180 | ||
181 | @table @code | |
182 | @item SCRIPT_NAME | |
183 | This is the name of the @file{scripttempl} script to use. If | |
184 | @code{SCRIPT_NAME} is set to @var{script}, @file{genscripts.sh} will use | |
b45619c0 | 185 | the script @file{scripttempl/@var{script}.sc}. |
252b5132 RH |
186 | |
187 | @item TEMPLATE_NAME | |
b45619c0 | 188 | This is the name of the @file{emultempl} script to use. If |
252b5132 RH |
189 | @code{TEMPLATE_NAME} is set to @var{template}, @file{genscripts.sh} will |
190 | use the script @file{emultempl/@var{template}.em}. If this variable is | |
191 | not set, the default value is @samp{generic}. | |
192 | ||
193 | @item GENERATE_SHLIB_SCRIPT | |
194 | If this is set to a nonempty string, @file{genscripts.sh} will invoke | |
195 | the @file{scripttempl} script an extra time to create a shared library | |
196 | script. @ref{linker scripts}. | |
197 | ||
198 | @item OUTPUT_FORMAT | |
199 | This is normally set to indicate the BFD output format use (e.g., | |
200 | @samp{"a.out-sunos-big"}. The @file{scripttempl} script will normally | |
201 | use it in an @code{OUTPUT_FORMAT} expression in the linker script. | |
202 | ||
203 | @item ARCH | |
204 | This is normally set to indicate the architecture to use (e.g., | |
205 | @samp{sparc}). The @file{scripttempl} script will normally use it in an | |
206 | @code{OUTPUT_ARCH} expression in the linker script. | |
207 | ||
208 | @item ENTRY | |
209 | Some @file{scripttempl} scripts use this to set the entry address, in an | |
210 | @code{ENTRY} expression in the linker script. | |
211 | ||
212 | @item TEXT_START_ADDR | |
213 | Some @file{scripttempl} scripts use this to set the start address of the | |
214 | @samp{.text} section. | |
215 | ||
252b5132 RH |
216 | @item SEGMENT_SIZE |
217 | The @file{genscripts.sh} script uses this to set the default value of | |
218 | @code{DATA_ALIGNMENT} when running the @file{scripttempl} script. | |
219 | ||
220 | @item TARGET_PAGE_SIZE | |
221 | If @code{SEGMENT_SIZE} is not defined, the @file{genscripts.sh} script | |
222 | uses this to define it. | |
223 | ||
224 | @item ALIGNMENT | |
225 | Some @file{scripttempl} scripts set this to a number to pass to | |
226 | @code{ALIGN} to set the required alignment for the @code{end} symbol. | |
227 | @end table | |
228 | ||
229 | @node linker scripts | |
230 | @section @file{scripttempl} scripts | |
231 | ||
232 | Each linker target uses a @file{scripttempl} script to generate the | |
233 | default linker scripts. The name of the @file{scripttempl} script is | |
234 | set by the @code{SCRIPT_NAME} variable in the @file{emulparams} script. | |
235 | If @code{SCRIPT_NAME} is set to @var{script}, @code{genscripts.sh} will | |
236 | invoke @file{scripttempl/@var{script}.sc}. | |
237 | ||
238 | The @file{genscripts.sh} script will invoke the @file{scripttempl} | |
e2a83dd0 | 239 | script 5 to 9 times. Each time it will set the shell variable |
252b5132 RH |
240 | @code{LD_FLAG} to a different value. When the linker is run, the |
241 | options used will direct it to select a particular script. (Script | |
242 | selection is controlled by the @code{get_script} emulation entry point; | |
243 | this describes the conventional behaviour). | |
244 | ||
245 | The @file{scripttempl} script should just write a linker script, written | |
246 | in the linker command language, to standard output. If the emulation | |
247 | name--the name of the @file{emulparams} file without the @file{.sc} | |
248 | extension--is @var{emul}, then the output will be directed to | |
249 | @file{ldscripts/@var{emul}.@var{extension}} in the build directory, | |
250 | where @var{extension} changes each time the @file{scripttempl} script is | |
251 | invoked. | |
252 | ||
253 | Here is the list of values assigned to @code{LD_FLAG}. | |
254 | ||
255 | @table @code | |
256 | @item (empty) | |
257 | The script generated is used by default (when none of the following | |
258 | cases apply). The output has an extension of @file{.x}. | |
259 | @item n | |
260 | The script generated is used when the linker is invoked with the | |
261 | @code{-n} option. The output has an extension of @file{.xn}. | |
262 | @item N | |
263 | The script generated is used when the linker is invoked with the | |
264 | @code{-N} option. The output has an extension of @file{.xbn}. | |
265 | @item r | |
266 | The script generated is used when the linker is invoked with the | |
267 | @code{-r} option. The output has an extension of @file{.xr}. | |
268 | @item u | |
269 | The script generated is used when the linker is invoked with the | |
270 | @code{-Ur} option. The output has an extension of @file{.xu}. | |
271 | @item shared | |
272 | The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to | |
273 | this value if @code{GENERATE_SHLIB_SCRIPT} is defined in the | |
274 | @file{emulparams} file. The @file{emultempl} script must arrange to use | |
275 | this script at the appropriate time, normally when the linker is invoked | |
276 | with the @code{-shared} option. The output has an extension of | |
277 | @file{.xs}. | |
db6751f2 JJ |
278 | @item c |
279 | The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to | |
280 | this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the | |
281 | @file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf}. The | |
282 | @file{emultempl} script must arrange to use this script at the appropriate | |
283 | time, normally when the linker is invoked with the @code{-z combreloc} | |
284 | option. The output has an extension of | |
285 | @file{.xc}. | |
286 | @item cshared | |
287 | The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to | |
288 | this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the | |
289 | @file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf} and | |
b45619c0 | 290 | @code{GENERATE_SHLIB_SCRIPT} is defined in the @file{emulparams} file. |
db6751f2 JJ |
291 | The @file{emultempl} script must arrange to use this script at the |
292 | appropriate time, normally when the linker is invoked with the @code{-shared | |
293 | -z combreloc} option. The output has an extension of @file{.xsc}. | |
e2a83dd0 NC |
294 | @item auto_import |
295 | The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to | |
296 | this value if @code{GENERATE_AUTO_IMPORT_SCRIPT} is defined in the | |
297 | @file{emulparams} file. The @file{emultempl} script must arrange to | |
298 | use this script at the appropriate time, normally when the linker is | |
299 | invoked with the @code{--enable-auto-import} option. The output has | |
300 | an extension of @file{.xa}. | |
252b5132 RH |
301 | @end table |
302 | ||
303 | Besides the shell variables set by the @file{emulparams} script, and the | |
304 | @code{LD_FLAG} variable, the @file{genscripts.sh} script will set | |
305 | certain variables for each run of the @file{scripttempl} script. | |
306 | ||
307 | @table @code | |
308 | @item RELOCATING | |
309 | This will be set to a non-empty string when the linker is doing a final | |
310 | relocation (e.g., all scripts other than @code{-r} and @code{-Ur}). | |
311 | ||
312 | @item CONSTRUCTING | |
313 | This will be set to a non-empty string when the linker is building | |
314 | global constructor and destructor tables (e.g., all scripts other than | |
315 | @code{-r}). | |
316 | ||
317 | @item DATA_ALIGNMENT | |
318 | This will be set to an @code{ALIGN} expression when the output should be | |
319 | page aligned, or to @samp{.} when generating the @code{-N} script. | |
320 | ||
321 | @item CREATE_SHLIB | |
322 | This will be set to a non-empty string when generating a @code{-shared} | |
323 | script. | |
db6751f2 JJ |
324 | |
325 | @item COMBRELOC | |
326 | This will be set to a non-empty string when generating @code{-z combreloc} | |
327 | scripts to a temporary file name which can be used during script generation. | |
252b5132 RH |
328 | @end table |
329 | ||
330 | The conventional way to write a @file{scripttempl} script is to first | |
331 | set a few shell variables, and then write out a linker script using | |
332 | @code{cat} with a here document. The linker script will use variable | |
333 | substitutions, based on the above variables and those set in the | |
334 | @file{emulparams} script, to control its behaviour. | |
335 | ||
336 | When there are parts of the @file{scripttempl} script which should only | |
337 | be run when doing a final relocation, they should be enclosed within a | |
338 | variable substitution based on @code{RELOCATING}. For example, on many | |
339 | targets special symbols such as @code{_end} should be defined when doing | |
340 | a final link. Naturally, those symbols should not be defined when doing | |
1049f94e | 341 | a relocatable link using @code{-r}. The @file{scripttempl} script |
252b5132 RH |
342 | could use a construct like this to define those symbols: |
343 | @smallexample | |
344 | $@{RELOCATING+ _end = .;@} | |
345 | @end smallexample | |
346 | This will do the symbol assignment only if the @code{RELOCATING} | |
347 | variable is defined. | |
348 | ||
349 | The basic job of the linker script is to put the sections in the correct | |
350 | order, and at the correct memory addresses. For some targets, the | |
351 | linker script may have to do some other operations. | |
352 | ||
353 | For example, on most MIPS platforms, the linker is responsible for | |
354 | defining the special symbol @code{_gp}, used to initialize the | |
355 | @code{$gp} register. It must be set to the start of the small data | |
356 | section plus @code{0x8000}. Naturally, it should only be defined when | |
357 | doing a final relocation. This will typically be done like this: | |
358 | @smallexample | |
359 | $@{RELOCATING+ _gp = ALIGN(16) + 0x8000;@} | |
360 | @end smallexample | |
361 | This line would appear just before the sections which compose the small | |
362 | data section (@samp{.sdata}, @samp{.sbss}). All those sections would be | |
363 | contiguous in memory. | |
364 | ||
365 | Many COFF systems build constructor tables in the linker script. The | |
366 | compiler will arrange to output the address of each global constructor | |
367 | in a @samp{.ctor} section, and the address of each global destructor in | |
368 | a @samp{.dtor} section (this is done by defining | |
369 | @code{ASM_OUTPUT_CONSTRUCTOR} and @code{ASM_OUTPUT_DESTRUCTOR} in the | |
370 | @code{gcc} configuration files). The @code{gcc} runtime support | |
371 | routines expect the constructor table to be named @code{__CTOR_LIST__}. | |
372 | They expect it to be a list of words, with the first word being the | |
373 | count of the number of entries. There should be a trailing zero word. | |
374 | (Actually, the count may be -1 if the trailing word is present, and the | |
375 | trailing word may be omitted if the count is correct, but, as the | |
376 | @code{gcc} behaviour has changed slightly over the years, it is safest | |
377 | to provide both). Here is a typical way that might be handled in a | |
378 | @file{scripttempl} file. | |
379 | @smallexample | |
380 | $@{CONSTRUCTING+ __CTOR_LIST__ = .;@} | |
381 | $@{CONSTRUCTING+ LONG((__CTOR_END__ - __CTOR_LIST__) / 4 - 2)@} | |
382 | $@{CONSTRUCTING+ *(.ctors)@} | |
383 | $@{CONSTRUCTING+ LONG(0)@} | |
384 | $@{CONSTRUCTING+ __CTOR_END__ = .;@} | |
385 | $@{CONSTRUCTING+ __DTOR_LIST__ = .;@} | |
386 | $@{CONSTRUCTING+ LONG((__DTOR_END__ - __DTOR_LIST__) / 4 - 2)@} | |
387 | $@{CONSTRUCTING+ *(.dtors)@} | |
388 | $@{CONSTRUCTING+ LONG(0)@} | |
389 | $@{CONSTRUCTING+ __DTOR_END__ = .;@} | |
390 | @end smallexample | |
391 | The use of @code{CONSTRUCTING} ensures that these linker script commands | |
392 | will only appear when the linker is supposed to be building the | |
393 | constructor and destructor tables. This example is written for a target | |
394 | which uses 4 byte pointers. | |
395 | ||
396 | Embedded systems often need to set a stack address. This is normally | |
397 | best done by using the @code{PROVIDE} construct with a default stack | |
398 | address. This permits the user to easily override the stack address | |
399 | using the @code{--defsym} option. Here is an example: | |
400 | @smallexample | |
401 | $@{RELOCATING+ PROVIDE (__stack = 0x80000000);@} | |
402 | @end smallexample | |
403 | The value of the symbol @code{__stack} would then be used in the startup | |
404 | code to initialize the stack pointer. | |
405 | ||
406 | @node linker emulations | |
407 | @section @file{emultempl} scripts | |
408 | ||
409 | Each linker target uses an @file{emultempl} script to generate the | |
410 | emulation code. The name of the @file{emultempl} script is set by the | |
411 | @code{TEMPLATE_NAME} variable in the @file{emulparams} script. If the | |
412 | @code{TEMPLATE_NAME} variable is not set, the default is | |
413 | @samp{generic}. If the value of @code{TEMPLATE_NAME} is @var{template}, | |
414 | @file{genscripts.sh} will use @file{emultempl/@var{template}.em}. | |
415 | ||
416 | Most targets use the generic @file{emultempl} script, | |
417 | @file{emultempl/generic.em}. A different @file{emultempl} script is | |
418 | only needed if the linker must support unusual actions, such as linking | |
419 | against shared libraries. | |
420 | ||
421 | The @file{emultempl} script is normally written as a simple invocation | |
422 | of @code{cat} with a here document. The document will use a few | |
423 | variable substitutions. Typically each function names uses a | |
424 | substitution involving @code{EMULATION_NAME}, for ease of debugging when | |
425 | the linker supports multiple emulations. | |
426 | ||
427 | Every function and variable in the emitted file should be static. The | |
428 | only globally visible object must be named | |
429 | @code{ld_@var{EMULATION_NAME}_emulation}, where @var{EMULATION_NAME} is | |
430 | the name of the emulation set in @file{configure.tgt} (this is also the | |
431 | name of the @file{emulparams} file without the @file{.sh} extension). | |
432 | The @file{genscripts.sh} script will set the shell variable | |
433 | @code{EMULATION_NAME} before invoking the @file{emultempl} script. | |
434 | ||
435 | The @code{ld_@var{EMULATION_NAME}_emulation} variable must be a | |
436 | @code{struct ld_emulation_xfer_struct}, as defined in @file{ldemul.h}. | |
437 | It defines a set of function pointers which are invoked by the linker, | |
438 | as well as strings for the emulation name (normally set from the shell | |
439 | variable @code{EMULATION_NAME} and the default BFD target name (normally | |
440 | set from the shell variable @code{OUTPUT_FORMAT} which is normally set | |
441 | by the @file{emulparams} file). | |
442 | ||
443 | The @file{genscripts.sh} script will set the shell variable | |
444 | @code{COMPILE_IN} when it invokes the @file{emultempl} script for the | |
445 | default emulation. In this case, the @file{emultempl} script should | |
446 | include the linker scripts directly, and return them from the | |
447 | @code{get_scripts} entry point. When the emulation is not the default, | |
448 | the @code{get_scripts} entry point should just return a file name. See | |
449 | @file{emultempl/generic.em} for an example of how this is done. | |
450 | ||
451 | At some point, the linker emulation entry points should be documented. | |
452 | ||
453 | @node Emulation Walkthrough | |
454 | @chapter A Walkthrough of a Typical Emulation | |
455 | ||
456 | This chapter is to help people who are new to the way emulations | |
457 | interact with the linker, or who are suddenly thrust into the position | |
458 | of having to work with existing emulations. It will discuss the files | |
459 | you need to be aware of. It will tell you when the given "hooks" in | |
460 | the emulation will be called. It will, hopefully, give you enough | |
461 | information about when and how things happen that you'll be able to | |
462 | get by. As always, the source is the definitive reference to this. | |
463 | ||
464 | The starting point for the linker is in @file{ldmain.c} where | |
465 | @code{main} is defined. The bulk of the code that's emulation | |
466 | specific will initially be in @code{emultempl/@var{emulation}.em} but | |
467 | will end up in @code{e@var{emulation}.c} when the build is done. | |
468 | Most of the work to select and interface with emulations is in | |
469 | @code{ldemul.h} and @code{ldemul.c}. Specifically, @code{ldemul.h} | |
470 | defines the @code{ld_emulation_xfer_struct} structure your emulation | |
471 | exports. | |
472 | ||
473 | Your emulation file exports a symbol | |
474 | @code{ld_@var{EMULATION_NAME}_emulation}. If your emulation is | |
475 | selected (it usually is, since usually there's only one), | |
476 | @code{ldemul.c} sets the variable @var{ld_emulation} to point to it. | |
477 | @code{ldemul.c} also defines a number of API functions that interface | |
478 | to your emulation, like @code{ldemul_after_parse} which simply calls | |
479 | your @code{ld_@var{EMULATION}_emulation.after_parse} function. For | |
480 | the rest of this section, the functions will be mentioned, but you | |
481 | should assume the indirect reference to your emulation also. | |
482 | ||
483 | We will also skip or gloss over parts of the link process that don't | |
484 | relate to emulations, like setting up internationalization. | |
485 | ||
486 | After initialization, @code{main} selects an emulation by pre-scanning | |
487 | the command line arguments. It calls @code{ldemul_choose_target} to | |
488 | choose a target. If you set @code{choose_target} to | |
489 | @code{ldemul_default_target}, it picks your @code{target_name} by | |
490 | default. | |
491 | ||
492 | @code{main} calls @code{ldemul_before_parse}, then @code{parse_args}. | |
493 | @code{parse_args} calls @code{ldemul_parse_args} for each arg, which | |
494 | must update the @code{getopt} globals if it recognizes the argument. | |
495 | If the emulation doesn't recognize it, then parse_args checks to see | |
496 | if it recognizes it. | |
497 | ||
498 | Now that the emulation has had access to all its command-line options, | |
499 | @code{main} calls @code{ldemul_set_symbols}. This can be used for any | |
500 | initialization that may be affected by options. It is also supposed | |
501 | to set up any variables needed by the emulation script. | |
502 | ||
503 | @code{main} now calls @code{ldemul_get_script} to get the emulation | |
504 | script to use (based on arguments, no doubt, @pxref{Emulations}) and | |
505 | runs it. While parsing, @code{ldgram.y} may call @code{ldemul_hll} or | |
506 | @code{ldemul_syslib} to handle the @code{HLL} or @code{SYSLIB} | |
507 | commands. It may call @code{ldemul_unrecognized_file} if you asked | |
508 | the linker to link a file it doesn't recognize. It will call | |
509 | @code{ldemul_recognized_file} for each file it does recognize, in case | |
510 | the emulation wants to handle some files specially. All the while, | |
511 | it's loading the files (possibly calling | |
512 | @code{ldemul_open_dynamic_archive}) and symbols and stuff. After it's | |
513 | done reading the script, @code{main} calls @code{ldemul_after_parse}. | |
514 | Use the after-parse hook to set up anything that depends on stuff the | |
515 | script might have set up, like the entry point. | |
516 | ||
517 | @code{main} next calls @code{lang_process} in @code{ldlang.c}. This | |
518 | appears to be the main core of the linking itself, as far as emulation | |
519 | hooks are concerned(*). It first opens the output file's BFD, calling | |
520 | @code{ldemul_set_output_arch}, and calls | |
521 | @code{ldemul_create_output_section_statements} in case you need to use | |
522 | other means to find or create object files (i.e. shared libraries | |
523 | found on a path, or fake stub objects). Despite the name, nobody | |
524 | creates output sections here. | |
525 | ||
526 | (*) In most cases, the BFD library does the bulk of the actual | |
527 | linking, handling symbol tables, symbol resolution, relocations, and | |
528 | building the final output file. See the BFD reference for all the | |
529 | details. Your emulation is usually concerned more with managing | |
530 | things at the file and section level, like "put this here, add this | |
531 | section", etc. | |
532 | ||
533 | Next, the objects to be linked are opened and BFDs created for them, | |
534 | and @code{ldemul_after_open} is called. At this point, you have all | |
535 | the objects and symbols loaded, but none of the data has been placed | |
536 | yet. | |
537 | ||
538 | Next comes the Big Linking Thingy (except for the parts BFD does). | |
539 | All input sections are mapped to output sections according to the | |
540 | script. If a section doesn't get mapped by default, | |
541 | @code{ldemul_place_orphan} will get called to figure out where it goes. | |
542 | Next it figures out the offsets for each section, calling | |
543 | @code{ldemul_before_allocation} before and | |
544 | @code{ldemul_after_allocation} after deciding where each input section | |
545 | ends up in the output sections. | |
546 | ||
547 | The last part of @code{lang_process} is to figure out all the symbols' | |
548 | values. After assigning final values to the symbols, | |
549 | @code{ldemul_finish} is called, and after that, any undefined symbols | |
550 | are turned into fatal errors. | |
551 | ||
552 | OK, back to @code{main}, which calls @code{ldwrite} in | |
553 | @file{ldwrite.c}. @code{ldwrite} calls BFD's final_link, which does | |
554 | all the relocation fixups and writes the output bfd to disk, and we're | |
555 | done. | |
556 | ||
557 | In summary, | |
558 | ||
559 | @itemize @bullet | |
560 | ||
561 | @item @code{main()} in @file{ldmain.c} | |
562 | @item @file{emultempl/@var{EMULATION}.em} has your code | |
563 | @item @code{ldemul_choose_target} (defaults to your @code{target_name}) | |
564 | @item @code{ldemul_before_parse} | |
565 | @item Parse argv, calls @code{ldemul_parse_args} for each | |
566 | @item @code{ldemul_set_symbols} | |
567 | @item @code{ldemul_get_script} | |
568 | @item parse script | |
569 | ||
570 | @itemize @bullet | |
571 | @item may call @code{ldemul_hll} or @code{ldemul_syslib} | |
572 | @item may call @code{ldemul_open_dynamic_archive} | |
573 | @end itemize | |
574 | ||
575 | @item @code{ldemul_after_parse} | |
576 | @item @code{lang_process()} in @file{ldlang.c} | |
577 | ||
578 | @itemize @bullet | |
579 | @item create @code{output_bfd} | |
580 | @item @code{ldemul_set_output_arch} | |
581 | @item @code{ldemul_create_output_section_statements} | |
582 | @item read objects, create input bfds - all symbols exist, but have no values | |
583 | @item may call @code{ldemul_unrecognized_file} | |
584 | @item will call @code{ldemul_recognized_file} | |
585 | @item @code{ldemul_after_open} | |
586 | @item map input sections to output sections | |
587 | @item may call @code{ldemul_place_orphan} for remaining sections | |
588 | @item @code{ldemul_before_allocation} | |
589 | @item gives input sections offsets into output sections, places output sections | |
590 | @item @code{ldemul_after_allocation} - section addresses valid | |
591 | @item assigns values to symbols | |
592 | @item @code{ldemul_finish} - symbol values valid | |
593 | @end itemize | |
594 | ||
595 | @item output bfd is written to disk | |
596 | ||
597 | @end itemize | |
598 | ||
b044cda1 CW |
599 | @node Architecture Specific |
600 | @chapter Some Architecture Specific Notes | |
601 | ||
602 | This is the place for notes on the behavior of @code{ld} on | |
603 | specific platforms. Currently, only Intel x86 is documented (and | |
604 | of that, only the auto-import behavior for DLLs). | |
605 | ||
606 | @menu | |
607 | * ix86:: Intel x86 | |
608 | @end menu | |
609 | ||
610 | @node ix86 | |
611 | @section Intel x86 | |
612 | ||
613 | @table @emph | |
614 | @code{ld} can create DLLs that operate with various runtimes available | |
615 | on a common x86 operating system. These runtimes include native (using | |
616 | the mingw "platform"), cygwin, and pw. | |
617 | ||
618 | @item auto-import from DLLs | |
619 | @enumerate | |
620 | @item | |
621 | With this feature on, DLL clients can import variables from DLL | |
622 | without any concern from their side (for example, without any source | |
623 | code modifications). Auto-import can be enabled using the | |
624 | @code{--enable-auto-import} flag, or disabled via the | |
625 | @code{--disable-auto-import} flag. Auto-import is disabled by default. | |
626 | ||
627 | @item | |
628 | This is done completely in bounds of the PE specification (to be fair, | |
629 | there's a minor violation of the spec at one point, but in practice | |
630 | auto-import works on all known variants of that common x86 operating | |
631 | system) So, the resulting DLL can be used with any other PE | |
632 | compiler/linker. | |
633 | ||
634 | @item | |
635 | Auto-import is fully compatible with standard import method, in which | |
636 | variables are decorated using attribute modifiers. Libraries of either | |
637 | type may be mixed together. | |
638 | ||
639 | @item | |
640 | Overhead (space): 8 bytes per imported symbol, plus 20 for each | |
641 | reference to it; Overhead (load time): negligible; Overhead | |
642 | (virtual/physical memory): should be less than effect of DLL | |
643 | relocation. | |
644 | @end enumerate | |
645 | ||
646 | Motivation | |
647 | ||
648 | The obvious and only way to get rid of dllimport insanity is | |
649 | to make client access variable directly in the DLL, bypassing | |
650 | the extra dereference imposed by ordinary DLL runtime linking. | |
b45619c0 | 651 | I.e., whenever client contains something like |
b044cda1 CW |
652 | |
653 | @code{mov dll_var,%eax,} | |
654 | ||
655 | address of dll_var in the command should be relocated to point | |
656 | into loaded DLL. The aim is to make OS loader do so, and than | |
657 | make ld help with that. Import section of PE made following | |
658 | way: there's a vector of structures each describing imports | |
659 | from particular DLL. Each such structure points to two other | |
b45619c0 | 660 | parallel vectors: one holding imported names, and one which |
b044cda1 CW |
661 | will hold address of corresponding imported name. So, the |
662 | solution is de-vectorize these structures, making import | |
663 | locations be sparse and pointing directly into code. | |
664 | ||
665 | Implementation | |
666 | ||
667 | For each reference of data symbol to be imported from DLL (to | |
668 | set of which belong symbols with name <sym>, if __imp_<sym> is | |
669 | found in implib), the import fixup entry is generated. That | |
670 | entry is of type IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3 | |
671 | subsection. Each fixup entry contains pointer to symbol's address | |
672 | within .text section (marked with __fuN_<sym> symbol, where N is | |
673 | integer), pointer to DLL name (so, DLL name is referenced by | |
674 | multiple entries), and pointer to symbol name thunk. Symbol name | |
675 | thunk is singleton vector (__nm_th_<symbol>) pointing to | |
676 | IMAGE_IMPORT_BY_NAME structure (__nm_<symbol>) directly containing | |
677 | imported name. Here comes that "om the edge" problem mentioned above: | |
678 | PE specification rambles that name vector (OriginalFirstThunk) should | |
679 | run in parallel with addresses vector (FirstThunk), i.e. that they | |
680 | should have same number of elements and terminated with zero. We violate | |
681 | this, since FirstThunk points directly into machine code. But in | |
682 | practice, OS loader implemented the sane way: it goes thru | |
683 | OriginalFirstThunk and puts addresses to FirstThunk, not something | |
684 | else. It once again should be noted that dll and symbol name | |
685 | structures are reused across fixup entries and should be there | |
686 | anyway to support standard import stuff, so sustained overhead is | |
687 | 20 bytes per reference. Other question is whether having several | |
688 | IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible. Answer is yes, | |
689 | it is done even by native compiler/linker (libth32's functions are in | |
690 | fact resident in windows9x kernel32.dll, so if you use it, you have | |
691 | two IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other question is | |
692 | whether referencing the same PE structures several times is valid. | |
693 | The answer is why not, prohibiting that (detecting violation) would | |
694 | require more work on behalf of loader than not doing it. | |
695 | ||
696 | @end table | |
697 | ||
704c465c NC |
698 | @node GNU Free Documentation License |
699 | @chapter GNU Free Documentation License | |
700 | ||
9fbcbd81 | 701 | @include fdl.texi |
704c465c | 702 | |
252b5132 RH |
703 | @contents |
704 | @bye |