]>
Commit | Line | Data |
---|---|---|
93b45514 RP |
1 | \input texinfo @c -*-texinfo-*- |
2 | @tex | |
3 | \special{twoside} | |
4 | @end tex | |
5 | @setfilename as | |
6 | @settitle as | |
7 | @titlepage | |
8 | @center @titlefont{as} | |
9 | @sp 1 | |
10 | @center The GNU Assembler | |
11 | @sp 2 | |
12 | @center Dean Elsner, Jay Fenlason & friends | |
13 | @sp 13 | |
14 | The Free Software Foundation Inc. thanks The Nice Computer | |
15 | Company of Australia for loaning Dean Elsner to write the | |
16 | first (Vax) version of @code{as} for Project GNU. | |
17 | The proprietors, management and staff of TNCCA thank FSF for | |
18 | distracting the boss while they got some work | |
19 | done. | |
20 | @sp 3 | |
21 | ||
22 | Copyright @copyright{} 1986,1987 Free Software Foundation, Inc. | |
23 | ||
24 | Permission is granted to make and distribute verbatim copies of | |
25 | this manual provided the copyright notice and this permission notice | |
26 | are preserved on all copies. | |
27 | ||
28 | @ignore | |
29 | Permission is granted to process this file through Tex and print the | |
30 | results, provided the printed document carries copying permission | |
31 | notice identical to this one except for the removal of this paragraph | |
32 | (this paragraph not being relevant to the printed manual). | |
33 | ||
34 | @end ignore | |
35 | Permission is granted to copy and distribute modified versions of this | |
36 | manual under the conditions for verbatim copying, provided that the entire | |
37 | resulting derived work is distributed under the terms of a permission | |
38 | notice identical to this one. | |
39 | ||
40 | Permission is granted to copy and distribute translations of this manual | |
41 | into another language, under the same conditions as for modified versions. | |
42 | ||
43 | @end titlepage | |
44 | @node top, Syntax, top, top | |
45 | @chapter Overview, Usage | |
46 | @menu | |
47 | * Syntax:: The (machine independent) syntax that assembly language | |
48 | files must follow. The machine dependent syntax | |
49 | can be found in the machine dependent section of | |
50 | the manual for the machine that you are using. | |
51 | * Segments:: How to use segments and subsegments, and how the | |
52 | assembler and linker will relocate things. | |
53 | * Symbols:: How to set up and manipulate symbols. | |
54 | * Expressions:: And how the assembler deals with them. | |
55 | * PseudoOps:: The assorted machine directives that tell the | |
56 | assembler exactly what to do with its input. | |
57 | * MachineDependent:: Information specific to each machine. | |
58 | * Maintenance:: Keeping the assembler running. | |
59 | * Retargeting:: Teaching the assembler about new machines. | |
60 | @end menu | |
61 | ||
62 | This document describes the GNU assembler @code{as}. This document | |
63 | does @emph{not} describe what an assembler does, or how it works. | |
64 | This document also does @emph{not} describe the opcodes, registers | |
65 | or addressing modes that @code{as} uses on any paticular computer | |
66 | that @code{as} runs on. Consult a good book on assemblers or the | |
67 | machine's architecture if you need that information. | |
68 | ||
69 | This document describes the directives that @code{as} understands, | |
70 | and their syntax. This document also describes some of the | |
71 | machine-dependent features of various flavors of the assembler. | |
72 | This document also describes how the assembler works internally, and | |
73 | provides some information that may be useful to people attempting to | |
74 | port the assembler to another machine. | |
75 | ||
76 | ||
77 | Throughout this document, we assume that you are running @dfn{GNU}, | |
78 | the portable operating system from the @dfn{Free Software | |
79 | Foundation, Inc.}. This restricts our attention to certain kinds of | |
80 | computer (in paticular, the kinds of computers that GNU can run on); | |
81 | once this assumption is granted examples and definitions need less | |
82 | qualification. | |
83 | ||
84 | Readers should already comprehend: | |
85 | @itemize @bullet | |
86 | @item | |
87 | Central processing unit | |
88 | @item | |
89 | registers | |
90 | @item | |
91 | memory address | |
92 | @item | |
93 | contents of memory address | |
94 | @item | |
95 | bit | |
96 | @item | |
97 | 8-bit byte | |
98 | @item | |
99 | 2's complement arithmetic | |
100 | @end itemize | |
101 | ||
102 | @code{as} is part of a team of programs that turn a high-level | |
103 | human-readable series of instructions into a low-level | |
104 | computer-readable series of instructions. Different versions of | |
105 | @code{as} are used for different kinds of computer. In paticular, | |
106 | at the moment, @code{as} only works for the DEC Vax, the Motorola | |
107 | 680x0, the Intel 80386, the Sparc, and the National Semiconductor | |
108 | 32032/32532. | |
109 | ||
110 | @section Notation | |
111 | GNU and @code{as} assume the computer that will run the programs it | |
112 | assembles will obey these rules. | |
113 | ||
114 | A (memory) @dfn{address} is 32 bits. The lowest address is zero. | |
115 | ||
116 | The @dfn{contents} of any memory address is one @dfn{byte} of | |
117 | exactly 8 bits. | |
118 | ||
119 | A @dfn{word} is 16 bits stored in two bytes of memory. The addresses | |
120 | of the bytes differ by exactly 1. Notice that the interpretation of | |
121 | the bits in a word and of how to address a word depends on which | |
122 | particular computer you are assembling for. | |
123 | ||
124 | A @dfn{long word}, or @dfn{long}, is 32 bits composed of four bytes. | |
125 | It is stored in 4 bytes of memory; these bytes have contiguous | |
126 | addresses. Again the interpretation and addressing of those bits is | |
127 | machine dependent. National Semiconductor 32x32 computers say | |
128 | @i{double word} where we say @i{long}. | |
129 | ||
130 | Numeric quantities are usually @i{unsigned} or @i{2's complement}. | |
131 | Bytes, words and longs may store numbers. @code{as} manipulates | |
132 | integer expressions as 32-bit numbers in 2's complement format. | |
133 | When asked to store an integer in a byte or word, the lowest order | |
134 | bits are stored. The order of bytes in a word or long in memory is | |
135 | determined by what kind of computer will run the assembled program. | |
136 | We won't mention this important @i{caveat} again. | |
137 | ||
138 | The meaning of these terms has changed over time. Although @i{byte} | |
139 | used to mean any length of contiguous bits, @i{byte} now pervasively | |
140 | means exactly 8 contiguous bits. A @i{word} of 16 bits made sense | |
141 | for 16-bit computers. Even on 32-bit computers, a @i{word} still | |
142 | means 16 bits (to machine language programmers). To many other | |
143 | programmers of GNU a @i{word} means 32 bits, so beware. Similarly | |
144 | @i{long} means 32 bits: from ``long word''. National Semiconductor | |
145 | 32x32 machine language calls a 32-bit number a ``double word''. | |
146 | ||
147 | @example | |
148 | ||
149 | Names for integers of different sizes: some conventions | |
150 | ||
151 | ||
152 | length as vax 32x32 680x0 GNU C | |
153 | (bits) | |
154 | ||
155 | 8 byte byte byte byte char | |
156 | 16 word word word word short (int) | |
157 | 32 long long(-word) double-word long(-word) long (int) | |
158 | 64 quad quad(-word) | |
159 | 128 octa octa-word | |
160 | ||
161 | @end example | |
162 | ||
163 | @section as, the GNU Assembler | |
164 | @dfn{As} is an assembler; it is one of the team of programs that | |
165 | `compile' your programs into the binary numbers that a computer uses | |
166 | to `run' your program. Often @code{as} reads a @i{source} program | |
167 | written by a compiler and writes an @dfn{object} program for the | |
168 | linker (sometimes referred to as a @dfn{loader}) @code{ld} to read. | |
169 | ||
170 | The source program consists of @dfn{statements} and comments. Each | |
171 | statement might @dfn{assemble} to one (and only one) machine | |
172 | language instruction or to one very simple datum. | |
173 | ||
174 | Mostly you don't have to think about the assembler because the | |
175 | compiler invokes it as needed; in that sense the assembler is just | |
176 | another part of the compiler. If you write your own assembly | |
177 | language program, then you must run the assembler yourself to get an | |
178 | object file suitable for linking. You can read below how to do this. | |
179 | ||
180 | @code{as} is only intended to assemble the output of the C compiler | |
181 | @code{cc} for use by the linker @code{ld}. @code{as} tries to | |
182 | assemble correctly everything that the standard assembler would | |
183 | assemble, with a few exceptions (described in the machine-dependent | |
184 | chapters.) Note that this doesn't mean @code{as} will use the same | |
185 | syntax as the standard assembler. For example, we know of several | |
186 | incompatable syntaxes for the 680x0. | |
187 | ||
188 | Each version of the assembler knows about just one kind of machine | |
189 | language, but much is common between the versions, including object | |
190 | file formats, (most) assembler directives (often called | |
191 | @dfn{pseudo-ops)} and assembler syntax. | |
192 | ||
193 | Unlike older assemblers, @code{as} tries to assemble a source program | |
194 | in one pass of the source file. This subtly changes the meaning of | |
195 | the @kbd{.org} directive (@xref{Org}.). | |
196 | ||
197 | If you want to write assembly language programs, you must tell | |
198 | @code{as} what numbers should be in a computer's memory, and which | |
199 | addresses should contain them, so that the program may be executed | |
200 | by the computer. Using symbols will prevent many bookkeeping | |
201 | mistakes that can occur if you use raw numbers. | |
202 | ||
203 | @section Command Line Synopsis | |
204 | @example | |
205 | as [ options @dots{} ] [ file1 @dots{} ] | |
206 | @end example | |
207 | ||
208 | After the program name @code{as}, the command line may contain | |
209 | options and file names. Options may be in any order, and may be | |
210 | before, after, or between file names. The order of file names is | |
211 | significant. | |
212 | ||
213 | @subsection Options | |
214 | ||
215 | Except for @samp{--} any command line argument that begins with a | |
216 | hyphen (@samp{-}) is an option. Each option changes the behavior of | |
217 | @code{as}. No option changes the way another option works. An | |
218 | option is a @samp{-} followed by one ore more letters; the case of | |
219 | the letter is important. No option (letter) should be used twice on | |
220 | the same command line. (Nobody has decided what two copies of the | |
221 | same option should mean.) All options are optional. | |
222 | ||
223 | Some options expect exactly one file name to follow them. The file | |
224 | name may either immediately follow the option's letter (compatible | |
225 | with older assemblers) or it may be the next command argument (GNU | |
226 | standard). These two command lines are equivalent: | |
227 | ||
228 | @example | |
229 | as -o my-object-file.o mumble | |
230 | as -omy-object-file.o mumble | |
231 | @end example | |
232 | ||
233 | Always, @file{--} (that's two hyphens, not one) by itself names the | |
234 | standard input file. | |
235 | ||
236 | @section Input File(s) | |
237 | ||
238 | We use the words @dfn{source program}, abbreviated @dfn{source}, to | |
239 | describe the program input to one run of @code{as}. The program may | |
240 | be in one or more files; how the source is partitioned into files | |
241 | doesn't change the meaning of the source. | |
242 | ||
243 | The source text is a catenation of the text in each file. | |
244 | ||
245 | Each time you run @code{as} it assembles exactly one source | |
246 | program. A source program text is made of one or more files. | |
247 | (The standard input is also a file.) | |
248 | ||
249 | You give @code{as} a command line that has zero or more input file | |
250 | names. The input files are read (from left file name to right). A | |
251 | command line argument (in any position) that has no special meaning | |
252 | is taken to be an input file name. If @code{as} is given no file | |
253 | names it attempts to read one input file from @code{as}'s standard | |
254 | input. | |
255 | ||
256 | Use @file{--} if you need to explicitly name the standard input file | |
257 | in your command line. | |
258 | ||
259 | It is OK to assemble an empty source. @code{as} will produce a | |
260 | small, empty object file. | |
261 | ||
262 | If you try to assemble no files then @code{as} will try to read | |
263 | standard input, which is normally your terminal. You may have to | |
264 | type @key{ctl-D} to tell @code{as} there is no more program to | |
265 | assemble. | |
266 | ||
267 | @subsection Input Filenames and Line-numbers | |
268 | A line is text up to and including the next newline. The first line | |
269 | of a file is numbered @b{1}, the next @b{2} and so on. | |
270 | ||
271 | There are two ways of locating a line in the input file(s) and both | |
272 | are used in reporting error messages. One way refers to a line | |
273 | number in a physical file; the other refers to a line number in a | |
274 | logical file. | |
275 | ||
276 | @dfn{Physical files} are those files named in the command line given | |
277 | to @code{as}. | |
278 | ||
279 | @dfn{Logical files} are ``pretend'' files which bear no relation to | |
280 | physical files. Logical file names help error messages reflect the | |
281 | proper source file. Often they are used when @code{as}' source is | |
282 | itself synthesized from other files. | |
283 | ||
284 | @section Output (Object) File | |
285 | Every time you run @code{as} it produces an output file, which is | |
286 | your assembly language program translated into numbers. This file | |
287 | is the object file; named @code{a.out} unless you tell @code{as} to | |
288 | give it another name by using the @code{-o} option. Conventionally, | |
289 | object file names end with @file{.o}. The default name of | |
290 | @file{a.out} is used for historical reasons. Older assemblers were | |
291 | capable of assembling self-contained programs directly into a | |
292 | runnable program. This may still work, but hasn't been tested. | |
293 | ||
294 | The object file is for input to the linker @code{ld}. It contains | |
295 | assembled program code, information to help @code{ld} to integrate | |
296 | the assembled program into a runnable file and (optionally) symbolic | |
297 | information for the debugger. The precise format of object files is | |
298 | described elsewhere. | |
299 | ||
300 | @comment link above to some info file(s) like the description of a.out. | |
301 | @comment don't forget to describe GNU info as well as Unix lossage. | |
302 | ||
303 | @section Error and Warning Messages | |
304 | ||
305 | @code{as} may write warnings and error messages to the standard | |
306 | error file (usually your terminal). This should not happen when | |
307 | @code{as} is run automatically by a compiler. Error messages are | |
308 | useful for those (few) people who still write in assembly language. | |
309 | ||
310 | Warnings report an assumption made so that @code{as} could keep | |
311 | assembling a flawed program. | |
312 | ||
313 | Errors report a grave problem that stops the assembly. | |
314 | ||
315 | Warning messages have the format | |
316 | @example | |
317 | file_name:line_number:Warning Message Text | |
318 | @end example | |
319 | If a logical file name has been given (@xref{File}.) it is used for | |
320 | the filename, otherwise the name of the current input file is used. | |
321 | If a logical line number was given (@xref{Line}.) then it is used to | |
322 | calculate the number printed, otherwise the actual line in the | |
323 | current source file is printed. The message text is intended to be | |
324 | self explanatory (In the grand Unix tradition). | |
325 | ||
326 | Error messages have the format | |
327 | @example | |
328 | file_name:line_number:FATAL:Error Message Text | |
329 | @end example | |
330 | The file name and line number are derived the same as for warning | |
331 | messages. The actual message text may be rather less explanatory | |
332 | because many of them aren't supposed to happen. | |
333 | ||
334 | @section Options | |
335 | @subsection -f Works Faster | |
336 | @samp{-f} should only be used when assembling programs written by a | |
337 | (trusted) compiler. @samp{-f} causes the assembler to not bother | |
338 | pre-processing the input file(s) before assembling them. Needless | |
339 | to say, if the files actually need to be pre-processed (if the | |
340 | contain comments, for example), @code{as} will not work correctly if | |
341 | @samp{-f} is used. | |
342 | ||
343 | @subsection -L Includes Local Labels | |
344 | For historical reasons, labels beginning with @samp{L} (upper case | |
345 | only) are called @dfn{local labels}. Normally you don't see such | |
346 | labels because they are intended for the use of programs (like | |
347 | compilers) that compose assembler programs, not for your notice. | |
348 | Normally both @code{as} and @code{ld} discard such labels, so you | |
349 | don't normally debug with them. | |
350 | ||
351 | This option tells @code{as} to retain those @samp{L@dots{}} symbols | |
352 | in the object file. Usually if you do this you also tell the linker | |
353 | @code{ld} to preserve symbols whose names begin with @samp{L}. | |
354 | ||
355 | @subsection -o Names the Object File | |
356 | There is always one object file output when you run @code{as}. By | |
357 | default it has the name @file{a.out}. You use this option (which | |
358 | takes exactly one filename) to give the object file a different name. | |
359 | ||
360 | Whatever the object file is called, @code{as} will overwrite any | |
361 | existing file of the same name. | |
362 | ||
363 | @subsection -R Folds Data Segment into Text Segment | |
364 | @code{-R} tells @code{as} to write the object file as if all | |
365 | data-segment data lives in the text segment. This is only done at | |
366 | the very last moment: your binary data are the same, but data | |
367 | segment parts are relocated differently. The data segment part of | |
368 | your object file is zero bytes long because all it bytes are | |
369 | appended to the text segment. (@xref{Segments}.) | |
370 | ||
371 | When you use @code{-R} it would be nice to generate shorter address | |
372 | displacements (possible because we don't have to cross segments) | |
373 | between text and data segment. We don't do this simply for | |
374 | compatibility with older versions of @code{as}. @code{-R} may work | |
375 | this way in future. | |
376 | ||
377 | @subsection -W Represses Warnings | |
378 | @code{as} should never give a warning or error message when | |
379 | assembling compiler output. But programs written by people often | |
380 | cause @code{as} to give a warning that a particular assumption was | |
381 | made. All such warnings are directed to the standard error file. | |
382 | If you use this option, any warning is repressed. This option only | |
383 | affects warning messages: it cannot change any detail of how | |
384 | @code{as} assembles your file. Errors, which stop the assembly, are | |
385 | still reported. | |
386 | ||
387 | @section Special Features to support Compilers | |
388 | ||
389 | In order to assemble compiler output into something that will work, | |
390 | @code{as} will occasionlly do strange things to @samp{.word} | |
391 | directives. In particular, when @code{gas} assembles a directive of | |
392 | the form @samp{.word sym1-sym2}, and the difference between | |
393 | @code{sym1} and @code{sym2} does not fit in 16 bits, @code{as} will | |
394 | create a @dfn{secondary jump table}, immediately before the next | |
395 | label. This @var{secondary jump table} will be preceeded by a | |
396 | short-jump to the first byte after the table. The short-jump | |
397 | prevents the flow-of-control from accidentally falling into the | |
398 | table. Inside the table will be a long-jump to @code{sym2}. The | |
399 | original @samp{.word} will contain @code{sym1} minus (the address of | |
400 | the long-jump to sym2) If there were several @samp{.word sym1-sym2} | |
401 | before the secondary jump table, all of them will be adjusted. If | |
402 | ther was a @samp{.word sym3-sym4}, that also did not fit in sixteen | |
403 | bits, a long-jump to @code{sym4} will be included in the secondary | |
404 | jump table, and the @code{.word}(s), will be adjusted to contain | |
405 | @code{sym3} minus (the address of the long-jump to sym4), etc. | |
406 | ||
407 | @emph{This feature may be disabled by compiling @code{as} with the | |
408 | @samp{-DWORKING_DOT_WORD} option.} This feature is likely to confuse | |
409 | assembly language programmers. | |
410 | ||
411 | @node Syntax, Segments, top, top | |
412 | @chapter Syntax | |
413 | This chapter informally defines the machine-independent syntax | |
414 | allowed in a source file. @code{as} has ordinary syntax; it tries | |
415 | to be upward compatible from BSD 4.2 assembler except @code{as} does | |
416 | not assemble Vax bit-fields. | |
417 | ||
418 | @section The Pre-processor | |
419 | The preprocess phase handles several aspects of the syntax. The | |
420 | pre-processor will be disabled by the @samp{-f} option, or if the | |
421 | first line of the source file is @code{#NO_APP}. The option to | |
422 | disable the pre-processor was designed to make compiler output | |
423 | assemble as fast as possible. | |
424 | ||
425 | The pre-processor adjusts and removes extra whitespace. It leaves | |
426 | one space or tab before the keywords on a line, and turns any other | |
427 | whitespace on the line into a single space. | |
428 | ||
429 | The pre-processor removes all comments, replacing them with a single | |
430 | space (for /* @dots{} */ comments), or an appropriate number of | |
431 | newlines. | |
432 | ||
433 | The pre-processor converts character constants into the appropriate | |
434 | numeric values. | |
435 | ||
436 | This means that excess whitespace, comments, and character constants | |
437 | cannot be used in the portions of the input text that are not | |
438 | pre-processed. | |
439 | ||
440 | If the first line of an input file is @code{#NO_APP} or the | |
441 | @samp{-f} option is given, the input file will not be | |
442 | pre-processed. Within such an input file, parts of the file can be | |
443 | pre-processed by putting a line that says @code{#APP} before the | |
444 | text that should be pre-processed, and putting a line that says | |
445 | @code{#NO_APP} after them. This feature is mainly intend to support | |
446 | asm statements in compilers whose output normally does not need to | |
447 | be pre-processed. | |
448 | ||
449 | @section Whitespace | |
450 | @dfn{Whitespace} is one or more blanks or tabs, in any order. | |
451 | Whitespace is used to separate symbols, and to make programs neater | |
452 | for people to read. Unless within character constants | |
453 | (@xref{Characters}.), any whitespace means the same as exactly one | |
454 | space. | |
455 | ||
456 | @section Comments | |
457 | There are two ways of rendering comments to @code{as}. In both | |
458 | cases the comment is equivalent to one space. | |
459 | ||
460 | Anything from @samp{/*} through the next @samp{*/} is a comment. | |
461 | ||
462 | @example | |
463 | /* | |
464 | The only way to include a newline ('\n') in a comment | |
465 | is to use this sort of comment. | |
466 | */ | |
467 | /* This sort of comment does not nest. */ | |
468 | @end example | |
469 | ||
470 | Anything from the @dfn{line comment} character to the next newline | |
471 | considered a comment and is ignored. The line comment character is | |
472 | @samp{#} on the Vax, and @samp{|} on the 680x0. | |
473 | @xref{MachineDependent}. On some machines there are two different | |
474 | line comment characters. One will only begin a comment if it is the | |
475 | first non-whitespace character on a line, while the other will | |
476 | always begin a comment. | |
477 | ||
478 | To be compatible with past assemblers a special interpretation is | |
479 | given to lines that begin with @samp{#}. Following the @samp{#} an | |
480 | absolute expression (@pxref{Expressions}) is expected: this will be | |
481 | the logical line number of the @b{next} line. Then a string | |
482 | (@xref{Strings}.) is allowed: if present it is a new logical file | |
483 | name. The rest of the line, if any, should be whitespace. | |
484 | ||
485 | If the first non-whitespace characters on the line are not numeric, | |
486 | the line is ignored. (Just like a comment.) | |
487 | @example | |
488 | # This is an ordinary comment. | |
489 | # 42-6 "new_file_name" # New logical file name | |
490 | # This is logical line # 36. | |
491 | @end example | |
492 | This feature is deprecated, and may disappear from future versions | |
493 | of @code{as}. | |
494 | ||
495 | @section Symbols | |
496 | A @dfn{symbol} is one or more characters chosen from the set of all | |
497 | letters (both upper and lower case), digits and the three characters | |
498 | @samp{_.$}. No symbol may begin with a digit. Case is | |
499 | significant. There is no length limit: all characters are | |
500 | significant. Symbols are delimited by characters not in that set, | |
501 | or by begin/end-of-file. (@xref{Symbols}.) | |
502 | ||
503 | @section Statements | |
504 | A @dfn{statement} ends at a newline character (@samp{\n}) or at a | |
505 | semicolon (@samp{;}). The newline or semicolon is considered part | |
506 | of the preceding statement. Newlines and semicolons within | |
507 | character constants are an exception: they don't end statements. | |
508 | It is an error to end any statement with end-of-file: the last | |
509 | character of any input file should be a newline. | |
510 | ||
511 | You may write a statement on more than one line if you put a | |
512 | backslash (@kbd{\}) immediately in front of any newlines within the | |
513 | statement. When @code{as} reads a backslashed newline both | |
514 | characters are ignored. You can even put backslashed newlines in | |
515 | the middle of symbol names without changing the meaning of your | |
516 | source program. | |
517 | ||
518 | An empty statement is OK, and may include whitespace. It is ignored. | |
519 | ||
520 | Statements begin with zero or more labels, followed by a @dfn{key | |
521 | symbol} which determines what kind of statement it is. The key | |
522 | symbol determines the syntax of the rest of the statement. If the | |
523 | symbol begins with a dot (@t{.}) then the statement is an assembler | |
524 | directive: typically valid for any computer. If the symbol begins | |
525 | with a letter the statement is an assembly language | |
526 | @dfn{instruction}: it will assemble into a machine language | |
527 | instruction. Different versions of @code{as} for different | |
528 | computers will recognize different instructions. In fact, the same | |
529 | symbol may represent a different instruction in a different | |
530 | computer's assembly language. | |
531 | ||
532 | A label is usually a symbol immediately followed by a colon | |
533 | (@code{:}). Whitespace before a label or after a colon is OK. You | |
534 | may not have whitespace between a label's symbol and its colon. | |
535 | Labels are explained below. | |
536 | @xref{Labels}. | |
537 | ||
538 | @example | |
539 | label: .directive followed by something | |
540 | another$label: # This is an empty statement. | |
541 | instruction operand_1, operand_2, @dots{} | |
542 | @end example | |
543 | ||
544 | @section Constants | |
545 | A constant is a number, written so that its value is known by | |
546 | inspection, without knowing any context. Like this: | |
547 | @example | |
548 | .byte 74, 0112, 092, 0x4A, 0X4a, 'J, '\J # All the same value. | |
549 | .ascii "Ring the bell\7" # A string constant. | |
550 | .octa 0x123456789abcdef0123456789ABCDEF0 # A bignum. | |
551 | .float 0f-314159265358979323846264338327\ | |
552 | 95028841971.693993751E-40 # - pi, a flonum. | |
553 | @end example | |
554 | ||
555 | @node Characters, Strings, , Syntax | |
556 | @subsection Character Constants | |
557 | There are two kinds of character constants. @dfn{Characters} stand | |
558 | for one character in one byte and their values may be used in | |
559 | numeric expressions. String constants (properly called string | |
560 | @i{literals}) are potentially many bytes and their values may not be | |
561 | used in arithmetic expressions. | |
562 | ||
563 | @node Strings, , Characters, Syntax | |
564 | @subsubsection Strings | |
565 | A @dfn{string} is written between double-quotes. It may contain | |
566 | double-quotes or null characters. The way to get weird characters | |
567 | into a string is to @dfn{escape} these characters: precede them with | |
568 | a backslash (@code{\}) character. For example @samp{\\} represents | |
569 | one backslash: the first @code{\} is an escape which tells | |
570 | @code{as} to interpret the second character literally as a backslash | |
571 | (which prevents @code{as} from recognizing the second @code{\} as an | |
572 | escape character). The complete list of escapes follows. | |
573 | ||
574 | @table @kbd | |
575 | @item \EOF | |
576 | A @kbd{\} followed by end-of-file erroneous. It is treated just | |
577 | like an end-of-file without a preceding backslash. | |
578 | @c @item \a | |
579 | @c Mnemonic for ACKnowledge; for ASCII this is octal code 007. | |
580 | @item \b | |
581 | Mnemonic for backspace; for ASCII this is octal code 010. | |
582 | @c @item \e | |
583 | @c Mnemonic for EOText; for ASCII this is octal code 004. | |
584 | @item \f | |
585 | Mnemonic for FormFeed; for ASCII this is octal code 014. | |
586 | @item \n | |
587 | Mnemonic for newline; for ASCII this is octal code 012. | |
588 | @c @item \p | |
589 | @c Mnemonic for prefix; for ASCII this is octal code 033, usually known as @code{escape}. | |
590 | @item \r | |
591 | Mnemonic for carriage-Return; for ASCII this is octal code 015. | |
592 | @c @item \s | |
593 | @c Mnemonic for space; for ASCII this is octal code 040. Included for compliance with | |
594 | @c other assemblers. | |
595 | @item \t | |
596 | Mnemonic for horizontal Tab; for ASCII this is octal code 011. | |
597 | @c @item \v | |
598 | @c Mnemonic for Vertical tab; for ASCII this is octal code 013. | |
599 | @c @item \x @var{digit} @var{digit} @var{digit} | |
600 | @c A hexadecimal character code. The numeric code is 3 hexadecimal digits. | |
601 | @item \ @var{digit} @var{digit} @var{digit} | |
602 | An octal character code. The numeric code is 3 octal digits. | |
603 | For compatibility with other Unix systems, 8 and 9 are legal digits | |
604 | with values 010 and 011 respectively. | |
605 | @item \\ | |
606 | Represents one @samp{\} character. | |
607 | @c @item \' | |
608 | @c Represents one @samp{'} (accent acute) character. | |
609 | @c This is needed in single character literals | |
610 | @c (@xref{Characters}.) to represent | |
611 | @c a @samp{'}. | |
612 | @item \" | |
613 | Represents one @samp{"} character. Needed in strings to represent | |
614 | this character, because an unescaped @samp{"} would end the string. | |
615 | @item \ @var{anything-else} | |
616 | Any other character when escaped by @kbd{\} will give a warning, but | |
617 | assemble as if the @samp{\} was not present. The idea is that if | |
618 | you used an escape sequence you clearly didn't want the literal | |
619 | interpretation of the following character. However @code{as} has no | |
620 | other interpretation, so @code{as} knows it is giving you the wrong | |
621 | code and warns you of the fact. | |
622 | @end table | |
623 | ||
624 | Which characters are escapable, and what those escapes represent, | |
625 | varies widely among assemblers. The current set is what we think | |
626 | BSD 4.2 @code{as} recognizes, and is a subset of what most C | |
627 | compilers recognize. If you are in doubt, don't use an escape | |
628 | sequence. | |
629 | ||
630 | @subsubsection Characters | |
631 | A single character may be written as a single quote immediately | |
632 | followed by that character. The same escapes apply to characters as | |
633 | to strings. So if you want to write the character backslash, you | |
634 | must write @kbd{'\\} where the first @code{\} escapes the second | |
635 | @code{\}. As you can see, the quote is an accent acute, not an | |
636 | accent grave. A newline (or semicolon (@samp{;})) immediately | |
637 | following an accent acute is taken as a literal character and does | |
638 | not count as the end of a statement. The value of a character | |
639 | constant in a numeric expression is the machine's byte-wide code for | |
640 | that character. @code{as} assumes your character code is ASCII: @kbd{'A} | |
641 | means 65, @kbd{'B} means 66, and so on. | |
642 | ||
643 | @subsection Number Constants | |
644 | @code{as} distinguishes 3 flavors of numbers according to how they | |
645 | are stored in the target machine. @i{Integers} are numbers that | |
646 | would fit into an @code{int} in the C language. @i{Bignums} are | |
647 | integers, but they are stored in a more than 32 bits. @i{Flonums} | |
648 | are floating point numbers, described below. | |
649 | ||
650 | @subsubsection Integers | |
651 | An octal integer is @samp{0} followed by zero or more of the octal | |
652 | digits (@samp{01234567}). | |
653 | ||
654 | A decimal integer starts with a non-zero digit followed by zero or | |
655 | more digits (@samp{0123456789}). | |
656 | ||
657 | A hexadecimal integer is @samp{0x} or @samp{0X} followed by one or | |
658 | more hexadecimal digits chosen from @samp{0123456789abcdefABCDEF}. | |
659 | ||
660 | Integers have the obvious values. To denote a negative integer, use | |
661 | the unary operator @samp{-} discussed under expressions | |
662 | (@xref{Unops}.). | |
663 | ||
664 | @subsubsection Bignums | |
665 | A @dfn{bignum} has the same syntax and semantics as an integer | |
666 | except that the number (or its negative) takes more than 32 bits to | |
667 | represent in binary. The distinction is made because in some places | |
668 | integers are permitted while bignums are not. | |
669 | ||
670 | @subsubsection Flonums | |
671 | A @dfn{flonum} represents a floating point number. The translation | |
672 | is complex: a decimal floating point number from the text is | |
673 | converted by @code{as} to a generic binary floating point number of | |
674 | more than sufficient precision. This generic floating point number | |
675 | is converted to the particular computer's floating point format(s) | |
676 | by a portion of @code{as} specialized to that computer. | |
677 | ||
678 | A flonum is written by writing (in order) | |
679 | @itemize @bullet | |
680 | @item | |
681 | The digit @samp{0}. | |
682 | @item | |
683 | A letter, to tell @code{as} the rest of the number is a flonum. | |
684 | @kbd{e} | |
685 | is recommended. Case is not important. | |
686 | (Any otherwise illegal letter will work here, | |
687 | but that might be changed. Vax BSD 4.2 assembler | |
688 | seems to allow any of @samp{defghDEFGH}.) | |
689 | @item | |
690 | An optional sign: either @samp{+} or @samp{-}. | |
691 | @item | |
692 | An optional integer part: zero or more decimal digits. | |
693 | @item | |
694 | An optional fraction part: @samp{.} followed by zero | |
695 | or more decimal digits. | |
696 | @item | |
697 | An optional exponent, consisting of: | |
698 | @itemize @bullet | |
699 | @item | |
700 | A letter; the exact significance varies according to | |
701 | the computer that executes the program. @code{as} | |
702 | accepts any letter for now. Case is not important. | |
703 | @item | |
704 | Optional sign: either @samp{+} or @samp{-}. | |
705 | @item | |
706 | One or more decimal digits. | |
707 | @end itemize | |
708 | @end itemize | |
709 | ||
710 | At least one of @var{integer part} or @var{fraction part} must be | |
711 | present. The floating point number has the obvious value. | |
712 | ||
713 | The computer running @code{as} needs no floating point hardware. | |
714 | @code{as} does all processing using integers. | |
715 | ||
716 | @node Segments, Symbols, Syntax, top | |
717 | @chapter (Sub)Segments & Relocation | |
718 | Roughly, a @dfn{segment} is a range of addresses, with no gaps, with | |
719 | all data ``in'' those addresses being treated the same. For example | |
720 | there may be a ``read only'' segment. | |
721 | ||
722 | The linker @code{ld} reads many object files (partial programs) and | |
723 | combines their contents to form a runnable program. When @code{as} | |
724 | emits an object file, the partial program is assumed to start at | |
725 | address 0. @code{ld} will assign the final addresses the partial | |
726 | program occupies, so that different partial programs don't overlap. | |
727 | That explanation is too simple, but it will suffice to explain how | |
728 | @code{as} works. | |
729 | ||
730 | @code{ld} moves blocks of bytes of your program to their run-time | |
731 | addresses. These blocks slide to their run-time addresses as rigid | |
732 | units; their length does not change and neither does the order of | |
733 | bytes within them. Such a rigid unit is called a @i{segment}. | |
734 | Assigning run-time addresses to segments is called | |
735 | @dfn{relocation}. It includes the task of adjusting mentions of | |
736 | object-file addresses so they refer to the proper run-time addresses. | |
737 | ||
738 | An object file written by @code{as} has three segments, any of which | |
739 | may be empty. These are named @i{text}, @i{data} and @i{bss} | |
740 | segments. Within the object file, the text segment starts at | |
741 | address 0, the data segment follows, and the bss segment follows the | |
742 | data segment. | |
743 | ||
744 | To let @code{ld} know which data will change when the segments are | |
745 | relocated, and how to change that data, @code{as} also writes to the | |
746 | object file details of the relocation needed. To perform relocation | |
747 | @code{ld} must know for each mention of an address in the object | |
748 | file: | |
749 | @itemize @bullet | |
750 | @item | |
751 | At what address in the object file does this mention of | |
752 | an address begin? | |
753 | @item | |
754 | How long (in bytes) is this mention? | |
755 | @item | |
756 | Which segment does the address refer to? | |
757 | What is the numeric value of (@var{address} @t{-} | |
758 | @var{start-address of segment})? | |
759 | @item | |
760 | Is the mention of an address ``Program counter relative''? | |
761 | @end itemize | |
762 | ||
763 | In fact, every address @code{as} ever thinks about is expressed as | |
764 | (@var{segment} @t{+} @var{offset into segment}). Further, every | |
765 | expression @code{as} computes is of this segmented nature. So | |
766 | @dfn{absolute expression} means an expression with segment | |
767 | ``absolute'' (@xref{LdSegs}.). A @dfn{pass1 expression} means an | |
768 | expression with segment ``pass1'' (@xref{MythSegs}.). In this | |
769 | document ``(segment, offset)'' will be written as @{ segment-name | |
770 | (offset into segment) @}. | |
771 | ||
772 | Apart from text, data and bss segments you need to know about the | |
773 | @dfn{absolute} segment. When @code{ld} mixes partial programs, | |
774 | addresses in the absolute segment remain unchanged. That is, | |
775 | address @{absolute 0@} is ``relocated'' to run-time address 0 by | |
776 | @code{ld}. Although two partial programs' data segments will not | |
777 | overlap addresses after linking, @b{by definition} their absolute | |
778 | segments will overlap. Address @{absolute 239@} in one partial | |
779 | program will always be the same address when the program is running | |
780 | as address @{absolute 239@} in any other partial program. | |
781 | ||
782 | The idea of segments is extended to the @dfn{undefined} segment. | |
783 | Any address whose segment is unknown at assembly time is by | |
784 | definition rendered @{undefined (something, unknown yet)@}. Since | |
785 | numbers are always defined, the only way to generate an undefined | |
786 | address is to mention an undefined symbol. A reference to a named | |
787 | common block would be such a symbol: its value is unknown at assembly | |
788 | time so it has segment @i{undefined}. | |
789 | ||
790 | By analogy the word @i{segment} is to describe groups of segments in | |
791 | the linked program. @code{ld} puts all partial program's text | |
792 | segments in contiguous addresses in the linked program. It is | |
793 | customary to refer to the @i{text segment} of a program, meaning all | |
794 | the addresses of all partial program's text segments. Likewise for | |
795 | data and bss segments. | |
796 | ||
797 | @section Segments | |
798 | Some segments are manipulated by @code{ld}; others are invented for | |
799 | use of @code{as} and have no meaning except during assembly. | |
800 | ||
801 | @node LdSegs, , , | |
802 | @subsection ld segments | |
803 | @code{ld} deals with just 5 kinds of segments, summarized below. | |
804 | @table @b | |
805 | @item text segment | |
806 | @itemx data segment | |
807 | These segments hold your program bytes. @code{as} and @code{ld} | |
808 | treat them as separate but equal segments. Anything you can say of | |
809 | one segment is true of the other. When the program is running | |
810 | however it is customary for the text segment to be unalterable: it | |
811 | will contain instructions, constants and the like. The data segment | |
812 | of a running program is usually alterable: for example, C variables | |
813 | would be stored in the data segment. | |
814 | @item bss segment | |
815 | This segment contains zeroed bytes when your program begins | |
816 | running. It is used to hold unitialized variables or common | |
817 | storage. The length of each partial program's bss segment is | |
818 | important, but because it starts out containing zeroed bytes there | |
819 | is no need to store explicit zero bytes in the object file. The Bss | |
820 | segment was invented to eliminate those explicit zeros from object | |
821 | files. | |
822 | @item absolute segment | |
823 | Address 0 of this segment is always ``relocated'' to runtime address | |
824 | 0. This is useful if you want to refer to an address that @code{ld} | |
825 | must not change when relocating. In this sense we speak of absolute | |
826 | addresses being ``unrelocatable'': they don't change during | |
827 | relocation. | |
828 | @item undefined segment | |
829 | This ``segment'' is a catch-all for address references to objects | |
830 | not in the preceding segments. See the description of @file{a.out} | |
831 | for details. | |
832 | @end table | |
833 | An idealized example of the 3 relocatable segments follows. Memory | |
834 | addresses are on the horizontal axis. | |
835 | ||
836 | @example | |
837 | +-----+----+--+ | |
838 | partial program # 1: |ttttt|dddd|00| | |
839 | +-----+----+--+ | |
840 | ||
841 | text data bss | |
842 | seg. seg. seg. | |
843 | ||
844 | +---+---+---+ | |
845 | partial program # 2: |TTT|DDD|000| | |
846 | +---+---+---+ | |
847 | ||
848 | +--+---+-----+--+----+---+-----+~~ | |
849 | linked program: | |TTT|ttttt| |dddd|DDD|00000| | |
850 | +--+---+-----+--+----+---+-----+~~ | |
851 | ||
852 | addresses: 0 @dots{} | |
853 | @end example | |
854 | ||
855 | @node MythSegs, , , | |
856 | @subsection Mythical Segments | |
857 | These segments are invented for the internal use of @code{as}. They | |
858 | have no meaning at run-time. You don't need to know about these | |
859 | segments except that they might be mentioned in @code{as}' warning | |
860 | messages. These segments are invented to permit the value of every | |
861 | expression in your assembly language program to be a segmented | |
862 | address. | |
863 | ||
864 | @table @b | |
865 | @item absent segment | |
866 | An expression was expected and none was found. | |
867 | @item goof segment | |
868 | An internal assembler logic error has been found. This means there | |
869 | is a bug in the assembler. | |
870 | @item grand segment | |
871 | A @dfn{grand number} is a bignum or a flonum, but not an integer. | |
872 | If a number can't be written as a C @code{int} constant, it is a | |
873 | grand number. @code{as} has to remember that a flonum or a bignum | |
874 | does not fit into 32 bits, and cannot be a primary (@xref{Primary}.) | |
875 | in an expression: this is done by making a flonum or bignum be of | |
876 | type ``grand''. This is purely for internal @code{as} convenience; | |
877 | grand segment behaves similarly to absolute segment. | |
878 | @item pass1 segment | |
879 | The expression was impossible to evaluate in the first pass. The | |
880 | assembler will attempt a second pass (second reading of the source) | |
881 | to evaluate the expression. Your expression mentioned an undefined | |
882 | symbol in a way that defies the one-pass (segment + offset in | |
883 | segment) assembly process. No compiler need emit such an expression. | |
884 | @item difference segment | |
885 | As an assist to the C compiler, expressions of the forms | |
886 | @itemize @bullet | |
887 | @item | |
888 | (undefined symbol) @t{-} (expression) | |
889 | @item | |
890 | (something) @t{-} (undefined symbol) | |
891 | @item | |
892 | (undefined symbol) @t{-} (undefined symbol) | |
893 | @end itemize | |
894 | are permitted to belong to the ``difference'' segment. @code{as} | |
895 | re-evaluates such expressions after the source file has been read | |
896 | and the symbol table built. If by that time there are no undefined | |
897 | symbols in the expression then the expression assumes a new segment. | |
898 | The intention is to permit statements like @samp{.word label - | |
899 | base_of_table} to be assembled in one pass where both @code{label} | |
900 | and @code{base_of_table} are undefined. This is useful for | |
901 | compiling C and Algol switch statements, Pascal case statements, | |
902 | FORTRAN computed goto statements and the like. | |
903 | @end table | |
904 | ||
905 | @section Sub-Segments | |
906 | Assembled bytes fall into two segments: text and data. Because you | |
907 | may have groups of text or data that you want to end up near to each | |
908 | other in the object file, @code{as}, allows you to use | |
909 | @dfn{subsegments}. Within each segment, there can be numbered | |
910 | subsegments with values from 0 to 8192. Objects assembled into the | |
911 | same subsegment will be grouped with other objects in the same | |
912 | subsegment when they are all put into the object file. For example, | |
913 | a compiler might want to store constants in the text segment, but | |
914 | might not want to have them intersperced with the program being | |
915 | assembled. In this case, the compiler could issue a @code{text 0} | |
916 | before each section of code being output, and a @code{text 1} before | |
917 | each group of constants being output. | |
918 | ||
919 | Subsegments are optional. If you don't used subsegments, everything | |
920 | will be stored in subsegment number zero. | |
921 | ||
922 | Each subsegment is zero-padded up to a multiple of four bytes. | |
923 | (Subsegments may be padded a different amount on different flavors | |
924 | of @code{as}.) Subsegments appear in your object file in numeric | |
925 | order, lowest numbered to highest. (All this to be compatible with | |
926 | other people's assemblers.) The object file, @code{ld} @i{etc.} | |
927 | have no concept of subsegments. They just see all your text | |
928 | subsegments as a text segment, and all your data subsegments as a | |
929 | data segment. | |
930 | ||
931 | To specify which subsegment you want subsequent statements assembled | |
932 | into, use a @samp{.text @var{expression}} or a @samp{.data | |
933 | @var{expression}} statement. @var{Expression} should be an absolute | |
934 | expression. (@xref{Expressions}.) If you just say @samp{.text} | |
935 | then @samp{.text 0} is assumed. Likewise @samp{.data} means | |
936 | @samp{.data 0}. Assembly begins in @code{text 0}. | |
937 | For instance: | |
938 | @example | |
939 | .text 0 # The default subsegment is text 0 anyway. | |
940 | .ascii "This lives in the first text subsegment. *" | |
941 | .text 1 | |
942 | .ascii "But this lives in the second text subsegment." | |
943 | .data 0 | |
944 | .ascii "This lives in the data segment," | |
945 | .ascii "in the first data subsegment." | |
946 | .text 0 | |
947 | .ascii "This lives in the first text segment," | |
948 | .ascii "immediately following the asterisk (*)." | |
949 | @end example | |
950 | ||
951 | Each segment has a @dfn{location counter} incremented by one for | |
952 | every byte assembled into that segment. Because subsegments are | |
953 | merely a convenience restricted to @code{as} there is no concept of | |
954 | a subsegment location counter. There is no way to directly | |
955 | manipulate a location counter. The location counter of the segment | |
956 | that statements are being assembled into is said to be the | |
957 | @dfn{active} location counter. | |
958 | ||
959 | @section Bss Segment | |
960 | The @code{bss} segment is used for local common variable storage. | |
961 | You may allocate address space in the @code{bss} segment, but you may | |
962 | not dictate data to load into it before your program executes. When | |
963 | your program starts running, all the contents of the @code{bss} | |
964 | segment are zeroed bytes. | |
965 | ||
966 | Addresses in the bss segment are allocated with a special statement; | |
967 | you may not assemble anything directly into the bss segment. Hence | |
968 | there are no bss subsegments. | |
969 | ||
970 | @node Symbols, Expressions, Segments, top | |
971 | @chapter Symbols | |
972 | Because the linker uses symbols to link, the debugger uses symbols | |
973 | to debug and the programmer uses symbols to name things, symbols are | |
974 | a central concept. Symbols do not appear in the object file in the | |
975 | order they are declared. This may break some debuggers. | |
976 | ||
977 | @node Labels, , , Symbols | |
978 | @section Labels | |
979 | A @dfn{label} is written as a symbol immediately followed by a colon | |
980 | (@samp{:}). The symbol then represents the current value of the | |
981 | active location counter, and is, for example, a suitable instruction | |
982 | operand. You are warned if you use the same symbol to represent two | |
983 | different locations: the first definition overrides any other | |
984 | definitions. | |
985 | ||
986 | @section Giving Symbols Other Values | |
987 | A symbol can be given an arbitrary value by writing a symbol followed | |
988 | by an equals sign (@samp{=}) followed by an expression | |
989 | (@pxref{Expressions}). This is equivalent to using the @code{.set} | |
990 | directive. (@xref{Set}.) | |
991 | ||
992 | @section Symbol Names | |
993 | Symbol names begin with a letter or with one of @samp{$._}. That | |
994 | character may be followed by any string of digits, letters, | |
995 | underscores and dollar signs. Case of letters is significant: | |
996 | @code{foo} is a different symbol name than @code{Foo}. | |
997 | ||
998 | Each symbol has exactly one name. Each name in an assembly program | |
999 | refers to exactly one symbol. You may use that symbol name any | |
1000 | number of times in an assembly program. | |
1001 | ||
1002 | @subsection Local Symbol Names | |
1003 | ||
1004 | Local symbols help compilers and programmers use names temporarily. | |
1005 | There are ten @dfn{local} symbol names, which are re-used throughout | |
1006 | the program. Their names are @samp{0} @samp{1} @dots{} @samp{9}. | |
1007 | To define a local symbol, write a label of the form | |
1008 | @var{digit}@t{:}. To refer to the most recent previous definition | |
1009 | of that symbol write @var{digit}@t{b}, using the same digit as when | |
1010 | you defined the label. To refer to the next definition of a local | |
1011 | label, write @var{digit}@t{f} where @var{digit} gives you a choice | |
1012 | of 10 forward references. The @samp{b} stands for ``backwards'' and | |
1013 | the @samp{f} stands for ``forwards''. | |
1014 | ||
1015 | Local symbols are not used by the current C compiler. | |
1016 | ||
1017 | There is no restriction on how you can use these labels, but | |
1018 | remember that at any point in the assembly you can refer to at most | |
1019 | 10 prior local labels and to at most 10 forward local labels. | |
1020 | ||
1021 | Local symbol names are only a notation device. They are immediately | |
1022 | transformed into more conventional symbol names before the assembler | |
1023 | thinks about them. The symbol names stored in the symbol table, | |
1024 | appearing in error messages and optionally emitted to the object | |
1025 | file have these parts: | |
1026 | @table @kbd | |
1027 | @item L | |
1028 | All local labels begin with @samp{L}. Normally both @code{as} and | |
1029 | @code{ld} forget symbols that start with @samp{L}. These labels are | |
1030 | used for symbols you are never intended to see. If you give the | |
1031 | @samp{-L} option then @code{as} will retain these symbols in the | |
1032 | object file. By instructing @code{ld} to also retain these symbols, | |
1033 | you may use them in debugging. | |
1034 | @item @i{a digit} | |
1035 | If the label is written @samp{0:} then the digit is @samp{0}. | |
1036 | If the label is written @samp{1:} then the digit is @samp{1}. | |
1037 | And so on up through @samp{9:}. | |
1038 | @item @i{control}-A | |
1039 | This unusual character is included so you don't accidentally invent | |
1040 | a symbol of the same name. The character has ASCII value | |
1041 | @samp{\001}. | |
1042 | @item @i{an ordinal number} | |
1043 | This is like a serial number to keep the labels distinct. The first | |
1044 | @samp{0:} gets the number @samp{1}; The 15th @samp{0:} gets the | |
1045 | number @samp{15}; @i{etc.}. Likewise for the other labels @samp{1:} | |
1046 | through @samp{9:}. | |
1047 | @end table | |
1048 | For instance, the | |
1049 | first @code{1:} is named @code{L1^A1}, the 44th @code{3:} is named @code{L3^A44}. | |
1050 | ||
1051 | @section The Special Dot Symbol | |
1052 | ||
1053 | The special symbol @code{.} refers to the current address that | |
1054 | @code{as} is assembling into. Thus, the expression @samp{melvin: | |
1055 | .long .} will cause @var{melvin} to contain its own address. | |
1056 | Assigning a value to @code{.} is treated the same as a @code{.org} | |
1057 | directive. Thus, the expression @samp{.=.+4} is the same as saying | |
1058 | @samp{.space 4}. | |
1059 | ||
1060 | @section Symbol Attributes | |
1061 | Every symbol has the attributes discussed below. The detailed | |
1062 | definitions are in <a.out.h>. | |
1063 | ||
1064 | If you use a symbol without defining it, @code{as} assumes zero for | |
1065 | all these attributes, and probably won't warn you. This makes the | |
1066 | symbol an externally defined symbol, which is generally what you | |
1067 | would want. | |
1068 | ||
1069 | @subsection Value | |
1070 | The value of a symbol is (usually) 32 bits, the size of one C | |
1071 | @code{int}. For a symbol which labels a location in the | |
1072 | @code{text}, @code{data}, @code{bss} or @code{Absolute} segments the | |
1073 | value is the number of addresses from the start of that segment to | |
1074 | the label. Naturally for @code{text} @code{data} and @code{bss} | |
1075 | segments the value of a symbol changes as @code{ld} changes segment | |
1076 | base addresses during linking. @code{absolute} symbols' values do | |
1077 | not change during linking: that is why they are called absolute. | |
1078 | ||
1079 | The value of an undefined symbol is treated in a special way. If it | |
1080 | is 0 then the symbol is not defined in this assembler source | |
1081 | program, and @code{ld} will try to determine its value from other | |
1082 | programs it is linked with. You make this kind of symbol simply by | |
1083 | mentioning a symbol name without defining it. A non-zero value | |
1084 | represents a @code{.comm} common declaration. The value is how much | |
1085 | common storage to reserve, in bytes (@i{i.e.} addresses). The | |
1086 | symbol refers to the first address of the allocated storage. | |
1087 | ||
1088 | @subsection Type | |
1089 | The type attribute of a symbol is 8 bits encoded in a devious way. | |
1090 | We kept this coding standard for compatibility with older operating | |
1091 | systems. | |
1092 | ||
1093 | @example | |
1094 | ||
1095 | 7 6 5 4 3 2 1 0 bit numbers | |
1096 | +-----+-----+-----+-----+-----+-----+-----+-----+ | |
1097 | | | | | | |
1098 | | N_STAB bits | N_TYPE bits |N_EXT| | |
1099 | | | | bit | | |
1100 | +-----+-----+-----+-----+-----+-----+-----+-----+ | |
1101 | ||
1102 | n_type byte | |
1103 | @end example | |
1104 | ||
1105 | @subsubsection N_EXT bit | |
1106 | This bit is set if @code{ld} might need to use the symbol's value | |
1107 | and type bits. If this bit is re-set then @code{ld} can ignore the | |
1108 | symbol while linking. It is set in two cases. If the symbol is | |
1109 | undefined, then @code{ld} is expected to find the symbol's value | |
1110 | elsewhere in another program module. Otherwise the symbol has the | |
1111 | value given, but this symbol name and value are revealed to any other | |
1112 | programs linked in the same executable program. This second use of | |
1113 | the @code{N_EXT} bit is most often done by a @code{.globl} statement. | |
1114 | ||
1115 | @subsubsection N_TYPE bits | |
1116 | These establish the symbol's ``type'', which is mainly a relocation | |
1117 | concept. Common values are detailed in the manual describing the | |
1118 | executable file format. | |
1119 | ||
1120 | @subsubsection N_STAB bits | |
1121 | Common values for these bits are described in the manual on the | |
1122 | executable file format. | |
1123 | ||
1124 | @subsection Desc(riptor) | |
1125 | This is an arbitrary 16-bit value. You may establish a symbol's | |
1126 | descriptor value by using a @code{.desc} statement (@xref{Desc}.). | |
1127 | A descriptor value means nothing to @code{as}. | |
1128 | ||
1129 | @subsection Other | |
1130 | This is an arbitrary 8-bit value. It means nothing to @code{as}. | |
1131 | ||
1132 | @node Expressions, PseudoOps, Symbols, top | |
1133 | @chapter Expressions | |
1134 | An @dfn{expression} specifies an address or numeric value. | |
1135 | Whitespace may precede and/or follow an expression. | |
1136 | ||
1137 | @section Empty Expressions | |
1138 | An empty expression has no operands: it is just whitespace or null. | |
1139 | Wherever an absolute expression is required, you may omit the | |
1140 | expression and @code{as} will assume a value of (absolute) 0. This | |
1141 | is compatible with other assemblers. | |
1142 | ||
1143 | @section Integer Expressions | |
1144 | An @dfn{integer expression} is one or more @i{primaries} delimited | |
1145 | by @i{operators}. | |
1146 | ||
1147 | @node Primary, Unops, , Expressions | |
1148 | @subsection Primaries | |
1149 | @dfn{Primaries} are symbols, numbers or subexpressions. Other | |
1150 | languages might call primaries ``arithmetic operands'' but we don't | |
1151 | want them confused with ``instruction operands'' of the machine | |
1152 | language so we give them a different name. | |
1153 | ||
1154 | Symbols are evaluated to yield @{@var{segment} @var{value}@} where | |
1155 | @var{segment} is one of @b{text}, @b{data}, @b{bss}, @b{absolute}, | |
1156 | or @b{undefined}. @var{value} is a signed 2's complement 32 bit | |
1157 | integer. | |
1158 | ||
1159 | Numbers are usually integers. | |
1160 | ||
1161 | A number can be a flonum or bignum. In this case, you are warned | |
1162 | that only the low order 32 bits are used, and @code{as} pretends | |
1163 | these 32 bits are an integer. You may write integer-manipulating | |
1164 | instructions that act on exotic constants, compatible with other | |
1165 | assemblers. | |
1166 | ||
1167 | Subexpressions are a left parenthesis (@t{(}) followed by an integer | |
1168 | expression followed by a right parenthesis (@t{)}), or a unary | |
1169 | operator followed by an primary. | |
1170 | ||
1171 | @subsection Operators | |
1172 | @dfn{Operators} are arithmetic marks, like @t{+} or @t{%}. Unary | |
1173 | operators are followed by an primary. Binary operators appear | |
1174 | between primaries. Operators may be preceded and/or followed by | |
1175 | whitespace. | |
1176 | ||
1177 | @subsection Unary Operators | |
1178 | @node Unops, , Primary, Expressions | |
1179 | @code{as} has the following @dfn{unary operators}. They each take | |
1180 | one primary, which must be absolute. | |
1181 | @table @t | |
1182 | @item - | |
1183 | Hyphen. @dfn{Negation}. Two's complement negation. | |
1184 | @item ~ | |
1185 | Tilde. @dfn{Complementation}. Bitwise not. | |
1186 | @end table | |
1187 | ||
1188 | @subsection Binary Operators | |
1189 | @dfn{Binary operators} are infix. Operators are prioritized, but | |
1190 | equal priority operators are performed left to right. Apart from | |
1191 | @samp{+} or @samp{-}, both primaries must be absolute, and the | |
1192 | result is absolute, else one primary can be either undefined or | |
1193 | pass1 and the result is pass1. | |
1194 | @enumerate | |
1195 | @item | |
1196 | Highest Priority | |
1197 | @table @code | |
1198 | @item * | |
1199 | @dfn{Multiplication}. | |
1200 | @item / | |
1201 | @dfn{Division}. Truncation is the same as the C operator @samp{/} | |
1202 | of the compiler that compiled @code{as}. | |
1203 | @item % | |
1204 | @dfn{Remainder}. | |
1205 | @item < | |
1206 | @itemx << | |
1207 | @dfn{Shift Left}. Same as the C operator @samp{<<} of | |
1208 | the compiler that compiled @code{as}. | |
1209 | @item > | |
1210 | @itemx >> | |
1211 | @dfn{Shift Right}. Same as the C operator @samp{>>} of | |
1212 | the compiler that compiled @code{as}. | |
1213 | @end table | |
1214 | @item | |
1215 | Intermediate priority | |
1216 | @table @t | |
1217 | @item | | |
1218 | @dfn{Bitwise Inclusive Or}. | |
1219 | @item & | |
1220 | @dfn{Bitwise And}. | |
1221 | @item ^ | |
1222 | @dfn{Bitwise Exclusive Or}. | |
1223 | @item ! | |
1224 | @dfn{Bitwise Or Not}. | |
1225 | @end table | |
1226 | @item | |
1227 | Lowest Priority | |
1228 | @table @t | |
1229 | @item + | |
1230 | @dfn{Addition}. If either primary is absolute, the result | |
1231 | has the segment of the other primary. | |
1232 | If either primary is pass1 or undefined, result is pass1. | |
1233 | Otherwise @t{+} is illegal. | |
1234 | @item - | |
1235 | @dfn{Subtraction}. If the right primary is absolute, the | |
1236 | result has the segment of the left primary. | |
1237 | If either primary is pass1 the result is pass1. | |
1238 | If either primary is undefined the result is difference segment. | |
1239 | If both primaries are in the same segment, the result is absolute; provided | |
1240 | that segment is one of text, data or bss. | |
1241 | Otherwise @t{-} is illegal. | |
1242 | @end table | |
1243 | @end enumerate | |
1244 | ||
1245 | The sense of the rules is that you can't add or subtract quantities | |
1246 | from two different segments. If both primaries are in one of these | |
1247 | segments, they must be in the same segment: @b{text}, @b{data} or | |
1248 | @b{bss}, and the operator must be @samp{-}. | |
1249 | ||
1250 | @node PseudoOps, MachineDependent, Expressions, top | |
1251 | @chapter Assembler Directives | |
1252 | @menu | |
1253 | * Abort:: The Abort directive causes as to abort | |
1254 | * Align:: Pad the location counter to a power of 2 | |
1255 | * Ascii:: Fill memory with bytes of ASCII characters | |
1256 | * Asciz:: Fill memory with bytes of ASCII characters followed | |
1257 | by a null. | |
1258 | * Byte:: Fill memory with 8-bit integers | |
1259 | * Comm:: Reserve public space in the BSS segment | |
1260 | * Data:: Change to the data segment | |
1261 | * Desc:: Set the n_desc of a symbol | |
1262 | * Double:: Fill memory with double-precision floating-point numbers | |
1263 | * File:: Set the logical file name | |
1264 | * Fill:: Fill memory with repeated values | |
1265 | * Float:: Fill memory with single-precision floating-point numbers | |
1266 | * Global:: Make a symbol visible to the linker | |
1267 | * Int:: Fill memory with 32-bit integers | |
1268 | * Lcomm:: Reserve private space in the BSS segment | |
1269 | * Line:: Set the logical line number | |
1270 | * Long:: Fill memory with 32-bit integers | |
1271 | * Lsym:: Create a local symbol | |
1272 | * Octa:: Fill memory with 128-bit integers | |
1273 | * Org:: Change the location counter | |
1274 | * Quad:: Fill memory with 64-bit integers | |
1275 | * Set:: Set the value of a symbol | |
1276 | * Short:: Fill memory with 16-bit integers | |
1277 | * Space:: Fill memory with a repeated value | |
1278 | * Stab:: Store debugging information | |
1279 | * Text:: Change to the text segment | |
1280 | * Word:: Fill memory with 16-bit integers | |
1281 | @end menu | |
1282 | ||
1283 | All assembler directives begin with a symbol that begins with a | |
1284 | period (@samp{.}). The rest of the symbol is letters: their case | |
1285 | does not matter. | |
1286 | ||
1287 | @node Abort, Align, PseudoOps, PseudoOps | |
1288 | @section .abort | |
1289 | This directive stops the assembly immediately. It is for | |
1290 | compatibility with other assemblers. The original idea was that the | |
1291 | assembler program would be piped into the assembler. If the source | |
1292 | of program wanted to quit, then this directive tells @code{as} to | |
1293 | quit also. One day @code{.abort} will not be supported. | |
1294 | ||
1295 | @node Align, Ascii, Abort, PseudoOps | |
1296 | @section .align @var{absolute-expression} , @var{absolute-expression} | |
1297 | Pad the location counter (in the current subsegment) to a word, | |
1298 | longword or whatever boundary. The first expression is the number | |
1299 | of low-order zero bits the location counter will have after | |
1300 | advancement. For example @samp{.align 3} will advance the location | |
1301 | counter until it a multiple of 8. If the location counter is | |
1302 | already a multiple of 8, no change is needed. | |
1303 | ||
1304 | The second expression gives the value to be stored in the padding | |
1305 | bytes. It (and the comma) may be omitted. If it is omitted, the | |
1306 | padding bytes are zeroed. | |
1307 | ||
1308 | @node Ascii, Asciz, Align, PseudoOps | |
1309 | @section .ascii @var{strings} | |
1310 | This expects zero or more string literals (@xref{Strings}.) | |
1311 | separated by commas. It assembles each string (with no automatic | |
1312 | trailing zero byte) into consecutive addresses. | |
1313 | ||
1314 | @node Asciz, Byte, Ascii, PseudoOps | |
1315 | @section .asciz @var{strings} | |
1316 | This is just like .ascii, but each string is followed by a zero byte. | |
1317 | The `z' in `.asciz' stands for `zero'. | |
1318 | ||
1319 | @node Byte, Comm, Asciz, PseudoOps | |
1320 | @section .byte @var{expressions} | |
1321 | ||
1322 | This expects zero or more expressions, separated by commas. | |
1323 | Each expression is assembled into the next byte. | |
1324 | ||
1325 | @node Comm, Data, Byte, PseudoOps | |
1326 | @section .comm @var{symbol} , @var{length} | |
1327 | This declares a named common area in the bss segment. Normally | |
1328 | @code{ld} reserves memory addresses for it during linking, so no | |
1329 | partial program defines the location of the symbol. Tell @code{ld} | |
1330 | that it must be at least @var{length} bytes long. @code{ld} will | |
1331 | allocate space that is at least as long as the longest @code{.comm} | |
1332 | request in any of the partial programs linked. @var{length} is an | |
1333 | absolute expression. | |
1334 | ||
1335 | @node Data, Desc, Comm, PseudoOps | |
1336 | @section .data @var{subsegment} | |
1337 | This tells @code{as} to assemble the following statements onto the | |
1338 | end of the data subsegment numbered @var{subsegment} (which is an | |
1339 | absolute expression). If @var{subsegment} is omitted, it defaults | |
1340 | to zero. | |
1341 | ||
1342 | @node Desc, Double, Data, PseudoOps | |
1343 | @section .desc @var{symbol}, @var{absolute-expression} | |
1344 | This sets @code{n_desc} of the symbol to the low 16 bits of | |
1345 | @var{absolute-expression}. | |
1346 | ||
1347 | @node Double, File, Desc, PseudoOps | |
1348 | @section .double @var{flonums} | |
1349 | This expects zero or more flonums, separated by commas. It assembles | |
1350 | floating point numbers. The exact kind of floating point numbers | |
1351 | emitted depends on what computer @code{as} is assembling for. See | |
1352 | the machine-specific part of the manual for the machine the | |
1353 | assembler is running on for more information. | |
1354 | ||
1355 | @node File, Fill, Double, PseudoOps | |
1356 | @section .file @var{string} | |
1357 | This tells @code{as} that we are about to start a new logical | |
1358 | file. @var{String} is the new file name. An empty file name | |
1359 | is OK, but you must still give the quotes: @code{""}. This | |
1360 | statement may go away in future: it is only recognized to | |
1361 | be compatible with old @code{as} programs. | |
1362 | ||
1363 | @node Fill, Float, File, PseudoOps | |
1364 | @section .fill @var{repeat} , @var{size} , @var{value} | |
1365 | @var{result}, @var{size} and @var{value} are absolute expressions. | |
1366 | This emits @var{repeat} copies of @var{size} bytes. @var{Repeat} | |
1367 | may be zero or more. @var{Size} may be zero or more, but if it is | |
1368 | more than 8, then it is deemed to have the value 8, compatible with | |
1369 | other people's assemblers. The contents of each @var{repeat} bytes | |
1370 | is taken from an 8-byte number. The highest order 4 bytes are | |
1371 | zero. The lowest order 4 bytes are @var{value} rendered in the | |
1372 | byte-order of an integer on the computer @code{as} is assembling for. | |
1373 | Each @var{size} bytes in a repetition is taken from the lowest order | |
1374 | @var{size} bytes of this number. Again, this bizarre behavior is | |
1375 | compatible with other people's assemblers. | |
1376 | ||
1377 | @var{Size} and @var{value} are optional. | |
1378 | If the second comma and @var{value} are absent, @var{value} is | |
1379 | assumed zero. If the first comma and following tokens are absent, | |
1380 | @var{size} is assumed to be 1. | |
1381 | ||
1382 | @node Float, Global, Fill, PseudoOps | |
1383 | @section .float @var{flonums} | |
1384 | This directive assembles zero or more flonums, separated by commas. | |
1385 | The exact kind of floating point numbers emitted depends on what | |
1386 | computer @code{as} is assembling for. See the machine-specific part | |
1387 | of the manual for the machine the assembler is running on for more | |
1388 | information. | |
1389 | ||
1390 | @node Global, Int, Float, PseudoOps | |
1391 | @section .global @var{symbol} | |
1392 | This makes the symbol visible to @code{ld}. If you define | |
1393 | @var{symbol} in your partial program, its value is made available to | |
1394 | other partial programs that are linked with it. Otherwise, | |
1395 | @var{symbol} will take its attributes from a symbol of the same name | |
1396 | from another partial program it is linked with. | |
1397 | ||
1398 | This is done by setting the @code{N_EXT} bit | |
1399 | of that symbol's @code{n_type} to 1. | |
1400 | ||
1401 | @node Int, Lcomm, Global, PseudoOps | |
1402 | @section .int @var{expressions} | |
1403 | Expect zero or more @var{expressions}, of any segment, separated by | |
1404 | commas. For each expression, emit a 32-bit number that will, at run | |
1405 | time, be the value of that expression. The byte order of the | |
1406 | expression depends on what kind of computer will run the program. | |
1407 | ||
1408 | @node Lcomm, Line, Int, PseudoOps | |
1409 | @section .lcomm @var{symbol} , @var{length} | |
1410 | Reserve @var{length} (an absolute expression) bytes for a local | |
1411 | common and denoted by @var{symbol}, whose segment and value are | |
1412 | those of the new local common. The addresses are allocated in the | |
1413 | @code{bss} segment, so at run-time the bytes will start off zeroed. | |
1414 | @var{Symbol} is not declared global (@xref{Global}.), so is normally | |
1415 | not visible to @code{ld}. | |
1416 | ||
1417 | @node Line, Long, Lcomm, PseudoOps | |
1418 | @section .line @var{logical line number} | |
1419 | This tells @code{as} to change the logical line number. | |
1420 | @var{logical line number} is an absolute expression. The next line | |
1421 | will have that logical line number. So any other statements on the | |
1422 | current line (after a @code{;}) will be reported as on logical line | |
1423 | number @var{logical line number} - 1. One day this directive will | |
1424 | be unsupported: it is used only for compatibility with existing | |
1425 | assembler programs. | |
1426 | ||
1427 | @node Long, Lsym, Line, PseudoOps | |
1428 | @section .long @var{expressions} | |
1429 | This is the same as @samp{.int}, @pxref{Int}. | |
1430 | ||
1431 | @node Lsym, Octa, Long, PseudoOps | |
1432 | @section .lsym @var{symbol}, @var{expression} | |
1433 | This creates a new symbol named @var{symbol}, but do not put it in | |
1434 | the hash table, ensuring it cannot be referenced by name during the | |
1435 | rest of the assembly. This sets the attributes of the symbol to be | |
1436 | the same as the expression value. @code{n_other} = @code{n_desc} = | |
1437 | 0. @code{n_type} = (whatever segment the expression has); the | |
1438 | @code{N_EXT} bit of @code{n_type} is zero. @code{n_value} = | |
1439 | (expression's value). | |
1440 | ||
1441 | @node Octa, Org, Lsym, PseudoOps | |
1442 | @section .octa @var{bignums} | |
1443 | This expects zero or more bignums, separated by commas. For each | |
1444 | bignum, it emits an 16-byte (@b{octa}-word) integer. | |
1445 | ||
1446 | @node Org, Quad, Octa, PseudoOps | |
1447 | @section .org @var{new-lc} , @var{fill} | |
1448 | This will advance the location counter of the current segment to | |
1449 | @var{new-lc}. @var{new-lc} is either an absolute expression or an | |
1450 | expression with the same segment as the current subsegment. That | |
1451 | is, you can't use @code{.org} to cross segments. Because @code{as} | |
1452 | tries to assemble programs in one pass @var{new-lc} must be defined. | |
1453 | If you really detest this restriction we eagerly await a chance to | |
1454 | share your improved assembler. To be compatible with former | |
1455 | assemblers, if the segment of @var{new-lc} is absolute then we | |
1456 | pretend the segment of @var{new-lc} is the same as the current | |
1457 | subsegment. | |
1458 | ||
1459 | Beware that the origin is relative to the start of the segment, not | |
1460 | to the start of the subsegment. This is compatible with other | |
1461 | people's assemblers. | |
1462 | ||
1463 | If the location counter (of the current subsegment) is advanced, the | |
1464 | intervening bytes are filled with @var{fill} which should be an | |
1465 | absolute expression. If the comma and @var{fill} are omitted, | |
1466 | @var{fill} defaults to zero. | |
1467 | ||
1468 | @node Quad, Set, Org, PseudoOps | |
1469 | @section .quad @var{bignums} | |
1470 | This expects zero or more bignums, separated by commas. For each | |
1471 | bignum, it emits an 8-byte (@b{quad}-word) integer. If the bignum | |
1472 | won't fit in a quad-word, it prints a warning message; and just | |
1473 | takes the lowest order 8 bytes of the bignum. | |
1474 | ||
1475 | @node Set, Short, Quad, PseudoOps | |
1476 | @section .set @var{symbol}, @var{expression} | |
1477 | ||
1478 | This sets the value of @var{symbol} to expression. This will change | |
1479 | @code{n_value} and @code{n_type} to conform to the @var{expression}. | |
1480 | if @code{n_ext} is set, it remains set. | |
1481 | ||
1482 | It is OK to @code{.set} a symbol many times in the same assembly. | |
1483 | If the expression's segment is unknowable during pass 1, a second | |
1484 | pass over the source program will be forced. The second pass is | |
1485 | currently not implemented. @code{as} will abort with an error | |
1486 | message if one is required. | |
1487 | ||
1488 | If you @code{.set} a global symbol, the value stored in the object | |
1489 | file is the last value stored into it. | |
1490 | ||
1491 | @node Short, Space, Set, PseudoOps | |
1492 | @section .short @var{expressions} | |
1493 | Except on the Sparc this is the same as @samp{.word}. @xref{Word}. | |
1494 | On the sparc, this expects zero or more @var{expressions}, and emits | |
1495 | a 16 bit number for each. | |
1496 | ||
1497 | @node Space, Stab, Short, PseudoOps | |
1498 | @section .space @var{size} , @var{fill} | |
1499 | This emits @var{size} bytes, each of value @var{fill}. Both | |
1500 | @var{size} and @var{fill} are absolute expressions. If the comma | |
1501 | and @var{fill} are omitted, @var{fill} is assumed to be zero. | |
1502 | ||
1503 | @node Stab, Text, Space, PseudoOps | |
1504 | @section .stabd, .stabn, .stabs | |
1505 | There are three directives that begin @code{.stab@dots{}}. | |
1506 | All emit symbols, for use by symbolic debuggers. | |
1507 | The symbols are not entered in @code{as}' hash table: they | |
1508 | cannot be referenced elsewhere in the source file. | |
1509 | Up to five fields are required: | |
1510 | @table @var | |
1511 | @item string | |
1512 | This is the symbol's name. It may contain any character except @samp{\000}, | |
1513 | so is more general than ordinary symbol names. Some debuggers used to | |
1514 | code arbitrarily complex structures into symbol names using this technique. | |
1515 | @item type | |
1516 | An absolute expression. The symbol's @code{n_type} is set to the low 8 | |
1517 | bits of this expression. | |
1518 | Any bit pattern is permitted, but @code{ld} and debuggers will choke on | |
1519 | silly bit patterns. | |
1520 | @item other | |
1521 | An absolute expression. | |
1522 | The symbol's @code{n_other} is set to the low 8 bits of this expression. | |
1523 | @item desc | |
1524 | An absolute expression. | |
1525 | The symbol's @code{n_desc} is set to the low 16 bits of this expression. | |
1526 | @item value | |
1527 | An absolute expression which becomes the symbol's @code{n_value}. | |
1528 | @end table | |
1529 | ||
1530 | If a warning is detected while reading the @code{.stab@dots{}} | |
1531 | statement the symbol has probably already been created and you will | |
1532 | get a half-formed symbol in your object file. This is compatible | |
1533 | with earlier assemblers (!) | |
1534 | ||
1535 | .stabd @var{type} , @var{other} , @var{desc} | |
1536 | ||
1537 | The ``name'' of the symbol generated is not even an empty string. | |
1538 | It is a null pointer, for compatibility. Older assemblers used a | |
1539 | null pointer so they didn't waste space in object files with empty | |
1540 | strings. | |
1541 | ||
1542 | The symbol's @code{n_value} is set to the location counter, | |
1543 | relocatably. When your program is linked, the value of this symbol | |
1544 | will be where the location counter was when the @code{.stabd} was | |
1545 | assembled. | |
1546 | ||
1547 | .stabn @var{type} , @var{other} , @var{desc} , @var{value} | |
1548 | ||
1549 | The name of the symbol is set to the empty string @code{""}. | |
1550 | ||
1551 | .stabs @var{string} , @var{type} , @var{other} , @var{desc} , @var{value} | |
1552 | ||
1553 | @node Text, Word, Stab, PseudoOps | |
1554 | @section .text @var{subsegment} | |
1555 | Tells @code{as} to assemble the following statements onto the end of | |
1556 | the text subsegment numbered @var{subsegment}, which is an absolute | |
1557 | expression. If @var{subsegment} is omitted, subsegment number zero | |
1558 | is used. | |
1559 | ||
1560 | @node Word, , Text, PseudoOps | |
1561 | @section .word @var{expressions} | |
1562 | On the Sparc, this produces 32-bit numbers instead of 16-bit ones. | |
1563 | This expect zero or more @var{expressions}, of any segment, | |
1564 | separated by commas. For each expression, emit a 16-bit number that | |
1565 | will, at run time, be the value of that expression. The byte order | |
1566 | of the expression depends on what kind of computer will run the | |
1567 | program. | |
1568 | ||
1569 | @section Deprecated Directives | |
1570 | One day these directives won't work. | |
1571 | They are included for compatibility with older assemblers. | |
1572 | @table @t | |
1573 | @item .abort | |
1574 | @item .file | |
1575 | @item .line | |
1576 | @end table | |
1577 | ||
1578 | @node MachineDependent, Maintenance, PseudoOps, top | |
1579 | @chapter Machine Dependent Features | |
1580 | @section Vax | |
1581 | @subsection Options | |
1582 | ||
1583 | The Vax version of @code{as} accepts any of the following options, | |
1584 | gives a warning message that the option was ignored and proceeds. | |
1585 | These options are for compatibility with scripts designed for other | |
1586 | people's assemblers. | |
1587 | ||
1588 | @table @asis | |
1589 | @item @kbd{-D} (Debug) | |
1590 | @itemx @kbd{-S} (Symbol Table) | |
1591 | @itemx @kbd{-T} (Token Trace) | |
1592 | These are obsolete options used to debug old assemblers. | |
1593 | ||
1594 | @item @kbd{-d} (Displacement size for JUMPs) | |
1595 | This option expects a number following the @kbd{-d}. Like options | |
1596 | that expect filenames, the number may immediately follow the | |
1597 | @kbd{-d} (old standard) or constitute the whole of the command line | |
1598 | argument that follows @kbd{-d} (GNU standard). | |
1599 | ||
1600 | @item @kbd{-V} (Virtualize Interpass Temporary File) | |
1601 | Some other assemblers use a temporary file. This option | |
1602 | commanded them to keep the information in active memory rather | |
1603 | than in a disk file. @code{as} always does this, so this | |
1604 | option is redundant. | |
1605 | ||
1606 | @item @kbd{-J} (JUMPify Longer Branches) | |
1607 | Many 32-bit computers permit a variety of branch instructions | |
1608 | to do the same job. Some of these instructions are short (and | |
1609 | fast) but have a limited range; others are long (and slow) but | |
1610 | can branch anywhere in virtual memory. Often there are 3 | |
1611 | flavors of branch: short, medium and long. Some other | |
1612 | assemblers would emit short and medium branches, unless told by | |
1613 | this option to emit short and long branches. | |
1614 | ||
1615 | @item @kbd{-t} (Temporary File Directory) | |
1616 | Some other assemblers may use a temporary file, and this option | |
1617 | takes a filename being the directory to site the temporary | |
1618 | file. @code{as} does not use a temporary disk file, so this | |
1619 | option makes no difference. @kbd{-t} needs exactly one | |
1620 | filename. | |
1621 | @end table | |
1622 | ||
1623 | The Vax version of the assembler accepts two options when | |
1624 | compiled for VMS. They are @kbd{-h}, and @kbd{-+}. The | |
1625 | @kbd{-h} option prevents @code{as} from modifying the | |
1626 | symbol-table entries for symbols that contain lowercase | |
1627 | characters (I think). The @kbd{-+} option causes @code{as} to | |
1628 | print warning messages if the FILENAME part of the object file, | |
1629 | or any symbol name is larger than 31 characters. The @kbd{-+} | |
1630 | option also insertes some code following the @samp{_main} | |
1631 | symbol so that the object file will be compatable with Vax-11 | |
1632 | "C". | |
1633 | ||
1634 | @subsection Floating Point | |
1635 | Conversion of flonums to floating point is correct, and | |
1636 | compatible with previous assemblers. Rounding is | |
1637 | towards zero if the remainder is exactly half the least significant bit. | |
1638 | ||
1639 | @code{D}, @code{F}, @code{G} and @code{H} floating point formats | |
1640 | are understood. | |
1641 | ||
1642 | Immediate floating literals (@i{e.g.} @samp{S`$6.9}) | |
1643 | are rendered correctly. Again, rounding is towards zero in the | |
1644 | boundary case. | |
1645 | ||
1646 | The @code{.float} directive produces @code{f} format numbers. | |
1647 | The @code{.double} directive produces @code{d} format numbers. | |
1648 | ||
1649 | @subsection Machine Directives | |
1650 | The Vax version of the assembler supports four directives for | |
1651 | generating Vax floating point constants. They are described in the | |
1652 | table below. | |
1653 | ||
1654 | @table @code | |
1655 | @item .dfloat | |
1656 | This expects zero or more flonums, separated by commas, and | |
1657 | assembles Vax @code{d} format 64-bit floating point constants. | |
1658 | ||
1659 | @item .ffloat | |
1660 | This expects zero or more flonums, separated by commas, and | |
1661 | assembles Vax @code{f} format 32-bit floating point constants. | |
1662 | ||
1663 | @item .gfloat | |
1664 | This expects zero or more flonums, separated by commas, and | |
1665 | assembles Vax @code{g} format 64-bit floating point constants. | |
1666 | ||
1667 | @item .hfloat | |
1668 | This expects zero or more flonums, separated by commas, and | |
1669 | assembles Vax @code{h} format 128-bit floating point constants. | |
1670 | ||
1671 | @end table | |
1672 | ||
1673 | @subsection Opcodes | |
1674 | All DEC mnemonics are supported. Beware that @code{case@dots{}} | |
1675 | instructions have exactly 3 operands. The dispatch table that | |
1676 | follows the @code{case@dots{}} instruction should be made with | |
1677 | @code{.word} statements. This is compatible with all unix | |
1678 | assemblers we know of. | |
1679 | ||
1680 | @subsection Branch Improvement | |
1681 | Certain pseudo opcodes are permitted. They are for branch | |
1682 | instructions. They expand to the shortest branch instruction that | |
1683 | will reach the target. Generally these mnemonics are made by | |
1684 | substituting @samp{j} for @samp{b} at the start of a DEC mnemonic. | |
1685 | This feature is included both for compatibility and to help | |
1686 | compilers. If you don't need this feature, don't use these | |
1687 | opcodes. Here are the mnemonics, and the code they can expand into. | |
1688 | ||
1689 | @table @code | |
1690 | @item jbsb | |
1691 | @samp{Jsb} is already an instruction mnemonic, so we chose @samp{jbsb}. | |
1692 | @table @asis | |
1693 | @item (byte displacement) | |
1694 | @kbd{bsbb @dots{}} | |
1695 | @item (word displacement) | |
1696 | @kbd{bsbw @dots{}} | |
1697 | @item (long displacement) | |
1698 | @kbd{jsb @dots{}} | |
1699 | @end table | |
1700 | @item jbr | |
1701 | @itemx jr | |
1702 | Unconditional branch. | |
1703 | @table @asis | |
1704 | @item (byte displacement) | |
1705 | @kbd{brb @dots{}} | |
1706 | @item (word displacement) | |
1707 | @kbd{brw @dots{}} | |
1708 | @item (long displacement) | |
1709 | @kbd{jmp @dots{}} | |
1710 | @end table | |
1711 | @item j@var{COND} | |
1712 | @var{COND} may be any one of the conditional branches | |
1713 | @code{neq nequ eql eqlu gtr geq lss gtru lequ vc vs gequ cc lssu cs}. | |
1714 | @var{COND} may also be one of the bit tests | |
1715 | @code{bs bc bss bcs bsc bcc bssi bcci lbs lbc}. | |
1716 | @var{NOTCOND} is the opposite condition to @var{COND}. | |
1717 | @table @asis | |
1718 | @item (byte displacement) | |
1719 | @kbd{b@var{COND} @dots{}} | |
1720 | @item (word displacement) | |
1721 | @kbd{b@var{UNCOND} foo ; brw @dots{} ; foo:} | |
1722 | @item (long displacement) | |
1723 | @kbd{b@var{UNCOND} foo ; jmp @dots{} ; foo:} | |
1724 | @end table | |
1725 | @item jacb@var{X} | |
1726 | @var{X} may be one of @code{b d f g h l w}. | |
1727 | @table @asis | |
1728 | @item (word displacement) | |
1729 | @kbd{@var{OPCODE} @dots{}} | |
1730 | @item (long displacement) | |
1731 | @kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: jmp @dots{} ; bar:} | |
1732 | @end table | |
1733 | @item jaob@var{YYY} | |
1734 | @var{YYY} may be one of @code{lss leq}. | |
1735 | @item jsob@var{ZZZ} | |
1736 | @var{ZZZ} may be one of @code{geq gtr}. | |
1737 | @table @asis | |
1738 | @item (byte displacement) | |
1739 | @kbd{@var{OPCODE} @dots{}} | |
1740 | @item (word displacement) | |
1741 | @kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: brw @var{destination} ; bar:} | |
1742 | @item (long displacement) | |
1743 | @kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: jmp @var{destination} ; bar: } | |
1744 | @end table | |
1745 | @item aobleq | |
1746 | @itemx aoblss | |
1747 | @itemx sobgeq | |
1748 | @itemx sobgtr | |
1749 | @table @asis | |
1750 | @item (byte displacement) | |
1751 | @kbd{@var{OPCODE} @dots{}} | |
1752 | @item (word displacement) | |
1753 | @kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: brw @var{destination} ; bar:} | |
1754 | @item (long displacement) | |
1755 | @kbd{@var{OPCODE} @dots{}, foo ; brb bar ; foo: jmp @var{destination} ; bar:} | |
1756 | @end table | |
1757 | @end table | |
1758 | ||
1759 | @subsection operands | |
1760 | The immediate character is @samp{$} for Unix compatibility, not | |
1761 | @samp{#} as DEC writes it. | |
1762 | ||
1763 | The indirect character is @samp{*} for Unix compatibility, not | |
1764 | @samp{@@} as DEC writes it. | |
1765 | ||
1766 | The displacement sizing character is @samp{`} (an accent grave) for | |
1767 | Unix compatibility, not @samp{^} as DEC writes it. The letter | |
1768 | preceding @samp{`} may have either case. @samp{G} is not | |
1769 | understood, but all other letters (@code{b i l s w}) are understood. | |
1770 | ||
1771 | Register names understood are @code{r0 r1 r2 @dots{} r15 ap fp sp | |
1772 | pc}. Any case of letters will do. | |
1773 | ||
1774 | For instance | |
1775 | @example | |
1776 | tstb *w`$4(r5) | |
1777 | @end example | |
1778 | ||
1779 | Any expression is permitted in an operand. Operands are comma | |
1780 | separated. | |
1781 | ||
1782 | @c There is some bug to do with recognizing expressions | |
1783 | @c in operands, but I forget what it is. It is | |
1784 | @c a syntax clash because () is used as an address mode | |
1785 | @c and to encapsulate sub-expressions. | |
1786 | @subsection Not Supported | |
1787 | Vax bit fields can not be assembled with @code{as}. Someone | |
1788 | can add the required code if they really need it. | |
1789 | ||
1790 | @section 680x0 | |
1791 | @subsection Options | |
1792 | The 680x0 version of @code{as} has two machine dependent options. | |
1793 | One shortens undefined references from 32 to 16 bits, while the | |
1794 | other is used to tell @code{as} what kind of machine it is | |
1795 | assembling for. | |
1796 | ||
1797 | You can use the @kbd{-l} option to shorten the size of references to | |
1798 | undefined symbols. If the @kbd{-l} option is not given, references | |
1799 | to undefined symbols will be a full long (32 bits) wide. (Since | |
1800 | @code{as} cannot know where these symbols will end up being, | |
1801 | @code{as} can only allocate space for the linker to fill in later. | |
1802 | Since @code{as} doesn't know how far away these symbols will be, it | |
1803 | allocates as much space as it can.) If this option is given, the | |
1804 | references will only be one word wide (16 bits). This may be useful | |
1805 | if you want the object file to be as small as possible, and you know | |
1806 | that the relevant symbols will be less than 17 bits away. | |
1807 | ||
1808 | The 680x0 version of @code{as} is usually used to assemble programs | |
1809 | for the Motorola MC68020 microprocessor. Occasionally it is used to | |
1810 | assemble programs for the mostly-similar-but-slightly-different | |
1811 | MC68000 or MC68010 microprocessors. You can give @code{as} the | |
1812 | options @samp{-m68000}, @samp{-mc68000}, @samp{-m68010}, | |
1813 | @samp{-mc68010}, @samp{-m68020}, and @samp{-mc68020} to tell it what | |
1814 | processor it should be assembling for. Unfortunately, these options | |
1815 | are almost entirely unused and untried. They make work, but nobody | |
1816 | has tested them much. | |
1817 | ||
1818 | @subsection Syntax | |
1819 | ||
1820 | The 680x0 version of @code{as} uses syntax similar to the Sun | |
1821 | assembler. Size modifieres are appended directly to the end of the | |
1822 | opcode without an intervening period. Thus, @samp{move.l} is | |
1823 | written @samp{movl}, etc. | |
1824 | ||
1825 | @c This is no longer true | |
1826 | @c Explicit size modifiers for branch instructions are ignored; @code{as} | |
1827 | @c automatically picks the smallest size that will reach the | |
1828 | destination. | |
1829 | ||
1830 | If @code{as} is compiled with SUN_ASM_SYNTAX defined, it will also | |
1831 | allow Sun-style local labels of the form @samp{1$} through @samp{$9}. | |
1832 | ||
1833 | In the following table @dfn{apc} stands for any of the address | |
1834 | registers (@samp{a0} through @samp{a7}), nothing, (@samp{}), the | |
1835 | Program Counter (@samp{pc}), or the zero-address relative to the | |
1836 | program counter (@samp{zpc}). | |
1837 | ||
1838 | The following addressing modes are understood: | |
1839 | @table @dfn | |
1840 | @item Immediate | |
1841 | @samp{#@var{digits}} | |
1842 | ||
1843 | @item Data Register | |
1844 | @samp{d0} through @samp{d7} | |
1845 | ||
1846 | @item Address Register | |
1847 | @samp{a0} through @samp{a7} | |
1848 | ||
1849 | @item Address Register Indirect | |
1850 | @samp{a0@@} through @samp{a7@@} | |
1851 | ||
1852 | @item Address Register Postincrement | |
1853 | @samp{a0@@+} through @samp{a7@@+} | |
1854 | ||
1855 | @item Address Register Predecrement | |
1856 | @samp{a0@@-} through @samp{a7@@-} | |
1857 | ||
1858 | @item Indirect Plus Offset | |
1859 | @samp{@var{apc}@@(@var{digits})} | |
1860 | ||
1861 | @item Index | |
1862 | @samp{@var{apc}@@(@var{digits},@var{register}:@var{size}:@var{scale})} | |
1863 | or @samp{@var{apc}@@(@var{register}:@var{size}:@var{scale})} | |
1864 | ||
1865 | @item Postindex | |
1866 | @samp{@var{apc}@@(@var{digits})@@(@var{digits},@var{register}:@var{size}:@var{scale})} | |
1867 | or @samp{@var{apc}@@(@var{digits})@@(@var{register}:@var{size}:@var{scale})} | |
1868 | ||
1869 | @item Preindex | |
1870 | @samp{@var{apc}@@(@var{digits},@var{register}:@var{size}:@var{scale})@@(@var{digits})} | |
1871 | or @samp{@var{apc}@@(@var{register}:@var{size}:@var{scale})@@(@var{digits})} | |
1872 | ||
1873 | @item Memory Indirect | |
1874 | @samp{@var{apc}@@(@var{digits})@@(@var{digits})} | |
1875 | ||
1876 | @item Absolute | |
1877 | @samp{@var{symbol}}, or @samp{@var{digits}}, or either of the above followed | |
1878 | by @samp{:b}, @samp{:w}, or @samp{:l}. | |
1879 | @end table | |
1880 | ||
1881 | @subsection Floating Point | |
1882 | The floating point code is not too well tested, and may have | |
1883 | subtle bugs in it. | |
1884 | ||
1885 | Packed decimal (P) format floating literals are not supported. | |
1886 | Feel free to add the code yourself. | |
1887 | ||
1888 | The floating point formats generated by directives are these. | |
1889 | @table @code | |
1890 | @item .float | |
1891 | @code{Single} precision floating point constants. | |
1892 | @item .double | |
1893 | @code{Double} precision floating point constants. | |
1894 | @end table | |
1895 | ||
1896 | There is no directive to produce regions of memory holding | |
1897 | extended precision numbers, however they can be used as | |
1898 | immediate operands to floating-point instructions. Adding a | |
1899 | directive to create extended precision numbers would not be | |
1900 | hard. Nobody has felt any burning need to do it. | |
1901 | ||
1902 | @subsection Machine Directives | |
1903 | In order to be compatible with the Sun assembler the 680x0 assembler | |
1904 | understands the following directives. | |
1905 | @table @code | |
1906 | @item .data1 | |
1907 | This directive is identical to a @code{.data 1} directive. | |
1908 | @item .data2 | |
1909 | This directive is identical to a @code{.data 2} directive. | |
1910 | @item .even | |
1911 | This directive is identical to a @code{.align 1} directive. | |
1912 | @c Is this true? does it work??? | |
1913 | @item .skip | |
1914 | This directive is identical to a @code{.space} directive. | |
1915 | @end table | |
1916 | ||
1917 | @subsection Opcodes | |
1918 | Danger: Several bugs have been found in the opcode table (and | |
1919 | fixed). More bugs may exist. Be careful when using obscure | |
1920 | instructions. | |
1921 | ||
1922 | The assembler automatically chooses the proper size for branch | |
1923 | instructions. However, most attempts to force a short displacement | |
1924 | will be honored. Branches that are forced to use a short | |
1925 | displacement will not be adjusted if the target is out of range. | |
1926 | Let The User Beware. | |
1927 | ||
1928 | The immediate character is @samp{#} for Sun compatibility. The | |
1929 | line-comment character is @samp{|}. If a @samp{#} appears at the | |
1930 | beginning of a line, it is treated as a comment unless it looks like | |
1931 | @samp{# line file}, in which case it is treated normally. | |
1932 | ||
1933 | @section 32x32 | |
1934 | @subsection Options | |
1935 | The 32x32 version of @code{as} accepts a @kbd{-m32032} option to | |
1936 | specify thiat it is compiling for a 32032 processor, or a | |
1937 | @kbd{-m32532} to specify that it is compiling for a 32532 option. | |
1938 | The default (if neither is specified) is chosen when the assembler | |
1939 | is compiled. | |
1940 | ||
1941 | @subsection Syntax | |
1942 | I don't know anything about the 32x32 syntax assembled by | |
1943 | @code{as}. Someone who undersands the processor (I've never seen | |
1944 | one) and the possible syntaxes should write this section. | |
1945 | ||
1946 | @subsection Floating Point | |
1947 | The 32x32 uses IEEE floating point numbers, but @code{as} will only | |
1948 | create single or double precision values. I don't know if the 32x32 | |
1949 | understands extended precision numbers. | |
1950 | ||
1951 | @subsection Machine Directives | |
1952 | The 32x32 has no machine dependent directives. | |
1953 | ||
1954 | @section Sparc | |
1955 | @subsection Options | |
1956 | The sparc has no machine dependent options. | |
1957 | ||
1958 | @subsection syntax | |
1959 | I don't know anything about Sparc syntax. Someone who does | |
1960 | will have to write this section. | |
1961 | ||
1962 | @subsection Floating Point | |
1963 | The Sparc uses ieee floating-point numbers. | |
1964 | ||
1965 | @subsection Machine Directives | |
1966 | The Sparc version of @code{as} supports the following additional | |
1967 | machine directives: | |
1968 | ||
1969 | @table @code | |
1970 | @item .common | |
1971 | This must be followed by a symbol name, a positive number, and | |
1972 | @code{"bss"}. This behaves somewhat like @code{.comm}, but the | |
1973 | syntax is different. | |
1974 | ||
1975 | @item .global | |
1976 | This is functionally identical to @code{.globl}. | |
1977 | ||
1978 | @item .half | |
1979 | This is functionally identical to @code{.short}. | |
1980 | ||
1981 | @item .proc | |
1982 | This directive is ignored. Any text following it on the same | |
1983 | line is also ignored. | |
1984 | ||
1985 | @item .reserve | |
1986 | This must be followed by a symbol name, a positive number, and | |
1987 | @code{"bss"}. This behaves somewhat like @code{.lcomm}, but the | |
1988 | syntax is different. | |
1989 | ||
1990 | @item .seg | |
1991 | This must be followed by @code{"text"}, @code{"data"}, or | |
1992 | @code{"data1"}. It behaves like @code{.text}, @code{.data}, or | |
1993 | @code{.data 1}. | |
1994 | ||
1995 | @item .skip | |
1996 | This is functionally identical to the .space directive. | |
1997 | ||
1998 | @item .word | |
1999 | On the Sparc, the .word directive produces 32 bit values, | |
2000 | instead of the 16 bit values it produces on every other machine. | |
2001 | ||
2002 | @end table | |
2003 | ||
2004 | @section Intel 80386 | |
2005 | @subsection Options | |
2006 | The 80386 has no machine dependent options. | |
2007 | ||
2008 | @subsection AT&T Syntax versus Intel Syntax | |
2009 | In order to maintain compatibility with the output of @code{GCC}, | |
2010 | @code{as} supports AT&T System V/386 assembler syntax. This is quite | |
2011 | different from Intel syntax. We mention these differences because | |
2012 | almost all 80386 documents used only Intel syntax. Notable differences | |
2013 | between the two syntaxes are: | |
2014 | @itemize @bullet | |
2015 | @item | |
2016 | AT&T immediate operands are preceded by @samp{$}; Intel immediate | |
2017 | operands are undelimited (Intel @samp{push 4} is AT&T @samp{pushl $4}). | |
2018 | AT&T register operands are preceded by @samp{%}; Intel register operands | |
2019 | are undelimited. AT&T absolute (as opposed to PC relative) jump/call | |
2020 | operands are prefixed by @samp{*}; they are undelimited in Intel syntax. | |
2021 | ||
2022 | @item | |
2023 | AT&T and Intel syntax use the opposite order for source and destination | |
2024 | operands. Intel @samp{add eax, 4} is @samp{addl $4, %eax}. The | |
2025 | @samp{source, dest} convention is maintained for compatibility with | |
2026 | previous Unix assemblers. | |
2027 | ||
2028 | @item | |
2029 | In AT&T syntax the size of memory operands is determined from the last | |
2030 | character of the opcode name. Opcode suffixes of @samp{b}, @samp{w}, | |
2031 | and @samp{l} specify byte (8-bit), word (16-bit), and long (32-bit) | |
2032 | memory references. Intel syntax accomplishes this by prefixes memory | |
2033 | operands (@emph{not} the opcodes themselves) with @samp{byte ptr}, | |
2034 | @samp{word ptr}, and @samp{dword ptr}. Thus, Intel @samp{mov al, byte | |
2035 | ptr @var{foo}} is @samp{movb @var{foo}, %al} in AT&T syntax. | |
2036 | ||
2037 | @item | |
2038 | Immediate form long jumps and calls are | |
2039 | @samp{lcall/ljmp $@var{segment}, $@var{offset}} in AT&T syntax; the | |
2040 | Intel syntax is | |
2041 | @samp{call/jmp far @var{segment}:@var{offset}}. Also, the far return | |
2042 | instruction | |
2043 | is @samp{lret $@var{stack-adjust}} in AT&T syntax; Intel syntax is | |
2044 | @samp{ret far @var{stack-adjust}}. | |
2045 | ||
2046 | @item | |
2047 | The AT&T assembler does not provide support for multiple segment | |
2048 | programs. Unix style systems expect all programs to be single segments. | |
2049 | @end itemize | |
2050 | ||
2051 | @subsection Opcode Naming | |
2052 | Opcode names are suffixed with one character modifiers which specify the | |
2053 | size of operands. The letters @samp{b}, @samp{w}, and @samp{l} specify | |
2054 | byte, word, and long operands. If no suffix is specified by an | |
2055 | instruction and it contains no memory operands then @code{as} tries to | |
2056 | fill in the missing suffix based on the destination register operand | |
2057 | (the last one by convention). Thus, @samp{mov %ax, %bx} is equivalent | |
2058 | to @samp{movw %ax, %bx}; also, @samp{mov $1, %bx} is equivalent to | |
2059 | @samp{movw $1, %bx}. Note that this is incompatible with the AT&T Unix | |
2060 | assembler which assumes that a missing opcode suffix implies long | |
2061 | operand size. (This incompatibility does not affect compiler output | |
2062 | since compilers always explicitly specify the opcode suffix.) | |
2063 | ||
2064 | Almost all opcodes have the same names in AT&T and Intel format. There | |
2065 | are a few exceptions. The sign extend and zero extend instructions need | |
2066 | two sizes to specify them. They need a size to sign/zero extend | |
2067 | @emph{from} and a size to zero extend @emph{to}. This is accomplished | |
2068 | by using two opcode suffixes in AT&T syntax. Base names for sign extend | |
2069 | and zero extend are @samp{movs@dots{}} and @samp{movz@dots{}} in AT&T | |
2070 | syntax (@samp{movsx} and @samp{movzx} in Intel syntax). The opcode | |
2071 | suffixes are tacked on to this base name, the @emph{from} suffix before | |
2072 | the @emph{to} suffix. Thus, @samp{movsbl %al, %edx} is AT&T syntax for | |
2073 | ``move sign extend @emph{from} %al @emph{to} %edx.'' Possible suffixes, | |
2074 | thus, are @samp{bl} (from byte to long), @samp{bw} (from byte to word), | |
2075 | and @samp{wl} (from word to long). | |
2076 | ||
2077 | The Intel syntax conversion instructions | |
2078 | @itemize @bullet | |
2079 | @item | |
2080 | @samp{cbw} --- sign-extend byte in @samp{%al} to word in @samp{%ax}, | |
2081 | @item | |
2082 | @samp{cwde} --- sign-extend word in @samp{%ax} to long in @samp{%eax}, | |
2083 | @item | |
2084 | @samp{cwd} --- sign-extend word in @samp{%ax} to long in @samp{%dx:%ax}, | |
2085 | @item | |
2086 | @samp{cdq} --- sign-extend dword in @samp{%eax} to quad in @samp{%edx:%eax}, | |
2087 | @end itemize | |
2088 | are called @samp{cbtw}, @samp{cwtl}, @samp{cwtd}, and @samp{cltd} in | |
2089 | AT&T naming. @code{as} accepts either naming for these instructions. | |
2090 | ||
2091 | Far call/jump instructions are @samp{lcall} and @samp{ljmp} in | |
2092 | AT&T syntax, but are @samp{call far} and @samp{jump far} in Intel | |
2093 | convention. | |
2094 | ||
2095 | @subsection Register Naming | |
2096 | Register operands are always prefixes with @samp{%}. The 80386 registers | |
2097 | consist of | |
2098 | @itemize @bullet | |
2099 | @item | |
2100 | the 8 32-bit registers @samp{%eax} (the accumulator), @samp{%ebx}, | |
2101 | @samp{%ecx}, @samp{%edx}, @samp{%edi}, @samp{%esi}, @samp{%ebp} (the | |
2102 | frame pointer), and @samp{%esp} (the stack pointer). | |
2103 | ||
2104 | @item | |
2105 | the 8 16-bit low-ends of these: @samp{%ax}, @samp{%bx}, @samp{%cx}, | |
2106 | @samp{%dx}, @samp{%di}, @samp{%si}, @samp{%bp}, and @samp{%sp}. | |
2107 | ||
2108 | @item | |
2109 | the 8 8-bit registers: @samp{%ah}, @samp{%al}, @samp{%bh}, | |
2110 | @samp{%bl}, @samp{%ch}, @samp{%cl}, @samp{%dh}, and @samp{%dl} (These | |
2111 | are the high-bytes and low-bytes of @samp{%ax}, @samp{%bx}, | |
2112 | @samp{%cx}, and @samp{%dx}) | |
2113 | ||
2114 | @item | |
2115 | the 6 segment registers @samp{%cs} (code segment), @samp{%ds} | |
2116 | (data segment), @samp{%ss} (stack segment), @samp{%es}, @samp{%fs}, | |
2117 | and @samp{%gs}. | |
2118 | ||
2119 | @item | |
2120 | the 3 processor control registers @samp{%cr0}, @samp{%cr2}, and | |
2121 | @samp{%cr3}. | |
2122 | ||
2123 | @item | |
2124 | the 6 debug registers @samp{%db0}, @samp{%db1}, @samp{%db2}, | |
2125 | @samp{%db3}, @samp{%db6}, and @samp{%db7}. | |
2126 | ||
2127 | @item | |
2128 | the 2 test registers @samp{%tr6} and @samp{%tr7}. | |
2129 | ||
2130 | @item | |
2131 | the 8 floating point register stack @samp{%st} or equivalently | |
2132 | @samp{%st(0)}, @samp{%st(1)}, @samp{%st(2)}, @samp{%st(3)}, | |
2133 | @samp{%st(4)}, @samp{%st(5)}, @samp{%st(6)}, and @samp{%st(7)}. | |
2134 | @end itemize | |
2135 | ||
2136 | @subsection Opcode Prefixes | |
2137 | Opcode prefixes are used to modify the following opcode. They are used | |
2138 | to repeat string instructions, to provide segment overrides, to perform | |
2139 | bus lock operations, and to give operand and address size (16-bit | |
2140 | operands are specified in an instruction by prefixing what would | |
2141 | normally be 32-bit operands with a ``operand size'' opcode prefix). | |
2142 | Opcode prefixes are usually given as single-line instructions with no | |
2143 | operands, and must directly precede the instruction they act upon. For | |
2144 | example, the @samp{scas} (scan string) instruction is repeated with: | |
2145 | @example | |
2146 | repne | |
2147 | scas | |
2148 | @end example | |
2149 | ||
2150 | Here is a list of opcode prefixes: | |
2151 | @itemize @bullet | |
2152 | @item | |
2153 | Segment override prefixes @samp{cs}, @samp{ds}, @samp{ss}, @samp{es}, | |
2154 | @samp{fs}, @samp{gs}. These are automatically added by specifying | |
2155 | using the @var{segment}:@var{memory-operand} form for memory references. | |
2156 | ||
2157 | @item | |
2158 | Operand/Address size prefixes @samp{data16} and @samp{addr16} | |
2159 | change 32-bit operands/addresses into 16-bit operands/addresses. Note | |
2160 | that 16-bit addressing modes (i.e. 8086 and 80286 addressing modes) | |
2161 | are not supported (yet). | |
2162 | ||
2163 | @item | |
2164 | The bus lock prefix @samp{lock} inhibits interrupts during | |
2165 | execution of the instruction it precedes. (This is only valid with | |
2166 | certain instructions; see a 80386 manual for details). | |
2167 | ||
2168 | @item | |
2169 | The wait for coprocessor prefix @samp{wait} waits for the | |
2170 | coprocessor to complete the current instruction. This should never be | |
2171 | needed for the 80386/80387 combination. | |
2172 | ||
2173 | @item | |
2174 | The @samp{rep}, @samp{repe}, and @samp{repne} prefixes are added | |
2175 | to string instructions to make them repeat @samp{%ecx} times. | |
2176 | @end itemize | |
2177 | ||
2178 | @subsection Memory References | |
2179 | An Intel syntax indirect memory reference of the form | |
2180 | @example | |
2181 | @var{segment}:[@var{base} + @var{index}*@var{scale} + @var{disp}] | |
2182 | @end example | |
2183 | is translated into the AT&T syntax | |
2184 | @example | |
2185 | @var{segment}:@var{disp}(@var{base}, @var{index}, @var{scale}) | |
2186 | @end example | |
2187 | where @var{base} and @var{index} are the optional 32-bit base and | |
2188 | index registers, @var{disp} is the optional displacement, and | |
2189 | @var{scale}, taking the values 1, 2, 4, and 8, multiplies @var{index} | |
2190 | to calculate the address of the operand. If no @var{scale} is | |
2191 | specified, @var{scale} is taken to be 1. @var{segment} specifies the | |
2192 | optional segment register for the memory operand, and may override the | |
2193 | default segment register (see a 80386 manual for segment register | |
2194 | defaults). Note that segment overrides in AT&T syntax @emph{must} have | |
2195 | be preceded by a @samp{%}. If you specify a segment override which | |
2196 | coincides with the default segment register, @code{as} will @emph{not} | |
2197 | output any segment register override prefixes to assemble the given | |
2198 | instruction. Thus, segment overrides can be specified to emphasize which | |
2199 | segment register is used for a given memory operand. | |
2200 | ||
2201 | Here are some examples of Intel and AT&T style memory references: | |
2202 | @table @asis | |
2203 | ||
2204 | @item AT&T: @samp{-4(%ebp)}, Intel: @samp{[ebp - 4]} | |
2205 | @var{base} is @samp{%ebp}; @var{disp} is @samp{-4}. @var{segment} is | |
2206 | missing, and the default segment is used (@samp{%ss} for addressing with | |
2207 | @samp{%ebp} as the base register). @var{index}, @var{scale} are both missing. | |
2208 | ||
2209 | @item AT&T: @samp{foo(,%eax,4)}, Intel: @samp{[foo + eax*4]} | |
2210 | @var{index} is @samp{%eax} (scaled by a @var{scale} 4); @var{disp} is | |
2211 | @samp{foo}. All other fields are missing. The segment register here | |
2212 | defaults to @samp{%ds}. | |
2213 | ||
2214 | @item AT&T: @samp{foo(,1)}; Intel @samp{[foo]} | |
2215 | This uses the value pointed to by @samp{foo} as a memory operand. | |
2216 | Note that @var{base} and @var{index} are both missing, but there is only | |
2217 | @emph{one} @samp{,}. This is a syntactic exception. | |
2218 | ||
2219 | @item AT&T: @samp{%gs:foo}; Intel @samp{gs:foo} | |
2220 | This selects the contents of the variable @samp{foo} with segment | |
2221 | register @var{segment} being @samp{%gs}. | |
2222 | ||
2223 | @end table | |
2224 | ||
2225 | Absolute (as opposed to PC relative) call and jump operands must be | |
2226 | prefixed with @samp{*}. If no @samp{*} is specified, @code{as} will | |
2227 | always choose PC relative addressing for jump/call labels. | |
2228 | ||
2229 | Any instruction that has a memory operand @emph{must} specify its size (byte, | |
2230 | word, or long) with an opcode suffix (@samp{b}, @samp{w}, or @samp{l}, | |
2231 | respectively). | |
2232 | ||
2233 | @subsection Handling of Jump Instructions | |
2234 | Jump instructions are always optimized to use the smallest possible | |
2235 | displacements. This is accomplished by using byte (8-bit) displacement | |
2236 | jumps whenever the target is sufficiently close. If a byte displacement | |
2237 | is insufficient a long (32-bit) displacement is used. We do not support | |
2238 | word (16-bit) displacement jumps (i.e. prefixing the jump instruction | |
2239 | with the @samp{addr16} opcode prefix), since the 80386 insists upon masking | |
2240 | @samp{%eip} to 16 bits after the word displacement is added. | |
2241 | ||
2242 | Note that the @samp{jcxz}, @samp{jecxz}, @samp{loop}, @samp{loopz}, | |
2243 | @samp{loope}, @samp{loopnz} and @samp{loopne} instructions only come in | |
2244 | byte displacements, so that it is possible that use of these | |
2245 | instructions (@code{GCC} does not use them) will cause the assembler to | |
2246 | print an error message (and generate incorrect code). The AT&T 80386 | |
2247 | assembler tries to get around this problem by expanding @samp{jcxz foo} to | |
2248 | @example | |
2249 | jcxz cx_zero | |
2250 | jmp cx_nonzero | |
2251 | cx_zero: jmp foo | |
2252 | cx_nonzero: | |
2253 | @end example | |
2254 | ||
2255 | @subsection Floating Point | |
2256 | All 80387 floating point types except packed BCD are supported. | |
2257 | (BCD support may be added without much difficulty). These data | |
2258 | types are 16-, 32-, and 64- bit integers, and single (32-bit), | |
2259 | double (64-bit), and extended (80-bit) precision floating point. | |
2260 | Each supported type has an opcode suffix and a constructor | |
2261 | associated with it. Opcode suffixes specify operand's data | |
2262 | types. Constructors build these data types into memory. | |
2263 | ||
2264 | @itemize @bullet | |
2265 | @item | |
2266 | Floating point constructors are @samp{.float} or @samp{.single}, | |
2267 | @samp{.double}, and @samp{.tfloat} for 32-, 64-, and 80-bit formats. | |
2268 | These correspond to opcode suffixes @samp{s}, @samp{l}, and @samp{t}. | |
2269 | @samp{t} stands for temporary real, and that the 80387 only supports | |
2270 | this format via the @samp{fldt} (load temporary real to stack top) and | |
2271 | @samp{fstpt} (store temporary real and pop stack) instructions. | |
2272 | ||
2273 | @item | |
2274 | Integer constructors are @samp{.word}, @samp{.long} or @samp{.int}, and | |
2275 | @samp{.quad} for the 16-, 32-, and 64-bit integer formats. The corresponding | |
2276 | opcode suffixes are @samp{s} (single), @samp{l} (long), and @samp{q} | |
2277 | (quad). As with the temporary real format the 64-bit @samp{q} format is | |
2278 | only present in the @samp{fildq} (load quad integer to stack top) and | |
2279 | @samp{fistpq} (store quad integer and pop stack) instructions. | |
2280 | @end itemize | |
2281 | ||
2282 | Register to register operations do not require opcode suffixes, | |
2283 | so that @samp{fst %st, %st(1)} is equivalent to @samp{fstl %st, %st(1)}. | |
2284 | ||
2285 | Since the 80387 automatically synchronizes with the 80386 @samp{fwait} | |
2286 | instructions are almost never needed (this is not the case for the | |
2287 | 80286/80287 and 8086/8087 combinations). Therefore, @code{as} supresses | |
2288 | the @samp{fwait} instruction whenever it is implicitly selected by one | |
2289 | of the @samp{fn@dots{}} instructions. For example, @samp{fsave} and | |
2290 | @samp{fnsave} are treated identically. In general, all the @samp{fn@dots{}} | |
2291 | instructions are made equivalent to @samp{f@dots{}} instructions. If | |
2292 | @samp{fwait} is desired it must be explicitly coded. | |
2293 | ||
2294 | @subsection Notes | |
2295 | There is some trickery concerning the @samp{mul} and @samp{imul} | |
2296 | instructions that deserves mention. The 16-, 32-, and 64-bit expanding | |
2297 | multiplies (base opcode @samp{0xf6}; extension 4 for @samp{mul} and 5 | |
2298 | for @samp{imul}) can be output only in the one operand form. Thus, | |
2299 | @samp{imul %ebx, %eax} does @emph{not} select the expanding multiply; | |
2300 | the expanding multiply would clobber the @samp{%edx} register, and this | |
2301 | would confuse @code{GCC} output. Use @samp{imul %ebx} to get the | |
2302 | 64-bit product in @samp{%edx:%eax}. | |
2303 | ||
2304 | We have added a two operand form of @samp{imul} when the first operand | |
2305 | is an immediate mode expression and the second operand is a register. | |
2306 | This is just a shorthand, so that, multiplying @samp{%eax} by 69, for | |
2307 | example, can be done with @samp{imul $69, %eax} rather than @samp{imul | |
2308 | $69, %eax, %eax}. | |
2309 | ||
2310 | @node Maintenance, Retargeting, MachineDependent, top | |
2311 | @chapter Maintaining the Assembler | |
2312 | [[this chapter is still being built]] | |
2313 | ||
2314 | @section Design | |
2315 | We had these goals, in descending priority: | |
2316 | @table @b | |
2317 | @item Accuracy. | |
2318 | For every program composed by a compiler, @code{as} should emit | |
2319 | ``correct'' code. This leaves some latitude in choosing addressing | |
2320 | modes, order of @code{relocation_info} structures in the object | |
2321 | file, @i{etc}. | |
2322 | ||
2323 | @item Speed, for usual case. | |
2324 | By far the most common use of @code{as} will be assembling compiler | |
2325 | emissions. | |
2326 | ||
2327 | @item Upward compatibility for existing assembler code. | |
2328 | Well @dots{} we don't support Vax bit fields but everything else | |
2329 | seems to be upward compatible. | |
2330 | ||
2331 | @item Readability. | |
2332 | The code should be maintainable with few surprises. (JF: ha!) | |
2333 | ||
2334 | @end table | |
2335 | ||
2336 | We assumed that disk I/O was slow and expensive while memory was | |
2337 | fast and access to memory was cheap. We expect the in-memory data | |
2338 | structures to be less than 10 times the size of the emitted object | |
2339 | file. (Contrast this with the C compiler where in-memory structures | |
2340 | might be 100 times object file size!) | |
2341 | This suggests: | |
2342 | @itemize @bullet | |
2343 | @item | |
2344 | Try to read the source file from disk only one time. For other | |
2345 | reasons, we keep large chunks of the source file in memory during | |
2346 | assembly so this is not a problem. Also the assembly algorithm | |
2347 | should only scan the source text once if the compiler composed the | |
2348 | text according to a few simple rules. | |
2349 | @item | |
2350 | Emit the object code bytes only once. Don't store values and then | |
2351 | backpatch later. | |
2352 | @item | |
2353 | Build the object file in memory and do direct writes to disk of | |
2354 | large buffers. | |
2355 | @end itemize | |
2356 | ||
2357 | RMS suggested a one-pass algorithm which seems to work well. By not | |
2358 | parsing text during a second pass considerable time is saved on | |
2359 | large programs (@i{e.g.} the sort of C program @code{yacc} would | |
2360 | emit). | |
2361 | ||
2362 | It happened that the data structures needed to emit relocation | |
2363 | information to the object file were neatly subsumed into the data | |
2364 | structures that do backpatching of addresses after pass 1. | |
2365 | ||
2366 | Many of the functions began life as re-usable modules, loosely | |
2367 | connected. RMS changed this to gain speed. For example, input | |
2368 | parsing routines which used to work on pre-sanitized strings now | |
2369 | must parse raw data. Hence they have to import knowledge of the | |
2370 | assemblers' comment conventions @i{etc}. | |
2371 | ||
2372 | @section Deprecated Feature(?)s | |
2373 | We have stopped supporting some features: | |
2374 | @itemize @bullet | |
2375 | @item | |
2376 | @code{.org} statements must have @b{defined} expressions. | |
2377 | @item | |
2378 | Vax Bit fields (@kbd{:} operator) are entirely unsupported. | |
2379 | @end itemize | |
2380 | ||
2381 | It might be a good idea to not support these features in a future release: | |
2382 | @itemize @bullet | |
2383 | @item | |
2384 | @kbd{#} should begin a comment, even in column 1. | |
2385 | @item | |
2386 | Why support the logical line & file concept any more? | |
2387 | @item | |
2388 | Subsegments are a good candidate for flushing. | |
2389 | Depends on which compilers need them I guess. | |
2390 | @end itemize | |
2391 | ||
2392 | @section Bugs, Ideas, Further Work | |
2393 | Clearly the major improvement is DON'T USE A TEXT-READING | |
2394 | ASSEMBLER for the back end of a compiler. It is much faster to | |
2395 | interpret binary gobbledygook from a compiler's tables than to | |
2396 | ask the compiler to write out human-readable code just so the | |
2397 | assembler can parse it back to binary. | |
2398 | ||
2399 | Assuming you use @code{as} for human written programs: here are | |
2400 | some ideas: | |
2401 | @itemize @bullet | |
2402 | @item | |
2403 | Document (here) @code{APP}. | |
2404 | @item | |
2405 | Take advantage of knowing no spaces except after opcode | |
2406 | to speed up @code{as}. (Modify @code{app.c} to flush useless spaces: | |
2407 | only keep space/tabs at begin of line or between 2 | |
2408 | symbols.) | |
2409 | @item | |
2410 | Put pointers in this documentation to @file{a.out} documentation. | |
2411 | @item | |
2412 | Split the assembler into parts so it can gobble direct binary | |
2413 | from @i{e.g.} @code{cc}. It is silly for@code{cc} to compose text | |
2414 | just so @code{as} can parse it back to binary. | |
2415 | @item | |
2416 | Rewrite hash functions: I want a more modular, faster library. | |
2417 | @item | |
2418 | Clean up LOTS of code. | |
2419 | @item | |
2420 | Include all the non-@file{.c} files in the maintenance chapter. | |
2421 | @item | |
2422 | Document flonums. | |
2423 | @item | |
2424 | Implement flonum short literals. | |
2425 | @item | |
2426 | Change all talk of expression operands to expression quantities, | |
2427 | or perhaps to expression primaries. | |
2428 | @item | |
2429 | Implement pass 2. | |
2430 | @item | |
2431 | Whenever a @code{.text} or @code{.data} statement is seen, we close | |
2432 | of the current frag with an imaginary @code{.fill 0}. This is | |
2433 | because we only have one obstack for frags, and we can't grow new | |
2434 | frags for a new subsegment, then go back to the old subsegment and | |
2435 | append bytes to the old frag. All this nonsense goes away if we | |
2436 | give each subsegment its own obstack. It makes code simpler in | |
2437 | about 10 places, but nobody has bothered to do it because C compiler | |
2438 | output rarely changes subsegments (compared to ending frags with | |
2439 | relaxable addresses, which is common). | |
2440 | @end itemize | |
2441 | ||
2442 | @section Sources | |
2443 | @c The following files in the @file{as} directory | |
2444 | @c are symbolic links to other files, of | |
2445 | @c the same name, in a different directory. | |
2446 | @c @itemize @bullet | |
2447 | @c @item | |
2448 | @c @file{atof_generic.c} | |
2449 | @c @item | |
2450 | @c @file{atof_vax.c} | |
2451 | @c @item | |
2452 | @c @file{flonum_const.c} | |
2453 | @c @item | |
2454 | @c @file{flonum_copy.c} | |
2455 | @c @item | |
2456 | @c @file{flonum_get.c} | |
2457 | @c @item | |
2458 | @c @file{flonum_multip.c} | |
2459 | @c @item | |
2460 | @c @file{flonum_normal.c} | |
2461 | @c @item | |
2462 | @c @file{flonum_print.c} | |
2463 | @c @end itemize | |
2464 | ||
2465 | Here is a list of the source files in the @file{as} directory. | |
2466 | ||
2467 | @table @file | |
2468 | @item app.c | |
2469 | This contains the pre-processing phase, which deletes comments, | |
2470 | handles whitespace, etc. This was recently re-written, since app | |
2471 | used to be a separate program, but RMS wanted it to be inline. | |
2472 | ||
2473 | @item append.c | |
2474 | This is a subroutine to append a string to another string returning a | |
2475 | pointer just after the last @code{char} appended. (JF: All these | |
2476 | little routines should probably all be put in one file.) | |
2477 | ||
2478 | @item as.c | |
2479 | Here you will find the main program of the assembler @code{as}. | |
2480 | ||
2481 | @item expr.c | |
2482 | This is a branch office of @file{read.c}. This understands | |
2483 | expressions, primaries. Inside @code{as}, primaries are called | |
2484 | (expression) @i{operands}. This is confusing, because we also talk | |
2485 | (elsewhere) about instruction @i{operands}. Also, expression | |
2486 | operands are called @i{quantities} explicitly to avoid confusion | |
2487 | with instruction operands. What a mess. | |
2488 | ||
2489 | @item frags.c | |
2490 | This implements the @b{frag} concept. Without frags, finding the | |
2491 | right size for branch instructions would be a lot harder. | |
2492 | ||
2493 | @item hash.c | |
2494 | This contains the symbol table, opcode table @i{etc.} hashing | |
2495 | functions. | |
2496 | ||
2497 | @item hex_value.c | |
2498 | This is a table of values of digits, for use in atoi() type | |
2499 | functions. Could probably be flushed by using calls to strtol(), or | |
2500 | something similar. | |
2501 | ||
2502 | @item input-file.c | |
2503 | This contains Operating system dependent source file reading | |
2504 | routines. Since error messages often say where we are in reading | |
2505 | the source file, they live here too. Since @code{as} is intended to | |
2506 | run under GNU and Unix only, this might be worth flushing. Anyway, | |
2507 | almost all C compilers support stdio. | |
2508 | ||
2509 | @item input-scrub.c | |
2510 | This deals with calling the pre-processor (if needed) and feeding the | |
2511 | chunks back to the rest of the assembler the right way. | |
2512 | ||
2513 | @item messages.c | |
2514 | This contains operating system independent parts of fatal and | |
2515 | warning message reporting. See @file{append.c} above. | |
2516 | ||
2517 | @item output-file.c | |
2518 | This contains operating system dependent functions that write an | |
2519 | object file for @code{as}. See @file{input-file.c} above. | |
2520 | ||
2521 | @item read.c | |
2522 | This implements all the directives of @code{as}. This also deals | |
2523 | with passing input lines to the machine dependent part of the | |
2524 | assembler. | |
2525 | ||
2526 | @item strstr.c | |
2527 | This is a C library function that isn't in most C libraries yet. | |
2528 | See @file{append.c} above. | |
2529 | ||
2530 | @item subsegs.c | |
2531 | This implements subsegments. | |
2532 | ||
2533 | @item symbols.c | |
2534 | This implements symbols. | |
2535 | ||
2536 | @item write.c | |
2537 | This contains the code to perform relaxation, and to write out | |
2538 | the object file. It is mostly operating system independent, but | |
2539 | different OSes have different object file formats in any case. | |
2540 | ||
2541 | @item xmalloc.c | |
2542 | This implements @code{malloc()} or bust. See @file{append.c} above. | |
2543 | ||
2544 | @item xrealloc.c | |
2545 | This implements @code{realloc()} or bust. See @file{append.c} above. | |
2546 | ||
2547 | @item atof-generic.c | |
2548 | The following files were taken from a machine-independent subroutine | |
2549 | library for manipulating floating point numbers and very large | |
2550 | integers. | |
2551 | ||
2552 | @file{atof-generic.c} turns a string into a flonum internal format | |
2553 | floating-point number. | |
2554 | ||
2555 | @item flonum-const.c | |
2556 | This contains some potentially useful floating point numbers in | |
2557 | flonum format. | |
2558 | ||
2559 | @item flonum-copy.c | |
2560 | This copies a flonum. | |
2561 | ||
2562 | @item flonum-multip.c | |
2563 | This multiplies two flonums together. | |
2564 | ||
2565 | @item bignum-copy.c | |
2566 | This copies a bignum. | |
2567 | ||
2568 | @end table | |
2569 | ||
2570 | Here is a table of all the machine-specific files (this includes | |
2571 | both source and header files). Typically, there is a | |
2572 | @var{machine}.c file, a @var{machine}-opcode.h file, and an | |
2573 | atof-@var{machine}.c file. The @var{machine}-opcode.h file should | |
2574 | be identical to the one used by GDB (which uses it for disassembly.) | |
2575 | ||
2576 | @table @file | |
2577 | ||
2578 | @item atof-ieee.c | |
2579 | This contains code to turn a flonum into a ieee literal constant. | |
2580 | This is used by tye 680x0, 32x32, sparc, and i386 versions of @code{as}. | |
2581 | ||
2582 | @item i386-opcode.h | |
2583 | This is the opcode-table for the i386 version of the assembler. | |
2584 | ||
2585 | @item i386.c | |
2586 | This contains all the code for the i386 version of the assembler. | |
2587 | ||
2588 | @item i386.h | |
2589 | This defines constants and macros used by the i386 version of the assembler. | |
2590 | ||
2591 | @item m-generic.h | |
2592 | generic 68020 header file. To be linked to m68k.h on a | |
2593 | non-sun3, non-hpux system. | |
2594 | ||
2595 | @item m-sun2.h | |
2596 | 68010 header file for Sun2 workstations. Not well tested. To be linked | |
2597 | to m68k.h on a sun2. (See also @samp{-DSUN_ASM_SYNTAX} in the | |
2598 | @file{Makefile}.) | |
2599 | ||
2600 | @item m-sun3.h | |
2601 | 68020 header file for Sun3 workstations. To be linked to m68k.h before | |
2602 | compiling on a Sun3 system. (See also @samp{-DSUN_ASM_SYNTAX} in the | |
2603 | @file{Makefile}.) | |
2604 | ||
2605 | @item m-hpux.h | |
2606 | 68020 header file for a HPUX (system 5?) box. Which box, which | |
2607 | version of HPUX, etc? I don't know. | |
2608 | ||
2609 | @item m68k.h | |
2610 | A hard- or symbolic- link to one of @file{m-generic.h}, | |
2611 | @file{m-hpux.h} or @file{m-sun3.h} depending on which kind of | |
2612 | 680x0 you are assembling for. (See also @samp{-DSUN_ASM_SYNTAX} in the | |
2613 | @file{Makefile}.) | |
2614 | ||
2615 | @item m68k-opcode.h | |
2616 | Opcode table for 68020. This is now a link to the opcode table | |
2617 | in the @code{GDB} source directory. | |
2618 | ||
2619 | @item m68k.c | |
2620 | All the mc680x0 code, in one huge, slow-to-compile file. | |
2621 | ||
2622 | @item ns32k.c | |
2623 | This contains the code for the ns32032/ns32532 version of the | |
2624 | assembler. | |
2625 | ||
2626 | @item ns32k-opcode.h | |
2627 | This contains the opcode table for the ns32032/ns32532 version | |
2628 | of the assembler. | |
2629 | ||
2630 | @item vax-inst.h | |
2631 | Vax specific file for describing Vax operands and other Vax-ish things. | |
2632 | ||
2633 | @item vax-opcode.h | |
2634 | Vax opcode table. | |
2635 | ||
2636 | @item vax.c | |
2637 | Vax specific parts of @code{as}. Also includes the former files | |
2638 | @file{vax-ins-parse.c}, @file{vax-reg-parse.c} and @file{vip-op.c}. | |
2639 | ||
2640 | @item atof-vax.c | |
2641 | Turns a flonum into a Vax constant. | |
2642 | ||
2643 | @item vms.c | |
2644 | This file contains the special code needed to put out a VMS | |
2645 | style object file for the Vax. | |
2646 | ||
2647 | @end table | |
2648 | ||
2649 | Here is a list of the header files in the source directory. | |
2650 | (Warning: This section may not be very accurate. I didn't | |
2651 | write the header files; I just report them.) Also note that I | |
2652 | think many of these header files could be cleaned up or | |
2653 | eliminated. | |
2654 | ||
2655 | @table @file | |
2656 | ||
2657 | @item a.out.h | |
2658 | This describes the structures used to create the binary header data | |
2659 | inside the object file. Perhaps we should use the one in | |
2660 | @file{/usr/include}? | |
2661 | ||
2662 | @item as.h | |
2663 | This defines all the globally useful things, and pulls in <stdio.h> | |
2664 | and <assert.h>. | |
2665 | ||
2666 | @item bignum.h | |
2667 | This defines macros useful for dealing with bignums. | |
2668 | ||
2669 | @item expr.h | |
2670 | Structure and macros for dealing with expression() | |
2671 | ||
2672 | @item flonum.h | |
2673 | This defines the structure for dealing with floating point | |
2674 | numbers. It #includes @file{bignum.h}. | |
2675 | ||
2676 | @item frags.h | |
2677 | This contains macro for appending a byte to the current frag. | |
2678 | ||
2679 | @item hash.h | |
2680 | Structures and function definitions for the hashing functions. | |
2681 | ||
2682 | @item input-file.h | |
2683 | Function headers for the input-file.c functions. | |
2684 | ||
2685 | @item md.h | |
2686 | structures and function headers for things defined in the | |
2687 | machine dependent part of the assembler. | |
2688 | ||
2689 | @item obstack.h | |
2690 | This is the GNU systemwide include file for manipulating obstacks. | |
2691 | Since nobody is running under real GNU yet, we include this file. | |
2692 | ||
2693 | @item read.h | |
2694 | Macros and function headers for reading in source files. | |
2695 | ||
2696 | @item struct-symbol.h | |
2697 | Structure definition and macros for dealing with the gas | |
2698 | internal form of a symbol. | |
2699 | ||
2700 | @item subsegs.h | |
2701 | structure definition for dealing with the numbered subsegments | |
2702 | of the text and data segments. | |
2703 | ||
2704 | @item symbols.h | |
2705 | Macros and function headers for dealing with symbols. | |
2706 | ||
2707 | @item write.h | |
2708 | Structure for doing segment fixups. | |
2709 | @end table | |
2710 | ||
2711 | @comment ~subsection Test Directory | |
2712 | @comment (Note: The test directory seems to have disappeared somewhere | |
2713 | @comment along the line. If you want it, you'll probably have to find a | |
2714 | @comment REALLY OLD dump tape~dots{}) | |
2715 | @comment | |
2716 | @comment The ~file{test/} directory is used for regression testing. | |
2717 | @comment After you modify ~code{as}, you can get a quick go/nogo | |
2718 | @comment confidence test by running the new ~code{as} over the source | |
2719 | @comment files in this directory. You use a shell script ~file{test/do}. | |
2720 | @comment | |
2721 | @comment The tests in this suite are evolving. They are not comprehensive. | |
2722 | @comment They have, however, caught hundreds of bugs early in the debugging | |
2723 | @comment cycle of ~code{as}. Most test statements in this suite were naturally | |
2724 | @comment selected: they were used to demonstrate actual ~code{as} bugs rather | |
2725 | @comment than being written ~i{a prioi}. | |
2726 | @comment | |
2727 | @comment Another testing suggestion: over 30 bugs have been found simply by | |
2728 | @comment running examples from this manual through ~code{as}. | |
2729 | @comment Some examples in this manual are selected | |
2730 | @comment to distinguish boundary conditions; they are good for testing ~code{as}. | |
2731 | @comment | |
2732 | @comment ~subsubsection Regression Testing | |
2733 | @comment Each regression test involves assembling a file and comparing the | |
2734 | @comment actual output of ~code{as} to ``known good'' output files. Both | |
2735 | @comment the object file and the error/warning message file (stderr) are | |
2736 | @comment inspected. Optionally ~code{as}' exit status may be checked. | |
2737 | @comment Discrepencies are reported. Each discrepency means either that | |
2738 | @comment you broke some part of ~code{as} or that the ``known good'' files | |
2739 | @comment are now out of date and should be changed to reflect the new | |
2740 | @comment definition of ``good''. | |
2741 | @comment | |
2742 | @comment Each regression test lives in its own directory, in a tree | |
2743 | @comment rooted in the directory ~file{test/}. Each such directory | |
2744 | @comment has a name ending in ~file{.ret}, where `ret' stands for | |
2745 | @comment REgression Test. The ~file{.ret} ending allows ~code{find | |
2746 | @comment (1)} to find all regression tests in the tree, without | |
2747 | @comment needing to list them explicitly. | |
2748 | @comment | |
2749 | @comment Any ~file{.ret} directory must contain a file called | |
2750 | @comment ~file{input} which is the source file to assemble. During | |
2751 | @comment testing an object file ~file{output} is created, as well as | |
2752 | @comment a file ~file{stdouterr} which contains the output to both | |
2753 | @comment stderr and stderr. If there is a file ~file{output.good} in | |
2754 | @comment the directory, and if ~file{output} contains exactly the | |
2755 | @comment same data as ~file{output.good}, the file ~file{output} is | |
2756 | @comment deleted. Likewise ~file{stdouterr} is removed if it exactly | |
2757 | @comment matches a file ~file{stdouterr.good}. If file | |
2758 | @comment ~file{status.good} is present, containing a decimal number | |
2759 | @comment before a newline, the exit status of ~code{as} is compared | |
2760 | @comment to this number. If the status numbers are not equal, a file | |
2761 | @comment ~file{status} is written to the directory, containing the | |
2762 | @comment actual status as a decimal number followed by newline. | |
2763 | @comment | |
2764 | @comment Should any of the ~file{*.good} files fail to match their corresponding | |
2765 | @comment actual files, this is noted by a 1-line message on the screen during | |
2766 | @comment the regression test, and you can use ~code{find (1)} to find any | |
2767 | @comment files named ~file{status}, ~file {output} or ~file{stdouterr}. | |
2768 | @comment | |
2769 | @node Retargeting, , Maintenance, top | |
2770 | @chapter Teaching the Assembler about a New Machine | |
2771 | ||
2772 | This chapter describes the steps required in order to make the | |
2773 | assembler work with another machine's assembly language. This | |
2774 | chapter is not complete, and only describes the steps in the | |
2775 | broadest terms. You should look at the source for the | |
2776 | currently supported machine in order to discover some of the | |
2777 | details that aren't mentioned here. | |
2778 | ||
2779 | You should create a new file called @file{@var{machine}.c}, and | |
2780 | add the appropriate lines to the file @file{Makefile} so that | |
2781 | you can compile your new version of the assembler. This should | |
2782 | be straighforward; simply add lines similar to the ones there | |
2783 | for the four current versions of the assembler. | |
2784 | ||
2785 | If you want to be compatable with GDB, (and the current | |
2786 | machine-dependent versions of the assembler), you should create | |
2787 | a file called @file{@var{machine}-opcode.h} which should | |
2788 | contain all the information about the names of the machine | |
2789 | instructions, their opcodes, and what addressing modes they | |
2790 | support. If you do this right, the assembler and GDB can share | |
2791 | this file, and you'll only have to write it once. Note that | |
2792 | while you're writing @code{as}, you may want to use an | |
2793 | independent program (if you have access to one), to make sure | |
2794 | that @code{as} is emitting the correct bytes. Since @code{as} | |
2795 | and @code{GDB} share the opcode table, an incorrect opcode | |
2796 | table entry may make invalid bytes look OK when you disassemble | |
2797 | them with @code{GDB}. | |
2798 | ||
2799 | @section Functions You will Have to Write | |
2800 | ||
2801 | Your file @file{@var{machine}.c} should contain definitions for | |
2802 | the following functions and variables. It will need to include | |
2803 | some header files in order to use some of the structures | |
2804 | defined in the machine-independent part of the assembler. The | |
2805 | needed header files are mentioned in the descriptions of the | |
2806 | functions that will need them. | |
2807 | ||
2808 | @table @code | |
2809 | ||
2810 | @item long omagic; | |
2811 | This long integer holds the value to place at the beginning of | |
2812 | the @file{a.out} file. It is usually @samp{OMAGIC}, except on | |
2813 | machines that store additional information in the magic-number. | |
2814 | ||
2815 | @item char comment_chars[]; | |
2816 | This character array holds the values of the characters that | |
2817 | start a comment anywhere in a line. Comments are stripped off | |
2818 | automatically by the machine independent part of the | |
2819 | assembler. Note that the @samp{/*} will always start a | |
2820 | comment, and that only @samp{*/} will end a comment started by | |
2821 | @samp{*/}. | |
2822 | ||
2823 | @item char line_comment_chars[]; | |
2824 | This character array holds the values of the chars that start a | |
2825 | comment only if they are the first (non-whitespace) character | |
2826 | on a line. If the character @samp{#} does not appear in this | |
2827 | list, you may get unexpected results. (Various | |
2828 | machine-independent parts of the assembler treat the comments | |
2829 | @samp{#APP} and @samp{#NO_APP} specially, and assume that lines | |
2830 | that start with @samp{#} are comments.) | |
2831 | ||
2832 | @item char EXP_CHARS[]; | |
2833 | This character array holds the letters that can separate the | |
2834 | mantissa and the exponent of a floating point number. Typical | |
2835 | values are @samp{e} and @samp{E}. | |
2836 | ||
2837 | @item char FLT_CHARS[]; | |
2838 | This character array holds the letters that--when they appear | |
2839 | immediately after a leading zero--indicate that a number is a | |
2840 | floating-point number. (Sort of how 0x indicates that a | |
2841 | hexadecimal number follows.) | |
2842 | ||
2843 | @item pseudo_typeS md_pseudo_table[]; | |
2844 | (@var{pseudo_typeS} is defined in @file{md.h}) | |
2845 | This array contains a list of the machine_dependent directives | |
2846 | the assembler must support. It contains the name of each | |
2847 | pseudo op (Without the leading @samp{.}), a pointer to a | |
2848 | function to be called when that directive is encountered, and | |
2849 | an integer argument to be passed to that function. | |
2850 | ||
2851 | @item void md_begin(void) | |
2852 | This function is called as part of the assembler's | |
2853 | initialization. It should do any initialization required by | |
2854 | any of your other routines. | |
2855 | ||
2856 | @item int md_parse_option(char **optionPTR, int *argcPTR, char ***argvPTR) | |
2857 | This routine is called once for each option on the command line | |
2858 | that the machine-independent part of @code{as} does not | |
2859 | understand. This function should return non-zero if the option | |
2860 | pointed to by @var{optionPTR} is a valid option. If it is not | |
2861 | a valid option, this routine should return zero. The variables | |
2862 | @var{argcPTR} and @var{argvPTR} are provided in case the option | |
2863 | requires a filename or something similar as an argument. If | |
2864 | the option is multi-character, @var{optionPTR} should be | |
2865 | advanced past the end of the option, otherwise every letter in | |
2866 | the option will be treated as a separate single-character | |
2867 | option. | |
2868 | ||
2869 | @item void md_assemble(char *string) | |
2870 | This routine is called for every machine-dependent | |
2871 | non-directive line in the source file. It does all the real | |
2872 | work involved in reading the opcode, parsing the operands, | |
2873 | etc. @var{string} is a pointer to a null-terminated string, | |
2874 | that comprises the input line, with all excess whitespace and | |
2875 | comments removed. | |
2876 | ||
2877 | @item void md_number_to_chars(char *outputPTR,long value,int nbytes) | |
2878 | This routine is called to turn a C long int, short int, or char | |
2879 | into the series of bytes that represents that number on the | |
2880 | target machine. @var{outputPTR} points to an array where the | |
2881 | result should be stored; @var{value} is the value to store; and | |
2882 | @var{nbytes} is the number of bytes in 'value' that should be | |
2883 | stored. | |
2884 | ||
2885 | @item void md_number_to_imm(char *outputPTR,long value,int nbytes) | |
2886 | This routine is called to turn a C long int, short int, or char | |
2887 | into the series of bytes that represent an immediate value on | |
2888 | the target machine. It is identical to the function @code{md_number_to_chars}, | |
2889 | except on NS32K machines.@refill | |
2890 | ||
2891 | @item void md_number_to_disp(char *outputPTR,long value,int nbytes) | |
2892 | This routine is called to turn a C long int, short int, or char | |
2893 | into the series of bytes that represent an displacement value on | |
2894 | the target machine. It is identical to the function @code{md_number_to_chars}, | |
2895 | except on NS32K machines.@refill | |
2896 | ||
2897 | @item void md_number_to_field(char *outputPTR,long value,int nbytes) | |
2898 | This routine is identical to @code{md_number_to_chars}, | |
2899 | except on NS32K machines. | |
2900 | ||
2901 | @item void md_ri_to_chars(struct relocation_info *riPTR,ri) | |
2902 | (@code{struct relocation_info} is defined in @file{a.out.h}) | |
2903 | This routine emits the relocation info in @var{ri} | |
2904 | in the appropriate bit-pattern for the target machine. | |
2905 | The result should be stored in the location pointed | |
2906 | to by @var{riPTR}. This routine may be a no-op unless you are | |
2907 | attempting to do cross-assembly. | |
2908 | ||
2909 | @item char *md_atof(char type,char *outputPTR,int *sizePTR) | |
2910 | This routine turns a series of digits into the appropriate | |
2911 | internal representation for a floating-point number. | |
2912 | @var{type} is a character from @var{FLT_CHARS[]} that describes | |
2913 | what kind of floating point number is wanted; @var{outputPTR} | |
2914 | is a pointer to an array that the result should be stored in; | |
2915 | and @var{sizePTR} is a pointer to an integer where the size (in | |
2916 | bytes) of the result should be stored. This routine should | |
2917 | return an error message, or an empty string (not (char *)0) for | |
2918 | success. | |
2919 | ||
2920 | @item int md_short_jump_size; | |
2921 | This variable holds the (maximum) size in bytes of a short (16 | |
2922 | bit or so) jump created by @code{md_create_short_jump()}. This | |
2923 | variable is used as part of the broken-word feature, and isn't | |
2924 | needed if the assembler is compiled with | |
2925 | @samp{-DWORKING_DOT_WORD}. | |
2926 | ||
2927 | @item int md_long_jump_size; | |
2928 | This variable holds the (maximum) size in bytes of a long (32 | |
2929 | bit or so) jump created by @code{md_create_long_jump()}. This | |
2930 | variable is used as part of the broken-word feature, and isn't | |
2931 | needed if the assembler is compiled with | |
2932 | @samp{-DWORKING_DOT_WORD}. | |
2933 | ||
2934 | @item void md_create_short_jump(char *resultPTR,long from_addr, | |
2935 | @code{long to_addr,fragS *frag,symbolS *to_symbol)} | |
2936 | This function emits a jump from @var{from_addr} to @var{to_addr} in | |
2937 | the array of bytes pointed to by @var{resultPTR}. If this creates a | |
2938 | type of jump that must be relocated, this function should call | |
2939 | @code{fix_new()} with @var{frag} and @var{to_symbol}. The jump | |
2940 | emitted by this function may be smaller than @var{md_short_jump_size}, | |
2941 | but it must never create a larger one. | |
2942 | (If it creates a smaller jump, the extra bytes of memory will not be | |
2943 | used.) This function is used as part of the broken-word feature, | |
2944 | and isn't needed if the assembler is compiled with | |
2945 | @samp{-DWORKING_DOT_WORD}.@refill | |
2946 | ||
2947 | @item void md_create_long_jump(char *ptr,long from_addr, | |
2948 | @code{long to_addr,fragS *frag,symbolS *to_symbol)} | |
2949 | This function is similar to the previous function, | |
2950 | @code{md_create_short_jump()}, except that it creates a long | |
2951 | jump instead of a short one. This function is used as part of | |
2952 | the broken-word feature, and isn't needed if the assembler is | |
2953 | compiled with @samp{-DWORKING_DOT_WORD}. | |
2954 | ||
2955 | @item int md_estimate_size_before_relax(fragS *fragPTR,int segment_type) | |
2956 | This function does the initial setting up for relaxation. This | |
2957 | includes forcing references to still-undefined symbols to the | |
2958 | appropriate addressing modes. | |
2959 | ||
2960 | @item relax_typeS md_relax_table[]; | |
2961 | (relax_typeS is defined in md.h) | |
2962 | This array describes the various machine dependent states a | |
2963 | frag may be in before relaxation. You will need one group of | |
2964 | entries for each type of addressing mode you intend to relax. | |
2965 | ||
2966 | @item void md_convert_frag(fragS *fragPTR) | |
2967 | (@var{fragS} is defined in @file{as.h}) | |
2968 | This routine does the required cleanup after relaxation. | |
2969 | Relaxation has changed the type of the frag to a type that can | |
2970 | reach its destination. This function should adjust the opcode | |
2971 | of the frag to use the appropriate addressing mode. | |
2972 | @var{fragPTR} points to the frag to clean up. | |
2973 | ||
2974 | @item void md_end(void) | |
2975 | This function is called just before the assembler exits. It | |
2976 | need not free up memory unless the operating system doesn't do | |
2977 | it automatically on exit. (In which case you'll also have to | |
2978 | track down all the other places where the assembler allocates | |
2979 | space but never frees it.) | |
2980 | ||
2981 | @end table | |
2982 | ||
2983 | @section External Variables You will Need to Use | |
2984 | ||
2985 | You will need to refer to or change the following external variables | |
2986 | from within the machine-dependent part of the assembler. | |
2987 | ||
2988 | @table @code | |
2989 | @item extern char flagseen[]; | |
2990 | This array holds non-zero values in locations corresponding to | |
2991 | the options that were on the command line. Thus, if the | |
2992 | assembler was called with @samp{-W}, @var{flagseen['W']} would | |
2993 | be non-zero. | |
2994 | ||
2995 | @item extern fragS *frag_now; | |
2996 | This pointer points to the current frag--the frag that bytes | |
2997 | are currently being added to. If nothing else, you will need | |
2998 | to pass it as an argument to various machine-independent | |
2999 | functions. It is maintained automatically by the | |
3000 | frag-manipulating functions; you should never have to change it | |
3001 | yourself. | |
3002 | ||
3003 | @item extern LITTLENUM_TYPE generic_bignum[]; | |
3004 | (@var{LITTLENUM_TYPE} is defined in @file{bignum.h}. | |
3005 | This is where @dfn{bignums}--numbers larger than 32 bits--are | |
3006 | returned when they are encountered in an expression. You will | |
3007 | need to use this if you need to implement directives (or | |
3008 | anything else) that must deal with these large numbers. | |
3009 | @code{Bignums} are of @code{segT} @code{SEG_BIG} (defined in | |
3010 | @file{as.h}, and have a positive @code{X_add_number}. The | |
3011 | @code{X_add_number} of a @code{bignum} is the number of | |
3012 | @code{LITTLENUMS} in @var{generic_bignum} that the number takes | |
3013 | up. | |
3014 | ||
3015 | @item extern FLONUM_TYPE generic_floating_point_number; | |
3016 | (@var{FLONUM_TYPE} is defined in @file{flonum.h}. | |
3017 | The is where @dfn{flonums}--floating-point numbers within | |
3018 | expressions--are returned. @code{Flonums} are of @code{segT} | |
3019 | @code{SEG_BIG}, and have a negative @code{X_add_number}. | |
3020 | @code{Flonums} are returned in a generic format. You will have | |
3021 | to write a routine to turn this generic format into the | |
3022 | appropriate floating-point format for your machine. | |
3023 | ||
3024 | @item extern int need_pass_2; | |
3025 | If this variable is non-zero, the assembler has encountered an | |
3026 | expression that cannot be assembled in a single pass. Since | |
3027 | the second pass isn't implemented, this flag means that the | |
3028 | assembler is punting, and is only looking for additional syntax | |
3029 | errors. (Or something like that.) | |
3030 | ||
3031 | @item extern segT now_seg; | |
3032 | This variable holds the value of the segment the assembler is | |
3033 | currently assembling into. | |
3034 | ||
3035 | @end table | |
3036 | ||
3037 | @section External functions will you need | |
3038 | ||
3039 | You will find the following external functions useful (or | |
3040 | indispensable) when you're writing the machine-dependent part | |
3041 | of the assembler. | |
3042 | ||
3043 | @table @code | |
3044 | ||
3045 | @item char *frag_more(int bytes) | |
3046 | This function allocates @var{bytes} more bytes in the current | |
3047 | frag (or starts a new frag, if it can't expand the current frag | |
3048 | any more.) for you to store some object-file bytes in. It | |
3049 | returns a pointer to the bytes, ready for you to store data in. | |
3050 | ||
3051 | @item void fix_new(fragS *frag, int where, short size, symbolS *add_symbol, symbolS *sub_symbol, long offset, int pcrel) | |
3052 | This function stores a relocation fixup to be acted on later. | |
3053 | @var{frag} points to the frag the relocation belongs in; | |
3054 | @var{where} is the location within the frag where the relocation begins; | |
3055 | @var{size} is the size of the relocation, and is usually 1 (a single byte), | |
3056 | 2 (sixteen bits), or 4 (a longword). | |
3057 | The value @var{add_symbol} @minus{} @var{sub_symbol} + @var{offset}, is added to the byte(s) | |
3058 | at @var{frag->literal[where]}. If @var{pcrel} is non-zero, the address of the | |
3059 | location is subtracted from the result. A relocation entry is also added | |
3060 | to the @file{a.out} file. @var{add_symbol}, @var{sub_symbol}, and/or | |
3061 | @var{offset} may be NULL.@refill | |
3062 | ||
3063 | @item char *frag_var(relax_stateT type, int max_chars, int var, | |
3064 | @code{relax_substateT subtype, symbolS *symbol, char *opcode)} | |
3065 | This function creates a machine-dependent frag of type @var{type} | |
3066 | (usually @code{rs_machine_dependent}). | |
3067 | @var{max_chars} is the maximum size in bytes that the frag may grow by; | |
3068 | @var{var} is the current size of the variable end of the frag; | |
3069 | @var{subtype} is the sub-type of the frag. The sub-type is used to index into | |
3070 | @var{md_relax_table[]} during @code{relaxation}. | |
3071 | @var{symbol} is the symbol whose value should be used to when relax-ing this frag. | |
3072 | @var{opcode} points into a byte whose value may have to be modified if the | |
3073 | addressing mode used by this frag changes. It typically points into the | |
3074 | @var{fr_literal[]} of the previous frag, and is used to point to a location | |
3075 | that @code{md_convert_frag()}, may have to change.@refill | |
3076 | ||
3077 | @item void frag_wane(fragS *fragPTR) | |
3078 | This function is useful from within @code{md_convert_frag}. It | |
3079 | changes a frag to type rs_fill, and sets the variable-sized | |
3080 | piece of the frag to zero. The frag will never change in size | |
3081 | again. | |
3082 | ||
3083 | @item segT expression(expressionS *retval) | |
3084 | (@var{segT} is defined in @file{as.h}; @var{expressionS} is defined in @file{expr.h}) | |
3085 | This function parses the string pointed to by the external char | |
3086 | pointer @var{input_line_pointer}, and returns the segment-type | |
3087 | of the expression. It also stores the results in the | |
3088 | @var{expressionS} pointed to by @var{retval}. | |
3089 | @var{input_line_pointer} is advanced to point past the end of | |
3090 | the expression. (@var{input_line_pointer} is used by other | |
3091 | parts of the assembler. If you modify it, be sure to restore | |
3092 | it to its original value.) | |
3093 | ||
3094 | @item as_warn(char *message,@dots{}) | |
3095 | If warning messages are disabled, this function does nothing. | |
3096 | Otherwise, it prints out the current file name, and the current | |
3097 | line number, then uses @code{fprintf} to print the | |
3098 | @var{message} and any arguments it was passed. | |
3099 | ||
3100 | @item as_bad(char *message,@dots{}) | |
3101 | This function should be called when @code{as} encounters | |
3102 | conditions that are bad enough that @code{as} should not | |
3103 | produce an object file, but should continue reading input and | |
3104 | printing warning and bad error messages. | |
3105 | ||
3106 | @item as_fatal(char *message,@dots{}) | |
3107 | This function prints out the current file name and line number, | |
3108 | prints the word @samp{FATAL:}, then uses @code{fprintf} to | |
3109 | print the @var{message} and any arguments it was passed. Then | |
3110 | the assembler exits. This function should only be used for | |
3111 | serious, unrecoverable errors. | |
3112 | ||
3113 | @item void float_const(int float_type) | |
3114 | This function reads floating-point constants from the current | |
3115 | input line, and calls @code{md_atof} to assemble them. It is | |
3116 | useful as the function to call for the directives | |
3117 | @samp{.single}, @samp{.double}, @samp{.float}, etc. | |
3118 | @var{float_type} must be a character from @var{FLT_CHARS}. | |
3119 | ||
3120 | @item void demand_empty_rest_of_line(void); | |
3121 | This function can be used by machine-dependent directives to | |
3122 | make sure the rest of the input line is empty. It prints a | |
3123 | warning message if there are additional characters on the line. | |
3124 | ||
3125 | @item long int get_absolute_expression(void) | |
3126 | This function can be used by machine-dependent directives to | |
3127 | read an absolute number from the current input line. It | |
3128 | returns the result. If it isn't given an absolute expression, | |
3129 | it prints a warning message and returns zero. | |
3130 | ||
3131 | @end table | |
3132 | ||
3133 | ||
3134 | @section The concept of Frags | |
3135 | ||
3136 | This assembler works to optimize the size of certain addressing | |
3137 | modes. (e.g. branch instructions) This means the size of many | |
3138 | pieces of object code cannot be determined until after assembly | |
3139 | is finished. (This means that the addresses of symbols cannot be | |
3140 | determined until assembly is finished.) In order to do this, | |
3141 | @code{as} stores the output bytes as @dfn{frags}. | |
3142 | ||
3143 | Here is the definition of a frag (from @file{as.h}) | |
3144 | @example | |
3145 | struct frag | |
3146 | @{ | |
3147 | long int fr_fix; | |
3148 | long int fr_var; | |
3149 | relax_stateT fr_type; | |
3150 | relax_substateT fr_substate; | |
3151 | unsigned long fr_address; | |
3152 | long int fr_offset; | |
3153 | struct symbol *fr_symbol; | |
3154 | char *fr_opcode; | |
3155 | struct frag *fr_next; | |
3156 | char fr_literal[]; | |
3157 | @} | |
3158 | @end example | |
3159 | ||
3160 | @table @var | |
3161 | @item fr_fix | |
3162 | is the size of the fixed-size piece of the frag. | |
3163 | ||
3164 | @item fr_var | |
3165 | is the maximum (?) size of the variable-sized piece of the frag. | |
3166 | ||
3167 | @item fr_type | |
3168 | is the type of the frag. | |
3169 | Current types are: | |
3170 | rs_fill | |
3171 | rs_align | |
3172 | rs_org | |
3173 | rs_machine_dependent | |
3174 | ||
3175 | @item fr_substate | |
3176 | This stores the type of machine-dependent frag this is. (what | |
3177 | kind of addressing mode is being used, and what size is being | |
3178 | tried/will fit/etc. | |
3179 | ||
3180 | @item fr_address | |
3181 | @var{fr_address} is only valid after relaxation is finished. | |
3182 | Before relaxation, the only way to store an address is (pointer | |
3183 | to frag containing the address) plus (offset into the frag). | |
3184 | ||
3185 | @item fr_offset | |
3186 | This contains a number, whose meaning depends on the type of | |
3187 | the frag. | |
3188 | for machine_dependent frags, this contains the offset from | |
3189 | fr_symbol that the frag wants to go to. Thus, for branch | |
3190 | instructions it is usually zero. (unless the instruction was | |
3191 | @samp{jba foo+12} or something like that.) | |
3192 | ||
3193 | @item fr_symbol | |
3194 | for machine_dependent frags, this points to the symbol the frag | |
3195 | needs to reach. | |
3196 | ||
3197 | @item fr_opcode | |
3198 | This points to the location in the frag (or in a previous frag) | |
3199 | of the opcode for the instruction that caused this to be a frag. | |
3200 | @var{fr_opcode} is needed if the actual opcode must be changed | |
3201 | in order to use a different form of the addressing mode. | |
3202 | (For example, if a conditional branch only comes in size tiny, | |
3203 | a large-size branch could be implemented by reversing the sense | |
3204 | of the test, and turning it into a tiny branch over a large jump. | |
3205 | This would require changing the opcode.) | |
3206 | ||
3207 | @var{fr_literal} is a variable-size array that contains the | |
3208 | actual object bytes. A frag consists of a fixed size piece of | |
3209 | object data, (which may be zero bytes long), followed by a | |
3210 | piece of object data whose size may not have been determined | |
3211 | yet. Other information includes the type of the frag (which | |
3212 | controls how it is relaxed), | |
3213 | ||
3214 | @item fr_next | |
3215 | This is the next frag in the singly-linked list. This is | |
3216 | usually only needed by the machine-independent part of | |
3217 | @code{as}. | |
3218 | ||
3219 | @end table | |
3220 | ||
3221 | @c Is this really a good idea? | |
3222 | @iftex | |
3223 | @center [end of manual] | |
3224 | @end iftex | |
3225 | @summarycontents | |
3226 | @contents | |
3227 | @bye |