[qemu.git] / docs / devel / decodetree.rst

========================
Decodetree Specification
========================

A *decodetree* is built from instruction *patterns*.  A pattern may
represent a single architectural instruction or a group of same, depending
on what is convenient for further processing.

Each pattern has both *fixedbits* and *fixedmask*, the combination of which
describes the condition under which the pattern is matched::

  (insn & fixedmask) == fixedbits

Each pattern may have *fields*, which are extracted from the insn and
passed along to the translator.  Examples of such are registers,
immediates, and sub-opcodes.

In support of patterns, one may declare *fields*, *argument sets*, and
*formats*, each of which may be re-used to simplify further definitions.

Fields
======

Syntax::

  field_def     := '%' identifier ( unnamed_field )+ ( !function=identifier )?
  unnamed_field := number ':' ( 's' ) number

For *unnamed_field*, the first number is the least-significant bit position
of the field and the second number is the length of the field.  If the 's' is
present, the field is considered signed.  If multiple ``unnamed_fields`` are
present, they are concatenated.  In this way one can define disjoint fields.

If ``!function`` is specified, the concatenated result is passed through the
named function, taking and returning an integral value.

FIXME: the fields of the structure into which this result will be stored
is restricted to ``int``.  Which means that we cannot expand 64-bit items.

Field examples:

+---------------------------+---------------------------------------------+
| Input                     | Generated code                              |
+===========================+=============================================+
| %disp   0:s16             | sextract(i, 0, 16)                          |
+---------------------------+---------------------------------------------+
| %imm9   16:6 10:3         | extract(i, 16, 6) << 3 | extract(i, 10, 3)  |
+---------------------------+---------------------------------------------+
| %disp12 0:s1 1:1 2:10     | sextract(i, 0, 1) << 11 |                   |
|                           |    extract(i, 1, 1) << 10 |                 |
|                           |    extract(i, 2, 10)                        |
+---------------------------+---------------------------------------------+
| %shimm8 5:s8 13:1         | expand_shimm8(sextract(i, 5, 8) << 1 |      |
|   !function=expand_shimm8 |               extract(i, 13, 1))            |
+---------------------------+---------------------------------------------+

Argument Sets
=============

Syntax::

  args_def    := '&' identifier ( args_elt )+ ( !extern )?
  args_elt    := identifier

Each *args_elt* defines an argument within the argument set.
Each argument set will be rendered as a C structure "arg_$name"
with each of the fields being one of the member arguments.

If ``!extern`` is specified, the backing structure is assumed
to have been already declared, typically via a second decoder.

Argument sets are useful when one wants to define helper functions
for the translator functions that can perform operations on a common
set of arguments.  This can ensure, for instance, that the ``AND``
pattern and the ``OR`` pattern put their operands into the same named
structure, so that a common ``gen_logic_insn`` may be able to handle
the operations common between the two.

Argument set examples::

  &reg3       ra rb rc
  &loadstore  reg base offset


Formats
=======

Syntax::

  fmt_def      := '@' identifier ( fmt_elt )+
  fmt_elt      := fixedbit_elt | field_elt | field_ref | args_ref
  fixedbit_elt := [01.-]+
  field_elt    := identifier ':' 's'? number
  field_ref    := '%' identifier | identifier '=' '%' identifier
  args_ref     := '&' identifier

Defining a format is a handy way to avoid replicating groups of fields
across many instruction patterns.

A *fixedbit_elt* describes a contiguous sequence of bits that must
be 1, 0, or don't care.  The difference between '.' and '-'
is that '.' means that the bit will be covered with a field or a
final 0 or 1 from the pattern, and '-' means that the bit is really
ignored by the cpu and will not be specified.

A *field_elt* describes a simple field only given a width; the position of
the field is implied by its position with respect to other *fixedbit_elt*
and *field_elt*.

If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined.
Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that.

A *field_ref* incorporates a field by reference.  This is the only way to
add a complex field to a format.  A field may be renamed in the process
via assignment to another identifier.  This is intended to allow the
same argument set be used with disjoint named fields.

A single *args_ref* may specify an argument set to use for the format.
The set of fields in the format must be a subset of the arguments in
the argument set.  If an argument set is not specified, one will be
inferred from the set of fields.

It is recommended, but not required, that all *field_ref* and *args_ref*
appear at the end of the line, not interleaving with *fixedbit_elf* or
*field_elt*.

Format examples::

  @opr    ...... ra:5 rb:5 ... 0 ....... rc:5
  @opi    ...... ra:5 lit:8    1 ....... rc:5

Patterns
========

Syntax::

  pat_def      := identifier ( pat_elt )+
  pat_elt      := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt
  fmt_ref      := '@' identifier
  const_elt    := identifier '=' number

The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats.
A pattern that does not specify a named format will have one inferred
from a referenced argument set (if present) and the set of fields.

A *const_elt* allows a argument to be set to a constant value.  This may
come in handy when fields overlap between patterns and one has to
include the values in the *fixedbit_elt* instead.

The decoder will call a translator function for each pattern matched.

Pattern examples::

  addl_r   010000 ..... ..... .... 0000000 ..... @opr
  addl_i   010000 ..... ..... .... 0000000 ..... @opi

which will, in part, invoke::

  trans_addl_r(ctx, &arg_opr, insn)

and::

  trans_addl_i(ctx, &arg_opi, insn)

Pattern Groups
==============

Syntax::

  group    := '{' ( pat_def | group )+ '}'

A *group* begins with a lone open-brace, with all subsequent lines
indented two spaces, and ending with a lone close-brace.  Groups
may be nested, increasing the required indentation of the lines
within the nested group to two spaces per nesting level.

Unlike ungrouped patterns, grouped patterns are allowed to overlap.
Conflicts are resolved by selecting the patterns in order.  If all
of the fixedbits for a pattern match, its translate function will
be called.  If the translate function returns false, then subsequent
patterns within the group will be matched.

The following example from PA-RISC shows specialization of the *or*
instruction::

  {
    {
      nop   000010 ----- ----- 0000 001001 0 00000
      copy  000010 00000 r1:5  0000 001001 0 rt:5
    }
    or      000010 rt2:5 r1:5  cf:4 001001 0 rt:5
  }

When the *cf* field is zero, the instruction has no side effects,
and may be specialized.  When the *rt* field is zero, the output
is discarded and so the instruction has no effect.  When the *rt2*
field is zero, the operation is ``reg[rt] | 0`` and so encodes
the canonical register copy operation.

The output from the generator might look like::

  switch (insn & 0xfc000fe0) {
  case 0x08000240:
    /* 000010.. ........ ....0010 010..... */
    if ((insn & 0x0000f000) == 0x00000000) {
        /* 000010.. ........ 00000010 010..... */
        if ((insn & 0x0000001f) == 0x00000000) {
            /* 000010.. ........ 00000010 01000000 */
            extract_decode_Fmt_0(&u.f_decode0, insn);
            if (trans_nop(ctx, &u.f_decode0)) return true;
        }
        if ((insn & 0x03e00000) == 0x00000000) {
            /* 00001000 000..... 00000010 010..... */
            extract_decode_Fmt_1(&u.f_decode1, insn);
            if (trans_copy(ctx, &u.f_decode1)) return true;
        }
    }
    extract_decode_Fmt_2(&u.f_decode2, insn);
    if (trans_or(ctx, &u.f_decode2)) return true;
    return false;
  }
Commit	Line	Data
3fdbf5d6 RH	1	========================
	2	Decodetree Specification
	3	========================
	4
	5	A decodetree is built from instruction patterns. A pattern may
	6	represent a single architectural instruction or a group of same, depending
	7	on what is convenient for further processing.
	8
	9	Each pattern has both fixedbits and fixedmask, the combination of which
	10	describes the condition under which the pattern is matched::
	11
	12	(insn & fixedmask) == fixedbits
	13
	14	Each pattern may have fields, which are extracted from the insn and
	15	passed along to the translator. Examples of such are registers,
	16	immediates, and sub-opcodes.
	17
	18	In support of patterns, one may declare fields, argument sets, and
	19	formats, each of which may be re-used to simplify further definitions.
	20
	21	Fields
	22	======
	23
	24	Syntax::
	25
	26	field_def := '%' identifier ( unnamed_field )+ ( !function=identifier )?
	27	unnamed_field := number ':' ( 's' ) number
	28
	29	For unnamed_field, the first number is the least-significant bit position
	30	of the field and the second number is the length of the field. If the 's' is
	31	present, the field is considered signed. If multiple ``unnamed_fields`` are
	32	present, they are concatenated. In this way one can define disjoint fields.
	33
	34	If ``!function`` is specified, the concatenated result is passed through the
	35	named function, taking and returning an integral value.
	36
	37	FIXME: the fields of the structure into which this result will be stored
	38	is restricted to ``int``. Which means that we cannot expand 64-bit items.
	39
	40	Field examples:
	41
	42	+---------------------------+---------------------------------------------+
	43	\| Input \| Generated code \|
	44	+===========================+=============================================+
	45	\| %disp 0:s16 \| sextract(i, 0, 16) \|
	46	+---------------------------+---------------------------------------------+
	47	\| %imm9 16:6 10:3 \| extract(i, 16, 6) << 3 \| extract(i, 10, 3) \|
	48	+---------------------------+---------------------------------------------+
	49	\| %disp12 0:s1 1:1 2:10 \| sextract(i, 0, 1) << 11 \| \|
	50	\| \| extract(i, 1, 1) << 10 \| \|
	51	\| \| extract(i, 2, 10) \|
	52	+---------------------------+---------------------------------------------+
	53	\| %shimm8 5:s8 13:1 \| expand_shimm8(sextract(i, 5, 8) << 1 \| \|
	54	\| !function=expand_shimm8 \| extract(i, 13, 1)) \|
	55	+---------------------------+---------------------------------------------+
	56
	57	Argument Sets
	58	=============
	59
	60	Syntax::
	61
	62	args_def := '&' identifier ( args_elt )+ ( !extern )?
	63	args_elt := identifier
	64
65	Each args_elt defines an argument within the argument set.
66	Each argument set will be rendered as a C structure "arg_$name"
67	with each of the fields being one of the member arguments.
68
69	If ``!extern`` is specified, the backing structure is assumed
70	to have been already declared, typically via a second decoder.
71
5d53b0f5 RH	72	Argument sets are useful when one wants to define helper functions
	73	for the translator functions that can perform operations on a common
	74	set of arguments. This can ensure, for instance, that the ``AND``
	75	pattern and the ``OR`` pattern put their operands into the same named
	76	structure, so that a common ``gen_logic_insn`` may be able to handle
	77	the operations common between the two.
	78
3fdbf5d6 RH	79	Argument set examples::
	80
	81	&reg3 ra rb rc
	82	&loadstore reg base offset
	83
	84
	85	Formats
	86	=======
	87
	88	Syntax::
	89
	90	fmt_def := '@' identifier ( fmt_elt )+
	91	fmt_elt := fixedbit_elt \| field_elt \| field_ref \| args_ref
	92	fixedbit_elt := [01.-]+
	93	field_elt := identifier ':' 's'? number
	94	field_ref := '%' identifier \| identifier '=' '%' identifier
	95	args_ref := '&' identifier
	96
	97	Defining a format is a handy way to avoid replicating groups of fields
	98	across many instruction patterns.
	99
	100	A fixedbit_elt describes a contiguous sequence of bits that must
	101	be 1, 0, or don't care. The difference between '.' and '-'
	102	is that '.' means that the bit will be covered with a field or a
	103	final 0 or 1 from the pattern, and '-' means that the bit is really
	104	ignored by the cpu and will not be specified.
	105
	106	A field_elt describes a simple field only given a width; the position of
	107	the field is implied by its position with respect to other fixedbit_elt
	108	and field_elt.
	109
	110	If any fixedbit_elt or field_elt appear, then all bits must be defined.
	111	Padding with a fixedbit_elt of all '.' is an easy way to accomplish that.
	112
	113	A field_ref incorporates a field by reference. This is the only way to
	114	add a complex field to a format. A field may be renamed in the process
	115	via assignment to another identifier. This is intended to allow the
	116	same argument set be used with disjoint named fields.
	117
	118	A single args_ref may specify an argument set to use for the format.
	119	The set of fields in the format must be a subset of the arguments in
	120	the argument set. If an argument set is not specified, one will be
	121	inferred from the set of fields.
	122
	123	It is recommended, but not required, that all field_ref and args_ref
	124	appear at the end of the line, not interleaving with fixedbit_elf or
	125	field_elt.
	126
	127	Format examples::
	128
	129	@opr ...... ra:5 rb:5 ... 0 ....... rc:5
	130	@opi ...... ra:5 lit:8 1 ....... rc:5
	131
	132	Patterns
	133	========
	134
	135	Syntax::
	136
	137	pat_def := identifier ( pat_elt )+
	138	pat_elt := fixedbit_elt \| field_elt \| field_ref \| args_ref \| fmt_ref \| const_elt
	139	fmt_ref := '@' identifier
	140	const_elt := identifier '=' number
	141
	142	The fixedbit_elt and field_elt specifiers are unchanged from formats.
143	A pattern that does not specify a named format will have one inferred
144	from a referenced argument set (if present) and the set of fields.
145
146	A const_elt allows a argument to be set to a constant value. This may
147	come in handy when fields overlap between patterns and one has to
148	include the values in the fixedbit_elt instead.
149
150	The decoder will call a translator function for each pattern matched.
151
152	Pattern examples::
153
154	addl_r 010000 ..... ..... .... 0000000 ..... @opr
155	addl_i 010000 ..... ..... .... 0000000 ..... @opi
156
157	which will, in part, invoke::
158
159	trans_addl_r(ctx, &arg_opr, insn)
160
161	and::
162
163	trans_addl_i(ctx, &arg_opi, insn)
0eff2df4 RH	164
	165	Pattern Groups
	166	==============
	167
	168	Syntax::
	169
	170	group := '{' ( pat_def \| group )+ '}'
	171
	172	A group begins with a lone open-brace, with all subsequent lines
	173	indented two spaces, and ending with a lone close-brace. Groups
	174	may be nested, increasing the required indentation of the lines
	175	within the nested group to two spaces per nesting level.
	176
	177	Unlike ungrouped patterns, grouped patterns are allowed to overlap.
	178	Conflicts are resolved by selecting the patterns in order. If all
	179	of the fixedbits for a pattern match, its translate function will
	180	be called. If the translate function returns false, then subsequent
	181	patterns within the group will be matched.
	182
	183	The following example from PA-RISC shows specialization of the or
	184	instruction::
	185
	186	{
	187	{
	188	nop 000010 ----- ----- 0000 001001 0 00000
	189	copy 000010 00000 r1:5 0000 001001 0 rt:5
	190	}
	191	or 000010 rt2:5 r1:5 cf:4 001001 0 rt:5
	192	}
	193
	194	When the cf field is zero, the instruction has no side effects,
	195	and may be specialized. When the rt field is zero, the output
	196	is discarded and so the instruction has no effect. When the rt2
	197	field is zero, the operation is ``reg[rt] \| 0`` and so encodes
	198	the canonical register copy operation.
	199
	200	The output from the generator might look like::
	201
	202	switch (insn & 0xfc000fe0) {
	203	case 0x08000240:
	204	/* 000010.. ........ ....0010 010..... */
	205	if ((insn & 0x0000f000) == 0x00000000) {
	206	/* 000010.. ........ 00000010 010..... */
	207	if ((insn & 0x0000001f) == 0x00000000) {
	208	/* 000010.. ........ 00000010 01000000 */
	209	extract_decode_Fmt_0(&u.f_decode0, insn);
	210	if (trans_nop(ctx, &u.f_decode0)) return true;
	211	}
	212	if ((insn & 0x03e00000) == 0x00000000) {
	213	/* 00001000 000..... 00000010 010..... */
	214	extract_decode_Fmt_1(&u.f_decode1, insn);
	215	if (trans_copy(ctx, &u.f_decode1)) return true;
	216	}
	217	}
	218	extract_decode_Fmt_2(&u.f_decode2, insn);
	219	if (trans_or(ctx, &u.f_decode2)) return true;
	220	return false;
	221	}