for cross compiling, it is assumed that the TCG target is different
from the host, although it is never the case for QEMU.
+In this document, we use "guest" to specify what architecture we are
+emulating; "target" always means the TCG target, the machine on which
+we are running QEMU.
+
A TCG "function" corresponds to a QEMU Translated Block (TB).
A TCG "temporary" is a variable only live in a basic
A TCG "basic block" corresponds to a list of instructions terminated
by a branch instruction.
+An operation with "undefined behavior" may result in a crash.
+
+An operation with "unspecified behavior" shall not crash. However,
+the result may be one of several possibilities so may be considered
+an "undefined result".
+
3) Intermediate representation
3.1) Introduction
Using the tcg_gen_helper_x_y it is possible to call any function
taking i32, i64 or pointer types. By default, before calling a helper,
all globals are stored at their canonical location and it is assumed
-that the function can modify them. This can be overridden by the
-TCG_CALL_CONST function modifier. By default, the helper is allowed to
-modify the CPU state or raise an exception. This can be overridden by
-the TCG_CALL_PURE function modifier, in which case the call to the
-function is removed if the return value is not used.
+that the function can modify them. By default, the helper is allowed to
+modify the CPU state or raise an exception.
+
+This can be overridden using the following function modifiers:
+- TCG_CALL_NO_READ_GLOBALS means that the helper does not read globals,
+ either directly or via an exception. They will not be saved to their
+ canonical locations before calling the helper.
+- TCG_CALL_NO_WRITE_GLOBALS means that the helper does not modify any globals.
+ They will only be saved to their canonical location before calling helpers,
+ but they won't be reloaded afterwise.
+- TCG_CALL_NO_SIDE_EFFECTS means that the call to the function is removed if
+ the return value is not used.
+
+Note that TCG_CALL_NO_READ_GLOBALS implies TCG_CALL_NO_WRITE_GLOBALS.
On some TCG targets (e.g. x86), several calling conventions are
supported.
* Branches:
-Use the instruction 'br' to jump to a label. Use 'jmp' to jump to an
-explicit address. Conditional branches can only jump to labels.
+Use the instruction 'br' to jump to a label.
3.3) Code Optimizations
********* Jumps/Labels
-* jmp t0
-
-Absolute jump to address t0 (pointer type).
-
* set_label $label
Define label 'label' at the current program point.
* shl_i32/i64 t0, t1, t2
-t0=t1 << t2. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
+t0=t1 << t2. Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64)
* shr_i32/i64 t0, t1, t2
-t0=t1 >> t2 (unsigned). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
+t0=t1 >> t2 (unsigned). Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64)
* sar_i32/i64 t0, t1, t2
-t0=t1 >> t2 (signed). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
+t0=t1 >> t2 (signed). Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64)
* rotl_i32/i64 t0, t1, t2
-Rotation of t2 bits to the left. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
+Rotation of t2 bits to the left.
+Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64)
* rotr_i32/i64 t0, t1, t2
-Rotation of t2 bits to the right. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64)
+Rotation of t2 bits to the right.
+Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64)
********* Misc
dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
+* extrl_i64_i32 t0, t1
+
+For 64-bit hosts only, extract the low 32-bits of input T1 and place it
+into 32-bit output T0. Depending on the host, this may be a simple move,
+or may require additional canonicalization.
+
+* extrh_i64_i32 t0, t1
+
+For 64-bit hosts only, extract the high 32-bits of input T1 and place it
+into 32-bit output T0. Depending on the host, this may be a simple shift,
+or may require additional canonicalization.
********* Conditional moves
write(t0, t1 + offset)
Write 8, 16, 32 or 64 bits to host memory.
-********* 64-bit target on 32-bit host support
+All this opcodes assume that the pointed host memory doesn't correspond
+to a global. In the latter case the behaviour is unpredictable.
+
+********* Multiword arithmetic support
+
+* add2_i32/i64 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high
+* sub2_i32/i64 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high
+
+Similar to add/sub, except that the double-word inputs T1 and T2 are
+formed from two single-word arguments, and the double-word output T0
+is returned in two single-word outputs.
+
+* mulu2_i32/i64 t0_low, t0_high, t1, t2
+
+Similar to mul, except two unsigned inputs T1 and T2 yielding the full
+double-word product T0. The later is returned in two single-word outputs.
+
+* muls2_i32/i64 t0_low, t0_high, t1, t2
+
+Similar to mulu2, except the two inputs T1 and T2 are signed.
+
+********* 64-bit guest on 32-bit host support
The following opcodes are internal to TCG. Thus they are to be implemented by
32-bit host code generators, but are not to be emitted by guest translators.
Similar to brcond, except that the 64-bit values T0 and T1
are formed from two 32-bit arguments.
-* add2_i32 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high
-* sub2_i32 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high
-
-Similar to add/sub, except that the 64-bit inputs T1 and T2 are
-formed from two 32-bit arguments, and the 64-bit output T0
-is returned in two 32-bit outputs.
-
-* mulu2_i32 t0_low, t0_high, t1, t2
-
-Similar to mul, except two 32-bit (unsigned) inputs T1 and T2 yielding
-the full 64-bit product T0. The later is returned in two 32-bit outputs.
-
* setcond2_i32 dest, t1_low, t1_high, t2_low, t2_high, cond
Similar to setcond, except that the 64-bit values T1 and T2 are
instructions. Only indices 0 and 1 are valid and tcg_gen_goto_tb may be issued
at most once with each slot index per TB.
-* qemu_ld8u t0, t1, flags
-qemu_ld8s t0, t1, flags
-qemu_ld16u t0, t1, flags
-qemu_ld16s t0, t1, flags
-qemu_ld32 t0, t1, flags
-qemu_ld32u t0, t1, flags
-qemu_ld32s t0, t1, flags
-qemu_ld64 t0, t1, flags
+* qemu_ld_i32/i64 t0, t1, flags, memidx
+* qemu_st_i32/i64 t0, t1, flags, memidx
-Load data at the QEMU CPU address t1 into t0. t1 has the QEMU CPU address
-type. 'flags' contains the QEMU memory index (selects user or kernel access)
-for example.
+Load data at the guest address t1 into t0, or store data in t0 at guest
+address t1. The _i32/_i64 size applies to the size of the input/output
+register t0 only. The address t1 is always sized according to the guest,
+and the width of the memory operation is controlled by flags.
-Note that "qemu_ld32" implies a 32-bit result, while "qemu_ld32u" and
-"qemu_ld32s" imply a 64-bit result appropriately extended from 32 bits.
+Both t0 and t1 may be split into little-endian ordered pairs of registers
+if dealing with 64-bit quantities on a 32-bit host.
-* qemu_st8 t0, t1, flags
-qemu_st16 t0, t1, flags
-qemu_st32 t0, t1, flags
-qemu_st64 t0, t1, flags
+The memidx selects the qemu tlb index to use (e.g. user or kernel access).
+The flags are the TCGMemOp bits, selecting the sign, width, and endianness
+of the memory access.
-Store the data t0 at the QEMU CPU Address t1. t1 has the QEMU CPU
-address type. 'flags' contains the QEMU memory index (selects user or
-kernel access) for example.
+For a 32-bit host, qemu_ld/st_i64 is guaranteed to only be used with a
+64-bit memory access specified in flags.
+
+*********
Note 1: Some shortcuts are defined when the last operand is known to be
a constant (e.g. addi for add, movi for mov).
4) Backend
-tcg-target.h contains the target specific definitions. tcg-target.c
-contains the target specific code.
+tcg-target.h contains the target specific definitions. tcg-target.inc.c
+contains the target specific code; it is #included by tcg/tcg.c, rather
+than being a standalone C file.
4.1) Assumptions
few specific operations must be implemented to allow it (see add2_i32,
sub2_i32, brcond2_i32).
+On a 64 bit target, the values are transfered between 32 and 64-bit
+registers using the following ops:
+- trunc_shr_i64_i32
+- ext_i32_i64
+- extu_i32_i64
+
+They ensure that the values are correctly truncated or extended when
+moved from a 32-bit to a 64-bit register or vice-versa. Note that the
+trunc_shr_i64_i32 is an optional op. It is not necessary to implement
+it if all the following conditions are met:
+- 64-bit registers can hold 32-bit values
+- 32-bit values in a 64-bit register do not need to stay zero or
+ sign extended
+- all 32-bit TCG ops ignore the high part of 64-bit registers
+
Floating point operations are not supported in this version. A
previous incarnation of the code generator had full support of them,
but it is better to concentrate on integer operations first.
-On a 64 bit target, no assumption is made in TCG about the storage of
-the 32 bit values in 64 bit registers.
-
4.2) Constraints
GCC like constraints are used to define the constraints of every
a better generated code, but it reduces the memory usage of TCG and
the speed of the translation.
-- Don't hesitate to use helpers for complicated or seldom used target
+- Don't hesitate to use helpers for complicated or seldom used guest
instructions. There is little performance advantage in using TCG to
- implement target instructions taking more than about twenty TCG
+ implement guest instructions taking more than about twenty TCG
instructions. Note that this rule of thumb is more applicable to
helpers doing complex logic or arithmetic, where the C compiler has
scope to do a good job of optimisation; it is less relevant where
inline TCG may still be faster for longer sequences.
- The hard limit on the number of TCG instructions you can generate
- per target instruction is set by MAX_OP_PER_INSTR in exec-all.h --
+ per guest instruction is set by MAX_OP_PER_INSTR in exec-all.h --
you cannot exceed this without risking a buffer overrun.
- Use the 'discard' instruction if you know that TCG won't be able to
prove that a given global is "dead" at a given program point. The
- x86 target uses it to improve the condition codes optimisation.
+ x86 guest uses it to improve the condition codes optimisation.