X-Git-Url: https://repo.jachan.dev/qemu.git/blobdiff_plain/d262cb02861dd33375c08fc798930653b14769e9..bc994b74ea38579e18f9d9144021c6f8de597a34:/tcg/README diff --git a/tcg/README b/tcg/README index ec1ac79375..f4a8ac170b 100644 --- a/tcg/README +++ b/tcg/README @@ -14,6 +14,10 @@ the emulated architecture. As TCG started as a generic C backend used for cross compiling, it is assumed that the TCG target is different from the host, although it is never the case for QEMU. +In this document, we use "guest" to specify what architecture we are +emulating; "target" always means the TCG target, the machine on which +we are running QEMU. + A TCG "function" corresponds to a QEMU Translated Block (TB). A TCG "temporary" is a variable only live in a basic @@ -32,6 +36,12 @@ or a memory location which is stored in a register outside QEMU TBs A TCG "basic block" corresponds to a list of instructions terminated by a branch instruction. +An operation with "undefined behavior" may result in a crash. + +An operation with "unspecified behavior" shall not crash. However, +the result may be one of several possibilities so may be considered +an "undefined result". + 3) Intermediate representation 3.1) Introduction @@ -235,23 +245,25 @@ t0=t1|~t2 * shl_i32/i64 t0, t1, t2 -t0=t1 << t2. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64) +t0=t1 << t2. Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64) * shr_i32/i64 t0, t1, t2 -t0=t1 >> t2 (unsigned). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64) +t0=t1 >> t2 (unsigned). Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64) * sar_i32/i64 t0, t1, t2 -t0=t1 >> t2 (signed). Undefined behavior if t2 < 0 or t2 >= 32 (resp 64) +t0=t1 >> t2 (signed). Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64) * rotl_i32/i64 t0, t1, t2 -Rotation of t2 bits to the left. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64) +Rotation of t2 bits to the left. +Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64) * rotr_i32/i64 t0, t1, t2 -Rotation of t2 bits to the right. Undefined behavior if t2 < 0 or t2 >= 32 (resp 64) +Rotation of t2 bits to the right. +Unspecified behavior if t2 < 0 or t2 >= 32 (resp 64) ********* Misc @@ -302,6 +314,17 @@ This operation would be equivalent to dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00) +* extrl_i64_i32 t0, t1 + +For 64-bit hosts only, extract the low 32-bits of input T1 and place it +into 32-bit output T0. Depending on the host, this may be a simple move, +or may require additional canonicalization. + +* extrh_i64_i32 t0, t1 + +For 64-bit hosts only, extract the high 32-bits of input T1 and place it +into 32-bit output T0. Depending on the host, this may be a simple shift, +or may require additional canonicalization. ********* Conditional moves @@ -361,7 +384,25 @@ Write 8, 16, 32 or 64 bits to host memory. All this opcodes assume that the pointed host memory doesn't correspond to a global. In the latter case the behaviour is unpredictable. -********* 64-bit target on 32-bit host support +********* Multiword arithmetic support + +* add2_i32/i64 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high +* sub2_i32/i64 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high + +Similar to add/sub, except that the double-word inputs T1 and T2 are +formed from two single-word arguments, and the double-word output T0 +is returned in two single-word outputs. + +* mulu2_i32/i64 t0_low, t0_high, t1, t2 + +Similar to mul, except two unsigned inputs T1 and T2 yielding the full +double-word product T0. The later is returned in two single-word outputs. + +* muls2_i32/i64 t0_low, t0_high, t1, t2 + +Similar to mulu2, except the two inputs T1 and T2 are signed. + +********* 64-bit guest on 32-bit host support The following opcodes are internal to TCG. Thus they are to be implemented by 32-bit host code generators, but are not to be emitted by guest translators. @@ -372,18 +413,6 @@ They are emitted as needed by inline functions within "tcg-op.h". Similar to brcond, except that the 64-bit values T0 and T1 are formed from two 32-bit arguments. -* add2_i32 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high -* sub2_i32 t0_low, t0_high, t1_low, t1_high, t2_low, t2_high - -Similar to add/sub, except that the 64-bit inputs T1 and T2 are -formed from two 32-bit arguments, and the 64-bit output T0 -is returned in two 32-bit outputs. - -* mulu2_i32 t0_low, t0_high, t1, t2 - -Similar to mul, except two 32-bit (unsigned) inputs T1 and T2 yielding -the full 64-bit product T0. The later is returned in two 32-bit outputs. - * setcond2_i32 dest, t1_low, t1_high, t2_low, t2_high, cond Similar to setcond, except that the 64-bit values T1 and T2 are @@ -402,30 +431,25 @@ current TB was linked to this TB. Otherwise execute the next instructions. Only indices 0 and 1 are valid and tcg_gen_goto_tb may be issued at most once with each slot index per TB. -* qemu_ld8u t0, t1, flags -qemu_ld8s t0, t1, flags -qemu_ld16u t0, t1, flags -qemu_ld16s t0, t1, flags -qemu_ld32 t0, t1, flags -qemu_ld32u t0, t1, flags -qemu_ld32s t0, t1, flags -qemu_ld64 t0, t1, flags +* qemu_ld_i32/i64 t0, t1, flags, memidx +* qemu_st_i32/i64 t0, t1, flags, memidx + +Load data at the guest address t1 into t0, or store data in t0 at guest +address t1. The _i32/_i64 size applies to the size of the input/output +register t0 only. The address t1 is always sized according to the guest, +and the width of the memory operation is controlled by flags. -Load data at the QEMU CPU address t1 into t0. t1 has the QEMU CPU address -type. 'flags' contains the QEMU memory index (selects user or kernel access) -for example. +Both t0 and t1 may be split into little-endian ordered pairs of registers +if dealing with 64-bit quantities on a 32-bit host. -Note that "qemu_ld32" implies a 32-bit result, while "qemu_ld32u" and -"qemu_ld32s" imply a 64-bit result appropriately extended from 32 bits. +The memidx selects the qemu tlb index to use (e.g. user or kernel access). +The flags are the TCGMemOp bits, selecting the sign, width, and endianness +of the memory access. -* qemu_st8 t0, t1, flags -qemu_st16 t0, t1, flags -qemu_st32 t0, t1, flags -qemu_st64 t0, t1, flags +For a 32-bit host, qemu_ld/st_i64 is guaranteed to only be used with a +64-bit memory access specified in flags. -Store the data t0 at the QEMU CPU Address t1. t1 has the QEMU CPU -address type. 'flags' contains the QEMU memory index (selects user or -kernel access) for example. +********* Note 1: Some shortcuts are defined when the last operand is known to be a constant (e.g. addi for add, movi for mov). @@ -436,8 +460,9 @@ function tcg_gen_xxx(args). 4) Backend -tcg-target.h contains the target specific definitions. tcg-target.c -contains the target specific code. +tcg-target.h contains the target specific definitions. tcg-target.inc.c +contains the target specific code; it is #included by tcg/tcg.c, rather +than being a standalone C file. 4.1) Assumptions @@ -448,13 +473,25 @@ On a 32 bit target, all 64 bit operations are converted to 32 bits. A few specific operations must be implemented to allow it (see add2_i32, sub2_i32, brcond2_i32). +On a 64 bit target, the values are transfered between 32 and 64-bit +registers using the following ops: +- trunc_shr_i64_i32 +- ext_i32_i64 +- extu_i32_i64 + +They ensure that the values are correctly truncated or extended when +moved from a 32-bit to a 64-bit register or vice-versa. Note that the +trunc_shr_i64_i32 is an optional op. It is not necessary to implement +it if all the following conditions are met: +- 64-bit registers can hold 32-bit values +- 32-bit values in a 64-bit register do not need to stay zero or + sign extended +- all 32-bit TCG ops ignore the high part of 64-bit registers + Floating point operations are not supported in this version. A previous incarnation of the code generator had full support of them, but it is better to concentrate on integer operations first. -On a 64 bit target, no assumption is made in TCG about the storage of -the 32 bit values in 64 bit registers. - 4.2) Constraints GCC like constraints are used to define the constraints of every @@ -515,9 +552,9 @@ register. a better generated code, but it reduces the memory usage of TCG and the speed of the translation. -- Don't hesitate to use helpers for complicated or seldom used target +- Don't hesitate to use helpers for complicated or seldom used guest instructions. There is little performance advantage in using TCG to - implement target instructions taking more than about twenty TCG + implement guest instructions taking more than about twenty TCG instructions. Note that this rule of thumb is more applicable to helpers doing complex logic or arithmetic, where the C compiler has scope to do a good job of optimisation; it is less relevant where @@ -525,9 +562,9 @@ register. inline TCG may still be faster for longer sequences. - The hard limit on the number of TCG instructions you can generate - per target instruction is set by MAX_OP_PER_INSTR in exec-all.h -- + per guest instruction is set by MAX_OP_PER_INSTR in exec-all.h -- you cannot exceed this without risking a buffer overrun. - Use the 'discard' instruction if you know that TCG won't be able to prove that a given global is "dead" at a given program point. The - x86 target uses it to improve the condition codes optimisation. + x86 guest uses it to improve the condition codes optimisation.