target-i386: optimize flags checking after sub using CC_SRCT
After a comparison or subtraction, the original value of the LHS will
currently be reconstructed using an addition. However, in most cases
it is already available: store it in a temp-local variable and save 1
or 2 TCG ops (2 if the result of the addition needs to be extended).
The temp-local can be declared dead as soon as the cc_op changes again,
or also before the translation block ends because gen_prepare_cc will
always make a copy before returning it. All this magic, plus copy
propagation and dead-code elimination, ensures that the temp local will
(almost) never be spilled.
Placing the CC_OP_DYNAMIC at the join is less effective than
before the branch, as the branch will have forced global registers
to their home locations. This way we have a chance to discard
CC_SRC2 before it gets stored.
A jump that ends a basic block or otherwise falls back to CC_OP_DYNAMIC
will always have to call gen_op_set_cc_op. However, not all jumps end
a basic block, so introduce a variant that does not do this.
This was partially undone earlier (i386: drop cc_op argument of gen_jcc1),
redo it now also to prepare for the introduction of src2.
Paolo Bonzini [Fri, 12 Oct 2012 13:04:10 +0000 (15:04 +0200)]
target-i386: kill cpu_T3
It is almost unused, and it is simpler to pass a TCG value directly
to gen_shiftd_rm_T1_T3. This value is then written to t2 without
going through a temporary register.
Paolo Bonzini [Sun, 7 Oct 2012 13:53:23 +0000 (15:53 +0200)]
target-i386: use CCPrepare to generate conditional jumps
This simplifies all the jump generation code. CCPrepare allows the
code to create an efficient brcond always, so there is no need to
duplicate the setcc and jcc code.
This makes the i386 front-end able to create CCPrepare structs for all
condition, not just those that come from a single flag. In particular,
JCC_L and JCC_LE can be optimized because gen_prepare_cc is not forced
to return a result in bit 0 (unlike gen_setcc_slow).
However, for now the slow jcc operations will still go through CC
computation in a single-bit temporary, followed by a brcond if the
temporary is nonzero.
Introduce a struct that describes how to build a *cond operation
that checks for a given x86 condition code. For now, just change
gen_compute_eflags_* to return the new struct, generate code for
the CCPrepare struct, and go on as before.
[rth: Use ctz with the proper width rather than ffs.]
Paolo Bonzini [Fri, 5 Oct 2012 21:00:10 +0000 (23:00 +0200)]
target-i386: optimize setcc instructions
Reconstruct the arguments for complex conditions involving CC_OP_SUBx (BE,
L, LE). In the others do it via setcond and gen_setcc_slow (which is
not that slow in many cases).
Do the switch at translation time, converting the helper templates to
TCG opcodes. In some cases CF can be computed with a single setcond,
though others it may require a little more work.
In the CC_OP_DYNAMIC case, compute the whole EFLAGS, same as for ZF/SF/PF.
target-i386: use inverted setcond when computing NS or NZ
Make gen_compute_eflags_z and gen_compute_eflags_s able to compute the
inverted condition, and use this in gen_setcc_slow_T0. We cannot do it
yet in gen_compute_eflags_c, but prepare the code for it anyway. It is
not worthwhile for PF, as usual.
shr+and+xor could be replaced by and+setcond. I'm not doing it yet.
ZF, SF and PF can always be computed from CC_DST except in the
CC_OP_EFLAGS case (and CC_OP_DYNAMIC, which just resolves to CC_OP_EFLAGS
in gen_compute_eflags). Use setcond to compute ZF and SF.
This gets us universal coverage, rather than scattering discards
around at various places. As a bonus, we do not emit redundant
discards e.g. between sequential logic insns.
target-i386: do not compute eflags multiple times consecutively
After calling gen_compute_eflags, leave the computed value in cc_reg_src
and set cc_op to CC_OP_EFLAGS. The next few patches will remove anyway
most calls to gen_compute_eflags.
As a result of this change it is more natural to remove the register
argument from gen_compute_eflags and change all the callers.
Paolo Bonzini [Fri, 5 Oct 2012 16:29:21 +0000 (18:29 +0200)]
target-i386: factor gen_op_set_cc_op/tcg_gen_discard_tl around computing flags
Before computing flags we need to store the cc_op to memory. Move this
to gen_compute_eflags_c and gen_compute_eflags rather than doing it all
over the place.
Alo, after computing the flags in cpu_cc_src we are in EFLAGS mode.
Set s->cc_op and discard cpu_cc_dst in gen_compute_eflags, rather than
doing it all over the place.
Always compute EFLAGS first since it is needed whenever
the shift is non-zero, i.e. most of the time. This makes it possible
to remove some writes of CC_OP_EFLAGS to cpu_cc_op and more importantly
removes cases where s->cc_op becomes CC_OP_DYNAMIC. Also, we can
remove cc_tmp and just modify cc_src from within the helper.
Finally, always follow gen_compute_eflags(cpu_cc_src) by setting s->cc_op
and discarding cpu_cc_dst.
Paolo Bonzini [Fri, 5 Oct 2012 22:18:55 +0000 (00:18 +0200)]
target-i386: move eflags computation closer to gen_op_set_cc_op
This ensures the invariant that cpu_cc_op matches s->cc_op when calling
the helpers. The next patches need this because gen_compute_eflags and
gen_compute_eflags_c will take care of setting cpu_cc_op.
Always compute EFLAGS first since it is needed whenever the shift is
non-zero, i.e. most of the time. This makes it possible to remove some
writes of CC_OP_EFLAGS to cpu_cc_op and more importantly removes cases
where s->cc_op becomes CC_OP_DYNAMIC. These are slow and we want to
avoid them: CC_OP_EFLAGS is quite efficient once we paid the initial
cost of computing the flags.
Finally, always follow gen_compute_eflags(cpu_cc_src) by setting s->cc_op
and discarding cpu_cc_dst.
Paolo Bonzini [Fri, 5 Oct 2012 22:18:55 +0000 (00:18 +0200)]
target-i386: move carry computation for inc/dec closer to gen_op_set_cc_op
This ensures the invariant that cpu_cc_op matches s->cc_op when calling
the helpers. The next patches need this because gen_compute_eflags and
gen_compute_eflags_c will take care of setting cpu_cc_op.
Paolo Bonzini [Fri, 5 Oct 2012 23:36:45 +0000 (01:36 +0200)]
target-i386: drop cc_op argument of gen_jcc1
As in the gen_repz_scas/gen_repz_cmps case, delay setting
CC_OP_DYNAMIC in gen_jcc until after code generation. All of
gen_jcc1/is_fast_jcc/gen_setcc_slow_T0 now work on s->cc_op, which makes
things a bit easier to follow and to patch.
Andre Przywara [Thu, 18 Oct 2012 09:16:58 +0000 (11:16 +0200)]
vnc-tls: Fix compilation with newer versions of GNU-TLS
In my installation of GNU-TLS (v3.0.23) the type
gnutls_anon_server_credentials is marked deprecated, so -Werror
breaks compilation.
Simply replacing it with the newer ..._t version fixed the compilation
on my machine (Slackware 14.0). I cannot tell how far back this "new"
type goes, at least the header file in RHEL 5.0 (v1.4.1) seems to have
it already. If someone finds a broken distribution, tell me and I
insert some compat code.
End tables before headings, start new ones afterwards. Fixes
incorrect indentation of headings "File system options" and "Virtual
File system pass-through options" in manual page and qemu-doc.
Normalize markup some to increase chances it survives future edits.
Andreas Färber [Sat, 16 Feb 2013 21:44:01 +0000 (22:44 +0100)]
libqtest: Convert macros to functions and clean up documentation
libqtest.h provides a number of shortcut macros to avoid tests feeding
it the QTestState they operate on. Most of these can easily be turned
into static inline functions, so let's do that for clarity.
This avoids getting off-by-one error messages when passing wrong args.
Some macros had a val argument but documented @value argument. Fix this.
While touching things, enforce gtk-doc markup for return values and for
referencing types.
Anthony Liguori [Mon, 18 Feb 2013 14:37:29 +0000 (08:37 -0600)]
Merge remote-tracking branch 'afaerber/qom-cpu' into staging
# By Andreas Färber
# Via Andreas Färber
* afaerber/qom-cpu: (47 commits)
target-i386: Split command line parsing out of cpu_x86_register()
target-i386: Move cpu_x86_init()
target-lm32: Drop unused cpu_lm32_close() prototype
target-s390x: Drop unused cpu_s390x_close() prototype
spapr_hcall: Replace open-coded CPU loop with qemu_get_cpu()
ppce500_spin: Replace open-coded CPU loop with qemu_get_cpu()
e500: Replace open-coded loop with qemu_get_cpu()
cpu: Add CPUArchState pointer to CPUState
cputlb: Pass CPUState to cpu_unlink_tb()
cpu: Move current_tb field to CPUState
cpu: Move exit_request field to CPUState
cpu: Move running field to CPUState
cpu: Move host_tid field to CPUState
target-cris: Introduce CRISCPU subclasses
target-m68k: Pass M68kCPU to m68k_set_irq_level()
mcf_intc: Pass M68kCPU to mcf_intc_init()
mcf5206: Pass M68kCPU to mcf5206_init()
target-m68k: Return M68kCPU from cpu_m68k_init()
ppc405_uc: Pass PowerPCCPU to ppc40x_{core,chip,system}_reset()
target-xtensa: Move TCG initialization to XtensaCPU initfn
...
The new formulation makes better use of add-with-carry type insns
that the host may have. Use gcc's sign adjustment trick to avoid
having to perform a 128-bit negation.
Replace some x86_64 specific inline assembly with something that
all 64-bit hosts ought to optimize well. At worst this becomes
a call to the gcc __multi3 routine, which is no worse than our
implementation in util/host-utils.c.
With gcc 4.7, we get identical code generation for x86_64. We
now get native multiplication on ia64 and s390x hosts. With minor
improvements to gcc we can get it for ppc64 as well.
Andreas Färber [Sat, 16 Feb 2013 22:21:24 +0000 (23:21 +0100)]
tcg/ppc: Fix build of tcg_qemu_tb_exec()
Commit 0b0d3320db74cde233ee7855ad32a9c121d20eb4 (TCG: Final globals
clean-up) moved code_gen_prologue but forgot to update ppc code.
This broke the build on 32-bit ppc. ppc64 is unaffected.
Andreas Färber [Fri, 15 Feb 2013 14:21:13 +0000 (15:21 +0100)]
e500: Replace open-coded loop with qemu_get_cpu()
Since we still need env for ppc-specific fields, obtain it via the new
env_ptr fields to avoid "cpu" name conflicts between CPUState and
PowerPCCPU for now.
This fixes a potential issue with env being NULL at the end of the loop
but cpu still being a valid pointer corresponding to a previous env.
Andreas Färber [Thu, 17 Jan 2013 11:13:41 +0000 (12:13 +0100)]
cpu: Add CPUArchState pointer to CPUState
The target-specific ENV_GET_CPU() macros have allowed us to navigate
from CPUArchState to CPUState. The reverse direction was not supported.
Avoid introducing CPU_GET_ENV() macros by initializing an untyped
pointer that is initialized in derived instance_init functions.
The field may not be called "env" due to it being poisoned.
Andreas Färber [Wed, 16 Jan 2013 18:29:31 +0000 (19:29 +0100)]
cpu: Move current_tb field to CPUState
Explictly NULL it on CPU reset since it was located before breakpoints.
Change vapic_report_tpr_access() argument to CPUState. This also
resolves the use of void* for cpu.h independence.
Change vAPIC patch_instruction() argument to X86CPU.
Andreas Färber [Sun, 20 Jan 2013 00:46:45 +0000 (01:46 +0100)]
target-xtensa: Move TCG initialization to XtensaCPU initfn
Combine this with breakpoint handler registration, guarding both with
tcg_enabled() to suppress also TCG init for qtest. Rename the handler to
xtensa_breakpoint_handler() since it needs to become global.
Andreas Färber [Sat, 19 Jan 2013 22:55:42 +0000 (23:55 +0100)]
target-cris: Move TCG initialization to CRISCPU initfn
Split out TCG initialization from cpu_cris_init(). Avoid CPUCRISState
dependency for v10-specific initialization and for non-v10 by inlining
the decision into the initfn as well.