]>
Commit | Line | Data |
---|---|---|
78e87797 PB |
1 | @node Implementation notes |
2 | @appendix Implementation notes | |
debc7065 FB |
3 | |
4 | @menu | |
77d47e16 PB |
5 | * CPU emulation:: |
6 | * Translator Internals:: | |
77d47e16 | 7 | * QEMU compared to other emulators:: |
047f7038 | 8 | * Managed start up options:: |
77d47e16 | 9 | * Bibliography:: |
debc7065 | 10 | @end menu |
debc7065 | 11 | |
77d47e16 | 12 | @node CPU emulation |
78e87797 | 13 | @section CPU emulation |
1f673135 | 14 | |
debc7065 | 15 | @menu |
77d47e16 PB |
16 | * x86:: x86 and x86-64 emulation |
17 | * ARM:: ARM emulation | |
18 | * MIPS:: MIPS emulation | |
19 | * PPC:: PowerPC emulation | |
20 | * SPARC:: Sparc32 and Sparc64 emulation | |
21 | * Xtensa:: Xtensa emulation | |
debc7065 FB |
22 | @end menu |
23 | ||
77d47e16 | 24 | @node x86 |
78e87797 | 25 | @subsection x86 and x86-64 emulation |
1f673135 FB |
26 | |
27 | QEMU x86 target features: | |
28 | ||
5fafdf24 | 29 | @itemize |
1f673135 | 30 | |
5fafdf24 | 31 | @item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation. |
998a0501 BS |
32 | LDT/GDT and IDT are emulated. VM86 mode is also supported to run |
33 | DOSEMU. There is some support for MMX/3DNow!, SSE, SSE2, SSE3, SSSE3, | |
34 | and SSE4 as well as x86-64 SVM. | |
1f673135 FB |
35 | |
36 | @item Support of host page sizes bigger than 4KB in user mode emulation. | |
37 | ||
38 | @item QEMU can emulate itself on x86. | |
39 | ||
5fafdf24 | 40 | @item An extensive Linux x86 CPU test program is included @file{tests/test-i386}. |
1f673135 FB |
41 | It can be used to test other x86 virtual CPUs. |
42 | ||
43 | @end itemize | |
44 | ||
45 | Current QEMU limitations: | |
46 | ||
5fafdf24 | 47 | @itemize |
1f673135 | 48 | |
998a0501 | 49 | @item Limited x86-64 support. |
1f673135 FB |
50 | |
51 | @item IPC syscalls are missing. | |
52 | ||
5fafdf24 | 53 | @item The x86 segment limits and access rights are not tested at every |
1f673135 FB |
54 | memory access (yet). Hopefully, very few OSes seem to rely on that for |
55 | normal use. | |
56 | ||
1f673135 FB |
57 | @end itemize |
58 | ||
77d47e16 | 59 | @node ARM |
78e87797 | 60 | @subsection ARM emulation |
1f673135 FB |
61 | |
62 | @itemize | |
63 | ||
64 | @item Full ARM 7 user emulation. | |
65 | ||
66 | @item NWFPE FPU support included in user Linux emulation. | |
67 | ||
68 | @item Can run most ARM Linux binaries. | |
69 | ||
70 | @end itemize | |
71 | ||
77d47e16 | 72 | @node MIPS |
78e87797 | 73 | @subsection MIPS emulation |
24d4de45 TS |
74 | |
75 | @itemize | |
76 | ||
77 | @item The system emulation allows full MIPS32/MIPS64 Release 2 emulation, | |
78 | including privileged instructions, FPU and MMU, in both little and big | |
79 | endian modes. | |
80 | ||
81 | @item The Linux userland emulation can run many 32 bit MIPS Linux binaries. | |
82 | ||
83 | @end itemize | |
84 | ||
85 | Current QEMU limitations: | |
86 | ||
87 | @itemize | |
88 | ||
89 | @item Self-modifying code is not always handled correctly. | |
90 | ||
91 | @item 64 bit userland emulation is not implemented. | |
92 | ||
93 | @item The system emulation is not complete enough to run real firmware. | |
94 | ||
b1f45238 TS |
95 | @item The watchpoint debug facility is not implemented. |
96 | ||
24d4de45 TS |
97 | @end itemize |
98 | ||
77d47e16 | 99 | @node PPC |
78e87797 | 100 | @subsection PowerPC emulation |
1f673135 FB |
101 | |
102 | @itemize | |
103 | ||
5fafdf24 | 104 | @item Full PowerPC 32 bit emulation, including privileged instructions, |
1f673135 FB |
105 | FPU and MMU. |
106 | ||
107 | @item Can run most PowerPC Linux binaries. | |
108 | ||
109 | @end itemize | |
110 | ||
77d47e16 | 111 | @node SPARC |
78e87797 | 112 | @subsection Sparc32 and Sparc64 emulation |
1f673135 FB |
113 | |
114 | @itemize | |
115 | ||
f6b647cd | 116 | @item Full SPARC V8 emulation, including privileged |
3475187d | 117 | instructions, FPU and MMU. SPARC V9 emulation includes most privileged |
a785e42e | 118 | and VIS instructions, FPU and I/D MMU. Alignment is fully enforced. |
1f673135 | 119 | |
a785e42e BS |
120 | @item Can run most 32-bit SPARC Linux binaries, SPARC32PLUS Linux binaries and |
121 | some 64-bit SPARC Linux binaries. | |
3475187d FB |
122 | |
123 | @end itemize | |
124 | ||
125 | Current QEMU limitations: | |
126 | ||
5fafdf24 | 127 | @itemize |
3475187d | 128 | |
3475187d FB |
129 | @item IPC syscalls are missing. |
130 | ||
1f587329 | 131 | @item Floating point exception support is buggy. |
3475187d FB |
132 | |
133 | @item Atomic instructions are not correctly implemented. | |
134 | ||
998a0501 BS |
135 | @item There are still some problems with Sparc64 emulators. |
136 | ||
137 | @end itemize | |
138 | ||
77d47e16 | 139 | @node Xtensa |
78e87797 | 140 | @subsection Xtensa emulation |
3aeaea65 MF |
141 | |
142 | @itemize | |
143 | ||
144 | @item Core Xtensa ISA emulation, including most options: code density, | |
145 | loop, extended L32R, 16- and 32-bit multiplication, 32-bit division, | |
044d003d MF |
146 | MAC16, miscellaneous operations, boolean, FP coprocessor, coprocessor |
147 | context, debug, multiprocessor synchronization, | |
3aeaea65 MF |
148 | conditional store, exceptions, relocatable vectors, unaligned exception, |
149 | interrupts (including high priority and timer), hardware alignment, | |
150 | region protection, region translation, MMU, windowed registers, thread | |
151 | pointer, processor ID. | |
152 | ||
044d003d MF |
153 | @item Not implemented options: data/instruction cache (including cache |
154 | prefetch and locking), XLMI, processor interface. Also options not | |
155 | covered by the core ISA (e.g. FLIX, wide branches) are not implemented. | |
3aeaea65 MF |
156 | |
157 | @item Can run most Xtensa Linux binaries. | |
158 | ||
159 | @item New core configuration that requires no additional instructions | |
160 | may be created from overlay with minimal amount of hand-written code. | |
161 | ||
162 | @end itemize | |
163 | ||
77d47e16 | 164 | @node Translator Internals |
78e87797 | 165 | @section Translator Internals |
1f673135 | 166 | |
1f673135 FB |
167 | QEMU is a dynamic translator. When it first encounters a piece of code, |
168 | it converts it to the host instruction set. Usually dynamic translators | |
169 | are very complicated and highly CPU dependent. QEMU uses some tricks | |
170 | which make it relatively easily portable and simple while achieving good | |
171 | performances. | |
172 | ||
bf28a69e PB |
173 | QEMU's dynamic translation backend is called TCG, for "Tiny Code |
174 | Generator". For more information, please take a look at @code{tcg/README}. | |
1f673135 | 175 | |
36e4970e | 176 | Some notable features of QEMU's dynamic translator are: |
1f673135 | 177 | |
36e4970e PB |
178 | @table @strong |
179 | ||
180 | @item CPU state optimisations: | |
998a0501 BS |
181 | The target CPUs have many internal states which change the way it |
182 | evaluates instructions. In order to achieve a good speed, the | |
183 | translation phase considers that some state information of the virtual | |
184 | CPU cannot change in it. The state is recorded in the Translation | |
185 | Block (TB). If the state changes (e.g. privilege level), a new TB will | |
186 | be generated and the previous TB won't be used anymore until the state | |
36e4970e PB |
187 | matches the state recorded in the previous TB. The same idea can be applied |
188 | to other aspects of the CPU state. For example, on x86, if the SS, | |
998a0501 BS |
189 | DS and ES segments have a zero base, then the translator does not even |
190 | generate an addition for the segment base. | |
1f673135 | 191 | |
36e4970e | 192 | @item Direct block chaining: |
1f673135 | 193 | After each translated basic block is executed, QEMU uses the simulated |
d274e07c | 194 | Program Counter (PC) and other cpu state information (such as the CS |
1f673135 FB |
195 | segment base value) to find the next basic block. |
196 | ||
197 | In order to accelerate the most common cases where the new simulated PC | |
198 | is known, QEMU can patch a basic block so that it jumps directly to the | |
199 | next one. | |
200 | ||
201 | The most portable code uses an indirect jump. An indirect jump makes | |
202 | it easier to make the jump target modification atomic. On some host | |
203 | architectures (such as x86 or PowerPC), the @code{JUMP} opcode is | |
204 | directly patched so that the block chaining has no overhead. | |
205 | ||
36e4970e | 206 | @item Self-modifying code and translated code invalidation: |
1f673135 FB |
207 | Self-modifying code is a special challenge in x86 emulation because no |
208 | instruction cache invalidation is signaled by the application when code | |
209 | is modified. | |
210 | ||
36e4970e PB |
211 | User-mode emulation marks a host page as write-protected (if it is |
212 | not already read-only) every time translated code is generated for a | |
213 | basic block. Then, if a write access is done to the page, Linux raises | |
214 | a SEGV signal. QEMU then invalidates all the translated code in the page | |
215 | and enables write accesses to the page. For system emulation, write | |
216 | protection is achieved through the software MMU. | |
1f673135 FB |
217 | |
218 | Correct translated code invalidation is done efficiently by maintaining | |
219 | a linked list of every translated block contained in a given page. Other | |
5fafdf24 | 220 | linked lists are also maintained to undo direct block chaining. |
1f673135 | 221 | |
998a0501 BS |
222 | On RISC targets, correctly written software uses memory barriers and |
223 | cache flushes, so some of the protection above would not be | |
224 | necessary. However, QEMU still requires that the generated code always | |
225 | matches the target instructions in memory in order to handle | |
226 | exceptions correctly. | |
1f673135 | 227 | |
36e4970e | 228 | @item Exception support: |
1f673135 | 229 | longjmp() is used when an exception such as division by zero is |
5fafdf24 | 230 | encountered. |
1f673135 FB |
231 | |
232 | The host SIGSEGV and SIGBUS signal handlers are used to get invalid | |
36e4970e PB |
233 | memory accesses. QEMU keeps a map from host program counter to |
234 | target program counter, and looks up where the exception happened | |
235 | based on the host program counter at the exception point. | |
236 | ||
237 | On some targets, some bits of the virtual CPU's state are not flushed to the | |
238 | memory until the end of the translation block. This is done for internal | |
239 | emulation state that is rarely accessed directly by the program and/or changes | |
240 | very often throughout the execution of a translation block---this includes | |
241 | condition codes on x86, delay slots on SPARC, conditional execution on | |
242 | ARM, and so on. This state is stored for each target instruction, and | |
243 | looked up on exceptions. | |
244 | ||
245 | @item MMU emulation: | |
246 | For system emulation QEMU uses a software MMU. In that mode, the MMU | |
998a0501 | 247 | virtual to physical address translation is done at every memory |
36e4970e | 248 | access. |
1f673135 | 249 | |
36e4970e | 250 | QEMU uses an address translation cache (TLB) to speed up the translation. |
1f673135 | 251 | In order to avoid flushing the translated code each time the MMU |
36e4970e | 252 | mappings change, all caches in QEMU are physically indexed. This |
5fafdf24 | 253 | means that each basic block is indexed with its physical address. |
1f673135 | 254 | |
36e4970e PB |
255 | In order to avoid invalidating the basic block chain when MMU mappings |
256 | change, chaining is only performed when the destination of the jump | |
257 | shares a page with the basic block that is performing the jump. | |
258 | ||
259 | The MMU can also distinguish RAM and ROM memory areas from MMIO memory | |
260 | areas. Access is faster for RAM and ROM because the translation cache also | |
261 | hosts the offset between guest address and host memory. Accessing MMIO | |
262 | memory areas instead calls out to C code for device emulation. | |
263 | Finally, the MMU helps tracking dirty pages and pages pointed to by | |
264 | translation blocks. | |
265 | @end table | |
998a0501 | 266 | |
77d47e16 | 267 | @node QEMU compared to other emulators |
78e87797 | 268 | @section QEMU compared to other emulators |
77d47e16 PB |
269 | |
270 | Like bochs [1], QEMU emulates an x86 CPU. But QEMU is much faster than | |
271 | bochs as it uses dynamic compilation. Bochs is closely tied to x86 PC | |
272 | emulation while QEMU can emulate several processors. | |
273 | ||
274 | Like Valgrind [2], QEMU does user space emulation and dynamic | |
275 | translation. Valgrind is mainly a memory debugger while QEMU has no | |
276 | support for it (QEMU could be used to detect out of bound memory | |
277 | accesses as Valgrind, but it has no support to track uninitialised data | |
278 | as Valgrind does). The Valgrind dynamic translator generates better code | |
279 | than QEMU (in particular it does register allocation) but it is closely | |
280 | tied to an x86 host and target and has no support for precise exceptions | |
281 | and system emulation. | |
282 | ||
283 | EM86 [3] is the closest project to user space QEMU (and QEMU still uses | |
284 | some of its code, in particular the ELF file loader). EM86 was limited | |
285 | to an alpha host and used a proprietary and slow interpreter (the | |
286 | interpreter part of the FX!32 Digital Win32 code translator [4]). | |
287 | ||
288 | TWIN from Willows Software was a Windows API emulator like Wine. It is less | |
289 | accurate than Wine but includes a protected mode x86 interpreter to launch | |
290 | x86 Windows executables. Such an approach has greater potential because most | |
291 | of the Windows API is executed natively but it is far more difficult to | |
292 | develop because all the data structures and function parameters exchanged | |
293 | between the API and the x86 code must be converted. | |
294 | ||
295 | User mode Linux [5] was the only solution before QEMU to launch a | |
296 | Linux kernel as a process while not needing any host kernel | |
297 | patches. However, user mode Linux requires heavy kernel patches while | |
298 | QEMU accepts unpatched Linux kernels. The price to pay is that QEMU is | |
299 | slower. | |
300 | ||
301 | The Plex86 [6] PC virtualizer is done in the same spirit as the now | |
302 | obsolete qemu-fast system emulator. It requires a patched Linux kernel | |
303 | to work (you cannot launch the same kernel on your PC), but the | |
304 | patches are really small. As it is a PC virtualizer (no emulation is | |
305 | done except for some privileged instructions), it has the potential of | |
306 | being faster than QEMU. The downside is that a complicated (and | |
307 | potentially unsafe) host kernel patch is needed. | |
308 | ||
309 | The commercial PC Virtualizers (VMWare [7], VirtualPC [8]) are faster | |
310 | than QEMU (without virtualization), but they all need specific, proprietary | |
311 | and potentially unsafe host drivers. Moreover, they are unable to | |
312 | provide cycle exact simulation as an emulator can. | |
313 | ||
314 | VirtualBox [9], Xen [10] and KVM [11] are based on QEMU. QEMU-SystemC | |
315 | [12] uses QEMU to simulate a system where some hardware devices are | |
316 | developed in SystemC. | |
317 | ||
047f7038 IM |
318 | @node Managed start up options |
319 | @section Managed start up options | |
320 | ||
321 | In system mode emulation, it's possible to create a VM in a paused state using | |
322 | the -S command line option. In this state the machine is completely initialized | |
323 | according to command line options and ready to execute VM code but VCPU threads | |
324 | are not executing any code. The VM state in this paused state depends on the way | |
325 | QEMU was started. It could be in: | |
326 | @table @asis | |
327 | @item initial state (after reset/power on state) | |
328 | @item with direct kernel loading, the initial state could be amended to execute | |
329 | code loaded by QEMU in the VM's RAM and with incoming migration | |
330 | @item with incoming migration, initial state will by amended with the migrated | |
331 | machine state after migration completes. | |
332 | @end table | |
333 | ||
334 | This paused state is typically used by users to query machine state and/or | |
335 | additionally configure the machine (by hotplugging devices) in runtime before | |
336 | allowing VM code to run. | |
337 | ||
338 | However, at the -S pause point, it's impossible to configure options that affect | |
339 | initial VM creation (like: -smp/-m/-numa ...) or cold plug devices. That's | |
340 | when the --preconfig command line option should be used. It allows pausing QEMU | |
341 | before the initial VM creation, in a new preconfig state, where additional | |
342 | queries and configuration can be performed via QMP before moving on to | |
343 | the resulting configuration startup. In the preconfig state, QEMU only allows | |
344 | a limited set of commands over the QMP monitor, where the commands do not | |
345 | depend on an initialized machine, including but not limited to: | |
346 | @table @asis | |
347 | @item qmp_capabilities | |
348 | @item query-qmp-schema | |
349 | @item query-commands | |
350 | @item query-status | |
351 | @item exit-preconfig | |
352 | @end table | |
353 | The full list of commands is in QMP schema which could be queried with | |
354 | query-qmp-schema, where commands supported at preconfig state have option | |
355 | 'allow-preconfig' set to true. | |
356 | ||
debc7065 | 357 | @node Bibliography |
78e87797 | 358 | @section Bibliography |
1f673135 FB |
359 | |
360 | @table @asis | |
361 | ||
5fafdf24 | 362 | @item [1] |
8e9620a6 TH |
363 | @url{http://bochs.sourceforge.net/}, the Bochs IA-32 Emulator Project, |
364 | by Kevin Lawton et al. | |
1f673135 FB |
365 | |
366 | @item [2] | |
8e9620a6 TH |
367 | @url{http://www.valgrind.org/}, Valgrind, an open-source memory debugger |
368 | for GNU/Linux. | |
1f673135 FB |
369 | |
370 | @item [3] | |
8e9620a6 TH |
371 | @url{http://ftp.dreamtime.org/pub/linux/Linux-Alpha/em86/v0.2/docs/em86.html}, |
372 | the EM86 x86 emulator on Alpha-Linux. | |
1f673135 FB |
373 | |
374 | @item [4] | |
debc7065 | 375 | @url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/@/full_papers/chernoff/chernoff.pdf}, |
1f673135 FB |
376 | DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton |
377 | Chernoff and Ray Hookway. | |
378 | ||
8e9620a6 | 379 | @item [5] |
5fafdf24 | 380 | @url{http://user-mode-linux.sourceforge.net/}, |
1f673135 FB |
381 | The User-mode Linux Kernel. |
382 | ||
8e9620a6 | 383 | @item [6] |
5fafdf24 | 384 | @url{http://www.plex86.org/}, |
1f673135 FB |
385 | The new Plex86 project. |
386 | ||
8e9620a6 | 387 | @item [7] |
5fafdf24 | 388 | @url{http://www.vmware.com/}, |
1f673135 FB |
389 | The VMWare PC virtualizer. |
390 | ||
8e9620a6 TH |
391 | @item [8] |
392 | @url{https://www.microsoft.com/download/details.aspx?id=3702}, | |
1f673135 FB |
393 | The VirtualPC PC virtualizer. |
394 | ||
8e9620a6 | 395 | @item [9] |
998a0501 BS |
396 | @url{http://virtualbox.org/}, |
397 | The VirtualBox PC virtualizer. | |
398 | ||
8e9620a6 | 399 | @item [10] |
998a0501 BS |
400 | @url{http://www.xen.org/}, |
401 | The Xen hypervisor. | |
402 | ||
8e9620a6 TH |
403 | @item [11] |
404 | @url{http://www.linux-kvm.org/}, | |
998a0501 BS |
405 | Kernel Based Virtual Machine (KVM). |
406 | ||
8e9620a6 | 407 | @item [12] |
998a0501 BS |
408 | @url{http://www.greensocs.com/projects/QEMUSystemC}, |
409 | QEMU-SystemC, a hardware co-simulator. | |
410 | ||
1f673135 | 411 | @end table |