]>
Commit | Line | Data |
---|---|---|
d73abd6d PD |
1 | Copyright (c) 2010-2015 Institute for System Programming |
2 | of the Russian Academy of Sciences. | |
3 | ||
4 | This work is licensed under the terms of the GNU GPL, version 2 or later. | |
5 | See the COPYING file in the top-level directory. | |
6 | ||
7 | Record/replay | |
8 | ------------- | |
9 | ||
10 | Record/replay functions are used for the reverse execution and deterministic | |
11 | replay of qemu execution. This implementation of deterministic replay can | |
12 | be used for deterministic debugging of guest code through a gdb remote | |
13 | interface. | |
14 | ||
15 | Execution recording writes a non-deterministic events log, which can be later | |
16 | used for replaying the execution anywhere and for unlimited number of times. | |
17 | It also supports checkpointing for faster rewinding during reverse debugging. | |
18 | Execution replaying reads the log and replays all non-deterministic events | |
19 | including external input, hardware clocks, and interrupts. | |
20 | ||
21 | Deterministic replay has the following features: | |
22 | * Deterministically replays whole system execution and all contents of | |
23 | the memory, state of the hardware devices, clocks, and screen of the VM. | |
24 | * Writes execution log into the file for later replaying for multiple times | |
25 | on different machines. | |
26 | * Supports i386, x86_64, and ARM hardware platforms. | |
27 | * Performs deterministic replay of all operations with keyboard and mouse | |
28 | input devices. | |
29 | ||
30 | Usage of the record/replay: | |
31 | * First, record the execution, by adding the following arguments to the command line: | |
32 | '-icount shift=7,rr=record,rrfile=replay.bin -net none'. | |
33 | Block devices' images are not actually changed in the recording mode, | |
34 | because all of the changes are written to the temporary overlay file. | |
35 | * Then you can replay it by using another command | |
36 | line option: '-icount shift=7,rr=replay,rrfile=replay.bin -net none' | |
37 | * '-net none' option should also be specified if network replay patches | |
38 | are not applied. | |
39 | ||
40 | Papers with description of deterministic replay implementation: | |
41 | http://www.computer.org/csdl/proceedings/csmr/2012/4666/00/4666a553-abs.html | |
42 | http://dl.acm.org/citation.cfm?id=2786805.2803179 | |
43 | ||
44 | Modifications of qemu include: | |
45 | * wrappers for clock and time functions to save their return values in the log | |
46 | * saving different asynchronous events (e.g. system shutdown) into the log | |
47 | * synchronization of the bottom halves execution | |
48 | * synchronization of the threads from thread pool | |
49 | * recording/replaying user input (mouse and keyboard) | |
50 | * adding internal checkpoints for cpu and io synchronization | |
51 | ||
52 | Non-deterministic events | |
53 | ------------------------ | |
54 | ||
55 | Our record/replay system is based on saving and replaying non-deterministic | |
56 | events (e.g. keyboard input) and simulating deterministic ones (e.g. reading | |
57 | from HDD or memory of the VM). Saving only non-deterministic events makes | |
58 | log file smaller, simulation faster, and allows using reverse debugging even | |
59 | for realtime applications. | |
60 | ||
61 | The following non-deterministic data from peripheral devices is saved into | |
62 | the log: mouse and keyboard input, network packets, audio controller input, | |
63 | USB packets, serial port input, and hardware clocks (they are non-deterministic | |
64 | too, because their values are taken from the host machine). Inputs from | |
65 | simulated hardware, memory of VM, software interrupts, and execution of | |
66 | instructions are not saved into the log, because they are deterministic and | |
67 | can be replayed by simulating the behavior of virtual machine starting from | |
68 | initial state. | |
69 | ||
70 | We had to solve three tasks to implement deterministic replay: recording | |
71 | non-deterministic events, replaying non-deterministic events, and checking | |
72 | that there is no divergence between record and replay modes. | |
73 | ||
74 | We changed several parts of QEMU to make event log recording and replaying. | |
75 | Devices' models that have non-deterministic input from external devices were | |
76 | changed to write every external event into the execution log immediately. | |
77 | E.g. network packets are written into the log when they arrive into the virtual | |
78 | network adapter. | |
79 | ||
80 | All non-deterministic events are coming from these devices. But to | |
81 | replay them we need to know at which moments they occur. We specify | |
82 | these moments by counting the number of instructions executed between | |
83 | every pair of consecutive events. | |
84 | ||
85 | Instruction counting | |
86 | -------------------- | |
87 | ||
88 | QEMU should work in icount mode to use record/replay feature. icount was | |
89 | designed to allow deterministic execution in absence of external inputs | |
90 | of the virtual machine. We also use icount to control the occurrence of the | |
91 | non-deterministic events. The number of instructions elapsed from the last event | |
92 | is written to the log while recording the execution. In replay mode we | |
93 | can predict when to inject that event using the instruction counter. | |
94 | ||
95 | Timers | |
96 | ------ | |
97 | ||
98 | Timers are used to execute callbacks from different subsystems of QEMU | |
99 | at the specified moments of time. There are several kinds of timers: | |
100 | * Real time clock. Based on host time and used only for callbacks that | |
101 | do not change the virtual machine state. For this reason real time | |
102 | clock and timers does not affect deterministic replay at all. | |
103 | * Virtual clock. These timers run only during the emulation. In icount | |
104 | mode virtual clock value is calculated using executed instructions counter. | |
105 | That is why it is completely deterministic and does not have to be recorded. | |
106 | * Host clock. This clock is used by device models that simulate real time | |
107 | sources (e.g. real time clock chip). Host clock is the one of the sources | |
108 | of non-determinism. Host clock read operations should be logged to | |
109 | make the execution deterministic. | |
110 | * Real time clock for icount. This clock is similar to real time clock but | |
111 | it is used only for increasing virtual clock while virtual machine is | |
112 | sleeping. Due to its nature it is also non-deterministic as the host clock | |
113 | and has to be logged too. | |
114 | ||
115 | Checkpoints | |
116 | ----------- | |
117 | ||
118 | Replaying of the execution of virtual machine is bound by sources of | |
119 | non-determinism. These are inputs from clock and peripheral devices, | |
120 | and QEMU thread scheduling. Thread scheduling affect on processing events | |
121 | from timers, asynchronous input-output, and bottom halves. | |
122 | ||
123 | Invocations of timers are coupled with clock reads and changing the state | |
124 | of the virtual machine. Reads produce non-deterministic data taken from | |
125 | host clock. And VM state changes should preserve their order. Their relative | |
126 | order in replay mode must replicate the order of callbacks in record mode. | |
127 | To preserve this order we use checkpoints. When a specific clock is processed | |
128 | in record mode we save to the log special "checkpoint" event. | |
129 | Checkpoints here do not refer to virtual machine snapshots. They are just | |
130 | record/replay events used for synchronization. | |
131 | ||
132 | QEMU in replay mode will try to invoke timers processing in random moment | |
133 | of time. That's why we do not process a group of timers until the checkpoint | |
134 | event will be read from the log. Such an event allows synchronizing CPU | |
135 | execution and timer events. | |
136 | ||
137 | Another checkpoints application in record/replay is instruction counting | |
138 | while the virtual machine is idle. This function (qemu_clock_warp) is called | |
139 | from the wait loop. It changes virtual machine state and must be deterministic | |
140 | then. That is why we added checkpoint to this function to prevent its | |
141 | operation in replay mode when it does not correspond to record mode. | |
142 | ||
143 | Bottom halves | |
144 | ------------- | |
145 | ||
146 | Disk I/O events are completely deterministic in our model, because | |
147 | in both record and replay modes we start virtual machine from the same | |
148 | disk state. But callbacks that virtual disk controller uses for reading and | |
149 | writing the disk may occur at different moments of time in record and replay | |
150 | modes. | |
151 | ||
152 | Reading and writing requests are created by CPU thread of QEMU. Later these | |
153 | requests proceed to block layer which creates "bottom halves". Bottom | |
154 | halves consist of callback and its parameters. They are processed when | |
155 | main loop locks the global mutex. These locks are not synchronized with | |
156 | replaying process because main loop also processes the events that do not | |
157 | affect the virtual machine state (like user interaction with monitor). | |
158 | ||
159 | That is why we had to implement saving and replaying bottom halves callbacks | |
160 | synchronously to the CPU execution. When the callback is about to execute | |
161 | it is added to the queue in the replay module. This queue is written to the | |
162 | log when its callbacks are executed. In replay mode callbacks are not processed | |
163 | until the corresponding event is read from the events log file. | |
164 | ||
165 | Sometimes the block layer uses asynchronous callbacks for its internal purposes | |
166 | (like reading or writing VM snapshots or disk image cluster tables). In this | |
167 | case bottom halves are not marked as "replayable" and do not saved | |
168 | into the log. |