]>
Commit | Line | Data |
---|---|---|
dce26c7d HS |
1 | .. SPDX-License-Identifier: GPL-2.0+ |
2 | .. Copyright (c) 2013 The Chromium OS Authors. | |
b2e16a85 SG |
3 | |
4 | Tracing in U-Boot | |
5 | ================= | |
6 | ||
3f80064f | 7 | U-Boot supports a simple tracing feature which allows a record of execution |
b2e16a85 SG |
8 | to be collected and sent to a host machine for analysis. At present the |
9 | main use for this is to profile boot time. | |
10 | ||
11 | ||
12 | Overview | |
13 | -------- | |
14 | ||
15 | The trace feature uses GCC's instrument-functions feature to trace all | |
16 | function entry/exit points. These are then recorded in a memory buffer. | |
17 | The memory buffer can be saved to the host over a network link using | |
e46b7243 | 18 | tftpput or by writing to an attached storage device such as MMC. |
b2e16a85 SG |
19 | |
20 | On the host, the file is first converted with a tool called 'proftool', | |
21 | which extracts useful information from it. The resulting trace output | |
22 | resembles that emitted by Linux's ftrace feature, so can be visually | |
e46b7243 SG |
23 | displayed by kernelshark (see kernelshark_) and used with |
24 | 'trace-cmd report' (see trace_cmd_). | |
25 | ||
26 | It is also possible to produce a flame graph for use with flamegraph.pl | |
27 | (see flamegraph_pl_). | |
b2e16a85 SG |
28 | |
29 | ||
30 | Quick-start using Sandbox | |
31 | ------------------------- | |
32 | ||
33 | Sandbox is a build of U-Boot that can run under Linux so it is a convenient | |
34 | way of trying out tracing before you use it on your actual board. To do | |
35 | this, follow these steps: | |
36 | ||
e46b7243 | 37 | Add the following to `config/sandbox_defconfig`: |
b2e16a85 | 38 | |
dce26c7d HS |
39 | .. code-block:: c |
40 | ||
89050244 | 41 | CONFIG_TRACE=y |
b2e16a85 SG |
42 | |
43 | Build sandbox U-Boot with tracing enabled: | |
44 | ||
dce26c7d HS |
45 | .. code-block:: console |
46 | ||
47 | $ make FTRACE=1 O=sandbox sandbox_config | |
48 | $ make FTRACE=1 O=sandbox | |
b2e16a85 SG |
49 | |
50 | Run sandbox, wait for a bit of trace information to appear, and then capture | |
51 | a trace: | |
52 | ||
dce26c7d HS |
53 | .. code-block:: console |
54 | ||
55 | $ ./sandbox/u-boot | |
56 | ||
57 | U-Boot 2013.04-rc2-00100-ga72fcef (Apr 17 2013 - 19:25:24) | |
58 | ||
59 | DRAM: 128 MiB | |
60 | trace: enabled | |
61 | Using default environment | |
62 | ||
63 | In: serial | |
64 | Out: serial | |
65 | Err: serial | |
66 | =>trace stats | |
67 | 671,406 function sites | |
68 | 69,712 function calls | |
69 | 0 untracked function calls | |
70 | 73,373 traced function calls | |
71 | 16 maximum observed call depth | |
72 | 15 call depth limit | |
73 | 66,491 calls not traced due to depth | |
74 | =>trace stats | |
75 | 671,406 function sites | |
76 | 1,279,450 function calls | |
77 | 0 untracked function calls | |
78 | 950,490 traced function calls (333217 dropped due to overflow) | |
79 | 16 maximum observed call depth | |
80 | 15 call depth limit | |
81 | 1,275,767 calls not traced due to depth | |
e46b7243 | 82 | =>trace calls 1000000 e00000 |
dce26c7d HS |
83 | Call list dumped to 00000000, size 0xae0a40 |
84 | ||
85 | baudrate=115200 | |
86 | profbase=0 | |
87 | profoffset=ae0a40 | |
88 | profsize=e00000 | |
89 | stderr=serial | |
90 | stdin=serial | |
91 | stdout=serial | |
b2e16a85 | 92 | |
dce26c7d | 93 | Environment size: 117/8188 bytes |
e46b7243 | 94 | =>host save hostfs - 1000000 trace ${profoffset} |
dce26c7d HS |
95 | 11405888 bytes written in 10 ms (1.1 GiB/s) |
96 | =>reset | |
b2e16a85 | 97 | |
b2e16a85 | 98 | |
dce26c7d | 99 | Then run proftool to convert the trace information to ftrace format |
b2e16a85 | 100 | |
dce26c7d | 101 | .. code-block:: console |
b2e16a85 | 102 | |
d53b2cfe | 103 | $ ./sandbox/tools/proftool -m sandbox/System.map -t trace dump-ftrace -o trace.dat |
b2e16a85 | 104 | |
e46b7243 | 105 | Finally run kernelshark to display it (note it only works with `.dat` files!): |
b2e16a85 | 106 | |
dce26c7d | 107 | .. code-block:: console |
b2e16a85 | 108 | |
e46b7243 | 109 | $ kernelshark trace.dat |
b2e16a85 | 110 | |
e46b7243 | 111 | Using this tool you can view the trace records and see the timestamp for each |
b2e16a85 SG |
112 | function. |
113 | ||
e46b7243 SG |
114 | .. image:: pics/kernelshark.png |
115 | :width: 800 | |
116 | :alt: Kernelshark showing function-trace records | |
117 | ||
118 | ||
119 | To see the records on the console, use trace-cmd: | |
120 | ||
121 | .. code-block:: console | |
122 | ||
123 | $ trace-cmd report trace.dat | less | |
124 | cpus=1 | |
125 | u-boot-1 [000] 3.116364: function: initf_malloc | |
126 | u-boot-1 [000] 3.116375: function: initf_malloc | |
127 | u-boot-1 [000] 3.116386: function: initf_bootstage | |
128 | u-boot-1 [000] 3.116396: function: bootstage_init | |
129 | u-boot-1 [000] 3.116408: function: malloc | |
130 | u-boot-1 [000] 3.116418: function: malloc_simple | |
131 | u-boot-1 [000] 3.116429: function: alloc_simple | |
132 | u-boot-1 [000] 3.116441: function: alloc_simple | |
133 | u-boot-1 [000] 3.116449: function: malloc_simple | |
134 | u-boot-1 [000] 3.116457: function: malloc | |
135 | ||
136 | Note that `pytimechart` is obsolete so cannot be used anymore. | |
137 | ||
138 | There is a -f option available to select a function graph: | |
139 | ||
140 | .. code-block:: console | |
141 | ||
aab60a51 | 142 | $ ./sandbox/tools/proftool -m sandbox/System.map -t trace -f funcgraph dump-ftrace -o trace.dat |
e46b7243 SG |
143 | |
144 | Again, you can use kernelshark or trace-cmd to look at the output. In this case | |
145 | you will see the time taken by each function shown against its exit record. | |
146 | ||
147 | .. image:: pics/kernelshark_fg.png | |
148 | :width: 800 | |
149 | :alt: Kernelshark showing function-graph records | |
150 | ||
151 | .. code-block:: console | |
152 | ||
153 | $ trace-cmd report trace.dat | less | |
154 | cpus=1 | |
155 | u-boot-1 [000] 3.116364: funcgraph_entry: 0.011 us | initf_malloc(); | |
156 | u-boot-1 [000] 3.116386: funcgraph_entry: | initf_bootstage() { | |
157 | u-boot-1 [000] 3.116396: funcgraph_entry: | bootstage_init() { | |
158 | u-boot-1 [000] 3.116408: funcgraph_entry: | malloc() { | |
159 | u-boot-1 [000] 3.116418: funcgraph_entry: | malloc_simple() { | |
160 | u-boot-1 [000] 3.116429: funcgraph_entry: 0.012 us | alloc_simple(); | |
161 | u-boot-1 [000] 3.116449: funcgraph_exit: 0.031 us | } | |
162 | u-boot-1 [000] 3.116457: funcgraph_exit: 0.049 us | } | |
163 | u-boot-1 [000] 3.116466: funcgraph_entry: 0.063 us | memset(); | |
164 | u-boot-1 [000] 3.116539: funcgraph_exit: 0.143 us | } | |
165 | ||
166 | Flame graph | |
167 | ----------- | |
168 | ||
169 | Some simple flame graph options are available as well, using the dump-flamegraph | |
170 | command: | |
171 | ||
172 | .. code-block:: console | |
173 | ||
aab60a51 | 174 | $ ./sandbox/tools/proftool -m sandbox/System.map -t trace dump-flamegraph -o trace.fg |
e46b7243 SG |
175 | $ flamegraph.pl trace.fg >trace.svg |
176 | ||
177 | You can load the .svg file into a viewer. If you use Chrome (and some other | |
178 | programs) you can click around and zoom in and out. | |
179 | ||
180 | .. image:: pics/flamegraph.png | |
181 | :width: 800 | |
182 | :alt: Chrome showing the flamegraph.pl output | |
183 | ||
184 | .. image:: pics/flamegraph_zoom.png | |
185 | :width: 800 | |
186 | :alt: Chrome showing zooming into the flamegraph.pl output | |
187 | ||
188 | ||
189 | A timing variant is also available, which gives an idea of how much time is | |
190 | spend in each call stack: | |
191 | ||
192 | .. code-block:: console | |
193 | ||
aab60a51 | 194 | $ ./sandbox/tools/proftool -m sandbox/System.map -t trace dump-flamegraph -f timing -o trace.fg |
e46b7243 SG |
195 | $ flamegraph.pl trace.fg >trace.svg |
196 | ||
197 | Note that trace collection does slow down execution so the timings will be | |
198 | inflated. They should be used to guide optimisation. For accurate boot timings, | |
199 | use bootstage. | |
200 | ||
201 | .. image:: pics/flamegraph_timing.png | |
202 | :width: 800 | |
203 | :alt: Chrome showing flamegraph.pl output with timing | |
b2e16a85 SG |
204 | |
205 | CONFIG Options | |
206 | -------------- | |
207 | ||
dce26c7d HS |
208 | CONFIG_TRACE |
209 | Enables the trace feature in U-Boot. | |
b2e16a85 | 210 | |
dce26c7d HS |
211 | CONFIG_CMD_TRACE |
212 | Enables the trace command. | |
b2e16a85 | 213 | |
dce26c7d HS |
214 | CONFIG_TRACE_BUFFER_SIZE |
215 | Size of trace buffer to allocate for U-Boot. This buffer is | |
216 | used after relocation, as a place to put function tracing | |
217 | information. The address of the buffer is determined by | |
218 | the relocation code. | |
b2e16a85 | 219 | |
dce26c7d HS |
220 | CONFIG_TRACE_EARLY |
221 | Define this to start tracing early, before relocation. | |
b2e16a85 | 222 | |
dce26c7d HS |
223 | CONFIG_TRACE_EARLY_SIZE |
224 | Size of 'early' trace buffer. Before U-Boot has relocated | |
225 | it doesn't have a proper trace buffer. On many boards | |
226 | you can define an area of memory to use for the trace | |
227 | buffer until the 'real' trace buffer is available after | |
228 | relocation. The contents of this buffer are then copied to | |
229 | the real buffer. | |
b2e16a85 | 230 | |
dce26c7d HS |
231 | CONFIG_TRACE_EARLY_ADDR |
232 | Address of early trace buffer | |
b2e16a85 | 233 | |
e46b7243 SG |
234 | CONFIG_TRACE_CALL_DEPTH_LIMIT |
235 | Sets the limit on trace call-depth. For a broad view, 10 is typically | |
236 | sufficient. Setting this too large creates enormous traces and distorts | |
237 | the overall timing considerable. | |
238 | ||
b2e16a85 SG |
239 | |
240 | Building U-Boot with Tracing Enabled | |
241 | ------------------------------------ | |
242 | ||
243 | Pass 'FTRACE=1' to the U-Boot Makefile to actually instrument the code. | |
244 | This is kept as a separate option so that it is easy to enable/disable | |
245 | instrumenting from the command line instead of having to change board | |
246 | config files. | |
247 | ||
248 | ||
e46b7243 SG |
249 | Board requirements |
250 | ------------------ | |
251 | ||
252 | Trace data collection relies on a microsecond timer, accessed through | |
253 | `timer_get_us()`. So the first thing you should do is make sure that | |
254 | this produces sensible results for your board. Suitable sources for | |
255 | this timer include high resolution timers, PWMs or profile timers if | |
256 | available. Most modern SOCs have a suitable timer for this. | |
257 | ||
258 | See `add_ftrace()` for where `timer_get_us()` is called. The `notrace` | |
259 | attribute must be used on each function called by `timer_get_us()` since | |
260 | recursive calls to `add_ftrace()` will cause a fault:: | |
261 | ||
262 | trace: recursion detected, disabling | |
263 | ||
264 | You cannot use driver model to obtain the microsecond timer, since tracing | |
265 | may be enabled before driver model is set up. Instead, provide a low-level | |
266 | function which accesses the timer, setting it up if needed. | |
267 | ||
268 | ||
b2e16a85 SG |
269 | Collecting Trace Data |
270 | --------------------- | |
271 | ||
272 | When you run U-Boot on your board it will collect trace data up to the | |
273 | limit of the trace buffer size you have specified. Once that is exhausted | |
274 | no more data will be collected. | |
275 | ||
e46b7243 | 276 | Collecting trace data affects execution time and performance. You |
3f80064f | 277 | will notice this particularly with trivial functions - the overhead of |
b2e16a85 SG |
278 | recording their execution may even exceed their normal execution time. |
279 | In practice this doesn't matter much so long as you are aware of the | |
3f80064f | 280 | effect. Once you have done your optimizations, turn off tracing before |
e46b7243 | 281 | doing end-to-end timing using bootstage. |
b2e16a85 SG |
282 | |
283 | The best time to start tracing is right at the beginning of U-Boot. The | |
284 | best time to stop tracing is right at the end. In practice it is hard | |
285 | to achieve these ideals. | |
286 | ||
e46b7243 SG |
287 | This implementation enables tracing early in `board_init_r()`, or |
288 | `board_init_f()` when `TRACE_EARLY` is enabled. This means | |
b2e16a85 SG |
289 | that it captures most of the board init process, missing only the |
290 | early architecture-specific init. However, it also misses the entire | |
e46b7243 | 291 | SPL stage if there is one. At present tracing is not supported in SPL. |
b2e16a85 SG |
292 | |
293 | U-Boot typically ends with a 'bootm' command which loads and runs an | |
294 | OS. There is useful trace data in the execution of that bootm | |
295 | command. Therefore this implementation provides a way to collect trace | |
296 | data after bootm has finished processing, but just before it jumps to | |
297 | the OS. In practical terms, U-Boot runs the 'fakegocmd' environment | |
298 | variable at this point. This variable should have a short script which | |
299 | collects the trace data and writes it somewhere. | |
300 | ||
e46b7243 SG |
301 | Controlling the trace |
302 | --------------------- | |
b2e16a85 | 303 | |
e46b7243 SG |
304 | U-Boot provides a command-line interface to the trace system for controlling |
305 | tracing and accessing the trace data. See :doc:`../usage/cmd/trace`. | |
b2e16a85 SG |
306 | |
307 | ||
308 | Environment Variables | |
309 | --------------------- | |
310 | ||
311 | The following are used: | |
312 | ||
dce26c7d HS |
313 | profbase |
314 | Base address of trace output buffer | |
b2e16a85 | 315 | |
dce26c7d HS |
316 | profoffset |
317 | Offset of first unwritten byte in trace output buffer | |
b2e16a85 | 318 | |
dce26c7d HS |
319 | profsize |
320 | Size of trace output buffer | |
b2e16a85 SG |
321 | |
322 | All of these are set by the 'trace calls' command. | |
323 | ||
324 | These variables keep track of the amount of data written to the trace | |
325 | output buffer by the 'trace' command. The trace commands which write data | |
326 | to the output buffer can use these to specify the buffer to write to, and | |
327 | update profoffset each time. This allows successive commands to append data | |
dce26c7d | 328 | to the same buffer, for example:: |
b2e16a85 | 329 | |
dce26c7d HS |
330 | => trace funclist 10000 e00000 |
331 | => trace calls | |
b2e16a85 SG |
332 | |
333 | (the latter command appends more data to the buffer). | |
334 | ||
335 | ||
dce26c7d HS |
336 | fakegocmd |
337 | Specifies commands to run just before booting the OS. This | |
338 | is a useful time to write the trace data to the host for | |
339 | processing. | |
b2e16a85 SG |
340 | |
341 | ||
342 | Writing Out Trace Data | |
343 | ---------------------- | |
344 | ||
345 | Once the trace data is in an output buffer in memory there are various ways | |
346 | to transmit it to the host. Notably you can use tftput to send the data | |
dce26c7d | 347 | over a network link:: |
b2e16a85 | 348 | |
dce26c7d HS |
349 | fakegocmd=trace pause; usb start; set autoload n; bootp; |
350 | trace calls 10000000 1000000; | |
351 | tftpput ${profbase} ${profoffset} 192.168.1.4:/tftpboot/calls | |
b2e16a85 SG |
352 | |
353 | This starts up USB (to talk to an attached USB Ethernet dongle), writes | |
354 | a trace log to address 10000000 and sends it to a host machine using | |
355 | TFTP. After this, U-Boot will boot the OS normally, albeit a little | |
356 | later. | |
357 | ||
e46b7243 SG |
358 | For a filesystem you may do something like:: |
359 | ||
360 | trace calls 10000000 1000000; | |
361 | save mmc 1:1 10000000 /trace ${profoffset} | |
362 | ||
363 | The trace buffer format is internal to the trace system. It consists of a | |
364 | header, a call count for each function site, followed by a list of trace | |
365 | records, once for each function call. | |
366 | ||
b2e16a85 | 367 | |
e46b7243 SG |
368 | Converting Trace Output Data (proftool) |
369 | --------------------------------------- | |
b2e16a85 SG |
370 | |
371 | The trace output data is kept in a binary format which is not documented | |
e46b7243 SG |
372 | here. See the `trace.h` header file if you are interested. To convert it into |
373 | something useful, you can use proftool. | |
b2e16a85 SG |
374 | |
375 | This tool must be given the U-Boot map file and the trace data received | |
e46b7243 SG |
376 | from running that U-Boot. It produces a binary output file. |
377 | ||
378 | It is also possible to provide a configuration file to indicate which functions | |
379 | should be included or dropped during conversion. This file consists of lines | |
380 | like:: | |
381 | ||
382 | include-func <regex> | |
383 | exclude-func <regex> | |
b2e16a85 | 384 | |
e46b7243 SG |
385 | where <regex> is a regular expression matched against function names. It |
386 | allows some functions to be dropped from the trace when producing ftrace | |
387 | records. | |
388 | ||
389 | Options: | |
390 | ||
391 | -c <config_file> | |
392 | Specify the optional configuration file, to control which functions are | |
393 | included in the output. | |
394 | ||
395 | -f <format> | |
396 | Specifies the format to use (see below) | |
b2e16a85 | 397 | |
dce26c7d | 398 | -m <map_file> |
e46b7243 SG |
399 | Specify U-Boot map file (`System.map`) |
400 | ||
401 | -o <output file> | |
402 | Specify the output filename | |
dce26c7d | 403 | |
e46b7243 SG |
404 | -t <trace_file> |
405 | Specify trace file, the data saved from U-Boot | |
406 | ||
407 | -v <0-4> | |
408 | Specify the verbosity, where 0 is the minimum and 4 is for debugging. | |
b2e16a85 SG |
409 | |
410 | Commands: | |
411 | ||
e46b7243 SG |
412 | dump-ftrace: |
413 | Write a binary dump of the file in Linux ftrace format. Two options are | |
414 | available: | |
415 | ||
416 | function | |
417 | write function-call records (caller/callee) | |
418 | ||
419 | funcgraph | |
420 | write function entry/exit records (graph) | |
421 | ||
422 | This format can be used with kernelshark_ and trace_cmd_. | |
423 | ||
424 | dump-flamegraph | |
425 | Write a list of stack records useful for producing a flame graph. Two | |
426 | options are available: | |
b2e16a85 | 427 | |
e46b7243 SG |
428 | calls |
429 | create a flamegraph of stack frames | |
430 | ||
431 | timing | |
432 | create a flamegraph of microseconds for each stack frame | |
433 | ||
434 | This format can be used with flamegraph_pl_. | |
b2e16a85 SG |
435 | |
436 | Viewing the Trace Data | |
437 | ---------------------- | |
438 | ||
e46b7243 SG |
439 | You can use kernelshark_ for a GUI, but note that version 2.0.x was broken. If |
440 | you have that version you could try building it from source. | |
b2e16a85 | 441 | |
e46b7243 SG |
442 | The file must have a .dat extension or it is ignored. The program has terse |
443 | user interface but is very convenient for viewing U-Boot profile information. | |
444 | ||
445 | Also available is trace_cmd_ which provides a command-line interface. | |
b2e16a85 SG |
446 | |
447 | Workflow Suggestions | |
448 | -------------------- | |
449 | ||
450 | The following suggestions may be helpful if you are trying to reduce boot | |
451 | time: | |
452 | ||
453 | 1. Enable CONFIG_BOOTSTAGE and CONFIG_BOOTSTAGE_REPORT. This should get | |
dce26c7d | 454 | you are helpful overall snapshot of the boot time. |
b2e16a85 SG |
455 | |
456 | 2. Build U-Boot with tracing and run it. Note the difference in boot time | |
dce26c7d | 457 | (it is common for tracing to add 10% to the time) |
b2e16a85 | 458 | |
3f80064f | 459 | 3. Collect the trace information as described above. Use this to find where |
dce26c7d | 460 | all the time is being spent. |
b2e16a85 | 461 | |
3f80064f HS |
462 | 4. Take a look at that code and see if you can optimize it. Perhaps it is |
463 | possible to speed up the initialization of a device, or remove an unused | |
dce26c7d | 464 | feature. |
b2e16a85 SG |
465 | |
466 | 5. Rebuild, run and collect again. Compare your results. | |
467 | ||
468 | 6. Keep going until you run out of steam, or your boot is fast enough. | |
469 | ||
470 | ||
471 | Configuring Trace | |
472 | ----------------- | |
473 | ||
474 | There are a few parameters in the code that you may want to consider. | |
475 | There is a function call depth limit (set to 15 by default). When the | |
476 | stack depth goes above this then no tracing information is recorded. | |
477 | The maximum depth reached is recorded and displayed by the 'trace stats' | |
e46b7243 SG |
478 | command. While it might be tempting to set the depth limit quite high, this |
479 | can dramatically increase the size of the trace output as well as the execution | |
480 | time. | |
b2e16a85 SG |
481 | |
482 | ||
483 | Future Work | |
484 | ----------- | |
485 | ||
486 | Tracing could be a little tidier in some areas, for example providing | |
487 | run-time configuration options for trace. | |
488 | ||
489 | Some other features that might be useful: | |
490 | ||
491 | - Trace filter to select which functions are recorded | |
492 | - Sample-based profiling using a timer interrupt | |
493 | - Better control over trace depth | |
494 | - Compression of trace information | |
495 | ||
496 | ||
e46b7243 SG |
497 | .. sectionauthor:: Simon Glass <[email protected]> |
498 | .. April 2013 | |
499 | .. Updated January 2023 | |
500 | ||
501 | .. _kernelshark: https://kernelshark.org/ | |
502 | .. _trace_cmd: https://www.trace-cmd.org/ | |
503 | .. _flamegraph_pl: https://github.com/brendangregg/FlameGraph/blob/master/flamegraph.pl |