]>
Commit | Line | Data |
---|---|---|
8df2d75e CD |
1 | ========================= |
2 | Hardware Latency Detector | |
3 | ========================= | |
4 | ||
5 | Introduction | |
c850ed38 JM |
6 | ------------- |
7 | ||
8 | The tracer hwlat_detector is a special purpose tracer that is used to | |
9 | detect large system latencies induced by the behavior of certain underlying | |
10 | hardware or firmware, independent of Linux itself. The code was developed | |
11 | originally to detect SMIs (System Management Interrupts) on x86 systems, | |
12 | however there is nothing x86 specific about this patchset. It was | |
13 | originally written for use by the "RT" patch since the Real Time | |
14 | kernel is highly latency sensitive. | |
15 | ||
16 | SMIs are not serviced by the Linux kernel, which means that it does not | |
17 | even know that they are occuring. SMIs are instead set up by BIOS code | |
18 | and are serviced by BIOS code, usually for "critical" events such as | |
19 | management of thermal sensors and fans. Sometimes though, SMIs are used for | |
20 | other tasks and those tasks can spend an inordinate amount of time in the | |
21 | handler (sometimes measured in milliseconds). Obviously this is a problem if | |
22 | you are trying to keep event service latencies down in the microsecond range. | |
23 | ||
24 | The hardware latency detector works by hogging one of the cpus for configurable | |
25 | amounts of time (with interrupts disabled), polling the CPU Time Stamp Counter | |
26 | for some period, then looking for gaps in the TSC data. Any gap indicates a | |
27 | time when the polling was interrupted and since the interrupts are disabled, | |
28 | the only thing that could do that would be an SMI or other hardware hiccup | |
29 | (or an NMI, but those can be tracked). | |
30 | ||
31 | Note that the hwlat detector should *NEVER* be used in a production environment. | |
32 | It is intended to be run manually to determine if the hardware platform has a | |
33 | problem with long system firmware service routines. | |
34 | ||
8df2d75e | 35 | Usage |
c850ed38 JM |
36 | ------ |
37 | ||
38 | Write the ASCII text "hwlat" into the current_tracer file of the tracing system | |
39 | (mounted at /sys/kernel/tracing or /sys/kernel/tracing). It is possible to | |
40 | redefine the threshold in microseconds (us) above which latency spikes will | |
41 | be taken into account. | |
42 | ||
8df2d75e | 43 | Example:: |
c850ed38 JM |
44 | |
45 | # echo hwlat > /sys/kernel/tracing/current_tracer | |
46 | # echo 100 > /sys/kernel/tracing/tracing_thresh | |
47 | ||
48 | The /sys/kernel/tracing/hwlat_detector interface contains the following files: | |
49 | ||
8df2d75e CD |
50 | - width - time period to sample with CPUs held (usecs) |
51 | must be less than the total window size (enforced) | |
52 | - window - total period of sampling, width being inside (usecs) | |
c850ed38 JM |
53 | |
54 | By default the width is set to 500,000 and window to 1,000,000, meaning that | |
55 | for every 1,000,000 usecs (1s) the hwlat detector will spin for 500,000 usecs | |
56 | (0.5s). If tracing_thresh contains zero when hwlat tracer is enabled, it will | |
57 | change to a default of 10 usecs. If any latencies that exceed the threshold is | |
58 | observed then the data will be written to the tracing ring buffer. | |
59 | ||
60 | The minimum sleep time between periods is 1 millisecond. Even if width | |
61 | is less than 1 millisecond apart from window, to allow the system to not | |
62 | be totally starved. | |
63 | ||
64 | If tracing_thresh was zero when hwlat detector was started, it will be set | |
65 | back to zero if another tracer is loaded. Note, the last value in | |
66 | tracing_thresh that hwlat detector had will be saved and this value will | |
67 | be restored in tracing_thresh if it is still zero when hwlat detector is | |
68 | started again. | |
69 | ||
70 | The following tracing directory files are used by the hwlat_detector: | |
71 | ||
72 | in /sys/kernel/tracing: | |
73 | ||
8df2d75e CD |
74 | - tracing_threshold - minimum latency value to be considered (usecs) |
75 | - tracing_max_latency - maximum hardware latency actually observed (usecs) | |
76 | - tracing_cpumask - the CPUs to move the hwlat thread across | |
77 | - hwlat_detector/width - specified amount of time to spin within window (usecs) | |
78 | - hwlat_detector/window - amount of time between (width) runs (usecs) | |
8fa826b7 | 79 | - hwlat_detector/mode - the thread mode |
0330f7aa | 80 | |
f46b1652 | 81 | By default, one hwlat detector's kernel thread will migrate across each CPU |
8fa826b7 DBO |
82 | specified in cpumask at the beginning of a new window, in a round-robin |
83 | fashion. This behavior can be changed by changing the thread mode, | |
84 | the available options are: | |
85 | ||
86 | - none: do not force migration | |
87 | - round-robin: migrate across each CPU specified in cpumask [default] | |
f46b1652 | 88 | - per-cpu: create one thread for each cpu in tracing_cpumask |