James Clark [Wed, 26 Jun 2024 14:54:45 +0000 (15:54 +0100)]
perf pmu: Restore full PMU name wildcard support
Commit b2b9d3a3f021 ("perf pmu: Support wildcards on pmu name in dynamic
pmu events") gives the following example for wildcarding a subset of
PMUs:
E.g., in a system with the following dynamic pmus:
mypmu_0
mypmu_1
mypmu_2
mypmu_4
perf stat -e mypmu_[01]/<config>/
Since commit f91fa2ae6360 ("perf pmu: Refactor perf_pmu__match()"), only
"*" has been supported, removing the ability to subset PMUs, even though
parse-events.l still supports ? and [] characters.
Fix it by using fnmatch() when any glob character is detected and add a
test which covers that and other scenarios of
perf_pmu__match_ignoring_suffix().
Namhyung Kim [Thu, 27 Jun 2024 18:19:16 +0000 (11:19 -0700)]
perf report: Display pregress bar on redirected pipe data
It's possible to save pipe output of perf record into a file.
$ perf record -o- ... > pipe.data
And you can use the data same as the normal perf data.
$ perf report -i pipe.data
In that case, perf tools will treat the input as a pipe, but it can get
the total size of the input. This means it can show the progress bar
unlike the normal pipe input (which doesn't know the total size in
advance).
While at it, fix the string in __perf_session__process_dir_events().
perf test stat_bpf_counter.sh: Stabilize the test results
The test has been failing for some time when two separate runs of
perf benchmarks are recorded for cycles events and their counts are
compared, while once the recording was done with option --bpf-counters
and once without it. It is expected that the count of the samples
should be within a certain range, firstly the difference was set to be
within 10%, which was then later raised to 20%. However, the test case
keeps failing on certain architectures as recording the provided
benchmark can produce completely different counts based on the
current load of the system.
Sampling two separate runs on intel-eaglestream-spr-13 of "perf stat
--no-big-num -e cycles -- perf bench sched messaging -g 1 -l 100 -t":
, which is ranging from 400mil to 1400mil samples.
Instead of recording the cycles use instructions event, which provides
more stable values. At the same time change the tested workload to one
of the provided testing workloads by perf that is not based on a
scheduler, which can provide another dependency on the current load.
Sampling instructions event with the new workload provide much more
stable results on intel-eaglestream-spr-13 of "perf stat --no-big-num
-e instructions -- perf test -w brstack":
Performance counter stats for 'perf test -w brstack':
Ian Rogers [Tue, 25 Jun 2024 21:41:17 +0000 (14:41 -0700)]
perf python: Clean up build dependencies
The python build now depends on libraries and doesn't use
python-ext-sources except for the util/python.c dependency. Switch to
just directly depending on that file and util/setup.py. This allows
the removal of python-ext-sources.
Ian Rogers [Tue, 25 Jun 2024 21:41:16 +0000 (14:41 -0700)]
perf python: Switch module to linking libraries from building source
setup.py was building most perf sources causing setup.py to mimic the
Makefile logic as well as flex/bison code to be stubbed out, due to
complexity building. By using libraries fewer functions are stubbed
out, the build is faster and the Makefile logic is reused which should
simplify updating. The libraries are passed through LDFLAGS to avoid
complexity in python.
Force the -fPIC flag for libbpf.a to ensure it is suitable for linking
into the perf python module.
Ian Rogers [Tue, 25 Jun 2024 21:41:15 +0000 (14:41 -0700)]
perf util: Make util its own library
Make the util directory into its own library. This is done to avoid
compiling code twice, once for the perf tool and once for the perf
python module. For convenience:
arch/common.c
scripts/perl/Perf-Trace-Util/Context.c
scripts/python/Perf-Trace-Util/Context.c
are made part of this library.
Namhyung Kim [Fri, 21 Jun 2024 17:05:28 +0000 (10:05 -0700)]
perf mem: Fix a segfault with NULL event->name
Guilherme reported a crash in perf mem record. It's because the
perf_mem_event->name was NULL on his machine. It should just return
a NULL string when it has no format string in the name.
The backtrace at the crash is below:
Program received signal SIGSEGV, Segmentation fault.
__strchrnul_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:67
67 vmovdqu (%rdi), %ymm2
(gdb) bt
#0 __strchrnul_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:67
#1 0x00007ffff6c982de in __find_specmb (format=0x0) at printf-parse.h:82
#2 __printf_buffer (buf=buf@entry=0x7fffffffc760, format=format@entry=0x0, ap=ap@entry=0x7fffffffc880,
mode_flags=mode_flags@entry=0) at vfprintf-internal.c:649
#3 0x00007ffff6cb7840 in __vsnprintf_internal (string=<optimized out>, maxlen=<optimized out>, format=0x0,
args=0x7fffffffc880, mode_flags=mode_flags@entry=0) at vsnprintf.c:96
#4 0x00007ffff6cb787f in ___vsnprintf (string=<optimized out>, maxlen=<optimized out>, format=<optimized out>,
args=<optimized out>) at vsnprintf.c:103
#5 0x00005555557b9391 in scnprintf (buf=0x555555fe9320 <mem_loads_name> "", size=100, fmt=0x0)
at ../lib/vsprintf.c:21
#6 0x00005555557b74c3 in perf_pmu__mem_events_name (i=0, pmu=0x555556832180) at util/mem-events.c:106
#7 0x00005555557b7ab9 in perf_mem_events__record_args (rec_argv=0x55555684c000, argv_nr=0x7fffffffca20)
at util/mem-events.c:252
#8 0x00005555555e370d in __cmd_record (argc=3, argv=0x7fffffffd760, mem=0x7fffffffcd80) at builtin-mem.c:156
#9 0x00005555555e49c4 in cmd_mem (argc=4, argv=0x7fffffffd760) at builtin-mem.c:514
#10 0x000055555569716c in run_builtin (p=0x555555fcde80 <commands+672>, argc=8, argv=0x7fffffffd760) at perf.c:349
#11 0x0000555555697402 in handle_internal_command (argc=8, argv=0x7fffffffd760) at perf.c:402
#12 0x0000555555697560 in run_argv (argcp=0x7fffffffd59c, argv=0x7fffffffd590) at perf.c:446
#13 0x00005555556978a6 in main (argc=8, argv=0x7fffffffd760) at perf.c:562
Namhyung Kim [Fri, 21 Jun 2024 17:05:27 +0000 (10:05 -0700)]
perf tools: Fix a compiler warning of NULL pointer
A compiler warning on the second argument of bsearch() should not be
NULL, but there's a case we might pass it. Let's return early if we
don't have any DSOs to search in __dsos__find_by_longname_id().
util/dsos.c:184:8: runtime error: null pointer passed as argument 2, which is declared to never be null
Namhyung Kim [Fri, 21 Jun 2024 17:05:26 +0000 (10:05 -0700)]
perf symbol: Simplify kernel module checking
In dso__load(), it checks if the dso is a kernel module by looking the
symtab type. Actually dso has 'is_kmod' field to check that easily and
dso__set_module_info() set the symtab type and the is_kmod bit. So it
should have the same result to check the is_kmod bit.
Namhyung Kim [Fri, 21 Jun 2024 17:05:25 +0000 (10:05 -0700)]
perf report: Fix condition in sort__sym_cmp()
It's expected that both hist entries are in the same hists when
comparing two. But the current code in the function checks one without
dso sort key and other with the key. This would make the condition true
in any case.
I guess the intention of the original commit was to add '!' for the
right side too. But as it should be the same, let's just remove it.
Junhao He [Fri, 14 Jun 2024 09:43:18 +0000 (17:43 +0800)]
perf pmus: Fixes always false when compare duplicates aliases
In the previous loop, all the members in the aliases[j-1] have been freed
and set to NULL. But in this loop, the function pmu_alias_is_duplicate()
compares the aliases[j] with the aliases[j-1] that has already been
disposed, so the function will always return false and duplicate aliases
will never be discarded.
If we find duplicate aliases, it skips the zfree aliases[j], which is
accompanied by a memory leak.
We can use the next aliases[j+1] to theck for duplicate aliases to
fixes the aliases NULL pointer dereference, then goto zfree code snippet
to release it.
After patch testing:
$ perf list --unit=hisi_sicl,cpa pmu
uncore cpa:
cpa_p0_rd_dat_32b
[Number of read ops transmitted by the P0 port which size is 32 bytes.
Unit: hisi_sicl,cpa]
cpa_p0_rd_dat_64b
[Number of read ops transmitted by the P0 port which size is 64 bytes.
Unit: hisi_sicl,cpa]
Yunseong Kim [Wed, 19 Jun 2024 20:34:29 +0000 (05:34 +0900)]
util: constant -1 with expression of type char
This patch resolve following warning.
tools/perf/util/evsel.c:1620:9: error: result of comparison of constant
-1 with expression of type 'char' is always false
-Werror,-Wtautological-constant-out-of-range-compare
1620 | if (c == -1)
| ~ ^ ~~
Fernand Sieber [Tue, 18 Jun 2024 09:03:39 +0000 (11:03 +0200)]
perf: Timehist account sch delay for scheduled out running
When using perf timehist, sch delay is only computed for a waking task,
not for a pre empted task. This patches changes sch delay to account for
both. This makes sense as testing scheduling policy need to consider the
effect of scheduling delay globally, not only for waking tasks.
Example of `perf timehist` report before the patch for `stress` task
competing with each other.
First column is wait time, second column sch delay, third column
runtime.
1.492060 [0000] s stress[81] 1.999 0.000 2.000 R next: stress[83]
1.494060 [0000] s stress[83] 2.000 0.000 2.000 R next: stress[81]
1.496060 [0000] s stress[81] 2.000 0.000 2.000 R next: stress[83]
1.498060 [0000] s stress[83] 2.000 0.000 1.999 R next: stress[81]
After the patch, it looks like this (note that all wait time is not zero
anymore):
1.492060 [0000] s stress[81] 1.999 1.999 2.000 R next: stress[83]
1.494060 [0000] s stress[83] 2.000 2.000 2.000 R next: stress[81]
1.496060 [0000] s stress[81] 2.000 2.000 2.000 R next: stress[83]
1.498060 [0000] s stress[83] 2.000 2.000 1.999 R next: stress[81]
Adrian Hunter [Thu, 2 May 2024 10:58:52 +0000 (13:58 +0300)]
perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder
JMPABS is 64-bit absolute direct jump instruction, encoded with a mandatory
REX2 prefix. JMPABS is designed to be used in the procedure linkage table
(PLT) to replace indirect jumps, because it has better performance. In that
case the jump target will be amended at run time. To enable Intel PT to
follow the code, a TIP packet is always emitted when JMPABS is traced under
Intel PT.
Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.
Decode JMPABS as an indirect jump, because it has an associated TIP packet
the same as an indirect jump and the control flow should follow the TIP
packet payload, and not assume it is the same as the on-file object code
JMPABS target address.
perf test: Check output of the probe ... --funcs command
Test "perf probe of function from different CU" only checks if the perf
command has failed and doesn't test the --funcs output. In the issue
reported in the previous commit, the garbage output of the --funcs
command was being ignored by the test when it could have been caught.
The script first makes use of --funcs option with the perf probe command
to check if the function "foo" exists in the testfile before adding a
probe to it in the next command. The output of probe...--funcs command
is redirected to stdout, therefore, add '| grep "foo"' to validate the
result.
Athira Rajeev [Sun, 23 Jun 2024 06:48:50 +0000 (12:18 +0530)]
tools/perf: Fix parallel-perf python script to replace new python syntax ":=" usage
perf test "perf script tests" fails as below in systems
with python 3.6
File "/home/athira/linux/tools/perf/tests/shell/../../scripts/python/parallel-perf.py", line 442
if line := p.stdout.readline():
^
SyntaxError: invalid syntax
--- Cleaning up ---
---- end(-1) ----
92: perf script tests: FAILED!
This happens because ":=" is a new syntax that assigns values
to variables as part of a larger expression. This is introduced
from python 3.8 and hence fails in setup with python 3.6
Address this by splitting the large expression and check the
value in two steps:
Previous line: if line := p.stdout.readline():
Current change:
line = p.stdout.readline()
if line:
With patch
./perf test "perf script tests"
93: perf script tests: Ok
Athira Rajeev [Sun, 23 Jun 2024 06:48:49 +0000 (12:18 +0530)]
tools/perf: Use is_perf_pid_map_name helper function to check dso's of pattern /tmp/perf-%d.map
commit 80d496be89ed ("perf report: Add support for profiling JIT
generated code") added support for profiling JIT generated code.
This patch handles dso's of form "/tmp/perf-$PID.map".
Some of the references doesn't check exactly for same pattern.
some uses "if (!strncmp(dso_name, "/tmp/perf-", 10))". Fix
this by using helper function perf_pid_map_tid and
is_perf_pid_map_name which looks for proper pattern of
form: "/tmp/perf-$PID.map" for these checks.
Athira Rajeev [Sun, 23 Jun 2024 06:48:48 +0000 (12:18 +0530)]
tools/perf: Fix the string match for "/tmp/perf-$PID.map" files in dso__load
Perf test for perf probe of function from different CU fails
as below:
./perf test -vv "test perf probe of function from different CU"
116: test perf probe of function from different CU:
--- start ---
test child forked, pid 2679
Failed to find symbol foo in /tmp/perf-uprobe-different-cu-sh.Msa7iy89bx/testfile
Error: Failed to add events.
--- Cleaning up ---
"foo" does not hit any event.
Error: Failed to delete events.
---- end(-1) ----
116: test perf probe of function from different CU : FAILED!
# ./perf probe -x /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile foo
Failed to find symbol foo in /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile
Error: Failed to add events.
Perf probe fails to find symbol foo in the executable placed in
/tmp/perf-uprobe-different-cu-sh.XniNxNEVT7
# gcc -g -o test test.c
# cp test /tmp/perf-checkcWpuLRQI8j/
# nm /tmp/perf-checkcWpuLRQI8j/test | grep foo 00000000100006bc T foo
# ./perf probe -x /tmp/perf-checkcWpuLRQI8j/test foo
Failed to find symbol foo in /tmp/perf-checkcWpuLRQI8j/test
Error: Failed to add events.
But it works with any files like /tmp/perf/test. Only for
patterns with "/tmp/perf-", this fails.
Further debugging, commit 80d496be89ed ("perf report: Add support
for profiling JIT generated code") added support for profiling JIT
generated code. This patch handles dso's of form
"/tmp/perf-$PID.map" .
The check used "if (strncmp(self->name, "/tmp/perf-", 10) == 0)"
to match "/tmp/perf-$PID.map". With this commit, any dso in
/tmp/perf- folder will be considered separately for processing
(not only JIT created map files ). Fix this by changing the
string pattern to check for "/tmp/perf-%d.map". Add a helper
function is_perf_pid_map_name to do this check. In "struct dso",
dso->long_name holds the long name of the dso file. Since the
/tmp/perf-$PID.map check uses the complete name, use dso___long_name for
the string name.
With the fix,
# ./perf test "test perf probe of function from different CU"
117: test perf probe of function from different CU : Ok
James Clark [Wed, 12 Jun 2024 14:03:14 +0000 (15:03 +0100)]
perf test: Make test_arm_callgraph_fp.sh more robust
The 2 second sleep can cause the test to fail on very slow network file
systems because Perf ends up being killed before it finishes starting
up.
Fix it by making the leafloop workload end after a fixed time like the
other workloads so there is no need to kill it after 2 seconds.
Also remove the 1 second start sampling delay because it is similarly
fragile. Instead, search through all samples for a matching one, rather
than just checking the first sample and hoping it's in the right place.
perf build: Ensure libtraceevent and libtracefs versions have 3 components
When either of these have a shorter version, like 1.8, the expression
that computes the version has a syntax error that can be seen in the
output of make:
perf build: Use pkg-config for feature check for libtrace{event,fs}
Needed to add required include directories for the feature detection
to succeed. The header tracefs.h is installed either into the include
directory /usr/include/tracefs/tracefs.h when using the Makefile, or
into /usr/include/libtracefs/tracefs.h when using meson to build
libtracefs. The header tracefs.h uses #include <event-parse.h> from
libtraceevent, so pkg-config needs to pick the correct include directory
for libtracefs and add the one for libtraceevent to succeed.
Note that in baa2ca59ec1e31ccbe3f24ff0368152b36f68720 the variable
LIBTRACEEVENT_DIR was introduced, and now the method to compile against
non-standard locations requires PKG_CONFIG_PATH to be set instead, which
works for both libtraceevent and libtracefs.
Ian Rogers [Fri, 7 Jun 2024 06:53:43 +0000 (23:53 -0700)]
perf arm: Workaround ARM PMUs cpu maps having offline cpus
When PMUs have a cpu map in the 'cpus' or 'cpumask' file, perf will
try to open events on those CPUs. ARM doesn't remove offline CPUs
meaning taking a CPU offline will cause perf commands to fail unless a
CPU map is passed on the command line.
Kan Liang [Thu, 6 Jun 2024 18:03:16 +0000 (11:03 -0700)]
perf stat: Fix the hard-coded metrics calculation on the hybrid
The hard-coded metrics is wrongly calculated on the hybrid machine.
$ perf stat -e cycles,instructions -a sleep 1
Performance counter stats for 'system wide':
18,205,487 cpu_atom/cycles/
9,733,603 cpu_core/cycles/
9,423,111 cpu_atom/instructions/ # 0.52 insn per cycle
4,268,965 cpu_core/instructions/ # 0.23 insn per cycle
The insn per cycle for cpu_core should be 4,268,965 / 9,733,603 = 0.44.
When finding the metric events, the find_stat() doesn't take the PMU
type into account. The cpu_atom/cycles/ is wrongly used to calculate
the IPC of the cpu_core.
In the hard-coded metrics, the events from a different PMU are only
SW_CPU_CLOCK and SW_TASK_CLOCK. They both have the stat type,
STAT_NSECS. Except the SW CLOCK events, check the PMU type as well.
The metrics aren't updated as they require retirement latency support
that is added in this series:
https://lore.kernel.org/lkml/20240613033631[email protected]/
Ravi Bangoria [Thu, 20 Jun 2024 05:41:04 +0000 (05:41 +0000)]
perf doc: Add AMD IBS usage document
Add a perf man page document that describes how to exploit AMD IBS with
Linux perf. Brief intro about IBS and simple one-liner examples will help
naive users to get started. This is not meant to be an exhaustive IBS
guide. User should refer latest AMD64 Architecture Programmer's Manual
for detailed description of IBS.
2. Regexp not found: "probe:vfs_mknod"
Regexp not found: "probe:vfs_create"
Regexp not found: "probe:vfs_rmdir"
Regexp not found: "probe:vfs_link"
Regexp not found: "probe:vfs_write"
-- [ FAIL ] -- perf_probe :: test_adding_kernel :: wildcard adding support (command exitcode + output regexp parsing)
3. Regexp not found: "Failed to find"
Regexp not found: "somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64"
Regexp not found: "in this function|at this address"
Line did not match any pattern: "The /boot/vmlinux file has no debug information."
Line did not match any pattern: "Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo package."
These three tests depends on kernel debug info.
1. Fail 1 expects file name along with probe which needs debuginfo
2. Fail 2 :
perf probe -nf --max-probes=512 -a 'vfs_* $params'
Debuginfo-analysis is not supported.
Error: Failed to add events.
3. Fail 3 :
perf probe 'vfs_read somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64'
Debuginfo-analysis is not supported.
Error: Failed to add events.
There is already helper function skip_if_no_debuginfo in
lib/probe_vfs_getname.sh which does perf probe and returns
"2" if debug info is not present. Use the skip_if_no_debuginfo
function and skip only the three tests which needs debuginfo
based on the result.
With the patch:
83: perftool-testsuite_probe:
--- start ---
test child forked, pid 3927
-- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission ::
-- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission :: -a
-- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission :: --add
-- [ PASS ] -- perf_probe :: test_adding_kernel :: listing added probe :: perf list
Regexp not found: "\s*probe:inode_permission(?:_\d+)?\s+\(on inode_permission(?:[:\+][0-9A-Fa-f]+)?@.+\)"
-- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped
-- [ PASS ] -- perf_probe :: test_adding_kernel :: using added probe
-- [ PASS ] -- perf_probe :: test_adding_kernel :: deleting added probe
-- [ PASS ] -- perf_probe :: test_adding_kernel :: listing removed probe (should NOT be listed)
-- [ PASS ] -- perf_probe :: test_adding_kernel :: dry run :: adding probe
-- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: first probe adding
-- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: second probe adding (without force)
-- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: second probe adding (with force)
-- [ PASS ] -- perf_probe :: test_adding_kernel :: using doubled probe
-- [ PASS ] -- perf_probe :: test_adding_kernel :: removing multiple probes
Regexp not found: "probe:vfs_mknod"
Regexp not found: "probe:vfs_create"
Regexp not found: "probe:vfs_rmdir"
Regexp not found: "probe:vfs_link"
Regexp not found: "probe:vfs_write"
-- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped
Regexp not found: "Failed to find"
Regexp not found: "somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64"
Regexp not found: "in this function|at this address"
Line did not match any pattern: "The /boot/vmlinux file has no debug information."
Line did not match any pattern: "Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo package."
-- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped
-- [ PASS ] -- perf_probe :: test_adding_kernel :: function with retval :: add
-- [ PASS ] -- perf_probe :: test_adding_kernel :: function with retval :: record
-- [ PASS ] -- perf_probe :: test_adding_kernel :: function argument probing :: script
## [ PASS ] ## perf_probe :: test_adding_kernel SUMMARY
---- end(0) ----
83: perftool-testsuite_probe : Ok
Only the three specific tests are skipped and remaining
ran successfully.
Namhyung Kim [Fri, 7 Jun 2024 20:29:17 +0000 (13:29 -0700)]
perf hist: Add symbol_conf.skip_empty
Add the skip_empty flag to symbol_conf and set the value from the report
command to preserve the existing behavior. This makes the code simpler
and will be needed other code which is hard to add a new argument.
Namhyung Kim [Fri, 7 Jun 2024 20:29:15 +0000 (13:29 -0700)]
perf hist: Factor out __hpp__fmt_print()
Split the logic to print the histogram values according to the format
string. This was used in 3 different places so it's better to move out
the logic into a function.
Fernand Sieber [Fri, 14 Jun 2024 07:35:17 +0000 (09:35 +0200)]
perf: sched map skips redundant lines with cpu filters
perf sched map supports cpu filter.
However, even with cpu filters active, any context switch currently
corresponds to a separate line.
As result, context switches on irrelevant cpus result to redundant lines,
which makes the output particlularly difficult to read on wide
architectures.
Ian Rogers [Wed, 12 Jun 2024 12:40:27 +0000 (05:40 -0700)]
perf test pmu: Warn don't fail for legacy mixed case event names
PowerPC has mixed case events matching legacy hardware cache
events. Warn but don't fail in this case. Event parsing will still
work in this case by matching the legacy case.
Athira Rajeev [Fri, 7 Jun 2024 04:43:54 +0000 (10:13 +0530)]
tools/perf: Fix timing issue with parallel threads in perf bench wake-up-parallel
perf bench futex fails as below and hangs intermittently when
attempted to run on on a powerpc system:
./perf bench futex wake-parallel
Running 'futex/wake-parallel' benchmark:
Run summary [PID 88588]: blocking on 640 threads (at [private] futex 0x10464b8c), 640 threads waking up 1 at a time.
[Run 1]: Avg per-thread latency (waking 1/640 threads) in 0.1309 ms (+-53.27%)
[Run 2]: Avg per-thread latency (waking 1/640 threads) in 0.0120 ms (+-31.16%)
[Run 3]: Avg per-thread latency (waking 1/640 threads) in 0.1474 ms (+-92.47%)
[Run 4]: Avg per-thread latency (waking 1/640 threads) in 0.2883 ms (+-67.75%)
[Run 5]: Avg per-thread latency (waking 1/640 threads) in 0.4108 ms (+-39.60%)
[Run 6]: Avg per-thread latency (waking 1/640 threads) in 0.7843 ms (+-78.98%)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
In the system, where perf bench wake-up-parallel is has system
configuration of 640 cpus. After debugging, this turned out to be
a timing issue. The benchmark creates threads equal to number of
cpus and issues a futex_wait. Then it does a usleep for .1 second
before initiating futex_wake. In system configuration with more
threads, the usleep time is not enough. Patch changes the usleep
from 100000 to 200000
With the patch, ran multiple iterations and there were no issues
further seen
Athira Rajeev [Fri, 7 Jun 2024 04:43:53 +0000 (10:13 +0530)]
tools/perf: Fix perf bench epoll to enable the run when some CPU's are offline
Perf bench epoll fails as below when attempted to run on
on a powerpc system:
./perf bench epoll wait
Running 'epoll/wait' benchmark:
Run summary [PID 627653]: 79 threads monitoring on 64 file-descriptors for 8 secs.
perf: pthread_create: No such file or directory
In the setup where this perf bench was ran, difference was that
partition had 640 CPU's, but not all CPUs were online. 80 CPUs
were online. While creating threads and using epoll_wait , code
sets the affinity using cpumask. The cpumask size used is 80
which is picked from "nrcpus = perf_cpu_map__nr(cpu)". Here the
benchmark reports fail while setting affinity for cpu number which
is greater than 80 or higher, because it attempts to set a bit
position which is not allocated on the cpumask. Fix this by changing
the size of cpumask to number of possible cpus and not the number
of online cpus.
Athira Rajeev [Fri, 7 Jun 2024 04:43:52 +0000 (10:13 +0530)]
tools/perf: Fix perf bench futex to enable the run when some CPU's are offline
Perf bench futex fails as below when attempted to run on
on a powerpc system:
./perf bench futex all
Running futex/hash benchmark...
Run summary [PID 626307]: 80 threads, each operating on 1024 [private] futexes for 10 secs.
perf: pthread_create: No such file or directory
In the setup where this perf bench was ran, difference was that
partition had 640 CPU's, but not all CPUs were online. 80 CPUs
were online. While blocking the threads with futex_wait, code
sets the affinity using cpumask. The cpumask size used is 80
which is picked from "nrcpus = perf_cpu_map__nr(cpu)". Here the
benchmark reports fail while setting affinity for cpu number which
is greater than 80 or higher, because it attempts to set a bit
position which is not allocated on the cpumask. Fix this by changing
the size of cpumask to number of possible cpus and not the number
of online cpus.
Ian Rogers [Fri, 3 May 2024 23:28:49 +0000 (16:28 -0700)]
perf evsel: Refactor tool events
Tool events unnecessarily open a dummy perf event which is useless
even with `perf record` which will still open a dummy event. Change
the behavior of tool events so:
- duration_time - call `rdclock` on open and then report the count as
a delta since the start in evsel__read_counter. This moves code out
of builtin-stat making it more general purpose.
- user_time/system_time - open the fd as either `/proc/pid/stat` or
`/proc/stat` for cases like system wide. evsel__read_counter will
read the appropriate field out of the procfs file. These values
were previously supplied by wait4, if the procfs read fails then
the wait4 values are used, assuming the process/thread terminated.
By reading user_time and system_time this way, interval mode, per
PID and per CPU can be supported although there are restrictions
given what the files provide (e.g. per PID can't be combined with
per CPU).
Opening any of the tool events for `perf record` is changed to return
invalid.
Thomas Richter [Fri, 7 Jun 2024 05:43:52 +0000 (07:43 +0200)]
perf test: Speed up test case 70 annotate basic tests
On some s390 linux machine (mostly older models) and with debug
packages installed, the test case 'perf annotate basic tests' runs
for some longer time.
Speed up the test and save the output of command perf annotate
in a temporary file. This is used to perform pattern matching via
grep command. This saves on invocation of perf annotate which
runs for some time.
Ian Rogers [Wed, 5 Jun 2024 06:38:28 +0000 (23:38 -0700)]
perf stat: Choose the most disaggregate command line option
When multiple aggregation options are passed to perf stat the behavior
isn't clear. Consider "perf stat -A --per-socket .." and "perf stat
--per-socket -A ..", the first won't aggregate at all while the second
will do per-socket aggregation, even though the same options were
passed.
Rather than set an enum value, gather the options in a struct and
process them from most to least aggregate. This ensures the least
aggregate option always applies, so no aggregation if "-A" is passed.
Ian Rogers [Wed, 5 Jun 2024 06:38:27 +0000 (23:38 -0700)]
perf stat: Make options local
Reduce the scope of stat_options to cmd_stat, and pass as an argument
to __cmd_record. This is done to make more localized changes to the
options in later patches. A side-effect of the change is to reduce the
size of a stripped PIE perf binary by 5952 bytes. The savings come
mainly in the dynamic relocation section.
Ian Rogers [Tue, 21 May 2024 16:51:09 +0000 (09:51 -0700)]
perf maps: Add/use a sorted insert for fixup overlap and insert
Data may have lots of overlapping mmaps. The regular insert adds at
the end and relies on a later sort. For data with overlapping mappings
the sort will happen during a subsequent maps__find or
__maps__fixup_overlap_and_insert, there's never a period where the
inserted maps buffer up and a single sort happens. To avoid back to
back sorts, maintain the sort order when fixing up and
inserting. Previously the first_ending_after search was O(log n) where
n is the size of maps, and the insert was O(1) but because of the
continuous sorting was becoming O(n*log(n)). With maintaining sort
order, the insert now becomes O(n) for a memmove.
For a perf report on a perf.data file containing overlapping mappings
the time numbers are:
Ian Rogers [Tue, 21 May 2024 16:51:08 +0000 (09:51 -0700)]
perf maps: Reduce sorting for overlapping mappings
When an 'after' map is generated the 'new' map must be before it so
terminate iterating and don't resort. If the entry 'pos' is entirely
overlapped by the 'new' mapping then don't remove and insert the
mapping, just replace - again to remove sorting.
For a perf report on a perf.data file containing overlapping mappings
the time numbers are:
perf: parse-events: Fix compilation error while defining DEBUG_PARSER
Compiling perf tool with 'DEBUG_PARSER=1' leads to errors:
$> make -C tools/perf PARSER_DEBUG=1 NO_LIBTRACEEVENT=1
...
CC util/expr-flex.o
CC util/expr.o
util/parse-events.c:33:12: error: redundant redeclaration of ‘parse_events_debug’ [-Werror=redundant-decls]
33 | extern int parse_events_debug;
| ^~~~~~~~~~~~~~~~~~
In file included from util/parse-events.c:18:
util/parse-events-bison.h:43:12: note: previous declaration of ‘parse_events_debug’ with type ‘int’
43 | extern int parse_events_debug;
| ^~~~~~~~~~~~~~~~~~
util/expr.c:27:12: error: redundant redeclaration of ‘expr_debug’ [-Werror=redundant-decls]
27 | extern int expr_debug;
| ^~~~~~~~~~
In file included from util/expr.c:11:
util/expr-bison.h:43:12: note: previous declaration of ‘expr_debug’ with type ‘int’
43 | extern int expr_debug;
| ^~~~~~~~~~
cc-1: all warnings being treated as errors
Remove extern declaration from the parse-envents.c file as there is a
conflict with the ones generated using bison and yacc tools from the file
parse-events.[ly].
Ian Rogers [Fri, 24 May 2024 20:52:27 +0000 (13:52 -0700)]
perf top: Allow filters on events
Allow filters to be added to perf top events. One use is to workaround
issues with:
```
$ perf top --uid="$(id -u)"
```
which tries to scan /proc find processes belonging to the uid and can
fail in such a pid terminates between the scan and the
perf_event_open reporting:
```
Error:
The sys_perf_event_open() syscall returned with 3 (No such process) for event (cycles:P).
/bin/dmesg | grep -i perf may provide additional information.
```
A similar filter:
```
$ perf top -e cycles:P --filter "uid == $(id -u)"
```
doesn't fail this way.
Ian Rogers [Fri, 24 May 2024 20:52:26 +0000 (13:52 -0700)]
perf bpf filter: Add uid and gid terms
Allow the BPF filter to use the uid and gid terms determined by the
bpf_get_current_uid_gid BPF helper. For example, the following will
record the cpu-clock event system wide discarding samples that don't
belong to the current user.
$ perf record -e cpu-clock --filter "uid == $(id -u)" -a sleep 0.1
Ian Rogers [Fri, 24 May 2024 20:52:25 +0000 (13:52 -0700)]
perf bpf filter: Give terms their own enum
Give the term types their own enum so that additional terms can be
added that don't correspond to a PERF_SAMPLE_xx flag. The term values
are numerically ascending rather than bit field positions, this means
they need translating to a PERF_SAMPLE_xx bit field in certain places
using a shift.
Ian Rogers [Sun, 19 May 2024 18:17:16 +0000 (11:17 -0700)]
tools api io: Move filling the io buffer to its own function
In general a read fills 4kb so filling the buffer is a 1 in 4096
operation, move it out of the io__get_char function to avoid some
checking overhead and to better hint the function is good to inline.
For perf's IO intensive internal (non-rigorous) benchmarks there's a
small improvement to kallsyms-parsing with a default build.
Before:
```
$ perf bench internals all
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 146.322 usec (+- 0.305 usec)
Average num. events: 61.000 (+- 0.000)
Average time per event 2.399 usec
Average data synthesis took: 145.056 usec (+- 0.155 usec)
Average num. events: 329.000 (+- 0.000)
Average time per event 0.441 usec
Average kallsyms__parse took: 162.313 ms (+- 0.599 ms)
...
Computing performance of sysfs PMU event scan for 100 times
Average core PMU scanning took: 53.720 usec (+- 7.823 usec)
Average PMU scanning took: 375.145 usec (+- 23.974 usec)
```
After:
```
$ perf bench internals all
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 127.829 usec (+- 0.079 usec)
Average num. events: 61.000 (+- 0.000)
Average time per event 2.096 usec
Average data synthesis took: 133.652 usec (+- 0.101 usec)
Average num. events: 327.000 (+- 0.000)
Average time per event 0.409 usec
Average kallsyms__parse took: 150.415 ms (+- 0.313 ms)
...
Computing performance of sysfs PMU event scan for 100 times
Average core PMU scanning took: 47.790 usec (+- 1.178 usec)
Average PMU scanning took: 376.945 usec (+- 23.683 usec)
```
Changbin Du [Wed, 22 May 2024 03:35:42 +0000 (11:35 +0800)]
perf trace beauty: Always show mmap prot even though PROT_NONE
PROT_NONE is also useful information, so do not omit the mmap prot even
though it is 0. syscall_arg__scnprintf_mmap_prot() could print PROT_NONE
for prot 0.
Before: PROT_NONE is not shown.
$ sudo perf trace -e syscalls:sys_enter_mmap --filter prot==0 -- ls
0.000 ls/2979231 syscalls:sys_enter_mmap(len: 4220888, flags: PRIVATE|ANONYMOUS)
Breno Leitao [Fri, 17 May 2024 14:14:26 +0000 (07:14 -0700)]
perf list: Fix the --no-desc option
Currently, the --no-desc option in perf list isn't functioning as
intended.
This issue arises from the overwriting of struct option->desc with the
opposite value of struct option->long_desc. Consequently, whatever
parse_options() returns at struct option->desc gets overridden later,
rendering the --desc or --no-desc arguments ineffective.
To resolve this, set ->desc as true by default and allow parse_options()
to adjust it accordingly. This adjustment will fix the --no-desc
option while preserving the functionality of the other parameters.