Git Repo - linux.git/log

perf pmu: Restore full PMU name wildcard support

Commit b2b9d3a3f021 ("perf pmu: Support wildcards on pmu name in dynamic
pmu events") gives the following example for wildcarding a subset of
PMUs:

  E.g., in a system with the following dynamic pmus:

        mypmu_0
        mypmu_1
        mypmu_2
        mypmu_4

  perf stat -e mypmu_[01]/<config>/

Since commit f91fa2ae6360 ("perf pmu: Refactor perf_pmu__match()"), only
"*" has been supported, removing the ability to subset PMUs, even though
parse-events.l still supports ? and [] characters.

Fix it by using fnmatch() when any glob character is detected and add a
test which covers that and other scenarios of
perf_pmu__match_ignoring_suffix().

Fixes: f91fa2ae6360 ("perf pmu: Refactor perf_pmu__match()")
Signed-off-by: James Clark <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf report: Display pregress bar on redirected pipe data

It's possible to save pipe output of perf record into a file.

  $ perf record -o- ... > pipe.data

And you can use the data same as the normal perf data.

  $ perf report -i pipe.data

In that case, perf tools will treat the input as a pipe, but it can get
the total size of the input.  This means it can show the progress bar
unlike the normal pipe input (which doesn't know the total size in
advance).

While at it, fix the string in __perf_session__process_dir_events().

Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf test stat_bpf_counter.sh: Stabilize the test results

The test has been failing for some time when two separate runs of
perf benchmarks are recorded for cycles events and their counts are
compared, while once the recording was done with option --bpf-counters
and once without it. It is expected that the count of the samples
should be within a certain range, firstly the difference was set to be
within 10%, which was then later raised to 20%. However, the test case
keeps failing on certain architectures as recording the provided
benchmark can produce completely different counts based on the
current load of the system.

Sampling two separate runs on intel-eaglestream-spr-13 of "perf stat
--no-big-num -e cycles -- perf bench sched messaging -g 1 -l 100 -t":

Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':

         396782898      cycles

       0.010051983 seconds time elapsed

       0.008664000 seconds user
       0.097058000 seconds sys

Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':

        1431133032      cycles

       0.021803714 seconds time elapsed

       0.023377000 seconds user
       0.349918000 seconds sys

, which is ranging from 400mil to 1400mil samples.

Instead of recording the cycles use instructions event, which provides
more stable values. At the same time change the tested workload to one
of the provided testing workloads by perf that is not based on a
scheduler, which can provide another dependency on the current load.

Sampling instructions event with the new workload provide much more
stable results on intel-eaglestream-spr-13 of "perf stat --no-big-num
-e instructions -- perf test -w brstack":

Performance counter stats for 'perf test -w brstack':

          64584494      instructions

       0.009173945 seconds time elapsed

       0.007262000 seconds user
       0.002071000 seconds sys

Performance counter stats for 'perf test -w brstack':

          64672669      instructions

       0.008888135 seconds time elapsed

       0.005018000 seconds user
       0.004018000 seconds sys

Signed-off-by: Veronika Molnarova <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf python: Clean up build dependencies

The python build now depends on libraries and doesn't use
python-ext-sources except for the util/python.c dependency. Switch to
just directly depending on that file and util/setup.py. This allows
the removal of python-ext-sources.

Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Cc: Ze Gao <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: John Garry <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: Björn Roy Baron <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Paul Walmsley <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf python: Switch module to linking libraries from building source

setup.py was building most perf sources causing setup.py to mimic the
Makefile logic as well as flex/bison code to be stubbed out, due to
complexity building. By using libraries fewer functions are stubbed
out, the build is faster and the Makefile logic is reused which should
simplify updating. The libraries are passed through LDFLAGS to avoid
complexity in python.

Force the -fPIC flag for libbpf.a to ensure it is suitable for linking
into the perf python module.

Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Cc: Ze Gao <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: John Garry <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: Björn Roy Baron <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Paul Walmsley <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf util: Make util its own library

Make the util directory into its own library. This is done to avoid
compiling code twice, once for the perf tool and once for the perf
python module. For convenience:
  arch/common.c
  scripts/perl/Perf-Trace-Util/Context.c
  scripts/python/Perf-Trace-Util/Context.c
are made part of this library.

Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Cc: Ze Gao <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: John Garry <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: Björn Roy Baron <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Paul Walmsley <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf bench: Make bench its own library

Make the benchmark code into a library so it may be linked against
things like the python module to avoid compiling code twice.

Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Cc: Ze Gao <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: John Garry <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: Björn Roy Baron <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Paul Walmsley <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf test: Make tests its own library

Make the tests code its own library. This is done to avoid compiling
code twice, once for the perf tool and once for the perf python
module.

Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Cc: Ze Gao <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: John Garry <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: Björn Roy Baron <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Paul Walmsley <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf pmu-events: Make pmu-events a library

Make pmu-events into a library so it may be linked against things like
the python module and not built from source.

Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Cc: Ze Gao <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: John Garry <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: Björn Roy Baron <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Paul Walmsley <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf ui: Make ui its own library

Make the ui code its own library. This is done to avoid compiling code
twice, once for the perf tool and once for the perf python module.

Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Cc: Ze Gao <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: John Garry <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: Björn Roy Baron <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Paul Walmsley <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf build: Add '*.a' to clean targets

Fix some excessively long lines by deploying '\'.

Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Nick Terrell <[email protected]>
Cc: Gary Guo <[email protected]>
Cc: Alex Gaynor <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Wedson Almeida Filho <[email protected]>
Cc: Ze Gao <[email protected]>
Cc: Alice Ryhl <[email protected]>
Cc: Andrei Vagin <[email protected]>
Cc: Yicong Yang <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: John Garry <[email protected]>
Cc: Benno Lossin <[email protected]>
Cc: Björn Roy Baron <[email protected]>
Cc: Andreas Hindborg <[email protected]>
Cc: Paul Walmsley <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf mem: Fix a segfault with NULL event->name

Guilherme reported a crash in perf mem record.  It's because the
perf_mem_event->name was NULL on his machine.  It should just return
a NULL string when it has no format string in the name.

The backtrace at the crash is below:

  Program received signal SIGSEGV, Segmentation fault.
  __strchrnul_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:67
  67              vmovdqu (%rdi), %ymm2
  (gdb) bt
  #0  __strchrnul_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:67
  #1  0x00007ffff6c982de in __find_specmb (format=0x0) at printf-parse.h:82
  #2  __printf_buffer (buf=buf@entry=0x7fffffffc760, format=format@entry=0x0, ap=ap@entry=0x7fffffffc880,
      mode_flags=mode_flags@entry=0) at vfprintf-internal.c:649
  #3  0x00007ffff6cb7840 in __vsnprintf_internal (string=<optimized out>, maxlen=<optimized out>, format=0x0,
      args=0x7fffffffc880, mode_flags=mode_flags@entry=0) at vsnprintf.c:96
  #4  0x00007ffff6cb787f in ___vsnprintf (string=<optimized out>, maxlen=<optimized out>, format=<optimized out>,
      args=<optimized out>) at vsnprintf.c:103
  #5  0x00005555557b9391 in scnprintf (buf=0x555555fe9320 <mem_loads_name> "", size=100, fmt=0x0)
      at ../lib/vsprintf.c:21
  #6  0x00005555557b74c3 in perf_pmu__mem_events_name (i=0, pmu=0x555556832180) at util/mem-events.c:106
  #7  0x00005555557b7ab9 in perf_mem_events__record_args (rec_argv=0x55555684c000, argv_nr=0x7fffffffca20)
      at util/mem-events.c:252
  #8  0x00005555555e370d in __cmd_record (argc=3, argv=0x7fffffffd760, mem=0x7fffffffcd80) at builtin-mem.c:156
  #9  0x00005555555e49c4 in cmd_mem (argc=4, argv=0x7fffffffd760) at builtin-mem.c:514
  #10 0x000055555569716c in run_builtin (p=0x555555fcde80 <commands+672>, argc=8, argv=0x7fffffffd760) at perf.c:349
  #11 0x0000555555697402 in handle_internal_command (argc=8, argv=0x7fffffffd760) at perf.c:402
  #12 0x0000555555697560 in run_argv (argcp=0x7fffffffd59c, argv=0x7fffffffd590) at perf.c:446
  #13 0x00005555556978a6 in main (argc=8, argv=0x7fffffffd760) at perf.c:562

Reported-by: Guilherme Amadio <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Closes: https://lore.kernel.org/linux-perf-users/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf tools: Fix a compiler warning of NULL pointer

A compiler warning on the second argument of bsearch() should not be
NULL, but there's a case we might pass it. Let's return early if we
don't have any DSOs to search in __dsos__find_by_longname_id().

util/dsos.c:184:8: runtime error: null pointer passed as argument 2, which is declared to never be null

Reported-by: kernel test robot <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf symbol: Simplify kernel module checking

In dso__load(), it checks if the dso is a kernel module by looking the
symtab type. Actually dso has 'is_kmod' field to check that easily and
dso__set_module_info() set the symtab type and the is_kmod bit. So it
should have the same result to check the is_kmod bit.

Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf report: Fix condition in sort__sym_cmp()

It's expected that both hist entries are in the same hists when
comparing two.  But the current code in the function checks one without
dso sort key and other with the key.  This would make the condition true
in any case.

I guess the intention of the original commit was to add '!' for the
right side too.  But as it should be the same, let's just remove it.

Fixes: 69849fc5d2119 ("perf hists: Move sort__has_dso into struct perf_hpp_list")
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf pmus: Fixes always false when compare duplicates aliases

In the previous loop, all the members in the aliases[j-1] have been freed
and set to NULL. But in this loop, the function pmu_alias_is_duplicate()
compares the aliases[j] with the aliases[j-1] that has already been
disposed, so the function will always return false and duplicate aliases
will never be discarded.

If we find duplicate aliases, it skips the zfree aliases[j], which is
accompanied by a memory leak.

We can use the next aliases[j+1] to theck for duplicate aliases to
fixes the aliases NULL pointer dereference, then goto zfree code snippet
to release it.

After patch testing:
$ perf list --unit=hisi_sicl,cpa pmu

uncore cpa:
   cpa_p0_rd_dat_32b
        [Number of read ops transmitted by the P0 port which size is 32 bytes.
         Unit: hisi_sicl,cpa]
   cpa_p0_rd_dat_64b
        [Number of read ops transmitted by the P0 port which size is 64 bytes.
         Unit: hisi_sicl,cpa]

Fixes: c3245d2093c1 ("perf pmu: Abstract alias/event struct")
Signed-off-by: Junhao He <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf unwind-libunwind: Add malloc() failure handling

Add malloc() failure handling in unread_unwind_spec_debug_frame().
This make caller find_proc_info() works well when the allocation failure.

Signed-off-by: Yunseong Kim <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Austin Kim <[email protected]>
Cc: [email protected]
Cc: Ze Gao <[email protected]>
Cc: Leo Yan <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

util: constant -1 with expression of type char

This patch resolve following warning.

  tools/perf/util/evsel.c:1620:9: error: result of comparison of constant
   -1 with expression of type 'char' is always false
   -Werror,-Wtautological-constant-out-of-range-compare
   1620 |                 if (c == -1)
        |                     ~ ^  ~~

Signed-off-by: Yunseong Kim <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Austin Kim <[email protected]>
Cc: [email protected]
Cc: Ze Gao <[email protected]>
Cc: Leo Yan <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf: Timehist account sch delay for scheduled out running

When using perf timehist, sch delay is only computed for a waking task,
not for a pre empted task. This patches changes sch delay to account for
both. This makes sense as testing scheduling policy need to consider the
effect of scheduling delay globally, not only for waking tasks.

Example of `perf timehist` report before the patch for `stress` task
competing with each other.

First column is wait time, second column sch delay, third column
runtime.

1.492060 [0000]  s    stress[81]                          1.999      0.000      2.000      R  next: stress[83]
1.494060 [0000]  s    stress[83]                          2.000      0.000      2.000      R  next: stress[81]
1.496060 [0000]  s    stress[81]                          2.000      0.000      2.000      R  next: stress[83]
1.498060 [0000]  s    stress[83]                          2.000      0.000      1.999      R  next: stress[81]

After the patch, it looks like this (note that all wait time is not zero
anymore):

1.492060 [0000]  s    stress[81]                          1.999      1.999      2.000      R  next: stress[83]
1.494060 [0000]  s    stress[83]                          2.000      2.000      2.000      R  next: stress[81]
1.496060 [0000]  s    stress[81]                          2.000      2.000      2.000      R  next: stress[83]
1.498060 [0000]  s    stress[83]                          2.000      2.000      1.999      R  next: stress[81]

Signed-off-by: Fernand Sieber <[email protected]>
Reviewed-by: Madadi Vineeth Reddy <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf tests: Add APX and other new instructions to x86 instruction decoder test

Add samples of APX and other new instructions to the 'x86 instruction
decoder - new instructions' test.

Note the test is only available if the perf tool has been built with
EXTRA_TESTS=1.

Example:

  $ make EXTRA_TESTS=1 -C tools/perf
  $ tools/perf/perf test -F -v 'new ins' |& grep -i 'jmpabs\|popp\|pushp'
  Decoded ok: d5 00 a1 ef cd ab 90 78 56 34 12    jmpabs $0x1234567890abcdef
  Decoded ok: d5 08 53                    pushp  %rbx
  Decoded ok: d5 18 50                    pushp  %r16
  Decoded ok: d5 19 57                    pushp  %r31
  Decoded ok: d5 19 5f                    popp   %r31
  Decoded ok: d5 18 58                    popp   %r16
  Decoded ok: d5 08 5b                    popp   %rbx

Signed-off-by: Adrian Hunter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chang S. Bae <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Nikolay Borisov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder

JMPABS is 64-bit absolute direct jump instruction, encoded with a mandatory
REX2 prefix. JMPABS is designed to be used in the procedure linkage table
(PLT) to replace indirect jumps, because it has better performance. In that
case the jump target will be amended at run time. To enable Intel PT to
follow the code, a TIP packet is always emitted when JMPABS is traced under
Intel PT.

Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.

Decode JMPABS as an indirect jump, because it has an associated TIP packet
the same as an indirect jump and the control flow should follow the TIP
packet payload, and not assume it is the same as the on-file object code
JMPABS target address.

Signed-off-by: Adrian Hunter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Chang S. Bae <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Nikolay Borisov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf test: Check output of the probe ... --funcs command

Test "perf probe of function from different CU" only checks if the perf
command has failed and doesn't test the --funcs output. In the issue
reported in the previous commit, the garbage output of the --funcs
command was being ignored by the test when it could have been caught.

The script first makes use of --funcs option with the perf probe command
to check if the function "foo" exists in the testfile before adding a
probe to it in the next command. The output of probe...--funcs command
is redirected to stdout, therefore, add '| grep "foo"' to validate the
result.

Signed-off-by: Chaitanya S Prakash <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

tools/perf: Fix parallel-perf python script to replace new python syntax ":=" usage

perf test "perf script tests" fails as below in systems
with python 3.6

File "/home/athira/linux/tools/perf/tests/shell/../../scripts/python/parallel-perf.py", line 442
if line := p.stdout.readline():
^
SyntaxError: invalid syntax
--- Cleaning up ---
---- end(-1) ----
92: perf script tests: FAILED!

This happens because ":=" is a new syntax that assigns values
to variables as part of a larger expression. This is introduced
from python 3.8 and hence fails in setup with python 3.6
Address this by splitting the large expression and check the
value in two steps:
Previous line: if line := p.stdout.readline():
Current change:
line = p.stdout.readline()
if line:

With patch

./perf test "perf script tests"
93: perf script tests: Ok

Signed-off-by: Athira Rajeev <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

tools/perf: Use is_perf_pid_map_name helper function to check dso's of pattern /tmp/perf-%d.map

commit 80d496be89ed ("perf report: Add support for profiling JIT
generated code") added support for profiling JIT generated code.
This patch handles dso's of form "/tmp/perf-$PID.map".

Some of the references doesn't check exactly for same pattern.
some uses "if (!strncmp(dso_name, "/tmp/perf-", 10))". Fix
this by using helper function perf_pid_map_tid and
is_perf_pid_map_name which looks for proper pattern of
form: "/tmp/perf-$PID.map" for these checks.

Signed-off-by: Athira Rajeev <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

tools/perf: Fix the string match for "/tmp/perf-$PID.map" files in dso__load

Perf test for perf probe of function from different CU fails
as below:

./perf test -vv "test perf probe of function from different CU"
116: test perf probe of function from different CU:
--- start ---
test child forked, pid 2679
Failed to find symbol foo in /tmp/perf-uprobe-different-cu-sh.Msa7iy89bx/testfile
  Error: Failed to add events.
--- Cleaning up ---
"foo" does not hit any event.
  Error: Failed to delete events.
---- end(-1) ----
116: test perf probe of function from different CU                   : FAILED!

The test does below to probe function "foo" :

# gcc -g -Og -flto -c /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.c
-o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.o
# gcc -g -Og -c /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.c
-o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.o
# gcc -g -Og -o /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile
/tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-foo.o
/tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile-main.o

# ./perf probe -x /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile foo
Failed to find symbol foo in /tmp/perf-uprobe-different-cu-sh.XniNxNEVT7/testfile
   Error: Failed to add events.

Perf probe fails to find symbol foo in the executable placed in
/tmp/perf-uprobe-different-cu-sh.XniNxNEVT7

Simple reproduce:

# mktemp -d /tmp/perf-checkXXXXXXXXXX
   /tmp/perf-checkcWpuLRQI8j

# gcc -g -o test test.c
# cp test /tmp/perf-checkcWpuLRQI8j/
# nm /tmp/perf-checkcWpuLRQI8j/test | grep foo
   00000000100006bc T foo

# ./perf probe -x /tmp/perf-checkcWpuLRQI8j/test foo
   Failed to find symbol foo in /tmp/perf-checkcWpuLRQI8j/test
      Error: Failed to add events.

But it works with any files like /tmp/perf/test. Only for
patterns with "/tmp/perf-", this fails.

Further debugging, commit 80d496be89ed ("perf report: Add support
for profiling JIT generated code") added support for profiling JIT
generated code. This patch handles dso's of form
"/tmp/perf-$PID.map" .

The check used "if (strncmp(self->name, "/tmp/perf-", 10) == 0)"
to match "/tmp/perf-$PID.map". With this commit, any dso in
/tmp/perf- folder will be considered separately for processing
(not only JIT created map files ). Fix this by changing the
string pattern to check for "/tmp/perf-%d.map". Add a helper
function is_perf_pid_map_name to do this check. In "struct dso",
dso->long_name holds the long name of the dso file. Since the
/tmp/perf-$PID.map check uses the complete name, use dso___long_name for
the string name.

With the fix,
# ./perf test "test perf probe of function from different CU"
117: test perf probe of function from different CU                   : Ok

Fixes: 56cbeacf1435 ("perf probe: Add test for regression introduced by switch to die_get_decl_file()")
Signed-off-by: Athira Rajeev <[email protected]>
Reviewed-by: Chaitanya S Prakash <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf test: Make test_arm_callgraph_fp.sh more robust

The 2 second sleep can cause the test to fail on very slow network file
systems because Perf ends up being killed before it finishes starting
up.

Fix it by making the leafloop workload end after a fixed time like the
other workloads so there is no need to kill it after 2 seconds.

Also remove the 1 second start sampling delay because it is similarly
fragile. Instead, search through all samples for a matching one, rather
than just checking the first sample and hoping it's in the right place.

Fixes: cd6382d82752 ("perf test arm64: Test unwinding using fame-pointer (fp) mode")
Signed-off-by: James Clark <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: German Gomez <[email protected]>
Cc: Spoorthy S <[email protected]>
Cc: Kajol Jain <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf build: Ensure libtraceevent and libtracefs versions have 3 components

When either of these have a shorter version, like 1.8, the expression
that computes the version has a syntax error that can be seen in the
output of make:

expr: syntax error: missing argument after +

Link: https://bugs.gentoo.org/917559
Reported-by: Peter Volkov <[email protected]>
Signed-off-by: Guilherme Amadio <[email protected]>
Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf build: Use pkg-config for feature check for libtrace{event,fs}

Needed to add required include directories for the feature detection
to succeed. The header tracefs.h is installed either into the include
directory /usr/include/tracefs/tracefs.h when using the Makefile, or
into /usr/include/libtracefs/tracefs.h when using meson to build
libtracefs. The header tracefs.h uses #include <event-parse.h> from
libtraceevent, so pkg-config needs to pick the correct include directory
for libtracefs and add the one for libtraceevent to succeed.

Note that in baa2ca59ec1e31ccbe3f24ff0368152b36f68720 the variable
LIBTRACEEVENT_DIR was introduced, and now the method to compile against
non-standard locations requires PKG_CONFIG_PATH to be set instead, which
works for both libtraceevent and libtracefs.

Signed-off-by: Guilherme Amadio <[email protected]>
Reviewed-by: Leo Yan <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf arm: Workaround ARM PMUs cpu maps having offline cpus

When PMUs have a cpu map in the 'cpus' or 'cpumask' file, perf will
try to open events on those CPUs. ARM doesn't remove offline CPUs
meaning taking a CPU offline will cause perf commands to fail unless a
CPU map is passed on the command line.

More context in:
https://lore.kernel.org/lkml/20240603092812 [email protected]/

Reported-by: Yicong Yang <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Yicong Yang <[email protected]>
Tested-by: Leo Yan <[email protected]>
Cc: James Clark <[email protected]>
Cc: Suzuki K Poulose <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mike Leach <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: John Garry <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf stat: Fix the hard-coded metrics calculation on the hybrid

The hard-coded metrics is wrongly calculated on the hybrid machine.

$ perf stat -e cycles,instructions -a sleep 1

Performance counter stats for 'system wide':

        18,205,487      cpu_atom/cycles/
         9,733,603      cpu_core/cycles/
         9,423,111      cpu_atom/instructions/     #  0.52  insn per cycle
         4,268,965      cpu_core/instructions/     #  0.23  insn per cycle

The insn per cycle for cpu_core should be 4,268,965 / 9,733,603 = 0.44.

When finding the metric events, the find_stat() doesn't take the PMU
type into account. The cpu_atom/cycles/ is wrongly used to calculate
the IPC of the cpu_core.

In the hard-coded metrics, the events from a different PMU are only
SW_CPU_CLOCK and SW_TASK_CLOCK. They both have the stat type,
STAT_NSECS. Except the SW CLOCK events, check the PMU type as well.

Fixes: 0a57b910807a ("perf stat: Use counts rather than saved_value")
Reported-by: Khalil, Amiri <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add westmereex counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add westmereep-sp counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add westmereep-dp counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add/update tigerlake events/metrics

Update events from v1.15 to v1.16.
Update TMA metrics from v4.7 to v4.8.

Bring in the event updates v1.16:
https://github.com/intel/perfmon/commit/43f3b8d6f82f3174bd3bffe8587e2179f086d2ce

The TMA 4.8 information was added in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Add counter information. The most recent RFC patch set using this
information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add snowridgex counter information

Update/remove events as per v1.23:
https://github.com/intel/perfmon/commit/9debd874e1b2b0cca42b9ba2342cacaaace2f0ce

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add/update skylakex events/metrics

Update events from v1.33 to v1.35.
Update TMA metrics from v4.7 to v4.8.

Bring in the event updates v1.35:
https://github.com/intel/perfmon/commit/c99b60c147b96f40f96dd961abfae54909f47e5f

The TMA 4.8 information was added in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Add counter information. The most recent RFC patch set using this
information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

Adds the event SW_PREFETCH_ACCESS.ANY.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add/update skylake events/metrics

Update events from v58 to v59.
Update TMA metrics from v4.7 to v4.8.

Bring in the event updates v59:
https://github.com/intel/perfmon/commit/5d36f1835b02f056031a06e777e4bf54a5964930

The TMA 4.8 information was added in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Add counter information. The most recent RFC patch set using this
information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

Adds the event SW_PREFETCH_ACCESS.ANY.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add silvermont counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add/update sierraforest events/metrics

Update events from v1.02 to v1.04.
Add TMA metrics v4.8.

Bring in the event updates v1.04:
https://github.com/intel/perfmon/commit/0a9546cdf63c8b07f5c33ebf6fe49e6ebec89f86
v1.03:
https://github.com/intel/perfmon/commit/c7dd26ce67ca4477d40fb4b55b6baa0584b3e5d6

The TMA 4.8 information was added in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Add counter information. The most recent RFC patch set using this
information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

New events are:
FP_INST_RETIRED.128B_DP,
FP_INST_RETIRED.128B_SP,
FP_INST_RETIRED.256B_DP,
FP_INST_RETIRED.32B_SP,
FP_INST_RETIRED.64B_DP,
OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM,
OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD,
OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM,
OCR.STREAMING_WR.ANY_RESPONSE,
UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR_LOCAL,
UNC_CHA_TOR_INSERTS.IO_ITOMCACHENEAR_REMOTE,
UNC_CHA_TOR_INSERTS.IO_ITOM_LOCAL,
UNC_CHA_TOR_INSERTS.IO_ITOM_REMOTE,
UNC_CHA_TOR_INSERTS.IO_MISS,
UNC_CHA_TOR_INSERTS.IO_MISS_ITOM,
UNC_CHA_TOR_INSERTS.IO_MISS_ITOMCACHENEAR,
UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_LOCAL,
UNC_CHA_TOR_INSERTS.IO_PCIRDCUR_REMOTE,
UNC_CXLCM_RxC_PACK_BUF_INSERTS.MEM_DATA,
UNC_CXLDP_TxC_AGF_INSERTS.M2S_DATA.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add/update sapphirerapids events/metrics

Update events from v1.20 to v1.23.
Update TMA metrics from v4.7 to v4.8.

Bring in the event updates v1.23:
https://github.com/intel/perfmon/commit/6ace93281c0f573b90d3f8f624486ad59dde1c93
v1.22:
https://github.com/intel/perfmon/commit/356eba05c07c4d54ed5b92c1164ce00fab545636

The TMA 4.8 information was added in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Add counter information. The most recent RFC patch set using this
information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

New events are:
EXE_ACTIVITY.2_3_PORTS_UTIL,
ICACHE_DATA.STALL_PERIODS,
L2_TRANS.L2_WB,
MEM_TRANS_RETIRED.LOAD_LATENCY_GT_1024,
OFFCORE_REQUESTS.DEMAND_CODE_RD,
OFFCORE_REQUESTS.DEMAND_RFO,
OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE_RD,
OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD,
RS.EMPTY_RESOURCE,
SW_PREFETCH_ACCESS.ANY,
UOPS_ISSUED.CYCLES.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update sandybridge metrics add event counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

The TMA 4.8 information was updated in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add/update rocketlake events/metrics

Update events from v1.02 to v1.03.
Update TMA metrics from v4.7 to v4.8.

Bring in the event updates v1.03:
https://github.com/intel/perfmon/commit/a7c75ffd56c7056494cd3acc2749336cd6363b90

The TMA 4.8 information was added in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Add counter information. The most recent RFC patch set using this
information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

Adds the event SW_PREFETCH_ACCESS.ANY.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add nehalemex counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add nehalemep counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update meteorlake events and add counter information

Update events from v1.08 to v1.10.

Bring in the event updates v1.10:
https://github.com/intel/perfmon/commit/3bee3dc150164df0bec5980ca5586930730e5778
v1.09:
https://github.com/intel/perfmon/commit/01c8c99f17a72460b2eaf7efe3495913f36c9d42

Add counter information. The most recent RFC patch set using this
information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

New events are:
EXE_ACTIVITY.2_3_PORTS_UTIL,
FP_INST_RETIRED.128B_DP,
FP_INST_RETIRED.128B_SP,
FP_INST_RETIRED.256B_DP,
FP_INST_RETIRED.32B_SP,
FP_INST_RETIRED.64B_DP,
FP_VINT_UOPS_EXECUTED.STD,
L2_LINES_OUT.USELESS_HWPF,
L2_RQSTS.SWPF_HIT,
L2_RQSTS.SWPF_MISS,
LOAD_HIT_PREFETCH.SWPF,
MACHINE_CLEARS.ANY,
MACHINE_CLEARS.MRN_NUKE,
MISC_RETIRED.LBR_INSERTS,
SW_PREFETCH_ACCESS.ANY.

The metrics aren't updated as they require retirement latency support
that is added in this series:
https://lore.kernel.org/lkml/20240613033631 [email protected]/

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add lunarlake counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add knightslanding counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update jaketown metrics add event counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

The TMA 4.8 information was updated in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update ivytown metrics add event counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

The TMA 4.8 information was updated in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update ivybridge metrics add event counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

The TMA 4.8 information was updated in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add/update icelakex events/metrics

Update events from v1.24 to v1.26.
Add TMA metrics v4.8.

Bring in the event updates v1.26:
https://github.com/intel/perfmon/commit/c607c739e05f2569f95998cc98e1283f042b4fd1
v1.25:
https://github.com/intel/perfmon/commit/42d996769069921ec06f6fbb600b0c663b9ec5a9

The TMA 4.8 information was added in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Adds the event SW_PREFETCH_ACCESS.ANY.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add/update icelake events/metrics

Update events from v1.21 to v1.22.
Add TMA metrics v4.8.

Bring in the event updates v1.22:
https://github.com/intel/perfmon/commit/e5640646e96d59e3c1c1e0d0100a475220ff1dfe

The TMA 4.8 information was added in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Adds the event SW_PREFETCH_ACCESS.ANY.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update haswellx metrics add event counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

The TMA 4.8 information was updated in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add haswell counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update graniterapids events and add counter information

Update events from v1.01 to v1.02.

Bring in the event updates v1.02:
https://github.com/intel/perfmon/commit/0ff9f681bd07d0e84026c52f4941d21b1cd4c171

Add counter information. The most recent RFC patch set using this
information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

There are over 1000 new events.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update/add grandridge events/metrics

Update events from v1.02 to v1.03.
Add TMA metrics v4.8.

Bring in the event updates v1.03:
https://github.com/intel/perfmon/commit/5ec7a252d0f6ec461f80cc397c9ac25abcd9184f

The TMA 4.8 information was added in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

New events are:
FP_INST_RETIRED.128B_DP,
FP_INST_RETIRED.128B_SP,
FP_INST_RETIRED.256B_DP,
FP_INST_RETIRED.32B_SP,
FP_INST_RETIRED.64B_DP,
OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM,
OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD,
OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM,
OCR.STREAMING_WR.ANY_RESPONSE.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add goldmontplus counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add goldmont counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add/update emeraldrapids events/metrics

Update events from v1.06 to v1.09.
Add TMA metrics v4.8.

Bring in the event updates v1.09:
https://github.com/intel/perfmon/commit/3fd5892bb4aece9c1e5c17630570d0462838e85d
v1.08:
https://github.com/intel/perfmon/commit/54525c4508f4a1ce4a8b854aa808a4ee2fb5930b

The TMA 4.8 information was added in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

New events are:
EXE_ACTIVITY.2_3_PORTS_UTIL,
ICACHE_DATA.STALL_PERIODS,
L2_TRANS.L2_WB,
MEM_TRANS_RETIRED.LOAD_LATENCY_GT_1024,
OFFCORE_REQUESTS.DEMAND_CODE_RD,
OFFCORE_REQUESTS.DEMAND_RFO,
OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE_RD,
OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD,
RS.EMPTY_RESOURCE,
SW_PREFETCH_ACCESS.ANY,
UNC_IIO_BANDWIDTH_OUT.PART[0-7]_FREERUN,
UOPS_ISSUED.CYCLES.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update elkhartlake events

Update events from v1.04 to v1.05. Bring in event updates from:
https://github.com/intel/perfmon/commit/fb91e1851ca40a5b443e2c3cd79bc7fc34c8237e

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update cascadelakex events/metrics

Update events from v1.21 to v1.22.

Bring in the event updates v1.22
https://github.com/intel/perfmon/commit/013877729c4ed96427932ca48722bc3bfd2a0075

The TMA 4.8 information was updated in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

New events are:
SW_PREFETCH_ACCESS.ANY

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update broadwellx metrics add event counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

The TMA 4.8 information was updated in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update broadwellde metrics add event counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

The TMA 4.8 information was updated in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update broadwell metrics add event counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

The TMA 4.8 information was updated in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Add bonnell counter information

Add counter information necessary for optimizing event grouping the
perf tool.

The most recent RFC patch set using this information:
https://lore.kernel.org/lkml/20240412210756 [email protected]/

The information was added in:
https://github.com/intel/perfmon/commit/475892a9690cb048949e593fe39cee65cd4765e1
and later patches.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update alderlaken events/metrics

Update events from v1.24 to v1.27.
Update e-core TMA metrics to v3.6.

Bring in the event updates v1.27:
https://github.com/intel/perfmon/commit/ea4f309a04c50ca77a00da2db130fd7cf06db978
v1.26:
https://github.com/intel/perfmon/commit/0052e68d24d9873d5ff22363677794fa3eb05313

The e-core TMA 3.6 information was updated in:
https://github.com/intel/perfmon/commit/d9c2faa70bafe03129dc10f9fe414ef03a95acd9

New events are:
MEM_UOPS_RETIRED.LOCK_LOADS,
SERIALIZATION.C01_MS_SCB,
UOPS_ISSUED.ANY.

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf vendor events: Update alderlake events/metrics

Update events from v1.24 to v1.27.
Update p-core TMA metrics from v4.7 to v4.8, and the e-core TMA
metrics to v3.6.

Bring in the event updates v1.27:
https://github.com/intel/perfmon/commit/ea4f309a04c50ca77a00da2db130fd7cf06db978
v1.26:
https://github.com/intel/perfmon/commit/0052e68d24d9873d5ff22363677794fa3eb05313

The p-core TMA 4.8 information was updated in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736
And e-core in:
https://github.com/intel/perfmon/commit/d9c2faa70bafe03129dc10f9fe414ef03a95acd9

New events are:
EXE_ACTIVITY.2_3_PORTS_UTIL,
ICACHE_DATA.STALL_PERIODS,
L2_TRANS.L2_WB,
MEM_TRANS_RETIRED.LOAD_LATENCY_GT_1024,
MEM_UOPS_RETIRED.LOCK_LOADS,
OFFCORE_REQUESTS.DEMAND_CODE_RD,
OFFCORE_REQUESTS.DEMAND_RFO,
OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_CODE_RD,
OFFCORE_REQUESTS_OUTSTANDING.DEMAND_CODE_RD,
RS.EMPTY_RESOURCE,
SERIALIZATION.C01_MS_SCB,
SW_PREFETCH_ACCESS.ANY,
UOPS_ISSUED.ANY,
UOPS_ISSUED.CYCLES

Co-authored-by: Weilin Wang <[email protected]>
Co-authored-by: Caleb Biggers <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf doc: Add AMD IBS usage document

Add a perf man page document that describes how to exploit AMD IBS with
Linux perf. Brief intro about IBS and simple one-liner examples will help
naive users to get started. This is not meant to be an exhaustive IBS
guide. User should refer latest AMD64 Architecture Programmer's Manual
for detailed description of IBS.

Usage:

$ man perf-amd-ibs

Signed-off-by: Ravi Bangoria <[email protected]>
Reviewed-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

tools/perf: Handle perftool-testsuite_probe testcases fail when kernel debuginfo is not present

Running "perftool-testsuite_probe" fails as below:

./perf test -v "perftool-testsuite_probe"
83: perftool-testsuite_probe  : FAILED

There are three fails:

1. Regexp not found: "\s*probe:inode_permission(?:_\d+)?\s+$on inode_permission(?:[:\+][0-9A-Fa-f]+)?@.+$"
   -- [ FAIL ] -- perf_probe :: test_adding_kernel :: listing added probe :: perf probe -l (output regexp parsing)

2. Regexp not found: "probe:vfs_mknod"
   Regexp not found: "probe:vfs_create"
   Regexp not found: "probe:vfs_rmdir"
   Regexp not found: "probe:vfs_link"
   Regexp not found: "probe:vfs_write"
   -- [ FAIL ] -- perf_probe :: test_adding_kernel :: wildcard adding support (command exitcode + output regexp parsing)

3. Regexp not found: "Failed to find"
   Regexp not found: "somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64"
   Regexp not found: "in this function|at this address"
   Line did not match any pattern: "The /boot/vmlinux file has no debug information."
   Line did not match any pattern: "Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo package."

These three tests depends on kernel debug info.
1. Fail 1 expects file name along with probe which needs debuginfo
2. Fail 2 :
    perf probe -nf --max-probes=512 -a 'vfs_* $params'
    Debuginfo-analysis is not supported.
     Error: Failed to add events.

3. Fail 3 :
   perf probe 'vfs_read somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64'
   Debuginfo-analysis is not supported.
   Error: Failed to add events.

There is already helper function skip_if_no_debuginfo in
lib/probe_vfs_getname.sh which does perf probe and returns
"2" if debug info is not present. Use the skip_if_no_debuginfo
function and skip only the three tests which needs debuginfo
based on the result.

With the patch:

    83: perftool-testsuite_probe:
   --- start ---
   test child forked, pid 3927
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission ::
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission :: -a
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: adding probe inode_permission :: --add
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: listing added probe :: perf list
   Regexp not found: "\s*probe:inode_permission(?:_\d+)?\s+$on inode_permission(?:[:\+][0-9A-Fa-f]+)?@.+$"
   -- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: using added probe
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: deleting added probe
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: listing removed probe (should NOT be listed)
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: dry run :: adding probe
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: first probe adding
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: second probe adding (without force)
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: force-adding probes :: second probe adding (with force)
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: using doubled probe
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: removing multiple probes
   Regexp not found: "probe:vfs_mknod"
   Regexp not found: "probe:vfs_create"
   Regexp not found: "probe:vfs_rmdir"
   Regexp not found: "probe:vfs_link"
   Regexp not found: "probe:vfs_write"
   -- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped
   Regexp not found: "Failed to find"
   Regexp not found: "somenonexistingrandomstuffwhichisalsoprettylongorevenlongertoexceed64"
   Regexp not found: "in this function|at this address"
   Line did not match any pattern: "The /boot/vmlinux file has no debug information."
   Line did not match any pattern: "Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo package."
   -- [ SKIP ] -- perf_probe :: test_adding_kernel :: 2 2 Skipped due to missing debuginfo :: testcase skipped
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: function with retval :: add
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: function with retval :: record
   -- [ PASS ] -- perf_probe :: test_adding_kernel :: function argument probing :: script
   ## [ PASS ] ## perf_probe :: test_adding_kernel SUMMARY
   ---- end(0) ----
   83: perftool-testsuite_probe                                        : Ok

Only the three specific tests are skipped and remaining
ran successfully.

Signed-off-by: Athira Rajeev <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf hist: Honor symbol_conf.skip_empty

So that it can skip events with no sample according to the config value.
This can omit the dummy event in the output of perf report --group.

An example output:

  $ sudo perf mem record -a sleep 1
  $ sudo perf report --group

Before)
  #
  # Samples: 232  of events 'cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u'
  # Event count (approx.): 3089861
  #
  #                 Overhead  Command      Shared Object      Symbol
  # ........................  ...........  .................  .....................................
  #
       9.29%   0.00%   0.00%  swapper      [kernel.kallsyms]  [k] update_blocked_averages
       5.26%   0.15%   0.00%  swapper      [kernel.kallsyms]  [k] __update_load_avg_se
       4.15%   0.00%   0.00%  perf-exec    [kernel.kallsyms]  [k] slab_update_freelist.isra.0
       3.87%   0.00%   0.00%  perf-exec    [kernel.kallsyms]  [k] memcg_slab_post_alloc_hook
       3.79%   0.17%   0.00%  swapper      [kernel.kallsyms]  [k] enqueue_task_fair
       3.63%   0.00%   0.00%  sleep        [kernel.kallsyms]  [k] next_uptodate_page
       2.86%   0.00%   0.00%  swapper      [kernel.kallsyms]  [k] __update_load_avg_cfs_rq
       2.78%   0.00%   0.00%  swapper      [kernel.kallsyms]  [k] __schedule
       2.34%   0.00%   0.00%  swapper      [kernel.kallsyms]  [k] intel_idle
       2.32%   0.97%   0.00%  swapper      [kernel.kallsyms]  [k] psi_group_change

After)
  #
  # Samples: 232  of events 'cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P'
  # Event count (approx.): 3089861
  #
  #         Overhead  Command      Shared Object      Symbol
  # ................  ...........  .................  .....................................
  #
       9.29%   0.00%  swapper      [kernel.kallsyms]  [k] update_blocked_averages
       5.26%   0.15%  swapper      [kernel.kallsyms]  [k] __update_load_avg_se
       4.15%   0.00%  perf-exec    [kernel.kallsyms]  [k] slab_update_freelist.isra.0
       3.87%   0.00%  perf-exec    [kernel.kallsyms]  [k] memcg_slab_post_alloc_hook
       3.79%   0.17%  swapper      [kernel.kallsyms]  [k] enqueue_task_fair
       3.63%   0.00%  sleep        [kernel.kallsyms]  [k] next_uptodate_page
       2.86%   0.00%  swapper      [kernel.kallsyms]  [k] __update_load_avg_cfs_rq
       2.78%   0.00%  swapper      [kernel.kallsyms]  [k] __schedule
       2.34%   0.00%  swapper      [kernel.kallsyms]  [k] intel_idle
       2.32%   0.97%  swapper      [kernel.kallsyms]  [k] psi_group_change

Now it doesn't have a column for the dummy event.

Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf hist: Add symbol_conf.skip_empty

Add the skip_empty flag to symbol_conf and set the value from the report
command to preserve the existing behavior. This makes the code simpler
and will be needed other code which is hard to add a new argument.

Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf hist: Simplify __hpp_fmt() using hpp_fmt_data

The struct hpp_fmt_data is to keep the values for each group members so
it doesn't need to check the event index in the group.

Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf hist: Factor out __hpp__fmt_print()

Split the logic to print the histogram values according to the format
string. This was used in 3 different places so it's better to move out
the logic into a function.

Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf: sched map skips redundant lines with cpu filters

perf sched map supports cpu filter.
However, even with cpu filters active, any context switch currently
corresponds to a separate line.
As result, context switches on irrelevant cpus result to redundant lines,
which makes the output particlularly difficult to read on wide
architectures.

Fix it by skipping printing for irrelevant CPUs.

Example snippet of output before fix:

  *B0       1.461147 secs
   B0
   B0
   B0
  *G0       1.517139 secs

After fix:

  *B0       1.461147 secs
  *G0       1.517139 secs

Signed-off-by: Fernand Sieber <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Reviewed-and-tested-by: Madadi Vineeth Reddy <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf test pmu: Warn don't fail for legacy mixed case event names

PowerPC has mixed case events matching legacy hardware cache
events. Warn but don't fail in this case. Event parsing will still
work in this case by matching the legacy case.

Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Kajol Jain <[email protected]>
Cc: James Clark <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

tools/perf: Fix timing issue with parallel threads in perf bench wake-up-parallel

perf bench futex fails as below and hangs intermittently when
attempted to run on on a powerpc system:

./perf bench futex wake-parallel
Running 'futex/wake-parallel' benchmark:
Run summary [PID 88588]: blocking on 640 threads (at [private] futex 0x10464b8c), 640 threads waking up 1 at a time.

[Run 1]: Avg per-thread latency (waking 1/640 threads) in 0.1309 ms (+-53.27%)
[Run 2]: Avg per-thread latency (waking 1/640 threads) in 0.0120 ms (+-31.16%)
[Run 3]: Avg per-thread latency (waking 1/640 threads) in 0.1474 ms (+-92.47%)
[Run 4]: Avg per-thread latency (waking 1/640 threads) in 0.2883 ms (+-67.75%)
[Run 5]: Avg per-thread latency (waking 1/640 threads) in 0.4108 ms (+-39.60%)
[Run 6]: Avg per-thread latency (waking 1/640 threads) in 0.7843 ms (+-78.98%)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)

In the system, where perf bench wake-up-parallel is has system
configuration of 640 cpus. After debugging, this turned out to be
a timing issue. The benchmark creates threads equal to number of
cpus and issues a futex_wait. Then it does a usleep for .1 second
before initiating futex_wake. In system configuration with more
threads, the usleep time is not enough. Patch changes the usleep
from 100000 to 200000

With the patch, ran multiple iterations and there were no issues
further seen

Reported-by: Disha Goel <[email protected]>
Signed-off-by: Athira Rajeev <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Tested-by: Disha Goel <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

tools/perf: Fix perf bench epoll to enable the run when some CPU's are offline

Perf bench epoll fails as below when attempted to run on
on a powerpc system:

   ./perf bench epoll wait
   Running 'epoll/wait' benchmark:
   Run summary [PID 627653]: 79 threads monitoring on 64 file-descriptors for 8 secs.

   perf: pthread_create: No such file or directory

In the setup where this perf bench was ran, difference was that
partition had 640 CPU's, but not all CPUs were online. 80 CPUs
were online. While creating threads and using epoll_wait , code
sets the affinity using cpumask. The cpumask size used is 80
which is picked from "nrcpus = perf_cpu_map__nr(cpu)". Here the
benchmark reports fail while setting affinity for cpu number which
is greater than 80 or higher, because it attempts to set a bit
position which is not allocated on the cpumask. Fix this by changing
the size of cpumask to number of possible cpus and not the number
of online cpus.

Signed-off-by: Athira Rajeev <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Tested-by: Disha Goel <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

tools/perf: Fix perf bench futex to enable the run when some CPU's are offline

Perf bench futex fails as below when attempted to run on
on a powerpc system:

./perf bench futex all
Running futex/hash benchmark...
Run summary [PID 626307]: 80 threads, each operating on 1024 [private] futexes for 10 secs.

perf: pthread_create: No such file or directory

In the setup where this perf bench was ran, difference was that
partition had 640 CPU's, but not all CPUs were online. 80 CPUs
were online. While blocking the threads with futex_wait, code
sets the affinity using cpumask. The cpumask size used is 80
which is picked from "nrcpus = perf_cpu_map__nr(cpu)". Here the
benchmark reports fail while setting affinity for cpu number which
is greater than 80 or higher, because it attempts to set a bit
position which is not allocated on the cpumask. Fix this by changing
the size of cpumask to number of possible cpus and not the number
of online cpus.

Signed-off-by: Athira Rajeev <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Tested-by: Disha Goel <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf record: Ensure space for lost samples

Previous allocation didn't account for sample ID written after the
lost samples event. Switch from malloc/free to a stack allocation.

Reported-by: Milian Wolff <[email protected]>
Closes: https://lore.kernel.org/linux-perf-users/23879991.0LEYPuXRzz@milian-workstation/
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf evsel: Refactor tool events

Tool events unnecessarily open a dummy perf event which is useless
even with `perf record` which will still open a dummy event. Change
the behavior of tool events so:

- duration_time - call `rdclock` on open and then report the count as
   a delta since the start in evsel__read_counter. This moves code out
   of builtin-stat making it more general purpose.

- user_time/system_time - open the fd as either `/proc/pid/stat` or
   `/proc/stat` for cases like system wide. evsel__read_counter will
   read the appropriate field out of the procfs file. These values
   were previously supplied by wait4, if the procfs read fails then
   the wait4 values are used, assuming the process/thread terminated.
   By reading user_time and system_time this way, interval mode, per
   PID and per CPU can be supported although there are restrictions
   given what the files provide (e.g. per PID can't be combined with
   per CPU).

Opening any of the tool events for `perf record` is changed to return
invalid.

Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Weilin Wang <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: James Clark <[email protected]>
Cc: Dmitrii Dolgov <[email protected]>
Cc: Ze Gao <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Leo Yan <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf test: Speed up test case 70 annotate basic tests

On some s390 linux machine (mostly older models) and with debug
packages installed, the test case 'perf annotate basic tests' runs
for some longer time.
Speed up the test and save the output of command perf annotate
in a temporary file. This is used to perform pattern matching via
grep command. This saves on invocation of perf annotate which
runs for some time.

Output before:
# time bash -x tests/shell/annotate.sh >/dev/null 2>&1; echo EXIT CODE $?

real   4m35.543s
user   3m19.442s
sys    1m14.322s
EXIT CODE 0
#
Output after:
# time bash -x tests/shell/annotate.sh >/dev/null 2>&1; echo EXIT CODE $?

real   2m2.881s
user   1m30.980s
sys    0m30.684s
EXIT CODE 0
#

Signed-off-by: Thomas Richter <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf stat: Choose the most disaggregate command line option

When multiple aggregation options are passed to perf stat the behavior
isn't clear. Consider "perf stat -A --per-socket .." and "perf stat
--per-socket -A ..", the first won't aggregate at all while the second
will do per-socket aggregation, even though the same options were
passed.

Rather than set an enum value, gather the options in a struct and
process them from most to least aggregate. This ensures the least
aggregate option always applies, so no aggregation if "-A" is passed.

Signed-off-by: Ian Rogers <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf stat: Make options local

Reduce the scope of stat_options to cmd_stat, and pass as an argument
to __cmd_record. This is done to make more localized changes to the
options in later patches. A side-effect of the change is to reduce the
size of a stripped PIE perf binary by 5952 bytes. The savings come
mainly in the dynamic relocation section.

Signed-off-by: Ian Rogers <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf maps: Add/use a sorted insert for fixup overlap and insert

Data may have lots of overlapping mmaps. The regular insert adds at
the end and relies on a later sort. For data with overlapping mappings
the sort will happen during a subsequent maps__find or
__maps__fixup_overlap_and_insert, there's never a period where the
inserted maps buffer up and a single sort happens. To avoid back to
back sorts, maintain the sort order when fixing up and
inserting. Previously the first_ending_after search was O(log n) where
n is the size of maps, and the insert was O(1) but because of the
continuous sorting was becoming O(n*log(n)). With maintaining sort
order, the insert now becomes O(n) for a memmove.

For a perf report on a perf.data file containing overlapping mappings
the time numbers are:

Before:
real    0m5.894s
user    0m5.650s
sys     0m0.231s

After:
real    0m0.675s
user    0m0.454s
sys     0m0.196s

Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Steinar H . Gunderson <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf maps: Reduce sorting for overlapping mappings

When an 'after' map is generated the 'new' map must be before it so
terminate iterating and don't resort. If the entry 'pos' is entirely
overlapped by the 'new' mapping then don't remove and insert the
mapping, just replace - again to remove sorting.

For a perf report on a perf.data file containing overlapping mappings
the time numbers are:

Before:
real    0m9.856s
user    0m9.637s
sys     0m0.204s

After:
real    0m5.894s
user    0m5.650s
sys     0m0.231s

Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Steinar H . Gunderson <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf maps: Fix use after free in __maps__fixup_overlap_and_insert

In the case 'before' and 'after' are broken out from pos,
maps_by_address may be changed by __maps__insert, as such it needs
re-reading.

Don't ignore the return value from __maps_insert.

Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses")
Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Steinar H . Gunderson <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf script: netdev-times: add location parameter to consume_skb

dd1b527831a3 ("net: add location to trace_consume_skb()") added a new
parameter to the consume_skb tracepoint. Adapt the script to match.

Signed-off-by: Lucas Stach <[email protected]>
Acked-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf: parse-events: Fix compilation error while defining DEBUG_PARSER

Compiling perf tool with 'DEBUG_PARSER=1' leads to errors:

$> make -C tools/perf PARSER_DEBUG=1 NO_LIBTRACEEVENT=1
...
  CC      util/expr-flex.o
  CC      util/expr.o
util/parse-events.c:33:12: error: redundant redeclaration of ‘parse_events_debug’ [-Werror=redundant-decls]
   33 | extern int parse_events_debug;
      |            ^~~~~~~~~~~~~~~~~~
In file included from util/parse-events.c:18:
util/parse-events-bison.h:43:12: note: previous declaration of ‘parse_events_debug’ with type ‘int’
   43 | extern int parse_events_debug;
      |            ^~~~~~~~~~~~~~~~~~
util/expr.c:27:12: error: redundant redeclaration of ‘expr_debug’ [-Werror=redundant-decls]
   27 | extern int expr_debug;
      |            ^~~~~~~~~~
In file included from util/expr.c:11:
util/expr-bison.h:43:12: note: previous declaration of ‘expr_debug’ with type ‘int’
   43 | extern int expr_debug;
      |            ^~~~~~~~~~
cc-1: all warnings being treated as errors

Remove extern declaration from the parse-envents.c file as there is a
conflict with the ones generated using bison and yacc tools from the file
parse-events.[ly].

Signed-off-by: Clément Le Goffic <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: John Garry <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf hisi-ptt: remove unused struct 'hisi_ptt_queue'

'hisi_ptt_queue' has been unused since the original
commit 5e91e57e6809 ("perf auxtrace arm64: Add support for parsing
HiSilicon PCIe Trace packet").

Remove it.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf genelf: remove unused struct 'options'

'options' has been unused since
commit fa7f7e735495 ("perf jit: Move test functionality in to a test").

Remove it.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf lock info: Display both map and thread by default

Change "perf lock info" argument handling to:

Display both map and thread info (rather than an error) when neither are
specified.

Display both map and thread info (rather than just thread info) when
both are requested.

Signed-off-by: Nick Forrington <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf top: Allow filters on events

Allow filters to be added to perf top events. One use is to workaround
issues with:
```
$ perf top --uid="$(id -u)"
```
which tries to scan /proc find processes belonging to the uid and can
fail in such a pid terminates between the scan and the
perf_event_open reporting:
```
Error:
The sys_perf_event_open() syscall returned with 3 (No such process) for event (cycles:P).
/bin/dmesg | grep -i perf may provide additional information.
```
A similar filter:
```
$ perf top -e cycles:P --filter "uid == $(id -u)"
```
doesn't fail this way.

Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf bpf filter: Add uid and gid terms

Allow the BPF filter to use the uid and gid terms determined by the
bpf_get_current_uid_gid BPF helper. For example, the following will
record the cpu-clock event system wide discarding samples that don't
belong to the current user.

$ perf record -e cpu-clock --filter "uid == $(id -u)" -a sleep 0.1

Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf bpf filter: Give terms their own enum

Give the term types their own enum so that additional terms can be
added that don't correspond to a PERF_SAMPLE_xx flag. The term values
are numerically ascending rather than bit field positions, this means
they need translating to a PERF_SAMPLE_xx bit field in certain places
using a shift.

Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Yang Jihong <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

tools api io: Move filling the io buffer to its own function

In general a read fills 4kb so filling the buffer is a 1 in 4096
operation, move it out of the io__get_char function to avoid some
checking overhead and to better hint the function is good to inline.

For perf's IO intensive internal (non-rigorous) benchmarks there's a
small improvement to kallsyms-parsing with a default build.

Before:
```
$ perf bench internals all
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
  Average synthesis took: 146.322 usec (+- 0.305 usec)
  Average num. events: 61.000 (+- 0.000)
  Average time per event 2.399 usec
  Average data synthesis took: 145.056 usec (+- 0.155 usec)
  Average num. events: 329.000 (+- 0.000)
  Average time per event 0.441 usec

  Average kallsyms__parse took: 162.313 ms (+- 0.599 ms)
...
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 53.720 usec (+- 7.823 usec)
  Average PMU scanning took: 375.145 usec (+- 23.974 usec)
```
After:
```
$ perf bench internals all
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
  Average synthesis took: 127.829 usec (+- 0.079 usec)
  Average num. events: 61.000 (+- 0.000)
  Average time per event 2.096 usec
  Average data synthesis took: 133.652 usec (+- 0.101 usec)
  Average num. events: 327.000 (+- 0.000)
  Average time per event 0.409 usec

  Average kallsyms__parse took: 150.415 ms (+- 0.313 ms)
...
Computing performance of sysfs PMU event scan for 100 times
  Average core PMU scanning took: 47.790 usec (+- 1.178 usec)
  Average PMU scanning took: 376.945 usec (+- 23.683 usec)
```

Signed-off-by: Ian Rogers <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf trace beauty: Always show mmap prot even though PROT_NONE

PROT_NONE is also useful information, so do not omit the mmap prot even
though it is 0. syscall_arg__scnprintf_mmap_prot() could print PROT_NONE
for prot 0.

Before: PROT_NONE is not shown.
$ sudo perf trace -e syscalls:sys_enter_mmap --filter prot==0  -- ls
     0.000 ls/2979231 syscalls:sys_enter_mmap(len: 4220888, flags: PRIVATE|ANONYMOUS)

After: PROT_NONE is displayed.
$ sudo perf trace -e syscalls:sys_enter_mmap --filter prot==0  -- ls
     0.000 ls/2975708 syscalls:sys_enter_mmap(len: 4220888, prot: NONE, flags: PRIVATE|ANONYMOUS)

Signed-off-by: Changbin Du <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf trace beauty: Always show param if show_zero is set

For some parameters, it is best to also display them when they are 0,
e.g. flags.

Here we only check the show_zero property and let arg printer handle
special cases.

Signed-off-by: Changbin Du <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf docs: Fix typos

Assorted typo fixes.

Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Cc: Changbin Du <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf list: Fix the --no-desc option

Currently, the --no-desc option in perf list isn't functioning as
intended.

This issue arises from the overwriting of struct option->desc with the
opposite value of struct option->long_desc. Consequently, whatever
parse_options() returns at struct option->desc gets overridden later,
rendering the --desc or --no-desc arguments ineffective.

To resolve this, set ->desc as true by default and allow parse_options()
to adjust it accordingly. This adjustment will fix the --no-desc
option while preserving the functionality of the other parameters.

Signed-off-by: Breno Leitao <[email protected]>
Reviewed-by: Ian Rogers <[email protected]>
Cc: [email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf arm-spe: Unaligned pointer work around

Use get_unaligned_leXX instead of leXX_to_cpu to handle unaligned
pointers. Such pointers occur with libFuzzer testing.

A similar change for intel-pt was done in:
https://lore.kernel.org/r/20231005190451 [email protected]

Signed-off-by: Ian Rogers <[email protected]>
Reviewed-by: James Clark <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]