Peter Zijlstra [Mon, 13 Nov 2017 13:28:30 +0000 (14:28 +0100)]
perf/core: Fix event schedule order
Scheduling in events with cpu=-1 before events with cpu=# changes
semantics and is undesirable in that it would priorize these events.
Given that groups->index is across all groups we actually have an
inter-group ordering, meaning we can merge-sort two groups, which is
just what we need to preserve semantics.
Change event groups into RB trees sorted by CPU and then by a 64bit
index, so that multiplexing hrtimer interrupt handler would be able
skipping to the current CPU's list and ignore groups allocated for the
other CPUs.
New API for manipulating event groups in the trees is implemented as well
as adoption on the API in the current implementation.
pinned_group_sched_in() and flexible_group_sched_in() API are
introduced to consolidate code enabling the whole group from pinned
and flexible groups appropriately.
Which results in doing a pmu::read() on an !ACTIVE event.
I _think_ this is 'new' since we added attr.inherit_stat, which added
the perf_event_read_event() to the exit path, without that
perf_event_read_output() would only trigger from samples and for
@event to trigger a sample, it's leader _must_ be ACTIVE too.
Still, adding this check makes it consistent with the @sub case for
the siblings.
Ingo Molnar [Fri, 9 Mar 2018 07:27:55 +0000 (08:27 +0100)]
Merge tag 'perf-core-for-mingo-4.17-20180308' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
- Support to display the IPC/Cycle in 'annotate' TUI, for systems
where this info can be obtained, like Intel's >= Skylake (Jin Yao)
- Support wildcards on PMU name in dynamic PMU events (Agustin Vega-Frias)
- Display pmu name when printing unmerged events in stat (Agustin Vega-Frias)
- Auto-merge PMU events created by prefix or glob match (Agustin Vega-Frias)
- Fix s390 'call' operations target function annotation (Thomas Richter)
- Handle s390 PC relative load and store instruction in the augmented
'annotate', code, used so far in the TUI modes of 'perf report' and
'perf annotate' (Thomas Richter)
- Provide libtraceevent with a kernel symbol resolver, so that
symbols in tracepoint fields can be resolved when showing them in
tools such as 'perf report' (Wang YanQing)
- Refactor the cgroups code to look more like other code in tools/perf,
using cgroup__{put,get} for refcount operations instead of its
open-coded equivalent, breaking larger functions, etc (Arnaldo Carvalho de Melo)
- Implement support for the -G/--cgroup target in 'perf trace', allowing
strace like tracing (plus other events, backtraces, etc) for cgroups
(Arnaldo Carvalho de Melo)
- Update thread shortname in 'perf sched map' when the thread's COMM
changes (Changbin Du)
- refcount 'struct mem_info', for better sharing it over several
users, avoid duplicating structs and fixing crashes related to
use after free (Jiri Olsa)
- Record the machine's memory topology information in a perf.data
feature section, to be used by tools such as 'perf c2c' (Jiri Olsa)
- Fix output of forced groups in the header for 'perf report' --stdio
and --tui (Jiri Olsa)
- Better support llvm, clang, cxx make tests in the build process (Jiri Olsa)
- Streamline the 'struct perf_mmap' methods, storing some info in the
struct instead of passing it via various methods, shortening its
signatures (Kan Liang)
- Update the quipper perf.data parser library site information (Stephane Eranian)
- Correct perf's man pages title markers for asciidoctor (Takashi Iwai)
- Intel PT fixes and refactorings paving the way for implementing
support for AUX area sampling (Adrian Hunter)
When the PEBS interrupt threshold is larger than one, there is no way
to get exact auto-reload times and value for userspace RDPMC. Disable
the userspace RDPMC usage when large PEBS is enabled.
The only exception is when the PEBS interrupt threshold is 1, in which
case user-space RDPMC works well even with auto-reload events.
Kan Liang [Mon, 12 Feb 2018 22:20:33 +0000 (14:20 -0800)]
perf/x86/intel/ds: Introduce ->read() function for auto-reload events and flush the PEBS buffer there
There is no way to get exact auto-reload times and values which are needed
for event updates unless we flush the PEBS buffer.
Introduce intel_pmu_auto_reload_read() to drain the PEBS buffer for
auto reload event. To prevent races with the hardware, we can only
call drain_pebs() when the PMU is disabled.
In fixed period mode, the auto-reload mechanism could be enabled for
PEBS events, but the calculation of event->count does not take the
auto-reload values into account.
Anyone who reads event->count will get the wrong result, e.g x86_pmu_read().
This bug was introduced with the auto-reload mechanism enabled since
commit:
851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible")
Introduce intel_pmu_save_and_restart_reload() to calculate the
event->count only for auto-reload.
Since the counter increments a negative counter value and overflows on
the sign switch, giving the interval:
[-period, 0]
the difference between two consequtive reads is:
A) value2 - value1;
when no overflows have happened in between,
B) (0 - value1) + (value2 - (-period));
when one overflow happened in between,
C) (0 - value1) + (n - 1) * (period) + (value2 - (-period));
when @n overflows happened in between.
Here A) is the obvious difference, B) is the extension to the discrete
interval, where the first term is to the top of the interval and the
second term is from the bottom of the next interval and C) the extension
to multiple intervals, where the middle term is the whole intervals
covered.
The equation for all cases is:
value2 - value1 + n * period
Previously the event->count is updated right before the sample output.
But for case A, there is no PEBS record ready. It needs to be specially
handled.
Remove the auto-reload code from x86_perf_event_set_period() since
we'll not longer call that function in this case.
Kan Liang [Tue, 20 Feb 2018 10:11:50 +0000 (02:11 -0800)]
perf/x86/intel: Properly save/restore the PMU state in the NMI handler
The PMU is disabled in intel_pmu_handle_irq(), but cpuc->enabled is not updated
accordingly.
This is fine in current usage because no-one checks it - but fix it
for future code: for example, the drain_pebs() will be modified to
fix an auto-reload bug.
Wang YanQing [Thu, 8 Mar 2018 03:28:50 +0000 (11:28 +0800)]
perf report: Provide libtraceevent with a kernel symbol resolver
So that beautifiers wanting to resolve kernel function addresses to
names can do its work, and when we use "perf report" for output of "perf
kmem record", we will get kernel symbol output.
This patch affect the output of "perf report" for the record data
generated by "perf kmem record" looks like below:
Jiri Olsa [Wed, 7 Mar 2018 15:50:08 +0000 (16:50 +0100)]
perf tools: Add MEM_TOPOLOGY feature to perf data file
Adding MEM_TOPOLOGY feature to perf data file,
that will carry physical memory map and its
node assignments.
The format of data in MEM_TOPOLOGY is as follows:
0 - version | for future changes
8 - block_size_bytes | /sys/devices/system/memory/block_size_bytes
16 - count | number of nodes
For each node we store map of physical indexes for
each node:
32 - node id | node index
40 - size | size of bitmap
48 - bitmap | bitmap of memory indexes that belongs to node
| /sys/devices/system/node/node<NODE>/memory<INDEX>
The MEM_TOPOLOGY could be displayed with following
report command:
Jiri Olsa [Wed, 7 Mar 2018 15:50:06 +0000 (16:50 +0100)]
perf tools: Add refcnt into struct mem_info
It's passed along several hists entries in --hierarchy mode, so it's
better we keep track of it.
The current fail I see is that it gets removed in hierarchy --mem-mode
mode, where it's shared in the different hierarchies, but removed from
the template hist entry, so the report crashes.
Thomas Richter [Wed, 7 Mar 2018 13:43:25 +0000 (14:43 +0100)]
perf annotate: Fix s390 target function disassembly
'perf annotate' displays function call assembler instructions with a
right arrow. Hitting enter on this line/instruction causes the browser
to disassemble this target function and show it on the screen. On s390
this results in an error message 'The called function was not found.'
The function call assembly line parsing does not handle the s390 bras
and brasl instructions. Function call__parse expects the target as first
operand:
callq e9140 <__fxstat>
S390 has a register number as first operand:
brasl %r14,41d60 <abort>
Therefore the target addresses on s390 are always zero which is an
invalid address.
Introduce a s390 specific call parsing function which skips the first
operand on s390.
Adrian Hunter [Wed, 7 Mar 2018 14:02:28 +0000 (16:02 +0200)]
perf intel-pt: Remove a check for sampling mode
Intel PT code already has some preparation for AUX area sampling mode.
However the implementation has changed from the first proposal and one
of the side-effects is that it will not be impossible to support snapshot
mode and sampling mode at the same time.
Although there are no plans to support it, let validation (not yet
implemented) control whether it is allowed rather than low-level
functions.
Adrian Hunter [Wed, 7 Mar 2018 14:02:23 +0000 (16:02 +0200)]
perf intel-pt: Fix error recovery from missing TIP packet
When a TIP packet is expected but there is a different packet, it is an
error. However the unexpected packet might be something important like a
TSC packet, so after the error, it is necessary to continue from there,
rather than the next packet. That is achieved by setting pkt_step to
zero.
Adrian Hunter [Wed, 7 Mar 2018 14:02:22 +0000 (16:02 +0200)]
perf intel-pt: Fix sync_switch
sync_switch is a facility to synchronize decoding more closely with the
point in the kernel when the context actually switched.
The flag when sync_switch is enabled was global to the decoding, whereas
it is really specific to the CPU.
The trace data for different CPUs is put on different queues, so add
sync_switch to the intel_pt_queue structure and use that in preference
to the global setting in the intel_pt structure.
That fixes problems decoding one CPU's trace because sync_switch was
disabled on a different CPU's queue.
Adrian Hunter [Wed, 7 Mar 2018 14:02:21 +0000 (16:02 +0200)]
perf intel-pt: Fix overlap detection to identify consecutive buffers correctly
Overlap detection was not not updating the buffer's 'consecutive' flag.
Marking buffers consecutive has the advantage that decoding begins from
the start of the buffer instead of the first PSB. Fix overlap detection
to identify consecutive buffers correctly.
Kan Liang [Tue, 6 Mar 2018 15:36:04 +0000 (10:36 -0500)]
perf mmap: Use stored 'overwrite' in perf_mmap__consume()
The 'overwrite' is set at allocation. It will not be changed. Using it
to replace the parameter of perf_mmap__consume(). The parameters will
be discarded later.
Kan Liang [Tue, 6 Mar 2018 15:36:03 +0000 (10:36 -0500)]
perf mmap: Use the stored data in perf_mmap__read_event()
Using the 'start', 'end' and 'overwrite' which are stored in
struct perf_mmap to replace the parameters of perf_mmap__read_event().
The parameters will be discarded later.
Kan Liang [Tue, 6 Mar 2018 15:36:02 +0000 (10:36 -0500)]
perf mmap: Use the stored scope data in perf_mmap__push()
Using the 'start' and 'end' which are stored in struct perf_mmap to
replace the temporary 'start' and 'end'.
The temporary variables will be discarded later.
It doesn't need to pass 'overwrite' to perf_mmap__push(). It's stored in
struct perf_mmap.
Kan Liang [Tue, 6 Mar 2018 15:36:01 +0000 (10:36 -0500)]
perf mmap: Store mmap scope in struct perf_mmap()
There is too much boilerplate in the perf_mmap__read*() interfaces.
The 'start' and 'end' variables should be stored in struct perf_mmap at
initialization. They will be used later.
The old 'startp' and 'endp' pointers are used by perf_mmap__read_event()
now. They cannot be removed. So the old 'startp/endp' and new
'md->start/md->end' will exist simultaneously now. The old one will be
removed later.
Kan Liang [Tue, 6 Mar 2018 15:36:00 +0000 (10:36 -0500)]
perf evlist: Store 'overwrite' in struct perf_mmap
It has been determined that the map is for overwrite mode
(evlist->overwrite_mmap) or non-overwrite mode (evlist->mmap) when
calling perf_evlist__alloc_mmap().
Store the information in struct perf_mmap, which will be used later to
simplify the perf_mmap__read*() interfaces.
perf pmu: Auto-merge PMU events created by prefix or glob match
Auto-merge for these events was disabled when auto-merging of non-alias
events was disabled in commit 63ce844 (perf stat: Only auto-merge events
that are PMU aliases).
Non-merging of legacy events is preserved:
$ perf stat -ag -e cache-misses,cache-misses sleep 1
perf pmu: Display pmu name when printing unmerged events in stat
To simplify creation of events accross multiple instances of the same
type of PMU stat supports two methods for creating multiple events from
a single event specification:
1. A prefix or glob can be used in the PMU name.
2. Aliases, which are listed immediately after the Kernel PMU events
by perf list, are used.
When the --no-merge option is passed and these events are displayed
individually the PMU name is lost and it's not possible to see which
count corresponds to which pmu:
$ perf stat -a -e l3cache/read-miss/ --no-merge ls > /dev/null
perf pmu: Support wildcards on pmu name in dynamic pmu events
Starting on v4.12 event parsing code for dynamic pmu events already
supports prefix-based matching of multiple pmus when creating dynamic
events. E.g., in a system with the following dynamic pmus:
mypmu_0
mypmu_1
mypmu_2
mypmu_4
passing mypmu/<config>/ as an event spec will result in the creation of
the event in all of the pmus. This change expands this matching through
the use of fnmatch so glob-like expressions can be used to create events
in multiple pmus. E.g., in the system described above if a user only
wants to create the event in mypmu_0 and mypmu_1, mypmu_[01]/<config>/
can be passed.
Takashi Iwai [Wed, 7 Mar 2018 10:54:41 +0000 (11:54 +0100)]
perf tools: Correct title markers for asciidoctor
I've tested to process the perf man pages with asciidoctor that is
picker than asciidoc, and it revealed minor syntax errors in some
documents. Namely, the title markers aren't aligned with the previous
line, hence asciidoctor didn't recognize as titles.
This patch corrects these markers to be processed properly.
Adrian Hunter [Tue, 6 Mar 2018 09:13:16 +0000 (11:13 +0200)]
perf auxtrace: Make auxtrace_queues__add_buffer() return buffer_ptr
In preparation for supporting AUX area sampling buffers,
auxtrace_queues__add_buffer() needs to be more generic. To that end, make
it return buffer_ptr instead of the caller.
One can set a cgroup as a default cgroup to be used by all events or
set cgroups with the 'perf stat' and 'perf record' behaviour, i.e.
'-G A' will be the cgroup for events defined so far in the command line.
Here in my main machine, with a kvm instance running a rhel6 guinea pig
I have:
# ls -la /sys/fs/cgroup/perf_event/ | grep drw
drwxr-xr-x. 14 root root 360 Mar 6 12:04 ..
drwxr-xr-x. 3 root root 0 Mar 6 15:05 machine.slice
#
So I can go ahead and use that cgroup hierarchy, say lets see what
syscalls are being emitted by threads in that 'machine.slice' hierarchy
that are taking more than 100ms:
The usual thing is for a constructor to allocate space for its members,
not to require that the caller pass a pre-allocated 'name' and then, at
its destructor, to free something not allocated by it.
Fix it by making cgroup__new() to receive a const char pointer, then
allocate cgroup->name that then can continue to be freed at
cgroup__delete(), balancing the alloc/free operations inside the cgroup
struct methods.
This eases calling evlist__findnew_cgroup() from the custom 'perf trace'
cgroup parser, that will only call parse_cgroups() when the '-G cgroup'
is passed on the command line after '-e event' entries, when it'll
behave just like 'perf stat' and 'perf record', i.e. the previous
parse_cgroup() users that mandate that -G only can come after a -e.
So that tools like 'perf trace' can allow the user to set a cgroup
to be used for all the evsels still without a crgroup setup by
parse_cgroups(), such as the one to use for the syscalls, vfs_getname
and other events involved in strace like syscall tracing.
perf cgroup: Rename close_cgroup() to cgroup__put()
It is not really closing the cgroup, but instead dropping a reference
count and if it hits zero, then calling delete, which will, among other
cleanup shores, close the cgroup fd.
So it is really dropping a reference to that cgroup, and the method name
for that is "put", so rename close_cgroup() to cgroup__put() to follow
this naming convention.
Ingo Molnar [Wed, 7 Mar 2018 08:18:32 +0000 (09:18 +0100)]
Merge tag 'perf-urgent-for-mingo-4.16-20180306' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Be more robust when drawing arrows in the annotation TUI, avoiding a
segfault when jump instructions have as a target addresses in functions
other that the one currently being annotated. The full fix will come in
the following days, when jumping to other functions will work as call
instructions (Arnaldo Carvalho de Melo)
- Prevent auxtrace_queues__process_index() from queuing AUX area data for
decoding when the --no-itrace option has been used (Adrian Hunter)
- Sync copy of kvm UAPI headers and x86's cpufeatures.h (Arnaldo Carvalho de Melo)
- Fix 'perf stat' CSV output format for non-supported counters (Ilya Pronin)
Adrian Hunter [Wed, 28 Feb 2018 08:39:04 +0000 (10:39 +0200)]
perf tools: Fix trigger class trigger_on()
trigger_on() means that the trigger is available but not ready, however
trigger_on() was making it ready. That can segfault if the signal comes
before trigger_ready(). e.g. (USR2 signal delivery not shown)
Ingo Molnar [Tue, 6 Mar 2018 06:34:04 +0000 (07:34 +0100)]
Merge tag 'perf-core-for-mingo-4.17-20180305' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
- Be more robust when drawing arrows in the annotation TUI, avoiding a
segfault when jump instructions have as a target addresses in functions
other that the one currently being annotated. The full fix will come in
the following days, when jumping to other functions will work as call
instructions (Arnaldo Carvalho de Melo)
- Allow asking for the maximum allowed sample rate in 'top' and
'record', i.e. 'perf record -F max' will read the
kernel.perf_event_max_sample_rate sysctl and use it (Arnaldo Carvalho de Melo)
- When the user specifies a freq above kernel.perf_event_max_sample_rate,
Throttle it down to that max freq, and warn the user about it, add as
well --strict-freq so that the previous behaviour of not starting the
session when the desired freq can't be used can be selected (Arnaldo Carvalho de Melo)
- Find 'call' instruction target symbol at parsing time, used so far in
the TUI, part of the infrastructure changes that will end up allowing
for jumps to navigate to other functions, just like 'call'
instructions. (Arnaldo Carvalho de Melo)
- Use xyarray dimensions to iterate fds in 'perf stat' (Andi Kleen)
- Ignore threads for which the current user hasn't permissions when
enabling system-wide --per-thread (Jin Yao)
- Fix some backtrace perf test cases to use 'perf record' + 'perf script'
instead, till 'perf trace' starts using ordered_events or equivalent
to avoid symbol resolving artifacts due to reordering of
PERF_RECORD_MMAP events (Jiri Olsa)
- Fix crash in 'perf record' pipe mode, it needs to allocate the ID
array even for a single event, unlike non-pipe mode (Jiri Olsa)
- Make annoying fallback message on older kernels with newer 'perf top'
binaries trying to use overwrite mode and that not being present
in the older kernels (Kan Liang)
- Switch last users of old APIs to the newer perf_mmap__read_event()
one, then discard those old mmap read forward APIs (Kan Liang)
- Fix the usage on the 'perf kallsyms' man page (Sangwon Hong)
- Simplify cgroup arguments when tracking multiple events (weiping zhang)
The changes in dd84441a7971 ("x86/speculation: Use IBRS if available
before calling into firmware") don't need any kind of special treatment
in the current tools/perf/ codebase, so just update the copy to get rid
of the perf build warning:
BUILD: Doing 'make -j4' parallel build
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
In 801e459a6f3a ("KVM: x86: Add a framework for supporting MSR-based
features") a new ioctl was introduced, which with this sync of the kvm
UAPI headers, makes 'perf trace' know about it:
This also silences the perf build header copy drift verifier:
make: Entering directory '/home/acme/git/perf/tools/perf'
BUILD: Doing 'make -j4' parallel build
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
Jiri Olsa [Fri, 2 Mar 2018 16:13:54 +0000 (17:13 +0100)]
perf record: Fix crash in pipe mode
Currently we can crash perf record when running in pipe mode, like:
$ perf record ls | perf report
# To display the perf.data header info, please use --header/--header-only options.
#
perf: Segmentation fault
Error:
The - file has no samples!
The callstack of the crash is:
0x0000000000515242 in perf_event__synthesize_event_update_name
3513 ev = event_update_event__new(len + 1, PERF_EVENT_UPDATE__NAME, evsel->id[0]);
(gdb) bt
#0 0x0000000000515242 in perf_event__synthesize_event_update_name
#1 0x00000000005158a4 in perf_event__synthesize_extra_attr
#2 0x0000000000443347 in record__synthesize
#3 0x00000000004438e3 in __cmd_record
#4 0x000000000044514e in cmd_record
#5 0x00000000004cbc95 in run_builtin
#6 0x00000000004cbf02 in handle_internal_command
#7 0x00000000004cc054 in run_argv
#8 0x00000000004cc422 in main
The reason of the crash is that the evsel does not have ids array
allocated and the pipe's synthesize code tries to access it.
We don't force evsel ids allocation when we have single event, because
it's not needed. However we need it when we are in pipe mode even for
single event as a key for evsel update event.
Fixing this by forcing evsel ids allocation event for single event, when
we are in pipe mode.
I.e. jumps to a label inside that function (_cpp_lex_token), and those
works, but also this kind:
│1159e8b: ↓ jne c469be <cpp_named_operator2name@@Base+0xa72>
I.e. jumps to another function, outside _cpp_lex_token, which are not
being correctly handled generating as a side effect references to
ab->offset[] entries that are set to NULL, so to make this code more
robust, check that here.
A proper fix for will be put in place, looking at the function name
right after the '<' token and probably treating this like a 'call'
instruction.
Kan Liang [Mon, 26 Feb 2018 18:17:10 +0000 (10:17 -0800)]
perf top: Fix annoying fallback message on older kernels
On older (e.g. v4.4) kernels, an annoying fallback message can be
observed in 'perf top':
┌─Warning:──────────────────────┐
│fall back to non-overwrite mode│
│ │
│ │
│Press any key... │
└───────────────────────────────┘
The 'perf top' utility has been changed to overwrite mode since commit ebebbf082357 ("perf top: Switch default mode to overwrite mode").
For older kernels which don't have overwrite mode support, 'perf top'
will fall back to non-overwrite mode and print out the fallback message
using ui__warning(), which needs user's input to close.
The fallback message is not critical for end users. Turning it to debug
message which is printed when running with -vv.
Sangwon Hong [Sun, 11 Feb 2018 19:37:44 +0000 (04:37 +0900)]
perf kallsyms: Fix the usage on the man page
First, all man pages highlight only perf and subcommands except 'perf
kallsyms', which includes the full usage. Fix it for commands to
monopolize underlines.
Second, options can be ommited when executing 'perf kallsyms', so add
square brackets between <option>.
Kan Liang [Thu, 1 Mar 2018 23:08:58 +0000 (18:08 -0500)]
perf kvm: Switch to new perf_mmap__read_event() interface
The perf kvm still use the legacy interface.
Switch to the new perf_mmap__read_event() interface for perf kvm.
No functional change.
Committer notes:
Tested before and after running:
# perf kvm stat record
On a machine with a kvm guest, then used:
# perf kvm stat report
Before/after results match and look like:
# perf kvm stat record -a sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.132 MB perf.data.guest (1828 samples) ]
# perf kvm stat report
Analyze events for all VMs, all VCPUs:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
Jiri Olsa [Fri, 2 Mar 2018 16:13:54 +0000 (17:13 +0100)]
perf record: Fix crash in pipe mode
Currently we can crash perf record when running in pipe mode, like:
$ perf record ls | perf report
# To display the perf.data header info, please use --header/--header-only options.
#
perf: Segmentation fault
Error:
The - file has no samples!
The callstack of the crash is:
0x0000000000515242 in perf_event__synthesize_event_update_name
3513 ev = event_update_event__new(len + 1, PERF_EVENT_UPDATE__NAME, evsel->id[0]);
(gdb) bt
#0 0x0000000000515242 in perf_event__synthesize_event_update_name
#1 0x00000000005158a4 in perf_event__synthesize_extra_attr
#2 0x0000000000443347 in record__synthesize
#3 0x00000000004438e3 in __cmd_record
#4 0x000000000044514e in cmd_record
#5 0x00000000004cbc95 in run_builtin
#6 0x00000000004cbf02 in handle_internal_command
#7 0x00000000004cc054 in run_argv
#8 0x00000000004cc422 in main
The reason of the crash is that the evsel does not have ids array
allocated and the pipe's synthesize code tries to access it.
We don't force evsel ids allocation when we have single event, because
it's not needed. However we need it when we are in pipe mode even for
single event as a key for evsel update event.
Fixing this by forcing evsel ids allocation event for single event, when
we are in pipe mode.
perf record: Throttle user defined frequencies to the maximum allowed
# perf record -F 200000 sleep 1
warning: Maximum frequency rate (15,000 Hz) exceeded, throttling from 200,000 Hz to 15,000 Hz.
The limit can be raised via /proc/sys/kernel/perf_event_max_sample_rate.
The kernel will lower it when perf's interrupts take too long.
Use --strict-freq to disable this throttling, refusing to record.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (15 samples) ]
# perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 15000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
For those wanting that it fails if the desired frequency can't be used:
# perf record --strict-freq -F 200000 sleep 1
error: Maximum frequency rate (15,000 Hz) exceeded.
Please use -F freq option with a lower value or consider
tweaking /proc/sys/kernel/perf_event_max_sample_rate.
#
perf top: Allow asking for the maximum allowed sample rate
Add the handy '-F max' shortcut, just introduced to 'perf record', to
reading and using the kernel.perf_event_max_sample_rate value as the
user supplied sampling frequency: