]> Git Repo - linux.git/log
linux.git
2 years agowatch_queue: Fix missing locking in add_watch_to_object()
Linus Torvalds [Thu, 28 Jul 2022 09:31:12 +0000 (10:31 +0100)]
watch_queue: Fix missing locking in add_watch_to_object()

If a watch is being added to a queue, it needs to guard against
interference from addition of a new watch, manual removal of a watch and
removal of a watch due to some other queue being destroyed.

KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by
holding the key->sem writelocked and by holding refs on both the key and
the queue - but that doesn't prevent interaction from other {key,queue}
pairs.

While add_watch_to_object() does take the spinlock on the event queue,
it doesn't take the lock on the source's watch list.  The assumption was
that the caller would prevent that (say by taking key->sem) - but that
doesn't prevent interference from the destruction of another queue.

Fix this by locking the watcher list in add_watch_to_object().

Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: [email protected]
Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
2 years agowatch_queue: Fix missing rcu annotation
David Howells [Thu, 28 Jul 2022 09:31:06 +0000 (10:31 +0100)]
watch_queue: Fix missing rcu annotation

Since __post_watch_notification() walks wlist->watchers with only the
RCU read lock held, we need to use RCU methods to add to the list (we
already use RCU methods to remove from the list).

Fix add_watch_to_object() to use hlist_add_head_rcu() instead of
hlist_add_head() for that list.

Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
2 years agonet/funeth: Fix fun_xdp_tx() and XDP packet reclaim
Dimitris Michailidis [Tue, 26 Jul 2022 21:59:23 +0000 (14:59 -0700)]
net/funeth: Fix fun_xdp_tx() and XDP packet reclaim

The current implementation of fun_xdp_tx(), used for XPD_TX, is
incorrect in that it takes an address/length pair and later releases it
with page_frag_free(). It is OK for XDP_TX but the same code is used by
ndo_xdp_xmit. In that case it loses the XDP memory type and releases the
packet incorrectly for some of the types. Assorted breakage follows.

Change fun_xdp_tx() to take xdp_frame and rely on xdp_return_frame() in
reclaim.

Fixes: db37bc177dae ("net/funeth: add the data path")
Signed-off-by: Dimitris Michailidis <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
2 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Thu, 28 Jul 2022 02:56:28 +0000 (19:56 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-07-26

This series contains updates to ice driver only.

Przemyslaw corrects accounting for VF VLANs to allow for correct number
of VLANs for untrusted VF. He also correct issue with checksum offload
on VXLAN tunnels.

Ani allows for two VSIs to share the same MAC address.

Maciej corrects checked bits for descriptor completion of loopback

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: do not setup vlan for loopback VSI
  ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
  ice: Fix VSIs unable to share unicast MAC
  ice: Fix tunnel checksum offload with fragmented traffic
  ice: Fix max VLANs available for VF
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agosctp: leave the err path free in sctp_stream_init to sctp_stream_free
Xin Long [Mon, 25 Jul 2022 22:11:06 +0000 (18:11 -0400)]
sctp: leave the err path free in sctp_stream_init to sctp_stream_free

A NULL pointer dereference was reported by Wei Chen:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  RIP: 0010:__list_del_entry_valid+0x26/0x80
  Call Trace:
   <TASK>
   sctp_sched_dequeue_common+0x1c/0x90
   sctp_sched_prio_dequeue+0x67/0x80
   __sctp_outq_teardown+0x299/0x380
   sctp_outq_free+0x15/0x20
   sctp_association_free+0xc3/0x440
   sctp_do_sm+0x1ca7/0x2210
   sctp_assoc_bh_rcv+0x1f6/0x340

This happens when calling sctp_sendmsg without connecting to server first.
In this case, a data chunk already queues up in send queue of client side
when processing the INIT_ACK from server in sctp_process_init() where it
calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in
all stream_out will be freed in sctp_stream_init's err path. Then in the
asoc freeing it will crash when dequeuing this data chunk as stream_out
is missing.

As we can't free stream out before dequeuing all data from send queue, and
this patch is to fix it by moving the err path stream_out/in freeing in
sctp_stream_init() to sctp_stream_free() which is eventually called when
freeing the asoc in sctp_association_free(). This fix also makes the code
in sctp_process_init() more clear.

Note that in sctp_association_init() when it fails in sctp_stream_init(),
sctp_association_free() will not be called, and in that case it should
go to 'stream_free' err path to free stream instead of 'fail_init'.

Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations")
Reported-by: Wei Chen <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agosfc: disable softirqs for ptp TX
Alejandro Lucero [Tue, 26 Jul 2022 06:45:04 +0000 (08:45 +0200)]
sfc: disable softirqs for ptp TX

Sending a PTP packet can imply to use the normal TX driver datapath but
invoked from the driver's ptp worker. The kernel generic TX code
disables softirqs and preemption before calling specific driver TX code,
but the ptp worker does not. Although current ptp driver functionality
does not require it, there are several reasons for doing so:

   1) The invoked code is always executed with softirqs disabled for non
      PTP packets.
   2) Better if a ptp packet transmission is not interrupted by softirq
      handling which could lead to high latencies.
   3) netdev_xmit_more used by the TX code requires preemption to be
      disabled.

Indeed a solution for dealing with kernel preemption state based on static
kernel configuration is not possible since the introduction of dynamic
preemption level configuration at boot time using the static calls
functionality.

Fixes: f79c957a0b537 ("drivers: net: sfc: use netdev_xmit_more helper")
Signed-off-by: Alejandro Lucero <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoptp: ocp: Select CRC16 in the Kconfig.
Jonathan Lemon [Tue, 26 Jul 2022 22:06:04 +0000 (15:06 -0700)]
ptp: ocp: Select CRC16 in the Kconfig.

The crc16() function is used to check the firmware validity, but
the library was not explicitly selected.

Fixes: 3c3673bde50c ("ptp: ocp: Add firmware header checks")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Jonathan Lemon <[email protected]>
Acked-by: Vadim Fedorenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoclk: sunxi-ng: Fix H6 RTC clock definition
Jernej Skrabec [Tue, 19 Jul 2022 18:37:25 +0000 (20:37 +0200)]
clk: sunxi-ng: Fix H6 RTC clock definition

While RTC clock was added in H616 ccu_common list, it was not in H6
list. That caused invalid pointer dereference like this:

Unable to handle kernel NULL pointer dereference at virtual address 000000000000020c
Mem abort info:
  ESR = 0x96000004
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x04: level 0 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000004
  CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000000004d574000
[000000000000020c] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
CPU: 3 PID: 339 Comm: cat Tainted: G    B             5.18.0-rc1+ #1352
Hardware name: Tanix TX6 (DT)
pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : ccu_gate_is_enabled+0x48/0x74
lr : ccu_gate_is_enabled+0x40/0x74
sp : ffff80000c0b76d0
x29: ffff80000c0b76d0 x28: 00000000016e3600 x27: 0000000000000000
x26: 0000000000000000 x25: 0000000000000002 x24: ffff00000952fe08
x23: ffff800009611400 x22: ffff00000952fe79 x21: 0000000000000000
x20: 0000000000000001 x19: ffff80000aad6f08 x18: 0000000000000000
x17: 2d2d2d2d2d2d2d2d x16: 2d2d2d2d2d2d2d2d x15: 2d2d2d2d2d2d2d2d
x14: 0000000000000000 x13: 00000000f2f2f2f2 x12: ffff700001816e89
x11: 1ffff00001816e88 x10: ffff700001816e88 x9 : dfff800000000000
x8 : ffff80000c0b7447 x7 : 0000000000000001 x6 : ffff700001816e88
x5 : ffff80000c0b7440 x4 : 0000000000000001 x3 : ffff800008935c50
x2 : dfff800000000000 x1 : 0000000000000000 x0 : 000000000000020c
Call trace:
 ccu_gate_is_enabled+0x48/0x74
 clk_core_is_enabled+0x7c/0x1c0
 clk_summary_show_subtree+0x1dc/0x334
 clk_summary_show_subtree+0x250/0x334
 clk_summary_show_subtree+0x250/0x334
 clk_summary_show_subtree+0x250/0x334
 clk_summary_show_subtree+0x250/0x334
 clk_summary_show+0x90/0xdc
 seq_read_iter+0x248/0x6d4
 seq_read+0x17c/0x1fc
 full_proxy_read+0x90/0xf0
 vfs_read+0xdc/0x28c
 ksys_read+0xc8/0x174
 __arm64_sys_read+0x44/0x5c
 invoke_syscall+0x60/0x190
 el0_svc_common.constprop.0+0x7c/0x160
 do_el0_svc+0x38/0xa0
 el0_svc+0x68/0x160
 el0t_64_sync_handler+0x10c/0x140
 el0t_64_sync+0x18c/0x190
Code: d1006260 97e5c981 785e8260 8b0002a0 (b9400000)
---[ end trace 0000000000000000 ]---

Fix that by adding rtc clock to H6 ccu_common list too.

Fixes: 38d321b61bda ("clk: sunxi-ng: h6-r: Add RTC gate clock")
Signed-off-by: Jernej Skrabec <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Samuel Holland <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
2 years agotcp: md5: fix IPv4-mapped support
Eric Dumazet [Tue, 26 Jul 2022 11:57:43 +0000 (11:57 +0000)]
tcp: md5: fix IPv4-mapped support

After the blamed commit, IPv4 SYN packets handled
by a dual stack IPv6 socket are dropped, even if
perfectly valid.

$ nstat | grep MD5
TcpExtTCPMD5Failure             5                  0.0

For a dual stack listener, an incoming IPv4 SYN packet
would call tcp_inbound_md5_hash() with @family == AF_INET,
while tp->af_specific is pointing to tcp_sock_ipv6_specific.

Only later when an IPv4-mapped child is created, tp->af_specific
is changed to tcp_sock_ipv6_mapped_specific.

Fixes: 7bbb765b7349 ("net/tcp: Merge TCP-MD5 inbound callbacks")
Reported-by: Brian Vazquez <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Dmitry Safonov <[email protected]>
Tested-by: Leonard Crestez <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoARM: 9216/1: Fix MAX_DMA_ADDRESS overflow
Florian Fainelli [Tue, 19 Jul 2022 16:33:21 +0000 (17:33 +0100)]
ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow

Commit 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis")
added a check to determine whether arm_dma_zone_size is exceeding the
amount of kernel virtual address space available between the upper 4GB
virtual address limit and PAGE_OFFSET in order to provide a suitable
definition of MAX_DMA_ADDRESS that should fit within the 32-bit virtual
address space. The quantity used for comparison was off by a missing
trailing 0, leading to MAX_DMA_ADDRESS to be overflowing a 32-bit
quantity.

This was caught thanks to CONFIG_DEBUG_VIRTUAL on the bcm2711 platform
where we define a dma_zone_size of 1GB and we have a PAGE_OFFSET value
of 0xc000_0000 (CONFIG_VMSPLIT_3G) leading to MAX_DMA_ADDRESS being
0x1_0000_0000 which overflows the unsigned long type used throughout
__pa() and then __virt_addr_valid(). Because the virtual address passed
to __virt_addr_valid() would now be 0, the function would loudly warn
and flood the kernel log, thus making the platform unable to boot
properly.

Fixes: 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis")
Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
2 years agoMerge tag 'asm-generic-fixes-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 27 Jul 2022 16:50:18 +0000 (09:50 -0700)]
Merge tag 'asm-generic-fixes-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic fixes from Arnd Bergmann:
 "Two more bug fixes for asm-generic, one addressing an incorrect
  Kconfig symbol reference and another one fixing a build failure for
  the perf tool on mips and possibly others"

* tag 'asm-generic-fixes-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: remove a broken and needless ifdef conditional
  tools: Fixed MIPS builds due to struct flock re-definition

2 years agoMerge tag 'soc-fixes-5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Wed, 27 Jul 2022 16:43:07 +0000 (09:43 -0700)]
Merge tag 'soc-fixes-5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "One last set of changes for the soc tree:

   - fix clock frequency on lan966x

   - fix incorrect GPIO numbers on some pxa machines

   - update Baolin's email address"

* tag 'soc-fixes-5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  ARM: pxa2xx: Fix GPIO descriptor tables
  mailmap: update Baolin Wang's email
  ARM: dts: lan966x: fix sys_clk frequency

2 years agoRevert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV"
Borislav Petkov [Wed, 27 Jul 2022 11:24:21 +0000 (13:24 +0200)]
Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV"

This reverts commit 007faec014cb5d26983c1f86fd08c6539b41392e.

Now that hyperv does its own protocol negotiation:

  49d6a3c062a1 ("x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM")

revert this exposure of the sev_es_ghcb_hv_call() helper.

Cc: Wei Liu <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by:Tianyu Lan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
2 years agox86/configs: Update configs in x86_debug.config
Lukas Bulwahn [Fri, 22 Jul 2022 12:18:15 +0000 (14:18 +0200)]
x86/configs: Update configs in x86_debug.config

Commit

  4675ff05de2d ("kmemcheck: rip it out")

removed kmemcheck and its corresponding build config KMEMCHECK.

Commit

  0f620cefd775 ("objtool: Rename "VMLINUX_VALIDATION" -> "NOINSTR_VALIDATION"")

renamed the debug config option.

Adjust x86_debug.config to those changes in debug configs.

Signed-off-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
2 years agoMerge tag 'nvme-5.19-2022-07-27' of git://git.infradead.org/nvme into block-5.19
Jens Axboe [Wed, 27 Jul 2022 16:03:40 +0000 (10:03 -0600)]
Merge tag 'nvme-5.19-2022-07-27' of git://git.infradead.org/nvme into block-5.19

Pull NVMe fix from Christoph:

"nvme fix for Linux 5.19

 - yet another duplicate ID quirk (Tobias Gruetzmacher)"

* tag 'nvme-5.19-2022-07-27' of git://git.infradead.org/nvme:
  nvme-pci: Crucial P2 has bogus namespace ids

2 years agoperf bpf: Remove undefined behavior from bpf_perf_object__next()
Ian Rogers [Tue, 26 Jul 2022 22:09:21 +0000 (15:09 -0700)]
perf bpf: Remove undefined behavior from bpf_perf_object__next()

bpf_perf_object__next() folded the last element in the list test with the
empty list test. However, this meant that offsets were computed against
null and that a struct list_head was compared against a 'struct
bpf_perf_object'.

Working around this with clang's undefined behavior sanitizer required
-fno-sanitize=null and -fno-sanitize=object-size.

Remove the undefined behavior by using the regular Linux list APIs and
handling the starting case separately from the end testing case.

Looking at uses like bpf_perf_object__for_each(), as the constant NULL
or non-NULL argument can be constant propagated, the code is no less
efficient.

Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Christy Lee <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Miaoqian Lin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Tom Rix <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf symbol: Skip symbols if SHF_ALLOC flag is not set
Leo Yan [Sun, 24 Jul 2022 06:00:13 +0000 (14:00 +0800)]
perf symbol: Skip symbols if SHF_ALLOC flag is not set

Some symbols are observed with the 'st_value' field zeroed.  E.g.
libc.so.6 in Ubuntu contains a symbol '__evoke_link_warning_getwd' which
resides in the '.gnu.warning.getwd' section.

Unlike normal sections, such kind of sections are used for linker
warning when a file calls deprecated functions, but they are not part of
memory images, the symbols in these sections should be dropped.

This patch checks the section attribute SHF_ALLOC bit, if the bit is not
set, it skips symbols to avoid spurious ones.

Suggested-by: Fangrui Song <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Chang Rui <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf symbol: Correct address for bss symbols
Leo Yan [Sun, 24 Jul 2022 06:00:12 +0000 (14:00 +0800)]
perf symbol: Correct address for bss symbols

When using 'perf mem' and 'perf c2c', an issue is observed that tool
reports the wrong offset for global data symbols.  This is a common
issue on both x86 and Arm64 platforms.

Let's see an example, for a test program, below is the disassembly for
its .bss section which is dumped with objdump:

  ...

  Disassembly of section .bss:

  0000000000004040 <completed.0>:
   ...

  0000000000004080 <buf1>:
   ...

  00000000000040c0 <buf2>:
   ...

  0000000000004100 <thread>:
   ...

First we used 'perf mem record' to run the test program and then used
'perf --debug verbose=4 mem report' to observe what's the symbol info
for 'buf1' and 'buf2' structures.

  # ./perf mem record -e ldlat-loads,ldlat-stores -- false_sharing.exe 8
  # ./perf --debug verbose=4 mem report
    ...
    dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 sh_addr: 0x4040 sh_offset: 0x3028
    symbol__new: buf2 0x30a8-0x30e8
    ...
    dso__load_sym_internal: adjusting symbol: st_value: 0x4080 sh_addr: 0x4040 sh_offset: 0x3028
    symbol__new: buf1 0x3068-0x30a8
    ...

The perf tool relies on libelf to parse symbols, in executable and
shared object files, 'st_value' holds a virtual address; 'sh_addr' is
the address at which section's first byte should reside in memory, and
'sh_offset' is the byte offset from the beginning of the file to the
first byte in the section.  The perf tool uses below formula to convert
a symbol's memory address to a file address:

  file_address = st_value - sh_addr + sh_offset
                    ^
                    ` Memory address

We can see the final adjusted address ranges for buf1 and buf2 are
[0x30a8-0x30e8) and [0x3068-0x30a8) respectively, apparently this is
incorrect, in the code, the structure for 'buf1' and 'buf2' specifies
compiler attribute with 64-byte alignment.

The problem happens for 'sh_offset', libelf returns it as 0x3028 which
is not 64-byte aligned, combining with disassembly, it's likely libelf
doesn't respect the alignment for .bss section, therefore, it doesn't
return the aligned value for 'sh_offset'.

Suggested by Fangrui Song, ELF file contains program header which
contains PT_LOAD segments, the fields p_vaddr and p_offset in PT_LOAD
segments contain the execution info.  A better choice for converting
memory address to file address is using the formula:

  file_address = st_value - p_vaddr + p_offset

This patch introduces elf_read_program_header() which returns the
program header based on the passed 'st_value', then it uses the formula
above to calculate the symbol file address; and the debugging log is
updated respectively.

After applying the change:

  # ./perf --debug verbose=4 mem report
    ...
    dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 p_vaddr: 0x3d28 p_offset: 0x2d28
    symbol__new: buf2 0x30c0-0x3100
    ...
    dso__load_sym_internal: adjusting symbol: st_value: 0x4080 p_vaddr: 0x3d28 p_offset: 0x2d28
    symbol__new: buf1 0x3080-0x30c0
    ...

Fixes: f17e04afaff84b5c ("perf report: Fix ELF symbol parsing")
Reported-by: Chang Rui <[email protected]>
Suggested-by: Fangrui Song <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf scripts python: Let script to be python2 compliant
Leo Yan [Mon, 25 Jul 2022 10:42:20 +0000 (18:42 +0800)]
perf scripts python: Let script to be python2 compliant

The mainline kernel can be used for relative old distros, e.g. RHEL 7.
The distro doesn't upgrade from python2 to python3, this causes the
building error that the python script is not python2 compliant.

To fix the building failure, this patch changes from the python f-string
format to traditional string format.

Fixes: 12fdd6c009da0d02 ("perf scripts python: Support Arm CoreSight trace data disassembly")
Reported-by: Akemi Yagi <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: ElRepo <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Leo Yan <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agotools headers cpufeatures: Sync with the kernel sources
Arnaldo Carvalho de Melo [Thu, 1 Jul 2021 16:39:15 +0000 (13:39 -0300)]
tools headers cpufeatures: Sync with the kernel sources

To pick the changes from:

  28a99e95f55c6185 ("x86/amd: Use IBPB for firmware calls")

This only causes these perf files to be rebuilt:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Adrian Hunter <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoMerge remote-tracking branch 'torvalds/master' into perf/core
Arnaldo Carvalho de Melo [Wed, 27 Jul 2022 14:08:48 +0000 (11:08 -0300)]
Merge remote-tracking branch 'torvalds/master' into perf/core

To get upstream fixes.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agovirtio-net: fix the race between refill work and close
Jason Wang [Mon, 25 Jul 2022 07:21:59 +0000 (15:21 +0800)]
virtio-net: fix the race between refill work and close

We try using cancel_delayed_work_sync() to prevent the work from
enabling NAPI. This is insufficient since we don't disable the source
of the refill work scheduling. This means an NAPI poll callback after
cancel_delayed_work_sync() can schedule the refill work then can
re-enable the NAPI that leads to use-after-free [1].

Since the work can enable NAPI, we can't simply disable NAPI before
calling cancel_delayed_work_sync(). So fix this by introducing a
dedicated boolean to control whether or not the work could be
scheduled from NAPI.

[1]
==================================================================
BUG: KASAN: use-after-free in refill_work+0x43/0xd4
Read of size 2 at addr ffff88810562c92e by task kworker/2:1/42

CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 5.19.0-rc1+ #480
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: events refill_work
Call Trace:
 <TASK>
 dump_stack_lvl+0x34/0x44
 print_report.cold+0xbb/0x6ac
 ? _printk+0xad/0xde
 ? refill_work+0x43/0xd4
 kasan_report+0xa8/0x130
 ? refill_work+0x43/0xd4
 refill_work+0x43/0xd4
 process_one_work+0x43d/0x780
 worker_thread+0x2a0/0x6f0
 ? process_one_work+0x780/0x780
 kthread+0x167/0x1a0
 ? kthread_exit+0x50/0x50
 ret_from_fork+0x22/0x30
 </TASK>
...

Fixes: b2baed69e605c ("virtio_net: set/cancel work on ndo_open/ndo_stop")
Signed-off-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Xuan Zhuo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoEDAC/ghes: Set the DIMM label unconditionally
Toshi Kani [Thu, 21 Jul 2022 18:05:03 +0000 (12:05 -0600)]
EDAC/ghes: Set the DIMM label unconditionally

The commit

  cb51a371d08e ("EDAC/ghes: Setup DIMM label from DMI and use it in error reports")

enforced that both the bank and device strings passed to
dimm_setup_label() are not NULL.

However, there are BIOSes, for example on a

  HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019

which don't populate both strings:

  Handle 0x0020, DMI type 17, 84 bytes
  Memory Device
          Array Handle: 0x0013
          Error Information Handle: Not Provided
          Total Width: 72 bits
          Data Width: 64 bits
          Size: 32 GB
          Form Factor: DIMM
          Set: None
          Locator: PROC 1 DIMM 1        <===== device
          Bank Locator: Not Specified   <===== bank

This results in a buffer overflow because ghes_edac_register() calls
strlen() on an uninitialized label, which had non-zero values left over
from krealloc_array():

  detected buffer overflow in __fortify_strlen
   ------------[ cut here ]------------
   kernel BUG at lib/string_helpers.c:983!
   invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
   CPU: 1 PID: 1 Comm: swapper/0 Tainted: G          I       5.18.6-200.fc36.x86_64 #1
   Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019
   RIP: 0010:fortify_panic
   ...
   Call Trace:
    <TASK>
    ghes_edac_register.cold
    ghes_probe
    platform_probe
    really_probe
    __driver_probe_device
    driver_probe_device
    __driver_attach
    ? __device_attach_driver
    bus_for_each_dev
    bus_add_driver
    driver_register
    acpi_ghes_init
    acpi_init
    ? acpi_sleep_proc_init
    do_one_initcall

The label contains garbage because the commit in Fixes reallocs the
DIMMs array while scanning the system but doesn't clear the newly
allocated memory.

Change dimm_setup_label() to always initialize the label to fix the
issue. Set it to the empty string in case BIOS does not provide both
bank and device so that ghes_edac_register() can keep the default label
given by edac_mc_alloc_dimms().

  [ bp: Rewrite commit message. ]

Fixes: b9cae27728d1f ("EDAC/ghes: Scan the system once on driver init")
Co-developed-by: Robert Richter <[email protected]>
Signed-off-by: Robert Richter <[email protected]>
Signed-off-by: Toshi Kani <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Tested-by: Robert Elliott <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
2 years agomptcp: Do not return EINPROGRESS when subflow creation succeeds
Mat Martineau [Mon, 25 Jul 2022 20:52:31 +0000 (13:52 -0700)]
mptcp: Do not return EINPROGRESS when subflow creation succeeds

New subflows are created within the kernel using O_NONBLOCK, so
EINPROGRESS is the expected return value from kernel_connect().
__mptcp_subflow_connect() has the correct logic to consider EINPROGRESS
to be a successful case, but it has also used that error code as its
return value.

Before v5.19 this was benign: all the callers ignored the return
value. Starting in v5.19 there is a MPTCP_PM_CMD_SUBFLOW_CREATE generic
netlink command that does use the return value, so the EINPROGRESS gets
propagated to userspace.

Make __mptcp_subflow_connect() always return 0 on success instead.

Fixes: ec3edaa7ca6c ("mptcp: Add handling of outgoing MP_JOIN requests")
Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Wed, 27 Jul 2022 02:53:09 +0000 (19:53 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter updates for net

Three late fixes for netfilter:

1) If nf_queue user requests packet truncation below size of l3 header,
   we corrupt the skb, then crash.  Reject such requests.

2) add cond_resched() calls when doing cycle detection in the
   nf_tables graph.  This avoids softlockup warning with certain
   rulesets.

3) Reject rulesets that use nftables 'queue' expression in family/chain
   combinations other than those that are supported.  Currently the ruleset
   will load, but when userspace attempts to reinject you get WARN splat +
   packet drops.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_queue: only allow supported familes and hooks
  netfilter: nf_tables: add rescheduling points during loop detection walks
  netfilter: nf_queue: do not allow packet truncation below transport header offset
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoMerge tag 'for-net-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Jakub Kicinski [Wed, 27 Jul 2022 02:48:24 +0000 (19:48 -0700)]
Merge tag 'for-net-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fix early wakeup after suspend
 - Fix double free on error
 - Fix use-after-free on l2cap_chan_put

* tag 'for-net-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
  Bluetooth: Always set event mask on suspend
  Bluetooth: mgmt: Fix double free on error path
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoMerge tag 'mm-hotfixes-stable-2022-07-26' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Wed, 27 Jul 2022 02:38:46 +0000 (19:38 -0700)]
Merge tag 'mm-hotfixes-stable-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "Thirteen hotfixes.

  Eight are cc:stable and the remainder are for post-5.18 issues or are
  too minor to warrant backporting"

* tag 'mm-hotfixes-stable-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mailmap: update Gao Xiang's email addresses
  userfaultfd: provide properly masked address for huge-pages
  Revert "ocfs2: mount shared volume without ha stack"
  hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte
  fs: sendfile handles O_NONBLOCK of out_fd
  ntfs: fix use-after-free in ntfs_ucsncmp()
  secretmem: fix unhandled fault in truncate
  mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range()
  mm: fix missing wake-up event for FSDAX pages
  mm: fix page leak with multiple threads mapping the same page
  mailmap: update Seth Forshee's email address
  tmpfs: fix the issue that the mount and remount results are inconsistent.
  mm: kfence: apply kmemleak_ignore_phys on early allocated pool

2 years agoscsi: ufs: core: Fix a race condition related to device management
Bart Van Assche [Wed, 20 Jul 2022 17:02:23 +0000 (10:02 -0700)]
scsi: ufs: core: Fix a race condition related to device management

If a device management command completion happens after
wait_for_completion_timeout() times out and before ufshcd_clear_cmds() is
called, then the completion code may crash on the complete() call in
__ufshcd_transfer_req_compl().

Fix the following crash:

  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
  Call trace:
   complete+0x64/0x178
   __ufshcd_transfer_req_compl+0x30c/0x9c0
   ufshcd_poll+0xf0/0x208
   ufshcd_sl_intr+0xb8/0xf0
   ufshcd_intr+0x168/0x2f4
   __handle_irq_event_percpu+0xa0/0x30c
   handle_irq_event+0x84/0x178
   handle_fasteoi_irq+0x150/0x2e8
   __handle_domain_irq+0x114/0x1e4
   gic_handle_irq.31846+0x58/0x300
   el1_irq+0xe4/0x1c0
   efi_header_end+0x110/0x680
   __irq_exit_rcu+0x108/0x124
   __handle_domain_irq+0x118/0x1e4
   gic_handle_irq.31846+0x58/0x300
   el1_irq+0xe4/0x1c0
   cpuidle_enter_state+0x3ac/0x8c4
   do_idle+0x2fc/0x55c
   cpu_startup_entry+0x84/0x90
   kernel_init+0x0/0x310
   start_kernel+0x0/0x608
   start_kernel+0x4ec/0x608

Link: https://lore.kernel.org/r/[email protected]
Fixes: 5a0b0cb9bee7 ("[SCSI] ufs: Add support for sending NOP OUT UPIU")
Cc: Adrian Hunter <[email protected]>
Cc: Avri Altman <[email protected]>
Cc: Bean Huo <[email protected]>
Cc: Stanley Chu <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
2 years agoscsi: core: Fix warning in scsi_alloc_sgtables()
Jason Yan [Wed, 20 Jul 2022 02:51:20 +0000 (10:51 +0800)]
scsi: core: Fix warning in scsi_alloc_sgtables()

As explained in SG_IO howto[1]:

"If iovec_count is non-zero then 'dxfer_len' should be equal to the sum of
iov_len lengths. If not, the minimum of the two is the transfer length."

When iovec_count is non-zero and dxfer_len is zero, the sg_io() just
genarated a null bio, and finally caused a warning below. To fix it, skip
generating a bio for this request if dxfer_len is zero.

[1] https://tldp.org/HOWTO/SCSI-Generic-HOWTO/x198.html

WARNING: CPU: 2 PID: 3643 at drivers/scsi/scsi_lib.c:1032 scsi_alloc_sgtables+0xc7d/0xf70 drivers/scsi/scsi_lib.c:1032
Modules linked in:

CPU: 2 PID: 3643 Comm: syz-executor397 Not tainted
5.17.0-rc3-syzkaller-00316-gb81b1829e7e3 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-204/01/2014
RIP: 0010:scsi_alloc_sgtables+0xc7d/0xf70 drivers/scsi/scsi_lib.c:1032
Code: e7 fc 31 ff 44 89 f6 e8 c1 4e e7 fc 45 85 f6 0f 84 1a f5 ff ff e8
93 4c e7 fc 83 c5 01 0f b7 ed e9 0f f5 ff ff e8 83 4c e7 fc <0f> 0b 41
   bc 0a 00 00 00 e9 2b fb ff ff 41 bc 09 00 00 00 e9 20 fb
RSP: 0018:ffffc90000d07558 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88801bfc96a0 RCX: 0000000000000000
RDX: ffff88801c876000 RSI: ffffffff849060bd RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff849055b9 R11: 0000000000000000 R12: ffff888012b8c000
R13: ffff88801bfc9580 R14: 0000000000000000 R15: ffff88801432c000
FS:  00007effdec8e700(0000) GS:ffff88802cc00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007effdec6d718 CR3: 00000000206d6000 CR4: 0000000000150ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 scsi_setup_scsi_cmnd drivers/scsi/scsi_lib.c:1219 [inline]
 scsi_prepare_cmd drivers/scsi/scsi_lib.c:1614 [inline]
 scsi_queue_rq+0x283e/0x3630 drivers/scsi/scsi_lib.c:1730
 blk_mq_dispatch_rq_list+0x6ea/0x22e0 block/blk-mq.c:1851
 __blk_mq_sched_dispatch_requests+0x20b/0x410 block/blk-mq-sched.c:299
 blk_mq_sched_dispatch_requests+0xfb/0x180 block/blk-mq-sched.c:332
 __blk_mq_run_hw_queue+0xf9/0x350 block/blk-mq.c:1968
 __blk_mq_delay_run_hw_queue+0x5b6/0x6c0 block/blk-mq.c:2045
 blk_mq_run_hw_queue+0x30f/0x480 block/blk-mq.c:2096
 blk_mq_sched_insert_request+0x340/0x440 block/blk-mq-sched.c:451
 blk_execute_rq+0xcc/0x340 block/blk-mq.c:1231
 sg_io+0x67c/0x1210 drivers/scsi/scsi_ioctl.c:485
 scsi_ioctl_sg_io drivers/scsi/scsi_ioctl.c:866 [inline]
 scsi_ioctl+0xa66/0x1560 drivers/scsi/scsi_ioctl.c:921
 sd_ioctl+0x199/0x2a0 drivers/scsi/sd.c:1576
 blkdev_ioctl+0x37a/0x800 block/ioctl.c:588
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:874 [inline]
 __se_sys_ioctl fs/ioctl.c:860 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7effdecdc5d9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 81 14 00 00 90 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007effdec8e2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007effded664c0 RCX: 00007effdecdc5d9
RDX: 0000000020002300 RSI: 0000000000002285 RDI: 0000000000000004
RBP: 00007effded34034 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 00007effded34054 R14: 2f30656c69662f2e R15: 00007effded664c8

Link: https://lore.kernel.org/r/[email protected]
Fixes: 25636e282fe9 ("block: fix SG_IO vector request data length handling")
Reported-by: [email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Jason Yan <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
2 years agoscsi: ufs: host: Hold reference returned by of_parse_phandle()
Liang He [Tue, 19 Jul 2022 07:15:29 +0000 (15:15 +0800)]
scsi: ufs: host: Hold reference returned by of_parse_phandle()

In ufshcd_populate_vreg(), we should hold the reference returned by
of_parse_phandle() and then use it to call of_node_put() for refcount
balance.

Link: https://lore.kernel.org/r/[email protected]
Fixes: aa4976130934 ("ufs: Add regulator enable support")
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Liang He <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
2 years agoscsi: mpt3sas: Stop fw fault watchdog work item during system shutdown
David Jeffery [Fri, 22 Jul 2022 14:24:48 +0000 (10:24 -0400)]
scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown

During system shutdown or reboot, mpt3sas will reset the firmware back to
ready state. However, the driver leaves running a watchdog work item
intended to keep the firmware in operational state. This causes a second,
unneeded reset on shutdown and moves the firmware back to operational
instead of in ready state as intended. And if the mpt3sas_fwfault_debug
module parameter is set, this extra reset also panics the system.

mpt3sas's scsih_shutdown needs to stop the watchdog before resetting the
firmware back to ready state.

Link: https://lore.kernel.org/r/[email protected]
Fixes: fae21608c31c ("scsi: mpt3sas: Transition IOC to Ready state during shutdown")
Tested-by: Laurence Oberman <[email protected]>
Acked-by: Sreekanth Reddy <[email protected]>
Signed-off-by: David Jeffery <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
2 years agomailmap: update Gao Xiang's email addresses
Gao Xiang [Tue, 19 Jul 2022 15:42:46 +0000 (23:42 +0800)]
mailmap: update Gao Xiang's email addresses

I've been in Alibaba Cloud for more than one year, mainly to address
cloud-native challenges (such as high-performance container images) for
open source communities.

Update my email addresses on behalf of my current employer (Alibaba Cloud)
to support all my (team) work in this area.  Also add an outdated
@redhat.com address of me.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Gao Xiang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agouserfaultfd: provide properly masked address for huge-pages
Nadav Amit [Mon, 11 Jul 2022 16:59:06 +0000 (09:59 -0700)]
userfaultfd: provide properly masked address for huge-pages

Commit 824ddc601adc ("userfaultfd: provide unmasked address on
page-fault") was introduced to fix an old bug, in which the offset in the
address of a page-fault was masked.  Concerns were raised - although were
never backed by actual code - that some userspace code might break because
the bug has been around for quite a while.  To address these concerns a
new flag was introduced, and only when this flag is set by the user,
userfaultfd provides the exact address of the page-fault.

The commit however had a bug, and if the flag is unset, the offset was
always masked based on a base-page granularity.  Yet, for huge-pages, the
behavior prior to the commit was that the address is masked to the
huge-page granulrity.

While there are no reports on real breakage, fix this issue.  If the flag
is unset, use the address with the masking that was done before.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 824ddc601adc ("userfaultfd: provide unmasked address on page-fault")
Signed-off-by: Nadav Amit <[email protected]>
Reported-by: James Houghton <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Reviewed-by: James Houghton <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
2 years agoBluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
Luiz Augusto von Dentz [Thu, 21 Jul 2022 16:10:50 +0000 (09:10 -0700)]
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put

This fixes the following trace which is caused by hci_rx_work starting up
*after* the final channel reference has been put() during sock_close() but
*before* the references to the channel have been destroyed, so instead
the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to
prevent referencing a channel that is about to be destroyed.

  refcount_t: increment on 0; use-after-free.
  BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
  Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705

  CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S      W
  4.14.234-00003-g1fb6d0bd49a4-dirty #28
  Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150
  Google Inc. MSM sm8150 Flame DVT (DT)
  Workqueue: hci0 hci_rx_work
  Call trace:
   dump_backtrace+0x0/0x378
   show_stack+0x20/0x2c
   dump_stack+0x124/0x148
   print_address_description+0x80/0x2e8
   __kasan_report+0x168/0x188
   kasan_report+0x10/0x18
   __asan_load4+0x84/0x8c
   refcount_dec_and_test+0x20/0xd0
   l2cap_chan_put+0x48/0x12c
   l2cap_recv_frame+0x4770/0x6550
   l2cap_recv_acldata+0x44c/0x7a4
   hci_acldata_packet+0x100/0x188
   hci_rx_work+0x178/0x23c
   process_one_work+0x35c/0x95c
   worker_thread+0x4cc/0x960
   kthread+0x1a8/0x1c4
   ret_from_fork+0x10/0x18

Cc: [email protected]
Reported-by: Lee Jones <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Tested-by: Lee Jones <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2 years agoBluetooth: Always set event mask on suspend
Abhishek Pandit-Subedi [Mon, 25 Jul 2022 22:34:21 +0000 (15:34 -0700)]
Bluetooth: Always set event mask on suspend

When suspending, always set the event mask once disconnects are
successful. Otherwise, if wakeup is disallowed, the event mask is not
set before suspend continues and can result in an early wakeup.

Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier")
Cc: [email protected]
Signed-off-by: Abhishek Pandit-Subedi <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2 years agoBluetooth: mgmt: Fix double free on error path
Dan Carpenter [Tue, 26 Jul 2022 15:13:44 +0000 (18:13 +0300)]
Bluetooth: mgmt: Fix double free on error path

Don't call mgmt_pending_remove() twice (double free).

Fixes: 6b88eff43704 ("Bluetooth: hci_sync: Refactor remove Adv Monitor")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2 years agowifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
Tetsuo Handa [Sun, 17 Jul 2022 12:21:52 +0000 (21:21 +0900)]
wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()

lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1],
for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif
that is being stopped") guards clear_bit() using fq.lock even before
fq_init() from ieee80211_txq_setup_flows() initializes this spinlock.

According to discussion [2], Toke was not happy with expanding usage of
fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we
can instead use synchronize_rcu() for flushing ieee80211_wake_txqs().

Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6
Link: https://lkml.kernel.org/r/[email protected]
Reported-by: syzbot <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped")
Tested-by: syzbot <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ pick up commit 3598cb6e1862 ("wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()") from -next]
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoice: do not setup vlan for loopback VSI
Maciej Fijalkowski [Thu, 7 Jul 2022 10:20:43 +0000 (12:20 +0200)]
ice: do not setup vlan for loopback VSI

Currently loopback test is failiing due to the error returned from
ice_vsi_vlan_setup(). Skip calling it when preparing loopback VSI.

Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
Maciej Fijalkowski [Thu, 7 Jul 2022 10:20:42 +0000 (12:20 +0200)]
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)

Tx side sets EOP and RS bits on descriptors to indicate that a
particular descriptor is the last one and needs to generate an irq when
it was sent. These bits should not be checked on completion path
regardless whether it's the Tx or the Rx. DD bit serves this purpose and
it indicates that a particular descriptor is either for Rx or was
successfully Txed. EOF is also set as loopback test does not xmit
fragmented frames.

Look at (DD | EOF) bits setting in ice_lbtest_receive_frames() instead
of EOP and RS pair.

Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Tested-by: George Kuruvinakunnel <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoice: Fix VSIs unable to share unicast MAC
Anirudh Venkataramanan [Thu, 21 Jul 2022 08:03:09 +0000 (10:03 +0200)]
ice: Fix VSIs unable to share unicast MAC

The driver currently does not allow two VSIs in the same PF domain
to have the same unicast MAC address. This is incorrect in the sense
that a policy decision is being made in the driver when it must be
left to the user. This approach was causing issues when rebooting
the system with VFs spawned not being able to change their MAC addresses.
Such errors were present in dmesg:

[ 7921.068237] ice 0000:b6:00.2 ens2f2: Unicast MAC 6a:0d:e4:70:ca:d1 already
exists on this PF. Preventing setting VF 7 unicast MAC address to 6a:0d:e4:70:ca:d1

Fix that by removing this restriction. Doing this also allows
us to remove some additional code that's checking if a unicast MAC
filter already exists.

Fixes: 47ebc7b02485 ("ice: Check if unicast MAC exists before setting VF MAC")
Signed-off-by: Anirudh Venkataramanan <[email protected]>
Signed-off-by: Sylwester Dziedziuch <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoice: Fix tunnel checksum offload with fragmented traffic
Przemyslaw Patynowski [Mon, 18 Jul 2022 11:34:27 +0000 (13:34 +0200)]
ice: Fix tunnel checksum offload with fragmented traffic

Fix checksum offload on VXLAN tunnels.
In case, when mpls protocol is not used, set l4 header to transport
header of skb. This fixes case, when user tries to offload checksums
of VXLAN tunneled traffic.

Steps for reproduction (requires link partner with tunnels):
ip l s enp130s0f0 up
ip a f enp130s0f0
ip a a 10.10.110.2/24 dev enp130s0f0
ip l s enp130s0f0 mtu 1600
ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev enp130s0f0 dstport 4789
ip l s vxlan12_sut up
ip a a 20.10.110.2/24 dev vxlan12_sut
iperf3 -c 20.10.110.1 #should connect

Offload params: td_offset, cd_tunnel_params were
corrupted, due to l4 header pointing wrong address. NIC would then drop
those packets internally, due to incorrect TX descriptor data,
which increased GLV_TEPC register.

Fixes: 69e66c04c672 ("ice: Add mpls+tso support")
Signed-off-by: Przemyslaw Patynowski <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoice: Fix max VLANs available for VF
Przemyslaw Patynowski [Mon, 4 Jul 2022 13:46:12 +0000 (15:46 +0200)]
ice: Fix max VLANs available for VF

Legacy VLAN implementation allows for untrusted VF to have 8 VLAN
filters, not counting VLAN 0 filters. Current VLAN_V2 implementation
lowers available filters for VF, by counting in VLAN 0 filter for both
TPIDs.
Fix this by counting only non zero VLAN filters.
Without this patch, untrusted VF would not be able to access 8 VLAN
filters.

Fixes: cc71de8fa133 ("ice: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2")
Signed-off-by: Przemyslaw Patynowski <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Tested-by: Marek Szlosek <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
2 years agoperf test: Avoid sysfs state affecting fake events
Ian Rogers [Mon, 25 Jul 2022 22:36:02 +0000 (15:36 -0700)]
perf test: Avoid sysfs state affecting fake events

Icelake has a slots event, on my Skylakex I have CPU events in sysfs of
topdown-slots-issued and topdown-total-slots.

Legacy event parsing would try to use '-' to separate parts of an event
and so perf_pmu__parse_init sets 'slots' to be a
PMU_EVENT_SYMBOL_SUFFIX2.

As such parsing the slots event for a fake PMU fails as a
PMU_EVENT_SYMBOL_SUFFIX2 isn't made into the PE_PMU_EVENT_FAKE token.

Resolve this issue by test initializing the PMU parsing state before
every parse. This must be done every parse as the state is removes after
each parse_events.

Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Kshipra Bopardikar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf vendor events intel: Update event list for haswellx
Zhengjun Xing [Tue, 14 Jun 2022 14:50:19 +0000 (22:50 +0800)]
perf vendor events intel: Update event list for haswellx

Update JSON core/uncore events for haswellx to perf.

Based on HSX JSON list v24:

https://download.01.org/perfmon/HSX

Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Xing Zhengjun <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf vendor events intel: Update event list for broadwellx
Zhengjun Xing [Tue, 14 Jun 2022 14:50:18 +0000 (22:50 +0800)]
perf vendor events intel: Update event list for broadwellx

Update JSON core/uncore events for broadwellx to perf.

Based on BDX JSON list v19:

https://download.01.org/perfmon/BDX

Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Xing Zhengjun <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf vendor events intel: Update event list for Snowridgex
Zhengjun Xing [Thu, 9 Jun 2022 09:42:22 +0000 (17:42 +0800)]
perf vendor events intel: Update event list for Snowridgex

More uncore events are added in the converter tool:

https://github.com/intel/event-converter-for-linux-perf

Keep both alias and the original name for the events, in case someone
already used the alias in their script.

Generate the perf events based on Snowridgex(SNR) event list v1.20:

https://download.01.org/perfmon/SNR/

Signed-off-by: Xing Zhengjun <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf vendor events intel: Rename tremontx to snowridgex
Zhengjun Xing [Thu, 9 Jun 2022 09:42:21 +0000 (17:42 +0800)]
perf vendor events intel: Rename tremontx to snowridgex

Tremontx was an old name for Snowridgex, so rename Tremontx to Snowridgex.

Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Xing Zhengjun <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf vendor events intel: Update event list for Sapphirerapids
Zhengjun Xing [Tue, 7 Jun 2022 09:27:49 +0000 (17:27 +0800)]
perf vendor events intel: Update event list for Sapphirerapids

Update JSON event list for Sapphirerapids to perf.

Based on JSON list v1.02:

https://download.01.org/perfmon/SPR/

Signed-off-by: Xing Zhengjun <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf vendor events intel: Update event list for Alderlake
Zhengjun Xing [Tue, 7 Jun 2022 09:27:48 +0000 (17:27 +0800)]
perf vendor events intel: Update event list for Alderlake

Update JSON event list for Alderlake to perf.

It is a hybrid event list for both Atom and Core.

Based on JSON list v1.11:

https://download.01.org/perfmon/ADL/

Signed-off-by: Xing Zhengjun <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf inject: Fix spelling mistake "theads" -> "threads"
Colin Ian King [Thu, 21 Jul 2022 12:45:28 +0000 (13:45 +0100)]
perf inject: Fix spelling mistake "theads" -> "threads"

There is a spelling mistake in a pr_err message. Fix it.

Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Add workqueue trace BPF support
Yang Jihong [Sat, 9 Jul 2022 01:50:33 +0000 (09:50 +0800)]
perf kwork: Add workqueue trace BPF support

Implements workqueue trace bpf function.

Test cases:

  # perf kwork -k workqueue lat -b
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
    Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
   --------------------------------------------------------------------------------------------------------------------------------
    (w)addrconf_verify_work        | 0002 |      5.856 ms |         1 |      5.856 ms |     111994.634313 s |     111994.640169 s |
    (w)vmstat_update               | 0001 |      1.247 ms |         1 |      1.247 ms |     111996.462651 s |     111996.463899 s |
    (w)neigh_periodic_work         | 0001 |      1.183 ms |         1 |      1.183 ms |     111996.462789 s |     111996.463973 s |
    (w)neigh_managed_work          | 0001 |      0.989 ms |         2 |      1.635 ms |     111996.462820 s |     111996.464455 s |
    (w)wb_workfn                   | 0000 |      0.667 ms |         1 |      0.667 ms |     111996.384273 s |     111996.384940 s |
    (w)bpf_prog_free_deferred      | 0001 |      0.495 ms |         1 |      0.495 ms |     111986.314201 s |     111986.314696 s |
    (w)mix_interrupt_randomness    | 0002 |      0.421 ms |         6 |      0.749 ms |     111995.927750 s |     111995.928499 s |
    (w)vmstat_shepherd             | 0000 |      0.374 ms |         2 |      0.385 ms |     111991.265242 s |     111991.265627 s |
    (w)e1000_watchdog              | 0002 |      0.356 ms |         5 |      0.390 ms |     111994.528380 s |     111994.528770 s |
    (w)vmstat_update               | 0000 |      0.231 ms |         2 |      0.365 ms |     111996.384407 s |     111996.384772 s |
    (w)flush_to_ldisc              | 0006 |      0.165 ms |         1 |      0.165 ms |     111995.930606 s |     111995.930771 s |
    (w)flush_to_ldisc              | 0000 |      0.094 ms |         2 |      0.095 ms |     111996.460453 s |     111996.460548 s |
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork -k workqueue rep -b
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    (w)e1000_watchdog              | 0002 |      0.627 ms |         2 |      0.324 ms |     112002.720665 s |     112002.720989 s |
    (w)flush_to_ldisc              | 0007 |      0.598 ms |         2 |      0.534 ms |     112000.875226 s |     112000.875761 s |
    (w)wq_barrier_func             | 0007 |      0.492 ms |         1 |      0.492 ms |     112000.876981 s |     112000.877473 s |
    (w)flush_to_ldisc              | 0007 |      0.281 ms |         1 |      0.281 ms |     112005.826882 s |     112005.827163 s |
    (w)mix_interrupt_randomness    | 0002 |      0.229 ms |         3 |      0.102 ms |     112005.825671 s |     112005.825774 s |
    (w)vmstat_shepherd             | 0000 |      0.202 ms |         1 |      0.202 ms |     112001.504511 s |     112001.504713 s |
    (w)bpf_prog_free_deferred      | 0001 |      0.181 ms |         1 |      0.181 ms |     112000.883251 s |     112000.883432 s |
    (w)wb_workfn                   | 0007 |      0.130 ms |         1 |      0.130 ms |     112001.505195 s |     112001.505325 s |
    (w)vmstat_update               | 0000 |      0.053 ms |         1 |      0.053 ms |     112001.504763 s |     112001.504815 s |
   --------------------------------------------------------------------------------------------------------------------------------

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Add softirq trace BPF support
Yang Jihong [Sat, 9 Jul 2022 01:50:32 +0000 (09:50 +0800)]
perf kwork: Add softirq trace BPF support

Implements softirq trace bpf function.

Test cases:
Trace softirq latency without filter:

  # perf kwork -k softirq lat -b
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
    Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
   --------------------------------------------------------------------------------------------------------------------------------
    (s)RCU:9                       | 0005 |      0.281 ms |         3 |      0.338 ms |     111295.752222 s |     111295.752560 s |
    (s)RCU:9                       | 0002 |      0.262 ms |        24 |      1.400 ms |     111301.335986 s |     111301.337386 s |
    (s)SCHED:7                     | 0005 |      0.177 ms |        14 |      0.212 ms |     111295.752270 s |     111295.752481 s |
    (s)RCU:9                       | 0007 |      0.161 ms |        47 |      2.022 ms |     111295.402159 s |     111295.404181 s |
    (s)NET_RX:3                    | 0003 |      0.149 ms |        12 |      1.261 ms |     111301.192964 s |     111301.194225 s |
    (s)TIMER:1                     | 0001 |      0.105 ms |         9 |      0.198 ms |     111301.180191 s |     111301.180389 s |
    ... <SNIP> ...
    (s)NET_RX:3                    | 0002 |      0.098 ms |         6 |      0.124 ms |     111295.403760 s |     111295.403884 s |
    (s)SCHED:7                     | 0001 |      0.093 ms |        19 |      0.242 ms |     111301.180256 s |     111301.180498 s |
    (s)SCHED:7                     | 0007 |      0.078 ms |        15 |      0.188 ms |     111300.064226 s |     111300.064415 s |
    (s)SCHED:7                     | 0004 |      0.077 ms |        11 |      0.213 ms |     111301.361759 s |     111301.361973 s |
    (s)SCHED:7                     | 0000 |      0.063 ms |        33 |      0.805 ms |     111295.401811 s |     111295.402616 s |
    (s)SCHED:7                     | 0003 |      0.063 ms |        14 |      0.085 ms |     111301.192255 s |     111301.192340 s |
   --------------------------------------------------------------------------------------------------------------------------------

Trace softirq latency with cpu filter:

  # perf kwork -k softirq lat -b -C 1
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
    Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
   --------------------------------------------------------------------------------------------------------------------------------
    (s)RCU:9                       | 0001 |      0.178 ms |         5 |      0.572 ms |     111435.534135 s |     111435.534707 s |
   --------------------------------------------------------------------------------------------------------------------------------

Trace softirq latency with name filter:

  # perf kwork -k softirq lat -b -n SCHED
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
    Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
   --------------------------------------------------------------------------------------------------------------------------------
    (s)SCHED:7                     | 0001 |      0.295 ms |        15 |      2.183 ms |     111452.534950 s |     111452.537133 s |
    (s)SCHED:7                     | 0002 |      0.215 ms |        10 |      0.315 ms |     111460.000238 s |     111460.000553 s |
    (s)SCHED:7                     | 0005 |      0.190 ms |        29 |      0.338 ms |     111457.032538 s |     111457.032876 s |
    (s)SCHED:7                     | 0003 |      0.097 ms |        10 |      0.319 ms |     111452.434351 s |     111452.434670 s |
    (s)SCHED:7                     | 0006 |      0.089 ms |         1 |      0.089 ms |     111450.737450 s |     111450.737539 s |
    (s)SCHED:7                     | 0007 |      0.085 ms |        17 |      0.169 ms |     111452.471333 s |     111452.471502 s |
    (s)SCHED:7                     | 0004 |      0.071 ms |        15 |      0.221 ms |     111452.535252 s |     111452.535473 s |
    (s)SCHED:7                     | 0000 |      0.044 ms |        32 |      0.130 ms |     111460.001982 s |     111460.002112 s |
   --------------------------------------------------------------------------------------------------------------------------------

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Add {} for multiline if blocks ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Add IRQ trace BPF support
Yang Jihong [Sat, 9 Jul 2022 01:50:31 +0000 (09:50 +0800)]
perf kwork: Add IRQ trace BPF support

Implements irq trace bpf function.

Test cases:
Trace irq without filter:

  # perf kwork -k irq rep -b
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    virtio0-requests:25            | 0000 |     31.026 ms |       285 |      1.493 ms |     110326.049963 s |     110326.051456 s |
    eth0:10                        | 0002 |      7.875 ms |        96 |      1.429 ms |     110313.916835 s |     110313.918264 s |
    ata_piix:14                    | 0002 |      2.510 ms |        28 |      0.396 ms |     110331.367987 s |     110331.368383 s |
   --------------------------------------------------------------------------------------------------------------------------------

Trace irq with cpu filter:

  # perf kwork -k irq rep -b -C 0
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    virtio0-requests:25            | 0000 |     34.288 ms |       282 |      2.061 ms |     110358.078968 s |     110358.081029 s |
   --------------------------------------------------------------------------------------------------------------------------------

Trace irq with name filter:

  # perf kwork -k irq rep -b -n eth0
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    eth0:10                        | 0002 |      2.184 ms |        21 |      0.572 ms |     110386.541699 s |     110386.542271 s |
   --------------------------------------------------------------------------------------------------------------------------------

Trace irq with summary:

  # perf kwork -k irq rep -b -S
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    virtio0-requests:25            | 0000 |     42.923 ms |       285 |      1.181 ms |     110418.128867 s |     110418.130049 s |
    eth0:10                        | 0002 |      2.085 ms |        20 |      0.668 ms |     110416.002935 s |     110416.003603 s |
    ata_piix:14                    | 0002 |      0.970 ms |         4 |      0.656 ms |     110424.034482 s |     110424.035138 s |
   --------------------------------------------------------------------------------------------------------------------------------
    Total count            :       309
    Total runtime   (msec) :    45.977 (0.003% load average)
    Total time span (msec) : 17017.655
   --------------------------------------------------------------------------------------------------------------------------------

Committer testing:

  # perf kwork -k irq rep -b
  Starting trace, Hit <Ctrl+C> to stop and report
  ^C
    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    nvme0q20:145                   | 0019 |      0.570 ms |        28 |      0.064 ms |      26966.635102 s |      26966.635167 s |
    amdgpu:162                     | 0002 |      0.568 ms |        29 |      0.068 ms |      26966.644346 s |      26966.644414 s |
    nvme0q4:129                    | 0003 |      0.565 ms |        31 |      0.037 ms |      26966.614830 s |      26966.614866 s |
    nvme0q16:141                   | 0015 |      0.205 ms |        66 |      0.012 ms |      26967.145161 s |      26967.145174 s |
    nvme0q29:154                   | 0028 |      0.154 ms |        44 |      0.014 ms |      26967.078970 s |      26967.078984 s |
    nvme0q10:135                   | 0009 |      0.134 ms |        43 |      0.011 ms |      26967.132093 s |      26967.132104 s |
    nvme0q2:127                    | 0001 |      0.132 ms |        26 |      0.011 ms |      26966.883584 s |      26966.883595 s |
    nvme0q25:150                   | 0024 |      0.127 ms |        32 |      0.014 ms |      26966.631419 s |      26966.631433 s |
    nvme0q14:139                   | 0013 |      0.110 ms |        21 |      0.017 ms |      26966.760843 s |      26966.760861 s |
    nvme0q30:155                   | 0029 |      0.102 ms |        30 |      0.022 ms |      26966.677171 s |      26966.677193 s |
    nvme0q13:138                   | 0012 |      0.088 ms |        20 |      0.015 ms |      26966.738733 s |      26966.738748 s |
    nvme0q6:131                    | 0005 |      0.087 ms |        13 |      0.020 ms |      26966.648445 s |      26966.648465 s |
    nvme0q28:153                   | 0027 |      0.066 ms |        12 |      0.015 ms |      26966.771431 s |      26966.771447 s |
    nvme0q26:151                   | 0025 |      0.060 ms |        13 |      0.012 ms |      26966.704266 s |      26966.704278 s |
    nvme0q21:146                   | 0020 |      0.054 ms |        20 |      0.011 ms |      26967.322082 s |      26967.322094 s |
    nvme0q1:126                    | 0000 |      0.046 ms |        11 |      0.013 ms |      26966.859754 s |      26966.859767 s |
    nvme0q17:142                   | 0016 |      0.046 ms |        10 |      0.011 ms |      26967.114513 s |      26967.114524 s |
    xhci_hcd:74                    | 0015 |      0.041 ms |         3 |      0.016 ms |      26967.086004 s |      26967.086020 s |
    nvme0q8:133                    | 0007 |      0.039 ms |        12 |      0.008 ms |      26966.712056 s |      26966.712063 s |
    nvme0q32:157                   | 0031 |      0.036 ms |        10 |      0.014 ms |      26966.627054 s |      26966.627068 s |
    nvme0q9:134                    | 0008 |      0.036 ms |        11 |      0.011 ms |      26967.258452 s |      26967.258462 s |
    nvme0q7:132                    | 0006 |      0.024 ms |         3 |      0.014 ms |      26966.767404 s |      26966.767418 s |
    nvme0q11:136                   | 0010 |      0.023 ms |         5 |      0.006 ms |      26966.935455 s |      26966.935461 s |
    nvme0q31:156                   | 0030 |      0.018 ms |         5 |      0.006 ms |      26966.627517 s |      26966.627524 s |
    nvme0q12:137                   | 0011 |      0.015 ms |         2 |      0.014 ms |      26966.799588 s |      26966.799602 s |
    enp5s0-rx-0:164                | 0006 |      0.009 ms |         2 |      0.005 ms |      26966.742024 s |      26966.742028 s |
    enp5s0-rx-1:165                | 0007 |      0.006 ms |         2 |      0.004 ms |      26966.939486 s |      26966.939490 s |
    enp5s0-tx-0:166                | 0008 |      0.005 ms |         1 |      0.005 ms |      26966.939484 s |      26966.939489 s |
    enp5s0-tx-1:167                | 0009 |      0.005 ms |         1 |      0.005 ms |      26966.939484 s |      26966.939489 s |
   --------------------------------------------------------------------------------------------------------------------------------

  #t

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Implement BPF trace
Yang Jihong [Sat, 9 Jul 2022 01:50:30 +0000 (09:50 +0800)]
perf kwork: Implement BPF trace

'perf record' generates perf.data, which generates extra interrupts
for hard disk, amount of data to be collected increases with time.

Using eBPF trace can process the data in kernel, which solves the
preceding two problems.

Add -b/--use-bpf option for latency and report to support
tracing kwork events using eBPF:

1. Create bpf prog and attach to tracepoints,
2. Start tracing after command is entered,
3. After user hit "ctrl+c", stop tracing and report,
4. Support CPU and name filtering.

This commit implements the framework code and
does not add specific event support.

Test cases:

  # perf kwork rep -h

   Usage: perf kwork report [<options>]

      -b, --use-bpf         Use BPF to measure kwork runtime
      -C, --cpu <cpu>       list of cpus to profile
      -i, --input <file>    input file name
      -n, --name <name>     event name to profile
      -s, --sort <key[,key2...]>
                            sort by key(s): runtime, max, count
      -S, --with-summary    Show summary with statistics
          --time <str>      Time span for analysis (start,stop)

  # perf kwork lat -h

   Usage: perf kwork latency [<options>]

      -b, --use-bpf         Use BPF to measure kwork latency
      -C, --cpu <cpu>       list of cpus to profile
      -i, --input <file>    input file name
      -n, --name <name>     event name to profile
      -s, --sort <key[,key2...]>
                            sort by key(s): avg, max, count
          --time <str>      Time span for analysis (start,stop)

  # perf kwork lat -b
  Unsupported bpf trace class irq

  # perf kwork rep -b
  Unsupported bpf trace class irq

Signed-off-by: Yang Jihong <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Simplify work_findnew() ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Implement perf kwork timehist
Yang Jihong [Sat, 9 Jul 2022 01:50:29 +0000 (09:50 +0800)]
perf kwork: Implement perf kwork timehist

Implements framework of perf kwork timehist,
to provide an analysis of kernel work events.

Test cases:

  # perf kwork tim
   Runtime start      Runtime end        Cpu     Kwork name                      Runtime     Delaytime
                                                 (TYPE)NAME:NUM                  (msec)      (msec)
   -----------------  -----------------  ------  ------------------------------  ----------  ----------
        91576.060290       91576.060344  [0000]  (s)RCU:9                             0.055       0.111
        91576.061470       91576.061547  [0000]  (s)SCHED:7                           0.077       0.073
        91576.062604       91576.062697  [0001]  (s)RCU:9                             0.094       0.409
        91576.064443       91576.064517  [0002]  (s)RCU:9                             0.074       0.114
        91576.065144       91576.065211  [0000]  (s)SCHED:7                           0.067       0.058
        91576.066564       91576.066609  [0003]  (s)RCU:9                             0.045       0.110
        91576.068495       91576.068559  [0000]  (s)SCHED:7                           0.064       0.059
        91576.068900       91576.068996  [0004]  (s)RCU:9                             0.096       0.726
        91576.069364       91576.069420  [0002]  (s)RCU:9                             0.056       0.082
        91576.069649       91576.069701  [0004]  (s)RCU:9                             0.052       0.111
        91576.070147       91576.070206  [0000]  (s)SCHED:7                           0.060       0.057
        91576.073147       91576.073202  [0000]  (s)SCHED:7                           0.054       0.060
  <SNIP>

  # perf kwork tim --max-stack 2 -g
   Runtime start      Runtime end        Cpu     Kwork name                      Runtime     Delaytime
                                                 (TYPE)NAME:NUM                  (msec)      (msec)
   -----------------  -----------------  ------  ------------------------------  ----------  ----------
        91576.060290       91576.060344  [0000]  (s)RCU:9                             0.055       0.111   irq_exit_rcu <- sysvec_apic_timer_interrupt
        91576.061470       91576.061547  [0000]  (s)SCHED:7                           0.077       0.073   irq_exit_rcu <- sysvec_call_function_single
        91576.062604       91576.062697  [0001]  (s)RCU:9                             0.094       0.409   irq_exit_rcu <- sysvec_apic_timer_interrupt
        91576.064443       91576.064517  [0002]  (s)RCU:9                             0.074       0.114   irq_exit_rcu <- sysvec_apic_timer_interrupt
        91576.065144       91576.065211  [0000]  (s)SCHED:7                           0.067       0.058   irq_exit_rcu <- sysvec_call_function_single
        91576.066564       91576.066609  [0003]  (s)RCU:9                             0.045       0.110   irq_exit_rcu <- sysvec_apic_timer_interrupt
        91576.068495       91576.068559  [0000]  (s)SCHED:7                           0.064       0.059   irq_exit_rcu <- sysvec_call_function_single
        91576.068900       91576.068996  [0004]  (s)RCU:9                             0.096       0.726   irq_exit_rcu <- sysvec_apic_timer_interrupt
        91576.069364       91576.069420  [0002]  (s)RCU:9                             0.056       0.082   irq_exit_rcu <- sysvec_apic_timer_interrupt
        91576.069649       91576.069701  [0004]  (s)RCU:9                             0.052       0.111   irq_exit_rcu <- sysvec_apic_timer_interrupt
  <SNIP>

Committer testing:

  # perf kwork -k workqueue timehist | head -40
   Runtime start      Runtime end        Cpu     Kwork name                      Runtime     Delaytime
                                                 (TYPE)NAME:NUM                  (msec)      (msec)
   -----------------  -----------------  ------  ------------------------------  ----------  ----------
        26520.211825       26520.211832  [0019]  (w)free_work                         0.007       0.004
        26520.212929       26520.212934  [0020]  (w)free_work                         0.005       0.004
        26520.213226       26520.213228  [0014]  (w)kfree_rcu_work                    0.002       0.004
        26520.214057       26520.214061  [0021]  (w)free_work                         0.004       0.004
        26520.221239       26520.221241  [0007]  (w)kfree_rcu_work                    0.002       0.009
        26520.223232       26520.223238  [0013]  (w)psi_avgs_work                     0.005       0.006
        26520.230057       26520.230060  [0020]  (w)free_work                         0.003       0.003
        26520.270428       26520.270434  [0015]  (w)free_work                         0.006       0.004
        26520.270546       26520.270550  [0014]  (w)free_work                         0.004       0.003
        26520.281626       26520.281629  [0015]  (w)free_work                         0.003       0.002
        26520.287225       26520.287230  [0012]  (w)psi_avgs_work                     0.005       0.008
        26520.287231       26520.287235  [0001]  (w)psi_avgs_work                     0.004       0.011
        26520.287236       26520.287239  [0001]  (w)psi_avgs_work                     0.003       0.012
        26520.329488       26520.329492  [0024]  (w)free_work                         0.004       0.004
        26520.330600       26520.330605  [0007]  (w)free_work                         0.005       0.004
        26520.334218       26520.334218  [0007]  (w)kfree_rcu_monitor                 0.001       0.002
        26520.335220       26520.335221  [0005]  (w)kfree_rcu_monitor                 0.001       0.004
        26520.343980       26520.343985  [0007]  (w)free_work                         0.005       0.002
        26520.345093       26520.345097  [0006]  (w)free_work                         0.004       0.003
        26520.351233       26520.351238  [0027]  (w)psi_avgs_work                     0.005       0.008
        26520.353228       26520.353229  [0007]  (w)kfree_rcu_work                    0.001       0.002
        26520.353229       26520.353231  [0005]  (w)kfree_rcu_work                    0.001       0.006
        26520.382381       26520.382383  [0006]  (w)free_work                         0.003       0.002
        26520.386547       26520.386548  [0006]  (w)free_work                         0.002       0.001
        26520.391243       26520.391245  [0015]  (w)console_callback                  0.002       0.016
        26520.415369       26520.415621  [0027]  (w)btrfs_work_helper                 0.252
        26520.415351       26520.416174  [0002]  (w)btrfs_work_helper                 0.823       0.037
        26520.415343       26520.416304  [0031]  (w)btrfs_work_helper                 0.961
        26520.415335       26520.417078  [0001]  (w)btrfs_work_helper                 1.743
        26520.415250       26520.417564  [0002]  (w)wb_workfn                         2.314
        26520.424777       26520.424787  [0002]  (w)btrfs_work_helper                 0.010
        26520.424788       26520.424798  [0002]  (w)btrfs_work_helper                 0.010
        26520.424790       26520.424805  [0001]  (w)btrfs_work_helper                 0.016       0.016
        26520.424801       26520.424807  [0002]  (w)btrfs_work_helper                 0.006
        26520.424809       26520.424831  [0002]  (w)btrfs_work_helper                 0.022       0.030
        26520.424824       26520.424835  [0027]  (w)btrfs_work_helper                 0.011
        26520.424809       26520.424867  [0001]  (w)btrfs_work_helper                 0.059       0.032
  #

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Add workqueue latency support
Yang Jihong [Sat, 9 Jul 2022 01:50:28 +0000 (09:50 +0800)]
perf kwork: Add workqueue latency support

Implements workqueue latency function.

Test cases:

  # perf kwork -k workqueue lat

    Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
   --------------------------------------------------------------------------------------------------------------------------------
    (w)vmstat_update               | 0001 |      5.004 ms |         1 |      5.004 ms |      44001.745646 s |      44001.750650 s |
    (w)vmstat_update               | 0006 |      1.773 ms |         1 |      1.773 ms |      44000.830840 s |      44000.832613 s |
    (w)vmstat_shepherd             | 0000 |      0.992 ms |         8 |      2.474 ms |      44007.717845 s |      44007.720318 s |
    (w)vmstat_update               | 0000 |      0.974 ms |         5 |      2.624 ms |      44004.785970 s |      44004.788594 s |
    (w)e1000_watchdog              | 0002 |      0.687 ms |         5 |      2.632 ms |      44005.009334 s |      44005.011966 s |
    (w)vmstat_update               | 0002 |      0.307 ms |         1 |      0.307 ms |      44004.817395 s |      44004.817702 s |
    (w)vmstat_update               | 0004 |      0.296 ms |         1 |      0.296 ms |      43997.913677 s |      43997.913973 s |
    (w)mix_interrupt_randomness    | 0000 |      0.283 ms |       285 |      3.724 ms |      44006.790889 s |      44006.794613 s |
    (w)neigh_managed_work          | 0001 |      0.271 ms |         1 |      0.271 ms |      43997.665542 s |      43997.665813 s |
    (w)vmstat_update               | 0005 |      0.261 ms |         1 |      0.261 ms |      44007.820542 s |      44007.820803 s |
    (w)neigh_managed_work          | 0004 |      0.220 ms |         1 |      0.220 ms |      44002.953287 s |      44002.953507 s |
    (w)neigh_periodic_work         | 0004 |      0.217 ms |         1 |      0.217 ms |      43999.929718 s |      43999.929935 s |
    (w)mix_interrupt_randomness    | 0002 |      0.199 ms |         5 |      0.310 ms |      44005.012316 s |      44005.012625 s |
    (w)vmstat_update               | 0003 |      0.199 ms |         4 |      0.307 ms |      44005.714391 s |      44005.714699 s |
    (w)gc_worker                   | 0001 |      0.071 ms |       173 |      1.128 ms |      44002.062579 s |      44002.063707 s |
   --------------------------------------------------------------------------------------------------------------------------------
    INFO: 0.020% skipped events (17 including 10 raise, 7 entry, 0 exit)

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Add softirq latency support
Yang Jihong [Sat, 9 Jul 2022 01:50:27 +0000 (09:50 +0800)]
perf kwork: Add softirq latency support

Implements softirq latency function.

Test cases:

  # perf kwork -k softirq lat

    Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
   --------------------------------------------------------------------------------------------------------------------------------
    (s)TIMER:1                     | 0006 |      1.048 ms |         1 |      1.048 ms |      44000.829759 s |      44000.830807 s |
    (s)TIMER:1                     | 0001 |      1.008 ms |         4 |      3.434 ms |      43997.662069 s |      43997.665503 s |
    (s)RCU:9                       | 0006 |      0.675 ms |         7 |      1.328 ms |      43997.670304 s |      43997.671632 s |
    (s)RCU:9                       | 0000 |      0.414 ms |       701 |      3.996 ms |      43997.661170 s |      43997.665167 s |
    (s)RCU:9                       | 0005 |      0.245 ms |        88 |      1.866 ms |      43997.683105 s |      43997.684971 s |
    (s)SCHED:7                     | 0000 |      0.158 ms |       677 |      2.639 ms |      44004.785716 s |      44004.788355 s |
    ... <SNIP> ...
    (s)RCU:9                       | 0002 |      0.141 ms |       932 |      1.662 ms |      44005.010206 s |      44005.011868 s |
    (s)RCU:9                       | 0003 |      0.129 ms |      2193 |      1.507 ms |      44006.010208 s |      44006.011715 s |
    (s)TIMER:1                     | 0005 |      0.128 ms |         1 |      0.128 ms |      44007.820346 s |      44007.820474 s |
    (s)SCHED:7                     | 0002 |      0.040 ms |      1731 |      0.211 ms |      44005.009237 s |      44005.009447 s |
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork -k softirq lat -C 1,2

    Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
   --------------------------------------------------------------------------------------------------------------------------------
    (s)TIMER:1                     | 0001 |      1.008 ms |         4 |      3.434 ms |      43997.662069 s |      43997.665503 s |
    (s)RCU:9                       | 0001 |      0.216 ms |      1619 |      3.659 ms |      43997.662069 s |      43997.665727 s |
    (s)RCU:9                       | 0002 |      0.141 ms |       932 |      1.662 ms |      44005.010206 s |      44005.011868 s |
    (s)NET_RX:3                    | 0002 |      0.106 ms |         5 |      0.163 ms |      44005.012255 s |      44005.012418 s |
    (s)TIMER:1                     | 0002 |      0.084 ms |         9 |      0.114 ms |      44005.009168 s |      44005.009282 s |
    (s)SCHED:7                     | 0001 |      0.049 ms |       655 |      0.837 ms |      44005.707998 s |      44005.708835 s |
    (s)SCHED:7                     | 0002 |      0.040 ms |      1731 |      0.211 ms |      44005.009237 s |      44005.009447 s |
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork -k softirq lat -n RCU

    Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
   --------------------------------------------------------------------------------------------------------------------------------
    (s)RCU:9                       | 0006 |      0.675 ms |         7 |      1.328 ms |      43997.670304 s |      43997.671632 s |
    (s)RCU:9                       | 0000 |      0.414 ms |       701 |      3.996 ms |      43997.661170 s |      43997.665167 s |
    (s)RCU:9                       | 0005 |      0.245 ms |        88 |      1.866 ms |      43997.683105 s |      43997.684971 s |
    (s)RCU:9                       | 0004 |      0.237 ms |        26 |      0.792 ms |      43997.683018 s |      43997.683810 s |
    (s)RCU:9                       | 0007 |      0.217 ms |       140 |      1.335 ms |      43997.671080 s |      43997.672415 s |
    (s)RCU:9                       | 0001 |      0.216 ms |      1619 |      3.659 ms |      43997.662069 s |      43997.665727 s |
    (s)RCU:9                       | 0002 |      0.141 ms |       932 |      1.662 ms |      44005.010206 s |      44005.011868 s |
    (s)RCU:9                       | 0003 |      0.129 ms |      2193 |      1.507 ms |      44006.010208 s |      44006.011715 s |
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork -k softirq lat -s count,avg -n RCU

    Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
   --------------------------------------------------------------------------------------------------------------------------------
    (s)RCU:9                       | 0003 |      0.129 ms |      2193 |      1.507 ms |      44006.010208 s |      44006.011715 s |
    (s)RCU:9                       | 0001 |      0.216 ms |      1619 |      3.659 ms |      43997.662069 s |      43997.665727 s |
    (s)RCU:9                       | 0002 |      0.141 ms |       932 |      1.662 ms |      44005.010206 s |      44005.011868 s |
    (s)RCU:9                       | 0000 |      0.414 ms |       701 |      3.996 ms |      43997.661170 s |      43997.665167 s |
    (s)RCU:9                       | 0007 |      0.217 ms |       140 |      1.335 ms |      43997.671080 s |      43997.672415 s |
    (s)RCU:9                       | 0005 |      0.245 ms |        88 |      1.866 ms |      43997.683105 s |      43997.684971 s |
    (s)RCU:9                       | 0004 |      0.237 ms |        26 |      0.792 ms |      43997.683018 s |      43997.683810 s |
    (s)RCU:9                       | 0006 |      0.675 ms |         7 |      1.328 ms |      43997.670304 s |      43997.671632 s |
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork -k softirq lat --time 43997,

    Kwork Name                     | Cpu  | Avg delay     | Count     | Max delay     | Max delay start     | Max delay end       |
   --------------------------------------------------------------------------------------------------------------------------------
    (s)TIMER:1                     | 0006 |      1.048 ms |         1 |      1.048 ms |      44000.829759 s |      44000.830807 s |
    (s)TIMER:1                     | 0001 |      1.008 ms |         4 |      3.434 ms |      43997.662069 s |      43997.665503 s |
    (s)RCU:9                       | 0006 |      0.675 ms |         7 |      1.328 ms |      43997.670304 s |      43997.671632 s |
    (s)RCU:9                       | 0000 |      0.414 ms |       701 |      3.996 ms |      43997.661170 s |      43997.665167 s |
    (s)TIMER:1                     | 0004 |      0.083 ms |        21 |      0.127 ms |      44004.969171 s |      44004.969298 s |
    ... <SNIP> ...
    (s)SCHED:7                     | 0005 |      0.050 ms |         4 |      0.086 ms |      43997.684852 s |      43997.684938 s |
    (s)SCHED:7                     | 0001 |      0.049 ms |       655 |      0.837 ms |      44005.707998 s |      44005.708835 s |
    (s)SCHED:7                     | 0007 |      0.044 ms |       171 |      0.077 ms |      43997.943265 s |      43997.943342 s |
    (s)SCHED:7                     | 0002 |      0.040 ms |      1731 |      0.211 ms |      44005.009237 s |      44005.009447 s |
   --------------------------------------------------------------------------------------------------------------------------------

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Implement perf kwork latency
Yang Jihong [Sat, 9 Jul 2022 01:50:26 +0000 (09:50 +0800)]
perf kwork: Implement perf kwork latency

Implements framework of perf kwork latency, which is used to report time
properties such as delay time and frequency.

Test cases:

  # perf kwork lat -h

   Usage: perf kwork latency [<options>]

      -C, --cpu <cpu>       list of cpus to profile
      -i, --input <file>    input file name
      -n, --name <name>     event name to profile
      -s, --sort <key[,key2...]>
                            sort by key(s): avg, max, count
          --time <str>      Time span for analysis (start,stop)

  # perf kwork lat -C 199
  Requested CPU 199 too large. Consider raising MAX_NR_CPUS
  Invalid cpu bitmap

  # perf kwork lat -i perf_no_exist.data
  failed to open perf_no_exist.data: No such file or directory

  # perf kwork lat -s avg1
    Error: Unknown --sort key: `avg1'

   Usage: perf kwork latency [<options>]

      -C, --cpu <cpu>       list of cpus to profile
      -i, --input <file>    input file name
      -n, --name <name>     event name to profile
      -s, --sort <key[,key2...]>
                            sort by key(s): avg, max, count
          --time <str>      Time span for analysis (start,stop)

  # perf kwork lat --time FFFF,
  Invalid time span

  # perf kwork lat

    Kwork Name                     | Cpu  | Avg delay     | Count    | Max delay     | Max delay start     | Max delay end       |
   --------------------------------------------------------------------------------------------------------------------------------
   --------------------------------------------------------------------------------------------------------------------------------
    INFO: 36.570% skipped events (31537 including 0 raise, 31537 entry, 0 exit)

Since there are no latency-enabled events, the output is empty.

Signed-off-by: Yang Jihong <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Add {} for multiline if blocks ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Add workqueue report support
Yang Jihong [Sat, 9 Jul 2022 01:50:25 +0000 (09:50 +0800)]
perf kwork: Add workqueue report support

Implements workqueue report function.

Test cases:

  # perf kwork -k workqueue rep

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    (w)gc_worker                   | 0001 |   1912.389 ms |       173 |     12.896 ms |      44002.050787 s |      44002.063683 s |
    (w)mix_interrupt_randomness    | 0000 |     24.308 ms |       285 |      3.349 ms |      44004.784908 s |      44004.788257 s |
    (w)e1000_watchdog              | 0002 |      5.332 ms |         5 |      2.059 ms |      44000.914366 s |      44000.916424 s |
    (w)vmstat_update               | 0005 |      0.989 ms |         2 |      0.953 ms |      43997.986991 s |      43997.987944 s |
    (w)vmstat_shepherd             | 0000 |      0.964 ms |         8 |      0.195 ms |      43997.986453 s |      43997.986648 s |
    (w)vmstat_update               | 0003 |      0.306 ms |         6 |      0.077 ms |      44004.689543 s |      44004.689620 s |
    (w)vmstat_update               | 0000 |      0.196 ms |         5 |      0.049 ms |      44005.713732 s |      44005.713781 s |
    (w)vmstat_update               | 0001 |      0.162 ms |         2 |      0.130 ms |      44000.192034 s |      44000.192164 s |
    (w)mix_interrupt_randomness    | 0002 |      0.114 ms |         5 |      0.037 ms |      44005.012625 s |      44005.012662 s |
    (w)vmstat_update               | 0002 |      0.084 ms |         2 |      0.043 ms |      44004.817702 s |      44004.817745 s |
    (w)vmstat_update               | 0006 |      0.067 ms |         2 |      0.041 ms |      43997.987214 s |      43997.987254 s |
    (w)neigh_periodic_work         | 0004 |      0.039 ms |         1 |      0.039 ms |      43999.929935 s |      43999.929974 s |
    (w)vmstat_update               | 0007 |      0.037 ms |         1 |      0.037 ms |      43997.988969 s |      43997.989006 s |
    (w)neigh_managed_work          | 0001 |      0.036 ms |         1 |      0.036 ms |      43997.665813 s |      43997.665849 s |
    (w)neigh_managed_work          | 0004 |      0.036 ms |         1 |      0.036 ms |      44002.953507 s |      44002.953543 s |
    (w)vmstat_update               | 0004 |      0.027 ms |         1 |      0.027 ms |      43997.913973 s |      43997.914000 s |
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork -k workqueue rep -S

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    (w)gc_worker                   | 0001 |   1912.389 ms |       173 |     12.896 ms |      44002.050787 s |      44002.063683 s |
    (w)mix_interrupt_randomness    | 0000 |     24.308 ms |       285 |      3.349 ms |      44004.784908 s |      44004.788257 s |
    (w)e1000_watchdog              | 0002 |      5.332 ms |         5 |      2.059 ms |      44000.914366 s |      44000.916424 s |
    (w)vmstat_update               | 0005 |      0.989 ms |         2 |      0.953 ms |      43997.986991 s |      43997.987944 s |
    (w)vmstat_shepherd             | 0000 |      0.964 ms |         8 |      0.195 ms |      43997.986453 s |      43997.986648 s |
    (w)vmstat_update               | 0003 |      0.306 ms |         6 |      0.077 ms |      44004.689543 s |      44004.689620 s |
    (w)vmstat_update               | 0000 |      0.196 ms |         5 |      0.049 ms |      44005.713732 s |      44005.713781 s |
    (w)vmstat_update               | 0001 |      0.162 ms |         2 |      0.130 ms |      44000.192034 s |      44000.192164 s |
    (w)mix_interrupt_randomness    | 0002 |      0.114 ms |         5 |      0.037 ms |      44005.012625 s |      44005.012662 s |
    (w)vmstat_update               | 0002 |      0.084 ms |         2 |      0.043 ms |      44004.817702 s |      44004.817745 s |
    (w)vmstat_update               | 0006 |      0.067 ms |         2 |      0.041 ms |      43997.987214 s |      43997.987254 s |
    (w)neigh_periodic_work         | 0004 |      0.039 ms |         1 |      0.039 ms |      43999.929935 s |      43999.929974 s |
    (w)vmstat_update               | 0007 |      0.037 ms |         1 |      0.037 ms |      43997.988969 s |      43997.989006 s |
    (w)neigh_managed_work          | 0001 |      0.036 ms |         1 |      0.036 ms |      43997.665813 s |      43997.665849 s |
    (w)neigh_managed_work          | 0004 |      0.036 ms |         1 |      0.036 ms |      44002.953507 s |      44002.953543 s |
    (w)vmstat_update               | 0004 |      0.027 ms |         1 |      0.027 ms |      43997.913973 s |      43997.914000 s |
   --------------------------------------------------------------------------------------------------------------------------------
    Total count            :       500
    Total runtime   (msec) :  1945.085 (0.192% load average)
    Total time span (msec) : 10155.026
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork -k workqueue rep -n vmstat_update

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    (w)vmstat_update               | 0005 |      0.989 ms |         2 |      0.953 ms |      43997.986991 s |      43997.987944 s |
    (w)vmstat_update               | 0003 |      0.306 ms |         6 |      0.077 ms |      44004.689543 s |      44004.689620 s |
    (w)vmstat_update               | 0000 |      0.196 ms |         5 |      0.049 ms |      44005.713732 s |      44005.713781 s |
    (w)vmstat_update               | 0001 |      0.162 ms |         2 |      0.130 ms |      44000.192034 s |      44000.192164 s |
    (w)vmstat_update               | 0002 |      0.084 ms |         2 |      0.043 ms |      44004.817702 s |      44004.817745 s |
    (w)vmstat_update               | 0006 |      0.067 ms |         2 |      0.041 ms |      43997.987214 s |      43997.987254 s |
    (w)vmstat_update               | 0007 |      0.037 ms |         1 |      0.037 ms |      43997.988969 s |      43997.989006 s |
    (w)vmstat_update               | 0004 |      0.027 ms |         1 |      0.027 ms |      43997.913973 s |      43997.914000 s |
   --------------------------------------------------------------------------------------------------------------------------------

Committer testing:

  # perf kwork -k workqueue rep -C 1 | head -20

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    (w)commit_work                 | 0001 |     25.896 ms |         2 |     13.200 ms |      26522.906700 s |      26522.919900 s |
    (w)commit_work                 | 0001 |     13.316 ms |         1 |     13.316 ms |      26522.573246 s |      26522.586562 s |
    (w)commit_work                 | 0001 |     13.177 ms |         1 |     13.177 ms |      26522.673406 s |      26522.686583 s |
    (w)commit_work                 | 0001 |     12.630 ms |         1 |     12.630 ms |      26522.123921 s |      26522.136551 s |
    (w)btrfs_work_helper           | 0001 |      3.544 ms |         1 |      3.544 ms |      26529.131296 s |      26529.134840 s |
    (w)btrfs_work_helper           | 0001 |      3.330 ms |         1 |      3.330 ms |      26529.137698 s |      26529.141028 s |
    (w)btrfs_work_helper           | 0001 |      2.855 ms |         1 |      2.855 ms |      26529.134842 s |      26529.137697 s |
    (w)btrfs_work_helper           | 0001 |      2.757 ms |         1 |      2.757 ms |      26529.124086 s |      26529.126843 s |
    (w)btrfs_work_helper           | 0001 |      2.182 ms |         1 |      2.182 ms |      26529.141030 s |      26529.143212 s |
    (w)btrfs_work_helper           | 0001 |      1.743 ms |         1 |      1.743 ms |      26520.415335 s |      26520.417078 s |
    (w)btrfs_work_helper           | 0001 |      1.499 ms |         1 |      1.499 ms |      26529.127774 s |      26529.129272 s |
    (w)btrfs_work_helper           | 0001 |      1.446 ms |         1 |      1.446 ms |      26529.129848 s |      26529.131294 s |
    (w)btrfs_work_helper           | 0001 |      1.373 ms |         1 |      1.373 ms |      26523.808270 s |      26523.809643 s |
    (w)wb_workfn                   | 0001 |      1.165 ms |         2 |      0.763 ms |      26527.071056 s |      26527.071819 s |
    (w)btrfs_work_helper           | 0001 |      0.926 ms |         1 |      0.926 ms |      26529.126846 s |      26529.127771 s |
    (w)btrfs_work_helper           | 0001 |      0.571 ms |         1 |      0.571 ms |      26529.129275 s |      26529.129846 s |
    (w)wb_workfn                   | 0001 |      0.525 ms |         1 |      0.525 ms |      26522.975151 s |      26522.975676 s |
  #

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Add softirq report support
Yang Jihong [Sat, 9 Jul 2022 01:50:24 +0000 (09:50 +0800)]
perf kwork: Add softirq report support

Implements softirq kwork report function.

Test cases:

  # perf kwork -k softirq rep

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    (s)TIMER:1                     | 0003 |    181.387 ms |      2476 |      1.240 ms |      44004.787960 s |      44004.789201 s |
    (s)RCU:9                       | 0003 |     91.573 ms |      2193 |      0.650 ms |      44004.790258 s |      44004.790908 s |
    (s)RCU:9                       | 0001 |     78.960 ms |      1619 |      1.195 ms |      44001.496553 s |      44001.497749 s |
    (s)SCHED:7                     | 0003 |     55.962 ms |      1255 |      0.954 ms |      44004.812008 s |      44004.812962 s |
    ... <SNIP> ...
    (s)RCU:9                       | 0004 |      0.830 ms |        26 |      0.058 ms |      43997.666418 s |      43997.666476 s |
    (s)TIMER:1                     | 0001 |      0.471 ms |         4 |      0.158 ms |      44007.834694 s |      44007.834852 s |
    (s)RCU:9                       | 0006 |      0.220 ms |         7 |      0.048 ms |      44004.833764 s |      44004.833812 s |
    (s)NET_RX:3                    | 0002 |      0.164 ms |         5 |      0.049 ms |      44005.012418 s |      44005.012466 s |
    (s)TIMER:1                     | 0005 |      0.164 ms |         1 |      0.164 ms |      44007.820474 s |      44007.820638 s |
    (s)TIMER:1                     | 0006 |      0.087 ms |         1 |      0.087 ms |      44000.830807 s |      44000.830894 s |
    (s)SCHED:7                     | 0006 |      0.080 ms |         2 |      0.044 ms |      43997.826145 s |      43997.826189 s |
   --------------------------------------------------------------------------------------------------------------------------------

  #
  # perf kwork -k softirq rep -S

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    (s)TIMER:1                     | 0003 |    181.387 ms |      2476 |      1.240 ms |      44004.787960 s |      44004.789201 s |
    (s)RCU:9                       | 0003 |     91.573 ms |      2193 |      0.650 ms |      44004.790258 s |      44004.790908 s |
    (s)RCU:9                       | 0001 |     78.960 ms |      1619 |      1.195 ms |      44001.496553 s |      44001.497749 s |
    (s)SCHED:7                     | 0000 |     63.631 ms |       680 |      2.690 ms |      44006.721976 s |      44006.724666 s |
    ... <SNIP> ...
    (s)SCHED:7                     | 0003 |     55.962 ms |      1255 |      0.954 ms |      44004.812008 s |      44004.812962 s |
    (s)RCU:9                       | 0006 |      0.220 ms |         7 |      0.048 ms |      44004.833764 s |      44004.833812 s |
    (s)NET_RX:3                    | 0002 |      0.164 ms |         5 |      0.049 ms |      44005.012418 s |      44005.012466 s |
    (s)TIMER:1                     | 0005 |      0.164 ms |         1 |      0.164 ms |      44007.820474 s |      44007.820638 s |
    (s)TIMER:1                     | 0006 |      0.087 ms |         1 |      0.087 ms |      44000.830807 s |      44000.830894 s |
    (s)SCHED:7                     | 0006 |      0.080 ms |         2 |      0.044 ms |      43997.826145 s |      43997.826189 s |
   --------------------------------------------------------------------------------------------------------------------------------
    Total count            :     12748
    Total runtime   (msec) :   661.433 (0.065% load average)
    Total time span (msec) : 10176.441
   --------------------------------------------------------------------------------------------------------------------------------

  #
  # perf kwork -k softirq rep -s count,max

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    (s)TIMER:1                     | 0003 |    181.387 ms |      2476 |      1.240 ms |      44004.787960 s |      44004.789201 s |
    (s)RCU:9                       | 0003 |     91.573 ms |      2193 |      0.650 ms |      44004.790258 s |      44004.790908 s |
    (s)SCHED:7                     | 0002 |     50.039 ms |      1731 |      0.074 ms |      44005.009447 s |      44005.009521 s |
    (s)RCU:9                       | 0001 |     78.960 ms |      1619 |      1.195 ms |      44001.496553 s |      44001.497749 s |
    (s)SCHED:7                     | 0003 |     55.962 ms |      1255 |      0.954 ms |      44004.812008 s |      44004.812962 s |
    ... <SNIP> ...
    (s)RCU:9                       | 0002 |     35.241 ms |       932 |      0.407 ms |      44005.009541 s |      44005.009949 s |
    (s)RCU:9                       | 0000 |     45.710 ms |       702 |      1.144 ms |      44004.787023 s |      44004.788167 s |
    (s)SCHED:7                     | 0006 |      0.080 ms |         2 |      0.044 ms |      43997.826145 s |      43997.826189 s |
    (s)TIMER:1                     | 0005 |      0.164 ms |         1 |      0.164 ms |      44007.820474 s |      44007.820638 s |
    (s)TIMER:1                     | 0006 |      0.087 ms |         1 |      0.087 ms |      44000.830807 s |      44000.830894 s |
   --------------------------------------------------------------------------------------------------------------------------------

Committer testing:

  # perf kwork -k softirq report -C 2 -s count,max

  Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
 --------------------------------------------------------------------------------------------------------------------------------
  (s)SCHED:7                     | 0002 |      0.980 ms |       159 |      0.024 ms |      26035.571037 s |      26035.571061 s |
  (s)RCU:9                       | 0002 |      0.124 ms |        88 |      0.021 ms |      26035.177050 s |      26035.177071 s |
  (s)TIMER:1                     | 0002 |      0.122 ms |        56 |      0.007 ms |      26035.468045 s |      26035.468052 s |
 --------------------------------------------------------------------------------------------------------------------------------

  #

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Add irq report support
Yang Jihong [Sat, 9 Jul 2022 01:50:23 +0000 (09:50 +0800)]
perf kwork: Add irq report support

Implements irq kwork report function.

Test cases:

  # perf kwork record -- sleep 10
  [ perf record: Woken up 0 times to write data ]
  [ perf record: Captured and wrote 6.134 MB perf.data ]

  # perf kwork report

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    virtio0-requests:25            | 0000 |   1167.501 ms |     18284 |      1.096 ms |      44004.464905 s |      44004.466001 s |
    eth0:10                        | 0002 |      0.185 ms |         5 |      0.058 ms |      44005.012222 s |      44005.012280 s |
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork report -C 2

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    eth0:10                        | 0002 |      0.185 ms |         5 |      0.058 ms |      44005.012222 s |      44005.012280 s |
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork report -C 3

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork report -i perf.data

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    virtio0-requests:25            | 0000 |   1167.501 ms |     18284 |      1.096 ms |      44004.464905 s |      44004.466001 s |
    eth0:10                        | 0002 |      0.185 ms |         5 |      0.058 ms |      44005.012222 s |      44005.012280 s |
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork report -s max,freq

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    virtio0-requests:25            | 0000 |   1167.501 ms |     18284 |      1.096 ms |      44004.464905 s |      44004.466001 s |
    eth0:10                        | 0002 |      0.185 ms |         5 |      0.058 ms |      44005.012222 s |      44005.012280 s |
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork report -S

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    virtio0-requests:25            | 0000 |   1167.501 ms |     18284 |      1.096 ms |      44004.464905 s |      44004.466001 s |
    eth0:10                        | 0002 |      0.185 ms |         5 |      0.058 ms |      44005.012222 s |      44005.012280 s |
   --------------------------------------------------------------------------------------------------------------------------------
    Total count            :     18289
    Total runtime   (msec) :  1167.686 (0.115% load average)
    Total time span (msec) : 10159.155
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork report --time 44005,

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    virtio0-requests:25            | 0000 |    402.173 ms |      4695 |      0.981 ms |      44007.831992 s |      44007.832973 s |
    eth0:10                        | 0002 |      0.089 ms |         2 |      0.058 ms |      44005.012222 s |      44005.012280 s |
   --------------------------------------------------------------------------------------------------------------------------------

Committer testing:

  # perf kwork report

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
    nvme0q5:130                    | 0004 |      1.101 ms |        49 |      0.051 ms |      26035.056403 s |      26035.056455 s |
    amdgpu:162                     | 0002 |      0.176 ms |         9 |      0.046 ms |      26035.268020 s |      26035.268066 s |
    nvme0q24:149                   | 0023 |      0.161 ms |        55 |      0.009 ms |      26035.655280 s |      26035.655288 s |
    nvme0q20:145                   | 0019 |      0.090 ms |        33 |      0.014 ms |      26035.939018 s |      26035.939032 s |
    nvme0q31:156                   | 0030 |      0.075 ms |        21 |      0.010 ms |      26035.052237 s |      26035.052247 s |
    nvme0q8:133                    | 0007 |      0.062 ms |        12 |      0.021 ms |      26035.416840 s |      26035.416861 s |
    nvme0q6:131                    | 0005 |      0.054 ms |        22 |      0.010 ms |      26035.199919 s |      26035.199929 s |
    nvme0q19:144                   | 0018 |      0.052 ms |        14 |      0.010 ms |      26035.110615 s |      26035.110625 s |
    nvme0q7:132                    | 0006 |      0.049 ms |        13 |      0.007 ms |      26035.125180 s |      26035.125187 s |
    nvme0q18:143                   | 0017 |      0.033 ms |        14 |      0.007 ms |      26035.169698 s |      26035.169705 s |
    nvme0q17:142                   | 0016 |      0.013 ms |         1 |      0.013 ms |      26035.565147 s |      26035.565160 s |
    enp5s0-rx-0:164                | 0006 |      0.004 ms |         4 |      0.002 ms |      26035.928882 s |      26035.928884 s |
    enp5s0-tx-0:166                | 0008 |      0.003 ms |         3 |      0.002 ms |      26035.870923 s |      26035.870925 s |
   --------------------------------------------------------------------------------------------------------------------------------

  #

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Implement 'report' subcommand
Yang Jihong [Sat, 9 Jul 2022 01:50:22 +0000 (09:50 +0800)]
perf kwork: Implement 'report' subcommand

Implements framework of 'perf kwork report', which is used to report
time properties such as run time and frequency:

Test cases:

  # perf kwork

   Usage: perf kwork [<options>] {record|report}

      -D, --dump-raw-trace  dump raw trace in ASCII
      -f, --force           don't complain, do it
      -k, --kwork <kwork>   list of kwork to profile (irq, softirq, workqueue, etc)
      -v, --verbose         be more verbose (show symbol address, etc)

  # perf kwork report -h

   Usage: perf kwork report [<options>]

      -C, --cpu <cpu>       list of cpus to profile
      -i, --input <file>    input file name
      -n, --name <name>     event name to profile
      -s, --sort <key[,key2...]>
                            sort by key(s): runtime, max, count
      -S, --with-summary    Show summary with statistics
          --time <str>      Time span for analysis (start,stop)

  # perf kwork report

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork report -S

    Kwork Name                     | Cpu  | Total Runtime | Count     | Max runtime   | Max runtime start   | Max runtime end     |
   --------------------------------------------------------------------------------------------------------------------------------
   --------------------------------------------------------------------------------------------------------------------------------
    Total count            :         0
    Total runtime   (msec) :     0.000 (0.000% load average)
    Total time span (msec) :     0.000
   --------------------------------------------------------------------------------------------------------------------------------

  # perf kwork report -C 0,100
  Requested CPU 100 too large. Consider raising MAX_NR_CPUS
  Invalid cpu bitmap

  # perf kwork report -s runtime1
    Error: Unknown --sort key: `runtime1'

   Usage: perf kwork report [<options>]

      -C, --cpu <cpu>       list of cpus to profile
      -i, --input <file>    input file name
      -n, --name <name>     event name to profile
      -s, --sort <key[,key2...]>
                            sort by key(s): runtime, max, count
      -S, --with-summary    Show summary with statistics
          --time <str>      Time span for analysis (start,stop)

  # perf kwork report -i perf_no_exist.data
  failed to open perf_no_exist.data: No such file or directory

  # perf kwork report --time 00FFF,
  Invalid time span

Since there are no report supported events, the output is empty.

Briefly describe the data structure:

1. "class" indicates event type. For example, irq and softiq correspond
to different types.

2. "cluster" refers to a specific event corresponding to a type. For
example, RCU and TIMER in softirq correspond to different clusters,
which contains three types of events: raise, entry, and exit.

3. "atom" includes time of each sample and sample of the previous phase.
(For example, exit corresponds to entry, which is used for timehist.)

Committer notes:

- Add {} for multiline if blocks.

- report_print_work() should either return that ret variable that
  accounts how many bytes were printed or stop accounting and be void.
  Do the former for now to avoid this:

builtin-kwork.c:534:6: error: variable 'ret' set but not used [-Werror,-Wunused-but-set-variable]
        int ret = 0;
            ^
1 error generated.

  When building with:

  â¬¢[acme@toolbox perf]$ clang --version
  clang version 13.0.0 (https://github.com/llvm/llvm-project e8991caea8690ec2d17b0b7e1c29bf0da6609076)

Also:

  -       if ((dst_type >= 0) && (dst_type < KWORK_TRACE_MAX)) {
  +       if (dst_type < KWORK_TRACE_MAX) {

Several versions of clang and at least this gcc:

   3    51.40 alpine:3.9                    : FAIL gcc version 8.3.0 (Alpine 8.3.0)
    builtin-kwork.c:411:16: error: comparison of unsigned enum expression >= 0 is
          always true [-Werror,-Wtautological-compare]
            if ((dst_type >= 0) && (dst_type < KWORK_TRACE_MAX)) {

As the first entry in a enum is zero.

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agonetfilter: nft_queue: only allow supported familes and hooks
Florian Westphal [Tue, 26 Jul 2022 17:49:00 +0000 (19:49 +0200)]
netfilter: nft_queue: only allow supported familes and hooks

Trying to use 'queue' statement in ingress (for example)
triggers a splat on reinject:

WARNING: CPU: 3 PID: 1345 at net/netfilter/nf_queue.c:291

... because nf_reinject cannot find the ruleset head.

The netdev family doesn't support async resume at the moment anyway,
so disallow loading such rulesets with a more appropriate
error message.

v2: add 'validate' callback and also check hook points, v1 did
allow ingress use in 'table inet', but that doesn't work either. (Pablo)

Signed-off-by: Florian Westphal <[email protected]>
Reviewed-by: Pablo Neira Ayuso <[email protected]>
2 years agonetfilter: nf_tables: add rescheduling points during loop detection walks
Florian Westphal [Tue, 26 Jul 2022 10:44:35 +0000 (12:44 +0200)]
netfilter: nf_tables: add rescheduling points during loop detection walks

Add explicit rescheduling points during ruleset walk.

Switching to a faster algorithm is possible but this is a much
smaller change, suitable for nf tree.

Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1460
Signed-off-by: Florian Westphal <[email protected]>
Acked-by: Pablo Neira Ayuso <[email protected]>
2 years agonetfilter: nf_queue: do not allow packet truncation below transport header offset
Florian Westphal [Tue, 26 Jul 2022 10:42:06 +0000 (12:42 +0200)]
netfilter: nf_queue: do not allow packet truncation below transport header offset

Domingo Dirutigliano and Nicola Guerrera report kernel panic when
sending nf_queue verdict with 1-byte nfta_payload attribute.

The IP/IPv6 stack pulls the IP(v6) header from the packet after the
input hook.

If user truncates the packet below the header size, this skb_pull() will
result in a malformed skb (skb->len < 0).

Fixes: 7af4cc3fa158 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink")
Reported-by: Domingo Dirutigliano <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Reviewed-by: Pablo Neira Ayuso <[email protected]>
2 years agotools lib: Add list_last_entry_or_null()
Yang Jihong [Sat, 9 Jul 2022 01:50:21 +0000 (09:50 +0800)]
tools lib: Add list_last_entry_or_null()

Add list_last_entry_or_null() to get the last element from a list,
returns NULL if the list is empty.

Signed-off-by: Yang Jihong <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Add workqueue kwork record support
Yang Jihong [Sat, 9 Jul 2022 01:50:20 +0000 (09:50 +0800)]
perf kwork: Add workqueue kwork record support

Record workqueue events workqueue:workqueue_activate_work,
workqueue:workqueue_execute_start & workqueue:workqueue_execute_end

Tese cases:
Record all events:

  # perf kwork record -o perf_kwork.date -- sleep 1
  [ perf record: Woken up 0 times to write data ]
  [ perf record: Captured and wrote 0.857 MB perf_kwork.date ]
  #
  # perf evlist -i perf_kwork.date
  irq:irq_handler_entry
  irq:irq_handler_exit
  irq:softirq_raise
  irq:softirq_entry
  irq:softirq_exit
  workqueue:workqueue_activate_work
  workqueue:workqueue_execute_start
  workqueue:workqueue_execute_end
  dummy:HG
  # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

Record workqueue events:

  # perf kwork -k workqueue record -o perf_kwork.date -- sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.081 MB perf_kwork.date ]
  #
  # perf evlist -i perf_kwork.date
  workqueue:workqueue_activate_work
  workqueue:workqueue_execute_start
  workqueue:workqueue_execute_end
  dummy:HG
  # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

Committer testing:

  # perf kwork record sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 3.430 MB perf.data (24130 samples) ]
  # perf evlist -v
  irq:irq_handler_entry: type: 2, size: 128, config: 0x97, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
  irq:irq_handler_exit: type: 2, size: 128, config: 0x96, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
  irq:softirq_raise: type: 2, size: 128, config: 0x93, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
  irq:softirq_entry: type: 2, size: 128, config: 0x95, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
  irq:softirq_exit: type: 2, size: 128, config: 0x94, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
  workqueue:workqueue_activate_work: type: 2, size: 128, config: 0x106, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
  workqueue:workqueue_execute_start: type: 2, size: 128, config: 0x105, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
  workqueue:workqueue_execute_end: type: 2, size: 128, config: 0x104, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
  dummy:HG: type: 1, size: 128, config: 0x9, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|RAW|IDENTIFIER, read_format: ID, inherit: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
  # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
  # perf script | grep workqueue | head
           swapper     0 [018] 26035.043289: workqueue:workqueue_activate_work: work struct 0xffff8b8ffeeae368
   kworker/18:2-ev 70440 [018] 26035.043293: workqueue:workqueue_execute_start: work struct 0xffff8b8ffeeae368: function free_work
   kworker/18:2-ev 70440 [018] 26035.043301:   workqueue:workqueue_execute_end: work struct 0xffff8b8ffeeae368: function free_work
           swapper     0 [021] 26035.044704: workqueue:workqueue_activate_work: work struct 0xffff8b8ffef6e368
   kworker/21:0-ev 4080535 [021] 26035.044709: workqueue:workqueue_execute_start: work struct 0xffff8b8ffef6e368: function free_work
   kworker/21:0-ev 4080535 [021] 26035.044716:   workqueue:workqueue_execute_end: work struct 0xffff8b8ffef6e368: function free_work
           swapper     0 [018] 26035.045230: workqueue:workqueue_activate_work: work struct 0xffff8b8ffeeae368
   kworker/18:2-ev 70440 [018] 26035.045232: workqueue:workqueue_execute_start: work struct 0xffff8b8ffeeae368: function free_work
   kworker/18:2-ev 70440 [018] 26035.045235:   workqueue:workqueue_execute_end: work struct 0xffff8b8ffeeae368: function free_work
           swapper     0 [001] 26035.052046: workqueue:workqueue_activate_work: work struct 0xffff8b8108901590
  #

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Add softirq kwork record support
Yang Jihong [Sat, 9 Jul 2022 01:50:19 +0000 (09:50 +0800)]
perf kwork: Add softirq kwork record support

Record softirq events irq:softirq_raise, irq:softirq_entry &
irq:softirq_exit.

Test cases:
Record all events:

  # perf kwork record -o perf_kwork.date -- sleep 1
  [ perf record: Woken up 0 times to write data ]
  [ perf record: Captured and wrote 0.897 MB perf_kwork.date ]
  #
  # perf evlist -i perf_kwork.date
  irq:irq_handler_entry
  irq:irq_handler_exit
  irq:softirq_raise
  irq:softirq_entry
  irq:softirq_exit
  dummy:HG
  # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

Record softirq events:

  # perf kwork -k softirq record -o perf_kwork.date -- sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.141 MB perf_kwork.date ]
  #
  # perf evlist -i perf_kwork.date
  irq:softirq_raise
  irq:softirq_entry
  irq:softirq_exit
  dummy:HG
  # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

Committer testing:

  # perf kwork record sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 3.078 MB perf.data (17433 samples) ]
  # perf evlist -v
  irq:irq_handler_entry: type: 2, size: 128, config: 0x97, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
  irq:irq_handler_exit: type: 2, size: 128, config: 0x96, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
  irq:softirq_raise: type: 2, size: 128, config: 0x93, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
  irq:softirq_entry: type: 2, size: 128, config: 0x95, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
  irq:softirq_exit: type: 2, size: 128, config: 0x94, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
  dummy:HG: type: 1, size: 128, config: 0x9, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|RAW|IDENTIFIER, read_format: ID, inherit: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
  # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
  # perf script | head
      migration/12    73 [012] 25884.940992:     irq:softirq_raise: vec=9 [action=RCU]
      migration/12    73 [012] 25884.940994:     irq:softirq_entry: vec=9 [action=RCU]
      migration/12    73 [012] 25884.940995:      irq:softirq_exit: vec=9 [action=RCU]
           swapper     0 [004] 25884.940995:     irq:softirq_raise: vec=9 [action=RCU]
           swapper     0 [004] 25884.940998:     irq:softirq_entry: vec=9 [action=RCU]
           swapper     0 [004] 25884.940999:      irq:softirq_exit: vec=9 [action=RCU]
               cc1 71212 [021] 25884.941990:     irq:softirq_raise: vec=9 [action=RCU]
           swapper     0 [004] 25884.941991:     irq:softirq_raise: vec=9 [action=RCU]
               cc1 71212 [021] 25884.941992:     irq:softirq_raise: vec=7 [action=SCHED]
         perf-exec 71208 [013] 25884.941992:     irq:softirq_raise: vec=9 [action=RCU]
  #

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: Add irq kwork record support
Yang Jihong [Sat, 9 Jul 2022 01:50:18 +0000 (09:50 +0800)]
perf kwork: Add irq kwork record support

Record interrupt events irq:irq_handler_entry & irq_handler_exit

Test cases:

 # perf kwork record -o perf_kwork.date -- sleep 1
  [ perf record: Woken up 0 times to write data ]
  [ perf record: Captured and wrote 0.556 MB perf_kwork.date ]
  #
  # perf evlist -i perf_kwork.date
  irq:irq_handler_entry
  irq:irq_handler_exit
  dummy:HG
  # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
  #

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf kwork: New tool to trace time properties of kernel work (such as softirq, and...
Yang Jihong [Sat, 9 Jul 2022 01:50:17 +0000 (09:50 +0800)]
perf kwork: New tool to trace time properties of kernel work (such as softirq, and workqueue)

The 'perf kwork' tool is used to trace time properties of kernel work
(such as irq, softirq, and workqueue), including runtime, latency, and
timehist, using the infrastructure in the perf tools to allow tracing
extra targets.

This is the first commit to reuse the 'perf record' framework code to
implement a simple record function, kwork is not supported currently.

Test cases:

  # perf

   usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

   The most commonly used perf commands are:
  <SNIP>
     iostat          Show I/O performance metrics
     kallsyms        Searches running kernel for symbols
     kmem            Tool to trace/measure kernel memory properties
     kvm             Tool to trace/measure kvm guest os
     kwork           Tool to trace/measure kernel work properties (latencies)
     list            List all symbolic event types
     lock            Analyze lock events
     mem             Profile memory accesses
     record          Run a command and record its profile into perf.data
  <SNIP>
   See 'perf help COMMAND' for more information on a specific command.

  # perf kwork

   Usage: perf kwork [<options>] {record}

      -D, --dump-raw-trace  dump raw trace in ASCII
      -f, --force           don't complain, do it
      -k, --kwork <kwork>   list of kwork to profile
      -v, --verbose         be more verbose (show symbol address, etc)

  # perf kwork record -- sleep 1
  [ perf record: Woken up 0 times to write data ]
  [ perf record: Captured and wrote 1.787 MB perf.data ]

Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Clarke <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Add {} for multiline if blocks ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoMerge tag 's390-5.19-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Tue, 26 Jul 2022 17:03:53 +0000 (10:03 -0700)]
Merge tag 's390-5.19-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fix from Alexander GordeevL

 - Prevent relatively slow PRNO TRNG random number operation from being
   called from interrupt context. That could for example cause some
   network loads to timeout.

* tag 's390-5.19-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/archrandom: prevent CPACF trng invocations in interrupt context

2 years agomm: fix NULL pointer dereference in wp_page_reuse()
Qi Zheng [Tue, 26 Jul 2022 06:24:36 +0000 (14:24 +0800)]
mm: fix NULL pointer dereference in wp_page_reuse()

The vmf->page can be NULL when the wp_page_reuse() is invoked by
wp_pfn_shared(), it will cause the following panic:

  BUG: kernel NULL pointer dereference, address: 000000000000008
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP PTI
  CPU: 18 PID: 923 Comm: Xorg Not tainted 5.19.0-rc8.bm.1-amd64 #263
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g14
  RIP: 0010:_compound_head+0x0/0x40
  [...]
  Call Trace:
    wp_page_reuse+0x1c/0xa0
    do_wp_page+0x1a5/0x3f0
    __handle_mm_fault+0x8cf/0xd20
    handle_mm_fault+0xd5/0x2a0
    do_user_addr_fault+0x1d0/0x680
    exc_page_fault+0x78/0x170
    asm_exc_page_fault+0x22/0x30

To fix it, this patch performs a NULL pointer check before dereferencing
the vmf->page.

Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
2 years agodrm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()
Nathan Chancellor [Mon, 25 Jul 2022 23:36:29 +0000 (16:36 -0700)]
drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()

When booting a kernel compiled with clang's CFI protection
(CONFIG_CFI_CLANG), there is a CFI failure in
drm_simple_kms_crtc_mode_valid() when trying to call
simpledrm_simple_display_pipe_mode_valid() through ->mode_valid():

[    0.322802] CFI failure (target: simpledrm_simple_display_pipe_mode_valid+0x0/0x8):
...
[    0.324928] Call trace:
[    0.324969]  __ubsan_handle_cfi_check_fail+0x58/0x60
[    0.325053]  __cfi_check_fail+0x3c/0x44
[    0.325120]  __cfi_slowpath_diag+0x178/0x200
[    0.325192]  drm_simple_kms_crtc_mode_valid+0x58/0x80
[    0.325279]  __drm_helper_update_and_validate+0x31c/0x464
...

The ->mode_valid() member in 'struct drm_simple_display_pipe_funcs'
expects a return type of 'enum drm_mode_status', not 'int'. Correct it
to fix the CFI failure.

Cc: [email protected]
Fixes: 11e8f5fd223b ("drm: Add simpledrm driver")
Link: https://github.com/ClangBuiltLinux/linux/issues/1647
Reported-by: Tomasz PaweÅ‚ Gajc <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Sami Tolvanen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2 years agobridge: Do not send empty IFLA_AF_SPEC attribute
Benjamin Poirier [Mon, 25 Jul 2022 00:12:36 +0000 (09:12 +0900)]
bridge: Do not send empty IFLA_AF_SPEC attribute

After commit b6c02ef54913 ("bridge: Netlink interface fix."),
br_fill_ifinfo() started to send an empty IFLA_AF_SPEC attribute when a
bridge vlan dump is requested but an interface does not have any vlans
configured.

iproute2 ignores such an empty attribute since commit b262a9becbcb
("bridge: Fix output with empty vlan lists") but older iproute2 versions as
well as other utilities have their output changed by the cited kernel
commit, resulting in failed test cases. Regardless, emitting an empty
attribute is pointless and inefficient.

Avoid this change by canceling the attribute if no AF_SPEC data was added.

Fixes: b6c02ef54913 ("bridge: Netlink interface fix.")
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: Benjamin Poirier <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
2 years agoMerge branch 'octeontx2-minor-tc-fixes'
Paolo Abeni [Tue, 26 Jul 2022 11:05:46 +0000 (13:05 +0200)]
Merge branch 'octeontx2-minor-tc-fixes'

Subbaraya Sundeep says:

====================
Octeontx2 minor tc fixes

This patch set fixes two problems found in tc code
wrt to ratelimiting and when installing UDP/TCP filters.

Patch 1: CN10K has different register format compared to
CN9xx hence fixes that.
Patch 2: Check flow mask also before installing a src/dst
port filter, otherwise installing for one port installs for other one too.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
2 years agoocteontx2-pf: Fix UDP/TCP src and dst port tc filters
Subbaraya Sundeep [Sun, 24 Jul 2022 08:21:14 +0000 (13:51 +0530)]
octeontx2-pf: Fix UDP/TCP src and dst port tc filters

Check the mask for non-zero value before installing tc filters
for L4 source and destination ports. Otherwise installing a
filter for source port installs destination port too and
vice-versa.

Fixes: 1d4d9e42c240 ("octeontx2-pf: Add tc flower hardware offload on ingress traffic")
Signed-off-by: Subbaraya Sundeep <[email protected]>
Signed-off-by: Sunil Goutham <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
2 years agoocteontx2-pf: cn10k: Fix egress ratelimit configuration
Sunil Goutham [Sun, 24 Jul 2022 08:21:13 +0000 (13:51 +0530)]
octeontx2-pf: cn10k: Fix egress ratelimit configuration

NIX_AF_TLXX_PIR/CIR register format has changed from OcteonTx2
to CN10K. CN10K supports larger burst size. Fix burst exponent
and burst mantissa configuration for CN10K.

Also fixed 'maxrate' from u32 to u64 since 'police.rate_bytes_ps'
passed by stack is also u64.

Fixes: e638a83f167e ("octeontx2-pf: TC_MATCHALL egress ratelimiting offload")
Signed-off-by: Sunil Goutham <[email protected]>
Signed-off-by: Subbaraya Sundeep <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
2 years agosctp: fix sleep in atomic context bug in timer handlers
Duoming Zhou [Sat, 23 Jul 2022 01:58:09 +0000 (09:58 +0800)]
sctp: fix sleep in atomic context bug in timer handlers

There are sleep in atomic context bugs in timer handlers of sctp
such as sctp_generate_t3_rtx_event(), sctp_generate_probe_event(),
sctp_generate_t1_init_event(), sctp_generate_timeout_event(),
sctp_generate_t3_rtx_event() and so on.

The root cause is sctp_sched_prio_init_sid() with GFP_KERNEL parameter
that may sleep could be called by different timer handlers which is in
interrupt context.

One of the call paths that could trigger bug is shown below:

      (interrupt context)
sctp_generate_probe_event
  sctp_do_sm
    sctp_side_effects
      sctp_cmd_interpreter
        sctp_outq_teardown
          sctp_outq_init
            sctp_sched_set_sched
              n->init_sid(..,GFP_KERNEL)
                sctp_sched_prio_init_sid //may sleep

This patch changes gfp_t parameter of init_sid in sctp_sched_set_sched()
from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic
context bugs.

Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations")
Signed-off-by: Duoming Zhou <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: dsa: fix reference counting for LAG FDBs
Vladimir Oltean [Sat, 23 Jul 2022 01:24:11 +0000 (04:24 +0300)]
net: dsa: fix reference counting for LAG FDBs

Due to an invalid conflict resolution on my side while working on 2
different series (LAG FDBs and FDB isolation), dsa_switch_do_lag_fdb_add()
does not store the database associated with a dsa_mac_addr structure.

So after adding an FDB entry associated with a LAG, dsa_mac_addr_find()
fails to find it while deleting it, because &a->db is zeroized memory
for all stored FDB entries of lag->fdbs, and dsa_switch_do_lag_fdb_del()
returns -ENOENT rather than deleting the entry.

Fixes: c26933639b54 ("net: dsa: request drivers to perform FDB isolation")
Signed-off-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoi40e: Fix interface init with MSI interrupts (no MSI-X)
Michal Maloszewski [Fri, 22 Jul 2022 17:54:01 +0000 (10:54 -0700)]
i40e: Fix interface init with MSI interrupts (no MSI-X)

Fix the inability to bring an interface up on a setup with
only MSI interrupts enabled (no MSI-X).
Solution is to add a default number of QPs = 1. This is enough,
since without MSI-X support driver enables only a basic feature set.

Fixes: bc6d33c8d93f ("i40e: Fix the number of queues available to be mapped for use")
Signed-off-by: Dawid Lukwinski <[email protected]>
Signed-off-by: Michal Maloszewski <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agodrm/amdgpu: Re-enable DCN for 64-bit powerpc
Michael Ellerman [Mon, 25 Jul 2022 04:37:29 +0000 (14:37 +1000)]
drm/amdgpu: Re-enable DCN for 64-bit powerpc

Commit d11219ad53dc ("amdgpu: disable powerpc support for the newer
display engine") disabled the DCN driver for all of powerpc due to
unresolved build failures with some compilers.

Further digging shows that the build failures only occur with compilers
that default to 64-bit long double.

Both the ppc64 and ppc64le ABIs define long double to be 128-bits, but
there are compilers in the wild that default to 64-bits. The compilers
provided by the major distros (Fedora, Ubuntu) default to 128-bits and
are not affected by the build failure.

There is a compiler flag to force 128-bit long double, which may be the
correct long term fix, but as an interim fix only allow building the DCN
driver if long double is 128-bits by default.

The bisection in commit d11219ad53dc must have gone off the rails at
some point, the build failure occurs all the way back to the original
commit that enabled DCN support on powerpc, at least with some
toolchains.

Depends-on: d11219ad53dc ("amdgpu: disable powerpc support for the newer display engine")
Fixes: 16a9dea110a6 ("amdgpu: Enable initial DCN support on POWER")
Signed-off-by: Michael Ellerman <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Reviewed-by: Dan Horák <[email protected]>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2100
Link: https://lore.kernel.org/r/[email protected]
2 years agoperf data: Add missing unistd.h header needed for pid_t
Arnaldo Carvalho de Melo [Mon, 25 Jul 2022 21:09:08 +0000 (18:09 -0300)]
perf data: Add missing unistd.h header needed for pid_t

Noticed when processing 'perf kwork' that includes util/data.h without,
by luck, having included unistd.h indirectly to get the pid_t typedef.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf lock: Support -t option for 'contention' subcommand
Namhyung Kim [Mon, 25 Jul 2022 18:31:24 +0000 (11:31 -0700)]
perf lock: Support -t option for 'contention' subcommand

Like perf lock report, it can report lock contention stat of each task.

  $ perf lock contention -t
   contended   total wait     max wait     avg wait          pid   comm

           5    945.20 us    902.08 us    189.04 us       316167   EventManager_De
          33     98.17 us      6.78 us      2.97 us       766063   kworker/0:1-get
           7     92.47 us     61.26 us     13.21 us       316170   EventManager_De
          14     76.31 us     12.87 us      5.45 us        12949   timedcall
          24     76.15 us     12.27 us      3.17 us       767992   sched-pipe
          15     75.62 us     11.93 us      5.04 us        15127   switchto-defaul
          24     71.84 us      5.59 us      2.99 us       629168   kworker/u513:2-
          17     67.41 us      7.94 us      3.96 us        13504   coroner-
           1     59.56 us     59.56 us     59.56 us       316165   EventManager_De
          14     56.21 us      6.89 us      4.01 us            0   swapper

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf lock: Add -k and -F options to 'contention' subcommand
Namhyung Kim [Mon, 25 Jul 2022 18:31:23 +0000 (11:31 -0700)]
perf lock: Add -k and -F options to 'contention' subcommand

Like perf lock report, add -k/--key and -F/--field options to control
output formatting and sorting.  Note that it has slightly different
default options as some fields are not available and to optimize the
screen space.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf lock: Add 'contention' subcommand
Namhyung Kim [Mon, 25 Jul 2022 18:31:22 +0000 (11:31 -0700)]
perf lock: Add 'contention' subcommand

The 'perf lock contention' processes the lock contention events and
displays the result like perf lock report.  Right now, there's not
much difference between the two but the lock contention specific
features will come soon.

  $ perf lock contention
   contended   total wait     max wait     avg wait         type   caller

         238      1.41 ms     29.20 us      5.94 us     spinlock   update_blocked_averages+0x4c
           1    902.08 us    902.08 us    902.08 us      rwsem:R   do_user_addr_fault+0x1dd
          81    330.30 us     17.24 us      4.08 us     spinlock   _nohz_idle_balance+0x172
           2     89.54 us     61.26 us     44.77 us     spinlock   do_anonymous_page+0x16d
          24     78.36 us     12.27 us      3.27 us        mutex   pipe_read+0x56
           2     71.58 us     59.56 us     35.79 us     spinlock   __handle_mm_fault+0x6aa
           6     25.68 us      6.89 us      4.28 us     spinlock   do_idle+0x28d
           1     18.46 us     18.46 us     18.46 us      rtmutex   exec_fw_cmd+0x21b
           3     15.25 us      6.26 us      5.08 us     spinlock   tick_do_update_jiffies64+0x2c

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf lock: Add lock aggregation enum
Namhyung Kim [Mon, 25 Jul 2022 18:31:21 +0000 (11:31 -0700)]
perf lock: Add lock aggregation enum

Introduce the aggr_mode variable to prepare a later code change.

The default is LOCK_AGGR_ADDR which aggregates the result for the lock
instances.

When -t/--threads option is given, it'd be set to LOCK_AGGR_TASK.  The
LOCK_AGGR_CALLER is for the contention analysis and it'd aggregate the
stat by comparing the callstacks.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf lock: Add flags field in the lock_stat
Namhyung Kim [Mon, 25 Jul 2022 18:31:20 +0000 (11:31 -0700)]
perf lock: Add flags field in the lock_stat

For lock contention tracepoint analysis, it needs to keep the flags.
As nr_readlock and nr_trylock fields are not used for it, let's make
it a union.

Signed-off-by: Namhyung Kim <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agointel_idle: Fix false positive RCU splats due to incorrect hardirqs state
Waiman Long [Sat, 23 Jul 2022 19:59:32 +0000 (15:59 -0400)]
intel_idle: Fix false positive RCU splats due to incorrect hardirqs state

Commit 32d4fd5751ea ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
uses raw_local_irq_enable/local_irq_disable() around call to
__intel_idle() in intel_idle_irq().

With interrupt enabled, timer tick interrupt can happen and a
subsequently call to __do_softirq() may change the lockdep hardirqs state
of a debug kernel back to 'on'. This will result in a mismatch between
the cpu hardirqs state (off) and the lockdep hardirqs state (on) causing
a number of false positive "WARNING: suspicious RCU usage" splats.

Fix that by using local_irq_disable() to disable interrupt in
intel_idle_irq().

Fixes: 32d4fd5751ea ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
Signed-off-by: Waiman Long <[email protected]>
Cc: 5.16+ <[email protected]> # 5.16+
Signed-off-by: Rafael J. Wysocki <[email protected]>
2 years agoperf test: Add test for #system_tsc_freq in metrics
Ian Rogers [Mon, 18 Jul 2022 16:43:12 +0000 (09:43 -0700)]
perf test: Add test for #system_tsc_freq in metrics

The value should be non-zero on Intel while zero on everything else.

Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kshipra Bopardikar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf tsc: Add cpuinfo fall back for arch_get_tsc_freq()
Ian Rogers [Mon, 18 Jul 2022 16:43:11 +0000 (09:43 -0700)]
perf tsc: Add cpuinfo fall back for arch_get_tsc_freq()

The CPUID method of arch_get_tsc_freq fails for older Intel processors,
such as Skylake. Compute using /proc/cpuinfo.

Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kshipra Bopardikar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agoperf tsc: Add arch TSC frequency information
Kan Liang [Mon, 18 Jul 2022 16:43:10 +0000 (09:43 -0700)]
perf tsc: Add arch TSC frequency information

The TSC frequency information is required for the event metrics with the
literal, system_tsc_freq. For the newer Intel platform, the TSC
frequency information can be retrieved from the CPUID leaf 0x15.  If the
TSC frequency information isn't present the /proc/cpuinfo approach is
used.

Refactor cpuid() for this use. Note, the previous stack pushing/popping
approach was broken on x86-64 that has stack red zones that would be
clobbered.

Committer testing:

Before:

  $ perf record sleep 0.0001
  [ perf record: Woken up 1 times to write data ]
  $ perf report --header-only |& grep cpuid
  # cpuid : AuthenticAMD,25,33,0
  $

After the patch:

  $ perf record sleep 0.0001
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.002 MB perf.data (8 samples) ]
  $ perf report --header-only |& grep cpuid
  # cpuid : AuthenticAMD,25,33,0
  $

Signed-off-by: Kan Liang <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Caleb Biggers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Garry <[email protected]>
Cc: Kshipra Bopardikar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Perry Taylor <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Xing Zhengjun <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ian Rogers <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2 years agodrm/i915/reset: Add additional steps for Wa_22011802037 for execlist backend
Umesh Nerlige Ramappa [Tue, 21 Jun 2022 19:21:05 +0000 (12:21 -0700)]
drm/i915/reset: Add additional steps for Wa_22011802037 for execlist backend

For execlists backend, current implementation of Wa_22011802037 is to
stop the CS before doing a reset of the engine. This WA was further
extended to wait for any pending MI FORCE WAKEUPs before issuing a
reset. Add the extended steps in the execlist path of reset.

In addition, extend the WA to gen11.

v2: (Tvrtko)
- Clarify comments, commit message, fix typos
- Use IS_GRAPHICS_VER for gen 11/12 checks

v3: (Daneile)
- Drop changes to intel_ring_submission since WA does not apply to it
- Log an error if MSG IDLE is not defined for an engine

Signed-off-by: Umesh Nerlige Ramappa <[email protected]>
Fixes: f6aa0d713c88 ("drm/i915: Add Wa_22011802037 force cs halt")
Acked-by: Tvrtko Ursulin <[email protected]>
Reviewed-by: Daniele Ceraolo Spurio <[email protected]>
Signed-off-by: John Harrison <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 0667429ce68e0b08f9f1fec8fd0b1f57228f605e)
Signed-off-by: Tvrtko Ursulin <[email protected]>
2 years agoMerge branch 'net-sysctl-races-part-6'
David S. Miller [Mon, 25 Jul 2022 11:42:10 +0000 (12:42 +0100)]
Merge branch 'net-sysctl-races-part-6'

Kuniyuki Iwashima says:

====================
sysctl: Fix data-races around ipv4_net_table (Round 6, Final).

This series fixes data-races around 11 knobs after tcp_pacing_ss_ratio
ipv4_net_table, and this is the final round for ipv4_net_table.

While at it, other data-races around these related knobs are fixed.

  - decnet_mem
  - decnet_rmem
  - tipc_rmem

There are still 58 tables possibly missing some fixes under net/.

  $ grep -rnE "struct ctl_table.*?\[\] =" net/ | wc -l
  60
====================

Signed-off-by: David S. Miller <[email protected]>
2 years agoipv4: Fix data-races around sysctl_fib_notify_on_flag_change.
Kuniyuki Iwashima [Fri, 22 Jul 2022 18:22:05 +0000 (11:22 -0700)]
ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.

While reading sysctl_fib_notify_on_flag_change, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 680aea08e78c ("net: ipv4: Emit notification when fib hardware flags are changed")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agotcp: Fix data-races around sysctl_tcp_reflect_tos.
Kuniyuki Iwashima [Fri, 22 Jul 2022 18:22:04 +0000 (11:22 -0700)]
tcp: Fix data-races around sysctl_tcp_reflect_tos.

While reading sysctl_tcp_reflect_tos, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Acked-by: Wei Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agotcp: Fix a data-race around sysctl_tcp_comp_sack_nr.
Kuniyuki Iwashima [Fri, 22 Jul 2022 18:22:03 +0000 (11:22 -0700)]
tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.

While reading sysctl_tcp_comp_sack_nr, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 9c21d2fc41c0 ("tcp: add tcp_comp_sack_nr sysctl")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agotcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.
Kuniyuki Iwashima [Fri, 22 Jul 2022 18:22:02 +0000 (11:22 -0700)]
tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.

While reading sysctl_tcp_comp_sack_slack_ns, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: a70437cc09a1 ("tcp: add hrtimer slack to sack compression")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agotcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.
Kuniyuki Iwashima [Fri, 22 Jul 2022 18:22:01 +0000 (11:22 -0700)]
tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.

While reading sysctl_tcp_comp_sack_delay_ns, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: 6d82aa242092 ("tcp: add tcp_comp_sack_delay_ns sysctl")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agonet: Fix data-races around sysctl_[rw]mem(_offset)?.
Kuniyuki Iwashima [Fri, 22 Jul 2022 18:22:00 +0000 (11:22 -0700)]
net: Fix data-races around sysctl_[rw]mem(_offset)?.

While reading these sysctl variables, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.

  - .sysctl_rmem
  - .sysctl_rwmem
  - .sysctl_rmem_offset
  - .sysctl_wmem_offset
  - sysctl_tcp_rmem[1, 2]
  - sysctl_tcp_wmem[1, 2]
  - sysctl_decnet_rmem[1]
  - sysctl_decnet_wmem[1]
  - sysctl_tipc_rmem[1]

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agotcp: Fix data-races around sk_pacing_rate.
Kuniyuki Iwashima [Fri, 22 Jul 2022 18:21:59 +0000 (11:21 -0700)]
tcp: Fix data-races around sk_pacing_rate.

While reading sysctl_tcp_pacing_(ss|ca)_ratio, they can be changed
concurrently.  Thus, we need to add READ_ONCE() to their readers.

Fixes: 43e122b014c9 ("tcp: refine pacing rate determination")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
This page took 0.183783 seconds and 4 git commands to generate.