Git Repo - linux.git/log

Merge tag 'irq-core-2022-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull interrupt updates from Thomas Gleixner:
"Core code:

   - Provide a generic wrapper which can be utilized in drivers to
     handle the problem of force threaded demultiplex interrupts on RT
     enabled kernels. This avoids conditionals and horrible quirks in
     drivers all over the place

   - Fix up affected pinctrl and GPIO drivers to make them cleanly RT
     safe

  Interrupt drivers:

   - A new driver for the FSL MU platform specific MSI implementation

   - Make irqchip_init() available for pure ACPI based systems

   - Provide a functional DT binding for the Realtek RTL interrupt chip

   - The usual DT updates and small code improvements all over the
     place"

* tag 'irq-core-2022-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  irqchip: IMX_MU_MSI should depend on ARCH_MXC
  irqchip/imx-mu-msi: Fix wrong register offset for 8ulp
  irqchip/ls-extirq: Fix invalid wait context by avoiding to use regmap
  dt-bindings: irqchip: Describe the IMX MU block as a MSI controller
  irqchip: Add IMX MU MSI controller driver
  dt-bindings: irqchip: renesas,irqc: Add r8a779g0 support
  irqchip/gic-v3: Fix typo in comment
  dt-bindings: interrupt-controller: ti,sci-intr: Fix missing reg property in the binding
  dt-bindings: irqchip: ti,sci-inta: Fix warning for missing #interrupt-cells
  irqchip: Allow extra fields to be passed to IRQCHIP_PLATFORM_DRIVER_END
  platform-msi: Export symbol platform_msi_create_irq_domain()
  irqchip/realtek-rtl: use parent interrupts
  dt-bindings: interrupt-controller: realtek,rtl-intc: require parents
  irqchip/realtek-rtl: use irq_domain_add_linear()
  irqchip: Make irqchip_init() usable on pure ACPI systems
  bcma: gpio: Use generic_handle_irq_safe()
  gpio: mlxbf2: Use generic_handle_irq_safe()
  platform/x86: intel_int0002_vgpio: Use generic_handle_irq_safe()
  ssb: gpio: Use generic_handle_irq_safe()
  pinctrl: amd: Use generic_handle_irq_safe()
  ...

ring-buffer: Fix kernel-doc

kernel/trace/ring_buffer.c:895: warning: expecting prototype for ring_buffer_nr_pages_dirty(). Prototype was for ring_buffer_nr_dirty_pages() instead.
kernel/trace/ring_buffer.c:5313: warning: expecting prototype for ring_buffer_reset_cpu(). Prototype was for ring_buffer_reset_online_cpus() instead.
kernel/trace/ring_buffer.c:5382: warning: expecting prototype for rind_buffer_empty(). Prototype was for ring_buffer_empty() instead.

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2340
Link: https://lkml.kernel.org/r/[email protected]
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Jiapeng Chong <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

mctp: prevent double key removal and unref

Currently, we have a bug where a simultaneous DROPTAG ioctl and socket
close may race, as we attempt to remove a key from lists twice, and
perform an unref for each removal operation. This may result in a uaf
when we attempt the second unref.

This change fixes the race by making __mctp_key_remove tolerant to being
called on a key that has already been removed from the socket/net lists,
and only performs the unref when we do the actual remove. We also need
to hold the list lock on the ioctl cleanup path.

This fix is based on a bug report and comprehensive analysis from
butt3rflyh4ck <[email protected]>, found via syzkaller.

Cc: [email protected]
Fixes: 63ed1aab3d40 ("mctp: Add SIOCMCTP{ALLOC,DROP}TAG ioctls for tag control")
Reported-by: butt3rflyh4ck <[email protected]>
Signed-off-by: Jeremy Kerr <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
netfilter fixes for net

This series from Phil Sutter for the *net* tree fixes a problem with a change
from the 6.1 development phase: the change to nft_fib should have used
the more recent flowic_l3mdev field. Pointed out by Guillaume Nault.
This also makes the older iptables module follow the same pattern.

Also add selftest case and avoid test failure in nft_fib.sh when the
host environment has set rp_filter=1.
====================

Signed-off-by: David S. Miller <[email protected]>

selftests: netfilter: Fix nft_fib.sh for all.rp_filter=1

If net.ipv4.conf.all.rp_filter is set, it overrides the per-interface
setting and thus defeats the fix from bbe4c0896d250 ("selftests:
netfilter: disable rp_filter on router"). Unset it as well to cover that
case.

Fixes: bbe4c0896d250 ("selftests: netfilter: disable rp_filter on router")
Signed-off-by: Phil Sutter <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>

netfilter: rpfilter/fib: Populate flowic_l3mdev field

Use the introduced field for correct operation with VRF devices instead
of conditionally overwriting flowic_oif. This is a partial revert of
commit b575b24b8eee3 ("netfilter: Fix rpfilter dropping vrf packets by
mistake"), implementing a simpler solution.

Signed-off-by: Phil Sutter <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Guillaume Nault <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>

selftests: netfilter: Test reverse path filtering

Test reverse path (filter) matches in iptables, ip6tables and nftables.
Both with a regular interface and a VRF.

Signed-off-by: Phil Sutter <[email protected]>
Reviewed-by: Guillaume Nault <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>

ftrace: Fix char print issue in print_ip_ins()

When ftrace bug happened, following log shows every hex data in
problematic ip address:
actual: ffffffe8:6b:ffffffd9:01:21

But so many 'f's seem a little confusing, and that is because format
'%x' being used to print signed chars in array 'ins'. As suggested
by Joe, change to use format "%*phC" to print array 'ins'.

After this patch, the log is like:
actual: e8:6b:d9:01:21

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 6c14133d2d3f ("ftrace: Do not blindly read the ip address in ftrace_bug()")
Suggested-by: Joe Perches <[email protected]>
Signed-off-by: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

LoongArch: Update Loongson-3 default config file

1, Enable ZBOOT, KEXEC and BPF_JIT;
2, Add more patition types;
3, Add some USB Type-C options;
4, Add some common network options;
5, Add some Bluetooth device drivers;
6, Remove obsolete config options (for some detailed information, see
Link).

Link: https://lore.kernel.org/kernel-janitors/[email protected]/
Co-developed-by: Tiezhu Yang <[email protected]>
Signed-off-by: Tiezhu Yang <[email protected]>
Co-developed-by: Youling Tang <[email protected]>
Signed-off-by: Youling Tang <[email protected]>
Co-developed-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Add ACPI-based generic laptop driver

This add ACPI-based generic laptop driver for Loongson-3. Some of the
codes are derived from drivers/platform/x86/thinkpad_acpi.c.

Signed-off-by: Jianmin Lv <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Add BPF JIT support

BPF programs are normally handled by a BPF interpreter, add BPF JIT
support for LoongArch to allow the kernel to generate native code when
a program is loaded into the kernel. This will significantly speed-up
processing of BPF programs.

Co-developed-by: Youling Tang <[email protected]>
Signed-off-by: Youling Tang <[email protected]>
Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Add some instruction opcodes and formats

According to the "Table of Instruction Encoding" in LoongArch Reference
Manual [1], add some instruction opcodes and formats which are used in
the BPF JIT for LoongArch.

[1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#table-of-instruction-encoding

Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Move {signed,unsigned}_imm_check() to inst.h

{signed,unsigned}_imm_check() will also be used in the bpf jit, so move
them from module.c to inst.h, this is preparation for later patches.

Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Add kdump support

This patch adds support for kdump. In kdump case the normal kernel will
reserve a region for the crash kernel and jump there on panic.

Arch-specific functions are added to allow for implementing a crash dump
file interface, /proc/vmcore, which can be viewed as a ELF file.

A user-space tool, such as kexec-tools, is responsible for allocating a
separate region for the core's ELF header within the crash kdump kernel
memory and filling it in when executing kexec_load().

Then, its location will be advertised to the crash dump kernel via a
command line argument "elfcorehdr=", and the crash dump kernel will
preserve this region for later use with arch_reserve_vmcore() at boot
time.

At the same time, the crash kdump kernel is also limited within the
"crashkernel" area via a command line argument "mem=", so as not to
destroy the original kernel dump data.

In the crash dump kernel environment, /proc/vmcore is used to access the
primary kernel's memory with copy_oldmem_page().

I tested kdump on LoongArch machines (Loongson-3A5000) and it works as
expected (suggested crashkernel parameter is "crashkernel=512M@2560M"),
you may test it by triggering a crash through /proc/sysrq-trigger:

$ sudo kexec -p /boot/vmlinux-kdump --reuse-cmdline --append="nr_cpus=1"
# echo c > /proc/sysrq-trigger

Signed-off-by: Youling Tang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Add kexec support

Add three new files, kexec.h, machine_kexec.c and relocate_kernel.S to
the LoongArch architecture, so as to add support for the kexec re-boot
mechanism (CONFIG_KEXEC) on LoongArch platforms.

Kexec supports loading vmlinux.elf in ELF format and vmlinux.efi in PE
format.

I tested kexec on LoongArch machines (Loongson-3A5000) and it works as
expected:

$ sudo kexec -l /boot/vmlinux.efi --reuse-cmdline
$ sudo kexec -e

Signed-off-by: Youling Tang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Use generic BUG() handler

Inspired by commit 9fb7410f955("arm64/BUG: Use BRK instruction for
generic BUG traps"), do similar for LoongArch to use generic BUG()
handler.

This patch uses the BREAK software breakpoint instruction to generate
a trap instead, similarly to most other arches, with the generic BUG
code generating the dmesg boilerplate.

This allows bug metadata to be moved to a separate table and reduces
the amount of inline code at BUG() and WARN() sites. This also avoids
clobbering any registers before they can be dumped.

To mitigate the size of the bug table further, this patch makes use of
the existing infrastructure for encoding addresses within the bug table
as 32-bit relative pointers instead of absolute pointers.

(Note: this limits the max kernel size to 2GB.)

Before patch:
[ 3018.338013] lkdtm: Performing direct entry BUG
[ 3018.342445] Kernel bug detected[#5]:
[ 3018.345992] CPU: 2 PID: 865 Comm: cat Tainted: G D 6.0.0-rc6+ #35

After patch:
[  125.585985] lkdtm: Performing direct entry BUG
[  125.590433] ------------[ cut here ]------------
[  125.595020] kernel BUG at drivers/misc/lkdtm/bugs.c:78!
[  125.600211] Oops - BUG[#1]:
[  125.602980] CPU: 3 PID: 410 Comm: cat Not tainted 6.0.0-rc6+ #36

Out-of-line file/line data information obtained compared to before.

Signed-off-by: Youling Tang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Add SysRq-x (TLB Dump) support

Add SysRq-x (TLB Dump) support for LoongArch, which is useful for
debugging.

Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Add perf events support

The perf events infrastructure of LoongArch is very similar to old MIPS-
based Loongson, so most of the codes are derived from MIPS.

Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Add qspinlock support

On NUMA system, the performance of qspinlock is better than generic
spinlock. Below is the UnixBench test results on a 8 nodes (4 cores
per node, 32 cores in total) machine.

A. With generic spinlock:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0  449574022.5  38523.9
Double-Precision Whetstone                       55.0      85190.4  15489.2
Execl Throughput                                 43.0      14696.2   3417.7
File Copy 1024 bufsize 2000 maxblocks          3960.0     143157.8    361.5
File Copy 256 bufsize 500 maxblocks            1655.0      37631.8    227.4
File Copy 4096 bufsize 8000 maxblocks          5800.0     444814.2    766.9
Pipe Throughput                               12440.0    5047490.7   4057.5
Pipe-based Context Switching                   4000.0    2021545.7   5053.9
Process Creation                                126.0      23829.8   1891.3
Shell Scripts (1 concurrent)                     42.4      33756.7   7961.5
Shell Scripts (8 concurrent)                      6.0       4062.9   6771.5
System Call Overhead                          15000.0    2479748.6   1653.2
                                                                   ========
System Benchmarks Index Score                                        2955.6

B. With qspinlock:

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0  449467876.9  38514.8
Double-Precision Whetstone                       55.0      85174.6  15486.3
Execl Throughput                                 43.0      14769.1   3434.7
File Copy 1024 bufsize 2000 maxblocks          3960.0     146150.5    369.1
File Copy 256 bufsize 500 maxblocks            1655.0      37496.8    226.6
File Copy 4096 bufsize 8000 maxblocks          5800.0     447527.0    771.6
Pipe Throughput                               12440.0    5175989.2   4160.8
Pipe-based Context Switching                   4000.0    2207747.8   5519.4
Process Creation                                126.0      25125.5   1994.1
Shell Scripts (1 concurrent)                     42.4      33461.2   7891.8
Shell Scripts (8 concurrent)                      6.0       4024.7   6707.8
System Call Overhead                          15000.0    2917278.6   1944.9
                                                                   ========
System Benchmarks Index Score                                        3040.1

Signed-off-by: Rui Wang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Use TLB for ioremap()

We can support more cache attributes (e.g., CC, SUC and WUC) and page
protection when we use TLB for ioremap(). The implementation is based
on GENERIC_IOREMAP.

The existing simple ioremap() implementation has better performance so
we keep it and introduce ARCH_IOREMAP to control the selection.

We move pagetable_init() earlier to make early ioremap() works, and we
modify the PCI ecam mapping because the TLB-based version of ioremap()
will actually take the size into account.

Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Support access filter to /dev/mem interface

Accidental access to /dev/mem is obviously disastrous, but specific
access can be used by people debugging the kernel. So select GENERIC_
LIB_DEVMEM_IS_ALLOWED, as well as define ARCH_HAS_VALID_PHYS_ADDR_RANGE
and related helpers, to support access filter to /dev/mem interface.

Signed-off-by: Weihao Li <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Refactor cache probe and flush methods

Current cache probe and flush methods have some drawbacks:
1, Assume there are 3 cache levels and only 3 levels;
2, Assume L1 = I + D, L2 = V, L3 = S, V is exclusive, S is inclusive.

However, the fact is I + D, I + D + V, I + D + S and I + D + V + S are
all valid. So, refactor the cache probe and flush methods to adapt more
types of cache hierarchy.

Signed-off-by: Huacai Chen <[email protected]>

LoongArch: mm: Refactor TLB exception handlers

This patch simplifies TLB load, store and modify exception handlers:

1. Reduce instructions, such as alu/csr and memory access;
2. Execute tlb search instruction only in the fast path;
3. Return directly from the fast path for both normal and huge pages;
4. Re-tab the assembly for better vertical alignment.

And fixes the concurrent modification issue of fast path for huge pages.

This issue will occur in the following steps:

   CPU-1 (In TLB exception)         CPU-2 (In THP splitting)
1: Load PMD entry (HUGE=1)
2: Goto huge path
3:                                  Store PMD entry (HUGE=0)
4: Reload PMD entry (HUGE=0)
5: Fill TLB entry (PA is incorrect)

This patch also slightly improves the TLB processing performance:

* Normal pages: 2.15%, Huge pages: 1.70%.

  #include <stdio.h>
  #include <stdlib.h>
  #include <unistd.h>
  #include <sys/mman.h>

  int main(int argc, char *argv[])
  {
        size_t page_size;
        size_t mem_size;
        size_t off;
        void *base;
        int flags;
        int i;

        if (argc < 2) {
                fprintf(stderr, "%s MEM_SIZE [HUGE]\n", argv[0]);
                return -1;
        }

        page_size = sysconf(_SC_PAGESIZE);
        flags = MAP_PRIVATE | MAP_ANONYMOUS;
        mem_size = strtoul(argv[1], NULL, 10);
        if (argc > 2)
                flags |= MAP_HUGETLB;

        for (i = 0; i < 10; i++) {
                base = mmap(NULL, mem_size, PROT_READ, flags, -1, 0);
                if (base == MAP_FAILED) {
                        fprintf(stderr, "Map memory failed!\n");
                        return -1;
                }

                for (off = 0; off < mem_size; off += page_size)
                        *(volatile int *)(base + off);

                munmap(base, mem_size);
        }

        return 0;
  }

Signed-off-by: Rui Wang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Support R_LARCH_GOT_PC_{LO12,HI20} in modules

GCC >= 13 and GNU assembler >= 2.40 use these relocations to address
external symbols, so we need to add them.

Let the module loader emit GOT entries for data symbols so we would be
able to handle GOT relocations. The GOT entry is just the data's symbol
address.

In module.lds, emit a stub .got section for a section header entry. The
actual content of the section entry will be filled at runtime by module_
frob_arch_sections().

Tested-by: WANG Xuerui <[email protected]>
Signed-off-by: Xi Ruoyao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Support PC-relative relocations in modules

Binutils >= 2.40 uses R_LARCH_B26 instead of R_LARCH_SOP_PUSH_PLT_PCREL,
and R_LARCH_PCALA* instead of R_LARCH_SOP_PUSH_PCREL.

Handle R_LARCH_B26 and R_LARCH_PCALA* in the module loader. For R_LARCH_
B26, also create a PLT entry as needed.

Tested-by: WANG Xuerui <[email protected]>
Signed-off-by: Xi Ruoyao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Define ELF relocation types added in ABIv2.0

These relocation types are used by GNU binutils >= 2.40 and GCC >= 13.
Add their definitions so we will be able to use them in later patches.

Link: https://github.com/loongson/LoongArch-Documentation/pull/57
Tested-by: WANG Xuerui <[email protected]>
Signed-off-by: Xi Ruoyao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Adjust symbol addressing for AS_HAS_EXPLICIT_RELOCS

If explicit relocation hints are used by the toolchain, -Wa,-mla-*
options will be useless for the C code. So only use them for the
!CONFIG_AS_HAS_EXPLICIT_RELOCS case.

Replace "la" with "la.pcrel" in head.S to keep the semantic consistent
with new and old toolchains for the low level startup code.

For per-CPU variables, the "address" of the symbol is actually an offset
from $r21. The value is near the loading address of main kernel image,
but far from the loading address of modules. So we use model("extreme")
attibute to tell the compiler that a PC-relative addressing with 32-bit
offset is not sufficient for local per-CPU variables.

The behavior with different assemblers and compilers are summarized in
the following table:

AS has            CC has
explicit relocs   explicit relocs * Behavior
==============================================================
No                No                Use la.* macros.
                                    No change from Linux 6.0.
--------------------------------------------------------------
No                Yes               Disable explicit relocs.
                                    No change from Linux 6.0.
--------------------------------------------------------------
Yes               No                Not supported.
--------------------------------------------------------------
Yes               Yes               Enable explicit relocs.
                                    No -Wa,-mla* options used.
==============================================================
*: We assume CC must have model attribute if it has explicit relocs.
   Both features are added in GCC 13 development cycle, so any GCC
   release >= 13 should be OK. Using early GCC 13 development snapshots
   may produce modules with unsupported relocations.

Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f09482a
Link: https://gcc.gnu.org/r13-1834
Link: https://gcc.gnu.org/r13-2199
Tested-by: WANG Xuerui <[email protected]>
Signed-off-by: Xi Ruoyao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Add Kconfig option AS_HAS_EXPLICIT_RELOCS

GNU as >= 2.40 and GCC >= 13 will support using explicit relocation
hints in the assembly code, instead of la.* macros. The usage of
explicit relocation hints can improve code generation so it's enabled
by default by GCC >= 13.

Introduce a Kconfig option AS_HAS_EXPLICIT_RELOCS as the switch for
"use explicit relocation hints or not".

Tested-by: WANG Xuerui <[email protected]>
Signed-off-by: Xi Ruoyao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Kconfig: Fix spelling mistake "delibrately" -> "deliberately"

There is a spelling mistake in a commented section. Fix it.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Mark __xchg() and __cmpxchg() as __always_inline

Commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") allows compiler to uninline functions marked as 'inline'.
In case of __xchg()/__cmpxchg() this would cause to reference
BUILD_BUG(), which is an error case for catching bugs and will not
happen for correct code, if __xchg()/__cmpxchg() is inlined.

This bug can be produced with CONFIG_DEBUG_SECTION_MISMATCH enabled,
and the solution is similar to below commits:
46f1619500d0225 ("MIPS: include: Mark __xchg as __always_inline"),
88356d09904bc60 ("MIPS: include: Mark __cmpxchg as __always_inline").

Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Flush TLB earlier at initialization

Move local_flush_tlb_all() earlier (just after setup_ptwalker() and
before page allocation). This can avoid stale TLB entries misguiding
the later page allocation. Without this patch the second kernel of
kexec/kdump fails to boot SMP.

BTW, move output_pgtable_bits_defines() into tlb_init() since it has
nothing to do with tlb handler setup.

Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Do not create sysfs control file for io master CPUs

Now io master CPUs are not hotpluggable on LoongArch, but in the current
code only /sys/devices/system/cpu/cpu0/online is not created. Let us set
the hotpluggable field of all the io master CPUs as 0, then prevent to
create sysfs control file for all the io master CPUs which confuses some
user space tools. This is similar with commit 9cce844abf07 ("MIPS: CPU#0
is not hotpluggable").

Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Fix cpu name after CPU-hotplug

Don't overwrite the SMBIOS-provided CPU name on coming back from CPU-
hotplug (including S3/S4) if it is already initialized.

Reviewed-by: WANG Xuerui <[email protected]>
Signed-off-by: Jianmin Lv <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

net/mlx5: Make ASO poll CQ usable in atomic context

Poll CQ functions shouldn't sleep as they are called in atomic context.
The following splat appears once the mlx5_aso_poll_cq() is used in such
flow.

BUG: scheduling while atomic: swapper/17/0/0x00000100
Modules linked in: sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core fuse [last unloaded: mlx5_core]
CPU: 17 PID: 0 Comm: swapper/17 Tainted: G        W          6.0.0-rc2+ #13
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
  <IRQ>
  dump_stack_lvl+0x34/0x44
  __schedule_bug.cold+0x47/0x53
  __schedule+0x4b6/0x670
  ? hrtimer_start_range_ns+0x28d/0x360
  schedule+0x50/0x90
  schedule_hrtimeout_range_clock+0x98/0x120
  ? __hrtimer_init+0xb0/0xb0
  usleep_range_state+0x60/0x90
  mlx5_aso_poll_cq+0xad/0x190 [mlx5_core]
  mlx5e_ipsec_aso_update_curlft+0x81/0xb0 [mlx5_core]
  xfrm_timer_handler+0x6b/0x360
  ? xfrm_find_acq_byseq+0x50/0x50
  __hrtimer_run_queues+0x139/0x290
  hrtimer_run_softirq+0x7d/0xe0
  __do_softirq+0xc7/0x272
  irq_exit_rcu+0x87/0xb0
  sysvec_apic_timer_interrupt+0x72/0x90
  </IRQ>
  <TASK>
  asm_sysvec_apic_timer_interrupt+0x16/0x20
RIP: 0010:default_idle+0x18/0x20
Code: ae 7d ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 8b 05 b5 30 0d 01 85 c0 7e 07 0f 00 2d 0a e3 53 00 fb f4 <c3> 0f 1f 80 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 80 ad 01 00
RSP: 0018:ffff888100883ee0 EFLAGS: 00000242
RAX: 0000000000000001 RBX: ffff888100849580 RCX: 4000000000000000
RDX: 0000000000000001 RSI: 0000000000000083 RDI: 000000000008863c
RBP: 0000000000000011 R08: 00000064e6977fa9 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  default_idle_call+0x37/0xb0
  do_idle+0x1cd/0x1e0
  cpu_startup_entry+0x19/0x20
  start_secondary+0xfe/0x120
  secondary_startup_64_no_verify+0xcd/0xdb
  </TASK>
softirq: huh, entered softirq 8 HRTIMER 00000000a97c08cb with preempt_count 00000100, exited with 00000000?

Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'irqchip-fixes-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Pull irqchip fixes from Marc Zyngier:

  - Fix IMX-MU Kconfig, keeping it private to IMX

  - Fix a register offset for the same IMX-MU driver

  - Fix the ls-extirq irqchip driver that would use the wrong
    flavour of spinlocks

Link: https://lore.kernel.org/r/[email protected]

tcp: cdg: allow tcp_cdg_release() to be called multiple times

Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]

Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.

[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567

CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>

Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571

Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'inet-ping-fixes'

Eric Dumazet says:

====================
inet: ping: give ping some care

First patch fixes an ipv6 ping bug that has been there forever,
for large sizes.

Second patch fixes a recent and elusive bug, that can potentially
crash the host. This is what I mentioned privately to Paolo and
Jakub at LPC in Dublin.
====================

Signed-off-by: David S. Miller <[email protected]>

inet: ping: fix recent breakage

Blamed commit broke the assumption used by ping sendmsg() that
allocated skb would have MAX_HEADER bytes in skb->head.

This patch changes the way ping works, by making sure
the skb head contains space for the icmp header,
and adjusting ping_getfrag() which was desperate
about going past the icmp header :/

This is adopting what UDP does, mostly.

syzbot is able to crash a host using both kfence and following repro in a loop.

fd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_ICMPV6)
connect(fd, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0),
inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28
sendmsg(fd, {msg_name=NULL, msg_namelen=0, msg_iov=[
{iov_base="\200\0\0\0\23\0\0\0\0\0\0\0\0\0"..., iov_len=65496}],
msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0

When kfence triggers, skb->head only has 64 bytes, immediately followed
by struct skb_shared_info (no extra headroom based on ksize(ptr))

Then icmpv6_push_pending_frames() is overwriting first bytes
of skb_shinfo(skb), making nr_frags bigger than MAX_SKB_FRAGS,
and/or setting shinfo->gso_size to a non zero value.

If nr_frags is mangled, a crash happens in skb_release_data()

If gso_size is mangled, we have the following report:

lo: caps=(0x00000516401d7c69, 0x00000516401d7c69)
WARNING: CPU: 0 PID: 7548 at net/core/dev.c:3239 skb_warn_bad_offload+0x119/0x230 net/core/dev.c:3239
Modules linked in:
CPU: 0 PID: 7548 Comm: syz-executor268 Not tainted 6.0.0-syzkaller-02754-g557f050166e5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
RIP: 0010:skb_warn_bad_offload+0x119/0x230 net/core/dev.c:3239
Code: 70 03 00 00 e8 58 c3 24 fa 4c 8d a5 e8 00 00 00 e8 4c c3 24 fa 4c 89 e9 4c 89 e2 4c 89 f6 48 c7 c7 00 53 f5 8a e8 13 ac e7 01 <0f> 0b 5b 5d 41 5c 41 5d 41 5e e9 28 c3 24 fa e8 23 c3 24 fa 48 89
RSP: 0018:ffffc9000366f3e8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88807a9d9d00 RCX: 0000000000000000
RDX: ffff8880780c0000 RSI: ffffffff8160f6f8 RDI: fffff520006cde6f
RBP: ffff888079952000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000400 R11: 0000000000000000 R12: ffff8880799520e8
R13: ffff88807a9da070 R14: ffff888079952000 R15: 0000000000000000
FS: 0000555556be6300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020010000 CR3: 000000006eb7b000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
gso_features_check net/core/dev.c:3521 [inline]
netif_skb_features+0x83e/0xb90 net/core/dev.c:3554
validate_xmit_skb+0x2b/0xf10 net/core/dev.c:3659
__dev_queue_xmit+0x998/0x3ad0 net/core/dev.c:4248
dev_queue_xmit include/linux/netdevice.h:3008 [inline]
neigh_hh_output include/net/neighbour.h:530 [inline]
neigh_output include/net/neighbour.h:544 [inline]
ip6_finish_output2+0xf97/0x1520 net/ipv6/ip6_output.c:134
__ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
dst_output include/net/dst.h:445 [inline]
ip6_local_out+0xaf/0x1a0 net/ipv6/output_core.c:161
ip6_send_skb+0xb7/0x340 net/ipv6/ip6_output.c:1966
ip6_push_pending_frames+0xdd/0x100 net/ipv6/ip6_output.c:1986
icmpv6_push_pending_frames+0x2af/0x490 net/ipv6/icmp.c:303
ping_v6_sendmsg+0xc44/0x1190 net/ipv6/ping.c:190
inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x712/0x8c0 net/socket.c:2482
___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
__sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f21aab42b89
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff1729d038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f21aab42b89
RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
R10: 000000000000000d R11: 0000000000000246 R12: 00007fff1729d050
R13: 00000000000f4240 R14: 0000000000021dd1 R15: 00007fff1729d044
</TASK>

Fixes: 47cf88993c91 ("net: unify alloclen calculation for paged requests")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Lorenzo Colitti <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Cc: Maciej Żenczykowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: ping: fix wrong checksum for large frames

For a given ping datagram, ping_getfrag() is called once
per skb fragment.

A large datagram requiring more than one page fragment
is currently getting the checksum of the last fragment,
instead of the cumulative one.

After this patch, "ping -s 35000 ::1" is working correctly.

Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Lorenzo Colitti <[email protected]>
Cc: Maciej Żenczykowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: ti: am65-cpsw: set correct devlink flavour for unused ports

am65_cpsw_nuss_register_ndevs() skips calling devlink_port_type_eth_set()
for ports without assigned netdev, triggering the following warning when
DEVLINK_PORT_TYPE_WARN_TIMEOUT elapses after 3600s:

Type was not set for devlink port.
WARNING: CPU: 0 PID: 129 at net/core/devlink.c:8095 devlink_port_type_warn+0x18/0x30

Fixes: 0680e20af5fb ("net: ethernet: ti: am65-cpsw: Fix devlink port register sequence")
Signed-off-by: Matthias Schiffer <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

irqchip: IMX_MU_MSI should depend on ARCH_MXC

The Freescale/NXP i.MX Messaging Unit is only present on Freescale/NXP
i.MX SoCs. Hence add a dependency on ARCH_MXC, to prevent asking the
user about this driver when configuring a kernel without Freescale/NXP
i.MX SoC family support.

While at it, expand "MU" to "Messaging Unit" in the help text.

Fixes: 70afdab904d2d1e6 ("irqchip: Add IMX MU MSI controller driver")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/7f3bd932614ddbff46a1b750ef45b231130364ad.1664900434.git.geert+renesas@glider.be

xen: Kconfig: Fix spelling mistake "Maxmium" -> "Maximum"

There is a spelling mistake in a Kconfig description. Fix it.

Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Stefano Stabellini <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>

selftests/bpf: Alphabetize DENYLISTs

The DENYLIST and DENYLIST.s390x files are used to specify testcases
which should not be run on CI. Currently, testcases are appended to the
end of these files as needed. This can make it a pain to resolve merge
conflicts. This patch alphabetizes the DENYLIST files to ease this
burden.

Signed-off-by: David Vernet <[email protected]>
Acked-by: Daniel Müller <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>

Merge tag 'memblock-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock updates from Mike Rapoport:
"Test suite improvements:

   - Added verification that memblock allocations zero the allocated
     memory

   - Added more test cases for memblock_add(), memblock_remove(),
     memblock_reserve() and memblock_free()

   - Added tests for memblock_*_raw() family

   - Added tests for NUMA-aware allocations in memblock_alloc_try_nid()
     and memblock_alloc_try_nid_raw()"

* tag 'memblock-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  memblock tests: add generic NUMA tests for memblock_alloc_try_nid*
  memblock tests: add bottom-up NUMA tests for memblock_alloc_try_nid*
  memblock tests: add top-down NUMA tests for memblock_alloc_try_nid*
  memblock tests: add simulation of physical memory with multiple NUMA nodes
  memblock_tests: move variable declarations to single block
  memblock tests: remove 'cleared' from comment blocks
  memblock tests: add tests for memblock_trim_memory
  memblock tests: add tests for memblock_*bottom_up functions
  memblock tests: update alloc_nid_api to test memblock_alloc_try_nid_raw
  memblock tests: update alloc_api to test memblock_alloc_raw
  memblock tests: add additional tests for basic api and memblock_alloc
  memblock tests: add labels to verbose output for generic alloc tests
  memblock tests: update zeroed memory check for memblock_alloc_* tests
  memblock tests: update tests to check if memblock_alloc zeroed memory
  memblock tests: update reference to obsolete build option in comments
  memblock tests: add command line help option

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
"The main batch of ARM + RISC-V changes, and a few fixes and cleanups
  for x86 (PMU virtualization and selftests).

  ARM:

   - Fixes for single-stepping in the presence of an async exception as
     well as the preservation of PSTATE.SS

   - Better handling of AArch32 ID registers on AArch64-only systems

   - Fixes for the dirty-ring API, allowing it to work on architectures
     with relaxed memory ordering

   - Advertise the new kvmarm mailing list

   - Various minor cleanups and spelling fixes

  RISC-V:

   - Improved instruction encoding infrastructure for instructions not
     yet supported by binutils

   - Svinval support for both KVM Host and KVM Guest

   - Zihintpause support for KVM Guest

   - Zicbom support for KVM Guest

   - Record number of signal exits as a VCPU stat

   - Use generic guest entry infrastructure

  x86:

   - Misc PMU fixes and cleanups.

   - selftests: fixes for Hyper-V hypercall

   - selftests: fix nx_huge_pages_test on TDP-disabled hosts

   - selftests: cleanups for fix_hypercall_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (57 commits)
  riscv: select HAVE_POSIX_CPU_TIMERS_TASK_WORK
  RISC-V: KVM: Use generic guest entry infrastructure
  RISC-V: KVM: Record number of signal exits as a vCPU stat
  RISC-V: KVM: add __init annotation to riscv_kvm_init()
  RISC-V: KVM: Expose Zicbom to the guest
  RISC-V: KVM: Provide UAPI for Zicbom block size
  RISC-V: KVM: Make ISA ext mappings explicit
  RISC-V: KVM: Allow Guest use Zihintpause extension
  RISC-V: KVM: Allow Guest use Svinval extension
  RISC-V: KVM: Use Svinval for local TLB maintenance when available
  RISC-V: Probe Svinval extension form ISA string
  RISC-V: KVM: Change the SBI specification version to v1.0
  riscv: KVM: Apply insn-def to hlv encodings
  riscv: KVM: Apply insn-def to hfence encodings
  riscv: Introduce support for defining instructions
  riscv: Add X register names to gpr-nums
  KVM: arm64: Advertise new kvmarm mailing list
  kvm: vmx: keep constant definition format consistent
  kvm: mmu: fix typos in struct kvm_arch
  KVM: selftests: Fix nx_huge_pages_test on TDP-disabled hosts
  ...

nilfs2: fix leak of nilfs_root in case of writer thread creation failure

If nilfs_attach_log_writer() failed to create a log writer thread, it
frees a data structure of the log writer without any cleanup.  After
commit e912a5b66837 ("nilfs2: use root object to get ifile"), this causes
a leak of struct nilfs_root, which started to leak an ifile metadata inode
and a kobject on that struct.

In addition, if the kernel is booted with panic_on_warn, the above
ifile metadata inode leak will cause the following panic when the
nilfs2 kernel module is removed:

  kmem_cache_destroy nilfs2_inode_cache: Slab cache still has objects when
  called from nilfs_destroy_cachep+0x16/0x3a [nilfs2]
  WARNING: CPU: 8 PID: 1464 at mm/slab_common.c:494 kmem_cache_destroy+0x138/0x140
  ...
  RIP: 0010:kmem_cache_destroy+0x138/0x140
  Code: 00 20 00 00 e8 a9 55 d8 ff e9 76 ff ff ff 48 8b 53 60 48 c7 c6 20 70 65 86 48 c7 c7 d8 69 9c 86 48 8b 4c 24 28 e8 ef 71 c7 00 <0f> 0b e9 53 ff ff ff c3 48 81 ff ff 0f 00 00 77 03 31 c0 c3 53 48
  ...
  Call Trace:
   <TASK>
   ? nilfs_palloc_freev.cold.24+0x58/0x58 [nilfs2]
   nilfs_destroy_cachep+0x16/0x3a [nilfs2]
   exit_nilfs_fs+0xa/0x1b [nilfs2]
    __x64_sys_delete_module+0x1d9/0x3a0
   ? __sanitizer_cov_trace_pc+0x1a/0x50
   ? syscall_trace_enter.isra.19+0x119/0x190
   do_syscall_64+0x34/0x80
   entry_SYSCALL_64_after_hwframe+0x63/0xcd
   ...
   </TASK>
  Kernel panic - not syncing: panic_on_warn set ...

This patch fixes these issues by calling nilfs_detach_log_writer() cleanup
function if spawning the log writer thread fails.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: e912a5b66837 ("nilfs2: use root object to get ifile")
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()

If the i_mode field in inode of metadata files is corrupted on disk, it
can cause the initialization of bmap structure, which should have been
called from nilfs_read_inode_common(), not to be called. This causes a
lockdep warning followed by a NULL pointer dereference at
nilfs_bmap_lookup_at_level().

This patch fixes these issues by adding a missing sanitiy check for the
i_mode field of metadata file's inode.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Reported-by: Tetsuo Handa <[email protected]>
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: fix use-after-free bug of struct nilfs_root

If the beginning of the inode bitmap area is corrupted on disk, an inode
with the same inode number as the root inode can be allocated and fail
soon after. In this case, the subsequent call to nilfs_clear_inode() on
that bogus root inode will wrongly decrement the reference counter of
struct nilfs_root, and this will erroneously free struct nilfs_root,
causing kernel oopses.

This fixes the problem by changing nilfs_new_inode() to skip reserved
inode numbers while repairing the inode bitmap.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Reported-by: Khalid Masum <[email protected]>
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/damon/core: initialize damon_target->list in damon_new_target()

'struct damon_target' creation function, 'damon_new_target()' is not
initializing its '->list' field, unlike other DAMON structs creator
functions such as 'damon_new_region()'. Normal users of
'damon_new_target()' initializes the field by adding the target to DAMON
context's targets list, but some code could access the uninitialized
field.

This commit avoids the case by initializing the field in
'damon_new_target()'.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: f23b8eee1871 ("mm/damon/core: implement region-based sampling")
Signed-off-by: SeongJae Park <[email protected]>
Reported-by: Hyeonggon Yoo <[email protected]>
Tested-by: Hyeonggon Yoo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page

On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.

So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte().  However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.

That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.

For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.

Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().

To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.

Mike said:

Support for CONT_PMD/_PTE was added with bb9dd3df8ee9 ("arm64: hugetlb:
refactor find_num_contig()").  Patch series "Support for contiguous pte
hugepages", v4.  However, I do not believe these code paths were
executed until migration support was added with 5480280d3f2d ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with 5480280d3f2d for the Fixes: targe.

Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com
Fixes: 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages")
Signed-off-by: Baolin Wang <[email protected]>
Suggested-by: Mike Kravetz <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Merge tag 'wireless-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.1

First set of fixes for v6.1. Quite a lot of fixes in stack but also
for mt76.

cfg80211/mac80211
- fix locking error in mac80211's hw addr change
- fix TX queue stop for internal TXQs
- handling of very small (e.g. STP TCN) packets
- two memcpy() hardening fixes
- fix probe request 6 GHz capability warning
- fix various connection prints
- fix decapsulation offload for AP VLAN

mt76
- fix rate reporting, LLC packets and receive checksum offload on specific chipsets

iwlwifi
- fix crash due to list corruption

ath11k
- fix a compiler warning with GCC 11 and KASAN

* tag 'wireless-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: ath11k: mac: fix reading 16 bytes from a region of size 0 warning
  wifi: iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (other cases)
  wifi: mt76: fix rx checksum offload on mt7615/mt7915/mt7921
  wifi: mt76: fix receiving LLC packets on mt7615/mt7915
  wifi: nl80211: Split memcpy() of struct nl80211_wowlan_tcp_data_token flexible array
  wifi: wext: use flex array destination for memcpy()
  wifi: cfg80211: fix ieee80211_data_to_8023_exthdr handling of small packets
  wifi: mac80211: netdev compatible TX stop for iTXQ drivers
  wifi: mac80211: fix decap offload for stations on AP_VLAN interfaces
  wifi: mac80211: unlock on error in ieee80211_can_powered_addr_change()
  wifi: mac80211: remove/avoid misleading prints
  wifi: mac80211: fix probe req HE capabilities access
  wifi: mac80211: do not drop packets smaller than the LLC-SNAP header on fast-rx
  wifi: mt76: fix rate reporting / throughput regression on mt7915 and newer
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype

The argument has_signal of arch_do_signal_or_restart() has been removed in
commit 8ba62d37949e ("task_work: Call tracehook_notify_signal from
get_signal on all architectures"), let us remove the related comment.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8ba62d37949e ("task_work: Call tracehook_notify_signal from get_signal on all architectures")
Signed-off-by: Tiezhu Yang <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

proc: test how it holds up with mapping'less process

Create process without mappings and check

/proc/*/maps
/proc/*/numa_maps
/proc/*/smaps
/proc/*/smaps_rollup

They must be empty (excluding vsyscall page) or full of zeroes.

Retroactively this test should've caught embarassing /proc/*/smaps_rollup
oops:

[17752.703567] BUG: kernel NULL pointer dereference, address: 0000000000000000
[17752.703580] #PF: supervisor read access in kernel mode
[17752.703583] #PF: error_code(0x0000) - not-present page
[17752.703587] PGD 0 P4D 0
[17752.703593] Oops: 0000 [#1] PREEMPT SMP PTI
[17752.703598] CPU: 0 PID: 60649 Comm: cat Tainted: G W 5.19.9-100.fc35.x86_64 #1
[17752.703603] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme6/3.1, BIOS P3.30 08/05/2016
[17752.703607] RIP: 0010:show_smaps_rollup+0x159/0x2e0

Note 1:
ProtectionKey field in /proc/*/smaps is optional,
so check most of its contents, not everything.

Note 2:
due to the nature of this test, child process hardly can signal
its readiness (after unmapping everything!) to parent.
I feel like "sleep(1)" is justified.
If you know how to do it without sleep please tell me.

Note 3:
/proc/*/statm is not tested but can be.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mailmap: update Frank Rowand email address

Frank is no longer at Sony, add an entry for his latest Sony email

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Frank Rowand <[email protected]>
Cc: Tim Bird <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

ia64: mca: use strscpy() is more robust and safer

The implementation of strscpy() is more robust and safer. That's now the
recommended way to copy NUL terminated strings.

Link: https://lkml.kernel.org/r/[email protected]
Reported-by: Zeal Robot <[email protected]>
Signed-off-by: Xu Panda <[email protected]>
Signed-off-by: xu xin <[email protected]>
Cc: Haowen Bai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

init/Kconfig: fix unmet direct dependencies

Commit 3c07bfce92a5 ("proc: make config PROC_CHILDREN depend on PROC_FS")
make config PROC_CHILDREN depend on PROC_FS.

When CONFIG_PROC_FS is not set and CONFIG_CHECKPOINT_RESTORE=y,
make menuconfig screams like this:

WARNING: unmet direct dependencies detected for PROC_CHILDREN
  Depends on [n]: PROC_FS [=n]
  Selected by [y]:
  - CHECKPOINT_RESTORE [=y]

CHECKPOINT_RESTORE would select PROC_CHILDREN which depends on PROC_FS,
so add depends on PROC_FS to CHECKPOINT_RESTORE to fix this.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 3c07bfce92a5 ("proc: make config PROC_CHILDREN depend on PROC_FS")
Signed-off-by: Ren Zhijie <[email protected]>
Reviewed-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

ia64: update config files

Clean up config files by:
  - removing configs that were deleted in the past
  - removing configs not in tree and without recently pending patches
  - adding new configs that are replacements for old configs in the file

For some detailed information, see Link.

Link: https://lore.kernel.org/kernel-janitors/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure

If creation or finalization of a checkpoint fails due to anomalies in the
checkpoint metadata on disk, a kernel warning is generated.

This patch replaces the WARN_ONs by nilfs_error, so that a kernel, booted
with panic_on_warn, does not panic. A nilfs_error is appropriate here to
handle the abnormal filesystem condition.

This also replaces the detected error codes with an I/O error so that
neither of the internal error codes is returned to callers.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Signed-off-by: Andrew Morton <[email protected]>

fork: remove duplicate included header files

linux/sched/mm.h is included more than once.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Xu Panda <[email protected]>
Reported-by: Zeal Robot <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Christian Brauner (Microsoft) <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Merge tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools updates from Arnaldo Carvalho de Melo:

- Add support for AMD on 'perf mem' and 'perf c2c', the kernel
   enablement patches went via tip.

   Example:

      $ sudo perf mem record -- -c 10000
      ^C[ perf record: Woken up 227 times to write data ]
      [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

      $ sudo perf mem report -F mem,sample,snoop
      Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
      Memory access                  Samples  Snoop
      N/A                             700620  N/A
      L1 hit                          126675  N/A
      L2 hit                             424  N/A
      L3 hit                             664  HitM
      L3 hit                              10  N/A
      Local RAM hit                        2  N/A
      Remote RAM (1 hop) hit            8558  N/A
      Remote Cache (1 hop) hit             3  N/A
      Remote Cache (1 hop) hit             2  HitM
      Remote Cache (2 hops) hit           10  HitM
      Remote Cache (2 hops) hit            6  N/A
      Uncached hit                         4  N/A
      $

- "perf lock" improvements:

     - Add -E/--entries option to limit the number of entries to
       display, say to ask for just the top 5 contended locks.

     - Add -q/--quiet option to suppress header and debug messages.

     - Add a 'perf test' kernel lock contention entry to test 'perf
       lock'.

- "perf lock contention" improvements:

     - Ask BPF's bpf_get_stackid() to skip some callchain entries.

       The ones closer to the tooling are bpf related and not that
       interesting, the ones calling the locking function are the ones
       we're interested in, example of a full, unskipped callstack:

     - Allow changing the callstack depth and number of entries to skip.

           1     10.74 us     10.74 us     10.74 us     spinlock   __bpf_trace_contention_begin+0xb
                          0xffffffffc03b5c47  bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
                          0xffffffffc03b5c47  bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
                          0xffffffffbb8b8e75  bpf_trace_run2+0x35
                          0xffffffffbb7eab9b  __bpf_trace_contention_begin+0xb
                          0xffffffffbb7ebe75  queued_spin_lock_slowpath+0x1f5
                          0xffffffffbc1c26ff  _raw_spin_lock+0x1f
                          0xffffffffbb841015  tick_do_update_jiffies64+0x25
                          0xffffffffbb8409ee  tick_irq_enter+0x9e

     - Show full callstack in verbose mode (-v option), sometimes this
       is desirable instead of showing just one callstack entry.

- Allow multiple time ranges in 'perf record --delay' to help in
   reducing the amount of data collected from hardware tracing (Intel
   PT, etc) when there is a rough idea of periods of time where events
   of interest take time.

- Add Intel PT to record only decoder debug messages when error
   happens.

- Improve layout of Intel PT man page.

- Add new branch types: alignment, data and inst faults and arch
   specific ones, such as fiq, debug_halt, debug_exit, debug_inst and
   debug_data on arm64.

   Kernel enablement went thru the tip tree.

- Fix 'perf probe' error log check in 'perf test' when no debuginfo is
   available.

- Fix 'perf stat' aggregation mode logic, it should be looking at the
   CPU not at the core number.

- Fix flags parsing in 'perf trace' filters.

- Introduce compact encoding of CPU range encoding on perf.data, to
   avoid having a bitmap with all the CPUs.

- Improvements to the 'perf stat' metrics, including adding
   "core_wide", and computing "smt" from the CPU topology.

- Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format,
   that allows tooling to ask for the precise number of lost samples for
   a given event.

- Add 'addr' sort key to see just the address of sampled instructions:

      $ perf record -o- true | perf report -i- -s addr
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.000 MB - ]
      # Samples: 12  of event 'cycles:u'
      # Event count (approx.): 252512
      #
      # Overhead  Address
      # ........  ..................
          42.96%  0x7f96f08443d7
          29.55%  0x7f96f0859b50
          14.76%  0x7f96f0852e02
           8.30%  0x7f96f0855028
           4.43%  0xffffffff8de01087

      perf annotate: Toggle full address <-> offset display

- Add 'f' hotkey to the 'perf annotate' TUI interface when in
   'disassembler output' mode ('o' hotkey) to toggle showing full
   virtual address or just the offset.

- Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for
   pre-existing threads, at the start of a 'perf record' session,
   speeding up that record startup phase.

- Add a command line option to specify build ids in 'perf inject'.

- Update JSON event files for the Intel alderlake, broadwell,
   broadwellde, broadwellx, cascadelakex, haswell, haswellx, icelake,
   icelakex, ivybridge, ivytown, jaketown, sandybridge, sapphirerapids,
   skylake, skylakex, and tigerlake processors.

- Update vendor JSON event files for the ARM Neoverse V1 and E1
   platforms.

- Add a 'perf test' entry for 'perf mem' where a struct has false
   sharing and this gets detected in the 'perf mem' output, tested with
   Intel, AMD and ARM64 systems.

- Add a 'perf test' entry to test the resolution of java symbols, where
   an output like this is expected:

       8.18%  jshell    jitted-50116-29.so    [.] Interpreter
       0.75%  Thread-1  jitted-83602-1670.so  [.] jdk.internal.jimage.BasicImageReader.getString(int)

- Add tests for the ARM64 CoreSight hardware tracing feature, with
   specially crafted pureloop, memcpy, thread loop and unroll tread that
   then gets traced and the output compared with expected output.

   Documentation explaining it is also included.

- Add per thread Intel PT 'perf test' entry to check that
   PERF_RECORD_TEXT_POKE events are recorded per CPU, resulting in a
   mixture of per thread and per CPU events and mmaps, verify that this
   gets all recorded correctly.

- Introduce pthread mutex wrappers to allow for building with clang's
   -Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by"
   "lockable", "exclusive_lock_function", "exclusive_trylock_function",
   "exclusive_locks_required", and "no_thread_safety_analysis" compiler
   function attributes.

- Fix empty version number when building outside of a git repo.

- Improve feature detection display when multiple versions of a feature
   are present, such as for binutils libbfd, that has a mix of possible
   ways to detect according to the Linux distribution.

   Previously in some cases we had:

      Auto-detecting system features
      <SNIP>
      ...                                  libbfd: [ on  ]
      ...                          libbfd-liberty: [ on  ]
      ...                        libbfd-liberty-z: [ on  ]
      <SNIP>

   Now for this case we show just the main feature:

      Auto-detecting system features
      <SNIP>
      ...                                  libbfd: [ on  ]
      <SNIP>

- Remove some unused structs, variables, macros, function prototypes
   and includes from various places.

* tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits)
  perf script: Add missing fields in usage hint
  perf mem: Print "LFB/MAB" for PERF_MEM_LVLNUM_LFB
  perf mem/c2c: Avoid printing empty lines for unsupported events
  perf mem/c2c: Add load store event mappings for AMD
  perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
  perf mem: Add support for printing PERF_MEM_LVLNUM_{CXL|IO}
  perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel
  tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel
  perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout()
  perf test coresight: Add relevant documentation about ARM64 CoreSight testing
  perf test: Add git ignore for tmp and output files of ARM CoreSight tests
  perf test coresight: Add unroll thread test shell script
  perf test coresight: Add unroll thread test tool
  perf test coresight: Add thread loop test shell scripts
  perf test coresight: Add thread loop test tool
  perf test coresight: Add memcpy thread test shell script
  perf test coresight: Add memcpy thread test tool
  perf test: Add git ignore for perf data generated by the ARM CoreSight tests
  perf test: Add arm64 asm pureloop test shell script
  perf test: Add asm pureloop test tool
  ...

Merge tag 'pci-v6.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci updates from Bjorn Helgaas:
"Resource management:

   - Distribute spare resources to unconfigured hotplug bridges at
     boot-time (not just when hot-adding such a bridge), which makes
     hot-adding devices to docks work better.

   - Revert to a BAR assignment inherited from firmware only when the
     address is actually reachable via any upstream bridges, which fixes
     some cases where firmware doesn't configure all devices.

   - Add a sysfs interface to resize BARs so this can be done before
     assigning devices to a VM through VFIO.

  Power management:

   - Disable Precision Time Management for all devices on suspend to
     enable lower-power PM state. We previously did this just for Root
     Ports, which isn't enough because downstream devices can still
     generate PTM messages, which cause errors if it's disabled in the
     Root Port.

   - Save and restore the ASPM L1 PM Substates configuration for
     suspend/ resume. Previously this configuration was lost, so L1.x
     states likely stopped working after resume.

   - Check whether the L1 PM Substates Capability exists. If it didn't
     exist, we previously read junk and tried to configure L1 Substates
     based on that.

   - Fix the LTR_L1.2_THRESHOLD computation, which previously set a
     threshold for entering L1.2 that was too low in some cases.

   - Reduce the delay after transitions to or from D3cold by using
     usleep_range() rather than msleep(), which often slept for ~19ms
     instead of the 10ms normally required. The spec says 10ms is
     enough, but it's possible we could trip over devices that need a
     little more.

  Error handling:

   - Work around a BIOS bug that caused Intel Root Ports to advertise a
     Root Port Programmed I/O (RP PIO) log size of zero, which caused
     annoying warnings and prevented the kernel from dumping log
     registers for DPC errors.

  Qualcomm PCIe controller driver:

   - Add support for SC8280XP and SA8540P host controllers and SM8450
     endpoint controller.

   - Disable Master AXI clock on endpoint controllers to save power when
     link is idle or in L1.x.

   - Expose link state transition counts via debugfs to help debug
     issues with low-power states.

   - Add auto-loading module support.

  Synopsys DesignWare PCIe controller driver:

   - Remove a dependency on ZONE_DMA32 by allocating the MSI target page
     differently. There's more work to do related to eDMA controllers,
     so it's not completely settled"

* tag 'pci-v6.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (71 commits)
  PCI: qcom-ep: Check platform_get_resource_byname() return value
  PCI: qcom-ep: Add support for SM8450 SoC
  dt-bindings: PCI: qcom-ep: Add support for SM8450 SoC
  dt-bindings: PCI: qcom-ep: Define clocks per platform
  PCI: qcom-ep: Make PERST separation optional
  dt-bindings: PCI: qcom-ep: Make PERST separation optional
  PCI: qcom-ep: Disable Master AXI Clock when there is no PCIe traffic
  PCI: Expose PCIe Resizable BAR support via sysfs
  PCI/ASPM: Correct LTR_L1.2_THRESHOLD computation
  PCI/ASPM: Ignore L1 PM Substates if device lacks capability
  PCI/ASPM: Factor out L1 PM Substates configuration
  PCI: qcom-ep: Gate Master AXI clock to MHI bus during L1SS
  PCI: qcom-ep: Expose link transition counts via debugfs
  PCI: qcom-ep: Disable IRQs during driver remove
  PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
  PCI/ASPM: Refactor L1 PM Substates Control Register programming
  PCI: qcom-ep: Make use of the cached dev pointer
  PCI: qcom-ep: Rely on the clocks supplied by devicetree
  PCI: qcom-ep: Add kernel-doc for qcom_pcie_ep structure
  phy: freescale: imx8m-pcie: Fix the wrong order of phy_init() and phy_power_on()
  ...

Merge tag 'i2c-for-6.1-rc1-batch2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull more i2c updates from Wolfram Sang:

- correct a variable type in the new pci1xxxx driver

- add a new SoC to the qcom-cci driver

- fix an issue with the designware driver which now got enough testing

- the aspeed driver now handles busy target backends better

* tag 'i2c-for-6.1-rc1-batch2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: aspeed: Assert NAK when slave is busy
  i2c: designware: Fix handling of real but unexpected device interrupts
  i2c: qcom-cci: Add MSM8226 compatible
  dt-bindings: i2c: qcom,i2c-cci: Document clocks for MSM8974
  dt-bindings: i2c: qcom,i2c-cci: Document MSM8226 compatible
  i2c: microchip: pci1xxxx: Fix comparison of -EPERM against an unsigned variable

Merge tag 'pinctrl-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control updates from Linus Walleij:
"There is nothing exciting going on, no core changes, just a few
  drivers and cleanups.

  New drivers:

   - Cypress CY8C95x0 chip pin control support, along with an immediate
     cleanup

   - Mediatek MT8188 SoC pin control support

   - Qualcomm SM8450 and SC8280XP LPASS (low power audio subsystem) pin
     control support

   - Qualcomm PM7250, PM8450

   - Rockchip RV1126 SoC pin control support

  Improvements:

   - Fix some missing pins in the Armada 37xx driver

   - Convert Broadcom and Nomadik drivers to use PINCTRL_PINGROUP()
     macro

   - Fix some GPIO irq_chips to be immutable

   - Massive Qualcomm device tree binding cleanup, with more to come"

* tag 'pinctrl-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (119 commits)
  MAINTAINERS: adjust STARFIVE JH7100 PINCTRL DRIVER after file movement
  pinctrl: starfive: Rename "pinctrl-starfive" to "pinctrl-starfive-jh7100"
  pinctrl: Create subdirectory for StarFive drivers
  dt-bindings: pinctrl: st,stm32: Document interrupt-controller property
  dt-bindings: pinctrl: st,stm32: Document gpio-hog pattern property
  dt-bindings: pinctrl: st,stm32: Document gpio-line-names
  pinctrl: st: stop abusing of_get_named_gpio()
  pinctrl: wpcm450: Correct the fwnode_irq_get() return value check
  pinctrl: bcm: Remove unused struct bcm6328_pingroup
  pinctrl: qcom: restrict drivers per ARM/ARM64
  pinctrl: bcm: ns: Remove redundant dev_err call
  gpio: rockchip: request GPIO mux to pinctrl when setting direction
  pinctrl: rockchip: add pinmux_ops.gpio_set_direction callback
  pinctrl: cy8c95x0: Align function names in cy8c95x0_pmxops
  pinctrl: cy8c95x0: Drop atomicity on operations on push_pull
  pinctrl: cy8c95x0: Lock register accesses in cy8c95x0_set_mux()
  pinctrl: sunxi: sun50i-h5: Switch to use dev_err_probe() helper
  pinctrl: stm32: Switch to use dev_err_probe() helper
  dt-bindings: qcom-pmic-gpio: Add PM7250B and PM8450 bindings
  pinctrl: qcom: spmi-gpio: Add compatible for PM7250B
  ...

Merge tag 'input-for-v6.1-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:

- a new driver for IBM Operational Panel

- a new driver for PinePhone keyboards

- RT5120 PMIC power key support

- various enhancements and support for new models in xpad (Xbox) driver

- a new compatible ID for Elan touchscreen driver

- rework of adp5588-keys driver to support configuring via device
   properties (OF, ACPI, etc) instead of platform data, and proper
   support of optional gpiochip functionality (and removal of
   gpio-adp5588 driver)

- improvements to firmware update handling in Synaptics RMI4 driver

- support for double key matrix in mt6779-keypad

- support for polled mode in adc-joystick driver

- other assorted driver fixes, cleanups and improvements

* tag 'input-for-v6.1-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (90 commits)
  Input: i8042 - fix refount leak on sparc
  Input: i8042 - add LoongArch support in i8042-acpipnpio.h
  Input: i8042 - rename i8042-x86ia64io.h to i8042-acpipnpio.h
  Input: pinephone-keyboard - support the proxied I2C bus
  Input: pinephone-keyboard - add PinePhone keyboard driver
  dt-bindings: input: Add the PinePhone keyboard binding
  dt-bindings: input: Convert hid-over-i2c to DT schema
  input: drop empty comment blocks
  Input: xpad - add X-Box Adaptive Profile button
  Input: add ABS_PROFILE to uapi and documentation
  Input: xpad - add X-Box Adaptive XBox button
  Input: xpad - add X-Box Adaptive support
  Input: ims-pcu - fix spelling mistake "BOOLTLOADER" -> "BOOTLOADER"
  Input: ibm-panel - add missing MODULE_DEVICE_TABLE
  Input: icn8505 - utilize acpi_get_subsystem_id()
  Input: xpad - decipher xpadone packages with GIP defines
  Input: xpad - refactor using BIT() macro
  Input: synaptics-rmi4 - convert to use sysfs_emit() APIs
  Input: twl4030-pwrbutton - add missing of.h include
  Input: applespi - replace zero-length array with DECLARE_FLEX_ARRAY() helper
  ...

Merge tag 'fbdev-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull fbdev updates from Helge Deller:
"Here's a fix for the smscufx USB graphics card to prevent a kernel
  crash if it's plugged in/out too fast.

  The other patches are mostly small cleanups, fixes in failure paths
  and code removal:

   - fix an use-after-free in smscufx USB graphics driver

   - add missing pci_disable_device() in tridentfb failure paths

   - correctly handle irq detection failure in mb862xx driver

   - fix resume code in omapfb/dss

   - drop unused code in controlfb, tridentfb, arkfb, imxfb and udlfb

   - convert uvesafb to use scnprintf() instead of snprintf()

   - convert gbefb to use dev_groups

   - add MODULE_DEVICE_TABLE() entry to vga16fb"

* tag 'fbdev-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: mb862xx: Fix check of return value from irq_of_parse_and_map()
  fbdev: vga16fb: Add missing MODULE_DEVICE_TABLE() entry
  fbdev: tridentfb: Fix missing pci_disable_device() in probe and remove
  fbdev: smscufx: Fix use-after-free in ufx_ops_open()
  fbdev: gbefb: Convert to use dev_groups
  fbdev: imxfb: Remove redundant dev_err() call
  fbdev: omapfb/dss: Use pm_runtime_resume_and_get() instead of pm_runtime_get_sync()
  fbdev: uvesafb: Convert snprintf to scnprintf
  fbdev: arkfb: Remove the unused function dac_read_reg()
  fbdev: tridentfb: Remove the unused function shadowmode_off()
  fbdev: controlfb: Remove the unused function VAR_MATCH()
  fbdev: udlfb: Remove redundant initialization to variable identical

Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging

Pull dmi updates from Jean Delvare.

* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
firmware: dmi: Fortify entry point length checks

Merge tag 'for-linus-6.1-1' of https://github.com/cminyard/linux-ipmi

Pull IPMI updates from Corey Minyard:
"Fix a bunch of little problems in IPMI

  This is mostly just doc, config, and little tweaks. Nothing big, which
  is why there was nothing for 6.0. There is one crash fix, but it's not
  something that I think anyone is using yet"

* tag 'for-linus-6.1-1' of https://github.com/cminyard/linux-ipmi:
  ipmi: Remove unused struct watcher_entry
  ipmi: kcs: aspeed: Update port address comments
  ipmi: Add __init/__exit annotations to module init/exit funcs
  ipmi:ipmb: Don't call ipmi_unregister_smi() on a register failure
  ipmi:ipmb: Fix a vague comment and a typo
  dt-binding: ipmi: add fallback to npcm845 compatible
  ipmi: Fix comment typo
  char: ipmi: modify NPCM KCS configuration
  dt-bindings: ipmi: Add npcm845 compatible

alpha: remove the needless aliases osf_{readv,writev}

Commit 987f20a9dcce ("a.out: Remove the a.out implementation") removes
CONFIG_OSF4_COMPAT and its functionality. Hence, sys_osf_{readv,writev}
are now just aliases of sys_{readv,writev}.

Remove these needless aliases.

[ Identical patch also posted by Jason A. Donenfeld ]

Link: https://lore.kernel.org/lkml/CAHk-=wjwvBc3VQMNtUVUrMBVoMPSPu26OuatZ_+1gZ2m-PmmRA@mail.gmail.com/
Link: https://lore.kernel.org/all/[email protected]/
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

powerpc: Fix 85xx build

The merge of the kbuild tree dropped the renaming of the FSL_BOOKE
kconfig option.

Fixes: 8afc66e8d43b ("Merge tag 'kbuild-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild")
Signed-off-by: Joel Stanley <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'linux-can-fixes-for-6.1-20221011' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2022-10-11

this is a pull request of 4 patches for net/main.

Anssi Hannula and Jimmy Assarsson contribute 4 patches for the
kvaser_usb driver. A check for actual received length of USB transfers
is added, the use of an uninitialized completion is fixed, the TX
queue is re-synced after restart, and the CAN state is fixed after
restart.

* tag 'linux-can-fixes-for-6.1-20221011' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: kvaser_usb_leaf: Fix CAN state after restart
  can: kvaser_usb_leaf: Fix TX queue out of sync after restart
  can: kvaser_usb: Fix use of uninitialized completion
  can: kvaser_usb_leaf: Fix overread with an invalid command
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

xen/pv: support selecting safe/unsafe msr accesses

Instead of always doing the safe variants for reading and writing MSRs
in Xen PV guests, make the behavior controllable via Kconfig option
and a boot parameter.

The default will be the current behavior, which is to always use the
safe variant.

Signed-off-by: Juergen Gross <[email protected]>

xen/pv: refactor msr access functions to support safe and unsafe accesses

Refactor and rename xen_read_msr_safe() and xen_write_msr_safe() to
support both cases of MSR accesses, safe ones and potentially GP-fault
generating ones.

This will prepare to no longer swallow GPs silently in xen_read_msr()
and xen_write_msr().

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

xen/pv: fix vendor checks for pmu emulation

The CPU vendor checks for pmu emulation are rather limited today, as
the assumption seems to be that only Intel and AMD are existing and/or
supported vendors.

Fix that by handling Centaur and Zhaoxin CPUs the same way as Intel,
and Hygon the same way as AMD.

While at it fix the return type of is_intel_pmu_msr().

Suggested-by: Jan Beulich <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

xen/pv: add fault recovery control to pmu msr accesses

Today pmu_msr_read() and pmu_msr_write() fall back to the safe variants
of read/write MSR in case the MSR access isn't emulated via Xen. Allow
the caller to select that faults should not be recovered from by passing
NULL for the error pointer.

Restructure the code to make it more readable.

Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

wifi: ath11k: mac: fix reading 16 bytes from a region of size 0 warning

Linaro reported stringop-overread warnings in ath11k (this is one of many):

drivers/net/wireless/ath/ath11k/mac.c:2238:29: error: 'ath11k_peer_assoc_h_he_limit' reading 16 bytes from a region of size 0 [-Werror=stringop-overread]

My further investigation showed that these warnings happen on GCC 11.3 but not
with GCC 12.2, and with only the kernel config Linaro provided:

https://builds.tuxbuild.com/2F4W7nZHNx3T88RB0gaCZ9hBX6c/config

I saw the same warnings both with arm64 and x86_64 builds and KASAN seems to be
the reason triggering these warnings with GCC 11. Nobody else has reported
this so this seems to be quite rare corner case. I don't know what specific
commit started emitting this warning so I can't provide a Fixes tag. The
function hasn't been touched for a year.

I decided to workaround this by converting the pointer to a new array in stack,
and then copying the data to the new array. It's only 16 bytes anyway and this
is executed during association, so not in a hotpath.

Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.9

Reported-by: Linux Kernel Functional Testing <[email protected]>
Link: https://lore.kernel.org/all/CA+G9fYsZ_qypa=jHY_dJ=tqX4515+qrV9n2SWXVDHve826nF7Q@mail.gmail.com/
Signed-off-by: Kalle Valo <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

wifi: iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (other cases)

BUGs like this are still reproducible:

[   31.509616] list_add corruption. prev->next should be next (ffff8f8644242300), but was ffff8f86493fd300. (prev=ffff8f86493fd300).
[   31.521544] ------------[ cut here ]------------
[   31.526248] kernel BUG at lib/list_debug.c:30!
[   31.530781] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   31.535831] CPU: 1 PID: 626 Comm: wpa_supplicant Not tainted 6.0.0+ #7
[   31.542450] Hardware name: Dell Inc. Inspiron 660s/0478VN       , BIOS A07 08/24/2012
[   31.550484] RIP: 0010:__list_add_valid.cold+0x3a/0x5b
[   31.555537] Code: f2 4c 89 c1 48 89 fe 48 c7 c7 28 20 69 89 e8 4c e3 fd ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 d0 1f 69 89 e8 35 e3 fd ff <0f> 0b 4c 89 c1 48 c7 c7 78 1f 69 89 e8 24 e3 fd ff 0f 0b 48 c7 c7
[   31.574605] RSP: 0018:ffff9f6f00dc3748 EFLAGS: 00010286
[   31.579990] RAX: 0000000000000075 RBX: ffff8f8644242080 RCX: 0000000000000000
[   31.587155] RDX: 0000000000000201 RSI: ffffffff8967862d RDI: 00000000ffffffff
[   31.594482] RBP: ffff8f86493fd2e8 R08: 0000000000000000 R09: 00000000ffffdfff
[   31.601735] R10: ffff9f6f00dc3608 R11: ffffffff89f46128 R12: ffff8f86493fd300
[   31.608986] R13: ffff8f86493fd300 R14: ffff8f8644242300 R15: ffff8f8643dd3f2c
[   31.616151] FS:  00007f3bb9a707c0(0000) GS:ffff8f865a300000(0000) knlGS:0000000000000000
[   31.624447] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   31.630286] CR2: 00007fe3647d5600 CR3: 00000001125a6002 CR4: 00000000000606e0
[   31.637539] Call Trace:
[   31.639936]  <TASK>
[   31.642143]  iwl_mvm_mac_wake_tx_queue+0x71/0x90 [iwlmvm]
[   31.647569]  ieee80211_queue_skb+0x4b6/0x720 [mac80211]
...

So, it is necessary to extend the applied solution with commit 14a3aacf517a9
("iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue")
to all other cases where the station queues are invalidated and the related
lists are not emptied. Because, otherwise as before, if some new element is
added later to the list in iwl_mvm_mac_wake_tx_queue, it can match with the
old one and produce the same commented BUG.

That is, in order to avoid this problem completely, we must also remove the
related lists for the other cases when station queues are invalidated.

Fixes: cfbc6c4c5b91c ("iwlwifi: mvm: support mac80211 TXQs model")
Reported-by: Petr Stourac <[email protected]>
Tested-by: Petr Stourac <[email protected]>
Signed-off-by: Jose Ignacio Tornos Martinez <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

wifi: mt76: fix rx checksum offload on mt7615/mt7915/mt7921

Checking the relevant rxd bits for the checksum information only indicates
if the checksum verification was performed by the hardware and doesn't show
actual checksum errors. Checksum errors are indicated in the info field of
the DMA descriptor. Fix packets erroneously marked as CHECKSUM_UNNECESSARY
by checking the extra bits as well.
Those bits are only passed to the driver for MMIO devices at the moment, so
limit checksum offload to those.

Fixes: 2122dfbfd0bd ("mt76: mt7615: add rx checksum offload support")
Fixes: 94244d2ea503 ("mt76: mt7915: add rx checksum offload support")
Fixes: 0e75732764e8 ("mt76: mt7921: enable rx csum offload")
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

wifi: mt76: fix receiving LLC packets on mt7615/mt7915

When 802.3 decap offload is enabled, the hardware indicates header translation
failure, whenever either the LLC-SNAP header was not found, or a VLAN header
with an unregcognized tag is present.
In that case, the hardware inserts a 2-byte length fields after the MAC
addresses. For VLAN packets, this tag needs to be removed. However,
for 802.3 LLC packets, the length bytes should be preserved, since there
is no separate ethertype field in the data.
This fixes an issue where the length field was omitted for LLC frames, causing
them to be malformed after hardware decap.

Fixes: 1eeff0b4c1a6 ("mt76: mt7915: fix decap offload corner case with 4-addr VLAN frames")
Reported-by: Chad Monroe <[email protected]>
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge patch series "can: kvaser_usb: Various fixes"

Jimmy Assarsson <[email protected]> says:

Changes in v5: https://lore.kernel.org/all/20221010150829 [email protected]
- Split series [1], kept only critical bug fixes that should go into
   stable, since v4 got rejected [2].
   Non-critical fixes are posted in a separate series.

Changes in v4: https://lore.kernel.org/all/20220903182344 [email protected]
- Add Tested-by: Anssi Hannula to
   [PATCH v4 04/15] can: kvaser_usb: kvaser_usb_leaf: Get capabilities from device
- Update commit message in
   [PATCH v4 04/15] can: kvaser_usb: kvaser_usb_leaf: Get capabilities from device

Changes in v3: https://lore.kernel.org/all/20220901122729 [email protected]
- Rebase on top of commit
   1d5eeda23f36 ("can: kvaser_usb: advertise timestamping capabilities and add ioctl support")
- Add Tested-by: Anssi Hannula
- Add [email protected] to CC.
- Add my S-o-b to all patches
- Fix regression introduced in
   [PATCH v2 04/15] can: kvaser_usb: kvaser_usb_leaf: Get capabilities from device
   found by Anssi Hannula
   https://lore.kernel.org/all/b25bc059-d776-146d-0b3c-41aecf4bd9f8@bitwise.fi

v2: https://lore.kernel.org/all/20220708115709 [email protected]

v1: https://lore.kernel.org/all/20220516134748.3724796 [email protected]

[1] https://lore.kernel.org/linux-can/20220903182344 [email protected]
[2] https://lore.kernel.org/linux-can/20220920192708 [email protected]

Link: https://lore.kernel.org/all/[email protected]
[mkl: add/update links]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: kvaser_usb_leaf: Fix CAN state after restart

can_restart() expects CMD_START_CHIP to set the error state to
ERROR_ACTIVE as it calls netif_carrier_on() immediately afterwards.

Otherwise the user may immediately trigger restart again and hit a
BUG_ON() in can_restart().

Fix kvaser_usb_leaf set_mode(CMD_START_CHIP) to set the expected state.

Cc: [email protected]
Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices")
Tested-by: Jimmy Assarsson <[email protected]>
Signed-off-by: Anssi Hannula <[email protected]>
Signed-off-by: Jimmy Assarsson <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: kvaser_usb_leaf: Fix TX queue out of sync after restart

The TX queue seems to be implicitly flushed by the hardware during
bus-off or bus-off recovery, but the driver does not reset the TX
bookkeeping.

Despite not resetting TX bookkeeping the driver still re-enables TX
queue unconditionally, leading to "cannot find free context" /
NETDEV_TX_BUSY errors if the TX queue was full at bus-off time.

Fix that by resetting TX bookkeeping on CAN restart.

Tested with 0bfd:0124 Kvaser Mini PCI Express 2xHS FW 4.18.778.

Cc: [email protected]
Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices")
Tested-by: Jimmy Assarsson <[email protected]>
Signed-off-by: Anssi Hannula <[email protected]>
Signed-off-by: Jimmy Assarsson <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: kvaser_usb: Fix use of uninitialized completion

flush_comp is initialized when CMD_FLUSH_QUEUE is sent to the device and
completed when the device sends CMD_FLUSH_QUEUE_RESP.

This causes completion of uninitialized completion if the device sends
CMD_FLUSH_QUEUE_RESP before CMD_FLUSH_QUEUE is ever sent (e.g. as a
response to a flush by a previously bound driver, or a misbehaving
device).

Fix that by initializing flush_comp in kvaser_usb_init_one() like the
other completions.

This issue is only triggerable after RX URBs have been set up, i.e. the
interface has been opened at least once.

Cc: [email protected]
Fixes: aec5fb2268b7 ("can: kvaser_usb: Add support for Kvaser USB hydra family")
Tested-by: Jimmy Assarsson <[email protected]>
Signed-off-by: Anssi Hannula <[email protected]>
Signed-off-by: Jimmy Assarsson <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: kvaser_usb_leaf: Fix overread with an invalid command

For command events read from the device,
kvaser_usb_leaf_read_bulk_callback() verifies that cmd->len does not
exceed the size of the received data, but the actual kvaser_cmd handlers
will happily read any kvaser_cmd fields without checking for cmd->len.

This can cause an overread if the last cmd in the buffer is shorter than
expected for the command type (with cmd->len showing the actual short
size).

Maximum overread seems to be 22 bytes (CMD_LEAF_LOG_MESSAGE), some of
which are delivered to userspace as-is.

Fix that by verifying the length of command before handling it.

This issue can only occur after RX URBs have been set up, i.e. the
interface has been opened at least once.

Cc: [email protected]
Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices")
Tested-by: Jimmy Assarsson <[email protected]>
Signed-off-by: Anssi Hannula <[email protected]>
Signed-off-by: Jimmy Assarsson <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

Merge tag 'xfs-6.1-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Dave Chinner:
"There are relatively few updates this cycle; half the cycle was eaten
  by a grue, the other half was eaten by a tricky data corruption issue
  that I still haven't entirely solved.

  Hence there's no major changes in this cycle and it's largely just
  minor cleanups and small bug fixes:

   - fixes for filesystem shutdown procedure during a DAX memory failure
     notification

   - bug fixes

   - logic cleanups

   - log message cleanups

   - updates to use vfs{g,u}id_t helpers where appropriate"

* tag 'xfs-6.1-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: on memory failure, only shut down fs after scanning all mappings
  xfs: rearrange the logic and remove the broken comment for xfs_dir2_isxx
  xfs: trim the mapp array accordingly in xfs_da_grow_inode_int
  xfs: do not need to check return value of xlog_kvmalloc()
  xfs: port to vfs{g,u}id_t and associated helpers
  xfs: remove xfs_setattr_time() declaration
  xfs: Remove the unneeded result variable
  xfs: missing space in xfs trace log
  xfs: simplify if-else condition in xfs_reflink_trim_around_shared
  xfs: simplify if-else condition in xfs_validate_new_dalign
  xfs: replace unnecessary seq_printf with seq_puts
  xfs: clean up "%Ld/%Lu" which doesn't meet C standard
  xfs: remove redundant else for clean code
  xfs: remove the redundant word in comment

Merge tag 'f2fs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
"This round looks fairly small comparing to the previous updates and
  includes mostly minor bug fixes. Nevertheless, as we've still
  interested in improving the stability, Chao added some debugging
  methods to diagnoze subtle runtime inconsistency problem.

  Enhancements:
   - store all the corruption or failure reasons in superblock
   - detect meta inode, summary info, and block address inconsistency
   - increase the limit for reserve_root for low-end devices
   - add the number of compressed IO in iostat

  Bug fixes:
   - DIO write fix for zoned devices
   - do out-of-place writes for cold files
   - fix some stat updates (FS_CP_DATA_IO, dirty page count)
   - fix race condition on setting FI_NO_EXTENT flag
   - fix data races when freezing super
   - fix wrong continue condition check in GC
   - do not allow ATGC for LFS mode

  In addition, there're some code enhancement and clean-ups as usual"

* tag 'f2fs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits)
  f2fs: change to use atomic_t type form sbi.atomic_files
  f2fs: account swapfile inodes
  f2fs: allow direct read for zoned device
  f2fs: support recording errors into superblock
  f2fs: support recording stop_checkpoint reason into super_block
  f2fs: remove the unnecessary check in f2fs_xattr_fiemap
  f2fs: introduce cp_status sysfs entry
  f2fs: fix to detect corrupted meta ino
  f2fs: fix to account FS_CP_DATA_IO correctly
  f2fs: code clean and fix a type error
  f2fs: add "c_len" into trace_f2fs_update_extent_tree_range for compressed file
  f2fs: fix to do sanity check on summary info
  f2fs: port to vfs{g,u}id_t and associated helpers
  f2fs: fix to do sanity check on destination blkaddr during recovery
  f2fs: let FI_OPU_WRITE override FADVISE_COLD_BIT
  f2fs: fix race condition on setting FI_NO_EXTENT flag
  f2fs: remove redundant check in f2fs_sanity_check_cluster
  f2fs: add static init_idisk_time function to reduce the code
  f2fs: fix typo
  f2fs: fix wrong dirty page count when race between mmap and fallocate.
  ...

Merge tag '9p-for-6.1' of https://github.com/martinetd/linux

Pull 9p updates from Dominique Martinet:
"Smaller buffers for small messages and fixes.

  The highlight of this is Christian's patch to allocate smaller buffers
  for most metadata requests: 9p with a big msize would try to allocate
  large buffers when just 4 or 8k would be more than enough; this brings
  in nice performance improvements.

  There's also a few fixes for problems reported by syzkaller (thanks to
  Schspa Shi, Tetsuo Handa for tests and feedback/patches) as well as
  some minor cleanup"

* tag '9p-for-6.1' of https://github.com/martinetd/linux:
  net/9p: clarify trans_fd parse_opt failure handling
  net/9p: add __init/__exit annotations to module init/exit funcs
  net/9p: use a dedicated spinlock for trans_fd
  9p/trans_fd: always use O_NONBLOCK read/write
  net/9p: allocate appropriate reduced message buffers
  net/9p: add 'pooled_rbuffers' flag to struct p9_trans_module
  net/9p: add p9_msg_buf_size()
  9p: add P9_ERRMAX for 9p2000 and 9p2000.u
  net/9p: split message size argument into 't_size' and 'r_size' pair
  9p: trans_fd/p9_conn_cancel: drop client lock earlier

Merge tag 'gfs2-nopid-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 debugfs updates from Andreas Gruenbacher:

- Improve the way how the state of glocks is reported in debugfs for
   glocks which are not held by processes, but rather by other resouces
   like cached inodes or flocks.

* tag 'gfs2-nopid-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Mark the remaining process-independent glock holders as GL_NOPID
  gfs2: Mark flock glock holders as GL_NOPID
  gfs2: Add GL_NOPID flag for process-independent glock holders
  gfs2: Add flocks to glockfd debugfs file
  gfs2: Add glockfd debugfs file

Merge tag 'gfs2-v6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

- Make sure to initialize the filesystem work queues before registering
   the filesystem; this prevents them from being used uninitialized.

- On filesystem withdraw: prevent a a double iput() and immediately
   reject pending locking requests that can no longer succeed.

- Use TRY lock in gfs2_inode_lookup() to prevent a rare glock hang
   during evict.

- During filesystem mount, explicitly make sure that the sb_bsize and
   sb_bsize_shift super block fields are consistent with each other.
   This prevents messy error messages during fuzz testing.

- Switch from strlcpy to strscpy.

* tag 'gfs2-v6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Register fs after creating workqueues
  gfs2: Check sb_bsize_shift after reading superblock
  gfs2: Switch from strlcpy to strscpy
  gfs2: Clear flags when withdraw prevents xmote
  gfs2: Dequeue waiters when withdrawn
  gfs2: Prevent double iput for journal on error
  gfs2: Use TRY lock in gfs2_inode_lookup for UNLINKED inodes

Merge tag '6.1-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs updates from Steve French:

- data corruption fix when cache disabled

- four RDMA (smbdirect) improvements, including enabling support for
   SoftiWARP

- four signing improvements

- three directory lease improvements

- four cleanup fixes

- minor security fix

- two debugging improvements

* tag '6.1-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (21 commits)
  smb3: fix oops in calculating shash_setkey
  cifs: secmech: use shash_desc directly, remove sdesc
  smb3: rename encryption/decryption TFMs
  cifs: replace kfree() with kfree_sensitive() for sensitive data
  cifs: remove initialization value
  cifs: Replace a couple of one-element arrays with flexible-array members
  smb3: do not log confusing message when server returns no network interfaces
  smb3: define missing create contexts
  cifs: store a pointer to a fid in the cfid structure instead of the struct
  cifs: improve handlecaching
  cifs: Make tcon contain a wrapper structure cached_fids instead of cached_fid
  smb3: add dynamic trace points for tree disconnect
  Fix formatting of client smbdirect RDMA logging
  Handle variable number of SGEs in client smbdirect send.
  Reduce client smbdirect max receive segment size
  Decrease the number of SMB3 smbdirect client SGEs
  cifs: Fix the error length of VALIDATE_NEGOTIATE_INFO message
  cifs: destage dirty pages before re-reading them for cache=none
  cifs: return correct error in ->calc_signature()
  MAINTAINERS: Add Tom Talpey as cifs.ko reviewer
  ...

Merge tag 'nfsd-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull more nfsd updates from Chuck Lever:

- filecache code clean-ups

* tag 'nfsd-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: rework hashtable handling in nfsd_do_file_acquire
nfsd: fix nfsd_file_unhash_and_dispose

Merge tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs tmpfile updates from Al Viro:
"Miklos' ->tmpfile() signature change; pass an unopened struct file to
  it, let it open the damn thing. Allows to add tmpfile support to FUSE"

* tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fuse: implement ->tmpfile()
  vfs: open inside ->tmpfile()
  vfs: move open right after ->tmpfile()
  vfs: make vfs_tmpfile() static
  ovl: use vfs_tmpfile_open() helper
  cachefiles: use vfs_tmpfile_open() helper
  cachefiles: only pass inode to *mark_inode_inuse() helpers
  cachefiles: tmpfile error handling cleanup
  hugetlbfs: cleanup mknod and tmpfile
  vfs: add vfs_tmpfile_open() helper

nfp: flower: fix incorrect struct type in GRE key_size

Looks like a copy-paste error sneaked in here at some point,
causing the key_size for these tunnels to be calculated
incorrectly. This size ends up being send to the firmware,
causing unexpected behaviour in some cases.

Fixes: 78a722af4ad9 ("nfp: flower: compile match for IPv6 tunnels")
Reported-by: Chaoyong He <[email protected]>
Signed-off-by: Louis Peens <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: sfp: fill also 5gbase-r and 25gbase-r modes in sfp_parse_support()

Fill in also 5gbase-r and 25gbase-r PHY interface modes into the
phy_interface_t bitmap in sfp_parse_support().

Fixes: fd580c983031 ("net: sfp: augment SFP parsing with phy_interface_t bitmap")
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: systemport: Enable all RX descriptors for SYSTEMPORT Lite

The original commit that added support for the SYSTEMPORT Lite variant
halved the number of RX descriptors due to a confusion between the
number of descriptors and the number of descriptor words. There are 512
descriptor *words* which means 256 descriptors total.

Fixes: 44a4524c54af ("net: systemport: Add support for SYSTEMPORT Lite")
Signed-off-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: prestera: span: do not unbind things things that were never bound

Fixes: 13defa275eef ("net: marvell: prestera: Add matchall support")
Signed-off-by: Maksym Glubokiy <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

- Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

- userfaultfd updates from Axel Rasmussen

- zsmalloc cleanups from Alexey Romanov

- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

- memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

- memcg cleanups from Kairui Song.

- memcg fixes and cleanups from Johannes Weiner.

- Vishal Moola provides more folio conversions

- Zhang Yi removed ll_rw_block() :(

- migration enhancements from Peter Xu

- migration error-path bugfixes from Huang Ying

- Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

- vma merging improvements from Jakub Matěn.

- NUMA hinting cleanups from David Hildenbrand.

- xu xin added aditional userspace visibility into KSM merging
   activity.

- THP & KSM code consolidation from Qi Zheng.

- more folio work from Matthew Wilcox.

- KASAN updates from Andrey Konovalov.

- DAMON cleanups from Kaixu Xia.

- DAMON work from SeongJae Park: fixes, cleanups.

- hugetlb sysfs cleanups from Muchun Song.

- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com
* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...

Merge tag 'x86_mm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Dave Hansen:
"There are some small things here, plus one big one.

  The big one detected and refused to create W+X kernel mappings. This
  caused a bit of trouble and it is entirely disabled on 32-bit due to
  known unfixable EFI issues. It also oopsed on some systemd eBPF use,
  which kept some users from booting.

  The eBPF issue is fixed, but those troubles were caught relatively
  recently which made me nervous that there are more lurking. The final
  commit in here retains the warnings, but doesn't actually refuse to
  create W+X mappings.

  Summary:

   - Detect insecure W+X mappings and warn about them, including a few
     bug fixes and relaxing the enforcement

   - Do a long-overdue defconfig update and enabling W+X boot-time
     detection

   - Cleanup _PAGE_PSE handling (follow-up on an earlier bug)

   - Rename a change_page_attr function"

* tag 'x86_mm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Ease W^X enforcement back to just a warning
  x86/mm: Disable W^X detection and enforcement on 32-bit
  x86/mm: Add prot_sethuge() helper to abstract out _PAGE_PSE handling
  x86/mm/32: Fix W^X detection when page tables do not support NX
  x86/defconfig: Enable CONFIG_DEBUG_WX=y
  x86/defconfig: Refresh the defconfigs
  x86/mm: Refuse W^X violations
  x86/mm: Rename set_memory_present() to set_memory_p()

Merge branch 'Add _opts variant for bpf_*_get_fd_by_id()'

Roberto Sassu says:

====================
From: Roberto Sassu <[email protected]>

Add the _opts variant for bpf_*_get_fd_by_id() functions, to be able to
pass to the kernel more options, when requesting a fd of an eBPF object.

Pass the options through a newly introduced structure,
bpf_get_fd_by_id_opts, which currently contains open_flags (the other two
members are for compatibility and for padding).

open_flags allows the caller to request specific permissions to access a
map (e.g. read-only). This is useful for example in the situation where a
map is write-protected.

Besides patches 2-6, which introduce the new variants and the data
structure, patch 1 fixes the LIBBPF_1.0.0 declaration in libbpf.map.

Changelog

v1:
- Don't CC stable kernel mailing list for patch 1 (suggested by Andrii)
- Rename bpf_get_fd_opts struct to bpf_get_fd_by_id_opts (suggested by
   Andrii)
- Move declaration of _opts variants after non-opts variants (suggested by
   Andrii)
- Correctly initialize bpf_map_info, fix style issues, use map from
   skeleton, check valid fd in the test (suggested by Andrii)
- Rename libbpf_get_fd_opts test to libbpf_get_fd_by_id_opts
====================

Signed-off-by: Andrii Nakryiko <[email protected]>

selftests/bpf: Add tests for _opts variants of bpf_*_get_fd_by_id()

Introduce the data_input map, write-protected with a small eBPF program
implementing the lsm/bpf_map hook.

Then, ensure that bpf_map_get_fd_by_id() and bpf_map_get_fd_by_id_opts()
with NULL opts don't succeed due to requesting read-write access to the
write-protected map. Also, ensure that bpf_map_get_fd_by_id_opts() with
open_flags in opts set to BPF_F_RDONLY instead succeeds.

After obtaining a read-only fd, ensure that only map lookup succeeds and
not update. Ensure that update works only with the read-write fd obtained
at program loading time, when the write protection was not yet enabled.

Finally, ensure that the other _opts variants of bpf_*_get_fd_by_id() don't
work if the BPF_F_RDONLY flag is set in opts (due to the kernel not
handling the open_flags member of bpf_attr).

Signed-off-by: Roberto Sassu <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

libbpf: Introduce bpf_link_get_fd_by_id_opts()

Introduce bpf_link_get_fd_by_id_opts(), for symmetry with
bpf_map_get_fd_by_id_opts(), to let the caller pass the newly introduced
data structure bpf_get_fd_by_id_opts. Keep the existing
bpf_link_get_fd_by_id(), and call bpf_link_get_fd_by_id_opts() with NULL as
opts argument, to prevent setting open_flags.

Currently, the kernel does not support non-zero open_flags for
bpf_link_get_fd_by_id_opts(), and a call with them will result in an error
returned by the bpf() system call. The caller should always pass zero
open_flags.

Signed-off-by: Roberto Sassu <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]