Git Repo - linux.git/log

mm/slub: Convert pfmemalloc_match() to take a struct slab

Preparatory for mass conversion. Use the new slab_test_pfmemalloc()
helper. As it doesn't do VM_BUG_ON(!PageSlab()) we no longer need the
pfmemalloc_match_unsafe() variant.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Reviewed-by: Hyeonggon Yoo <[email protected]>

mm/slub: Convert __free_slab() to use struct slab

__free_slab() is on the boundary of distinguishing struct slab and
struct page so start with struct slab but convert to folio for working
with flags and folio_page() to call functions that require struct page.

Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Reviewed-by: Hyeonggon Yoo <[email protected]>
Tested-by: Hyeonggon Yoo <[email protected]>

mm/slub: Convert alloc_slab_page() to return a struct slab

Preparatory, callers convert back to struct page for now.

Also move setting page flags to alloc_slab_page() where we still operate
on a struct page. This means the page->slab_cache pointer is now set
later than the PageSlab flag, which could theoretically confuse some pfn
walker assuming PageSlab means there would be a valid cache pointer. But
as the code had no barriers and used __set_bit() anyway, it could have
happened already, so there shouldn't be such a walker.

Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Reviewed-by: Hyeonggon Yoo <[email protected]>
Tested-by: Hyeonggon Yoo <[email protected]>

mm/slub: Convert print_page_info() to print_slab_info()

Improve the type safety and prepare for further conversion. For flags
access, convert to folio internally.

[ [email protected]: access flags via folio_flags() ]

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Reviewed-by: Hyeonggon Yoo <[email protected]>

mm/slub: Convert __slab_lock() and __slab_unlock() to struct slab

These functions operate on the PG_locked page flag, but make them accept
struct slab to encapsulate this implementation detail.

Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>

mm/slub: Convert kfree() to use a struct slab

Convert kfree(), kmem_cache_free() and ___cache_free() to resolve object
addresses to struct slab, using folio as intermediate step where needed.
Keep passing the result as struct page for now in preparation for mass
conversion of internal functions.

[ [email protected]: Use folio as intermediate step when checking for
large kmalloc pages, and when freeing them - rename
free_nonslab_page() to free_large_kmalloc() that takes struct folio ]

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>

mm/slub: Convert detached_freelist to use a struct slab

This gives us a little bit of extra typesafety as we know that nobody
called virt_to_page() instead of virt_to_head_page().

[ [email protected]: Use folio as intermediate step when filtering out
large kmalloc pages ]

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>

mm: Convert check_heap_object() to use struct slab

Ensure that we're not seeing a tail page inside __check_heap_object() by
converting to a slab instead of a page.  Take the opportunity to mark
the slab as const since we're not modifying it.  Also move the
declaration of __check_heap_object() to mm/slab.h so it's not available
to the wider kernel.

[ [email protected]: in check_heap_object() only convert to struct slab for
  actual PageSlab pages; use folio as intermediate step instead of page ]

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>

mm: Use struct slab in kmem_obj_info()

All three implementations of slab support kmem_obj_info() which reports
details of an object allocated from the slab allocator. By using the
slab type instead of the page type, we make it obvious that this can
only be called for slabs.

[ [email protected]: also convert the related kmem_valid_obj() to folios ]

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>

mm: Convert __ksize() to struct slab

In SLUB, use folios, and struct slab to access slab_cache field.
In SLOB, use folios to properly resolve pointers beyond
PAGE_SIZE offset of the object.

[ [email protected]: use folios, and only convert folio_test_slab() == true
folios to struct slab ]

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>

mm: Convert virt_to_cache() to use struct slab

This function is entirely self-contained, so can be converted from page
to slab.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Reviewed-by: Hyeonggon Yoo <[email protected]>

mm: Convert [un]account_slab_page() to struct slab

Convert the parameter of these functions to struct slab instead of
struct page and drop _page from the names. For now their callers just
convert page to slab.

[ [email protected]: replace existing functions instead of calling them ]

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>

mm: Split slab into its own type

Make struct slab independent of struct page. It still uses the
underlying memory in struct page for storing slab-specific data, but
slab and slub can now be weaned off using struct page directly.  Some of
the wrapper functions (slab_address() and slab_order()) still need to
cast to struct folio, but this is a significant disentanglement.

[ [email protected]: Rebase on folios, use folio instead of page where
  possible.

  Do not duplicate flags field in struct slab, instead make the related
  accessors go through slab_folio(). For testing pfmemalloc use the
  folio_*_active flag accessors directly so the PageSlabPfmemalloc
  wrappers can be removed later.

  Make folio_slab() expect only folio_test_slab() == true folios and
  virt_to_slab() return NULL when folio_test_slab() == false.

  Move struct slab to mm/slab.h.

  Don't represent with struct slab pages that are not true slab pages,
  but just a compound page obtained directly rom page allocator (with
  large kmalloc() for SLUB and SLOB). ]

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>

mm/slub: Make object_err() static

There are no callers outside of mm/slub.c anymore.

Move freelist_corrupted() that calls object_err() to avoid a need for
forward declaration.

Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>

mm/slab: Dissolve slab_map_pages() in its caller

The function no longer does what its name and comment suggests, and just
sets two struct page fields, which can be done directly in its sole
caller.

Signed-off-by: Vlastimil Babka <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Reviewed-by: Hyeonggon Yoo <[email protected]>

Merge tag 'socfpga_fix_for_v5.16_part_3' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into arm/fixes

SoCFPGA dts updates for v5.16, part 3
- Change the SoCFPGA compatible to "intel,socfpga-qspi"
- Update dt-bindings document to include "intel,socfpga-qspi"

* tag 'socfpga_fix_for_v5.16_part_3' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux: (361 commits)
  ARM: dts: socfpga: change qspi to "intel,socfpga-qspi"
  dt-bindings: spi: cadence-quadspi: document "intel,socfpga-qspi"
  Linux 5.16-rc7
  mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()
  mm/damon/dbgfs: protect targets destructions with kdamond_lock
  mm/page_alloc: fix __alloc_size attribute for alloc_pages_exact_nid
  mm: delete unsafe BUG from page_cache_add_speculative()
  mm, hwpoison: fix condition in free hugetlb page path
  MAINTAINERS: mark more list instances as moderated
  kernel/crash_core: suppress unknown crashkernel parameter warning
  mm: mempolicy: fix THP allocations escaping mempolicy restrictions
  kfence: fix memory leak when cat kfence objects
  platform/x86: intel_pmc_core: fix memleak on registration failure
  net: stmmac: dwmac-visconti: Fix value of ETHER_CLK_SEL_FREQ_SEL_2P5M
  r8152: sync ocp base
  r8152: fix the force speed doesn't work for RTL8156
  net: bridge: fix ioctl old_deviceless bridge argument
  net: stmmac: ptp: fix potentially overflowing expression
  net: dsa: tag_ocelot: use traffic class to map priority on injected header
  veth: ensure skb entering GRO are not cloned.
  ...

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Olof Johansson <[email protected]>

Merge tag 'reset-fixes-for-v5.16-2' of git://git.pengutronix.de/pza/linux into arm/fixes

Reset controller fixes for v5.16, part 2

Fix pm_runtime_resume_and_get() error handling in the
reset-rzg2l-usbphy-ctrl driver.

* tag 'reset-fixes-for-v5.16-2' of git://git.pengutronix.de/pza/linux:
reset: renesas: Fix Runtime PM usage
reset: tegra-bpmp: Revert Handle errors in BPMP response

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Olof Johansson <[email protected]>

tracing: Tag trace_percpu_buffer as a percpu pointer

Tag trace_percpu_buffer as a percpu pointer to resolve warnings
reported by sparse:
  /linux/kernel/trace/trace.c:3218:46: warning: incorrect type in initializer (different address spaces)
  /linux/kernel/trace/trace.c:3218:46:    expected void const [noderef] __percpu *__vpp_verify
  /linux/kernel/trace/trace.c:3218:46:    got struct trace_buffer_struct *
  /linux/kernel/trace/trace.c:3234:9: warning: incorrect type in initializer (different address spaces)
  /linux/kernel/trace/trace.c:3234:9:    expected void const [noderef] __percpu *__vpp_verify
  /linux/kernel/trace/trace.c:3234:9:    got int *

Link: https://lkml.kernel.org/r/ebabd3f23101d89cb75671b68b6f819f5edc830b.1640255304.git.naveen.n.rao@linux.vnet.ibm.com
Cc: [email protected]
Reported-by: kernel test robot <[email protected]>
Fixes: 07d777fe8c398 ("tracing: Add percpu buffers for trace_printk()")
Signed-off-by: Naveen N. Rao <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>

tracing: Fix check for trace_percpu_buffer validity in get_trace_buf()

With the new osnoise tracer, we are seeing the below splat:
    Kernel attempted to read user page (c7d880000) - exploit attempt? (uid: 0)
    BUG: Unable to handle kernel data access on read at 0xc7d880000
    Faulting instruction address: 0xc0000000002ffa10
    Oops: Kernel access of bad area, sig: 11 [#1]
    LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
    ...
    NIP [c0000000002ffa10] __trace_array_vprintk.part.0+0x70/0x2f0
    LR [c0000000002ff9fc] __trace_array_vprintk.part.0+0x5c/0x2f0
    Call Trace:
    [c0000008bdd73b80] [c0000000001c49cc] put_prev_task_fair+0x3c/0x60 (unreliable)
    [c0000008bdd73be0] [c000000000301430] trace_array_printk_buf+0x70/0x90
    [c0000008bdd73c00] [c0000000003178b0] trace_sched_switch_callback+0x250/0x290
    [c0000008bdd73c90] [c000000000e70d60] __schedule+0x410/0x710
    [c0000008bdd73d40] [c000000000e710c0] schedule+0x60/0x130
    [c0000008bdd73d70] [c000000000030614] interrupt_exit_user_prepare_main+0x264/0x270
    [c0000008bdd73de0] [c000000000030a70] syscall_exit_prepare+0x150/0x180
    [c0000008bdd73e10] [c00000000000c174] system_call_vectored_common+0xf4/0x278

osnoise tracer on ppc64le is triggering osnoise_taint() for negative
duration in get_int_safe_duration() called from
trace_sched_switch_callback()->thread_exit().

The problem though is that the check for a valid trace_percpu_buffer is
incorrect in get_trace_buf(). The check is being done after calculating
the pointer for the current cpu, rather than on the main percpu pointer.
Fix the check to be against trace_percpu_buffer.

Link: https://lkml.kernel.org/r/a920e4272e0b0635cf20c444707cbce1b2c8973d.1640255304.git.naveen.n.rao@linux.vnet.ibm.com
Cc: [email protected]
Fixes: e2ace001176dc9 ("tracing: Choose static tp_printk buffer by explicit nesting count")
Signed-off-by: Naveen N. Rao <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>

ftrace/samples: Add missing prototypes direct functions

There's another compilation fail (first here [1]) reported by kernel
test robot for W=1 clang build:

  >> samples/ftrace/ftrace-direct-multi-modify.c:7:6: warning: no previous
  prototype for function 'my_direct_func1' [-Wmissing-prototypes]
     void my_direct_func1(unsigned long ip)

Direct functions in ftrace direct sample modules need to have prototypes
defined. They are already global in order to be visible for the inline
assembly, so there's no problem.

The kernel test robot reported just error for ftrace-direct-multi-modify,
but I got same errors also for the rest of the modules touched by this patch.

[1] 67d4f6e3bf5d ftrace/samples: Add missing prototype for my_direct_func

Link: https://lkml.kernel.org/r/[email protected]
Reported-by: kernel test robot <[email protected]>
Fixes: e1067a07cfbc ("ftrace/samples: Add module to test multi direct modify interface")
Fixes: ae0cc3b7e7f5 ("ftrace/samples: Add a sample module that implements modify_ftrace_direct()")
Fixes: 156473a0ff4f ("ftrace: Add another example of register_ftrace_direct() use case")
Fixes: b06457c83af6 ("ftrace: Add sample module that uses register_ftrace_direct()")
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>

Merge tag 'net-5.16-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski"
"Networking fixes, including fixes from bpf, and WiFi. One last pull
  request, turns out some of the recent fixes did more harm than good.

  Current release - regressions:

   - Revert "xsk: Do not sleep in poll() when need_wakeup set", made the
     problem worse

   - Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in
     __fixed_phy_register", broke EPROBE_DEFER handling

   - Revert "net: usb: r8152: Add MAC pass-through support for more
     Lenovo Docks", broke setups without a Lenovo dock

  Current release - new code bugs:

   - selftests: set amt.sh executable

  Previous releases - regressions:

   - batman-adv: mcast: don't send link-local multicast to mcast routers

  Previous releases - always broken:

   - ipv4/ipv6: check attribute length for RTA_FLOW / RTA_GATEWAY

   - sctp: hold endpoint before calling cb in
     sctp_transport_lookup_process

   - mac80211: mesh: embed mesh_paths and mpp_paths into
     ieee80211_if_mesh to avoid complicated handling of sub-object
     allocation failures

   - seg6: fix traceroute in the presence of SRv6

   - tipc: fix a kernel-infoleak in __tipc_sendmsg()"

* tag 'net-5.16-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (36 commits)
  selftests: set amt.sh executable
  Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo Docks"
  sfc: The RX page_ring is optional
  iavf: Fix limit of total number of queues to active queues of VF
  i40e: Fix incorrect netdev's real number of RX/TX queues
  i40e: Fix for displaying message regarding NVM version
  i40e: fix use-after-free in i40e_sync_filters_subtask()
  i40e: Fix to not show opcode msg on unsuccessful VF MAC change
  ieee802154: atusb: fix uninit value in atusb_set_extended_addr
  mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh
  mac80211: initialize variable have_higher_than_11mbit
  sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc
  netrom: fix copying in user data in nr_setsockopt
  udp6: Use Segment Routing Header for dest address if present
  icmp: ICMPV6: Examine invoking packet for Segment Route Headers.
  seg6: export get_srh() for ICMP handling
  Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register"
  ipv6: Do cleanup if attribute validation fails in multipath route
  ipv6: Continue processing multipath route even if gateway attribute is invalid
  net/fsl: Remove leftover definition in xgmac_mdio
  ...

RDMA/core: Don't infoleak GRH fields

If dst->is_global field is not set, the GRH fields are not cleared
and the following infoleak is reported.

=====================================================
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline]
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x1c9/0x270 lib/usercopy.c:33
instrument_copy_to_user include/linux/instrumented.h:121 [inline]
_copy_to_user+0x1c9/0x270 lib/usercopy.c:33
copy_to_user include/linux/uaccess.h:209 [inline]
ucma_init_qp_attr+0x8c7/0xb10 drivers/infiniband/core/ucma.c:1242
ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732
vfs_write+0x8ce/0x2030 fs/read_write.c:588
ksys_write+0x28b/0x510 fs/read_write.c:643
__do_sys_write fs/read_write.c:655 [inline]
__se_sys_write fs/read_write.c:652 [inline]
__ia32_sys_write+0xdb/0x120 fs/read_write.c:652
do_syscall_32_irqs_on arch/x86/entry/common.c:114 [inline]
__do_fast_syscall_32+0x96/0xf0 arch/x86/entry/common.c:180
do_fast_syscall_32+0x34/0x70 arch/x86/entry/common.c:205
do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:248
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

Local variable resp created at:
ucma_init_qp_attr+0xa4/0xb10 drivers/infiniband/core/ucma.c:1214
ucma_write+0x637/0x6c0 drivers/infiniband/core/ucma.c:1732

Bytes 40-59 of 144 are uninitialized
Memory access of size 144 starts at ffff888167523b00
Data copied to user address 0000000020000100

CPU: 1 PID: 25910 Comm: syz-executor.1 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
=====================================================

Fixes: 4ba66093bdc6 ("IB/core: Check for global flag when using ah_attr")
Link: https://lore.kernel.org/r/0e9dd51f93410b7b2f4f5562f52befc878b71afa.1641298868.git.leonro@nvidia.com
Reported-by: [email protected]
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

selftests: set amt.sh executable

amt.sh test script will not work because it doesn't have execution
permission. So, it adds execution permission.

Reported-by: Hangbin Liu <[email protected]>
Fixes: c08e8baea78e ("selftests: add amt interface selftest script")
Signed-off-by: Taehee Yoo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

RDMA/uverbs: Check for null return of kmalloc_array

Because of the possible failure of the allocation, data might be NULL
pointer and will cause the dereference of the NULL pointer later.
Therefore, it might be better to check it and return -ENOMEM.

Fixes: 6884c6c4bd09 ("RDMA/verbs: Store the write/write_ex uapi entry points in the uverbs_api")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jiasheng Jiang <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

Merge branches 'for-next/misc', 'for-next/cache-ops-dzp', 'for-next/stacktrace', 'for-next/xor-neon', 'for-next/kasan', 'for-next/armv8_7-fp', 'for-next/atomics', 'for-next/bti', 'for-next/sve', 'for-next/kselftest' and 'for-next/kcsan', remote-tracking branch 'arm64/for-next/perf' into for-next/core

* arm64/for-next/perf: (32 commits)
  arm64: perf: Don't register user access sysctl handler multiple times
  drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
  perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
  arm64: perf: Support new DT compatibles
  arm64: perf: Simplify registration boilerplate
  arm64: perf: Support Denver and Carmel PMUs
  drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
  docs: perf: Add description for HiSilicon PCIe PMU driver
  dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings
  drivers: perf: Add LLC-TAD perf counter support
  perf/smmuv3: Synthesize IIDR from CoreSight ID registers
  perf/smmuv3: Add devicetree support
  dt-bindings: Add Arm SMMUv3 PMCG binding
  perf/arm-cmn: Add debugfs topology info
  perf/arm-cmn: Add CI-700 Support
  dt-bindings: perf: arm-cmn: Add CI-700
  perf/arm-cmn: Support new IP features
  perf/arm-cmn: Demarcate CMN-600 specifics
  perf/arm-cmn: Move group validation data off-stack
  perf/arm-cmn: Optimise DTC counter accesses
  ...

* for-next/misc:
  : Miscellaneous patches
  arm64: Use correct method to calculate nomap region boundaries
  arm64: Drop outdated links in comments
  arm64: errata: Fix exec handling in erratum 1418040 workaround
  arm64: Unhash early pointer print plus improve comment
  asm-generic: introduce io_stop_wc() and add implementation for ARM64
  arm64: remove __dma_*_area() aliases
  docs/arm64: delete a space from tagged-address-abi
  arm64/fp: Add comments documenting the usage of state restore functions
  arm64: mm: Use asid feature macro for cheanup
  arm64: mm: Rename asid2idx() to ctxid2asid()
  arm64: kexec: reduce calls to page_address()
  arm64: extable: remove unused ex_handler_t definition
  arm64: entry: Use SDEI event constants
  arm64: Simplify checking for populated DT
  arm64/kvm: Fix bitrotted comment for SVE handling in handle_exit.c

* for-next/cache-ops-dzp:
  : Avoid DC instructions when DCZID_EL0.DZP == 1
  arm64: mte: DC {GVA,GZVA} shouldn't be used when DCZID_EL0.DZP == 1
  arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1

* for-next/stacktrace:
  : Unify the arm64 unwind code
  arm64: Make some stacktrace functions private
  arm64: Make dump_backtrace() use arch_stack_walk()
  arm64: Make profile_pc() use arch_stack_walk()
  arm64: Make return_address() use arch_stack_walk()
  arm64: Make __get_wchan() use arch_stack_walk()
  arm64: Make perf_callchain_kernel() use arch_stack_walk()
  arm64: Mark __switch_to() as __sched
  arm64: Add comment for stack_info::kr_cur
  arch: Make ARCH_STACKWALK independent of STACKTRACE

* for-next/xor-neon:
  : Use SHA3 instructions to speed up XOR
  arm64/xor: use EOR3 instructions when available

* for-next/kasan:
  : Log potential KASAN shadow aliases
  arm64: mm: log potential KASAN shadow alias
  arm64: mm: use die_kernel_fault() in do_mem_abort()

* for-next/armv8_7-fp:
  : Add HWCAPS for ARMv8.7 FEAT_AFP amd FEAT_RPRES
  arm64: cpufeature: add HWCAP for FEAT_RPRES
  arm64: add ID_AA64ISAR2_EL1 sys register
  arm64: cpufeature: add HWCAP for FEAT_AFP

* for-next/atomics:
  : arm64 atomics clean-ups and codegen improvements
  arm64: atomics: lse: define RETURN ops in terms of FETCH ops
  arm64: atomics: lse: improve constraints for simple ops
  arm64: atomics: lse: define ANDs in terms of ANDNOTs
  arm64: atomics lse: define SUBs in terms of ADDs
  arm64: atomics: format whitespace consistently

* for-next/bti:
  : BTI clean-ups
  arm64: Ensure that the 'bti' macro is defined where linkage.h is included
  arm64: Use BTI C directly and unconditionally
  arm64: Unconditionally override SYM_FUNC macros
  arm64: Add macro version of the BTI instruction
  arm64: ftrace: add missing BTIs
  arm64: kexec: use __pa_symbol(empty_zero_page)
  arm64: update PAC description for kernel

* for-next/sve:
  : SVE code clean-ups and refactoring in prepararation of Scalable Matrix Extensions
  arm64/sve: Minor clarification of ABI documentation
  arm64/sve: Generalise vector length configuration prctl() for SME
  arm64/sve: Make sysctl interface for SVE reusable by SME

* for-next/kselftest:
  : arm64 kselftest additions
  kselftest/arm64: Add pidbench for floating point syscall cases
  kselftest/arm64: Add a test program to exercise the syscall ABI
  kselftest/arm64: Allow signal tests to trigger from a function
  kselftest/arm64: Parameterise ptrace vector length information

* for-next/kcsan:
  : Enable KCSAN for arm64
  arm64: Enable KCSAN

Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo Docks"

This reverts commit f77b83b5bbab53d2be339184838b19ed2c62c0a5.

This change breaks multiple usb to ethernet dongles attached on Lenovo
USB hub.

Fixes: f77b83b5bbab ("net: usb: r8152: Add MAC passthrough support for more Lenovo Docks")
Signed-off-by: Aaron Ma <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mm: Make SLAB_MERGE_DEFAULT depend on SL[AU]B

SLOB always manage objects of different caches in same page regardless of
SLAB_MERGE_DEFAULT. Because it has no effect on SLOB, make it depend on
SLAB || SLUB.

Signed-off-by: Hyeonggon Yoo <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'gpio-fixes-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
"Here are two last fixes for this release cycle from the GPIO
  subsystem:

   - fix irq offset calculation in gpio-aspeed-sgpio

   - update the MAINTAINERS entry for gpio-brcmstb"

* tag 'gpio-fixes-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  MAINTAINERS: update gpio-brcmstb maintainers
  gpio: gpio-aspeed-sgpio: Fix wrong hwirq base in irq handler

Merge tag 'ieee802154-for-net-2022-01-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan

Stefan Schmidt says:

====================
pull-request: ieee802154 for net 2022-01-05

Below I have a last minute fix for the atusb driver.

Pavel fixes a KASAN uninit report for the driver. This version is the
minimal impact fix to ease backporting. A bigger rework of the driver to
avoid potential similar problems is ongoing and will come through net-next
when ready.

* tag 'ieee802154-for-net-2022-01-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan:
ieee802154: atusb: fix uninit value in atusb_set_extended_addr
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

arm64: Use correct method to calculate nomap region boundaries

Nomap regions are treated as "reserved". When region boundaries are not
page aligned, we usually increase the "reserved" regions rather than
decrease them. So, we should use memblock_region_reserved_base_pfn()/
memblock_region_reserved_end_pfn() instead of memblock_region_memory_
base_pfn()/memblock_region_memory_base_pfn() to calculate boundaries.

Signed-off-by: Huacai Chen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

Revert "RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow"

This patch is not the full fix and still causes to call traces
during mlx5_ib_dereg_mr().

This reverts commit f0ae4afe3d35e67db042c58a52909e06262b740f.

Fixes: f0ae4afe3d35 ("RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maor Gottlieb <[email protected]>
Acked-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

arm64: Drop outdated links in comments

As started by commit 05a5f51ca566 ("Documentation: Replace lkml.org links
with lore"), an effort was made to replace lkml.org links with lore to
better use a single source that's more likely to stay available long-term.
However, it seems these links don't offer much value here, so just
remove them entirely.

Cc: Joe Perches <[email protected]>
Suggested-by: Will Deacon <[email protected]>
Link: https://lore.kernel.org/lkml/20210211100213.GA29813@willie-the-truck/
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[[email protected]: removed the arch/arm changes]
Signed-off-by: Catalin Marinas <[email protected]>

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-01-04

This series contains updates to i40e and iavf drivers.

Mateusz adjusts displaying of failed VF MAC message when the failure is
expected as well as modifying an NVM info message to not confuse the user
for i40e.

Di Zhu fixes a use-after-free issue MAC filters for i40e.

Jedrzej fixes an issue with misreporting of Rx and Tx queues during
reinitialization for i40e.

Karen correct checking of channel queue configuration to occur against
active queues for iavf.
====================

Signed-off-by: David S. Miller <[email protected]>

sfc: The RX page_ring is optional

The RX page_ring is an optional feature that improves
performance. When allocation fails the driver can still
function, but possibly with a lower bandwidth.
Guard against dereferencing a NULL page_ring.

Fixes: 2768935a4660 ("sfc: reuse pages to avoid DMA mapping/unmapping costs")
Signed-off-by: Martin Habets <[email protected]>
Reported-by: Jiasheng Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

iavf: Fix limit of total number of queues to active queues of VF

In the absence of this validation, if the user requests to
configure queues more than the enabled queues, it results in
sending the requested number of queues to the kernel stack
(due to the asynchronous nature of VF response), in which
case the stack might pick a queue to transmit that is not
enabled and result in Tx hang. Fix this bug by
limiting the total number of queues allocated for VF to
active queues of VF.

Fixes: d5b33d024496 ("i40evf: add ndo_setup_tc callback to i40evf")
Signed-off-by: Ashwin Vijayavel <[email protected]>
Signed-off-by: Karen Sornek <[email protected]>
Tested-by: Konrad Jankowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

i40e: Fix incorrect netdev's real number of RX/TX queues

There was a wrong queues representation in sysfs during
driver's reinitialization in case of online cpus number is
less than combined queues. It was caused by stopped
NetworkManager, which is responsible for calling vsi_open
function during driver's initialization.
In specific situation (ex. 12 cpus online) there were 16 queues
in /sys/class/net/<iface>/queues. In case of modifying queues with
value higher, than number of online cpus, then it caused write
errors and other errors.
Add updating of sysfs's queues representation during driver
initialization.

Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Lukasz Cieplicki <[email protected]>
Signed-off-by: Jedrzej Jagielski <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

i40e: Fix for displaying message regarding NVM version

When loading the i40e driver, it prints a message like: 'The driver for the
device detected a newer version of the NVM image v1.x than expected v1.y.
Please install the most recent version of the network driver.' This is
misleading as the driver is working as expected.

Fix that by removing the second part of message and changing it from
dev_info to dev_dbg.

Fixes: 4fb29bddb57f ("i40e: The driver now prints the API version in error message")
Signed-off-by: Mateusz Palczewski <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

i40e: fix use-after-free in i40e_sync_filters_subtask()

Using ifconfig command to delete the ipv6 address will cause
the i40e network card driver to delete its internal mac_filter and
i40e_service_task kernel thread will concurrently access the mac_filter.
These two processes are not protected by lock
so causing the following use-after-free problems.

print_address_description+0x70/0x360
? vprintk_func+0x5e/0xf0
kasan_report+0x1b2/0x330
i40e_sync_vsi_filters+0x4f0/0x1850 [i40e]
i40e_sync_filters_subtask+0xe3/0x130 [i40e]
i40e_service_task+0x195/0x24c0 [i40e]
process_one_work+0x3f5/0x7d0
worker_thread+0x61/0x6c0
? process_one_work+0x7d0/0x7d0
kthread+0x1c3/0x1f0
? kthread_park+0xc0/0xc0
ret_from_fork+0x35/0x40

Allocated by task 2279810:
kasan_kmalloc+0xa0/0xd0
kmem_cache_alloc_trace+0xf3/0x1e0
i40e_add_filter+0x127/0x2b0 [i40e]
i40e_add_mac_filter+0x156/0x190 [i40e]
i40e_addr_sync+0x2d/0x40 [i40e]
__hw_addr_sync_dev+0x154/0x210
i40e_set_rx_mode+0x6d/0xf0 [i40e]
__dev_set_rx_mode+0xfb/0x1f0
__dev_mc_add+0x6c/0x90
igmp6_group_added+0x214/0x230
__ipv6_dev_mc_inc+0x338/0x4f0
addrconf_join_solict.part.7+0xa2/0xd0
addrconf_dad_work+0x500/0x980
process_one_work+0x3f5/0x7d0
worker_thread+0x61/0x6c0
kthread+0x1c3/0x1f0
ret_from_fork+0x35/0x40

Freed by task 2547073:
__kasan_slab_free+0x130/0x180
kfree+0x90/0x1b0
__i40e_del_filter+0xa3/0xf0 [i40e]
i40e_del_mac_filter+0xf3/0x130 [i40e]
i40e_addr_unsync+0x85/0xa0 [i40e]
__hw_addr_sync_dev+0x9d/0x210
i40e_set_rx_mode+0x6d/0xf0 [i40e]
__dev_set_rx_mode+0xfb/0x1f0
__dev_mc_del+0x69/0x80
igmp6_group_dropped+0x279/0x510
__ipv6_dev_mc_dec+0x174/0x220
addrconf_leave_solict.part.8+0xa2/0xd0
__ipv6_ifa_notify+0x4cd/0x570
ipv6_ifa_notify+0x58/0x80
ipv6_del_addr+0x259/0x4a0
inet6_addr_del+0x188/0x260
addrconf_del_ifaddr+0xcc/0x130
inet6_ioctl+0x152/0x190
sock_do_ioctl+0xd8/0x2b0
sock_ioctl+0x2e5/0x4c0
do_vfs_ioctl+0x14e/0xa80
ksys_ioctl+0x7c/0xa0
__x64_sys_ioctl+0x42/0x50
do_syscall_64+0x98/0x2c0
entry_SYSCALL_64_after_hwframe+0x65/0xca

Fixes: 41c445ff0f48 ("i40e: main driver core")
Signed-off-by: Di Zhu <[email protected]>
Signed-off-by: Rui Zhang <[email protected]>
Tested-by: Gurucharan G <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

i40e: Fix to not show opcode msg on unsuccessful VF MAC change

Hide i40e opcode information sent during response to VF in case when
untrusted VF tried to change MAC on the VF interface.

This is implemented by adding an additional parameter 'hide' to the
response sent to VF function that hides the display of error
information, but forwards the error code to VF.

Previously it was not possible to send response with some error code
to VF without displaying opcode information.

Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface")
Signed-off-by: Grzegorz Szczurek <[email protected]>
Signed-off-by: Mateusz Palczewski <[email protected]>
Reviewed-by: Paul M Stillwell Jr <[email protected]>
Reviewed-by: Aleksandr Loktionov <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ieee802154: atusb: fix uninit value in atusb_set_extended_addr

Alexander reported a use of uninitialized value in
atusb_set_extended_addr(), that is caused by reading 0 bytes via
usb_control_msg().

Fix it by validating if the number of bytes transferred is actually
correct, since usb_control_msg() may read less bytes, than was requested
by caller.

Fail log:

BUG: KASAN: uninit-cmp in ieee802154_is_valid_extended_unicast_addr include/linux/ieee802154.h:310 [inline]
BUG: KASAN: uninit-cmp in atusb_set_extended_addr drivers/net/ieee802154/atusb.c:1000 [inline]
BUG: KASAN: uninit-cmp in atusb_probe.cold+0x29f/0x14db drivers/net/ieee802154/atusb.c:1056
Uninit value used in comparison: 311daa649a2003bd stack handle: 000000009a2003bd
ieee802154_is_valid_extended_unicast_addr include/linux/ieee802154.h:310 [inline]
atusb_set_extended_addr drivers/net/ieee802154/atusb.c:1000 [inline]
atusb_probe.cold+0x29f/0x14db drivers/net/ieee802154/atusb.c:1056
usb_probe_interface+0x314/0x7f0 drivers/usb/core/driver.c:396

Fixes: 7490b008d123 ("ieee802154: add support for atusb transceiver")
Reported-by: Alexander Potapenko <[email protected]>
Acked-by: Alexander Aring <[email protected]>
Signed-off-by: Pavel Skripkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stefan Schmidt <[email protected]>

EDAC/i10nm: Release mdev/mbase when failing to detect HBM

On systems without HBM (High Bandwidth Memory) mdev/mbase are not
released/unmapped.

Add the code to release mdev/mbase when failing to detect HBM.

[Tony: re-word commit message]

Cc: <[email protected]>
Fixes: c945088384d0 ("EDAC/i10nm: Add support for high bandwidth memory")
Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Qiuxu Zhuo <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'mac80211-for-net-2022-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Two more changes:
- mac80211: initialize a variable to avoid using it uninitialized
- mac80211 mesh: put some data structures into the container to
   fix bugs with and not have to deal with allocation failures

* tag 'mac80211-for-net-2022-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
  mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh
  mac80211: initialize variable have_higher_than_11mbit
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

arm64: perf: Don't register user access sysctl handler multiple times

Commit e2012600810c ("arm64: perf: Add userspace counter access disable
switch") introduced a new 'perf_user_access' sysctl file to enable and
disable direct userspace access to the PMU counters. Sadly, Geert
reports that on his big.LITTLE SoC ('Renesas Salvator-XS w/ R-Car H3'),
the file is created for each PMU type probed, resulting in a splat
during boot:

  | hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available
  | sysctl duplicate entry: /kernel//perf_user_access
  | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc3-arm64-renesas-00003-ge2012600810c #1420
  | Hardware name: Renesas Salvator-X 2nd version board based on r8a77951 (DT)
  | Call trace:
  |  dump_backtrace+0x0/0x190
  |  show_stack+0x14/0x20
  |  dump_stack_lvl+0x88/0xb0
  |  dump_stack+0x14/0x2c
  |  __register_sysctl_table+0x384/0x818
  |  register_sysctl+0x20/0x28
  |  armv8_pmu_init.constprop.0+0x118/0x150
  |  armv8_a57_pmu_init+0x1c/0x28
  |  arm_pmu_device_probe+0x1b4/0x558
  |  armv8_pmu_device_probe+0x18/0x20
  |  platform_probe+0x64/0xd0
  |  hw perfevents: enabled with armv8_cortex_a57 PMU driver, 7 counters available

Introduce a state variable to track creation of the sysctl file and
ensure that it is only created once.

Reported-by: Geert Uytterhoeven <[email protected]>
Fixes: e2012600810c ("arm64: perf: Add userspace counter access disable switch")
Link: https://lore.kernel.org/r/CAMuHMdVcDxR9sGzc5pcnORiotonERBgc6dsXZXMd6wTvLGA9iw@mail.gmail.com
Signed-off-by: Will Deacon <[email protected]>

RDMA/rxe: Prevent double freeing rxe_map_set()

The same rxe_map_set could be freed twice:

rxe_reg_user_mr()
  -> rxe_mr_init_user()
    -> rxe_mr_free_map_set() # 1st

  -> rxe_drop_ref()
   ...
    -> rxe_mr_cleanup()
      -> rxe_mr_free_map_set() # 2nd

Follow normal convection and put resource cleanup either in the error
unwind of the allocator, or the overall free function. Leave the object
unchanged with a NULL cur_map_set on failure and remove the unncessary
free in rxe_mr_init_user().

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Li Zhijian <[email protected]>
Acked-by: Zhu Yanjun <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh

Syzbot hit NULL deref in rhashtable_free_and_destroy(). The problem was
in mesh_paths and mpp_paths being NULL.

mesh_pathtbl_init() could fail in case of memory allocation failure, but
nobody cared, since ieee80211_mesh_init_sdata() returns void. It led to
leaving 2 pointers as NULL. Syzbot has found null deref on exit path,
but it could happen anywhere else, because code assumes these pointers are
valid.

Since all ieee80211_*_setup_sdata functions are void and do not fail,
let's embedd mesh_paths and mpp_paths into parent struct to avoid
adding error handling on higher levels and follow the pattern of others
setup_sdata functions

Fixes: 60854fd94573 ("mac80211: mesh: convert path table to rhashtable")
Reported-and-tested-by: [email protected]
Signed-off-by: Pavel Skripkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

mac80211: initialize variable have_higher_than_11mbit

Clang static analysis reports this warnings

mlme.c:5332:7: warning: Branch condition evaluates to a
  garbage value
    have_higher_than_11mbit)
    ^~~~~~~~~~~~~~~~~~~~~~~

have_higher_than_11mbit is only set to true some of the time in
ieee80211_get_rates() but is checked all of the time.  So
have_higher_than_11mbit needs to be initialized to false.

Fixes: 5d6a1b069b7f ("mac80211: set basic rates earlier")
Signed-off-by: Tom Rix <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check

The devm_ioremap() function does not return error pointers. It returns
NULL.

Fixes: 036a7584bede ("drivers: perf: Add LLC-TAD perf counter support")
Signed-off-by: Dan Carpenter <[email protected]>
Link: https://lore.kernel.org/r/20211217145907.GA16611@kili
Signed-off-by: Will Deacon <[email protected]>

perf/smmuv3: Fix unused variable warning when CONFIG_OF=n

The kbuild robot reports that building the SMMUv3 PMU driver with
CONFIG_OF=n results in a warning for W=1 builds:

>> drivers/perf/arm_smmuv3_pmu.c:889:34: warning: unused variable 'smmu_pmu_of_match' [-Wunused-const-variable]
static const struct of_device_id smmu_pmu_of_match[] = {
^

Guard the match table with #ifdef CONFIG_OF.

Link: https://lore.kernel.org/r/[email protected]
Fixes: 3f7be4356176 ("perf/smmuv3: Add devicetree support")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc

tx_queue_len can be set to ~0U, we need to be more
careful about overflows.

__fls(0) is undefined, as this report shows:

UBSAN: shift-out-of-bounds in net/sched/sch_qfq.c:1430:24
shift exponent 51770272 is too large for 32-bit type 'int'
CPU: 0 PID: 25574 Comm: syz-executor.0 Not tainted 5.16.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x201/0x2d8 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:151 [inline]
__ubsan_handle_shift_out_of_bounds+0x494/0x530 lib/ubsan.c:330
qfq_init_qdisc+0x43f/0x450 net/sched/sch_qfq.c:1430
qdisc_create+0x895/0x1430 net/sched/sch_api.c:1253
tc_modify_qdisc+0x9d9/0x1e20 net/sched/sch_api.c:1660
rtnetlink_rcv_msg+0x934/0xe60 net/core/rtnetlink.c:5571
netlink_rcv_skb+0x200/0x470 net/netlink/af_netlink.c:2496
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x814/0x9f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0xaea/0xe60 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg net/socket.c:724 [inline]
____sys_sendmsg+0x5b9/0x910 net/socket.c:2409
___sys_sendmsg net/socket.c:2463 [inline]
__sys_sendmsg+0x280/0x370 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netrom: fix copying in user data in nr_setsockopt

This code used to copy in an unsigned long worth of data before
the sockptr_t conversion, so restore that.

Fixes: a7b75c5a8c41 ("net: pass a sockptr_t into ->setsockopt")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'srv6-traceroute'

Andrew Lunn says:

====================
Fix traceroute in the presence of SRv6

When using SRv6 the destination IP address in the IPv6 header is not
always the true destination, it can be a router along the path that
SRv6 is using.

When ICMP reports an error, e.g, time exceeded, which is what
traceroute uses, it included the packet which invoked the error into
the ICMP message body. Upon receiving such an ICMP packet, the
invoking packet is examined and an attempt is made to find the socket
which sent the packet, so the error can be reported. Lookup is
performed using the source and destination address. If the
intermediary router IP address from the IP header is used, the lookup
fails. It is necessary to dig into the header and find the true
destination address in the Segment Router header, SRH.

v2:
Play games with the skb->network_header rather than clone the skb
v3:
Move helpers into seg6.c
v4:
Move short helper into header file.
Rework getting SRH destination address
v5:
Fix comment to describe function, not caller

Patch 1 exports a helper which can find the SRH in a packet
Patch 2 does the actual examination of the invoking packet
Patch 3 makes use of the results when trying to find the socket.
====================

Signed-off-by: David S. Miller <[email protected]>

udp6: Use Segment Routing Header for dest address if present

When finding the socket to report an error on, if the invoking packet
is using Segment Routing, the IPv6 destination address is that of an
intermediate router, not the end destination. Extract the ultimate
destination address from the segment address.

This change allows traceroute to function in the presence of Segment
Routing.

Signed-off-by: Andrew Lunn <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

icmp: ICMPV6: Examine invoking packet for Segment Route Headers.

RFC8754 says:

ICMP error packets generated within the SR domain are sent to source
nodes within the SR domain.  The invoking packet in the ICMP error
message may contain an SRH.  Since the destination address of a packet
with an SRH changes as each segment is processed, it may not be the
destination used by the socket or application that generated the
invoking packet.

For the source of an invoking packet to process the ICMP error
message, the ultimate destination address of the IPv6 header may be
required.  The following logic is used to determine the destination
address for use by protocol-error handlers.

*  Walk all extension headers of the invoking IPv6 packet to the
   routing extension header preceding the upper-layer header.

   -  If routing header is type 4 Segment Routing Header (SRH)

      o  The SID at Segment List[0] may be used as the destination
         address of the invoking packet.

Mangle the skb so the network header points to the invoking packet
inside the ICMP packet. The seg6 helpers can then be used on the skb
to find any segment routing headers. If found, mark this fact in the
IPv6 control block of the skb, and store the offset into the packet of
the SRH. Then restore the skb back to its old state.

Signed-off-by: Andrew Lunn <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

seg6: export get_srh() for ICMP handling

An ICMP error message can contain in its message body part of an IPv6
packet which invoked the error. Such a packet might contain a segment
router header. Export get_srh() so the ICMP code can make use of it.

Since his changes the scope of the function from local to global, add
the seg6_ prefix to keep the namespace clean. And move it into seg6.c
so it is always available, not just when IPV6_SEG6_LWTUNNEL is
enabled.

Signed-off-by: Andrew Lunn <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.16

Pull MD fix from Song, fixing a raid1 regression with missing bitmap
updates.

* 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md/raid1: fix missing bitmap update w/o WriteMostly devices

Merge tag 'batadv-net-pullrequest-20220103' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here is a batman-adv bugfix:

- avoid sending link-local multicast to multicast routers,
by Linus Lüssing

* tag 'batadv-net-pullrequest-20220103' of git://git.open-mesh.org/linux-merge:
batman-adv: mcast: don't send link-local multicast to mcast routers
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Revert "net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register"

This reverts commit b45396afa4177f2b1ddfeff7185da733fade1dc3 ("net: phy:
fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register")
since it prevents any system that uses a fixed PHY without a GPIO
descriptor from properly working:

[    5.971952] brcm-systemport 9300000.ethernet: failed to register fixed PHY
[    5.978854] brcm-systemport: probe of 9300000.ethernet failed with error -22
[    5.986047] brcm-systemport 9400000.ethernet: failed to register fixed PHY
[    5.992947] brcm-systemport: probe of 9400000.ethernet failed with error -22

Fixes: b45396afa417 ("net: phy: fixed_phy: Fix NULL vs IS_ERR() checking in __fixed_phy_register")
Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

md/raid1: fix missing bitmap update w/o WriteMostly devices

commit [1] causes missing bitmap updates when there isn't any WriteMostly
devices.

Detailed steps to reproduce by Norbert (which somehow didn't make to lore):

   # setup md10 (raid1) with two drives (1 GByte sparse files)
   dd if=/dev/zero of=disk1 bs=1024k seek=1024 count=0
   dd if=/dev/zero of=disk2 bs=1024k seek=1024 count=0

   losetup /dev/loop11 disk1
   losetup /dev/loop12 disk2

   mdadm --create /dev/md10 --level=1 --raid-devices=2 /dev/loop11 /dev/loop12

   # add bitmap (aka write-intent log)
   mdadm /dev/md10 --grow --bitmap=internal

   echo check > /sys/block/md10/md/sync_action

   root:# cat /sys/block/md10/md/mismatch_cnt
   0
   root:#

   # remove member drive disk2 (loop12)
   mdadm /dev/md10 -f loop12 ; mdadm /dev/md10 -r loop12

   # modify degraded md device
   dd if=/dev/urandom of=/dev/md10 bs=512 count=1

   # no blocks recorded as out of sync on the remaining member disk1/loop11
   root:# mdadm -X /dev/loop11 | grep Bitmap
             Bitmap : 16 bits (chunks), 0 dirty (0.0%)
   root:#

   # re-add disk2, nothing synced because of empty bitmap
   mdadm /dev/md10 --re-add /dev/loop12

   # check integrity again
   echo check > /sys/block/md10/md/sync_action

   # disk1 and disk2 are no longer in sync, reads return differend data
   root:# cat /sys/block/md10/md/mismatch_cnt
   128
   root:#

   # clean up
   mdadm -S /dev/md10
   losetup -d /dev/loop11
   losetup -d /dev/loop12
   rm disk1 disk2

Fix this by moving the WriteMostly check to the if condition for
alloc_behind_master_bio().

[1] commit fd3b6975e9c1 ("md/raid1: only allocate write behind bio for WriteMostly device")
Fixes: fd3b6975e9c1 ("md/raid1: only allocate write behind bio for WriteMostly device")
Cc: [email protected] # v5.12+
Cc: Guoqing Jiang <[email protected]>
Cc: Jens Axboe <[email protected]>
Reported-by: Norbert Warmuth <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Song Liu <[email protected]>

ipv6: Do cleanup if attribute validation fails in multipath route

As Nicolas noted, if gateway validation fails walking the multipath
attribute the code should jump to the cleanup to free previously
allocated memory.

Fixes: 1ff15a710a86 ("ipv6: Check attribute length for RTA_GATEWAY when deleting multipath route")
Signed-off-by: David Ahern <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ipv6: Continue processing multipath route even if gateway attribute is invalid

ip6_route_multipath_del loop continues processing the multipath
attribute even if delete of a nexthop path fails. For consistency,
do the same if the gateway attribute is invalid.

Fixes: 1ff15a710a86 ("ipv6: Check attribute length for RTA_GATEWAY when deleting multipath route")
Signed-off-by: David Ahern <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

MAINTAINERS: update gpio-brcmstb maintainers

Add Doug and Florian as maintainers for gpio-brcmstb, and remove myself.

Signed-off-by: Gregory Fong <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

gpio: gpio-aspeed-sgpio: Fix wrong hwirq base in irq handler

Each aspeed sgpio bank has 64 gpio pins(32 input pins and 32 output pins).
The hwirq base for each sgpio bank should be multiples of 64 rather than
multiples of 32.

Signed-off-by: Steven Lee <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

Linux 5.16-rc8

Merge tag 'perf-tools-fixes-for-v5.16-2022-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

- Fix TUI exit screen refresh race condition in 'perf top'.

- Fix parsing of Intel PT VM time correlation arguments.

- Honour CPU filtering command line request of a script's switch events
   in 'perf script'.

- Fix printing of switch events in Intel PT python script.

- Fix duplicate alias events list printing in 'perf list', noticed on
   heterogeneous arm64 systems.

- Fix return value of ids__new(), users expect NULL for failure, not
   ERR_PTR(-ENOMEM).

* tag 'perf-tools-fixes-for-v5.16-2022-01-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf top: Fix TUI exit screen refresh race condition
  perf pmu: Fix alias events list
  perf scripts python: intel-pt-events.py: Fix printing of switch events
  perf script: Fix CPU filtering of a script's switch events
  perf intel-pt: Fix parsing of VM time correlation arguments
  perf expr: Fix return value of ids__new()

net/fsl: Remove leftover definition in xgmac_mdio

commit 26eee0210ad7 ("net/fsl: fix a bug in xgmac_mdio") fixed a bug in
the QorIQ mdio driver but left the (now unused) incorrect bit definition
for MDIO_DATA_BSY in the code. This commit removes it.

Signed-off-by: Markus Koch <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Better input validation for compat ioctls and a documentation bugfix
  for 5.16"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  Docs: Fixes link to I2C specification
  i2c: validate user data in compat ioctl

Merge tag 'x86_urgent_for_v5.16_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:

- Use the proper CONFIG symbol in a preprocessor check.

* tag 'x86_urgent_for_v5.16_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Use the proper name CONFIG_FW_LOADER

rndis_host: support Hytera digital radios

Hytera makes a range of digital (DMR) radios. These radios can be
programmed to a allow a computer to control them over Ethernet over USB,
either using NCM or RNDIS.

This commit adds support for RNDIS for Hytera radios. I tested with a
Hytera PD785 and a Hytera MD785G. When these radios are programmed to
set up a Radio to PC Network using RNDIS, an USB interface will be added
with class 2 (Communications), subclass 2 (Abstract Modem Control) and
an interface protocol of 255 ("vendor specific" - lsusb even hints "MSFT
RNDIS?").

This patch is similar to the solution of this StackOverflow user, but
that only works for the Hytera MD785:
https://stackoverflow.com/a/53550858

To use the "Radio to PC Network" functionality of Hytera DMR radios, the
radios need to be programmed correctly in CPS (Hytera's Customer
Programming Software). "Forward to PC" should be checked in "Network"
(under "General Setting" in "Conventional") and the "USB Network
Communication Protocol" should be set to RNDIS.

Signed-off-by: Thomas Toye <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

perf top: Fix TUI exit screen refresh race condition

When the following command is executed several times, a coredump file is
generated.

$ timeout -k 9 5 perf top -e task-clock
*******
*******
*******
0.01%  [kernel]                  [k] __do_softirq
0.01%  libpthread-2.28.so        [.] __pthread_mutex_lock
0.01%  [kernel]                  [k] __ll_sc_atomic64_sub_return
double free or corruption (!prev) perf top --sort comm,dso
timeout: the monitored command dumped core

When we terminate "perf top" using sending signal method,
SLsmg_reset_smg() called. SLsmg_reset_smg() resets the SLsmg screen
management routines by freeing all memory allocated while it was active.

However SLsmg_reinit_smg() maybe be called by another thread.

SLsmg_reinit_smg() will free the same memory accessed by
SLsmg_reset_smg(), thus it results in a double free.

SLsmg_reinit_smg() is called already protected by ui__lock, so we fix
the problem by adding pthread_mutex_trylock of ui__lock when calling
SLsmg_reset_smg().

Signed-off-by: Wenyu Liu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Hewenliang <[email protected]>
Signed-off-by: yaowenbin <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf pmu: Fix alias events list

Commit 0e0ae8742207c3b4 ("perf list: Display hybrid PMU events with cpu
type") changes the event list for uncore PMUs or arm64 heterogeneous CPU
systems, such that duplicate aliases are incorrectly listed per PMU
(which they should not be), like:

  # perf list
  ...
  unc_cbo_cache_lookup.any_es
  [Unit: uncore_cbox L3 Lookup any request that access cache and found
  line in E or S-state]
  unc_cbo_cache_lookup.any_es
  [Unit: uncore_cbox L3 Lookup any request that access cache and found
  line in E or S-state]
  unc_cbo_cache_lookup.any_i
  [Unit: uncore_cbox L3 Lookup any request that access cache and found
  line in I-state]
  unc_cbo_cache_lookup.any_i
  [Unit: uncore_cbox L3 Lookup any request that access cache and found
  line in I-state]
  ...

Notice how the events are listed twice.

The named commit changed how we remove duplicate events, in that events
for different PMUs are not treated as duplicates. I suppose this is to
handle how "Each hybrid pmu event has been assigned with a pmu name".

Fix PMU alias listing by restoring behaviour to remove duplicates for
non-hybrid PMUs.

Fixes: 0e0ae8742207c3b4 ("perf list: Display hybrid PMU events with cpu type")
Signed-off-by: John Garry <[email protected]>
Tested-by: Zhengjun Xing <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

sctp: hold endpoint before calling cb in sctp_transport_lookup_process

The same fix in commit 5ec7d18d1813 ("sctp: use call_rcu to free endpoint")
is also needed for dumping one asoc and sock after the lookup.

Fixes: 86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the dump")
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'ena-fixes'

Arthur Kiyanovski says:

====================
ENA driver bug fixes

Patchset V2 chages:
-------------------
Updated SHA1 of Fixes tag in patch 3/3 to be 12 digits long

Original cover letter:
----------------------
ENA driver bug fixes
====================

Signed-off-by: David S. Miller <[email protected]>

net: ena: Fix error handling when calculating max IO queues number

The role of ena_calc_max_io_queue_num() is to return the number
of queues supported by the device, which means the return value
should be >=0.

The function that calls ena_calc_max_io_queue_num(), checks
the return value. If it is 0, it means the device reported
it supports 0 IO queues. This case is considered an error
and is handled by the calling function accordingly.

However the current implementation of ena_calc_max_io_queue_num()
is wrong, since when it detects the device supports 0 IO queues,
it returns -EFAULT.

In such a case the calling function doesn't detect the error,
and therefore doesn't handle it.

This commit changes ena_calc_max_io_queue_num() to return 0
in case the device reported it supports 0 queues, allowing the
calling function to properly handle the error case.

Fixes: 736ce3f414cc ("net: ena: make ethtool -l show correct max number of queues")
Signed-off-by: Shay Agroskin <[email protected]>
Signed-off-by: Arthur Kiyanovski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ena: Fix wrong rx request id by resetting device

A wrong request id received from the device is a sign that
something is wrong with it, therefore trigger a device reset.

Also add some debug info to the "Page is NULL" print to make
it easier to debug.

Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Arthur Kiyanovski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ena: Fix undefined state when tx request id is out of bounds

ena_com_tx_comp_req_id_get() checks the req_id of a received completion,
and if it is out of bounds returns -EINVAL. This is a sign that
something is wrong with the device and it needs to be reset.

The current code does not reset the device in this case, which leaves
the driver in an undefined state, where this completion is not properly
handled.

This commit adds a call to handle_invalid_req_id() in ena_clean_tx_irq()
and ena_clean_xdp_irq() which resets the device to fix the issue.

This commit also removes unnecessary request id checks from
validate_tx_req_id() and validate_xdp_req_id(). This check is unneeded
because it was already performed in ena_com_tx_comp_req_id_get(), which
is called right before these functions.

Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
Signed-off-by: Shay Agroskin <[email protected]>
Signed-off-by: Arthur Kiyanovski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mctp: Remove only static neighbour on RTM_DELNEIGH

Add neighbour source flag in mctp_neigh_remove(...) to allow removal of
only static neighbours.

This should be a no-op change and might be useful later when mctp can
have MCTP_NEIGH_DISCOVER neighbours.

Signed-off-by: Gagan Kumar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

batman-adv: mcast: don't send link-local multicast to mcast routers

The addition of routable multicast TX handling introduced a
bug/regression for packets with a link-local multicast destination:
These packets would be sent to all batman-adv nodes with a multicast
router and to all batman-adv nodes with an old version without multicast
router detection.

This even disregards the batman-adv multicast fanout setting, which can
potentially lead to an unwanted, high number of unicast transmissions or
even congestion.

Fixing this by avoiding to send link-local multicast packets to nodes in
the multicast router list.

Fixes: 11d458c1cb9b ("batman-adv: mcast: apply optimizations for routable packets, too")
Signed-off-by: Linus Lüssing <[email protected]>
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Simon Wunderlich <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:
"Two small fixups for spaceball joystick driver and appletouch touchpad
  driver"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: spaceball - fix parsing of movement data packets
  Input: appletouch - initialize work before device registration

net ticp:fix a kernel-infoleak in __tipc_sendmsg()

struct tipc_socket_addr.ref has a 4-byte hole,and __tipc_getname() currently
copying it to user space,causing kernel-infoleak.

BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline]
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] lib/usercopy.c:33
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 lib/usercopy.c:33
instrument_copy_to_user include/linux/instrumented.h:121 [inline]
instrument_copy_to_user include/linux/instrumented.h:121 [inline] lib/usercopy.c:33
_copy_to_user+0x1c9/0x270 lib/usercopy.c:33 lib/usercopy.c:33
copy_to_user include/linux/uaccess.h:209 [inline]
copy_to_user include/linux/uaccess.h:209 [inline] net/socket.c:287
move_addr_to_user+0x3f6/0x600 net/socket.c:287 net/socket.c:287
__sys_getpeername+0x470/0x6b0 net/socket.c:1987 net/socket.c:1987
__do_sys_getpeername net/socket.c:1997 [inline]
__se_sys_getpeername net/socket.c:1994 [inline]
__do_sys_getpeername net/socket.c:1997 [inline] net/socket.c:1994
__se_sys_getpeername net/socket.c:1994 [inline] net/socket.c:1994
__x64_sys_getpeername+0xda/0x120 net/socket.c:1994 net/socket.c:1994
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_x64 arch/x86/entry/common.c:51 [inline] arch/x86/entry/common.c:82
do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x44/0xae

Uninit was stored to memory at:
tipc_getname+0x575/0x5e0 net/tipc/socket.c:757 net/tipc/socket.c:757
__sys_getpeername+0x3b3/0x6b0 net/socket.c:1984 net/socket.c:1984
__do_sys_getpeername net/socket.c:1997 [inline]
__se_sys_getpeername net/socket.c:1994 [inline]
__do_sys_getpeername net/socket.c:1997 [inline] net/socket.c:1994
__se_sys_getpeername net/socket.c:1994 [inline] net/socket.c:1994
__x64_sys_getpeername+0xda/0x120 net/socket.c:1994 net/socket.c:1994
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_x64 arch/x86/entry/common.c:51 [inline] arch/x86/entry/common.c:82
do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x44/0xae

Uninit was stored to memory at:
msg_set_word net/tipc/msg.h:212 [inline]
msg_set_destport net/tipc/msg.h:619 [inline]
msg_set_word net/tipc/msg.h:212 [inline] net/tipc/socket.c:1486
msg_set_destport net/tipc/msg.h:619 [inline] net/tipc/socket.c:1486
__tipc_sendmsg+0x44fa/0x5890 net/tipc/socket.c:1486 net/tipc/socket.c:1486
tipc_sendmsg+0xeb/0x140 net/tipc/socket.c:1402 net/tipc/socket.c:1402
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg net/socket.c:724 [inline]
sock_sendmsg_nosec net/socket.c:704 [inline] net/socket.c:2409
sock_sendmsg net/socket.c:724 [inline] net/socket.c:2409
____sys_sendmsg+0xe11/0x12c0 net/socket.c:2409 net/socket.c:2409
___sys_sendmsg net/socket.c:2463 [inline]
___sys_sendmsg net/socket.c:2463 [inline] net/socket.c:2492
__sys_sendmsg+0x704/0x840 net/socket.c:2492 net/socket.c:2492
__do_sys_sendmsg net/socket.c:2501 [inline]
__se_sys_sendmsg net/socket.c:2499 [inline]
__do_sys_sendmsg net/socket.c:2501 [inline] net/socket.c:2499
__se_sys_sendmsg net/socket.c:2499 [inline] net/socket.c:2499
__x64_sys_sendmsg+0xe2/0x120 net/socket.c:2499 net/socket.c:2499
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_x64 arch/x86/entry/common.c:51 [inline] arch/x86/entry/common.c:82
do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x44/0xae

Local variable skaddr created at:
__tipc_sendmsg+0x2d0/0x5890 net/tipc/socket.c:1419 net/tipc/socket.c:1419
tipc_sendmsg+0xeb/0x140 net/tipc/socket.c:1402 net/tipc/socket.c:1402

Bytes 4-7 of 16 are uninitialized
Memory access of size 16 starts at ffff888113753e00
Data copied to user address 0000000020000280

Reported-by: [email protected]
Signed-off-by: Haimin Zhang <[email protected]>
Acked-by: Jon Maloy <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: net: udpgro_fwd.sh: explicitly checking the available ping feature

As Paolo pointed out, the result of ping IPv6 address depends on
the running distro. So explicitly checking the available ping feature,
as e.g. do the bareudp.sh self-tests.

Fixes: 8b3170e07539 ("selftests: net: using ping6 for IPv6 in udpgro_fwd.sh")
Signed-off-by: Jianguo Wu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2021-12-31

We've added 2 non-merge commits during the last 14 day(s) which contain
a total of 2 files changed, 3 insertions(+), 3 deletions(-).

The main changes are:

1) Revert of an earlier attempt to fix xsk's poll() behavior where it
   turned out that the fix for a rare problem made it much worse in
   general, from Magnus Karlsson. (Fyi, Magnus mentioned that a proper
   fix is coming early next year, so the revert is mainly to avoid
   slipping the behavior into 5.16.)

2) Minor misc spell fix in BPF selftests, from Colin Ian King.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf, selftests: Fix spelling mistake "tained" -> "tainted"
  Revert "xsk: Do not sleep in poll() when need_wakeup set"
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mm: vmscan: reduce throttling due to a failure to make progress -fix

Hugh Dickins reported the following

My tmpfs swapping load (tweaked to use huge pages more heavily
than in real life) is far from being a realistic load: but it was
notably slowed down by your throttling mods in 5.16-rc, and this
patch makes it well again - thanks.

But: it very quickly hit NULL pointer until I changed that last
line to

        if (first_pgdat)
                consider_reclaim_throttle(first_pgdat, sc);

The likely issue is that huge pages are a major component of the test
workload.  When this is the case, first_pgdat may never get set if
compaction is ready to continue due to this check

        if (IS_ENABLED(CONFIG_COMPACTION) &&
            sc->order > PAGE_ALLOC_COSTLY_ORDER &&
            compaction_ready(zone, sc)) {
                sc->compaction_ready = true;
                continue;
        }

If this was true for every zone in the zonelist, first_pgdat would never
get set resulting in a NULL pointer exception.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1b4e3f26f9f75 ("mm: vmscan: Reduce throttling due to a failure to make progress")
Signed-off-by: Mel Gorman <[email protected]>
Reported-by: Hugh Dickins <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: vmscan: Reduce throttling due to a failure to make progress

Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
problems due to reclaim throttling for excessive lengths of time.  In
Alexey's case, a memory hog that should go OOM quickly stalls for
several minutes before stalling.  In Mike and Darrick's cases, a small
memcg environment stalled excessively even though the system had enough
memory overall.

Commit 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is
being made") introduced the problem although commit a19594ca4a8b
("mm/vmscan: increase the timeout if page reclaim is not making
progress") made it worse.  Systems at or near an OOM state that cannot
be recovered must reach OOM quickly and memcg should kill tasks if a
memcg is near OOM.

To address this, only stall for the first zone in the zonelist, reduce
the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if
the scan control nr_reclaimed is 0, kswapd is still active and there
were excessive pages pending for writeback.  If kswapd has stopped
reclaiming due to excessive failures, do not stall at all so that OOM
triggers relatively quickly.  Similarly, if an LRU is simply congested,
only lightly throttle similar to NOPROGRESS.

Alexey's original case was the most straight forward

for i in {1..3}; do tail /dev/zero; done

On vanilla 5.16-rc1, this test stalled heavily, after the patch the test
completes in a few seconds similar to 5.15.

Alexey's second test case added watching a youtube video while tail runs
10 times.  On 5.15, playback only jitters slightly, 5.16-rc1 stalls a
lot with lots of frames missing and numerous audio glitches.  With this
patch applies, the video plays similarly to 5.15.

[[email protected]: Fix W=1 build warning]

Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Link: https://linux-regtracking.leemhuis.info/regzbot/regression/[email protected]/
Reported-and-tested-by: Alexey Avramov <[email protected]>
Reported-and-tested-by: Mike Galbraith <[email protected]>
Reported-and-tested-by: Darrick J. Wong <[email protected]>
Reported-by: kernel test robot <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Tracked-by: Thorsten Leemhuis <[email protected]>
Fixes: 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being made")
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'akpm' (patches from Andrew)

Merge misc mm fixes from Andrew Morton:
"2 patches.

  Subsystems affected by this patch series: mm (userfaultfd and damon)"

* akpm:
  mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'
  userfaultfd/selftests: fix hugetlb area allocations

x86/mce: Reduce number of machine checks taken during recovery

When any of the copy functions in arch/x86/lib/copy_user_64.S take a
fault, the fixup code copies the remaining byte count from %ecx to %edx
and unconditionally jumps to .Lcopy_user_handle_tail to continue the
copy in case any more bytes can be copied.

If the fault was #PF this may copy more bytes (because the page fault
handler might have fixed the fault). But when the fault is a machine
check the original copy code will have copied all the way to the poisoned
cache line. So .Lcopy_user_handle_tail will just take another machine
check for no good reason.

Every code path to .Lcopy_user_handle_tail comes from an exception fixup
path, so add a check there to check the trap type (in %eax) and simply
return the count of remaining bytes if the trap was a machine check.

Doing this reduces the number of machine checks taken during synthetic
tests from four to three.

As well as reducing the number of machine checks, this also allows
Skylake generation Xeons to recover some cases that currently fail. The
is because REP; MOVSB is only recoverable when source and destination
are well aligned and the byte count is large. That useless call to
.Lcopy_user_handle_tail may violate one or more of these conditions and
generate a fatal machine check.

  [ Tony: Add more details to commit message. ]
  [ bp: Fixup comment.
    Also, another tip patchset which is adding straight-line speculation
    mitigation changes the "ret" instruction to an all-caps macro "RET".
    But, since gas is case-insensitive, use "RET" in the newly added asm block
    already in order to simplify tip branch merging on its way upstream.
  ]

Signed-off-by: Youquan Song <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Three fixes, all in drivers. The lpfc one doesn't look exploitable,
  but nasty things could happen in string operations if mybuf ends up
  with an on stack unterminated string"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: vmw_pvscsi: Set residual data length conditionally
  scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()
  scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()

mm/damon/dbgfs: fix 'struct pid' leaks in 'dbgfs_target_ids_write()'

DAMON debugfs interface increases the reference counts of 'struct pid's
for targets from the 'target_ids' file write callback
('dbgfs_target_ids_write()'), but decreases the counts only in DAMON
monitoring termination callback ('dbgfs_before_terminate()').

Therefore, when 'target_ids' file is repeatedly written without DAMON
monitoring start/termination, the reference count is not decreased and
therefore memory for the 'struct pid' cannot be freed. This commit
fixes this issue by decreasing the reference counts when 'target_ids' is
written.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4bc05954d007 ("mm/damon: implement a debugfs-based user space interface")
Signed-off-by: SeongJae Park <[email protected]>
Cc: <[email protected]> [5.15+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

userfaultfd/selftests: fix hugetlb area allocations

Currently, userfaultfd selftest for hugetlb as run from run_vmtests.sh
or any environment where there are 'just enough' hugetlb pages will
always fail with:

  testing events (fork, remap, remove):
ERROR: UFFDIO_COPY error: -12 (errno=12, line=616)

The ENOMEM error code implies there are not enough hugetlb pages.
However, there are free hugetlb pages but they are all reserved.  There
is a basic problem with the way the test allocates hugetlb pages which
has existed since the test was originally written.

Due to the way 'cleanup' was done between different phases of the test,
this issue was masked until recently.  The issue was uncovered by commit
8ba6e8640844 ("userfaultfd/selftests: reinitialize test context in each
test").

For the hugetlb test, src and dst areas are allocated as PRIVATE
mappings of a hugetlb file.  This means that at mmap time, pages are
reserved for the src and dst areas.  At the start of event testing (and
other tests) the src area is populated which results in allocation of
huge pages to fill the area and consumption of reserves associated with
the area.  Then, a child is forked to fault in the dst area.  Note that
the dst area was allocated in the parent and hence the parent owns the
reserves associated with the mapping.  The child has normal access to
the dst area, but can not use the reserves created/owned by the parent.
Thus, if there are no other huge pages available allocation of a page
for the dst by the child will fail.

Fix by not creating reserves for the dst area.  In this way the child
can use free (non-reserved) pages.

Also, MAP_PRIVATE of a file only makes sense if you are interested in
the contents of the file before making a COW copy.  The test does not do
this.  So, just use MAP_ANONYMOUS | MAP_HUGETLB to create an anonymous
hugetlb mapping.  There is no need to create a hugetlb file in the
non-shared case.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Kravetz <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Mina Almasry <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'mpr-len-checks'
David Ahern says:

====================
net: Length checks for attributes within multipath routes

Add length checks for attributes within a multipath route (attributes
within RTA_MULTIPATH). Motivated by the syzbot report in patch 1 and
then expanded to other attributes as noted by Ido.
====================

Signed-off-by: David S. Miller <[email protected]>

lwtunnel: Validate RTA_ENCAP_TYPE attribute length

lwtunnel_valid_encap_type_attr is used to validate encap attributes
within a multipath route. Add length validation checking to the type.

lwtunnel_valid_encap_type_attr is called converting attributes to
fib{6,}_config struct which means it is used before fib_get_nhs,
ip6_route_multipath_add, and ip6_route_multipath_del - other
locations that use rtnh_ok and then nla_get_u16 on RTA_ENCAP_TYPE
attribute.

Fixes: 9ed59592e3e3 ("lwtunnel: fix autoload of lwt modules")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: Check attribute length for RTA_GATEWAY when deleting multipath route

Make sure RTA_GATEWAY for IPv6 multipath route has enough bytes to hold
an IPv6 address.

Fixes: 6b9ea5a64ed5 ("ipv6: fix multipath route replace error recovery")
Signed-off-by: David Ahern <[email protected]>
Cc: Roopa Prabhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: Check attribute length for RTA_GATEWAY in multipath route

Commit referenced in the Fixes tag used nla_memcpy for RTA_GATEWAY as
does the current nla_get_in6_addr. nla_memcpy protects against accessing
memory greater than what is in the attribute, but there is no check
requiring the attribute to have an IPv6 address. Add it.

Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
Signed-off-by: David Ahern <[email protected]>
Cc: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv4: Check attribute length for RTA_FLOW in multipath route

Make sure RTA_FLOW is at least 4B before using.

Fixes: 4e902c57417c ("[IPv4]: FIB configuration using struct fib_config")
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv4: Check attribute length for RTA_GATEWAY in multipath route

syzbot reported uninit-value:
============================================================
  BUG: KMSAN: uninit-value in fib_get_nhs+0xac4/0x1f80
  net/ipv4/fib_semantics.c:708
   fib_get_nhs+0xac4/0x1f80 net/ipv4/fib_semantics.c:708
   fib_create_info+0x2411/0x4870 net/ipv4/fib_semantics.c:1453
   fib_table_insert+0x45c/0x3a10 net/ipv4/fib_trie.c:1224
   inet_rtm_newroute+0x289/0x420 net/ipv4/fib_frontend.c:886

Add helper to validate RTA_GATEWAY length before using the attribute.

Fixes: 4e902c57417c ("[IPv4]: FIB configuration using struct fib_config")
Reported-by: [email protected]
Signed-off-by: David Ahern <[email protected]>
Cc: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drm/amdgpu: disable runpm if we are the primary adapter

If we are the primary adapter (i.e., the one used by the firwmare
framebuffer), disable runtime pm. This fixes a regression caused
by commit 55285e21f045 which results in the displays waking up
shortly after they go to sleep due to the device coming out of
runtime suspend and sending a hotplug uevent.

v2: squash in reworked fix from Evan

Fixes: 55285e21f045 ("fbdev/efifb: Release PCI device's runtime PM ref during FB destroy")
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215203
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1840
Signed-off-by: Alex Deucher <[email protected]>

fbdev: fbmem: add a helper to determine if an aperture is used by a fw fb

Add a function for drivers to check if the a firmware initialized
fb is corresponds to their aperture. This allows drivers to check if the
device corresponds to what the firmware set up as the display device.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215203
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1840
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm: keep the BACO feature enabled for suspend

To pair with the workaround which always reset the ASIC in suspend.
Otherwise, the reset which relies on BACO will fail.

Fixes: daf8de0874ab5b ("drm/amdgpu: always reset the asic in suspend (v2)")
Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Guchun Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

Docs: Fixes link to I2C specification

The link to the I2C specification is broken. Although
"https://www.nxp.com" hosts Rev 7 (2021) of this specification, it is
behind a login-wall. Thus, an additional link has been added (which
doesn't require a login) and the NXP official docs link has been
updated.

Signed-off-by: Deep Majumder <[email protected]>
[wsa: minor updates to text and commit message]
Signed-off-by: Wolfram Sang <[email protected]>

i2c: validate user data in compat ioctl

Wrong user data may cause warning in i2c_transfer(), ex: zero msgs.
Userspace should not be able to trigger warnings, so this patch adds
validation checks for user data in compact ioctl to prevent reported
warnings

Reported-and-tested-by: [email protected]
Fixes: 7d5cb45655f2 ("i2c compat ioctls: move to ->compat_ioctl()")
Signed-off-by: Pavel Skripkin <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>

Input: spaceball - fix parsing of movement data packets

The spaceball.c module was not properly parsing the movement reports
coming from the device. The code read axis data as signed 16-bit
little-endian values starting at offset 2.

In fact, axis data in Spaceball movement reports are signed 16-bit
big-endian values starting at offset 3. This was determined first by
visually inspecting the data packets, and later verified by consulting:
http://spacemice.org/pdf/SpaceBall_2003-3003_Protocol.pdf

If this ever worked properly, it was in the time before Git...

Signed-off-by: Leo L. Schwab <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>