Git Repo - linux.git/log

rv: Update rv_en(dis)able_monitor doc to match kernel-doc

The patch updates the function documentation comment for
rv_en(dis)able_monitor to adhere to the kernel-doc specification.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Fixes: 102227b970a15 ("rv: Add Runtime Verification (RV) interface")
Signed-off-by: Yang Li <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test

Fix the 'make W=1' warning:

WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/trace/preemptirq_delay_test.o

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Mathieu Desnoyers <[email protected]>
Fixes: f96e8577da10 ("lib: Add module for testing preemptoff/irqsoff latency tracers")
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Jeff Johnson <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

ring-buffer: Fix a race between readers and resize checks

The reader code in rb_get_reader_page() swaps a new reader page into the
ring buffer by doing cmpxchg on old->list.prev->next to point it to the
new page. Following that, if the operation is successful,
old->list.next->prev gets updated too. This means the underlying
doubly-linked list is temporarily inconsistent, page->prev->next or
page->next->prev might not be equal back to page for some page in the
ring buffer.

The resize operation in ring_buffer_resize() can be invoked in parallel.
It calls rb_check_pages() which can detect the described inconsistency
and stop further tracing:

[  190.271762] ------------[ cut here ]------------
[  190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0
[  190.271789] Modules linked in: [...]
[  190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1
[  190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G            E      6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f
[  190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[  190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0
[  190.272023] Code: [...]
[  190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206
[  190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80
[  190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700
[  190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000
[  190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720
[  190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000
[  190.272053] FS:  00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000
[  190.272057] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0
[  190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  190.272077] Call Trace:
[  190.272098]  <TASK>
[  190.272189]  ring_buffer_resize+0x2ab/0x460
[  190.272199]  __tracing_resize_ring_buffer.part.0+0x23/0xa0
[  190.272206]  tracing_resize_ring_buffer+0x65/0x90
[  190.272216]  tracing_entries_write+0x74/0xc0
[  190.272225]  vfs_write+0xf5/0x420
[  190.272248]  ksys_write+0x67/0xe0
[  190.272256]  do_syscall_64+0x82/0x170
[  190.272363]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  190.272373] RIP: 0033:0x7f1bd657d263
[  190.272381] Code: [...]
[  190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263
[  190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001
[  190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000
[  190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500
[  190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002
[  190.272412]  </TASK>
[  190.272414] ---[ end trace 0000000000000000 ]---

Note that ring_buffer_resize() calls rb_check_pages() only if the parent
trace_buffer has recording disabled. Recent commit d78ab792705c
("tracing: Stop current tracer when resizing buffer") causes that it is
now always the case which makes it more likely to experience this issue.

The window to hit this race is nonetheless very small. To help
reproducing it, one can add a delay loop in rb_get_reader_page():

ret = rb_head_page_replace(reader, cpu_buffer->reader_page);
if (!ret)
goto spin;
for (unsigned i = 0; i < 1U << 26; i++)  /* inserted delay loop */
__asm__ __volatile__ ("" : : : "memory");
rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list;

.. and then run the following commands on the target system:

echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable
while true; do
echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
done &
while true; do
for i in /sys/kernel/tracing/per_cpu/*; do
timeout 0.1 cat $i/trace_pipe; sleep 0.2
done
done

To fix the problem, make sure ring_buffer_resize() doesn't invoke
rb_check_pages() concurrently with a reader operating on the same
ring_buffer_per_cpu by taking its cpu_buffer->reader_lock.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Fixes: 659f451ff213 ("ring-buffer: Add integrity check at end of iter read")
Signed-off-by: Petr Pavlu <[email protected]>
[ Fixed whitespace ]
Signed-off-by: Steven Rostedt (Google) <[email protected]>

ring-buffer: Correct stale comments related to non-consuming readers

Adjust the following code documentation:

* Kernel-doc comments for ring_buffer_read_prepare() and
  ring_buffer_read_finish() mention that recording to the ring buffer is
  disabled when the read is active. Remove mention of this restriction
  because it was already lifted in commit 1039221cc278 ("ring-buffer: Do
  not disable recording when there is an iterator").

* Function ring_buffer_read_finish() performs a self-check of the
  ring-buffer by locking cpu_buffer->reader_lock and then calling
  rb_check_pages(). The preceding comment explains that the lock is
  needed because rb_check_pages() clears the HEAD flag required by
  readers which might be running in parallel. Remove this explanation
  because commit 8843e06f67b1 ("ring-buffer: Handle race between
  rb_move_tail and rb_check_pages") simplified the function so it no
  longer resets the mentioned flag. Nonetheless, the lock is still
  needed because a reader swapping a page into the ring buffer can make
  the underlying doubly-linked list temporarily inconsistent.

This is a non-functional change.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Signed-off-by: Petr Pavlu <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Arnaldo Carvalho de Melo:
"General:

   - Integrate the shellcheck utility with the build of perf to allow
     catching shell problems early in areas such as 'perf test', 'perf
     trace' scrape scripts, etc

   - Add 'uretprobe' variant in the 'perf bench uprobe' tool

   - Add script to run instances of 'perf script' in parallel

   - Allow parsing tracepoint names that start with digits, such as
     9p/9p_client_req, etc. Make sure 'perf test' tests it even on
     systems where those tracepoints aren't available

   - Add Kan Liang to MAINTAINERS as a perf tools reviewer

   - Add support for using the 'capstone' disassembler library in
     various tools, such as 'perf script' and 'perf annotate'. This is
     an alternative for the use of the 'xed' and 'objdump' disassemblers

  Data-type profiling improvements:

   - Resolve types for a->b->c by backtracking the assignments until it
     finds DWARF info for one of those members

   - Support for global variables, keeping a cache to speed up lookups

   - Handle the 'call' instruction, dealing with effects on registers
     and handling its return when tracking register data types

   - Handle x86's segment based addressing like %gs:0x28, to support
     things like per CPU variables, the stack canary, etc

   - Data-type profiling got big speedups when using capstone for
     disassembling. The objdump outoput parsing method is left as a
     fallback when capstone fails or isn't available. There are patches
     posted for 6.11 that to use a LLVM disassembler

   - Support event group display in the TUI when annotating types with
     --data-type, for instance to show memory load and store events for
     the data type fields

   - Optimize the 'perf annotate' data structures, reducing memory usage

   - Add a initial 'perf test' for 'perf annotate', checking that a
     target symbol appears on the output, specifying objdump via the
     command line, etc

  Vendor Events:

   - Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand
     Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra
     Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics
     erroneously in TopdownL1

   - Add AMD's Zen 5 core and uncore events and metrics. Those come from
     the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh
     Processors" document, with events that capture information on op
     dispatch, execution and retirement, branch prediction, L1 and L2
     cache activity, TLB activity, etc

   - Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/
     AmpereOneX

  Miscellaneous:

   - Sync header copies with the kernel sources

   - Move some header copies used only for generating translation string
     tables for ioctl cmds and other syscall integer arguments to a new
     directory under tools/perf/beauty/, to separate from copies in
     tools/include/ that are used to build the tools

   - Introduce scrape script for several syscall 'flags'/'mask'
     arguments

   - Improve cpumap utilization, fixing up pairing of refcounts, using
     the right iterators (perf_cpu_map__for_each_cpu), etc

   - Give more details about raw event encodings in 'perf list', show
     tracepoint encoding in the detailed output

   - Refactor the DSOs handling code, reducing memory usage

   - Document the BPF event modifier and add a 'perf test' for it

   - Improve the event parser, better error messages and add further
     'perf test's for it

   - Add reference count checking to 'struct comm_str' and 'struct
     mem_info'

   - Make ARM64's 'perf test' entries for the Neoverse N1 more robust

   - Tweak the ARM64's Coresight 'perf test's

   - Improve ARM64's CoreSight ETM version detection and error reporting

   - Fix handling of symbols when using kcore

   - Fix PAI (Processor Activity Instrumentation) counter names for s390
     virtual machines in 'perf report'

   - Fix -g/--call-graph option failure in 'perf sched timehist'

   - Add LIBTRACEEVENT_DIR build option to allow building with
     libtraceevent installed in non-standard directories, such as when
     doing cross builds

   - Various 'perf test' and 'perf bench' fixes

   - Improve 'perf probe' error message for long C++ probe names"

* tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits)
  tools lib subcmd: Show parent options in help
  perf pmu: Count sys and cpuid JSON events separately
  perf stat: Don't display metric header for non-leader uncore events
  perf annotate-data: Ensure the number of type histograms
  perf annotate: Fix segfault on sample histogram
  perf daemon: Fix file leak in daemon_session__control
  libsubcmd: Fix parse-options memory leak
  perf lock: Avoid memory leaks from strdup()
  perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency
  perf tools: Ignore deleted cgroups
  perf parse: Allow tracepoint names to start with digits
  perf parse-events: Add new 'fake_tp' parameter for tests
  perf parse-events: pass parse_state to add_tracepoint
  perf symbols: Fix ownership of string in dso__load_vmlinux()
  perf symbols: Update kcore map before merging in remaining symbols
  perf maps: Re-use __maps__free_maps_by_name()
  perf symbols: Remove map from list before updating addresses
  perf tracepoint: Don't scan all tracepoints to test if one exists
  perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT
  perf thread: Fixes to thread__new() related to initializing comm
  ...

Merge tag 'bitmap-for-6.10v2' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:

- topology_span_sane() optimization from Kyle Meyer

- fns() rework from Kuan-Wei Chiu (used in cpumask_local_spread() and
   other places)

- headers cleanup from Andy

- add a MAINTAINERS record for bitops API

* tag 'bitmap-for-6.10v2' of https://github.com/norov/linux:
  usercopy: Don't use "proxy" headers
  bitops: Move aligned_byte_mask() to wordpart.h
  MAINTAINERS: add BITOPS API record
  bitmap: relax find_nth_bit() limitation on return value
  lib: make test_bitops compilable into the kernel image
  bitops: Optimize fns() for improved performance
  lib/test_bitops: Add benchmark test for fns()
  Compiler Attributes: Add __always_used macro
  sched/topology: Optimize topology_span_sane()
  cpumask: Add for_each_cpu_from()

Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc vfs updates from Al Viro:
"Assorted commits that had missed the last merge window..."

* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  remove call_{read,write}_iter() functions
  do_dentry_open(): kill inode argument
  kernel_file_open(): get rid of inode argument
  get_file_rcu(): no need to check for NULL separately
  fd_is_open(): move to fs/file.c
  close_on_exec(): pass files_struct instead of fdtable

Merge tag 'pull-bd_flags-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull bdev flags update from Al Viro:
"Compactifying bdev flags.

  We can easily have up to 24 flags with sane atomicity, _without_
  pushing anything out of the first cacheline of struct block_device"

* tag 'pull-bd_flags-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  bdev: move ->bd_make_it_fail to ->__bd_flags
  bdev: move ->bd_ro_warned to ->__bd_flags
  bdev: move ->bd_has_subit_bio to ->__bd_flags
  bdev: move ->bd_write_holder into ->__bd_flags
  bdev: move ->bd_read_only to ->__bd_flags
  bdev: infrastructure for flags
  wrapper for access to ->bd_partno
  Use bdev_is_paritition() instead of open-coding it

io_uring/sqpoll: ensure that normal task_work is also run timely

With the move to private task_work, SQPOLL neglected to also run the
normal task_work, if any is pending. This will eventually get run, but
we should run it with the private task_work to ensure that things like
a final fput() is processed in a timely fashion.

Cc: [email protected]
Link: https://lore.kernel.org/all/[email protected]/
Reported-by: Andrew Udvare <[email protected]>
Fixes: af5d68f8892f ("io_uring/sqpoll: manage task_work privately")
Tested-by: Christian Heusel <[email protected]>
Tested-by: Andrew Udvare <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 's390-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Alexander Gordeev:

- Switch read and write software bits for PUDs

- Add missing hardware bits for PUDs and PMDs

- Generate unwind information for C modules to fix GDB unwind error for
   vDSO functions

- Create .build-id links for unstripped vDSO files to enable vDSO
   debugging with symbols

- Use standard stack frame layout for vDSO generated stack frames to
   manually walk stack frames without DWARF information

- Rework perf_callchain_user() and arch_stack_walk_user() functions to
   reduce code duplication

- Skip first stack frame when walking user stack

- Add basic checks to identify invalid instruction pointers when
   walking stack frames

- Introduce and use struct stack_frame_vdso_wrapper within vDSO user
   wrapper code to automatically generate an asm-offset define. Also use
   STACK_FRAME_USER_OVERHEAD instead of STACK_FRAME_OVERHEAD to document
   that the code works with user space stack

- Clear the backchain of the extra stack frame added by the vDSO user
   wrapper code. This allows the user stack walker to detect and skip
   the non-standard stack frame. Without this an incorrect instruction
   pointer would be added to stack traces.

- Rewrite psw_idle() function in C to ease maintenance and further
   enhancements

- Remove get_vtimer() function and use get_cpu_timer() instead

- Mark psw variable in __load_psw_mask() as __unitialized to avoid
   superfluous clearing of PSW

- Remove obsolete and superfluous comment about removed TIF_FPU flag

- Replace memzero_explicit() and kfree() with kfree_sensitive() to fix
   warnings reported by Coccinelle

- Wipe sensitive data and all copies of protected- or secure-keys from
   stack when an IOCTL fails

- Both do_airq_interrupt() and do_io_interrupt() functions set
   CIF_NOHZ_DELAY flag. Move it in do_io_irq() to simplify the code

- Provide iucv_alloc_device() and iucv_release_device() helpers, which
   can be used to deduplicate more or less identical IUCV device
   allocation and release code in four different drivers

- Make use of iucv_alloc_device() and iucv_release_device() helpers to
   get rid of quite some code and also remove a cast to an incompatible
   function (clang W=1)

- There is no user of iucv_root outside of the core IUCV code left.
   Therefore remove the EXPORT_SYMBOL

- __apply_alternatives() contains a runtime check which verifies that
   the size of the to be patched code area is even. Convert this to a
   compile time check

- Increase size of buffers for sending z/VM CP DIAGNOSE X'008' commands
   from 128 to 240

- Do not accept z/VM CP DIAGNOSE X'008' commands longer than maximally
   allowed

- Use correct defines IPL_BP_NVME_LEN and IPL_BP0_NVME_LEN instead of
   IPL_BP_FCP_LEN and IPL_BP0_FCP_LEN ones to initialize NVMe reIPL
   block on 'scp_data' sysfs attribute update

- Initialize the correct fields of the NVMe dump block, which were
   confused with FCP fields

- Refactor macros for 'scp_data' (re-)IPL sysfs attribute to reduce
   code duplication

- Introduce 'scp_data' sysfs attribute for dump IPL to allow tools such
   as dumpconf passing additional kernel command line parameters to a
   stand-alone dumper

- Rework the CPACF query functions to use the correct RRE or RRF
   instruction formats and set instruction register fields correctly

- Instead of calling BUG() at runtime force a link error during compile
   when a unsupported opcode is used with __cpacf_query() or
   __cpacf_check_opcode() functions

- Fix a crash in ap_parse_bitmap_str() function on /sys/bus/ap/apmask
   or /sys/bus/ap/aqmask sysfs file update with a relative mask value

- Fix "bindings complete" udev event which should be sent once all AP
   devices have been bound to device drivers and again when unbind/bind
   actions take place and all AP devices are bound again

- Facility list alt_stfle_fac_list is nowhere used in the decompressor,
   therefore remove it there

- Remove custom kprobes insn slot allocator in favour of the standard
   module_alloc() one, since kernel image and module areas are located
   within 4GB

- Use kvcalloc() instead of kvmalloc_array() in zcrypt driver to avoid
   calling memset() with a large byte count and get rid of the sparse
   warning as result

* tag 's390-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
  s390/zcrypt: Use kvcalloc() instead of kvmalloc_array()
  s390/kprobes: Remove custom insn slot allocator
  s390/boot: Remove alt_stfle_fac_list from decompressor
  s390/ap: Fix bind complete udev event sent after each AP bus scan
  s390/ap: Fix crash in AP internal function modify_bitmap()
  s390/cpacf: Make use of invalid opcode produce a link error
  s390/cpacf: Split and rework cpacf query functions
  s390/ipl: Introduce sysfs attribute 'scp_data' for dump ipl
  s390/ipl: Introduce macros for (re)ipl sysfs attribute 'scp_data'
  s390/ipl: Fix incorrect initialization of nvme dump block
  s390/ipl: Fix incorrect initialization of len fields in nvme reipl block
  s390/ipl: Do not accept z/VM CP diag X'008' cmds longer than max length
  s390/ipl: Fix size of vmcmd buffers for sending z/VM CP diag X'008' cmds
  s390/alternatives: Convert runtime sanity check into compile time check
  s390/iucv: Unexport iucv_root
  tty: hvc-iucv: Make use of iucv_alloc_device()
  s390/smsgiucv_app: Make use of iucv_alloc_device()
  s390/netiucv: Make use of iucv_alloc_device()
  s390/vmlogrdr: Make use of iucv_alloc_device()
  s390/iucv: Provide iucv_alloc_device() / iucv_release_device()
  ...

Merge tag 'm68knommu-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu

Pull m68knommu update from Greg Ungerer:

. remove use of kernel config option from uapi header

* tag 'm68knommu-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: Avoid CONFIG_COLDFIRE switch in uapi header

Merge tag 'efi-fixes-for-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fix from Ard Biesheuvel:

- Followup fix for the EFI boot sequence refactor, which may result in
   physical KASLR putting the kernel in a region which is being used for
   a special purpose via a command line argument.

* tag 'efi-fixes-for-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  x86/efistub: Omit physical KASLR when memory reservations exist

Merge tag 'for-6.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- Fix DM discard regressions due to DM core switching over to using
   queue_limits_set() without DM core and targets first being updated to
   set (and stack) discard limits in terms of max_hw_discard_sectors and
   not max_discard_sectors

- Fix stable@ DM integrity discard support to set device's
   discard_granularity limit to the device's logical block size

* tag 'for-6.10/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: always manage discard support in terms of max_hw_discard_sectors
  dm-integrity: set discard_granularity to logical block size

Merge tag 'pm-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix the amd-pstate driver and the operating performance point
  (OPP) handling related to generic PM domains.

  Specifics:

   - Fix a memory leak in the exit path of amd-pstate (Peng Ma)

   - Fix required_opp_tables handling in the cases when multiple generic
     PM domains share one OPP table (Viresh Kumar)"

* tag 'pm-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  OPP: Fix required_opp_tables for multiple genpds using same table
  cpufreq: amd-pstate: fix memory leak on CPU EPP exit

Merge tag 'acpi-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These make the ACPI EC driver always install the EC address space
  handler at the root of the ACPI namespace which causes it to take care
  of all EC operation regions everywhere.

  This means that the custom EC address space handler in the WMI driver
  is not needed any more and accordingly it gets removed altogether"

* tag 'acpi-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  platform/x86: wmi: Remove custom EC address space handler
  ACPI: EC: Install address space handler at the namespace root

Merge tag 'thermal-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
"These fix the MediaTek lvts_thermal driver and the handling of trip
  points that start as invalid and are adjusted later by user space via
  sysfs.

  Specifics:

   - Fix and clean up the MediaTek lvts_thermal driver (Julien Panis)

   - Prevent invalid trip point handling from triggering spurious trip
     point crossing events and allow passive polling to stop when a
     passive trip point involved in it becomes invalid (Rafael Wysocki)"

* tag 'thermal-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: core: Fix the handling of invalid trip points
  thermal/drivers/mediatek/lvts_thermal: Fix wrong lvts_ctrl index
  thermal/drivers/mediatek/lvts_thermal: Remove unused members from struct lvts_ctrl_data
  thermal/drivers/mediatek/lvts_thermal: Check NULL ptr on lvts_data

Merge tag 'intel-gpio-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-gpio-intel

Pull intel-gpio fixes from Andy Shevchenko:

- NULL pointer dereference fix in GPIO APCI library

- Restore ACPI handle matching for GPIO devices represented in banks

* tag 'intel-gpio-v6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-gpio-intel:
gpiolib: acpi: Fix failed in acpi_gpiochip_find() by adding parent node match
gpiolib: acpi: Move ACPI device NULL check to acpi_can_fallback_to_crs()

Merge tag 'soundwire-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire

Pull soundwire updates from Vinod Koul:

- cleanup and conversion for soundwire sysfs groups

- intel support for ace2x bits, auxdevice pm improvements

- qcom multi link device support

* tag 'soundwire-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: (33 commits)
  soundwire: intel_ace2.x: add support for DOAISE property
  soundwire: intel_ace2.x: add support for DODSE property
  soundwire: intel_ace2x: use DOAIS and DODS settings from firmware
  soundwire: intel_ace2x: cleanup DOAIS/DODS settings
  soundwire: intel_ace2x: simplify check_wake()
  soundwire: intel_ace2x: fix wakeup handling
  soundwire: intel_init: resume all devices on exit.
  soundwire: intel: export intel_resume_child_device
  soundwire: intel_auxdevice: use pm_runtime_resume() instead of pm_request_resume()
  ASoC: SOF: Intel: hda: disable SoundWire interrupt later
  soundwire: qcom: allow multi-link on newer devices
  soundwire: intel_ace2x: use legacy formula for intel_alh_id
  soundwire: reconcile dp0_prop and dpn_prop
  soundwire: intel_ace2x: set the clock source
  soundwire: intel_ace2.x: power-up first before setting SYNCPRD
  soundwire: intel_ace2x: move and extend clock selection
  soundwire: intel: add support for MeteorLake additional clocks
  soundwire: intel: add more values for SYNCPRD
  soundwire: bus: extend base clock checks to 96 MHz
  soundwire: cadence: show the bus frequency and frame shape
  ...

Merge tag 'phy-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Pull generic phy updates from Vinod Koul:
"New HW Support:
   - Support for Embedded DisplayPort and DisplayPort submodes and
     driver support on Qualcomm X1E80100 edp driver
   - Qualcomm QMP UFS PHY for SM8475, QMP USB phy for QDU1000/QRU1000
     and eusb2-repeater for SMB2360
   - Samsung HDMI PHY for i.MX8MP, gs101 UFS phy
   - Mediatek XFI T-PHY support for mt7988
   - Rockchip usbdp combo phy driver

  Updates:
   - Qualcomm x4 lane EP support for sa8775p, v4 ad v6 support for
     X1E80100, SM8650 tables for UFS Gear 4 & 5 and correct voltage
     swing tables
   - Freescale imx8m-pci pcie link-up updates
   - Rockchip rx-common-refclk-mode support
   - More platform remove callback returning void conversions"

* tag 'phy-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (43 commits)
  dt-bindings: phy: qcom,usb-snps-femto-v2: use correct fallback for sc8180x
  dt-bindings: phy: qcom,sc8280xp-qmp-ufs-phy: fix msm899[68] power-domains
  dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: fix x1e80100-gen3x2 schema
  phy: qcpm-qmp-usb: Add support for QDU1000/QRU1000
  dt-bindings: phy: qcom,qmp-usb: Add QDU1000 USB3 PHY
  dt-bindings: phy: qcom,usb-snps-femto-v2: Add bindings for QDU1000
  phy: qcom-qmp-pcie: add x4 lane EP support for sa8775p
  phy: samsung-ufs: ufs: exit on first reported error
  phy: samsung-ufs: ufs: remove superfluous mfd/syscon.h header
  phy: rockchip: fix CONFIG_TYPEC dependency
  phy: rockchip: usbdp: fix uninitialized variable
  phy: rockchip-snps-pcie3: add support for rockchip,rx-common-refclk-mode
  dt-bindings: phy: rockchip,pcie3-phy: add rockchip,rx-common-refclk-mode
  phy: rockchip: add usbdp combo phy driver
  dt-bindings: phy: add rockchip usbdp combo phy document
  phy: add driver for MediaTek XFI T-PHY
  dt-bindings: phy: mediatek,mt7988-xfi-tphy: add new bindings
  phy: freescale: fsl-samsung-hdmi: Convert to platform remove callback returning void
  phy: qcom: qmp-ufs: update SM8650 tables for Gear 4 & 5
  MAINTAINERS: Add phy-gs101-ufs file to Tensor GS101.
  ...

Merge tag 'dmaengine-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine updates from Vinod Koul:
"New HW support:
   - Freescale i.MX8ULP edma support in edma driver
   - StarFive JH8100 DMA support in Synopsis axi-dmac driver

  Updates:
   - Tracing support for freescale edma driver, updates to dpaa2 driver
   - Remove unused QCom hidma DT support
   - Support for i2c dma in imx-sdma
   - Maintainers update for idxd and edma drivers"

* tag 'dmaengine-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (42 commits)
  MAINTAINERS: Update role for IDXD driver
  dmaengine: fsl-edma: use _Generic to handle difference type
  dmaengine: fsl-edma: add trace event support
  dmaengine: idxd: Avoid unnecessary destruction of file_ida
  dmaengine: xilinx: xdma: fix module autoloading
  dt-bindings: dma: fsl-edma: allow 'power-domains' property
  dt-bindings: dma: fsl-edma: remove 'clocks' from required
  dmaengine: fsl-dpaa2-qdma: Fix kernel-doc check warning
  dmaengine: imx-sdma: Add i2c dma support
  dmaengine: imx-sdma: utilize compiler to calculate ADDRS_ARRAY_SIZE_V<n>
  dt-bindings: fsl-imx-sdma: Add I2C peripheral types ID
  dt-bindings: fsl-dma: fsl-edma: clean up unused "fsl,imx8qm-adma" compatible string
  dmaengine: fsl-edma: clean up unused "fsl,imx8qm-adma" compatible string
  dt-bindings: dma: Drop unused QCom hidma binding
  dmaengine: qcom: Drop hidma DT support
  dmaengine: pl08x: Use kcalloc() instead of kzalloc()
  dmaengine: fsl-dpaa2-qdma: Update DPDMAI interfaces to version 3
  dmaengine: fsl-edma: fix miss mutex unlock at an error return path
  dmaengine: pch_dma: remove unused function chan2parent
  dmaengine: fsl-dpaa2-qdma: Add dpdmai_cmd_open
  ...

arm64: asm-bug: Add .align 2 to the end of __BUG_ENTRY

When CONFIG_DEBUG_BUGVERBOSE=n, we fail to add necessary padding bytes
to bug_table entries, and as a result the last entry in a bug table will
be ignored, potentially leading to an unexpected panic(). All prior
entries in the table will be handled correctly.

The arm64 ABI requires that struct fields of up to 8 bytes are
naturally-aligned, with padding added within a struct such that struct
are suitably aligned within arrays.

When CONFIG_DEBUG_BUGVERPOSE=y, the layout of a bug_entry is:

struct bug_entry {
signed int      bug_addr_disp; // 4 bytes
signed int      file_disp; // 4 bytes
unsigned short  line; // 2 bytes
unsigned short  flags; // 2 bytes
}

... with 12 bytes total, requiring 4-byte alignment.

When CONFIG_DEBUG_BUGVERBOSE=n, the layout of a bug_entry is:

struct bug_entry {
signed int      bug_addr_disp; // 4 bytes
unsigned short  flags; // 2 bytes
< implicit padding > // 2 bytes
}

... with 8 bytes total, with 6 bytes of data and 2 bytes of trailing
padding, requiring 4-byte alginment.

When we create a bug_entry in assembly, we align the start of the entry
to 4 bytes, which implicitly handles padding for any prior entries.
However, we do not align the end of the entry, and so when
CONFIG_DEBUG_BUGVERBOSE=n, the final entry lacks the trailing padding
bytes.

For the main kernel image this is not a problem as find_bug() doesn't
depend on the trailing padding bytes when searching for entries:

for (bug = __start___bug_table; bug < __stop___bug_table; ++bug)
if (bugaddr == bug_addr(bug))
return bug;

However for modules, module_bug_finalize() depends on the trailing
bytes when calculating the number of entries:

mod->num_bugs = sechdrs[i].sh_size / sizeof(struct bug_entry);

... and as the last bug_entry lacks the necessary padding bytes, this entry
will not be counted, e.g. in the case of a single entry:

sechdrs[i].sh_size == 6
sizeof(struct bug_entry) == 8;

sechdrs[i].sh_size / sizeof(struct bug_entry) == 0;

Consequently module_find_bug() will miss the last bug_entry when it does:

for (i = 0; i < mod->num_bugs; ++i, ++bug)
if (bugaddr == bug_addr(bug))
goto out;

... which can lead to a kenrel panic due to an unhandled bug.

This can be demonstrated with the following module:

static int __init buginit(void)
{
WARN(1, "hello\n");
return 0;
}

static void __exit bugexit(void)
{
}

module_init(buginit);
module_exit(bugexit);
MODULE_LICENSE("GPL");

... which will trigger a kernel panic when loaded:

------------[ cut here ]------------
hello
Unexpected kernel BRK exception at EL1
Internal error: BRK handler: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in: hello(O+)
CPU: 0 PID: 50 Comm: insmod Tainted: G           O       6.9.1 #8
Hardware name: linux,dummy-virt (DT)
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : buginit+0x18/0x1000 [hello]
lr : buginit+0x18/0x1000 [hello]
sp : ffff800080533ae0
x29: ffff800080533ae0 x28: 0000000000000000 x27: 0000000000000000
x26: ffffaba8c4e70510 x25: ffff800080533c30 x24: ffffaba8c4a28a58
x23: 0000000000000000 x22: 0000000000000000 x21: ffff3947c0eab3c0
x20: ffffaba8c4e3f000 x19: ffffaba846464000 x18: 0000000000000006
x17: 0000000000000000 x16: ffffaba8c2492834 x15: 0720072007200720
x14: 0720072007200720 x13: ffffaba8c49b27c8 x12: 0000000000000312
x11: 0000000000000106 x10: ffffaba8c4a0a7c8 x9 : ffffaba8c49b27c8
x8 : 00000000ffffefff x7 : ffffaba8c4a0a7c8 x6 : 80000000fffff000
x5 : 0000000000000107 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff3947c0eab3c0
Call trace:
buginit+0x18/0x1000 [hello]
do_one_initcall+0x80/0x1c8
do_init_module+0x60/0x218
load_module+0x1ba4/0x1d70
__do_sys_init_module+0x198/0x1d0
__arm64_sys_init_module+0x1c/0x28
invoke_syscall+0x48/0x114
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x34/0xd8
el0t_64_sync_handler+0x120/0x12c
el0t_64_sync+0x190/0x194
Code: d0ffffe0 910003fd 91000000 9400000b (d4210000)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: BRK handler: Fatal exception

Fix this by always aligning the end of a bug_entry to 4 bytes, which is
correct regardless of CONFIG_DEBUG_BUGVERBOSE.

Fixes: 9fb7410f955f ("arm64/BUG: Use BRK instruction for generic BUG traps")
Signed-off-by: Yuanbin Xie <[email protected]>
Signed-off-by: Jiangfeng Xiao <[email protected]>
Reviewed-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

Merge tag 'mailbox-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox

Pull mailbox updates from Jassi Brar:

- redo the omap driver from legacy to mailbox api

- enable bufferless IPI for zynqmp

- add mhu-v3 driver

- convert from tasklet to BH workqueue

- add qcom MSM8974 APCS compatible IDs

* tag 'mailbox-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox: (24 commits)
  dt-bindings: mailbox: qcom-ipcc: Document the SDX75 IPCC
  dt-bindings: mailbox: qcom: Add MSM8974 APCS compatible
  mailbox: Convert from tasklet to BH workqueue
  mailbox: mtk-cmdq: Fix pm_runtime_get_sync() warning in mbox shutdown
  mailbox: mtk-cmdq-mailbox: fix module autoloading
  mailbox: zynqmp: handle SGI for shared IPI
  mailbox: arm_mhuv3: Add driver
  dt-bindings: mailbox: arm,mhuv3: Add bindings
  mailbox: omap: Remove kernel FIFO message queuing
  mailbox: omap: Reverse FIFO busy check logic
  mailbox: omap: Remove mbox_chan_to_omap_mbox()
  mailbox: omap: Use mbox_controller channel list directly
  mailbox: omap: Use function local struct mbox_controller
  mailbox: omap: Merge mailbox child node setup loops
  mailbox: omap: Use devm_pm_runtime_enable() helper
  mailbox: omap: Remove device class
  mailbox: omap: Remove unneeded header omap-mailbox.h
  mailbox: omap: Move fifo size check to point of use
  mailbox: omap: Move omap_mbox_irq_t into driver
  mailbox: omap: Remove unused omap_mbox_request_channel() function
  ...

Merge tag 'rproc-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux

Pull remoteproc updates from Bjorn Andersson:
"This makes the remoteproc core rproc_class const.

  DeviceTree bindings for a few different Qualcomm remoteprocs are
  updated to remove a range of validation warnings/errors. The Qualcomm
  SMD binding marks qcom,ipc deprecated, in favor or the mailbox
  interface.

  The TI K3 R5 remoteproc driver is updated to ensure that cores are
  powered up in the appropriate order. The driver also see a couple of
  fixes related to cleanups in error paths during probe.

  The Mediatek remoteproc driver is extended to support the MT8188 SCP
  core 1. Support for varying DRAM and IPI shared buffer sizes are
  introduced. This together with a couple of bug fixes and improvements
  to the driver.

  Support for the AMD-Xilinx Versal and Versal-NET platforms are added.
  Coredump support and support for parsing TCM information from
  DeviceTree is added to the Xilinx R5F remoteproc driver"

* tag 'rproc-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: (22 commits)
  dt-bindings: remoteproc: qcom,sdm845-adsp-pil: Fix qcom,halt-regs definition
  dt-bindings: remoteproc: qcom,sc7280-wpss-pil: Fix qcom,halt-regs definition
  dt-bindings: remoteproc: qcom,qcs404-cdsp-pil: Fix qcom,halt-regs definition
  dt-bindings: remoteproc: qcom,msm8996-mss-pil: allow glink-edge on msm8996
  dt-bindings: remoteproc: qcom,smd-edge: Mark qcom,ipc as deprecated
  remoteproc: k3-r5: Jump to error handling labels in start/stop errors
  remoteproc: mediatek: Fix error code in scp_rproc_init()
  remoteproc: k3-r5: Do not allow core1 to power up before core0 via sysfs
  remoteproc: k3-r5: Wait for core0 power-up before powering up core1
  remoteproc: mediatek: Add IMGSYS IPI command
  remoteproc: mediatek: Support setting DRAM and IPI shared buffer sizes
  remoteproc: mediatek: Support MT8188 SCP core 1
  dt-bindings: remoteproc: mediatek: Support MT8188 dual-core SCP
  drivers: remoteproc: xlnx: Fix uninitialized tcm mode
  drivers: remoteproc: xlnx: Fix uninitialized variable use
  drivers: remoteproc: xlnx: Add Versal and Versal-NET support
  remoteproc: zynqmp: parse TCM from device tree
  dt-bindings: remoteproc: Add Tightly Coupled Memory (TCM) bindings
  remoteproc: zynqmp: fix lockstep mode memory region
  remoteproc: zynqmp: Add coredump support
  ...

Merge tag 'rpmsg-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux

Pull rpmsg updates from Bjorn Andersson:
"This makes core rpmsg_class const and ensures that the automatic
  module loading of the Qualcomm glink_ssr driver happens"

* tag 'rpmsg-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
  rpmsg: qcom_glink_ssr: fix module autoloading
  rpmsg: core: Make rpmsg_class constant

Merge tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
"Enumeration:

   - Skip E820 checks for MCFG ECAM regions for new (2016+) machines,
     since there's no requirement to describe them in E820 and some
     platforms require ECAM to work (Bjorn Helgaas)

   - Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien
     Le Moal)

   - Remove last user and pci_enable_device_io() (Heiner Kallweit)

   - Wait for Link Training==0 to avoid possible race (Ilpo Järvinen)

   - Skip waiting for devices that have been disconnected while
     suspended (Ilpo Järvinen)

   - Clear Secondary Status errors after enumeration since Master Aborts
     and Unsupported Request errors are an expected part of enumeration
     (Vidya Sagar)

  MSI:

   - Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas)

  Error handling:

   - Mask Genesys GL975x SD host controller Replay Timer Timeout
     correctable errors caused by a hardware defect; the errors cause
     interrupts that prevent system suspend (Kai-Heng Feng)

   - Fix EDR-related _DSM support, which previously evaluated revision 5
     but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan)

  ASPM:

   - Simplify link state definitions and mask calculation (Ilpo
     Järvinen)

  Power management:

   - Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS
     apparently doesn't know how to put them back in D0 (Mario
     Limonciello)

  CXL:

   - Support resetting CXL devices; special handling required because
     CXL Ports mask Secondary Bus Reset by default (Dave Jiang)

  DOE:

   - Support DOE Discovery Version 2 (Alexey Kardashevskiy)

  Endpoint framework:

   - Set endpoint BAR to be 64-bit if the driver says that's all the
     device supports, in addition to doing so if the size is >2GB
     (Niklas Cassel)

   - Simplify endpoint BAR allocation and setting interfaces (Niklas
     Cassel)

  Cadence PCIe controller driver:

   - Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof
     Kozlowski)

  Cadence PCIe endpoint driver:

   - Configure endpoint BARs to be 64-bit based on the BAR type, not the
     BAR value (Niklas Cassel)

  Freescale Layerscape PCIe controller driver:

   - Convert DT binding to YAML (Frank Li)

  MediaTek MT7621 PCIe controller driver:

   - Add DT binding missing 'reg' property for child Root Ports
     (Krzysztof Kozlowski)

   - Fix theoretical string truncation in PHY name (Sergio Paracuellos)

  NVIDIA Tegra194 PCIe controller driver:

   - Return success for endpoint probe instead of falling through to the
     failure path (Vidya Sagar)

  Renesas R-Car PCIe controller driver:

   - Add DT binding missing IOMMU properties (Geert Uytterhoeven)

   - Add DT binding R-Car V4H compatible for host and endpoint mode
     (Yoshihiro Shimoda)

  Rockchip PCIe controller driver:

   - Configure endpoint BARs to be 64-bit based on the BAR type, not the
     BAR value (Niklas Cassel)

   - Add DT binding missing maxItems to ep-gpios (Krzysztof Kozlowski)

   - Set the Subsystem Vendor ID, which was previously zero because it
     was masked incorrectly (Rick Wertenbroek)

  Synopsys DesignWare PCIe controller driver:

   - Restructure DBI register access to accommodate devices where this
     requires Refclk to be active (Manivannan Sadhasivam)

   - Remove the deinit() callback, which was only need by the
     pcie-rcar-gen4, and do it directly in that driver (Manivannan
     Sadhasivam)

   - Add dw_pcie_ep_cleanup() so drivers that support PERST# can clean
     up things like eDMA (Manivannan Sadhasivam)

   - Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit() to make it parallel
     to dw_pcie_ep_init() (Manivannan Sadhasivam)

   - Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers() to
     reflect the actual functionality (Manivannan Sadhasivam)

   - Call dw_pcie_ep_init_registers() directly from all the glue
     drivers, not just those that require active Refclk from the host
     (Manivannan Sadhasivam)

   - Remove the "core_init_notifier" flag, which was an obscure way for
     glue drivers to indicate that they depend on Refclk from the host
     (Manivannan Sadhasivam)

  TI J721E PCIe driver:

   - Add DT binding J784S4 SoC Device ID (Siddharth Vadapalli)

   - Add DT binding J722S SoC support (Siddharth Vadapalli)

  TI Keystone PCIe controller driver:

   - Add DT binding missing num-viewport, phys and phy-name properties
     (Jan Kiszka)

  Miscellaneous:

   - Constify and annotate with __ro_after_init (Heiner Kallweit)

   - Convert DT bindings to YAML (Krzysztof Kozlowski)

   - Check for kcalloc() failure in of_pci_prop_intr_map() (Duoming
     Zhou)"

* tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits)
  PCI: Do not wait for disconnected devices when resuming
  x86/pci: Skip early E820 check for ECAM region
  PCI: Remove unused pci_enable_device_io()
  ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io()
  PCI: Update pci_find_capability() stub return types
  PCI: Remove PCI_IRQ_LEGACY
  scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY
  scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY
  dt-bindings: PCI: rockchip,rk3399-pcie: Add missing maxItems to ep-gpios
  Revert "genirq/msi: Provide constants for PCI/IMS support"
  Revert "x86/apic/msi: Enable PCI/IMS"
  Revert "iommu/vt-d: Enable PCI/IMS"
  Revert "iommu/amd: Enable PCI/IMS"
  Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support"
  ...

Merge tag 'keys-trusted-next-6.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull trusted keys fixes from Jarkko Sakkinen:
"These are two bugs I found from trusted keys while working on a new
  RSA key type for TPM2. Both originate form v5.13.

  The memory leak is more crucial but I don't think it is either good
  idea if kernel throws WARN when ASN.1 parser fails, even if it is
  related to programming error, as it is not that mature code yet.

  There's at least two WARN's in that code but I picked just the one
  more likely to trigger. Planning to fix the other one too over time"

* tag 'keys-trusted-next-6.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  KEYS: trusted: Do not use WARN when encode fails
  KEYS: trusted: Fix memory leak in tpm2_key_encode()

Merge tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull bdev bd_inode updates from Al Viro:
"Replacement of bdev->bd_inode with sane(r) set of primitives by me and
  Yu Kuai"

* tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  RIP ->bd_inode
  dasd_format(): killing the last remaining user of ->bd_inode
  nilfs_attach_log_writer(): use ->bd_mapping->host instead of ->bd_inode
  block/bdev.c: use the knowledge of inode/bdev coallocation
  gfs2: more obvious initializations of mapping->host
  fs/buffer.c: massage the remaining users of ->bd_inode to ->bd_mapping
  blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here...
  grow_dev_folio(): we only want ->bd_inode->i_mapping there
  use ->bd_mapping instead of ->bd_inode->i_mapping
  block_device: add a pointer to struct address_space (page cache of bdev)
  missing helpers: bdev_unhash(), bdev_drop()
  block: move two helpers into bdev.c
  block2mtd: prevent direct access of bd_inode
  dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode)
  blkdev_write_iter(): saner way to get inode and bdev
  bcachefs: remove dead function bdev_sectors()
  ext4: remove block_device_ejected()
  erofs_buf: store address_space instead of inode
  erofs: switch erofs_bread() to passing offset instead of block number

gpiolib: acpi: Fix failed in acpi_gpiochip_find() by adding parent node match

Previous patch modified the standard used by acpi_gpiochip_find()
to match device nodes. Using the device node set in gc->gpiodev->d-
ev instead of gc->parent.

However, there is a situation in gpio-dwapb where the GPIO device
driver will set gc->fwnode for each port corresponding to a child
node under a GPIO device, so gc->gpiodev->dev will be assigned the
value of each child node in gpiochip_add_data().

gpio-dwapb.c:
128,31 static int dwapb_gpio_add_port(struct dwapb_gpio *gpio,
struct dwapb_port_property *pp,
unsigned int offs);
port->gc.fwnode = pp->fwnode;

693,39 static int dwapb_gpio_probe;
err = dwapb_gpio_add_port(gpio, &pdata->properties[i], i);

When other drivers request GPIO pin resources through the GPIO device
node provided by ACPI (corresponding to the parent node), the change
of the matching object to gc->gpiodev->dev in acpi_gpiochip_find()
only allows finding the value of each port (child node), resulting
in a failed request.

Reapply the condition of using gc->parent for match in acpi_gpio-
chip_find() in the code can compatible with the problem of gpio-dwapb,
and will not affect the two cases mentioned in the patch:
1. There is no setting for gc->fwnode.
2. The case that depends on using gc->fwnode for match.

Fixes: 5062e4c14b75 ("gpiolib: acpi: use the fwnode in acpi_gpiochip_find()")
Fixes: 067dbc1ea5ce ("gpiolib: acpi: Don't use GPIO chip fwnode in acpi_gpiochip_find()")
Signed-off-by: Devyn Liu <[email protected]>
Reviewed-by: Mika Westerberg <[email protected]>
Tested-by: Benjamin Tissoires <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>

Merge tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs blocksize updates from Al Viro:
"This gets rid of bogus set_blocksize() uses, switches it over
  to be based on a 'struct file *' and verifies that the caller
  has the device opened exclusively"

* tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make set_blocksize() fail unless block device is opened exclusive
  set_blocksize(): switch to passing struct file *
  btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens
  swsusp: don't bother with setting block size
  zram: don't bother with reopening - just use O_EXCL for open
  swapon(2): open swap with O_EXCL
  swapon(2)/swapoff(2): don't bother with block size
  pktcdvd: sort set_blocksize() calls out
  bcache_register(): don't bother with set_blocksize()

gpiolib: acpi: Move ACPI device NULL check to acpi_can_fallback_to_crs()

Following the relocation of the function call outside of
__acpi_find_gpio(), move the ACPI device NULL check to
acpi_can_fallback_to_crs().

Signed-off-by: Laura Nao <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Reported-by: kernelci.org bot <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Fixes: 49c02f6e901c ("gpiolib: acpi: Move acpi_can_fallback_to_crs() out of __acpi_find_gpio()")
Signed-off-by: Andy Shevchenko <[email protected]>

fs/pidfs: make 'lsof' happy with our inode changes

pidfs started using much saner inodes in commit b28ddcc32d8f ("pidfs:
convert to path_from_stashed() helper"), but that exposed the fact that
lsof had some knowledge of just how odd our old anon_inode usage was.

For example, legacy anon_inodes hadn't even initialized the inode type
in the inode mode, so everything had a type of zero.

So sane tools like 'stat' would report these files as "weird file", but
'lsof' instead used that (together with the name of the link in proc) to
notice that it's an anonymous inode, and used it to detect pidfd files.

Let's keep our internal new sane inode model, but mask the file type
bits at 'stat()' time in the getattr() function we already have, and by
making the dentry name match what lsof expects too.

This keeps our internal models sane, but should make user space see the
same old odd behavior.

Reported-by: Jiri Slaby <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Link: https://github.com/lsof-org/lsof/issues/317
Cc: Alexander Viro <[email protected]>
Cc: Seth Forshee <[email protected]>
Cc: Tycho Andersen <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

openvswitch: Set the skbuff pkt_type for proper pmtud support.

Open vSwitch is originally intended to switch at layer 2, only dealing with
Ethernet frames.  With the introduction of l3 tunnels support, it crossed
into the realm of needing to care a bit about some routing details when
making forwarding decisions.  If an oversized packet would need to be
fragmented during this forwarding decision, there is a chance for pmtu
to get involved and generate a routing exception.  This is gated by the
skbuff->pkt_type field.

When a flow is already loaded into the openvswitch module this field is
set up and transitioned properly as a packet moves from one port to
another.  In the case that a packet execute is invoked after a flow is
newly installed this field is not properly initialized.  This causes the
pmtud mechanism to omit sending the required exception messages across
the tunnel boundary and a second attempt needs to be made to make sure
that the routing exception is properly setup.  To fix this, we set the
outgoing packet's pkt_type to PACKET_OUTGOING, since it can only get
to the openvswitch module via a port device or packet command.

Even for bridge ports as users, the pkt_type needs to be reset when
doing the transmit as the packet is truly outgoing and routing needs
to get involved post packet transformations, in the case of
VXLAN/GENEVE/udp-tunnel packets.  In general, the pkt_type on output
gets ignored, since we go straight to the driver, but in the case of
tunnel ports they go through IP routing layer.

This issue is periodically encountered in complex setups, such as large
openshift deployments, where multiple sets of tunnel traversal occurs.
A way to recreate this is with the ovn-heater project that can setup
a networking environment which mimics such large deployments.  We need
larger environments for this because we need to ensure that flow
misses occur.  In these environment, without this patch, we can see:

  ./ovn_cluster.sh start
  podman exec ovn-chassis-1 ip r a 170.168.0.5/32 dev eth1 mtu 1200
  podman exec ovn-chassis-1 ip netns exec sw01p1 ip r flush cache
  podman exec ovn-chassis-1 ip netns exec sw01p1 \
         ping 21.0.0.3 -M do -s 1300 -c2
  PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
  From 21.0.0.3 icmp_seq=2 Frag needed and DF set (mtu = 1142)

  --- 21.0.0.3 ping statistics ---
  ...

Using tcpdump, we can also see the expected ICMP FRAG_NEEDED message is not
sent into the server.

With this patch, setting the pkt_type, we see the following:

  podman exec ovn-chassis-1 ip netns exec sw01p1 \
         ping 21.0.0.3 -M do -s 1300 -c2
  PING 21.0.0.3 (21.0.0.3) 1300(1328) bytes of data.
  From 21.0.0.3 icmp_seq=1 Frag needed and DF set (mtu = 1222)
  ping: local error: message too long, mtu=1222

  --- 21.0.0.3 ping statistics ---
  ...

In this case, the first ping request receives the FRAG_NEEDED message and
a local routing exception is created.

Tested-by: Jaime Caamano <[email protected]>
Reported-at: https://issues.redhat.com/browse/FDP-164
Fixes: 58264848a5a7 ("openvswitch: Add vxlan tunneling support.")
Signed-off-by: Aaron Conole <[email protected]>
Acked-by: Eelco Chaudron <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

nfs: fix undefined behavior in nfs_block_bits()

Shifting *signed int* typed constant 1 left by 31 bits causes undefined
behavior. Specify the correct *unsigned long* type by using 1UL instead.

Found by Linux Verification Center (linuxtesting.org) with the Svace static
analysis tool.

Cc: [email protected]
Signed-off-by: Sergey Shtylyov <[email protected]>
Reviewed-by: Benjamin Coddington <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

pNFS: rework pnfs_generic_pg_check_layout to check IO range

All callers of pnfs_generic_pg_check_layout() also want to do a call to
check that the layout's range covers the IO range. Merge the functionality
of the pnfs_generic_pg_check_range() into that of
pnfs_generic_pg_check_layout().

Signed-off-by: Olga Kornievskaia <[email protected]>
Reviewed-by: Benjamin Coddington <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

pNFS/filelayout: check layout segment range

Before doing the IO, check that we have the layout covering the range of
IO.

Signed-off-by: Olga Kornievskaia <[email protected]>
Reviewed-by: Benjamin Coddington <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

pNFS/filelayout: fixup pNfs allocation modes

Change left over allocation flags.

Fixes: a245832aaa99 ("pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod")
Signed-off-by: Olga Kornievskaia <[email protected]>
Reviewed-by: Benjamin Coddington <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

Merge branch 'af_unix-fix-gc-and-improve-selftest'

Michal Luczaj says:

====================
af_unix: Fix GC and improve selftest

Series deals with AF_UNIX garbage collector mishandling some in-flight
graph cycles. Embryos carrying OOB packets with SCM_RIGHTS cause issues.

Patch 1/2 fixes the memory leak.
Patch 2/2 tweaks the selftest for a better OOB coverage.

v3:
  - Patch 1/2: correct the commit message (Kuniyuki)

v2: https://lore.kernel.org/netdev/20240516145457.1206847 [email protected]/
  - Patch 1/2: remove WARN_ON_ONCE() (Kuniyuki)
  - Combine both patches into a series (Kuniyuki)

v1: https://lore.kernel.org/netdev/20240516103049.1132040 [email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

selftest: af_unix: Make SCM_RIGHTS into OOB data.

scm_rights.c covers various test cases for inflight file descriptors
and garbage collector for AF_UNIX sockets.

Currently, SCM_RIGHTS messages are sent with 3-bytes string, and it's
not good for MSG_OOB cases, as SCM_RIGTS cmsg goes with the first 2-bytes,
which is non-OOB data.

Let's send SCM_RIGHTS messages with 1-byte character to pack SCM_RIGHTS
into OOB data.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Michal Luczaj <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS

GC attempts to explicitly drop oob_skb's reference before purging the hit
list.

The problem is with embryos: kfree_skb(u->oob_skb) is never called on an
embryo socket.

The python script below [0] sends a listener's fd to its embryo as OOB
data.  While GC does collect the embryo's queue, it fails to drop the OOB
skb's refcount.  The skb which was in embryo's receive queue stays as
unix_sk(sk)->oob_skb and keeps the listener's refcount [1].

Tell GC to dispose embryo's oob_skb.

[0]:
from array import array
from socket import *

addr = '\x00unix-oob'
lis = socket(AF_UNIX, SOCK_STREAM)
lis.bind(addr)
lis.listen(1)

s = socket(AF_UNIX, SOCK_STREAM)
s.connect(addr)
scm = (SOL_SOCKET, SCM_RIGHTS, array('i', [lis.fileno()]))
s.sendmsg([b'x'], [scm], MSG_OOB)
lis.close()

[1]
$ grep unix-oob /proc/net/unix
$ ./unix-oob.py
$ grep unix-oob /proc/net/unix
0000000000000000: 00000002 00000000 00000000 0001 02     0 @unix-oob
0000000000000000: 00000002 00000000 00010000 0001 01  6072 @unix-oob

Fixes: 4090fa373f0e ("af_unix: Replace garbage collection algorithm.")
Signed-off-by: Michal Luczaj <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

tcp: Fix shift-out-of-bounds in dctcp_update_alpha().

In dctcp_update_alpha(), we use a module parameter dctcp_shift_g
as follows:

  alpha -= min_not_zero(alpha, alpha >> dctcp_shift_g);
  ...
  delivered_ce <<= (10 - dctcp_shift_g);

It seems syzkaller started fuzzing module parameters and triggered
shift-out-of-bounds [0] by setting 100 to dctcp_shift_g:

  memcpy((void*)0x20000080,
         "/sys/module/tcp_dctcp/parameters/dctcp_shift_g\000", 47);
  res = syscall(__NR_openat, /*fd=*/0xffffffffffffff9cul, /*file=*/0x20000080ul,
                /*flags=*/2ul, /*mode=*/0ul);
  memcpy((void*)0x20000000, "100\000", 4);
  syscall(__NR_write, /*fd=*/r[0], /*val=*/0x20000000ul, /*len=*/4ul);

Let's limit the max value of dctcp_shift_g by param_set_uint_minmax().

With this patch:

  # echo 10 > /sys/module/tcp_dctcp/parameters/dctcp_shift_g
  # cat /sys/module/tcp_dctcp/parameters/dctcp_shift_g
  10
  # echo 11 > /sys/module/tcp_dctcp/parameters/dctcp_shift_g
  -bash: echo: write error: Invalid argument

[0]:
UBSAN: shift-out-of-bounds in net/ipv4/tcp_dctcp.c:143:12
shift exponent 100 is too large for 32-bit type 'u32' (aka 'unsigned int')
CPU: 0 PID: 8083 Comm: syz-executor345 Not tainted 6.9.0-05151-g1b294a1f3561 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x201/0x300 lib/dump_stack.c:114
ubsan_epilogue lib/ubsan.c:231 [inline]
__ubsan_handle_shift_out_of_bounds+0x346/0x3a0 lib/ubsan.c:468
dctcp_update_alpha+0x540/0x570 net/ipv4/tcp_dctcp.c:143
tcp_in_ack_event net/ipv4/tcp_input.c:3802 [inline]
tcp_ack+0x17b1/0x3bc0 net/ipv4/tcp_input.c:3948
tcp_rcv_state_process+0x57a/0x2290 net/ipv4/tcp_input.c:6711
tcp_v4_do_rcv+0x764/0xc40 net/ipv4/tcp_ipv4.c:1937
sk_backlog_rcv include/net/sock.h:1106 [inline]
__release_sock+0x20f/0x350 net/core/sock.c:2983
release_sock+0x61/0x1f0 net/core/sock.c:3549
mptcp_subflow_shutdown+0x3d0/0x620 net/mptcp/protocol.c:2907
mptcp_check_send_data_fin+0x225/0x410 net/mptcp/protocol.c:2976
__mptcp_close+0x238/0xad0 net/mptcp/protocol.c:3072
mptcp_close+0x2a/0x1a0 net/mptcp/protocol.c:3127
inet_release+0x190/0x1f0 net/ipv4/af_inet.c:437
__sock_release net/socket.c:659 [inline]
sock_close+0xc0/0x240 net/socket.c:1421
__fput+0x41b/0x890 fs/file_table.c:422
task_work_run+0x23b/0x300 kernel/task_work.c:180
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0x9c8/0x2540 kernel/exit.c:878
do_group_exit+0x201/0x2b0 kernel/exit.c:1027
__do_sys_exit_group kernel/exit.c:1038 [inline]
__se_sys_exit_group kernel/exit.c:1036 [inline]
__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xe4/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x67/0x6f
RIP: 0033:0x7f6c2b5005b6
Code: Unable to access opcode bytes at 0x7f6c2b50058c.
RSP: 002b:00007ffe883eb948 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f6c2b5862f0 RCX: 00007f6c2b5005b6
RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffffc0
R10: 0000000000000006 R11: 0000000000000246 R12: 00007f6c2b5862f0
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
</TASK>

Reported-by: syzkaller <[email protected]>
Reported-by: Yue Sun <[email protected]>
Reported-by: xingwei lee <[email protected]>
Closes: https://lore.kernel.org/netdev/CAEkJfYNJM=cw-8x7_Vmj1J6uYVCWMbbvD=EFmDPVBGpTsqOxEA@mail.gmail.com/
Fixes: e3118e8359bb ("net: tcp: add DCTCP congestion control algorithm")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

regulator: tps6594-regulator: Correct multi-phase configuration

According to the TPS6594 PMIC Manual (linked) 8.3.2.1.4 Multi-Phase BUCK
Regulator Configurations section, the PMIC ignores all the other bucks'
except the primary buck's regulator registers. This is BUCK1 for
configurations BUCK12, BUCK123 and BUCK1234 while it is BUCK3 for
BUCK34. Correct the registers mapped for these configurations
accordingly.

Fixes: f17ccc5deb4d ("regulator: tps6594-regulator: Add driver for TI TPS6594 regulators")
Link: https://www.ti.com/lit/gpn/tps6594-q1
Signed-off-by: Neha Malcom Francis <[email protected]>
Link: https://msgid.link/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

selftests/net: use tc rule to filter the na packet

Test arp_ndisc_untracked_subnets use tcpdump to filter the unsolicited
and untracked na messages. It set -e before calling tcpdump. But if
tcpdump filters 0 packet, it will return none zero, and cause the script
to exit.

Instead of using slow tcpdump to capture packets, let's using tc rule
to filter out the na message.

At the same time, fix function setup_v6 which only needs one parameter.
Move all the related helpers from forwarding lib.sh to net lib.sh.

Fixes: 0ea7b0a454ca ("selftests: net: arp_ndisc_untracked_subnets: test for arp_accept and accept_untracked_na")
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

ipv6: sr: fix memleak in seg6_hmac_init_algo

seg6_hmac_init_algo returns without cleaning up the previous allocations
if one fails, so it's going to leak all that memory and the crypto tfms.

Update seg6_hmac_exit to only free the memory when allocated, so we can
reuse the code directly.

Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
Reported-by: Sabrina Dubroca <[email protected]>
Closes: https://lore.kernel.org/netdev/Zj3bh-gE7eT6V6aH@hog/
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Sabrina Dubroca <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.

Billy Jheng Bing-Jhong reported a race between __unix_gc() and
queue_oob().

__unix_gc() tries to garbage-collect close()d inflight sockets,
and then if the socket has MSG_OOB in unix_sk(sk)->oob_skb, GC
will drop the reference and set NULL to it locklessly.

However, the peer socket still can send MSG_OOB message and
queue_oob() can update unix_sk(sk)->oob_skb concurrently, leading
NULL pointer dereference. [0]

To fix the issue, let's update unix_sk(sk)->oob_skb under the
sk_receive_queue's lock and take it everywhere we touch oob_skb.

Note that we defer kfree_skb() in manage_oob() to silence lockdep
false-positive (See [1]).

[0]:
BUG: kernel NULL pointer dereference, address: 0000000000000008
PF: supervisor write access in kernel mode
PF: error_code(0x0002) - not-present page
PGD 8000000009f5e067 P4D 8000000009f5e067 PUD 9f5d067 PMD 0
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc5-00191-gd091e579b864 #110
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: events delayed_fput
RIP: 0010:skb_dequeue (./include/linux/skbuff.h:2386 ./include/linux/skbuff.h:2402 net/core/skbuff.c:3847)
Code: 39 e3 74 3e 8b 43 10 48 89 ef 83 e8 01 89 43 10 49 8b 44 24 08 49 c7 44 24 08 00 00 00 00 49 8b 14 24 49 c7 04 24 00 00 00 00 <48> 89 42 08 48 89 10 e8 e7 c5 42 00 4c 89 e0 5b 5d 41 5c c3 cc cc
RSP: 0018:ffffc900001bfd48 EFLAGS: 00000002
RAX: 0000000000000000 RBX: ffff8880088f5ae8 RCX: 00000000361289f9
RDX: 0000000000000000 RSI: 0000000000000206 RDI: ffff8880088f5b00
RBP: ffff8880088f5b00 R08: 0000000000080000 R09: 0000000000000001
R10: 0000000000000003 R11: 0000000000000001 R12: ffff8880056b6a00
R13: ffff8880088f5280 R14: 0000000000000001 R15: ffff8880088f5a80
FS: 0000000000000000(0000) GS:ffff88807dd80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000006314000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<TASK>
unix_release_sock (net/unix/af_unix.c:654)
unix_release (net/unix/af_unix.c:1050)
__sock_release (net/socket.c:660)
sock_close (net/socket.c:1423)
__fput (fs/file_table.c:423)
delayed_fput (fs/file_table.c:444 (discriminator 3))
process_one_work (kernel/workqueue.c:3259)
worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
kthread (kernel/kthread.c:388)
ret_from_fork (arch/x86/kernel/process.c:153)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
</TASK>
Modules linked in:
CR2: 0000000000000008

Link: https://lore.kernel.org/netdev/[email protected]/
Fixes: 1279f9d9dec2 ("af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC.")
Reported-by: Billy Jheng Bing-Jhong <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Revert "r8169: don't try to disable interrupts if NAPI is, scheduled already"

This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9.

Ken reported that RTL8125b can lock up if gro_flush_timeout has the
default value of 20000 and napi_defer_hard_irqs is set to 0.
In this scenario device interrupts aren't disabled, what seems to
trigger some silicon bug under heavy load. I was able to reproduce this
behavior on RTL8168h. Fix this by reverting 7274c4147afb.

Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already")
Cc: [email protected]
Reported-by: Ken Milmore <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'pm-cpufreq'

Merge an amd-pstate driver fix for 6.10-rc1:

- Fix a memory leak in the exit path of amd-pstate (Peng Ma).

* pm-cpufreq:
cpufreq: amd-pstate: fix memory leak on CPU EPP exit

KEYS: trusted: Do not use WARN when encode fails

When asn1_encode_sequence() fails, WARN is not the correct solution.

1. asn1_encode_sequence() is not an internal function (located
in lib/asn1_encode.c).
2. Location is known, which makes the stack trace useless.
3. Results a crash if panic_on_warn is set.

It is also noteworthy that the use of WARN is undocumented, and it
should be avoided unless there is a carefully considered rationale to
use it.

Replace WARN with pr_err, and print the return value instead, which is
only useful piece of information.

Cc: [email protected] # v5.13+
Fixes: f2219745250f ("security: keys: trusted: use ASN.1 TPM2 key format for the blobs")
Signed-off-by: Jarkko Sakkinen <[email protected]>

KEYS: trusted: Fix memory leak in tpm2_key_encode()

'scratch' is never freed. Fix this by calling kfree() in the success, and
in the error case.

Cc: [email protected] # +v5.13
Fixes: f2219745250f ("security: keys: trusted: use ASN.1 TPM2 key format for the blobs")
Signed-off-by: Jarkko Sakkinen <[email protected]>

Merge tag 'cocci-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux

Pull coccinelle updates from Julia Lawall:
"One patch slightly improves the text in a comment.

  The other patch (on minmax.cocci) removes a report about ? being used
  in return statements that has been generating not very useful
  suggestions to change idiomatic code"

* tag 'cocci-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux:
  Coccinelle: pm_runtime: Fix grammar in comment
  coccinelle: misc: minmax: Suppress reports for err returns

Merge tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic cleanups from Arnd Bergmann:
"These are a few cross-architecture cleanup patches:

   - separate out fbdev support from the asm/video.h contents that may
     be used by either the old fbdev drivers or the newer drm display
     code (Thomas Zimmermann)

   - cleanups for the generic bitops code and asm-generic/bug.h
     (Thorsten Blum)

   - remove the orphaned include/asm-generic/page.h header that used to
     be included by long-removed mmu-less architectures (me)"

* tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  arch: Fix name collision with ACPI's video.o
  bug: Improve comment
  asm-generic: remove unused asm-generic/page.h
  arch: Rename fbdev header and source files
  arch: Remove struct fb_info from video helpers
  arch: Select fbdev helpers with CONFIG_VIDEO
  bitops: Change function return types from long to int

Merge tag 'soc-dt-late-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull more SoC devicetree updates from Arnd Bergmann:
"This is a follow-up to an earlier pull request for device tree
  changes, as three platform maintainers sent their contents too late to
  be included in the main set, but had not caused any further problems
  since then:

   - The Amlogic platform now containts support for two new SoC types,
     the A4 and A5 chips for audio applications. Both come with a
     reference board, and one more dts file gets addded for the
     combination of the MNT Reform Laptop with the BPI-CM4 CPU module

   - The ASpeed platform adds support for six addititional server
     platforms that use ast2500 or ast2600 as their BMC, while another
     one gets removed

   - The RISC-V platforms from Microchip, Starfive and and T-HEAD get
     additional features for existing hardware, plus the addition of the
     Milk-V Mars based on the StarFive VisionFive v2 board"

* tag 'soc-dt-late-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (76 commits)
  riscv: dts: microchip: add pac1934 power-monitor to icicle
  riscv: dts: thead: Fix node ordering in TH1520 device tree
  ARM: dts: aspeed: Add ASRock E3C256D4I BMC
  dt-bindings: arm: aspeed: document ASRock E3C256D4I
  dt-bindings: trivial-devices: add isil,isl69269
  ARM: dts: aspeed: x4tf: Add dts for asus x4tf project
  dt-bindings: arm: aspeed: add ASUS X4TF board
  ARM: dts: aspeed: Remove Facebook Cloudripper dts
  ARM: dts: aspeed: drop unused ref_voltage ADC property
  ARM: dts: aspeed: harma: correct Mellanox multi-host property
  ARM: dts: aspeed: yosemitev2: correct Mellanox multi-host property
  ARM: dts: aspeed: yosemite4: correct Mellanox multi-host property
  ARM: dts: aspeed: greatlakes: correct Mellanox multi-host property
  ARM: dts: aspeed: Modify I2C bus configuration
  ARM: dts: aspeed: Disable unused ADC channels for Asrock X570D4U BMC
  ARM: dts: aspeed: Modify GPIO table for Asrock X570D4U BMC
  ARM: dts: aspeed: yosemite4: set bus13 frequency to 100k
  ARM: dts: Aspeed: Bonnell: Fix NVMe LED labels
  ARM: dts: aspeed: yosemite4: Enable ipmb device for OCP debug card
  ARM: dts: aspeed: ahe50dc: Update lm25066 regulator name
  ...

Merge tag 'vfio-v6.10-rc1' of https://github.com/awilliam/linux-vfio

Pull vfio updates from Alex Williamson:

- The vfio fsl-mc bus driver has become orphaned. We'll consider
   removing it in future releases if a new maintainer isn't found (Alex
   Williamson)

- Improved usage of opaque data in vfio-pci INTx handling, avoiding
   lookups of the eventfd through the interrupt and irqfd runtime paths
   (Alex Williamson)

- Resolve an error path memory leak introduced in vfio-pci interrupt
   code (Ye Bin)

- Addition of interrupt support for vfio devices exposed on the CDX
   bus, including a new MSI allocation helper and export of existing
   helpers for MSI alloc and free (Nipun Gupta)

- A new vfio-pci variant driver supporting migration of Intel QAT VF
   devices for the GEN4 PFs (Xin Zeng & Yahui Cao)

- Resolve a possibly circular locking dependency in vfio-pci by
   avoiding copy_to_user() from a PCI bus walk callback (Alex
   Williamson)

- Trivial docs update to remove a duplicate semicolon (Foryun Ma)

* tag 'vfio-v6.10-rc1' of https://github.com/awilliam/linux-vfio:
  vfio/pci: Restore zero affected bus reset devices warning
  vfio: remove an extra semicolon
  vfio/pci: Collect hot-reset devices to local buffer
  vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices
  vfio/cdx: add interrupt support
  genirq/msi: Add MSI allocation helper and export MSI functions
  vfio/pci: fix potential memory leak in vfio_intx_enable()
  vfio/pci: Pass eventfd context object through irqfd
  vfio/pci: Pass eventfd context to IRQ handler
  MAINTAINERS: Orphan vfio fsl-mc bus driver

Merge tag 'linux_kselftest-next-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
"Revert framework change to add D_GNU_SOURCE to KHDR_INCLUDES to
  Makefile, lib.mk, and kselftest_harness.h and follow-on changes to
  cgroup and sgx test as they are causing build failures and warnings"

* tag 'linux_kselftest-next-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  Revert "selftests/cgroup: Drop define _GNU_SOURCE"
  Revert "selftests/sgx: Include KHDR_INCLUDES in Makefile"
  Revert "selftests: Compile kselftest headers with -D_GNU_SOURCE"

arch: Fix name collision with ACPI's video.o

Commit 2fd001cd3600 ("arch: Rename fbdev header and source files")
renames the video source files under arch/ such that they do not
refer to fbdev any longer. The new files named video.o conflict with
ACPI's video.ko module. Modprobing the ACPI module can then fail with
warnings about missing symbols, as shown below.

  (i915_selftest:1107) igt_kmod-WARNING: i915: Unknown symbol acpi_video_unregister (err -2)
  (i915_selftest:1107) igt_kmod-WARNING: i915: Unknown symbol acpi_video_register_backlight (err -2)
  (i915_selftest:1107) igt_kmod-WARNING: i915: Unknown symbol __acpi_video_get_backlight_type (err -2)
  (i915_selftest:1107) igt_kmod-WARNING: i915: Unknown symbol acpi_video_register (err -2)

Fix the issue by renaming the architecture's video.o to video-common.o.

Reported-by: Chaitanya Kumar Borah <[email protected]>
Closes: https://lore.kernel.org/intel-gfx/[email protected]/T/#t
Signed-off-by: Thomas Zimmermann <[email protected]>
Fixes: 2fd001cd3600 ("arch: Rename fbdev header and source files")
Reviewed-by: Hans de Goede <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Arnd Bergmann <[email protected]>

Merge tag 'f2fs-for-6.10.rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
"In this round, we've tried to address some performance issues on zoned
  storage such as direct IO and write_hints. In addition, we've migrated
  some IO paths using folio. Meanwhile, there are multiple bug fixes in
  the compression paths, sanity check conditions, and error handlers.

  Enhancements:
   - allow direct io of pinned files for zoned storage
   - assign the write hint per stream by default
   - convert read paths and test_writeback to folio
   - avoid allocating WARM_DATA segment for direct IO

  Bug fixes:
   - fix false alarm on invalid block address
   - fix to add missing iput() in gc_data_segment()
   - fix to release node block count in error path of
     f2fs_new_node_page()
   - compress:
       - don't allow unaligned truncation on released compress inode
       - cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
       - fix error path of inc_valid_block_count()
       - fix to update i_compr_blocks correctly
   - fix block migration when section is not aligned to pow2
   - don't trigger OPU on pinfile for direct IO
   - fix to do sanity check on i_xattr_nid in sanity_check_inode()
   - write missing last sum blk of file pinning section
   - clear writeback when compression failed
   - fix to adjust appropirate defragment pg_end

  As usual, there are several minor code clean-ups, and fixes to manage
  missing corner cases in the error paths"

* tag 'f2fs-for-6.10.rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (50 commits)
  f2fs: initialize last_block_in_bio variable
  f2fs: Add inline to f2fs_build_fault_attr() stub
  f2fs: fix some ambiguous comments
  f2fs: fix to add missing iput() in gc_data_segment()
  f2fs: allow dirty sections with zero valid block for checkpoint disabled
  f2fs: compress: don't allow unaligned truncation on released compress inode
  f2fs: fix to release node block count in error path of f2fs_new_node_page()
  f2fs: compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
  f2fs: compress: fix error path of inc_valid_block_count()
  f2fs: compress: fix typo in f2fs_reserve_compress_blocks()
  f2fs: compress: fix to update i_compr_blocks correctly
  f2fs: check validation of fault attrs in f2fs_build_fault_attr()
  f2fs: fix to limit gc_pin_file_threshold
  f2fs: remove unused GC_FAILURE_PIN
  f2fs: use f2fs_{err,info}_ratelimited() for cleanup
  f2fs: fix block migration when section is not aligned to pow2
  f2fs: zone: fix to don't trigger OPU on pinfile for direct IO
  f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode()
  f2fs: fix to avoid allocating WARM_DATA segment for direct IO
  f2fs: remove redundant parameter in is_next_segment_free()
  ...

Merge tag 'xfs-6.10-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Chandan Babu:
"Online repair feature continues to be expanded. Also, we now support
  delayed allocation for realtime devices which have an extent size that
  is equal to filesystem's block size.

  New code:

   - Introduce Parent Pointer extended attribute for inodes

   - Bring back delalloc support for realtime devices which have an
     extent size that is equal to filesystem's block size

   - Improve performance of log incompat feature handling

  Online Repair:

   - Implement atomic file content exchanges i.e. exchange ranges of
     bytes between two files atomically

   - Create temporary files to repair file-based metadata. This uses
     atomic file content exchange facility to swap file fork mappings
     between the temporary file and the metadata inode

   - Allow callers of directory/xattr code to set an explicit owner
     number to be written into the header fields of any new blocks that
     are created. This is required to avoid walking every block of the
     new structure and modify their ownership during online repair

   - Repair more data structures:
       - Extended attributes
       - Inode unlinked state
       - Directories
       - Symbolic links
       - AGI's unlinked inode list
       - Parent pointers

   - Move Orphan files to lost and found directory

   - Fixes for Inode repair functionality

   - Introduce a new sub-AG FITRIM implementation to reduce the duration
     for which the AGF lock is held

   - Updates for the design documentation

   - Use Parent Pointers to assist in checking directories, parent
     pointers, extended attributes, and link counts

  Fixes:

   - Prevent userspace from reading invalid file data due to incorrect.
     updation of file size when performing a non-atomic clone operation

   - Minor fixes to online repair

   - Fix confusing return values from xfs_bmapi_write()

   - Fix an out of bounds access due to incorrect h_size during log
     recovery

   - Defer upgrading the extent counters in xfs_reflink_end_cow_extent()
     until we know we are going to modify the extent mapping

   - Remove racy access to if_bytes check in
     xfs_reflink_end_cow_extent()

   - Fix sparse warnings

  Cleanups:

   - Hold inode locks on all files involved in a rename until the
     completion of the operation. This is in preparation for the parent
     pointers patchset where parent pointers are applied in a separate
     chained update from the actual directory update

   - Compile out v4 support when disabled

   - Cleanup xfs_extent_busy_clear()

   - Remove unused flags and fields from struct xfs_da_args

   - Remove definitions of unused functions

   - Improve extended attribute validation

   - Add higher level directory operations helpers to remove duplication
     of code

   - Cleanup quota (un)reservation interfaces"

* tag 'xfs-6.10-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (221 commits)
  xfs: simplify iext overflow checking and upgrade
  xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent
  xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later
  xfs: xfs_quota_unreserve_blkres can't fail
  xfs: consolidate the xfs_quota_reserve_blkres definitions
  xfs: clean up buffer allocation in xlog_do_recovery_pass
  xfs: fix log recovery buffer allocation for the legacy h_size fixup
  xfs: widen flags argument to the xfs_iflags_* helpers
  xfs: minor cleanups of xfs_attr3_rmt_blocks
  xfs: create a helper to compute the blockcount of a max sized remote value
  xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function
  xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
  xfs: do not allocate the entire delalloc extent in xfs_bmapi_write
  xfs: fix xfs_bmap_add_extent_delay_real for partial conversions
  xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate
  xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate
  xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write
  xfs: lift a xfs_valid_startblock into xfs_bmapi_allocate
  xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate
  xfs: fix error returns from xfs_bmapi_write
  ...

dm: always manage discard support in terms of max_hw_discard_sectors

Commit 4f563a64732d ("block: add a max_user_discard_sectors queue
limit") changed block core to set max_discard_sectors to:
min(lim->max_hw_discard_sectors, lim->max_user_discard_sectors)

Since commit 1c0e720228ad ("dm: use queue_limits_set") it was reported
dm-thinp was failing in a few fstests (generic/347 and generic/405)
with the first WARN_ON_ONCE in dm_cell_key_has_valid_range() being
reported, e.g.:
WARNING: CPU: 1 PID: 30 at drivers/md/dm-bio-prison-v1.c:128 dm_cell_key_has_valid_range+0x3d/0x50

blk_set_stacking_limits() sets max_user_discard_sectors to UINT_MAX,
so given how block core now sets max_discard_sectors (detailed above)
it follows that blk_stack_limits() stacks up the underlying device's
max_hw_discard_sectors and max_discard_sectors is set to match it. If
max_hw_discard_sectors exceeds dm's BIO_PRISON_MAX_RANGE, then
dm_cell_key_has_valid_range() will trigger the warning with:
WARN_ON_ONCE(key->block_end - key->block_begin > BIO_PRISON_MAX_RANGE)

Aside from this warning, the discard will fail. Fix this and other DM
issues by governing discard support in terms of max_hw_discard_sectors
instead of max_discard_sectors.

Reported-by: Theodore Ts'o <[email protected]>
Fixes: 1c0e720228ad ("dm: use queue_limits_set")
Signed-off-by: Mike Snitzer <[email protected]>

Merge tag 'fs_for_v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull isofs, udf, quota, ext2, and reiserfs updates from Jan Kara:

- convert isofs to the new mount API

- cleanup isofs Makefile

- udf conversion to folios

- some other small udf cleanups and fixes

- ext2 cleanups

- removal of reiserfs .writepage method

- update reiserfs README file

* tag 'fs_for_v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  isofs: Use *-y instead of *-objs in Makefile
  ext2: Remove LEGACY_DIRECT_IO dependency
  isofs: Remove calls to set/clear the error flag
  ext2: Remove call to folio_set_error()
  udf: Use a folio in udf_write_end()
  udf: Convert udf_page_mkwrite() to use a folio
  udf: Convert udf_symlink_getattr() to use a folio
  udf: Convert udf_adinicb_readpage() to udf_adinicb_read_folio()
  udf: Convert udf_expand_file_adinicb() to use a folio
  udf: Convert udf_write_begin() to use a folio
  udf: Convert udf_symlink_filler() to use a folio
  reiserfs: Trim some README bits
  quota: fix to propagate error of mark_dquot_dirty() to caller
  reiserfs: Convert to writepages
  udf: udftime: prevent overflow in udf_disk_stamp_to_time()
  ext2: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
  udf: replace deprecated strncpy/strcpy with strscpy
  udf: Remove second semicolon
  isofs: convert isofs to use the new mount API
  fs: quota: use group allocation of per-cpu counters API

dm-integrity: set discard_granularity to logical block size

dm-integrity could set discard_granularity lower than the logical block
size. This could result in failures when sending discard requests to
dm-integrity.

This fix is needed for kernels prior to 6.10.

Signed-off-by: Mikulas Patocka <[email protected]>
Reported-by: Eric Wheeler <[email protected]>
Cc: [email protected] # <= 6.9
Signed-off-by: Mike Snitzer <[email protected]>

Revert "fanotify: remove unneeded sub-zero check for unsigned value"

This reverts commit e6595224464b692ddae193d783402130d1625147.

These kinds of patches are only making the code worse.

Compilers don't care about the unnecessary check, but removing it makes
the code less obvious to a human. The declaration of 'len' is more than
80 lines earlier, so a human won't easily see that 'len' is of an
unsigned type, so to a human the range check that checks against zero is
much more explicit and obvious.

Any tool that complains about a range check like this just because the
variable is unsigned is actively detrimental, and should be ignored.

Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'fsnotify_for_v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify updates from Jan Kara:

- reduce overhead of fsnotify infrastructure when no permission events
   are in use

- a few small cleanups

* tag 'fsnotify_for_v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: fix UAF from FS_ERROR event on a shutting down filesystem
  fsnotify: optimize the case of no permission event watchers
  fsnotify: use an enum for group priority constants
  fsnotify: move s_fsnotify_connectors into fsnotify_sb_info
  fsnotify: lazy attach fsnotify_sb_info state to sb
  fsnotify: create helper fsnotify_update_sb_watchers()
  fsnotify: pass object pointer and type to fsnotify mark helpers
  fanotify: merge two checks regarding add of ignore mark
  fsnotify: create a wrapper fsnotify_find_inode_mark()
  fsnotify: create helpers to get sb and connp from object
  fsnotify: rename fsnotify_{get,put}_sb_connectors()
  fsnotify: Avoid -Wflex-array-member-not-at-end warning
  fanotify: remove unneeded sub-zero check for unsigned value

Coccinelle: pm_runtime: Fix grammar in comment

s/does not use unnecessary/do not unnecessarily use/

Signed-off-by: Thorsten Blum <[email protected]>
Signed-off-by: Julia Lawall <[email protected]>

coccinelle: misc: minmax: Suppress reports for err returns

Most of the people prefer:

return ret < 0 ? ret: 0;

than:

return min(ret, 0);

Let's tweak the cocci file to ignore those lines completely.

Signed-off-by: Ricardo Ribalda <[email protected]>
Signed-off-by: Julia Lawall <[email protected]>

regulator: tps6287x: Force writing VSEL bit

The data-sheet for TPS6287x-Q1
https://www.ti.com/lit/ds/symlink/tps62873-q1.pdf
states at chapter 9.3.6.1 Output Voltage Range:

"Note that every change to the VRANGE[1:0] bits must be followed by a
write to the VSET register, even if the value of the VSET[7:0] bits does
not change."

The current implementation of the driver uses the
regulator_set_voltage_sel_pickable_regmap() helper which further uses
regmap_update_bits() to write the VSET-register. The
regmap_update_bits() will not access the hardware if the new register
value is same as old. It is worth noting that this is true also when the
register is marked volatile, which I can't say is wrong because
'read-mnodify-write'-cycle with a volatile register is in any case
something user should carefully consider.

The 'range_applied_by_vsel'-flag in regulator desc was added to force
the vsel register upodates by using regmap_write_bits(). This variant
will always unconditionally write the bits to the hardware.

It is worth noting that the vsel is now forced to be written to the
hardware, whether the range was changed or not. This may cause a
performance drop if users are wrtiting same voltage value repeteadly.

It would be possible to read the range register to determine if it was
changed, but this would be a performance issue for users who don't use
reg cache for vsel.

Always write the VSET register to the hardware regardless the cache.

Signed-off-by: Matti Vaittinen <[email protected]>
Fixes: 7b0518fbf2be ("regulator: Add support for TI TPS6287x regulators")
Link: https://msgid.link/r/ZktD50C5twF1EuKu@fedora
Signed-off-by: Mark Brown <[email protected]>

regulator: pickable ranges: don't always cache vsel

Some PMICs treat the vsel_reg same as apply-bit. Eg, when voltage range
is changed, the new voltage setting is not taking effect until the vsel
register is written.

Add a flag 'range_applied_by_vsel' to the regulator desc to indicate this
behaviour and to force the vsel value to be written to hardware if range
was changed, even if the old selector was same as the new one.

Signed-off-by: Matti Vaittinen <[email protected]>
Link: https://msgid.link/r/ZktCpcGZdgHWuN_L@fedora
Signed-off-by: Mark Brown <[email protected]>

Merge tag 'dma-mapping-6.10-2024-05-20' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

- optimize DMA sync calls when they are no-ops (Alexander Lobakin)

- fix swiotlb padding for untrusted devices (Michael Kelley)

- add documentation for swiotb (Michael Kelley)

* tag 'dma-mapping-6.10-2024-05-20' of git://git.infradead.org/users/hch/dma-mapping:
  dma: fix DMA sync for drivers not calling dma_set_mask*()
  xsk: use generic DMA sync shortcut instead of a custom one
  page_pool: check for DMA sync shortcut earlier
  page_pool: don't use driver-set flags field directly
  page_pool: make sure frag API fields don't span between cachelines
  iommu/dma: avoid expensive indirect calls for sync operations
  dma: avoid redundant calls for sync operations
  dma: compile-out DMA sync op calls when not used
  iommu/dma: fix zeroing of bounce buffer padding used by untrusted devices
  swiotlb: remove alloc_size argument to swiotlb_tbl_map_single()
  Documentation/core-api: add swiotlb documentation

Merge tag 'mips_6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS updates from Thomas Bogendoerfer:
"Just cleanups and fixes"

* tag 'mips_6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (24 commits)
  MIPS: Take in account load hazards for HI/LO restoring
  MIPS: SGI-IP27: use WARN_ON() output
  MIPS: SGI-IP27: fix -Wunused-variable in arch_init_irq()
  MIPS: SGI-IP27: micro-optimize arch_init_irq()
  mips: dts: ralink: mt7621: reorder the attributes of the root node
  mips: dts: ralink: mt7621: reorder pci?_phy attributes
  mips: dts: ralink: mt7621: reorder pcie node attributes and children
  mips: dts: ralink: mt7621: reorder ethernet node attributes and kids
  mips: dts: ralink: mt7621: reorder gic node attributes
  mips: dts: ralink: mt7621: reorder mmc node attributes
  mips: dts: ralink: mt7621: move pinctrl and sort its children
  mips: dts: ralink: mt7621: reorder spi0 node attributes
  mips: dts: ralink: mt7621: reorder i2c node attributes
  mips: dts: ralink: mt7621: reorder gpio node attributes
  mips: dts: ralink: mt7621: reorder sysc node attributes
  mips: dts: ralink: mt7621: reorder mmc regulator attributes
  mips: dts: ralink: mt7621: reorder cpuintc node attributes
  mips: dts: ralink: mt7621: reorder cpu node attributes
  MIPS: Add prototypes for plat_post_relocation() and relocate_kernel()
  MIPS: Octeon: Add PCIe link status check
  ...

Merge tag 'dmi-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging

Pull dmi updates from Jean Delvare:
"Bug fixes:

   - KCFI violation in dmi-id

   - stop decoding on broken (short) DMI table entry

  New features:

   - print info about populated memory slots at boot"

* tag 'dmi-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  firmware: dmi: Add info message for number of populated and total memory slots
  firmware: dmi: Stop decoding on broken entry
  firmware: dmi-id: add a release callback function

Merge tag 'linux-watchdog-6.10-rc1' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:

- Add Lenovo SE10 platform Watchdog Driver

- Other small fixes and improvements

* tag 'linux-watchdog-6.10-rc1' of git://www.linux-watchdog.org/linux-watchdog:
  watchdog: LENOVO_SE10_WDT should depend on X86 && DMI
  watchdog: sa1100: Fix PTR_ERR_OR_ZERO() vs NULL check in sa1100dog_probe()
  watchdog: rti_wdt: Set min_hw_heartbeat_ms to accommodate a safety margin
  watchdog: add HAS_IOPORT dependencies
  watchdog/wdt-main: Use cpumask_of() to avoid cpumask var on stack
  watchdog: bd9576: Drop "always-running" property
  watchdog: mtx-1: drop driver owner assignment
  watchdog: cpu5wdt.c: Fix use-after-free bug caused by cpu5wdt_trigger
  watchdog: lenovo_se10_wdt: Watchdog driver for Lenovo SE10 platform

Merge tag 'i2c-for-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c updates from Wolfram Sang:
"i2c core removes an argument from the i2c_mux_add_adapter() call to
  further deprecate class based I2C device instantiation. All users are
  converted, too.

  Other that that, Andi collected a number if I2C host driver patches.
  Those merges have their own description"

* tag 'i2c-for-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (72 commits)
  power: supply: sbs-manager: Remove class argument from i2c_mux_add_adapter()
  i2c: mux: Remove class argument from i2c_mux_add_adapter()
  i2c: synquacer: Fix an error handling path in synquacer_i2c_probe()
  i2c: acpi: Unbind mux adapters before delete
  i2c: designware: Replace MODULE_ALIAS() with MODULE_DEVICE_TABLE()
  i2c: pxa: use 'time_left' variable with wait_event_timeout()
  i2c: s3c2410: use 'time_left' variable with wait_event_timeout()
  i2c: rk3x: use 'time_left' variable with wait_event_timeout()
  i2c: qcom-geni: use 'time_left' variable with wait_for_completion_timeout()
  i2c: jz4780: use 'time_left' variable with wait_for_completion_timeout()
  i2c: synquacer: use 'time_left' variable with wait_for_completion_timeout()
  i2c: stm32f7: use 'time_left' variable with wait_for_completion_timeout()
  i2c: stm32f4: use 'time_left' variable with wait_for_completion_timeout()
  i2c: st: use 'time_left' variable with wait_for_completion_timeout()
  i2c: omap: use 'time_left' variable with wait_for_completion_timeout()
  i2c: imx-lpi2c: use 'time_left' variable with wait_for_completion_timeout()
  i2c: hix5hd2: use 'time_left' variable with wait_for_completion_timeout()
  i2c: exynos5: use 'time_left' variable with wait_for_completion_timeout()
  i2c: digicolor: use 'time_left' variable with wait_for_completion_timeout()
  i2c: amd-mp2-plat: use 'time_left' variable with wait_for_completion_timeout()
  ...

Merge tag 'pinctrl-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control updates from Linus Walleij:
"Core changes:

   - Use DEFINE_SHOW_STORE_ATTRIBUTE() in debugfs entries

  New drivers:

   - Qualcomm PMIH0108, PMD8028, PMXR2230 and PM6450 pin control support

  Improvements:

   - Serious cleanup of the recently merged aw9523 driver

   - Fix PIN_CONFIG_BIAS_DISABLE handling in pinctrl-single

   - A slew of device tree binding cleanups

   - Support a bus clock in the Samsung driver"

* tag 'pinctrl-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (48 commits)
  pinctrl: bcm2835: Make pin freeing behavior configurable
  dt-bindings: pinctrl: qcom,pmic-gpio: Fix "comptaible" typo for PMIH0108
  pinctrl: qcom: pinctrl-sm7150: Fix sdc1 and ufs special pins regs
  dt-bindings: pinctrl: mediatek: mt7622: add "antsel" function
  dt-bindings: pinctrl: mediatek: mt7622: fix array properties
  pinctrl: samsung: drop redundant drvdata assignment
  pinctrl: samsung: support a bus clock
  dt-bindings: pinctrl: samsung: google,gs101-pinctrl needs a clock
  pinctrl: renesas: rzg2l: Limit 2.5V power supply to Ethernet interfaces
  pinctrl: renesas: r8a779h0: Add INTC-EX pins, groups, and function
  pinctrl: renesas: r8a779h0: Fix IRQ suffixes
  pinctrl: renesas: rzg2l: Remove extra space in function parameter
  dt-bindings: pinctrl: qcom,pmic-mpp: add support for PM8901
  pinctrl: pinconf-generic: print hex value
  pinctrl: realtek: fix module autoloading
  pinctrl: qcom: sm7150: fix module autoloading
  pinctrl: loongson2: fix module autoloading
  pinctrl: mediatek: fix module autoloading
  pinctrl: freescale: imx8ulp: fix module autoloading
  dt-bindings: pinctrl: qcom,pmic-gpio: Allow gpio-hog nodes
  ...

Merge tag 'v6.10-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
"Fix a bug in the new ecc P521 code as well as a buggy fix in qat"

* tag 'v6.10-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ecc - Prevent ecc_digits_from_bytes from reading too many bytes
crypto: qat - Fix ADF_DEV_RESET_SYNC memory leak

rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL

Under the scenario of IB device bonding, when bringing down one of the
ports, or all ports, we saw xprtrdma entering a non-recoverable state
where it is not even possible to complete the disconnect and shut it
down the mount, requiring a reboot. Following debug, we saw that
transport connect never ended after receiving the
RDMA_CM_EVENT_DEVICE_REMOVAL callback.

The DEVICE_REMOVAL callback is irrespective of whether the CM_ID is
connected, and ESTABLISHED may not have happened. So need to work with
each of these states accordingly.

Fixes: 2acc5cae2923 ('xprtrdma: Prevent dereferencing r_xprt->rx_ep after it is freed')
Cc: Sagi Grimberg <[email protected]>
Signed-off-by: Dan Aloni <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

NFS: Don't enable NFS v2 by default

This came up during one of the Bake-a-thon discussions. NFS v2 support
was dropped from nfs-utils/mount.nfs in December 2021. Let's turn it
off by default in the kernel too, since this means there isn't a way
to mount and test it.

Signed-off-by: Anna Schumaker <[email protected]>
Reviewed-by: Jeffrey Layton <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS

Olga showed me a case where the client was sending multiple READ_PLUS
calls to the server in parallel, and the server replied
NFS4ERR_OPNOTSUPP to each. The client would fall back to READ for the
first reply, but fail to retry the other calls.

I fix this by removing the test for NFS_CAP_READ_PLUS in
nfs4_read_plus_not_supported(). This allows us to reschedule any
READ_PLUS call that has a NFS4ERR_OPNOTSUPP return value, even after the
capability has been cleared.

Reported-by: Olga Kornievskaia <[email protected]>
Fixes: c567552612ec ("NFS: Add READ_PLUS data segment support")
Cc: [email protected] # v5.10+
Signed-off-by: Anna Schumaker <[email protected]>
Reviewed-by: Benjamin Coddington <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

sunrpc: fix NFSACL RPC retry on soft mount

It used to be quite awhile ago since 1b63a75180c6 ('SUNRPC: Refactor
rpc_clone_client()'), in 2012, that `cl_timeout` was copied in so that
all mount parameters propagate to NFSACL clients. However since that
change, if mount options as follows are given:

    soft,timeo=50,retrans=16,vers=3

The resultant NFSACL client receives:

    cl_softrtry: 1
    cl_timeout: to_initval=60000, to_maxval=60000, to_increment=0, to_retries=2, to_exponential=0

These values lead to NFSACL operations not being retried under the
condition of transient network outages with soft mount. Instead, getacl
call fails after 60 seconds with EIO.

The simple fix is to pass the existing client's `cl_timeout` as the new
client timeout.

Cc: Chuck Lever <[email protected]>
Cc: Benjamin Coddington <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/T/
Fixes: 1b63a75180c6 ('SUNRPC: Refactor rpc_clone_client()')
Signed-off-by: Dan Aloni <[email protected]>
Reviewed-by: Benjamin Coddington <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

SUNRPC: fix handling expired GSS context

In the case where we have received a successful reply to an RPC request,
but while processing the reply the client in rpc_decode_header() finds
an expired context, the code ends up propagating the error to the caller
instead of getting a new context and retrying the request.

To give more details, in rpc_decode_header() we call rpcauth_checkverf()
will call into the gss and internally will at some point call
gss_validate() which has a check if the current’s context lifetime
expired, and it would fail. The reason for the failure gets ‘scrubbed’
and translated to EACCES so when we get back to rpc_decode_header() we
just go to “out_verifier” which for that error would get converted to
“out_garbage” (ie it’s treated as garballed reply) and the next
action is call_encode. Which (1) doesn’t reencode or re-send (not to
mention no upcall happens because context expires as that reason just
not known) and it again fails in the same decoding process. After
re-trying it 3 times the error is propagated back to the caller
(ie nfs4_write_done_cb() in the case a failing write).

To fix this, instead we need to look to the case where the server
decides that context has expired and replies with an RPC auth error.
In that case, the rpc_decode_header() goes to "out_msg_denied" in that
we return EKEYREJECTED which in call_decode() is sent to “call_reserve”
which triggers an upcalls and a re-try of the operation.

The proposed fix is in case of a failed rpc_decode_header() to check
if credentials were set to be invalid and use that as a proxy for
deciding that context has expired and then treat is same way as
receiving an auth error.

Signed-off-by: Olga Kornievskaia <[email protected]>
Reviewed-by: Benjamin Coddington <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

nfs: keep server info for remounts

With newer kernels that use fs_context for nfs mounts, remounts fail with
-EINVAL.

$ mount -t nfs -o nolock 10.0.0.1:/tmp/test /mnt/test/
$ mount -t nfs -o remount /mnt/test/
mount: mounting 10.0.0.1:/tmp/test on /mnt/test failed: Invalid argument

For remounts, the nfs server address and port are populated by
nfs_init_fs_context and later overwritten with 0x00 bytes by
nfs23_parse_monolithic. The remount then fails as the server address is
invalid.

Fix this by not overwriting nfs server info in nfs23_parse_monolithic if
we're doing a remount.

Fixes: f2aedb713c28 ("NFS: Add fs_context support.")
Signed-off-by: Martin Kaiser <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

NFSv4: Fixup smatch warning for ambiguous return

Dan Carpenter reports smatch warning for nfs4_try_migration() when a memory
allocation failure results in a zero return value. In this case, a
transient allocation failure error will likely be retried the next time the
server responds with NFS4ERR_MOVED.

We can fixup the smatch warning with a small refactor: attempt all three
allocations before testing and returning on a failure.

Reported-by: Dan Carpenter <[email protected]>
Fixes: c3ed222745d9 ("NFSv4: Fix free of uninitialized nfs4_label on referral lookup.")
Signed-off-by: Benjamin Coddington <[email protected]>
Reviewed-by: Dan Carpenter <[email protected]>
Reviewed-by: Chuck Lever <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

NFS: make sure lock/nolock overriding local_lock mount option

Currently, mount option lock/nolock and local_lock option
may override NFS_MOUNT_LOCAL_FLOCK NFS_MOUNT_LOCAL_FCNTL flags
when passing in different order:

mount -o vers=3,local_lock=all,lock:
local_lock=none

mount -o vers=3,lock,local_lock=all:
local_lock=all

This patch will let lock/nolock override local_lock option
as nfs(5) suggested.

Signed-off-by: Chen Hanxiao <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.

With two clients, each with NFSv3 mounts of the same directory, the sequence:

   client1            client2
  ls -l afile
                      echo hello there > afile
  echo HELLO > afile
  cat afile

will show
   HELLO
   there

because the O_TRUNC requested in the final 'echo' doesn't take effect.
This is because the "Negative dentry, just create a file" section in
lookup_open() assumes that the file *does* get created since the dentry
was negative, so it sets FMODE_CREATED, and this causes do_open() to
clear O_TRUNC and so the file doesn't get truncated.

Even mounting with -o lookupcache=none does not help as
nfs_neg_need_reval() always returns false if LOOKUP_CREATE is set.

This patch fixes the problem by providing an atomic_open inode operation
for NFSv3 (and v2).  The code is largely the code from the branch in
lookup_open() when atomic_open is not provided.  The significant change
is that the O_TRUNC flag is passed a new nfs_do_create() which add
'trunc' handling to nfs_create().

With this change we also optimise away an unnecessary LOOKUP before the
file is created.

Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

pNFS/filelayout: Specify the layout segment range in LAYOUTGET

Move from only requesting full file layout segments to requesting layout
segments that match our I/O size. This means the server is still free to
return a full file layout if it wants, but partial layouts will no
longer cause an error.

Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

pNFS/filelayout: Remove the whole file layout requirement

Layout segments have been supported in pNFS for years, so remove the
requirement that the server always sends whole file layouts.

Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>

Revert "selftests/cgroup: Drop define _GNU_SOURCE"

This reverts commit c1457d9aad5ee2feafcf85aa9a58ab50500159d2.

The framework change to add D_GNU_SOURCE to KHDR_INCLUDES
to Makefile, lib.mk, and kselftest_harness.h is reverted
as it is causing build failures and warnings.

Revert this change as this change depends on the framework
change.

Reported-by: Mark Brown <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>

Revert "selftests/sgx: Include KHDR_INCLUDES in Makefile"

This reverts commit 2c3b8f8f37c6c0c926d584cf4158db95e62b960c.

The framework change to add D_GNU_SOURCE to KHDR_INCLUDES
to Makefile, lib.mk, and kselftest_harness.h is reverted
as it is causing build failures and warnings.

Revert this change as this change depends on the framework
change.

Reported-by: Mark Brown <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>

Revert "selftests: Compile kselftest headers with -D_GNU_SOURCE"

This reverts commit daef47b89efd0b745e8478d69a3ad724bd8b4dc6.

This framework change to add D_GNU_SOURCE to KHDR_INCLUDES
to Makefile, lib.mk, and kselftest_harness.h is causing build
failures and warnings.

Revert this change.

Reported-by: Mark Brown <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>

block: t10-pi: add MODULE_DESCRIPTION()

Fix the allmodconfig 'make W=1' issue:

WARNING: modpost: missing MODULE_DESCRIPTION() in block/t10-pi.o

Signed-off-by: Jeff Johnson <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

nfc: nci: Fix uninit-value in nci_rx_work

syzbot reported the following uninit-value access issue [1]

nci_rx_work() parses received packet from ndev->rx_q. It should be
validated header size, payload size and total packet size before
processing the packet. If an invalid packet is detected, it should be
silently discarded.

Fixes: d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet")
Reported-and-tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=d7b4dc6cd50410152534 [1]
Signed-off-by: Ryosuke Yasuoka <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: net: kill smcrouted in the cleanup logic in amt.sh

The amt.sh requires smcrouted for multicasting routing.
So, it starts smcrouted before forwarding tests.
It must be stopped after all tests, but it isn't.

To fix this issue, it kills smcrouted in the cleanup logic.

Fixes: c08e8baea78e ("selftests: add amt interface selftest script")
Signed-off-by: Taehee Yoo <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: sr: fix missing sk_buff release in seg6_input_core

The seg6_input() function is responsible for adding the SRH into a
packet, delegating the operation to the seg6_input_core(). This function
uses the skb_cow_head() to ensure that there is sufficient headroom in
the sk_buff for accommodating the link-layer header.
In the event that the skb_cow_header() function fails, the
seg6_input_core() catches the error but it does not release the sk_buff,
which will result in a memory leak.

This issue was introduced in commit af3b5158b89d ("ipv6: sr: fix BUG due
to headroom too small after SRH push") and persists even after commit
7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane"),
where the entire seg6_input() code was refactored to deal with netfilter
hooks.

The proposed patch addresses the identified memory leak by requiring the
seg6_input_core() function to release the sk_buff in the event that
skb_cow_head() fails.

Fixes: af3b5158b89d ("ipv6: sr: fix BUG due to headroom too small after SRH push")
Signed-off-by: Andrea Mayer <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Always descend into dsa/ folder with CONFIG_NET_DSA enabled

Stephen reported that he was unable to get the dsa_loop driver to get
probed, and the reason ended up being because he had CONFIG_FIXED_PHY=y
in his kernel configuration. As Masahiro explained it:

  "obj-m += dsa/" means everything under dsa/ must be modular.

  If there is a built-in object under dsa/ with CONFIG_NET_DSA=m,
  you cannot do  "obj-$(CONFIG_NET_DSA) += dsa/".

  You need to change it back to "obj-y += dsa/".

This was the case here whereby CONFIG_NET_DSA=m, and so the
obj-$(CONFIG_FIXED_PHY) += dsa_loop_bdinfo.o rule is not executed and
the DSA loop mdio_board info structure is not registered with the
kernel, and eventually the device is simply not found.

To preserve the intention of the original commit of limiting the amount
of folder descending, conditionally descend into drivers/net/dsa when
CONFIG_NET_DSA is enabled.

Fixes: 227d72063fcc ("dsa: simplify Kconfig symbols and dependencies")
Reported-by: Stephen Langstaff <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Alexander Lobakin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: mailbox: qcom-ipcc: Document the SDX75 IPCC

Document the Inter-Processor Communication Controller on the SDX75 Platform.

Signed-off-by: Rohit Agarwal <[email protected]>
Acked-by: Rob Herring (Arm) <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>

dt-bindings: mailbox: qcom: Add MSM8974 APCS compatible

Add compatible for the Qualcomm MSM8974 APCS block.

Signed-off-by: Luca Weiss <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>

mailbox: Convert from tasklet to BH workqueue

The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws. To
replace tasklets, BH workqueue support was recently added. A BH workqueue
behaves similarly to regular workqueues except that the queued work items
are executed in the BH context.

Based on the work done by Tejun Heo <[email protected]>
Branch: https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-6.10

Signed-off-by: Allen Pais <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>

mailbox: mtk-cmdq: Fix pm_runtime_get_sync() warning in mbox shutdown

The return value of pm_runtime_get_sync() in cmdq_mbox_shutdown()
will return 1 when pm runtime state is active, and we don't want to
get the warning message in this case.

So we change the return value < 0 for WARN_ON().

Fixes: 8afe816b0c99 ("mailbox: mtk-cmdq-mailbox: Implement Runtime PM with autosuspend")
Signed-off-by: Jason-JH.Lin <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>

mailbox: mtk-cmdq-mailbox: fix module autoloading

Add MODULE_DEVICE_TABLE(), so this module could be properly autoloaded
based on the alias from of_device_id table.

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>

mailbox: zynqmp: handle SGI for shared IPI

At least one IPI is used in TF-A for communication with PMC firmware.
If this IPI needs to be used by other agents such as RPU then, IPI
system interrupt can't be generated in mailbox driver. In such case
TF-A generates SGI to mailbox driver for IPI notification.

Signed-off-by: Tanmay Shah <[email protected]>
Signed-off-by: Saeed Nowshadi <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>

mailbox: arm_mhuv3: Add driver

Add support for ARM MHUv3 mailbox controller.

Support is limited to the MHUv3 Doorbell extension using only the PBX/MBX
combined interrupts.

Signed-off-by: Cristian Marussi <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>

dt-bindings: mailbox: arm,mhuv3: Add bindings

Add bindings for the ARM MHUv3 Mailbox controller.

Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Cristian Marussi <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>

mailbox: omap: Remove kernel FIFO message queuing

The kernel FIFO queue has a couple issues. The biggest issue is that
it causes extra latency in a path that can be used in real-time tasks,
such as communication with real-time remote processors.

The whole FIFO idea itself looks to be a leftover from before the
unified mailbox framework. The current mailbox framework expects
mbox_chan_received_data() to be called with data immediately as it
arrives. Remove the FIFO and pass the messages to the mailbox
framework directly as part of a threaded IRQ handler.

Signed-off-by: Andrew Davis <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>