Git Repo - linux.git/log

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux

Pull ARM fix from Russell King:

- Fix a build issue with older binutils with LD dead code elimination
disabled

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: 9414/1: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION

Merge tag 'parisc-for-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc architecture fix from Helge Deller:

- Fix boot issue where boot memory is marked read-only too early

* tag 'parisc-for-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Delay write-protection until mark_rodata_ro() call

Merge tag 'mm-hotfixes-stable-2024-09-03-20-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"17 hotfixes, 15 of which are cc:stable.

  Mostly MM, no identifiable theme.  And a few nilfs2 fixups"

* tag 'mm-hotfixes-stable-2024-09-03-20-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  alloc_tag: fix allocation tag reporting when CONFIG_MODULES=n
  mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area
  mailmap: update entry for Jan Kuliga
  codetag: debug: mark codetags for poisoned page as empty
  mm/memcontrol: respect zswap.writeback setting from parent cg too
  scripts: fix gfp-translate after ___GFP_*_BITS conversion to an enum
  Revert "mm: skip CMA pages when they are not available"
  maple_tree: remove rcu_read_lock() from mt_validate()
  kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
  mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook
  nilfs2: fix state management in error path of log writing function
  nilfs2: fix missing cleanup on rollforward recovery error
  nilfs2: protect references to superblock parameters exposed in sysfs
  userfaultfd: don't BUG_ON() if khugepaged yanks our page table
  userfaultfd: fix checks for huge PMDs
  mm: vmalloc: ensure vmap_block is initialised before adding to queue
  selftests: mm: fix build errors on armhf

ARM: 9414/1: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION

There is a build issue with LD segmentation fault, while
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is not enabled, as bellow.

scripts/link-vmlinux.sh: line 49: 3796 Segmentation fault
(core dumped) ${ld} ${ldflags} -o ${output} ${wl}--whole-archive
${objs} ${wl}--no-whole-archive ${wl}--start-group
${libs} ${wl}--end-group ${kallsymso} ${btf_vmlinux_bin_o} ${ldlibs}

The error occurs in older versions of the GNU ld with version earlier
than 2.36. It makes most sense to have a minimum LD version as
a dependency for HAVE_LD_DEAD_CODE_DATA_ELIMINATION and eliminate
the impact of ".reloc .text, R_ARM_NONE, ." when
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is not enabled.

Fixes: ed0f94102251 ("ARM: 9404/1: arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION")
Reported-by: Harith George <mail2hgg@gmail.com>
Tested-by: Harith George <mail2hgg@gmail.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com>
Link: https://lore.kernel.org/all/14e9aefb-88d1-4eee-8288-ef15d4a9b059@gmail.com/
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Merge tag 'fuse-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:

- Fix EIO if splice and page stealing are enabled on the fuse device

- Disable problematic combination of passthrough and writeback-cache

- Other bug fixes found by code review

* tag 'fuse-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: disable the combination of passthrough and writeback cache
  fuse: update stats for pages in dropped aux writeback list
  fuse: clear PG_uptodate when using a stolen page
  fuse: fix memory leak in fuse_create_open
  fuse: check aborted connection before adding requests to pending list for resending
  fuse: use unsigned type for getxattr/listxattr size truncation

parisc: Delay write-protection until mark_rodata_ro() call

Do not write-protect the kernel read-only and __ro_after_init sections
earlier than before mark_rodata_ro() is called. This fixes a boot issue on
parisc which is triggered by commit 91a1d97ef482 ("jump_label,module: Don't
alloc static_key_mod for __ro_after_init keys"). That commit may modify
static key contents in the __ro_after_init section at bootup, so this
section needs to be writable at least until mark_rodata_ro() is called.

Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: matoro <matoro_mailinglist_kernel@matoro.tk>
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Link: https://lore.kernel.org/linux-parisc/096cad5aada514255cd7b0b9dbafc768@matoro.tk/#r
Fixes: 91a1d97ef482 ("jump_label,module: Don't alloc static_key_mod for __ro_after_init keys")
Cc: stable@vger.kernel.org # v6.10+

Merge tag 'ata-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fix from Damien Le Moal:

- Fix a potential memory leak in the ata host initialization code (from
Zheng)

* tag 'ata-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata: Fix memory leak for error path in ata_host_alloc()

alloc_tag: fix allocation tag reporting when CONFIG_MODULES=n

codetag_module_init() is used to initialize sections containing allocation
tags.  This function is used to initialize module sections as well as core
kernel sections, in which case the module parameter is set to NULL.  This
function has to be called even when CONFIG_MODULES=n to initialize core
kernel allocation tag sections.  When CONFIG_MODULES=n, this function is a
NOP, which is wrong.  This leads to /proc/allocinfo reported as empty.
Fix this by making it independent of CONFIG_MODULES.

Link: https://lkml.kernel.org/r/20240828231536.1770519-1-surenb@google.com
Fixes: 916cc5167cc6 ("lib: code tagging framework")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [6.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area

When running the vmalloc stress on a 448-core system, observe the average
latency of purge_vmap_node() is about 2 seconds by using the eBPF/bcc
'funclatency.py' tool [1].

  # /your-git-repo/bcc/tools/funclatency.py -u purge_vmap_node & pid1=$! && sleep 8 && modprobe test_vmalloc nr_threads=$(nproc) run_test_mask=0x7; kill -SIGINT $pid1

     usecs             : count    distribution
        0 -> 1         : 0       |                                        |
        2 -> 3         : 29      |                                        |
        4 -> 7         : 19      |                                        |
        8 -> 15        : 56      |                                        |
       16 -> 31        : 483     |****                                    |
       32 -> 63        : 1548    |************                            |
       64 -> 127       : 2634    |*********************                   |
      128 -> 255       : 2535    |*********************                   |
      256 -> 511       : 1776    |**************                          |
      512 -> 1023      : 1015    |********                                |
     1024 -> 2047      : 573     |****                                    |
     2048 -> 4095      : 488     |****                                    |
     4096 -> 8191      : 1091    |*********                               |
     8192 -> 16383     : 3078    |*************************               |
    16384 -> 32767     : 4821    |****************************************|
    32768 -> 65535     : 3318    |***************************             |
    65536 -> 131071    : 1718    |**************                          |
   131072 -> 262143    : 2220    |******************                      |
   262144 -> 524287    : 1147    |*********                               |
   524288 -> 1048575   : 1179    |*********                               |
  1048576 -> 2097151   : 822     |******                                  |
  2097152 -> 4194303   : 906     |*******                                 |
  4194304 -> 8388607   : 2148    |*****************                       |
  8388608 -> 16777215  : 4497    |*************************************   |
16777216 -> 33554431  : 289     |**                                      |

  avg = 2041714 usecs, total: 78381401772 usecs, count: 38390

  The worst case is over 16-33 seconds, so soft lockup is triggered [2].

[Root Cause]
1) Each purge_list has the long list. The following shows the number of
   vmap_area is purged.

   crash> p vmap_nodes
   vmap_nodes = $27 = (struct vmap_node *) 0xff2de5a900100000
   crash> vmap_node 0xff2de5a900100000 128 | grep nr_purged
     nr_purged = 663070
     ...
     nr_purged = 821670
     nr_purged = 692214
     nr_purged = 726808
     ...

2) atomic_long_sub() employs the 'lock' prefix to ensure the atomic
   operation when purging each vmap_area. However, the iteration is over
   600000 vmap_area (See 'nr_purged' above).

   Here is objdump output:

     $ objdump -D vmlinux
     ffffffff813e8c80 <purge_vmap_node>:
     ...
     ffffffff813e8d70:  f0 48 29 2d 68 0c bb  lock sub %rbp,0x2bb0c68(%rip)
     ...

   Quote from "Instruction tables" pdf file [3]:
     Instructions with a LOCK prefix have a long latency that depends on
     cache organization and possibly RAM speed. If there are multiple
     processors or cores or direct memory access (DMA) devices, then all
     locked instructions will lock a cache line for exclusive access,
     which may involve RAM access. A LOCK prefix typically costs more
     than a hundred clock cycles, even on single-processor systems.

   That's why the latency of purge_vmap_node() dramatically increases
   on a many-core system: One core is busy on purging each vmap_area of
   the *long* purge_list and executing atomic_long_sub() for each
   vmap_area, while other cores free vmalloc allocations and execute
   atomic_long_add_return() in free_vmap_area_noflush().

[Solution]
Employ a local variable to record the total purged pages, and execute
atomic_long_sub() after the traversal of the purge_list is done. The
experiment result shows the latency improvement is 99%.

[Experiment Result]
1) System Configuration: Three servers (with HT-enabled) are tested.
     * 72-core server: 3rd Gen Intel Xeon Scalable Processor*1
     * 192-core server: 5th Gen Intel Xeon Scalable Processor*2
     * 448-core server: AMD Zen 4 Processor*2

2) Kernel Config
     * CONFIG_KASAN is disabled

3) The data in column "w/o patch" and "w/ patch"
     * Unit: micro seconds (us)
     * Each data is the average of 3-time measurements

         System        w/o patch (us)   w/ patch (us)    Improvement (%)
     ---------------   --------------   -------------    -------------
     72-core server          2194              14            99.36%
     192-core server       143799            1139            99.21%
     448-core server      1992122            6883            99.65%

[1] https://github.com/iovisor/bcc/blob/master/tools/funclatency.py
[2] https://gist.github.com/AdrianHuang/37c15f67b45407b83c2d32f918656c12
[3] https://www.agner.org/optimize/instruction_tables.pdf

Link: https://lkml.kernel.org/r/20240829130633.2184-1-ahuang12@lenovo.com
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mailmap: update entry for Jan Kuliga

Soon I won't be able to use my current email address.

Link: https://lkml.kernel.org/r/20240830095658.1203198-1-jankul@alatek.krakow.pl
Signed-off-by: Jan Kuliga <jankul@alatek.krakow.pl>
Cc: David S. Miller <davem@davemloft.net>
Cc: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

codetag: debug: mark codetags for poisoned page as empty

When PG_hwpoison pages are freed they are treated differently in
free_pages_prepare() and instead of being released they are isolated.

Page allocation tag counters are decremented at this point since the page
is considered not in use.  Later on when such pages are released by
unpoison_memory(), the allocation tag counters will be decremented again
and the following warning gets reported:

[  113.930443][ T3282] ------------[ cut here ]------------
[  113.931105][ T3282] alloc_tag was not set
[  113.931576][ T3282] WARNING: CPU: 2 PID: 3282 at ./include/linux/alloc_tag.h:130 pgalloc_tag_sub.part.66+0x154/0x164
[  113.932866][ T3282] Modules linked in: hwpoison_inject fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_man4
[  113.941638][ T3282] CPU: 2 UID: 0 PID: 3282 Comm: madvise11 Kdump: loaded Tainted: G        W          6.11.0-rc4-dirty #18
[  113.943003][ T3282] Tainted: [W]=WARN
[  113.943453][ T3282] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[  113.944378][ T3282] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  113.945319][ T3282] pc : pgalloc_tag_sub.part.66+0x154/0x164
[  113.946016][ T3282] lr : pgalloc_tag_sub.part.66+0x154/0x164
[  113.946706][ T3282] sp : ffff800087093a10
[  113.947197][ T3282] x29: ffff800087093a10 x28: ffff0000d7a9d400 x27: ffff80008249f0a0
[  113.948165][ T3282] x26: 0000000000000000 x25: ffff80008249f2b0 x24: 0000000000000000
[  113.949134][ T3282] x23: 0000000000000001 x22: 0000000000000001 x21: 0000000000000000
[  113.950597][ T3282] x20: ffff0000c08fcad8 x19: ffff80008251e000 x18: ffffffffffffffff
[  113.952207][ T3282] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800081746210
[  113.953161][ T3282] x14: 0000000000000000 x13: 205d323832335420 x12: 5b5d353031313339
[  113.954120][ T3282] x11: ffff800087093500 x10: 000000000000005d x9 : 00000000ffffffd0
[  113.955078][ T3282] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008236ba90 x6 : c0000000ffff7fff
[  113.956036][ T3282] x5 : ffff000b34bf4dc8 x4 : ffff8000820aba90 x3 : 0000000000000001
[  113.956994][ T3282] x2 : ffff800ab320f000 x1 : 841d1e35ac932e00 x0 : 0000000000000000
[  113.957962][ T3282] Call trace:
[  113.958350][ T3282]  pgalloc_tag_sub.part.66+0x154/0x164
[  113.959000][ T3282]  pgalloc_tag_sub+0x14/0x1c
[  113.959539][ T3282]  free_unref_page+0xf4/0x4b8
[  113.960096][ T3282]  __folio_put+0xd4/0x120
[  113.960614][ T3282]  folio_put+0x24/0x50
[  113.961103][ T3282]  unpoison_memory+0x4f0/0x5b0
[  113.961678][ T3282]  hwpoison_unpoison+0x30/0x48 [hwpoison_inject]
[  113.962436][ T3282]  simple_attr_write_xsigned.isra.34+0xec/0x1cc
[  113.963183][ T3282]  simple_attr_write+0x38/0x48
[  113.963750][ T3282]  debugfs_attr_write+0x54/0x80
[  113.964330][ T3282]  full_proxy_write+0x68/0x98
[  113.964880][ T3282]  vfs_write+0xdc/0x4d0
[  113.965372][ T3282]  ksys_write+0x78/0x100
[  113.965875][ T3282]  __arm64_sys_write+0x24/0x30
[  113.966440][ T3282]  invoke_syscall+0x7c/0x104
[  113.966984][ T3282]  el0_svc_common.constprop.1+0x88/0x104
[  113.967652][ T3282]  do_el0_svc+0x2c/0x38
[  113.968893][ T3282]  el0_svc+0x3c/0x1b8
[  113.969379][ T3282]  el0t_64_sync_handler+0x98/0xbc
[  113.969980][ T3282]  el0t_64_sync+0x19c/0x1a0
[  113.970511][ T3282] ---[ end trace 0000000000000000 ]---

To fix this, clear the page tag reference after the page got isolated
and accounted for.

Link: https://lkml.kernel.org/r/20240825163649.33294-1-hao.ge@linux.dev
Fixes: d224eb0287fb ("codetag: debug: mark codetags for reserved pages as empty")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hao Ge <gehao@kylinos.cn>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: <stable@vger.kernel.org> [6.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/memcontrol: respect zswap.writeback setting from parent cg too

Currently, the behavior of zswap.writeback wrt.  the cgroup hierarchy
seems a bit odd.  Unlike zswap.max, it doesn't honor the value from parent
cgroups.  This surfaced when people tried to globally disable zswap
writeback, i.e.  reserve physical swap space only for hibernation [1] -
disabling zswap.writeback only for the root cgroup results in subcgroups
with zswap.writeback=1 still performing writeback.

The inconsistency became more noticeable after I introduced the
MemoryZSwapWriteback= systemd unit setting [2] for controlling the knob.
The patch assumed that the kernel would enforce the value of parent
cgroups.  It could probably be workarounded from systemd's side, by going
up the slice unit tree and inheriting the value.  Yet I think it's more
sensible to make it behave consistently with zswap.max and friends.

[1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation
[2] https://github.com/systemd/systemd/pull/31734

Link: https://lkml.kernel.org/r/20240823162506.12117-1-me@yhndnzj.com
Fixes: 501a06fe8e4c ("zswap: memcontrol: implement zswap writeback disabling")
Signed-off-by: Mike Yuan <me@yhndnzj.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

scripts: fix gfp-translate after ___GFP_*_BITS conversion to an enum

Richard reports that since 772dd0342727c ("mm: enumerate all gfp flags"),
gfp-translate is broken, as the bit numbers are implicit, leaving the
shell script unable to extract them.  Even more, some bits are now at a
variable location, making it double extra hard to parse using a simple
shell script.

Use a brute-force approach to the problem by generating a small C stub
that will use the enum to dump the interesting bits.

As an added bonus, we are now able to identify invalid bits for a given
configuration.  As an added drawback, we cannot parse include files that
predate this change anymore.  Tough luck.

Link: https://lkml.kernel.org/r/20240823163850.3791201-1-maz@kernel.org
Fixes: 772dd0342727 ("mm: enumerate all gfp flags")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reported-by: Richard Weinberger <richard@nod.at>
Cc: Petr Tesařík <petr@tesarici.cz>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Revert "mm: skip CMA pages when they are not available"

This reverts commit 5da226dbfce3 ("mm: skip CMA pages when they are not
available") and b7108d66318a ("Multi-gen LRU: skip CMA pages when they are
not eligible").

lruvec->lru_lock is highly contended and is held when calling
isolate_lru_folios.  If the lru has a large number of CMA folios
consecutively, while the allocation type requested is not MIGRATE_MOVABLE,
isolate_lru_folios can hold the lock for a very long time while it skips
those.  For FIO workload, ~150million order=0 folios were skipped to
isolate a few ZONE_DMA folios [1].  This can cause lockups [1] and high
memory pressure for extended periods of time [2].

Remove skipping CMA for MGLRU as well, as it was introduced in sort_folio
for the same resaon as 5da226dbfce3a2f44978c2c7cf88166e69a6788b.

[1] https://lore.kernel.org/all/CAOUHufbkhMZYz20aM_3rHZ3OcK4m2puji2FGpUpn_-DevGk3Kg@mail.gmail.com/
[2] https://lore.kernel.org/all/ZrssOrcJIDy8hacI@gmail.com/

[usamaarif642@gmail.com: also revert b7108d66318a, per Johannes]
Link: https://lkml.kernel.org/r/9060a32d-b2d7-48c0-8626-1db535653c54@gmail.com
Link: https://lkml.kernel.org/r/357ac325-4c61-497a-92a3-bdbd230d5ec9@gmail.com
Link: https://lkml.kernel.org/r/9060a32d-b2d7-48c0-8626-1db535653c54@gmail.com
Fixes: 5da226dbfce3 ("mm: skip CMA pages when they are not available")
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zhaoyang Huang <huangzhaoyang@gmail.com>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

maple_tree: remove rcu_read_lock() from mt_validate()

The write lock should be held when validating the tree to avoid updates
racing with checks. Holding the rcu read lock during a large tree
validation may also cause a prolonged rcu read window and "rcu_preempt
detected stalls" warnings.

Link: https://lore.kernel.org/all/0000000000001d12d4062005aea1@google.com/
Link: https://lkml.kernel.org/r/20240820175417.2782532-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reported-by: syzbot+036af2f0c7338a33b0cd@syzkaller.appspotmail.com
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y

Fix the condition to exclude the elfcorehdr segment from the SHA digest
calculation.

The j iterator is an index into the output sha_regions[] array, not into
the input image->segment[] array. Once it reaches
image->elfcorehdr_index, all subsequent segments are excluded. Besides,
if the purgatory segment precedes the elfcorehdr segment, the elfcorehdr
may be wrongly included in the calculation.

Link: https://lkml.kernel.org/r/20240805150750.170739-1-petr.tesarik@suse.com
Fixes: f7cc804a9fd4 ("kexec: exclude elfcorehdr from the segment digest")
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Eric DeVolder <eric_devolder@yahoo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook

When enable CONFIG_MEMCG & CONFIG_KFENCE & CONFIG_KMEMLEAK, the following
warning always occurs,This is because the following call stack occurred:
mem_pool_alloc
    kmem_cache_alloc_noprof
        slab_alloc_node
            kfence_alloc

Once the kfence allocation is successful,slab->obj_exts will not be empty,
because it has already been assigned a value in kfence_init_pool.

Since in the prepare_slab_obj_exts_hook function,we perform a check for
s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE),the alloc_tag_add function
will not be called as a result.Therefore,ref->ct remains NULL.

However,when we call mem_pool_free,since obj_ext is not empty, it
eventually leads to the alloc_tag_sub scenario being invoked.  This is
where the warning occurs.

So we should add corresponding checks in the alloc_tagging_slab_free_hook.
For __GFP_NO_OBJ_EXT case,I didn't see the specific case where it's using
kfence,so I won't add the corresponding check in
alloc_tagging_slab_free_hook for now.

[    3.734349] ------------[ cut here ]------------
[    3.734807] alloc_tag was not set
[    3.735129] WARNING: CPU: 4 PID: 40 at ./include/linux/alloc_tag.h:130 kmem_cache_free+0x444/0x574
[    3.735866] Modules linked in: autofs4
[    3.736211] CPU: 4 UID: 0 PID: 40 Comm: ksoftirqd/4 Tainted: G        W          6.11.0-rc3-dirty #1
[    3.736969] Tainted: [W]=WARN
[    3.737258] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[    3.737875] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    3.738501] pc : kmem_cache_free+0x444/0x574
[    3.738951] lr : kmem_cache_free+0x444/0x574
[    3.739361] sp : ffff80008357bb60
[    3.739693] x29: ffff80008357bb70 x28: 0000000000000000 x27: 0000000000000000
[    3.740338] x26: ffff80008207f000 x25: ffff000b2eb2fd60 x24: ffff0000c0005700
[    3.740982] x23: ffff8000804229e4 x22: ffff800082080000 x21: ffff800081756000
[    3.741630] x20: fffffd7ff8253360 x19: 00000000000000a8 x18: ffffffffffffffff
[    3.742274] x17: ffff800ab327f000 x16: ffff800083398000 x15: ffff800081756df0
[    3.742919] x14: 0000000000000000 x13: 205d344320202020 x12: 5b5d373038343337
[    3.743560] x11: ffff80008357b650 x10: 000000000000005d x9 : 00000000ffffffd0
[    3.744231] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008237bad0 x6 : c0000000ffff7fff
[    3.744907] x5 : ffff80008237ba78 x4 : ffff8000820bbad0 x3 : 0000000000000001
[    3.745580] x2 : 68d66547c09f7800 x1 : 68d66547c09f7800 x0 : 0000000000000000
[    3.746255] Call trace:
[    3.746530]  kmem_cache_free+0x444/0x574
[    3.746931]  mem_pool_free+0x44/0xf4
[    3.747306]  free_object_rcu+0xc8/0xdc
[    3.747693]  rcu_do_batch+0x234/0x8a4
[    3.748075]  rcu_core+0x230/0x3e4
[    3.748424]  rcu_core_si+0x14/0x1c
[    3.748780]  handle_softirqs+0x134/0x378
[    3.749189]  run_ksoftirqd+0x70/0x9c
[    3.749560]  smpboot_thread_fn+0x148/0x22c
[    3.749978]  kthread+0x10c/0x118
[    3.750323]  ret_from_fork+0x10/0x20
[    3.750696] ---[ end trace 0000000000000000 ]---

Link: https://lkml.kernel.org/r/20240816013336.17505-1-hao.ge@linux.dev
Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

nilfs2: fix state management in error path of log writing function

After commit a694291a6211 ("nilfs2: separate wait function from
nilfs_segctor_write") was applied, the log writing function
nilfs_segctor_do_construct() was able to issue I/O requests continuously
even if user data blocks were split into multiple logs across segments,
but two potential flaws were introduced in its error handling.

First, if nilfs_segctor_begin_construction() fails while creating the
second or subsequent logs, the log writing function returns without
calling nilfs_segctor_abort_construction(), so the writeback flag set on
pages/folios will remain uncleared. This causes page cache operations to
hang waiting for the writeback flag. For example,
truncate_inode_pages_final(), which is called via nilfs_evict_inode() when
an inode is evicted from memory, will hang.

Second, the NILFS_I_COLLECTED flag set on normal inodes remain uncleared.
As a result, if the next log write involves checkpoint creation, that's
fine, but if a partial log write is performed that does not, inodes with
NILFS_I_COLLECTED set are erroneously removed from the "sc_dirty_files"
list, and their data and b-tree blocks may not be written to the device,
corrupting the block mapping.

Fix these issues by uniformly calling nilfs_segctor_abort_construction()
on failure of each step in the loop in nilfs_segctor_do_construct(),
having it clean up logs and segment usages according to progress, and
correcting the conditions for calling nilfs_redirty_inodes() to ensure
that the NILFS_I_COLLECTED flag is cleared.

Link: https://lkml.kernel.org/r/20240814101119.4070-1-konishi.ryusuke@gmail.com
Fixes: a694291a6211 ("nilfs2: separate wait function from nilfs_segctor_write")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

nilfs2: fix missing cleanup on rollforward recovery error

In an error injection test of a routine for mount-time recovery, KASAN
found a use-after-free bug.

It turned out that if data recovery was performed using partial logs
created by dsync writes, but an error occurred before starting the log
writer to create a recovered checkpoint, the inodes whose data had been
recovered were left in the ns_dirty_files list of the nilfs object and
were not freed.

Fix this issue by cleaning up inodes that have read the recovery data if
the recovery routine fails midway before the log writer starts.

Link: https://lkml.kernel.org/r/20240810065242.3701-1-konishi.ryusuke@gmail.com
Fixes: 0f3e1c7f23f8 ("nilfs2: recovery functions")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

nilfs2: protect references to superblock parameters exposed in sysfs

The superblock buffers of nilfs2 can not only be overwritten at runtime
for modifications/repairs, but they are also regularly swapped, replaced
during resizing, and even abandoned when degrading to one side due to
backing device issues. So, accessing them requires mutual exclusion using
the reader/writer semaphore "nilfs->ns_sem".

Some sysfs attribute show methods read this superblock buffer without the
necessary mutual exclusion, which can cause problems with pointer
dereferencing and memory access, so fix it.

Link: https://lkml.kernel.org/r/20240811100320.9913-1-konishi.ryusuke@gmail.com
Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

userfaultfd: don't BUG_ON() if khugepaged yanks our page table

Since khugepaged was changed to allow retracting page tables in file
mappings without holding the mmap lock, these BUG_ON()s are wrong - get
rid of them.

We could also remove the preceding "if (unlikely(...))" block, but then we
could reach pte_offset_map_lock() with transhuge pages not just for file
mappings but also for anonymous mappings - which would probably be fine
but I think is not necessarily expected.

Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-2-5efa61078a41@google.com
Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

userfaultfd: fix checks for huge PMDs

Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.

The pmd_trans_huge() code in mfill_atomic() is wrong in three different
ways depending on kernel version:

1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit
   the right two race windows) - I've tested this in a kernel build with
   some extra mdelay() calls. See the commit message for a description
   of the race scenario.
   On older kernels (before 6.5), I think the same bug can even
   theoretically lead to accessing transhuge page contents as a page table
   if you hit the right 5 narrow race windows (I haven't tested this case).
2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for
   detecting PMDs that don't point to page tables.
   On older kernels (before 6.5), you'd just have to win a single fairly
   wide race to hit this.
   I've tested this on 6.1 stable by racing migration (with a mdelay()
   patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86
   VM, that causes a kernel oops in ptlock_ptr().
3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed
   to yank page tables out from under us (though I haven't tested that),
   so I think the BUG_ON() checks in mfill_atomic() are just wrong.

I decided to write two separate fixes for these (one fix for bugs 1+2, one
fix for bug 3), so that the first fix can be backported to kernels
affected by bugs 1+2.

This patch (of 2):

This fixes two issues.

I discovered that the following race can occur:

  mfill_atomic                other thread
  ============                ============
                              <zap PMD>
  pmdp_get_lockless() [reads none pmd]
  <bail if trans_huge>
  <if none:>
                              <pagefault creates transhuge zeropage>
    __pte_alloc [no-op]
                              <zap PMD>
  <bail if pmd_trans_huge(*dst_pmd)>
  BUG_ON(pmd_none(*dst_pmd))

I have experimentally verified this in a kernel with extra mdelay() calls;
the BUG_ON(pmd_none(*dst_pmd)) triggers.

On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow
pte_offset_map[_lock]() to fail"), this can't lead to anything worse than
a BUG_ON(), since the page table access helpers are actually designed to
deal with page tables concurrently disappearing; but on older kernels
(<=6.4), I think we could probably theoretically race past the two
BUG_ON() checks and end up treating a hugepage as a page table.

The second issue is that, as Qi Zheng pointed out, there are other types
of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs
(in particular, migration PMDs).

On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a
PMD that contains a migration entry (which just requires winning a single,
fairly wide race), it will pass the PMD to pte_offset_map_lock(), which
assumes that the PMD points to a page table.

Breakage follows: First, the kernel tries to take the PTE lock (which will
crash or maybe worse if there is no "struct page" for the address bits in
the migration entry PMD - I think at least on X86 there usually is no
corresponding "struct page" thanks to the PTE inversion mitigation, amd64
looks different).

If that didn't crash, the kernel would next try to write a PTE into what
it wrongly thinks is a page table.

As part of fixing these issues, get rid of the check for pmd_trans_huge()
before __pte_alloc() - that's redundant, we're going to have to check for
that after the __pte_alloc() anyway.

Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.

Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@google.com
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-1-5efa61078a41@google.com
Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

mm: vmalloc: ensure vmap_block is initialised before adding to queue

Commit 8c61291fd850 ("mm: fix incorrect vbq reference in
purge_fragmented_block") extended the 'vmap_block' structure to contain a
'cpu' field which is set at allocation time to the id of the initialising
CPU.

When a new 'vmap_block' is being instantiated by new_vmap_block(), the
partially initialised structure is added to the local 'vmap_block_queue'
xarray before the 'cpu' field has been initialised.  If another CPU is
concurrently walking the xarray (e.g.  via vm_unmap_aliases()), then it
may perform an out-of-bounds access to the remote queue thanks to an
uninitialised index.

This has been observed as UBSAN errors in Android:

| Internal error: UBSAN: array index out of bounds: 00000000f2005512 [#1] PREEMPT SMP
|
| Call trace:
|  purge_fragmented_block+0x204/0x21c
|  _vm_unmap_aliases+0x170/0x378
|  vm_unmap_aliases+0x1c/0x28
|  change_memory_common+0x1dc/0x26c
|  set_memory_ro+0x18/0x24
|  module_enable_ro+0x98/0x238
|  do_init_module+0x1b0/0x310

Move the initialisation of 'vb->cpu' in new_vmap_block() ahead of the
addition to the xarray.

Link: https://lkml.kernel.org/r/20240812171606.17486-1-will@kernel.org
Fixes: 8c61291fd850 ("mm: fix incorrect vbq reference in purge_fragmented_block")
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Cc: Hailong.Liu <hailong.liu@oppo.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

selftests: mm: fix build errors on armhf

The __NR_mmap isn't found on armhf.  The mmap() is commonly available
system call and its wrapper is present on all architectures.  So it should
be used directly.  It solves problem for armhf and doesn't create problem
for other architectures.

Remove sys_mmap() functions as they aren't doing anything else other than
calling mmap().  There is no need to set errno = 0 manually as glibc
always resets it.

For reference errors are as following:

  CC       seal_elf
seal_elf.c: In function 'sys_mmap':
seal_elf.c:39:33: error: '__NR_mmap' undeclared (first use in this function)
   39 |         sret = (void *) syscall(__NR_mmap, addr, len, prot,
      |                                 ^~~~~~~~~

mseal_test.c: In function 'sys_mmap':
mseal_test.c:90:33: error: '__NR_mmap' undeclared (first use in this function)
   90 |         sret = (void *) syscall(__NR_mmap, addr, len, prot,
      |                                 ^~~~~~~~~

Link: https://lkml.kernel.org/r/20240809082511.497266-1-usama.anjum@collabora.com
Fixes: 4926c7a52de7 ("selftest mm/mseal memory sealing")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Merge tag 'x86-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:

- x2apic_disable() clears x2apic_state and x2apic_mode unconditionally,
   even when the state is X2APIC_ON_LOCKED, which prevents the kernel to
   disable it thereby creating inconsistent state.

   Reorder the logic so it actually works correctly

- The XSTATE logic for handling LBR is incorrect as it assumes that
   XSAVES supports LBR when the CPU supports LBR. In fact both
   conditions need to be true. Otherwise the enablement of LBR in the
   IA32_XSS MSR fails and subsequently the machine crashes on the next
   XRSTORS operation because IA32_XSS is not initialized.

   Cache the XSTATE support bit during init and make the related
   functions use this cached information and the LBR CPU feature bit to
   cure this.

- Cure a long standing bug in KASLR

   KASLR uses the full address space between PAGE_OFFSET and vaddr_end
   to randomize the starting points of the direct map, vmalloc and
   vmemmap regions. It thereby limits the size of the direct map by
   using the installed memory size plus an extra configurable margin for
   hot-plug memory. This limitation is done to gain more randomization
   space because otherwise only the holes between the direct map,
   vmalloc, vmemmap and vaddr_end would be usable for randomizing.

   The limited direct map size is not exposed to the rest of the kernel,
   so the memory hot-plug and resource management related code paths
   still operate under the assumption that the available address space
   can be determined with MAX_PHYSMEM_BITS.

   request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1
   downwards. That means the first allocation happens past the end of
   the direct map and if unlucky this address is in the vmalloc space,
   which causes high_memory to become greater than VMALLOC_START and
   consequently causes iounmap() to fail for valid ioremap addresses.

   Cure this by exposing the end of the direct map via PHYSMEM_END and
   use that for the memory hot-plug and resource management related
   places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case
   PHYSMEM_END maps to a variable which is initialized by the KASLR
   initialization and otherwise it is based on MAX_PHYSMEM_BITS as
   before.

- Prevent a data leak in mmio_read(). The TDVMCALL exposes the value of
   an initialized variabled on the stack to the VMM. The variable is
   only required as output value, so it does not have to exposed to the
   VMM in the first place.

- Prevent an array overrun in the resource control code on systems with
   Sub-NUMA Clustering enabled because the code failed to adjust the
   index by the number of SNC nodes per L3 cache.

* tag 'x86-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Fix arch_mbm_* array overrun on SNC
  x86/tdx: Fix data leak in mmio_read()
  x86/kaslr: Expose and use the end of the physical memory address space
  x86/fpu: Avoid writing LBR bit to IA32_XSS unless supported
  x86/apic: Make x2apic_disable() work correctly

Merge tag 'perf-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Thomas Gleixner:
"A single fix for x86 performance monitoring.

  Haswell PMUs suffer from several errata and require a limit the
  minimal period for counter events, otherwise they suffer from endless
  loops in the PMU interrupt"

* tag 'perf-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Limit the period on Haswell

Merge tag 'locking-urgent-2024-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Thomas Gleixner:
"A single fix for rt_mutex.

  The deadlock detection code drops into an infinite scheduling loop
  while still holding rt_mutex::wait_lock, which rightfully triggers a
  'scheduling in atomic' warning.

  Unlock it before that"

* tag 'locking-urgent-2024-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rtmutex: Drop rt_mutex::wait_lock before scheduling

Merge tag 'irq-urgent-2024-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
"A set of fixes for interrupt chip drivers:

   - Unbreak the PLIC driver for Allwinner D1 systems

     The recent conversion of the PLIC driver to a platform driver broke
     Allwinnder D1 systems due to the deferred probing of platform
     drivers.

     Due to that the only timer available on D1 systems cannot get an
     interrupt, which causes the system to hang at boot. Other RISCV
     platforms are not affected because they provide the architected SBI
     timer which uses the built in core interrupt controller.

     Cure this by probing PLIC early on D1 systems

   - Cure a regression in ARM/GIC-V3 on 32-bit ARM systems caused by the
     recent addition of a initialization function, which accesses system
     registers before they are enabled. On 64-bit ARM they are enabled
     prior to that by sheer luck.

     Ensure they are enabled.

   - Cure a use before check problem in the MSI library. The existing
     NULL pointer check is too late.

   - Cure a lock order inversion in the ARM/GIC-V4 driver

   - Fix a IS_ERR() vs. NULL pointer check issue in the RISCV APLIC
     driver

   - Plug a reference count leak in the ARM/GIC-V2 driver"

* tag 'irq-urgent-2024-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/irq-msi-lib: Check for NULL ops in msi_lib_irq_domain_select()
  irqchip/gic-v3: Init SRE before poking sysregs
  irqchip/gic-v2m: Fix refcount leak in gicv2m_of_init()
  irqchip/riscv-aplic: Fix an IS_ERR() vs NULL bug in probe()
  irqchip/gic-v4: Fix ordering between vmapp and vpe locks
  irqchip/sifive-plic: Probe plic driver early for Allwinner D1 platform

Linux 6.11-rc6

Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- copy_file_range fix

- two read fixes including read past end of file rc fix and read retry
   crediting fix

- falloc zero range fix

* tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region
  cifs: Fix copy offload to flush destination region
  netfs, cifs: Fix handling of short DIO read
  cifs: Fix lack of credit renegotiation on read retry

Merge tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs

Push bcachefs fixes from Kent Overstreet:
"The data corruption in the buffered write path is troubling; inode
  lock should not have been able to cause that...

   - Fix a rare data corruption in the rebalance path, caught as a nonce
     inconsistency on encrypted filesystems

   - Revert lockless buffered write path

   - Mark more errors as autofix"

* tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs:
  bcachefs: Mark more errors as autofix
  bcachefs: Revert lockless buffered IO path
  bcachefs: Fix bch2_extents_match() false positive
  bcachefs: Fix failure to return error in data_update_index_update()

bcachefs: Mark more errors as autofix

errors that are known to always be safe to fix should be autofix: this
should be most errors even at this point, but that will need some
thorough review.

note that errors are still logged in the superblock, so we'll still know
that they happened.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Revert lockless buffered IO path

We had a report of data corruption on nixos when building installer
images.

https://github.com/NixOS/nixpkgs/pull/321055#issuecomment-2184131334

It seems that writes are being dropped, but only when issued by QEMU,
and possibly only in snapshot mode. It's undetermined if it's write
calls are being dropped or dirty folios.

Further testing, via minimizing the original patch to just the change
that skips the inode lock on non appends/truncates, reveals that it
really is just not taking the inode lock that causes the corruption: it
has nothing to do with the other logic changes for preserving write
atomicity in corner cases.

It's also kernel config dependent: it doesn't reproduce with the minimal
kernel config that ktest uses, but it does reproduce with nixos's distro
config. Bisection the kernel config initially pointer the finger at page
migration or compaction, but it appears that was erroneous; we haven't
yet determined what kernel config option actually triggers it.

Sadly it appears this will have to be reverted since we're getting too
close to release and my plate is full, but we'd _really_ like to fully
debug it.

My suspicion is that this patch is exposing a preexisting bug - the
inode lock actually covers very little in IO paths, and we have a
different lock (the pagecache add lock) that guards against races with
truncate here.

Fixes: 7e64c86cdc6c ("bcachefs: Buffered write path now can avoid the inode lock")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull misc fixes from Guenter Roeck.

These are fixes for regressions that Guenther has been reporting, and
the maintainers haven't picked up and sent in. With rc6 fairly imminent,
I'm taking them directly from Guenter.

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  apparmor: fix policy_unpack_test on big endian systems
  Revert "MIPS: csrc-r4k: Apply verification clocksource flags"
  microblaze: don't treat zero reserved memory regions as error

Merge tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull power sequencing fix from Bartosz Golaszewski:
"A follow-up fix for the power sequencing subsystem. It turned out the
  previous fix for this driver was incomplete and broke the WLAN support
  on some platforms. This addresses the issue.

   - set the direction of the wlan-enable GPIO to output after
     requesting it as-is"

* tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  power: sequencing: qcom-wcn: set the wlan-enable GPIO to output

power: sequencing: qcom-wcn: set the wlan-enable GPIO to output

Commit a9aaf1ff88a8 ("power: sequencing: request the WLAN enable GPIO
as-is") broke WLAN on boards on which the wlan-enable GPIO enabling the
wifi module isn't in output mode by default. We need to set direction to
output while retaining the value that was already set to keep the ath
module on if it's already started.

Fixes: a9aaf1ff88a8 ("power: sequencing: request the WLAN enable GPIO as-is")
Link: https://lore.kernel.org/r/20240823115500.37280-1-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

Merge tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB fixes for 6.11-rc6.  Included in here are:

   - dwc3 driver fixes for reported issues

   - MAINTAINER file update, marking a driver as unsupported :(

   - cdnsp driver fixes

   - USB gadget driver fix

   - USB sysfs fix

   - other tiny fixes

   - new device ids for usb serial driver

  All of these have been in linux-next this week with no reported
  issues"

* tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: option: add MeiG Smart SRM825L
  usb: cdnsp: fix for Link TRB with TC
  usb: dwc3: st: add missing depopulate in probe error path
  usb: dwc3: st: fix probed platform device ref count on probe error path
  usb: dwc3: ep0: Don't reset resource alloc flag (including ep0)
  usb: core: sysfs: Unmerge @usb3_hardware_lpm_attr_group in remove_power_attributes()
  usb: typec: fsa4480: Relax CHIP_ID check
  usb: dwc3: xilinx: add missing depopulate in probe error path
  usb: dwc3: omap: add missing depopulate in probe error path
  dt-bindings: usb: microchip,usb2514: Fix reference USB device schema
  usb: gadget: uvc: queue pump work in uvcg_video_enable()
  cdc-acm: Add DISABLE_ECHO quirk for GE HealthCare UI Controller
  usb: cdnsp: fix incorrect index in cdnsp_get_hw_deq function
  usb: dwc3: core: Prevent USB core invalid event buffer address access
  MAINTAINERS: Mark UVC gadget driver as orphan

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Minor fixes only.

  The sd.c one ignores a sync cache request if format is in progress
  which can happen if formatting a drive across suspend/resume"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: Ignore command SYNCHRONIZE CACHE error if format in progress
  scsi: aacraid: Fix double-free on probe failure
  scsi: lpfc: Fix overflow build issue

Merge tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

- One more write delegation fix

* tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease

Merge tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Chandan Babu:

- Do not call out v1 inodes with non-zero di_nlink field as being
   corrupt

- Change xfs_finobt_count_blocks() to count "free inode btree" blocks
   rather than "inode btree" blocks

- Don't report the number of trimmed bytes via FITRIM because the
   underlying storage isn't required to do anything and failed discard
   IOs aren't reported to the caller anyway

- Fix incorrect setting of rm_owner field in an rmap query

- Report missing disk offset range in an fsmap query

- Obtain m_growlock when extending realtime section of the filesystem

- Reset rootdir extent size hint after extending realtime section of
   the filesystem

* tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: reset rootdir extent size hint after growfsrt
  xfs: take m_growlock when running growfsrt
  xfs: Fix missing interval for missing_owner in xfs fsmap
  xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code
  xfs: Fix the owner setting issue for rmap query in xfs fsmap
  xfs: don't bother reporting blocks trimmed via FITRIM
  xfs: xfs_finobt_count_blocks() walks the wrong btree
  xfs: fix folio dirtying for XFILE_ALLOC callers
  xfs: fix di_onlink checking for V1/V2 inodes

Merge tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"There is a fairly large number of bug fixes for Qualcomm platforms,
  most of them addressing issues with the devicetree files for the newly
  added Snapdragon X1 based laptops to make them more reliable.

  The Qualcomm driver changes address a few build-time issues as well as
  runtime problems in the tzmem and scm firmware, the USB Type-C driver,
  and the cmd-db and pmic_glink soc drivers.

  The NXP i.MX usually gets a bunch of devicetree fixes that is
  proportional to the number of supported machines. This includes both
  warning fixes and correctness for the 64-bit i.MX9, i.MX8 and
  layerscape platforms, as well as a single fix for a 32-bit i.MX6 based
  board.

  The other changes are the usual minor changes, including an update to
  the MAINTAINERS file, an omap3 dts file and a SoC driver for mpfs
  (risc-v)"

* tag 'arm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits)
  firmware: microchip: fix incorrect error report of programming:timeout on success
  soc: qcom: pd-mapper: Fix singleton refcount
  firmware: qcom: tzmem: disable sdm670 platform
  soc: qcom: pmic_glink: Actually communicate when remote goes down
  usb: typec: ucsi: Move unregister out of atomic section
  soc: qcom: pmic_glink: Fix race during initialization
  firmware: qcom: qseecom: remove unused functions
  firmware: qcom: tzmem: fix virtual-to-physical address conversion
  firmware: qcom: scm: Mark get_wq_ctx() as atomic call
  arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt
  arm64: dts: qcom: disable GPU on x1e80100 by default
  arm64: dts: imx8mm-phygate: fix typo pinctrcl-0
  arm64: dts: imx95: correct L3Cache cache-sets
  arm64: dts: imx95: correct a55 power-domains
  arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo
  arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges
  ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design
  arm64: dts: imx93: update default value for snps,clk-csr
  arm64: dts: freescale: tqma9352: Fix watchdog reset
  arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962
  ...

Merge tag 'input-for-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fix from Dmitry Torokhov:

- a fix for Cypress PS/2 touchpad for regression introduced in 6.11
   merge window where a timeout condition is incorrectly reported for
   all extended Cypress commands

* tag 'input-for-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: cypress_ps2 - fix waiting for command response

Merge tag 'pci-v6.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

- Add Manivannan Sadhasivam as PCI native host bridge and endpoint
   driver reviewer (Manivannan Sadhasivam)

- Disable MHI RAM data parity error interrupt for qcom SA8775P SoC to
   work around hardware erratum that causes a constant stream of
   interrupts (Manivannan Sadhasivam)

- Don't try to fall back to qcom Operating Performance Points (OPP)
   support unless the platform actually supports OPP (Manivannan
   Sadhasivam)

- Add imx@lists.linux.dev mailing list to MAINTAINERS for NXP
   layerscape and imx6 PCI controller drivers (Frank Li)

* tag 'pci-v6.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  MAINTAINERS: PCI: Add NXP PCI controller mailing list imx@lists.linux.dev
  PCI: qcom: Use OPP only if the platform supports it
  PCI: qcom-ep: Disable MHI RAM data parity error interrupt for SA8775P SoC
  MAINTAINERS: Add Manivannan Sadhasivam as Reviewer for PCI native host bridge and endpoint drivers

Merge tag 'block-6.11-20240830' of git://git.kernel.dk/linux

Pull block fix from Jens Axboe:
"Fix for a single regression for WRITE_SAME introduced in the 6.11
merge window"

* tag 'block-6.11-20240830' of git://git.kernel.dk/linux:
block: fix detection of unsupported WRITE SAME in blkdev_issue_write_zeroes

Merge tag 'io_uring-6.11-20240830' of git://git.kernel.dk/linux

Pull io_uring fixes from Jens Axboe:

- A fix for a regression that happened in 6.11 merge window, where the
   copying of iovecs for compat mode applications got broken for certain
   cases.

- Fix for a bug introduced in 6.10, where if using recv/send bundles
   with classic provided buffers, the recv/send would fail to set the
   right iovec count. This caused 0 byte send/recv results. Found via
   code coverage testing and writing a test case to exercise it.

* tag 'io_uring-6.11-20240830' of git://git.kernel.dk/linux:
  io_uring/kbuf: return correct iovec count from classic buffer peek
  io_uring/rsrc: ensure compat iovecs are copied correctly

Merge tag 'at91-fixes-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/fixes

Microchip AT91 fixes for v6.11

It contains:
- DTS directory update to match all entries not only those starting with
at91 or sama

Merge tag 'lsm-pr-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm fix from Paul Moore:
"One small patch to correct a NFS permissions problem with SELinux and
Smack"

* tag 'lsm-pr-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
selinux,smack: don't bypass permissions check in inode_setsecctx hook

Merge tag 'pm-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix three issues in the amd-pstate cpufreq driver.

  Specifics:

   - Remove checks for highest performance match on preferred cores when
     updating preferred core ranking in amd-pstate (Mario Limonciello)

   - Make amd-pstate call topology_logical_package_id() instead of
     logical_die_id() to get a socked ID for a CPU (Gautham Shenoy)

   - Fix uninitialized variable in amd_pstate_cpu_boost_update() (Dan
     Carpenter)"

* tag 'pm-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq/amd-pstate-ut: Don't check for highest perf matching on prefcore
  cpufreq/amd-pstate: Use topology_logical_package_id() instead of logical_die_id()
  cpufreq: amd-pstate: Fix uninitialized variable in amd_pstate_cpu_boost_update()

Merge tag 'dmaengine-fix-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine fixes from Vinod Koul:

- A bunch of dw driver changes to fix the src/dst addr width config

- Omap driver fix for sglen initialization

- stm32-dma3 driver lli_size init fix

- dw edma driver fixes for watermark interrupts and unmasking STOP and
   ABORT interrupts

* tag 'dmaengine-fix-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: dw-edma: Do not enable watermark interrupts for HDMA
  dmaengine: dw-edma: Fix unmasking STOP and ABORT interrupts for HDMA
  dmaengine: stm32-dma3: Set lli_size after allocation
  dmaengine: ti: omap-dma: Initialize sglen after allocation
  dmaengine: dw: Unify ret-val local variables naming
  dmaengine: dw: Simplify max-burst calculation procedure
  dmaengine: dw: Define encode_maxburst() above prepare_ctllo() callbacks
  dmaengine: dw: Simplify prepare CTL_LO methods
  dmaengine: dw: Add memory bus width verification
  dmaengine: dw: Add peripheral bus width verification

Merge tag 'phy-fixes-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Pull phy fixes from Vinod Koul:

- Qualcomm QMP X1E80100 PCIe Gen4 PHY initialisation fix

- Freescale imx8mq tuning parameter name fix

- Samsung exynos5 fir for error code in probe()

- Xilinx Zynqmp SGMII linkup failure fix

* tag 'phy-fixes-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: xilinx: phy-zynqmp: Fix SGMII linkup failure on resume
  phy: exynos5-usbdrd: fix error code in probe()
  phy: fsl-imx8mq-usb: fix tuning parameter name
  phy: qcom: qmp-pcie: Fix X1E80100 PCIe Gen4 PHY initialisation

Merge tag 'soundwire-6.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire

Pull soundwire fix from Vinod Koul:

- Single fix for non-continous port map programming

* tag 'soundwire-6.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: stream: fix programming slave ports for non-continous port maps

Merge tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:

- Fix a device-stall problem in bad io-page-fault setups (faults
   received from devices with no supporting domain attached).

- Context flush fix for Intel VT-d.

- Do not allow non-read+non-write mapping through iommufd as most
   implementations can not handle that.

- Fix a possible infinite-loop issue in map_pages() path.

- Add Jean-Philippe as reviewer for SMMUv3 SVA support

* tag 'iommu-fixes-v6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  MAINTAINERS: Add Jean-Philippe as SMMUv3 SVA reviewer
  iommu: Do not return 0 from map_pages if it doesn't do anything
  iommufd: Do not allow creating areas without READ or WRITE
  iommu/vt-d: Fix incorrect domain ID in context flush helper
  iommu: Handle iommu faults for a bad iopf setup

MAINTAINERS: PCI: Add NXP PCI controller mailing list imx@lists.linux.dev

Add imx mailing list imx@lists.linux.dev for PCI controller of NXP chips
(Layerscape and iMX).

Link: https://lore.kernel.org/r/20240826202740.970015-1-Frank.Li@nxp.com
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Richard Zhu <hongxing.zhu@nxp.com>

io_uring/kbuf: return correct iovec count from classic buffer peek

io_provided_buffers_select() returns 0 to indicate success, but it should
be returning 1 to indicate that 1 vec was mapped. This causes peeking
to fail with classic provided buffers, and while that's not a use case
that anyone should use, it should still work correctly.

The end result is that no buffer will be selected, and hence a completion
with '0' as the result will be posted, without a buffer attached.

Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease

It is not safe to dereference fl->c.flc_owner without first confirming
fl->fl_lmops is the expected manager.  nfsd4_deleg_getattr_conflict()
tests fl_lmops but largely ignores the result and assumes that flc_owner
is an nfs4_delegation anyway.  This is wrong.

With this patch we restore the "!= &nfsd_lease_mng_ops" case to behave
as it did before the change mentioned below.  This is the same as the
current code, but without any reference to a possible delegation.

Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

io_uring/rsrc: ensure compat iovecs are copied correctly

For buffer registration (or updates), a userspace iovec is copied in
and updated. If the application is within a compat syscall, then the
iovec type is compat_iovec rather than iovec. However, the type used
in __io_sqe_buffers_update() and io_sqe_buffers_register() is always
struct iovec, and hence the source is incremented by the size of a
non-compat iovec in the loop. This misses every other iovec in the
source, and will run into garbage half way through the copies and
return -EFAULT to the application.

Maintain the source address separately and assign to our user vec
pointer, so that copies always happen from the right source address.

While in there, correct a bad placement of __user which triggered
the following sparse warning prior to this fix:

io_uring/rsrc.c:981:33: warning: cast removes address space '__user' of expression
io_uring/rsrc.c:981:30: warning: incorrect type in assignment (different address spaces)
io_uring/rsrc.c:981:30: expected struct iovec const [noderef] __user *uvec
io_uring/rsrc.c:981:30: got struct iovec *[noderef] __user

Fixes: f4eaf8eda89e ("io_uring/rsrc: Drop io_copy_iov in favor of iovec API")
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Merge tag 'usb-serial-6.11-rc6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB-serial device id for 6.11-rc6

Here's a new modem device id.

This one has been in linux-next with no reported issues.

* tag 'usb-serial-6.11-rc6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add MeiG Smart SRM825L

Merge tag 'drm-fixes-2024-08-30' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Another week, another set of GPU fixes. amdgpu and vmwgfx leading the
  charge, then i915 and xe changes along with v3d and some other bits.
  The TTM revert is due to some stuttering graphical apps probably due
  to longer stalls while prefaulting.

  Seems pretty much where I'd expect things,

  ttm:
   - revert prefault change, caused stutters

  aperture:
   - handle non-VGA devices bettter

  amdgpu:
   - SWSMU gaming stability fix
   - SMU 13.0.7 fix
   - SWSMU documentation alignment fix
   - SMU 14.0.x fixes
   - GC 12.x fix
   - Display fix
   - IP discovery fix
   - SMU 13.0.6 fix

  i915:
   - Fix #11195: The external display connect via USB type-C dock stays
     blank after re-connect the dock
   - Make DSI backlight work for 2G version of Lenovo Yoga Tab 3 X90F
   - Move ARL GuC firmware to correct version

  xe:
   - Invalidate media_gt TLBs
   - Fix HWMON i1 power setup write command

  vmwgfx:
   - prevent unmapping active read buffers
   - fix prime with external buffers
   - disable coherent dumb buffers without 3d

  v3d:
   - disable preemption while updating GPU stats"

* tag 'drm-fixes-2024-08-30' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe/hwmon: Fix WRITE_I1 param from u32 to u16
  drm/v3d: Disable preemption while updating GPU stats
  drm/amd/pm: Drop unsupported features on smu v14_0_2
  drm/amd/pm: Add support for new P2S table revision
  drm/amdgpu: support for gc_info table v1.3
  drm/amd/display: avoid using null object of framebuffer
  drm/amdgpu/gfx12: set UNORD_DISPATCH in compute MQDs
  drm/amd/pm: update message interface for smu v14.0.2/3
  drm/amdgpu/swsmu: always force a state reprogram on init
  drm/amdgpu/smu13.0.7: print index for profiles
  drm/amdgpu: align pp_power_profile_mode with kernel docs
  drm/i915/dp_mst: Fix MST state after a sink reset
  drm/xe: Invalidate media_gt TLBs
  drm/i915: ARL requires a newer GSC firmware
  drm/i915/dsi: Make Lenovo Yoga Tab 3 X90F DMI match less strict
  video/aperture: optionally match the device in sysfb_disable()
  drm/vmwgfx: Disable coherent dumb buffers without 3d
  drm/vmwgfx: Fix prime with external buffers
  drm/vmwgfx: Prevent unmapping active read buffers
  Revert "drm/ttm: increase ttm pre-fault value to PMD size"

Merge tag 'drm-misc-fixes-2024-08-29' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

A revert for a previous TTM commit causing stuttering, 3 fixes for
vmwgfx related to buffer operations, a fix for video/aperture with
non-VGA primary devices, and a preemption status fix for v3d

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240829-efficient-swift-from-lemuria-f60c05@houat

Merge tag 'drm-xe-fixes-2024-08-29' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

- Invalidate media_gt TLBs (Brost)
- Fix HWMON i1 power setup write command (Karthik)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZtB-t5f4uXMrKgnV@intel.com

Merge tag 'execve-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve fix from Kees Cook:

- binfmt_elf_fdpic: fix AUXV size with ELF_HWCAP2 (Max Filippov)

* tag 'execve-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf_fdpic: fix AUXV size calculation when ELF_HWCAP2 is defined

dcache: keep dentry_hashtable or d_hash_shift even when not used

The runtime constant feature removes all the users of these variables,
allowing the compiler to optimize them away.  It's quite difficult to
extract their values from the kernel text, and the memory saved by
removing them is tiny, and it was never the point of this optimization.

Since the dentry_hashtable is a core data structure, it's valuable for
debugging tools to be able to read it easily.  For instance, scripts
built on drgn, like the dentrycache script[1], rely on it to be able to
perform diagnostics on the contents of the dcache.  Annotate it as used,
so the compiler doesn't discard it.

Link: https://github.com/oracle-samples/drgn-tools/blob/3afc56146f54d09dfd1f6d3c1b7436eda7e638be/drgn_tools/dentry.py#L325-L355
Fixes: e3c92e81711d ("runtime constants: add x86 architecture support")
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'drm-intel-fixes-2024-08-29' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Fix #11195: The external display connect via USB type-C dock stays blank after re-connect the dock
- Make DSI backlight work for 2G version of Lenovo Yoga Tab 3 X90F
. Move ARL GuC firmware to correct version
-

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZtAd8WTw1xiSu_TS@jlahtine-mobl.ger.corp.intel.com

Merge tag 'hwmon-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- pt5161l: Fix invalid temperature reading of bad ADC values

- asus-ec-sensors: Remove unsupported VRM temperature from X570-E
   GAMING

* tag 'hwmon-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (pt5161l) Fix invalid temperature reading
  hwmon: (asus-ec-sensors) remove VRM temp X570-E GAMING

Merge tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth, wireless and netfilter.

  No known outstanding regressions.

  Current release - regressions:

   - wifi: iwlwifi: fix hibernation

   - eth: ionic: prevent tx_timeout due to frequent doorbell ringing

  Previous releases - regressions:

   - sched: fix sch_fq incorrect behavior for small weights

   - wifi:
      - iwlwifi: take the mutex before running link selection
      - wfx: repair open network AP mode

   - netfilter: restore IP sanity checks for netdev/egress

   - tcp: fix forever orphan socket caused by tcp_abort

   - mptcp: close subflow when receiving TCP+FIN

   - bluetooth: fix random crash seen while removing btnxpuart driver

  Previous releases - always broken:

   - mptcp: more fixes for the in-kernel PM

   - eth: bonding: change ipsec_lock from spin lock to mutex

   - eth: mana: fix race of mana_hwc_post_rx_wqe and new hwc response

  Misc:

   - documentation: drop special comment style for net code"

* tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
  nfc: pn533: Add poll mod list filling check
  mailmap: update entry for Sriram Yagnaraman
  selftests: mptcp: join: check re-re-adding ID 0 signal
  mptcp: pm: ADD_ADDR 0 is not a new address
  selftests: mptcp: join: validate event numbers
  mptcp: avoid duplicated SUB_CLOSED events
  selftests: mptcp: join: check re-re-adding ID 0 endp
  mptcp: pm: fix ID 0 endp usage after multiple re-creations
  mptcp: pm: do not remove already closed subflows
  selftests: mptcp: join: no extra msg if no counter
  selftests: mptcp: join: check re-adding init endp with != id
  mptcp: pm: reset MPC endp ID when re-added
  mptcp: pm: skip connecting to already established sf
  mptcp: pm: send ACK on an active subflow
  selftests: mptcp: join: check removing ID 0 endpoint
  mptcp: pm: fix RM_ADDR ID for the initial subflow
  mptcp: pm: reuse ID 0 after delete and re-add
  net: busy-poll: use ktime_get_ns() instead of local_clock()
  sctp: fix association labeling in the duplicate COOKIE-ECHO case
  mptcp: pr_debug: add missing \n at the end
  ...

Input: cypress_ps2 - fix waiting for command response

Commit 8bccf667f62a ("Input: cypress_ps2 - report timeouts when reading
command status") uncovered an existing problem with cypress_ps2 driver:
it tries waiting on a PS/2 device waitqueue without using the rest of
libps2. Unfortunately without it nobody signals wakeup for the
waiting process, and each "extended" command was timing out. But the
rest of the code simply did not notice it.

Fix this by switching from homegrown way of sending request to get
command response and reading it to standard ps2_command() which does
the right thing.

Reported-by: Woody Suwalski <terraluna977@gmail.com>
Tested-by: Woody Suwalski <terraluna977@gmail.com>
Fixes: 8bccf667f62a ("Input: cypress_ps2 - report timeouts when reading command status")
Link: https://lore.kernel.org/r/a8252e0f-dab4-ef5e-2aa1-407a6f4c7204@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

drm/xe/hwmon: Fix WRITE_I1 param from u32 to u16

WRITE_I1 sub-command of the POWER_SETUP pcode command accepts a u16
parameter instead of u32. This change prevents potential illegal
sub-command errors.

v2: Mask uval instead of changing the prototype. (Badal)

v3: Rephrase commit message. (Badal)

Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Fixes: 92d44a422d0d ("drm/xe/hwmon: Expose card reactive critical power")
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240827155301.183383-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit a7f657097e96d8fa745c74bb1a239ebd5a8c971c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

nfc: pn533: Add poll mod list filling check

In case of im_protocols value is 1 and tm_protocols value is 0 this
combination successfully passes the check
'if (!im_protocols && !tm_protocols)' in the nfc_start_poll().
But then after pn533_poll_create_mod_list() call in pn533_start_poll()
poll mod list will remain empty and dev->poll_mod_count will remain 0
which lead to division by zero.

Normally no im protocol has value 1 in the mask, so this combination is
not expected by driver. But these protocol values actually come from
userspace via Netlink interface (NFC_CMD_START_POLL operation). So a
broken or malicious program may pass a message containing a "bad"
combination of protocol parameter values so that dev->poll_mod_count
is not incremented inside pn533_poll_create_mod_list(), thus leading
to division by zero.
Call trace looks like:
nfc_genl_start_poll()
  nfc_start_poll()
    ->start_poll()
    pn533_start_poll()

Add poll mod list filling check.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: dfccd0f58044 ("NFC: pn533: Add some polling entropy")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240827084822.18785-1-amishin@t-argos.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

fuse: disable the combination of passthrough and writeback cache

Current design and handling of passthrough is without fuse
caching and with that FUSE_WRITEBACK_CACHE is conflicting.

Fixes: 7dc4e97a4f9a ("fuse: introduce FUSE_PASSTHROUGH capability")
Cc: stable@kernel.org # v6.9
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Merge tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 sets on NFT_PKTINFO_L4PROTO for UDP packets less than 4 bytes
payload from netdev/egress by subtracting skb_network_offset() when
validating IPv4 packet length, otherwise 'meta l4proto udp' never
matches.

Patch #2 subtracts skb_network_offset() when validating IPv6 packet
length for netdev/egress.

netfilter pull request 24-08-28

* tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables_ipv6: consider network offset in netdev/egress validation
netfilter: nf_tables: restore IP sanity checks for netdev/egress
====================

Link: https://patch.msgid.link/20240828214708.619261-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mailmap: update entry for Sriram Yagnaraman

Link my old est.tech address to my active mail address

Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://patch.msgid.link/20240828072417.4111996-1-sriram.yagnaraman@ericsson.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'mptcp-more-fixes-for-the-in-kernel-pm'

Matthieu Baerts says:

====================
mptcp: more fixes for the in-kernel PM

Here is a new batch of fixes for the MPTCP in-kernel path-manager:

Patch 1 ensures the address ID is set to 0 when the path-manager sends
an ADD_ADDR for the address of the initial subflow. The same fix is
applied when a new subflow is created re-using this special address. A
fix for v6.0.

Patch 2 is similar, but for the case where an endpoint is removed: if
this endpoint was used for the initial address, it is important to send
a RM_ADDR with this ID set to 0, and look for existing subflows with the
ID set to 0. A fix for v6.0 as well.

Patch 3 validates the two previous patches.

Patch 4 makes the PM selecting an "active" path to send an address
notification in an ACK, instead of taking the first path in the list. A
fix for v5.11.

Patch 5 fixes skipping the establishment of a new subflow if a previous
subflow using the same pair of addresses is being closed. A fix for
v5.13.

Patch 6 resets the ID linked to the initial subflow when the linked
endpoint is re-added, possibly with a different ID. A fix for v6.0.

Patch 7 validates the three previous patches.

Patch 8 is a small fix for the MPTCP Join selftest, when being used with
older subflows not supporting all MIB counters. A fix for a commit
introduced in v6.4, but backported up to v5.10.

Patch 9 avoids the PM to try to close the initial subflow multiple
times, and increment counters while nothing happened. A fix for v5.10.

Patch 10 stops incrementing local_addr_used and add_addr_accepted
counters when dealing with the address ID 0, because these counters are
not taking into account the initial subflow, and are then not
decremented when the linked addresses are removed. A fix for v6.0.

Patch 11 validates the previous patch.

Patch 12 avoids the PM to send multiple SUB_CLOSED events for the
initial subflow. A fix for v5.12.

Patch 13 validates the previous patch.

Patch 14 stops treating the ADD_ADDR 0 as a new address, and accepts it
in order to re-create the initial subflow if it has been closed, even if
the limit for *new* addresses -- not taking into account the address of
the initial subflow -- has been reached. A fix for v5.10.

Patch 15 validates the previous patch.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Matthieu Baerts (NGI0) (15):
      mptcp: pm: reuse ID 0 after delete and re-add
      mptcp: pm: fix RM_ADDR ID for the initial subflow
      selftests: mptcp: join: check removing ID 0 endpoint
      mptcp: pm: send ACK on an active subflow
      mptcp: pm: skip connecting to already established sf
      mptcp: pm: reset MPC endp ID when re-added
      selftests: mptcp: join: check re-adding init endp with != id
      selftests: mptcp: join: no extra msg if no counter
      mptcp: pm: do not remove already closed subflows
      mptcp: pm: fix ID 0 endp usage after multiple re-creations
      selftests: mptcp: join: check re-re-adding ID 0 endp
      mptcp: avoid duplicated SUB_CLOSED events
      selftests: mptcp: join: validate event numbers
      mptcp: pm: ADD_ADDR 0 is not a new address
      selftests: mptcp: join: check re-re-adding ID 0 signal

net/mptcp/pm.c                                  |   4 +-
net/mptcp/pm_netlink.c                          |  87 ++++++++++----
net/mptcp/protocol.c                            |   6 +
net/mptcp/protocol.h                            |   5 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 153 ++++++++++++++++++++----
tools/testing/selftests/net/mptcp/mptcp_lib.sh  |   4 +
6 files changed, 209 insertions(+), 50 deletions(-)
---
base-commit: 3a0504d54b3b57f0d7bf3d9184a00c9f8887f6d7
change-id: 20240826-net-mptcp-more-pm-fix-ffa61a36f817

Best regards,
====================

Link: https://patch.msgid.link/20240828-net-mptcp-more-pm-fix-v2-0-7f11b283fff7@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mptcp: join: check re-re-adding ID 0 signal

This test extends "delete re-add signal" to validate the previous
commit: when the 'signal' endpoint linked to the initial subflow (ID 0)
is re-added multiple times, it will re-send the ADD_ADDR with id 0. The
client should still be able to re-create this subflow, even if the
add_addr_accepted limit has been reached as this special address is not
considered as a new address.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: ADD_ADDR 0 is not a new address

The ADD_ADDR 0 with the address from the initial subflow should not be
considered as a new address: this is not something new. If the host
receives it, it simply means that the address is available again.

When receiving an ADD_ADDR for the ID 0, the PM already doesn't consider
it as new by not incrementing the 'add_addr_accepted' counter. But the
'accept_addr' might not be set if the limit has already been reached:
this can be bypassed in this case. But before, it is important to check
that this ADD_ADDR for the ID 0 is for the same address as the initial
subflow. If not, it is not something that should happen, and the
ADD_ADDR can be ignored.

Note that if an ADD_ADDR is received while there is already a subflow
opened using the same address, this ADD_ADDR is ignored as well. It
means that if multiple ADD_ADDR for ID 0 are received, there will not be
any duplicated subflows created by the client.

Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mptcp: join: validate event numbers

This test extends "delete and re-add" and "delete re-add signal" to
validate the previous commit: the number of MPTCP events are checked to
make sure there are no duplicated or unexpected ones.

A new helper has been introduced to easily check these events. The
missing events have been added to the lib.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: b911c97c7dc7 ("mptcp: add netlink event support")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: avoid duplicated SUB_CLOSED events

The initial subflow might have already been closed, but still in the
connection list. When the worker is instructed to close the subflows
that have been marked as closed, it might then try to close the initial
subflow again.

A consequence of that is that the SUB_CLOSED event can be seen twice:

  # ip mptcp endpoint
  1.1.1.1 id 1 subflow dev eth0
  2.2.2.2 id 2 subflow dev eth1

  # ip mptcp monitor &
  [         CREATED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
  [     ESTABLISHED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
  [  SF_ESTABLISHED] remid=0 locid=2 saddr4=2.2.2.2 daddr4=9.9.9.9

  # ip mptcp endpoint delete id 1
  [       SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
  [       SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9

The first one is coming from mptcp_pm_nl_rm_subflow_received(), and the
second one from __mptcp_close_subflow().

To avoid doing the post-closed processing twice, the subflow is now
marked as closed the first time.

Note that it is not enough to check if we are dealing with the first
subflow and check its sk_state: the subflow might have been reset or
closed before calling mptcp_close_ssk().

Fixes: b911c97c7dc7 ("mptcp: add netlink event support")
Cc: stable@vger.kernel.org
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mptcp: join: check re-re-adding ID 0 endp

This test extends "delete and re-add" to validate the previous commit:
when the endpoint linked to the initial subflow (ID 0) is re-added
multiple times, it was no longer being used, because the internal linked
counters are not decremented for this special endpoint: it is not an
additional endpoint.

Here, the "del/add id 0" steps are done 3 times to unsure this case is
validated.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: fix ID 0 endp usage after multiple re-creations

'local_addr_used' and 'add_addr_accepted' are decremented for addresses
not related to the initial subflow (ID0), because the source and
destination addresses of the initial subflows are known from the
beginning: they don't count as "additional local address being used" or
"ADD_ADDR being accepted".

It is then required not to increment them when the entrypoint used by
the initial subflow is removed and re-added during a connection. Without
this modification, this entrypoint cannot be removed and re-added more
than once.

Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/512
Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Reported-by: syzbot+455d38ecd5f655fc45cf@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/00000000000049861306209237f4@google.com
Cc: stable@vger.kernel.org
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: do not remove already closed subflows

It is possible to have in the list already closed subflows, e.g. the
initial subflow has been already closed, but still in the list. No need
to try to close it again, and increments the related counters again.

Fixes: 0ee4261a3681 ("mptcp: implement mptcp_pm_remove_subflow")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mptcp: join: no extra msg if no counter

The checksum and fail counters might not be available. Then no need to
display an extra message with missing info.

While at it, fix the indentation around, which is wrong since the same
commit.

Fixes: 47867f0a7e83 ("selftests: mptcp: join: skip check if MIB counter not supported")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mptcp: join: check re-adding init endp with != id

The initial subflow has a special local ID: 0. It is specific per
connection.

When a global endpoint is deleted and re-added later, it can have a
different ID, but the kernel should still use the ID 0 if it corresponds
to the initial address.

This test validates this behaviour: the endpoint linked to the initial
subflow is removed, and re-added with a different ID.

Note that removing the initial subflow will not decrement the 'subflows'
counters, which corresponds to the *additional* subflows. On the other
hand, when the same endpoint is re-added, it will increment this
counter, as it will be seen as an additional subflow this time.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: reset MPC endp ID when re-added

The initial subflow has a special local ID: 0. It is specific per
connection.

When a global endpoint is deleted and re-added later, it can have a
different ID -- most services managing the endpoints automatically don't
force the ID to be the same as before. It is then important to track
these modifications to be consistent with the ID being used for the
address used by the initial subflow, not to confuse the other peer or to
send the ID 0 for the wrong address.

Now when removing an endpoint, msk->mpc_endpoint_id is reset if it
corresponds to this endpoint. When adding a new endpoint, the same
variable is updated if the address match the one of the initial subflow.

Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: skip connecting to already established sf

The lookup_subflow_by_daddr() helper checks if there is already a
subflow connected to this address. But there could be a subflow that is
closing, but taking time due to some reasons: latency, losses, data to
process, etc.

If an ADD_ADDR is received while the endpoint is being closed, it is
better to try connecting to it, instead of rejecting it: the peer which
has sent the ADD_ADDR will not be notified that the ADD_ADDR has been
rejected for this reason, and the expected subflow will not be created
at the end.

This helper should then only look for subflows that are established, or
going to be, but not the ones being closed.

Fixes: d84ad04941c3 ("mptcp: skip connecting the connected address")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: send ACK on an active subflow

Taking the first one on the list doesn't work in some cases, e.g. if the
initial subflow is being removed. Pick another one instead of not
sending anything.

Fixes: 84dfe3677a6f ("mptcp: send out dedicated ADD_ADDR packet")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mptcp: join: check removing ID 0 endpoint

Removing the endpoint linked to the initial subflow should trigger a
RM_ADDR for the right ID, and the removal of the subflow. That's what is
now being verified in the "delete and re-add" test.

Note that removing the initial subflow will not decrement the 'subflows'
counters, which corresponds to the *additional* subflows. On the other
hand, when the same endpoint is re-added, it will increment this
counter, as it will be seen as an additional subflow this time.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: fix RM_ADDR ID for the initial subflow

The initial subflow has a special local ID: 0. When an endpoint is being
deleted, it is then important to check if its address is not linked to
the initial subflow to send the right ID.

If there was an endpoint linked to the initial subflow, msk's
mpc_endpoint_id field will be set. We can then use this info when an
endpoint is being removed to see if it is linked to the initial subflow.

So now, the correct IDs are passed to mptcp_pm_nl_rm_addr_or_subflow(),
it is no longer needed to use mptcp_local_id_match().

Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: reuse ID 0 after delete and re-add

When the endpoint used by the initial subflow is removed and re-added
later, the PM has to force the ID 0, it is a special case imposed by the
MPTCP specs.

Note that the endpoint should then need to be re-added reusing the same
ID.

Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'amd-drm-fixes-6.11-2024-08-28' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.11-2024-08-28:

amdgpu:
- SWSMU gaming stability fix
- SMU 13.0.7 fix
- SWSMU documentation alignment fix
- SMU 14.0.x fixes
- GC 12.x fix
- Display fix
- IP discovery fix
- SMU 13.0.6 fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240828184908.125387-1-alexander.deucher@amd.com

Merge tag 'random-6.11-rc6-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random

Pull random number generator fix from Jason Donenfeld:
"Reject invalid flags passed to vgetrandom() in the same way that
  getrandom() does, so that the behavior is the same, from Yann.

  The flags argument to getrandom() only has a behavioral effect on the
  function if the RNG isn't initialized yet, so vgetrandom() falls back
  to the syscall in that case. But if the RNG is initialized, all of the
  flags behave the same way, so vgetrandom() didn't bother checking
  them, and just ignored them entirely.

  But that doesn't account for invalid flags passed in, which need to be
  rejected so we can use them later"

* tag 'random-6.11-rc6-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  random: vDSO: reject unknown getrandom() flags

net: busy-poll: use ktime_get_ns() instead of local_clock()

Typically, busy-polling durations are below 100 usec.

When/if the busy-poller thread migrates to another cpu,
local_clock() can be off by +/-2msec or more for small
values of HZ, depending on the platform.

Use ktimer_get_ns() to ensure deterministic behavior,
which is the whole point of busy-polling.

Fixes: 060212928670 ("net: add low latency socket poll")
Fixes: 9a3c71aa8024 ("net: convert low latency sockets to sched_clock()")
Fixes: 37089834528b ("sched, net: Fixup busy_loop_us_clock()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240827114916.223377-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-2024-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Regressions:
* wfx: fix for open network connection
* iwlwifi: fix for hibernate (due to fast resume feature)
* iwlwifi: fix for a few warnings that were recently added
   (had previously been messages not warnings)

Previously broken:
* mwifiex: fix static structures used for per-device data
* iwlwifi: some harmless FW related messages were tagged
   too high priority
* iwlwifi: scan buffers weren't checked correctly
* mac80211: SKB leak on beacon error path
* iwlwifi: fix ACPI table interop with certain BIOSes
* iwlwifi: fix locking for link selection
* mac80211: fix SSID comparison in beacon validation

* tag 'wireless-2024-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: clear trans->state earlier upon error
  wifi: wfx: repair open network AP mode
  wifi: mac80211: free skb on error path in ieee80211_beacon_get_ap()
  wifi: iwlwifi: mvm: don't wait for tx queues if firmware is dead
  wifi: iwlwifi: mvm: allow 6 GHz channels in MLO scan
  wifi: iwlwifi: mvm: pause TCM when the firmware is stopped
  wifi: iwlwifi: fw: fix wgds rev 3 exact size
  wifi: iwlwifi: mvm: take the mutex before running link selection
  wifi: iwlwifi: mvm: fix iwl_mvm_max_scan_ie_fw_cmd_room()
  wifi: iwlwifi: mvm: fix iwl_mvm_scan_fits() calculation
  wifi: iwlwifi: lower message level for FW buffer destination
  wifi: iwlwifi: mvm: fix hibernation
  wifi: mac80211: fix beacon SSID mismatch handling
  wifi: mwifiex: duplicate static structs used in driver instances
====================

Link: https://patch.msgid.link/20240828100151.23662-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selinux,smack: don't bypass permissions check in inode_setsecctx hook

Marek Gresko reports that the root user on an NFS client is able to
change the security labels on files on an NFS filesystem that is
exported with root squashing enabled.

The end of the kerneldoc comment for __vfs_setxattr_noperm() states:

*  This function requires the caller to lock the inode's i_mutex before it
*  is executed. It also assumes that the caller will make the appropriate
*  permission checks.

nfsd_setattr() does do permissions checking via fh_verify() and
nfsd_permission(), but those don't do all the same permissions checks
that are done by security_inode_setxattr() and its related LSM hooks do.

Since nfsd_setattr() is the only consumer of security_inode_setsecctx(),
simplest solution appears to be to replace the call to
__vfs_setxattr_noperm() with a call to __vfs_setxattr_locked().  This
fixes the above issue and has the added benefit of causing nfsd to
recall conflicting delegations on a file when a client tries to change
its security label.

Cc: stable@kernel.org
Reported-by: Marek Gresko <marek.gresko@protonmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218809
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region

Under certain conditions, the range to be cleared by FALLOC_FL_ZERO_RANGE
may only be buffered locally and not yet have been flushed to the server.
For example:

xfs_io -f -t -c "pwrite -S 0x41 0 4k" \
     -c "pwrite -S 0x42 4k 4k" \
     -c "fzero 0 4k" \
     -c "pread -v 0 8k" /xfstest.test/foo

will write two 4KiB blocks of data, which get buffered in the pagecache,
and then fallocate() is used to clear the first 4KiB block on the server -
but we don't flush the data first, which means the EOF position on the
server is wrong, and so the FSCTL_SET_ZERO_DATA RPC fails (and xfs_io
ignores the error), but then when we try to read it, we see the old data.

Fix this by preflushing any part of the target region that above the
server's idea of the EOF position to force the server to update its EOF
position.

Note, however, that we don't want to simply expand the file by moving the
EOF before doing the FSCTL_SET_ZERO_DATA[*] because someone else might see
the zeroed region or if the RPC fails we then have to try to clean it up or
risk getting corruption.

[*] And we have to move the EOF first otherwise FSCTL_SET_ZERO_DATA won't
do what we want.

This fixes the generic/008 xfstest.

[!] Note: A better way to do this might be to split the operation into two
parts: we only do FSCTL_SET_ZERO_DATA for the part of the range below the
server's EOF and then, if that worked, invalidate the buffered pages for the
part above the range.

Fixes: 6b69040247e1 ("cifs/smb3: Fix data inconsistent when zero file range")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <stfrench@microsoft.com>
cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
cc: Pavel Shilovsky <pshilov@microsoft.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>

Merge tag 'riscv-soc-fixes-for-v6.11-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into arm/fixes

RISC-V soc fixes for v6.11-rc6

Prevent an erroneous unconditional report of a timeout during firmware
upload in the mpfs-auto-update driver.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-soc-fixes-for-v6.11-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
firmware: microchip: fix incorrect error report of programming:timeout on success

Link: https://lore.kernel.org/r/20240828-fidelity-almighty-18d5434aaef0@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'qcom-arm64-fixes-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes

Qualcomm Arm64 DeviceTree fixes for v6.11

On X1E the GPU node is disabled by default, to be enabled in the
individual devices once the developers install the required firmware.

The generic EDP panel driver used on the X1E CRD is replaced with the
Samsung ATNA45AF01 driver, in order to ensure backlight is brought back
up after being turned off.

The pin configuration for PCIe-related pins are corrected across all the
X1E targets. The PCIe controllers gain a minimum OPP vote, and PCIe
domain numbers are corrected.

WiFi calibration variant information is added to the Lenovo Yoga Slim
7x, to pick the right data from the firmware packages.

The incorrect Adreno SMMU global interrupt is corrected.

For IPQ5332, the IRQ triggers for the USB controller are corrected.

* tag 'qcom-arm64-fixes-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (23 commits)
  arm64: dts: qcom: x1e80100: Fix Adreno SMMU global interrupt
  arm64: dts: qcom: disable GPU on x1e80100 by default
  arm64: dts: qcom: x1e80100-crd: Fix backlight
  arm64: dts: qcom: x1e80100-yoga-slim7x: fix missing PCIe4 gpios
  arm64: dts: qcom: x1e80100-yoga-slim7x: disable PCIe6a perst pull down
  arm64: dts: qcom: x1e80100-yoga-slim7x: fix up PCIe6a pinctrl node
  arm64: dts: qcom: x1e80100-yoga-slim7x: fix PCIe4 PHY supply
  arm64: dts: qcom: x1e80100-vivobook-s15: fix missing PCIe4 gpios
  arm64: dts: qcom: x1e80100-vivobook-s15: disable PCIe6a perst pull down
  arm64: dts: qcom: x1e80100-vivobook-s15: fix up PCIe6a pinctrl node
  arm64: dts: qcom: x1e80100-vivobook-s15: fix PCIe4 PHY supply
  arm64: dts: qcom: x1e80100-qcp: fix missing PCIe4 gpios
  arm64: dts: qcom: x1e80100-qcp: disable PCIe6a perst pull down
  arm64: dts: qcom: x1e80100-qcp: fix up PCIe6a pinctrl node
  arm64: dts: qcom: x1e80100-qcp: fix PCIe4 PHY supply
  arm64: dts: qcom: x1e80100-crd: fix missing PCIe4 gpios
  arm64: dts: qcom: x1e80100-crd: disable PCIe6a perst pull down
  arm64: dts: qcom: x1e80100-crd: fix up PCIe6a pinctrl node
  arm64: dts: qcom: x1e80100: add missing PCIe minimum OPP
  arm64: dts: qcom: x1e80100: fix PCIe domain numbers
  ...

Link: https://lore.kernel.org/r/20240826152426.1648383-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'qcom-arm64-defconfig-fixes-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes

Qualcomm Arm64 defconfig fix for 6.11

Enable the Samsung ATNA33XC20 display panel driver, as we switched from
the generic EDP panel for some of the X1E devices in v6.11.

* tag 'qcom-arm64-defconfig-fixes-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: defconfig: Add CONFIG_DRM_PANEL_SAMSUNG_ATNA33XC20

Link: https://lore.kernel.org/r/20240826145736.1646729-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'qcom-drivers-fixes-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes

Qualcomm driver fixes for v6.11

This corrects the tzmem virt-to-phys conversion, which caused issues for
the uefisecapp implementation of EFI variable access. SDM670 is excluded
from tzmem usage due to reported issues.

The SCM get wait queue context call is corrected to be marked ATOMIC and
some dead code in qseecom, following the tzmem conversion, is removed.

The memory backing command DB is remapped writecombined, to avoid XPU
violations when Linux runs without the Qualcomm hypervisor.

Two compile fixes are added for pd-mapper, and the broken reference
count is corrected, to make pd-mapper deal with remoteprocs going away.

In pmic_glink a race condition where the client callbacks might be
called before we returned the client handle is corrected. The broken conditions
for when to signal that the firmware is going down is also corrected.

In the pmic_glink UCSI driver, the ucsi_unregister() is moved out of the
pdr callback, as this is being invoked in atomic context.

Konrad's email address is updated in MAINTAINERS, and related mailmap
entries are added.

* tag 'qcom-drivers-fixes-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
  soc: qcom: pd-mapper: Fix singleton refcount
  firmware: qcom: tzmem: disable sdm670 platform
  soc: qcom: pmic_glink: Actually communicate when remote goes down
  usb: typec: ucsi: Move unregister out of atomic section
  soc: qcom: pmic_glink: Fix race during initialization
  firmware: qcom: qseecom: remove unused functions
  firmware: qcom: tzmem: fix virtual-to-physical address conversion
  firmware: qcom: scm: Mark get_wq_ctx() as atomic call
  MAINTAINERS: Update Konrad Dybcio's email address
  mailmap: Add an entry for Konrad Dybcio
  soc: qcom: pd-mapper: mark qcom_pdm_domains as __maybe_unused
  soc: qcom: cmd-db: Map shared memory as WC, not WB
  soc: qcom: pd-mapper: Depend on ARCH_QCOM || COMPILE_TEST

Link: https://lore.kernel.org/r/20240826145209.1646159-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'imx-fixes-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes

i.MX fixes for 6.11:

- One imx8mp-beacon-kit change from Adam Ford to fix the broken WM8962
  audio support
- One pinctrl property typo fix for imx8mm-phygate
- One layerscape fix from Krzysztof Kozlowski to get thermal nodes
  correct name length
- A couple of imx93-tqma9352 fixes from Markus Niebel, one on CMA
  alloc-ranges and the other on SD-Card cd-gpios typo
- One change from Michal Vokáč to fix imx6dl-yapp43 LED current to match
  the HW design
- A couple of imx95 fixes from Peng Fan, one to correct a55 power
  domains and the other to correct L3Cache cache-sets
- One tqma9352 watchdog reset fix from Sascha Hauer
- One imx93 change from Shenwei Wang to fix the default value for STMMAC
  EQOS snps,clk-csr

* tag 'imx-fixes-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  arm64: dts: imx8mm-phygate: fix typo pinctrcl-0
  arm64: dts: imx95: correct L3Cache cache-sets
  arm64: dts: imx95: correct a55 power-domains
  arm64: dts: freescale: imx93-tqma9352-mba93xxla: fix typo
  arm64: dts: freescale: imx93-tqma9352: fix CMA alloc-ranges
  ARM: dts: imx6dl-yapp43: Increase LED current to match the yapp4 HW design
  arm64: dts: imx93: update default value for snps,clk-csr
  arm64: dts: freescale: tqma9352: Fix watchdog reset
  arm64: dts: imx8mp-beacon-kit: Fix Stereo Audio on WM8962
  arm64: dts: layerscape: fix thermal node names length

Link: https://lore.kernel.org/r/ZrtsTO1+jXhJ6GSM@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'omap-for-v6.11/fixes-signed' of https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap into arm/fixes

OMAP fixes for v6.11-rc

- omap3-n900: fix accelerometer orientation

* tag 'omap-for-v6.11/fixes-signed' of https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap:
ARM: dts: omap3-n900: correct the accelerometer orientation

Link: https://lore.kernel.org/r/7h4j7eyhyh.fsf@baylibre.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'loongarch-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
"Remove the unused dma-direct.h, and some bug & warning fixes"

* tag 'loongarch-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Invalidate guest steal time address on vCPU reset
  LoongArch: Add ifdefs to fix LSX and LASX related warnings
  LoongArch: Define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE
  LoongArch: Remove the unused dma-direct.h