Git Repo - linux.git/log

net: nexthop: Initialize all fields in dumped nexthops

struct nexthop_grp contains two reserved fields that are not initialized by
nla_put_nh_group(), and carry garbage. This can be observed e.g. with
strace (edited for clarity):

    # ip nexthop add id 1 dev lo
    # ip nexthop add id 101 group 1
    # strace -e recvmsg ip nexthop get id 101
    ...
    recvmsg(... [{nla_len=12, nla_type=NHA_GROUP},
                 [{id=1, weight=0, resvd1=0x69, resvd2=0x67}]] ...) = 52

The fields are reserved and therefore not currently used. But as they are, they
leak kernel memory, and the fact they are not just zero complicates repurposing
of the fields for new ends. Initialize the full structure.

Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: Correct byte order of perfect_match

The perfect_match parameter of the update_vlan_hash operation is __le16,
and is correctly converted from host byte-order in the lone caller,
stmmac_vlan_update().

However, the implementations of this caller, dwxgmac2_update_vlan_hash()
and dwxgmac2_update_vlan_hash(), both treat this parameter as host byte
order, using the following pattern:

u32 value = ...
...
writel(value | perfect_match, ...);

This is not correct because both:
1) value is host byte order; and
2) writel expects a host byte order value as it's first argument

I believe that this will break on big endian systems. And I expect it
has gone unnoticed by only being exercised on little endian systems.

The approach taken by this patch is to update the callback, and it's
caller to simply use a host byte order value.

Flagged by Sparse.
Compile tested only.

Fixes: c7ab0b8088d7 ("net: stmmac: Fallback to VLAN Perfect filtering if HASH is not available")
Signed-off-by: Simon Horman <[email protected]>
Reviewed-by: Maxime Chevallier <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

io_uring: align iowq and task request error handling

There is a difference in how io_queue_sqe and io_wq_submit_work treat
error codes they get from io_issue_sqe. The first one fails anything
unknown but latter only fails when the code is negative.

It doesn't make sense to have this discrepancy, align them to the
io_queue_sqe behaviour.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/c550e152bf4a290187f91a4322ddcb5d6d1f2c73.1721819383.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

io_uring: kill REQ_F_CANCEL_SEQ

We removed the reliance on the flag by the cancellation path and now
it's unused.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/e57afe566bbe4fefeb44daffb08900f2a4756577.1721819383.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

io_uring: simplify io_uring_cmd return

We don't have to return error code from an op handler back to core
io_uring, so once io_uring_cmd() sets the results and handles errors we
can juts return IOU_OK and simplify the code.

Note, only valid with e0b23d9953b0c ("io_uring: optimise ltimeout for
inline execution"), there was a problem with iopoll before.

Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/8eae2be5b2a49236cd5f1dadbd1aa5730e9e2d4f.1721819383.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

io_uring: fix io_match_task must_hold

The __must_hold annotation in io_match_task() uses a non existing
parameter "req", fix it.

Fixes: 6af3f48bf6156 ("io_uring: fix link traversal locking")
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/3e65ee7709e96507cef3d93291746f2c489f2307.1721819383.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

io_uring: don't allow netpolling with SETUP_IOPOLL

IORING_SETUP_IOPOLL rings don't have any netpoll handling, let's fail
attempts to register netpolling in this case, there might be people who
will mix up IOPOLL and netpoll.

Cc: [email protected]
Fixes: ef1186c1a875b ("io_uring: add register/unregister napi function")
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/1e7553aee0a8ae4edec6742cd6dd0c1e6914fba8.1721819383.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

io_uring: tighten task exit cancellations

io_uring_cancel_generic() should retry if any state changes like a
request is completed, however in case of a task exit it only goes for
another loop and avoids schedule() if any tracked (i.e. REQ_F_INFLIGHT)
request got completed.

Let's assume we have a non-tracked request executing in iowq and a
tracked request linked to it. Let's also assume
io_uring_cancel_generic() fails to find and cancel the request, i.e.
via io_run_local_work(), which may happen as io-wq has gaps.
Next, the request logically completes, io-wq still hold a ref but queues
it for completion via tw, which happens in
io_uring_try_cancel_requests(). After, right before prepare_to_wait()
io-wq puts the request, grabs the linked one and tries executes it, e.g.
arms polling. Finally the cancellation loop calls prepare_to_wait(),
there are no tw to run, no tracked request was completed, so the
tctx_inflight() check passes and the task is put to indefinite sleep.

Cc: [email protected]
Fixes: 3f48cf18f886c ("io_uring: unify files and task cancel")
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/acac7311f4e02ce3c43293f8f1fda9c705d158f1.1721819383.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

riscv: boot: remove duplicated targets line

The "targets:" is duplicated in another line, remove the one with less
targets.

Signed-off-by: Jisheng Zhang <[email protected]>
Reviewed-by: Emil Renner Berthing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

trace: riscv: Remove deprecated kprobe on ftrace support

Since commit 7caa9765465f60 ("ftrace: riscv: move from REGS to ARGS"),
kprobe on ftrace is not supported by riscv, because riscv's support for
FTRACE_WITH_REGS has been replaced with support for FTRACE_WITH_ARGS, and
KPROBES_ON_FTRACE will be supplanted by FPROBES. So remove the deprecated
kprobe on ftrace support, which is misunderstood.

Signed-off-by: Jinjie Ruan <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>

selftests: forwarding: skip if kernel not support setting bridge fdb learning limit

If the testing kernel doesn't support setting fdb_max_learned or show
fdb_n_learned, just skip it. Or we will get errors like

./bridge_fdb_learning_limit.sh: line 218: [: null: integer expression expected
./bridge_fdb_learning_limit.sh: line 225: [: null: integer expression expected

Fixes: 6f84090333bb ("selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest")
Signed-off-by: Hangbin Liu <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Reviewed-by: Johannes Nixdorf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: Return non-zero value from tipc_udp_addr2str() on error

tipc_udp_addr2str() should return non-zero value if the UDP media
address is invalid. Otherwise, a buffer overflow access can occur in
tipc_media_addr_printf(). Fix this by returning 1 on an invalid UDP
media address.

Fixes: d0f91938bede ("tipc: add ip/udp media type")
Signed-off-by: Shigeru Yoshida <[email protected]>
Reviewed-by: Tung Nguyen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

thermal: core: Back off when polling thermal zones on errors

Commit a8a261774466 ("thermal: core: Call monitor_thermal_zone() if zone
temperature is invalid") introduced a polling mechanism by which the
thermal core attampts to get a valid temperature value for thermal zones
where the .get_temp() callback returns errors to start with (for
example, due to initialization ordering woes).  However, this polling is
carried out periodically ad infinitum and every iteration of it causes
a message to be printed to the kernel log which means a lot of log noise
on systems where there are thermal zones that never get ready for some
reason.  It is also not really useful to continuously poll thermal zones
that never respond.

To address this, modify the thermal core to increase the delay between
consecutive thermal zone temperature checks after every check that fails
until it reaches a certain maximum value.  At that point, the thermal
zone in question will be disabled, but user space will be able to
reenable it if it believes that the failure is transient.

Also change the code to print messages regarding failed temperature
checks to the kernel log only twice, once when the thermal zone's
.get_temp() callback returns an error for the first time and once when
disabling the given thermal zone.  In addition, a dev_crit() message
will be printed at that point if the given thermal zone contains a
critical trip point to notify the system operator about the situation.

Fixes: a8a261774466 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid")
Link: https://lore.kernel.org/linux-acpi/CAGnHSE=RyPK++UG0-wAtVKgeJxe0uzFYgLxm+RUOKKoQquW=Ow@mail.gmail.com/
Reported-by: Tom Yan <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Link: https://patch.msgid.link/[email protected]

ASoC: SOF: ipc4-topology: Preserve the DMA Link ID for ChainDMA on unprepare

The DMA Link ID is set to the IPC message's primary during dai_config,
which is only during hw_params.
During xrun handling the hw_params is not called and the DMA Link ID
information will be lost.

All other fields in the message expected to be 0 for re-configuration, only
the DMA Link ID needs to be preserved and the in case of repeated
dai_config, it is correctly updated (masked and then set).

Cc: [email protected]
Fixes: ca5ce0caa67f ("ASoC: SOF: ipc4/intel: Add support for chained DMA")
Link: https://github.com/thesofproject/linux/issues/5116
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>

ASoC: SOF: ipc4-topology: Only handle dai_config with HW_PARAMS for ChainDMA

The DMA Link ID is only valid in snd_sof_dai_config_data when the
dai_config is called with HW_PARAMS.

The commit that this patch fixes is actually moved a code section without
changing it, the same bug exists in the original code, needing different
patch to kernel prior to 6.9 kernels.

Cc: [email protected]
Fixes: 3858464de57b ("ASoC: SOF: ipc4-topology: change chain_dma handling in dai_config")
Link: https://github.com/thesofproject/linux/issues/5116
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Bard Liao <[email protected]>
Reviewed-by: Ranjani Sridharan <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>

kbuild: rpm-pkg: Fix C locale setup

semicolon separation in LC_ALL is wrong. Either variable needs to be
exported before as a separate commit or set as part of the commit in the
beginning. Used second variant.

This fixes broken build on user's locale setup which makes 'date' binary
to produce invalid characters in rpm changelog (e.g. cs_CZ.UTF-8 'čec'):

$ make binrpm-pkg
  GEN     rpmbuild/SPECS/kernel.spec
rpmbuild -bb rpmbuild/SPECS/kernel.spec --define='_topdirlinux/rpmbuild' \
    --target x86_64-linux --build-in-place --noprep --define='_smp_mflags \
    %{nil}' $(rpm -q rpm >/dev/null 2>&1 || echo --nodeps)
Building target platforms: x86_64-linux
Building for target x86_64-linux
error: bad date in %changelog: St čec 24 2024 user <user@somehost>
make[2]: *** [scripts/Makefile.package:71: binrpm-pkg] Error 1
make[1]: *** [linux/Makefile:1546: binrpm-pkg] Error 2
make: *** [Makefile:224: __sub-make] Error 2

Fixes: 301c10908e42 ("kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec")
Signed-off-by: Petr Vorel <[email protected]>
Reviewed-by: Miguel Ojeda <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

inode: clarify what's locked

In __wait_on_freeing_inode() we warn in case the inode_hash_lock is held
but the inode is unhashed. We then release the inode_lock. So using
"locked" as parameter name is confusing. Use is_inode_hash_locked as
parameter name instead.

Signed-off-by: Christian Brauner <[email protected]>

vfs: Fix potential circular locking through setxattr() and removexattr()

When using cachefiles, lockdep may emit something similar to the circular
locking dependency notice below.  The problem appears to stem from the
following:

(1) Cachefiles manipulates xattrs on the files in its cache when called
     from ->writepages().

(2) The setxattr() and removexattr() system call handlers get the name
     (and value) from userspace after taking the sb_writers lock, putting
     accesses of the vma->vm_lock and mm->mmap_lock inside of that.

(3) The afs filesystem uses a per-inode lock to prevent multiple
     revalidation RPCs and in writeback vs truncate to prevent parallel
     operations from deadlocking against the server on one side and local
     page locks on the other.

Fix this by moving the getting of the name and value in {get,remove}xattr()
outside of the sb_writers lock.  This also has the minor benefits that we
don't need to reget these in the event of a retry and we never try to take
the sb_writers lock in the event we can't pull the name and value into the
kernel.

Alternative approaches that might fix this include moving the dispatch of a
write to the cache off to a workqueue or trying to do without the
validation lock in afs.  Note that this might also affect other filesystems
that use netfslib and/or cachefiles.

======================================================
WARNING: possible circular locking dependency detected
6.10.0-build2+ #956 Not tainted
------------------------------------------------------
fsstress/6050 is trying to acquire lock:
ffff888138fd82f0 (mapping.invalidate_lock#3){++++}-{3:3}, at: filemap_fault+0x26e/0x8b0

but task is already holding lock:
ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&vma->vm_lock->lock){++++}-{3:3}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        down_write+0x3b/0x50
        vma_start_write+0x6b/0xa0
        vma_link+0xcc/0x140
        insert_vm_struct+0xb7/0xf0
        alloc_bprm+0x2c1/0x390
        kernel_execve+0x65/0x1a0
        call_usermodehelper_exec_async+0x14d/0x190
        ret_from_fork+0x24/0x40
        ret_from_fork_asm+0x1a/0x30

-> #3 (&mm->mmap_lock){++++}-{3:3}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        __might_fault+0x7c/0xb0
        strncpy_from_user+0x25/0x160
        removexattr+0x7f/0x100
        __do_sys_fremovexattr+0x7e/0xb0
        do_syscall_64+0x9f/0x100
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #2 (sb_writers#14){.+.+}-{0:0}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        percpu_down_read+0x3c/0x90
        vfs_iocb_iter_write+0xe9/0x1d0
        __cachefiles_write+0x367/0x430
        cachefiles_issue_write+0x299/0x2f0
        netfs_advance_write+0x117/0x140
        netfs_write_folio.isra.0+0x5ca/0x6e0
        netfs_writepages+0x230/0x2f0
        afs_writepages+0x4d/0x70
        do_writepages+0x1e8/0x3e0
        filemap_fdatawrite_wbc+0x84/0xa0
        __filemap_fdatawrite_range+0xa8/0xf0
        file_write_and_wait_range+0x59/0x90
        afs_release+0x10f/0x270
        __fput+0x25f/0x3d0
        __do_sys_close+0x43/0x70
        do_syscall_64+0x9f/0x100
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #1 (&vnode->validate_lock){++++}-{3:3}:
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        down_read+0x95/0x200
        afs_writepages+0x37/0x70
        do_writepages+0x1e8/0x3e0
        filemap_fdatawrite_wbc+0x84/0xa0
        filemap_invalidate_inode+0x167/0x1e0
        netfs_unbuffered_write_iter+0x1bd/0x2d0
        vfs_write+0x22e/0x320
        ksys_write+0xbc/0x130
        do_syscall_64+0x9f/0x100
        entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #0 (mapping.invalidate_lock#3){++++}-{3:3}:
        check_noncircular+0x119/0x160
        check_prev_add+0x195/0x430
        __lock_acquire+0xaf0/0xd80
        lock_acquire.part.0+0x103/0x280
        down_read+0x95/0x200
        filemap_fault+0x26e/0x8b0
        __do_fault+0x57/0xd0
        do_pte_missing+0x23b/0x320
        __handle_mm_fault+0x2d4/0x320
        handle_mm_fault+0x14f/0x260
        do_user_addr_fault+0x2a2/0x500
        exc_page_fault+0x71/0x90
        asm_exc_page_fault+0x22/0x30

other info that might help us debug this:

Chain exists of:
   mapping.invalidate_lock#3 --> &mm->mmap_lock --> &vma->vm_lock->lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   rlock(&vma->vm_lock->lock);
                                lock(&mm->mmap_lock);
                                lock(&vma->vm_lock->lock);
   rlock(mapping.invalidate_lock#3);

  *** DEADLOCK ***

1 lock held by fsstress/6050:
  #0: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250

stack backtrace:
CPU: 0 PID: 6050 Comm: fsstress Not tainted 6.10.0-build2+ #956
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x80
  check_noncircular+0x119/0x160
  ? queued_spin_lock_slowpath+0x4be/0x510
  ? __pfx_check_noncircular+0x10/0x10
  ? __pfx_queued_spin_lock_slowpath+0x10/0x10
  ? mark_lock+0x47/0x160
  ? init_chain_block+0x9c/0xc0
  ? add_chain_block+0x84/0xf0
  check_prev_add+0x195/0x430
  __lock_acquire+0xaf0/0xd80
  ? __pfx___lock_acquire+0x10/0x10
  ? __lock_release.isra.0+0x13b/0x230
  lock_acquire.part.0+0x103/0x280
  ? filemap_fault+0x26e/0x8b0
  ? __pfx_lock_acquire.part.0+0x10/0x10
  ? rcu_is_watching+0x34/0x60
  ? lock_acquire+0xd7/0x120
  down_read+0x95/0x200
  ? filemap_fault+0x26e/0x8b0
  ? __pfx_down_read+0x10/0x10
  ? __filemap_get_folio+0x25/0x1a0
  filemap_fault+0x26e/0x8b0
  ? __pfx_filemap_fault+0x10/0x10
  ? find_held_lock+0x7c/0x90
  ? __pfx___lock_release.isra.0+0x10/0x10
  ? __pte_offset_map+0x99/0x110
  __do_fault+0x57/0xd0
  do_pte_missing+0x23b/0x320
  __handle_mm_fault+0x2d4/0x320
  ? __pfx___handle_mm_fault+0x10/0x10
  handle_mm_fault+0x14f/0x260
  do_user_addr_fault+0x2a2/0x500
  exc_page_fault+0x71/0x90
  asm_exc_page_fault+0x22/0x30

Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
cc: Alexander Viro <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Jan Kara <[email protected]>
cc: Jeff Layton <[email protected]>
cc: Gao Xiang <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
[brauner: fix minor issues]
Signed-off-by: Christian Brauner <[email protected]>

filelock: Fix fcntl/close race recovery compat path

When I wrote commit 3cad1bc01041 ("filelock: Remove locks reliably when
fcntl/close race is detected"), I missed that there are two copies of the
code I was patching: The normal version, and the version for 64-bit offsets
on 32-bit kernels.
Thanks to Greg KH for stumbling over this while doing the stable
backport...

Apply exactly the same fix to the compat path for 32-bit kernels.

Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling")
Cc: [email protected]
Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563
Signed-off-by: Jann Horn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

fs: use all available ids

The counter is unconditionally incremented for each mount allocation.
If we set it to 1ULL << 32 we're losing 4294967296 as the first valid
non-32 bit mount id.

Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

cachefiles: Set the max subreq size for cache writes to MAX_RW_COUNT

Set the maximum size of a subrequest that writes to cachefiles to be
MAX_RW_COUNT so that we don't overrun the maximum write we can make to the
backing filesystem.

Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
cc: Jeff Layton <[email protected]>
cc: [email protected]
cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>

netfs: Fix writeback that needs to go to both server and cache

When netfslib is performing writeback (ie. ->writepages), it maintains two
parallel streams of writes, one to the server and one to the cache, but it
doesn't mark either stream of writes as active until it gets some data that
needs to be written to that stream.

This is done because some folios will only be written to the cache
(e.g. copying to the cache on read is done by marking the folios and
letting writeback do the actual work) and sometimes we'll only be writing
to the server (e.g. if there's no cache).

Now, since we don't actually dispatch uploads and cache writes in parallel,
but rather flip between the streams, depending on which has the lowest
so-far-issued offset, and don't wait for the subreqs to finish before
flipping, we can end up in a situation where, say, we issue a write to the
server and this completes before we start the write to the cache.

But because we only activate a stream when we first add a subreq to it, the
result collection code may run before we manage to activate the stream -
resulting in the folio being cleaned and having the writeback-in-progress
mark removed. At this point, the folio no longer belongs to us.

This is only really a problem for folios that need to be written to both
streams - and in that case, the upload to the server is started first,
followed by the write to the cache - and the cache write may see a bad
folio.

Fix this by activating the cache stream up front if there's a cache
available. If there's a cache, then all data is going to be written to it.

Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
cc: Jeff Layton <[email protected]>
cc: [email protected]
cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>

pidfs: add selftests for new namespace ioctls

Add selftests to verify that deriving namespace file descriptors from
pidfd file descriptors works correctly.

Link: https://lore.kernel.org/r/20240722-work-pidfs-69dbea91edab@brauner
Signed-off-by: Christian Brauner <[email protected]>

pidfs: handle kernels without namespaces cleanly

The nsproxy structure contains nearly all of the namespaces associated
with a task. When a given namespace type is not supported by this kernel
the rules whether the corresponding pointer in struct nsproxy is NULL or
always init_<ns_type>_ns differ per namespace. Ideally, that wouldn't be
the case and for all namespace types we'd always set it to
init_<ns_type>_ns when the corresponding namespace type isn't supported.

Make sure we handle all namespaces where the pointer in struct nsproxy
can be NULL when the namespace type isn't supported.

Link: https://lore.kernel.org/r/20240722-work-pidfs-e6a83030f63e@brauner
Fixes: 5b08bd408534 ("pidfs: allow retrieval of namespace file descriptors") # mainline only
Signed-off-by: Christian Brauner <[email protected]>

pidfs: when time ns disabled add check for ioctl

syzbot call pidfd_ioctl() with cmd "PIDFD_GET_TIME_NAMESPACE" and disabled
CONFIG_TIME_NS, since time_ns is NULL, it will make NULL ponter deref in
open_namespace.

Fixes: 5b08bd408534 ("pidfs: allow retrieval of namespace file descriptors") # mainline only
Reported-and-tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=34a0ee986f61f15da35d
Signed-off-by: Edward Adam Davis <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

vfs: correct the comments of vfs_*() helpers

correct the comments of vfs_*() helpers in fs/namei.c, including:
1. vfs_create()
2. vfs_mknod()
3. vfs_mkdir()
4. vfs_rmdir()
5. vfs_symlink()

All of them come from the same commit:
6521f8917082 "namei: prepare for idmapped mounts"

The @dentry is actually the dentry of child directory rather than
base directory(parent directory), and thus the @dir has to be
modified due to the change of @dentry.

Signed-off-by: Congjie Zhou <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

vfs: handle __wait_on_freeing_inode() and evict() race

Lockless hash lookup can find and lock the inode after it gets the
I_FREEING flag set, at which point it blocks waiting for teardown in
evict() to finish.

However, the flag is still set even after evict() wakes up all waiters.

This results in a race where if the inode lock is taken late enough, it
can happen after both hash removal and wakeups, meaning there is nobody
to wake the racing thread up.

This worked prior to RCU-based lookup because the entire ordeal was
synchronized with the inode hash lock.

Since unhashing requires the inode lock, we can safely check whether it
happened after acquiring it.

Link: https://lore.kernel.org/v9fs/[email protected]/
Reported-by: Dominique Martinet <[email protected]>
Fixes: 7180f8d91fcb ("vfs: add rcu-based find_inode variants for iget ops")
Signed-off-by: Mateusz Guzik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>

netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG

CONFIG_FSCACHE_DEBUG should have been renamed to CONFIG_NETFS_DEBUG, so do
that now.

Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
cc: Uwe Kleine-König <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Jeff Layton <[email protected]>
cc: [email protected]
cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>

netfs: Revert "netfs: Switch debug logging to pr_debug()"

Revert commit 163eae0fb0d4c610c59a8de38040f8e12f89fd43 to get back the
original operation of the debugging macros.

Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
cc: Uwe Kleine-König <[email protected]>
cc: Christian Brauner <[email protected]>
cc: Jeff Layton <[email protected]>
cc: [email protected]
cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>

netfilter: nft_set_pipapo_avx2: disable softinterrupts

We need to disable softinterrupts, else we get following problem:

1. pipapo_avx2 called from process context; fpu usable
2. preempt_disable() called, pcpu scratchmap in use
3. softirq handles rx or tx, we re-enter pipapo_avx2
4. fpu busy, fallback to generic non-avx version
5. fallback reuses scratch map and index, which are in use
by the preempted process

Handle this same way as generic version by first disabling
softinterrupts while the scratchmap is in use.

Fixes: f0b3d338064e ("netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version")
Cc: Stefano Brivio <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Reviewed-by: Stefano Brivio <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge tag 'perf-tools-fixes-for-v6.11-2024-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Namhyung Kim:
"Two fixes for building perf and other tools:

   - Fix breakage in tracing tools due to pkg-config for
     libtrace{event,fs}

   - Fix build of perf when libunwind is used"

* tag 'perf-tools-fixes-for-v6.11-2024-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf dso: Fix build when libunwind is enabled
  tools/latency: Use pkg-config in lib_setup of Makefile.config
  tools/rtla: Use pkg-config in lib_setup of Makefile.config
  tools/verification: Use pkg-config in lib_setup of Makefile.config
  tools: Make pkg-config dependency checks usable by other tools
  perf build: Warn if libtracefs is not found

Merge tag 'execve-v6.11-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve fix from Kees Cook:
"This moves the exec and binfmt_elf tests out of your way and into the
  tests/ subdirectory, following the newly ratified KUnit naming
  conventions. :)"

* tag 'execve-v6.11-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  execve: Move KUnit tests to tests/ subdirectory

parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN

Allow users to disable kernel warnings for unaligned memory
accesses from kernel via the /proc/sys/kernel/ignore-unaligned-usertrap
procfs entry.
That way users can disable those warnings in case they happen too
often.

Signed-off-by: Helge Deller <[email protected]>

cifs: mount with "unix" mount option for SMB1 incorrectly handled

Although by default we negotiate CIFS Unix Extensions for SMB1 mounts to
Samba (and they work if the user does not specify "unix" or "posix" or
"linux" on mount), and we do properly handle when a user turns them off
with "nounix" mount parm. But with the changes to the mount API we
broke cases where the user explicitly specifies the "unix" option (or
equivalently "linux" or "posix") on mount with vers=1.0 to Samba or other
servers which support the CIFS Unix Extensions.

"mount error(95): Operation not supported"

and logged:

"CIFS: VFS: Check vers= mount option. SMB3.11 disabled but required for POSIX extensions"

even though CIFS Unix Extensions are supported for vers=1.0 This patch fixes
the case where the user specifies both "unix" (or equivalently "posix" or
"linux") and "vers=1.0" on mount to a server which supports the
CIFS Unix Extensions.

Cc: [email protected]
Reviewed-by: David Howells <[email protected]>
Reviewed-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: fix reconnect with SMB1 UNIX Extensions

When mounting with the SMB1 Unix Extensions (e.g. mounts
to Samba with vers=1.0), reconnects no longer reset the
Unix Extensions (SetFSInfo SET_FILE_UNIX_BASIC) after tcon so most
operations (e.g. stat, ls, open, statfs) will fail continuously
with:
"Operation not supported"
if the connection ever resets (e.g. due to brief network disconnect)

Cc: [email protected]
Reviewed-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: Steve French <[email protected]>

ice: Fix recipe read procedure

When ice driver reads recipes from firmware information about
need_pass_l2 and allow_pass_l2 flags is not stored correctly.
Those flags are stored as one bit each in ice_sw_recipe structure.
Because of that, the result of checking a flag has to be casted to bool.
Note that the need_pass_l2 flag currently works correctly, because
it's stored in the first bit.

Fixes: bccd9bce29e0 ("ice: Add guard rule when creating FDB in switchdev")
Reviewed-by: Marcin Szycik <[email protected]>
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Wojciech Drewek <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ice: Add a per-VF limit on number of FDIR filters

While the iavf driver adds a s/w limit (128) on the number of FDIR
filters that the VF can request, a malicious VF driver can request more
than that and exhaust the resources for other VFs.

Add a similar limit in ice.

CC: [email protected]
Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF")
Reviewed-by: Przemek Kitszel <[email protected]>
Suggested-by: Sridhar Samudrala <[email protected]>
Signed-off-by: Ahmed Zaki <[email protected]>
Reviewed-by: Wojciech Drewek <[email protected]>
Tested-by: Rafal Romanowski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

Merge tag 'f2fs-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
"A pretty small update including mostly minor bug fixes in zoned
  storage along with the large section support.

  Enhancements:
   - add support for FS_IOC_GETFSSYSFSPATH
   - enable atgc dynamically if conditions are met
   - use new ioprio Macro to get ckpt thread ioprio level
   - remove unreachable lazytime mount option parsing

  Bug fixes:
   - fix null reference error when checking end of zone
   - fix start segno of large section
   - fix to cover read extent cache access with lock
   - don't dirty inode for readonly filesystem
   - allocate a new section if curseg is not the first seg in its zone
   - only fragment segment in the same section
   - truncate preallocated blocks in f2fs_file_open()
   - fix to avoid use SSR allocate when do defragment
   - fix to force buffered IO on inline_data inode

  And some minor code clean-ups and sanity checks"

* tag 'f2fs-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (26 commits)
  f2fs: clean up addrs_per_{inode,block}()
  f2fs: clean up F2FS_I()
  f2fs: use meta inode for GC of COW file
  f2fs: use meta inode for GC of atomic file
  f2fs: only fragment segment in the same section
  f2fs: fix to update user block counts in block_operations()
  f2fs: remove unreachable lazytime mount option parsing
  f2fs: fix null reference error when checking end of zone
  f2fs: fix start segno of large section
  f2fs: remove redundant sanity check in sanity_check_inode()
  f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid
  f2fs: fix to use mnt_{want,drop}_write_file replace file_{start,end}_wrtie
  f2fs: clean up set REQ_RAHEAD given rac
  f2fs: enable atgc dynamically if conditions are met
  f2fs: fix to truncate preallocated blocks in f2fs_file_open()
  f2fs: fix to cover read extent cache access with lock
  f2fs: fix return value of f2fs_convert_inline_inode()
  f2fs: use new ioprio Macro to get ckpt thread ioprio level
  f2fs: fix to don't dirty inode for readonly filesystem
  f2fs: fix to avoid use SSR allocate when do defragment
  ...

Merge tag 'jfs-6.11' of github.com:kleikamp/linux-shaggy

Pull jfs updates from David Kleikamp:
"Folio conversion from Matthew Wilcox and a few various fixes"

* tag 'jfs-6.11' of github.com:kleikamp/linux-shaggy:
  jfs: don't walk off the end of ealist
  jfs: Fix shift-out-of-bounds in dbDiscardAG
  jfs: Fix array-index-out-of-bounds in diFree
  jfs: fix null ptr deref in dtInsertEntry
  jfs: Remove use of folio error flag
  fs: Remove i_blocks_per_page
  jfs: Change metapage->page to metapage->folio
  jfs: Convert force_metapage to use a folio
  jfs: Convert inc_io to take a folio
  jfs: Convert page_to_mp to folio_to_mp
  jfs; Convert __invalidate_metapages to use a folio
  jfs: Convert dec_io to take a folio
  jfs: Convert drop_metapage and remove_metapage to take a folio
  jfs; Convert release_metapage to use a folio
  jfs: Convert insert_metapage() to take a folio
  jfs: Convert __get_metapage to use a folio
  jfs: Convert metapage_writepage to metapage_write_folio
  jfs: Convert metapage_read_folio to use folio APIs

Merge tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

- Remove tristate choice support from Kconfig

- Stop using the PROVIDE() directive in the linker script

- Reduce the number of links for the combination of CONFIG_KALLSYMS and
   CONFIG_DEBUG_INFO_BTF

- Enable the warning for symbol reference to .exit.* sections by
   default

- Fix warnings in RPM package builds

- Improve scripts/make_fit.py to generate a FIT image with separate
   base DTB and overlays

- Improve choice value calculation in Kconfig

- Fix conditional prompt behavior in choice in Kconfig

- Remove support for the uncommon EMAIL environment variable in Debian
   package builds

- Remove support for the uncommon "name <email>" form for the DEBEMAIL
   environment variable

- Raise the minimum supported GNU Make version to 4.0

- Remove stale code for the absolute kallsyms

- Move header files commonly used for host programs to scripts/include/

- Introduce the pacman-pkg target to generate a pacman package used in
   Arch Linux

- Clean up Kconfig

* tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (65 commits)
  kbuild: doc: gcc to CC change
  kallsyms: change sym_entry::percpu_absolute to bool type
  kallsyms: unify seq and start_pos fields of struct sym_entry
  kallsyms: add more original symbol type/name in comment lines
  kallsyms: use \t instead of a tab in printf()
  kallsyms: avoid repeated calculation of array size for markers
  kbuild: add script and target to generate pacman package
  modpost: use generic macros for hash table implementation
  kbuild: move some helper headers from scripts/kconfig/ to scripts/include/
  Makefile: add comment to discourage tools/* addition for kernel builds
  kbuild: clean up scripts/remove-stale-files
  kconfig: recursive checks drop file/lineno
  kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec
  kallsyms: get rid of code for absolute kallsyms
  kbuild: Create INSTALL_PATH directory if it does not exist
  kbuild: Abort make on install failures
  kconfig: remove 'e1' and 'e2' macros from expression deduplication
  kconfig: remove SYMBOL_CHOICEVAL flag
  kconfig: add const qualifiers to several function arguments
  kconfig: call expr_eliminate_yn() at least once in expr_eliminate_dups()
  ...

Merge tag 'rpmsg-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux

Pull rpmsg updates from Bjorn Andersson:

- fix interrupt handling in the stm32 remoteproc driver when being
   attached to an already running remote processor

- fix invalid kernel-doc and add missing MODULE_DESCRIPTION() in the
   rpmsg char driver

* tag 'rpmsg-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
  rpmsg: char: add missing MODULE_DESCRIPTION() macro
  remoteproc: stm32_rproc: Fix mailbox interrupts queuing
  rpmsg: char: Fix rpmsg_eptdev structure documentation

Merge tag 'rproc-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux

Pull remoteproc updates from Bjorn Andersson:

- The maximum amount of DDR memory used by the Mediatek MT8188/MT8195
   SCP is increased to handle new use cases. Handling of optional L1TCM
   memory is made actually optional.

- An optimization is introduced to only clear the unused portion of IPI
   shared buffers, rather than the entire buffer before writing the
   message.

- Detection for IPC-only mode in the TI K3 DSP remoteproc driver is
   corrected. The loglevel of a debug print in the same is lowered from
   error.

- Support for attaching to an running remote processor is added to the
   Xilinx R5F.

- An in-kernel implementation of the Qualcomm "protected domain mapper"
   (aka service registry) service is introduced, to remove the
   dependency on a userspace implementation to detect when the battery
   monitor and USB Type-C port manager becomes available. This is then
   integrated with the Qualcomm remoteproc driver.

- The Qualcomm PAS remoteproc driver gains support for attempting to
   bust hwspinlocks held by the remote processor when it
   crashed/stopped.

- The TI OMAP remoteproc driver is transitioned to use devres helpers
   for various forms of allocations.

- Parsing of memory-regions in the i.MX remoteproc driver is improved
   to avoid a NULL pointer dereference if the phandle reference is
   empty. of_node reference counting is corrected in the same.

* tag 'rproc-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
  remoteproc: mediatek: Increase MT8188/MT8195 SCP core0 DRAM size
  remoteproc: k3-dsp: Fix log levels where appropriate
  remoteproc: xlnx: Add attach detach support
  remoteproc: qcom: select AUXILIARY_BUS
  remoteproc: k3-r5: Fix IPC-only mode detection
  remoteproc: mediatek: Don't attempt to remap l1tcm memory if missing
  remoteproc: qcom: enable in-kernel PD mapper
  dt-bindings: remoteproc: imx_rproc: Add minItems for power-domain
  remoteproc: imx_rproc: Fix refcount mistake in imx_rproc_addr_init
  remoteproc: omap: Use devm_rproc_add() helper
  remoteproc: omap: Use devm action to release reserved memory
  remoteproc: omap: Use devm_rproc_alloc() helper
  remoteproc: imx_rproc: Skip over memory region when node value is NULL
  dt-bindings: remoteproc: k3-dsp: Correct optional sram properties for AM62A SoCs
  remoteproc: qcom_q6v5_pas: Add hwspinlock bust on stop
  soc: qcom: smem: Add qcom_smem_bust_hwspin_lock_by_host()
  remoteproc: mediatek: Zero out only remaining bytes of IPI buffer

Merge tag 'hwlock-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux

Pull hwspinlock updates from Bjorn Andersson:
"This introduces a mechanism in the hardware spinlock framework, and
  the Qualcomm TCSR mutex driver, for allowing clients to bust locks
  held by a remote processor in the event that this enters a faulty
  state while holding the shared lock"

* tag 'hwlock-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
  hwspinlock: qcom: implement bust operation
  hwspinlock: Introduce hwspin_lock_bust()

Merge tag 'sh-for-v6.11-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux

Pull sh updates from John Paul Adrian Glaubitz:
"This is rather small this time and contains just three changes.

  The first change by Oscar Salvador drops support for memory hotplug
  and hotremove for sh as the kernel stopped supporting it on 32-bit
  platforms since 7ec58a2b941e ("mm/memory_hotplug: restrict
  CONFIG_MEMORY_HOTPLUG to 64 bit").

  That then results in a follow-up change to update all affected board
  config files.

  The third change comes from Jeff Johnson which adds the missing
  MODULE_DESCRIPTION() macro to the push-switch driver"

* tag 'sh-for-v6.11-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
  sh: push-switch: Add missing MODULE_DESCRIPTION() macro
  sh: config: Drop CONFIG_MEMORY_{HOTPLUG,HOTREMOVE}
  sh: Drop support for memory hotplug and memory hotremove

Merge tag 'modules-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull module update from Luis Chamberlain:
"This is a super boring development cycle this time around for modules,
  there is only one patch in this pull request.

  The patch deals with a corner case set of dependencies which is not
  resolved today to ensure users get the module they need on initramfs.
  Currently only one module is known to exist which needs this, however
  this can grow to capture other corner cases likely escaped and not
  reported before. The kernel change is just a section update, the real
  work is done and merged already on upstream kmod.

  This has been on linux-next for 3 weeks now"

* tag 'modules-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  module: create weak dependecies

clk: samsung: fix getting Exynos4 fin_pll rate from external clocks

Commit 0dc83ad8bfc9 ("clk: samsung: Don't register clkdev lookup for the
fixed rate clocks") claimed registering clkdev lookup is not necessary
anymore, but that was not entirely true: Exynos4210/4212/4412 clock code
still relied on it to get the clock rate of xxti or xusbxti external
clocks.

Drop that requirement by accessing already registered clk_hw when
looking up the xxti/xusbxti rate.

Reported-by: Artur Weber <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Fixes: 0dc83ad8bfc9 ("clk: samsung: Don't register clkdev lookup for the fixed rate clocks")
Cc: <[email protected]>
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Tested-by: Artur Weber <[email protected]> # Exynos4212
Signed-off-by: Stephen Boyd <[email protected]>

Merge tag 'livepatching-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching

Pull livepatching update from Petr Mladek:

- show patch->replace flag in sysfs

- add or improve few selftests

* tag 'livepatching-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
  livepatch: Replace snprintf() with sysfs_emit()
  selftests/livepatch: Add selftests for "replace" sysfs attribute
  livepatch: Add "replace" sysfs attribute
  selftests: livepatch: Test atomic replace against multiple modules
  selftests/livepatch: define max test-syscall processes

Merge tag 'i2c-for-6.11-rc1-second-batch' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull more i2c updates from Wolfram Sang:
"The I2C core has two header documentation updates as the dependecies
  are in now.

  The I2C host drivers add some patches which nearly fell through the
  cracks:

   - Added descriptions in the DTS for the Qualcomm SM8650 and SM8550
     Camera Control Interface (CCI).

   - Added support for the "settle-time-us" property, which allows the
     gpio-mux device to switch from one bus to another with a
     configurable delay. The time can be set in the DTS. The latest
     change also includes file sorting.

   - Fixed slot numbering in the SMBus framework to prevent failures
     when more than 8 slots are occupied. It now enforces a a maximum of
     8 slots to be used. This ensures that the Intel PIIX4 device can
     register the SPDs correctly without failure, even if other slots
     are populated but not used"

* tag 'i2c-for-6.11-rc1-second-batch' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: header: improve kdoc for i2c_algorithm
  i2c: header: remove unneeded stuff regarding i2c_algorithm
  i2c: piix4: Register SPDs
  i2c: smbus: remove i801 assumptions from SPD probing
  i2c: mux: gpio: Add support for the 'settle-time-us' property
  i2c: mux: gpio: Re-order #include to match alphabetic order
  dt-bindings: i2c: mux-gpio: Add 'settle-time-us' property
  dt-bindings: i2c: qcom-cci: Document sm8650 compatible
  dt-bindings: i2c: qcom-cci: Document sm8550 compatible

Merge tag 'mailbox-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox

Pull mailbox updates from Jassi Brar:
"broadcom:
   - remove unused pdc_dma_map

  imx:
   - fix TXDB_V2 channel race condition

  mediatek:
   - cleanup and refactor driver
   - add bindings for gce-props

  omap:
   - fix mailbox interrupt sharing

  qcom:
   - add bindings for SA8775p
   - add CPUCP driver

  zynqmp:
   - make polling period configurable"

* tag 'mailbox-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox:
  mailbox: mtk-cmdq: Move devm_mbox_controller_register() after devm_pm_runtime_enable()
  mailbox: zynqmp-ipi: Make polling period configurable
  mailbox: qcom-cpucp: fix 64BIT dependency
  mailbox: Add support for QTI CPUCP mailbox controller
  dt-bindings: mailbox: qcom: Add CPUCP mailbox controller bindings
  dt-bindings: remoteproc: qcom,sa8775p-pas: Document the SA8775p ADSP, CDSP and GPDSP
  mailbox: mtk-cmdq: add missing MODULE_DESCRIPTION() macro
  mailbox: bcm-pdc: remove unused struct 'pdc_dma_map'
  mailbox: imx: fix TXDB_V2 channel race condition
  mailbox: omap: Fix mailbox interrupt sharing
  mailbox: mtk-cmdq: Dynamically allocate clk_bulk_data structure
  mailbox: mtk-cmdq: Move and partially refactor clocks probe
  mailbox: mtk-cmdq: Stop requiring name for GCE clock
  dt-bindings: mailbox: Add mediatek,gce-props.yaml

Merge tag 'pcmcia-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux

Pull PCMCIA updates from Dominik Brodowski:
"A number of tiny cleanups of the PCMCIA subsystem by Jeff Johnson,
  Jules Irenge, and Krzysztof Kozlowski"

* tag 'pcmcia-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux:
  pcmcia: add missing MODULE_DESCRIPTION() macros
  pcmcia: Use resource_size function on resource object
  pcmcia: bcm63xx: drop driver owner assignment

Merge tag 'for-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply

Pull power supply and reset updates from Sebastian Reichel:
"Power-supply core:
   - new charging_orange_full_green RGB LED trigger
   - simplify and cleanup power-supply LED trigger code
   - expose power information via hwmon compatibility layer

  New hardware support:
   - enable battery support for Qualcomm Snapdragon X Elite
   - new battery driver for Maxim MAX17201/MAX17205
   - new battery driver for Lenovo Yoga C630 laptop (custom EC)

  Cleanups:
   - cleanup 'struct i2c_device_id' initializations
   - misc small battery driver cleanups and fixes"

* tag 'for-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: supply: sysfs: use power_supply_property_is_writeable()
  power: supply: qcom_battmgr: Enable battery support on x1e80100
  power: supply: add support for MAX1720x standalone fuel gauge
  dt-bindings: power: supply: add support for MAX17201/MAX17205 fuel gauge
  power: reset: piix4: add missing MODULE_DESCRIPTION() macro
  power: supply: samsung-sdi-battery: Constify struct power_supply_maintenance_charge_table
  power: supply: samsung-sdi-battery: Constify struct power_supply_vbat_ri_table
  power: supply: lenovo_yoga_c630_battery: add Lenovo C630 driver
  power: supply: ingenic: Fix some error handling paths in ingenic_battery_get_property()
  power: supply: ab8500: Clean some error messages
  power: supply: ab8500: Use iio_read_channel_processed_scale()
  power: supply: ab8500: Fix error handling when calling iio_read_channel_processed()
  power: supply: hwmon: Add support for power sensors
  power: supply: ab8500: remove unused struct 'inst_curr_result_list'
  power: supply: bd99954: remove unused struct 'battery_data'
  power: supply: leds: Add activate() callback to triggers
  power: supply: leds: Share trig pointer for online and charging_full
  power: supply: leds: Add power_supply_[un]register_led_trigger()
  power: supply: Drop explicit initialization of struct i2c_device_id::driver_data to 0

Merge tag 'hsi-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi

Pull HSI update from Sebastian Reichel:

- drop unused gpio.h header from SSI McSAAB protocol driver

* tag 'hsi-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi:
HSI: ssi_protocol: Remove unused linux/gpio.h

kbuild: doc: gcc to CC change

In this part of the documentation, $(CC) is meant, but gcc is written.

Signed-off-by: Ivan Davydov <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

iommu/amd: Convert comma to semicolon

Replace a comma between expression statements by a semicolon.

Fixes: c9b258c6be09 ("iommu/amd: Prepare for generic IO page table framework")
Signed-off-by: Chen Ni <[email protected]>
Reviewed-by: Suravee Suthikulpanit <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

iommu: sprd: Avoid NULL deref in sprd_iommu_hw_en

In sprd_iommu_cleanup() before calling function sprd_iommu_hw_en()
dom->sdev is equal to NULL, which leads to null dereference.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 9afea57384d4 ("iommu/sprd: Release dma buffer to avoid memory leak")
Signed-off-by: Artem Chernyshev <[email protected]>
Reviewed-by: Chunyan Zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

cifs: fix potential null pointer use in destroy_workqueue in init_cifs error path

Dan Carpenter reported a Smack static checker warning:
fs/smb/client/cifsfs.c:1981 init_cifs()
error: we previously assumed 'serverclose_wq' could be null (see line 1895)

The patch which introduced the serverclose workqueue used the wrong
oredering in error paths in init_cifs() for freeing it on errors.

Fixes: 173217bd7336 ("smb3: retrying on failed server close")
Cc: [email protected]
Cc: Ritvik Budhiraja <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Reviewed-by: Dan Carpenter <[email protected]>
Reviewed-by: David Howells <[email protected]>
Signed-off-by: Steve French <[email protected]>

Merge branch 'for-6.11/sysfs-patch-replace' into for-linus

arm64/sysreg: Correct the values for GICv4.1

Currently, sysreg has value as 0b0010 for the presence of GICv4.1 in
ID_PFR1_EL1 and ID_AA64PFR0_EL1, instead of 0b0011 as per ARM ARM.
Hence, correct them to reflect ARM ARM.

Signed-off-by: Raghavendra Rao Ananta <[email protected]>
Reviewed-by: Zenghui Yu <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Reviewed-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

arm64/vdso: Remove --hash-style=sysv

glibc added support for .gnu.hash in 2006 and .hash has been obsoleted
for more than one decade in many Linux distributions. Using
--hash-style=sysv might imply unaddressed issues and confuse readers.

Just drop the option and rely on the linker default, which is likely
"both", or "gnu" when the distribution really wants to eliminate sysv
hash overhead.

Similar to commit 6b7e26547fad ("x86/vdso: Emit a GNU hash").

Signed-off-by: Fangrui Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

kselftest: missing arg in ptrace.c

The string passed to ksft_test_result_skip is missing the `type_name`

Signed-off-by: Remington Brasga <[email protected]>
Reviewed-by: Dev Jain <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

arm64/Kconfig: Remove redundant 'if HAVE_FUNCTION_GRAPH_TRACER'

Since the commit 819e50e25d0c ("arm64: Add ftrace support"),
HAVE_FUNCTION_GRAPH_TRACER has always been enabled. Although a subsequent
commit 364697032246 ("arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL")
redundantly added check on HAVE_FUNCTION_GRAPH_TRACER, while enabling the
config HAVE_FUNCTION_GRAPH_RETVAL. Let's just drop this redundant check.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
CC: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

arm64: remove redundant 'if HAVE_ARCH_KASAN' in Kconfig

Since commit 0383808e4d99 ("arm64: kasan: Reduce minimum shadow
alignment and enable 5 level paging"), HAVE_ARCH_KASAN is always 'y'.

The condition 'if HAVE_ARCH_KASAN' is always met.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Randy Dunlap <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

s390: Remove protvirt and kvm config guards for uv code

Removing the CONFIG_PROTECTED_VIRTUALIZATION_GUEST ifdefs and config
option as well as CONFIG_KVM ifdefs in uv files.

Having this configurable has been more of a pain than a help.
It's time to remove the ifdefs and the config option.

Signed-off-by: Janosch Frank <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Acked-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/boot: Add cmdline option to relocate lowcore

Now that everything has been converted, add the option
'relocate_lowcore' to enable relocating the lowcore.

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/kdump: Make kdump ready for lowcore relocation

In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in store_status()
and __do_machine_kdump().

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/entry: Make system_call() ready for lowcore relocation

In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in system_call().

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/entry: Make ret_from_fork() ready for lowcore relocation

In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in ret_from_fork().

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/entry: Make __switch_to() ready for lowcore relocation

In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in __switch_to().

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/entry: Make restart_int_handler() ready for lowcore relocation

In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in restart_int_handler().

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/entry: Make mchk_int_handler() ready for lowcore relocation

In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in mcck_int_handler().

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/entry: Make int handlers ready for lowcore relocation

In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in the ext/io interrupt
handlers.

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/entry: Make pgm_check_handler() ready for lowcore relocation

In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in pgm_check_handler().

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/entry: Add base register to CHECK_VMAP_STACK/CHECK_STACK macro

In preparation of having lowcore at different address than zero,
add the base register to CHECK_VMAP_STACK and CHECK_STACK. No
functional change, because %r0 is passed to the macro.

Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/entry: Add base register to SIEEXIT macro

In preparation of having lowcore at different address than zero,
add the base register to SIEEXIT. No functional change, because
%r0 is passed to the macro.

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/entry: Add base register to MBEAR macro

In preparation of having lowcore at different address than zero,
add the base register to MBEAR. No functional change, because
%r0 is passed to the macro.

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/entry: Make __sie64a() ready for lowcore relocation

In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in __sie64a().

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/head64: Make startup code ready for lowcore relocation

In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in startup_continue().

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390: Add infrastructure to patch lowcore accesses

The s390 architecture defines two special per-CPU data pages
called the "prefix area". In s390-linux terminology this is usually
called "lowcore". This memory area contains system configuration
data like old/new PSW's for system call/interrupt/machine check
handlers and lots of other data. It is normally mapped to logical
address 0. This area can only be accessed when in supervisor mode.

This means that kernel code can dereference NULL pointers, because
accesses to address 0 are allowed. Parts of lowcore can be write
protected, but read accesses and write accesses outside of the write
protected areas are not caught.

To remove this limitation for debugging and testing, remap lowcore to
another address and define a function get_lowcore() which simply
returns the address where lowcore is mapped at. This would normally
introduce a pointer dereference (=memory read). As lowcore is used
for several very often used variables, add code to patch this function
during runtime, so we avoid the memory reads.

For C code get_lowcore() has to be used, for assembly code it is
the GET_LC macro. When using this macro/function a reference is added
to alternative patching. All these locations will be patched to the
actual lowcore location when the kernel is booted or a module is loaded.

To make debugging/bisecting problems easier, this patch adds all the
infrastructure but the lowcore address is still hardwired to 0. This
way the code can be converted on a per function basis, and the
functionality is enabled in a patch after all the functions have
been converted.

Note that this requires at least z16 because the old lpsw instruction
only allowed a 12 bit displacement. z16 introduced lpswey which allows
20 bits (signed), so the lowcore can effectively be mapped from
address 0 - 0x7e000. To use 0x7e000 as address, a 6 byte lgfi
instruction would have to be used in the alternative. To save two
bytes, llilh can be used, but this only allows to set bits 16-31 of
the address. In order to use the llilh instruction, use 0x70000 as
alternative lowcore address. This is still large enough to catch
NULL pointer dereferences into large arrays.

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/atomic_ops: Disable flag outputs constraint for GCC versions below 14.2.0

GCC may die with an ICE if the flag outputs constraint is used in
combination with other inline assemblies. This will be fixed with
GCC 14.2.0.

Therefore disable the use of the constraint for now.

Link: https://gcc.gnu.org/git?p=gcc.git;a=commit;h=cd11413ff7c4353a3e336db415304f788d23a393
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/entry: Move SIE indicator flag to thread info

CIF_SIE indicates if a thread is running in SIE context. This is the
state of a thread and not the CPU. Therefore move this indicator to
thread info.

Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/nmi: Simplify ptregs setup

The low level machine check handler code fills the ptregs structure
partially with the register contents present at machine check handler
entry and partially with contents from the machine check save area.

In case of a machine check the contents of all general purpose
registers are saved by the CPU to the machine check save area.
Therefore simplify the code and fill the ptregs structure by only
using the machine check save area as source.

Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/alternatives: Remove alternative facility list

The alternative and the normal facility list are always identical. Remove
the alternative facility list, which allows to simplify the alternatives
code.

Acked-by: Alexander Gordeev <[email protected]>
Tested-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/nospec: Push down alternative handling

The nospec implementation is deeply integrated into the alternatives
code: only for nospec an alternative facility list is implemented and
used by the alternative code, while it is modified by nospec specific
needs.

Push down the nospec alternative handling into the nospec by
introducing a new alternative type and a specific nospec callback to
decide if alternatives should be applied.

Also introduce a new global nobp variable which together with facility
82 can be used to decide if nobp is enabled or not.

Acked-by: Alexander Gordeev <[email protected]>
Tested-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/alternatives: Allow early alternative patching in decompressor

Add the required code to patch alternatives early in the decompressor.
This is required for the upcoming lowcore relocation changes, where
alternatives for facility 193 need to get patched before lowcore
alternatives.

Reviewed-by: Alexander Gordeev <[email protected]>
Co-developed-by: Heiko Carstens <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/alternatives: Rework to allow for callbacks

Rework alternatives to allow for callbacks. With this every
alternative entry has additional data encoded:

- When (aka context) an alternative is supposed to be applied

- The type of an alternative, which allows for type specific handling
  and callbacks

- Extra type specific payload (patch information), which can be passed
  to callbacks in order to decide if an alternative should be applied
  or not

With this only the "late" context is implemented, which means there is
no change to the previous behaviour. All code is just converted to the
more generic new infrastructure.

Reviewed-by: Alexander Gordeev <[email protected]>
Tested-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/uaccess: Make s390_kernel_write() usable for decompressor

To avoid lots of ifdefs in C code make s390_kernel_write() usable for
the decompressor: simply use memcpy() for this case since there is no
write protection enabled that early.

Reviewed-by: Alexander Gordeev <[email protected]>
Tested-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/alternatives: Move text sync functions

Move all text sync functions from alternative.c to processor.c. This
way there is only minimal code left in alternative.c left, which is a
prerequisite to use the C file within boot code as well.

Reviewed-by: Alexander Gordeev <[email protected]>
Tested-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/alternatives: Merge both alternative header files

The two alternative header files must stay in sync. This is easier to
achieve within one header file. Therefore merge both of them and have
only one file, like most other architectures.

Reviewed-by: Alexander Gordeev <[email protected]>
Tested-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/alternatives: Use consistent naming

The alternative code is using the words facility and feature for the
same. Rename facility to more generic feature everywhere to have
consistent naming.

Reviewed-by: Alexander Gordeev <[email protected]>
Tested-by: Sven Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/alternatives: Remove noaltinstr option

The current Kernel doesn't boot without alternative patching on
z16 machines. To avoid such bugs in the future, remove the option
disable alternative patching.

Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Alexander Gordeev <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390: Move CIF flags to struct pcpu

To allow testing flags for offline CPUs, move the CIF flags
to struct pcpu. To avoid having to calculate the array index
for each access, add a pointer to the pcpu member for the current
cpu to lowcore.

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/smp: Switch pcpu_devices to percpu

In preparation of moving the CIF flags from lowcore to pcpu_devices,
convert the pcpu_devices array to use the percpu infrastructure.
This is required because using the pcpu_devices array as it is would
introduce a performance penalty due to the fact that CPU flags for
multiple CPUs would end up in the same cacheline.

Note that a pointer to the pcpu struct of the IPL CPU is still required.
This is because a restart interrupt can be triggered on an offline CPU.
s390 stores the percpu offset in lowcore, but offline CPUs have no
lowcore area allocated. So percpu data cannot be used from an offline
CPU and we need to get the pcpu pointer for the IPL cpu from somewhere
else.

Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/smp: Handle restart interrupt on ipl cpu

The current smp code allows to trigger a restart interrupt on CPUs
offline in linux. To allow using the percpu infrastructure instead
of the pcpu_devices array, switch to the ipl cpu which is always
online before calling do_restart().

Reviewed-by: Heiko Carstens <[email protected]>
Reviewed-by: Alexander Gordeev <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/boot: Do not assume the decompressor range is reserved

When allocating a random memory range for .amode31 sections
the minimal randomization address is 0. That does not lead
to a possible overlap with the decompressor image (which also
starts from 0) since by that time the image range is already
reserved.

Do not assume the decompressor range is reserved and always
provide the minimal randomization address for .amode31
sections beyond the decompressor. That is a prerequisite
for moving the lowcore memory address from NULL elsewhere.

Signed-off-by: Alexander Gordeev <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/cpum_cf: Fix endless loop in CF_DIAG event stop

Event CF_DIAG reads out complete counter sets using stcctm
instruction. This is done at event start time when the process
starts execution and at event stop time when the process is
removed from the CPU. During removal the difference of each
counter in the counter sets is calculated and saved as raw data
in the ring buffer. This works fine unless the number of counters
in a counter set is zero. This may happen for the extended counter
set. This set is machine specific and the size of the counter
set can be zero even when extended counter set is authorized for
read access.

This case is not handled. cfdiag_diffctr() checks authorization
of the extended counter set. If true the functions assumes
the extended counter set has been saved in a data buffer. However
this is not the case, cfdiag_getctrset() does not save a counter
set with counter set size of zero. This mismatch causes an endless
loop in the counter set readout during event stop handling.

The calculation of the difference of the counters in each counter
now verifies the size of the counter set is non-zero. A counter set
with size zero is skipped.

Fixes: a029a4eab39e ("s390/cpumf: Allow concurrent access for CPU Measurement Counter Facility")
Signed-off-by: Thomas Richter <[email protected]>
Acked-by: Sumanth Korikkar <[email protected]>
Acked-by: Heiko Carstens <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/ptdump: Add KMSAN page markers

Add KMSAN vmalloc metadata areas to
/sys/kernel/debug/kernel_page_tables. Example output:

    0x000003a95fff9000-0x000003a960000000        28K PTE I
    ---[ vmalloc Area End ]---
    ---[ Kmsan vmalloc Shadow Start ]---
    0x000003a960000000-0x000003a960010000        64K PTE RW NX
    [...]
    0x000003d3dfff9000-0x000003d3e0000000        28K PTE I
    ---[ Kmsan vmalloc Shadow End ]---
    ---[ Kmsan vmalloc Origins Start ]---
    0x000003d3e0000000-0x000003d3e0010000        64K PTE RW NX
    [...]
    0x000003fe5fff9000-0x000003fe60000000        28K PTE I
    ---[ Kmsan vmalloc Origins End ]---
    ---[ Kmsan Modules Shadow Start ]---
    0x000003fe60000000-0x000003fe60001000         4K PTE RW NX
    [...]
    0x000003fe60100000-0x000003fee0000000      2047M PMD I
    ---[ Kmsan Modules Shadow End ]---
    ---[ Kmsan Modules Origins Start ]---
    0x000003fee0000000-0x000003fee0001000         4K PTE RW NX
    [...]
    0x000003fee0100000-0x000003ff60000000      2047M PMD I
    ---[ Kmsan Modules Origins End ]---
    ---[ Modules Area Start ]---
    0x000003ff60000000-0x000003ff60001000         4K PTE RO X

Signed-off-by: Ilya Leoshkevich <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vasily Gorbik <[email protected]>

s390/kmsan: Fix merge conflict with get_lowcore() introduction

Resolve the conflict between commit 2a48c8c9cf87 ("s390/kmsan:
implement the architecture-specific functions") and commit 39976f1278a9
("s390: Remove S390_lowcore").

Fixes: 2a48c8c9cf87 ("s390/kmsan: implement the architecture-specific functions")
Signed-off-by: Ilya Leoshkevich <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vasily Gorbik <[email protected]>

s390/setup: Fix __pa/__va for modules under non-GPL licenses

The struct vm_layout contains fields used in __pa/__va calculations. Such
fundamental things have to be exported with EXPORT_SYMBOL to avoid
breakages of out-of-tree modules under non-GPL licenses.

Fixes: 7de0446f0b26 ("s390/boot: Make identity mapping base address explicit")
Acked-by: Heiko Carstens <[email protected]>
Acked-by: Alexander Gordeev <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/pci: Allow allocation of more than 1 MSI interrupt

On a PCI adapter that provides up to 8 MSI interrupt sources the s390
implementation of PCI interrupts rejected to accommodate them, although
the underlying hardware is able to support that.

For MSI-X it is sufficient to allocate a single irq_desc per msi_desc,
but for MSI multiple irq descriptors are attached to and controlled by
a single msi descriptor. Add the appropriate loops to maintain multiple
irq descriptors and tie/untie them to/from the appropriate AIBV bit, if
a device driver allocates more than 1 MSI interrupt.

Common PCI code passes on requests to allocate a number of interrupt
vectors based on the device drivers' demand and the PCI functions'
capabilities. However, the root-complex of s390 systems support just a
limited number of interrupt vectors per PCI function.
Produce a kernel log message to inform about any architecture-specific
capping that might be done.

With this change, we had a PCI adapter successfully raising
interrupts to its device driver via all 8 sources.

Fixes: a384c8924a8b ("s390/PCI: Fix single MSI only check")
Signed-off-by: Gerd Bayer <[email protected]>
Reviewed-by: Niklas Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/pci: Refactor arch_setup_msi_irqs()

Factor out adapter interrupt allocation from arch_setup_msi_irqs() in
preparation for enabling registration of multiple MSIs. Code movement
only, no change of functionality intended.

Signed-off-by: Gerd Bayer <[email protected]>
Reviewed-by: Niklas Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>