Petr Machata [Tue, 23 Jul 2024 16:04:16 +0000 (18:04 +0200)]
net: nexthop: Initialize all fields in dumped nexthops
struct nexthop_grp contains two reserved fields that are not initialized by
nla_put_nh_group(), and carry garbage. This can be observed e.g. with
strace (edited for clarity):
# ip nexthop add id 1 dev lo
# ip nexthop add id 101 group 1
# strace -e recvmsg ip nexthop get id 101
...
recvmsg(... [{nla_len=12, nla_type=NHA_GROUP},
[{id=1, weight=0, resvd1=0x69, resvd2=0x67}]] ...) = 52
The fields are reserved and therefore not currently used. But as they are, they
leak kernel memory, and the fact they are not just zero complicates repurposing
of the fields for new ends. Initialize the full structure.
Simon Horman [Tue, 23 Jul 2024 13:29:27 +0000 (14:29 +0100)]
net: stmmac: Correct byte order of perfect_match
The perfect_match parameter of the update_vlan_hash operation is __le16,
and is correctly converted from host byte-order in the lone caller,
stmmac_vlan_update().
However, the implementations of this caller, dwxgmac2_update_vlan_hash()
and dwxgmac2_update_vlan_hash(), both treat this parameter as host byte
order, using the following pattern:
u32 value = ...
...
writel(value | perfect_match, ...);
This is not correct because both:
1) value is host byte order; and
2) writel expects a host byte order value as it's first argument
I believe that this will break on big endian systems. And I expect it
has gone unnoticed by only being exercised on little endian systems.
The approach taken by this patch is to update the callback, and it's
caller to simply use a host byte order value.
Flagged by Sparse.
Compile tested only.
Fixes: c7ab0b8088d7 ("net: stmmac: Fallback to VLAN Perfect filtering if HASH is not available") Signed-off-by: Simon Horman <[email protected]> Reviewed-by: Maxime Chevallier <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Pavel Begunkov [Wed, 24 Jul 2024 11:16:21 +0000 (12:16 +0100)]
io_uring: align iowq and task request error handling
There is a difference in how io_queue_sqe and io_wq_submit_work treat
error codes they get from io_issue_sqe. The first one fails anything
unknown but latter only fails when the code is negative.
It doesn't make sense to have this discrepancy, align them to the
io_queue_sqe behaviour.
Pavel Begunkov [Wed, 24 Jul 2024 11:16:19 +0000 (12:16 +0100)]
io_uring: simplify io_uring_cmd return
We don't have to return error code from an op handler back to core
io_uring, so once io_uring_cmd() sets the results and handles errors we
can juts return IOU_OK and simplify the code.
Note, only valid with e0b23d9953b0c ("io_uring: optimise ltimeout for
inline execution"), there was a problem with iopoll before.
Pavel Begunkov [Wed, 24 Jul 2024 11:16:17 +0000 (12:16 +0100)]
io_uring: don't allow netpolling with SETUP_IOPOLL
IORING_SETUP_IOPOLL rings don't have any netpoll handling, let's fail
attempts to register netpolling in this case, there might be people who
will mix up IOPOLL and netpoll.
Pavel Begunkov [Wed, 24 Jul 2024 11:16:16 +0000 (12:16 +0100)]
io_uring: tighten task exit cancellations
io_uring_cancel_generic() should retry if any state changes like a
request is completed, however in case of a task exit it only goes for
another loop and avoids schedule() if any tracked (i.e. REQ_F_INFLIGHT)
request got completed.
Let's assume we have a non-tracked request executing in iowq and a
tracked request linked to it. Let's also assume
io_uring_cancel_generic() fails to find and cancel the request, i.e.
via io_run_local_work(), which may happen as io-wq has gaps.
Next, the request logically completes, io-wq still hold a ref but queues
it for completion via tw, which happens in
io_uring_try_cancel_requests(). After, right before prepare_to_wait()
io-wq puts the request, grabs the linked one and tries executes it, e.g.
arms polling. Finally the cancellation loop calls prepare_to_wait(),
there are no tw to run, no tracked request was completed, so the
tctx_inflight() check passes and the task is put to indefinite sleep.
Jinjie Ruan [Thu, 13 Jun 2024 11:13:47 +0000 (19:13 +0800)]
trace: riscv: Remove deprecated kprobe on ftrace support
Since commit 7caa9765465f60 ("ftrace: riscv: move from REGS to ARGS"),
kprobe on ftrace is not supported by riscv, because riscv's support for
FTRACE_WITH_REGS has been replaced with support for FTRACE_WITH_ARGS, and
KPROBES_ON_FTRACE will be supplanted by FPROBES. So remove the deprecated
kprobe on ftrace support, which is misunderstood.
Hangbin Liu [Tue, 23 Jul 2024 08:22:52 +0000 (16:22 +0800)]
selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
If the testing kernel doesn't support setting fdb_max_learned or show
fdb_n_learned, just skip it. Or we will get errors like
./bridge_fdb_learning_limit.sh: line 218: [: null: integer expression expected
./bridge_fdb_learning_limit.sh: line 225: [: null: integer expression expected
Fixes: 6f84090333bb ("selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest") Signed-off-by: Hangbin Liu <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Johannes Nixdorf <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Shigeru Yoshida [Tue, 16 Jul 2024 02:09:05 +0000 (11:09 +0900)]
tipc: Return non-zero value from tipc_udp_addr2str() on error
tipc_udp_addr2str() should return non-zero value if the UDP media
address is invalid. Otherwise, a buffer overflow access can occur in
tipc_media_addr_printf(). Fix this by returning 1 on an invalid UDP
media address.
thermal: core: Back off when polling thermal zones on errors
Commit a8a261774466 ("thermal: core: Call monitor_thermal_zone() if zone
temperature is invalid") introduced a polling mechanism by which the
thermal core attampts to get a valid temperature value for thermal zones
where the .get_temp() callback returns errors to start with (for
example, due to initialization ordering woes). However, this polling is
carried out periodically ad infinitum and every iteration of it causes
a message to be printed to the kernel log which means a lot of log noise
on systems where there are thermal zones that never get ready for some
reason. It is also not really useful to continuously poll thermal zones
that never respond.
To address this, modify the thermal core to increase the delay between
consecutive thermal zone temperature checks after every check that fails
until it reaches a certain maximum value. At that point, the thermal
zone in question will be disabled, but user space will be able to
reenable it if it believes that the failure is transient.
Also change the code to print messages regarding failed temperature
checks to the kernel log only twice, once when the thermal zone's
.get_temp() callback returns an error for the first time and once when
disabling the given thermal zone. In addition, a dev_crit() message
will be printed at that point if the given thermal zone contains a
critical trip point to notify the system operator about the situation.
Peter Ujfalusi [Wed, 24 Jul 2024 08:19:32 +0000 (11:19 +0300)]
ASoC: SOF: ipc4-topology: Preserve the DMA Link ID for ChainDMA on unprepare
The DMA Link ID is set to the IPC message's primary during dai_config,
which is only during hw_params.
During xrun handling the hw_params is not called and the DMA Link ID
information will be lost.
All other fields in the message expected to be 0 for re-configuration, only
the DMA Link ID needs to be preserved and the in case of repeated
dai_config, it is correctly updated (masked and then set).
Peter Ujfalusi [Wed, 24 Jul 2024 08:19:31 +0000 (11:19 +0300)]
ASoC: SOF: ipc4-topology: Only handle dai_config with HW_PARAMS for ChainDMA
The DMA Link ID is only valid in snd_sof_dai_config_data when the
dai_config is called with HW_PARAMS.
The commit that this patch fixes is actually moved a code section without
changing it, the same bug exists in the original code, needing different
patch to kernel prior to 6.9 kernels.
Petr Vorel [Wed, 24 Jul 2024 08:46:55 +0000 (10:46 +0200)]
kbuild: rpm-pkg: Fix C locale setup
semicolon separation in LC_ALL is wrong. Either variable needs to be
exported before as a separate commit or set as part of the commit in the
beginning. Used second variant.
This fixes broken build on user's locale setup which makes 'date' binary
to produce invalid characters in rpm changelog (e.g. cs_CZ.UTF-8 'čec'):
$ make binrpm-pkg
GEN rpmbuild/SPECS/kernel.spec
rpmbuild -bb rpmbuild/SPECS/kernel.spec --define='_topdirlinux/rpmbuild' \
--target x86_64-linux --build-in-place --noprep --define='_smp_mflags \
%{nil}' $(rpm -q rpm >/dev/null 2>&1 || echo --nodeps)
Building target platforms: x86_64-linux
Building for target x86_64-linux
error: bad date in %changelog: St čec 24 2024 user <user@somehost>
make[2]: *** [scripts/Makefile.package:71: binrpm-pkg] Error 1
make[1]: *** [linux/Makefile:1546: binrpm-pkg] Error 2
make: *** [Makefile:224: __sub-make] Error 2
Fixes: 301c10908e42 ("kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec") Signed-off-by: Petr Vorel <[email protected]> Reviewed-by: Miguel Ojeda <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
In __wait_on_freeing_inode() we warn in case the inode_hash_lock is held
but the inode is unhashed. We then release the inode_lock. So using
"locked" as parameter name is confusing. Use is_inode_hash_locked as
parameter name instead.
David Howells [Tue, 23 Jul 2024 08:59:54 +0000 (09:59 +0100)]
vfs: Fix potential circular locking through setxattr() and removexattr()
When using cachefiles, lockdep may emit something similar to the circular
locking dependency notice below. The problem appears to stem from the
following:
(1) Cachefiles manipulates xattrs on the files in its cache when called
from ->writepages().
(2) The setxattr() and removexattr() system call handlers get the name
(and value) from userspace after taking the sb_writers lock, putting
accesses of the vma->vm_lock and mm->mmap_lock inside of that.
(3) The afs filesystem uses a per-inode lock to prevent multiple
revalidation RPCs and in writeback vs truncate to prevent parallel
operations from deadlocking against the server on one side and local
page locks on the other.
Fix this by moving the getting of the name and value in {get,remove}xattr()
outside of the sb_writers lock. This also has the minor benefits that we
don't need to reget these in the event of a retry and we never try to take
the sb_writers lock in the event we can't pull the name and value into the
kernel.
Alternative approaches that might fix this include moving the dispatch of a
write to the cache off to a workqueue or trying to do without the
validation lock in afs. Note that this might also affect other filesystems
that use netfslib and/or cachefiles.
======================================================
WARNING: possible circular locking dependency detected
6.10.0-build2+ #956 Not tainted
------------------------------------------------------
fsstress/6050 is trying to acquire lock: ffff888138fd82f0 (mapping.invalidate_lock#3){++++}-{3:3}, at: filemap_fault+0x26e/0x8b0
but task is already holding lock: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
When I wrote commit 3cad1bc01041 ("filelock: Remove locks reliably when
fcntl/close race is detected"), I missed that there are two copies of the
code I was patching: The normal version, and the version for 64-bit offsets
on 32-bit kernels.
Thanks to Greg KH for stumbling over this while doing the stable
backport...
Apply exactly the same fix to the compat path for 32-bit kernels.
The counter is unconditionally incremented for each mount allocation.
If we set it to 1ULL << 32 we're losing 4294967296 as the first valid
non-32 bit mount id.
David Howells [Fri, 19 Jul 2024 14:19:02 +0000 (15:19 +0100)]
cachefiles: Set the max subreq size for cache writes to MAX_RW_COUNT
Set the maximum size of a subrequest that writes to cachefiles to be
MAX_RW_COUNT so that we don't overrun the maximum write we can make to the
backing filesystem.
David Howells [Fri, 19 Jul 2024 14:20:18 +0000 (15:20 +0100)]
netfs: Fix writeback that needs to go to both server and cache
When netfslib is performing writeback (ie. ->writepages), it maintains two
parallel streams of writes, one to the server and one to the cache, but it
doesn't mark either stream of writes as active until it gets some data that
needs to be written to that stream.
This is done because some folios will only be written to the cache
(e.g. copying to the cache on read is done by marking the folios and
letting writeback do the actual work) and sometimes we'll only be writing
to the server (e.g. if there's no cache).
Now, since we don't actually dispatch uploads and cache writes in parallel,
but rather flip between the streams, depending on which has the lowest
so-far-issued offset, and don't wait for the subreqs to finish before
flipping, we can end up in a situation where, say, we issue a write to the
server and this completes before we start the write to the cache.
But because we only activate a stream when we first add a subreq to it, the
result collection code may run before we manage to activate the stream -
resulting in the folio being cleaned and having the writeback-in-progress
mark removed. At this point, the folio no longer belongs to us.
This is only really a problem for folios that need to be written to both
streams - and in that case, the upload to the server is started first,
followed by the write to the cache - and the cache write may see a bad
folio.
Fix this by activating the cache stream up front if there's a cache
available. If there's a cache, then all data is going to be written to it.
The nsproxy structure contains nearly all of the namespaces associated
with a task. When a given namespace type is not supported by this kernel
the rules whether the corresponding pointer in struct nsproxy is NULL or
always init_<ns_type>_ns differ per namespace. Ideally, that wouldn't be
the case and for all namespace types we'd always set it to
init_<ns_type>_ns when the corresponding namespace type isn't supported.
Make sure we handle all namespaces where the pointer in struct nsproxy
can be NULL when the namespace type isn't supported.
syzbot call pidfd_ioctl() with cmd "PIDFD_GET_TIME_NAMESPACE" and disabled
CONFIG_TIME_NS, since time_ns is NULL, it will make NULL ponter deref in
open_namespace.
correct the comments of vfs_*() helpers in fs/namei.c, including:
1. vfs_create()
2. vfs_mknod()
3. vfs_mkdir()
4. vfs_rmdir()
5. vfs_symlink()
All of them come from the same commit: 6521f8917082 "namei: prepare for idmapped mounts"
The @dentry is actually the dentry of child directory rather than
base directory(parent directory), and thus the @dir has to be
modified due to the change of @dentry.
vfs: handle __wait_on_freeing_inode() and evict() race
Lockless hash lookup can find and lock the inode after it gets the
I_FREEING flag set, at which point it blocks waiting for teardown in
evict() to finish.
However, the flag is still set even after evict() wakes up all waiters.
This results in a race where if the inode lock is taken late enough, it
can happen after both hash removal and wakeups, meaning there is nobody
to wake the racing thread up.
This worked prior to RCU-based lookup because the entire ordeal was
synchronized with the inode hash lock.
Since unhashing requires the inode lock, we can safely check whether it
happened after acquiring it.
We need to disable softinterrupts, else we get following problem:
1. pipapo_avx2 called from process context; fpu usable
2. preempt_disable() called, pcpu scratchmap in use
3. softirq handles rx or tx, we re-enter pipapo_avx2
4. fpu busy, fallback to generic non-avx version
5. fallback reuses scratch map and index, which are in use
by the preempted process
Handle this same way as generic version by first disabling
softinterrupts while the scratchmap is in use.
Merge tag 'perf-tools-fixes-for-v6.11-2024-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
"Two fixes for building perf and other tools:
- Fix breakage in tracing tools due to pkg-config for
libtrace{event,fs}
- Fix build of perf when libunwind is used"
* tag 'perf-tools-fixes-for-v6.11-2024-07-23' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf dso: Fix build when libunwind is enabled
tools/latency: Use pkg-config in lib_setup of Makefile.config
tools/rtla: Use pkg-config in lib_setup of Makefile.config
tools/verification: Use pkg-config in lib_setup of Makefile.config
tools: Make pkg-config dependency checks usable by other tools
perf build: Warn if libtracefs is not found
Merge tag 'execve-v6.11-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fix from Kees Cook:
"This moves the exec and binfmt_elf tests out of your way and into the
tests/ subdirectory, following the newly ratified KUnit naming
conventions. :)"
* tag 'execve-v6.11-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
execve: Move KUnit tests to tests/ subdirectory
parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN
Allow users to disable kernel warnings for unaligned memory
accesses from kernel via the /proc/sys/kernel/ignore-unaligned-usertrap
procfs entry.
That way users can disable those warnings in case they happen too
often.
Steve French [Tue, 23 Jul 2024 05:44:48 +0000 (00:44 -0500)]
cifs: mount with "unix" mount option for SMB1 incorrectly handled
Although by default we negotiate CIFS Unix Extensions for SMB1 mounts to
Samba (and they work if the user does not specify "unix" or "posix" or
"linux" on mount), and we do properly handle when a user turns them off
with "nounix" mount parm. But with the changes to the mount API we
broke cases where the user explicitly specifies the "unix" option (or
equivalently "linux" or "posix") on mount with vers=1.0 to Samba or other
servers which support the CIFS Unix Extensions.
"mount error(95): Operation not supported"
and logged:
"CIFS: VFS: Check vers= mount option. SMB3.11 disabled but required for POSIX extensions"
even though CIFS Unix Extensions are supported for vers=1.0 This patch fixes
the case where the user specifies both "unix" (or equivalently "posix" or
"linux") and "vers=1.0" on mount to a server which supports the
CIFS Unix Extensions.
Steve French [Tue, 23 Jul 2024 04:40:08 +0000 (23:40 -0500)]
cifs: fix reconnect with SMB1 UNIX Extensions
When mounting with the SMB1 Unix Extensions (e.g. mounts
to Samba with vers=1.0), reconnects no longer reset the
Unix Extensions (SetFSInfo SET_FILE_UNIX_BASIC) after tcon so most
operations (e.g. stat, ls, open, statfs) will fail continuously
with:
"Operation not supported"
if the connection ever resets (e.g. due to brief network disconnect)
Wojciech Drewek [Mon, 1 Jul 2024 09:05:46 +0000 (11:05 +0200)]
ice: Fix recipe read procedure
When ice driver reads recipes from firmware information about
need_pass_l2 and allow_pass_l2 flags is not stored correctly.
Those flags are stored as one bit each in ice_sw_recipe structure.
Because of that, the result of checking a flag has to be casted to bool.
Note that the need_pass_l2 flag currently works correctly, because
it's stored in the first bit.
Ahmed Zaki [Fri, 14 Jun 2024 13:18:42 +0000 (07:18 -0600)]
ice: Add a per-VF limit on number of FDIR filters
While the iavf driver adds a s/w limit (128) on the number of FDIR
filters that the VF can request, a malicious VF driver can request more
than that and exhaust the resources for other VFs.
Merge tag 'f2fs-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"A pretty small update including mostly minor bug fixes in zoned
storage along with the large section support.
Enhancements:
- add support for FS_IOC_GETFSSYSFSPATH
- enable atgc dynamically if conditions are met
- use new ioprio Macro to get ckpt thread ioprio level
- remove unreachable lazytime mount option parsing
Bug fixes:
- fix null reference error when checking end of zone
- fix start segno of large section
- fix to cover read extent cache access with lock
- don't dirty inode for readonly filesystem
- allocate a new section if curseg is not the first seg in its zone
- only fragment segment in the same section
- truncate preallocated blocks in f2fs_file_open()
- fix to avoid use SSR allocate when do defragment
- fix to force buffered IO on inline_data inode
And some minor code clean-ups and sanity checks"
* tag 'f2fs-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (26 commits)
f2fs: clean up addrs_per_{inode,block}()
f2fs: clean up F2FS_I()
f2fs: use meta inode for GC of COW file
f2fs: use meta inode for GC of atomic file
f2fs: only fragment segment in the same section
f2fs: fix to update user block counts in block_operations()
f2fs: remove unreachable lazytime mount option parsing
f2fs: fix null reference error when checking end of zone
f2fs: fix start segno of large section
f2fs: remove redundant sanity check in sanity_check_inode()
f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid
f2fs: fix to use mnt_{want,drop}_write_file replace file_{start,end}_wrtie
f2fs: clean up set REQ_RAHEAD given rac
f2fs: enable atgc dynamically if conditions are met
f2fs: fix to truncate preallocated blocks in f2fs_file_open()
f2fs: fix to cover read extent cache access with lock
f2fs: fix return value of f2fs_convert_inline_inode()
f2fs: use new ioprio Macro to get ckpt thread ioprio level
f2fs: fix to don't dirty inode for readonly filesystem
f2fs: fix to avoid use SSR allocate when do defragment
...
Merge tag 'jfs-6.11' of github.com:kleikamp/linux-shaggy
Pull jfs updates from David Kleikamp:
"Folio conversion from Matthew Wilcox and a few various fixes"
* tag 'jfs-6.11' of github.com:kleikamp/linux-shaggy:
jfs: don't walk off the end of ealist
jfs: Fix shift-out-of-bounds in dbDiscardAG
jfs: Fix array-index-out-of-bounds in diFree
jfs: fix null ptr deref in dtInsertEntry
jfs: Remove use of folio error flag
fs: Remove i_blocks_per_page
jfs: Change metapage->page to metapage->folio
jfs: Convert force_metapage to use a folio
jfs: Convert inc_io to take a folio
jfs: Convert page_to_mp to folio_to_mp
jfs; Convert __invalidate_metapages to use a folio
jfs: Convert dec_io to take a folio
jfs: Convert drop_metapage and remove_metapage to take a folio
jfs; Convert release_metapage to use a folio
jfs: Convert insert_metapage() to take a folio
jfs: Convert __get_metapage to use a folio
jfs: Convert metapage_writepage to metapage_write_folio
jfs: Convert metapage_read_folio to use folio APIs
Merge tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Remove tristate choice support from Kconfig
- Stop using the PROVIDE() directive in the linker script
- Reduce the number of links for the combination of CONFIG_KALLSYMS and
CONFIG_DEBUG_INFO_BTF
- Enable the warning for symbol reference to .exit.* sections by
default
- Fix warnings in RPM package builds
- Improve scripts/make_fit.py to generate a FIT image with separate
base DTB and overlays
- Improve choice value calculation in Kconfig
- Fix conditional prompt behavior in choice in Kconfig
- Remove support for the uncommon EMAIL environment variable in Debian
package builds
- Remove support for the uncommon "name <email>" form for the DEBEMAIL
environment variable
- Raise the minimum supported GNU Make version to 4.0
- Remove stale code for the absolute kallsyms
- Move header files commonly used for host programs to scripts/include/
- Introduce the pacman-pkg target to generate a pacman package used in
Arch Linux
- Clean up Kconfig
* tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (65 commits)
kbuild: doc: gcc to CC change
kallsyms: change sym_entry::percpu_absolute to bool type
kallsyms: unify seq and start_pos fields of struct sym_entry
kallsyms: add more original symbol type/name in comment lines
kallsyms: use \t instead of a tab in printf()
kallsyms: avoid repeated calculation of array size for markers
kbuild: add script and target to generate pacman package
modpost: use generic macros for hash table implementation
kbuild: move some helper headers from scripts/kconfig/ to scripts/include/
Makefile: add comment to discourage tools/* addition for kernel builds
kbuild: clean up scripts/remove-stale-files
kconfig: recursive checks drop file/lineno
kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec
kallsyms: get rid of code for absolute kallsyms
kbuild: Create INSTALL_PATH directory if it does not exist
kbuild: Abort make on install failures
kconfig: remove 'e1' and 'e2' macros from expression deduplication
kconfig: remove SYMBOL_CHOICEVAL flag
kconfig: add const qualifiers to several function arguments
kconfig: call expr_eliminate_yn() at least once in expr_eliminate_dups()
...
Merge tag 'rproc-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull remoteproc updates from Bjorn Andersson:
- The maximum amount of DDR memory used by the Mediatek MT8188/MT8195
SCP is increased to handle new use cases. Handling of optional L1TCM
memory is made actually optional.
- An optimization is introduced to only clear the unused portion of IPI
shared buffers, rather than the entire buffer before writing the
message.
- Detection for IPC-only mode in the TI K3 DSP remoteproc driver is
corrected. The loglevel of a debug print in the same is lowered from
error.
- Support for attaching to an running remote processor is added to the
Xilinx R5F.
- An in-kernel implementation of the Qualcomm "protected domain mapper"
(aka service registry) service is introduced, to remove the
dependency on a userspace implementation to detect when the battery
monitor and USB Type-C port manager becomes available. This is then
integrated with the Qualcomm remoteproc driver.
- The Qualcomm PAS remoteproc driver gains support for attempting to
bust hwspinlocks held by the remote processor when it
crashed/stopped.
- The TI OMAP remoteproc driver is transitioned to use devres helpers
for various forms of allocations.
- Parsing of memory-regions in the i.MX remoteproc driver is improved
to avoid a NULL pointer dereference if the phandle reference is
empty. of_node reference counting is corrected in the same.
* tag 'rproc-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
remoteproc: mediatek: Increase MT8188/MT8195 SCP core0 DRAM size
remoteproc: k3-dsp: Fix log levels where appropriate
remoteproc: xlnx: Add attach detach support
remoteproc: qcom: select AUXILIARY_BUS
remoteproc: k3-r5: Fix IPC-only mode detection
remoteproc: mediatek: Don't attempt to remap l1tcm memory if missing
remoteproc: qcom: enable in-kernel PD mapper
dt-bindings: remoteproc: imx_rproc: Add minItems for power-domain
remoteproc: imx_rproc: Fix refcount mistake in imx_rproc_addr_init
remoteproc: omap: Use devm_rproc_add() helper
remoteproc: omap: Use devm action to release reserved memory
remoteproc: omap: Use devm_rproc_alloc() helper
remoteproc: imx_rproc: Skip over memory region when node value is NULL
dt-bindings: remoteproc: k3-dsp: Correct optional sram properties for AM62A SoCs
remoteproc: qcom_q6v5_pas: Add hwspinlock bust on stop
soc: qcom: smem: Add qcom_smem_bust_hwspin_lock_by_host()
remoteproc: mediatek: Zero out only remaining bytes of IPI buffer
Merge tag 'hwlock-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull hwspinlock updates from Bjorn Andersson:
"This introduces a mechanism in the hardware spinlock framework, and
the Qualcomm TCSR mutex driver, for allowing clients to bust locks
held by a remote processor in the event that this enters a faulty
state while holding the shared lock"
* tag 'hwlock-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
hwspinlock: qcom: implement bust operation
hwspinlock: Introduce hwspin_lock_bust()
Merge tag 'sh-for-v6.11-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
"This is rather small this time and contains just three changes.
The first change by Oscar Salvador drops support for memory hotplug
and hotremove for sh as the kernel stopped supporting it on 32-bit
platforms since 7ec58a2b941e ("mm/memory_hotplug: restrict
CONFIG_MEMORY_HOTPLUG to 64 bit").
That then results in a follow-up change to update all affected board
config files.
The third change comes from Jeff Johnson which adds the missing
MODULE_DESCRIPTION() macro to the push-switch driver"
* tag 'sh-for-v6.11-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: push-switch: Add missing MODULE_DESCRIPTION() macro
sh: config: Drop CONFIG_MEMORY_{HOTPLUG,HOTREMOVE}
sh: Drop support for memory hotplug and memory hotremove
Merge tag 'modules-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module update from Luis Chamberlain:
"This is a super boring development cycle this time around for modules,
there is only one patch in this pull request.
The patch deals with a corner case set of dependencies which is not
resolved today to ensure users get the module they need on initramfs.
Currently only one module is known to exist which needs this, however
this can grow to capture other corner cases likely escaped and not
reported before. The kernel change is just a section update, the real
work is done and merged already on upstream kmod.
This has been on linux-next for 3 weeks now"
* tag 'modules-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module: create weak dependecies
clk: samsung: fix getting Exynos4 fin_pll rate from external clocks
Commit 0dc83ad8bfc9 ("clk: samsung: Don't register clkdev lookup for the
fixed rate clocks") claimed registering clkdev lookup is not necessary
anymore, but that was not entirely true: Exynos4210/4212/4412 clock code
still relied on it to get the clock rate of xxti or xusbxti external
clocks.
Drop that requirement by accessing already registered clk_hw when
looking up the xxti/xusbxti rate.
Merge tag 'livepatching-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching update from Petr Mladek:
- show patch->replace flag in sysfs
- add or improve few selftests
* tag 'livepatching-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
livepatch: Replace snprintf() with sysfs_emit()
selftests/livepatch: Add selftests for "replace" sysfs attribute
livepatch: Add "replace" sysfs attribute
selftests: livepatch: Test atomic replace against multiple modules
selftests/livepatch: define max test-syscall processes
Merge tag 'i2c-for-6.11-rc1-second-batch' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
"The I2C core has two header documentation updates as the dependecies
are in now.
The I2C host drivers add some patches which nearly fell through the
cracks:
- Added descriptions in the DTS for the Qualcomm SM8650 and SM8550
Camera Control Interface (CCI).
- Added support for the "settle-time-us" property, which allows the
gpio-mux device to switch from one bus to another with a
configurable delay. The time can be set in the DTS. The latest
change also includes file sorting.
- Fixed slot numbering in the SMBus framework to prevent failures
when more than 8 slots are occupied. It now enforces a a maximum of
8 slots to be used. This ensures that the Intel PIIX4 device can
register the SPDs correctly without failure, even if other slots
are populated but not used"
* tag 'i2c-for-6.11-rc1-second-batch' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: header: improve kdoc for i2c_algorithm
i2c: header: remove unneeded stuff regarding i2c_algorithm
i2c: piix4: Register SPDs
i2c: smbus: remove i801 assumptions from SPD probing
i2c: mux: gpio: Add support for the 'settle-time-us' property
i2c: mux: gpio: Re-order #include to match alphabetic order
dt-bindings: i2c: mux-gpio: Add 'settle-time-us' property
dt-bindings: i2c: qcom-cci: Document sm8650 compatible
dt-bindings: i2c: qcom-cci: Document sm8550 compatible
Merge tag 'pcmcia-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux
Pull PCMCIA updates from Dominik Brodowski:
"A number of tiny cleanups of the PCMCIA subsystem by Jeff Johnson,
Jules Irenge, and Krzysztof Kozlowski"
* tag 'pcmcia-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux:
pcmcia: add missing MODULE_DESCRIPTION() macros
pcmcia: Use resource_size function on resource object
pcmcia: bcm63xx: drop driver owner assignment
Merge tag 'for-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:
"Power-supply core:
- new charging_orange_full_green RGB LED trigger
- simplify and cleanup power-supply LED trigger code
- expose power information via hwmon compatibility layer
New hardware support:
- enable battery support for Qualcomm Snapdragon X Elite
- new battery driver for Maxim MAX17201/MAX17205
- new battery driver for Lenovo Yoga C630 laptop (custom EC)
Cleanups:
- cleanup 'struct i2c_device_id' initializations
- misc small battery driver cleanups and fixes"
* tag 'for-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: sysfs: use power_supply_property_is_writeable()
power: supply: qcom_battmgr: Enable battery support on x1e80100
power: supply: add support for MAX1720x standalone fuel gauge
dt-bindings: power: supply: add support for MAX17201/MAX17205 fuel gauge
power: reset: piix4: add missing MODULE_DESCRIPTION() macro
power: supply: samsung-sdi-battery: Constify struct power_supply_maintenance_charge_table
power: supply: samsung-sdi-battery: Constify struct power_supply_vbat_ri_table
power: supply: lenovo_yoga_c630_battery: add Lenovo C630 driver
power: supply: ingenic: Fix some error handling paths in ingenic_battery_get_property()
power: supply: ab8500: Clean some error messages
power: supply: ab8500: Use iio_read_channel_processed_scale()
power: supply: ab8500: Fix error handling when calling iio_read_channel_processed()
power: supply: hwmon: Add support for power sensors
power: supply: ab8500: remove unused struct 'inst_curr_result_list'
power: supply: bd99954: remove unused struct 'battery_data'
power: supply: leds: Add activate() callback to triggers
power: supply: leds: Share trig pointer for online and charging_full
power: supply: leds: Add power_supply_[un]register_led_trigger()
power: supply: Drop explicit initialization of struct i2c_device_id::driver_data to 0
Steve French [Sun, 21 Jul 2024 20:45:56 +0000 (15:45 -0500)]
cifs: fix potential null pointer use in destroy_workqueue in init_cifs error path
Dan Carpenter reported a Smack static checker warning:
fs/smb/client/cifsfs.c:1981 init_cifs()
error: we previously assumed 'serverclose_wq' could be null (see line 1895)
The patch which introduced the serverclose workqueue used the wrong
oredering in error paths in init_cifs() for freeing it on errors.
Currently, sysreg has value as 0b0010 for the presence of GICv4.1 in
ID_PFR1_EL1 and ID_AA64PFR0_EL1, instead of 0b0011 as per ARM ARM.
Hence, correct them to reflect ARM ARM.
Fangrui Song [Thu, 18 Jul 2024 17:34:23 +0000 (10:34 -0700)]
arm64/vdso: Remove --hash-style=sysv
glibc added support for .gnu.hash in 2006 and .hash has been obsoleted
for more than one decade in many Linux distributions. Using
--hash-style=sysv might imply unaddressed issues and confuse readers.
Just drop the option and rely on the linker default, which is likely
"both", or "gnu" when the distribution really wants to eliminate sysv
hash overhead.
Similar to commit 6b7e26547fad ("x86/vdso: Emit a GNU hash").
Since the commit 819e50e25d0c ("arm64: Add ftrace support"),
HAVE_FUNCTION_GRAPH_TRACER has always been enabled. Although a subsequent
commit 364697032246 ("arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL")
redundantly added check on HAVE_FUNCTION_GRAPH_TRACER, while enabling the
config HAVE_FUNCTION_GRAPH_RETVAL. Let's just drop this redundant check.
Sven Schnelle [Mon, 22 Jul 2024 13:41:27 +0000 (15:41 +0200)]
s390/kdump: Make kdump ready for lowcore relocation
In preparation of having lowcore at different address than zero,
add the base register to all lowcore accesses in store_status()
and __do_machine_kdump().
Sven Schnelle [Mon, 22 Jul 2024 13:41:19 +0000 (15:41 +0200)]
s390/entry: Add base register to CHECK_VMAP_STACK/CHECK_STACK macro
In preparation of having lowcore at different address than zero,
add the base register to CHECK_VMAP_STACK and CHECK_STACK. No
functional change, because %r0 is passed to the macro.
Sven Schnelle [Mon, 22 Jul 2024 13:41:18 +0000 (15:41 +0200)]
s390/entry: Add base register to SIEEXIT macro
In preparation of having lowcore at different address than zero,
add the base register to SIEEXIT. No functional change, because
%r0 is passed to the macro.
Sven Schnelle [Mon, 22 Jul 2024 13:41:17 +0000 (15:41 +0200)]
s390/entry: Add base register to MBEAR macro
In preparation of having lowcore at different address than zero,
add the base register to MBEAR. No functional change, because
%r0 is passed to the macro.
Sven Schnelle [Mon, 22 Jul 2024 13:41:14 +0000 (15:41 +0200)]
s390: Add infrastructure to patch lowcore accesses
The s390 architecture defines two special per-CPU data pages
called the "prefix area". In s390-linux terminology this is usually
called "lowcore". This memory area contains system configuration
data like old/new PSW's for system call/interrupt/machine check
handlers and lots of other data. It is normally mapped to logical
address 0. This area can only be accessed when in supervisor mode.
This means that kernel code can dereference NULL pointers, because
accesses to address 0 are allowed. Parts of lowcore can be write
protected, but read accesses and write accesses outside of the write
protected areas are not caught.
To remove this limitation for debugging and testing, remap lowcore to
another address and define a function get_lowcore() which simply
returns the address where lowcore is mapped at. This would normally
introduce a pointer dereference (=memory read). As lowcore is used
for several very often used variables, add code to patch this function
during runtime, so we avoid the memory reads.
For C code get_lowcore() has to be used, for assembly code it is
the GET_LC macro. When using this macro/function a reference is added
to alternative patching. All these locations will be patched to the
actual lowcore location when the kernel is booted or a module is loaded.
To make debugging/bisecting problems easier, this patch adds all the
infrastructure but the lowcore address is still hardwired to 0. This
way the code can be converted on a per function basis, and the
functionality is enabled in a patch after all the functions have
been converted.
Note that this requires at least z16 because the old lpsw instruction
only allowed a 12 bit displacement. z16 introduced lpswey which allows
20 bits (signed), so the lowcore can effectively be mapped from
address 0 - 0x7e000. To use 0x7e000 as address, a 6 byte lgfi
instruction would have to be used in the alternative. To save two
bytes, llilh can be used, but this only allows to set bits 16-31 of
the address. In order to use the llilh instruction, use 0x70000 as
alternative lowcore address. This is still large enough to catch
NULL pointer dereferences into large arrays.
The low level machine check handler code fills the ptregs structure
partially with the register contents present at machine check handler
entry and partially with contents from the machine check save area.
In case of a machine check the contents of all general purpose
registers are saved by the CPU to the machine check save area.
Therefore simplify the code and fill the ptregs structure by only
using the machine check save area as source.
s390/alternatives: Remove alternative facility list
The alternative and the normal facility list are always identical. Remove
the alternative facility list, which allows to simplify the alternatives
code.
The nospec implementation is deeply integrated into the alternatives
code: only for nospec an alternative facility list is implemented and
used by the alternative code, while it is modified by nospec specific
needs.
Push down the nospec alternative handling into the nospec by
introducing a new alternative type and a specific nospec callback to
decide if alternatives should be applied.
Also introduce a new global nobp variable which together with facility
82 can be used to decide if nobp is enabled or not.
Sven Schnelle [Tue, 16 Jul 2024 11:50:54 +0000 (13:50 +0200)]
s390/alternatives: Allow early alternative patching in decompressor
Add the required code to patch alternatives early in the decompressor.
This is required for the upcoming lowcore relocation changes, where
alternatives for facility 193 need to get patched before lowcore
alternatives.
Rework alternatives to allow for callbacks. With this every
alternative entry has additional data encoded:
- When (aka context) an alternative is supposed to be applied
- The type of an alternative, which allows for type specific handling
and callbacks
- Extra type specific payload (patch information), which can be passed
to callbacks in order to decide if an alternative should be applied
or not
With this only the "late" context is implemented, which means there is
no change to the previous behaviour. All code is just converted to the
more generic new infrastructure.
s390/uaccess: Make s390_kernel_write() usable for decompressor
To avoid lots of ifdefs in C code make s390_kernel_write() usable for
the decompressor: simply use memcpy() for this case since there is no
write protection enabled that early.
Move all text sync functions from alternative.c to processor.c. This
way there is only minimal code left in alternative.c left, which is a
prerequisite to use the C file within boot code as well.
s390/alternatives: Merge both alternative header files
The two alternative header files must stay in sync. This is easier to
achieve within one header file. Therefore merge both of them and have
only one file, like most other architectures.
The alternative code is using the words facility and feature for the
same. Rename facility to more generic feature everywhere to have
consistent naming.
Sven Schnelle [Tue, 16 Jul 2024 11:50:48 +0000 (13:50 +0200)]
s390/alternatives: Remove noaltinstr option
The current Kernel doesn't boot without alternative patching on
z16 machines. To avoid such bugs in the future, remove the option
disable alternative patching.
Sven Schnelle [Tue, 16 Jul 2024 07:26:15 +0000 (09:26 +0200)]
s390: Move CIF flags to struct pcpu
To allow testing flags for offline CPUs, move the CIF flags
to struct pcpu. To avoid having to calculate the array index
for each access, add a pointer to the pcpu member for the current
cpu to lowcore.
Sven Schnelle [Tue, 16 Jul 2024 07:26:14 +0000 (09:26 +0200)]
s390/smp: Switch pcpu_devices to percpu
In preparation of moving the CIF flags from lowcore to pcpu_devices,
convert the pcpu_devices array to use the percpu infrastructure.
This is required because using the pcpu_devices array as it is would
introduce a performance penalty due to the fact that CPU flags for
multiple CPUs would end up in the same cacheline.
Note that a pointer to the pcpu struct of the IPL CPU is still required.
This is because a restart interrupt can be triggered on an offline CPU.
s390 stores the percpu offset in lowcore, but offline CPUs have no
lowcore area allocated. So percpu data cannot be used from an offline
CPU and we need to get the pcpu pointer for the IPL cpu from somewhere
else.
Sven Schnelle [Tue, 16 Jul 2024 07:26:13 +0000 (09:26 +0200)]
s390/smp: Handle restart interrupt on ipl cpu
The current smp code allows to trigger a restart interrupt on CPUs
offline in linux. To allow using the percpu infrastructure instead
of the pcpu_devices array, switch to the ipl cpu which is always
online before calling do_restart().
s390/boot: Do not assume the decompressor range is reserved
When allocating a random memory range for .amode31 sections
the minimal randomization address is 0. That does not lead
to a possible overlap with the decompressor image (which also
starts from 0) since by that time the image range is already
reserved.
Do not assume the decompressor range is reserved and always
provide the minimal randomization address for .amode31
sections beyond the decompressor. That is a prerequisite
for moving the lowcore memory address from NULL elsewhere.
Thomas Richter [Mon, 15 Jul 2024 10:07:29 +0000 (12:07 +0200)]
s390/cpum_cf: Fix endless loop in CF_DIAG event stop
Event CF_DIAG reads out complete counter sets using stcctm
instruction. This is done at event start time when the process
starts execution and at event stop time when the process is
removed from the CPU. During removal the difference of each
counter in the counter sets is calculated and saved as raw data
in the ring buffer. This works fine unless the number of counters
in a counter set is zero. This may happen for the extended counter
set. This set is machine specific and the size of the counter
set can be zero even when extended counter set is authorized for
read access.
This case is not handled. cfdiag_diffctr() checks authorization
of the extended counter set. If true the functions assumes
the extended counter set has been saved in a data buffer. However
this is not the case, cfdiag_getctrset() does not save a counter
set with counter set size of zero. This mismatch causes an endless
loop in the counter set readout during event stop handling.
The calculation of the difference of the counters in each counter
now verifies the size of the counter set is non-zero. A counter set
with size zero is skipped.
s390/kmsan: Fix merge conflict with get_lowcore() introduction
Resolve the conflict between commit 2a48c8c9cf87 ("s390/kmsan:
implement the architecture-specific functions") and commit 39976f1278a9
("s390: Remove S390_lowcore").
s390/setup: Fix __pa/__va for modules under non-GPL licenses
The struct vm_layout contains fields used in __pa/__va calculations. Such
fundamental things have to be exported with EXPORT_SYMBOL to avoid
breakages of out-of-tree modules under non-GPL licenses.
Fixes: 7de0446f0b26 ("s390/boot: Make identity mapping base address explicit") Acked-by: Heiko Carstens <[email protected]> Acked-by: Alexander Gordeev <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
s390/pci: Allow allocation of more than 1 MSI interrupt
On a PCI adapter that provides up to 8 MSI interrupt sources the s390
implementation of PCI interrupts rejected to accommodate them, although
the underlying hardware is able to support that.
For MSI-X it is sufficient to allocate a single irq_desc per msi_desc,
but for MSI multiple irq descriptors are attached to and controlled by
a single msi descriptor. Add the appropriate loops to maintain multiple
irq descriptors and tie/untie them to/from the appropriate AIBV bit, if
a device driver allocates more than 1 MSI interrupt.
Common PCI code passes on requests to allocate a number of interrupt
vectors based on the device drivers' demand and the PCI functions'
capabilities. However, the root-complex of s390 systems support just a
limited number of interrupt vectors per PCI function.
Produce a kernel log message to inform about any architecture-specific
capping that might be done.
With this change, we had a PCI adapter successfully raising
interrupts to its device driver via all 8 sources.
Factor out adapter interrupt allocation from arch_setup_msi_irqs() in
preparation for enabling registration of multiple MSIs. Code movement
only, no change of functionality intended.