Linus Torvalds [Thu, 24 Oct 2024 23:31:58 +0000 (16:31 -0700)]
Merge tag 'hid-for-linus-20241024' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
"Device-specific functionality quirks for Thinkpad X1 Gen3, Logitech
Bolt and some Goodix touchpads (Bartłomiej Maryńczak, Hans de Goede
and Kenneth Albanowski)"
* tag 'hid-for-linus-20241024' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: lenovo: Add support for Thinkpad X1 Tablet Gen 3 keyboard
HID: multitouch: Add quirk for Logitech Bolt receiver w/ Casa touchpad
HID: i2c-hid: Delayed i2c resume wakeup for 0x0d42 Goodix touchpad
Linus Torvalds [Thu, 24 Oct 2024 21:17:34 +0000 (14:17 -0700)]
Merge tag 'loongarch-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Get correct cores_per_package for SMT systems, enable IRQ if do_ale()
triggered in irq-enabled context, and fix some bugs about vDSO, memory
managenent, hrtimer in KVM, etc"
* tag 'loongarch-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Mark hrtimer to expire in hard interrupt context
LoongArch: Make KASAN usable for variable cpu_vabits
LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
LoongArch: Don't crash in stack_top() for tasks without vDSO
LoongArch: Set correct size for vDSO code mapping
LoongArch: Enable IRQ if do_ale() triggered in irq-enabled context
LoongArch: Get correct cores_per_package for SMT systems
LoongArch: Use "Exception return address" to comment ERA
Linus Torvalds [Thu, 24 Oct 2024 20:51:58 +0000 (13:51 -0700)]
Merge tag 'probes-fixes-v6.12-rc4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- objpool: Fix choosing allocation for percpu slots
Fixes to allocate objpool's percpu slots correctly according to the
GFP flag. It checks whether "any bit" in GFP_ATOMIC is set to choose
the vmalloc source, but it should check "all bits" in GFP_ATOMIC flag
is set, because GFP_ATOMIC is a combined flag.
If more than MAX_TRACE_ARGS are passed for creating a probe event,
the entries over MAX_TRACE_ARG in trace_arg array are not
initialized. Thus if the kernel accesses those entries, it crashes.
This rejects creating event if the number of arguments is over
MAX_TRACE_ARGS.
- tracing: Consider the NUL character when validating the event length
A strlen() is used when parsing the event name, and the original code
does not consider the terminal null byte. Thus it can pass the name
one byte longer than the buffer. This fixes to check it correctly.
* tag 'probes-fixes-v6.12-rc4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Consider the NULL character when validating the event length
tracing/probes: Fix MAX_TRACE_ARGS limit handling
objpool: fix choosing allocation for percpu slots
Linus Torvalds [Thu, 24 Oct 2024 20:04:15 +0000 (13:04 -0700)]
Merge tag 'for-6.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- mount option fixes:
- fix handling of compression mount options on remount
- reject rw remount in case there are options that don't work
in read-write mode (like rescue options)
- fix zone accounting of unusable space
- fix in-memory corruption when merging extent maps
- fix delalloc range locking for sector < page
- use more convenient default value of drop subtree threshold, clean
more subvolumes without the fallback to marking quotas inconsistent
- fix smatch warning about incorrect value passed to ERR_PTR
* tag 'for-6.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix passing 0 to ERR_PTR in btrfs_search_dir_index_item()
btrfs: reject ro->rw reconfiguration if there are hard ro requirements
btrfs: fix read corruption due to race with extent map merging
btrfs: fix the delalloc range locking if sector size < page size
btrfs: qgroup: set a more sane default value for subtree drop threshold
btrfs: clear force-compress on remount when compress mount option is given
btrfs: zoned: fix zone unusable accounting for freed reserved extent
Linus Torvalds [Thu, 24 Oct 2024 19:38:59 +0000 (12:38 -0700)]
Merge tag 'bcachefs-2024-10-22' of https://github.com/koverstreet/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Lots of hotfixes:
- transaction restart injection has been shaking out a few things
- fix a data corruption in the buffered write path on -ENOSPC, found
by xfstests generic/299
- Some small show_options fixes
- Repair mismatches in inode hash type, seed: different snapshot
versions of an inode must have the same hash/type seed, used for
directory entries and xattrs. We were checking the hash seed, but
not the type, and a user contributed a filesystem where the hash
type on one inode had somehow been flipped; these fixes allow his
filesystem to repair.
Additionally, the hash type flip made some directory entries
invisible, which were then recreated by userspace; so the hash
check code now checks for duplicate non dangling dirents, and
renames one of them if necessary.
- Don't use wait_event_interruptible() in recovery: this fixes some
filesystems failing to mount with -ERESTARTSYS
- Workaround for kvmalloc not supporting > INT_MAX allocations,
causing an -ENOMEM when allocating the sorted array of journal
keys: this allows a 75 TB filesystem to mount
- Make sure bch_inode_unpacked.bi_snapshot is set in the old inode
compat path: this alllows Marcin's filesystem (in use since before
6.7) to repair and mount"
* tag 'bcachefs-2024-10-22' of https://github.com/koverstreet/bcachefs: (26 commits)
bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path
bcachefs: Mark more errors as AUTOFIX
bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
bcachefs: Don't use wait_event_interruptible() in recovery
bcachefs: Fix __bch2_fsck_err() warning
bcachefs: fsck: Improve hash_check_key()
bcachefs: bch2_hash_set_or_get_in_snapshot()
bcachefs: Repair mismatches in inode hash seed, type
bcachefs: Add hash seed, type to inode_to_text()
bcachefs: INODE_STR_HASH() for bch_inode_unpacked
bcachefs: Run in-kernel offline fsck without ratelimit errors
bcachefs: skip mount option handle for empty string.
bcachefs: fix incorrect show_options results
bcachefs: Fix data corruption on -ENOSPC in buffered write path
bcachefs: bch2_folio_reservation_get_partial() is now better behaved
bcachefs: fix disk reservation accounting in bch2_folio_reservation_get()
bcachefS: ec: fix data type on stripe deletion
bcachefs: Don't use commit_do() unnecessarily
bcachefs: handle restarts in bch2_bucket_io_time_reset()
bcachefs: fix restart handling in __bch2_resume_logged_op_finsert()
...
Huacai Chen [Wed, 23 Oct 2024 14:15:44 +0000 (22:15 +0800)]
LoongArch: KVM: Mark hrtimer to expire in hard interrupt context
Like commit 2c0d278f3293f ("KVM: LAPIC: Mark hrtimer to expire in hard
interrupt context") and commit 9090825fa9974 ("KVM: arm/arm64: Let the
timer expire in hardirq context on RT"), On PREEMPT_RT enabled kernels
unmarked hrtimers are moved into soft interrupt expiry mode by default.
Then the timers are canceled from an preempt-notifier which is invoked
with disabled preemption which is not allowed on PREEMPT_RT.
The timer callback is short so in could be invoked in hard-IRQ context.
So let the timer expire on hard-IRQ context even on -RT.
This fix a "scheduling while atomic" bug for PREEMPT_RT enabled kernels:
Huacai Chen [Wed, 23 Oct 2024 14:15:30 +0000 (22:15 +0800)]
LoongArch: Make KASAN usable for variable cpu_vabits
Currently, KASAN on LoongArch assume the CPU VA bits is 48, which is
true for Loongson-3 series, but not for Loongson-2 series (only 40 or
lower), this patch fix that issue and make KASAN usable for variable
cpu_vabits.
Solution is very simple: Just define XRANGE_SHADOW_SHIFT which means
valid address length from VA_BITS to min(cpu_vabits, VA_BITS).
Leo Yan [Mon, 7 Oct 2024 14:47:24 +0000 (15:47 +0100)]
tracing: Consider the NULL character when validating the event length
strlen() returns a string length excluding the null byte. If the string
length equals to the maximum buffer length, the buffer will have no
space for the NULL terminating character.
This commit checks this condition and returns failure for it.
Mikel Rychliski [Mon, 30 Sep 2024 20:26:54 +0000 (16:26 -0400)]
tracing/probes: Fix MAX_TRACE_ARGS limit handling
When creating a trace_probe we would set nr_args prior to truncating the
arguments to MAX_TRACE_ARGS. However, we would only initialize arguments
up to the limit.
This caused invalid memory access when attempting to set up probes with
more than 128 fetchargs.
Yue Haibing [Tue, 22 Oct 2024 09:52:08 +0000 (17:52 +0800)]
btrfs: fix passing 0 to ERR_PTR in btrfs_search_dir_index_item()
The ret may be zero in btrfs_search_dir_index_item() and should not
passed to ERR_PTR(). Now btrfs_unlink_subvol() is the only caller to
this, reconstructed it to check ERR_PTR(-ENOENT) while ret >= 0.
This fixes smatch warnings:
fs/btrfs/dir-item.c:353
btrfs_search_dir_index_item() warn: passing zero to 'ERR_PTR'
btrfs: reject ro->rw reconfiguration if there are hard ro requirements
[BUG]
Syzbot reports the following crash:
BTRFS info (device loop0 state MCS): disabling free space tree
BTRFS info (device loop0 state MCS): clearing compat-ro feature flag for FREE_SPACE_TREE (0x1)
BTRFS info (device loop0 state MCS): clearing compat-ro feature flag for FREE_SPACE_TREE_VALID (0x2)
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:backup_super_roots fs/btrfs/disk-io.c:1691 [inline]
RIP: 0010:write_all_supers+0x97a/0x40f0 fs/btrfs/disk-io.c:4041
Call Trace:
<TASK>
btrfs_commit_transaction+0x1eae/0x3740 fs/btrfs/transaction.c:2530
btrfs_delete_free_space_tree+0x383/0x730 fs/btrfs/free-space-tree.c:1312
btrfs_start_pre_rw_mount+0xf28/0x1300 fs/btrfs/disk-io.c:3012
btrfs_remount_rw fs/btrfs/super.c:1309 [inline]
btrfs_reconfigure+0xae6/0x2d40 fs/btrfs/super.c:1534
btrfs_reconfigure_for_mount fs/btrfs/super.c:2020 [inline]
btrfs_get_tree_subvol fs/btrfs/super.c:2079 [inline]
btrfs_get_tree+0x918/0x1920 fs/btrfs/super.c:2115
vfs_get_tree+0x90/0x2b0 fs/super.c:1800
do_new_mount+0x2be/0xb40 fs/namespace.c:3472
do_mount fs/namespace.c:3812 [inline]
__do_sys_mount fs/namespace.c:4020 [inline]
__se_sys_mount+0x2d6/0x3c0 fs/namespace.c:3997
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[CAUSE]
To support mounting different subvolume with different RO/RW flags for
the new mount APIs, btrfs introduced two workaround to support this feature:
- Skip mount option/feature checks if we are mounting a different
subvolume
- Reconfigure the fs to RW if the initial mount is RO
Combining these two, we can have the following sequence:
- Mount the fs ro,rescue=all,clear_cache,space_cache=v1
rescue=all will mark the fs as hard read-only, so no v2 cache clearing
will happen.
- Mount a subvolume rw of the same fs.
We go into btrfs_get_tree_subvol(), but fc_mount() returns EBUSY
because our new fc is RW, different from the original fs.
Now we enter btrfs_reconfigure_for_mount(), which switches the RO flag
first so that we can grab the existing fs_info.
Then we reconfigure the fs to RW.
- During reconfiguration, option/features check is skipped
This means we will restart the v2 cache clearing, and convert back to
v1 cache.
This will trigger fs writes, and since the original fs has "rescue=all"
option, it skips the csum tree read.
And eventually causing NULL pointer dereference in super block
writeback.
[FIX]
For reconfiguration caused by different subvolume RO/RW flags, ensure we
always run btrfs_check_options() to ensure we have proper hard RO
requirements met.
In fact the function btrfs_check_options() doesn't really do many
complex checks, but hard RO requirement and some feature dependency
checks, thus there is no special reason not to do the check for mount
reconfiguration.
Boris Burkov [Fri, 18 Oct 2024 22:44:34 +0000 (15:44 -0700)]
btrfs: fix read corruption due to race with extent map merging
In debugging some corrupt squashfs files, we observed symptoms of
corrupt page cache pages but correct on-disk contents. Further
investigation revealed that the exact symptom was a correct page
followed by an incorrect, duplicate, page. This got us thinking about
extent maps.
commit ac05ca913e9f ("Btrfs: fix race between using extent maps and merging them")
enforces a reference count on the primary `em` extent_map being merged,
as that one gets modified.
However, since,
commit 3d2ac9922465 ("btrfs: introduce new members for extent_map")
both 'em' and 'merge' get modified, which started modifying 'merge'
and thus introduced the same race.
We were able to reproduce this by looping the affected squashfs workload
in parallel on a bunch of separate btrfs-es while also dropping caches.
We are still working on a simple enough reproducer to make into an fstest.
The simplest fix is to stop modifying 'merge', which is not essential,
as it is dropped immediately after the merge. This behavior is simply
a consequence of the order of the two extent maps being important in
computing the new values. Modify merge_ondisk_extents to take prev and
next by const* and also take a third merged parameter that it puts the
results in. Note that this introduces the rather odd behavior of passing
'em' to merge_ondisk_extents as a const * and as a regular ptr.
Qu Wenruo [Tue, 8 Oct 2024 23:07:03 +0000 (09:37 +1030)]
btrfs: fix the delalloc range locking if sector size < page size
Inside lock_delalloc_folios(), there are several problems related to
sector size < page size handling:
- Set the writer locks without checking if the folio is still valid
We call btrfs_folio_start_writer_lock() just like it's folio_lock().
But since the folio may not even be the folio of the current mapping,
we can easily screw up the folio->private.
- The range is not clamped inside the page
This means we can over write other bitmaps if the start/len is not
properly handled, and trigger the btrfs_subpage_assert().
- @processed_end is always rounded up to page end
If the delalloc range is not page aligned, and we need to retry
(returning -EAGAIN), then we will unlock to the page end.
Thankfully this is not a huge problem, as now
btrfs_folio_end_writer_lock() can handle range larger than the locked
range, and only unlock what is already locked.
Fix all these problems by:
- Lock and check the folio first, then call
btrfs_folio_set_writer_lock()
So that if we got a folio not belonging to the inode, we won't
touch folio->private.
btrfs: qgroup: set a more sane default value for subtree drop threshold
Since commit 011b46c30476 ("btrfs: skip subtree scan if it's too high to
avoid low stall in btrfs_commit_transaction()"), btrfs qgroup can
automatically skip large subtree scan at the cost of marking qgroup
inconsistent.
It's designed to address the final performance problem of snapshot drop
with qgroup enabled, but to be safe the default value is
BTRFS_MAX_LEVEL, requiring a user space daemon to set a different value
to make it work.
I'd say it's not a good idea to rely on user space tool to set this
default value, especially when some operations (snapshot dropping) can
be triggered immediately after mount, leaving a very small window to
that that sysfs interface.
So instead of disabling this new feature by default, enable it with a
low threshold (3), so that large subvolume tree drop at mount time won't
cause huge qgroup workload.
Filipe Manana [Mon, 14 Oct 2024 15:14:18 +0000 (16:14 +0100)]
btrfs: clear force-compress on remount when compress mount option is given
After the migration to use fs context for processing mount options we had
a slight change in the semantics for remounting a filesystem that was
mounted with compress-force. Before we could clear compress-force by
passing only "-o compress[=algo]" during a remount, but after that change
that does not work anymore, force-compress is still present and one needs
to pass "-o compress-force=no,compress[=algo]" to the mount command.
Example, when running on a kernel 6.8+:
$ mount -o compress-force=zlib:9 /dev/sdi /mnt/sdi
$ mount | grep sdi
/dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress-force=zlib:9,discard=async,space_cache=v2,subvolid=5,subvol=/)
$ mount -o remount,compress=zlib:5 /mnt/sdi
$ mount | grep sdi
/dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress-force=zlib:5,discard=async,space_cache=v2,subvolid=5,subvol=/)
On a 6.7 kernel (or older):
$ mount -o compress-force=zlib:9 /dev/sdi /mnt/sdi
$ mount | grep sdi
/dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress-force=zlib:9,discard=async,space_cache=v2,subvolid=5,subvol=/)
$ mount -o remount,compress=zlib:5 /mnt/sdi
$ mount | grep sdi
/dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress=zlib:5,discard=async,space_cache=v2,subvolid=5,subvol=/)
So update btrfs_parse_param() to clear "compress-force" when "compress" is
given, providing the same semantics as kernel 6.7 and older.
Viktor Malik [Mon, 26 Aug 2024 06:07:18 +0000 (08:07 +0200)]
objpool: fix choosing allocation for percpu slots
objpool intends to use vmalloc for default (non-atomic) allocations of
percpu slots and objects. However, the condition checking if GFP flags
set any bit of GFP_ATOMIC is wrong b/c GFP_ATOMIC is a combination of bits
(__GFP_HIGH|__GFP_KSWAPD_RECLAIM) and so `pool->gfp & GFP_ATOMIC` will
be true if either bit is set. Since GFP_ATOMIC and GFP_KERNEL share the
___GFP_KSWAPD_RECLAIM bit, kmalloc will be used in cases when GFP_KERNEL
is specified, i.e. in all current usages of objpool.
This may lead to unexpected OOM errors since kmalloc cannot allocate
large amounts of memory.
For instance, objpool is used by fprobe rethook which in turn is used by
BPF kretprobe.multi and kprobe.session probe types. Trying to attach
these to all kernel functions with libbpf using
SEC("kprobe.session/*")
int kprobe(struct pt_regs *ctx)
{
[...]
}
fails on objpool slot allocation with ENOMEM.
Fix the condition to truly use vmalloc by default.
Mark Brown [Mon, 21 Oct 2024 22:11:40 +0000 (23:11 +0100)]
KVM: selftests: Fix build on on non-x86 architectures
Commit 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX
instructions") unconditionally added -march=x86-64-v2 to the CFLAGS used
to build the KVM selftests which does not work on non-x86 architectures:
cc1: error: unknown value ‘x86-64-v2’ for ‘-march’
Fix this by making the addition of this x86 specific command line flag
conditional on building for x86.
Fixes: 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX instructions") Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Linus Torvalds [Mon, 21 Oct 2024 18:57:38 +0000 (11:57 -0700)]
9p: fix slab cache name creation for real
This was attempted by using the dev_name in the slab cache name, but as
Omar Sandoval pointed out, that can be an arbitrary string, eg something
like "/dev/root". Which in turn trips verify_dirent_name(), which fails
if a filename contains a slash.
So just make it use a sequence counter, and make it an atomic_t to avoid
any possible races or locking issues.
Linus Torvalds [Mon, 21 Oct 2024 18:22:04 +0000 (11:22 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Fix the guest view of the ID registers, making the relevant fields
writable from userspace (affecting ID_AA64DFR0_EL1 and
ID_AA64PFR1_EL1)
- Correcly expose S1PIE to guests, fixing a regression introduced in
6.12-rc1 with the S1POE support
- Fix the recycling of stage-2 shadow MMUs by tracking the context
(are we allowed to block or not) as well as the recycling state
- Address a couple of issues with the vgic when userspace
misconfigures the emulation, resulting in various splats. Headaches
courtesy of our Syzkaller friends
- Stop wasting space in the HYP idmap, as we are dangerously close to
the 4kB limit, and this has already exploded in -next
- Fix another race in vgic_init()
- Fix a UBSAN error when faking the cache topology with MTE enabled
RISCV:
- RISCV: KVM: use raw_spinlock for critical section in imsic
x86:
- A bandaid for lack of XCR0 setup in selftests, which causes trouble
if the compiler is configured to have x86-64-v3 (with AVX) as the
default ISA. Proper XCR0 setup will come in the next merge window.
- Fix an issue where KVM would not ignore low bits of the nested CR3
and potentially leak up to 31 bytes out of the guest memory's
bounds
- Fix case in which an out-of-date cached value for the segments
could by returned by KVM_GET_SREGS.
- More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL
- Override MTRR state for KVM confidential guests, making it WB by
default as is already the case for Hyper-V guests.
Generic:
- Remove a couple of unused functions"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
RISCV: KVM: use raw_spinlock for critical section in imsic
KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups
KVM: selftests: x86: Avoid using SSE/AVX instructions
KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory
KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL
KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot
x86/kvm: Override default caching mode for SEV-SNP and TDX
KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic
KVM: Remove unused kvm_vcpu_gfn_to_pfn
KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration
KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS
KVM: arm64: Fix shift-out-of-bounds bug
KVM: arm64: Shave a few bytes from the EL2 idmap code
KVM: arm64: Don't eagerly teardown the vgic on init error
KVM: arm64: Expose S1PIE to guests
KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule
KVM: arm64: nv: Punt stage-2 recycling to a vCPU request
KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed
...
Linus Torvalds [Mon, 21 Oct 2024 18:08:05 +0000 (11:08 -0700)]
Merge tag 'probes-fixes-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull uprobe fix from Masami Hiramatsu:
- uprobe: avoid out-of-bounds memory access of fetching args
Uprobe trace events can cause out-of-bounds memory access when
fetching user-space data which is bigger than one page, because it
does not check the local CPU buffer size when reading the data. This
checks the read data size and cut it down to the local CPU buffer
size.
* tag 'probes-fixes-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
uprobe: avoid out-of-bounds memory access of fetching args
Linus Torvalds [Mon, 21 Oct 2024 17:48:24 +0000 (10:48 -0700)]
Merge tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"afs:
- Fix a lock recursion in afs_wake_up_async_call() on ->notify_lock
netfs:
- Drop the references to a folio immediately after the folio has been
extracted to prevent races with future I/O collection
- Fix a documenation build error
- Downgrade the i_rwsem for buffered writes to fix a cifs reported
performance regression when switching to netfslib
vfs:
- Explicitly return -E2BIG from openat2() if the specified size is
unexpectedly large. This aligns openat2() with other extensible
struct based system calls
- When copying a mount namespace ensure that we only try to remove
the new copy from the mount namespace rbtree if it has already been
added to it
nilfs:
- Clear the buffer delay flag when clearing the buffer state clags
when a buffer head is discarded to prevent a kernel OOPs
ocfs2:
- Fix an unitialized value warning in ocfs2_setattr()
proc:
- Fix a kernel doc warning"
* tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
proc: Fix W=1 build kernel-doc warning
afs: Fix lock recursion
fs: Fix uninitialized value issue in from_kuid and from_kgid
fs: don't try and remove empty rbtree node
netfs: Downgrade i_rwsem for a buffered write
nilfs2: fix kernel bug due to missing clearing of buffer delay flag
openat2: explicitly return -E2BIG for (usize > PAGE_SIZE)
netfs: fix documentation build error
netfs: In readahead, put the folio refs as soon extracted
Bibo Mao [Mon, 21 Oct 2024 14:11:19 +0000 (22:11 +0800)]
LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
There are two pages in one TLB entry on LoongArch system. For kernel
space, it requires both two pte entries (buddies) with PAGE_GLOBAL bit
set, otherwise HW treats it as non-global tlb, there will be potential
problems if tlb entry for kernel space is not global. Such as fail to
flush kernel tlb with the function local_flush_tlb_kernel_range() which
supposed only flush tlb with global bit.
Kernel address space areas include percpu, vmalloc, vmemmap, fixmap and
kasan areas. For these areas both two consecutive page table entries
should be enabled with PAGE_GLOBAL bit. So with function set_pte() and
pte_clear(), pte buddy entry is checked and set besides its own pte
entry. However it is not atomic operation to set both two pte entries,
there is problem with test_vmalloc test case.
So function kernel_pte_init() is added to init a pte table when it is
created for kernel address space, and the default initial pte value is
PAGE_GLOBAL rather than zero at beginning. Then only its own pte entry
need update with function set_pte() and pte_clear(), nothing to do with
the pte buddy entry.
Thomas Weißschuh [Mon, 21 Oct 2024 14:11:19 +0000 (22:11 +0800)]
LoongArch: Don't crash in stack_top() for tasks without vDSO
Not all tasks have a vDSO mapped, for example kthreads never do. If such
a task ever ends up calling stack_top(), it will derefence the NULL vdso
pointer and crash.
Huacai Chen [Mon, 21 Oct 2024 14:11:19 +0000 (22:11 +0800)]
LoongArch: Set correct size for vDSO code mapping
The current size of vDSO code mapping is hardcoded to PAGE_SIZE. This
cannot work for 4KB page size after commit 18efd0b10e0fd77 ("LoongArch:
vDSO: Wire up getrandom() vDSO implementation") because the code size
increases to 8KB. Thus set the code mapping size to its real size, i.e.
PAGE_ALIGN(vdso_end - vdso_start).
Fixes: 18efd0b10e0fd77 ("LoongArch: vDSO: Wire up getrandom() vDSO implementation") Reviewed-by: Xi Ruoyao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
Huacai Chen [Mon, 21 Oct 2024 14:11:19 +0000 (22:11 +0800)]
LoongArch: Enable IRQ if do_ale() triggered in irq-enabled context
Unaligned access exception can be triggered in irq-enabled context such
as user mode, in this case do_ale() may call get_user() which may cause
sleep. Then we will get:
Huacai Chen [Mon, 21 Oct 2024 14:11:18 +0000 (22:11 +0800)]
LoongArch: Get correct cores_per_package for SMT systems
In loongson_sysconf, The "core" of cores_per_node and cores_per_package
stands for a logical core, which means in a SMT system it stands for a
thread indeed. This information is gotten from SMBIOS Type4 Structure,
so in order to get a correct cores_per_package for both SMT and non-SMT
systems in parse_cpu_table() we should use SMBIOS_THREAD_PACKAGE_OFFSET
instead of SMBIOS_CORE_PACKAGE_OFFSET.
Yanteng Si [Mon, 21 Oct 2024 14:11:18 +0000 (22:11 +0800)]
LoongArch: Use "Exception return address" to comment ERA
The information contained in the comment for LOONGARCH_CSR_ERA is even
less informative than the macro itself, which can cause confusion for
junior developers. Let's use the full English term.
Qiao Ma [Tue, 15 Oct 2024 06:01:48 +0000 (14:01 +0800)]
uprobe: avoid out-of-bounds memory access of fetching args
Uprobe needs to fetch args into a percpu buffer, and then copy to ring
buffer to avoid non-atomic context problem.
Sometimes user-space strings, arrays can be very large, but the size of
percpu buffer is only page size. And store_trace_args() won't check
whether these data exceeds a single page or not, caused out-of-bounds
memory access.
It could be reproduced by following steps:
1. build kernel with CONFIG_KASAN enabled
2. save follow program as test.c
// If string length large than MAX_STRING_SIZE, the fetch_store_strlen()
// will return 0, cause __get_data_size() return shorter size, and
// store_trace_args() will not trigger out-of-bounds access.
// So make string length less than 4096.
\#define STRLEN 4093
void generate_string(char *str, int n)
{
int i;
for (i = 0; i < n; ++i)
{
char c = i % 26 + 'a';
str[i] = c;
}
str[n-1] = '\0';
}
Linus Torvalds [Sun, 20 Oct 2024 21:08:17 +0000 (14:08 -0700)]
Merge tag 'for-net-2024-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Pull bluetooth fixes from Luiz Augusto Von Dentz:
- ISO: Fix multiple init when debugfs is disabled
- Call iso_exit() on module unload
- Remove debugfs directory on module init failure
- btusb: Fix not being able to reconnect after suspend
- btusb: Fix regression with fake CSR controllers 0a12:0001
- bnep: fix wild-memory-access in proto_unregister
Note: normally the bluetooth fixes go through the networking tree, but
this missed the weekly merge, and two of the commits fix regressions
that have caused a fair amount of noise and have now hit stable too:
So I'm pulling it directly just to expedite things and not miss yet
another -rc release. This is not meant to become a new pattern.
* tag 'for-net-2024-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btusb: Fix regression with fake CSR controllers 0a12:0001
Bluetooth: bnep: fix wild-memory-access in proto_unregister
Bluetooth: btusb: Fix not being able to reconnect after suspend
Bluetooth: Remove debugfs directory on module init failure
Bluetooth: Call iso_exit() on module unload
Bluetooth: ISO: Fix multiple init when debugfs is disabled
Linus Torvalds [Sun, 20 Oct 2024 20:55:46 +0000 (13:55 -0700)]
Merge tag 'pinctrl-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Mostly error path fixes, but one pretty serious interrupt problem in
the Ocelot driver as well:
- Fix two error paths and a missing semicolon in the Intel driver
- Add a missing ACPI ID for the Intel Panther Lake
- Check return value of devm_kasprintf() in the Apple and STM32
drivers
- Add a missing mutex_destroy() in the aw9523 driver
- Fix a double free in cv1800_pctrl_dt_node_to_map() in the Sophgo
driver
- Fix a double free in ma35_pinctrl_dt_node_to_map_func() in the
Nuvoton driver
- Fix a bug in the Ocelot interrupt handler making the system hang"
* tag 'pinctrl-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: ocelot: fix system hang on level based interrupts
pinctrl: nuvoton: fix a double free in ma35_pinctrl_dt_node_to_map_func()
pinctrl: sophgo: fix double free in cv1800_pctrl_dt_node_to_map()
pinctrl: intel: platform: Add Panther Lake to the list of supported
pinctrl: aw9523: add missing mutex_destroy
pinctrl: stm32: check devm_kasprintf() returned value
pinctrl: apple: check devm_kasprintf() returned value
pinctrl: intel: platform: use semicolon instead of comma in ncommunities assignment
pinctrl: intel: platform: fix error path in device_for_each_child_node()
Kent Overstreet [Sat, 19 Oct 2024 22:27:09 +0000 (18:27 -0400)]
bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations
kvmalloc() doesn't support allocations > INT_MAX, but vmalloc() does -
the limit should be lifted, but we can work around this for now.
A user with a 75 TB filesystem reported the following journal replay
error:
https://github.com/koverstreet/bcachefs/issues/769
In journal replay we have to sort and dedup all the keys from the
journal, which means we need a large contiguous allocation. Given that
the user has 128GB of ram, the 2GB limit on allocation size has become
far too small.
Linus Torvalds [Sun, 20 Oct 2024 20:10:44 +0000 (13:10 -0700)]
Merge tag 'char-misc-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull misc driver fixes from Greg KH:
"Here are a number of small char/misc/iio driver fixes for 6.12-rc4:
- loads of small iio driver fixes for reported problems
- parport driver out-of-bounds fix
- Kconfig description and MAINTAINERS file updates
All of these, except for the Kconfig and MAINTAINERS file updates have
been in linux-next all week. Those other two are just documentation
changes and will have no runtime issues and were merged on Friday"
* tag 'char-misc-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (39 commits)
misc: rtsx: list supported models in Kconfig help
MAINTAINERS: Remove some entries due to various compliance requirements.
misc: microchip: pci1xxxx: add support for NVMEM_DEVID_AUTO for OTP device
misc: microchip: pci1xxxx: add support for NVMEM_DEVID_AUTO for EEPROM device
parport: Proper fix for array out-of-bounds access
iio: frequency: admv4420: fix missing select REMAP_SPI in Kconfig
iio: frequency: {admv4420,adrf6780}: format Kconfig entries
iio: adc: ad4695: Add missing Kconfig select
iio: adc: ti-ads8688: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig
iio: hid-sensors: Fix an error handling path in _hid_sensor_set_report_latency()
iioc: dac: ltc2664: Fix span variable usage in ltc2664_channel_config()
iio: dac: stm32-dac-core: add missing select REGMAP_MMIO in Kconfig
iio: dac: ltc1660: add missing select REGMAP_SPI in Kconfig
iio: dac: ad5770r: add missing select REGMAP_SPI in Kconfig
iio: amplifiers: ada4250: add missing select REGMAP_SPI in Kconfig
iio: frequency: adf4377: add missing select REMAP_SPI in Kconfig
iio: resolver: ad2s1210: add missing select (TRIGGERED_)BUFFER in Kconfig
iio: resolver: ad2s1210 add missing select REGMAP in Kconfig
iio: proximity: mb1232: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig
iio: pressure: bm1390: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig
...
Linus Torvalds [Sun, 20 Oct 2024 19:57:53 +0000 (12:57 -0700)]
Merge tag 'usb-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes and new device ids for 6.12-rc4:
- xhci driver fixes for a number of reported issues
- new usb-serial driver ids
- dwc3 driver fixes for reported problems.
- usb gadget driver fixes for reported problems
- typec driver fixes
- MAINTAINER file updates
All of these have been in linux-next this week with no reported issues"
* tag 'usb-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: option: add Telit FN920C04 MBIM compositions
USB: serial: option: add support for Quectel EG916Q-GL
xhci: dbc: honor usb transfer size boundaries.
usb: xhci: Fix handling errors mid TD followed by other errors
xhci: Mitigate failed set dequeue pointer commands
xhci: Fix incorrect stream context type macro
USB: gadget: dummy-hcd: Fix "task hung" problem
usb: gadget: f_uac2: fix return value for UAC2_ATTRIBUTE_STRING store
usb: dwc3: core: Fix system suspend on TI AM62 platforms
xhci: tegra: fix checked USB2 port number
usb: dwc3: Wait for EndXfer completion before restoring GUSB2PHYCFG
usb: typec: qcom-pmic-typec: fix sink status being overwritten with RP_DEF
usb: typec: altmode should keep reference to parent
MAINTAINERS: usb: raw-gadget: add bug tracker link
MAINTAINERS: Add an entry for the LJCA drivers
Linus Torvalds [Sun, 20 Oct 2024 19:04:32 +0000 (12:04 -0700)]
Merge tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Explicitly disable the TSC deadline timer when going idle to address
some CPU errata in that area
- Do not apply the Zenbleed fix on anything else except AMD Zen2 on the
late microcode loading path
- Clear CPU buffers later in the NMI exit path on 32-bit to avoid
register clearing while they still contain sensitive data, for the
RDFS mitigation
- Do not clobber EFLAGS.ZF with VERW on the opportunistic SYSRET exit
path on 32-bit
- Fix parsing issues of memory bandwidth specification in sysfs for
resctrl's memory bandwidth allocation feature
- Other small cleanups and improvements
* tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Always explicitly disarm TSC-deadline timer
x86/CPU/AMD: Only apply Zenbleed fix for Zen2 during late microcode load
x86/bugs: Use code segment selector for VERW operand
x86/entry_32: Clear CPU buffers after register restore in NMI return
x86/entry_32: Do not clobber user EFLAGS.ZF
x86/resctrl: Annotate get_mem_config() functions as __init
x86/resctrl: Avoid overflow in MB settings in bw_validate()
x86/amd_nb: Add new PCI ID for AMD family 1Ah model 20h
Linus Torvalds [Sun, 20 Oct 2024 18:30:56 +0000 (11:30 -0700)]
Merge tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduling fixes from Borislav Petkov:
- Add PREEMPT_RT maintainers
- Fix another aspect of delayed dequeued tasks wrt determining their
state, i.e., whether they're runnable or blocked
- Handle delayed dequeued tasks and their migration wrt PSI properly
- Fix the situation where a delayed dequeue task gets enqueued into a
new class, which should not happen
- Fix a case where memory allocation would happen while the runqueue
lock is held, which is a no-no
- Do not over-schedule when tasks with shorter slices preempt the
currently running task
- Make sure delayed to deque entities are properly handled before
unthrottling
- Other smaller cleanups and improvements
* tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add an entry for PREEMPT_RT.
sched/fair: Fix external p->on_rq users
sched/psi: Fix mistaken CPU pressure indication after corrupted task state bug
sched/core: Dequeue PSI signals for blocked tasks that are delayed
sched: Fix delayed_dequeue vs switched_from_fair()
sched/core: Disable page allocation in task_tick_mm_cid()
sched/deadline: Use hrtick_enabled_dl() before start_hrtick_dl()
sched/eevdf: Fix wakeup-preempt by checking cfs_rq->nr_running
sched: Fix sched_delayed vs cfs_bandwidth
Paolo Bonzini [Sun, 20 Oct 2024 16:10:56 +0000 (12:10 -0400)]
Merge tag 'kvmarm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.12, take #2
- Fix the guest view of the ID registers, making the relevant fields
writable from userspace (affecting ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1)
- Correcly expose S1PIE to guests, fixing a regression introduced
in 6.12-rc1 with the S1POE support
- Fix the recycling of stage-2 shadow MMUs by tracking the context
(are we allowed to block or not) as well as the recycling state
- Address a couple of issues with the vgic when userspace misconfigures
the emulation, resulting in various splats. Headaches courtesy
of our Syzkaller friends
Cyan Yang [Thu, 19 Sep 2024 16:01:26 +0000 (00:01 +0800)]
RISCV: KVM: use raw_spinlock for critical section in imsic
For the external interrupt updating procedure in imsic, there was a
spinlock to protect it already. But since it should not be preempted in
any cases, we should turn to use raw_spinlock to prevent any preemption
in case PREEMPT_RT was enabled.
KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups
When looking for a "mangled", i.e. dynamic, CPUID entry, terminate the
walk based on the number of array _entries_, not the size in bytes of
the array. Iterating based on the total size of the array can result in
false passes, e.g. if the random data beyond the array happens to match
a CPUID entry's function and index.
KVM: selftests: x86: Avoid using SSE/AVX instructions
Some distros switched gcc to '-march=x86-64-v3' by default and while it's
hard to find a CPU which doesn't support it today, many KVM selftests fail
with
==== Test Assertion Failure ====
lib/x86_64/processor.c:570: Unhandled exception in guest
pid=72747 tid=72747 errno=4 - Interrupted system call
Unhandled exception '0x6' at guest RIP '0x4104f7'
The failure is easy to reproduce elsewhere with
$ make clean && CFLAGS='-march=x86-64-v3' make -j && ./x86_64/kvm_pv_test
The root cause of the problem seems to be that with '-march=x86-64-v3' GCC
uses AVX* instructions (VMOVQ in the example above) and without prior
XSETBV() in the guest this results in #UD. It is certainly possible to add
it there, e.g. the following saves the day as well:
KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory
Ignore nCR3[4:0] when loading PDPTEs from memory for nested SVM, as bits
4:0 of CR3 are ignored when PAE paging is used, and thus VMRUN doesn't
enforce 32-byte alignment of nCR3.
In the absolute worst case scenario, failure to ignore bits 4:0 can result
in an out-of-bounds read, e.g. if the target page is at the end of a
memslot, and the VMM isn't using guard pages.
Per the APM:
The CR3 register points to the base address of the page-directory-pointer
table. The page-directory-pointer table is aligned on a 32-byte boundary,
with the low 5 address bits 4:0 assumed to be 0.
And the SDM's much more explicit:
4:0 Ignored
Note, KVM gets this right when loading PDPTRs, it's only the nSVM flow
that is broken.
Maxim Levitsky [Wed, 9 Oct 2024 17:50:00 +0000 (10:50 -0700)]
KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
Reset the segment cache after segment initialization in vmx_vcpu_reset()
to harden KVM against caching stale/uninitialized data. Without the
recent fix to bypass the cache in kvm_arch_vcpu_put(), the following
scenario is possible:
- vCPU is just created, and the vCPU thread is preempted before
SS.AR_BYTES is written in vmx_vcpu_reset().
- When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() =>
vmx_get_cpl() reads and caches '0' for SS.AR_BYTES.
- vmx_vcpu_reset() => seg_setup() configures SS.AR_BYTES, but doesn't
invoke vmx_segment_cache_clear() to invalidate the cache.
As a result, KVM retains a stale value in the cache, which can be read,
e.g. via KVM_GET_SREGS. Usually this is not a problem because the VMX
segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM
selftests) reads and writes system registers just after the vCPU was
created, _without_ modifying SS.AR_BYTES, userspace will write back the
stale '0' value and ultimately will trigger a VM-Entry failure due to
incorrect SS segment type.
Invalidating the cache after writing the VMCS doesn't address the general
issue of cache accesses from IRQ context being unsafe, but it does prevent
KVM from clobbering the VMCS, i.e. mitigates the harm done _if_ KVM has a
bug that results in an unsafe cache access.
KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL
Massage the documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL to call out that
it applies to moved memslots as well as deleted memslots, to avoid KVM's
"fast zap" terminology (which has no meaning for userspace), and to reword
the documented targeted zap behavior to specifically say that KVM _may_
zap a subset of all SPTEs. As evidenced by the fix to zap non-leafs SPTEs
with gPTEs, formally documenting KVM's exact internal behavior is risky
and unnecessary.
KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
Add a lockdep assertion in kvm_unmap_gfn_range() to ensure that either
mmu_invalidate_in_progress is elevated, or that the range is being zapped
due to memslot removal (loosely detected by slots_lock being held).
Zapping SPTEs without mmu_invalidate_{in_progress,seq} protection is unsafe
as KVM's page fault path snapshots state before acquiring mmu_lock, and
thus can create SPTEs with stale information if vCPUs aren't forced to
retry faults (due to seeing an in-progress or past MMU invalidation).
Memslot removal is a special case, as the memslot is retrieved outside of
mmu_invalidate_seq, i.e. doesn't use the "standard" protections, and
instead relies on SRCU synchronization to ensure any in-flight page faults
are fully resolved before zapping SPTEs.
KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot
When performing a targeted zap on memslot removal, zap only MMU pages that
shadow guest PTEs, as zapping all SPs that "match" the gfn is inexact and
unnecessary. Furthermore, for_each_gfn_valid_sp() arguably shouldn't
exist, because it doesn't do what most people would it expect it to do.
The "round gfn for level" adjustment that is done for direct SPs (no gPTE)
means that the exact gfn comparison will not get a match, even when a SP
does "cover" a gfn, or was even created specifically for a gfn.
For memslot deletion specifically, KVM's behavior will vary significantly
based on the size and alignment of a memslot, and in weird ways. E.g. for
a 4KiB memslot, KVM will zap more SPs if the slot is 1GiB aligned than if
it's only 4KiB aligned. And as described below, zapping SPs in the
aligned case overzaps for direct MMUs, as odds are good the upper-level
SPs are serving other memslots.
To iterate over all potentially-relevant gfns, KVM would need to make a
pass over the hash table for each level, with the gfn used for lookup
rounded for said level. And then check that the SP is of the correct
level, too, e.g. to avoid over-zapping.
But even then, KVM would massively overzap, as processing every level is
all but guaranteed to zap SPs that serve other memslots, especially if the
memslot being removed is relatively small. KVM could mitigate that issue
by processing only levels that can be possible guest huge pages, i.e. are
less likely to be re-used for other memslot, but while somewhat logical,
that's quite arbitrary and would be a bit of a mess to implement.
So, zap only SPs with gPTEs, as the resulting behavior is easy to describe,
is predictable, and is explicitly minimal, i.e. KVM only zaps SPs that
absolutely must be zapped.
x86/kvm: Override default caching mode for SEV-SNP and TDX
AMD SEV-SNP and Intel TDX have limited access to MTRR: either it is not
advertised in CPUID or it cannot be programmed (on TDX, due to #VE on
CR0.CD clear).
This results in guests using uncached mappings where it shouldn't and
pmd/pud_set_huge() failures due to non-uniform memory type reported by
mtrr_type_lookup().
Override MTRR state, making it WB by default as the kernel does for
Hyper-V guests.
Linus Torvalds [Sun, 20 Oct 2024 00:04:52 +0000 (17:04 -0700)]
Merge tag 'io_uring-6.12-20241019' of git://git.kernel.dk/linux
Pull one more io_uring fix from Jens Axboe:
"Fix for a regression introduced in 6.12-rc2, where a condition check
was negated and hence -EAGAIN would bubble back up up to userspace
rather than trigger a retry condition"
* tag 'io_uring-6.12-20241019' of git://git.kernel.dk/linux:
io_uring/rw: fix wrong NOWAIT check in io_rw_init_file()
Linus Torvalds [Sat, 19 Oct 2024 19:52:19 +0000 (12:52 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Fixes all in drivers. The largest is the mpi3mr which corrects a phy
count limit that should only apply to the controller but was being
incorrectly applied to expander phys"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target: core: Fix null-ptr-deref in target_alloc_device()
scsi: mpi3mr: Validate SAS port assignments
scsi: ufs: core: Set SDEV_OFFLINE when UFS is shut down
scsi: ufs: core: Requeue aborted request
scsi: ufs: core: Fix the issue of ICU failure
Linus Torvalds [Sat, 19 Oct 2024 19:42:14 +0000 (12:42 -0700)]
Merge tag 'ftrace-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace fixes from Steven Rostedt:
"A couple of fixes to function graph infrastructure:
- Fix allocation of idle shadow stack allocation during hotplug
If function graph tracing is started when a CPU is offline, if it
were come online during the trace then the idle task that
represents the CPU will not get a shadow stack allocated for it.
This means all function graph hooks that happen while that idle
task is running (including in interrupt mode) will have all its
events dropped.
Switch over to the CPU hotplug mechanism that will have any newly
brought on line CPU get a callback that can allocate the shadow
stack for its idle task.
- Fix allocation size of the ret_stack_list array
When function graph tracing converted over to allowing more than
one user at a time, it had to convert its shadow stack from an
array of ret_stack structures to an array of unsigned longs. The
shadow stacks are allocated in batches of 32 at a time and assigned
to every running task. The batch is held by the ret_stack_list
array.
But when the conversion happened, instead of allocating an array of
32 pointers, it was allocated as a ret_stack itself (PAGE_SIZE).
This ret_stack_list gets passed to a function that iterates over
what it believes is its size defined by the
FTRACE_RETSTACK_ALLOC_SIZE macro (which is 32).
Luckily (PAGE_SIZE) is greater than 32 * sizeof(long), otherwise
this would have been an array overflow. This still should be fixed
and the ret_stack_list should be allocated to the size it is
expected to be as someday it may end up being bigger than
SHADOW_STACK_SIZE"
* tag 'ftrace-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
fgraph: Allocate ret_stack_list with proper size
fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks
Linus Torvalds [Sat, 19 Oct 2024 18:48:14 +0000 (11:48 -0700)]
Merge tag 'ipe-pr-20241018' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe
Pull ipe fixes from Fan Wu:
"This addresses several issues identified by Luca when attempting to
enable IPE on Debian and systemd:
- address issues with IPE policy update errors and policy update
version check, improving the clarity of error messages for better
understanding by userspace programs.
- enable IPE policies to be signed by secondary and platform
keyrings, facilitating broader use across general Linux
distributions like Debian.
- updates the IPE entry in the MAINTAINERS file to reflect the new
tree URL and my updated email from kernel.org"
* tag 'ipe-pr-20241018' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
MAINTAINERS: update IPE tree url and Fan Wu's email
ipe: fallback to platform keyring also if key in trusted keyring is rejected
ipe: allow secondary and platform keyrings to install/update policies
ipe: also reject policy updates with the same version
ipe: return -ESTALE instead of -EINVAL on update when new policy has a lower version
Linus Torvalds [Sat, 19 Oct 2024 17:18:03 +0000 (10:18 -0700)]
Merge tag 'input-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a fix for Zinitix driver to not fail probing if the property enabling
touch keys functionality is not defined. Support for touch keys was
added in 6.12 merge window so this issue does not affect users of
released kernels
- a couple new vendor/device IDs in xpad driver to enable support for
more hardware
* tag 'input-for-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: zinitix - don't fail if linux,keycodes prop is absent
Input: xpad - add support for MSI Claw A1M
Input: xpad - add support for 8BitDo Ultimate 2C Wireless Controller
Linus Torvalds [Sat, 19 Oct 2024 15:44:10 +0000 (08:44 -0700)]
Merge tag '9p-for-6.12-rc4' of https://github.com/martinetd/linux
Pull 9p fixes from Dominique Martinet:
"Mashed-up update that I sat on too long:
- fix for multiple slabs created with the same name
- enable multipage folios
- theorical fix to also look for opened fids by inode if none was
found by dentry"
[ Enabling multi-page folios should have been done during the merge
window, but it's a one-liner, and the actual meat of the enablement
is in netfs and already in use for other filesystems... - Linus ]
* tag '9p-for-6.12-rc4' of https://github.com/martinetd/linux:
9p: Avoid creating multiple slab caches with the same name
9p: Enable multipage folios
9p: v9fs_fid_find: also lookup by inode if not found dentry
Linus Torvalds [Sat, 19 Oct 2024 15:32:47 +0000 (08:32 -0700)]
Merge tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux
Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Fix several issues with the 'rustc-option' macro. It includes a
refactor from Masahiro of three '{cc,rust}-*' macros, which is not
a fix but avoids repeating the same commands (which would be
several lines in the case of 'rustc-option').
- Fix conditions for 'CONFIG_HAVE_CFI_ICALL_NORMALIZE_INTEGERS'. It
includes the addition of 'CONFIG_RUSTC_LLVM_VERSION', which is not
a fix but is needed for the actual fix.
And a trivial grammar fix"
* tag 'rust-fixes-6.12-2' of https://github.com/Rust-for-Linux/linux:
cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERS
kbuild: rust: add `CONFIG_RUSTC_LLVM_VERSION`
kbuild: fix issues with rustc-option
kbuild: refactor cc-option-yn, cc-disable-warning, rust-option-yn macros
lib/Kconfig.debug: fix grammar in RUST_BUILD_ASSERT_ALLOW
Jens Axboe [Sat, 19 Oct 2024 15:16:51 +0000 (09:16 -0600)]
io_uring/rw: fix wrong NOWAIT check in io_rw_init_file()
A previous commit improved how !FMODE_NOWAIT is dealt with, but
inadvertently negated a check whilst doing so. This caused -EAGAIN to be
returned from reading files with O_NONBLOCK set. Fix up the check for
REQ_F_SUPPORT_NOWAIT.
Steven Rostedt [Sat, 19 Oct 2024 01:52:12 +0000 (21:52 -0400)]
fgraph: Allocate ret_stack_list with proper size
The ret_stack_list is an array of ret_stack shadow stacks for the function
graph usage. When the first function graph is enabled, all tasks in the
system get a shadow stack. The ret_stack_list is a 32 element array of
pointers to these shadow stacks. It allocates the shadow stack in batches
(32 stacks at a time), assigns them to running tasks, and continues until
all tasks are covered.
When the function graph shadow stack changed from an array of
ftrace_ret_stack structures to an array of longs, the allocation of
ret_stack_list went from allocating an array of 32 elements to just a
block defined by SHADOW_STACK_SIZE. Luckily, that's defined as PAGE_SIZE
and is much more than enough to hold 32 pointers. But it is way overkill
for the amount needed to allocate.
Change the allocation of ret_stack_list back to a kcalloc() of
FTRACE_RETSTACK_ALLOC_SIZE pointers.
Steven Rostedt [Sat, 19 Oct 2024 01:43:00 +0000 (21:43 -0400)]
fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks
The function graph infrastructure allocates a shadow stack for every task
when enabled. This includes the idle tasks. The first time the function
graph is invoked, the shadow stacks are created and never freed until the
task exits. This includes the idle tasks.
Only the idle tasks that were for online CPUs had their shadow stacks
created when function graph tracing started. If function graph tracing is
enabled and a CPU comes online, the idle task representing that CPU will
not have its shadow stack created, and all function graph tracing for that
idle task will be silently dropped.
Instead, use the CPU hotplug mechanism to allocate the idle shadow stacks.
This will include idle tasks for CPUs that come online during tracing.
Linus Torvalds [Fri, 18 Oct 2024 23:27:14 +0000 (16:27 -0700)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann:
- Fix BPF verifier to not affect subreg_def marks in its range
propagation (Eduard Zingerman)
- Fix a truncation bug in the BPF verifier's handling of
coerce_reg_to_size_sx (Dimitar Kanaliev)
- Fix the BPF verifier's delta propagation between linked registers
under 32-bit addition (Daniel Borkmann)
- Fix a NULL pointer dereference in BPF devmap due to missing rxq
information (Florian Kauer)
- Fix a memory leak in bpf_core_apply (Jiri Olsa)
- Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for
arrays of nested structs (Hou Tao)
- Fix build ID fetching where memory areas backing the file were
created with memfd_secret (Andrii Nakryiko)
- Fix BPF task iterator tid filtering which was incorrectly using pid
instead of tid (Jordan Rome)
- Several fixes for BPF sockmap and BPF sockhash redirection in
combination with vsocks (Michal Luczaj)
- Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri)
- Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility
of an infinite BPF tailcall (Pu Lehui)
- Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot
be resolved (Thomas Weißschuh)
- Fix a bug in kfunc BTF caching for modules where the wrong BTF object
was returned (Toke Høiland-Jørgensen)
- Fix a BPF selftest compilation error in cgroup-related tests with
musl libc (Tony Ambardar)
- Several fixes to BPF link info dumps to fill missing fields (Tyrone
Wu)
- Add BPF selftests for kfuncs from multiple modules, checking that the
correct kfuncs are called (Simon Sundberg)
- Ensure that internal and user-facing bpf_redirect flags don't overlap
(Toke Høiland-Jørgensen)
- Switch to use kvzmalloc to allocate BPF verifier environment (Rik van
Riel)
- Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat
under RT (Wander Lairson Costa)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits)
lib/buildid: Handle memfd_secret() files in build_id_parse()
selftests/bpf: Add test case for delta propagation
bpf: Fix print_reg_state's constant scalar dump
bpf: Fix incorrect delta propagation between linked registers
bpf: Properly test iter/task tid filtering
bpf: Fix iter/task tid filtering
riscv, bpf: Make BPF_CMPXCHG fully ordered
bpf, vsock: Drop static vsock_bpf_prot initialization
vsock: Update msg_count on read_skb()
vsock: Update rx_bytes on read_skb()
bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock
selftests/bpf: Add asserts for netfilter link info
bpf: Fix link info netfilter flags to populate defrag flag
selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx()
selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx()
bpf: Fix truncation bug in coerce_reg_to_size_sx()
selftests/bpf: Assert link info uprobe_multi count & path_size if unset
bpf: Fix unpopulated path_size when uprobe_multi fields unset
selftests/bpf: Fix cross-compiling urandom_read
selftests/bpf: Add test for kfunc module order
...
Linus Torvalds [Fri, 18 Oct 2024 23:11:17 +0000 (16:11 -0700)]
Merge tag 'linux_kselftest-fixes-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
- fix test makefile to install tests directory without which the test
fails with errors
* tag 'linux_kselftest-fixes-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftest: hid: add the missing tests directory
Nikita Travkin [Fri, 4 Oct 2024 16:17:30 +0000 (21:17 +0500)]
Input: zinitix - don't fail if linux,keycodes prop is absent
When initially adding the touchkey support, a mistake was made in the
property parsing code. The possible negative errno from
device_property_count_u32() was never checked, which was an oversight
left from converting to it from the of_property as part of the review
fixes.
Re-add the correct handling of the absent property, in which case zero
touchkeys should be assumed, which would disable the feature.
- Cleanup of checking for always-set tag_set (SurajSonawane2415)
- Fix for a crash with CPU hotplug notifiers (Ming)
- Don't allow zero-copy ublk on unprivileged device (Ming)
- Use array_index_nospec() for CDROM (Josh)
- Remove dead code in drbd (David)
- Tweaks to elevator loading (Breno)
* tag 'block-6.12-20241018' of git://git.kernel.dk/linux:
cdrom: Avoid barrier_nospec() in cdrom_ioctl_media_changed()
nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function
nvme: make keep-alive synchronous operation
nvme-loop: flush off pending I/O while shutting down loop controller
nvme-pci: fix race condition between reset and nvme_dev_disable()
ublk: don't allow user copy for unprivileged device
blk-rq-qos: fix crash on rq_qos_wait vs. rq_qos_wake_function race
nvme-multipath: defer partition scanning
blk-mq: setup queue ->tag_set before initializing hctx
elevator: Remove argument from elevator_find_get
elevator: do not request_module if elevator exists
drbd: Remove unused conn_lowest_minor
nvme: disable CC.CRIME (NVME_CC_CRIME)
nvme: delete unnecessary fallthru comment
nvmet-rdma: use sbitmap to replace rsp free list
block: Fix elevator_get_default() checking for NULL q->tag_set
nvme: tcp: avoid race between queue_lock lock and destroy
nvmet-passthru: clear EUID/NGUID/UUID while using loop target
block: fix blk_rq_map_integrity_sg kernel-doc
Linus Torvalds [Fri, 18 Oct 2024 22:38:37 +0000 (15:38 -0700)]
Merge tag 'io_uring-6.12-20241018' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Fix a regression this merge window where cloning of registered
buffers didn't take into account the dummy_ubuf
- Fix a race with reading how many SQRING entries are available,
causing userspace to need to loop around io_uring_sqring_wait()
rather than being able to rely on SQEs being available when it
returned
- Ensure that the SQPOLL thread is TASK_RUNNING before running
task_work off the cancelation exit path
* tag 'io_uring-6.12-20241018' of git://git.kernel.dk/linux:
io_uring/sqpoll: ensure task state is TASK_RUNNING when running task_work
io_uring/rsrc: ignore dummy_ubuf for buffer cloning
io_uring/sqpoll: close race on waiting for sqring entries
ipe: fallback to platform keyring also if key in trusted keyring is rejected
If enabled, we fallback to the platform keyring if the trusted keyring
doesn't have the key used to sign the ipe policy. But if pkcs7_verify()
rejects the key for other reasons, such as usage restrictions, we do not
fallback. Do so, following the same change in dm-verity.
Signed-off-by: Luca Boccassi <[email protected]> Suggested-by: Serge Hallyn <[email protected]>
[FW: fixed some line length issues and a typo in the commit message] Signed-off-by: Fan Wu <[email protected]>
Linus Torvalds [Fri, 18 Oct 2024 18:37:12 +0000 (11:37 -0700)]
Merge tag 'v6.12-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix possible double free setting xattrs
- Fix slab out of bounds with large ioctl payload
- Remove three unused functions, and an unused variable that could be
confusing
* tag 'v6.12-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Remove unused functions
smb/client: Fix logically dead code
smb: client: fix OOBs when building SMB2_IOCTL request
smb: client: fix possible double free in smb2_set_ea()
Linus Torvalds [Fri, 18 Oct 2024 18:28:39 +0000 (11:28 -0700)]
Merge tag 'xfs-6.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
- Fix integer overflow in xrep_bmap
- Fix stale dealloc punching for COW IO
* tag 'xfs-6.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: punch delalloc extents from the COW fork for COW writes
xfs: set IOMAP_F_SHARED for all COW fork allocations
xfs: share more code in xfs_buffered_write_iomap_begin
xfs: support the COW fork in xfs_bmap_punch_delalloc_range
xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock
xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof
xfs: factor out a xfs_file_write_zero_eof helper
iomap: move locking out of iomap_write_delalloc_release
iomap: remove iomap_file_buffered_write_punch_delalloc
iomap: factor out a iomap_last_written_block helper
xfs: fix integer overflow in xrep_bmap
Linus Torvalds [Fri, 18 Oct 2024 18:16:01 +0000 (11:16 -0700)]
Merge tag 'pm-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two issues in the amd-pstate cpufreq driver and update the
intel_rapl power capping driver with a new processor ID.
Specifics:
- Enable ACPI CPPC in amd_pstate_register_driver() after disabling it
in amd_pstate_unregister_driver() when switching driver operation
modes (Dhananjay Ugwekar)
- Make amd-pstate use nominal performance as the maximum performance
level when boost is disabled (Mario Limonciello)
- Add ArrowLake-H to the list of processors where PL4 is supported in
the MSR part of the intel_rapl power capping driver (Srinivas
Pandruvada)"
* tag 'pm-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
powercap: intel_rapl_msr: Add PL4 support for ArrowLake-H
cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled
cpufreq/amd-pstate: Fix amd_pstate mode switch on shared memory systems
Linus Torvalds [Fri, 18 Oct 2024 18:13:53 +0000 (11:13 -0700)]
Merge tag 'hwmon-for-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Fix auto-detect regression in jc42 driver"
* tag 'hwmon-for-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
[PATCH} hwmon: (jc42) Properly detect TSE2004-compliant devices again
Linus Torvalds [Fri, 18 Oct 2024 18:03:21 +0000 (11:03 -0700)]
Merge tag 'drm-fixes-2024-10-18' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes, msm and xe are the two main ones, with a bunch of
scattered fixes including a largish revert in mgag200, then amdgpu,
vmwgfx and scattering of other minor ones.
All seems pretty regular.
msm:
- Display:
- move CRTC resource assignment to atomic_check otherwise to make
consecutive calls to atomic_check() consistent
- fix rounding / sign-extension issues with pclk calculation in
case of DSC
- cleanups to drop incorrect null checks in dpu snapshots
- fix to use kvzalloc in dpu snapshot to avoid allocation issues
in heavily loaded system cases
- Fix to not program merge_3d block if dual LM is not being used
- Fix to not flush merge_3d block if its not enabled otherwise
this leads to false timeouts
- GPU:
- a7xx: add a fence wait before SMMU table update
xe:
- New workaround to Xe2 (Aradhya)
- Fix unbalanced rpm put (Matthew Auld)
- Remove fragile lock optimization (Matthew Brost)
- Fix job release, delegating it to the drm scheduler (Matthew Brost)
- Fix timestamp bit width for Xe2 (Lucas)
- Fix external BO's dma-resv usag (Matthew Brost)
- Fix returning success for timeout in wait_token (Nirmoy)
- Initialize fence to avoid it being detected as signaled (Matthew
Auld)
- Improve cache flush for BMG (Matthew Auld)
- Don't allow hflip for tile4 framebuffer on Xe2 (Juha-Pekka)
host1x:
- Fix boot on Tegra186
- Set DMA parameters
mgag200:
- Revert VBLANK support
panel:
- himax-hx83192: Adjust power and gamma
qaic:
- Sgtable loop fixes
vmwgfx:
- Limit display layout allocatino size
- Handle allocation errors in connector checks
- Clean up KMS code for 2d-only setup
- Report surface-check errors correctly
- Remove NULL test around kvfree()"
* tag 'drm-fixes-2024-10-18' of https://gitlab.freedesktop.org/drm/kernel: (45 commits)
drm/ast: vga: Clear EDID if no display is connected
drm/ast: sil164: Clear EDID if no display is connected
Revert "drm/mgag200: Add vblank support"
drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs
drm/i915/display: Don't allow tile4 framebuffer to do hflip on display20 or greater
drm/xe/bmg: improve cache flushing behaviour
drm/xe/xe_sync: initialise ufence.signalled
drm/xe/ufence: ufence can be signaled right after wait_woken
drm/xe: Use bookkeep slots for external BO's in exec IOCTL
drm/xe/query: Increase timestamp width
drm/xe: Don't free job in TDR
drm/xe: Take job list lock in xe_sched_add_pending_job
drm/xe: fix unbalanced rpm put() with declare_wedged()
drm/xe: fix unbalanced rpm put() with fence_fini()
drm/xe/xe2lpg: Extend Wa_15016589081 for xe2lpg
drm/i915/dp_mst: Don't require DSC hblank quirk for a non-DSC compatible mode
drm/i915/dp_mst: Handle error during DSC BW overhead/slice calculation
drm/msm/a6xx+: Insert a fence wait before SMMU table update
drm/msm/dpu: don't always program merge_3d block
drm/msm/dpu: Don't always set merge_3d pending flush
...
Linus Torvalds [Fri, 18 Oct 2024 16:50:05 +0000 (09:50 -0700)]
mm: fix follow_pfnmap API lockdep assert
The lockdep asserts for the new follow_pfnmap() API "knows" that a
pfnmap always has a vma->vm_file, since that's the only way to create
such a mapping.
And that's actually true for all the normal cases. But not for the mmap
failure case, where the incomplete mapping is torn down and we have
cleared vma->vm_file because the failure occured before the file was
linked to the vma.
So this codepath does actually need to check for vm_file being NULL.
- Enable ACPI CPPC in amd_pstate_register_driver() after disabling
it in amd_pstate_unregister_driver() during driver operation mode
switch (Dhananjay Ugwekar).
- Make amd-pstate use nominal performance as the maximum performance
level when boost is disabled (Mario Limonciello).
* pm-cpufreq:
cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled
cpufreq/amd-pstate: Fix amd_pstate mode switch on shared memory systems
Linus Torvalds [Fri, 18 Oct 2024 14:01:59 +0000 (07:01 -0700)]
Merge tag 's390-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Fix PCI error recovery by handling error events correctly
- Fix CCA crypto card behavior within protected execution environment
- Two KVM commits which fix virtual vs physical address handling bugs
in KVM pfault handling
- Fix return code handling in pckmo_key2protkey()
- Deactivate sclp console as late as possible so that outstanding
messages appear on the console instead of being dropped on reboot
- Convert newlines to CRLF instead of LFCR for the sclp vt220 driver,
as required by the vt220 specification
- Initialize also psw mask in perf_arch_fetch_caller_regs() to make
sure that user_mode(regs) will return false
- Update defconfigs
* tag 's390-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: Update defconfigs
s390: Initialize psw mask in perf_arch_fetch_caller_regs()
s390/sclp_vt220: Convert newlines to CRLF instead of LFCR
s390/sclp: Deactivate sclp after all its users
s390/pkey_pckmo: Return with success for valid protected key types
KVM: s390: Change virtual to physical address access in diag 0x258 handler
KVM: s390: gaccess: Check if guest address is in memslot
s390/ap: Fix CCA crypto card behavior within protected execution environment
s390/pci: Handle PCI error codes other than 0x3a
rts5228, rts5261, rts5264 are supported by the rtsx_pci driver, but
they are not mentioned in the Kconfig help when the code was added.
List those models in the Kconfig help accordingly.
HID: multitouch: Add quirk for Logitech Bolt receiver w/ Casa touchpad
The Logitech Casa Touchpad does not reliably send touch release signals
when communicating through the Logitech Bolt wireless-to-USB receiver.
Adjusting the device class to add MT_QUIRK_NOT_SEEN_MEANS_UP to make
sure that no touches become stuck, MT_QUIRK_FORCE_MULTI_INPUT is not
needed, but harmless.
Linux does not have information on which devices are connected to the
Bolt receiver, so we have to enable this for the entire device.
HID: i2c-hid: Delayed i2c resume wakeup for 0x0d42 Goodix touchpad
Patch for Goodix 27c6:0d42 touchpads found in Inspiron 5515 laptops.
After resume from suspend, one can communicate with this device just fine.
We can read data from it or request a reset,
but for some reason the interrupt line will not go up
when new events are available.
(it can correctly respond to a reset with an interrupt tho)
The only way I found to wake this device up
is to send anything to it after ~1.5s mark,
for example a simple read request, or power mode change.
In this patch, I simply delay the resume steps with msleep,
this will cause the set_power request to happen after
the ~1.5s barrier causing the device to resume its event interrupts.
Sleep was used rather than delayed_work
to make this workaround as non-invasive as possible.
Jiqian Chen [Sat, 12 Oct 2024 08:45:37 +0000 (16:45 +0800)]
xen: Remove dependency between pciback and privcmd
Commit 2fae6bb7be32 ("xen/privcmd: Add new syscall to get gsi from dev")
adds a weak reverse dependency to the config XEN_PRIVCMD definition, that
dependency causes xen-privcmd can't be loaded on domU, because dependent
xen-pciback isn't always be loaded successfully on domU.
To solve above problem, remove that dependency, and do not call
pcistub_get_gsi_from_sbdf() directly, instead add a hook in
drivers/xen/apci.c, xen-pciback register the real call function, then in
privcmd_ioctl_pcidev_get_gsi call that hook.
Kent Overstreet [Fri, 18 Oct 2024 04:22:09 +0000 (00:22 -0400)]
bcachefs: fsck: Improve hash_check_key()
hash_check_key() checks and repairs the hash table btrees: dirents and
xattrs are open addressing hash tables.
We recently had a corruption reported where the hash type on an inode
somehow got flipped, which made the existing dirents invisible and
allowed new ones to be created with the same name.
Now, hash_check_key() can repair duplicates: it will delete one of them,
if it has an xattr or dangling dirent, but if it has two valid dirents
one of them gets renamed.
Kent Overstreet [Fri, 18 Oct 2024 03:00:14 +0000 (23:00 -0400)]
bcachefs: Repair mismatches in inode hash seed, type
Different versions of the same inode (same inode number, different
snapshot ID) must have the same hash seed and type - lookups require
this, since they see keys from different snapshots simultaneously.
To repair we only need to make the inodes consistent, hash_check_key()
will do the rest.