Merge tag 'xtensa-20230928' of https://github.com/jcmvbkbc/linux-xtensa
Pull Xtensa fixes from Max Filippov:
- fix build warnings from builds performed with W=1
* tag 'xtensa-20230928' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: boot/lib: fix function prototypes
xtensa: umulsidi3: fix conditional expression
xtensa: boot: don't add include-dirs
xtensa: iss/network: make functions static
xtensa: tlb: include <asm/tlb.h> for missing prototype
xtensa: hw_breakpoint: include header for missing prototype
xtensa: smp: add headers for missing function prototypes
irqchip: irq-xtensa-mx: include header for missing prototype
xtensa: traps: add <linux/cpu.h> for function prototype
xtensa: stacktrace: include <asm/ftrace.h> for prototype
xtensa: signal: include headers for function prototypes
xtensa: processor.h: add init_arch() prototype
xtensa: ptrace: add prototypes to <asm/ptrace.h>
xtensa: irq: include <asm/traps.h>
xtensa: fault: include <asm/traps.h>
xtensa: add default definition for XCHAL_HAVE_DIV32
Merge tag 'spi-fix-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A small set of device specific fixes, the most major one is for the
GXP driver which would probably have been confusing some callers with
returning the length rather than 0 on successful writes"
* tag 'spi-fix-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-gxp: BUG: Correct spi write return value
dt-bindings: spi: fsl-imx-cspi: Document missing entries
spi: cs42l43: Remove spurious pm_runtime_disable
Merge tag 'loongarch-fixes-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Fix high_memory calculation and module loader errors with latest
binutils"
* tag 'loongarch-fixes-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Add support for 64_PCREL relocation type
LoongArch: Add support for 32_PCREL relocation type
LoongArch: Define relocation types for ABI v2.10
LoongArch: numa: Fix high_memory calculation
Merge tag 'mips-fixes_6.6_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fix from Thomas Bogendoerfer:
- fix Alchemy build with MMC support disabled
* tag 'mips-fixes_6.6_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Alchemy: only build mmc support helpers if au1xmmc is enabled
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A single fix for libata: older devices don't support command duration
limits (CDL) and some don't support report opcodes, meaning there's no
way to tell if they support the command or not.
Reduce the problems of incorrectly using CDL commands on older devices
by checking SCSI spec compliance at SPC-5 (the spec which introduced
the command) before turning on CDL"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: core: ata: Do no try to probe for CDL on old drives
Merge tag 'vfio-v6.6-rc4' of https://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
- The new PDS vfio-pci variant driver only supports SR-IOV VF devices
and incorrectly made a direct reference to the physfn field of the
pci_dev. Fix this both by making the Kconfig depend on IOV support
as well as using the correct wrapper for this access (Shixiong Ou)
- Resolve an error path issue where on unwind of the mdev registration
the created kset is not unregistered and the wrong error code is
returned (Jinjie Ruan)
* tag 'vfio-v6.6-rc4' of https://github.com/awilliam/linux-vfio:
vfio/mdev: Fix a null-ptr-deref bug for mdev_unregister_parent()
vfio/pds: Use proper PF device access helper
vfio/pds: Add missing PCI_IOV depends
Tiezhu Yang [Wed, 27 Sep 2023 08:19:13 +0000 (16:19 +0800)]
LoongArch: Add support for 32_PCREL relocation type
When build and update kernel with the latest upstream binutils and
loongson3_defconfig, module loader fails with:
kmod: zsmalloc: Unsupport relocation type 99, please add its support.
kmod: fuse: Unsupport relocation type 99, please add its support.
kmod: ipmi_msghandler: Unsupport relocation type 99, please add its support.
kmod: ipmi_msghandler: Unsupport relocation type 99, please add its support.
kmod: pstore: Unsupport relocation type 99, please add its support.
kmod: drm_display_helper: Unsupport relocation type 99, please add its support.
kmod: drm_display_helper: Unsupport relocation type 99, please add its support.
kmod: drm_display_helper: Unsupport relocation type 99, please add its support.
kmod: fuse: Unsupport relocation type 99, please add its support.
kmod: fat: Unsupport relocation type 99, please add its support.
This is because the latest upstream binutils replaces a pair of ADD32
and SUB32 with 32_PCREL, so add support for 32_PCREL relocation type.
For 64bit kernel without HIGHMEM, high_memory is the virtual address of
the highest physical address in the system. But __va(get_num_physpages()
<< PAGE_SHIFT) is not what we want for high_memory because there may be
holes in the physical address space. On the other hand, max_low_pfn is
calculated from memblock_end_of_DRAM(), which is exactly corresponding
to the highest physical address, so use it for high_memory calculation.
Merge tag 'wq-for-6.6-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
- Remove double allocation of wq_update_pod_attrs_buf
- Fix missing allocation of pwq_release_worker when
wq_cpu_intensive_thresh_us is set to a custom value
* tag 'wq-for-6.6-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fix missed pwq_release_worker creation in wq_cpu_intensive_thresh_init()
workqueue: Removed double allocation of wq_update_pod_attrs_buf
Merge tag 'for-6.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- delayed refs fixes:
- fix race when refilling delayed refs block reserve
- prevent transaction block reserve underflow when starting
transaction
- error message and value adjustments
- fix build warnings with CONFIG_CC_OPTIMIZE_FOR_SIZE and
-Wmaybe-uninitialized
- fix for smatch report where uninitialized data from invalid extent
buffer range could be returned to the caller
- fix numeric overflow in statfs when calculating lower threshold
for a full filesystem
* tag 'for-6.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: initialize start_slot in btrfs_log_prealloc_extents
btrfs: make sure to initialize start and len in find_free_dev_extent
btrfs: reset destination buffer when read_extent_buffer() gets invalid range
btrfs: properly report 0 avail for very full file systems
btrfs: log message if extent item not found when running delayed extent op
btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref()
btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1
btrfs: prevent transaction block reserve underflow when starting transaction
btrfs: fix race when refilling delayed refs block reserve
Merge tag 'linux-kselftest-fixes-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
"One single fix to unmount tracefs when test created mount"
* tag 'linux-kselftest-fixes-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/user_events: Fix to unmount tracefs when test created mount
Merge tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains the usual miscellaneous fixes and cleanups for vfs and
individual fses:
Fixes:
- Revert ki_pos on error from buffered writes for direct io fallback
- Add missing documentation for block device and superblock handling
for changes merged this cycle
- Fix reiserfs flexible array usage
- Ensure that overlayfs sets ctime when setting mtime and atime
- Disable deferred caller completions with overlayfs writes until
proper support exists
Cleanups:
- Remove duplicate initialization in pipe code
- Annotate aio kioctx_table with __counted_by"
* tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
overlayfs: set ctime when setting mtime and atime
ntfs3: put resources during ntfs_fill_super()
ovl: disable IOCB_DIO_CALLER_COMP
porting: document superblock as block device holder
porting: document new block device opening order
fs/pipe: remove duplicate "offset" initializer
fs-writeback: do not requeue a clean inode having skipped pages
aio: Annotate struct kioctx_table with __counted_by
direct_write_fallback(): on error revert the ->ki_pos update from buffered write
reiserfs: Replace 1-element array with C99 style flex-array
Merge tag 'perf-tools-fixes-for-v6.6-1-2023-09-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
"Build:
- Update header files in the tools/**/include directory to sync with
the kernel sources as usual.
- Remove unused bpf-prologue files. While it's not strictly a fix,
but the functionality was removed in this cycle so better to get
rid of the code together.
- Other minor build fixes.
Misc:
- Fix uninitialized memory access in PMU parsing code
- Fix segfaults on software event"
* tag 'perf-tools-fixes-for-v6.6-1-2023-09-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf jevent: fix core dump on software events on s390
perf pmu: Ensure all alias variables are initialized
perf jevents metric: Fix type of strcmp_cpuid_str
perf trace: Avoid compile error wrt redefining bool
perf bpf-prologue: Remove unused file
tools headers UAPI: Update tools's copy of drm.h headers
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf bench sched-seccomp-notify: Use the tools copy of seccomp.h UAPI
tools headers UAPI: Copy seccomp.h to be able to build 'perf bench' in older systems
tools headers UAPI: Sync files changed by new fchmodat2 and map_shadow_stack syscalls with the kernel sources
perf tools: Update copy of libbpf's hashmap.c
Jeff Layton [Wed, 13 Sep 2023 13:33:12 +0000 (09:33 -0400)]
overlayfs: set ctime when setting mtime and atime
Nathan reported that he was seeing the new warning in
setattr_copy_mgtime pop when starting podman containers. Overlayfs is
trying to set the atime and mtime via notify_change without also
setting the ctime.
POSIX states that when the atime and mtime are updated via utimes() that
we must also update the ctime to the current time. The situation with
overlayfs copy-up is analogies, so add ATTR_CTIME to the bitmask.
notify_change will fill in the value.
During ntfs_fill_super() some resources are allocated that we need to
cleanup in ->put_super() such as additional inodes. When
ntfs_fill_super() fails these resources need to be cleaned up as well.
Reported-by: [email protected] Fixes: 78a06688a4d4 ("ntfs3: drop inode references in ntfs_put_super()") Signed-off-by: Christian Brauner <[email protected]>
MIPS: Alchemy: only build mmc support helpers if au1xmmc is enabled
While commit d4a5c59a955b ("mmc: au1xmmc: force non-modular build and
remove symbol_get usage") to be built in, it can still build a kernel
without MMC support and thuse no mmc_detect_change symbol at all.
Add ifdefs to build the mmc support code in the alchemy arch code
conditional on mmc support.
Fixes: d4a5c59a955b ("mmc: au1xmmc: force non-modular build and remove symbol_get usage") Reported-by: kernel test robot <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Randy Dunlap <[email protected]> Tested-by: Randy Dunlap <[email protected]> # build-tested Signed-off-by: Thomas Bogendoerfer <[email protected]>
overlayfs copies the kiocb flags when it sets up a new kiocb to handle
a write, but it doesn't properly support dealing with the deferred
caller completions of the kiocb. This means it doesn't get the final
write completion value, and hence will complete the write with '0' as
the result.
We could support the caller completions in overlayfs, but for now let's
just disable them in the generated write kiocb.
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix EL2 Stage-1 MMIO mappings where a random address was used
- Fix SMCCC function number comparison when the SVE hint is set
RISC-V:
- Fix KVM_GET_REG_LIST API for ISA_EXT registers
- Fix reading ISA_EXT register of a missing extension
- Fix ISA_EXT register handling in get-reg-list test
- Fix filtering of AIA registers in get-reg-list test
x86:
- Fixes for TSC_AUX virtualization
- Stop zapping page tables asynchronously, since we don't zap them as
often as before"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX
KVM: SVM: Fix TSC_AUX virtualization setup
KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway
KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously
KVM: x86/mmu: Do not filter address spaces in for_each_tdp_mmu_root_yield_safe()
KVM: x86/mmu: Open code leaf invalidation from mmu_notifier
KVM: riscv: selftests: Selectively filter-out AIA registers
KVM: riscv: selftests: Fix ISA_EXT register handling in get-reg-list
RISC-V: KVM: Fix riscv_vcpu_get_isa_ext_single() for missing extensions
RISC-V: KVM: Fix KVM_GET_REG_LIST API for ISA_EXT registers
KVM: selftests: Assert that vasprintf() is successful
KVM: arm64: nvhe: Ignore SVE hint in SMCCC function ID
KVM: arm64: Properly return allocated EL2 VA from hyp_alloc_private_va_range()
Merge tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix the "bytes" output of the per_cpu stat file
The tracefs/per_cpu/cpu*/stats "bytes" was giving bogus values as the
accounting was not accurate. It is suppose to show how many used
bytes are still in the ring buffer, but even when the ring buffer was
empty it would still show there were bytes used.
- Fix a bug in eventfs where reading a dynamic event directory (open)
and then creating a dynamic event that goes into that diretory screws
up the accounting.
On close, the newly created event dentry will get a "dput" without
ever having a "dget" done for it. The fix is to allocate an array on
dir open to save what dentries were actually "dget" on, and what ones
to "dput" on close.
* tag 'trace-v6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Remember what dentries were created on dir open
ring-buffer: Fix bytes info in per_cpu buffer stats
Merge tag 'cxl-fixes-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dan Williams:
"A collection of regression fixes, bug fixes, and some small cleanups
to the Compute Express Link code.
The regressions arrived in the v6.5 dev cycle and missed the v6.6
merge window due to my personal absences this cycle. The most
important fixes are for scenarios where the CXL subsystem fails to
parse valid region configurations established by platform firmware.
This is important because agreement between OS and BIOS on the CXL
configuration is fundamental to implementing "OS native" error
handling, i.e. address translation and component failure
identification.
Other important fixes are a driver load error when the BIOS lets the
Linux PCI core handle AER events, but not CXL memory errors.
The other fixex might have end user impact, but for now are only known
to trigger in our test/emulation environment.
Summary:
- Fix multiple scenarios where platform firmware defined regions fail
to be assembled by the CXL core.
- Fix a spurious driver-load failure on platforms that enable OS
native AER, but not OS native CXL error handling.
- Fix a regression detecting "poison" commands when "security"
commands are also defined.
- Fix a cxl_test regression with the move to centralize CXL port
register enumeration in the CXL core.
- Miscellaneous small fixes and cleanups"
* tag 'cxl-fixes-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/acpi: Annotate struct cxl_cxims_data with __counted_by
cxl/port: Fix cxl_test register enumeration regression
cxl/region: Refactor granularity select in cxl_port_setup_targets()
cxl/region: Match auto-discovered region decoders by HPA range
cxl/mbox: Fix CEL logic for poison and security commands
cxl/pci: Replace host_bridge->native_aer with pcie_aer_is_native()
PCI/AER: Export pcie_aer_is_native()
cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registers
Merge tag 'gpio-fixes-for-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix an invalid usage of __free(kfree) leading to kfreeing an
ERR_PTR()
- fix an irq domain leak in gpio-tb10x
- MAINTAINERS update
* tag 'gpio-fixes-for-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sim: fix an invalid __free() usage
gpio: tb10x: Fix an error handling path in tb10x_gpio_probe()
MAINTAINERS: gpio-regmap: make myself a maintainer of it
Merge tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"13 hotfixes, 10 of which pertain to post-6.5 issues. The other three
are cc:stable"
* tag 'mm-hotfixes-stable-2023-09-23-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
proc: nommu: fix empty /proc/<pid>/maps
filemap: add filemap_map_order0_folio() to handle order0 folio
proc: nommu: /proc/<pid>/maps: release mmap read lock
mm: memcontrol: fix GFP_NOFS recursion in memory.high enforcement
pidfd: prevent a kernel-doc warning
argv_split: fix kernel-doc warnings
scatterlist: add missing function params to kernel-doc
selftests/proc: fixup proc-empty-vm test after KSM changes
revert "scripts/gdb/symbols: add specific ko module load command"
selftests: link libasan statically for tests with -fsanitize=address
task_work: add kerneldoc annotation for 'data' argument
mm: page_alloc: fix CMA and HIGHATOMIC landing on the wrong buddy list
sh: mm: re-add lost __ref to ioremap_prot() to fix modpost warning
Merge tag 'i2c-for-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"A set of I2C driver fixes. Mostly fixing resource leaks or sanity
checks"
* tag 'i2c-for-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: xiic: Correct return value check for xiic_reinit()
i2c: mux: gpio: Add missing fwnode_handle_put()
i2c: mux: demux-pinctrl: check the return value of devm_kstrdup()
i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low
i2c: i801: unregister tco_pdev in i801_probe() error path
Charles Keepax [Tue, 19 Sep 2023 11:03:20 +0000 (13:03 +0200)]
mfd: cs42l43: Use correct macro for new-style PM runtime ops
The code was accidentally mixing new and old style macros, update the
macros used to remove an unused function warning whilst building with
no PM enabled in the config.
Merge tag 'loongarch-fixes-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Fix lockdep, fix a boot failure, fix some build warnings, fix document
links, and some cleanups"
* tag 'loongarch-fixes-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
docs/zh_CN/LoongArch: Update the links of ABI
docs/LoongArch: Update the links of ABI
LoongArch: Don't inline kasan_mem_to_shadow()/kasan_shadow_to_mem()
kasan: Cleanup the __HAVE_ARCH_SHADOW_MAP usage
LoongArch: Set all reserved memblocks on Node#0 at initialization
LoongArch: Remove dead code in relocate_new_kernel
LoongArch: Use _UL() and _ULL()
LoongArch: Fix some build warnings with W=1
LoongArch: Fix lockdep static memory detection
Merge tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:
- Return EIO on bad inputs to iomap_to_bh instead of BUGging, to deal
less poorly with block device io racing with block device resizing
- Fix a stale page data exposure bug introduced in 6.6-rc1 when
unsharing a file range that is not in the page cache
* tag 'iomap-6.6-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: convert iomap_unshare_iter to use large folios
iomap: don't skip reading in !uptodate folios when unsharing a range
iomap: handle error conditions more gracefully in iomap_to_bh
Paolo Bonzini [Sat, 23 Sep 2023 09:35:55 +0000 (05:35 -0400)]
Merge tag 'kvm-riscv-fixes-6.6-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv fixes for 6.6, take #1
- Fix KVM_GET_REG_LIST API for ISA_EXT registers
- Fix reading ISA_EXT register of a missing extension
- Fix ISA_EXT register handling in get-reg-list test
- Fix filtering of AIA registers in get-reg-list test
Tom Lendacky [Fri, 15 Sep 2023 20:54:32 +0000 (15:54 -0500)]
KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX
When the TSC_AUX MSR is virtualized, the TSC_AUX value is swap type "B"
within the VMSA. This means that the guest value is loaded on VMRUN and
the host value is restored from the host save area on #VMEXIT.
Since the value is restored on #VMEXIT, the KVM user return MSR support
for TSC_AUX can be replaced by populating the host save area with the
current host value of TSC_AUX. And, since TSC_AUX is not changed by Linux
post-boot, the host save area can be set once in svm_hardware_enable().
This eliminates the two WRMSR instructions associated with the user return
MSR support.
Tom Lendacky [Fri, 15 Sep 2023 20:54:30 +0000 (15:54 -0500)]
KVM: SVM: Fix TSC_AUX virtualization setup
The checks for virtualizing TSC_AUX occur during the vCPU reset processing
path. However, at the time of initial vCPU reset processing, when the vCPU
is first created, not all of the guest CPUID information has been set. In
this case the RDTSCP and RDPID feature support for the guest is not in
place and so TSC_AUX virtualization is not established.
This continues for each vCPU created for the guest. On the first boot of
an AP, vCPU reset processing is executed as a result of an APIC INIT
event, this time with all of the guest CPUID information set, resulting
in TSC_AUX virtualization being enabled, but only for the APs. The BSP
always sees a TSC_AUX value of 0 which probably went unnoticed because,
at least for Linux, the BSP TSC_AUX value is 0.
Move the TSC_AUX virtualization enablement out of the init_vmcb() path and
into the vcpu_after_set_cpuid() path to allow for proper initialization of
the support after the guest CPUID information has been set.
With the TSC_AUX virtualization support now in the vcpu_set_after_cpuid()
path, the intercepts must be either cleared or set based on the guest
CPUID input.
Paolo Bonzini [Fri, 22 Sep 2023 21:06:34 +0000 (17:06 -0400)]
KVM: SVM: INTERCEPT_RDTSCP is never intercepted anyway
svm_recalc_instruction_intercepts() is always called at least once
before the vCPU is started, so the setting or clearing of the RDTSCP
intercept can be dropped from the TSC_AUX virtualization support.
Extracted from a patch by Tom Lendacky.
Cc: [email protected] Fixes: 296d5a17e793 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts") Signed-off-by: Paolo Bonzini <[email protected]>
Stop zapping invalidate TDP MMU roots via work queue now that KVM
preserves TDP MMU roots until they are explicitly invalidated. Zapping
roots asynchronously was effectively a workaround to avoid stalling a vCPU
for an extended during if a vCPU unloaded a root, which at the time
happened whenever the guest toggled CR0.WP (a frequent operation for some
guest kernels).
While a clever hack, zapping roots via an unbound worker had subtle,
unintended consequences on host scheduling, especially when zapping
multiple roots, e.g. as part of a memslot. Because the work of zapping a
root is no longer bound to the task that initiated the zap, things like
the CPU affinity and priority of the original task get lost. Losing the
affinity and priority can be especially problematic if unbound workqueues
aren't affined to a small number of CPUs, as zapping multiple roots can
cause KVM to heavily utilize the majority of CPUs in the system, *beyond*
the CPUs KVM is already using to run vCPUs.
When deleting a memslot via KVM_SET_USER_MEMORY_REGION, the async root
zap can result in KVM occupying all logical CPUs for ~8ms, and result in
high priority tasks not being scheduled in in a timely manner. In v5.15,
which doesn't preserve unloaded roots, the issues were even more noticeable
as KVM would zap roots more frequently and could occupy all CPUs for 50ms+.
Consuming all CPUs for an extended duration can lead to significant jitter
throughout the system, e.g. on ChromeOS with virtio-gpu, deleting memslots
is a semi-frequent operation as memslots are deleted and recreated with
different host virtual addresses to react to host GPU drivers allocating
and freeing GPU blobs. On ChromeOS, the jitter manifests as audio blips
during games due to the audio server's tasks not getting scheduled in
promptly, despite the tasks having a high realtime priority.
Deleting memslots isn't exactly a fast path and should be avoided when
possible, and ChromeOS is working towards utilizing MAP_FIXED to avoid the
memslot shenanigans, but KVM is squarely in the wrong. Not to mention
that removing the async zapping eliminates a non-trivial amount of
complexity.
Note, one of the subtle behaviors hidden behind the async zapping is that
KVM would zap invalidated roots only once (ignoring partial zaps from
things like mmu_notifier events). Preserve this behavior by adding a flag
to identify roots that are scheduled to be zapped versus roots that have
already been zapped but not yet freed.
Add a comment calling out why kvm_tdp_mmu_invalidate_all_roots() can
encounter invalid roots, as it's not at all obvious why zapping
invalidated roots shouldn't simply zap all invalid roots.
Paolo Bonzini [Thu, 21 Sep 2023 09:44:56 +0000 (05:44 -0400)]
KVM: x86/mmu: Do not filter address spaces in for_each_tdp_mmu_root_yield_safe()
All callers except the MMU notifier want to process all address spaces.
Remove the address space ID argument of for_each_tdp_mmu_root_yield_safe()
and switch the MMU notifier to use __for_each_tdp_mmu_root_yield_safe().
* tag 'hardening-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
uapi: stddef.h: Fix __DECLARE_FLEX_ARRAY for C++
uapi: stddef.h: Fix header guard location
Merge tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Fix an integer overflow bug when processing an fsmap call
- Fix crash due to CPU hot remove event racing with filesystem mount
operation
- During read-only mount, XFS does not allow the contents of the log to
be recovered when there are one or more unrecognized rcompat features
in the primary superblock, since the log might have intent items
which the kernel does not know how to process
- During recovery of log intent items, XFS now reserves log space
sufficient for one cycle of a permanent transaction to execute.
Otherwise, this could lead to livelocks due to non-availability of
log space
- On an fs which has an ondisk unlinked inode list, trying to delete a
file or allocating an O_TMPFILE file can cause the fs to the shutdown
if the first inode in the ondisk inode list is not present in the
inode cache. The bug is solved by explicitly loading the first inode
in the ondisk unlinked inode list into the inode cache if it is not
already cached
A similar problem arises when the uncached inode is present in the
middle of the ondisk unlinked inode list. This second bug is
triggered when executing operations like quotacheck and bulkstat. In
this case, XFS now reads in the entire ondisk unlinked inode list
- Enable LARP mode only on recent v5 filesystems
- Fix a out of bounds memory access in scrub
- Fix a performance bug when locating the tail of the log during
mounting a filesystem
* tag 'xfs-6.6-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: use roundup_pow_of_two instead of ffs during xlog_find_tail
xfs: only call xchk_stats_merge after validating scrub inputs
xfs: require a relatively recent V5 filesystem for LARP mode
xfs: make inode unlinked bucket recovery work with quotacheck
xfs: load uncached unlinked inodes into memory on demand
xfs: reserve less log space when recovering log intent items
xfs: fix log recovery when unknown rocompat bits are set
xfs: reload entire unlinked bucket lists
xfs: allow inode inactivation during a ro mount log recovery
xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
xfs: remove CPU hotplug infrastructure
xfs: remove the all-mounts list
xfs: use per-mount cpumask to track nonempty percpu inodegc lists
xfs: fix an agbno overflow in __xfs_getfsmap_datadev
xfs: fix per-cpu CIL structure aggregation racing with dying cpus
xfs: fix select in config XFS_ONLINE_SCRUB_STATS
cxl/acpi: Annotate struct cxl_cxims_data with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct cxl_cxims_data.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.
The cxl_test unit test environment models a CXL topology for
sysfs/user-ABI regression testing. It uses interface mocking via the
"--wrap=" linker option to redirect cxl_core routines that parse
hardware registers with versions that just publish objects, like
devm_cxl_enumerate_decoders().
Starting with:
Commit 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port")
...port register enumeration is moved into devm_cxl_add_port(). This
conflicts with the "cxl_test avoids emulating registers stance" so
either the port code needs to be refactored (too violent), or modified
so that register enumeration is skipped on "fake" cxl_test ports
(annoying, but straightforward).
This conflict has happened previously and the "check for platform
device" workaround to avoid instrusive refactoring was deployed in those
scenarios. In general, refactoring should only benefit production code,
test code needs to remain minimally instrusive to the greatest extent
possible.
This was missed previously because it may sometimes just cause warning
messages to be emitted, but it can also cause test failures. The
backport to -stable is only nice to have for clean cxl_test runs.
eventfs: Remember what dentries were created on dir open
Using the following code with libtracefs:
int dfd;
// create the directory events/kprobes/kp1
tracefs_kprobe_raw(NULL, "kp1", "schedule_timeout", "time=$arg1");
// Open the kprobes directory
dfd = tracefs_instance_file_open(NULL, "events/kprobes", O_RDONLY);
// Do a lookup of the kprobes/kp1 directory (by looking at enable)
tracefs_file_exists(NULL, "events/kprobes/kp1/enable");
// Now create a new entry in the kprobes directory
tracefs_kprobe_raw(NULL, "kp2", "schedule_hrtimeout", "expires=$arg1");
// Do another lookup to create the dentries
tracefs_file_exists(NULL, "events/kprobes/kp2/enable"))
// Close the directory
close(dfd);
What happened above, the first open (dfd) will call
dcache_dir_open_wrapper() that will create the dentries and up their ref
counts.
Now the creation of "kp2" will add another dentry within the kprobes
directory.
Upon the close of dfd, eventfs_release() will now do a dput for all the
entries in kprobes. But this is where the problem lies. The open only
upped the dentry of kp1 and not kp2. Now the close is decrementing both
kp1 and kp2, which causes kp2 to get a negative count.
Doing a "trace-cmd reset" which deletes all the kprobes cause the kernel
to crash! (due to the messed up accounting of the ref counts).
To solve this, save all the dentries that are opened in the
dcache_dir_open_wrapper() into an array, and use this array to know what
dentries to do a dput on in eventfs_release().
Since the dcache_dir_open_wrapper() calls dcache_dir_open() which uses the
file->private_data, we need to also add a wrapper around dcache_readdir()
that uses the cursor assigned to the file->private_data. This is because
the dentries need to also be saved in the file->private_data. To do this
create the structure:
Which will hold both the cursor and the dentries. Some shuffling around is
needed to make sure that dcache_dir_open() and dcache_readdir() only see
the cursor.
ring-buffer: Fix bytes info in per_cpu buffer stats
The 'bytes' info in file 'per_cpu/cpu<X>/stats' means the number of
bytes in cpu buffer that have not been consumed. However, currently
after consuming data by reading file 'trace_pipe', the 'bytes' info
was not changed as expected.
The root cause is incorrect stat on cpu_buffer->read_bytes. To fix it:
1. When stat 'read_bytes', account consumed event in rb_advance_reader();
2. When stat 'entries_bytes', exclude the discarded padding event which
is smaller than minimum size because it is invisible to reader. Then
use rb_page_commit() instead of BUF_PAGE_SIZE at where accounting for
page-based read/remove/overrun.
Also correct the comments of ring_buffer_bytes_cpu() in this patch.
Merge tag 'thermal-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Unbreak the trip point update sysfs interface that has been broken
since the 6.3 cycle (Rafael Wysocki)"
* tag 'thermal-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: sysfs: Fix trip_point_hyst_store()
Merge tag 'acpi-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix a general ACPI processor driver regression and an ia64 build
issue, both introduced recently.
Specifics:
- Fix recently introduced uninitialized memory access issue in the
ACPI processor driver (Michal Wilczynski)
- Fix ia64 build inadvertently broken by recent ACPI processor driver
changes, which is prudent to do for 6.6 even though ia64 support is
slated for removal in 6.7 (Ard Biesheuvel)"
* tag 'acpi-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: processor: Fix uninitialized access of buf in acpi_set_pdc_bits()
acpi: Provide ia64 dummy implementation of acpi_proc_quirk_mwait_check()
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Small crop of relatively boring arm64 fixes for -rc3.
That's not to say we don't have any juicy bugs, however, it's just
that fixes for those are likely to come via -mm and -tip for a hugetlb
and an atomics issue respectively. I get left with the
documentation...
- Fix detection of "ClearBHB" and "Hinted Conditional Branch" features
- Fix broken wildcarding for Arm PMU MAINTAINERS entry
- Add missing documentation for userspace-visible ID register fields"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Document missing userspace visible fields in ID_AA64ISAR2_EL1
arm64/hbc: Document HWCAP2_HBC
arm64/sme: Include ID_AA64PFR1_EL1.SME in cpu-feature-registers.rst
arm64: cpufeature: Fix CLRBHB and BC detection
MAINTAINERS: Use wildcard pattern for ARM PMU headers
Merge tag 'x86_urgent_for_v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 rethunk fixes from Borislav Petkov:
"Fix the patching ordering between static calls and return thunks"
* tag 'x86_urgent_for_v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86,static_call: Fix static-call vs return-thunk
x86/alternatives: Remove faulty optimization
Merge tag 'x86-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- Fix a kexec bug
- Fix an UML build bug
- Fix a handful of SRSO related bugs
- Fix a shadow stacks handling bug & robustify related code
* tag 'x86-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/shstk: Add warning for shadow stack double unmap
x86/shstk: Remove useless clone error handling
x86/shstk: Handle vfork clone failure correctly
x86/srso: Fix SBPB enablement for spec_rstack_overflow=off
x86/srso: Don't probe microcode in a guest
x86/srso: Set CPUID feature bits independently of bug or mitigation status
x86/srso: Fix srso_show_state() side effect
x86/asm: Fix build of UML with KASAN
x86/mm, kexec, ima: Use memblock_free_late() from ima_free_kexec_buffer()
Merge tag 'locking-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Fix a include/linux/atomic/atomic-arch-fallback.h breakage that
generated incorrect code, and fix a lockdep reporting race that may
result in lockups"
* tag 'locking-urgent-2023-09-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/seqlock: Do the lockdep annotation before locking in do_write_seqcount_begin_nested()
locking/atomic: scripts: fix fallback ifdeffery
vfio/mdev: Fix a null-ptr-deref bug for mdev_unregister_parent()
Inject fault while probing mdpy.ko, if kstrdup() of create_dir() fails in
kobject_add_internal() in kobject_init_and_add() in mdev_type_add()
in parent_create_sysfs_files(), it will return 0 and probe successfully.
And when rmmod mdpy.ko, the mdpy_dev_exit() will call
mdev_unregister_parent(), the mdev_type_remove() may traverse uninitialized
parent->types[i] in parent_remove_sysfs_files(), and it will cause
below null-ptr-deref.
If mdev_type_add() fails, return the error code and kset_unregister()
to fix the issue.
095b8303f383 ("x86/alternative: Make custom return thunk unconditional")
made '__x86_return_thunk' a placeholder value. All code setting
X86_FEATURE_RETHUNK also changes the value of 'x86_return_thunk'. So
the optimization at the beginning of apply_returns() is dead code.
Also, before the above-mentioned commit, the optimization actually had a
bug It bypassed __static_call_fixup(), causing some raw returns to
remain unpatched in static call trampolines. Thus the 'Fixes' tag.
Merge tag 'efi-fixes-for-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fix from Ard Biesheuvel:
"Follow-up fix for the unaccepted memory fix merged last week as part
of the first EFI fixes batch.
The unaccepted memory table needs to be accessible very early, even in
cases (such as crashkernels) where the direct map does not cover all
of DRAM, and so it is added to memblock explicitly, and subsequently
memblock_reserve()'d as before"
* tag 'efi-fixes-for-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/unaccepted: Make sure unaccepted table is mapped
Merge tag 'drm-fixes-2023-09-22-2' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Ben Skeggs is stepping away from nouveau and Red Hat for personal
reasons, he'll be missed and we intend to fill the gaps in the
upcoming time with Danilo and Lyude stepping in for now.
Otherwise i915, nouveau, amdgpu with a few each and some misc spread
around.
MAINTAINERS:
- drop Ben as he retired from nouveau
core:
- drm_mm test fixes
fbdev:
- Kconfig fixes
ivpu:
- IRQ-handling fixes
meson:
- Fix memory leak in HDMI EDID code
nouveau:
- Correct type casting
- Fix memory leak in scheduler
- u_memcpya() fixes
i915:
- Prevent error pointer dereference
- Fix PMU busyness values when using GuC mode
amdgpu:
- MST fix
- Vbios part number reporting fix
- Fix a possible memory leak in an error case in the RAS code
- Fix low resolution modes on eDP
amdkfd:
- Fix GPU address for user queue wptr when GART is not at 0"
* tag 'drm-fixes-2023-09-22-2' of git://anongit.freedesktop.org/drm/drm:
MAINTAINERS: remove myself as nouveau maintainer
fbdev/sh7760fb: Depend on FB=y
drm/amdkfd: Use gpu_offset for user queue's wptr
drm/amd/display: fix the ability to use lower resolution modes on eDP
drm/amdgpu: fix a memory leak in amdgpu_ras_feature_enable
Revert "drm/amdgpu: Report vbios version instead of PN"
drm/amd/display: Fix MST recognizes connected displays as one
drm/virtio: clean out_fence on complete_submit
i915/pmu: Move execlist stats initialization to execlist specific setup
drm/i915/gt: Prevent error pointer dereference
drm/meson: fix memory leak on ->hpd_notify callback
accel/ivpu/40xx: Fix buttress interrupt handling
nouveau/u_memcpya: fix NULL vs error pointer bug
nouveau/u_memcpya: use vmemdup_user
drm/nouveau: sched: fix leaking memory of timedout job
drm/nouveau: fence: fix type cast warning in nouveau_fence_emit()
drm: fix up fbdev Kconfig defaults
drm/tests: Fix incorrect argument in drm_test_mm_insert_range
Merge tag 'platform-drivers-x86-v6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"The most noteworthy change in here is the addition of Ilpo Järvinen as
co-maintainer of platform-drivers-x86. Ilpo will be helping me with
platform-drivers-x86 maintenance going forward and you can expect
pull-requests from Ilpo in the future.
Other then that there is a set of Intel SCU IPC fixes and a
thinkpad_acpi locking fix"
* tag 'platform-drivers-x86-v6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
MAINTAINERS: Add x86 platform drivers patchwork
MAINTAINERS: Add myself into x86 platform driver maintainers
platform/x86: thinkpad_acpi: Take mutex in hotkey_resume
platform/x86: intel_scu_ipc: Fail IPC send if still busy
platform/x86: intel_scu_ipc: Don't override scu in intel_scu_ipc_dev_simple_command()
platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt()
platform/x86: intel_scu_ipc: Check status after timeout in busy_loop()
Daniel Scally [Wed, 20 Sep 2023 13:41:09 +0000 (14:41 +0100)]
i2c: xiic: Correct return value check for xiic_reinit()
The error paths for xiic_reinit() return negative values on failure
and 0 on success - this error message therefore is triggered on
_success_ rather than failure. Correct the condition so it's only
shown on failure as intended.
gpio_sim_make_line_names() returns NULL or ERR_PTR() so we must not use
__free(kfree) on the returned address. Split this function into two, one
that determines the size of the "gpio-line-names" array to allocate and
one that actually sets the names at correct offsets. The allocation and
assignment of the managed pointer happens in between.
Damien Le Moal [Fri, 15 Sep 2023 02:20:34 +0000 (11:20 +0900)]
scsi: core: ata: Do no try to probe for CDL on old drives
Some old drives (e.g. an Ultra320 SCSI disk as reported by John) do not
seem to execute MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES
commands correctly and hang when a non-zero service action is specified
(one command format with service action case in scsi_report_opcode()).
Currently, CDL probing with scsi_cdl_check_cmd() is the only caller using a
non zero service action for scsi_report_opcode(). To avoid issues with
these old drives, do not attempt CDL probe if the device reports support
for an SPC version lower than 5 (CDL was introduced in SPC-5). To keep
things working with ATA devices which probe for the CDL T2A and T2B pages
introduced with SPC-6, modify ata_scsiop_inq_std() to claim SPC-6 version
compatibility for ATA drives supporting CDL.
SPC-6 standard version number is defined as Dh (= 13) in SPC-6 r09. Fix
scsi_probe_lun() to correctly capture this value by changing the bit mask
for the second byte of the INQUIRY response from 0x7 to 0xf.
include/scsi/scsi.h is modified to add the definition SCSI_SPC_6 with the
value 14 (Dh + 1). The missing definitions for the SCSI_SPC_4 and
SCSI_SPC_5 versions are also added.
Merge tag 'fix-ia64-build-for-v6.6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/ardb/linux
Merge an ia64 ACPI build fix for v6.6 from Ard Biesheuvel:
"Build fix for Itanium/ia64:
- provide dummy implementation of acpi_proc_quirk_mwait_check() which
was moved out of generic code into arch/x86, breaking the ia64 build"
* tag 'fix-ia64-build-for-v6.6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/ardb/linux:
acpi: Provide ia64 dummy implementation of acpi_proc_quirk_mwait_check()
- netfilter:
- fix several GC related issues
- fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
- eth: team: fix null-ptr-deref when team device type is changed
- eth: i40e: fix VF VLAN offloading when port VLAN is configured
- eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB
Previous releases - always broken:
- core: fix ETH_P_1588 flow dissector
- mptcp: fix several connection hang-up conditions
- bpf:
- avoid deadlock when using queue and stack maps from NMI
- add override check to kprobe multi link attach
- hsr: properly parse HSRv1 supervisor frames.
- eth: igc: fix infinite initialization loop with early XDP redirect
- eth: octeon_ep: fix tx dma unmap len values in SG
- eth: hns3: fix GRE checksum offload issue"
* tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast()
igc: Expose tx-usecs coalesce setting to user
octeontx2-pf: Do xdp_do_flush() after redirects.
bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI
net: ena: Flush XDP packets on error.
net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()
net: hinic: Fix warning-hinic_set_vlan_fliter() warn: variable dereferenced before check 'hwdev'
netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
netfilter: nf_tables: fix memleak when more than 255 elements expired
netfilter: nf_tables: disable toggling dormant table state more than once
vxlan: Add missing entries to vxlan_get_size()
net: rds: Fix possible NULL-pointer dereference
team: fix null-ptr-deref when team device type is changed
net: bridge: use DEV_STATS_INC()
net: hns3: add 5ms delay before clear firmware reset irq source
net: hns3: fix fail to delete tc flower rules during reset issue
net: hns3: only enable unicast promisc when mac table full
net: hns3: fix GRE checksum offload issue
net: hns3: add cmdq check for vf periodic service task
net: stmmac: fix incorrect rxq|txq_stats reference
...
Merge tag 'v6.6-rc3.vfs.ctime.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull finegrained timestamp reverts from Christian Brauner:
"Earlier this week we sent a few minor fixes for the multi-grained
timestamp work in [1]. While we were polishing those up after Linus
realized that there might be a nicer way to fix them we received a
regression report in [2] that fine grained timestamps break gnulib
tests and thus possibly other tools.
The kernel will elide fine-grain timestamp updates when no one is
actively querying for them to avoid performance impacts. So a sequence
like write(f1) stat(f2) write(f2) stat(f2) write(f1) stat(f1) may
result in timestamp f1 to be older than the final f2 timestamp even
though f1 was last written too but the second write didn't update the
timestamp.
Such plotholes can lead to subtle bugs when programs compare
timestamps. For example, the nap() function in [2] will estimate that
it needs to wait one ns on a fine-grain timestamp enabled filesytem
between subsequent calls to observe a timestamp change. But in general
we don't update timestamps with more than one jiffie if we think that
no one is actively querying for fine-grain timestamps to avoid
performance impacts.
While discussing various fixes the decision was to go back to the
drawing board and ultimately to explore a solution that involves only
exposing such fine-grained timestamps to nfs internally and never to
userspace.
As there are multiple solutions discussed the honest thing to do here
is not to fix this up or disable it but to cleanly revert. The general
infrastructure will probably come back but there is no reason to keep
this code in mainline.
The general changes to timestamp handling are valid and a good cleanup
that will stay. The revert is fully bisectable"
Josef Bacik [Tue, 5 Sep 2023 16:15:24 +0000 (12:15 -0400)]
btrfs: initialize start_slot in btrfs_log_prealloc_extents
Jens reported a compiler warning when using
CONFIG_CC_OPTIMIZE_FOR_SIZE=y that looks like this
fs/btrfs/tree-log.c: In function ‘btrfs_log_prealloc_extents’:
fs/btrfs/tree-log.c:4828:23: warning: ‘start_slot’ may be used
uninitialized [-Wmaybe-uninitialized]
4828 | ret = copy_items(trans, inode, dst_path, path,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4829 | start_slot, ins_nr, 1, 0);
| ~~~~~~~~~~~~~~~~~~~~~~~~~
fs/btrfs/tree-log.c:4725:13: note: ‘start_slot’ was declared here
4725 | int start_slot;
| ^~~~~~~~~~
The compiler is incorrect, as we only use this code when ins_len > 0,
and when ins_len > 0 we have start_slot properly initialized. However
we generally find the -Wmaybe-uninitialized warnings valuable, so
initialize start_slot to get rid of the warning.
Josef Bacik [Tue, 5 Sep 2023 16:15:23 +0000 (12:15 -0400)]
btrfs: make sure to initialize start and len in find_free_dev_extent
Jens reported a compiler error when using CONFIG_CC_OPTIMIZE_FOR_SIZE=y
that looks like this
In function ‘gather_device_info’,
inlined from ‘btrfs_create_chunk’ at fs/btrfs/volumes.c:5507:8:
fs/btrfs/volumes.c:5245:48: warning: ‘dev_offset’ may be used uninitialized [-Wmaybe-uninitialized]
5245 | devices_info[ndevs].dev_offset = dev_offset;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
fs/btrfs/volumes.c: In function ‘btrfs_create_chunk’:
fs/btrfs/volumes.c:5196:13: note: ‘dev_offset’ was declared here
5196 | u64 dev_offset;
This occurs because find_free_dev_extent is responsible for setting
dev_offset, however if we get an -ENOMEM at the top of the function
we'll return without setting the value.
This isn't actually a problem because we will see the -ENOMEM in
gather_device_info() and return and not use the uninitialized value,
however we also just don't want the compiler warning so rework the code
slightly in find_free_dev_extent() to make sure it's always setting
*start and *len to avoid the compiler warning.
Merge tag 'powerpc-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- A fix for breakpoint handling which was using get_user() while atomic
- Fix the Power10 HASHCHK handler which was using get_user() while
atomic
- A few build fixes for issues caused by recent changes
Thanks to Benjamin Gray, Christophe Leroy, Kajol Jain, and Naveen N Rao.
* tag 'powerpc-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/dexcr: Move HASHCHK trap handler
powerpc/82xx: Select FSL_SOC
powerpc: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION and FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
powerpc/watchpoints: Annotate atomic context in more places
powerpc/watchpoint: Disable pagefaults when getting user instruction
powerpc/watchpoints: Disable preemption in thread_change_pc()
powerpc/perf/hv-24x7: Update domain value check
KVM: x86/mmu: Open code leaf invalidation from mmu_notifier
The mmu_notifier path is a bit of a special snowflake, e.g. it zaps only a
single address space (because it's per-slot), and can't always yield.
Because of this, it calls kvm_tdp_mmu_zap_leafs() in ways that no one
else does.
Iterate manually over the leafs in response to an mmu_notifier
invalidation, instead of invoking kvm_tdp_mmu_zap_leafs(). Drop the
@can_yield param from kvm_tdp_mmu_zap_leafs() as its sole remaining
caller unconditionally passes "true".
Currently the AIA ONE_REG registers are reported by get-reg-list
as new registers for various vcpu_reg_list configs whenever Ssaia
is available on the host because Ssaia extension can only be
disabled by Smstateen extension which is not always available.
To tackle this, we should filter-out AIA ONE_REG registers only
when Ssaia can't be disabled for a VCPU.
KVM: riscv: selftests: Fix ISA_EXT register handling in get-reg-list
Same set of ISA_EXT registers are not present on all host because
ISA_EXT registers are visible to the KVM user space based on the
ISA extensions available on the host. Also, disabling an ISA
extension using corresponding ISA_EXT register does not affect
the visibility of the ISA_EXT register itself.
Based on the above, we should filter-out all ISA_EXT registers.
RISC-V: KVM: Fix KVM_GET_REG_LIST API for ISA_EXT registers
The ISA_EXT registers to enabled/disable ISA extensions for VCPU
are always available when underlying host has the corresponding
ISA extension. The copy_isa_ext_reg_indices() called by the
KVM_GET_REG_LIST API does not align with this expectation so
let's fix it.
Paolo Abeni [Thu, 21 Sep 2023 09:09:44 +0000 (11:09 +0200)]
Merge tag 'nf-23-09-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter updates for net
The following three patches fix regressions in the netfilter subsystem:
1. Reject attempts to repeatedly toggle the 'dormant' flag in a single
transaction. Doing so makes nf_tables lose track of the real state
vs. the desired state. This ends with an attempt to unregister hooks
that were never registered in the first place, which yields a splat.
2. Fix element counting in the new nftables garbage collection infra
that came with 6.5: More than 255 expired elements wraps a counter
which results in memory leak.
3. Since 6.4 ipset can BUG when a set is renamed while a CREATE command
is in progress, fix from Jozsef Kadlecsik.
* tag 'nf-23-09-20' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
netfilter: nf_tables: fix memleak when more than 255 elements expired
netfilter: nf_tables: disable toggling dormant table state more than once
====================
Liang He [Wed, 22 Mar 2023 04:29:51 +0000 (12:29 +0800)]
i2c: mux: gpio: Add missing fwnode_handle_put()
In i2c_mux_gpio_probe_fw(), we should add fwnode_handle_put()
when break out of the iteration device_for_each_child_node()
as it will automatically increase and decrease the refcounter.
Fixes: 98b2b712bc85 ("i2c: i2c-mux-gpio: Enable this driver in ACPI land") Signed-off-by: Liang He <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
Edward Cree [Tue, 19 Sep 2023 18:39:49 +0000 (19:39 +0100)]
sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast()
Several places in TC offload code assumed that the return from
rhashtable_lookup_get_insert_fast() was always either NULL or a valid
pointer to an existing entry, but in fact that function can return an
error pointer. In that case, perform the usual cleanup of the newly
created entry, then pass up the error, rather than attempting to take a
reference on the old entry.
Fix linker error if FB=m about missing fb_io_read and fb_io_write. The
linker's error message suggests that this config setting has already
been broken for other symbols.
All errors (new ones prefixed by >>):
sh4-linux-ld: drivers/video/fbdev/sh7760fb.o: in function `sh7760fb_probe':
sh7760fb.c:(.text+0x374): undefined reference to `framebuffer_alloc'
sh4-linux-ld: sh7760fb.c:(.text+0x394): undefined reference to `fb_videomode_to_var'
sh4-linux-ld: sh7760fb.c:(.text+0x39c): undefined reference to `fb_alloc_cmap'
sh4-linux-ld: sh7760fb.c:(.text+0x3a4): undefined reference to `register_framebuffer'
sh4-linux-ld: sh7760fb.c:(.text+0x3ac): undefined reference to `fb_dealloc_cmap'
sh4-linux-ld: sh7760fb.c:(.text+0x434): undefined reference to `framebuffer_release'
sh4-linux-ld: drivers/video/fbdev/sh7760fb.o: in function `sh7760fb_remove':
sh7760fb.c:(.text+0x800): undefined reference to `unregister_framebuffer'
sh4-linux-ld: sh7760fb.c:(.text+0x804): undefined reference to `fb_dealloc_cmap'
sh4-linux-ld: sh7760fb.c:(.text+0x814): undefined reference to `framebuffer_release'
>> sh4-linux-ld: drivers/video/fbdev/sh7760fb.o:(.rodata+0xc): undefined reference to `fb_io_read'
>> sh4-linux-ld: drivers/video/fbdev/sh7760fb.o:(.rodata+0x10): undefined reference to `fb_io_write'
sh4-linux-ld: drivers/video/fbdev/sh7760fb.o:(.rodata+0x2c): undefined reference to `cfb_fillrect'
sh4-linux-ld: drivers/video/fbdev/sh7760fb.o:(.rodata+0x30): undefined reference to `cfb_copyarea'
sh4-linux-ld: drivers/video/fbdev/sh7760fb.o:(.rodata+0x34): undefined reference to `cfb_imageblit'
When users attempt to obtain the coalesce setting using the
ethtool command, current code always returns 0 for tx-usecs.
This is because I225/6 always uses a queue pair setting, hence
tx_coalesce_usecs does not return a value during the
igc_ethtool_get_coalesce() callback process. The pair queue
condition checking in igc_ethtool_get_coalesce() is removed by
this patch so that the user gets information of the value of tx-usecs.
Even if i225/6 is using queue pair setting, there is no harm in
notifying the user of the tx-usecs. The implementation of the current
code may have previously been a copy of the legacy code i210.
Since I225 has the queue pair setting enabled, tx-usecs will always adhere
to the user-set rx-usecs value. An error message will appear when the user
attempts to set the tx-usecs value for the input parameters because,
by default, they should only set the rx-usecs value.
This patch also adds the helper function to get the
previous rx coalesce value similar to tx coalesce.
How to test:
User can get the coalesce value using ethtool command.
bnxt_poll_nitroa0() invokes bnxt_rx_pkt() which can run a XDP program
which in turn can return XDP_REDIRECT. bnxt_rx_pkt() is also used by
__bnxt_poll_work() which flushes (xdp_do_flush()) the packets after each
round. bnxt_poll_nitroa0() lacks this feature.
xdp_do_flush() should be invoked before leaving the NAPI callback.
Invoke xdp_do_flush() after a redirect in bnxt_poll_nitroa0() NAPI.
xdp_do_flush() should be invoked before leaving the NAPI poll function
after a XDP-redirect. This is not the case if the driver leaves via
the error path (after having a redirect in one of its previous
iterations).
locking/seqlock: Do the lockdep annotation before locking in do_write_seqcount_begin_nested()
It was brought up by Tetsuo that the following sequence:
write_seqlock_irqsave()
printk_deferred_enter()
could lead to a deadlock if the lockdep annotation within
write_seqlock_irqsave() triggers.
The problem is that the sequence counter is incremented before the lockdep
annotation is performed. The lockdep splat would then attempt to invoke
printk() but the reader side, of the same seqcount, could have a
tty_port::lock acquired waiting for the sequence number to become even again.
The other lockdep annotations come before the actual locking because "we
want to see the locking error before it happens". There is no reason why
seqcount should be different here.
Do the lockdep annotation first then perform the locking operation (the
sequence increment).
Hamza Mahfooz [Wed, 13 Sep 2023 18:48:08 +0000 (14:48 -0400)]
drm/amd/display: fix the ability to use lower resolution modes on eDP
On eDP we can receive invalid modes from dm_update_crtc_state() for
entirely new streams for which drm_mode_set_crtcinfo() shouldn't be
called on. So, instead of calling drm_mode_set_crtcinfo() from within
create_stream_for_sink() we can instead call it from
amdgpu_dm_connector_mode_valid(). Since, we are guaranteed to only call
drm_mode_set_crtcinfo() for valid modes from that function (invalid
modes are rejected by that callback) and that is the only user
of create_validate_stream_for_sink() that we need to call
drm_mode_set_crtcinfo() for (as before commit cb841d27b876
("drm/amd/display: Always pass connector_state to stream validation"),
that is the only place where create_validate_stream_for_sink()'s
dm_state was NULL).
Cong Liu [Thu, 14 Sep 2023 09:45:33 +0000 (17:45 +0800)]
drm/amdgpu: fix a memory leak in amdgpu_ras_feature_enable
This patch fixes a memory leak in the amdgpu_ras_feature_enable() function.
The leak occurs when the function sends a command to the firmware to enable
or disable a RAS feature for a GFX block. If the command fails, the kfree()
function is not called to free the info memory.
Xiaoke Wang [Thu, 3 Mar 2022 12:39:14 +0000 (20:39 +0800)]
i2c: mux: demux-pinctrl: check the return value of devm_kstrdup()
devm_kstrdup() returns pointer to allocated string on success,
NULL on failure. So it is better to check the return value of it.
Fixes: e35478eac030 ("i2c: mux: demux-pinctrl: run properly with multiple instances") Signed-off-by: Xiaoke Wang <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>