rtc: isl12022: use dev_set_drvdata() instead of i2c_set_clientdata()
As another preparation for removing direct references to the
i2c_client in the helper functions, stash a pointer to the private
data via dev_set_drvdata() instead of i2c_set_clientdata().
rtc: isl12022: stop using deprecated devm_rtc_device_register()
The comments say that devm_rtc_device_register() is deprecated and
that one should instead use devm_rtc_allocate_device() and
[devm_]rtc_register_device. So do that.
Linus Torvalds [Wed, 12 Oct 2022 22:01:58 +0000 (15:01 -0700)]
Merge tag 'linux-kselftest-kunit-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull more KUnit updates from Shuah Khan:
"Features and fixes:
- simplify resource use
- make kunit_malloc() and kunit_free() allocations and frees
consistent. kunit_free() frees only the memory allocated by
kunit_malloc()
- stop downloading risc-v opensbi binaries using wget
- other fixes and improvements to tool and KUnit framework"
* tag 'linux-kselftest-kunit-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: kunit: Update description of --alltests option
kunit: declare kunit_assert structs as const
kunit: rename base KUNIT_ASSERTION macro to _KUNIT_FAILED
kunit: remove format func from struct kunit_assert, get it to 0 bytes
kunit: tool: Don't download risc-v opensbi firmware with wget
kunit: make kunit_kfree(NULL) a no-op to match kfree()
kunit: make kunit_kfree() not segfault on invalid inputs
kunit: make kunit_kfree() only work on pointers from kunit_malloc() and friends
kunit: drop test pointer in string_stream_fragment
kunit: string-stream: Simplify resource use
Palmer Dabbelt [Wed, 12 Oct 2022 21:59:53 +0000 (14:59 -0700)]
Merge tag 'dt-for-palmer-v6.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into for-next
Microchip RISC-V devicetrees for v6.1
Fixups, reference design changes and new boards:
- The addition of QSPI support for mpfs had a corresponding change to
the devicetree node.
- The v2022.{09,10} reference designs brought with them several memory
map changes which are not backwards compatible. The old devicetrees
from the v2022.08 and earlier releases still work with current
kernels.
- Two new devicetrees for a first-party development kit and for the
Aries Embedded M100FPSEVP kit.
- Corresponding dt-bindings changes for the above.
Signed-off-by: Conor Dooley <[email protected]>
* tag 'dt-for-palmer-v6.1-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: microchip: fix fabric i2c reg size
riscv: dts: microchip: update memory configuration for v2022.10
riscv: dts: microchip: add a devicetree for aries' m100pfsevp
riscv: dts: microchip: add sevkit device tree
riscv: dts: microchip: reduce the fic3 clock rate
riscv: dts: microchip: icicle: re-jig fabric peripheral addresses
riscv: dts: microchip: icicle: update pci address properties
riscv: dts: microchip: move the mpfs' pci node to -fabric.dtsi
riscv: dts: microchip: add pci dma ranges for the icicle kit
dt-bindings: riscv: microchip: document the sev kit
dt-bindings: riscv: microchip: document the aries m100pfsevp
dt-bindings: riscv: microchip: document icicle reference design
riscv: dts: microchip: add qspi compatible fallback
Linus Torvalds [Wed, 12 Oct 2022 21:59:13 +0000 (14:59 -0700)]
Merge tag 'linux-kselftest-next-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull more Kselftest updates from Shuah Khan:
"This consists of fixes and improvements to memory-hotplug test and a
minor spelling fix to ftrace test"
* tag 'linux-kselftest-next-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
docs: notifier-error-inject: Correct test's name
selftests/memory-hotplug: Adjust log info for maintainability
selftests/memory-hotplug: Restore memory before exit
selftests/memory-hotplug: Add checking after online or offline
selftests/ftrace: func_event_triggers: fix typo in user message
Linus Torvalds [Wed, 12 Oct 2022 21:46:48 +0000 (14:46 -0700)]
Merge tag 'vfio-v6.1-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Prune private items from vfio_pci_core.h to a new internal header,
fix missed function rename, and refactor vfio-pci interrupt defines
(Jason Gunthorpe)
- Create consistent naming and handling of ioctls with a function per
ioctl for vfio-pci and vfio group handling, use proper type args
where available (Jason Gunthorpe)
- Implement a set of low power device feature ioctls allowing userspace
to make use of power states such as D3cold where supported (Abhishek
Sahu)
- Remove device counter on vfio groups, which had restricted the page
pinning interface to singleton groups to account for limitations in
the type1 IOMMU backend. Document usage as limited to emulated IOMMU
devices, ie. traditional mdev devices where this restriction is
consistent (Jason Gunthorpe)
- Correct function prefix in hisi_acc driver incurred during previous
refactoring (Shameer Kolothum)
- Correct typo and remove redundant warning triggers in vfio-fsl driver
(Christophe JAILLET)
- Introduce device level DMA dirty tracking uAPI and implementation in
the mlx5 variant driver (Yishai Hadas & Joao Martins)
- Move much of the vfio_device life cycle management into vfio core,
simplifying and avoiding duplication across drivers. This also
facilitates adding a struct device to vfio_device which begins the
introduction of device rather than group level user support and fills
a gap allowing userspace identify devices as vfio capable without
implicit knowledge of the driver (Kevin Tian & Yi Liu)
- Split vfio container handling to a separate file, creating a more
well defined API between the core and container code, masking IOMMU
backend implementation from the core, allowing for an easier future
transition to an iommufd based implementation of the same (Jason
Gunthorpe)
- Attempt to resolve race accessing the iommu_group for a device
between vfio releasing DMA ownership and removal of the device from
the IOMMU driver. Follow-up with support to allow vfio_group to exist
with NULL iommu_group pointer to support existing userspace use cases
of holding the group file open (Jason Gunthorpe)
- Fix error code and hi/lo register manipulation issues in the hisi_acc
variant driver, along with various code cleanups (Longfang Liu)
- Fix a prior regression in GVT-g group teardown, resulting in
unreleased resources (Jason Gunthorpe)
- A significant cleanup and simplification of the mdev interface,
consolidating much of the open coded per driver sysfs interface
support into the mdev core (Christoph Hellwig)
- Simplification of tracking and locking around vfio_groups that fall
out from previous refactoring (Jason Gunthorpe)
- Replace trivial open coded f_ops tests with new helper (Alex
Williamson)
* tag 'vfio-v6.1-rc1' of https://github.com/awilliam/linux-vfio: (77 commits)
vfio: More vfio_file_is_group() use cases
vfio: Make the group FD disassociate from the iommu_group
vfio: Hold a reference to the iommu_group in kvm for SPAPR
vfio: Add vfio_file_is_group()
vfio: Change vfio_group->group_rwsem to a mutex
vfio: Remove the vfio_group->users and users_comp
vfio/mdev: add mdev available instance checking to the core
vfio/mdev: consolidate all the description sysfs into the core code
vfio/mdev: consolidate all the available_instance sysfs into the core code
vfio/mdev: consolidate all the name sysfs into the core code
vfio/mdev: consolidate all the device_api sysfs into the core code
vfio/mdev: remove mtype_get_parent_dev
vfio/mdev: remove mdev_parent_dev
vfio/mdev: unexport mdev_bus_type
vfio/mdev: remove mdev_from_dev
vfio/mdev: simplify mdev_type handling
vfio/mdev: embedd struct mdev_parent in the parent data structure
vfio/mdev: make mdev.h standalone includable
drm/i915/gvt: simplify vgpu configuration management
drm/i915/gvt: fix a memory leak in intel_gvt_init_vgpu_types
...
Billy Tsai [Mon, 26 Sep 2022 10:51:45 +0000 (18:51 +0800)]
i3c: master: Remove the wrong place of reattach.
The reattach should be used when an I3C device has its address changed.
But the modified place in this patch doesn't have the address changed of
the newdev. This wrong reattach will reserve the same address slot twice
and return unexpected -EBUSY when the bus find the duplicate device with
diffent dynamic address.
Billy Tsai [Mon, 26 Sep 2022 10:51:44 +0000 (18:51 +0800)]
i3c: master: Free the old_dyn_addr when reattach.
This patch is used to free the old_dyn_addr when the caller want to
reattach the device to the different dynamic address. If the
old_dyn_addr is 0 the function will treat it as no old_dyn_addr is
reserved on the bus. Without the patch, when the driver reattach the i3c
device after setnewda the old_dyn_addr will be permanently occupied.
Linus Torvalds [Wed, 12 Oct 2022 18:16:58 +0000 (11:16 -0700)]
Merge tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"Five hotfixes - three for nilfs2, two for MM. For are cc:stable, one
is not"
* tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: fix leak of nilfs_root in case of writer thread creation failure
nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
nilfs2: fix use-after-free bug of struct nilfs_root
mm/damon/core: initialize damon_target->list in damon_new_target()
mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
Linus Torvalds [Wed, 12 Oct 2022 18:00:22 +0000 (11:00 -0700)]
Merge tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- hfs and hfsplus kmap API modernization (Fabio Francesco)
- make crash-kexec work properly when invoked from an NMI-time panic
(Valentin Schneider)
- ntfs bugfixes (Hawkins Jiawei)
- improve IPC msg scalability by replacing atomic_t's with percpu
counters (Jiebin Sun)
- nilfs2 cleanups (Minghao Chi)
- lots of other single patches all over the tree!
* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
proc: test how it holds up with mapping'less process
mailmap: update Frank Rowand email address
ia64: mca: use strscpy() is more robust and safer
init/Kconfig: fix unmet direct dependencies
ia64: update config files
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
fork: remove duplicate included header files
init/main.c: remove unnecessary (void*) conversions
proc: mark more files as permanent
nilfs2: remove the unneeded result variable
nilfs2: delete unnecessary checks before brelse()
checkpatch: warn for non-standard fixes tag style
usr/gen_init_cpio.c: remove unnecessary -1 values from int file
ipc/msg: mitigate the lock contention with percpu counter
percpu: add percpu_counter_add_local and percpu_counter_sub_local
fs/ocfs2: fix repeated words in comments
relay: use kvcalloc to alloc page array in relay_alloc_page_array
proc: make config PROC_CHILDREN depend on PROC_FS
fs: uninline inode_maybe_inc_iversion()
...
The problem is that the synthetic event field "char file[]" will read
the value given to it as a string without any memory checks to make sure
the address is valid. The above example will pass in the user space
address and the sythetic event code will happily call strlen() on it
and then strscpy() where either one will cause an oops when accessing
user space addresses.
Use the helper functions from trace_kprobe and trace_eprobe that can
read strings safely (and actually succeed when the address is from user
space and the memory is mapped in).
tracing: Add "(fault)" name injection to kernel probes
Have the specific functions for kernel probes that read strings to inject
the "(fault)" name directly. trace_probes.c does this too (for uprobes)
but as the code to read strings are going to be used by synthetic events
(and perhaps other utilities), it simplifies the code by making sure those
other uses do not need to implement the "(fault)" name injection as well.
are identical in both trace_kprobe.c and trace_eprobe.c. Move them into
a new header file trace_probe_kernel.h to share it. This code will later
be used by the synthetic events as well.
Marked for stable as a fix for a crash in synthetic events requires it.
Linus Torvalds [Wed, 12 Oct 2022 17:35:20 +0000 (10:35 -0700)]
Merge tag 'loongarch-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Use EXPLICIT_RELOCS (ABIv2.0)
- Use generic BUG() handler
- Refactor TLB/Cache operations
- Add qspinlock support
- Add perf events support
- Add kexec/kdump support
- Add BPF JIT support
- Add ACPI-based laptop driver
- Update the default config file
* tag 'loongarch-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (25 commits)
LoongArch: Update Loongson-3 default config file
LoongArch: Add ACPI-based generic laptop driver
LoongArch: Add BPF JIT support
LoongArch: Add some instruction opcodes and formats
LoongArch: Move {signed,unsigned}_imm_check() to inst.h
LoongArch: Add kdump support
LoongArch: Add kexec support
LoongArch: Use generic BUG() handler
LoongArch: Add SysRq-x (TLB Dump) support
LoongArch: Add perf events support
LoongArch: Add qspinlock support
LoongArch: Use TLB for ioremap()
LoongArch: Support access filter to /dev/mem interface
LoongArch: Refactor cache probe and flush methods
LoongArch: mm: Refactor TLB exception handlers
LoongArch: Support R_LARCH_GOT_PC_{LO12,HI20} in modules
LoongArch: Support PC-relative relocations in modules
LoongArch: Define ELF relocation types added in ABIv2.0
LoongArch: Adjust symbol addressing for AS_HAS_EXPLICIT_RELOCS
LoongArch: Add Kconfig option AS_HAS_EXPLICIT_RELOCS
...
Linus Torvalds [Wed, 12 Oct 2022 17:23:24 +0000 (10:23 -0700)]
Merge tag 'irq-core-2022-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt updates from Thomas Gleixner:
"Core code:
- Provide a generic wrapper which can be utilized in drivers to
handle the problem of force threaded demultiplex interrupts on RT
enabled kernels. This avoids conditionals and horrible quirks in
drivers all over the place
- Fix up affected pinctrl and GPIO drivers to make them cleanly RT
safe
Interrupt drivers:
- A new driver for the FSL MU platform specific MSI implementation
- Make irqchip_init() available for pure ACPI based systems
- Provide a functional DT binding for the Realtek RTL interrupt chip
- The usual DT updates and small code improvements all over the
place"
* tag 'irq-core-2022-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
irqchip: IMX_MU_MSI should depend on ARCH_MXC
irqchip/imx-mu-msi: Fix wrong register offset for 8ulp
irqchip/ls-extirq: Fix invalid wait context by avoiding to use regmap
dt-bindings: irqchip: Describe the IMX MU block as a MSI controller
irqchip: Add IMX MU MSI controller driver
dt-bindings: irqchip: renesas,irqc: Add r8a779g0 support
irqchip/gic-v3: Fix typo in comment
dt-bindings: interrupt-controller: ti,sci-intr: Fix missing reg property in the binding
dt-bindings: irqchip: ti,sci-inta: Fix warning for missing #interrupt-cells
irqchip: Allow extra fields to be passed to IRQCHIP_PLATFORM_DRIVER_END
platform-msi: Export symbol platform_msi_create_irq_domain()
irqchip/realtek-rtl: use parent interrupts
dt-bindings: interrupt-controller: realtek,rtl-intc: require parents
irqchip/realtek-rtl: use irq_domain_add_linear()
irqchip: Make irqchip_init() usable on pure ACPI systems
bcma: gpio: Use generic_handle_irq_safe()
gpio: mlxbf2: Use generic_handle_irq_safe()
platform/x86: intel_int0002_vgpio: Use generic_handle_irq_safe()
ssb: gpio: Use generic_handle_irq_safe()
pinctrl: amd: Use generic_handle_irq_safe()
...
Zack Rusin [Thu, 6 Oct 2022 14:33:19 +0000 (10:33 -0400)]
kbuild: Stop including vmlinux.bz2 in the rpm's
vmlinux.bz2 was added to the rpm packages in 2009 in the fc370ecfdb37 ("kbuild: add vmlinux to kernel rpm") but seemingly hasn't
been used since.
Originally this should have been split up in a seperate debugging
package because it massively increases the size of the generated rpm's
e.g. kernel rpm built using binrpm-pkg on Fedora 36 default 5.19.8 kernel
config and localmodconfig is ~255MB with vmlinux.bz2 and only ~65MB
without it.
Make the kernel built rpms about 4x smaller by not including the unused
vmlinux.bz2 in them.
Masahiro Yamada [Tue, 4 Oct 2022 16:29:04 +0000 (01:29 +0900)]
Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT does not give explicit
-gdwarf-* flag. The actual DWARF version is up to the toolchain.
The combination of GCC and GAS works fine, and Clang with the integrated
assembler is good too.
The combination of Clang and GAS is tricky, but at least, the -g flag
works for Clang <=13, which defaults to DWARF v4.
Clang 14 switched its default to DWARF v5.
Now, CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT has the same issue as
addressed by commit 98cd6f521f10 ("Kconfig: allow explicit opt in to
DWARF v5").
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y for Clang >= 14 and
GAS < 2.35 produces a ton of errors like follows:
/tmp/main-c2741c.s: Assembler messages:
/tmp/main-c2741c.s:109: Error: junk at end of line, first unrecognized character is `"'
/tmp/main-c2741c.s:109: Error: file number less than one
D Scott Phillips [Tue, 11 Oct 2022 02:21:40 +0000 (19:21 -0700)]
arm64: Add AMPERE1 to the Spectre-BHB affected list
Per AmpereOne erratum AC03_CPU_12, "Branch history may allow control of
speculative execution across software contexts," the AMPERE1 core needs the
bhb clearing loop to mitigate Spectre-BHB, with a loop iteration count of
11.
doc: RISC-V: Document that misaligned accesses are supported
The RISC-V ISA manual used to mandate that misaligned accesses were
supported in user mode, but that requirement was removed in 2018 via
riscv-isa-manual commit 61cadb9 ("Provide new description of misaligned
load/store behavior compatible with privileged architecture."). Since
the Linux uABI was already frozen at that point it's just been demoted
to part of the uABI, but that was never written down.
Nicholas Piggin [Wed, 12 Oct 2022 03:53:34 +0000 (13:53 +1000)]
powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs
powerpc 32-bit system call (and function) calling convention for 64-bit
arguments requires the next available odd-pair (two sequential registers
with the first being odd-numbered) from the standard register argument
allocation.
The first argument register is r3, so a 64-bit argument that appears at
an even position in the argument list must skip a register (unless there
were preceding 64-bit arguments, which might throw things off). This
requires non-standard compat definitions to deal with the holes in the
argument register allocation.
With pt_regs syscall wrappers which use a standard mapper to map pt_regs
GPRs to function arguments, 32-bit kernels hit the same basic problem,
the standard definitions don't cope with the unused argument registers.
Fix this by having 32-bit kernels share those syscall definitions with
compat.
Thanks to Jason for spending a lot of time finding and bisecting this
and developing a trivial reproducer. The perfect bug report.
Jens Axboe [Wed, 12 Oct 2022 13:15:53 +0000 (07:15 -0600)]
Merge tag 'nvme-6.1-2022-10-12' of git://git.infradead.org/nvme into block-6.1
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.1
- add NVME_QUIRK_BOGUS_NID for Lexar NM760 (Abhijit)
- avoid the deepest sleep state on ZHITAI TiPro5000 SSDs (Xi Ruoyao)
- fix possible hang caused during ctrl deletion (Sagi Grimberg)
- fix possible hang in live ns resize with ANA access (Sagi Grimberg)"
* tag 'nvme-6.1-2022-10-12' of git://git.infradead.org/nvme:
nvme-multipath: fix possible hang in live ns resize with ANA access
nvme-pci: avoid the deepest sleep state on ZHITAI TiPro5000 SSDs
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM760
nvme-tcp: fix possible hang caused during ctrl deletion
nvme-rdma: fix possible hang caused during ctrl deletion
Jiapeng Chong [Sun, 9 Oct 2022 02:06:42 +0000 (10:06 +0800)]
ring-buffer: Fix kernel-doc
kernel/trace/ring_buffer.c:895: warning: expecting prototype for ring_buffer_nr_pages_dirty(). Prototype was for ring_buffer_nr_dirty_pages() instead.
kernel/trace/ring_buffer.c:5313: warning: expecting prototype for ring_buffer_reset_cpu(). Prototype was for ring_buffer_reset_online_cpus() instead.
kernel/trace/ring_buffer.c:5382: warning: expecting prototype for rind_buffer_empty(). Prototype was for ring_buffer_empty() instead.
Jeremy Kerr [Wed, 12 Oct 2022 02:08:51 +0000 (10:08 +0800)]
mctp: prevent double key removal and unref
Currently, we have a bug where a simultaneous DROPTAG ioctl and socket
close may race, as we attempt to remove a key from lists twice, and
perform an unref for each removal operation. This may result in a uaf
when we attempt the second unref.
This change fixes the race by making __mctp_key_remove tolerant to being
called on a key that has already been removed from the socket/net lists,
and only performs the unref when we do the actual remove. We also need
to hold the list lock on the ioctl cleanup path.
This fix is based on a bug report and comprehensive analysis from
butt3rflyh4ck <[email protected]>, found via syzkaller.
David S. Miller [Wed, 12 Oct 2022 12:29:07 +0000 (13:29 +0100)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter fixes for net
This series from Phil Sutter for the *net* tree fixes a problem with a change
from the 6.1 development phase: the change to nft_fib should have used
the more recent flowic_l3mdev field. Pointed out by Guillaume Nault.
This also makes the older iptables module follow the same pattern.
Also add selftest case and avoid test failure in nft_fib.sh when the
host environment has set rp_filter=1.
====================
Phil Sutter [Wed, 5 Oct 2022 15:34:36 +0000 (17:34 +0200)]
selftests: netfilter: Fix nft_fib.sh for all.rp_filter=1
If net.ipv4.conf.all.rp_filter is set, it overrides the per-interface
setting and thus defeats the fix from bbe4c0896d250 ("selftests:
netfilter: disable rp_filter on router"). Unset it as well to cover that
case.
Fixes: bbe4c0896d250 ("selftests: netfilter: disable rp_filter on router") Signed-off-by: Phil Sutter <[email protected]> Signed-off-by: Florian Westphal <[email protected]>
Phil Sutter [Wed, 5 Oct 2022 16:07:05 +0000 (18:07 +0200)]
netfilter: rpfilter/fib: Populate flowic_l3mdev field
Use the introduced field for correct operation with VRF devices instead
of conditionally overwriting flowic_oif. This is a partial revert of
commit b575b24b8eee3 ("netfilter: Fix rpfilter dropping vrf packets by
mistake"), implementing a simpler solution.
Zheng Yejian [Tue, 11 Oct 2022 12:03:52 +0000 (12:03 +0000)]
ftrace: Fix char print issue in print_ip_ins()
When ftrace bug happened, following log shows every hex data in
problematic ip address:
actual: ffffffe8:6b:ffffffd9:01:21
But so many 'f's seem a little confusing, and that is because format
'%x' being used to print signed chars in array 'ins'. As suggested
by Joe, change to use format "%*phC" to print array 'ins'.
After this patch, the log is like:
actual: e8:6b:d9:01:21
nvme-multipath: fix possible hang in live ns resize with ANA access
When we revalidate paths as part of ns size change (as of commit e7d65803e2bb), it is possible that during the path revalidation, the
only paths that is IO capable (i.e. optimized/non-optimized) are the
ones that ns resize was not yet informed to the host, which will cause
inflight requests to be requeued (as we have available paths but none
are IO capable). These requests on the requeue list are waiting for
someone to resubmit them at some point.
The IO capable paths will eventually notify the ns resize change to the
host, but there is nothing that will kick the requeue list to resubmit
the queued requests.
Fix this by always kicking the requeue list, and if no IO capable path
exists, these requests will be queued again.
A typical log that indicates that IOs are requeued:
--
nvme nvme1: creating 4 I/O queues.
nvme nvme1: new ctrl: "testnqn1"
nvme nvme2: creating 4 I/O queues.
nvme nvme2: mapped 4/0/0 default/read/poll queues.
nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
nvme nvme1: rescanning namespaces.
nvme1n1: detected capacity change from 2097152 to 4194304
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
nvme nvme2: rescanning namespaces.
--
Xi Ruoyao [Wed, 28 Sep 2022 09:39:13 +0000 (17:39 +0800)]
nvme-pci: avoid the deepest sleep state on ZHITAI TiPro5000 SSDs
ZHITAI TiPro5000 SSDs has the same APST sleep problem as its cousin,
TiPro7000. The quirk for TiPro7000 has been added in
commit 6b961bce50e4 ("nvme-pci: avoid the deepest sleep state on
ZHITAI TiPro7000 SSDs"), use the same quirk for TiPro5000.
nvme-tcp: fix possible hang caused during ctrl deletion
When we delete a controller, we execute the following:
1. nvme_stop_ctrl() - stop some work elements that may be
inflight or scheduled (specifically also .stop_ctrl
which cancels ctrl error recovery work)
2. nvme_remove_namespaces() - which first flushes scan_work
to avoid competing ns addition/removal
3. continue to teardown the controller
However, if err_work was scheduled to run in (1), it is designed to
cancel any inflight I/O, particularly I/O that is originating from ns
scan_work in (2), but because it is cancelled in .stop_ctrl(), we can
prevent forward progress of (2) as ns scanning is blocking on I/O
(that will never be cancelled).
The race is:
1. transport layer error observed -> err_work is scheduled
2. scan_work executes, discovers ns, generate I/O to it
3. nvme_ctop_ctrl() -> .stop_ctrl() -> cancel_work_sync(err_work)
- err_work never executed
4. nvme_remove_namespaces() -> flush_work(scan_work)
--> deadlock, because scan_work is blocked on I/O that was supposed
to be cancelled by err_work, but was cancelled before executing (see
stack trace [1]).
Fix this by flushing err_work instead of cancelling it, to force it
to execute and cancel all inflight I/O.
nvme-rdma: fix possible hang caused during ctrl deletion
When we delete a controller, we execute the following:
1. nvme_stop_ctrl() - stop some work elements that may be
inflight or scheduled (specifically also .stop_ctrl
which cancels ctrl error recovery work)
2. nvme_remove_namespaces() - which first flushes scan_work
to avoid competing ns addition/removal
3. continue to teardown the controller
However, if err_work was scheduled to run in (1), it is designed to
cancel any inflight I/O, particularly I/O that is originating from ns
scan_work in (2), but because it is cancelled in .stop_ctrl(), we can
prevent forward progress of (2) as ns scanning is blocking on I/O
(that will never be cancelled).
The race is:
1. transport layer error observed -> err_work is scheduled
2. scan_work executes, discovers ns, generate I/O to it
3. nvme_ctop_ctrl() -> .stop_ctrl() -> cancel_work_sync(err_work)
- err_work never executed
4. nvme_remove_namespaces() -> flush_work(scan_work)
--> deadlock, because scan_work is blocked on I/O that was supposed
to be cancelled by err_work, but was cancelled before executing.
Fix this by flushing err_work instead of cancelling it, to force it
to execute and cancel all inflight I/O.
Fixes: b435ecea2a4d ("nvme: Add .stop_ctrl to nvme ctrl ops") Fixes: f6c8e432cb04 ("nvme: flush namespace scanning work just before removing namespaces") Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
Catalin Marinas [Thu, 6 Oct 2022 16:33:54 +0000 (17:33 +0100)]
arm64: mte: Avoid setting PG_mte_tagged if no tags cleared or restored
Prior to commit 69e3b846d8a7 ("arm64: mte: Sync tags for pages where PTE
is untagged"), mte_sync_tags() was only called for pte_tagged() entries
(those mapped with PROT_MTE). Therefore mte_sync_tags() could safely use
test_and_set_bit(PG_mte_tagged, &page->flags) without inadvertently
setting PG_mte_tagged on an untagged page.
The above commit was required as guests may enable MTE without any
control at the stage 2 mapping, nor a PROT_MTE mapping in the VMM.
However, the side-effect was that any page with a PTE that looked like
swap (or migration) was getting PG_mte_tagged set automatically. A
subsequent page copy (e.g. migration) copied the tags to the destination
page even if the tags were owned by KASAN.
This issue was masked by the page_kasan_tag_reset() call introduced in
commit e5b8d9218951 ("arm64: mte: reset the page tag in page->flags").
When this commit was reverted (20794545c146), KASAN started reporting
access faults because the overriding tags in a page did not match the
original page->flags (with CONFIG_KASAN_HW_TAGS=y):
BUG: KASAN: invalid-access in copy_page+0x10/0xd0 arch/arm64/lib/copy_page.S:26
Read at addr f5ff000017f2e000 by task syz-executor.1/2218
Pointer tag: [f5], memory tag: [f2]
Move the PG_mte_tagged bit setting from mte_sync_tags() to the actual
place where tags are cleared (mte_sync_page_tags()) or restored
(mte_restore_tags()).
Huacai Chen [Wed, 12 Oct 2022 08:36:23 +0000 (16:36 +0800)]
LoongArch: Update Loongson-3 default config file
1, Enable ZBOOT, KEXEC and BPF_JIT;
2, Add more patition types;
3, Add some USB Type-C options;
4, Add some common network options;
5, Add some Bluetooth device drivers;
6, Remove obsolete config options (for some detailed information, see
Link).
Tiezhu Yang [Wed, 12 Oct 2022 08:36:20 +0000 (16:36 +0800)]
LoongArch: Add BPF JIT support
BPF programs are normally handled by a BPF interpreter, add BPF JIT
support for LoongArch to allow the kernel to generate native code when
a program is loaded into the kernel. This will significantly speed-up
processing of BPF programs.
Tiezhu Yang [Wed, 12 Oct 2022 08:36:19 +0000 (16:36 +0800)]
LoongArch: Add some instruction opcodes and formats
According to the "Table of Instruction Encoding" in LoongArch Reference
Manual [1], add some instruction opcodes and formats which are used in
the BPF JIT for LoongArch.
Youling Tang [Wed, 12 Oct 2022 08:36:19 +0000 (16:36 +0800)]
LoongArch: Add kdump support
This patch adds support for kdump. In kdump case the normal kernel will
reserve a region for the crash kernel and jump there on panic.
Arch-specific functions are added to allow for implementing a crash dump
file interface, /proc/vmcore, which can be viewed as a ELF file.
A user-space tool, such as kexec-tools, is responsible for allocating a
separate region for the core's ELF header within the crash kdump kernel
memory and filling it in when executing kexec_load().
Then, its location will be advertised to the crash dump kernel via a
command line argument "elfcorehdr=", and the crash dump kernel will
preserve this region for later use with arch_reserve_vmcore() at boot
time.
At the same time, the crash kdump kernel is also limited within the
"crashkernel" area via a command line argument "mem=", so as not to
destroy the original kernel dump data.
In the crash dump kernel environment, /proc/vmcore is used to access the
primary kernel's memory with copy_oldmem_page().
I tested kdump on LoongArch machines (Loongson-3A5000) and it works as
expected (suggested crashkernel parameter is "crashkernel=512M@2560M"),
you may test it by triggering a crash through /proc/sysrq-trigger:
Youling Tang [Wed, 12 Oct 2022 08:36:19 +0000 (16:36 +0800)]
LoongArch: Add kexec support
Add three new files, kexec.h, machine_kexec.c and relocate_kernel.S to
the LoongArch architecture, so as to add support for the kexec re-boot
mechanism (CONFIG_KEXEC) on LoongArch platforms.
Kexec supports loading vmlinux.elf in ELF format and vmlinux.efi in PE
format.
I tested kexec on LoongArch machines (Loongson-3A5000) and it works as
expected:
Youling Tang [Wed, 12 Oct 2022 08:36:19 +0000 (16:36 +0800)]
LoongArch: Use generic BUG() handler
Inspired by commit 9fb7410f955("arm64/BUG: Use BRK instruction for
generic BUG traps"), do similar for LoongArch to use generic BUG()
handler.
This patch uses the BREAK software breakpoint instruction to generate
a trap instead, similarly to most other arches, with the generic BUG
code generating the dmesg boilerplate.
This allows bug metadata to be moved to a separate table and reduces
the amount of inline code at BUG() and WARN() sites. This also avoids
clobbering any registers before they can be dumped.
To mitigate the size of the bug table further, this patch makes use of
the existing infrastructure for encoding addresses within the bug table
as 32-bit relative pointers instead of absolute pointers.
(Note: this limits the max kernel size to 2GB.)
Before patch:
[ 3018.338013] lkdtm: Performing direct entry BUG
[ 3018.342445] Kernel bug detected[#5]:
[ 3018.345992] CPU: 2 PID: 865 Comm: cat Tainted: G D 6.0.0-rc6+ #35
After patch:
[ 125.585985] lkdtm: Performing direct entry BUG
[ 125.590433] ------------[ cut here ]------------
[ 125.595020] kernel BUG at drivers/misc/lkdtm/bugs.c:78!
[ 125.600211] Oops - BUG[#1]:
[ 125.602980] CPU: 3 PID: 410 Comm: cat Not tainted 6.0.0-rc6+ #36
Out-of-line file/line data information obtained compared to before.
Huacai Chen [Wed, 12 Oct 2022 08:36:14 +0000 (16:36 +0800)]
LoongArch: Add qspinlock support
On NUMA system, the performance of qspinlock is better than generic
spinlock. Below is the UnixBench test results on a 8 nodes (4 cores
per node, 32 cores in total) machine.
Huacai Chen [Wed, 12 Oct 2022 08:36:14 +0000 (16:36 +0800)]
LoongArch: Use TLB for ioremap()
We can support more cache attributes (e.g., CC, SUC and WUC) and page
protection when we use TLB for ioremap(). The implementation is based
on GENERIC_IOREMAP.
The existing simple ioremap() implementation has better performance so
we keep it and introduce ARCH_IOREMAP to control the selection.
We move pagetable_init() earlier to make early ioremap() works, and we
modify the PCI ecam mapping because the TLB-based version of ioremap()
will actually take the size into account.
Huacai Chen [Wed, 12 Oct 2022 08:36:14 +0000 (16:36 +0800)]
LoongArch: Support access filter to /dev/mem interface
Accidental access to /dev/mem is obviously disastrous, but specific
access can be used by people debugging the kernel. So select GENERIC_
LIB_DEVMEM_IS_ALLOWED, as well as define ARCH_HAS_VALID_PHYS_ADDR_RANGE
and related helpers, to support access filter to /dev/mem interface.
Huacai Chen [Wed, 12 Oct 2022 08:36:14 +0000 (16:36 +0800)]
LoongArch: Refactor cache probe and flush methods
Current cache probe and flush methods have some drawbacks:
1, Assume there are 3 cache levels and only 3 levels;
2, Assume L1 = I + D, L2 = V, L3 = S, V is exclusive, S is inclusive.
However, the fact is I + D, I + D + V, I + D + S and I + D + V + S are
all valid. So, refactor the cache probe and flush methods to adapt more
types of cache hierarchy.
Rui Wang [Wed, 12 Oct 2022 08:36:14 +0000 (16:36 +0800)]
LoongArch: mm: Refactor TLB exception handlers
This patch simplifies TLB load, store and modify exception handlers:
1. Reduce instructions, such as alu/csr and memory access;
2. Execute tlb search instruction only in the fast path;
3. Return directly from the fast path for both normal and huge pages;
4. Re-tab the assembly for better vertical alignment.
And fixes the concurrent modification issue of fast path for huge pages.
This issue will occur in the following steps:
CPU-1 (In TLB exception) CPU-2 (In THP splitting)
1: Load PMD entry (HUGE=1)
2: Goto huge path
3: Store PMD entry (HUGE=0)
4: Reload PMD entry (HUGE=0)
5: Fill TLB entry (PA is incorrect)
This patch also slightly improves the TLB processing performance:
Xi Ruoyao [Wed, 12 Oct 2022 08:36:14 +0000 (16:36 +0800)]
LoongArch: Support R_LARCH_GOT_PC_{LO12,HI20} in modules
GCC >= 13 and GNU assembler >= 2.40 use these relocations to address
external symbols, so we need to add them.
Let the module loader emit GOT entries for data symbols so we would be
able to handle GOT relocations. The GOT entry is just the data's symbol
address.
In module.lds, emit a stub .got section for a section header entry. The
actual content of the section entry will be filled at runtime by module_
frob_arch_sections().
Xi Ruoyao [Wed, 12 Oct 2022 08:36:08 +0000 (16:36 +0800)]
LoongArch: Adjust symbol addressing for AS_HAS_EXPLICIT_RELOCS
If explicit relocation hints are used by the toolchain, -Wa,-mla-*
options will be useless for the C code. So only use them for the
!CONFIG_AS_HAS_EXPLICIT_RELOCS case.
Replace "la" with "la.pcrel" in head.S to keep the semantic consistent
with new and old toolchains for the low level startup code.
For per-CPU variables, the "address" of the symbol is actually an offset
from $r21. The value is near the loading address of main kernel image,
but far from the loading address of modules. So we use model("extreme")
attibute to tell the compiler that a PC-relative addressing with 32-bit
offset is not sufficient for local per-CPU variables.
The behavior with different assemblers and compilers are summarized in
the following table:
AS has CC has
explicit relocs explicit relocs * Behavior
==============================================================
No No Use la.* macros.
No change from Linux 6.0.
--------------------------------------------------------------
No Yes Disable explicit relocs.
No change from Linux 6.0.
--------------------------------------------------------------
Yes No Not supported.
--------------------------------------------------------------
Yes Yes Enable explicit relocs.
No -Wa,-mla* options used.
==============================================================
*: We assume CC must have model attribute if it has explicit relocs.
Both features are added in GCC 13 development cycle, so any GCC
release >= 13 should be OK. Using early GCC 13 development snapshots
may produce modules with unsupported relocations.
GNU as >= 2.40 and GCC >= 13 will support using explicit relocation
hints in the assembly code, instead of la.* macros. The usage of
explicit relocation hints can improve code generation so it's enabled
by default by GCC >= 13.
Introduce a Kconfig option AS_HAS_EXPLICIT_RELOCS as the switch for
"use explicit relocation hints or not".
Huacai Chen [Wed, 12 Oct 2022 08:36:08 +0000 (16:36 +0800)]
LoongArch: Mark __xchg() and __cmpxchg() as __always_inline
Commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") allows compiler to uninline functions marked as 'inline'.
In case of __xchg()/__cmpxchg() this would cause to reference
BUILD_BUG(), which is an error case for catching bugs and will not
happen for correct code, if __xchg()/__cmpxchg() is inlined.
This bug can be produced with CONFIG_DEBUG_SECTION_MISMATCH enabled,
and the solution is similar to below commits: 46f1619500d0225 ("MIPS: include: Mark __xchg as __always_inline"), 88356d09904bc60 ("MIPS: include: Mark __cmpxchg as __always_inline").
Huacai Chen [Wed, 12 Oct 2022 08:36:08 +0000 (16:36 +0800)]
LoongArch: Flush TLB earlier at initialization
Move local_flush_tlb_all() earlier (just after setup_ptwalker() and
before page allocation). This can avoid stale TLB entries misguiding
the later page allocation. Without this patch the second kernel of
kexec/kdump fails to boot SMP.
BTW, move output_pgtable_bits_defines() into tlb_init() since it has
nothing to do with tlb handler setup.
Tiezhu Yang [Wed, 12 Oct 2022 08:36:08 +0000 (16:36 +0800)]
LoongArch: Do not create sysfs control file for io master CPUs
Now io master CPUs are not hotpluggable on LoongArch, but in the current
code only /sys/devices/system/cpu/cpu0/online is not created. Let us set
the hotpluggable field of all the io master CPUs as 0, then prevent to
create sysfs control file for all the io master CPUs which confuses some
user space tools. This is similar with commit 9cce844abf07 ("MIPS: CPU#0
is not hotpluggable").
Eric Dumazet [Tue, 11 Oct 2022 22:07:48 +0000 (15:07 -0700)]
tcp: cdg: allow tcp_cdg_release() to be called multiple times
Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]
Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.
[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567
David S. Miller [Wed, 12 Oct 2022 08:10:02 +0000 (09:10 +0100)]
Merge branch 'inet-ping-fixes'
Eric Dumazet says:
====================
inet: ping: give ping some care
First patch fixes an ipv6 ping bug that has been there forever,
for large sizes.
Second patch fixes a recent and elusive bug, that can potentially
crash the host. This is what I mentioned privately to Paolo and
Jakub at LPC in Dublin.
====================
Eric Dumazet [Tue, 11 Oct 2022 21:27:29 +0000 (14:27 -0700)]
inet: ping: fix recent breakage
Blamed commit broke the assumption used by ping sendmsg() that
allocated skb would have MAX_HEADER bytes in skb->head.
This patch changes the way ping works, by making sure
the skb head contains space for the icmp header,
and adjusting ping_getfrag() which was desperate
about going past the icmp header :/
This is adopting what UDP does, mostly.
syzbot is able to crash a host using both kfence and following repro in a loop.
When kfence triggers, skb->head only has 64 bytes, immediately followed
by struct skb_shared_info (no extra headroom based on ksize(ptr))
Then icmpv6_push_pending_frames() is overwriting first bytes
of skb_shinfo(skb), making nr_frags bigger than MAX_SKB_FRAGS,
and/or setting shinfo->gso_size to a non zero value.
If nr_frags is mangled, a crash happens in skb_release_data()
If gso_size is mangled, we have the following report:
net: ethernet: ti: am65-cpsw: set correct devlink flavour for unused ports
am65_cpsw_nuss_register_ndevs() skips calling devlink_port_type_eth_set()
for ports without assigned netdev, triggering the following warning when
DEVLINK_PORT_TYPE_WARN_TIMEOUT elapses after 3600s:
Type was not set for devlink port.
WARNING: CPU: 0 PID: 129 at net/core/devlink.c:8095 devlink_port_type_warn+0x18/0x30
Fixes: 0680e20af5fb ("net: ethernet: ti: am65-cpsw: Fix devlink port register sequence") Signed-off-by: Matthias Schiffer <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
The Freescale/NXP i.MX Messaging Unit is only present on Freescale/NXP
i.MX SoCs. Hence add a dependency on ARCH_MXC, to prevent asking the
user about this driver when configuring a kernel without Freescale/NXP
i.MX SoC family support.
While at it, expand "MU" to "Messaging Unit" in the help text.
Stefan Binding [Tue, 11 Oct 2022 14:35:52 +0000 (15:35 +0100)]
ALSA: hda: cs35l41: Support System Suspend
Add support for system suspend into the CS35L41 HDA Driver.
Since S4 suspend may power off the system, it is required
that the driver ensure the part is safe to be shutdown before
system suspend, as well as ensuring that the firmware is
unloaded before shutdown. The part must then be restored
on system resume, including re-downloading the firmware.
The current code uses calls from the HDA Codec driver to
determine when to suspend/resume by calling hooks via the
hda_component binding.
However, this means the cs35l41 driver relies on the HDA
Codec driver to tell it when to suspend or resume,
creating an additional external dependency, and potentially
creating race conditions in the future. It is better for
the cs35l41 hda driver to decide for itself when the part
should be suspended or resumed.
This makes supporting system suspend easier.
ALSA: hda/cs_dsp_ctl: Fix mutex inversion when creating controls
Redesign the creation of ALSA controls so that the cs_dsp
pwr_lock is not held when calling snd_ctl_add(). Instead of
creating the ALSA control from the cs_dsp control_add callback,
do it after cs_dsp_power_up() has completed. The existing
functions are changed to return void instead of passing errors
back - this duplicates the original behaviour, as cs_dsp does
not abort firmware load if creation of a control fails.
It is safe to walk the control list without taking any mutex
provided that the caller is not trying to load a new firmware
or remove the driver in parallel. There is no other situation
that the list can change. So the caller can trigger creation
of ALSA controls after cs_dsp_power_up() has returned. A cs_dsp
control will have a non-NULL priv pointer if we have created
an ALSA control.
With the previous code the ALSA controls were created from
the cs_dsp control_add callback. But this is called with
pwr_lock held (as it is part of the DSP power-up sequence).
The kernel lock checking will show a mutex inversion between
this and the control creation path:
control_add
pwr_lock held, takes controls_rwsem (in snd_ctl_add)
get/put
controls_rwsem held, takes pwr_lock to call cs_dsp.
This is not completely theoretical. Although the time window
is very small, it is possible for these to run in parallel
and deadlock the old implementation.
Stefan Binding [Tue, 11 Oct 2022 14:35:48 +0000 (15:35 +0100)]
ALSA: hda: hda_cs_dsp_ctl: Minor clean and redundant code removal
The cs_dsp core will return an error if passed a NULL cs_dsp struct so
there is no need for the hda_cs_dsp_write|read_ctl functions to manually
check that. The cs_dsp core will also check the data is within bounds of
the control so the additional bounds check is redundant too. Simplify
things a bit by removing said code.
Linus Torvalds [Wed, 12 Oct 2022 03:48:55 +0000 (20:48 -0700)]
Merge tag 'memblock-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
"Test suite improvements:
- Added verification that memblock allocations zero the allocated
memory
- Added more test cases for memblock_add(), memblock_remove(),
memblock_reserve() and memblock_free()
- Added tests for memblock_*_raw() family
- Added tests for NUMA-aware allocations in memblock_alloc_try_nid()
and memblock_alloc_try_nid_raw()"
* tag 'memblock-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock tests: add generic NUMA tests for memblock_alloc_try_nid*
memblock tests: add bottom-up NUMA tests for memblock_alloc_try_nid*
memblock tests: add top-down NUMA tests for memblock_alloc_try_nid*
memblock tests: add simulation of physical memory with multiple NUMA nodes
memblock_tests: move variable declarations to single block
memblock tests: remove 'cleared' from comment blocks
memblock tests: add tests for memblock_trim_memory
memblock tests: add tests for memblock_*bottom_up functions
memblock tests: update alloc_nid_api to test memblock_alloc_try_nid_raw
memblock tests: update alloc_api to test memblock_alloc_raw
memblock tests: add additional tests for basic api and memblock_alloc
memblock tests: add labels to verbose output for generic alloc tests
memblock tests: update zeroed memory check for memblock_alloc_* tests
memblock tests: update tests to check if memblock_alloc zeroed memory
memblock tests: update reference to obsolete build option in comments
memblock tests: add command line help option
Linus Torvalds [Wed, 12 Oct 2022 03:07:44 +0000 (20:07 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
"The main batch of ARM + RISC-V changes, and a few fixes and cleanups
for x86 (PMU virtualization and selftests).
ARM:
- Fixes for single-stepping in the presence of an async exception as
well as the preservation of PSTATE.SS
- Better handling of AArch32 ID registers on AArch64-only systems
- Fixes for the dirty-ring API, allowing it to work on architectures
with relaxed memory ordering
- Advertise the new kvmarm mailing list
- Various minor cleanups and spelling fixes
RISC-V:
- Improved instruction encoding infrastructure for instructions not
yet supported by binutils
- Svinval support for both KVM Host and KVM Guest
- Zihintpause support for KVM Guest
- Zicbom support for KVM Guest
- Record number of signal exits as a VCPU stat
- Use generic guest entry infrastructure
x86:
- Misc PMU fixes and cleanups.
- selftests: fixes for Hyper-V hypercall
- selftests: fix nx_huge_pages_test on TDP-disabled hosts
- selftests: cleanups for fix_hypercall_test"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (57 commits)
riscv: select HAVE_POSIX_CPU_TIMERS_TASK_WORK
RISC-V: KVM: Use generic guest entry infrastructure
RISC-V: KVM: Record number of signal exits as a vCPU stat
RISC-V: KVM: add __init annotation to riscv_kvm_init()
RISC-V: KVM: Expose Zicbom to the guest
RISC-V: KVM: Provide UAPI for Zicbom block size
RISC-V: KVM: Make ISA ext mappings explicit
RISC-V: KVM: Allow Guest use Zihintpause extension
RISC-V: KVM: Allow Guest use Svinval extension
RISC-V: KVM: Use Svinval for local TLB maintenance when available
RISC-V: Probe Svinval extension form ISA string
RISC-V: KVM: Change the SBI specification version to v1.0
riscv: KVM: Apply insn-def to hlv encodings
riscv: KVM: Apply insn-def to hfence encodings
riscv: Introduce support for defining instructions
riscv: Add X register names to gpr-nums
KVM: arm64: Advertise new kvmarm mailing list
kvm: vmx: keep constant definition format consistent
kvm: mmu: fix typos in struct kvm_arch
KVM: selftests: Fix nx_huge_pages_test on TDP-disabled hosts
...
riscv: always honor the CONFIG_CMDLINE_FORCE when parsing dtb
When CONFIG_CMDLINE_FORCE is enabled, cmdline provided by
CONFIG_CMDLINE are always used. This allows CONFIG_CMDLINE to be
used regardless of the result of device tree scanning.
This especially fixes the case where a device tree without the
chosen node is supplied to the kernel. In such cases,
early_init_dt_scan would return true. But inside
early_init_dt_scan_chosen, the cmdline won't be updated as there
is no chosen node in the device tree. As a result, CONFIG_CMDLINE
is not copied into boot_command_line even if CONFIG_CMDLINE_FORCE
is enabled. This commit allows properly update boot_command_line
in this situation.
Ryusuke Konishi [Fri, 7 Oct 2022 08:52:26 +0000 (17:52 +0900)]
nilfs2: fix leak of nilfs_root in case of writer thread creation failure
If nilfs_attach_log_writer() failed to create a log writer thread, it
frees a data structure of the log writer without any cleanup. After
commit e912a5b66837 ("nilfs2: use root object to get ifile"), this causes
a leak of struct nilfs_root, which started to leak an ifile metadata inode
and a kobject on that struct.
In addition, if the kernel is booted with panic_on_warn, the above
ifile metadata inode leak will cause the following panic when the
nilfs2 kernel module is removed:
Ryusuke Konishi [Sun, 2 Oct 2022 03:08:04 +0000 (12:08 +0900)]
nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
If the i_mode field in inode of metadata files is corrupted on disk, it
can cause the initialization of bmap structure, which should have been
called from nilfs_read_inode_common(), not to be called. This causes a
lockdep warning followed by a NULL pointer dereference at
nilfs_bmap_lookup_at_level().
This patch fixes these issues by adding a missing sanitiy check for the
i_mode field of metadata file's inode.
Ryusuke Konishi [Mon, 3 Oct 2022 15:05:19 +0000 (00:05 +0900)]
nilfs2: fix use-after-free bug of struct nilfs_root
If the beginning of the inode bitmap area is corrupted on disk, an inode
with the same inode number as the root inode can be allocated and fail
soon after. In this case, the subsequent call to nilfs_clear_inode() on
that bogus root inode will wrongly decrement the reference counter of
struct nilfs_root, and this will erroneously free struct nilfs_root,
causing kernel oopses.
This fixes the problem by changing nilfs_new_inode() to skip reserved
inode numbers while repairing the inode bitmap.
SeongJae Park [Sun, 2 Oct 2022 19:31:30 +0000 (19:31 +0000)]
mm/damon/core: initialize damon_target->list in damon_new_target()
'struct damon_target' creation function, 'damon_new_target()' is not
initializing its '->list' field, unlike other DAMON structs creator
functions such as 'damon_new_region()'. Normal users of
'damon_new_target()' initializes the field by adding the target to DAMON
context's targets list, but some code could access the uninitialized
field.
This commit avoids the case by initializing the field in
'damon_new_target()'.
Baolin Wang [Thu, 1 Sep 2022 10:41:31 +0000 (18:41 +0800)]
mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.
So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte(). However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.
That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.
For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.
Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().
To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.
Mike said:
Support for CONT_PMD/_PTE was added with bb9dd3df8ee9 ("arm64: hugetlb:
refactor find_num_contig()"). Patch series "Support for contiguous pte
hugepages", v4. However, I do not believe these code paths were
executed until migration support was added with 5480280d3f2d ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with 5480280d3f2d for the Fixes: targe.
Jakub Kicinski [Wed, 12 Oct 2022 02:04:22 +0000 (19:04 -0700)]
Merge tag 'wireless-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.1
First set of fixes for v6.1. Quite a lot of fixes in stack but also
for mt76.
cfg80211/mac80211
- fix locking error in mac80211's hw addr change
- fix TX queue stop for internal TXQs
- handling of very small (e.g. STP TCN) packets
- two memcpy() hardening fixes
- fix probe request 6 GHz capability warning
- fix various connection prints
- fix decapsulation offload for AP VLAN
mt76
- fix rate reporting, LLC packets and receive checksum offload on specific chipsets
iwlwifi
- fix crash due to list corruption
ath11k
- fix a compiler warning with GCC 11 and KASAN
* tag 'wireless-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: ath11k: mac: fix reading 16 bytes from a region of size 0 warning
wifi: iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (other cases)
wifi: mt76: fix rx checksum offload on mt7615/mt7915/mt7921
wifi: mt76: fix receiving LLC packets on mt7615/mt7915
wifi: nl80211: Split memcpy() of struct nl80211_wowlan_tcp_data_token flexible array
wifi: wext: use flex array destination for memcpy()
wifi: cfg80211: fix ieee80211_data_to_8023_exthdr handling of small packets
wifi: mac80211: netdev compatible TX stop for iTXQ drivers
wifi: mac80211: fix decap offload for stations on AP_VLAN interfaces
wifi: mac80211: unlock on error in ieee80211_can_powered_addr_change()
wifi: mac80211: remove/avoid misleading prints
wifi: mac80211: fix probe req HE capabilities access
wifi: mac80211: do not drop packets smaller than the LLC-SNAP header on fast-rx
wifi: mt76: fix rate reporting / throughput regression on mt7915 and newer
====================
Tiezhu Yang [Fri, 2 Sep 2022 03:41:46 +0000 (11:41 +0800)]
include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
The argument has_signal of arch_do_signal_or_restart() has been removed in
commit 8ba62d37949e ("task_work: Call tracehook_notify_signal from
get_signal on all architectures"), let us remove the related comment.
They must be empty (excluding vsyscall page) or full of zeroes.
Retroactively this test should've caught embarassing /proc/*/smaps_rollup
oops:
[17752.703567] BUG: kernel NULL pointer dereference, address: 0000000000000000
[17752.703580] #PF: supervisor read access in kernel mode
[17752.703583] #PF: error_code(0x0000) - not-present page
[17752.703587] PGD 0 P4D 0
[17752.703593] Oops: 0000 [#1] PREEMPT SMP PTI
[17752.703598] CPU: 0 PID: 60649 Comm: cat Tainted: G W 5.19.9-100.fc35.x86_64 #1
[17752.703603] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme6/3.1, BIOS P3.30 08/05/2016
[17752.703607] RIP: 0010:show_smaps_rollup+0x159/0x2e0
Note 1:
ProtectionKey field in /proc/*/smaps is optional,
so check most of its contents, not everything.
Note 2:
due to the nature of this test, child process hardly can signal
its readiness (after unmapping everything!) to parent.
I feel like "sleep(1)" is justified.
If you know how to do it without sleep please tell me.
Clean up config files by:
- removing configs that were deleted in the past
- removing configs not in tree and without recently pending patches
- adding new configs that are replacements for old configs in the file
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
If creation or finalization of a checkpoint fails due to anomalies in the
checkpoint metadata on disk, a kernel warning is generated.
This patch replaces the WARN_ONs by nilfs_error, so that a kernel, booted
with panic_on_warn, does not panic. A nilfs_error is appropriate here to
handle the abnormal filesystem condition.
This also replaces the detected error codes with an I/O error so that
neither of the internal error codes is returned to callers.