Thomas Gleixner [Tue, 13 Aug 2024 22:29:36 +0000 (00:29 +0200)]
x86/kaslr: Expose and use the end of the physical memory address space
iounmap() on x86 occasionally fails to unmap because the provided valid
ioremap address is not below high_memory. It turned out that this
happens due to KASLR.
KASLR uses the full address space between PAGE_OFFSET and vaddr_end to
randomize the starting points of the direct map, vmalloc and vmemmap
regions. It thereby limits the size of the direct map by using the
installed memory size plus an extra configurable margin for hot-plug
memory. This limitation is done to gain more randomization space
because otherwise only the holes between the direct map, vmalloc,
vmemmap and vaddr_end would be usable for randomizing.
The limited direct map size is not exposed to the rest of the kernel, so
the memory hot-plug and resource management related code paths still
operate under the assumption that the available address space can be
determined with MAX_PHYSMEM_BITS.
request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1
downwards. That means the first allocation happens past the end of the
direct map and if unlucky this address is in the vmalloc space, which
causes high_memory to become greater than VMALLOC_START and consequently
causes iounmap() to fail for valid ioremap addresses.
MAX_PHYSMEM_BITS cannot be changed for that because the randomization
does not align with address bit boundaries and there are other places
which actually require to know the maximum number of address bits. All
remaining usage sites of MAX_PHYSMEM_BITS have been analyzed and found
to be correct.
Cure this by exposing the end of the direct map via PHYSMEM_END and use
that for the memory hot-plug and resource management related places
instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END
maps to a variable which is initialized by the KASLR initialization and
otherwise it is based on MAX_PHYSMEM_BITS as before.
To prevent future hickups add a check into add_pages() to catch callers
trying to add memory above PHYSMEM_END.
Mitchell Levy [Mon, 12 Aug 2024 20:44:12 +0000 (13:44 -0700)]
x86/fpu: Avoid writing LBR bit to IA32_XSS unless supported
There are two distinct CPU features related to the use of XSAVES and LBR:
whether LBR is itself supported and whether XSAVES supports LBR. The LBR
subsystem correctly checks both in intel_pmu_arch_lbr_init(), but the
XSTATE subsystem does not.
The LBR bit is only removed from xfeatures_mask_independent when LBR is not
supported by the CPU, but there is no validation of XSTATE support.
If XSAVES does not support LBR the write to IA32_XSS causes a #GP fault,
leaving the state of IA32_XSS unchanged, i.e. zero. The fault is handled
with a warning and the boot continues.
Consequently the next XRSTORS which tries to restore supervisor state fails
with #GP because the RFBM has zero for all supervisor features, which does
not match the XCOMP_BV field.
As XFEATURE_MASK_FPSTATE includes supervisor features setting up the FPU
causes a #GP, which ends up in fpu_reset_from_exception_fixup(). That fails
due to the same problem resulting in recursive #GPs until the kernel runs
out of stack space and double faults.
Prevent this by storing the supported independent features in
fpu_kernel_cfg during XSTATE initialization and use that cached value for
retrieving the independent feature bits to be written into IA32_XSS.
Yuntao Wang [Tue, 13 Aug 2024 01:48:27 +0000 (09:48 +0800)]
x86/apic: Make x2apic_disable() work correctly
x2apic_disable() clears x2apic_state and x2apic_mode unconditionally, even
when the state is X2APIC_ON_LOCKED, which prevents the kernel to disable
it thereby creating inconsistent state.
Due to the early state check for X2APIC_ON, the code path which warns about
a locked X2APIC cannot be reached.
Test for state < X2APIC_ON instead and move the clearing of the state and
mode variables to the place which actually disables X2APIC.
[ tglx: Massaged change log. Added Fixes tag. Moved clearing so it's at the
right place for back ports ]
Linus Torvalds [Sun, 11 Aug 2024 17:20:29 +0000 (10:20 -0700)]
Merge tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Fix 32-bit PTI for real.
pti_clone_entry_text() is called twice, once before initcalls so that
initcalls can use the user-mode helper and then again after text is
set read only. Setting read only on 32-bit might break up the PMD
mapping, which makes the second invocation of pti_clone_entry_text()
find the mappings out of sync and failing.
Allow the second call to split the existing PMDs in the user mapping
and synchronize with the kernel mapping.
- Don't make acpi_mp_wake_mailbox read-only after init as the mail box
must be writable in the case that CPU hotplug operations happen after
boot. Otherwise the attempt to start a CPU crashes with a write to
read only memory.
- Add a missing sanity check in mtrr_save_state() to ensure that the
fixed MTRR MSRs are supported.
Otherwise mtrr_save_state() ends up in a #GP, which is fixed up, but
the WARN_ON() can bring systems down when panic on warn is set.
* tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mtrr: Check if fixed MTRRs exist before saving them
x86/paravirt: Fix incorrect virt spinlock setting on bare metal
x86/acpi: Remove __ro_after_init from acpi_mp_wake_mailbox
x86/mm: Fix PTI for i386 some more
Linus Torvalds [Sun, 11 Aug 2024 17:15:34 +0000 (10:15 -0700)]
Merge tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull time keeping fixes from Thomas Gleixner:
- Fix a couple of issues in the NTP code where user supplied values are
neither sanity checked nor clamped to the operating range. This
results in integer overflows and eventualy NTP getting out of sync.
According to the history the sanity checks had been removed in favor
of clamping the values, but the clamping never worked correctly under
all circumstances. The NTP people asked to not bring the sanity
checks back as it might break existing applications.
Make the clamping work correctly and add it where it's missing
- If adjtimex() sets the clock it has to trigger the hrtimer subsystem
so it can adjust and if the clock was set into the future expire
timers if needed. The caller should provide a bitmask to tell
hrtimers which clocks have been adjusted.
adjtimex() uses not the proper constant and uses CLOCK_REALTIME
instead, which is 0. So hrtimers adjusts only the clocks, but does
not check for expired timers, which might make them expire really
late. Use the proper bitmask constant instead.
* tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix bogus clock_was_set() invocation in do_adjtimex()
ntp: Safeguard against time_constant overflow
ntp: Clamp maxerror and esterror to operating range
Linus Torvalds [Sun, 11 Aug 2024 17:07:52 +0000 (10:07 -0700)]
Merge tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Three small fixes for interrupt core and drivers:
- The interrupt core fails to honor caller supplied affinity hints
for non-managed interrupts and uses the system default affinity on
startup instead. Set the missing flag in the descriptor to tell the
core to use the provided affinity.
- Fix a shift out of bounds error in the Xilinx driver
- Handle switching to level trigger correctly in the RISCV APLIC
driver. It failed to retrigger the interrupt which causes it to
become stale"
* tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/riscv-aplic: Retrigger MSI interrupt on source configuration
irqchip/xilinx: Fix shift out of bounds
genirq/irqdesc: Honor caller provided affinity in alloc_desc()
Linus Torvalds [Sun, 11 Aug 2024 16:55:32 +0000 (09:55 -0700)]
Merge tag 'usb-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for reported issues for
6.11-rc3. Included in here are:
- usb serial driver MODULE_DESCRIPTION() updates
- usb serial driver fixes
- typec driver fixes
- usb-ip driver fix
- gadget driver fixes
- dt binding update
All of these have been in linux-next with no reported issues"
* tag 'usb-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: ucsi: Fix a deadlock in ucsi_send_command_common()
usb: typec: tcpm: avoid sink goto SNK_UNATTACHED state if not received source capability message
usb: gadget: f_fs: pull out f->disable() from ffs_func_set_alt()
usb: gadget: f_fs: restore ffs_func_disable() functionality
USB: serial: debug: do not echo input by default
usb: typec: tipd: Delete extra semi-colon
usb: typec: tipd: Fix dereferencing freeing memory in tps6598x_apply_patch()
usb: gadget: u_serial: Set start_delayed during suspend
usb: typec: tcpci: Fix error code in tcpci_check_std_output_cap()
usb: typec: fsa4480: Check if the chip is really there
usb: gadget: core: Check for unset descriptor
usb: vhci-hcd: Do not drop references before new references are gained
usb: gadget: u_audio: Check return codes from usb_ep_enable and config_ep_by_speed.
usb: gadget: midi2: Fix the response for FB info with block 0xff
dt-bindings: usb: microchip,usb2514: Add USB2517 compatible
USB: serial: garmin_gps: use struct_size() to allocate pkt
USB: serial: garmin_gps: annotate struct garmin_packet with __counted_by
USB: serial: add missing MODULE_DESCRIPTION() macros
USB: serial: spcp8x5: remove unused struct 'spcp8x5_usb_ctrl_arg'
Linus Torvalds [Sun, 11 Aug 2024 16:51:29 +0000 (09:51 -0700)]
Merge tag 'tty-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for reported problems
for 6.11-rc3. Included in here are:
- sc16is7xx serial driver fixes
- uartclk bugfix for a divide by zero issue
- conmakehash userspace build issue fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: vt: conmakehash: cope with abs_srctree no longer in env
serial: sc16is7xx: fix invalid FIFO access with special register set
serial: sc16is7xx: fix TX fifo corruption
serial: core: check uartclk for zero to avoid divide by zero
Linus Torvalds [Sun, 11 Aug 2024 16:38:38 +0000 (09:38 -0700)]
Merge tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core / documentation fixes from Greg KH:
"Here are some small fixes, and some documentation updates for
6.11-rc3. Included in here are:
- embargoed hardware documenation updates based on a lot of review by
legal-types in lots of companies to try to make the process a _bit_
easier for us to manage over time.
- rust firmware documentation fix
- driver detach race fix for the fix that went into 6.11-rc1
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Fix uevent_show() vs driver detach race
Documentation: embargoed-hardware-issues.rst: add a section documenting the "early access" process
Documentation: embargoed-hardware-issues.rst: minor cleanups and fixes
rust: firmware: fix invalid rustdoc link
Linus Torvalds [Sun, 11 Aug 2024 16:28:04 +0000 (09:28 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two core fixes: one to prevent discard type changes (seen on iSCSI)
during intermittent errors and the other is fixing a lockdep problem
caused by the queue limits change.
And one driver fix in ufs"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Keep the discard mode stable
scsi: sd: Move sd_read_cpr() out of the q->limits_lock region
scsi: ufs: core: Fix hba->last_dme_cmd_tstamp timestamp updating logic
Linus Torvalds [Sat, 10 Aug 2024 17:44:21 +0000 (10:44 -0700)]
Merge tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Two minor fixes for recent changes
* tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't set SVC_SOCK_ANONYMOUS when creating nfsd sockets
sunrpc: avoid -Wformat-security warning
Linus Torvalds [Sat, 10 Aug 2024 17:28:52 +0000 (10:28 -0700)]
Merge tag 'i2c-for-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- Two fixes for SMBusAlert handling in the I2C core: one to avoid an
endless loop when scanning for handlers and one to make sure handlers
are always called even if HW has broken behaviour
- I2C header build fix for when ACPI is enabled but I2C isn't
- The testunit gets a rename in the code to match the documentation
- Two fixes for the Qualcomm GENI I2C controller are cleaning up the
error exit patch in the runtime_resume() function. The first is
disabling the clock, the second disables the icc on the way out
* tag 'i2c-for-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: testunit: match HostNotify test name with docs
i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume
i2c: qcom-geni: Add missing clk_disable_unprepare in geni_i2c_runtime_resume
i2c: Fix conditional for substituting empty ACPI functions
i2c: smbus: Send alert notifications to all devices if source not found
i2c: smbus: Improve handling of stuck alerts
Linus Torvalds [Sat, 10 Aug 2024 17:19:05 +0000 (10:19 -0700)]
Merge tag 'dma-mapping-6.11-2024-08-10' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- avoid a deadlock with dma-debug and netconsole (Rik van Riel)
* tag 'dma-mapping-6.11-2024-08-10' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: avoid deadlock between dma debug vs printk and netconsole
Linus Torvalds [Sat, 10 Aug 2024 17:06:26 +0000 (10:06 -0700)]
Merge tag 'bcachefs-2024-08-10' of git://evilpiepirate.org/bcachefs
Pull more bcachefs fixes from Kent Overstreet:
"A couple last minute fixes for the new disk accounting
- fix a bug that was causing ACLs to seemingly "disappear"
- new on disk format version, bcachefs_metadata_version_disk_accounting_v3
bcachefs_metadata_version_disk_accounting_v2 accidentally included
padding in disk_accounting_key; fortunately, 6.11 isn't out yet so
we can fix this with another version bump"
* tag 'bcachefs-2024-08-10' of git://evilpiepirate.org/bcachefs:
bcachefs: bcachefs_metadata_version_disk_accounting_v3
bcachefs: improve bch2_dev_usage_to_text()
bcachefs: bch2_accounting_invalid()
bcachefs: Switch to .get_inode_acl()
Yong-Xuan Wang [Fri, 9 Aug 2024 07:10:47 +0000 (15:10 +0800)]
irqchip/riscv-aplic: Retrigger MSI interrupt on source configuration
The section 4.5.2 of the RISC-V AIA specification says that "any write
to a sourcecfg register of an APLIC might (or might not) cause the
corresponding interrupt-pending bit to be set to one if the rectified
input value is high (= 1) under the new source mode."
When the interrupt type is changed in the sourcecfg register, the APLIC
device might not set the corresponding pending bit, so the interrupt might
never become pending.
To handle sourcecfg register changes for level-triggered interrupts in MSI
mode, manually set the pending bit for retriggering interrupt so it gets
retriggered if it was already asserted.
The device tree property 'xlnx,kind-of-intr' is sanity checked that the
bitmask contains only set bits which are in the range of the number of
interrupts supported by the controller.
The check is done by shifting the mask right by the number of supported
interrupts and checking the result for zero.
The data type of the mask is u32 and the number of supported interrupts is
up to 32. In case of 32 interrupts the shift is out of bounds, resulting in
a mismatch warning. The out of bounds condition is also reported by UBSAN:
UBSAN: shift-out-of-bounds in irq-xilinx-intc.c:332:22
shift exponent 32 is too large for 32-bit type 'unsigned int'
Linus Torvalds [Sat, 10 Aug 2024 04:33:25 +0000 (21:33 -0700)]
Merge tag '6.11-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- DFS fix
- fix for security flags for requiring encryption
- minor cleanup
* tag '6.11-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: cifs_inval_name_dfs_link_error: correct the check for fullpath
Fix spelling errors in Server Message Block
smb3: fix setting SecurityFlags when encryption is required
Linus Torvalds [Sat, 10 Aug 2024 04:26:50 +0000 (21:26 -0700)]
Merge tag 'spi-fix-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few SPI fixes: clock rate calculation fixes for the Kunpeng and lpsi
drivers and a missing registration of a device ID for spidev (which
had only been updated for DT cases, causing warnings)"
* tag 'spi-fix-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-fsl-lpspi: Fix scldiv calculation
spi: spidev: Add missing spi_device_id for bh2228fv
spi: hisi-kunpeng: Add verification for the max_frequency provided by the firmware
spi: hisi-kunpeng: Add validation for the minimum value of speed_hz
bcachefs_metadata_version_disk_accounting_v2 erroneously had padding
bytes in disk_accounting_key, which is a problem because we have to
guarantee that all unused bytes in disk_accounting_key are zeroed.
Fortunately 6.11 isn't out yet, so it's cheap to fix this by spinning a
new version.
Linus Torvalds [Fri, 9 Aug 2024 21:00:22 +0000 (14:00 -0700)]
Merge tag 'drm-fixes-2024-08-10' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly regular fixes, mostly amdgpu with i915/xe having a few each,
and then some misc bits across the board, seems about right for rc3
time.
xe:
- Fix off-by-one when processing RTP rules
- Use dma_fence_chain_free in chain fence unused as a sync
- Fix PL1 disable flow in xe_hwmon_power_max_write
- Take ref to VM in delayed dump snapshot
i915:
- correct dual pps handling for MTL_PCH+ [display]
- Adjust vma offset for framebuffer mmap offset [gem]
- Fix Virtual Memory mapping boundaries calculation [gem]
- Allow evicting to use the requested placement
- Attempt to get pages without eviction first"
* tag 'drm-fixes-2024-08-10' of https://gitlab.freedesktop.org/drm/kernel: (31 commits)
drm/xe: Take ref to VM in delayed snapshot
drm/xe/hwmon: Fix PL1 disable flow in xe_hwmon_power_max_write
drm/xe: Use dma_fence_chain_free in chain fence unused as a sync
drm/xe/rtp: Fix off-by-one when processing rules
drm/amdgpu: Add DCC GFX12 flag to enable address alignment
drm/amdgpu: correct sdma7 max dw
drm/amdgpu: Add address alignment support to DCC buffers
drm/amd/display: Skip Recompute DSC Params if no Stream on Link
drm/amdgpu: change non-dcc buffer copy configuration
drm/amdgpu: Forward soft recovery errors to userspace
drm/amdgpu: add golden setting for gc v12
drm/buddy: Add start address support to trim function
drm/amd/display: Add missing program DET segment call to pipe init
drm/amd/display: Add missing DCN314 to the DML Makefile
drm/amdgpu: force to use legacy inv in mmhub
drm/amd/pm: update powerplay structure on smu v14.0.2/3
drm/amd/display: Add missing mcache registers
drm/amd/display: Add dcc propagation value
drm/amd/display: Add missing DET segments programming
drm/amd/display: Replace dm_execute_dmub_cmd with dc_wake_and_execute_dmub_cmd
...
Kent Overstreet [Fri, 9 Aug 2024 03:19:59 +0000 (23:19 -0400)]
bcachefs: bch2_accounting_invalid()
Implement bch2_accounting_invalid(); check for junk at the end, and
replicas accounting entries in particular need to be checked or we'll
pop asserts later.
Linus Torvalds [Fri, 9 Aug 2024 17:44:35 +0000 (10:44 -0700)]
Merge tag 'pm-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Change the default EPP (energy-performence preference) value for the
Emerald Rapids processor in the intel_pstate driver.
Thisshould improve both the performance and energy efficiency (Pedro
Henrique Kopper)"
* tag 'pm-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Update Balance performance EPP for Emerald Rapids
Linus Torvalds [Fri, 9 Aug 2024 17:23:18 +0000 (10:23 -0700)]
Merge tag 'asm-generic-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic fixes from Arnd Bergmann:
"There are two more changes to the syscall.tbl conversion: the
'__NR_newfstat' in the previous bugfix was a mistake and gets reverted
now, after triple-checking that the contents are now back to what they
were on all architectures. The __NR_nfsservctl definition is not
really needed but came up in the same discussion as it had previously
been defined in uapi/asm-generic/unistd.h and tested for in user
space.
There are a few more symbols that used to be defined in the old
unistd.h file, but that are never defined on any other architecture
using syscall.tbl format. These used to be needed inside of the
kernel:
Searching for these on https://codesearch.debian.net/ shows a few
packages (rustc, golang, clamav, libseccomp, librsvg, strace) that
duplicate all the macros from asm/unistd.h, but nothing that actually
uses the macros, so I concluded that they are fine to omit after all"
* tag 'asm-generic-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
syscalls: add back legacy __NR_nfsservctl macro
syscalls: fix fstat() entry again
Linus Torvalds [Fri, 9 Aug 2024 16:43:46 +0000 (09:43 -0700)]
Merge tag 'probes-fixes-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull kprobe fixes from Masami Hiramatsu:
- Fix misusing str_has_prefix() parameter order to check symbol prefix
correctly
- bpf: remove unused declaring of bpf_kprobe_override
* tag 'probes-fixes-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
kprobes: Fix to check symbol prefixes correctly
bpf: kprobe: remove unused declaring of bpf_kprobe_override
Linus Torvalds [Fri, 9 Aug 2024 16:35:58 +0000 (09:35 -0700)]
Merge tag 'block-6.11-20240809' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Just a set of cleanups for blk-throttle and nvme structures"
* tag 'block-6.11-20240809' of git://git.kernel.dk/linux:
nvme: reorganize nvme_ns_head fields
nvme: change data type of lba_shift
nvme: remove a field from nvme_ns_head
nvme: remove unused parameter
blk-throttle: remove more latency dead-code
Linus Torvalds [Fri, 9 Aug 2024 16:32:10 +0000 (09:32 -0700)]
Merge tag 'io_uring-6.11-20240809' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"Nothing major in here, just two fixes for ensuring that bundle
recv/send requests always get marked for cleanups, and a single fix to
ensure that sends with provided buffers only pick a single buffer
unless the bundle option has been enabled"
* tag 'io_uring-6.11-20240809' of git://git.kernel.dk/linux:
io_uring/net: don't pick multiple buffers for non-bundle send
io_uring/net: ensure expanded bundle send gets marked for cleanup
io_uring/net: ensure expanded bundle recv gets marked for cleanup
Linus Torvalds [Fri, 9 Aug 2024 16:25:30 +0000 (09:25 -0700)]
Merge tag 'sound-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of lots of small changes, almost all device-specific:
- A series of fixes for ASoC Qualcomm stuff
- Various fixes for Cirrus ASoC and HD-audio codecs
- A few AMD ASoC quirks and usual HD-audio quirks
- Other misc fixes, including a long-time regression in USB-audio"
* tag 'sound-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (39 commits)
ASoC: cs35l56: Patch CS35L56_IRQ1_MASK_18 to the default value
ASoC: meson: axg-fifo: fix irq scheduling issue with PREEMPT_RT
MAINTAINERS: Update Cirrus Logic parts to linux-sound mailing list
ASoC: dt-bindings: qcom,wcd939x: Correct reset GPIO polarity in example
ASoC: dt-bindings: qcom,wcd938x: Correct reset GPIO polarity in example
ASoC: dt-bindings: qcom,wcd934x: Correct reset GPIO polarity in example
ASoC: dt-bindings: qcom,wcd937x: Correct reset GPIO polarity in example
ASoC: amd: yc: Add quirk entry for OMEN by HP Gaming Laptop 16-n0xxx
ASoC: codecs: ES8326: button detect issue
ASoC: amd: yc: Support mic on Lenovo Thinkpad E14 Gen 6
ALSA: usb-audio: Re-add ScratchAmp quirk entries
ALSA: hda/realtek: Add Framework Laptop 13 (Intel Core Ultra) to quirks
ALSA: hda/hdmi: Yet more pin fix for HP EliteDesk 800 G4
ALSA: hda: Add HP MP9 G4 Retail System AMS to force connect list
ASoC: cs35l56: Handle OTP read latency over SoundWire
ASoC: codecs: lpass-macro: fix missing codec version
ALSA: line6: Fix racy access to midibuf
ASoC: cs-amp-lib: Fix NULL pointer crash if efi.get_variable is NULL
ASoC: cs35l56: Stop creating ALSA controls for firmware coefficients
ASoC: wm_adsp: Add control_add callback and export wm_adsp_control_add()
...
Linus Torvalds [Fri, 9 Aug 2024 15:33:28 +0000 (08:33 -0700)]
module: make waiting for a concurrent module loader interruptible
The recursive aes-arm-bs module load situation reported by Russell King
is getting fixed in the crypto layer, but this in the meantime fixes the
"recursive load hangs forever" by just making the waiting for the first
module load be interruptible.
This should now match the old behavior before commit 9b9879fc0327
("modules: catch concurrent module loads, treat them as idempotent"),
which used the different "wait for module to be ready" code in
module_patient_check_exists().
End result: a recursive module load will still block, but now a signal
will interrupt it and fail the second module load, at which point the
first module will successfully complete loading.
Fixes: 9b9879fc0327 ("modules: catch concurrent module loads, treat them as idempotent") Cc: Russell King <[email protected]> Cc: Herbert Xu <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Wolfram Sang [Fri, 9 Aug 2024 13:28:08 +0000 (15:28 +0200)]
Merge tag 'i2c-host-fixes-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
Two fixes on the Qualcomm GENI I2C controller are cleaning up the
error exit patch in the runtime_resume() function. The first is
disabling the clock, the second disables the icc on the way out.
Takashi Iwai [Fri, 9 Aug 2024 07:58:07 +0000 (09:58 +0200)]
Merge tag 'asoc-fix-v6.11-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.11
Quite a lot of fixes have come in since the merge window, there's some
repetitive fixes over the Qualcomm drivers increasing the patch count,
along with a large batch of fixes from Cirrus. We also have some quirks
and some individual fixes.
Dave Airlie [Fri, 9 Aug 2024 07:08:55 +0000 (17:08 +1000)]
Merge tag 'drm-xe-fixes-2024-08-08' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
- Fix off-by-one when processing RTP rules (Lucas)
- Use dma_fence_chain_free in chain fence unused as a sync (Brost)
- Fix PL1 disable flow in xe_hwmon_power_max_write (Karthik)
- Take ref to VM in delayed dump snapshot (Brost)
Dave Airlie [Fri, 9 Aug 2024 03:00:59 +0000 (13:00 +1000)]
Merge tag 'drm-misc-fixes-2024-08-08' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A fix for drm/client to prevent a null pointer dereference, a fix for a
double-free in drm/bridge-connector, a fix for a gem shmem test, and a
fix for async flips updates.
cifs: cifs_inval_name_dfs_link_error: correct the check for fullpath
Replace the always-true check tcon->origin_fullpath with
check of server->leaf_fullpath
See https://bugzilla.kernel.org/show_bug.cgi?id=219083
The check of the new @tcon will always be true during mounting,
since @tcon->origin_fullpath will only be set after the tree is
connected to the latest common resource, as well as checking if
the prefix paths from it are fully accessible.
Fixes: 3ae872de4107 ("smb: client: fix shared DFS root mounts with different prefixes") Reviewed-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Gleb Korobeynikov <[email protected]> Signed-off-by: Steve French <[email protected]>
Linus Torvalds [Thu, 8 Aug 2024 20:51:44 +0000 (13:51 -0700)]
Merge tag 'net-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth.
Current release - regressions:
- eth: bnxt_en: fix memory out-of-bounds in bnxt_fill_hw_rss_tbl() on
older chips
Current release - new code bugs:
- ethtool: fix off-by-one error / kdoc contradicting the code for max
RSS context IDs
- Bluetooth: hci_qca:
- QCA6390: fix support on non-DT platforms
- QCA6390: don't call pwrseq_power_off() twice
- fix a NULL-pointer derefence at shutdown
- eth: ice: fix incorrect assigns of FEC counters
Previous releases - regressions:
- mptcp: fix handling endpoints with both 'signal' and 'subflow'
flags set
- virtio-net: fix changing ring count when vq IRQ coalescing not
supported
- eth: gve: fix use of netif_carrier_ok() during reconfig / reset
Previous releases - always broken:
- eth: idpf: fix bugs in queue re-allocation on reconfig / reset
- ethtool: fix context creation with no parameters
Misc:
- linkwatch: use system_unbound_wq to ease RTNL contention"
* tag 'net-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (41 commits)
net: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897.
ethtool: Fix context creation with no parameters
net: ethtool: fix off-by-one error in max RSS context IDs
net: pse-pd: tps23881: include missing bitfield.h header
net: fec: Stop PPS on driver remove
net: bcmgenet: Properly overlay PHY and MAC Wake-on-LAN capabilities
l2tp: fix lockdep splat
net: stmmac: dwmac4: fix PCS duplex mode decode
idpf: fix UAFs when destroying the queues
idpf: fix memleak in vport interrupt configuration
idpf: fix memory leaks and crashes while performing a soft reset
bnxt_en : Fix memory out-of-bounds in bnxt_fill_hw_rss_tbl()
net: dsa: bcm_sf2: Fix a possible memory leak in bcm_sf2_mdio_register()
net/smc: add the max value of fallback reason count
Bluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor
Bluetooth: l2cap: always unlock channel in l2cap_conless_channel()
Bluetooth: hci_qca: fix a NULL-pointer derefence at shutdown
Bluetooth: hci_qca: fix QCA6390 support on non-DT platforms
Bluetooth: hci_qca: don't call pwrseq_power_off() twice for QCA6390
ice: Fix incorrect assigns of FEC counts
...
Linus Torvalds [Thu, 8 Aug 2024 20:32:59 +0000 (13:32 -0700)]
Merge tag 'trace-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Have reading of event format files test if the metadata still exists.
When a event is freed, a flag (EVENT_FILE_FL_FREED) in the metadata
is set to state that it is to prevent any new references to it from
happening while waiting for existing references to close. When the
last reference closes, the metadata is freed. But the "format" was
missing a check to this flag (along with some other files) that
allowed new references to happen, and a use-after-free bug to occur.
- Have the trace event meta data use the refcount infrastructure
instead of relying on its own atomic counters.
- Have tracefs inodes use alloc_inode_sb() for allocation instead of
using kmem_cache_alloc() directly.
- Have eventfs_create_dir() return an ERR_PTR instead of NULL as the
callers expect a real object or an ERR_PTR.
- Have release_ei() use call_srcu() and not call_rcu() as all the
protection is on SRCU and not RCU.
- Fix ftrace_graph_ret_addr() to use the task passed in and not
current.
- Fix overflow bug in get_free_elt() where the counter can overflow the
integer and cause an infinite loop.
- Remove unused function ring_buffer_nr_pages()
- Have tracefs freeing use the inode RCU infrastructure instead of
creating its own.
When the kernel had randomize structure fields enabled, the rcu field
of the tracefs_inode was overlapping the rcu field of the inode
structure, and corrupting it. Instead, use the destroy_inode()
callback to do the initial cleanup of the code, and then have
free_inode() free it.
* tag 'trace-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracefs: Use generic inode RCU for synchronizing freeing
ring-buffer: Remove unused function ring_buffer_nr_pages()
tracing: Fix overflow in get_free_elt()
function_graph: Fix the ret_stack used by ftrace_graph_ret_addr()
eventfs: Use SRCU for freeing eventfs_inodes
eventfs: Don't return NULL in eventfs_create_dir()
tracefs: Fix inode allocation
tracing: Use refcount for trace_event_file reference counter
tracing: Have format file honor EVENT_FILE_FL_FREED
Linus Torvalds [Thu, 8 Aug 2024 20:27:31 +0000 (13:27 -0700)]
Merge tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Assorted little stuff:
- lockdep fixup for lockdep_set_notrack_class()
- we can now remove a device when using erasure coding without
deadlocking, though we still hit other issues
- the 'allocator stuck' timeout is now configurable, and messages are
ratelimited. The default timeout has been increased from 10 seconds
to 30"
* tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefs:
bcachefs: Use bch2_wait_on_allocator() in btree node alloc path
bcachefs: Make allocator stuck timeout configurable, ratelimit messages
bcachefs: Add missing path_traverse() to btree_iter_next_node()
bcachefs: ec should not allocate from ro devs
bcachefs: Improved allocator debugging for ec
bcachefs: Add missing bch2_trans_begin() call
bcachefs: Add a comment for bucket helper types
bcachefs: Don't rely on implicit unsigned -> signed integer conversion
lockdep: Fix lockdep_set_notrack_class() for CONFIG_LOCK_STAT
bcachefs: Fix double free of ca->buckets_nouse
Jerome Brunet [Wed, 7 Aug 2024 16:27:03 +0000 (18:27 +0200)]
ASoC: meson: axg-fifo: fix irq scheduling issue with PREEMPT_RT
With PREEMPT_RT enabled a spinlock_t becomes a sleeping lock.
This is usually not a problem with spinlocks used in IRQ context since
IRQ handlers get threaded. However, if IRQF_ONESHOT is set, the primary
handler won't be force-threaded and runs always in hardirq context. This is
a problem because spinlock_t requires a preemptible context on PREEMPT_RT.
In this particular instance, regmap mmio uses spinlock_t to protect the
register access and IRQF_ONESHOT is set on the IRQ. In this case, it is
actually better to do everything in threaded handler and it solves the
problem with PREEMPT_RT.
ASoC: dt-bindings: qcom,wcd939x: Correct reset GPIO polarity in example
The reset GPIO of WCD9390/WCD9395 is active low and that's how it is
routed on typical boards, so correct the example DTS to use expected
polarity, instead of IRQ flag (which is a logical mistake on its own).
Linus Torvalds [Thu, 8 Aug 2024 19:29:40 +0000 (12:29 -0700)]
module: warn about excessively long module waits
Russell King reported that the arm cbc(aes) crypto module hangs when
loaded, and Herbert Xu bisected it to commit 9b9879fc0327 ("modules:
catch concurrent module loads, treat them as idempotent"), and noted:
"So what's happening here is that the first modprobe tries to load a
fallback CBC implementation, in doing so it triggers a load of the
exact same module due to module aliases.
IOW we're loading aes-arm-bs which provides cbc(aes). However, this
needs a fallback of cbc(aes) to operate, which is made out of the
generic cbc module + any implementation of aes, or ecb(aes). The
latter happens to also be provided by aes-arm-cb so that's why it
tries to load the same module again"
So loading the aes-arm-bs module ends up wanting to recursively load
itself, and the recursive load then ends up waiting for the original
module load to complete.
This is a regression, in that it used to be that we just tried to load
the module multiple times, and then as we went on to install it the
second time we would instead just error out because the module name
already existed.
That is actually also exactly what the original "catch concurrent loads"
patch did in commit 9828ed3f695a ("module: error out early on concurrent
load of the same module file"), but it turns out that it ends up being
racy, in that erroring out before the module has been fully initialized
will cause failures in dependent module loading.
See commit ac2263b588df (which was the revert of that "error out early")
commit for details about why erroring out before the module has been
initialized is actually fundamentally racy.
Now, for the actual recursive module load (as opposed to just
concurrently loading the same module twice), the race is not an issue.
At the same time it's hard for the kernel to see that this is recursion,
because the module load is always done from a usermode helper, so the
recursion is not some simple callchain within the kernel.
End result: this is not the real fix, but this at least adds a warning
for the situation (admittedly much too late for all the debugging pain
that Russell and Herbert went through) and if we can come to a
resolution on how to detect the recursion properly, this re-organizes
the code to make that easier.
Kent Overstreet [Wed, 7 Aug 2024 19:42:23 +0000 (15:42 -0400)]
bcachefs: Switch to .get_inode_acl()
.set_acl() requires a dentry, and if one isn't passed it marks the VFS
inode as not having an ACL.
This has been causing inodes with ACLs to have them "disappear" on
bcachefs filesystem, depending on which path those inodes get pulled
into the cache from.
Switching to .get_inode_acl(), like other local filesystems, fixes this.
Jens Axboe [Thu, 8 Aug 2024 18:27:40 +0000 (12:27 -0600)]
Merge tag 'nvme-6.11-2024-08-08' of git://git.infradead.org/nvme into block-6.11
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.11
- Cleanups and improved struct packing (Kanchan)"
* tag 'nvme-6.11-2024-08-08' of git://git.infradead.org/nvme:
nvme: reorganize nvme_ns_head fields
nvme: change data type of lba_shift
nvme: remove a field from nvme_ns_head
nvme: remove unused parameter
Linus Torvalds [Thu, 8 Aug 2024 18:22:04 +0000 (11:22 -0700)]
Merge tag 'loongarch-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Enable general EFI poweroff method to make poweroff usable on
hardwares which lack ACPI S5, use accessors to page table entries
instead of direct dereference to avoid potential problems, and two
trivial kvm cleanups"
* tag 'loongarch-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Remove undefined a6 argument comment for kvm_hypercall()
LoongArch: KVM: Remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS
LoongArch: Use accessors to page table entries instead of direct dereference
LoongArch: Enable general EFI poweroff method
Matthew Brost [Thu, 1 Aug 2024 15:41:16 +0000 (08:41 -0700)]
drm/xe: Take ref to VM in delayed snapshot
Kernel BO's don't take a ref to the VM, we need the VM for the
delayed snapshot, so take a ref to the VM in delayed snapshot.
v2:
- Check for lrc_bo before taking a VM ref (CI)
- Check lrc_bo->vm before taking / dropping a VM ref (CI)
- Drop VM in xe_lrc_snapshot_free
v5:
- Fix commit message wording (Johnathan)
Karthik Poosa [Thu, 1 Aug 2024 11:24:24 +0000 (16:54 +0530)]
drm/xe/hwmon: Fix PL1 disable flow in xe_hwmon_power_max_write
In xe_hwmon_power_max_write, for PL1 disable supported case, instead of
returning after PL1 disable, PL1 enable path was also being run.
Fixed it by returning after disable.
v2: Correct typo and grammar in commit message. (Jonathan)
Matthew Brost [Sat, 27 Jul 2024 01:22:16 +0000 (18:22 -0700)]
drm/xe: Use dma_fence_chain_free in chain fence unused as a sync
A chain fence is uninitialized if not installed in a drm sync obj. Thus
if xe_sync_entry_cleanup is called and sync->chain_fence is non-NULL the
proper cleanup is dma_fence_chain_free rather than a dma-fence put.
Lucas De Marchi [Fri, 26 Jul 2024 06:43:35 +0000 (23:43 -0700)]
drm/xe/rtp: Fix off-by-one when processing rules
Gustavo noticed an odd "+ 2" in rtp_mark_active() while processing
rtp rules and pointed that it should be "+ 1". In fact, while processing
entries without actions (OOB workarounds), if the WA is activated and
has OR rules, it will also inadvertently activate the very next
workaround.
Test in a LNL B0 platform by moving 18024947630 on top of 16020292621,
makes the latter become active:
Gavin Shan [Thu, 8 Aug 2024 04:08:08 +0000 (14:08 +1000)]
cpumask: Fix crash on updating CPU enabled mask
The CPU enabled mask instead of the CPU possible mask should be used
by set_cpu_enabled(). Otherwise, we run into crash due to write to
the read-only CPU possible mask when vCPU is hot added on ARM64.
Steve French [Thu, 1 Aug 2024 02:38:50 +0000 (21:38 -0500)]
smb3: fix setting SecurityFlags when encryption is required
Setting encryption as required in security flags was broken.
For example (to require all mounts to be encrypted by setting):
"echo 0x400c5 > /proc/fs/cifs/SecurityFlags"
Would return "Invalid argument" and log "Unsupported security flags"
This patch fixes that (e.g. allowing overriding the default for
SecurityFlags 0x00c5, including 0x40000 to require seal, ie
SMB3.1.1 encryption) so now that works and forces encryption
on subsequent mounts.
Martin Whitaker [Wed, 7 Aug 2024 20:52:09 +0000 (21:52 +0100)]
net: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897.
As noted in the device errata [1-8], EEE support is not fully operational
in the KSZ8567, KSZ9477, KSZ9567, KSZ9896, and KSZ9897 devices, causing
link drops when connected to another device that supports EEE. The patch
series "net: add EEE support for KSZ9477 switch family" merged in commit 9b0bf4f77162 caused EEE support to be enabled in these devices. A fix for
this regression for the KSZ9477 alone was merged in commit 08c6d8bae48c2.
This patch extends this fix to the other affected devices.
Gal Pressman [Wed, 7 Aug 2024 17:33:52 +0000 (20:33 +0300)]
ethtool: Fix context creation with no parameters
The 'at least one change' requirement is not applicable for context
creation, skip the check in such case.
This allows a command such as 'ethtool -X eth0 context new' to work.
The command works by mistake when using older versions of userspace
ethtool due to an incompatibility issue where rxfh.input_xfrm is passed
as zero (unset) instead of RXH_XFRM_NO_CHANGE as done with recent
userspace. This patch does not try to solve the incompatibility issue.
Edward Cree [Wed, 7 Aug 2024 16:06:12 +0000 (17:06 +0100)]
net: ethtool: fix off-by-one error in max RSS context IDs
Both ethtool_ops.rxfh_max_context_id and the default value used when
it's not specified are supposed to be exclusive maxima (the former
is documented as such; the latter, U32_MAX, cannot be used as an ID
since it equals ETH_RXFH_CONTEXT_ALLOC), but xa_alloc() expects an
inclusive maximum.
Subtract one from 'limit' to produce an inclusive maximum, and pass
that to xa_alloc().
Increase bnxt's max by one to prevent a (very minor) regression, as
BNXT_MAX_ETH_RSS_CTX is an inclusive max. This is safe since bnxt
is not actually hard-limited; BNXT_MAX_ETH_RSS_CTX is just a
leftover from old driver code that managed context IDs itself.
Rename rxfh_max_context_id to rxfh_max_num_contexts to make its
semantics (hopefully) more obvious.
Arnd Bergmann [Wed, 7 Aug 2024 07:54:22 +0000 (09:54 +0200)]
net: pse-pd: tps23881: include missing bitfield.h header
Using FIELD_GET() fails in configurations that don't already include
the header file indirectly:
drivers/net/pse-pd/tps23881.c: In function 'tps23881_i2c_probe':
drivers/net/pse-pd/tps23881.c:755:13: error: implicit declaration of function 'FIELD_GET' [-Wimplicit-function-declaration]
755 | if (FIELD_GET(TPS23881_REG_DEVID_MASK, ret) != TPS23881_DEVICE_ID) {
| ^~~~~~~~~
Csókás, Bence [Wed, 7 Aug 2024 08:09:56 +0000 (10:09 +0200)]
net: fec: Stop PPS on driver remove
PPS was not stopped in `fec_ptp_stop()`, called when
the adapter was removed. Consequentially, you couldn't
safely reload the driver with the PPS signal on.
net: bcmgenet: Properly overlay PHY and MAC Wake-on-LAN capabilities
Some Wake-on-LAN modes such as WAKE_FILTER may only be supported by the MAC,
while others might be only supported by the PHY. Make sure that the .get_wol()
returns the union of both rather than only that of the PHY if the PHY supports
Wake-on-LAN.
James Chapman [Tue, 6 Aug 2024 16:06:26 +0000 (17:06 +0100)]
l2tp: fix lockdep splat
When l2tp tunnels use a socket provided by userspace, we can hit
lockdep splats like the below when data is transmitted through another
(unrelated) userspace socket which then gets routed over l2tp.
This issue was previously discussed here:
https://lore.kernel.org/netdev/[email protected]/
The solution is to have lockdep treat socket locks of l2tp tunnel
sockets separately than those of standard INET sockets. To do so, use
a different lockdep subclass where lock nesting is possible.
============================================
WARNING: possible recursive locking detected
6.10.0+ #34 Not tainted
--------------------------------------------
iperf3/771 is trying to acquire lock: ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0
but task is already holding lock: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10
other info that might help us debug this:
Possible unsafe locking scenario:
dwmac4 was decoding the duplex mode from the GMAC_PHYIF_CONTROL_STATUS
register incorrectly, using GMAC_PHYIF_CTRLSTATUS_LNKMOD_MASK (value 1)
rather than GMAC_PHYIF_CTRLSTATUS_LNKMOD (bit 16). Fix this.
Andi Kleen [Thu, 8 Aug 2024 00:02:44 +0000 (17:02 -0700)]
x86/mtrr: Check if fixed MTRRs exist before saving them
MTRRs have an obsolete fixed variant for fine grained caching control
of the 640K-1MB region that uses separate MSRs. This fixed variant has
a separate capability bit in the MTRR capability MSR.
So far all x86 CPUs which support MTRR have this separate bit set, so it
went unnoticed that mtrr_save_state() does not check the capability bit
before accessing the fixed MTRR MSRs.
Though on a CPU that does not support the fixed MTRR capability this
results in a #GP. The #GP itself is harmless because the RDMSR fault is
handled gracefully, but results in a WARN_ON().
Linus Torvalds [Thu, 8 Aug 2024 14:32:20 +0000 (07:32 -0700)]
Merge tag 'mm-hotfixes-stable-2024-08-07-18-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Nine hotfixes. Five are cc:stable, the others either pertain to
post-6.10 material or aren't considered necessary for earlier kernels.
Five are MM and four are non-MM. No identifiable theme here - please
see the individual changelogs"
* tag 'mm-hotfixes-stable-2024-08-07-18-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
padata: Fix possible divide-by-0 panic in padata_mt_helper()
mailmap: update entry for David Heidelberg
memcg: protect concurrent access to mem_cgroup_idr
mm: shmem: fix incorrect aligned index when checking conflicts
mm: shmem: avoid allocating huge pages larger than MAX_PAGECACHE_ORDER for shmem
mm: list_lru: fix UAF for memory cgroup
kcov: properly check for softirq context
MAINTAINERS: Update LTP members and web
selftests: mm: add s390 to ARCH check
Takashi Iwai [Thu, 8 Aug 2024 08:18:01 +0000 (10:18 +0200)]
ALSA: usb-audio: Re-add ScratchAmp quirk entries
At the code refactoring of USB-audio quirk handling, I assumed that
the quirk entries of Stanton ScratchAmp devices were only about the
device name, and moved them completely into the rename table.
But it seems that the device requires the quirk entry so that it's
probed by the driver itself.
This re-adds back the quirk entries of ScratchAmp, but in a
minimalistic manner.
Jakub Kicinski [Thu, 8 Aug 2024 03:31:42 +0000 (20:31 -0700)]
Merge tag 'for-net-2024-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_sync: avoid dup filtering when passive scanning with adv monitor
- hci_qca: don't call pwrseq_power_off() twice for QCA6390
- hci_qca: fix QCA6390 support on non-DT platforms
- hci_qca: fix a NULL-pointer derefence at shutdown
- l2cap: always unlock channel in l2cap_conless_channel()
* tag 'for-net-2024-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor
Bluetooth: l2cap: always unlock channel in l2cap_conless_channel()
Bluetooth: hci_qca: fix a NULL-pointer derefence at shutdown
Bluetooth: hci_qca: fix QCA6390 support on non-DT platforms
Bluetooth: hci_qca: don't call pwrseq_power_off() twice for QCA6390
====================
====================
idpf: fix 3 bugs revealed by the Chapter I
Alexander Lobakin says:
The libeth conversion revealed 2 serious issues which lead to sporadic
crashes or WARNs under certain configurations. Additional one was found
while debugging these two with kmemleak.
This one is targeted stable, the rest can be backported manually later
if needed. They can be reproduced only after the conversion is applied
anyway.
====================
The second tagged commit started sometimes (very rarely, but possible)
throwing WARNs from
net/core/page_pool.c:page_pool_disable_direct_recycling().
Turned out idpf frees interrupt vectors with embedded NAPIs *before*
freeing the queues making page_pools' NAPI pointers lead to freed
memory before these pools are destroyed by libeth.
It's not clear whether there are other accesses to the freed vectors
when destroying the queues, but anyway, we usually free queue/interrupt
vectors only when the queues are destroyed and the NAPIs are guaranteed
to not be referenced anywhere.
Invert the allocation and freeing logic making queue/interrupt vectors
be allocated first and freed last. Vectors don't require queues to be
present, so this is safe. Additionally, this change allows to remove
that useless queue->q_vector pointer cleanup, as vectors are still
valid when freeing the queues (+ both are freed within one function,
so it's not clear why nullify the pointers at all).
Michal Kubiak [Tue, 6 Aug 2024 22:09:21 +0000 (15:09 -0700)]
idpf: fix memleak in vport interrupt configuration
The initialization of vport interrupt consists of two functions:
1) idpf_vport_intr_init() where a generic configuration is done
2) idpf_vport_intr_req_irq() where the irq for each q_vector is
requested.
The first function used to create a base name for each interrupt using
"kasprintf()" call. Unfortunately, although that call allocated memory
for a text buffer, that memory was never released.
Fix this by removing creating the interrupt base name in 1).
Instead, always create a full interrupt name in the function 2), because
there is no need to create a base name separately, considering that the
function 2) is never called out of idpf_vport_intr_init() context.
idpf: fix memory leaks and crashes while performing a soft reset
The second tagged commit introduced a UAF, as it removed restoring
q_vector->vport pointers after reinitializating the structures.
This is due to that all queue allocation functions are performed here
with the new temporary vport structure and those functions rewrite
the backpointers to the vport. Then, this new struct is freed and
the pointers start leading to nowhere.
But generally speaking, the current logic is very fragile. It claims
to be more reliable when the system is low on memory, but in fact, it
consumes two times more memory as at the moment of running this
function, there are two vports allocated with their queues and vectors.
Moreover, it claims to prevent the driver from running into "bad state",
but in fact, any error during the rebuild leaves the old vport in the
partially allocated state.
Finally, if the interface is down when the function is called, it always
allocates a new queue set, but when the user decides to enable the
interface later on, vport_open() allocates them once again, IOW there's
a clear memory leak here.
Just don't allocate a new queue set when performing a reset, that solves
crashes and memory leaks. Readd the old queue number and reopen the
interface on rollback - that solves limbo states when the device is left
disabled and/or without HW queues enabled.
Michael Chan [Tue, 6 Aug 2024 05:37:42 +0000 (22:37 -0700)]
bnxt_en : Fix memory out-of-bounds in bnxt_fill_hw_rss_tbl()
A recent commit has modified the code in __bnxt_reserve_rings() to
set the default RSS indirection table to default only when the number
of RX rings is changing. While this works for newer firmware that
requires RX ring reservations, it causes the regression on older
firmware not requiring RX ring resrvations (BNXT_NEW_RM() returns
false).
With older firmware, RX ring reservations are not required and so
hw_resc->resv_rx_rings is not always set to the proper value. The
comparison:
if (old_rx_rings != bp->hw_resc.resv_rx_rings)
in __bnxt_reserve_rings() may be false even when the RX rings are
changing. This will cause __bnxt_reserve_rings() to skip setting
the default RSS indirection table to default to match the current
number of RX rings. This may later cause bnxt_fill_hw_rss_tbl() to
use an out-of-range index.
We already have bnxt_check_rss_tbl_no_rmgr() to handle exactly this
scenario. We just need to move it up in bnxt_need_reserve_rings()
to be called unconditionally when using older firmware. Without the
fix, if the TX rings are changing, we'll skip the
bnxt_check_rss_tbl_no_rmgr() call and __bnxt_reserve_rings() may also
skip the bnxt_set_dflt_rss_indir_tbl() call for the reason explained
in the last paragraph. Without setting the default RSS indirection
table to default, it causes the regression:
BUG: KASAN: slab-out-of-bounds in __bnxt_hwrm_vnic_set_rss+0xb79/0xe40
Read of size 2 at addr ffff8881c5809618 by task ethtool/31525
Call Trace:
__bnxt_hwrm_vnic_set_rss+0xb79/0xe40
bnxt_hwrm_vnic_rss_cfg_p5+0xf7/0x460
__bnxt_setup_vnic_p5+0x12e/0x270
__bnxt_open_nic+0x2262/0x2f30
bnxt_open_nic+0x5d/0xf0
ethnl_set_channels+0x5d4/0xb30
ethnl_default_set_doit+0x2f1/0x620
Joe Hattori [Tue, 6 Aug 2024 01:13:27 +0000 (10:13 +0900)]
net: dsa: bcm_sf2: Fix a possible memory leak in bcm_sf2_mdio_register()
bcm_sf2_mdio_register() calls of_phy_find_device() and then
phy_device_remove() in a loop to remove existing PHY devices.
of_phy_find_device() eventually calls bus_find_device(), which calls
get_device() on the returned struct device * to increment the refcount.
The current implementation does not decrement the refcount, which causes
memory leak.
This commit adds the missing phy_device_free() call to decrement the
refcount via put_device() to balance the refcount.
Zhengchao Shao [Mon, 5 Aug 2024 04:38:56 +0000 (12:38 +0800)]
net/smc: add the max value of fallback reason count
The number of fallback reasons defined in the smc_clc.h file has reached
36. For historical reasons, some are no longer quoted, and there's 33
actually in use. So, add the max value of fallback reason count to 36.
Fixes: 6ac1e6563f59 ("net/smc: support smc v2.x features validate") Fixes: 7f0620b9940b ("net/smc: support max connections per lgr negotiation") Fixes: 69b888e3bb4b ("net/smc: support max links per lgr negotiation in clc handshake") Signed-off-by: Zhengchao Shao <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-by: D. Wythe <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
Looking at the padata_mt_helper() function, the only way a divide-by-0
panic can happen is when ps->chunk_size is 0. The way that chunk_size is
initialized in padata_do_multithreaded(), chunk_size can be 0 when the
min_chunk in the passed-in padata_mt_job structure is 0.
Fix this divide-by-0 panic by making sure that chunk_size will be at least
1 no matter what the input parameters are.
Shakeel Butt [Fri, 2 Aug 2024 23:58:22 +0000 (16:58 -0700)]
memcg: protect concurrent access to mem_cgroup_idr
Commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after
many small jobs") decoupled the memcg IDs from the CSS ID space to fix the
cgroup creation failures. It introduced IDR to maintain the memcg ID
space. The IDR depends on external synchronization mechanisms for
modifications. For the mem_cgroup_idr, the idr_alloc() and idr_replace()
happen within css callback and thus are protected through cgroup_mutex
from concurrent modifications. However idr_remove() for mem_cgroup_idr
was not protected against concurrency and can be run concurrently for
different memcgs when they hit their refcnt to zero. Fix that.
We have been seeing list_lru based kernel crashes at a low frequency in
our fleet for a long time. These crashes were in different part of
list_lru code including list_lru_add(), list_lru_del() and reparenting
code. Upon further inspection, it looked like for a given object (dentry
and inode), the super_block's list_lru didn't have list_lru_one for the
memcg of that object. The initial suspicions were either the object is
not allocated through kmem_cache_alloc_lru() or somehow
memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
returned success. No evidence were found for these cases.
Looking more deeply, we started seeing situations where valid memcg's id
is not present in mem_cgroup_idr and in some cases multiple valid memcgs
have same id and mem_cgroup_idr is pointing to one of them. So, the most
reasonable explanation is that these situations can happen due to race
between multiple idr_remove() calls or race between
idr_alloc()/idr_replace() and idr_remove(). These races are causing
multiple memcgs to acquire the same ID and then offlining of one of them
would cleanup list_lrus on the system for all of them. Later access from
other memcgs to the list_lru cause crashes due to missing list_lru_one.
Baolin Wang [Wed, 31 Jul 2024 05:46:20 +0000 (13:46 +0800)]
mm: shmem: fix incorrect aligned index when checking conflicts
In the shmem_suitable_orders() function, xa_find() is used to check for
conflicts in the pagecache to select suitable huge orders. However, when
checking each huge order in every loop, the aligned index is calculated
from the previous iteration, which may cause suitable huge orders to be
missed.
We should use the original index each time in the loop to calculate a new
aligned index for checking conflicts to avoid this issue.
Baolin Wang [Wed, 31 Jul 2024 05:46:19 +0000 (13:46 +0800)]
mm: shmem: avoid allocating huge pages larger than MAX_PAGECACHE_ORDER for shmem
Similar to commit d659b715e94ac ("mm/huge_memory: avoid PMD-size page
cache if needed"), ARM64 can support 512MB PMD-sized THP when the base
page size is 64KB, which is larger than the maximum supported page cache
size MAX_PAGECACHE_ORDER.
This is not expected. To fix this issue, use THP_ORDERS_ALL_FILE_DEFAULT
for shmem to filter allowable huge orders.
Muchun Song [Thu, 18 Jul 2024 08:36:07 +0000 (16:36 +0800)]
mm: list_lru: fix UAF for memory cgroup
The mem_cgroup_from_slab_obj() is supposed to be called under rcu lock or
cgroup_mutex or others which could prevent returned memcg from being
freed. Fix it by adding missing rcu read lock.
When collecting coverage from softirqs, KCOV uses in_serving_softirq() to
check whether the code is running in the softirq context. Unfortunately,
in_serving_softirq() is > 0 even when the code is running in the hardirq
or NMI context for hardirqs and NMIs that happened during a softirq.
As a result, if a softirq handler contains a remote coverage collection
section and a hardirq with another remote coverage collection section
happens during handling the softirq, KCOV incorrectly detects a nested
softirq coverate collection section and prints a WARNING, as reported by
syzbot.
This issue was exposed by commit a7f3813e589f ("usb: gadget: dummy_hcd:
Switch to hrtimer transfer scheduler"), which switched dummy_hcd to using
hrtimer and made the timer's callback be executed in the hardirq context.
Change the related checks in KCOV to account for this behavior of
in_serving_softirq() and make KCOV ignore remote coverage collection
sections in the hardirq and NMI contexts.
This prevents the WARNING printed by syzbot but does not fix the inability
of KCOV to collect coverage from the __usb_hcd_giveback_urb when dummy_hcd
is in use (caused by a7f3813e589f); a separate patch is required for that.
commit 0518dbe97fe6 ("selftests/mm: fix cross compilation with LLVM")
changed the env variable for the architecture from MACHINE to ARCH.
This is preventing 3 required TEST_GEN_FILES from being included when
cross compiling s390x and errors when trying to run the test suite. This
is due to the ARCH variable already being set and the arch folder name
being s390.
Add "s390" to the filtered list to cover this case and have the 3 files
included in the build.
Steven Rostedt [Wed, 7 Aug 2024 22:54:02 +0000 (18:54 -0400)]
tracefs: Use generic inode RCU for synchronizing freeing
With structure layout randomization enabled for 'struct inode' we need to
avoid overlapping any of the RCU-used / initialized-only-once members,
e.g. i_lru or i_sb_list to not corrupt related list traversals when making
use of the rcu_head.
For an unlucky structure layout of 'struct inode' we may end up with the
following splat when running the ftrace selftests:
The list debug message as well as RBX's symbolic value point out that the
object in question was allocated from 'tracefs_inode_cache' and that the
list's '->next' member is at offset 0. Dumping the layout of the relevant
parts of 'struct tracefs_inode' gives the following:
Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
call_rcu() or later when calling the RCU callback. This will disturb
concurrent list traversals as well as object reuse which assumes these
list heads will keep their integrity.
For reproduction, the following diff manually overlays 'i_lru' with
'rcu' as, otherwise, one would require some good portion of luck for
gambling an unlucky RANDSTRUCT seed:
@@ -690,7 +691,6 @@ struct inode {
u16 i_wb_frn_avg_time;
u16 i_wb_frn_history;
#endif
- struct list_head i_lru; /* inode LRU list */
struct list_head i_sb_list;
struct list_head i_wb_list; /* backing dev writeback list */
union {
The tracefs inode does not need to supply its own RCU delayed destruction
of its inode. The inode code itself offers both a "destroy_inode()"
callback that gets called when the last reference of the inode is
released, and the "free_inode()" which is called after a RCU
synchronization period from the "destroy_inode()".
The tracefs code can unlink the inode from its list in the destroy_inode()
callback, and the simply free it from the free_inode() callback. This
should provide the same protection.
Jianhui Zhou [Mon, 5 Aug 2024 11:36:31 +0000 (19:36 +0800)]
ring-buffer: Remove unused function ring_buffer_nr_pages()
Because ring_buffer_nr_pages() is not an inline function and user accesses
buffer->buffers[cpu]->nr_pages directly, the function ring_buffer_nr_pages
is removed.
Tze-nan Wu [Mon, 5 Aug 2024 05:59:22 +0000 (13:59 +0800)]
tracing: Fix overflow in get_free_elt()
"tracing_map->next_elt" in get_free_elt() is at risk of overflowing.
Once it overflows, new elements can still be inserted into the tracing_map
even though the maximum number of elements (`max_elts`) has been reached.
Continuing to insert elements after the overflow could result in the
tracing_map containing "tracing_map->max_size" elements, leaving no empty
entries.
If any attempt is made to insert an element into a full tracing_map using
`__tracing_map_insert()`, it will cause an infinite loop with preemption
disabled, leading to a CPU hang problem.
Fix this by preventing any further increments to "tracing_map->next_elt"
once it reaches "tracing_map->max_elt".