Qu Wenruo [Sun, 11 Aug 2024 05:30:22 +0000 (15:00 +0930)]
btrfs: tree-checker: add dev extent item checks
[REPORT]
There is a corruption report that btrfs refused to mount a fs that has
overlapping dev extents:
BTRFS error (device sdc): dev extent devid 4 physical offset 14263979671552 overlap with previous dev extent end 14263980982272
BTRFS error (device sdc): failed to verify dev extents against chunks: -117
BTRFS error (device sdc): open_ctree failed
[CAUSE]
The direct cause is very obvious, there is a bad dev extent item with
incorrect length.
With btrfs check reporting two overlapping extents, the second one shows
some clue on the cause:
ERROR: dev extent devid 4 offset 14263979671552 len 6488064 overlap with previous dev extent end 14263980982272
ERROR: dev extent devid 13 offset 2257707008000 len 6488064 overlap with previous dev extent end 2257707270144
ERROR: errors found in extent allocation tree or chunk allocation
The second one looks like a bitflip happened during new chunk
allocation:
hex(2257707008000) = 0x20da9d30000
hex(2257707270144) = 0x20da9d70000
diff = 0x00000040000
So it looks like a bitflip happened during new dev extent allocation,
resulting the second overlap.
Currently we only do the dev-extent verification at mount time, but if the
corruption is caused by memory bitflip, we really want to catch it before
writing the corruption to the storage.
Furthermore the dev extent items has the following key definition:
(<device id> DEV_EXTENT <physical offset>)
Thus we can not just rely on the generic key order check to make sure
there is no overlapping.
[ENHANCEMENT]
Introduce dedicated dev extent checks, including:
- Fixed member checks
* chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3)
* chunk_objectid should always be
BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
- Alignment checks
* chunk_offset should be aligned to sectorsize
* length should be aligned to sectorsize
* key.offset should be aligned to sectorsize
- Overlap checks
If the previous key is also a dev-extent item, with the same
device id, make sure we do not overlap with the previous dev extent.
Jeff Layton [Mon, 12 Aug 2024 16:30:52 +0000 (12:30 -0400)]
btrfs: update target inode's ctime on unlink
Unlink changes the link count on the target inode. POSIX mandates that
the ctime must also change when this occurs.
According to https://pubs.opengroup.org/onlinepubs/9699919799/functions/unlink.html:
"Upon successful completion, unlink() shall mark for update the last data
modification and last file status change timestamps of the parent
directory. Also, if the file's link count is not 0, the last file status
change timestamp of the file shall be marked for update."
Thorsten Blum [Tue, 13 Aug 2024 10:53:15 +0000 (12:53 +0200)]
btrfs: send: annotate struct name_cache_entry with __counted_by()
Add the __counted_by compiler attribute to the flexible array member
name to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Naohiro Aota [Fri, 9 Aug 2024 07:54:22 +0000 (16:54 +0900)]
btrfs: fix invalid mapping of extent xarray state
In __extent_writepage_io(), we call btrfs_set_range_writeback() ->
folio_start_writeback(), which clears PAGECACHE_TAG_DIRTY mark from the
mapping xarray if the folio is not dirty. This worked fine before commit 97713b1a2ced ("btrfs: do not clear page dirty inside
extent_write_locked_range()").
After the commit, however, the folio is still dirty at this point, so the
mapping DIRTY tag is not cleared anymore. Then, __extent_writepage_io()
calls btrfs_folio_clear_dirty() to clear the folio's dirty flag. That
results in the page being unlocked with a "strange" state. The page is not
PageDirty, but the mapping tag is set as PAGECACHE_TAG_DIRTY.
This strange state looks like causing a hang with a call trace below when
running fstests generic/091 on a null_blk device. It is waiting for a folio
lock.
While I don't have an exact relation between this hang and the strange
state, fixing the state also fixes the hang. And, that state is worth
fixing anyway.
This commit reorders btrfs_folio_clear_dirty() and
btrfs_set_range_writeback() in __extent_writepage_io(), so that the
PAGECACHE_TAG_DIRTY tag is properly removed from the xarray.
Filipe Manana [Mon, 12 Aug 2024 13:18:06 +0000 (14:18 +0100)]
btrfs: send: allow cloning non-aligned extent if it ends at i_size
If we a find that an extent is shared but its end offset is not sector
size aligned, then we don't clone it and issue write operations instead.
This is because the reflink (remap_file_range) operation does not allow
to clone unaligned ranges, except if the end offset of the range matches
the i_size of the source and destination files (and the start offset is
sector size aligned).
While this is not incorrect because send can only guarantee that a file
has the same data in the source and destination snapshots, it's not
optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Use a file size not aligned to any possible sector size.
file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes
dd if=/dev/random of=$MNT/foo bs=$file_size count=1
cp --reflink=always $MNT/foo $MNT/bar
Filipe Manana [Sun, 11 Aug 2024 10:53:42 +0000 (11:53 +0100)]
btrfs: only run the extent map shrinker from kswapd tasks
Currently the extent map shrinker can be run by any task when attempting
to allocate memory and there's enough memory pressure to trigger it.
To avoid too much latency we stop iterating over extent maps and removing
them once the task needs to reschedule. This logic was introduced in commit b3ebb9b7e92a ("btrfs: stop extent map shrinker if reschedule is needed").
While that solved high latency problems for some use cases, it's still
not enough because with a too high number of tasks entering the extent map
shrinker code, either due to memory allocations or because they are a
kswapd task, we end up having a very high level of contention on some
spin locks, namely:
1) The fs_info->fs_roots_radix_lock spin lock, which we need to find
roots to iterate over their inodes;
2) The spin lock of the xarray used to track open inodes for a root
(struct btrfs_root::inodes) - on 6.10 kernels and below, it used to
be a red black tree and the spin lock was root->inode_lock;
3) The fs_info->delayed_iput_lock spin lock since the shrinker adds
delayed iputs (calls btrfs_add_delayed_iput()).
Instead of allowing the extent map shrinker to be run by any task, make
it run only by kswapd tasks. This still solves the problem of running
into OOM situations due to an unbounded extent map creation, which is
simple to trigger by direct IO writes, as described in the changelog
of commit 956a17d9d050 ("btrfs: add a shrinker for extent maps"), and
by a similar case when doing buffered IO on files with a very large
number of holes (keeping the file open and creating many holes, whose
extent maps are only released when the file is closed).
Qu Wenruo [Sun, 11 Aug 2024 23:22:44 +0000 (08:52 +0930)]
btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
[REPORT]
There is a bug report that kernel is rejecting a mismatching inode mode
and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with
dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE]
It looks like the inode mode is correct, while the dir item type
0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT]
Although tree-checker is not able to do any cross-leaf verification, for
this particular case we can at least reject any dir type with
BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to
(0, BTRFS_FT_MAX).
Although the existing corruption can not be fixed just by such enhanced
checking, it should prevent the same 0x2->0x0 bitflip for dir type to
reach disk in the future.
Josef Bacik [Thu, 11 Apr 2024 20:41:20 +0000 (16:41 -0400)]
btrfs: check delayed refs when we're checking if a ref exists
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete
resume") I added some code to handle file systems that had been
corrupted by a bug that incorrectly skipped updating the drop progress
key while dropping a snapshot. This code would check to see if we had
already deleted our reference for a child block, and skip the deletion
if we had already.
Unfortunately there is a bug, as the check would only check the on-disk
references. I made an incorrect assumption that blocks in an already
deleted snapshot that was having the deletion resume on mount wouldn't
be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily
modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They
share references to block 1. Block 1 will have 2 full references, one
for subvolume a and one for subvolume b, and it belongs to subvolume a
(btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1,
and because we are the owner we will drop our full reference for all of
block 1's children, convert block 1 to FULL BACKREF, and add a shared
reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the
extent info for block 1, which checks delayed refs and tells us that
FULL BACKREF is set, so sets parent to the bytenr of block 1. However
because this is a resumed snapshot deletion, we call into
check_ref_exists(). Because check_ref_exists() only looks at the disk,
it doesn't find the shared backref for the child of block 1, and thus
returns 0 and we skip deleting the reference for the child of block 1
and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in
btrfs_lookup_extent_info(). However we only care about whether the
reference exists or not. If we fail to find our reference on disk, go
look up the bytenr in the delayed refs, and if it exists look for an
existing ref in the delayed ref head. If that exists then we know we
can delete the reference safely and carry on. If it doesn't exist we
know we have to skip over this block.
This bug has existed since I introduced this fix, however requires
having multiple deleted snapshots pending when we unmount. We noticed
this in production because our shutdown path stops the container on the
system, which deletes a bunch of subvolumes, and then reboots the box.
This gives us plenty of opportunities to hit this issue. Looking at the
history we've seen this occasionally in production, but we had a big
spike recently thanks to faster machines getting jobs with multiple
subvolumes in the job.
Chris Mason wrote a reproducer which does the following
On the second loop this would fail consistently, with my patch it has
been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could
debug the problem. Using the existing failure case to test my patch
validated that it fixes the problem.
Linus Torvalds [Sun, 11 Aug 2024 17:20:29 +0000 (10:20 -0700)]
Merge tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Fix 32-bit PTI for real.
pti_clone_entry_text() is called twice, once before initcalls so that
initcalls can use the user-mode helper and then again after text is
set read only. Setting read only on 32-bit might break up the PMD
mapping, which makes the second invocation of pti_clone_entry_text()
find the mappings out of sync and failing.
Allow the second call to split the existing PMDs in the user mapping
and synchronize with the kernel mapping.
- Don't make acpi_mp_wake_mailbox read-only after init as the mail box
must be writable in the case that CPU hotplug operations happen after
boot. Otherwise the attempt to start a CPU crashes with a write to
read only memory.
- Add a missing sanity check in mtrr_save_state() to ensure that the
fixed MTRR MSRs are supported.
Otherwise mtrr_save_state() ends up in a #GP, which is fixed up, but
the WARN_ON() can bring systems down when panic on warn is set.
* tag 'x86-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mtrr: Check if fixed MTRRs exist before saving them
x86/paravirt: Fix incorrect virt spinlock setting on bare metal
x86/acpi: Remove __ro_after_init from acpi_mp_wake_mailbox
x86/mm: Fix PTI for i386 some more
Linus Torvalds [Sun, 11 Aug 2024 17:15:34 +0000 (10:15 -0700)]
Merge tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull time keeping fixes from Thomas Gleixner:
- Fix a couple of issues in the NTP code where user supplied values are
neither sanity checked nor clamped to the operating range. This
results in integer overflows and eventualy NTP getting out of sync.
According to the history the sanity checks had been removed in favor
of clamping the values, but the clamping never worked correctly under
all circumstances. The NTP people asked to not bring the sanity
checks back as it might break existing applications.
Make the clamping work correctly and add it where it's missing
- If adjtimex() sets the clock it has to trigger the hrtimer subsystem
so it can adjust and if the clock was set into the future expire
timers if needed. The caller should provide a bitmask to tell
hrtimers which clocks have been adjusted.
adjtimex() uses not the proper constant and uses CLOCK_REALTIME
instead, which is 0. So hrtimers adjusts only the clocks, but does
not check for expired timers, which might make them expire really
late. Use the proper bitmask constant instead.
* tag 'timers-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix bogus clock_was_set() invocation in do_adjtimex()
ntp: Safeguard against time_constant overflow
ntp: Clamp maxerror and esterror to operating range
Linus Torvalds [Sun, 11 Aug 2024 17:07:52 +0000 (10:07 -0700)]
Merge tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Three small fixes for interrupt core and drivers:
- The interrupt core fails to honor caller supplied affinity hints
for non-managed interrupts and uses the system default affinity on
startup instead. Set the missing flag in the descriptor to tell the
core to use the provided affinity.
- Fix a shift out of bounds error in the Xilinx driver
- Handle switching to level trigger correctly in the RISCV APLIC
driver. It failed to retrigger the interrupt which causes it to
become stale"
* tag 'irq-urgent-2024-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/riscv-aplic: Retrigger MSI interrupt on source configuration
irqchip/xilinx: Fix shift out of bounds
genirq/irqdesc: Honor caller provided affinity in alloc_desc()
Linus Torvalds [Sun, 11 Aug 2024 16:55:32 +0000 (09:55 -0700)]
Merge tag 'usb-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for reported issues for
6.11-rc3. Included in here are:
- usb serial driver MODULE_DESCRIPTION() updates
- usb serial driver fixes
- typec driver fixes
- usb-ip driver fix
- gadget driver fixes
- dt binding update
All of these have been in linux-next with no reported issues"
* tag 'usb-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: ucsi: Fix a deadlock in ucsi_send_command_common()
usb: typec: tcpm: avoid sink goto SNK_UNATTACHED state if not received source capability message
usb: gadget: f_fs: pull out f->disable() from ffs_func_set_alt()
usb: gadget: f_fs: restore ffs_func_disable() functionality
USB: serial: debug: do not echo input by default
usb: typec: tipd: Delete extra semi-colon
usb: typec: tipd: Fix dereferencing freeing memory in tps6598x_apply_patch()
usb: gadget: u_serial: Set start_delayed during suspend
usb: typec: tcpci: Fix error code in tcpci_check_std_output_cap()
usb: typec: fsa4480: Check if the chip is really there
usb: gadget: core: Check for unset descriptor
usb: vhci-hcd: Do not drop references before new references are gained
usb: gadget: u_audio: Check return codes from usb_ep_enable and config_ep_by_speed.
usb: gadget: midi2: Fix the response for FB info with block 0xff
dt-bindings: usb: microchip,usb2514: Add USB2517 compatible
USB: serial: garmin_gps: use struct_size() to allocate pkt
USB: serial: garmin_gps: annotate struct garmin_packet with __counted_by
USB: serial: add missing MODULE_DESCRIPTION() macros
USB: serial: spcp8x5: remove unused struct 'spcp8x5_usb_ctrl_arg'
Linus Torvalds [Sun, 11 Aug 2024 16:51:29 +0000 (09:51 -0700)]
Merge tag 'tty-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for reported problems
for 6.11-rc3. Included in here are:
- sc16is7xx serial driver fixes
- uartclk bugfix for a divide by zero issue
- conmakehash userspace build issue fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: vt: conmakehash: cope with abs_srctree no longer in env
serial: sc16is7xx: fix invalid FIFO access with special register set
serial: sc16is7xx: fix TX fifo corruption
serial: core: check uartclk for zero to avoid divide by zero
Linus Torvalds [Sun, 11 Aug 2024 16:38:38 +0000 (09:38 -0700)]
Merge tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core / documentation fixes from Greg KH:
"Here are some small fixes, and some documentation updates for
6.11-rc3. Included in here are:
- embargoed hardware documenation updates based on a lot of review by
legal-types in lots of companies to try to make the process a _bit_
easier for us to manage over time.
- rust firmware documentation fix
- driver detach race fix for the fix that went into 6.11-rc1
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Fix uevent_show() vs driver detach race
Documentation: embargoed-hardware-issues.rst: add a section documenting the "early access" process
Documentation: embargoed-hardware-issues.rst: minor cleanups and fixes
rust: firmware: fix invalid rustdoc link
Linus Torvalds [Sun, 11 Aug 2024 16:28:04 +0000 (09:28 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two core fixes: one to prevent discard type changes (seen on iSCSI)
during intermittent errors and the other is fixing a lockdep problem
caused by the queue limits change.
And one driver fix in ufs"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Keep the discard mode stable
scsi: sd: Move sd_read_cpr() out of the q->limits_lock region
scsi: ufs: core: Fix hba->last_dme_cmd_tstamp timestamp updating logic
Linus Torvalds [Sat, 10 Aug 2024 17:44:21 +0000 (10:44 -0700)]
Merge tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Two minor fixes for recent changes
* tag 'nfsd-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't set SVC_SOCK_ANONYMOUS when creating nfsd sockets
sunrpc: avoid -Wformat-security warning
Linus Torvalds [Sat, 10 Aug 2024 17:28:52 +0000 (10:28 -0700)]
Merge tag 'i2c-for-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- Two fixes for SMBusAlert handling in the I2C core: one to avoid an
endless loop when scanning for handlers and one to make sure handlers
are always called even if HW has broken behaviour
- I2C header build fix for when ACPI is enabled but I2C isn't
- The testunit gets a rename in the code to match the documentation
- Two fixes for the Qualcomm GENI I2C controller are cleaning up the
error exit patch in the runtime_resume() function. The first is
disabling the clock, the second disables the icc on the way out
* tag 'i2c-for-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: testunit: match HostNotify test name with docs
i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume
i2c: qcom-geni: Add missing clk_disable_unprepare in geni_i2c_runtime_resume
i2c: Fix conditional for substituting empty ACPI functions
i2c: smbus: Send alert notifications to all devices if source not found
i2c: smbus: Improve handling of stuck alerts
Linus Torvalds [Sat, 10 Aug 2024 17:19:05 +0000 (10:19 -0700)]
Merge tag 'dma-mapping-6.11-2024-08-10' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- avoid a deadlock with dma-debug and netconsole (Rik van Riel)
* tag 'dma-mapping-6.11-2024-08-10' of git://git.infradead.org/users/hch/dma-mapping:
dma-debug: avoid deadlock between dma debug vs printk and netconsole
Linus Torvalds [Sat, 10 Aug 2024 17:06:26 +0000 (10:06 -0700)]
Merge tag 'bcachefs-2024-08-10' of git://evilpiepirate.org/bcachefs
Pull more bcachefs fixes from Kent Overstreet:
"A couple last minute fixes for the new disk accounting
- fix a bug that was causing ACLs to seemingly "disappear"
- new on disk format version, bcachefs_metadata_version_disk_accounting_v3
bcachefs_metadata_version_disk_accounting_v2 accidentally included
padding in disk_accounting_key; fortunately, 6.11 isn't out yet so
we can fix this with another version bump"
* tag 'bcachefs-2024-08-10' of git://evilpiepirate.org/bcachefs:
bcachefs: bcachefs_metadata_version_disk_accounting_v3
bcachefs: improve bch2_dev_usage_to_text()
bcachefs: bch2_accounting_invalid()
bcachefs: Switch to .get_inode_acl()
Yong-Xuan Wang [Fri, 9 Aug 2024 07:10:47 +0000 (15:10 +0800)]
irqchip/riscv-aplic: Retrigger MSI interrupt on source configuration
The section 4.5.2 of the RISC-V AIA specification says that "any write
to a sourcecfg register of an APLIC might (or might not) cause the
corresponding interrupt-pending bit to be set to one if the rectified
input value is high (= 1) under the new source mode."
When the interrupt type is changed in the sourcecfg register, the APLIC
device might not set the corresponding pending bit, so the interrupt might
never become pending.
To handle sourcecfg register changes for level-triggered interrupts in MSI
mode, manually set the pending bit for retriggering interrupt so it gets
retriggered if it was already asserted.
The device tree property 'xlnx,kind-of-intr' is sanity checked that the
bitmask contains only set bits which are in the range of the number of
interrupts supported by the controller.
The check is done by shifting the mask right by the number of supported
interrupts and checking the result for zero.
The data type of the mask is u32 and the number of supported interrupts is
up to 32. In case of 32 interrupts the shift is out of bounds, resulting in
a mismatch warning. The out of bounds condition is also reported by UBSAN:
UBSAN: shift-out-of-bounds in irq-xilinx-intc.c:332:22
shift exponent 32 is too large for 32-bit type 'unsigned int'
Linus Torvalds [Sat, 10 Aug 2024 04:33:25 +0000 (21:33 -0700)]
Merge tag '6.11-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- DFS fix
- fix for security flags for requiring encryption
- minor cleanup
* tag '6.11-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: cifs_inval_name_dfs_link_error: correct the check for fullpath
Fix spelling errors in Server Message Block
smb3: fix setting SecurityFlags when encryption is required
Linus Torvalds [Sat, 10 Aug 2024 04:26:50 +0000 (21:26 -0700)]
Merge tag 'spi-fix-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few SPI fixes: clock rate calculation fixes for the Kunpeng and lpsi
drivers and a missing registration of a device ID for spidev (which
had only been updated for DT cases, causing warnings)"
* tag 'spi-fix-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-fsl-lpspi: Fix scldiv calculation
spi: spidev: Add missing spi_device_id for bh2228fv
spi: hisi-kunpeng: Add verification for the max_frequency provided by the firmware
spi: hisi-kunpeng: Add validation for the minimum value of speed_hz
bcachefs_metadata_version_disk_accounting_v2 erroneously had padding
bytes in disk_accounting_key, which is a problem because we have to
guarantee that all unused bytes in disk_accounting_key are zeroed.
Fortunately 6.11 isn't out yet, so it's cheap to fix this by spinning a
new version.
Linus Torvalds [Fri, 9 Aug 2024 21:00:22 +0000 (14:00 -0700)]
Merge tag 'drm-fixes-2024-08-10' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly regular fixes, mostly amdgpu with i915/xe having a few each,
and then some misc bits across the board, seems about right for rc3
time.
xe:
- Fix off-by-one when processing RTP rules
- Use dma_fence_chain_free in chain fence unused as a sync
- Fix PL1 disable flow in xe_hwmon_power_max_write
- Take ref to VM in delayed dump snapshot
i915:
- correct dual pps handling for MTL_PCH+ [display]
- Adjust vma offset for framebuffer mmap offset [gem]
- Fix Virtual Memory mapping boundaries calculation [gem]
- Allow evicting to use the requested placement
- Attempt to get pages without eviction first"
* tag 'drm-fixes-2024-08-10' of https://gitlab.freedesktop.org/drm/kernel: (31 commits)
drm/xe: Take ref to VM in delayed snapshot
drm/xe/hwmon: Fix PL1 disable flow in xe_hwmon_power_max_write
drm/xe: Use dma_fence_chain_free in chain fence unused as a sync
drm/xe/rtp: Fix off-by-one when processing rules
drm/amdgpu: Add DCC GFX12 flag to enable address alignment
drm/amdgpu: correct sdma7 max dw
drm/amdgpu: Add address alignment support to DCC buffers
drm/amd/display: Skip Recompute DSC Params if no Stream on Link
drm/amdgpu: change non-dcc buffer copy configuration
drm/amdgpu: Forward soft recovery errors to userspace
drm/amdgpu: add golden setting for gc v12
drm/buddy: Add start address support to trim function
drm/amd/display: Add missing program DET segment call to pipe init
drm/amd/display: Add missing DCN314 to the DML Makefile
drm/amdgpu: force to use legacy inv in mmhub
drm/amd/pm: update powerplay structure on smu v14.0.2/3
drm/amd/display: Add missing mcache registers
drm/amd/display: Add dcc propagation value
drm/amd/display: Add missing DET segments programming
drm/amd/display: Replace dm_execute_dmub_cmd with dc_wake_and_execute_dmub_cmd
...
Kent Overstreet [Fri, 9 Aug 2024 03:19:59 +0000 (23:19 -0400)]
bcachefs: bch2_accounting_invalid()
Implement bch2_accounting_invalid(); check for junk at the end, and
replicas accounting entries in particular need to be checked or we'll
pop asserts later.
Linus Torvalds [Fri, 9 Aug 2024 17:44:35 +0000 (10:44 -0700)]
Merge tag 'pm-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Change the default EPP (energy-performence preference) value for the
Emerald Rapids processor in the intel_pstate driver.
Thisshould improve both the performance and energy efficiency (Pedro
Henrique Kopper)"
* tag 'pm-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Update Balance performance EPP for Emerald Rapids
Linus Torvalds [Fri, 9 Aug 2024 17:23:18 +0000 (10:23 -0700)]
Merge tag 'asm-generic-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic fixes from Arnd Bergmann:
"There are two more changes to the syscall.tbl conversion: the
'__NR_newfstat' in the previous bugfix was a mistake and gets reverted
now, after triple-checking that the contents are now back to what they
were on all architectures. The __NR_nfsservctl definition is not
really needed but came up in the same discussion as it had previously
been defined in uapi/asm-generic/unistd.h and tested for in user
space.
There are a few more symbols that used to be defined in the old
unistd.h file, but that are never defined on any other architecture
using syscall.tbl format. These used to be needed inside of the
kernel:
Searching for these on https://codesearch.debian.net/ shows a few
packages (rustc, golang, clamav, libseccomp, librsvg, strace) that
duplicate all the macros from asm/unistd.h, but nothing that actually
uses the macros, so I concluded that they are fine to omit after all"
* tag 'asm-generic-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
syscalls: add back legacy __NR_nfsservctl macro
syscalls: fix fstat() entry again
Linus Torvalds [Fri, 9 Aug 2024 16:43:46 +0000 (09:43 -0700)]
Merge tag 'probes-fixes-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull kprobe fixes from Masami Hiramatsu:
- Fix misusing str_has_prefix() parameter order to check symbol prefix
correctly
- bpf: remove unused declaring of bpf_kprobe_override
* tag 'probes-fixes-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
kprobes: Fix to check symbol prefixes correctly
bpf: kprobe: remove unused declaring of bpf_kprobe_override
Linus Torvalds [Fri, 9 Aug 2024 16:35:58 +0000 (09:35 -0700)]
Merge tag 'block-6.11-20240809' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Just a set of cleanups for blk-throttle and nvme structures"
* tag 'block-6.11-20240809' of git://git.kernel.dk/linux:
nvme: reorganize nvme_ns_head fields
nvme: change data type of lba_shift
nvme: remove a field from nvme_ns_head
nvme: remove unused parameter
blk-throttle: remove more latency dead-code
Linus Torvalds [Fri, 9 Aug 2024 16:32:10 +0000 (09:32 -0700)]
Merge tag 'io_uring-6.11-20240809' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"Nothing major in here, just two fixes for ensuring that bundle
recv/send requests always get marked for cleanups, and a single fix to
ensure that sends with provided buffers only pick a single buffer
unless the bundle option has been enabled"
* tag 'io_uring-6.11-20240809' of git://git.kernel.dk/linux:
io_uring/net: don't pick multiple buffers for non-bundle send
io_uring/net: ensure expanded bundle send gets marked for cleanup
io_uring/net: ensure expanded bundle recv gets marked for cleanup
Linus Torvalds [Fri, 9 Aug 2024 16:25:30 +0000 (09:25 -0700)]
Merge tag 'sound-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of lots of small changes, almost all device-specific:
- A series of fixes for ASoC Qualcomm stuff
- Various fixes for Cirrus ASoC and HD-audio codecs
- A few AMD ASoC quirks and usual HD-audio quirks
- Other misc fixes, including a long-time regression in USB-audio"
* tag 'sound-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (39 commits)
ASoC: cs35l56: Patch CS35L56_IRQ1_MASK_18 to the default value
ASoC: meson: axg-fifo: fix irq scheduling issue with PREEMPT_RT
MAINTAINERS: Update Cirrus Logic parts to linux-sound mailing list
ASoC: dt-bindings: qcom,wcd939x: Correct reset GPIO polarity in example
ASoC: dt-bindings: qcom,wcd938x: Correct reset GPIO polarity in example
ASoC: dt-bindings: qcom,wcd934x: Correct reset GPIO polarity in example
ASoC: dt-bindings: qcom,wcd937x: Correct reset GPIO polarity in example
ASoC: amd: yc: Add quirk entry for OMEN by HP Gaming Laptop 16-n0xxx
ASoC: codecs: ES8326: button detect issue
ASoC: amd: yc: Support mic on Lenovo Thinkpad E14 Gen 6
ALSA: usb-audio: Re-add ScratchAmp quirk entries
ALSA: hda/realtek: Add Framework Laptop 13 (Intel Core Ultra) to quirks
ALSA: hda/hdmi: Yet more pin fix for HP EliteDesk 800 G4
ALSA: hda: Add HP MP9 G4 Retail System AMS to force connect list
ASoC: cs35l56: Handle OTP read latency over SoundWire
ASoC: codecs: lpass-macro: fix missing codec version
ALSA: line6: Fix racy access to midibuf
ASoC: cs-amp-lib: Fix NULL pointer crash if efi.get_variable is NULL
ASoC: cs35l56: Stop creating ALSA controls for firmware coefficients
ASoC: wm_adsp: Add control_add callback and export wm_adsp_control_add()
...
Linus Torvalds [Fri, 9 Aug 2024 15:33:28 +0000 (08:33 -0700)]
module: make waiting for a concurrent module loader interruptible
The recursive aes-arm-bs module load situation reported by Russell King
is getting fixed in the crypto layer, but this in the meantime fixes the
"recursive load hangs forever" by just making the waiting for the first
module load be interruptible.
This should now match the old behavior before commit 9b9879fc0327
("modules: catch concurrent module loads, treat them as idempotent"),
which used the different "wait for module to be ready" code in
module_patient_check_exists().
End result: a recursive module load will still block, but now a signal
will interrupt it and fail the second module load, at which point the
first module will successfully complete loading.
Fixes: 9b9879fc0327 ("modules: catch concurrent module loads, treat them as idempotent") Cc: Russell King <[email protected]> Cc: Herbert Xu <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Wolfram Sang [Fri, 9 Aug 2024 13:28:08 +0000 (15:28 +0200)]
Merge tag 'i2c-host-fixes-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
Two fixes on the Qualcomm GENI I2C controller are cleaning up the
error exit patch in the runtime_resume() function. The first is
disabling the clock, the second disables the icc on the way out.
Takashi Iwai [Fri, 9 Aug 2024 07:58:07 +0000 (09:58 +0200)]
Merge tag 'asoc-fix-v6.11-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.11
Quite a lot of fixes have come in since the merge window, there's some
repetitive fixes over the Qualcomm drivers increasing the patch count,
along with a large batch of fixes from Cirrus. We also have some quirks
and some individual fixes.
Dave Airlie [Fri, 9 Aug 2024 07:08:55 +0000 (17:08 +1000)]
Merge tag 'drm-xe-fixes-2024-08-08' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
- Fix off-by-one when processing RTP rules (Lucas)
- Use dma_fence_chain_free in chain fence unused as a sync (Brost)
- Fix PL1 disable flow in xe_hwmon_power_max_write (Karthik)
- Take ref to VM in delayed dump snapshot (Brost)
Dave Airlie [Fri, 9 Aug 2024 03:00:59 +0000 (13:00 +1000)]
Merge tag 'drm-misc-fixes-2024-08-08' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A fix for drm/client to prevent a null pointer dereference, a fix for a
double-free in drm/bridge-connector, a fix for a gem shmem test, and a
fix for async flips updates.
cifs: cifs_inval_name_dfs_link_error: correct the check for fullpath
Replace the always-true check tcon->origin_fullpath with
check of server->leaf_fullpath
See https://bugzilla.kernel.org/show_bug.cgi?id=219083
The check of the new @tcon will always be true during mounting,
since @tcon->origin_fullpath will only be set after the tree is
connected to the latest common resource, as well as checking if
the prefix paths from it are fully accessible.
Fixes: 3ae872de4107 ("smb: client: fix shared DFS root mounts with different prefixes") Reviewed-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Gleb Korobeynikov <[email protected]> Signed-off-by: Steve French <[email protected]>
Linus Torvalds [Thu, 8 Aug 2024 20:51:44 +0000 (13:51 -0700)]
Merge tag 'net-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth.
Current release - regressions:
- eth: bnxt_en: fix memory out-of-bounds in bnxt_fill_hw_rss_tbl() on
older chips
Current release - new code bugs:
- ethtool: fix off-by-one error / kdoc contradicting the code for max
RSS context IDs
- Bluetooth: hci_qca:
- QCA6390: fix support on non-DT platforms
- QCA6390: don't call pwrseq_power_off() twice
- fix a NULL-pointer derefence at shutdown
- eth: ice: fix incorrect assigns of FEC counters
Previous releases - regressions:
- mptcp: fix handling endpoints with both 'signal' and 'subflow'
flags set
- virtio-net: fix changing ring count when vq IRQ coalescing not
supported
- eth: gve: fix use of netif_carrier_ok() during reconfig / reset
Previous releases - always broken:
- eth: idpf: fix bugs in queue re-allocation on reconfig / reset
- ethtool: fix context creation with no parameters
Misc:
- linkwatch: use system_unbound_wq to ease RTNL contention"
* tag 'net-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (41 commits)
net: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897.
ethtool: Fix context creation with no parameters
net: ethtool: fix off-by-one error in max RSS context IDs
net: pse-pd: tps23881: include missing bitfield.h header
net: fec: Stop PPS on driver remove
net: bcmgenet: Properly overlay PHY and MAC Wake-on-LAN capabilities
l2tp: fix lockdep splat
net: stmmac: dwmac4: fix PCS duplex mode decode
idpf: fix UAFs when destroying the queues
idpf: fix memleak in vport interrupt configuration
idpf: fix memory leaks and crashes while performing a soft reset
bnxt_en : Fix memory out-of-bounds in bnxt_fill_hw_rss_tbl()
net: dsa: bcm_sf2: Fix a possible memory leak in bcm_sf2_mdio_register()
net/smc: add the max value of fallback reason count
Bluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor
Bluetooth: l2cap: always unlock channel in l2cap_conless_channel()
Bluetooth: hci_qca: fix a NULL-pointer derefence at shutdown
Bluetooth: hci_qca: fix QCA6390 support on non-DT platforms
Bluetooth: hci_qca: don't call pwrseq_power_off() twice for QCA6390
ice: Fix incorrect assigns of FEC counts
...
Linus Torvalds [Thu, 8 Aug 2024 20:32:59 +0000 (13:32 -0700)]
Merge tag 'trace-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Have reading of event format files test if the metadata still exists.
When a event is freed, a flag (EVENT_FILE_FL_FREED) in the metadata
is set to state that it is to prevent any new references to it from
happening while waiting for existing references to close. When the
last reference closes, the metadata is freed. But the "format" was
missing a check to this flag (along with some other files) that
allowed new references to happen, and a use-after-free bug to occur.
- Have the trace event meta data use the refcount infrastructure
instead of relying on its own atomic counters.
- Have tracefs inodes use alloc_inode_sb() for allocation instead of
using kmem_cache_alloc() directly.
- Have eventfs_create_dir() return an ERR_PTR instead of NULL as the
callers expect a real object or an ERR_PTR.
- Have release_ei() use call_srcu() and not call_rcu() as all the
protection is on SRCU and not RCU.
- Fix ftrace_graph_ret_addr() to use the task passed in and not
current.
- Fix overflow bug in get_free_elt() where the counter can overflow the
integer and cause an infinite loop.
- Remove unused function ring_buffer_nr_pages()
- Have tracefs freeing use the inode RCU infrastructure instead of
creating its own.
When the kernel had randomize structure fields enabled, the rcu field
of the tracefs_inode was overlapping the rcu field of the inode
structure, and corrupting it. Instead, use the destroy_inode()
callback to do the initial cleanup of the code, and then have
free_inode() free it.
* tag 'trace-v6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracefs: Use generic inode RCU for synchronizing freeing
ring-buffer: Remove unused function ring_buffer_nr_pages()
tracing: Fix overflow in get_free_elt()
function_graph: Fix the ret_stack used by ftrace_graph_ret_addr()
eventfs: Use SRCU for freeing eventfs_inodes
eventfs: Don't return NULL in eventfs_create_dir()
tracefs: Fix inode allocation
tracing: Use refcount for trace_event_file reference counter
tracing: Have format file honor EVENT_FILE_FL_FREED
Linus Torvalds [Thu, 8 Aug 2024 20:27:31 +0000 (13:27 -0700)]
Merge tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Assorted little stuff:
- lockdep fixup for lockdep_set_notrack_class()
- we can now remove a device when using erasure coding without
deadlocking, though we still hit other issues
- the 'allocator stuck' timeout is now configurable, and messages are
ratelimited. The default timeout has been increased from 10 seconds
to 30"
* tag 'bcachefs-2024-08-08' of git://evilpiepirate.org/bcachefs:
bcachefs: Use bch2_wait_on_allocator() in btree node alloc path
bcachefs: Make allocator stuck timeout configurable, ratelimit messages
bcachefs: Add missing path_traverse() to btree_iter_next_node()
bcachefs: ec should not allocate from ro devs
bcachefs: Improved allocator debugging for ec
bcachefs: Add missing bch2_trans_begin() call
bcachefs: Add a comment for bucket helper types
bcachefs: Don't rely on implicit unsigned -> signed integer conversion
lockdep: Fix lockdep_set_notrack_class() for CONFIG_LOCK_STAT
bcachefs: Fix double free of ca->buckets_nouse
Jerome Brunet [Wed, 7 Aug 2024 16:27:03 +0000 (18:27 +0200)]
ASoC: meson: axg-fifo: fix irq scheduling issue with PREEMPT_RT
With PREEMPT_RT enabled a spinlock_t becomes a sleeping lock.
This is usually not a problem with spinlocks used in IRQ context since
IRQ handlers get threaded. However, if IRQF_ONESHOT is set, the primary
handler won't be force-threaded and runs always in hardirq context. This is
a problem because spinlock_t requires a preemptible context on PREEMPT_RT.
In this particular instance, regmap mmio uses spinlock_t to protect the
register access and IRQF_ONESHOT is set on the IRQ. In this case, it is
actually better to do everything in threaded handler and it solves the
problem with PREEMPT_RT.
ASoC: dt-bindings: qcom,wcd939x: Correct reset GPIO polarity in example
The reset GPIO of WCD9390/WCD9395 is active low and that's how it is
routed on typical boards, so correct the example DTS to use expected
polarity, instead of IRQ flag (which is a logical mistake on its own).
Linus Torvalds [Thu, 8 Aug 2024 19:29:40 +0000 (12:29 -0700)]
module: warn about excessively long module waits
Russell King reported that the arm cbc(aes) crypto module hangs when
loaded, and Herbert Xu bisected it to commit 9b9879fc0327 ("modules:
catch concurrent module loads, treat them as idempotent"), and noted:
"So what's happening here is that the first modprobe tries to load a
fallback CBC implementation, in doing so it triggers a load of the
exact same module due to module aliases.
IOW we're loading aes-arm-bs which provides cbc(aes). However, this
needs a fallback of cbc(aes) to operate, which is made out of the
generic cbc module + any implementation of aes, or ecb(aes). The
latter happens to also be provided by aes-arm-cb so that's why it
tries to load the same module again"
So loading the aes-arm-bs module ends up wanting to recursively load
itself, and the recursive load then ends up waiting for the original
module load to complete.
This is a regression, in that it used to be that we just tried to load
the module multiple times, and then as we went on to install it the
second time we would instead just error out because the module name
already existed.
That is actually also exactly what the original "catch concurrent loads"
patch did in commit 9828ed3f695a ("module: error out early on concurrent
load of the same module file"), but it turns out that it ends up being
racy, in that erroring out before the module has been fully initialized
will cause failures in dependent module loading.
See commit ac2263b588df (which was the revert of that "error out early")
commit for details about why erroring out before the module has been
initialized is actually fundamentally racy.
Now, for the actual recursive module load (as opposed to just
concurrently loading the same module twice), the race is not an issue.
At the same time it's hard for the kernel to see that this is recursion,
because the module load is always done from a usermode helper, so the
recursion is not some simple callchain within the kernel.
End result: this is not the real fix, but this at least adds a warning
for the situation (admittedly much too late for all the debugging pain
that Russell and Herbert went through) and if we can come to a
resolution on how to detect the recursion properly, this re-organizes
the code to make that easier.
Kent Overstreet [Wed, 7 Aug 2024 19:42:23 +0000 (15:42 -0400)]
bcachefs: Switch to .get_inode_acl()
.set_acl() requires a dentry, and if one isn't passed it marks the VFS
inode as not having an ACL.
This has been causing inodes with ACLs to have them "disappear" on
bcachefs filesystem, depending on which path those inodes get pulled
into the cache from.
Switching to .get_inode_acl(), like other local filesystems, fixes this.
Jens Axboe [Thu, 8 Aug 2024 18:27:40 +0000 (12:27 -0600)]
Merge tag 'nvme-6.11-2024-08-08' of git://git.infradead.org/nvme into block-6.11
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.11
- Cleanups and improved struct packing (Kanchan)"
* tag 'nvme-6.11-2024-08-08' of git://git.infradead.org/nvme:
nvme: reorganize nvme_ns_head fields
nvme: change data type of lba_shift
nvme: remove a field from nvme_ns_head
nvme: remove unused parameter
Linus Torvalds [Thu, 8 Aug 2024 18:22:04 +0000 (11:22 -0700)]
Merge tag 'loongarch-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Enable general EFI poweroff method to make poweroff usable on
hardwares which lack ACPI S5, use accessors to page table entries
instead of direct dereference to avoid potential problems, and two
trivial kvm cleanups"
* tag 'loongarch-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Remove undefined a6 argument comment for kvm_hypercall()
LoongArch: KVM: Remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS
LoongArch: Use accessors to page table entries instead of direct dereference
LoongArch: Enable general EFI poweroff method
Matthew Brost [Thu, 1 Aug 2024 15:41:16 +0000 (08:41 -0700)]
drm/xe: Take ref to VM in delayed snapshot
Kernel BO's don't take a ref to the VM, we need the VM for the
delayed snapshot, so take a ref to the VM in delayed snapshot.
v2:
- Check for lrc_bo before taking a VM ref (CI)
- Check lrc_bo->vm before taking / dropping a VM ref (CI)
- Drop VM in xe_lrc_snapshot_free
v5:
- Fix commit message wording (Johnathan)
Karthik Poosa [Thu, 1 Aug 2024 11:24:24 +0000 (16:54 +0530)]
drm/xe/hwmon: Fix PL1 disable flow in xe_hwmon_power_max_write
In xe_hwmon_power_max_write, for PL1 disable supported case, instead of
returning after PL1 disable, PL1 enable path was also being run.
Fixed it by returning after disable.
v2: Correct typo and grammar in commit message. (Jonathan)
Matthew Brost [Sat, 27 Jul 2024 01:22:16 +0000 (18:22 -0700)]
drm/xe: Use dma_fence_chain_free in chain fence unused as a sync
A chain fence is uninitialized if not installed in a drm sync obj. Thus
if xe_sync_entry_cleanup is called and sync->chain_fence is non-NULL the
proper cleanup is dma_fence_chain_free rather than a dma-fence put.
Lucas De Marchi [Fri, 26 Jul 2024 06:43:35 +0000 (23:43 -0700)]
drm/xe/rtp: Fix off-by-one when processing rules
Gustavo noticed an odd "+ 2" in rtp_mark_active() while processing
rtp rules and pointed that it should be "+ 1". In fact, while processing
entries without actions (OOB workarounds), if the WA is activated and
has OR rules, it will also inadvertently activate the very next
workaround.
Test in a LNL B0 platform by moving 18024947630 on top of 16020292621,
makes the latter become active:
Gavin Shan [Thu, 8 Aug 2024 04:08:08 +0000 (14:08 +1000)]
cpumask: Fix crash on updating CPU enabled mask
The CPU enabled mask instead of the CPU possible mask should be used
by set_cpu_enabled(). Otherwise, we run into crash due to write to
the read-only CPU possible mask when vCPU is hot added on ARM64.
Steve French [Thu, 1 Aug 2024 02:38:50 +0000 (21:38 -0500)]
smb3: fix setting SecurityFlags when encryption is required
Setting encryption as required in security flags was broken.
For example (to require all mounts to be encrypted by setting):
"echo 0x400c5 > /proc/fs/cifs/SecurityFlags"
Would return "Invalid argument" and log "Unsupported security flags"
This patch fixes that (e.g. allowing overriding the default for
SecurityFlags 0x00c5, including 0x40000 to require seal, ie
SMB3.1.1 encryption) so now that works and forces encryption
on subsequent mounts.
Martin Whitaker [Wed, 7 Aug 2024 20:52:09 +0000 (21:52 +0100)]
net: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897.
As noted in the device errata [1-8], EEE support is not fully operational
in the KSZ8567, KSZ9477, KSZ9567, KSZ9896, and KSZ9897 devices, causing
link drops when connected to another device that supports EEE. The patch
series "net: add EEE support for KSZ9477 switch family" merged in commit 9b0bf4f77162 caused EEE support to be enabled in these devices. A fix for
this regression for the KSZ9477 alone was merged in commit 08c6d8bae48c2.
This patch extends this fix to the other affected devices.
Gal Pressman [Wed, 7 Aug 2024 17:33:52 +0000 (20:33 +0300)]
ethtool: Fix context creation with no parameters
The 'at least one change' requirement is not applicable for context
creation, skip the check in such case.
This allows a command such as 'ethtool -X eth0 context new' to work.
The command works by mistake when using older versions of userspace
ethtool due to an incompatibility issue where rxfh.input_xfrm is passed
as zero (unset) instead of RXH_XFRM_NO_CHANGE as done with recent
userspace. This patch does not try to solve the incompatibility issue.
Edward Cree [Wed, 7 Aug 2024 16:06:12 +0000 (17:06 +0100)]
net: ethtool: fix off-by-one error in max RSS context IDs
Both ethtool_ops.rxfh_max_context_id and the default value used when
it's not specified are supposed to be exclusive maxima (the former
is documented as such; the latter, U32_MAX, cannot be used as an ID
since it equals ETH_RXFH_CONTEXT_ALLOC), but xa_alloc() expects an
inclusive maximum.
Subtract one from 'limit' to produce an inclusive maximum, and pass
that to xa_alloc().
Increase bnxt's max by one to prevent a (very minor) regression, as
BNXT_MAX_ETH_RSS_CTX is an inclusive max. This is safe since bnxt
is not actually hard-limited; BNXT_MAX_ETH_RSS_CTX is just a
leftover from old driver code that managed context IDs itself.
Rename rxfh_max_context_id to rxfh_max_num_contexts to make its
semantics (hopefully) more obvious.
Arnd Bergmann [Wed, 7 Aug 2024 07:54:22 +0000 (09:54 +0200)]
net: pse-pd: tps23881: include missing bitfield.h header
Using FIELD_GET() fails in configurations that don't already include
the header file indirectly:
drivers/net/pse-pd/tps23881.c: In function 'tps23881_i2c_probe':
drivers/net/pse-pd/tps23881.c:755:13: error: implicit declaration of function 'FIELD_GET' [-Wimplicit-function-declaration]
755 | if (FIELD_GET(TPS23881_REG_DEVID_MASK, ret) != TPS23881_DEVICE_ID) {
| ^~~~~~~~~
Csókás, Bence [Wed, 7 Aug 2024 08:09:56 +0000 (10:09 +0200)]
net: fec: Stop PPS on driver remove
PPS was not stopped in `fec_ptp_stop()`, called when
the adapter was removed. Consequentially, you couldn't
safely reload the driver with the PPS signal on.
net: bcmgenet: Properly overlay PHY and MAC Wake-on-LAN capabilities
Some Wake-on-LAN modes such as WAKE_FILTER may only be supported by the MAC,
while others might be only supported by the PHY. Make sure that the .get_wol()
returns the union of both rather than only that of the PHY if the PHY supports
Wake-on-LAN.
James Chapman [Tue, 6 Aug 2024 16:06:26 +0000 (17:06 +0100)]
l2tp: fix lockdep splat
When l2tp tunnels use a socket provided by userspace, we can hit
lockdep splats like the below when data is transmitted through another
(unrelated) userspace socket which then gets routed over l2tp.
This issue was previously discussed here:
https://lore.kernel.org/netdev/[email protected]/
The solution is to have lockdep treat socket locks of l2tp tunnel
sockets separately than those of standard INET sockets. To do so, use
a different lockdep subclass where lock nesting is possible.
============================================
WARNING: possible recursive locking detected
6.10.0+ #34 Not tainted
--------------------------------------------
iperf3/771 is trying to acquire lock: ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0
but task is already holding lock: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10
other info that might help us debug this:
Possible unsafe locking scenario:
dwmac4 was decoding the duplex mode from the GMAC_PHYIF_CONTROL_STATUS
register incorrectly, using GMAC_PHYIF_CTRLSTATUS_LNKMOD_MASK (value 1)
rather than GMAC_PHYIF_CTRLSTATUS_LNKMOD (bit 16). Fix this.
Andi Kleen [Thu, 8 Aug 2024 00:02:44 +0000 (17:02 -0700)]
x86/mtrr: Check if fixed MTRRs exist before saving them
MTRRs have an obsolete fixed variant for fine grained caching control
of the 640K-1MB region that uses separate MSRs. This fixed variant has
a separate capability bit in the MTRR capability MSR.
So far all x86 CPUs which support MTRR have this separate bit set, so it
went unnoticed that mtrr_save_state() does not check the capability bit
before accessing the fixed MTRR MSRs.
Though on a CPU that does not support the fixed MTRR capability this
results in a #GP. The #GP itself is harmless because the RDMSR fault is
handled gracefully, but results in a WARN_ON().
Linus Torvalds [Thu, 8 Aug 2024 14:32:20 +0000 (07:32 -0700)]
Merge tag 'mm-hotfixes-stable-2024-08-07-18-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Nine hotfixes. Five are cc:stable, the others either pertain to
post-6.10 material or aren't considered necessary for earlier kernels.
Five are MM and four are non-MM. No identifiable theme here - please
see the individual changelogs"
* tag 'mm-hotfixes-stable-2024-08-07-18-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
padata: Fix possible divide-by-0 panic in padata_mt_helper()
mailmap: update entry for David Heidelberg
memcg: protect concurrent access to mem_cgroup_idr
mm: shmem: fix incorrect aligned index when checking conflicts
mm: shmem: avoid allocating huge pages larger than MAX_PAGECACHE_ORDER for shmem
mm: list_lru: fix UAF for memory cgroup
kcov: properly check for softirq context
MAINTAINERS: Update LTP members and web
selftests: mm: add s390 to ARCH check
Takashi Iwai [Thu, 8 Aug 2024 08:18:01 +0000 (10:18 +0200)]
ALSA: usb-audio: Re-add ScratchAmp quirk entries
At the code refactoring of USB-audio quirk handling, I assumed that
the quirk entries of Stanton ScratchAmp devices were only about the
device name, and moved them completely into the rename table.
But it seems that the device requires the quirk entry so that it's
probed by the driver itself.
This re-adds back the quirk entries of ScratchAmp, but in a
minimalistic manner.
Jakub Kicinski [Thu, 8 Aug 2024 03:31:42 +0000 (20:31 -0700)]
Merge tag 'for-net-2024-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_sync: avoid dup filtering when passive scanning with adv monitor
- hci_qca: don't call pwrseq_power_off() twice for QCA6390
- hci_qca: fix QCA6390 support on non-DT platforms
- hci_qca: fix a NULL-pointer derefence at shutdown
- l2cap: always unlock channel in l2cap_conless_channel()
* tag 'for-net-2024-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor
Bluetooth: l2cap: always unlock channel in l2cap_conless_channel()
Bluetooth: hci_qca: fix a NULL-pointer derefence at shutdown
Bluetooth: hci_qca: fix QCA6390 support on non-DT platforms
Bluetooth: hci_qca: don't call pwrseq_power_off() twice for QCA6390
====================
====================
idpf: fix 3 bugs revealed by the Chapter I
Alexander Lobakin says:
The libeth conversion revealed 2 serious issues which lead to sporadic
crashes or WARNs under certain configurations. Additional one was found
while debugging these two with kmemleak.
This one is targeted stable, the rest can be backported manually later
if needed. They can be reproduced only after the conversion is applied
anyway.
====================
The second tagged commit started sometimes (very rarely, but possible)
throwing WARNs from
net/core/page_pool.c:page_pool_disable_direct_recycling().
Turned out idpf frees interrupt vectors with embedded NAPIs *before*
freeing the queues making page_pools' NAPI pointers lead to freed
memory before these pools are destroyed by libeth.
It's not clear whether there are other accesses to the freed vectors
when destroying the queues, but anyway, we usually free queue/interrupt
vectors only when the queues are destroyed and the NAPIs are guaranteed
to not be referenced anywhere.
Invert the allocation and freeing logic making queue/interrupt vectors
be allocated first and freed last. Vectors don't require queues to be
present, so this is safe. Additionally, this change allows to remove
that useless queue->q_vector pointer cleanup, as vectors are still
valid when freeing the queues (+ both are freed within one function,
so it's not clear why nullify the pointers at all).
Michal Kubiak [Tue, 6 Aug 2024 22:09:21 +0000 (15:09 -0700)]
idpf: fix memleak in vport interrupt configuration
The initialization of vport interrupt consists of two functions:
1) idpf_vport_intr_init() where a generic configuration is done
2) idpf_vport_intr_req_irq() where the irq for each q_vector is
requested.
The first function used to create a base name for each interrupt using
"kasprintf()" call. Unfortunately, although that call allocated memory
for a text buffer, that memory was never released.
Fix this by removing creating the interrupt base name in 1).
Instead, always create a full interrupt name in the function 2), because
there is no need to create a base name separately, considering that the
function 2) is never called out of idpf_vport_intr_init() context.
idpf: fix memory leaks and crashes while performing a soft reset
The second tagged commit introduced a UAF, as it removed restoring
q_vector->vport pointers after reinitializating the structures.
This is due to that all queue allocation functions are performed here
with the new temporary vport structure and those functions rewrite
the backpointers to the vport. Then, this new struct is freed and
the pointers start leading to nowhere.
But generally speaking, the current logic is very fragile. It claims
to be more reliable when the system is low on memory, but in fact, it
consumes two times more memory as at the moment of running this
function, there are two vports allocated with their queues and vectors.
Moreover, it claims to prevent the driver from running into "bad state",
but in fact, any error during the rebuild leaves the old vport in the
partially allocated state.
Finally, if the interface is down when the function is called, it always
allocates a new queue set, but when the user decides to enable the
interface later on, vport_open() allocates them once again, IOW there's
a clear memory leak here.
Just don't allocate a new queue set when performing a reset, that solves
crashes and memory leaks. Readd the old queue number and reopen the
interface on rollback - that solves limbo states when the device is left
disabled and/or without HW queues enabled.
Michael Chan [Tue, 6 Aug 2024 05:37:42 +0000 (22:37 -0700)]
bnxt_en : Fix memory out-of-bounds in bnxt_fill_hw_rss_tbl()
A recent commit has modified the code in __bnxt_reserve_rings() to
set the default RSS indirection table to default only when the number
of RX rings is changing. While this works for newer firmware that
requires RX ring reservations, it causes the regression on older
firmware not requiring RX ring resrvations (BNXT_NEW_RM() returns
false).
With older firmware, RX ring reservations are not required and so
hw_resc->resv_rx_rings is not always set to the proper value. The
comparison:
if (old_rx_rings != bp->hw_resc.resv_rx_rings)
in __bnxt_reserve_rings() may be false even when the RX rings are
changing. This will cause __bnxt_reserve_rings() to skip setting
the default RSS indirection table to default to match the current
number of RX rings. This may later cause bnxt_fill_hw_rss_tbl() to
use an out-of-range index.
We already have bnxt_check_rss_tbl_no_rmgr() to handle exactly this
scenario. We just need to move it up in bnxt_need_reserve_rings()
to be called unconditionally when using older firmware. Without the
fix, if the TX rings are changing, we'll skip the
bnxt_check_rss_tbl_no_rmgr() call and __bnxt_reserve_rings() may also
skip the bnxt_set_dflt_rss_indir_tbl() call for the reason explained
in the last paragraph. Without setting the default RSS indirection
table to default, it causes the regression:
BUG: KASAN: slab-out-of-bounds in __bnxt_hwrm_vnic_set_rss+0xb79/0xe40
Read of size 2 at addr ffff8881c5809618 by task ethtool/31525
Call Trace:
__bnxt_hwrm_vnic_set_rss+0xb79/0xe40
bnxt_hwrm_vnic_rss_cfg_p5+0xf7/0x460
__bnxt_setup_vnic_p5+0x12e/0x270
__bnxt_open_nic+0x2262/0x2f30
bnxt_open_nic+0x5d/0xf0
ethnl_set_channels+0x5d4/0xb30
ethnl_default_set_doit+0x2f1/0x620
Joe Hattori [Tue, 6 Aug 2024 01:13:27 +0000 (10:13 +0900)]
net: dsa: bcm_sf2: Fix a possible memory leak in bcm_sf2_mdio_register()
bcm_sf2_mdio_register() calls of_phy_find_device() and then
phy_device_remove() in a loop to remove existing PHY devices.
of_phy_find_device() eventually calls bus_find_device(), which calls
get_device() on the returned struct device * to increment the refcount.
The current implementation does not decrement the refcount, which causes
memory leak.
This commit adds the missing phy_device_free() call to decrement the
refcount via put_device() to balance the refcount.
Zhengchao Shao [Mon, 5 Aug 2024 04:38:56 +0000 (12:38 +0800)]
net/smc: add the max value of fallback reason count
The number of fallback reasons defined in the smc_clc.h file has reached
36. For historical reasons, some are no longer quoted, and there's 33
actually in use. So, add the max value of fallback reason count to 36.
Fixes: 6ac1e6563f59 ("net/smc: support smc v2.x features validate") Fixes: 7f0620b9940b ("net/smc: support max connections per lgr negotiation") Fixes: 69b888e3bb4b ("net/smc: support max links per lgr negotiation in clc handshake") Signed-off-by: Zhengchao Shao <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-by: D. Wythe <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
Looking at the padata_mt_helper() function, the only way a divide-by-0
panic can happen is when ps->chunk_size is 0. The way that chunk_size is
initialized in padata_do_multithreaded(), chunk_size can be 0 when the
min_chunk in the passed-in padata_mt_job structure is 0.
Fix this divide-by-0 panic by making sure that chunk_size will be at least
1 no matter what the input parameters are.
Shakeel Butt [Fri, 2 Aug 2024 23:58:22 +0000 (16:58 -0700)]
memcg: protect concurrent access to mem_cgroup_idr
Commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after
many small jobs") decoupled the memcg IDs from the CSS ID space to fix the
cgroup creation failures. It introduced IDR to maintain the memcg ID
space. The IDR depends on external synchronization mechanisms for
modifications. For the mem_cgroup_idr, the idr_alloc() and idr_replace()
happen within css callback and thus are protected through cgroup_mutex
from concurrent modifications. However idr_remove() for mem_cgroup_idr
was not protected against concurrency and can be run concurrently for
different memcgs when they hit their refcnt to zero. Fix that.
We have been seeing list_lru based kernel crashes at a low frequency in
our fleet for a long time. These crashes were in different part of
list_lru code including list_lru_add(), list_lru_del() and reparenting
code. Upon further inspection, it looked like for a given object (dentry
and inode), the super_block's list_lru didn't have list_lru_one for the
memcg of that object. The initial suspicions were either the object is
not allocated through kmem_cache_alloc_lru() or somehow
memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
returned success. No evidence were found for these cases.
Looking more deeply, we started seeing situations where valid memcg's id
is not present in mem_cgroup_idr and in some cases multiple valid memcgs
have same id and mem_cgroup_idr is pointing to one of them. So, the most
reasonable explanation is that these situations can happen due to race
between multiple idr_remove() calls or race between
idr_alloc()/idr_replace() and idr_remove(). These races are causing
multiple memcgs to acquire the same ID and then offlining of one of them
would cleanup list_lrus on the system for all of them. Later access from
other memcgs to the list_lru cause crashes due to missing list_lru_one.
Baolin Wang [Wed, 31 Jul 2024 05:46:20 +0000 (13:46 +0800)]
mm: shmem: fix incorrect aligned index when checking conflicts
In the shmem_suitable_orders() function, xa_find() is used to check for
conflicts in the pagecache to select suitable huge orders. However, when
checking each huge order in every loop, the aligned index is calculated
from the previous iteration, which may cause suitable huge orders to be
missed.
We should use the original index each time in the loop to calculate a new
aligned index for checking conflicts to avoid this issue.
Baolin Wang [Wed, 31 Jul 2024 05:46:19 +0000 (13:46 +0800)]
mm: shmem: avoid allocating huge pages larger than MAX_PAGECACHE_ORDER for shmem
Similar to commit d659b715e94ac ("mm/huge_memory: avoid PMD-size page
cache if needed"), ARM64 can support 512MB PMD-sized THP when the base
page size is 64KB, which is larger than the maximum supported page cache
size MAX_PAGECACHE_ORDER.
This is not expected. To fix this issue, use THP_ORDERS_ALL_FILE_DEFAULT
for shmem to filter allowable huge orders.
Muchun Song [Thu, 18 Jul 2024 08:36:07 +0000 (16:36 +0800)]
mm: list_lru: fix UAF for memory cgroup
The mem_cgroup_from_slab_obj() is supposed to be called under rcu lock or
cgroup_mutex or others which could prevent returned memcg from being
freed. Fix it by adding missing rcu read lock.
When collecting coverage from softirqs, KCOV uses in_serving_softirq() to
check whether the code is running in the softirq context. Unfortunately,
in_serving_softirq() is > 0 even when the code is running in the hardirq
or NMI context for hardirqs and NMIs that happened during a softirq.
As a result, if a softirq handler contains a remote coverage collection
section and a hardirq with another remote coverage collection section
happens during handling the softirq, KCOV incorrectly detects a nested
softirq coverate collection section and prints a WARNING, as reported by
syzbot.
This issue was exposed by commit a7f3813e589f ("usb: gadget: dummy_hcd:
Switch to hrtimer transfer scheduler"), which switched dummy_hcd to using
hrtimer and made the timer's callback be executed in the hardirq context.
Change the related checks in KCOV to account for this behavior of
in_serving_softirq() and make KCOV ignore remote coverage collection
sections in the hardirq and NMI contexts.
This prevents the WARNING printed by syzbot but does not fix the inability
of KCOV to collect coverage from the __usb_hcd_giveback_urb when dummy_hcd
is in use (caused by a7f3813e589f); a separate patch is required for that.
commit 0518dbe97fe6 ("selftests/mm: fix cross compilation with LLVM")
changed the env variable for the architecture from MACHINE to ARCH.
This is preventing 3 required TEST_GEN_FILES from being included when
cross compiling s390x and errors when trying to run the test suite. This
is due to the ARCH variable already being set and the arch folder name
being s390.
Add "s390" to the filtered list to cover this case and have the 3 files
included in the build.