Keith Busch [Wed, 23 Oct 2024 15:40:26 +0000 (08:40 -0700)]
nvme: module parameter to disable pi with offsets
A recent commit enables integrity checks for formats the previous kernel
versions registered with the "nop" integrity profile. This means
namespaces using that format become unreadable when upgrading the kernel
past that commit.
Introduce a module parameter to restore the "nop" integrity profile so
that storage can be readable once again. This could be a boot device, so
the setting needs to happen at module load time.
Keith Busch [Thu, 17 Oct 2024 17:45:24 +0000 (10:45 -0700)]
nvme: enhance cns version checking
The number of CNS bits in the command is specific to the nvme spec
version compliance. The existing check is not sufficient for possible
CNS values the driver uses that may create confusion between host and
device, so enhance the check to consider the version and desired CNS
value.
Jens Axboe [Fri, 18 Oct 2024 16:58:24 +0000 (10:58 -0600)]
Merge tag 'md-6.12-20241018' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.12
Pull MD fixes from Song.
* tag 'md-6.12-20241018' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md/raid10: fix null ptr dereference in raid10_size()
md: ensure child flush IO does not affect origin bio->bi_status
* tag 'nvme-6.12-2024-10-18' of git://git.infradead.org/nvme:
nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function
nvme: make keep-alive synchronous operation
nvme-loop: flush off pending I/O while shutting down loop controller
nvme-pci: fix race condition between reset and nvme_dev_disable()
nvme-multipath: defer partition scanning
nvme: disable CC.CRIME (NVME_CC_CRIME)
nvme: delete unnecessary fallthru comment
nvmet-rdma: use sbitmap to replace rsp free list
nvme: tcp: avoid race between queue_lock lock and destroy
nvmet-passthru: clear EUID/NGUID/UUID while using loop target
block: fix blk_rq_map_integrity_sg kernel-doc
Yu Kuai [Wed, 9 Oct 2024 01:49:14 +0000 (09:49 +0800)]
md/raid10: fix null ptr dereference in raid10_size()
In raid10_run() if raid10_set_queue_limits() succeed, the return value
is set to zero, and if following procedures failed raid10_run() will
return zero while mddev->private is still NULL, causing null ptr
dereference in raid10_size().
Fix the problem by only overwrite the return value if
raid10_set_queue_limits() failed.
Li Nan [Thu, 19 Sep 2024 06:30:48 +0000 (14:30 +0800)]
md: ensure child flush IO does not affect origin bio->bi_status
When a flush is issued to an RAID array, a child flush IO is created and
issued for each member disk in the RAID array. Since commit b75197e86e6d
("md: Remove flush handling"), each child flush IO has been chained with
the original bio. As a result, the failure of any child IO could modify
the bi_status of the original bio, potentially impacting the upper-layer
filesystem.
Fix the issue by preventing child flush IO from altering the original
bio->bi_status as before. However, this design introduces a known
issue: in the event of a power failure, if a flush IO on a member
disk fails, the upper layers may not be informed. This issue is not easy
to fix and will not be addressed for the time being in this issue.
Nilay Shroff [Wed, 16 Oct 2024 03:03:16 +0000 (08:33 +0530)]
nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function
We no more need acquiring ctrl->lock before accessing the
NVMe controller state and instead we can now use the helper
nvme_ctrl_state. So replace the use of ctrl->lock from
nvme_keep_alive_finish function with nvme_ctrl_state call.
Nilay Shroff [Wed, 16 Oct 2024 03:03:15 +0000 (08:33 +0530)]
nvme: make keep-alive synchronous operation
The nvme keep-alive operation, which executes at a periodic interval,
could potentially sneak in while shutting down a fabric controller.
This may lead to a race between the fabric controller admin queue
destroy code path (invoked while shutting down controller) and hw/hctx
queue dispatcher called from the nvme keep-alive async request queuing
operation. This race could lead to the kernel crash shown below:
While shutting down fabric controller, if nvme keep-alive request sneaks
in then it would be flushed off. The nvme_keep_alive_end_io function is
then invoked to handle the end of the keep-alive operation which
decrements the admin->q_usage_counter and assuming this is the last/only
request in the admin queue then the admin->q_usage_counter becomes zero.
If that happens then blk-mq destroy queue operation (blk_mq_destroy_
queue()) which could be potentially running simultaneously on another
cpu (as this is the controller shutdown code path) would forward
progress and deletes the admin queue. So, now from this point onward
we are not supposed to access the admin queue resources. However the
issue here's that the nvme keep-alive thread running hw/hctx queue
dispatch operation hasn't yet finished its work and so it could still
potentially access the admin queue resource while the admin queue had
been already deleted and that causes the above crash.
This fix helps avoid the observed crash by implementing keep-alive as a
synchronous operation so that we decrement admin->q_usage_counter only
after keep-alive command finished its execution and returns the command
status back up to its caller (blk_execute_rq()). This would ensure that
fabric shutdown code path doesn't destroy the fabric admin queue until
keep-alive request finished execution and also keep-alive thread is not
running hw/hctx queue dispatch operation.
Nilay Shroff [Wed, 16 Oct 2024 03:03:14 +0000 (08:33 +0530)]
nvme-loop: flush off pending I/O while shutting down loop controller
While shutting down loop controller, we first quiesce the admin/IO queue,
delete the admin/IO tag-set and then at last destroy the admin/IO queue.
However it's quite possible that during the window between quiescing and
destroying of the admin/IO queue, some admin/IO request might sneak in
and if that happens then we could potentially encounter a hung task
because shutdown operation can't forward progress until any pending I/O
is flushed off.
This commit helps ensure that before destroying the admin/IO queue, we
unquiesce the admin/IO queue so that any outstanding requests, which are
added after the admin/IO queue is quiesced, are now flushed to its
completion.
nvme-pci: fix race condition between reset and nvme_dev_disable()
nvme_dev_disable() modifies the dev->online_queues field, therefore
nvme_pci_update_nr_queues() should avoid racing against it, otherwise
we could end up passing invalid values to blk_mq_update_nr_hw_queues().
Fix the bug by locking the shutdown_lock mutex before using
dev->online_queues. Give up if nvme_dev_disable() is running or if
it has been executed already.
So rq_qos_wake_function() calls wake_up_process(data->task), which calls
try_to_wake_up(), which faults in raw_spin_lock_irqsave(&p->pi_lock).
p comes from data->task, and data comes from the waitqueue entry, which
is stored on the waiter's stack in rq_qos_wait(). Analyzing the core
dump with drgn, I found that the waiter had already woken up and moved
on to a completely unrelated code path, clobbering what was previously
data->task. Meanwhile, the waker was passing the clobbered garbage in
data->task to wake_up_process(), leading to the crash.
What's happening is that in between rq_qos_wake_function() deleting the
waitqueue entry and calling wake_up_process(), rq_qos_wait() is finding
that it already got a token and returning. The race looks like this:
rq_qos_wait() rq_qos_wake_function()
==============================================================
prepare_to_wait_exclusive()
data->got_token = true;
list_del_init(&curr->entry);
if (data.got_token)
break;
finish_wait(&rqw->wait, &data.wq);
^- returns immediately because
list_empty_careful(&wq_entry->entry)
is true
... return, go do something else ...
wake_up_process(data->task)
(NO LONGER VALID!)-^
Normally, finish_wait() is supposed to synchronize against the waker.
But, as noted above, it is returning immediately because the waitqueue
entry has already been removed from the waitqueue.
The bug is that rq_qos_wake_function() is accessing the waitqueue entry
AFTER deleting it. Note that autoremove_wake_function() wakes the waiter
and THEN deletes the waitqueue entry, which is the proper order.
Fix it by swapping the order. We also need to use
list_del_init_careful() to match the list_empty_careful() in
finish_wait().
Keith Busch [Tue, 15 Oct 2024 14:30:17 +0000 (07:30 -0700)]
nvme-multipath: defer partition scanning
We need to suppress the partition scan from occuring within the
controller's scan_work context. If a path error occurs here, the IO will
wait until a path becomes available or all paths are torn down, but that
action also occurs within scan_work, so it would deadlock. Defer the
partion scan to a different context that does not block scan_work.
Ming Lei [Mon, 14 Oct 2024 00:51:15 +0000 (08:51 +0800)]
blk-mq: setup queue ->tag_set before initializing hctx
Commit 7b815817aa58 ("blk-mq: add helper for checking if one CPU is mapped to specified hctx")
needs to check queue mapping via tag set in hctx's cpuhp handler.
However, q->tag_set may not be setup yet when the cpuhp handler is
enabled, then kernel oops is triggered.
Fix the issue by setup queue tag_set before initializing hctx.
Breno Leitao [Fri, 11 Oct 2024 15:56:15 +0000 (08:56 -0700)]
elevator: Remove argument from elevator_find_get
Commit e4eb37cc0f3ed ("block: Remove elevator required features")
removed the usage of `struct request_queue` from elevator_find_get(),
but didn't removed the argument.
Remove the "struct request_queue *q" argument from elevator_find_get()
given it is useless.
Breno Leitao [Fri, 11 Oct 2024 17:01:21 +0000 (10:01 -0700)]
elevator: do not request_module if elevator exists
Whenever an I/O elevator is changed, the system attempts to load a
module for the new elevator. This occurs regardless of whether the
elevator is already loaded or built directly into the kernel. This
behavior introduces unnecessary overhead and potential issues.
This makes the operation slower, and more error-prone. For instance,
making the problem fixed by [1] visible for users that doesn't even rely
on modules being available through modules.
Do not try to load the ioscheduler if it is already visible.
This change brings two main benefits: it improves the performance of
elevator changes, and it reduces the likelihood of errors occurring
during this process.
[1] Commit e3accac1a976 ("block: Fix elv_iosched_local_module handling of "none" scheduler")
Greg Joyce [Mon, 7 Oct 2024 19:33:24 +0000 (14:33 -0500)]
nvme: disable CC.CRIME (NVME_CC_CRIME)
Disable NVME_CC_CRIME so that CSTS.RDY indicates that the media
is ready and able to handle commands without returning
NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY.
Guixin Liu [Tue, 8 Oct 2024 09:37:08 +0000 (17:37 +0800)]
nvmet-rdma: use sbitmap to replace rsp free list
We can use sbitmap to manage all the nvmet_rdma_rsp instead of using
free lists and spinlock, and we can use an additional tag to
determine whether the nvmet_rdma_rsp is extra allocated.
In addition, performance has improved:
1. testing environment is local rxe rdma devie and mem-based
backstore device.
2. fio command, test the average 5 times:
fio -filename=/dev/nvme0n1 --ioengine=libaio -direct=1
-size=1G -name=1 -thread -runtime=60 -time_based -rw=read -numjobs=16
-iodepth=128 -bs=4k -group_reporting
3. Before: 241k IOPS, After: 256k IOPS, an increase of about 5%.
block: Fix elevator_get_default() checking for NULL q->tag_set
elevator_get_default() and elv_support_iosched() both check for whether
or not q->tag_set is non-NULL, however it's not possible for them to be
NULL. This messes up some static checkers, as the checking of tag_set
isn't consistent.
Remove the checks, which both simplifies the logic and avoids checker
errors.
Hannes Reinecke [Wed, 2 Oct 2024 04:51:41 +0000 (13:51 +0900)]
nvme: tcp: avoid race between queue_lock lock and destroy
Commit 76d54bf20cdc ("nvme-tcp: don't access released socket during
error recovery") added a mutex_lock() call for the queue->queue_lock
in nvme_tcp_get_address(). However, the mutex_lock() races with
mutex_destroy() in nvme_tcp_free_queue(), and causes the WARN below.
The WARN is observed when the blktests test case nvme/014 is repeated
with tcp transport. It is rare, and 200 times repeat is required to
recreate in some test environments.
To avoid the WARN, check the NVME_TCP_Q_LIVE flag before locking
queue->queue_lock. The flag is cleared long time before the lock gets
destroyed.
Chun-Yi Lee [Wed, 2 Oct 2024 03:54:58 +0000 (11:54 +0800)]
aoe: fix the potential use-after-free problem in more places
For fixing CVE-2023-6270, f98364e92662 ("aoe: fix the potential
use-after-free problem in aoecmd_cfg_pkts") makes tx() calling dev_put()
instead of doing in aoecmd_cfg_pkts(). It avoids that the tx() runs
into use-after-free.
Then Nicolai Stange found more places in aoe have potential use-after-free
problem with tx(). e.g. revalidate(), aoecmd_ata_rw(), resend(), probe()
and aoecmd_cfg_rsp(). Those functions also use aoenet_xmit() to push
packet to tx queue. So they should also use dev_hold() to increase the
refcnt of skb->dev.
On the other hand, moving dev_put() to tx() causes that the refcnt of
skb->dev be reduced to a negative value, because corresponding
dev_hold() are not called in revalidate(), aoecmd_ata_rw(), resend(),
probe(), and aoecmd_cfg_rsp(). This patch fixed this issue.
Dan Carpenter [Wed, 2 Oct 2024 10:47:21 +0000 (13:47 +0300)]
blk_iocost: remove some duplicate irq disable/enables
These are called from blkcg_print_blkgs() which already disables IRQs so
disabling it again is wrong. It means that IRQs will be enabled slightly
earlier than intended, however, so far as I can see, this bug is harmless.
nvmet-passthru: clear EUID/NGUID/UUID while using loop target
When nvme passthru is configured using loop target, the clear_ids
attribute is, by default, set to true. This attribute would ensure that
EUID/NGUID/UUID is cleared for the loop passthru target.
The newer NVMe disk supporting the NVMe spec 1.3 or higher, typically,
implements the support for "Namespace Identification Descriptor list"
command. This command when issued from host returns EUID/NGUID/UUID
assigned to the inquired namespace. Not clearing these values, while
using nvme passthru using loop target, would result in NVMe host driver
rejecting the namespace. This check was implemented in the commit 2079f41ec6ff ("nvme: check that EUI/GUID/UUID are globally unique").
The fix implemented in this commit ensure that when host issues ns-id
descriptor list command, the EUID/NGUID/UUID are cleared by passthru
target. In fact, the function nvmet_passthru_override_id_descs() which
clears those unique ids already exits, so we just need to ensure that
ns-id descriptor list command falls through the corretc code path. And
while we're at it, we also combines the three passthru admin command
cases together which shares the same code.
leading to a build error when neither KVM_INTEL nor KVM_AMD support is
enabled:
arch/x86/kvm/x86.c: In function ‘kvm_arch_enable_virtualization’:
arch/x86/kvm/x86.c:12517:9: error: implicit declaration of function ‘cpu_emergency_register_virt_callback’ [-Wimplicit-function-declaration]
12517 | cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/kvm/x86.c: In function ‘kvm_arch_disable_virtualization’:
arch/x86/kvm/x86.c:12522:9: error: implicit declaration of function ‘cpu_emergency_unregister_virt_callback’ [-Wimplicit-function-declaration]
12522 | cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix the build by defining empty helper functions the same way the old
cpu_emergency_disable_virtualization() function was dealt with for the
same situation.
Maybe we could instead have made the call sites conditional, since the
callers (kvm_arch_{en,dis}able_virtualization()) have an empty weak
fallback. I'll leave that to the kvm people to argue about, this at
least gets the build going for that particular config.
Merge tag 'mailbox-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox
Pull mailbox updates from Jassi Brar:
- fix kconfig dependencies (mhu-v3, omap2+)
- use devie name instead of genereic imx_mu_chan as interrupt name
(imx)
- enable sa8255p and qcs8300 ipc controllers (qcom)
- Fix timeout during suspend mode (bcm2835)
- convert to use use of_property_match_string (mailbox)
- enable mt8188 (mediatek)
- use devm_clk_get_enabled helpers (spreadtrum)
- fix device-id typo (rockchip)
* tag 'mailbox-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox:
mailbox, remoteproc: omap2+: fix compile testing
dt-bindings: mailbox: qcom-ipcc: Document QCS8300 IPCC
dt-bindings: mailbox: qcom-ipcc: document the support for SA8255p
dt-bindings: mailbox: mtk,adsp-mbox: Add compatible for MT8188
mailbox: Use of_property_match_string() instead of open-coding
mailbox: bcm2835: Fix timeout during suspend mode
mailbox: sprd: Use devm_clk_get_enabled() helpers
mailbox: rockchip: fix a typo in module autoloading
mailbox: imx: use device name in interrupt name
mailbox: ARM_MHU_V3 should depend on ARM64
Merge tag 'i2c-for-6.12-rc1-additional_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- fix DesignWare driver ENABLE-ABORT sequence, ensuring ABORT can
always be sent when needed
- check for PCLK in the SynQuacer controller as an optional clock,
allowing ACPI to directly provide the clock rate
- KEBA driver Kconfig dependency fix
- fix XIIC driver power suspend sequence
* tag 'i2c-for-6.12-rc1-additional_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: xiic: Fix pm_runtime_set_suspended() with runtime pm enabled
i2c: keba: I2C_KEBA should depend on KEBA_CP500
i2c: synquacer: Deal with optional PCLK correctly
i2c: designware: fix controller is holding SCL low while ENABLE bit is disabled
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"These are mostly minor updates.
There are two drivers (lpfc and mpi3mr) which missed the initial
pull and a core change to retry a start/stop unit which affect
suspend/resume"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits)
scsi: lpfc: Update lpfc version to 14.4.0.5
scsi: lpfc: Support loopback tests with VMID enabled
scsi: lpfc: Revise TRACE_EVENT log flag severities from KERN_ERR to KERN_WARNING
scsi: lpfc: Ensure DA_ID handling completion before deleting an NPIV instance
scsi: lpfc: Fix kref imbalance on fabric ndlps from dev_loss_tmo handler
scsi: lpfc: Restrict support for 32 byte CDBs to specific HBAs
scsi: lpfc: Update phba link state conditional before sending CMF_SYNC_WQE
scsi: lpfc: Add ELS_RSP cmd to the list of WQEs to flush in lpfc_els_flush_cmd()
scsi: mpi3mr: Update driver version to 8.12.0.0.50
scsi: mpi3mr: Improve wait logic while controller transitions to READY state
scsi: mpi3mr: Update MPI Headers to revision 34
scsi: mpi3mr: Use firmware-provided timestamp update interval
scsi: mpi3mr: Enhance the Enable Controller retry logic
scsi: sd: Fix off-by-one error in sd_read_block_characteristics()
scsi: pm8001: Do not overwrite PCI queue mapping
scsi: scsi_debug: Remove a useless memset()
scsi: pmcraid: Convert comma to semicolon
scsi: sd: Retry START STOP UNIT commands
scsi: mpi3mr: A performance fix
scsi: ufs: qcom: Update MODE_MAX cfg_bw value
...
Merge tag 'bcachefs-2024-09-28' of git://evilpiepirate.org/bcachefs
Pull more bcachefs updates from Kent Overstreet:
"Assorted minor syzbot fixes, and for bigger stuff:
Fix two disk accounting rewrite bugs:
- Disk accounting keys use the version field of bkey so that journal
replay can tell which updates have been applied to the btree.
This is set in the transaction commit path, after we've gotten our
journal reservation (and our time ordering), but the
BCH_TRANS_COMMIT_skip_accounting_apply flag that journal replay
uses was incorrectly skipping this for new updates generated prior
to journal replay.
This fixes the underlying cause of an assertion pop in
disk_accounting_read.
- A couple of fixes for disk accounting + device removal.
Checking if acocunting replicas entries were marked in the
superblock was being done at the wrong point, when deltas in the
journal could still zero them out, and then additionally we'd try
to add a missing replicas entry to the superblock without checking
if it referred to an invalid (removed) device.
A whole slew of repair fixes:
- fix infinite loop in propagate_key_to_snapshot_leaves(), this fixes
an infinite loop when repairing a filesystem with many snapshots
- fix incorrect transaction restart handling leading to occasional
"fsck counted ..." warnings
- fix warning in __bch2_fsck_err() for bkey fsck errors
- check_inode() in fsck now correctly checks if the filesystem was
clean
- there shouldn't be pending logged ops if the fs was clean, we now
check for this
- remove_backpointer() doesn't remove a dirent that doesn't actually
point to the inode
- many more fsck errors are AUTOFIX"
* tag 'bcachefs-2024-09-28' of git://evilpiepirate.org/bcachefs: (35 commits)
bcachefs: check_subvol_path() now prints subvol root inode
bcachefs: remove_backpointer() now checks if dirent points to inode
bcachefs: dirent_points_to_inode() now warns on mismatch
bcachefs: Fix lost wake up
bcachefs: Check for logged ops when clean
bcachefs: BCH_FS_clean_recovery
bcachefs: Convert disk accounting BUG_ON() to WARN_ON()
bcachefs: Fix BCH_TRANS_COMMIT_skip_accounting_apply
bcachefs: Check for accounting keys with bversion=0
bcachefs: rename version -> bversion
bcachefs: Don't delete unlinked inodes before logged op resume
bcachefs: Fix BCH_SB_ERRS() so we can reorder
bcachefs: Fix fsck warnings from bkey validation
bcachefs: Move transaction commit path validation to as late as possible
bcachefs: Fix disk accounting attempting to mark invalid replicas entry
bcachefs: Fix unlocked access to c->disk_sb.sb in bch2_replicas_entry_validate()
bcachefs: Fix accounting read + device removal
bcachefs: bch_accounting_mode
bcachefs: fix transaction restart handling in check_extents(), check_dirents()
bcachefs: kill inode_walker_entry.seen_this_pos
...
Merge tag 'x86-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix TDX MMIO #VE fault handling, and add two new Intel model numbers
for 'Pantherlake' and 'Diamond Rapids'"
* tag 'x86-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add two Intel CPU model numbers
x86/tdx: Fix "in-kernel MMIO" check
Merge tag 'locking-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"lockdep:
- Fix potential deadlock between lockdep and RCU (Zhiguo Niu)
- Use str_plural() to address Coccinelle warning (Thorsten Blum)
- Add debuggability enhancement (Luis Claudio R. Goncalves)
static keys & calls:
- Fix static_key_slow_dec() yet again (Peter Zijlstra)
- Handle module init failure correctly in static_call_del_module()
(Thomas Gleixner)
- Replace pointless WARN_ON() in static_call_module_notify() (Thomas
Gleixner)
<linux/cleanup.h>:
- Add usage and style documentation (Dan Williams)
rwsems:
- Move is_rwsem_reader_owned() and rwsem_owner() under
CONFIG_DEBUG_RWSEMS (Waiman Long)
atomic ops, x86:
- Redeclare x86_32 arch_atomic64_{add,sub}() as void (Uros Bizjak)
- Introduce the read64_nonatomic macro to x86_32 with cx8 (Uros
Bizjak)"
Signed-off-by: Ingo Molnar <[email protected]>
* tag 'locking-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS
jump_label: Fix static_key_slow_dec() yet again
static_call: Replace pointless WARN_ON() in static_call_module_notify()
static_call: Handle module init failure correctly in static_call_del_module()
locking/lockdep: Simplify character output in seq_line()
lockdep: fix deadlock issue between lockdep and rcu
lockdep: Use str_plural() to fix Coccinelle warning
cleanup: Add usage and style documentation
lockdep: suggest the fix for "lockdep bfs error:-1" on print_bfs_bug
locking/atomic/x86: Redeclare x86_32 arch_atomic64_{add,sub}() as void
locking/atomic/x86: Introduce the read64_nonatomic macro to x86_32 with cx8
Merge tag 'linux_kselftest-next-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
"One urgent fix to vDSO as automated testing is failing due to this
bug"
* tag 'linux_kselftest-next-6.12-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: vDSO: align stack for O2-optimized memcpy
Julia Lawall [Sat, 28 Sep 2024 19:26:22 +0000 (21:26 +0200)]
Reduce Coccinelle choices in string_choices.cocci
The isomorphism neg_if_exp negates the test of a ?: conditional,
making it unnecessary to have an explicit case for a negated test
with the branches inverted.
At the same time, we can disable neg_if_exp in cases where a
different API function may be more suitable for a negated test.
Finally, in the non-patch cases, E matches an expression with
parentheses around it, so there is no need to mention ()
explicitly in the pattern. The () are still needed in the patch
cases, because we want to drop them, if they are present.
Hongbo Li [Wed, 11 Sep 2024 01:09:27 +0000 (09:09 +0800)]
coccinelle: Remove unnecessary parentheses for only one possible change.
The parentheses are only needed if there is a disjunction, ie a
set of possible changes. If there is only one pattern, we can
remove these parentheses. Just like the format:
Hongbo Li [Wed, 11 Sep 2024 01:09:18 +0000 (09:09 +0800)]
coccinelle: Add rules to find str_true_false() replacements
After str_true_false() has been introduced in the tree,
we can add rules for finding places where str_true_false()
can be used. A simple test can find over 10 locations.
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull x86 kvm updates from Paolo Bonzini:
"x86:
- KVM currently invalidates the entirety of the page tables, not just
those for the memslot being touched, when a memslot is moved or
deleted.
This does not traditionally have particularly noticeable overhead,
but Intel's TDX will require the guest to re-accept private pages
if they are dropped from the secure EPT, which is a non starter.
Actually, the only reason why this is not already being done is a
bug which was never fully investigated and caused VM instability
with assigned GeForce GPUs, so allow userspace to opt into the new
behavior.
- Advertise AVX10.1 to userspace (effectively prep work for the
"real" AVX10 functionality that is on the horizon)
- Rework common MSR handling code to suppress errors on userspace
accesses to unsupported-but-advertised MSRs
This will allow removing (almost?) all of KVM's exemptions for
userspace access to MSRs that shouldn't exist based on the vCPU
model (the actual cleanup is non-trivial future work)
- Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC)
splits the 64-bit value into the legacy ICR and ICR2 storage,
whereas Intel (APICv) stores the entire 64-bit value at the ICR
offset
- Fix a bug where KVM would fail to exit to userspace if one was
triggered by a fastpath exit handler
- Add fastpath handling of HLT VM-Exit to expedite re-entering the
guest when there's already a pending wake event at the time of the
exit
- Fix a WARN caused by RSM entering a nested guest from SMM with
invalid guest state, by forcing the vCPU out of guest mode prior to
signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the
nested guest)
- Overhaul the "unprotect and retry" logic to more precisely identify
cases where retrying is actually helpful, and to harden all retry
paths against putting the guest into an infinite retry loop
- Add support for yielding, e.g. to honor NEED_RESCHED, when zapping
rmaps in the shadow MMU
- Refactor pieces of the shadow MMU related to aging SPTEs in
prepartion for adding multi generation LRU support in KVM
- Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is
enabled, i.e. when the CPU has already flushed the RSB
- Trace the per-CPU host save area as a VMCB pointer to improve
readability and cleanup the retrieval of the SEV-ES host save area
- Remove unnecessary accounting of temporary nested VMCB related
allocations
- Set FINAL/PAGE in the page fault error code for EPT violations if
and only if the GVA is valid. If the GVA is NOT valid, there is no
guest-side page table walk and so stuffing paging related metadata
is nonsensical
- Fix a bug where KVM would incorrectly synthesize a nested VM-Exit
instead of emulating posted interrupt delivery to L2
- Add a lockdep assertion to detect unsafe accesses of vmcs12
structures
- Harden eVMCS loading against an impossible NULL pointer deref
(really truly should be impossible)
- Minor SGX fix and a cleanup
- Misc cleanups
Generic:
- Register KVM's cpuhp and syscore callbacks when enabling
virtualization in hardware, as the sole purpose of said callbacks
is to disable and re-enable virtualization as needed
- Enable virtualization when KVM is loaded, not right before the
first VM is created
Together with the previous change, this simplifies a lot the logic
of the callbacks, because their very existence implies
virtualization is enabled
- Fix a bug that results in KVM prematurely exiting to userspace for
coalesced MMIO/PIO in many cases, clean up the related code, and
add a testcase
- Fix a bug in kvm_clear_guest() where it would trigger a buffer
overflow _if_ the gpa+len crosses a page boundary, which thankfully
is guaranteed to not happen in the current code base. Add WARNs in
more helpers that read/write guest memory to detect similar bugs
Selftests:
- Fix a goof that caused some Hyper-V tests to be skipped when run on
bare metal, i.e. NOT in a VM
- Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES
guest
- Explicitly include one-off assets in .gitignore. Past Sean was
completely wrong about not being able to detect missing .gitignore
entries
- Verify userspace single-stepping works when KVM happens to handle a
VM-Exit in its fastpath
- Misc cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
Documentation: KVM: fix warning in "make htmldocs"
s390: Enable KVM_S390_UCONTROL config in debug_defconfig
selftests: kvm: s390: Add VM run test case
KVM: SVM: let alternatives handle the cases when RSB filling is required
KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid
KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
KVM: x86: Update retry protection fields when forcing retry on emulation failure
KVM: x86: Apply retry protection to "unprotect on failure" path
KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
...
Merge tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Clean up and improve vdso code: use SYM_* macros for function and
data annotations, add CFI annotations to fix GDB unwinding, optimize
the chacha20 implementation
- Add vfio-ap driver feature advertisement for use by libvirt and
mdevctl
* tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vfio-ap: Driver feature advertisement
s390/vdso: Use one large alternative instead of an alternative branch
s390/vdso: Use SYM_DATA_START_LOCAL()/SYM_DATA_END() for data objects
tools: Add additional SYM_*() stubs to linkage.h
s390/vdso: Use macros for annotation of asm functions
s390/vdso: Add CFI annotations to __arch_chacha20_blocks_nostack()
s390/vdso: Fix comment within __arch_chacha20_blocks_nostack()
s390/vdso: Get rid of permutation constants
Merge tag 'modules-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module updates from Luis Chamberlain:
"There are a few fixes / cleanups from Vincent, Chunhui, and Petr, but
the most important part of this pull request is the Rust community
stepping up to help maintain both C / Rust code for future Rust module
support. We grow the set of modules maintainers by three now, and with
this hope to scale to help address what's needed to properly support
future Rust module support.
A lot of exciting stuff coming in future kernel releases.
This has been on linux-next for ~ 3 weeks now with no issues"
* tag 'modules-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module: Refine kmemleak scanned areas
module: abort module loading when sysfs setup suffer errors
MAINTAINERS: scale modules with more reviewers
module: Clean up the description of MODULE_SIG_<type>
module: Split modules_install compression and in-kernel decompression
Merge tag 'fbdev-for-6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes from Helge Deller:
- crash fix in fbcon_putcs
- avoid a possible string memory overflow in sisfb
- minor code optimizations in omapfb and fbcon
* tag 'fbdev-for-6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: sisfb: Fix strbuf array overflow
fbcon: break earlier in search_fb_in_map and search_for_mapped_con
fbdev: omapfb: Call of_node_put(ep) only once in omapdss_of_find_source_for_first_ep()
fbcon: Fix a NULL pointer dereference issue in fbcon_putcs
Merge tag 'drm-next-2024-09-28' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Regular fixes for the week to end the merge window, i915 and xe have a
few each, amdgpu makes up most of it with a bunch of SR-IOV related
fixes amongst others.
i915:
- Fix BMG support to UHBR13.5
- Two PSR fixes
- Fix colorimetry detection for DP
xe:
- Fix macro for checking minimum GuC version
- Fix CCS offset calculation for some BMG SKUs
- Fix locking on memory usage reporting via fdinfo and BO destroy
- Fix GPU page fault handler on a closed VM
- Fix overflow in oa batch buffer
amdgpu:
- MES 12 fix
- KFD fence sync fix
- SR-IOV fixes
- VCN 4.0.6 fix
- SDMA 7.x fix
- Bump driver version to note cleared VRAM support
- SWSMU fix
- CU occupancy logic fix
- SDMA queue fix"
* tag 'drm-next-2024-09-28' of https://gitlab.freedesktop.org/drm/kernel: (79 commits)
drm/amd/pm: update workload mask after the setting
drm/amdgpu: bump driver version for cleared VRAM
drm/amdgpu: fix vbios fetching for SR-IOV
drm/amdgpu: fix PTE copy corruption for sdma 7
drm/amdkfd: Add SDMA queue quantum support for GFX12
drm/amdgpu/vcn: enable AV1 on both instances
drm/amdkfd: Fix CU occupancy for GFX 9.4.3
drm/amdkfd: Update logic for CU occupancy calculations
drm/amdgpu: skip coredump after job timeout in SRIOV
drm/amdgpu: sync to KFD fences before clearing PTEs
drm/amdgpu/mes12: set enable_level_process_quantum_check
drm/i915/dp: Fix colorimetry detection
drm/amdgpu/mes12: reduce timeout
drm/amdgpu/mes11: reduce timeout
drm/amdgpu: use GEM references instead of TTMs v2
drm/amd/display: Allow backlight to go below `AMDGPU_DM_DEFAULT_MIN_BACKLIGHT`
drm/amd/display: Fix kdoc entry for 'tps' in 'dc_process_dmub_dpia_set_tps_notification'
drm/amdgpu: update golden regs for gfx12
drm/amdgpu: clean up vbios fetching code
drm/amd/display: handle nulled pipe context in DCE110's set_drr()
...
Merge tag 'ceph-for-6.12-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"Three CephFS fixes from Xiubo and Luis and a bunch of assorted
cleanups"
* tag 'ceph-for-6.12-rc1' of https://github.com/ceph/ceph-client:
ceph: remove the incorrect Fw reference check when dirtying pages
ceph: Remove empty definition in header file
ceph: Fix typo in the comment
ceph: fix a memory leak on cap_auths in MDS client
ceph: flush all caps releases when syncing the whole filesystem
ceph: rename ceph_flush_cap_releases() to ceph_flush_session_cap_releases()
libceph: use min() to simplify code in ceph_dns_resolve_name()
ceph: Convert to use jiffies macro
ceph: Remove unused declarations
Merge tag 'v6.12-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- fix querying dentry for char/block special files
- small cleanup patches
* tag 'v6.12-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: Correct typos in multiple comments across various files
ksmbd: fix open failure from block and char device file
ksmbd: remove unsafe_memcpy use in session setup
ksmbd: Replace one-element arrays with flexible-array members
ksmbd: fix warning: comparison of distinct pointer types lacks a cast
Merge tag '6.12rc-more-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull xmb client fixes from Steve French:
- Noisy log message cleanup
- Important netfs fix for cifs crash in generic/074
- Three minor improvements to use of hashing (multichannel and mount
improvements)
- Fix decryption crash for large read with small esize
* tag '6.12rc-more-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: make SHA-512 TFM ephemeral
smb: client: make HMAC-MD5 TFM ephemeral
smb: client: stop flooding dmesg in smb2_calc_signature()
smb: client: allocate crypto only for primary server
smb: client: fix UAF in async decryption
netfs: Fix write oops in generic/346 (9p) and generic/074 (cifs)
Alan Huang [Tue, 27 Aug 2024 15:14:48 +0000 (23:14 +0800)]
bcachefs: Fix lost wake up
If the reader acquires the read lock and then the writer enters the slow
path, while the reader proceeds to the unlock path, the following scenario
can occur without the change:
writer: pcpu_read_count(lock) return 1 (so __do_six_trylock will return 0)
reader: this_cpu_dec(*lock->readers)
reader: smp_mb()
reader: state = atomic_read(&lock->state) (there is no waiting flag set)
writer: six_set_bitmask()
Kent Overstreet [Thu, 26 Sep 2024 20:19:58 +0000 (16:19 -0400)]
bcachefs: BCH_FS_clean_recovery
Add a filesystem flag to indicate whether we did a clean recovery -
using c->sb.clean after we've got rw is incorrect, since c->sb is
updated whenever we write the superblock.
Kent Overstreet [Sat, 28 Sep 2024 01:05:59 +0000 (21:05 -0400)]
bcachefs: Convert disk accounting BUG_ON() to WARN_ON()
We had a bug where disk accounting keys didn't always have their version
field set in journal replay; change the BUG_ON() to a WARN(), and
exclude this case since it's now checked for elsewhere (in the bkey
validate function).
This was added to avoid double-counting accounting keys in journal
replay. But applied incorrectly (easily done since it applies to the
transaction commit, not a particular update), it leads to skipping
in-mem accounting for real accounting updates, and failure to give them
a version number - which leads to journal replay becoming very confused
the next time around.
Kent Overstreet [Thu, 26 Sep 2024 19:19:17 +0000 (15:19 -0400)]
bcachefs: Don't delete unlinked inodes before logged op resume
Previously, check_inode() would delete unlinked inodes if they weren't
on the deleted list - this code dating from before there was a deleted
list.
But, if we crash during a logged op (truncate or finsert/fcollapse) of
an unlinked file, logged op resume will get confused if the inode has
already been deleted - instead, just add it to the deleted list if it
needs to be there; delete_dead_inodes runs after logged op resume.
Kent Overstreet [Thu, 26 Sep 2024 20:51:19 +0000 (16:51 -0400)]
bcachefs: Fix fsck warnings from bkey validation
__bch2_fsck_err() warns if the current task has a btree_trans object and
it wasn't passed in, because if it has to prompt for user input it has
to be able to unlock it.
But plumbing the btree_trans through bkey_validate(), as well as
transaction restarts, is problematic - so instead make bkey fsck errors
FSCK_AUTOFIX, which doesn't need to warn.
Kent Overstreet [Wed, 25 Sep 2024 20:46:06 +0000 (16:46 -0400)]
bcachefs: Fix accounting read + device removal
accounting read was checking if accounting replicas entries were marked
in the superblock prior to applying accounting from the journal,
which meant that a recently removed device could spuriously trigger a
"not marked in superblocked" error (when journal entries zero out the
offending counter).
Kent Overstreet [Tue, 24 Sep 2024 02:32:58 +0000 (22:32 -0400)]
bcachefs: fix transaction restart handling in check_extents(), check_dirents()
Dealing with outside state within a btree transaction is always tricky.
check_extents() and check_dirents() have to accumulate counters for
i_sectors and i_nlink (for subdirectories). There were two bugs:
- transaction commit may return a restart; therefore we have to commit
before accumulating to those counters
- get_inode_all_snapshots() may return a transaction restart, before
updating w->last_pos; then, on the restart,
check_i_sectors()/check_subdir_count() would see inodes that were not
for w->last_pos
Hongbo Li [Tue, 24 Sep 2024 01:41:46 +0000 (09:41 +0800)]
bcachefs: fix the memory leak in exception case
The pointer clean points the memory allocated by kmemdup, when the
return value of bch2_sb_clean_validate_late is not zero. The memory
pointed by clean is leaked. So we should free it in this case.
Fixes: a37ad1a3aba9 ("bcachefs: sb-clean.c") Signed-off-by: Hongbo Li <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
bcachefs: assign return error when iterating through layout
syzbot reported a null ptr deref in __copy_user [0]
In __bch2_read_super, when a corrupt backup superblock matches the
default opts offset, no error is assigned to ret and the freed superblock
gets through, possibly being assigned as the best sb in bch2_fs_open and
being later dereferenced, causing a fault. Assign EINVALID to ret when
iterating through layout.
Piotr Zalewski [Sun, 22 Sep 2024 15:18:01 +0000 (15:18 +0000)]
bcachefs: memset bounce buffer portion to 0 after key_sort_fix_overlapping
Zero-initialize part of allocated bounce buffer which wasn't touched by
subsequent bch2_key_sort_fix_overlapping to mitigate later uinit-value
use KMSAN bug[1].
After applying the patch reproducer still triggers stack overflow[2] but
it seems unrelated to the uninit-value use warning. After further
investigation it was found that stack overflow occurs because KMSAN adds
too many function calls[3]. Backtrace of where the stack magic number gets
smashed was added as a reply to syzkaller thread[3].
It was confirmed that task's stack magic number gets smashed after the code
path where KSMAN detects uninit-value use is executed, so it can be assumed
that it doesn't contribute in any way to uninit-value use detection.
Kent Overstreet [Mon, 23 Sep 2024 21:30:59 +0000 (17:30 -0400)]
bcachefs: Add extra padding in bkey_make_mut_noupdate()
This fixes a kasan splat in propagate_key_to_snapshot_leaves() -
varint_decode_fast() does reads (that it never uses) up to 7 bytes past
the end of the integer.
The values of the variables xres and yres are placed in strbuf.
These variables are obtained from strbuf1.
The strbuf1 array contains digit characters
and a space if the array contains non-digit characters.
Then, when executing sprintf(strbuf, "%ux%ux8", xres, yres);
more than 16 bytes will be written to strbuf.
It is suggested to increase the size of the strbuf array to 24.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Merge tag 'pm-6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix idle states enumeration in the intel_idle driver on platforms
supporting multiple flavors of the C6 idle state (Artem Bityutskiy)"
* tag 'pm-6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: fix ACPI _CST matching for newer Xeon platforms
Merge tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull more random number generator updates from Jason Donenfeld:
- Christophe realized that the LoongArch64 instructions could be
scheduled more similar to how GCC generates code, which Ruoyao
implemented, for a 5% speedup from basically some rearrangements
- An update to MAINTAINERS to match the right files
* tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
LoongArch: vDSO: Tune chacha implementation
MAINTAINERS: make vDSO getrandom matches more generic
Merge tag 'bitmap-for-6.12' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- switch all bitmamp APIs from inline to __always_inline (Brian Norris)
The __always_inline series improves on code generation, and now with
the latest compiler versions is required to avoid compilation
warnings. It spent enough in my backlog, and I'm thankful to Brian
Norris for taking over and moving it forward.
GENMASK_U128() is a prerequisite needed for arm64 development
* tag 'bitmap-for-6.12' of https://github.com/norov/linux:
lib/test_bits.c: Add tests for GENMASK_U128()
uapi: Define GENMASK_U128
nodemask: Switch from inline to __always_inline
cpumask: Switch from inline to __always_inline
bitmap: Switch from inline to __always_inline
find: Switch from inline to __always_inline
Merge tag 'tomoyo-pr-20240927' of git://git.code.sf.net/p/tomoyo/tomoyo
Pull tomoyo updates from Tetsuo Handa:
"One bugfix patch, one preparation patch, and one conversion patch.
TOMOYO is useful as an analysis tool for learning how a Linux system
works. My boss was hoping that SELinux's policy is generated from what
TOMOYO has observed. A translated paper describing it is available at
Although that attempt failed due to mapping problem between inode and
pathname, TOMOYO remains as an access restriction tool due to ability
to write custom policy by individuals.
I was delivering pure LKM version of TOMOYO (named AKARI) to users who
cannot afford rebuilding their distro kernels with TOMOYO enabled. But
since the LSM framework was converted to static calls, it became more
difficult to deliver AKARI to such users. Therefore, I decided to
update TOMOYO so that people can use mostly LKM version of TOMOYO with
minimal burden for both distributors and users"
* tag 'tomoyo-pr-20240927' of git://git.code.sf.net/p/tomoyo/tomoyo:
tomoyo: fallback to realpath if symlink's pathname does not exist
tomoyo: allow building as a loadable LSM module
tomoyo: preparation step for building as a loadable LSM module
Merge tag 'cxl-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link (cxl) updates from Dave Jiang:
"Major changes address HDM decoder initialization from DVSEC ranges,
refactoring the code related to cxl mailboxes to be independent of the
memory devices, and adding support for shared upstream link
access_coordinate calculation, as well as a change to remove locking
from memory notifier callback.
In addition, a number of misc cleanups and refactoring of the code are
also included.
Address HDM decoder initialization from DVSEC ranges:
- Only register non-zero DVSEC ranges
- Remove duplicate implementation of waiting for memory_info_valid
- Simplify the checking of mem_enabled in cxl_hdm_decode_init()
Refactor the code related to cxl mailboxes to be independent of the memory devices:
- Move cxl headers in include/linux/ to include/cxl
- Move all mailbox related data to 'struct cxl_mailbox'
- Refactor mailbox APIs with 'struct cxl_mailbox' as input instead of
memory device state
Add support for shared upstream link access_coordinate calculation for
configurations that have multiple targets under a switch or a root
port where the aggregated bandwidth can be greater than the upstream
link of the switch/RP upstream link:
- Preserve the CDAT access_coordinate from an endpoint
- Add the support for shared upstream link access_coordinate calculation
- Add documentation to explain how the calculations are done
Remove locking from memory notifier callback.
Misc cleanups:
- Convert devm_cxl_add_root() to return using ERR_CAST()
- cxl_test use dev_is_platform() instead of open coding
- Remove duplicate include of header core.h in core/cdat.c
- use scoped resource management to drop put_device() for cxl_port
- Use scoped_guard to drop device_lock() for cxl_port
- Refactor __devm_cxl_add_port() to drop gotos
- Rename cxl_setup_parent_dport to cxl_dport_init_aer and
cxl_dport_map_regs() to cxl_dport_map_ras()
- Refactor cxl_dport_init_aer() to be more concise
- Remove duplicate host_bridge->native_aer checking in
cxl_dport_init_ras_reporting()
- Fix comment for cxl_query_cmd()"
* tag 'cxl-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits)
cxl: Add documentation to explain the shared link bandwidth calculation
cxl: Calculate region bandwidth of targets with shared upstream link
cxl: Preserve the CDAT access_coordinate for an endpoint
cxl: Fix comment regarding cxl_query_cmd() return data
cxl: Convert cxl_internal_send_cmd() to use 'struct cxl_mailbox' as input
cxl: Move mailbox related bits to the same context
cxl: move cxl headers to new include/cxl/ directory
cxl/region: Remove lock from memory notifier callback
cxl/pci: simplify the check of mem_enabled in cxl_hdm_decode_init()
cxl/pci: Check Mem_info_valid bit for each applicable DVSEC
cxl/pci: Remove duplicated implementation of waiting for memory_info_valid
cxl/pci: Fix to record only non-zero ranges
cxl/pci: Remove duplicate host_bridge->native_aer checking
cxl/pci: cxl_dport_map_rch_aer() cleanup
cxl/pci: Rename cxl_setup_parent_dport() and cxl_dport_map_regs()
cxl/port: Refactor __devm_cxl_add_port() to drop goto pattern
cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port
cxl/port: Use __free() to drop put_device() for cxl_port
cxl: Remove duplicate included header file core.h
tools/testing/cxl: Use dev_is_platform()
...
selftests: vDSO: align stack for O2-optimized memcpy
When switching on -O2, gcc generates SSE2 instructions that assume a
16-byte aligned stack, which the standalone test's start point wasn't
aligning. Fix this with the usual alignment sequence.
Fixes: ecb8bd70d51 ("selftests: vDSO: build tests with O2 optimization") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Shuah Khan <[email protected]>