Bjorn Helgaas [Tue, 12 Mar 2024 17:14:24 +0000 (12:14 -0500)]
Merge branch 'pci/virtualization'
- Avoid Secondary Bus Reset on the LSI / Agere FW643, which allows it to be
assigned to VMs with VFIO, at the cost of leaking FW643 state between VMs
(Edmund Raile)
* pci/virtualization:
PCI: Mark LSI FW643 to avoid bus reset
- Rework pci_dev_resource_resize_attr(n) macros to call a function instead
of duplicating most of the body, which saves about 2.5KB of text (Ilpo
Järvinen)
* pci/sysfs:
PCI/sysfs: Demacrofy pci_dev_resource_resize_attr(n) functions
PCI: Remove obsolete pci_cleanup_rom() declaration
PCI/sysfs: Compile pci-sysfs.c only if CONFIG_SYSFS=y
Bjorn Helgaas [Tue, 12 Mar 2024 17:14:22 +0000 (12:14 -0500)]
Merge branch 'pci/pm'
- Disable use of D3cold on Asus B1400 PCI-NVMe bridges because some BIOSes
can't power them back on, replacing a more general ACPI sleep quirk
(Daniel Drake)
- Allow runtime PM when the driver enables it but doesn't need any runtime
PM callbacks (Raag Jadav)
- Drain runtime-idle callbacks before driver removal to avoid races between
.remove() and .runtime_idle(), which caused intermittent page faults when
the rtsx .runtime_idle() accessed registers that its .remove() had
already unmapped (Rafael J. Wysocki)
* pci/pm:
PCI/PM: Drain runtime-idle callbacks before driver removal
PCI/PM: Allow runtime PM with no PM callbacks at all
Revert "ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default"
PCI: Disable D3cold on Asus B1400 PCI-NVMe bridge
Bjorn Helgaas [Tue, 12 Mar 2024 17:14:20 +0000 (12:14 -0500)]
Merge branch 'pci/dpc'
- After a DPC event, print all logged TLP Prefixes instead of printing the
first prefix several times (Ilpo Järvinen)
- Ignore the expected Surprise Down error that may cause a DPC event when
hot-removing a device (Smita Koralahalli)
- Add an RP PIO log size quirk for Intel Raptor Lake Root Ports, which
still don't advertise the correct log size, which prevented logging of RP
PIO Log registers when DPC is triggered (Paul Menzel)
* pci/dpc:
PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root Ports
PCI/DPC: Ignore Surprise Down error on hot removal
PCI/DPC: Print all TLP Prefixes, not just the first
Bjorn Helgaas [Tue, 12 Mar 2024 17:14:19 +0000 (12:14 -0500)]
Merge branch 'pci/aspm'
- Collect ASPM-related code into aspm.c (David E. Box)
- Save and restore ASPM L1 PM Substates configuration so these states
continue working after suspend/resume (David E. Box)
- Move the ASPM L1.2-related LTR save/restore next to the ASPM save/restore
(David E. Box)
- Move the required L1 disable before L1 Substate configuration into
pci_restore_aspm_l1ss_state() (Bjorn Helgaas)
- Update save_save when ASPM config is changed, so a .slot_reset() during
error recovery restores the changed config, not the .probe()-time config
(Vidya Sagar)
* pci/aspm:
PCI/ASPM: Update save_state when configuration changes
PCI/ASPM: Disable L1 before configuring L1 Substates
PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state()
PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
PCI/ASPM: Move pci_save_ltr_state() to aspm.c
PCI/ASPM: Always build aspm.c
PCI/ASPM: Move pci_configure_ltr() to aspm.c
Bjorn Helgaas [Tue, 12 Mar 2024 17:14:19 +0000 (12:14 -0500)]
Merge branch 'pci/aer'
- Fix sysfs paths in aer_rootport_total_err_* documentation (Johan Hovold)
- Block runtime suspend while handling AER errors (Stanislaw Gruszka)
- Add a generic Header Log structure and reader shared by AER and DPC (Ilpo
Järvinen)
* pci/aer:
PCI/AER: Generalize TLP Header Log reading
PCI/AER: Use explicit register size for PCI_ERR_CAP
PCI/AER: Block runtime suspend when handling errors
PCI/AER: Clean up version indentation in ABI docs
PCI/AER: Fix rootport attribute paths in ABI docs
Vidya Sagar [Fri, 23 Feb 2024 19:36:24 +0000 (13:36 -0600)]
PCI/ASPM: Update save_state when configuration changes
Many PCIe device drivers save the configuration state of their device
during probe and restore it when their .slot_reset() hook is called during
PCIe error recovery.
If the ASPM configuration is changed after the driver's probe is called and
before an error event occurs, .slot_reset() restores the ASPM configuration
to what it was at the time of probe, not to what it was just before the
occurrence of the error event. This leads to a mismatch in ASPM
configuration between the device and its upstream device.
Update the saved configuration of the device when the ASPM configuration
changes.
Bjorn Helgaas [Tue, 5 Mar 2024 21:15:25 +0000 (15:15 -0600)]
PCI/ASPM: Disable L1 before configuring L1 Substates
Per PCIe r6.1, sec 5.5.4, L1 must be disabled while setting ASPM L1 PM
Substates enable bits. Previously this was enforced by clearing
PCI_EXP_LNKCTL_ASPMC before calling pci_restore_aspm_l1ss_state().
Move the L1 (and L0s, although that doesn't seem required) disable into
pci_restore_aspm_l1ss_state() itself so it's closer to the code that
depends on it.
David E. Box [Fri, 23 Feb 2024 20:58:51 +0000 (14:58 -0600)]
PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state()
ASPM state is saved and restored from pci_save/restore_pcie_state(). Since
the LTR Capability is linked with ASPM, move the LTR save and restore calls
there as well. No functional change intended.
David E. Box [Fri, 23 Feb 2024 20:58:50 +0000 (14:58 -0600)]
PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume") restored the L1 PM Substates Capability after resume,
which reduced power consumption by making the ASPM L1.x states work after
resume.
a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume"") reverted 4ff116d0d5fd because resume failed on some
systems, so power consumption after resume increased again.
a7152be79b62 mentioned that we restore L1 PM substate configuration even
though ASPM L1 may already be enabled. This is due the fact that the
pci_restore_aspm_l1ss_state() was called before pci_restore_pcie_state().
Save and restore the L1 PM Substates Capability, following PCIe r6.1, sec
5.5.4 more closely by:
1) Do not restore ASPM configuration in pci_restore_pcie_state() but
do that after PCIe capability is restored in pci_restore_aspm_state()
following PCIe r6.1, sec 5.5.4.
2) If BIOS reenables L1SS, particularly L1.2, we need to clear the
enables in the right order, downstream before upstream. Defer
restoring the L1SS config until we are at the downstream component.
Then update the config for both ends of the link in the prescribed
order.
3) Program ASPM L1 PM substate configuration before L1 enables.
4) Program ASPM L1 PM substate enables last, after rest of the fields
in the capability are programmed.
[bhelgaas: commit log, squash L1SS-related patches, do both LNKCTL restores
in pci_restore_pcie_state()]
Ilpo Järvinen [Tue, 6 Feb 2024 13:57:15 +0000 (15:57 +0200)]
PCI/AER: Generalize TLP Header Log reading
Both AER and DPC RP PIO provide TLP Header Log registers (PCIe r6.1 secs
7.8.4 & 7.9.14) to convey error diagnostics but the struct is named after
AER as the struct aer_header_log_regs. Also, not all places that handle TLP
Header Log use the struct and the struct members are named individually.
Generalize the struct name and members, and use it consistently where TLP
Header Log is being handled so that a pcie_read_tlp_log() helper can be
easily added.
Ilpo Järvinen [Tue, 6 Feb 2024 13:57:14 +0000 (15:57 +0200)]
PCI/AER: Use explicit register size for PCI_ERR_CAP
Use u32 for PCIe AER Capability register variable and name it "aercc"
(Advanced Error Capabilities and Control register, PCIe r6.1 sec 7.8.4.7)
instead of "temp".
PCI/AER: Block runtime suspend when handling errors
PM runtime can be done simultaneously with AER error handling. Avoid that
by using pm_runtime_get_sync() before and pm_runtime_put() after reset in
pcie_do_recovery() for all recovering devices.
pm_runtime_get_sync() will increase dev->power.usage_count counter to
prevent any possible future request to runtime suspend a device. It will
also resume a device, if it was previously in D3hot state.
I tested with igc device by doing simultaneous aer_inject and rpm
suspend/resume via /sys/bus/pci/devices/PCI_ID/power/control and can
reproduce:
igc 0000:02:00.0: not ready 65535ms after bus reset; giving up
pcieport 0000:00:1c.2: AER: Root Port link has been reset (-25)
pcieport 0000:00:1c.2: AER: subordinate device reset failed
pcieport 0000:00:1c.2: AER: device recovery failed
igc 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible
The problem disappears when this patch is applied.
David E. Box [Fri, 23 Feb 2024 20:58:49 +0000 (14:58 -0600)]
PCI/ASPM: Move pci_save_ltr_state() to aspm.c
Even when CONFIG_PCIEASPM is not set, we save and restore the LTR
Capability so that if ASPM L1.2 and LTR were configured by the platform,
ASPM L1.2 will still work after suspend/resume, when that platform
configuration may be lost. See dbbfadf23190 ("PCI/ASPM: Save LTR Capability
for suspend/resume").
Since ASPM L1.2 depends on the LTR Capability, move the save/restore code
to the part of aspm.c that is always compiled regardless of
CONFIG_PCIEASPM. No functional change intended.
David E. Box [Fri, 23 Feb 2024 20:58:48 +0000 (14:58 -0600)]
PCI/ASPM: Always build aspm.c
Some ASPM-related tasks, such as save and restore of LTR and L1SS
capabilities, still need to be performed when CONFIG_PCIEASPM is not
enabled. To prepare for these changes, wrap the current code in aspm.c
with an #ifdef and always build the file.
David E. Box [Fri, 23 Feb 2024 20:58:47 +0000 (14:58 -0600)]
PCI/ASPM: Move pci_configure_ltr() to aspm.c
The Latency Tolerance Reporting (LTR) mechanism supports the ASPM L1.2
state and is only configured when CONFIG_PCIEASPM is set.
Move pci_configure_ltr() and pci_bridge_reconfigure_ltr() into aspm.c since
they only build when CONFIG_PCIEASPM is set. No functional change
intended.
Commit d9c8bea179a6 ("PCI: Remove unused IORESOURCE_ROM_COPY and
IORESOURCE_ROM_BIOS_COPY") removed pci_cleanup_rom(), but retained
its declaration in pci.h.
Lukas Wunner [Mon, 30 Oct 2023 12:32:12 +0000 (13:32 +0100)]
PCI/sysfs: Compile pci-sysfs.c only if CONFIG_SYSFS=y
It is possible to enable CONFIG_PCI but disable CONFIG_SYSFS and for
space-constrained devices such as routers, such a configuration may
actually make sense.
However pci-sysfs.c is compiled even if CONFIG_SYSFS is disabled,
unnecessarily increasing the kernel's size.
To rectify that:
* Move pci_mmap_fits() to mmap.c. It is not only needed by
pci-sysfs.c, but also proc.c.
* Move pci_dev_type to probe.c and make it private. It references
pci_dev_attr_groups in pci-sysfs.c. Make that public instead for
consistency with pci_dev_groups, pcibus_groups and pci_bus_groups,
which are likewise public and referenced by struct definitions in
pci-driver.c and probe.c.
* Define pci_dev_groups, pci_dev_attr_groups, pcibus_groups and
pci_bus_groups to NULL if CONFIG_SYSFS is disabled. Provide empty
static inlines for pci_{create,remove}_legacy_files() and
pci_{create,remove}_sysfs_dev_files().
Result:
vmlinux size is reduced by 122996 bytes in my arm 32-bit test build.
Paul Menzel [Tue, 5 Mar 2024 11:30:56 +0000 (12:30 +0100)]
PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root Ports
Commit 5459c0b70467 ("PCI/DPC: Quirk PIO log size for certain Intel Root
Ports") and commit 3b8803494a06 ("PCI/DPC: Quirk PIO log size for Intel Ice
Lake Root Ports") add quirks for Ice, Tiger and Alder Lake Root Ports.
System firmware for Raptor Lake still has the bug, so Linux logs the
warning below on several Raptor Lake systems like Dell Precision 3581 with
Intel Raptor Lake processor (0W18NX) system firmware/BIOS version 1.10.1.
pci 0000:00:07.0: [8086:a76e] type 01 class 0x060400
pci 0000:00:07.0: DPC: RP PIO log size 0 is invalid
pci 0000:00:07.1: [8086:a73f] type 01 class 0x060400
pci 0000:00:07.1: DPC: RP PIO log size 0 is invalid
Apply the quirk for Raptor Lake Root Ports as well.
This also enables the DPC driver to dump the RP PIO Log registers when DPC
is triggered.
PCI/PM: Drain runtime-idle callbacks before driver removal
A race condition between the .runtime_idle() callback and the .remove()
callback in the rtsx_pcr PCI driver leads to a kernel crash due to an
unhandled page fault [1].
The problem is that rtsx_pci_runtime_idle() is not expected to be running
after pm_runtime_get_sync() has been called, but the latter doesn't really
guarantee that. It only guarantees that the suspend and resume callbacks
will not be running when it returns.
However, if a .runtime_idle() callback is already running when
pm_runtime_get_sync() is called, the latter will notice that the runtime PM
status of the device is RPM_ACTIVE and it will return right away without
waiting for the former to complete. In fact, it cannot wait for
.runtime_idle() to complete because it may be called from that callback (it
arguably does not make much sense to do that, but it is not strictly
prohibited).
Thus in general, whoever is providing a .runtime_idle() callback needs
to protect it from running in parallel with whatever code runs after
pm_runtime_get_sync(). [Note that .runtime_idle() will not start after
pm_runtime_get_sync() has returned, but it may continue running then if it
has started earlier.]
One way to address that race condition is to call pm_runtime_barrier()
after pm_runtime_get_sync() (not before it, because a nonzero value of the
runtime PM usage counter is necessary to prevent runtime PM callbacks from
being invoked) to wait for the .runtime_idle() callback to complete should
it be running at that point. A suitable place for doing that is in
pci_device_remove() which calls pm_runtime_get_sync() before removing the
driver, so it may as well call pm_runtime_barrier() subsequently, which
will prevent the race in question from occurring, not just in the rtsx_pcr
driver, but in any PCI drivers providing .runtime_idle() callbacks.
Edmund Raile [Tue, 27 Feb 2024 13:14:18 +0000 (13:14 +0000)]
PCI: Mark LSI FW643 to avoid bus reset
Apparently the LSI / Agere FW643 can't recover after a Secondary Bus Reset
and requires a power-off or suspend/resume and rescan.
VFIO resets a device before assigning it to a VM, and the FW643 doesn't
support any other reset methods, so this problem prevented assignment of
FW643 to VMs.
Prevent use of Secondary Bus Reset for this device.
With this change, the FW643 can be assigned to VMs with VFIO. Note that it
will not be reset, resulting in leaking state between VMs and host.
Raag Jadav [Tue, 27 Feb 2024 06:26:48 +0000 (11:56 +0530)]
PCI/PM: Allow runtime PM with no PM callbacks at all
Commit c5eb1190074c ("PCI / PM: Allow runtime PM without callback
functions") eliminated the need for PM callbacks in
pci_pm_runtime_suspend() and pci_pm_runtime_resume(), but
didn't do the same for pci_pm_runtime_idle().
Therefore, runtime suspend worked as long as the driver implemented at
least one PM callback. But if the driver doesn't implement any PM
callbacks at all (driver->pm is NULL), pci_pm_runtime_idle() returned
-ENOSYS, which prevented runtime suspend.
Modify pci_pm_runtime_idle() to allow PCI device power state transitions
without runtime PM callbacks and complete the original intention of commit c5eb1190074c ("PCI / PM: Allow runtime PM without callback functions").
Daniel Drake [Wed, 28 Feb 2024 07:53:16 +0000 (08:53 +0100)]
Revert "ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default"
This reverts commit d52848620de00cde4a3a5df908e231b8c8868250, which was
originally put in place to work around a s2idle failure on this platform
where the NVMe device was inaccessible upon resume.
After extended testing, we found that the firmware's implementation of S3
is buggy and intermittently fails to wake up the system. We need to revert
to s2idle mode.
The NVMe issue has now been solved more precisely in the commit titled
"PCI: Disable D3cold on Asus B1400 PCI-NVMe bridge"
Daniel Drake [Wed, 28 Feb 2024 07:53:15 +0000 (08:53 +0100)]
PCI: Disable D3cold on Asus B1400 PCI-NVMe bridge
The Asus B1400 with original shipped firmware versions and VMD disabled
cannot resume from suspend: the NVMe device becomes unresponsive and
inaccessible.
This appears to be an untested D3cold transition by the vendor; Intel
socwatch shows that Windows leaves the NVMe device and parent bridge in D0
during suspend, even though these firmware versions have StorageD3Enable=1.
The NVMe device and parent PCI bridge both share the same "PXP" ACPI power
resource, which gets turned off as both devices are put into D3cold during
suspend. The _OFF() method calls DL23() which sets a L23E bit at offset
0xe2 into the PCI configuration space for this root port. This is the
specific write that the _ON() routine is unable to recover from. This
register is not documented in the public chipset datasheet.
Disallow D3cold on the PCI bridge to enable successful suspend/resume.
PCI/DPC: Ignore Surprise Down error on hot removal
According to PCIe r6.0 sec 6.7.6 [1], async removal with DPC may result in
surprise down error. This error is expected and is just a side-effect of
async remove.
Ignore surprise down error generated as a side-effect of async remove.
Typically, this error is benign as the pciehp handler invoked by PDC
or/and DLLSC alongside DPC, de-enumerates and brings down the device
appropriately, but the error messages might confuse users. Get rid of
these irritating log messages with a 1s delay while pciehp waits for
DPC recovery.
The implementation is as follows: On an async remove a DPC is triggered
along with a Presence Detect State change and/or DLL State Change.
Determine it's an async remove by checking for DPC Trigger Status in DPC
Status Register and Surprise Down Error Status in AER Uncorrected Error
Status to be non-zero. If true, treat the DPC event as a side-effect of
async remove, clear the error status registers and continue with hot-plug
tear down routines. If not, follow the existing routine to handle AER and
DPC errors.
Masking Surprise Down Errors was explored as an alternative approach, but
discarded due to the odd behavior that masking only avoids the interrupt,
but still records an error per PCIe r6.0, sec 6.2.3.2.2. That stale error
would be reported the next time some error other than Surprise Down is
handled.
Dmesg before:
pcieport 0000:00:01.4: DPC: containment event, status:0x1f01 source:0x0000
pcieport 0000:00:01.4: DPC: unmasked uncorrectable error detected
pcieport 0000:00:01.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
pcieport 0000:00:01.4: device [1022:14ab] error status/mask=00000020/04004000
pcieport 0000:00:01.4: [ 5] SDES (First)
nvme nvme2: frozen state error detected, reset controller
pcieport 0000:00:01.4: DPC: Data Link Layer Link Active not set in 1000 msec
pcieport 0000:00:01.4: AER: subordinate device reset failed
pcieport 0000:00:01.4: AER: device recovery failed
pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
nvme2n1: detected capacity change from 1953525168 to 0
pci 0000:04:00.0: Removing from iommu group 49
Dmesg after:
pcieport 0000:00:01.4: pciehp: Slot(16): Link Down
nvme1n1: detected capacity change from 1953525168 to 0
pci 0000:04:00.0: Removing from iommu group 37
[1] PCI Express Base Specification Revision 6.0, Dec 16 2021.
https://members.pcisig.com/wg/PCI-SIG/document/16609
Jörg Wedekind [Mon, 19 Feb 2024 13:28:11 +0000 (14:28 +0100)]
PCI: Mark 3ware-9650SE Root Port Extended Tags as broken
Per PCIe r6.1, sec 2.2.6.2 and 7.5.3.4, a Requester may not use 8-bit Tags
unless its Extended Tag Field Enable is set, but all Receivers/Completers
must handle 8-bit Tags correctly regardless of their Extended Tag Field
Enable.
Some devices do not handle 8-bit Tags as Completers, so add a quirk for
them. If we find such a device, we disable Extended Tags for the entire
hierarchy to make peer-to-peer DMA possible.
The 3ware 9650SE seems to have issues with handling 8-bit tags. Mark it as
broken.
Philipp Stanner [Wed, 31 Jan 2024 09:00:23 +0000 (10:00 +0100)]
PCI: Move devres code from pci.c to devres.c
The file pci.c is very large and contains a number of devres functions.
These functions should now reside in devres.c.
Move as much devres-specific code from pci.c to devres.c as possible.
There are a few callers left in pci.c that do devres operations. These
should be ported in the future. Add corresponding TODOs.
The reason they are not moved right now in this commit is that PCI's devres
currently implements a sort of "hybrid-mode": pci_request_region(), for
instance, does not have a corresponding pcim_ equivalent, yet. Instead, the
function can be made managed by previously calling pcim_enable_device()
(instead of pci_enable_device()). This makes it unreasonable to move
pci_request_region() to devres.c. Moving the functions would require
changes to PCI's API and is, therefore, left for future work.
In summary, this commit serves as a preparation step for a following
patch series that will cleanly separate the PCI's managed and unmanaged
API.
Philipp Stanner [Wed, 31 Jan 2024 09:00:22 +0000 (10:00 +0100)]
PCI: Move PCI-specific devres code to drivers/pci/
The pcim_*() functions in lib/devres.c are guarded by an #ifdef CONFIG_PCI
and, thus, don't belong to this file. They are only ever used for PCI and
are not generic infrastructure.
Move all pcim_*() functions in lib/devres.c to drivers/pci/devres.c.
Adjust the Makefile.
PCI: switchtec: Fix an error handling path in switchtec_pci_probe()
The commit in Fixes changed the logic on how resources are released and
introduced a new switchtec_exit_pci() that need to be called explicitly in
order to undo a corresponding switchtec_init_pci().
This was done in the remove function, but not in the probe.
Ilpo Järvinen [Thu, 18 Jan 2024 11:08:15 +0000 (13:08 +0200)]
PCI/DPC: Print all TLP Prefixes, not just the first
The TLP Prefix Log Register consists of multiple DWORDs (PCIe r6.1 sec
7.9.14.13) but the loop in dpc_process_rp_pio_error() keeps reading from
the first DWORD, so we print only the first PIO TLP Prefix (duplicated
several times), and we never print the second, third, etc., Prefixes.
Add the iteration count based offset calculation into the config read.
Linus Torvalds [Sun, 21 Jan 2024 22:01:12 +0000 (14:01 -0800)]
Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs
Pull more bcachefs updates from Kent Overstreet:
"Some fixes, Some refactoring, some minor features:
- Assorted prep work for disk space accounting rewrite
- BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this
makes our trigger context more explicit
- A few fixes to avoid excessive transaction restarts on
multithreaded workloads: fstests (in addition to ktest tests) are
now checking slowpath counters, and that's shaking out a few bugs
- Assorted tracepoint improvements
- Starting to break up bcachefs_format.h and move on disk types so
they're with the code they belong to; this will make room to start
documenting the on disk format better.
- A few minor fixes"
* tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits)
bcachefs: Improve inode_to_text()
bcachefs: logged_ops_format.h
bcachefs: reflink_format.h
bcachefs; extents_format.h
bcachefs: ec_format.h
bcachefs: subvolume_format.h
bcachefs: snapshot_format.h
bcachefs: alloc_background_format.h
bcachefs: xattr_format.h
bcachefs: dirent_format.h
bcachefs: inode_format.h
bcachefs; quota_format.h
bcachefs: sb-counters_format.h
bcachefs: counters.c -> sb-counters.c
bcachefs: comment bch_subvolume
bcachefs: bch_snapshot::btime
bcachefs: add missing __GFP_NOWARN
bcachefs: opts->compression can now also be applied in the background
bcachefs: Prep work for variable size btree node buffers
bcachefs: grab s_umount only if snapshotting
...
Linus Torvalds [Sun, 21 Jan 2024 19:14:40 +0000 (11:14 -0800)]
Merge tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Updates for time and clocksources:
- A fix for the idle and iowait time accounting vs CPU hotplug.
The time is reset on CPU hotplug which makes the accumulated
systemwide time jump backwards.
- Assorted fixes and improvements for clocksource/event drivers"
* tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug
clocksource/drivers/ep93xx: Fix error handling during probe
clocksource/drivers/cadence-ttc: Fix some kernel-doc warnings
clocksource/drivers/timer-ti-dm: Fix make W=n kerneldoc warnings
clocksource/timer-riscv: Add riscv_clock_shutdown callback
dt-bindings: timer: Add StarFive JH8100 clint
dt-bindings: timer: thead,c900-aclint-mtimer: separate mtime and mtimecmp regs
Kent Overstreet [Tue, 16 Jan 2024 21:20:21 +0000 (16:20 -0500)]
bcachefs: opts->compression can now also be applied in the background
The "apply this compression method in the background" paths now use the
compression option if background_compression is not set; this means that
setting or changing the compression option will cause existing data to
be compressed accordingly in the background.
Kent Overstreet [Tue, 16 Jan 2024 18:29:59 +0000 (13:29 -0500)]
bcachefs: Prep work for variable size btree node buffers
bcachefs btree nodes are big - typically 256k - and btree roots are
pinned in memory. As we're now up to 18 btrees, we now have significant
memory overhead in mostly empty btree roots.
And in the future we're going to start enforcing that certain btree node
boundaries exist, to solve lock contention issues - analagous to XFS's
AGIs.
Thus, we need to start allocating smaller btree node buffers when we
can. This patch changes code that refers to the filesystem constant
c->opts.btree_node_size to refer to the btree node buffer size -
btree_buf_bytes() - where appropriate.
In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
and unlock it at the end of the function. There is a comment
"why do we need this lock?" about the lock coming from
commit 42d237320e98 ("bcachefs: Snapshot creation, deletion")
The reason is that __bch2_ioctl_subvolume_create() calls
sync_inodes_sb() which enforce locked s_umount to writeback all dirty
nodes before doing snapshot works.
Fix it by read locking s_umount for snapshotting only and unlocking
s_umount after sync_inodes_sb().
Colin Ian King [Tue, 16 Jan 2024 11:07:23 +0000 (11:07 +0000)]
bcachefs: remove redundant variable tmp
The variable tmp is being assigned a value but it isn't being
read afterwards. The assignment is redundant and so tmp can be
removed.
Cleans up clang scan build warning:
warning: Although the value stored to 'ret' is used in the enclosing
expression, the value is never actually read from 'ret'
[deadcode.DeadStores]
Kent Overstreet [Tue, 16 Jan 2024 01:37:23 +0000 (20:37 -0500)]
bcachefs: Fix excess transaction restarts in __bchfs_fallocate()
drop_locks_do() should not be used in a fastpath without first trying
the do in nonblocking mode - the unlock and relock will cause excessive
transaction restarts and potentially livelocking with other threads that
are contending for the same locks.
Kent Overstreet [Mon, 15 Jan 2024 22:59:51 +0000 (17:59 -0500)]
bcachefs: Better journal tracepoints
Factor out bch2_journal_bufs_to_text(), and use it in the
journal_entry_full() tracepoint; when we can't get a journal reservation
we need to know the outstanding journal entry sizes to know if the
problem is due to excessive flushing.
Kent Overstreet [Mon, 15 Jan 2024 22:56:22 +0000 (17:56 -0500)]
bcachefs: Avoid flushing the journal in the discard path
When issuing discards, we may need to flush the journal if there's too
many buckets that can't be discarded until a journal flush.
But the heuristic was bad; we should be comparing the number of buckets
that need to flushes against the number of free buckets, not the number
of buckets we saw.
Kent Overstreet [Mon, 15 Jan 2024 19:15:26 +0000 (14:15 -0500)]
bcachefs: bch2_kthread_io_clock_wait() no longer sleeps until full amount
Drop t he loop in bch2_kthread_io_clock_wait(): this allows the code
that uses it to be woken up for other reasons, and fixes a bug where
rebalance wouldn't wake up when a scan was requested.
This raises the possibility of spurious wakeups, but callers should
always be able to handle that reasonably well.
Kent Overstreet [Thu, 11 Jan 2024 04:47:04 +0000 (23:47 -0500)]
bcachefs: Reduce would_deadlock restarts
We don't have to take locks in any particular ordering - we'll make
forward progress just fine - but if we try to stick to an ordering, it
can help to avoid excessive would_deadlock transaction restarts.
This tweaks the reflink path to take extents btree locks in the right
order.
Kent Overstreet [Thu, 11 Jan 2024 04:08:30 +0000 (23:08 -0500)]
bcachefs: Don't log errors if BCH_WRITE_ALLOC_NOWAIT
Previously, we added logging in the write path to ensure that any
unexpected errors getting reported to userspace have a log message; but
BCH_WRITE_ALLOC_NOWAIT is a special case, it's used for promotes where
errors are expected and not reported out to userspace - so we need to
silence those.
- retry (reconnect) improvement including new retrans mount parm, and
handling of two additional return codes that need to be retried on
- two minor cleanup patches and another to remove duplicate query
info code
- two documentation cleanup, and one reviewer email correction"
* tag 'v6.8-rc-part2-smb-client' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update iface_last_update on each query-and-update
cifs: handle servers that still advertise multichannel after disabling
cifs: new mount option called retrans
cifs: reschedule periodic query for server interfaces
smb: client: don't clobber ->i_rdev from cached reparse points
smb: client: get rid of smb311_posix_query_path_info()
smb: client: parse owner/group when creating reparse points
smb: client: fix parsing of SMB3.1.1 POSIX create context
cifs: update known bugs mentioned in kernel docs for cifs
cifs: new nt status codes from MS-SMB2
cifs: pick channel for tcon and tdis
cifs: open_cached_dir should not rely on primary channel
smb3: minor documentation updates
Update MAINTAINERS email address
cifs: minor comment cleanup
smb3: show beginning time for per share stats
cifs: remove redundant variable tcon_exist
Linus Torvalds [Sat, 20 Jan 2024 23:03:25 +0000 (15:03 -0800)]
Merge tag 'dmaengine-fix-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"New support:
- Loongson LS2X APB DMA controller
- sf-pdma: mpfs-pdma support
- Qualcomm X1E80100 GPI dma controller support
Updates:
- Xilinx XDMA updates to support interleaved DMA transfers
- TI PSIL threads for AM62P and J722S and cfg register regions
description
- axi-dmac Improving the cyclic DMA transfers
- Tegra Support dma-channel-mask property
- Remaining platform remove callback returning void conversions
Driver fixes for:
- Xilinx xdma driver operator precedence and initialization fix
- Excess kernel-doc warning fix in imx-sdma xilinx xdma drivers
- format-overflow warning fix for rz-dmac, sh usb dmac drivers
- 'output may be truncated' fix for shdma, fsl-qdma and dw-edma
drivers"
* tag 'dmaengine-fix-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (58 commits)
dmaengine: dw-edma: increase size of 'name' in debugfs code
dmaengine: fsl-qdma: increase size of 'irq_name'
dmaengine: shdma: increase size of 'dev_id'
dmaengine: xilinx: xdma: Fix kernel-doc warnings
dmaengine: usb-dmac: Avoid format-overflow warning
dmaengine: sh: rz-dmac: Avoid format-overflow warning
dmaengine: imx-sdma: fix Excess kernel-doc warnings
dmaengine: xilinx: xdma: Fix initialization location of desc in xdma_channel_isr()
dmaengine: xilinx: xdma: Fix operator precedence in xdma_prep_interleaved_dma()
dmaengine: xilinx: xdma: statify xdma_prep_interleaved_dma
dmaengine: xilinx: xdma: Workaround truncation compilation error
dmaengine: pl330: issue_pending waits until WFP state
dmaengine: xilinx: xdma: Implement interleaved DMA transfers
dmaengine: xilinx: xdma: Prepare the introduction of interleaved DMA transfers
dmaengine: xilinx: xdma: Add transfer error reporting
dmaengine: xilinx: xdma: Add error checking in xdma_channel_isr()
dmaengine: xilinx: xdma: Rework xdma_terminate_all()
dmaengine: xilinx: xdma: Ease dma_pool alignment requirements
dmaengine: xilinx: xdma: Add necessary macro definitions
dmaengine: xilinx: xdma: Get rid of unused code
...
Linus Torvalds [Sat, 20 Jan 2024 22:20:34 +0000 (14:20 -0800)]
Merge tag 'coccinelle-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux
Pull coccinelle updates from Julia Lawall:
"Updates to the device_attr_show semantic patch to reflect the new
guidelines of the Linux kernel documentation.
The problem was identified by Li Zhijian <[email protected]>, who
proposed an initial fix"
* tag 'coccinelle-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux:
coccinelle: device_attr_show: simplify patch case
coccinelle: device_attr_show: Adapt to the latest Documentation/filesystems/sysfs.rst
Aurelien Jarno [Sat, 13 Jan 2024 18:33:31 +0000 (19:33 +0100)]
media: solo6x10: replace max(a, min(b, c)) by clamp(b, a, c)
This patch replaces max(a, min(b, c)) by clamp(b, a, c) in the solo6x10
driver. This improves the readability and more importantly, for the
solo6x10-p2m.c file, this reduces on my system (x86-64, gcc 13):
- the preprocessed size from 121 MiB to 4.5 MiB;
- the build CPU time from 46.8 s to 1.6 s;
- the build memory from 2786 MiB to 98MiB.
In fine, this allows this relatively simple C file to be built on a
32-bit system.
Linus Torvalds [Tue, 9 Jan 2024 00:43:04 +0000 (16:43 -0800)]
execve: open the executable file before doing anything else
No point in allocating a new mm, counting arguments and environment
variables etc if we're just going to return ENOENT.
This patch does expose the fact that 'do_filp_open()' that execve() uses
is still unnecessarily expensive in the failure case, because it
allocates the 'struct file *' early, even if the path lookup (which is
heavily optimized) fails.
So that remains an unnecessary cost in the "no such executable" case,
but it's a separate issue. Regardless, I do not want to do _both_ a
filename_lookup() and a later do_filp_open() like the origin patch by
Josh Triplett did in [1].
Linus Torvalds [Sat, 20 Jan 2024 19:06:04 +0000 (11:06 -0800)]
Merge tag 'riscv-for-linus-6.8-mw4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- Support for tuning for systems with fast misaligned accesses.
- Support for SBI-based suspend.
- Support for the new SBI debug console extension.
- The T-Head CMOs now use PA-based flushes.
- Support for enabling the V extension in kernel code.
- Optimized IP checksum routines.
- Various ftrace improvements.
- Support for archrandom, which depends on the Zkr extension.
- The build is no longer broken under NET=n, KUNIT=y for ports that
don't define their own ipv6 checksum.
* tag 'riscv-for-linus-6.8-mw4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (56 commits)
lib: checksum: Fix build with CONFIG_NET=n
riscv: lib: Check if output in asm goto supported
riscv: Fix build error on rv32 + XIP
riscv: optimize ELF relocation function in riscv
RISC-V: Implement archrandom when Zkr is available
riscv: Optimize hweight API with Zbb extension
riscv: add dependency among Image(.gz), loader(.bin), and vmlinuz.efi
samples: ftrace: Add RISC-V support for SAMPLE_FTRACE_DIRECT[_MULTI]
riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support
riscv: ftrace: Make function graph use ftrace directly
riscv: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
lib/Kconfig.debug: Update AS_HAS_NON_CONST_LEB128 comment and name
riscv: Restrict DWARF5 when building with LLVM to known working versions
riscv: Hoist linker relaxation disabling logic into Kconfig
kunit: Add tests for csum_ipv6_magic and ip_fast_csum
riscv: Add checksum library
riscv: Add checksum header
riscv: Add static key for misaligned accesses
asm-generic: Improve csum_fold
RISC-V: selftests: cbo: Ensure asm operands match constraints
...
Linus Torvalds [Sat, 20 Jan 2024 17:42:32 +0000 (09:42 -0800)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"Final round of fixes that came in too late to send in the first
request.
It's nine bug fixes and one version update (because of a bug fix) and
one set of PCI ID additions. There's one bug fix in the core which is
really a one liner (except that an additional sdev pointer was added
for convenience) and the rest are in drivers"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target: core: Add TMF to tmr_list handling
scsi: core: Kick the requeue list after inserting when flushing
scsi: fnic: unlock on error path in fnic_queuecommand()
scsi: fcoe: Fix unsigned comparison with zero in store_ctlr_mode()
scsi: mpi3mr: Fix mpi3mr_fw.c kernel-doc warnings
scsi: smartpqi: Bump driver version to 2.1.26-030
scsi: smartpqi: Fix logical volume rescan race condition
scsi: smartpqi: Add new controller PCI IDs
scsi: ufs: qcom: Remove unnecessary goto statement from ufs_qcom_config_esi()
scsi: ufs: core: Remove the ufshcd_hba_exit() call from ufshcd_async_scan()
scsi: ufs: core: Simplify power management during async scan