Git Repo - linux.git/log

x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it

Implement the validation function which tells the core code whether
parallel bringup is possible.

The only condition for now is that the kernel does not run in an encrypted
guest as these will trap the RDMSR via #VC, which cannot be handled at that
point in early startup.

There was an earlier variant for AMD-SEV which used the GHBC protocol for
retrieving the APIC ID via CPUID, but there is no guarantee that the
initial APIC ID in CPUID is the same as the real APIC ID. There is no
enforcement from the secure firmware and the hypervisor can assign APIC IDs
as it sees fit as long as the ACPI/MADT table is consistent with that
assignment.

Unfortunately there is no RDMSR GHCB protocol at the moment, so enabling
AMD-SEV guests for parallel startup needs some more thought.

Intel-TDX provides a secure RDMSR hypercall, but supporting that is outside
the scope of this change.

Fixup announce_cpu() as e.g. on Hyper-V CPU1 is the secondary sibling of
CPU0, which makes the @cpu == 1 logic in announce_cpu() fall apart.

[ mikelley: Reported the announce_cpu() fallout

Originally-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Support parallel startup of secondary CPUs

In parallel startup mode the APs are kicked alive by the control CPU
quickly after each other and run through the early startup code in
parallel. The real-mode startup code is already serialized with a
bit-spinlock to protect the real-mode stack.

In parallel startup mode the smpboot_control variable obviously cannot
contain the Linux CPU number so the APs have to determine their Linux CPU
number on their own. This is required to find the CPUs per CPU offset in
order to find the idle task stack and other per CPU data.

To achieve this, export the cpuid_to_apicid[] array so that each AP can
find its own CPU number by searching therein based on its APIC ID.

Introduce a flag in the top bits of smpboot_control which indicates that
the AP should find its CPU number by reading the APIC ID from the APIC.

This is required because CPUID based APIC ID retrieval can only provide the
initial APIC ID, which might have been overruled by the firmware. Some AMD
APUs come up with APIC ID = initial APIC ID + 0x10, so the APIC ID to CPU
number lookup would fail miserably if based on CPUID. Also virtualization
can make its own APIC ID assignements. The only requirement is that the
APIC IDs are consistent with the APCI/MADT table.

For the boot CPU or in case parallel bringup is disabled the control bits
are empty and the CPU number is directly available in bit 0-23 of
smpboot_control.

[ tglx: Initial proof of concept patch with bitlock and APIC ID lookup ]
[ dwmw2: Rework and testing, commit message, CPUID 0x1 and CPU0 support ]
[ seanc: Fix stray override of initial_gs in common_cpu_up() ]
[ Oleksandr Natalenko: reported suspend/resume issue fixed in
x86_acpi_suspend_lowlevel ]
[ tglx: Make it read the APIC ID from the APIC instead of using CPUID,
split the bitlock part out ]

Co-developed-by: Thomas Gleixner <[email protected]>
Co-developed-by: Brian Gerst <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Brian Gerst <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Implement a bit spinlock to protect the realmode stack

Parallel AP bringup requires that the APs can run fully parallel through
the early startup code including the real mode trampoline.

To prepare for this implement a bit-spinlock to serialize access to the
real mode stack so that parallel upcoming APs are not going to corrupt each
others stack while going through the real mode startup code.

Co-developed-by: David Woodhouse <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/apic: Save the APIC virtual base address

For parallel CPU brinugp it's required to read the APIC ID in the low level
startup code. The virtual APIC base address is a constant because its a
fix-mapped address. Exposing that constant which is composed via macros to
assembly code is non-trivial due to header inclusion hell.

Aside of that it's constant only because of the vsyscall ABI
requirement. Once vsyscall is out of the picture the fixmap can be placed
at runtime.

Avoid header hell, stay flexible and store the address in a variable which
can be exposed to the low level startup code.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE

There is often significant latency in the early stages of CPU bringup, and
time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and
then waiting for it to respond before moving on to the next.

Allow a platform to enable parallel setup which brings all to be onlined
CPUs up to the CPUHP_BP_KICK_AP state. While this state advancement on the
control CPU (BP) is single-threaded the important part is the last state
CPUHP_BP_KICK_AP which wakes the to be onlined CPUs up.

This allows the CPUs to run up to the first sychronization point
cpuhp_ap_sync_alive() where they wait for the control CPU to release them
one by one for the full onlining procedure.

This parallelism depends on the CPU hotplug core sync mechanism which
ensures that the parallel brought up CPUs wait for release before touching
any state which would make the CPU visible to anything outside the hotplug
control mechanism.

To handle the SMT constraints of X86 correctly the bringup happens in two
iterations when CONFIG_HOTPLUG_SMT is enabled. The control CPU brings up
the primary SMT threads of each core first, which can load the microcode
without the need to rendevouz with the thread siblings. Once that's
completed it brings up the secondary SMT threads.

Co-developed-by: David Woodhouse <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/apic: Provide cpu_primary_thread mask

Make the primary thread tracking CPU mask based in preparation for simpler
handling of parallel bootup.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Enable split CPU startup

The x86 CPU bringup state currently does AP wake-up, wait for AP to
respond and then release it for full bringup.

It is safe to be split into a wake-up and and a separate wait+release
state.

Provide the required functions and enable the split CPU bringup, which
prepares for parallel bringup, where the bringup of the non-boot CPUs takes
two iterations: One to prepare and wake all APs and the second to wait and
release them. Depending on timing this can eliminate the wait time
completely.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism

The bring up logic of a to be onlined CPU consists of several parts, which
are considered to be a single hotplug state:

  1) Control CPU issues the wake-up

  2) To be onlined CPU starts up, does the minimal initialization,
     reports to be alive and waits for release into the complete bring-up.

  3) Control CPU waits for the alive report and releases the upcoming CPU
     for the complete bring-up.

Allow to split this into two states:

  1) Control CPU issues the wake-up

     After that the to be onlined CPU starts up, does the minimal
     initialization, reports to be alive and waits for release into the
     full bring-up. As this can run after the control CPU dropped the
     hotplug locks the code which is executed on the AP before it reports
     alive has to be carefully audited to not violate any of the hotplug
     constraints, especially not modifying any of the various cpumasks.

     This is really only meant to avoid waiting for the AP to react on the
     wake-up. Of course an architecture can move strict CPU related setup
     functionality, e.g. microcode loading, with care before the
     synchronization point to save further pointless waiting time.

  2) Control CPU waits for the alive report and releases the upcoming CPU
     for the complete bring-up.

This allows that the two states can be split up to run all to be onlined
CPUs up to state #1 on the control CPU and then at a later point run state
#2. This spares some of the latencies of the full serialized per CPU
bringup by avoiding the per CPU wakeup/wait serialization. The assumption
is that the first AP already waits when the last AP has been woken up. This
obvioulsy depends on the hardware latencies and depending on the timings
this might still not completely eliminate all wait scenarios.

This split is just a preparatory step for enabling the parallel bringup
later. The boot time bringup is still fully serialized. It has a separate
config switch so that architectures which want to support parallel bringup
can test the split of the CPUHP_BRINGUG step separately.

To enable this the architecture must support the CPU hotplug core sync
mechanism and has to be audited that there are no implicit hotplug state
dependencies which require a fully serialized bringup.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

cpu/hotplug: Reset task stack state in _cpu_up()

Commit dce1ca0525bf ("sched/scs: Reset task stack state in bringup_cpu()")
ensured that the shadow call stack and KASAN poisoning were removed from
a CPU's stack each time that CPU is brought up, not just once.

This is not incorrect. However, with parallel bringup the idle thread setup
will happen at a different step. As a consequence the cleanup in
bringup_cpu() would be too late.

Move the SCS/KASAN cleanup to the generic _cpu_up() function instead,
which already ensures that the new CPU's stack is available, purely to
allow for early failure. This occurs when the CPU to be brought up is
in the CPUHP_OFFLINE state, which should correctly do the cleanup any
time the CPU has been taken down to the point where such is needed.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Mark Rutland <[email protected]>
Tested-by: Mark Rutland <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

cpu/hotplug: Remove unused state functions

All users converted to the hotplug core mechanism.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

riscv: Switch to hotplug core state synchronization

Switch to the CPU hotplug core state tracking and synchronization
mechanim. No functional change intended.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

parisc: Switch to hotplug core state synchronization

Switch to the CPU hotplug core state tracking and synchronization
mechanim. No functional change intended.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

MIPS: SMP_CPS: Switch to hotplug core state synchronization

Switch to the CPU hotplug core state tracking and synchronization
mechanim. This unfortunately requires to add dead reporting to the non CPS
platforms as CPS is the only user, but it allows an overall consolidation
of this functionality.

No functional change intended.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

csky/smp: Switch to hotplug core state synchronization

Switch to the CPU hotplug core state tracking and synchronization
mechanim. No functional change intended.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

arm64: smp: Switch to hotplug core state synchronization

Switch to the CPU hotplug core state tracking and synchronization
mechanim. No functional change intended.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Mark Rutland <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

ARM: smp: Switch to hotplug core state synchronization

Switch to the CPU hotplug core state tracking and synchronization
mechanim. No functional change intended.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

cpu/hotplug: Remove cpu_report_state() and related unused cruft

No more users.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Switch to hotplug core state synchronization

The new AP state tracking and synchronization mechanism in the CPU hotplug
core code allows to remove quite some x86 specific code:

1) The AP alive synchronization based on cpumasks

2) The decision whether an AP can be brought up again

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

cpu/hotplug: Add CPU state tracking and synchronization

The CPU state tracking and synchronization mechanism in smpboot.c is
completely independent of the hotplug code and all logic around it is
implemented in architecture specific code.

Except for the state reporting of the AP there is absolutely nothing
architecture specific and the sychronization and decision functions can be
moved into the generic hotplug core code.

Provide an integrated variant and add the core synchronization and decision
points. This comes in two flavours:

  1) DEAD state synchronization

     Updated by the architecture code once the AP reaches the point where
     it is ready to be torn down by the control CPU, e.g. by removing power
     or clocks or tear down via the hypervisor.

     The control CPU waits for this state to be reached with a timeout. If
     the state is reached an architecture specific cleanup function is
     invoked.

  2) Full state synchronization

     This extends #1 with AP alive synchronization. This is new
     functionality, which allows to replace architecture specific wait
     mechanims, e.g. cpumasks, completely.

     It also prevents that an AP which is in a limbo state can be brought
     up again. This can happen when an AP failed to report dead state
     during a previous off-line operation.

The dead synchronization is what most architectures use. Only x86 makes a
bringup decision based on that state at the moment.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/xen/hvm: Get rid of DEAD_FROZEN handling

No point in this conditional voodoo. Un-initializing the lock mechanism is
safe to be called unconditionally even if it was already invoked when the
CPU died.

Remove the invocation of xen_smp_intr_free() as that has been already
cleaned up in xen_cpu_dead_hvm().

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/xen/smp_pv: Remove wait for CPU online

Now that the core code drops sparse_irq_lock after the idle thread
synchronized, it's pointless to wait for the AP to mark itself online.

Whether the control CPU runs in a wait loop or sleeps in the core code
waiting for the online operation to complete makes no difference.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Remove wait for cpu_online()

Now that the core code drops sparse_irq_lock after the idle thread
synchronized, it's pointless to wait for the AP to mark itself online.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

cpu/hotplug: Rework sparse_irq locking in bringup_cpu()

There is no harm to hold sparse_irq lock until the upcoming CPU completes
in cpuhp_online_idle(). This allows to remove cpu_online() synchronization
from architecture code.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Remove cpu_callin_mask

Now that TSC synchronization is SMP function call based there is no reason
to wait for the AP to be set in smp_callin_mask. The control CPU waits for
the AP to set itself in the online mask anyway.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Make TSC synchronization function call based

Spin-waiting on the control CPU until the AP reaches the TSC
synchronization is just a waste especially in the case that there is no
synchronization required.

As the synchronization has to run with interrupts disabled the control CPU
part can just be done from a SMP function call. The upcoming AP issues that
call async only in the case that synchronization is required.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Move synchronization masks to SMP boot code

The usage is in smpboot.c and not in the CPU initialization code.

The XEN_PV usage of cpu_callout_mask is obsolete as cpu_init() not longer
waits and cacheinfo has its own CPU mask now, so cpu_callout_mask can be
made static too.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/cpu/cacheinfo: Remove cpu_callout_mask dependency

cpu_callout_mask is used for the stop machine based MTRR/PAT init.

In preparation of moving the BP/AP synchronization to the core hotplug
code, use a private CPU mask for cacheinfo and manage it in the
starting/dying hotplug state.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Get rid of cpu_init_secondary()

The synchronization of the AP with the control CPU is a SMP boot problem
and has nothing to do with cpu_init().

Open code cpu_init_secondary() in start_secondary() and move
wait_for_master_cpu() into the SMP boot code.

No functional change.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Split up native_cpu_up() into separate phases and document them

There are four logical parts to what native_cpu_up() does on the BSP (or
on the controlling CPU for a later hotplug):

1) Wake the AP by sending the INIT/SIPI/SIPI sequence.

2) Wait for the AP to make it as far as wait_for_master_cpu() which
    sets that CPU's bit in cpu_initialized_mask, then sets the bit in
    cpu_callout_mask to let the AP proceed through cpu_init().

3) Wait for the AP to finish cpu_init() and get as far as the
    smp_callin() call, which sets that CPU's bit in cpu_callin_mask.

4) Perform the TSC synchronization and wait for the AP to actually
    mark itself online in cpu_online_mask.

In preparation to allow these phases to operate in parallel on multiple
APs, split them out into separate functions and document the interactions
a little more clearly in both the BP and AP code paths.

No functional change intended.

Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Remove unnecessary barrier()

Peter stumbled over the barrier() after the invocation of smp_callin() in
start_secondary():

  "...this barrier() and it's comment seem weird vs smp_callin(). That
   function ends with an atomic bitop (it has to, at the very least it must
   not be weaker than store-release) but also has an explicit wmb() to order
   setup vs CPU_STARTING.

   There is no way the smp_processor_id() referred to in this comment can land
   before cpu_init() even without the barrier()."

The barrier() along with the comment was added in 2003 with commit
d8f19f2cac70 ("[PATCH] x86-64 merge") in the history tree. One of those
well documented combo patches of that time which changes world and some
more. The context back then was:

/*
* Dont put anything before smp_callin(), SMP
* booting is too fragile that we want to limit the
* things done here to the most necessary things.
*/
cpu_init();
smp_callin();

+ /* otherwise gcc will move up smp_processor_id before the cpu_init */
+ barrier();

Dprintk("cpu %d: waiting for commence\n", smp_processor_id());

Even back in 2003 the compiler was not allowed to reorder that
smp_processor_id() invocation before the cpu_init() function call.
Especially not as smp_processor_id() resolved to:

  asm volatile("movl %%gs:%c1,%0":"=r" (ret__):"i"(pda_offset(field)):"memory");

There is no trace of this change in any mailing list archive including the
back then official x86_64 list [email protected], which would explain the
problem this change solved.

The debug prints are gone by now and the the only smp_processor_id()
invocation today is farther down in start_secondary() after locking
vector_lock which itself prevents reordering.

Even if the compiler would be allowed to reorder this, the code would still
be correct as GSBASE is set up early in the assembly code and is valid when
the CPU reaches start_secondary(), while the code at the time when this
barrier was added did the GSBASE setup in cpu_init().

As the barrier has zero value, remove it.

Reported-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Restrict soft_restart_cpu() to SEV

Now that the CPU0 hotplug cruft is gone, the only user is AMD SEV.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Remove the CPU0 hotplug kludge

This was introduced with commit e1c467e69040 ("x86, hotplug: Wake up CPU0
via NMI instead of INIT, SIPI, SIPI") to eventually support physical
hotplug of CPU0:

"We'll change this code in the future to wake up hard offlined CPU0 if
real platform and request are available."

11 years later this has not happened and physical hotplug is not officially
supported. Remove the cruft.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/topology: Remove CPU0 hotplug option

This was introduced together with commit e1c467e69040 ("x86, hotplug: Wake
up CPU0 via NMI instead of INIT, SIPI, SIPI") to eventually support
physical hotplug of CPU0:

"We'll change this code in the future to wake up hard offlined CPU0 if
real platform and request are available."

11 years later this has not happened and physical hotplug is not officially
supported. Remove the cruft.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Rename start_cpu0() to soft_restart_cpu()

This is used in the SEV play_dead() implementation to re-online CPUs. But
that has nothing to do with CPU0.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Avoid pointless delay calibration if TSC is synchronized

When TSC is synchronized across sockets then there is no reason to
calibrate the delay for the first CPU which comes up on a socket.

Just reuse the existing calibration value.

This removes 100ms pointlessly wasted time from CPU hotplug per socket.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

cpu/hotplug: Mark arch_disable_smp_support() and bringup_nonboot_cpus() __init

No point in keeping them around.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

x86/smpboot: Cleanup topology_phys_to_logical_pkg()/die()

Make topology_phys_to_logical_pkg_die() static as it's only used in
smpboot.c and fixup the kernel-doc warnings for both functions.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Michael Kelley <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Helge Deller <[email protected]> # parisc
Tested-by: Guilherme G. Piccoli <[email protected]> # Steam Deck
Link: https://lore.kernel.org/r/[email protected]

Linux 6.4-rc2

Merge tag 'cxl-fixes-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull compute express link fixes from Dan Williams:

- Fix a compilation issue with DEFINE_STATIC_SRCU() in the unit tests

- Fix leaking kernel memory to a root-only sysfs attribute

* tag 'cxl-fixes-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Add missing return to cdat read error path
tools/testing/cxl: Use DEFINE_STATIC_SRCU()

Merge tag 'parisc-for-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc architecture fixes from Helge Deller:

- Fix encoding of swp_entry due to added SWP_EXCLUSIVE flag

- Include reboot.h to avoid gcc-12 compiler warning

* tag 'parisc-for-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix encoding of swp_entry due to added SWP_EXCLUSIVE flag
parisc: kexec: include reboot.h

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:

- fix unwinder for uleb128 case

- fix kernel-doc warnings for HP Jornada 7xx

- fix unbalanced stack on vfp success path

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9297/1: vfp: avoid unbalanced stack on 'success' return path
  ARM: 9296/1: HP Jornada 7XX: fix kernel-doc warnings
  ARM: 9295/1: unwind:fix unwind abort for uleb128 case

Merge tag 'locking_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Borislav Petkov:

- Make sure __down_read_common() is always inlined so that the callers'
   names land in traceevents output and thus the blocked function can be
   identified

* tag 'locking_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rwsem: Add __always_inline annotation to __down_read_common() and inlined callers

Merge tag 'perf_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fixes from Borislav Petkov:

- Make sure the PEBS buffer is flushed before reprogramming the
   hardware so that the correct record sizes are used

- Update the sample size for AMD BRS events

- Fix a confusion with using the same on-stack struct with different
   events in the event processing path

* tag 'perf_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG
  perf/x86: Fix missing sample size update on AMD BRS
  perf/core: Fix perf_sample_data not properly initialized for different swevents in perf_tp_event()

Merge tag 'sched_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Borislav Petkov:

- Fix a couple of kernel-doc warnings

* tag 'sched_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: fix cid_lock kernel-doc warnings

Merge tag 'x86_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:

- Add the required PCI IDs so that the generic SMN accesses provided by
   amd_nb.c work for drivers which switch to them. Add a PCI device ID
   to k10temp's table so that latter is loaded on such systems too

* tag 'x86_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hwmon: (k10temp) Add PCI ID for family 19, model 78h
  x86/amd_nb: Add PCI ID for family 19h model 78h

Merge tag 'timers_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Borislav Petkov:

- Prevent CPU state corruption when an active clockevent broadcast
device is replaced while the system is already in oneshot mode

* tag 'timers_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/broadcast: Make broadcast device replacement work correctly

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
"Some ext4 bug fixes (mostly to address Syzbot reports)"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: bail out of ext4_xattr_ibody_get() fails for any reason
  ext4: add bounds checking in get_max_inline_xattr_value_size()
  ext4: add indication of ro vs r/w mounts in the mount message
  ext4: fix deadlock when converting an inline directory in nojournal mode
  ext4: improve error recovery code paths in __ext4_remount()
  ext4: improve error handling from ext4_dirhash()
  ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled
  ext4: check iomap type only if ext4_iomap_begin() does not fail
  ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum
  ext4: fix data races when using cached status extents
  ext4: avoid deadlock in fs reclaim with page writeback
  ext4: fix invalid free tracking in ext4_xattr_move_to_block()
  ext4: remove a BUG_ON in ext4_mb_release_group_pa()
  ext4: allow ext4_get_group_info() to fail
  ext4: fix lockdep warning when enabling MMP
  ext4: fix WARNING in mb_find_extent

Merge tag 'fbdev-for-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull fbdev fixes from Helge Deller:

- use after free fix in imsttfb (Zheng Wang)

- fix error handling in arcfb (Zongjie Li)

- lots of whitespace cleanups (Thomas Zimmermann)

- add 1920x1080 modedb entry (me)

* tag 'fbdev-for-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: stifb: Fix info entry in sti_struct on error path
  fbdev: modedb: Add 1920x1080 at 60 Hz video mode
  fbdev: imsttfb: Fix use after free bug in imsttfb_probe
  fbdev: vfb: Remove trailing whitespaces
  fbdev: valkyriefb: Remove trailing whitespaces
  fbdev: stifb: Remove trailing whitespaces
  fbdev: sa1100fb: Remove trailing whitespaces
  fbdev: platinumfb: Remove trailing whitespaces
  fbdev: p9100: Remove trailing whitespaces
  fbdev: maxinefb: Remove trailing whitespaces
  fbdev: macfb: Remove trailing whitespaces
  fbdev: hpfb: Remove trailing whitespaces
  fbdev: hgafb: Remove trailing whitespaces
  fbdev: g364fb: Remove trailing whitespaces
  fbdev: controlfb: Remove trailing whitespaces
  fbdev: cg14: Remove trailing whitespaces
  fbdev: atmel_lcdfb: Remove trailing whitespaces
  fbdev: 68328fb: Remove trailing whitespaces
  fbdev: arcfb: Fix error handling in arcfb_probe()

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
"A single small fix for the UFS driver to fix a power management
failure"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Fix I/O hang that occurs when BKOPS fails in W-LUN suspend

parisc: Fix encoding of swp_entry due to added SWP_EXCLUSIVE flag

Fix the __swp_offset() and __swp_entry() macros due to commit 6d239fc78c0b
("parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE") which introduced the
SWP_EXCLUSIVE flag by reusing the _PAGE_ACCESSED flag.

Reported-by: Christoph Biedl <[email protected]>
Tested-by: Christoph Biedl <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
Fixes: 6d239fc78c0b ("parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE")
Cc: <[email protected]> # v6.3+

ext4: bail out of ext4_xattr_ibody_get() fails for any reason

In ext4_update_inline_data(), if ext4_xattr_ibody_get() fails for any
reason, it's best if we just fail as opposed to stumbling on,
especially if the failure is EFSCORRUPTED.

Cc: [email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: add bounds checking in get_max_inline_xattr_value_size()

Normally the extended attributes in the inode body would have been
checked when the inode is first opened, but if someone is writing to
the block device while the file system is mounted, it's possible for
the inode table to get corrupted. Add bounds checking to avoid
reading beyond the end of allocated memory if this happens.

Reported-by: [email protected]
Link: https://syzkaller.appspot.com/bug?extid=1966db24521e5f6e23f7
Cc: [email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: add indication of ro vs r/w mounts in the mount message

Whether the file system is mounted read-only or read/write is more
important than the quota mode, which we are already printing. Add the
ro vs r/w indication since this can be helpful in debugging problems
from the console log.

Signed-off-by: Theodore Ts'o <[email protected]>

ext4: fix deadlock when converting an inline directory in nojournal mode

In no journal mode, ext4_finish_convert_inline_dir() can self-deadlock
by calling ext4_handle_dirty_dirblock() when it already has taken the
directory lock.  There is a similar self-deadlock in
ext4_incvert_inline_data_nolock() for data files which we'll fix at
the same time.

A simple reproducer demonstrating the problem:

    mke2fs -Fq -t ext2 -O inline_data -b 4k /dev/vdc 64
    mount -t ext4 -o dirsync /dev/vdc /vdc
    cd /vdc
    mkdir file0
    cd file0
    touch file0
    touch file1
    attr -s BurnSpaceInEA -V abcde .
    touch supercalifragilisticexpialidocious

Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Reported-by: [email protected]
Link: https://syzkaller.appspot.com/bug?id=ba84cc80a9491d65416bc7877e1650c87530fe8a
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: improve error recovery code paths in __ext4_remount()

If there are failures while changing the mount options in
__ext4_remount(), we need to restore the old mount options.

This commit fixes two problem. The first is there is a chance that we
will free the old quota file names before a potential failure leading
to a use-after-free. The second problem addressed in this commit is
if there is a failed read/write to read-only transition, if the quota
has already been suspended, we need to renable quota handling.

Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: improve error handling from ext4_dirhash()

The ext4_dirhash() will *almost* never fail, especially when the hash
tree feature was first introduced. However, with the addition of
support of encrypted, casefolded file names, that function can most
certainly fail today.

So make sure the callers of ext4_dirhash() properly check for
failures, and reflect the errors back up to their callers.

Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Link: https://syzkaller.appspot.com/bug?id=db56459ea4ac4a676ae4b4678f633e55da005a9b
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled

When a file system currently mounted read/only is remounted
read/write, if we clear the SB_RDONLY flag too early, before the quota
is initialized, and there is another process/thread constantly
attempting to create a directory, it's possible to trigger the

WARN_ON_ONCE(dquot_initialize_needed(inode));

in ext4_xattr_block_set(), with the following stack trace:

   WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680
   RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141
   Call Trace:
    ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458
    ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44
    security_inode_init_security+0x2df/0x3f0 security/security.c:1147
    __ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324
    ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992
    vfs_mkdir+0x29d/0x450 fs/namei.c:4038
    do_mkdirat+0x264/0x520 fs/namei.c:4061
    __do_sys_mkdirat fs/namei.c:4076 [inline]
    __se_sys_mkdirat fs/namei.c:4074 [inline]
    __x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074

Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Reported-by: [email protected]
Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c34c
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: check iomap type only if ext4_iomap_begin() does not fail

When ext4_iomap_overwrite_begin() calls ext4_iomap_begin() map blocks may
fail for some reason (e.g. memory allocation failure, bare disk write), and
later because "iomap->type ! = IOMAP_MAPPED" triggers WARN_ON(). When ext4
iomap_begin() returns an error, it is normal that the type of iomap->type
may not match the expectation. Therefore, we only determine if iomap->type
is as expected when ext4_iomap_begin() is executed successfully.

Cc: [email protected]
Reported-by: [email protected]
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum

When modifying the block device while it is mounted by the filesystem,
syzbot reported the following:

BUG: KASAN: slab-out-of-bounds in crc16+0x206/0x280 lib/crc16.c:58
Read of size 1 at addr ffff888075f5c0a8 by task syz-executor.2/15586

CPU: 1 PID: 15586 Comm: syz-executor.2 Not tainted 6.2.0-rc5-syzkaller-00205-gc96618275234 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
print_address_description+0x74/0x340 mm/kasan/report.c:306
print_report+0x107/0x1f0 mm/kasan/report.c:417
kasan_report+0xcd/0x100 mm/kasan/report.c:517
crc16+0x206/0x280 lib/crc16.c:58
ext4_group_desc_csum+0x81b/0xb20 fs/ext4/super.c:3187
ext4_group_desc_csum_set+0x195/0x230 fs/ext4/super.c:3210
ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
ext4_free_blocks+0x191a/0x2810 fs/ext4/mballoc.c:6173
ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
ext4_ext_remove_space+0x24ef/0x46a0 fs/ext4/extents.c:2958
ext4_ext_truncate+0x177/0x220 fs/ext4/extents.c:4416
ext4_truncate+0xa6a/0xea0 fs/ext4/inode.c:4342
ext4_setattr+0x10c8/0x1930 fs/ext4/inode.c:5622
notify_change+0xe50/0x1100 fs/attr.c:482
do_truncate+0x200/0x2f0 fs/open.c:65
handle_truncate fs/namei.c:3216 [inline]
do_open fs/namei.c:3561 [inline]
path_openat+0x272b/0x2dd0 fs/namei.c:3714
do_filp_open+0x264/0x4f0 fs/namei.c:3741
do_sys_openat2+0x124/0x4e0 fs/open.c:1310
do_sys_open fs/open.c:1326 [inline]
__do_sys_creat fs/open.c:1402 [inline]
__se_sys_creat fs/open.c:1396 [inline]
__x64_sys_creat+0x11f/0x160 fs/open.c:1396
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f72f8a8c0c9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f72f97e3168 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
RAX: ffffffffffffffda RBX: 00007f72f8bac050 RCX: 00007f72f8a8c0c9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 00007f72f8ae7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd165348bf R14: 00007f72f97e3300 R15: 0000000000022000

Replace
le16_to_cpu(sbi->s_es->s_desc_size)
with
sbi->s_desc_size

It reduces ext4's compiled text size, and makes the code more efficient
(we remove an extra indirect reference and a potential byte
swap on big endian systems), and there is no downside. It also avoids the
potential KASAN / syzkaller failure, as a bonus.

Reported-by: [email protected]
Reported-by: [email protected]
Link: https://syzkaller.appspot.com/bug?id=70d28d11ab14bd7938f3e088365252aa923cff42
Link: https://syzkaller.appspot.com/bug?id=b85721b38583ecc6b5e72ff524c67302abbc30f3
Link: https://lore.kernel.org/all/[email protected]/
Fixes: 717d50e4971b ("Ext4: Uninitialized Block Groups")
Cc: [email protected]
Signed-off-by: Tudor Ambarus <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: fix data races when using cached status extents

When using cached extent stored in extent status tree in tree->cache_es
another process holding ei->i_es_lock for reading can be racing with us
setting new value of tree->cache_es. If the compiler would decide to
refetch tree->cache_es at an unfortunate moment, it could result in a
bogus in_range() check. Fix the possible race by using READ_ONCE() when
using tree->cache_es only under ei->i_es_lock for reading.

Cc: [email protected]
Reported-by: [email protected]
Link: https://lore.kernel.org/all/[email protected]
Suggested-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: avoid deadlock in fs reclaim with page writeback

Ext4 has a filesystem wide lock protecting ext4_writepages() calls to
avoid races with switching of journalled data flag or inode format. This
lock can however cause a deadlock like:

CPU0                            CPU1

ext4_writepages()
  percpu_down_read(sbi->s_writepages_rwsem);
                                ext4_change_inode_journal_flag()
                                  percpu_down_write(sbi->s_writepages_rwsem);
                                    - blocks, all readers block from now on
  ext4_do_writepages()
    ext4_init_io_end()
      kmem_cache_zalloc(io_end_cachep, GFP_KERNEL)
        fs_reclaim frees dentry...
          dentry_unlink_inode()
            iput() - last ref =>
              iput_final() - inode dirty =>
                write_inode_now()...
                  ext4_writepages() tries to acquire sbi->s_writepages_rwsem
                    and blocks forever

Make sure we cannot recurse into filesystem reclaim from writeback code
to avoid the deadlock.

Reported-by: [email protected]
Link: https://lore.kernel.org/all/[email protected]
Fixes: c8585c6fcaf2 ("ext4: fix races between changing inode journal mode and ext4_writepages")
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: fix invalid free tracking in ext4_xattr_move_to_block()

In ext4_xattr_move_to_block(), the value of the extended attribute
which we need to move to an external block may be allocated by
kvmalloc() if the value is stored in an external inode.  So at the end
of the function the code tried to check if this was the case by
testing entry->e_value_inum.

However, at this point, the pointer to the xattr entry is no longer
valid, because it was removed from the original location where it had
been stored.  So we could end up calling kvfree() on a pointer which
was not allocated by kvmalloc(); or we could also potentially leak
memory by not freeing the buffer when it should be freed.  Fix this by
storing whether it should be freed in a separate variable.

Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Link: https://syzkaller.appspot.com/bug?id=5c2aee8256e30b55ccf57312c16d88417adbd5e1
Link: https://syzkaller.appspot.com/bug?id=41a6b5d4917c0412eb3b3c3c604965bed7d7420b
Reported-by: [email protected]
Reported-by: [email protected]
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: remove a BUG_ON in ext4_mb_release_group_pa()

If a malicious fuzzer overwrites the ext4 superblock while it is
mounted such that the s_first_data_block is set to a very large
number, the calculation of the block group can underflow, and trigger
a BUG_ON check. Change this to be an ext4_warning so that we don't
crash the kernel.

Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Reported-by: [email protected]
Link: https://syzkaller.appspot.com/bug?id=69b28112e098b070f639efb356393af3ffec4220
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: allow ext4_get_group_info() to fail

Previously, ext4_get_group_info() would treat an invalid group number
as BUG(), since in theory it should never happen.  However, if a
malicious attaker (or fuzzer) modifies the superblock via the block
device while it is the file system is mounted, it is possible for
s_first_data_block to get set to a very large number.  In that case,
when calculating the block group of some block number (such as the
starting block of a preallocation region), could result in an
underflow and very large block group number.  Then the BUG_ON check in
ext4_get_group_info() would fire, resutling in a denial of service
attack that can be triggered by root or someone with write access to
the block device.

For a quality of implementation perspective, it's best that even if
the system administrator does something that they shouldn't, that it
will not trigger a BUG.  So instead of BUG'ing, ext4_get_group_info()
will call ext4_error and return NULL.  We also add fallback code in
all of the callers of ext4_get_group_info() that it might NULL.

Also, since ext4_get_group_info() was already borderline to be an
inline function, un-inline it.  The results in a next reduction of the
compiled text size of ext4 by roughly 2k.

Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Reported-by: [email protected]
Link: https://syzkaller.appspot.com/bug?id=69b28112e098b070f639efb356393af3ffec4220
Signed-off-by: Theodore Ts'o <[email protected]>
Reviewed-by: Jan Kara <[email protected]>

Merge tag 'block-6.4-2023-05-13' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:
"Just a few minor fixes for drivers, and a deletion of a file that is
  woefully out-of-date these days"

* tag 'block-6.4-2023-05-13' of git://git.kernel.dk/linux:
  Documentation/block: drop the request.rst file
  ublk: fix command op code check
  block/rnbd: replace REQ_OP_FLUSH with REQ_OP_WRITE
  nbd: Fix debugfs_create_dir error checking

cxl: Add missing return to cdat read error path

Add a return to the error path when cxl_cdat_read_table() fails. Current
code continues with the table pointer points to freed memory.

Fixes: 7a877c923995 ("cxl/pci: Simplify CDAT retrieval error path")
Signed-off-by: Dave Jiang <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Link: https://lore.kernel.org/r/168382793506.3510737.4792518576623749076.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <[email protected]>

tools/testing/cxl: Use DEFINE_STATIC_SRCU()

Starting with commit:

95433f726301 ("srcu: Begin offloading srcu_struct fields to srcu_update")

...it is no longer possible to do:

static DEFINE_SRCU(x)

Switch to DEFINE_STATIC_SRCU(x) to fix:

tools/testing/cxl/test/mock.c:22:1: error: duplicate ‘static’
22 | static DEFINE_SRCU(cxl_mock_srcu);
| ^~~~~~

Reviewed-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/168392709546.1135523.10424917245934547117.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <[email protected]>

x86/retbleed: Fix return thunk alignment

SYM_FUNC_START_LOCAL_NOALIGN() adds an endbr leading to this layout
(leaving only the last 2 bytes of the address):

  3bff <zen_untrain_ret>:
  3bff:       f3 0f 1e fa             endbr64
  3c03:       f6                      test   $0xcc,%bl

  3c04 <__x86_return_thunk>:
  3c04:       c3                      ret
  3c05:       cc                      int3
  3c06:       0f ae e8                lfence

However, "the RET at __x86_return_thunk must be on a 64 byte boundary,
for alignment within the BTB."

Use SYM_START instead.

Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull more btrfs fixes from David Sterba:

- fix incorrect number of bitmap entries for space cache if loading is
   interrupted by some error

- fix backref walking, this breaks a mode of LOGICAL_INO_V2 ioctl that
   is used in deduplication tools

- zoned mode fixes:
      - properly finish zone reserved for relocation
      - correctly calculate super block zone end on ZNS
      - properly initialize new extent buffer for redirty

- make mount option clear_cache work with block-group-tree, to rebuild
   free-space-tree instead of temporarily disabling it that would lead
   to a forced read-only mount

- fix alignment check for offset when printing extent item

* tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: make clear_cache mount option to rebuild FST without disabling it
  btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add
  btrfs: zoned: fix full zone super block reading on ZNS
  btrfs: zoned: zone finish data relocation BG with last IO
  btrfs: fix backref walking not returning all inode refs
  btrfs: fix space cache inconsistency after error loading it from disk
  btrfs: print-tree: parent bytenr must be aligned to sector size

Merge tag '6.4-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs client fixes from Steve French:

- fix for copy_file_range bug for very large files that are multiples
   of rsize

- do not ignore "isolated transport" flag if set on share

- set rasize default better

- three fixes related to shutdown and freezing (fixes 4 xfstests, and
   closes deferred handles faster in some places that were missed)

* tag '6.4-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: release leases for deferred close handles when freezing
  smb3: fix problem remounting a share after shutdown
  SMB3: force unmount was failing to close deferred close files
  smb3: improve parallel reads of large files
  do not reuse connection if share marked as isolated
  cifs: fix pcchunk length type in smb2_copychunk_range

Merge tag 'vfs/v6.4-rc1/pipe' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fix from Christian Brauner:
"During the pipe nonblock rework the check for both O_NONBLOCK and
  IOCB_NOWAIT was dropped. Both checks need to be performed to ensure
  that files without O_NONBLOCK but IOCB_NOWAIT don't block when writing
  to or reading from a pipe.

  This just contains the fix adding the check for IOCB_NOWAIT back in"

* tag 'vfs/v6.4-rc1/pipe' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  pipe: check for IOCB_NOWAIT alongside O_NONBLOCK

Merge tag 'io_uring-6.4-2023-05-12' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
"Just a single fix making io_uring_sqe_cmd() available regardless of
  CONFIG_IO_URING, fixing a regression introduced during the merge
  window if nvme was selected but io_uring was not"

* tag 'io_uring-6.4-2023-05-12' of git://git.kernel.dk/linux:
  io_uring: make io_uring_sqe_cmd() unconditionally available

Merge tag 'riscv-for-linus-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fix from Palmer Dabbelt:
"Just a single fix this week for a build issue. That'd usually be a
  good sign, but we've started to get some reports of boot failures on
  some hardware/bootloader configurations. Nothing concrete yet, but
  I've got a funny feeling that's where much of the bug hunting is going
  right now.

  Nothing's reproducing on my end, though, and this fixes some pretty
  concrete issues so I figured there's no reason to delay it:

   - a fix to the linker script to avoid orpahaned sections in
     kernel/pi"

* tag 'riscv-for-linus-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix orphan section warnings caused by kernel/pi

Documentation/block: drop the request.rst file

Documentation/block/request.rst is outdated and should be removed.
Also delete its entry in the block/index.rst file.

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: [email protected]
Cc: Jonathan Corbet <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

pipe: check for IOCB_NOWAIT alongside O_NONBLOCK

Pipe reads or writes need to enable nonblocking attempts, if either
O_NONBLOCK is set on the file, or IOCB_NOWAIT is set in the iocb being
passed in. The latter isn't currently true, ensure we check for both
before waiting on data or space.

Fixes: afed6271f5b0 ("pipe: set FMODE_NOWAIT on pipes")
Signed-off-by: Jens Axboe <[email protected]>
Message-Id: <e5946d67-4e5e-b056-ba80-656bab12d9f6@kernel.dk>
Signed-off-by: Christian Brauner <[email protected]>

ublk: fix command op code check

In case of CONFIG_BLKDEV_UBLK_LEGACY_OPCODES, type of cmd opcode could
be 0 or 'u'; and type can only be 'u' if CONFIG_BLKDEV_UBLK_LEGACY_OPCODES
isn't set.

So fix the wrong check.

Fixes: 2d786e66c966 ("block: ublk: switch to ioctl command encoding")
Signed-off-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

block/rnbd: replace REQ_OP_FLUSH with REQ_OP_WRITE

Since flush bios are implemented as writes with no data and
the preflush flag per Christoph's comment [1].

And we need to change it in rnbd accordingly. Otherwise, I
got splatting when create fs from rnbd client.

[  464.028545] ------------[ cut here ]------------
[  464.028553] WARNING: CPU: 0 PID: 65 at block/blk-core.c:751 submit_bio_noacct+0x32c/0x5d0
[ ... ]
[  464.028668] CPU: 0 PID: 65 Comm: kworker/0:1H Tainted: G           OE      6.4.0-rc1 #9
[  464.028671] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
[  464.028673] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[  464.028717] RIP: 0010:submit_bio_noacct+0x32c/0x5d0
[  464.028720] Code: 03 0f 85 51 fe ff ff 48 8b 43 18 8b 88 04 03 00 00 85 c9 0f 85 3f fe ff ff e9 be fd ff ff 0f b6 d0 3c 0d 74 26 83 fa 01 74 21 <0f> 0b b8 0a 00 00 00 e9 56 fd ff ff 4c 89 e7 e8 70 a1 03 00 84 c0
[  464.028722] RSP: 0018:ffffaf3680b57c68 EFLAGS: 00010202
[  464.028724] RAX: 0000000000060802 RBX: ffffa09dcc18bf00 RCX: 0000000000000000
[  464.028726] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffffa09dde081d00
[  464.028727] RBP: ffffaf3680b57c98 R08: ffffa09dde081d00 R09: ffffa09e38327200
[  464.028729] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa09dde081d00
[  464.028730] R13: ffffa09dcb06e1e8 R14: 0000000000000000 R15: 0000000000200000
[  464.028733] FS:  0000000000000000(0000) GS:ffffa09e3bc00000(0000) knlGS:0000000000000000
[  464.028735] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  464.028736] CR2: 000055a4e8206c40 CR3: 0000000119f06000 CR4: 00000000003506f0
[  464.028738] Call Trace:
[  464.028740]  <TASK>
[  464.028746]  submit_bio+0x1b/0x80
[  464.028748]  rnbd_srv_rdma_ev+0x50d/0x10c0 [rnbd_server]
[  464.028754]  ? percpu_ref_get_many.constprop.0+0x55/0x140 [rtrs_server]
[  464.028760]  ? __this_cpu_preempt_check+0x13/0x20
[  464.028769]  process_io_req+0x1dc/0x450 [rtrs_server]
[  464.028775]  rtrs_srv_inv_rkey_done+0x67/0xb0 [rtrs_server]
[  464.028780]  __ib_process_cq+0xbc/0x1f0 [ib_core]
[  464.028793]  ib_cq_poll_work+0x2b/0xa0 [ib_core]
[  464.028804]  process_one_work+0x2a9/0x580

[1]. https://lore.kernel.org/all/[email protected]/

Signed-off-by: Guoqing Jiang <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

nbd: Fix debugfs_create_dir error checking

The debugfs_create_dir function returns ERR_PTR in case of error, and the
only correct way to check if an error occurred is 'IS_ERR' inline function.
This patch will replace the null-comparison with IS_ERR.

Signed-off-by: Ivan Orlov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'firewire-fixes-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:

- fix early release of request packet

* tag 'firewire-fixes-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: net: fix unexpected release of object for asynchronous request packet

fbdev: stifb: Fix info entry in sti_struct on error path

Minor fix to reset the info field to NULL in case of error.

Signed-off-by: Helge Deller <[email protected]>

fbdev: modedb: Add 1920x1080 at 60 Hz video mode

Add typical resolution for Full-HD monitors.

Signed-off-by: Helge Deller <[email protected]>

Merge tag 'drm-fixes-2023-05-12' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"About the usual for this stage, bunch of amdgpu, a few i915 and a
  scattering of fixes across the board"

  dsc:
   - macro fixes

  simplefb:
   - fix VESA format

  scheduler:
   - timeout handling fix

  fbdev:
   - avoid potential out-of-bounds access in generic fbdev emulation

  ast:
   - improve AST2500+ compat on ARM

  mipi-dsi:
   - small mipi-dsi fix

  amdgpu:
   - VCN3 fixes
   - APUs always support PCI atomics
   - legacy power management fixes
   - DCN 3.1.4 fix
   - DCFCLK fix
   - fix several RAS irq refcount mismatches
   - GPU Reset fix
   - GFX 11.0.4 fix

  i915:
   - taint kernel when force_probe is used
   - NULL deref and div-by-zero fixes for display
   - GuC error capture fix for Xe devices"

* tag 'drm-fixes-2023-05-12' of git://anongit.freedesktop.org/drm/drm: (24 commits)
  drm/amdgpu: change gfx 11.0.4 external_id range
  drm/amdgpu/jpeg: Remove harvest checking for JPEG3
  drm/amdgpu/gfx: disable gfx9 cp_ecc_error_irq only when enabling legacy gfx ras
  drm/amd/pm: avoid potential UBSAN issue on legacy asics
  drm/i915: taint kernel when force probing unsupported devices
  drm/i915/dp: prevent potential div-by-zero
  drm/i915: Fix NULL ptr deref by checking new_crtc_state
  drm/i915/guc: Don't capture Gen8 regs on Xe devices
  drm/amdgpu: disable sdma ecc irq only when sdma RAS is enabled in suspend
  drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)
  drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcs
  drm/amd/display: Enforce 60us prefetch for 200Mhz DCFCLK modes
  drm/amd/display: Add symclk workaround during disable link output
  drm/amd/pm: parse pp_handle under appropriate conditions
  drm/amdgpu: set gfx9 onwards APU atomics support to be true
  drm/amdgpu/nv: update VCN 3 max HEVC encoding resolution
  drm/sched: Check scheduler work queue before calling timeout handling
  drm/mipi-dsi: Set the fwnode for mipi_dsi_device
  drm/nouveau/disp: More DP_RECEIVER_CAP_SIZE array fixes
  drm/dsc: fix DP_DSC_MAX_BPP_DELTA_* macro values
  ...

Merge tag 'xfs-6.4-rc1-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs bug fixes from Dave Chinner:
"Largely minor bug fixes and cleanups, th emost important of which are
  probably the fixes for regressions in the extent allocation code:

   - fixes for inode garbage collection shutdown racing with work queue
     updates

   - ensure inodegc workers run on the CPU they are supposed to

   - disable counter scrubbing until we can exclusively freeze the
     filesystem from the kernel

   - regression fixes for new allocation related bugs

   - a couple of minor cleanups"

* tag 'xfs-6.4-rc1-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix xfs_inodegc_stop racing with mod_delayed_work
  xfs: disable reaping in fscounters scrub
  xfs: check that per-cpu inodegc workers actually run on that cpu
  xfs: explicitly specify cpu when forcing inodegc delayed work to run immediately
  xfs: fix negative array access in xfs_getbmap
  xfs: don't allocate into the data fork for an unshare request
  xfs: flush dirty data and drain directios before scrubbing cow fork
  xfs: set bnobt/cntbt numrecs correctly when formatting new AGs
  xfs: don't unconditionally null args->pag in xfs_bmap_btalloc_at_eof

fbdev: imsttfb: Fix use after free bug in imsttfb_probe

A use-after-free bug may occur if init_imstt invokes framebuffer_release
and free the info ptr. The caller, imsttfb_probe didn't notice that and
still keep the ptr as private data in pdev.

If we remove the driver which will call imsttfb_remove to make cleanup,
UAF happens.

Fix it by return error code if bad case happens in init_imstt.

Signed-off-by: Zheng Wang <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

Merge tag 'amd-drm-fixes-6.4-2023-05-11' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amdgpu:
- VCN3 fixes
- APUs always support PCI atomics
- Legacy power management fixes
- DCN 3.1.4 fix
- DCFCLK fix
- Fix several RAS irq refcount mismatches
- GPU Reset fix
- GFX 11.0.4 fix

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-intel-fixes-2023-05-11-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Fix to taint kernel when force_probe is used
- Null deref and div-by-zero fixes for display
- GuC error capture fix for Xe devices

Signed-off-by: Dave Airlie <[email protected]>
From: Joonas Lahtinen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-misc-fixes-2023-05-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for v6.4-rc2:
- More DSC macro fixes.
- Small mipi-dsi fix.
- Scheduler timeout handling fix.

---

drm-misc-fixes for v6.4-rc1:
- Fix DSC macros.
- Fix VESA format for simplefb.
- Prohibit potential out-of-bounds access in generic fbdev emulation.
- Improve AST2500+ compat on ARM.

Signed-off-by: Dave Airlie <[email protected]>
From: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'dt-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt

Pull devicetree binding fixes from Krzysztof Kozlowski:
"A few fixes for Devicetree bindings and related docs, all for issues
  introduced in v6.4-rc1 commits:

   - media/ov2685: fix number of possible data lanes, as old binding
     explicitly mentioned one data lane. This fixes dt_binding_check
     warnings like:

       Documentation/devicetree/bindings/media/rockchip-isp1.example.dtb: camera@3c: port:endpoint:data-lanes: [[1]] is too short
       From schema: Documentation/devicetree/bindings/media/i2c/ovti,ov2685.yaml

   - PCI/fsl,imx6q: correct parsing of assigned-clocks and related
     properties and make the clocks more specific per PCI device (host
     or endpoint). This fixes dtschema limitation and dt_binding_check
     warnings like:

       Documentation/devicetree/bindings/pci/fsl,imx6q-pcie-ep.example.dtb: pcie-ep@33800000: Unevaluated properties are not allowed
       From schema: Documentation/devicetree/bindings/pci/fsl,imx6q-pcie-ep.yaml

   - Maintainers: correct path of Apple PWM binding. This fixes
     refcheckdocs warning"

* tag 'dt-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt:
  dt-bindings: PCI: fsl,imx6q: fix assigned-clocks warning
  MAINTAINERS: adjust file entry for ARM/APPLE MACHINE SUPPORT
  media: dt-bindings: ov2685: Correct data-lanes attribute

Merge tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.

  Current release - regressions:

   - mtk_eth_soc: fix NULL pointer dereference

  Previous releases - regressions:

   - core:
      - skb_partial_csum_set() fix against transport header magic value
      - fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
      - annotate sk->sk_err write from do_recvmmsg()
      - add vlan_get_protocol_and_depth() helper

   - netlink: annotate accesses to nlk->cb_running

   - netfilter: always release netdev hooks from notifier

  Previous releases - always broken:

   - core: deal with most data-races in sk_wait_event()

   - netfilter: fix possible bug_on with enable_hooks=1

   - eth: bonding: fix send_peer_notif overflow

   - eth: xpcs: fix incorrect number of interfaces

   - eth: ipvlan: fix out-of-bounds caused by unclear skb->cb

   - eth: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register"

* tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
  af_unix: Fix data races around sk->sk_shutdown.
  af_unix: Fix a data race of sk->sk_receive_queue->qlen.
  net: datagram: fix data-races in datagram_poll()
  net: mscc: ocelot: fix stat counter register values
  ipvlan:Fix out-of-bounds caused by unclear skb->cb
  docs: networking: fix x25-iface.rst heading & index order
  gve: Remove the code of clearing PBA bit
  tcp: add annotations around sk->sk_shutdown accesses
  net: add vlan_get_protocol_and_depth() helper
  net: pcs: xpcs: fix incorrect number of interfaces
  net: deal with most data-races in sk_wait_event()
  net: annotate sk->sk_err write from do_recvmmsg()
  netlink: annotate accesses to nlk->cb_running
  kselftest: bonding: add num_grat_arp test
  selftests: forwarding: lib: add netns support for tc rule handle stats get
  Documentation: bonding: fix the doc of peer_notif_delay
  bonding: fix send_peer_notif overflow
  net: ethernet: mtk_eth_soc: fix NULL pointer dereference
  selftests: nft_flowtable.sh: check ingress/egress chain too
  selftests: nft_flowtable.sh: monitor result file sizes
  ...

Merge tag 'media/v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

- fix some unused-variable warning in mtk-mdp3

- ignore unused suspend operations in nxp

- some driver fixes in rcar-vin

* tag 'media/v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: platform: mtk-mdp3: work around unused-variable warning
  media: nxp: ignore unused suspend operations
  media: rcar-vin: Select correct interrupt mode for V4L2_FIELD_ALTERNATE
  media: rcar-vin: Fix NV12 size alignment
  media: rcar-vin: Gen3 can not scale NV12

fbdev: vfb: Remove trailing whitespaces

Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann <[email protected]>
Acked-by: Helge Deller <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: valkyriefb: Remove trailing whitespaces

Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann <[email protected]>
Acked-by: Helge Deller <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: stifb: Remove trailing whitespaces

Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann <[email protected]>
Acked-by: Helge Deller <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: sa1100fb: Remove trailing whitespaces

Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: platinumfb: Remove trailing whitespaces

Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann <[email protected]>
Acked-by: Helge Deller <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: p9100: Remove trailing whitespaces

Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: maxinefb: Remove trailing whitespaces

Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: macfb: Remove trailing whitespaces

Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann <[email protected]>
Acked-by: Helge Deller <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: hpfb: Remove trailing whitespaces

Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

fbdev: hgafb: Remove trailing whitespaces

Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann <[email protected]>
Signed-off-by: Helge Deller <[email protected]>