Git Repo - linux.git/log

net: add missing READ_ONCE(sk->sk_rcvlowat) annotation

In a prior commit, I forgot to change sk_getsockopt()
when reading sk->sk_rcvlowat locklessly.

Fixes: eac66402d1c3 ("net: annotate sk->sk_rcvlowat lockless reads")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: annotate data-races around sk->sk_max_pacing_rate

sk_getsockopt() runs locklessly. This means sk->sk_max_pacing_rate
can be read while other threads are changing its value.

Fixes: 62748f32d501 ("net: introduce SO_MAX_PACING_RATE")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: annotate data-race around sk->sk_txrehash

sk_getsockopt() runs locklessly. This means sk->sk_txrehash
can be read while other threads are changing its value.

Other locations were handled in commit cb6cd2cec799
("tcp: Change SYN ACK retransmit behaviour to account for rehash")

Fixes: 26859240e4ee ("txhash: Add socket option to control TX hash rethink behavior")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Akhmat Karakotov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: annotate data-races around sk->sk_reserved_mem

sk_getsockopt() runs locklessly. This means sk->sk_reserved_mem
can be read while other threads are changing its value.

Add missing annotations where they are needed.

Fixes: 2bb2f5fb21b0 ("net: add new socket option SO_RESERVE_MEM")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Wei Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: gro: fix misuse of CB in udp socket lookup

This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to
`udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the
device from CB. The fix changes it to fetch the device from `skb->dev`.
l3mdev case requires special attention since it has a master and a slave
device.

Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket")
Reported-by: Gal Pressman <[email protected]>
Signed-off-by: Richard Gobert <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: Fix scheduling in a tasklet while getting stats

Here we've got to a situation when tasklet called usleep_range() in PTT
acquire logic, thus welcome to the "scheduling while atomic" BUG().

  BUG: scheduling while atomic: swapper/24/0/0x00000100

   [<ffffffffb41c6199>] schedule+0x29/0x70
   [<ffffffffb41c5512>] schedule_hrtimeout_range_clock+0xb2/0x150
   [<ffffffffb41c55c3>] schedule_hrtimeout_range+0x13/0x20
   [<ffffffffb41c3bcf>] usleep_range+0x4f/0x70
   [<ffffffffc08d3e58>] qed_ptt_acquire+0x38/0x100 [qed]
   [<ffffffffc08eac48>] _qed_get_vport_stats+0x458/0x580 [qed]
   [<ffffffffc08ead8c>] qed_get_vport_stats+0x1c/0xd0 [qed]
   [<ffffffffc08dffd3>] qed_get_protocol_stats+0x93/0x100 [qed]
                        qed_mcp_send_protocol_stats
            case MFW_DRV_MSG_GET_LAN_STATS:
            case MFW_DRV_MSG_GET_FCOE_STATS:
            case MFW_DRV_MSG_GET_ISCSI_STATS:
            case MFW_DRV_MSG_GET_RDMA_STATS:
   [<ffffffffc08e36d8>] qed_mcp_handle_events+0x2d8/0x890 [qed]
                        qed_int_assertion
                        qed_int_attentions
   [<ffffffffc08d9490>] qed_int_sp_dpc+0xa50/0xdc0 [qed]
   [<ffffffffb3aa7623>] tasklet_action+0x83/0x140
   [<ffffffffb41d9125>] __do_softirq+0x125/0x2bb
   [<ffffffffb41d560c>] call_softirq+0x1c/0x30
   [<ffffffffb3a30645>] do_softirq+0x65/0xa0
   [<ffffffffb3aa78d5>] irq_exit+0x105/0x110
   [<ffffffffb41d8996>] do_IRQ+0x56/0xf0

Fix this by making caller to provide the context whether it could be in
atomic context flow or not when getting stats from QED driver.
QED driver based on the context provided decide to schedule out or not
when acquiring the PTT BAR window.

We faced the BUG_ON() while getting vport stats, but according to the
code same issue could happen for fcoe and iscsi statistics as well, so
fixing them too.

Fixes: 6c75424612a7 ("qed: Add support for NCSI statistics.")
Fixes: 1e128c81290a ("qed: Add support for hardware offloaded FCoE.")
Fixes: 2f2b2614e893 ("qed: Provide iSCSI statistics to management")
Cc: Sudarsana Kalluru <[email protected]>
Cc: David Miller <[email protected]>
Cc: Manish Chopra <[email protected]>
Signed-off-by: Konstantin Khorenko <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: microchip: KSZ9477 register regmap alignment to 32 bit boundaries

The commit (SHA1: 5c844d57aa7894154e49cf2fc648bfe2f1aefc1c) provided code
to apply "Module 6: Certain PHY registers must be written as pairs instead
of singly" errata for KSZ9477 as this chip for certain PHY registers
(0xN120 to 0xN13F, N=1,2,3,4,5) must be accesses as 32 bit words instead
of 16 or 8 bit access.
Otherwise, adjacent registers (no matter if reserved or not) are
overwritten with 0x0.

Without this patch some registers (e.g. 0x113c or 0x1134) required for 32
bit access are out of valid regmap ranges.

As a result, following error is observed and KSZ9477 is not properly
configured:

ksz-switch spi1.0: can't rmw 32bit reg 0x113c: -EIO
ksz-switch spi1.0: can't rmw 32bit reg 0x1134: -EIO
ksz-switch spi1.0 lan1 (uninitialized): failed to connect to PHY: -EIO
ksz-switch spi1.0 lan1 (uninitialized): error -5 setting up PHY for tree 0, switch 0, port 0

The solution is to modify regmap_reg_range to allow accesses with 4 bytes
boundaries.

Signed-off-by: Lukasz Majewski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: tegra: Properly allocate clock bulk data

The clock data is an array of struct clk_bulk_data, so make sure to
allocate enough memory.

Fixes: d8ca113724e7 ("net: stmmac: tegra: Add MGBE support")
Signed-off-by: Thierry Reding <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'loongarch-fixes-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
"Some bug fixes for build system, builtin cmdline handling, bpf and
  {copy, clear}_user, together with a trivial cleanup"

* tag 'loongarch-fixes-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: Cleanup __builtin_constant_p() checking for cpu_has_*
  LoongArch: BPF: Fix check condition to call lu32id in move_imm()
  LoongArch: BPF: Enable bpf_probe_read{, str}() on LoongArch
  LoongArch: Fix return value underflow in exception path
  LoongArch: Fix CMDLINE_EXTEND and CMDLINE_BOOTLOADER handling
  LoongArch: Fix module relocation error with binutils 2.41
  LoongArch: Only fiddle with CHECKFLAGS if `need-compiler'

KVM: selftests: Expand x86's sregs test to cover illegal CR0 values

Add coverage to x86's set_sregs_test to verify KVM rejects vendor-agnostic
illegal CR0 values, i.e. CR0 values whose legality doesn't depend on the
current VMX mode. KVM historically has neglected to reject bad CR0s from
userspace, i.e. would happily accept a completely bogus CR0 via
KVM_SET_SREGS{2}.

Punt VMX specific subtests to future work, as they would require quite a
bit more effort, and KVM gets coverage for CR0 checks in general through
other means, e.g. KVM-Unit-Tests.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230613203037.1968489 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: VMX: Don't fudge CR0 and CR4 for restricted L2 guest

Stuff CR0 and/or CR4 to be compliant with a restricted guest if and only
if KVM itself is not configured to utilize unrestricted guests, i.e. don't
stuff CR0/CR4 for a restricted L2 that is running as the guest of an
unrestricted L1. Any attempt to VM-Enter a restricted guest with invalid
CR0/CR4 values should fail, i.e. in a nested scenario, KVM (as L0) should
never observe a restricted L2 with incompatible CR0/CR4, since nested
VM-Enter from L1 should have failed.

And if KVM does observe an active, restricted L2 with incompatible state,
e.g. due to a KVM bug, fudging CR0/CR4 instead of letting VM-Enter fail
does more harm than good, as KVM will often neglect to undo the side
effects, e.g. won't clear rmode.vm86_active on nested VM-Exit, and thus
the damage can easily spill over to L1. On the other hand, letting
VM-Enter fail due to bad guest state is more likely to contain the damage
to L2 as KVM relies on hardware to perform most guest state consistency
checks, i.e. KVM needs to be able to reflect a failed nested VM-Enter into
L1 irrespective of (un)restricted guest behavior.

Cc: Jim Mattson <[email protected]>
Cc: [email protected]
Fixes: bddd82d19e2e ("KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in vmcs02 if vmcs12 doesn't set it")
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230613203037.1968489 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid

Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid,
e.g. due to setting bits 63:32, illegal combinations, or to a value that
isn't allowed in VMX (non-)root mode. The VMX checks in particular are
"fun" as failure to disallow Real Mode for an L2 that is configured with
unrestricted guest disabled, when KVM itself has unrestricted guest
enabled, will result in KVM forcing VM86 mode to virtual Real Mode for
L2, but then fail to unwind the related metadata when synthesizing a
nested VM-Exit back to L1 (which has unrestricted guest enabled).

Opportunistically fix a benign typo in the prototype for is_valid_cr4().

Cc: [email protected]
Reported-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230613203037.1968489 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Revert "debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE() usage"

Remove coccinelle's recommendation to use DEFINE_DEBUGFS_ATTRIBUTE()
instead of DEFINE_SIMPLE_ATTRIBUTE(). Regardless of whether or not the
"significant overhead" incurred by debugfs_create_file() is actually
meaningful, warnings from the script have led to a rash of low-quality
patches that have sowed confusion and consumed maintainer time for little
to no benefit. There have been no less than four attempts to "fix" KVM,
and a quick search on lore shows that KVM is not alone.

This reverts commit 5103068eaca290f890a30aae70085fac44cecaf6.

Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/[email protected]
Link: https://lkml.kernel.org/r/20230706072954.4881-1-duminjie%40vivo.com
Link: https://lore.kernel.org/all/[email protected]
Link: https://lore.kernel.org/all/Y2ENJJ1YiSg5oHiy@orome
Link: https://lore.kernel.org/all/[email protected]
Suggested-by: Paolo Bonzini <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230726202920 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Verify stats fd is usable after VM fd has been closed

Verify that VM and vCPU binary stats files are usable even after userspace
has put its last direct reference to the VM. This is a regression test
for a UAF bug where KVM didn't gift the stats files a reference to the VM.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230711230131 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Verify stats fd can be dup()'d and read

Expand the binary stats test to verify that a stats fd can be dup()'d
and read, to (very) roughly simulate userspace passing around the file.
Adding the dup() test is primarily an intermediate step towards verifying
that userspace can read VM/vCPU stats before _and_ after userspace closes
its copy of the VM fd; the dup() test itself is only mildly interesting.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230711230131 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Verify userspace can create "redundant" binary stats files

Verify that KVM doesn't artificially limit KVM_GET_STATS_FD to a single
file per VM/vCPU. There's no known use case for getting multiple stats
fds, but it should work, and more importantly creating multiple files will
make it easier to test that KVM correct manages VM refcounts for stats
files.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230711230131 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Explicitly free vcpus array in binary stats test

Explicitly free the all-encompassing vcpus array in the binary stats test
so that the test is consistent with respect to freeing all dynamically
allocated resources (versus letting them be freed on exit).

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230711230131 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Clean up stats fd in common stats_test() helper

Move the stats fd cleanup code into stats_test() and drop the
superfluous vm_stats_test() and vcpu_stats_test() helpers in order to
decouple creation of the stats file from consuming/testing the file
(deduping code is a bonus). This will make it easier to test various
edge cases related to stats, e.g. that userspace can dup() a stats fd,
that userspace can have multiple stats files for a singleVM/vCPU, etc.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230711230131 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: selftests: Use pread() to read binary stats header

Use pread() with an explicit offset when reading the header and the header
name for a binary stats fd so that the common helper and the binary stats
test don't subtly rely on the file effectively being untouched, e.g. to
allow multiple reads of the header, name, etc.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230711230131 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Grab a reference to KVM for VM and vCPU stats file descriptors

Grab a reference to KVM prior to installing VM and vCPU stats file
descriptors to ensure the underlying VM and vCPU objects are not freed
until the last reference to any and all stats fds are dropped.

Note, the stats paths manually invoke fd_install() and so don't need to
grab a reference before creating the file.

Fixes: ce55c049459c ("KVM: stats: Support binary stats retrieval for a VCPU")
Fixes: fcfe1baeddbf ("KVM: stats: Support binary stats retrieval for a VM")
Reported-by: Zheng Zhang <[email protected]>
Closes: https://lore.kernel.org/all/CAC_GQSr3xzZaeZt85k_RCBd5kfiOve8qXo7a81Cq53LuVQ5r=Q@mail.gmail.com
Cc: [email protected]
Cc: Kees Cook <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Message-Id: <20230711230131 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

selftests/rseq: Play nice with binaries statically linked against glibc 2.35+

To allow running rseq and KVM's rseq selftests as statically linked
binaries, initialize the various "trampoline" pointers to point directly
at the expect glibc symbols, and skip the dlysm() lookups if the rseq
size is non-zero, i.e. the binary is statically linked *and* the libc
registered its own rseq.

Define weak versions of the symbols so as not to break linking against
libc versions that don't support rseq in any capacity.

The KVM selftests in particular are often statically linked so that they
can be run on targets with very limited runtime environments, i.e. test
machines.

Fixes: 233e667e1ae3 ("selftests/rseq: Uplift rseq selftests for compatibility with glibc-2.35")
Cc: Aaron Lewis <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230721223352.2333911 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"

Now that handle_fastpath_set_msr_irqoff() acquires kvm->srcu, i.e. allows
dereferencing memslots during WRMSR emulation, drop the requirement that
"next RIP" is valid. In hindsight, acquiring kvm->srcu would have been a
better fix than avoiding the pastpath, but at the time it was thought that
accessing SRCU-protected data in the fastpath was a one-off edge case.

This reverts commit 5c30e8101e8d5d020b1d7119117889756a6ed713.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230721224337.2335137 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: Acquire SRCU read lock when handling fastpath MSR writes

Temporarily acquire kvm->srcu for read when potentially emulating WRMSR in
the VM-Exit fastpath handler, as several of the common helpers used during
emulation expect the caller to provide SRCU protection.  E.g. if the guest
is counting instructions retired, KVM will query the PMU event filter when
stepping over the WRMSR.

  dump_stack+0x85/0xdf
  lockdep_rcu_suspicious+0x109/0x120
  pmc_event_is_allowed+0x165/0x170
  kvm_pmu_trigger_event+0xa5/0x190
  handle_fastpath_set_msr_irqoff+0xca/0x1e0
  svm_vcpu_run+0x5c3/0x7b0 [kvm_amd]
  vcpu_enter_guest+0x2108/0x2580

Alternatively, check_pmu_event_filter() could acquire kvm->srcu, but this
isn't the first bug of this nature, e.g. see commit 5c30e8101e8d ("KVM:
SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid").  Providing
protection for the entirety of WRMSR emulation will allow reverting the
aforementioned commit, and will avoid having to play whack-a-mole when new
uses of SRCU-protected structures are inevitably added in common emulation
helpers.

Fixes: dfdeda67ea2d ("KVM: x86/pmu: Prevent the PMU from counting disallowed events")
Reported-by: Greg Thelen <[email protected]>
Reported-by: Aaron Lewis <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230721224337.2335137 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: VMX: Use vmread_error() to report VM-Fail in "goto" path

Use vmread_error() to report VM-Fail on VMREAD for the "asm goto" case,
now that trampoline case has yet another wrapper around vmread_error() to
play nice with instrumentation.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230721235637.2345403 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: VMX: Make VMREAD error path play nice with noinstr

Mark vmread_error_trampoline() as noinstr, and add a second trampoline
for the CONFIG_CC_HAS_ASM_GOTO_OUTPUT=n case to enable instrumentation
when handling VM-Fail on VMREAD.  VMREAD is used in various noinstr
flows, e.g. immediately after VM-Exit, and objtool rightly complains that
the call to the error trampoline leaves a no-instrumentation section
without annotating that it's safe to do so.

  vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0xc9:
  call to vmread_error_trampoline() leaves .noinstr.text section

Note, strictly speaking, enabling instrumentation in the VM-Fail path
isn't exactly safe, but if VMREAD fails the kernel/system is likely hosed
anyways, and logging that there is a fatal error is more important than
*maybe* encountering slightly unsafe instrumentation.

Reported-by: Su Hui <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20230721235637.2345403 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86/irq: Conditionally register IRQ bypass consumer again

As was attempted commit 14717e203186 ("kvm: Conditionally register IRQ
bypass consumer"): "if we don't support a mechanism for bypassing IRQs,
don't register as a consumer. Initially this applied to AMD processors,
but when AVIC support was implemented for assigned devices,
kvm_arch_has_irq_bypass() was always returning true.

We can still skip registering the consumer where enable_apicv
or posted-interrupts capability is unsupported or globally disabled.
This eliminates meaningless dev_info()s when the connect fails
between producer and consumer", such as on Linux hosts where enable_apicv
or posted-interrupts capability is unsupported or globally disabled.

Cc: Alex Williamson <[email protected]>
Reported-by: Yong He <[email protected]>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217379
Signed-off-by: Like Xu <[email protected]>
Message-Id: <20230724111236 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: X86: Use GFP_KERNEL_ACCOUNT for pid_table in ipiv

The pid_table of ipiv is the persistent memory allocated by
per-vcpu, which should be counted into the memory cgroup.

Signed-off-by: Peng Hao <[email protected]>
Message-Id: <CAPm50aLxCQ3TQP2Lhc0PX3y00iTRg+mniLBqNDOC=t9CLxMwwA@mail.gmail.com>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: check the kvm_cpu_get_interrupt result before using it

The code was blindly assuming that kvm_cpu_get_interrupt never returns -1
when there is a pending interrupt.

While this should be true, a bug in KVM can still cause this.

If -1 is returned, the code before this patch was converting it to 0xFF,
and 0xFF interrupt was injected to the guest, which results in an issue
which was hard to debug.

Add WARN_ON_ONCE to catch this case and skip the injection
if this happens again.

Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <20230726135945 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: VMX: set irr_pending in kvm_apic_update_irr

When the APICv is inhibited, the irr_pending optimization is used.

Therefore, when kvm_apic_update_irr sets bits in the IRR,
it must set irr_pending to true as well.

Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <20230726135945 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: VMX: __kvm_apic_update_irr must update the IRR atomically

If APICv is inhibited, then IPIs from peer vCPUs are done by
atomically setting bits in IRR.

This means, that when __kvm_apic_update_irr copies PIR to IRR,
it has to modify IRR atomically as well.

Signed-off-by: Maxim Levitsky <[email protected]>
Message-Id: <20230726135945 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kprobes: Prohibit probing on CFI preamble symbol

Do not allow to probe on "__cfi_" or "__pfx_" started symbol, because those
are used for CFI and not executed. Probing it will break the CFI.

Link: https://lore.kernel.org/all/168904024679.116016.18089228029322008512.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>

x86/srso: Add a forgotten NOENDBR annotation

Fix:

vmlinux.o: warning: objtool: .export_symbol+0x29e40: data relocation to !ENDBR: srso_untrain_ret_alias+0x0

Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>

KVM: s390: fix sthyi error handling

Commit 9fb6c9b3fea1 ("s390/sthyi: add cache to store hypervisor info")
added cache handling for store hypervisor info. This also changed the
possible return code for sthyi_fill().

Instead of only returning a condition code like the sthyi instruction would
do, it can now also return a negative error value (-ENOMEM). handle_styhi()
was not changed accordingly. In case of an error, the negative error value
would incorrectly injected into the guest PSW.

Add proper error handling to prevent this, and update the comment which
describes the possible return values of sthyi_fill().

Fixes: 9fb6c9b3fea1 ("s390/sthyi: add cache to store hypervisor info")
Reviewed-by: Christian Borntraeger <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Heiko Carstens <[email protected]>

x86/srso: Fix return thunks in generated code

Set X86_FEATURE_RETHUNK when enabling the SRSO mitigation so that
generated code (e.g., ftrace, static call, eBPF) generates "jmp
__x86_return_thunk" instead of RET.

[ bp: Add a comment. ]

Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>

mISDN: hfcpci: Fix potential deadlock on &hc->lock

As &hc->lock is acquired by both timer _hfcpci_softirq() and hardirq
hfcpci_int(), the timer should disable irq before lock acquisition
otherwise deadlock could happen if the timmer is preemtped by the hadr irq.

Possible deadlock scenario:
hfcpci_softirq() (timer)
    -> _hfcpci_softirq()
    -> spin_lock(&hc->lock);
        <irq interruption>
        -> hfcpci_int()
        -> spin_lock(&hc->lock); (deadlock here)

This flaw was found by an experimental static analysis tool I am developing
for irq-related deadlock.

The tentative patch fixes the potential deadlock by spin_lock_irq()
in timer.

Fixes: b36b654a7e82 ("mISDN: Create /sys/class/mISDN")
Signed-off-by: Chengfeng Ye <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'ata-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ata fixes from Damien Le Moal:

- Fix error message output in the pata_arasan_cf driver (Minjie)

- Fix invalid error return in the pata_octeon_cf driver initialization
   (Yingliang)

- Fix a compilation warning due to a missing static function
   declaration in the pata_ns87415 driver (Arnd)

- Fix the condition evaluating when to fetch sense data for successful
   completions, which should be done only when command duration limits
   are being used (Niklas)

* tag 'ata-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: libata-core: fix when to fetch sense data for successful commands
  ata: pata_ns87415: mark ns87560_tf_read static
  ata: pata_octeon_cf: fix error return code in octeon_cf_probe()
  ata: pata_arasan_cf: Use dev_err_probe() instead dev_err() in data_xfer()

net: sched: cls_u32: Fix match key mis-addressing

A match entry is uniquely identified with an "address" or "path" in the
form of: hashtable ID(12b):bucketid(8b):nodeid(12b).

When creating table match entries all of hash table id, bucket id and
node (match entry id) are needed to be either specified by the user or
reasonable in-kernel defaults are used. The in-kernel default for a table id is
0x800(omnipresent root table); for bucketid it is 0x0. Prior to this fix there
was none for a nodeid i.e. the code assumed that the user passed the correct
nodeid and if the user passes a nodeid of 0 (as Mingi Cho did) then that is what
was used. But nodeid of 0 is reserved for identifying the table. This is not
a problem until we dump. The dump code notices that the nodeid is zero and
assumes it is referencing a table and therefore references table struct
tc_u_hnode instead of what was created i.e match entry struct tc_u_knode.

Ming does an equivalent of:
tc filter add dev dummy0 parent 10: prio 1 handle 0x1000 \
protocol ip u32 match ip src 10.0.0.1/32 classid 10:1 action ok

Essentially specifying a table id 0, bucketid 1 and nodeid of zero
Tableid 0 is remapped to the default of 0x800.
Bucketid 1 is ignored and defaults to 0x00.
Nodeid was assumed to be what Ming passed - 0x000

dumping before fix shows:
~$ tc filter ls dev dummy0 parent 10:
filter protocol ip pref 1 u32 chain 0
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor -30591

Note that the last line reports a table instead of a match entry
(you can tell this because it says "ht divisor...").
As a result of reporting the wrong data type (misinterpretting of struct
tc_u_knode as being struct tc_u_hnode) the divisor is reported with value
of -30591. Ming identified this as part of the heap address
(physmap_base is 0xffff8880 (-30591 - 1)).

The fix is to ensure that when table entry matches are added and no
nodeid is specified (i.e nodeid == 0) then we get the next available
nodeid from the table's pool.

After the fix, this is what the dump shows:
$ tc filter ls dev dummy0 parent 10:
filter protocol ip pref 1 u32 chain 0
filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 1 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 10:1 not_in_hw
match 0a000001/ffffffff at 12
action order 1: gact action pass
random type none pass val 0
index 1 ref 1 bind 1

Reported-by: Mingi Cho <[email protected]>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tracing: Fix warning in trace_buffered_event_disable()

Warning happened in trace_buffered_event_disable() at
  WARN_ON_ONCE(!trace_buffered_event_ref)

  Call Trace:
   ? __warn+0xa5/0x1b0
   ? trace_buffered_event_disable+0x189/0x1b0
   __ftrace_event_enable_disable+0x19e/0x3e0
   free_probe_data+0x3b/0xa0
   unregister_ftrace_function_probe_func+0x6b8/0x800
   event_enable_func+0x2f0/0x3d0
   ftrace_process_regex.isra.0+0x12d/0x1b0
   ftrace_filter_write+0xe6/0x140
   vfs_write+0x1c9/0x6f0
   [...]

The cause of the warning is in __ftrace_event_enable_disable(),
trace_buffered_event_enable() was called once while
trace_buffered_event_disable() was called twice.
Reproduction script show as below, for analysis, see the comments:
```
#!/bin/bash

cd /sys/kernel/tracing/

# 1. Register a 'disable_event' command, then:
#    1) SOFT_DISABLED_BIT was set;
#    2) trace_buffered_event_enable() was called first time;
echo 'cmdline_proc_show:disable_event:initcall:initcall_finish' > \
     set_ftrace_filter

# 2. Enable the event registered, then:
#    1) SOFT_DISABLED_BIT was cleared;
#    2) trace_buffered_event_disable() was called first time;
echo 1 > events/initcall/initcall_finish/enable

# 3. Try to call into cmdline_proc_show(), then SOFT_DISABLED_BIT was
#    set again!!!
cat /proc/cmdline

# 4. Unregister the 'disable_event' command, then:
#    1) SOFT_DISABLED_BIT was cleared again;
#    2) trace_buffered_event_disable() was called second time!!!
echo '!cmdline_proc_show:disable_event:initcall:initcall_finish' > \
     set_ftrace_filter
```

To fix it, IIUC, we can change to call trace_buffered_event_enable() at
fist time soft-mode enabled, and call trace_buffered_event_disable() at
last time soft-mode disabled.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: <[email protected]>
Fixes: 0fc1b09ff1ff ("tracing: Use temp buffer when filtering events")
Signed-off-by: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

Merge tag 'mm-hotfixes-stable-2023-07-28-15-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
"11 hotfixes. Five are cc:stable and the remainder address post-6.4
  issues or aren't considered serious enough to justify backporting"

* tag 'mm-hotfixes-stable-2023-07-28-15-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/memory-failure: fix hardware poison check in unpoison_memory()
  proc/vmcore: fix signedness bug in read_from_oldmem()
  mailmap: update remaining active codeaurora.org email addresses
  mm: lock VMA in dup_anon_vma() before setting ->anon_vma
  mm: fix memory ordering for mm_lock_seq and vm_lock_seq
  scripts/spelling.txt: remove 'thead' as a typo
  mm/pagewalk: fix EFI_PGT_DUMP of espfix area
  shmem: minor fixes to splice-read implementation
  tmpfs: fix Documentation of noswap and huge mount options
  Revert "um: Use swap() to make code cleaner"
  mm/damon/core-test: initialise context before test in damon_test_set_attrs()

Merge tag 'thermal-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
"Constify thermal_zone_device_register() parameters, which was omitted
  by mistake, and fix a double free on thermal zone unregistration in
  the generic DT thermal driver (Ahmad Fatoum)"

* tag 'thermal-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: of: fix double-free on unregistration
  thermal: core: constify params in thermal_zone_device_register

Merge tag 'pm-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"Fix the arming of wakeup IRQs in the generic wakeup IRQ code
  (wakeirq), drop unused functions from it and fix up a driver using it
  and trying to work around the IRQ arming issue in a questionable way
  (Johan Hovold)"

* tag 'pm-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  serial: qcom-geni: drop bogus runtime pm state update
  PM: sleep: wakeirq: drop unused enable helpers
  PM: sleep: wakeirq: fix wake irq arming

Merge tag 'hwmon-for-v6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- k10temp: Display negative temperatures for industrial processors

- pmbus core: Fix deadlock, NULL pointer dereference, and chip enable
   detection

- nct7802: Do not display PECI1 temperature if disabled

- nct6775: Fix IN scaling factors and feature detection for
   NCT6798/6799

- oxp-sensors: Fix race condition during device attribute creation

- aquacomputer_d5next: Fix incorrect PWM value readout

* tag 'hwmon-for-v6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (k10temp) Enable AMD3255 Proc to show negative temperature
  hwmon: (pmbus_core) Fix Deadlock in pmbus_regulator_get_status
  hwmon: (pmbus_core) Fix NULL pointer dereference
  hwmon: (pmbus_core) Fix pmbus_is_enabled()
  hwmon: (nct7802) Fix for temp6 (PECI1) processed even if PECI1 disabled
  hwmon: (nct6775) Fix IN scaling factors for 6798/6799
  hwmon: (oxp-sensors) Move tt_toggle attribute to dev_groups
  hwmon: (aquacomputer_d5next) Fix incorrect PWM value readout
  hwmon: (nct6775) Fix register for nct6799

ftrace: Remove unused extern declarations

commit 6a9c981b1e96 ("ftrace: Remove unused function ftrace_arch_read_dyn_info()")
left ftrace_arch_read_dyn_info() extern declaration.
And commit 1d74f2a0f64b ("ftrace: remove ftrace_ip_converted()")
leave ftrace_ip_converted() declaration.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: <[email protected]>
Cc: <[email protected]>
Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing: Fix kernel-doc warnings in trace_seq.c

Fix kernel-doc warning:

kernel/trace/trace_seq.c:142: warning: Function parameter or member
'args' not described in 'trace_seq_vprintf'

Link: https://lkml.kernel.org/r/[email protected]
Cc: <[email protected]>
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing: Fix kernel-doc warnings in trace_events_trigger.c

Fix kernel-doc warnings:

kernel/trace/trace_events_trigger.c:59: warning: Function parameter
or member 'buffer' not described in 'event_triggers_call'
kernel/trace/trace_events_trigger.c:59: warning: Function parameter
or member 'event' not described in 'event_triggers_call'

Link: https://lkml.kernel.org/r/[email protected]
Cc: <[email protected]>
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing/synthetic: Fix kernel-doc warnings in trace_events_synth.c

Fix kernel-doc warning:

kernel/trace/trace_events_synth.c:1257: warning: Function parameter
or member 'mod' not described in 'synth_event_gen_cmd_array_start'

Link: https://lkml.kernel.org/r/[email protected]
Cc: <[email protected]>
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

ring-buffer: Fix kernel-doc warnings in ring_buffer.c

Fix kernel-doc warnings:

kernel/trace/ring_buffer.c:954: warning: Function parameter or
member 'cpu' not described in 'ring_buffer_wake_waiters'
kernel/trace/ring_buffer.c:3383: warning: Excess function parameter
'event' description in 'ring_buffer_unlock_commit'
kernel/trace/ring_buffer.c:5359: warning: Excess function parameter
'cpu' description in 'ring_buffer_reset_online_cpus'

Link: https://lkml.kernel.org/r/[email protected]
Cc: <[email protected]>
Signed-off-by: Gaosheng Cui <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"Several smaller driver fixes and a core RDMA CM regression fix:

   - Fix improperly accepting flags from userspace in mlx4

   - Add missing DMA barriers for irdma

   - Fix two kcsan warnings in irdma

   - Report the correct CQ op code to userspace in irdma

   - Report the correct MW bind error code for irdma

   - Load the destination address in RDMA CM to resolve a recent
     regression

   - Fix a QP regression in mthca

   - Remove a race processing completions in bnxt_re resulting in a
     crash

   - Fix driver unloading races with interrupts and tasklets in bnxt_re

   - Fix missing error unwind in rxe"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/irdma: Report correct WC error
  RDMA/irdma: Fix op_type reporting in CQEs
  RDMA/rxe: Fix an error handling path in rxe_bind_mw()
  RDMA/bnxt_re: Fix hang during driver unload
  RDMA/bnxt_re: Prevent handling any completions after qp destroy
  RDMA/mthca: Fix crash when polling CQ for shared QPs
  RDMA/core: Update CMA destination address on rdma_resolve_addr
  RDMA/irdma: Fix data race on CQP request done
  RDMA/irdma: Fix data race on CQP completion stats
  RDMA/irdma: Add missing read barriers
  RDMA/mlx4: Make check for invalid flags stricter

Merge tag 'tpmdd-v6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm fixes from Jarkko Sakkinen:
"I picked up three small scale updates that I think would improve the
  quality of the release"

* tag 'tpmdd-v6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm_tis: Explicitly check for error code
  tpm: Switch i2c drivers back to use .probe()
  security: keys: perform capable check only on privileged operations

ring-buffer: Fix wrong stat of cpu_buffer->read

When pages are removed in rb_remove_pages(), 'cpu_buffer->read' is set
to 0 in order to make sure any read iterators reset themselves. However,
this will mess 'entries' stating, see following steps:

  # cd /sys/kernel/tracing/
  # 1. Enlarge ring buffer prepare for later reducing:
  # echo 20 > per_cpu/cpu0/buffer_size_kb
  # 2. Write a log into ring buffer of cpu0:
  # taskset -c 0 echo "hello1" > trace_marker
  # 3. Read the log:
  # cat per_cpu/cpu0/trace_pipe
       <...>-332     [000] .....    62.406844: tracing_mark_write: hello1
  # 4. Stop reading and see the stats, now 0 entries, and 1 event readed:
  # cat per_cpu/cpu0/stats
   entries: 0
   [...]
   read events: 1
  # 5. Reduce the ring buffer
  # echo 7 > per_cpu/cpu0/buffer_size_kb
  # 6. Now entries became unexpected 1 because actually no entries!!!
  # cat per_cpu/cpu0/stats
   entries: 1
   [...]
   read events: 0

To fix it, introduce 'page_removed' field to count total removed pages
since last reset, then use it to let read iterators reset themselves
instead of changing the 'read' pointer.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: <[email protected]>
Cc: <[email protected]>
Fixes: 83f40318dab0 ("ring-buffer: Make removal of ring buffer pages atomic")
Signed-off-by: Zheng Yejian <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

cxl/memdev: Only show sanitize sysfs files when supported

If the device does not support Sanitize or Secure Erase commands,
hide the respective sysfs interfaces such that the operation can
never be attempted.

In order to be generic, keep track of the enabled security commands
found in the CEL - the driver does not support Security Passthrough.

Signed-off-by: Davidlohr Bueso <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Dave Jiang <[email protected]>
Signed-off-by: Vishal Verma <[email protected]>

cxl/memdev: Document security state in kern-doc

... as is the case with all members of struct cxl_memdev_state.

Signed-off-by: Davidlohr Bueso <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Dave Jiang <[email protected]>
Signed-off-by: Vishal Verma <[email protected]>

cxl/memdev: Improve sanitize ABI descriptions

Be more detailed about the CPU cache management situation. The same
goes for both sanitize and secure erase.

Signed-off-by: Davidlohr Bueso <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Dave Jiang <[email protected]>
Signed-off-by: Vishal Verma <[email protected]>

perf test uprobe_from_different_cu: Skip if there is no gcc

Without gcc, the test will fail.

On cleanup, ignore probe removal errors. Otherwise, in case of an error
adding the probe, the temporary directory is not removed.

Fixes: 56cbeacf14353057 ("perf probe: Add test for regression introduced by switch to die_get_decl_file()")
Signed-off-by: Georg Müller <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Georg Müller <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/CAP-5=fUP6UuLgRty3t2=fQsQi3k4hDMz415vWdp1x88QMvZ8ug@mail.gmail.com/
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

- A couple of SME updates for recent fixes (one of which went to
   stable): reverting the flushing of the SME hardware state along with
   the thread flushing and making sure we have the correct vector length
   before reallocating.

- An ACPI/IORT fix to avoid skipping ID mappings whose "number of IDs"
   is 0 (the spec reports the number of IDs in the mapping range minus
   1).

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  ACPI/IORT: Remove erroneous id_count check in iort_node_get_rmr_info()
  arm64/sme: Set new vector length before reallocating
  arm64/fpsimd: Don't flush SME register hardware state along with thread

Merge tag 'for-linus-6.5a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

- A fix for a performance problem in QubesOS, adding a way to drain the
   queue of grants experiencing delayed unmaps faster

- A patch enabling the use of static event channels from user mode,
   which was omitted when introducing supporting static event channels

- A fix for a problem where Xen related code didn't check properly for
   running in a Xen environment, resulting in a WARN splat

* tag 'for-linus-6.5a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: speed up grant-table reclaim
  xen/evtchn: Introduce new IOCTL to bind static evtchn
  xenbus: check xen_domain in xenbus_probe_initcall

tpm_tis: Explicitly check for error code

recv_data either returns the number of received bytes, or a negative value
representing an error code. Adding the return value directly to the total
number of received bytes therefore looks a little weird, since it might add
a negative error code to a sum of bytes.

The following check for size < expected usually makes the function return
ETIME in that case, so it does not cause too many problems in practice. But
to make the code look cleaner and because the caller might still be
interested in the original error code, explicitly check for the presence of
an error code and pass that through.

Cc: [email protected]
Fixes: cb5354253af2 ("[PATCH] tpm: spacing cleanups 2")
Signed-off-by: Alexander Steffen <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

tpm: Switch i2c drivers back to use .probe()

After commit b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new()
call-back type"), all drivers being converted to .probe_new() and then
03c835f498b5 ("i2c: Switch .probe() to not take an id parameter")
convert back to (the new) .probe() to be able to eventually drop
.probe_new() from struct i2c_driver.

Signed-off-by: Uwe Kleine-König <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

security: keys: perform capable check only on privileged operations

If the current task fails the check for the queried capability via
`capable(CAP_SYS_ADMIN)` LSMs like SELinux generate a denial message.
Issuing such denial messages unnecessarily can lead to a policy author
granting more privileges to a subject than needed to silence them.

Reorder CAP_SYS_ADMIN checks after the check whether the operation is
actually privileged.

Signed-off-by: Christian Göttsche <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

Merge tag 'ceph-for-6.5-rc4' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
"A patch to reduce the potential for erroneous RBD exclusive lock
  blocklisting (fencing) with a couple of prerequisites and a fixup to
  prevent metrics from being sent to the MDS even just once after that
  has been disabled by the user. All marked for stable"

* tag 'ceph-for-6.5-rc4' of https://github.com/ceph/ceph-client:
  rbd: retrieve and check lock owner twice before blocklisting
  rbd: harden get_lock_owner_info() a bit
  rbd: make get_lock_owner_info() return a single locker or NULL
  ceph: never send metrics if disable_send_metrics is set

Merge tag '9p-fixes-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9p fixes from Eric Van Hensbergen:
"Misc set of fixes for 9p.

  Most of these clean up warnings we've gotten out of compilation tools,
  but several of them were from inspection while hunting down a couple
  of regressions.

  The most important one is 75b396821cb7 ("fs/9p: remove unnecessary and
  overrestrictive check") which caused a regression for some folks by
  restricting mmap in any case where writeback caches weren't enabled.

  Most of the other bugs caught via inspection were type mismatches"

* tag '9p-fixes-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: Remove unused extern declaration
  9p: remove dead stores (variable set again without being read)
  9p: virtio: skip incrementing unused variable
  9p: virtio: make sure 'offs' is initialized in zc_request
  9p: virtio: fix unlikely null pointer deref in handle_rerror
  9p: fix ignored return value in v9fs_dir_release
  fs/9p: remove unnecessary invalidate_inode_pages2
  fs/9p: fix type mismatch in file cache mode helper
  fs/9p: fix typo in comparison logic for cache mode
  fs/9p: remove unnecessary and overrestrictive check
  fs/9p: Fix a datatype used with V9FS_DIRECT_IO

Merge tag 'block-6.5-2023-07-28' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:
"A few fixes that should go into the current kernel release, mainly:

   - Set of fixes for dasd (Stefan)

   - Handle interruptible waits returning because of a signal for ublk
     (Ming)"

* tag 'block-6.5-2023-07-28' of git://git.kernel.dk/linux:
  ublk: return -EINTR if breaking from waiting for existed users in DEL_DEV
  ublk: fail to recover device if queue setup is interrupted
  ublk: fail to start device if queue setup is interrupted
  block: Fix a source code comment in include/uapi/linux/blkzoned.h
  s390/dasd: print copy pair message only for the correct error
  s390/dasd: fix hanging device after request requeue
  s390/dasd: use correct number of retries for ERP requests
  s390/dasd: fix hanging device after quiesce/resume

Merge tag 'io_uring-6.5-2023-07-28' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
"Just a single tweak to a patch from last week, to avoid having idle
cqring waits be attributed as iowait"

* tag 'io_uring-6.5-2023-07-28' of git://git.kernel.dk/linux:
io_uring: gate iowait schedule on having pending requests

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd fixes from Jason Gunthorpe:
"Two user triggerable problems:

   - Syzkaller found a way to trigger a WARN_ON and leak memory by
     racing destroy with other actions

   - There is still a bug in the "batch carry" stuff that gets invoked
     for complex cases with accesses and unmapping of huge pages. The
     test suite found this (triggers rarely)"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
  iommufd: Set end correctly when doing batch carry
  iommufd: IOMMUFD_DESTROY should not increase the refcount

KVM: arm64: Skip instruction after emulating write to TCR_EL1

Whelp, this is embarrassing. Since commit 082fdfd13841 ("KVM: arm64:
Prevent guests from enabling HA/HD on Ampere1") KVM traps writes to
TCR_EL1 on AmpereOne to work around an erratum in the unadvertised
HAFDBS implementation, preventing the guest from enabling the feature.
Unfortunately, I failed virtualization 101 when working on that change,
and forgot to advance PC after instruction emulation.

Do the right thing and skip the MSR instruction after emulating the
write.

Fixes: 082fdfd13841 ("KVM: arm64: Prevent guests from enabling HA/HD on Ampere1")
Acked-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Oliver Upton <[email protected]>

Merge tag 'for-6.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

- Fix double free on memory allocation failure in DM integrity target's
   integrity_recalc()

- Fix locking in DM raid target's raid_ctr() and around call to
   md_stop()

- Fix DM cache target's cleaner policy to always allow work to be
   queued for writeback; even if cache isn't idle.

* tag 'for-6.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache policy smq: ensure IO doesn't prevent cleaner policy progress
  dm raid: protect md_stop() with 'reconfig_mutex'
  dm raid: clean up four equivalent goto tags in raid_ctr()
  dm raid: fix missing reconfig_mutex unlock in raid_ctr() error paths
  dm integrity: fix double free on memory allocation failure

Merge tag 'sound-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of device-specific small fixes such as ASoC Realtek codec
  fixes for PM issues, ASoC nau8821 quirk additions, and usual HD- and
  USB-audio quirks"

* tag 'sound-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek: Support ASUS G713PV laptop
  ALSA: usb-audio: Update for native DSD support quirks
  ALSA: usb-audio: Add quirk for Microsoft Modern Wireless Headset
  ALSA: hda/relatek: Enable Mute LED on HP 250 G8
  ASoC: atmel: Fix the 8K sample parameter in I2SC master
  ASoC: rt711-sdca: fix for JD event handling in ClockStop Mode0
  ASoC: rt711: fix for JD event handling in ClockStop Mode0
  ASoC: rt722-sdca: fix for JD event handling in ClockStop Mode0
  ASoC: rt712-sdca: fix for JD event handling in ClockStop Mode0
  ASoc: codecs: ES8316: Fix DMIC config
  ASoC: rt5682-sdw: fix for JD event handling in ClockStop Mode0
  ASoC: wm8904: Fill the cache for WM8904_ADC_TEST_0 register
  ASoC: nau8821: Add DMI quirk mechanism for active-high jack-detect
  ASoC: da7219: Check for failure reading AAD IRQ events
  ASoC: da7219: Flush pending AAD IRQ when suspending
  ALSA: seq: remove redundant unsigned comparison to zero
  ASoC: fsl_spdif: Silence output on stop

Merge tag 'drm-fixes-2023-07-28' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Regular scheduled fixes, msm and amdgpu leading the way, with some
  i915 and a single misc fbdev, all seems fine.

  fbdev:
   - remove unused function

  amdgpu:
   - gfxhub partition fix
   - Fix error handling in psp_sw_init()
   - SMU13 fix
   - DCN 3.1 fix
   - DCN 3.2 fix
   - Fix for display PHY programming sequence
   - DP MST error handling fix
   - GFX 9.4.3 fix

  amdkfd:
   - GFX11 trap handling fix

  i915:
   - Use shmem for dpt objects
   - Fix an error handling path in igt_write_huge()

  msm:
   - display:
      - Fix to correct the UBWC programming for decoder version 4.3 seen
        on SM8550
      - Add the missing flush and fetch bits for DMA4 and DMA5 SSPPs.
      - Fix to drop the unused dpu_core_perf_data_bus_id enum from the
        code
      - Drop the unused dsi_phy_14nm_17mA_regulators from QCM 2290 DSI
        cfg.
   - gpu:
      - Fix warn splat for newer devices without revn
      - Remove name/revn for a690.. we shouldn't be populating these for
        newer devices, for consistency, but it slipped through review
      - Fix a6xx gpu snapshot BINDLESS_DATA size (was listed in bytes
        instead of dwords, causing AHB faults on a6xx gen4/a660-family)
      - Disallow submit with fence id 0"

* tag 'drm-fixes-2023-07-28' of git://anongit.freedesktop.org/drm/drm: (22 commits)
  drm/msm: Disallow submit with fence id 0
  drm/amdgpu: Restore HQD persistent state register
  drm/amd/display: Unlock on error path in dm_handle_mst_sideband_msg_ready_event()
  drm/amd/display: Exit idle optimizations before attempt to access PHY
  drm/amd/display: Don't apply FIFO resync W/A if rdivider = 0
  drm/amd/display: Guard DCN31 PHYD32CLK logic against chip family
  drm/amd/smu: use AverageGfxclkFrequency* to replace previous GFX Curr Clock
  drm/amd: Fix an error handling mistake in psp_sw_init()
  drm/amdgpu: Fix infinite loop in gfxhub_v1_2_xcc_gart_enable (v2)
  drm/amdkfd: fix trap handling work around for debugging
  drm/fb-helper: Remove unused inline function drm_fb_helper_defio_init()
  drm/i915: Fix an error handling path in igt_write_huge()
  drm/i915/dpt: Use shmem for dpt objects
  drm/msm: Fix hw_fence error path cleanup
  drm/msm: Fix IS_ERR_OR_NULL() vs NULL check in a5xx_submit_in_rb()
  drm/msm/adreno: Fix snapshot BINDLESS_DATA size
  drm/msm/a690: Remove revn and name
  drm/msm/adreno: Fix warn splat for devices without revn
  drm/msm/dsi: Drop unused regulators from QCM2290 14nm DSI PHY config
  drm/msm/dpu: drop enum dpu_core_perf_data_bus_id
  ...

Merge tag 'cxl-fixes-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull cxl fixes from Vishal Verma:

- Update MAINTAINERS for cxl

- A few static analysis fixes

- Fix a Kconfig dependency for CONFIG_FW_LOADER

* tag 'cxl-fixes-6.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  tools/testing/cxl: Remove unused SZ_512G macro
  cxl/acpi: Return 'rc' instead of '0' in cxl_parse_cfmws()
  cxl/acpi: Fix a use-after-free in cxl_parse_cfmws()
  cxl: Update MAINTAINERS
  cxl/mem: Fix a double shift bug
  cxl: fix CONFIG_FW_LOADER dependency

Revert "mm,memblock: reset memblock.reserved to system init state to prevent UAF"

This reverts commit 9e46e4dcd9d6cd88342b028dbfa5f4fb7483d39c.

kbuild reports a warning in memblock_remove_region() because of a false
positive caused by partial reset of the memblock state.

Doing the full reset will remove the false positives, but will allow
late use of memblock_free() to go unnoticed, so it is better to revert
the offending commit.

   WARNING: CPU: 0 PID: 1 at mm/memblock.c:352 memblock_remove_region (kbuild/src/x86_64/mm/memblock.c:352 (discriminator 1))
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc3-00001-g9e46e4dcd9d6 #2
   RIP: 0010:memblock_remove_region (kbuild/src/x86_64/mm/memblock.c:352 (discriminator 1))
   Call Trace:
     memblock_discard (kbuild/src/x86_64/mm/memblock.c:383)
     page_alloc_init_late (kbuild/src/x86_64/include/linux/find.h:208 kbuild/src/x86_64/include/linux/nodemask.h:266 kbuild/src/x86_64/mm/mm_init.c:2405)
     kernel_init_freeable (kbuild/src/x86_64/init/main.c:1325 kbuild/src/x86_64/init/main.c:1546)
     kernel_init (kbuild/src/x86_64/init/main.c:1439)
     ret_from_fork (kbuild/src/x86_64/arch/x86/kernel/process.c:145)
     ret_from_fork_asm (kbuild/src/x86_64/arch/x86/entry/entry_64.S:298)

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Signed-off-by: "Mike Rapoport (IBM)" <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/mempolicy: Take VMA lock before replacing policy

mbind() calls down into vma_replace_policy() without taking the per-VMA
locks, replaces the VMA's vma->vm_policy pointer, and frees the old
policy. That's bad; a concurrent page fault might still be using the
old policy (in vma_alloc_folio()), resulting in use-after-free.

Normally this will manifest as a use-after-free read first, but it can
result in memory corruption, including because vma_alloc_folio() can
call mpol_cond_put() on the freed policy, which conditionally changes
the policy's refcount member.

This bug is specific to CONFIG_NUMA, but it does also affect non-NUMA
systems as long as the kernel was built with CONFIG_NUMA.

Signed-off-by: Jann Horn <[email protected]>
Reviewed-by: Suren Baghdasaryan <[email protected]>
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

drm/imx/ipuv3: Fix front porch adjustment upon hactive aligning

When hactive is not aligned to 8 pixels, it is aligned accordingly and
hfront porch needs to be reduced the same amount. Unfortunately the front
porch is set to the difference rather than reducing it. There are some
Samsung TVs which can't cope with a front porch of instead of 70.

Fixes: 94dfec48fca7 ("drm/imx: Add 8 pixel alignment fix")
Signed-off-by: Alexander Stein <[email protected]>
Reviewed-by: Philipp Zabel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[[email protected]: Fixed subject]
Signed-off-by: Philipp Zabel <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

ACPI/IORT: Remove erroneous id_count check in iort_node_get_rmr_info()

According to the ARM IORT specifications DEN 0049 issue E,
the "Number of IDs" field in the ID mapping format reports
the number of IDs in the mapping range minus one.

In iort_node_get_rmr_info(), we erroneously skip ID mappings
whose "Number of IDs" equal to 0, resulting in valid mapping
nodes with a single ID to map being skipped, which is wrong.

Fix iort_node_get_rmr_info() by removing the bogus id_count
check.

Fixes: 491cf4a6735a ("ACPI/IORT: Add support to retrieve IORT RMR reserved regions")
Signed-off-by: Guanghui Feng <[email protected]>
Cc: <[email protected]> # 6.0.x
Acked-by: Lorenzo Pieralisi <[email protected]>
Tested-by: Hanjun Guo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>

powerpc/ftrace: Create a dummy stackframe to fix stack unwind

With ppc64 -mprofile-kernel and ppc32 -pg, profiling instructions to
call into ftrace are emitted right at function entry. The instruction
sequence used is minimal to reduce overhead. Crucially, a stackframe is
not created for the function being traced. This breaks stack unwinding
since the function being traced does not have a stackframe for itself.
As such, it never shows up in the backtrace:

/sys/kernel/debug/tracing # echo 1 > /proc/sys/kernel/stack_tracer_enabled
/sys/kernel/debug/tracing # cat stack_trace
        Depth    Size   Location    (17 entries)
        -----    ----   --------
  0)     4144      32   ftrace_call+0x4/0x44
  1)     4112     432   get_page_from_freelist+0x26c/0x1ad0
  2)     3680     496   __alloc_pages+0x290/0x1280
  3)     3184     336   __folio_alloc+0x34/0x90
  4)     2848     176   vma_alloc_folio+0xd8/0x540
  5)     2672     272   __handle_mm_fault+0x700/0x1cc0
  6)     2400     208   handle_mm_fault+0xf0/0x3f0
  7)     2192      80   ___do_page_fault+0x3e4/0xbe0
  8)     2112     160   do_page_fault+0x30/0xc0
  9)     1952     256   data_access_common_virt+0x210/0x220
10)     1696     400   0xc00000000f16b100
11)     1296     384   load_elf_binary+0x804/0x1b80
12)      912     208   bprm_execve+0x2d8/0x7e0
13)      704      64   do_execveat_common+0x1d0/0x2f0
14)      640     160   sys_execve+0x54/0x70
15)      480      64   system_call_exception+0x138/0x350
16)      416     416   system_call_common+0x160/0x2c4

Fix this by having ftrace create a dummy stackframe for the function
being traced. With this, backtraces now capture the function being
traced:

/sys/kernel/debug/tracing # cat stack_trace
        Depth    Size   Location    (17 entries)
        -----    ----   --------
  0)     3888      32   _raw_spin_trylock+0x8/0x70
  1)     3856     576   get_page_from_freelist+0x26c/0x1ad0
  2)     3280      64   __alloc_pages+0x290/0x1280
  3)     3216     336   __folio_alloc+0x34/0x90
  4)     2880     176   vma_alloc_folio+0xd8/0x540
  5)     2704     416   __handle_mm_fault+0x700/0x1cc0
  6)     2288      96   handle_mm_fault+0xf0/0x3f0
  7)     2192      48   ___do_page_fault+0x3e4/0xbe0
  8)     2144     192   do_page_fault+0x30/0xc0
  9)     1952     608   data_access_common_virt+0x210/0x220
10)     1344      16   0xc0000000334bbb50
11)     1328     416   load_elf_binary+0x804/0x1b80
12)      912      64   bprm_execve+0x2d8/0x7e0
13)      848     176   do_execveat_common+0x1d0/0x2f0
14)      672     192   sys_execve+0x54/0x70
15)      480      64   system_call_exception+0x138/0x350
16)      416     416   system_call_common+0x160/0x2c4

This results in two additional stores in the ftrace entry code, but
produces reliable backtraces.

Fixes: 153086644fd1 ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
Cc: [email protected]
Signed-off-by: Naveen N Rao <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]

dt-bindings: net: rockchip-dwmac: fix {tx|rx}-delay defaults/range in schema

The range and the defaults are specified in the description instead of
being specified in the schema.
Fix it by adding the default value in the `default` field and specifying
the range as `minimum` and `maximum`.

Fixes: b331b8ef86f0 ("dt-bindings: net: convert rockchip-dwmac to json-schema")
Signed-off-by: Eugen Hristev <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

powerpc/mm/altmap: Fix altmap boundary check

altmap->free includes the entire free space from which altmap blocks
can be allocated. So when checking whether the kernel is doing altmap
block free, compute the boundary correctly, otherwise memory hotunplug
can fail.

Fixes: 9ef34630a461 ("powerpc/mm: Fallback to RAM if the altmap is unusable")
Signed-off-by: "Aneesh Kumar K.V" <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://msgid.link/[email protected]

Merge tag 'mlx5-fixes-2023-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2023-07-26

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2023-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Unregister devlink params in case interface is down
  net/mlx5: DR, Fix peer domain namespace setting
  net/mlx5: fs_chains: Fix ft prio if ignore_flow_level is not supported
  net/mlx5e: kTLS, Fix protection domain in use syndrome when devlink reload
  net/mlx5: Bridge, set debugfs access right to root-only
  net/mlx5e: xsk: Fix crash on regular rq reactivation
  net/mlx5e: xsk: Fix invalid buffer access for legacy rq
  net/mlx5e: Move representor neigh cleanup to profile cleanup_tx
  net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set
  net/mlx5e: Don't hold encap tbl lock if there is no encap action
  net/mlx5: Honor user input for migratable port fn attr
  net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer()
  net/mlx5: fix potential memory leak in mlx5e_init_rep_rx
  net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx
  net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: fix value check in bcm_sf2_sw_probe()

in bcm_sf2_sw_probe(), check the return value of clk_prepare_enable()
and return the error code if clk_prepare_enable() returns an
unexpected value.

Fixes: e9ec5c3bd238 ("net: dsa: bcm_sf2: request and handle clocks")
Signed-off-by: Yuanjun Gong <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: flower: fix stack-out-of-bounds in fl_set_key_cfm()

Typical misuse of

nla_parse_nested(array, XXX_MAX, ...);

array must be declared as

struct nlattr *array[XXX_MAX + 1];

v2: Based on feedbacks from Ido Schimmel and Zahari Doychev,
I also changed TCA_FLOWER_KEY_CFM_OPT_MAX and cfm_opt_policy
definitions.

syzbot reported:

BUG: KASAN: stack-out-of-bounds in __nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588
Write of size 32 at addr ffffc90003a0ee20 by task syz-executor296/5014

CPU: 0 PID: 5014 Comm: syz-executor296 Not tainted 6.5.0-rc2-syzkaller-00307-gd192f5382581 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:364 [inline]
print_report+0x163/0x540 mm/kasan/report.c:475
kasan_report+0x175/0x1b0 mm/kasan/report.c:588
kasan_check_range+0x27e/0x290 mm/kasan/generic.c:187
__asan_memset+0x23/0x40 mm/kasan/shadow.c:84
__nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588
__nla_parse+0x40/0x50 lib/nlattr.c:700
nla_parse_nested include/net/netlink.h:1262 [inline]
fl_set_key_cfm+0x1e3/0x440 net/sched/cls_flower.c:1718
fl_set_key+0x2168/0x6620 net/sched/cls_flower.c:1884
fl_tmplt_create+0x1fe/0x510 net/sched/cls_flower.c:2666
tc_chain_tmplt_add net/sched/cls_api.c:2959 [inline]
tc_ctl_chain+0x131d/0x1ac0 net/sched/cls_api.c:3068
rtnetlink_rcv_msg+0x82b/0xf50 net/core/rtnetlink.c:6424
netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2549
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x7c3/0x990 net/netlink/af_netlink.c:1365
netlink_sendmsg+0xa2a/0xd60 net/netlink/af_netlink.c:1914
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg net/socket.c:748 [inline]
____sys_sendmsg+0x592/0x890 net/socket.c:2494
___sys_sendmsg net/socket.c:2548 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2577
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f54c6150759
Code: 48 83 c4 28 c3 e8 d7 19 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe06c30578 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f54c619902d RCX: 00007f54c6150759
RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
RBP: 00007ffe06c30590 R08: 0000000000000000 R09: 00007ffe06c305f0
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f54c61c35f0
R13: 00007ffe06c30778 R14: 0000000000000001 R15: 0000000000000001
</TASK>

The buggy address belongs to stack of task syz-executor296/5014
and is located at offset 32 in frame:
fl_set_key_cfm+0x0/0x440 net/sched/cls_flower.c:374

This frame has 1 object:
[32, 56) 'nla_cfm_opt'

The buggy address belongs to the virtual mapping at
[ffffc90003a08000, ffffc90003a11000) created by:
copy_process+0x5c8/0x4290 kernel/fork.c:2330

Fixes: 7cfffd5fed3e ("net: flower: add support for matching cfm fields")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Simon Horman <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Zahari Doychev <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

LoongArch: Cleanup __builtin_constant_p() checking for cpu_has_*

In the current configuration, cpu_has_lsx and cpu_has_lasx cannot be
constants. So cleanup the __builtin_constant_p() checking to reduce the
complexity.

Reviewed-by: WANG Xuerui <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: BPF: Fix check condition to call lu32id in move_imm()

As the code comment says, the initial aim is to reduce one instruction
in some corner cases, if bit[51:31] is all 0 or all 1, no need to call
lu32id. That is to say, it should call lu32id only if bit[51:31] is not
all 0 and not all 1. The current code always call lu32id, the result is
right but the logic is unexpected and wrong, fix it.

Cc: [email protected] # 6.1
Fixes: 5dc615520c4d ("LoongArch: Add BPF JIT support")
Reported-by: Colin King (gmail) <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: BPF: Enable bpf_probe_read{, str}() on LoongArch

Currently nettrace does not work on LoongArch due to missing
bpf_probe_read{,str}() support, with the error message:

ERROR: failed to load kprobe-based eBPF
ERROR: failed to load kprobe-based bpf

According to commit 0ebeea8ca8a4d1d ("bpf: Restrict bpf_probe_read{,
str}() only to archs where they work"), we only need to select
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE to add said support,
because LoongArch does have non-overlapping address ranges for kernel
and userspace.

Cc: [email protected] # 6.1
Signed-off-by: Chenguang Zhao <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Fix return value underflow in exception path

This patch fixes an underflow issue in the return value within the
exception path, specifically at .Llt8 when the remaining length is less
than 8 bytes.

Cc: [email protected]
Fixes: 8941e93ca590 ("LoongArch: Optimize memory ops (memset/memcpy/memmove)")
Reported-by: Weihao Li <[email protected]>
Signed-off-by: WANG Rui <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Fix CMDLINE_EXTEND and CMDLINE_BOOTLOADER handling

On FDT systems these command line processing are already taken care of
by early_init_dt_scan_chosen(). Add similar handling to the ACPI (non-
FDT) code path to allow these config options to work for ACPI (non-FDT)
systems too.

Signed-off-by: Zhihong Dong <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Fix module relocation error with binutils 2.41

Binutils 2.41 enables linker relaxation by default, but the kernel
module loader doesn't support that, so just disable it. Otherwise we
get such an error when loading modules:

"Unknown relocation type 102"

As an alternative, we could add linker relaxation support in the kernel
module loader. But it is relatively large complexity that may or may not
bring a similar gain, and we don't really want to include this linker
pass in the kernel.

Reviewed-by: WANG Xuerui <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>

LoongArch: Only fiddle with CHECKFLAGS if `need-compiler'

This is a port of commit 4fe4a6374c4db9ae2b ("MIPS: Only fiddle with
CHECKFLAGS if `need-compiler'") to LoongArch.

We have originally guarded fiddling with CHECKFLAGS in our arch Makefile
by checking for the CONFIG_LOONGARCH variable, not set for targets such
as `distclean', etc. that neither include `.config' nor use the compiler.

Starting from commit 805b2e1d427aab4 ("kbuild: include Makefile.compiler
only when compiler is needed") we have had a generic `need-compiler'
variable explicitly telling us if the compiler will be used and thus its
capabilities need to be checked and expressed in the form of compilation
flags. If this variable is not set, then `make' functions such as
`cc-option' are undefined, causing all kinds of weirdness to happen if
we expect specific results to be returned.

It doesn't cause problems on LoongArch now. But as a guard we replace
the check for CONFIG_LOONGARCH with one for `need-compiler' instead, so
as to prevent the compiler from being ever called for CHECKFLAGS when
not needed.

Signed-off-by: Huacai Chen <[email protected]>

ata: libata-core: fix when to fetch sense data for successful commands

The condition to fetch sense data was supposed to be:
ATA_SENSE set AND either
1) Command was NCQ and ATA_DFLAG_CDL_ENABLED flag set (flag
ATA_DFLAG_CDL_ENABLED will only be set if the Successful NCQ command
sense data supported bit is set); or
2) Command was non-NCQ and regular sense data reporting is enabled.

However the check in 2) accidentally had the negation at the wrong place,
causing it to try to fetch sense data if it was a non-NCQ command _or_
if regular sense data reporting was _not_ enabled.

Fix this by removing the extra parentheses that should not be there,
such that only the correct return (ata_is_ncq()) is negated.

Fixes: 18bd7718b5c4 ("scsi: ata: libata: Handle completion of CDL commands using policy 0xD")
Reported-by: Borislav Petkov <[email protected]>
Closes: https://lore.kernel.org/linux-ide/20230722155621.GIZLv8JbURKzHtKvQE@fat_crate.local/
Signed-off-by: Niklas Cassel <[email protected]>
Tested-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Jason Yan <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>

Merge tag 'drm-msm-fixes-2023-07-27' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

Fixes for v6.5-rc4

Display:
+ Fix to correct the UBWC programming for decoder version 4.3 seen
  on SM8550
+ Add the missing flush and fetch bits for DMA4 and DMA5 SSPPs.
+ Fix to drop the unused dpu_core_perf_data_bus_id enum from the code
+ Drop the unused dsi_phy_14nm_17mA_regulators from QCM 2290 DSI cfg.

GPU:
+ Fix warn splat for newer devices without revn
+ Remove name/revn for a690.. we shouldn't be populating these for
  newer devices, for consistency, but it slipped through review
+ Fix a6xx gpu snapshot BINDLESS_DATA size (was listed in bytes
  instead of dwords, causing AHB faults on a6xx gen4/a660-family)
+ Disallow submit with fence id 0

Signed-off-by: Dave Airlie <[email protected]>
From: Rob Clark <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGs9MwCSfiyv8i7yWAsJKYEzCDyzaTx=ujX80Y23rZd9RA@mail.gmail.com

Merge tag 'amd-drm-fixes-6.5-2023-07-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.5-2023-07-26:

amdgpu:
- gfxhub partition fix
- Fix error handling in psp_sw_init()
- SMU13 fix
- DCN 3.1 fix
- DCN 3.2 fix
- Fix for display PHY programming sequence
- DP MST error handling fix
- GFX 9.4.3 fix

amdkfd:
- GFX11 trap handling fix

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-intel-fixes-2023-07-27' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Use shmem for dpt objects [dpt] (Radhakrishna Sripada)
- Fix an error handling path in igt_write_huge() (Christophe JAILLET)

Signed-off-by: Dave Airlie <[email protected]>
From: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/ZMI4Mtom7pDhLB7M@tursulin-desk

Merge tag 'drm-misc-fixes-2023-07-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

A single patch to remove an unused function.

Signed-off-by: Dave Airlie <[email protected]>
From: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/dqvxednqyab5t7gvwvcq72x6yu7ug5gusmhpgs3kq6z7pf3co6@ofr6s7547gbe

MAINTAINERS: stmmac: retire Giuseppe Cavallaro

I tried to get stmmac maintainers to be more active by agreeing with
them off-list on a review rotation. I pinged Peppe 3 times over 2 weeks
during his "shift month", no reviews are flowing.

All the contributions are much appreciated! But stmmac is quite
active, we need participating maintainers :(

Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: fix older DSA drivers using phylink

Older DSA drivers that do not provide an dsa_ops adjust_link method end
up using phylink. Unfortunately, a recent phylink change that requires
its supported_interfaces bitmap to be filled breaks these drivers
because the bitmap remains empty.

Rather than fixing each driver individually, fix it in the core code so
we have a sensible set of defaults.

Reported-by: Sergei Antonov <[email protected]>
Fixes: de5c9bf40c45 ("net: phylink: require supported_interfaces to be filled")
Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Tested-by: Vladimir Oltean <[email protected]> # dsa_loop
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length

There are totally 9 ndo_bridge_setlink handlers in the current kernel,
which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3)
i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5)
ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7)
nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink.

By investigating the code, we find that 1-7 parse and use nlattr
IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can
lead to an out-of-attribute read and allow a malformed nlattr (e.g.,
length 0) to be viewed as a 2 byte integer.

To avoid such issues, also for other ndo_bridge_setlink handlers in the
future. This patch adds the nla_len check in rtnl_bridge_setlink and
does an early error return if length mismatches. To make it works, the
break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure
this nla_for_each_nested iterates every attribute.

Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
Fixes: 51616018dd1b ("i40e: Add support for getlink, setlink ndo ops")
Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: Lin Ma <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Reviewed-by: Hangbin Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ata: pata_ns87415: mark ns87560_tf_read static

The global function triggers a warning because of the missing prototype

drivers/ata/pata_ns87415.c:263:6: warning: no previous prototype for 'ns87560_tf_read' [-Wmissing-prototypes]
263 | void ns87560_tf_read(struct ata_port *ap, struct ata_taskfile *tf)

There are no other references to this, so just make it static.

Fixes: c4b5b7b6c4423 ("pata_ns87415: Initial cut at 87415/87560 IDE support")
Reviewed-by: Sergey Shtylyov <[email protected]>
Reviewed-by: Serge Semin <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>

mm/memory-failure: fix hardware poison check in unpoison_memory()

It was pointed out[1] that using folio_test_hwpoison() is wrong as we need
to check the indiviual page that has poison. folio_test_hwpoison() only
checks the head page so go back to using PageHWPoison().

User-visible effects include existing hwpoison-inject tests possibly
failing as unpoisoning a single subpage could lead to unpoisoning an
entire folio. Memory unpoisoning could also not work as expected as
the function will break early due to only checking the head page and
not the actually poisoned subpage.

[1]: https://lore.kernel.org/lkml/[email protected]/

Link: https://lkml.kernel.org/r/[email protected]
Fixes: a6fddef49eef ("mm/memory-failure: convert unpoison_memory() to folios")
Signed-off-by: Sidhartha Kumar <[email protected]>
Reported-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

proc/vmcore: fix signedness bug in read_from_oldmem()

The bug is the error handling:

if (tmp < nr_bytes) {

"tmp" can hold negative error codes but because "nr_bytes" is type size_t
the negative error codes are treated as very high positive values
(success). Fix this by changing "nr_bytes" to type ssize_t. The
"nr_bytes" variable is used to store values between 1 and PAGE_SIZE and
they can fit in ssize_t without any issue.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5d8de293c224 ("vmcore: convert copy_oldmem_page() to take an iov_iter")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Baoquan He <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mailmap: update remaining active codeaurora.org email addresses

The lack of mailmap updates for @codeaurora.org addresses reduces the
usefulness of tools such as get_maintainer.pl. Some recent (and welcome!)
additions has been made to improve the situation, this concludes the
effort.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bjorn Andersson <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Konrad Dybcio <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: lock VMA in dup_anon_vma() before setting ->anon_vma

When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the
VMA that is being expanded to cover the area previously occupied by
another VMA.  This currently happens while `dst` is not write-locked.

This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon as
the assignment `dst->anon_vma = src->anon_vma` has happened, concurrent
page faults can happen on `dst` under the per-VMA lock.  This is already
icky in itself, since such page faults can now install pages into `dst`
that are attached to an `anon_vma` that is not yet tied back to the
`anon_vma` with an `anon_vma_chain`.  But if `anon_vma_clone()` fails due
to an out-of-memory error, things get much worse: `anon_vma_clone()` then
reverts `dst->anon_vma` back to NULL, and `dst` remains completely
unconnected to the `anon_vma`, even though we can have pages in the area
covered by `dst` that point to the `anon_vma`.

This means the `anon_vma` of such pages can be freed while the pages are
still mapped into userspace, which leads to UAF when a helper like
folio_lock_anon_vma_read() tries to look up the anon_vma of such a page.

This theoretically is a security bug, but I believe it is really hard to
actually trigger as an unprivileged user because it requires that you can
make an order-0 GFP_KERNEL allocation fail, and the page allocator tries
pretty hard to prevent that.

I think doing the vma_start_write() call inside dup_anon_vma() is the most
straightforward fix for now.

For a kernel-assisted reproducer, see the notes section of the patch mail.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <[email protected]>
Reviewed-by: Suren Baghdasaryan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: fix memory ordering for mm_lock_seq and vm_lock_seq

mm->mm_lock_seq effectively functions as a read/write lock; therefore it
must be used with acquire/release semantics.

A specific example is the interaction between userfaultfd_register() and
lock_vma_under_rcu().

userfaultfd_register() does the following from the point where it changes
a VMA's flags to the point where concurrent readers are permitted again
(in a simple scenario where only a single private VMA is accessed and no
merging/splitting is involved):

userfaultfd_register
  userfaultfd_set_vm_flags
    vm_flags_reset
      vma_start_write
        down_write(&vma->vm_lock->lock)
        vma->vm_lock_seq = mm_lock_seq [marks VMA as busy]
        up_write(&vma->vm_lock->lock)
      vm_flags_init
        [sets VM_UFFD_* in __vm_flags]
  vma->vm_userfaultfd_ctx.ctx = ctx
  mmap_write_unlock
    vma_end_write_all
      WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA]

There are no memory barriers in between the __vm_flags update and the
mm->mm_lock_seq update that unlocks the VMA, so the unlock can be
reordered to above the `vm_flags_init()` call, which means from the
perspective of a concurrent reader, a VMA can be marked as a userfaultfd
VMA while it is not VMA-locked.  That's bad, we definitely need a
store-release for the unlock operation.

The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly
fine because all accesses to vma->vm_lock_seq that matter are always
protected by the VMA lock.  There is a racy read in vma_start_read()
though that can tolerate false-positives, so we should be using
WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN).

On the other side, lock_vma_under_rcu() works as follows in the relevant
region for locking and userfaultfd check:

lock_vma_under_rcu
  vma_start_read
    vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout]
    down_read_trylock(&vma->vm_lock->lock)
    vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check]
  userfaultfd_armed
    checks vma->vm_flags & __VM_UFFD_FLAGS

Here, the interesting aspect is how far down the mm->mm_lock_seq read can
be reordered - if this read is reordered down below the vma->vm_flags
access, this could cause lock_vma_under_rcu() to partly operate on
information that was read while the VMA was supposed to be locked.  To
prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we
need to read it with a load-acquire.

Some of the comment wording is based on suggestions by Suren.

BACKPORT WARNING: One of the functions changed by this patch (which I've
written against Linus' tree) is vma_try_start_write(), but this function
no longer exists in mm/mm-everything.  I don't know whether the merged
version of this patch will be ordered before or after the patch that
removes vma_try_start_write().  If you're backporting this patch to a tree
with vma_try_start_write(), make sure this patch changes that function.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <[email protected]>
Reviewed-by: Suren Baghdasaryan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>