Git Repo - linux.git/log

xfs: aborting inodes on shutdown may need buffer lock

Most buffer io list operations are run with the bp->b_lock held, but
xfs_iflush_abort() can be called without the buffer lock being held
resulting in inodes being removed from the buffer list while other
list operations are occurring. This causes problems with corrupted
bp->b_io_list inode lists during filesystem shutdown, leading to
traversals that never end, double removals from the AIL, etc.

Fix this by passing the buffer to xfs_iflush_abort() if we have
it locked. If the inode is attached to the buffer, we're going to
have to remove it from the buffer list and we'd have to get the
buffer off the inode log item to do that anyway.

If we don't have a buffer passed in (e.g. from xfs_reclaim_inode())
then we can determine if the inode has a log item and if it is
attached to a buffer before we do anything else. If it does have an
attached buffer, we can lock it safely (because the inode has a
reference to it) and then perform the inode abort.

Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

Merge tag 'jfs-5.18' of https://github.com/kleikamp/linux-shaggy

Pull jfs updates from Dave Kleikamp:
"A couple bug fixes"

* tag 'jfs-5.18' of https://github.com/kleikamp/linux-shaggy:
jfs: prevent NULL deref in diFree
jfs: fix divide error in dbNextAG

dt-bindings: net: qcom,ethqos: Document SM8150 SoC compatible

SM8150 has an ethernet controller and it needs a different
configuration, so add a new compatible for this.

Acked-by: Rob Herring <[email protected]>
Signed-off-by: Vinod Koul <[email protected]>
[bhsharma: Massage the commit log]
Signed-off-by: Bhupesh Sharma <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

lib/test: use after free in register_test_dev_kmod()

The "test_dev" pointer is freed but then returned to the caller.

Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>

fs: fd tables have to be multiples of BITS_PER_LONG

This has always been the rule: fdtables have several bitmaps in them,
and as a result they have to be sized properly for bitmaps.  We walk
those bitmaps in chunks of 'unsigned long' in serveral cases, but even
when we don't, we use the regular kernel bitops that are defined to work
on arrays of 'unsigned long', not on some byte array.

Now, the distinction between arrays of bytes and 'unsigned long'
normally only really ends up being noticeable on big-endian systems, but
Fedor Pchelkin and Alexey Khoroshilov reported that copy_fd_bitmaps()
could be called with an argument that wasn't even a multiple of
BITS_PER_BYTE.  And then it fails to do the proper copy even on
little-endian machines.

The bug wasn't in copy_fd_bitmap(), but in sane_fdtable_size(), which
didn't actually sanitize the fdtable size sufficiently, and never made
sure it had the proper BITS_PER_LONG alignment.

That's partly because the alignment historically came not from having to
explicitly align things, but simply from previous fdtable sizes, and
from count_open_files(), which counts the file descriptors by walking
them one 'unsigned long' word at a time and thus naturally ends up doing
sizing in the proper 'chunks of unsigned long'.

But with the introduction of close_range(), we now have an external
source of "this is how many files we want to have", and so
sane_fdtable_size() needs to do a better job.

This also adds that explicit alignment to alloc_fdtable(), although
there it is mainly just for documentation at a source code level.  The
arithmetic we do there to pick a reasonable fdtable size already aligns
the result sufficiently.

In fact,clang notices that the added ALIGN() in that function doesn't
actually do anything, and does not generate any extra code for it.

It turns out that gcc ends up confusing itself by combining a previous
constant-sized shift operation with the variable-sized shift operations
in roundup_pow_of_two().  And probably due to that doesn't notice that
the ALIGN() is a no-op.  But that's a (tiny) gcc misfeature that doesn't
matter.  Having the explicit alignment makes sense, and would actually
matter on a 128-bit architecture if we ever go there.

This also adds big comments above both functions about how fdtable sizes
have to have that BITS_PER_LONG alignment.

Fixes: 60997c3d45d9 ("close_range: add CLOSE_RANGE_UNSHARE")
Reported-by: Fedor Pchelkin <[email protected]>
Reported-by: Alexey Khoroshilov <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Tested-and-acked-by: Christian Brauner <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

riscv module: remove (NOLOAD)

On ELF, (NOLOAD) sets the section type to SHT_NOBITS[1]. It is conceptually
inappropriate for .plt, .got, and .got.plt sections which are always
SHT_PROGBITS.

In GNU ld, if PLT entries are needed, .plt will be SHT_PROGBITS anyway
and (NOLOAD) will be essentially ignored. In ld.lld, since
https://reviews.llvm.org/D118840 ("[ELF] Support (TYPE=<value>) to
customize the output section type"), ld.lld will report a `section type
mismatch` error (later changed to a warning). Just remove (NOLOAD) to
fix the warning.

[1] https://lld.llvm.org/ELF/linker_script.html As of today, "The
section should be marked as not loadable" on
https://sourceware.org/binutils/docs/ld/Output-Section-Type.html is
outdated for ELF.

Link: https://github.com/ClangBuiltLinux/linux/issues/1597
Fixes: ab1ef68e5401 ("RISC-V: Add sections of PLT and GOT for kernel module")
Reported-by: Nathan Chancellor <[email protected]>
Signed-off-by: Fangrui Song <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

rtc: check if __rtc_read_time was successful

Clang static analysis reports this issue
interface.c:810:8: warning: Passed-by-value struct
  argument contains uninitialized data
  now = rtc_tm_to_ktime(tm);
      ^~~~~~~~~~~~~~~~~~~

tm is set by a successful call to __rtc_read_time()
but its return status is not checked.  Check if
it was successful before setting the enabled flag.
Move the decl of err to function scope.

Fixes: 2b2f5ff00f63 ("rtc: interface: ignore expired timers when enqueuing new timers")
Signed-off-by: Tom Rix <[email protected]>
Signed-off-by: Alexandre Belloni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

rtc: gamecube: Fix refcount leak in gamecube_rtc_read_offset_from_sram

The of_find_compatible_node() function returns a node pointer with
refcount incremented, We should use of_node_put() on it when done
Add the missing of_node_put() to release the refcount.

Fixes: 86559400b3ef ("rtc: gamecube: Add a RTC driver for the GameCube, Wii and Wii U")
Signed-off-by: Miaoqian Lin <[email protected]>
Signed-off-by: Alexandre Belloni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

rtc: mc146818-lib: Fix the AltCentury for AMD platforms

Setting the century forward has been failing on AMD platforms.
There was a previous attempt at fixing this for family 0x17 as part of
commit 7ad295d5196a ("rtc: Fix the AltCentury value on AMD/Hygon
platform") but this was later reverted due to some problems reported
that appeared to stem from an FW bug on a family 0x17 desktop system.

The same comments mentioned in the previous commit continue to apply
to the newer platforms as well.

```
MC146818 driver use function mc146818_set_time() to set register
RTC_FREQ_SELECT(RTC_REG_A)'s bit4-bit6 field which means divider stage
reset value on Intel platform to 0x7.

While AMD/Hygon RTC_REG_A(0Ah)'s bit4 is defined as DV0 [Reference]:
DV0 = 0 selects Bank 0, DV0 = 1 selects Bank 1. Bit5-bit6 is defined
as reserved.

DV0 is set to 1, it will select Bank 1, which will disable AltCentury
register(0x32) access. As UEFI pass acpi_gbl_FADT.century 0x32
(AltCentury), the CMOS write will be failed on code:
CMOS_WRITE(century, acpi_gbl_FADT.century).

Correct RTC_REG_A bank select bit(DV0) to 0 on AMD/Hygon CPUs, it will
enable AltCentury(0x32) register writing and finally setup century as
expected.
```

However in closer examination the change previously submitted was also
modifying bits 5 & 6 which are declared reserved in the AMD documentation.
So instead modify just the DV0 bank selection bit.

Being cognizant that there was a failure reported before, split the code
change out to a static function that can also be used for exclusions if
any regressions such as Mikhail's pop up again.

Cc: Jinke Fan <[email protected]>
Cc: Mikhail Gavrilov <[email protected]>
Link: https://lore.kernel.org/all/CABXGCsMLob0DC25JS8wwAYydnDoHBSoMh2_YLPfqm3TTvDE-Zw@mail.gmail.com/
Link: https://www.amd.com/system/files/TechDocs/51192_Bolton_FCH_RRG.pdf
Signed-off-by: Raul E Rangel <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alexandre Belloni <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

io_uring: defer msg-ring file validity check until command issue

In preparation for not using the file at prep time, defer checking if this
file refers to a valid io_uring instance until issue time.

Signed-off-by: Jens Axboe <[email protected]>

parisc: Fix patch code locking and flushing

This change fixes the following:

1) The flags variable is not initialized. Always use raw_spin_lock_irqsave
and raw_spin_unlock_irqrestore to serialize patching.

2) flush_kernel_vmap_range is primarily intended for DMA flushes. Since
__patch_text_multiple is often called with interrupts disabled, it is
better to directly call flush_kernel_dcache_range_asm and
flush_kernel_icache_range_asm. This avoids an extra call.

3) The final call to flush_icache_range is unnecessary.

Signed-off-by: John David Anglin <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

parisc: Find a new timesync master if current CPU is removed

When CPU hotplugging is enabled, the user may want to remove the
current CPU which is providing the timer ticks. If this happens
we need to find a new timesync master.

Signed-off-by: Helge Deller <[email protected]>

parisc: Move common_stext into .text section when CONFIG_HOTPLUG_CPU=y

Move the common_stext function into the non-init text section if hotplug
is enabled. This function is called from the firmware when hotplugged
CPUs are brought up.

Signed-off-by: Helge Deller <[email protected]>

parisc: Rewrite arch_cpu_idle_dead() for CPU hotplugging

Let the PDC firmware put the CPU into firmware idle loop with the
pdc_cpu_rendezvous() function.

Signed-off-by: Helge Deller <[email protected]>

parisc: Implement __cpu_die() and __cpu_disable() for CPU hotplugging

Add relevant code to __cpu_die() and __cpu_disable() to finally enable
the CPU hotplugging features. Reset the irq count values in smp_callin()
to zero before bringing up the CPU.

It seems that the firmware may need up to 8 seconds to fully stop a CPU
in which no other PDC calls are allowed to be made. Use a timeout
__cpu_die() to accommodate for this.

Use "chcpu -d 1" to bring CPU1 down, and "chcpu -e 1" to bring it up.

Signed-off-by: Helge Deller <[email protected]>

parisc: Add PDC locking functions for rendezvous code

Add pdc_cpu_rendezvous_lock() and pdc_cpu_rendezvous_unlock()
to lock PDC while CPU is transitioning into rendezvous state.
This is needed, because the transition phase may take up to 8 seconds.

Add pdc_pat_get_PDC_entrypoint() to get PDC entry point for current CPU.

Signed-off-by: Helge Deller <[email protected]>

parisc: Move disable_sr_hashing_asm() into .text section

Signed-off-by: Helge Deller <[email protected]>

parisc: Move CPU startup-related functions into .text section

If CONFIG_HOTPLUG_CPU is enabled, those functions will be run again
after bootup. So they need to reside in the .text section.

Signed-off-by: Helge Deller <[email protected]>

parisc: Move store_cpu_topology() into text section

Signed-off-by: Helge Deller <[email protected]>

parisc: Switch from GENERIC_CPU_DEVICES to GENERIC_ARCH_TOPOLOGY

Switch away from the own cpu topology code to common code which is used
by ARM64 and RISCV. That will allow us to enable CPU hotplug later on.

Signed-off-by: Helge Deller <[email protected]>

parisc: Ensure set_firmware_width() is called only once

Call set_firmware_width() only once at runtime.
This prevents that hotplugged CPUs will get stuck in spinlocks later on.

Signed-off-by: Helge Deller <[email protected]>

parisc: Add constants for control registers and clean up mfctl()

Clean up the code for the mfctl() and mtctl() functions and add often
used constants.

Signed-off-by: Helge Deller <[email protected]>

parisc: Detect hppa-suse-linux-gcc compiler for cross-building

Allow the system to find the SUSE hppa compiler and linker to set
CROSS32_COMPILE and CROSS_COMPILE.

Suggested-by: Jiri Slaby <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

parisc: Clean up cpu_check_affinity() and drop cpu_set_affinity_irq()

The cpu_set_affinity_irq() isn't needed. Not the CPU irqs need to
change, but the slave irq chips simply need to be reprogrammed to
a new CPU irq with the txn_* functions.

Signed-off-by: Helge Deller <[email protected]>

parisc: Fix CPU affinity for Lasi, WAX and Dino chips

Add the missing logic to allow Lasi, WAX and Dino to set the
CPU affinity. This fixes IRQ migration to other CPUs when a
CPU is shutdown which currently holds the IRQs for one of those
chips.

Signed-off-by: Helge Deller <[email protected]>

x86/fpu: Remove redundant XCOMP_BV initialization

fpu_copy_uabi_to_guest_fpstate() initializes the XCOMP_BV field in the
XSAVE header. That's a leftover from the old KVM FPU buffer handling code.

Since

d69c1382e1b7 ("x86/kvm: Convert FPU handling to a single swap buffer")

KVM uses the FPU core allocation code, which initializes the XCOMP_BV
field already.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'devprop-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull device properties code update from Rafael Wysocki:
"This is based on new i2c material for 5.18-rc1 and simply reorganizes
  the code on top of it so as to group similar functions together (Andy
  Shevchenko)"

* tag 'devprop-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  device property: Don't split fwnode_get_irq*() APIs in the code

Merge tag 'pm-5.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
"These update ARM cpufreq drivers, the OPP (Operating Performance
  Points) library and the power management documentation.

  Specifics:

   - Add per core DVFS support for QCom SoC (Bjorn Andersson), convert
     to yaml binding (Manivannan Sadhasivam) and various other fixes to
     the QCom drivers (Luca Weiss).

   - Add OPP table for imx7s SoC (Denys Drozdov) and minor fixes (Stefan
     Agner).

   - Fix CPPC driver's freq/performance conversions (Pierre Gondois).

   - Minor generic cleanups (Yury Norov).

   - Introduce opp-microwatt property to the OPP core, bindings, etc
     (Lukasz Luba).

   - Convert DT bindings to schema format and various related fixes
     (Yassine Oudjana).

   - Expose OPP's OF node in debugfs (Viresh Kumar).

   - Add Intel uncore frequency scaling documentation file to its
     MAINTAINERS entry (Srinivas Pandruvada).

   - Clean up the AMD P-state driver documentation (Jan Engelhardt)"

* tag 'pm-5.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits)
  Documentation: amd-pstate: grammar and sentence structure updates
  dt-bindings: cpufreq: cpufreq-qcom-hw: Convert to YAML bindings
  dt-bindings: dvfs: Use MediaTek CPUFREQ HW as an example
  Documentation: EM: Describe new registration method using DT
  OPP: Add support of "opp-microwatt" for EM registration
  PM: EM: add macro to set .active_power() callback conditionally
  OPP: Add "opp-microwatt" supporting code
  dt-bindings: opp: Add "opp-microwatt" entry in the OPP
  MAINTAINERS: Add additional file to uncore frequency control
  cpufreq: blocklist Qualcomm sc8280xp and sa8540p in cpufreq-dt-platdev
  cpufreq: qcom-hw: Add support for per-core-dcvs
  dt-bindings: power: avs: qcom,cpr: Convert to DT schema
  arm64: dts: qcom: qcs404: Rename CPU and CPR OPP tables
  arm64: dts: qcom: msm8996: Rename cluster OPP tables
  dt-bindings: opp: Convert qcom-nvmem-cpufreq to DT schema
  dt-bindings: opp: qcom-opp: Convert to DT schema
  arm64: dts: qcom: msm8996-mtp: Add msm8996 compatible
  dt-bindings: arm: qcom: Add msm8996 and apq8096 compatibles
  opp: Expose of-node's name in debugfs
  cpufreq: CPPC: Fix performance/frequency conversion
  ...

KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated

Setting non-zero values to SYNIC/STIMER MSRs activates certain features,
this should not happen when KVM_CAP_HYPERV_SYNIC{,2} was not activated.

Note, it would've been better to forbid writing anything to SYNIC/STIMER
MSRs, including zeroes, however, at least QEMU tries clearing
HV_X64_MSR_STIMER0_CONFIG without SynIC. HV_X64_MSR_EOM MSR is somewhat
'special' as writing zero there triggers an action, this also should not
happen when SynIC wasn't activated.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <20220325132140 [email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: Avoid theoretical NULL pointer dereference in kvm_irq_delivery_to_apic_fast()

When kvm_irq_delivery_to_apic_fast() is called with APIC_DEST_SELF
shorthand, 'src' must not be NULL. Crash the VM with KVM_BUG_ON()
instead of crashing the host.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <20220325132140 [email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: Check lapic_in_kernel() before attempting to set a SynIC irq

When KVM_CAP_HYPERV_SYNIC{,2} is activated, KVM already checks for
irqchip_in_kernel() so normally SynIC irqs should never be set. It is,
however, possible for a misbehaving VMM to write to SYNIC/STIMER MSRs
causing erroneous behavior.

The immediate issue being fixed is that kvm_irq_delivery_to_apic()
(kvm_irq_delivery_to_apic_fast()) crashes when called with
'irq.shorthand = APIC_DEST_SELF' and 'src == NULL'.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <20220325132140 [email protected]>
Cc: [email protected]
Signed-off-by: Paolo Bonzini <[email protected]>

Documentation: KVM: add API issues section

Add a section to document all the different ways in which the KVM API sucks.

I am sure there are way more, give people a place to vent so that userspace
authors are aware.

Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <20220322110712 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Documentation: KVM: add virtual CPU errata documentation

Add a file to document all the different ways in which the virtual CPU
emulation is imperfect. Include an example to show how to document
such errata.

Signed-off-by: Paolo Bonzini <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Reviewed-by: Oliver Upton <[email protected]>
Message-Id: <20220322110712 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Documentation: KVM: add separate directories for architecture-specific documentation

ARM already has an arm/ subdirectory, but s390 and x86 do not even though
they have a relatively large number of files specific to them. Create
new directories in Documentation/virt/kvm for these two architectures
as well.

While at it, group the API documentation and the developer documentation
in the table of contents.

Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <20220322110712 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Documentation: kvm: include new locks

kvm->mn_invalidate_lock and kvm->slots_arch_lock were not included in the
documentation, add them.

Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <20220322110720 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Documentation: kvm: fixes for locking.rst

Separate the various locks clearly, and include the new names of blocked_vcpu_on_cpu_lock
and blocked_vcpu_on_cpu.

Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <20220322110720 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: x86: Fix clang -Wimplicit-fallthrough in do_host_cpuid()

Clang warns:

  arch/x86/kvm/cpuid.c:739:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
          default:
          ^
  arch/x86/kvm/cpuid.c:739:2: note: insert 'break;' to avoid fall-through
          default:
          ^
          break;
  1 error generated.

Clang is a little more pedantic than GCC, which does not warn when
falling through to a case that is just break or return. Clang's version
is more in line with the kernel's own stance in deprecated.rst, which
states that all switch/case blocks must end in either break,
fallthrough, continue, goto, or return. Add the missing break to silence
the warning.

Fixes: f144c49e8c39 ("KVM: x86: synthesize CPUID leaf 0x80000021h if useful")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Message-Id: <20220322152906 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge branches 'clk-sifive' and 'clk-visconti' into clk-next

* clk-sifive:
  clk: sifive: Move all stuff into SoCs header files from C files
  clk: sifive: Add SoCs prefix in each SoCs-dependent data
  riscv: dts: Change the macro name of prci in each device node
  dt-bindings: change the macro name of prci in header files and example
  clk: sifive: duplicate the macro definitions for the time being

* clk-visconti:
  clk: visconti: prevent array overflow in visconti_clk_register_gates()

Merge branches 'clk-range', 'clk-uniphier', 'clk-apple' and 'clk-qcom' into clk-next

- Make clk_set_rate_range() re-evaluate the limits each time
- Introduce various clk_set_rate_range() tests
- Add clk_drop_range() to drop a previously set range
- Support for NCO blocks on Apple SoCs

* clk-range:
  clk: Drop the rate range on clk_put()
  clk: test: Test clk_set_rate_range on orphan mux
  clk: Initialize orphan req_rate
  clk: bcm: rpi: Run some clocks at the minimum rate allowed
  clk: bcm: rpi: Set a default minimum rate
  clk: bcm: rpi: Add variant structure
  clk: Add clk_drop_range
  clk: Always set the rate on clk_set_range_rate
  clk: Use clamp instead of open-coding our own
  clk: Always clamp the rounded rate
  clk: Enforce that disjoints limits are invalid
  clk: Introduce Kunit Tests for the framework
  clk: Fix clk_hw_get_clk() when dev is NULL

* clk-uniphier:
  clk: uniphier: Fix fixed-rate initialization

* clk-apple:
  clk: clk-apple-nco: Allow and fix module building
  MAINTAINERS: Add clk-apple-nco under ARM/APPLE MACHINE
  clk: clk-apple-nco: Add driver for Apple NCO
  dt-bindings: clock: Add Apple NCO

* clk-qcom: (61 commits)
  clk: qcom: gcc-msm8994: Fix gpll4 width
  dt-bindings: clock: fix dt_binding_check error for qcom,gcc-other.yaml
  clk: qcom: Add display clock controller driver for SM6125
  dt-bindings: clock: add QCOM SM6125 display clock bindings
  clk: qcom: Fix sorting of SDX_GCC_65 in Makefile and Kconfig
  clk: qcom: gcc: Add emac GDSC support for SM8150
  clk: qcom: gcc: sm8150: Fix some identation issues
  clk: qcom: gcc: Add UFS_CARD and UFS_PHY GDSCs for SM8150
  clk: qcom: gcc: Add PCIe0 and PCIe1 GDSC for SM8150
  clk: qcom: clk-rcg2: Update the frac table for pixel clock
  clk: qcom: clk-rcg2: Update logic to calculate D value for RCG
  clk: qcom: smd: Add missing MSM8998 RPM clocks
  clk: qcom: smd: Add missing RPM clocks for msm8992/4
  dt-bindings: clock: qcom: rpmcc: Add RPM Modem SubSystem (MSS) clocks
  clk: qcom: gcc-ipq806x: add CryptoEngine resets
  dt-bindings: reset: add ipq8064 ce5 resets
  clk: qcom: gcc-ipq806x: add CryptoEngine clocks
  dt-bindings: clock: add ipq8064 ce5 clk define
  clk: qcom: gcc-ipq806x: add additional freq for sdc table
  clk: qcom: clk-rcg: add clk_rcg_floor_ops ops
  ...

Merge branches 'clk-starfive', 'clk-ti', 'clk-terminate' and 'clk-cleanup' into clk-next

- Audio clks on StarFive JH7100 RISC-V SoC
- Terminate arrays with sentinels and make that clearer
- Cleanup SPDX tags
- Fix typos in comments

* clk-starfive:
  clk: starfive: Add JH7100 audio clock driver
  clk: starfive: jh7100: Support more clock types
  clk: starfive: jh7100: Make hw clock implementation reusable
  dt-bindings: clock: Add starfive,jh7100-audclk bindings
  dt-bindings: clock: Add JH7100 audio clock definitions
  clk: starfive: jh7100: Handle audio_div clock properly
  clk: starfive: jh7100: Don't round divisor up twice

* clk-ti:
  clk: ti: Drop legacy compatibility clocks for dra7
  clk: ti: Drop legacy compatibility clocks for am4
  clk: ti: Drop legacy compatibility clocks for am3
  clk: ti: Update component clocks to use ti_dt_clk_name()
  clk: ti: Update pll and clockdomain clocks to use ti_dt_clk_name()
  clk: ti: Add ti_dt_clk_name() helper to use clock-output-names
  clk: ti: Use clock-output-names for clkctrl
  clk: ti: Add ti_find_clock_provider() to use clock-output-names
  clk: ti: Optionally parse IO address from parent clock node
  clk: ti: Preserve node in ti_dt_clocks_register()
  clk: ti: Constify clkctrl_name

* clk-terminate:
  clk: actions: Make sentinel elements more obvious
  clk: clps711x: Terminate clk_div_table with sentinel element
  clk: hisilicon: Terminate clk_div_table with sentinel element
  clk: loongson1: Terminate clk_div_table with sentinel element
  clk: actions: Terminate clk_div_table with sentinel element

* clk-cleanup:
  clk: zynq: Update the parameters to zynq_clk_register_periph_clk
  clk: zynq: trivial warning fix
  clk: qcom: sm6125-gcc: fix typos in comments
  clk: ti: clkctrl: fix typos in comments
  clk: COMMON_CLK_LAN966X should depend on SOC_LAN966
  clk: Use of_device_get_match_data()
  clk: bcm2835: Remove unused variable
  clk: tegra: tegra124-emc: Fix missing put_device() call in emc_ensure_emc_driver
  clk: cleanup comments
  clk: socfpga: cleanup spdx tags

Merge branches 'clk-mvebu', 'clk-const', 'clk-imx' and 'clk-rockchip' into clk-next

- Mark mux table as const in clk-mux
- Make the all_lists array const

* clk-mvebu:
  clk: mvebu: use time_is_before_eq_jiffies() instead of open coding it

* clk-const:
  clk: Mark clk_core_evict_parent_cache_subtree() 'target' const
  clk: Mark 'all_lists' as const
  clk: pistachio: Declare mux table as const u32[]
  clk: qcom: Declare mux table as const u32[]
  clk: mmp: Declare mux tables as const u32[]
  clk: hisilicon: Remove unnecessary cast of mux table to u32 *
  clk: mux: Declare u32 *table parameter as const
  clk: nxp: Declare mux table parameter as const u32 *
  clk: nxp: Remove unused variable

* clk-imx: (28 commits)
  dt-bindings: clock: drop useless consumer example
  clk: imx: Select MXC_CLK for i.MX93 clock driver
  clk: imx: remove redundant re-assignment of pll->base
  MAINTAINERS: clk: imx: add git tree and dt-bindings files
  clk: imx: pll14xx: Support dynamic rates
  clk: imx: pll14xx: Add pr_fmt
  clk: imx: pll14xx: explicitly return lowest rate
  clk: imx: pll14xx: name variables after usage
  clk: imx: pll14xx: consolidate rate calculation
  clk: imx: pll14xx: Use FIELD_GET/FIELD_PREP
  clk: imx: pll14xx: Drop wrong shifting
  clk: imx: pll14xx: Use register defines consistently
  clk: imx8mp: remove SYS PLL 1/2 clock gates
  clk: imx8mn: remove SYS PLL 1/2 clock gates
  clk: imx8mm: remove SYS PLL 1/2 clock gates
  clk: imx: add i.MX93 clk
  clk: imx: support fracn gppll
  clk: imx: add i.MX93 composite clk
  dt-bindings: clock: add i.MX93 clock definition
  dt-bindings: clock: Add imx93 clock support
  ...

* clk-rockchip:
  clk: rockchip: re-add rational best approximation algorithm to the fractional divider
  clk/rockchip: Use of_device_get_match_data()
  clk: rockchip: Add CLK_SET_RATE_PARENT to the HDMI reference clock on rk3568
  clk: rockchip: drop CLK_SET_RATE_PARENT from dclk_vop* on rk3568
  clk: rockchip: Add more PLL rates for rk3568

Merge branches 'clk-xilinx', 'clk-kunit', 'clk-cs2000' and 'clk-renesas' into clk-next

- Kunit tests for clk-gate implementation
- Convert Cirrus Logic CS2000P driver to regmap, yamlify DT binding and add
   support for dynamic mode

* clk-xilinx:
  clk: zynqmp: replace warn_once with pr_debug for failed clock ops

* clk-kunit:
  clk: gate: Add some kunit test suites

* clk-cs2000:
  clk: cs2000-cp: convert driver to regmap
  clk: cs2000-cp: freeze config during register fiddling
  clk: cs2000-cp: make clock skip setting configurable
  clk: cs2000-cp: add support for dynamic mode
  clk: cs2000-cp: Make aux output function controllable
  dt-bindings: clock: cs2000-cp: document cirrus,dynamic-mode
  dt-bindings: clock: cs2000-cp: document cirrus,clock-skip flag
  dt-bindings: clock: cs2000-cp: document aux-output-source
  dt-bindings: clock: convert cs2000-cp bindings to yaml

* clk-renesas:
  dt-bindings: clock: renesas: Make example 'clocks' parsable
  clk: rs9: Add Renesas 9-series PCIe clock generator driver
  clk: fixed-factor: Introduce devm_clk_hw_register_fixed_factor_index()
  dt-bindings: clk: rs9: Add Renesas 9-series I2C PCIe clock generator
  clk: renesas: r8a779f0: Add PFC clock
  clk: renesas: r8a779f0: Add I2C clocks
  clk: renesas: r8a779f0: Add WDT clock
  clk: renesas: r8a779f0: Fix RSW2 clock divider
  clk: renesas: rzg2l-cpg: Add support for RZ/V2L SoC
  dt-bindings: clock: renesas: Document RZ/V2L SoC
  dt-bindings: clock: Add R9A07G054 CPG Clock and Reset Definitions
  clk: renesas: r8a779a0: Add CANFD module clock
  clk: renesas: r9a07g044: Update multiplier and divider values for PLL2/3
  clk: renesas: r8a7799[05]: Add MLP clocks
  clk: renesas: r8a779f0: Add SYS-DMAC clocks

Merge branches 'clk-microchip', 'clk-si', 'clk-mtk', 'clk-at91' and 'clk-st' into clk-next

- Clock configuration on Microchip PolarFire SoCs
- Free allocations on probe error in Mediatek clk driver
- Modernize Mediatek clk driver by consolidating code

* clk-microchip:
  clk: microchip: Add driver for Microchip PolarFire SoC
  dt-bindings: clk: microchip: Add Microchip PolarFire host binding

* clk-si:
  clk-si5341: replace snprintf in show functions with sysfs_emit
  clk: si5341: fix reported clk_rate when output divider is 2

* clk-mtk: (32 commits)
  clk: mediatek: Warn if clk IDs are duplicated
  clk: mediatek: mt8195: Implement remove functions
  clk: mediatek: mt8195: Implement error handling in probe functions
  clk: mediatek: mt8195: Hook up mtk_clk_simple_remove()
  clk: mediatek: Unregister clks in mtk_clk_simple_probe() error path
  clk: mediatek: mtk: Implement error handling in register APIs
  clk: mediatek: pll: Implement error handling in register API
  clk: mediatek: mux: Implement error handling in register API
  clk: mediatek: mux: Reverse check for existing clk to reduce nesting level
  clk: mediatek: gate: Implement error handling in register API
  clk: mediatek: cpumux: Implement error handling in register API
  clk: mediatek: mtk: Clean up included headers
  clk: mediatek: Add mtk_clk_simple_remove()
  clk: mediatek: Implement mtk_clk_unregister_composites() API
  clk: mediatek: Implement mtk_clk_unregister_divider_clks() API
  clk: mediatek: Implement mtk_clk_unregister_factors() API
  clk: mediatek: Implement mtk_clk_unregister_fixed_clks() API
  clk: mediatek: pll: Clean up included headers
  clk: mediatek: pll: Implement unregister API
  clk: mediatek: pll: Split definitions into separate header file
  ...

* clk-at91:
  clk: at91: clk-master: remove dead code
  clk: at91: sama7g5: fix parents of PDMCs' GCLK
  clk: at91: sama7g5: Allow MCK1 to be exported and referenced in DT
  clk: at91: allow setting PMC_AUDIOPINCK clock parents via DT

* clk-st:
  clk: stm32mp1: Add parent_data to ETHRX clock
  clk: stm32mp1: Split ETHCK_K into separate MUX and GATE clock

clk: zynq: Update the parameters to zynq_clk_register_periph_clk

In case there are only one gate or the two_gate is 0 the clk1 clock
passed is not used. We are passing 0 which is arm_pll.
Pass a invalid clock instead.

Signed-off-by: Shubhrajyoti Datta <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stephen Boyd <[email protected]>

clk: zynq: trivial warning fix

Fix the below warning

WARNING: Missing a blank line after declarations
+ int enable = !!(fclk_enable & BIT(i - fclk0));
+ zynq_clk_register_fclk(i, clk_output_name[i],

Signed-off-by: Shubhrajyoti Datta <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stephen Boyd <[email protected]>

Revert "KVM: set owner of cpu and vm file operations"

This reverts commit 3d3aab1b973b01bd2a1aa46307e94a1380b1d802.

Now that the KVM module's lifetime is tied to kvm.users_count, there is
no need to also tie it's lifetime to the lifetime of the VM and vCPU
file descriptors.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Message-Id: <20220303183328.1499189 [email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: Prevent module exit until all VMs are freed

Tie the lifetime the KVM module to the lifetime of each VM via
kvm.users_count. This way anything that grabs a reference to the VM via
kvm_get_kvm() cannot accidentally outlive the KVM module.

Prior to this commit, the lifetime of the KVM module was tied to the
lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU
file descriptors by their respective file_operations "owner" field.
This approach is insufficient because references grabbed via
kvm_get_kvm() do not prevent closing any of the aforementioned file
descriptors.

This fixes a long standing theoretical bug in KVM that at least affects
async page faults. kvm_setup_async_pf() grabs a reference via
kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing
prevents the VM file descriptor from being closed and the KVM module
from being unloaded before this callback runs.

Fixes: af585b921e5d ("KVM: Halt vcpu if page it tries to access is swapped out")
Fixes: 3d3aab1b973b ("KVM: set owner of cpu and vm file operations")
Cc: [email protected]
Suggested-by: Ben Gardon <[email protected]>
[ Based on a patch from Ben implemented for Google's kernel. ]
Signed-off-by: David Matlack <[email protected]>
Message-Id: <20220303183328.1499189 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge branch 'pm-docs'

Merge additional power management documentation udates for 5.18-rc1:

- Add Intel uncore frequency scaling documentation file to its
   MAINTAINERS entry (Srinivas Pandruvada).

- Clean up the AMD P-state driver documentation (Jan Engelhardt).

* pm-docs:
  Documentation: amd-pstate: grammar and sentence structure updates
  MAINTAINERS: Add additional file to uncore frequency control

Merge branch 'pm-opp'

Merge OPP (Operating Performance Points) changes for 5.18-rc1.

* pm-opp:
  Documentation: EM: Describe new registration method using DT
  OPP: Add support of "opp-microwatt" for EM registration
  PM: EM: add macro to set .active_power() callback conditionally
  OPP: Add "opp-microwatt" supporting code
  dt-bindings: opp: Add "opp-microwatt" entry in the OPP
  dt-bindings: power: avs: qcom,cpr: Convert to DT schema
  arm64: dts: qcom: qcs404: Rename CPU and CPR OPP tables
  arm64: dts: qcom: msm8996: Rename cluster OPP tables
  dt-bindings: opp: Convert qcom-nvmem-cpufreq to DT schema
  dt-bindings: opp: qcom-opp: Convert to DT schema
  arm64: dts: qcom: msm8996-mtp: Add msm8996 compatible
  dt-bindings: arm: qcom: Add msm8996 and apq8096 compatibles
  opp: Expose of-node's name in debugfs

io_uring: fail links if msg-ring doesn't succeeed

We must always call req_set_fail() if the request is failed, otherwise
we won't sever links for dependent chains correctly.

Fixes: 4f57f06ce218 ("io_uring: add support for IORING_OP_MSG_RING command")
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'devicetree-fixes-for-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Clean-up missing '/schemas' in $ref paths

- Fix MediaTek Vcodec decoder example 'dma-ranges' errors

- Expand available values of PBL for snps,dwmac to fix warnings in
   mediatek-dwmac.yaml example

- Fix warnings in MediaTek display bindings

* tag 'devicetree-fixes-for-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: Fix missing '/schemas' in $ref paths
  dt-bindings: media: mediatek,vcodec: Fix addressing cell sizes
  dt-bindings: net: snps,dwmac: modify available values of PBL
  dt-bindings: display: mediatek: Fix examples on new bindings
  dt-bindings: display: mediatek, ovl: Fix 'iommu' required property typo
  dt-bindings: display: mediatek, mutex: Fix mediatek, gce-events type
  Revert "dt-bindings: display: mediatek: add ethdr definition for mt8195"

Merge tag 'dma-mapping-5.18' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

- do not zero buffer in set_memory_decrypted (Kirill A. Shutemov)

- fix return value of dma-debug __setup handlers (Randy Dunlap)

- swiotlb cleanups (Robin Murphy)

- remove most remaining users of the pci-dma-compat.h API
   (Christophe JAILLET)

- share the ABI header for the DMA map_benchmark with userspace
   (Tian Tao)

- update the maintainer for DMA MAPPING BENCHMARK (Xiang Chen)

- remove CONFIG_DMA_REMAP (me)

* tag 'dma-mapping-5.18' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: benchmark: extract a common header file for map_benchmark definition
  dma-debug: fix return value of __setup handlers
  dma-mapping: remove CONFIG_DMA_REMAP
  media: v4l2-pci-skeleton: Remove usage of the deprecated "pci-dma-compat.h" API
  rapidio/tsi721: Remove usage of the deprecated "pci-dma-compat.h" API
  sparc: Remove usage of the deprecated "pci-dma-compat.h" API
  agp/intel: Remove usage of the deprecated "pci-dma-compat.h" API
  alpha: Remove usage of the deprecated "pci-dma-compat.h" API
  MAINTAINERS: update maintainer list of DMA MAPPING BENCHMARK
  swiotlb: simplify array allocation
  swiotlb: tidy up includes
  swiotlb: simplify debugfs setup
  swiotlb: do not zero buffer in set_memory_decrypted()

phy: PHY_FSL_LYNX_28G should depend on ARCH_LAYERSCAPE

Freescale Layerscape Lynx 28G SerDes PHYs are only present on
Freescale/NXP Layerscape SoCs.

Move PHY_FSL_LYNX_28G outside the block for ARCH_MXC, as the latter
is meant for i.MX8 SoCs, which is a different family than Layerscape.
Add a dependency on ARCH_LAYERSCAPE, to prevent asking the user about
this driver when configuring a kernel without Layerscape SoC support.

Fixes: 02e2af20f4f9f2aa ("Merge tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc")
Fixes: 8f73b37cf3fbda67 ("phy: add support for the Layerscape SerDes 28G")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Revert "parisc: Fix invalidate/flush vmap routines"

This reverts commit 53d862fac4a09b9c56cca0433fa9de5732fd05a1.

It turned out that flush_kernel_vmap_range() is being called with
interrupts disabled. There's no way to flush entire cache with
interrupts disabled.

Signed-off-by: Helge Deller <[email protected]>

x86/sev: Unroll string mmio with CC_ATTR_GUEST_UNROLL_STRING_IO

The io-specific memcpy/memset functions use string mmio accesses to do
their work. Under SEV, the hypervisor can't emulate these instructions
because they read/write directly from/to encrypted memory.

KVM will inject a page fault exception into the guest when it is asked
to emulate string mmio instructions for an SEV guest:

  BUG: unable to handle page fault for address: ffffc90000065068
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 8000100000067 P4D 8000100000067 PUD 80001000fb067 PMD 80001000fc067 PTE 80000000fed40173
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc7 #3

As string mmio for an SEV guest can not be supported by the
hypervisor, unroll the instructions for CC_ATTR_GUEST_UNROLL_STRING_IO
enabled kernels.

This issue appears when kernels are launched in recent libvirt-managed
SEV virtual machines, because virt-install started to add a tpm-crb
device to the guest by default and proactively because, raisins:

  https://github.com/virt-manager/virt-manager/commit/eb58c09f488b0633ed1eea012cd311e48864401e

and as that commit says, the default adding of a TPM can be disabled
with "virt-install ... --tpm none".

The kernel driver for tpm-crb uses memcpy_to/from_io() functions to
access MMIO memory, resulting in a page-fault injected by KVM and
crashing the kernel at boot.

  [ bp: Massage and extend commit message. ]

Fixes: d8aa7eea78a1 ('x86/mm: Add Secure Encrypted Virtualization (SEV) support')
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'nvme-5.18-2022-03-29' of git://git.infradead.org/nvme into for-5.18/drivers

Pull NVMe fixes from Christoph:

"- fix multipath hang when disk goes live over reconnect (Anton Eidelman)
- fix RCU hole that allowed for endless looping in multipath round robin
   (Chris Leech)
- remove redundant assignment after left shift (Colin Ian King)
- add quirks for Samsung X5 SSDs (Monish Kumar R)
- fix the read-only state for zoned namespaces with unsupposed features
   (Pankaj Raghav)
- use a private workqueue instead of the system workqueue in nvmet
   (Sagi Grimberg)
- allow duplicate NSIDs for private namespaces (Sungup Moon)
- expose use_threaded_interrupts read-only in sysfs (Xin Hao)"

* tag 'nvme-5.18-2022-03-29' of git://git.infradead.org/nvme:
  nvme-multipath: fix hang when disk goes live over reconnect
  nvme: fix RCU hole that allowed for endless looping in multipath round robin
  nvme: allow duplicate NSIDs for private namespaces
  nvmet: remove redundant assignment after left shift
  nvmet: use a private workqueue instead of the system workqueue
  nvme-pci: add quirks for Samsung X5 SSDs
  nvme-pci: expose use_threaded_interrupts read-only in sysfs
  nvme: fix the read-only state for zoned namespaces with unsupposed features

net: lan966x: fix kernel oops on ioctl when I/F is down

ioctls handled by phy_mii_ioctl() will cause a kernel oops when the
interface is down. Fix it by making sure there is a PHY attached.

Fixes: 735fec995b21 ("net: lan966x: Implement SIOCSHWTSTAMP and SIOCGHWTSTAMP")
Signed-off-by: Michael Walle <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'fix-uaf-bugs-caused-by-ax25_release'

Duoming Zhou says:

====================
Fix UAF bugs caused by ax25_release()

The first patch fixes UAF bugs in ax25_send_control, and
the second patch fixes UAF bugs in ax25 timers.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

ax25: Fix UAF bugs in ax25 timers

There are race conditions that may lead to UAF bugs in
ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(),
ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we call
ax25_release() to deallocate ax25_dev.

One of the UAF bugs caused by ax25_release() is shown below:

      (Thread 1)                    |      (Thread 2)
ax25_dev_device_up() //(1)          |
...                                 | ax25_kill_by_device()
ax25_bind()          //(2)          |
ax25_connect()                      | ...
ax25_std_establish_data_link()     |
  ax25_start_t1timer()              | ax25_dev_device_down() //(3)
   mod_timer(&ax25->t1timer,..)     |
                                    | ax25_release()
   (wait a time)                    |  ...
                                    |  ax25_dev_put(ax25_dev) //(4)FREE
   ax25_t1timer_expiry()            |
    ax25->ax25_dev->values[..] //USE|  ...
     ...                            |

We increase the refcount of ax25_dev in position (1) and (2), and
decrease the refcount of ax25_dev in position (3) and (4).
The ax25_dev will be freed in position (4) and be used in
ax25_t1timer_expiry().

The fail log is shown below:
==============================================================

[  106.116942] BUG: KASAN: use-after-free in ax25_t1timer_expiry+0x1c/0x60
[  106.116942] Read of size 8 at addr ffff88800bda9028 by task swapper/0/0
[  106.116942] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-06123-g0905eec574
[  106.116942] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-14
[  106.116942] Call Trace:
...
[  106.116942]  ax25_t1timer_expiry+0x1c/0x60
[  106.116942]  call_timer_fn+0x122/0x3d0
[  106.116942]  __run_timers.part.0+0x3f6/0x520
[  106.116942]  run_timer_softirq+0x4f/0xb0
[  106.116942]  __do_softirq+0x1c2/0x651
...

This patch adds del_timer_sync() in ax25_release(), which could ensure
that all timers stop before we deallocate ax25_dev.

Signed-off-by: Duoming Zhou <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

ax25: fix UAF bug in ax25_send_control()

There are UAF bugs in ax25_send_control(), when we call ax25_release()
to deallocate ax25_dev. The possible race condition is shown below:

      (Thread 1)              |     (Thread 2)
ax25_dev_device_up() //(1)    |
                              | ax25_kill_by_device()
ax25_bind()          //(2)    |
ax25_connect()                | ...
ax25->state = AX25_STATE_1   |
...                          | ax25_dev_device_down() //(3)

      (Thread 3)
ax25_release()                |
ax25_dev_put()  //(4) FREE   |
case AX25_STATE_1:           |
  ax25_send_control()         |
   alloc_skb()       //USE    |

The refcount of ax25_dev increases in position (1) and (2), and
decreases in position (3) and (4). The ax25_dev will be freed
before dereference sites in ax25_send_control().

The following is part of the report:

[  102.297448] BUG: KASAN: use-after-free in ax25_send_control+0x33/0x210
[  102.297448] Read of size 8 at addr ffff888009e6e408 by task ax25_close/602
[  102.297448] Call Trace:
[  102.303751]  ax25_send_control+0x33/0x210
[  102.303751]  ax25_release+0x356/0x450
[  102.305431]  __sock_release+0x6d/0x120
[  102.305431]  sock_close+0xf/0x20
[  102.305431]  __fput+0x11f/0x420
[  102.305431]  task_work_run+0x86/0xd0
[  102.307130]  get_signal+0x1075/0x1220
[  102.308253]  arch_do_signal_or_restart+0x1df/0xc00
[  102.308253]  exit_to_user_mode_prepare+0x150/0x1e0
[  102.308253]  syscall_exit_to_user_mode+0x19/0x50
[  102.308253]  do_syscall_64+0x48/0x90
[  102.308253]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  102.308253] RIP: 0033:0x405ae7

This patch defers the free operation of ax25_dev and net_device after
all corresponding dereference sites in ax25_release() to avoid UAF.

Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()")
Signed-off-by: Duoming Zhou <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

openvswitch: Fixed nd target mask field in the flow dump.

IPv6 nd target mask was not getting populated in flow dump.

In the function __ovs_nla_put_key the icmp code mask field was checked
instead of icmp code key field to classify the flow as neighbour discovery.

ufid:bdfbe3e5-60c2-43b0-a5ff-dfcac1c37328, recirc_id(0),dp_hash(0/0),
skb_priority(0/0),in_port(ovs-nm1),skb_mark(0/0),ct_state(0/0),
ct_zone(0/0),ct_mark(0/0),ct_label(0/0),
eth(src=00:00:00:00:00:00/00:00:00:00:00:00,
dst=00:00:00:00:00:00/00:00:00:00:00:00),
eth_type(0x86dd),
ipv6(src=::/::,dst=::/::,label=0/0,proto=58,tclass=0/0,hlimit=0/0,frag=no),
icmpv6(type=135,code=0),
nd(target=2001::2/::,
sll=00:00:00:00:00:00/00:00:00:00:00:00,
tll=00:00:00:00:00:00/00:00:00:00:00:00),
packets:10, bytes:860, used:0.504s, dp:ovs, actions:ovs-nm2

Fixes: e64457191a25 (openvswitch: Restructure datapath.c and flow.c)
Signed-off-by: Martin Varghese <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

nvme-multipath: fix hang when disk goes live over reconnect

nvme_mpath_init_identify() invoked from nvme_init_identify() fetches a
fresh ANA log from the ctrl.  This is essential to have an up to date
path states for both existing namespaces and for those scan_work may
discover once the ctrl is up.

This happens in the following cases:
  1) A new ctrl is being connected.
  2) An existing ctrl is successfully reconnected.
  3) An existing ctrl is being reset.

While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces, and
nvme_read_ana_log() may call nvme_update_ns_ana_state().

This result in a hang when the ANA state of an existing namespace changes
and makes the disk live: nvme_mpath_set_live() issues IO to the namespace
through the ctrl, which does NOT have IO queues yet.

See sample hang below.

Solution:
- nvme_update_ns_ana_state() to call set_live only if ctrl is live
- nvme_read_ana_log() call from nvme_mpath_init_identify()
  therefore only fetches and parses the ANA log;
  any erros in this process will fail the ctrl setup as appropriate;
- a separate function nvme_mpath_update()
  is called in nvme_start_ctrl();
  this parses the ANA log without fetching it.
  At this point the ctrl is live,
  therefore, disks can be set live normally.

Sample failure:
    nvme nvme0: starting error recovery
    nvme nvme0: Reconnecting in 10 seconds...
    block nvme0n6: no usable path - requeuing I/O
    INFO: task kworker/u8:3:312 blocked for more than 122 seconds.
          Tainted: G            E     5.14.5-1.el7.elrepo.x86_64 #1
    Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
    Call Trace:
     __schedule+0x2a2/0x7e0
     schedule+0x4e/0xb0
     io_schedule+0x16/0x40
     wait_on_page_bit_common+0x15c/0x3e0
     do_read_cache_page+0x1e0/0x410
     read_cache_page+0x12/0x20
     read_part_sector+0x46/0x100
     read_lba+0x121/0x240
     efi_partition+0x1d2/0x6a0
     bdev_disk_changed.part.0+0x1df/0x430
     bdev_disk_changed+0x18/0x20
     blkdev_get_whole+0x77/0xe0
     blkdev_get_by_dev+0xd2/0x3a0
     __device_add_disk+0x1ed/0x310
     device_add_disk+0x13/0x20
     nvme_mpath_set_live+0x138/0x1b0 [nvme_core]
     nvme_update_ns_ana_state+0x2b/0x30 [nvme_core]
     nvme_update_ana_state+0xca/0xe0 [nvme_core]
     nvme_parse_ana_log+0xac/0x170 [nvme_core]
     nvme_read_ana_log+0x7d/0xe0 [nvme_core]
     nvme_mpath_init_identify+0x105/0x150 [nvme_core]
     nvme_init_identify+0x2df/0x4d0 [nvme_core]
     nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core]
     nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp]
     nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp]
     process_one_work+0x1bd/0x360
     worker_thread+0x50/0x3d0

Signed-off-by: Anton Eidelman <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme: fix RCU hole that allowed for endless looping in multipath round robin

Make nvme_ns_remove match the assumptions elsewhere.

1) !NVME_NS_READY needs to be srcu synchronized to make sure nothing is
   running in __nvme_find_path or nvme_round_robin_path that will
   re-assign this ns to current_path.

2) Any matching current_path entries need to be cleared before removing
   from the siblings list, to prevent calling nvme_round_robin_path with
   an "old" ns that's off list.

3) Finally the list_del_rcu can happen, and then synchronize again
   before releasing any reference counts.

Signed-off-by: Christoph Hellwig <[email protected]>

nvme: allow duplicate NSIDs for private namespaces

A NVMe subsystem with multiple controller can have private namespaces
that use the same NSID under some conditions:

"If Namespace Management, ANA Reporting, or NVM Sets are supported, the
  NSIDs shall be unique within the NVM subsystem. If the Namespace
  Management, ANA Reporting, and NVM Sets are not supported, then NSIDs:
   a) for shared namespace shall be unique; and
   b) for private namespace are not required to be unique."

Reference: Section 6.1.6 NSID and Namespace Usage; NVM Express 1.4c spec.

Make sure this specific setup is supported in Linux.

Fixes: 9ad1927a3bc2 ("nvme: always search for namespace head")
Signed-off-by: Sungup Moon <[email protected]>
[hch: refactored and fixed the controller vs subsystem based naming
      conflict]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>

nvmet: remove redundant assignment after left shift

The left shift is followed by a re-assignment back to cc_css, the
assignment is redundant. Fix this by replacing the "<<=" operator with
"<<" instead.

This cleans up the clang scan build warning:

drivers/nvme/target/core.c:1124:10: warning: Although the value stored to 'cc_css' is used in the enclosing expression, the value is never actually read from 'cc_css' [deadcode.DeadStores]

Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvmet: use a private workqueue instead of the system workqueue

Any attempt to flush kernel-global WQs has possibility of deadlock
so we should simply stop using them, instead introduce nvmet_wq
which is the generic nvmet workqueue for work elements that
don't explicitly require a dedicated workqueue (by the mere fact
that they are using the system_wq).

Changes were done using the following replaces:

- s/schedule_work(/queue_work(nvmet_wq, /g
- s/schedule_delayed_work(/queue_delayed_work(nvmet_wq, /g
- s/flush_scheduled_work()/flush_workqueue(nvmet_wq)/g

Reported-by: Tetsuo Handa <[email protected]>
Signed-off-by: Sagi Grimberg <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

selftests/bpf: Fix clang compilation errors

llvm upstream patch ([1]) added to issue warning for code like
  void test() {
    int j = 0;
    for (int i = 0; i < 1000; i++)
            j++;
    return;
  }

This triggered several errors in selftests/bpf build since
compilation flag -Werror is used.
  ...
  test_lpm_map.c:212:15: error: variable 'n_matches' set but not used [-Werror,-Wunused-but-set-variable]
        size_t i, j, n_matches, n_matches_after_delete, n_nodes, n_lookups;
                     ^
  test_lpm_map.c:212:26: error: variable 'n_matches_after_delete' set but not used [-Werror,-Wunused-but-set-variable]
        size_t i, j, n_matches, n_matches_after_delete, n_nodes, n_lookups;
                                ^
  ...
  prog_tests/get_stack_raw_tp.c:32:15: error: variable 'cnt' set but not used [-Werror,-Wunused-but-set-variable]
        static __u64 cnt;
                     ^
  ...

  For test_lpm_map.c, 'n_matches'/'n_matches_after_delete' are changed to be volatile
  in order to silent the warning. I didn't remove these two declarations since
  they are referenced in a commented code which might be used by people in certain
  cases. For get_stack_raw_tp.c, the variable 'cnt' is removed.

  [1] https://reviews.llvm.org/D122271

Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Merge branch 'xsk: another round of fixes'

Maciej Fijalkowski says:

====================

Hello,

yet another fixes for XSK from Magnus and me.

Magnus addresses the fact that xp_alloc() can return NULL, so this needs
to be handled to avoid clearing entries in the SW ring on driver side.
Then he addresses the off-by-one problem in Tx desc cleaning routine for
ice ZC driver.

From my side, I am adding protection to ZC Rx processing loop so that
cleaning of descriptors wouldn't go over already processed entries.
Then I also fix an issue with assigning XSK pool to Tx queues.

This is directed to bpf tree.

Thanks!

Maciej Fijalkowski (2):
ice: xsk: stop Rx processing when ntc catches ntu
ice: xsk: fix indexing in ice_tx_xsk_pool()
====================

Acked-by: Alexander Lobakin <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

ice: xsk: Fix indexing in ice_tx_xsk_pool()

Ice driver tries to always create XDP rings array to be
num_possible_cpus() sized, regardless of user's queue count setting that
can be changed via ethtool -L for example.

Currently, ice_tx_xsk_pool() calculates the qid by decrementing the
ring->q_index by the count of XDP queues, but ring->q_index is set to 'i
+ vsi->alloc_txq'.

When user did ethtool -L $IFACE combined 1, alloc_txq is 1, but
vsi->num_xdp_txq is still num_possible_cpus(). Then, ice_tx_xsk_pool()
will do OOB access and in the final result ring would not get xsk_pool
pointer assigned. Then, each ice_xsk_wakeup() call will fail with error
and it will not be possible to get into NAPI and do the processing from
driver side.

Fix this by decrementing vsi->alloc_txq instead of vsi->num_xdp_txq from
ring-q_index in ice_tx_xsk_pool() so the calculation is reflected to the
setting of ring->q_index.

Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

ice: xsk: Stop Rx processing when ntc catches ntu

This can happen with big budget values and some breakage of re-filling
descriptors as we do not clear the entry that ntu is pointing at the end
of ice_alloc_rx_bufs_zc. So if ntc is at ntu then it might be the case
that status_error0 has an old, uncleared value and ntc would go over
with processing which would result in false results.

Break Rx loop when ntc == ntu to avoid broken behavior.

Fixes: 3876ff525de7 ("ice: xsk: Handle SW XDP ring wrap and bump tail more often")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

ice: xsk: Eliminate unnecessary loop iteration

The NIC Tx ring completion routine cleans entries from the ring in
batches. However, it processes one more batch than it is supposed
to. Note that this does not matter from a functionality point of view
since it will not find a set DD bit for the next batch and just exit
the loop. But from a performance perspective, it is faster to
terminate the loop before and not issue an expensive read over PCIe to
get the DD bit.

Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API")
Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

xsk: Do not write NULL in SW ring at allocation failure

For the case when xp_alloc_batch() is used but the batched allocation
cannot be used, there is a slow path that uses the non-batched
xp_alloc(). When it fails to allocate an entry, it returns NULL. The
current code wrote this NULL into the entry of the provided results
array (pointer to the driver SW ring usually) and returned. This might
not be what the driver expects and to make things simpler, just write
successfully allocated xdp_buffs into the SW ring,. The driver might
have information in there that is still important after an allocation
failure.

Note that at this point in time, there are no drivers using
xp_alloc_batch() that could trigger this slow path. But one might get
added.

Fixes: 47e4075df300 ("xsk: Batched buffer allocation for the pool")
Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Merge branch 'kprobes: rethook: x86: Replace kretprobe trampoline with rethook'

Masami Hiramatsu says:

====================
Here are the 3rd version for generic kretprobe and kretprobe on x86 for
replacing the kretprobe trampoline with rethook. The previous version
is here[1]

[1] https://lore.kernel.org/all/164821817332.2373735.12048266953420821089.stgit@devnote2/T/#u

This version fixed typo and build issues for bpf-next and CONFIG_RETHOOK=y
error. I also add temporary mitigation lines for ANNOTATE_NOENDBR macro
issue for bpf-next tree [2/4].

This will be removed after merging kernel IBT series.

Background:

This rethook came from Jiri's request of multiple kprobe for bpf[2].
He tried to solve an issue that starting bpf with multiple kprobe will
take a long time because bpf-kprobe will wait for RCU grace period for
sync rcu events.

Jiri wanted to attach a single bpf handler to multiple kprobes and
he tried to introduce multiple-probe interface to kprobe. So I asked
him to use ftrace and kretprobe-like hook if it is only for the
function entry and exit, instead of adding ad-hoc interface
to kprobes.
For this purpose, I introduced the fprobe (kprobe like interface for
ftrace) with the rethook (this is a generic return hook feature for
fprobe exit handler)[3].

[2] https://lore.kernel.org/all/20220104080943 [email protected]/T/#u
[3] https://lore.kernel.org/all/164191321766.806991.7930388561276940676.stgit@devnote2/T/#u

The rethook is basically same as the kretprobe trampoline. I just made
it decoupled from kprobes. Eventually, the all arch dependent kretprobe
trampolines will be replaced with the rethook trampoline instead of
cloning and set HAVE_RETHOOK=y.
When I port the rethook for all arch which supports kretprobe, the
legacy kretprobe specific code (which is for CONFIG_KRETPROBE_ON_RETHOOK=n)
will be removed eventually.
====================

Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

x86,kprobes: Fix optprobe trampoline to generate complete pt_regs

Currently the optprobe trampoline template code ganerate an
almost complete pt_regs on-stack, everything except regs->ss.
The 'regs->ss' points to the top of stack, which is not a
valid segment decriptor.

As same as the rethook does, complete the job by also pushing ss.

Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/164826166027.2455864.14759128090648961900.stgit@devnote2

x86,rethook: Fix arch_rethook_trampoline() to generate a complete pt_regs

Currently arch_rethook_trampoline() generates an almost complete
pt_regs on-stack, everything except regs->ss that is, that currently
points to the fake return address, which is not a valid segment
descriptor.

Since interpretation of regs->[sb]p should be done in the context of
regs->ss, and we have code actually doing that (see
arch/x86/lib/insn-eval.c for instance), complete the job by also
pushing ss.

This ensures that anybody who does do look at regs->ss doesn't
mysteriously malfunction, avoiding much future pain.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Link: https://lore.kernel.org/bpf/164826164851.2455864.17272661073069737350.stgit@devnote2

x86,rethook,kprobes: Replace kretprobe with rethook on x86

Replaces the kretprobe code with rethook on x86. With this patch,
kretprobe on x86 uses the rethook instead of kretprobe specific
trampoline code.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Tested-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/bpf/164826163692.2455864.13745421016848209527.stgit@devnote2

kprobes: Use rethook for kretprobe if possible

Use rethook for kretprobe function return hooking if the arch sets
CONFIG_HAVE_RETHOOK=y. In this case, CONFIG_KRETPROBE_ON_RETHOOK is
set to 'y' automatically, and the kretprobe internal data fields
switches to use rethook. If not, it continues to use kretprobe
specific function return hooks.

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/164826162556.2455864.12255833167233452047.stgit@devnote2

bpftool: Fix generated code in codegen_asserts

Arnaldo reported perf compilation fail with:

  $ make -k BUILD_BPF_SKEL=1 CORESIGHT=1 PYTHON=python3
  ...
  In file included from util/bpf_counter.c:28:
  /tmp/build/perf//util/bpf_skel/bperf_leader.skel.h: In function ‘bperf_leader_bpf__assert’:
  /tmp/build/perf//util/bpf_skel/bperf_leader.skel.h:351:51: error: unused parameter ‘s’ [-Werror=unused-parameter]
    351 | bperf_leader_bpf__assert(struct bperf_leader_bpf *s)
        |                          ~~~~~~~~~~~~~~~~~~~~~~~~~^
  cc1: all warnings being treated as errors

If there's nothing to generate in the new assert function,
we will get unused 's' warn/error, adding 'unused' attribute to it.

Fixes: 08d4dba6ae77 ("bpftool: Bpf skeletons assert type sizes")
Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

selftests/bpf: fix selftest after random: Urandom_read tracepoint removal

14c174633f34 ("random: remove unused tracepoints") removed all the
tracepoints from drivers/char/random.c, one of which,
random:urandom_read, was used by stacktrace_build_id selftest to trigger
stack trace capture.

Fix breakage by switching to kprobing urandom_read() function.

Suggested-by: Yonghong Song <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Fix maximum permitted number of arguments check

Since the m->arg_size array can hold up to MAX_BPF_FUNC_ARGS argument
sizes, it's ok that nargs is equal to MAX_BPF_FUNC_ARGS.

Signed-off-by: Yuntao Wang <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Sync comments for bpf_get_stack

Commit ee2a098851bf missed updating the comments for helper bpf_get_stack
in tools/include/uapi/linux/bpf.h. Sync it.

Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/ce54617746b7ed5e9ba3b844e55e74cb8a60e0b5.1648110794.git.geliang.tang@suse.com

Merge branch 'fprobe: Fixes for Sparse and Smatch warnings'

Masami Hiramatsu says:

====================

Hi,

These fprobe patches are for fixing the warnings by Smatch and sparse.
This is arch independent part of the fixes.

Thank you,
---
====================

Signed-off-by: Alexei Starovoitov <[email protected]>

fprobe: Fix sparse warning for acccessing __rcu ftrace_hash

Since ftrace_ops::local_hash::filter_hash field is an __rcu pointer,
we have to use rcu_access_pointer() to access it.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/164802093635.1732982.4938094876018890866.stgit@devnote2

fprobe: Fix smatch type mismatch warning

Fix the type mismatching warning of 'rethook_node vs fprobe_rethook_node'
found by Smatch.

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/164802092611.1732982.12268174743437084619.stgit@devnote2

bpf/bpftool: Add unprivileged_bpf_disabled check against value of 2

In [1], we added a kconfig knob that can set
/proc/sys/kernel/unprivileged_bpf_disabled to 2

We now check against this value in bpftool feature probe

[1] https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074 [email protected]

Signed-off-by: Milan Landaverde <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Quentin Monnet <[email protected]>
Acked-by: KP Singh <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

dt-bindings: Fix missing '/schemas' in $ref paths

Absolute paths in $ref should always begin with '/schemas'. The tools
mostly work with it omitted, but for correctness the path should be
everything except the hostname as that is taken from the schema's $id
value. This scheme is defined in the json-schema spec.

Cc: Hector Martin <[email protected]>
Cc: Sven Peter <[email protected]>
Cc: Andrew Lunn <[email protected]>
Cc: Vivien Didelot <[email protected]>
Cc: Florian Fainelli <[email protected]>
Cc: Vladimir Oltean <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Chunfeng Yun <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Mukesh Savaliya <[email protected]>
Cc: Akash Asthana <[email protected]>
Cc: Bayi Cheng <[email protected]>
Cc: Chuanhong Guo <[email protected]>
Cc: Min Guo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Rob Herring <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Acked-by: Mark Brown <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

dt-bindings: media: mediatek,vcodec: Fix addressing cell sizes

'dma-ranges' in the example is written for cell sizes of 2 cells, but
the schema and example specify sizes of 1 cell. As the h/w has a bus
address of >32-bits, cell sizes of 2 is correct. Update the schema's
'#address-cells' and '#size-cells' to be 2 and adjust the example
throughout.

There's no error currently because dtc only checks 'dma-ranges' is a
correct multiple number of cells (3) and the schema checking is based on
bracketing of entries.

Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

dt-bindings: net: snps,dwmac: modify available values of PBL

PBL can be any of the following values: 1, 2, 4, 8, 16 or 32
according to the datasheet, so modify available values of PBL in
snps,dwmac.yaml.

Signed-off-by: Biao Huang <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

dt-bindings: display: mediatek: Fix examples on new bindings

To avoid failure of dt_binding_check perform a slight refactoring
of the examples: the main block is kept, but that required fixing
the address and size cells, plus the inclusion of missing dt-bindings
headers, required to parse some of the values assigned to various
properties.

Fixes: 4ed545e7d100 ("dt-bindings: display: mediatek: disp: split each block to individual yaml")
Signed-off-by: AngeloGioacchino Del Regno <[email protected]>
Signed-off-by: jason-jh.lin <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Acked-by: Chun-Kuang Hu <[email protected]>
Tested-by: jason-jh.lin <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

dt-bindings: display: mediatek, ovl: Fix 'iommu' required property typo

The property is called 'iommus' and not 'iommu'. Fix this typo.

Fixes: 4ed545e7d100 ("dt-bindings: display: mediatek: disp: split each block to individual yaml")
Signed-off-by: AngeloGioacchino Del Regno <[email protected]>
Signed-off-by: jason-jh.lin <[email protected]>
Acked-by: Rob Herring <[email protected]>
Acked-by: Chun-Kuang Hu <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

dt-bindings: display: mediatek, mutex: Fix mediatek, gce-events type

The mediatek,gce-events property needs as value an array of uint32
corresponding to the CMDQ events to listen to, and not any phandle.

Fixes: 4ed545e7d100 ("dt-bindings: display: mediatek: disp: split each block to individual yaml")
Signed-off-by: AngeloGioacchino Del Regno <[email protected]>
Signed-off-by: jason-jh.lin <[email protected]>
Acked-by: Rob Herring <[email protected]>
Acked-by: Chun-Kuang Hu <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Revert "dt-bindings: display: mediatek: add ethdr definition for mt8195"

This reverts commit e7dcfe64204a5cd9a74a9ca7d9c7a22434dc7fe5.

Because examples property of mediatek,ethdr.yaml should base on [1][2].
Reverting it until [1][2] are applied.

[1] dt-bindings: mediatek: mt8195: Add binding for MM IOMMU
https://patchwork.kernel.org/project/linux-mediatek/patch/20220217113453 [email protected]/
[2] dt-bindings: reset: mt8195: add vdosys1 reset control bit
https://patchwork.kernel.org/project/linux-mediatek/patch/20220222100741 [email protected]/

Signed-off-by: jason-jh.lin <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull ptrace cleanups from Eric Biederman:
"This set of changes removes tracehook.h, moves modification of all of
  the ptrace fields inside of siglock to remove races, adds a missing
  permission check to ptrace.c

  The removal of tracehook.h is quite significant as it has been a major
  source of confusion in recent years. Much of that confusion was around
  task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
  semantics clearer).

  For people who don't know tracehook.h is a vestiage of an attempt to
  implement uprobes like functionality that was never fully merged, and
  was later superseeded by uprobes when uprobes was merged. For many
  years now we have been removing what tracehook functionaly a little
  bit at a time. To the point where anything left in tracehook.h was
  some weird strange thing that was difficult to understand"

* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ptrace: Remove duplicated include in ptrace.c
  ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
  ptrace: Return the signal to continue with from ptrace_stop
  ptrace: Move setting/clearing ptrace_message into ptrace_stop
  tracehook: Remove tracehook.h
  resume_user_mode: Move to resume_user_mode.h
  resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
  signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
  task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
  task_work: Call tracehook_notify_signal from get_signal on all architectures
  task_work: Introduce task_work_pending
  task_work: Remove unnecessary include from posix_timers.h
  ptrace: Remove tracehook_signal_handler
  ptrace: Remove arch_syscall_{enter,exit}_tracehook
  ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
  ptrace/arm: Rename tracehook_report_syscall report_syscall
  ptrace: Move ptrace_report_syscall into ptrace.h

Merge tag 'ucount-rlimit-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull shm ucounts fix from Eric Biederman:
"The introduction of a new failure mode when the code was converted to
  ucounts resulted in user_shm_lock misbehaving.

  The change simplifies the code to make the code easier to follow and
  removes the known misbehaviors"

* tag 'ucount-rlimit-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  mm/mlock: fix two bugs in user_shm_lock()

Merge tag 'net-5.18-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter.

  Current release - regressions:

   - llc: only change llc->dev when bind() succeeds, fix null-deref

  Current release - new code bugs:

   - smc: fix a memory leak in smc_sysctl_net_exit()

   - dsa: realtek: make interface drivers depend on OF

  Previous releases - regressions:

   - sched: act_ct: fix ref leak when switching zones

  Previous releases - always broken:

   - netfilter: egress: report interface as outgoing

   - vsock/virtio: enable VQs early on probe and finish the setup before
     using them

  Misc:

   - memcg: enable accounting for nft objects"

* tag 'net-5.18-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (39 commits)
  Revert "selftests: net: Add tls config dependency for tls selftests"
  net/smc: Send out the remaining data in sndbuf before close
  net: move net_unlink_todo() out of the header
  net: dsa: bcm_sf2_cfp: fix an incorrect NULL check on list iterator
  net: bnxt_ptp: fix compilation error
  selftests: net: Add tls config dependency for tls selftests
  memcg: enable accounting for nft objects
  net/sched: act_ct: fix ref leak when switching zones
  net/smc: fix a memory leak in smc_sysctl_net_exit()
  selftests: tls: skip cmsg_to_pipe tests with TLS=n
  octeontx2-af: initialize action variable
  net: sparx5: switchdev: fix possible NULL pointer dereference
  net/x25: Fix null-ptr-deref caused by x25_disconnect
  qlcnic: dcb: default to returning -EOPNOTSUPP
  net: sparx5: depends on PTP_1588_CLOCK_OPTIONAL
  net: hns3: fix phy can not link up when autoneg off and reset
  net: hns3: add NULL pointer check for hns3_set/get_ringparam()
  net: hns3: add netdev reset check for hns3_set_tunable()
  net: hns3: clean residual vf config after disable sriov
  net: hns3: add max order judgement for tx spare buffer
  ...

XArray: Fix xas_create_range() when multi-order entry present

If there is already an entry present that is of order >= XA_CHUNK_SHIFT
when we call xas_create_range(), xas_create_range() will misinterpret
that entry as a node and dereference xa_node->parent, generally leading
to a crash that looks something like this:

general protection fault, probably for non-canonical address 0xdffffc0000000001:
0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 32 Comm: khugepaged Not tainted 5.17.0-rc8-syzkaller-00003-g56e337f2cf13 #0
RIP: 0010:xa_parent_locked include/linux/xarray.h:1207 [inline]
RIP: 0010:xas_create_range+0x2d9/0x6e0 lib/xarray.c:725

It's deterministically reproducable once you know what the problem is,
but producing it in a live kernel requires khugepaged to hit a race.
While the problem has been present since xas_create_range() was
introduced, I'm not aware of a way to hit it before the page cache was
converted to use multi-index entries.

Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache")
Reported-by: [email protected]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

Revert "selftests: net: Add tls config dependency for tls selftests"

This reverts commit d9142e1cf3bbdaf21337767114ecab26fe702d47.

The test is supposed to run cleanly with TLS is disabled,
to test compatibility with TCP behavior. I can't repro
the failure [1], the problem should be debugged rather
than papered over.

Link: https://lore.kernel.org/all/20220325161203.7000698c@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
Fixes: d9142e1cf3bb ("selftests: net: Add tls config dependency for tls selftests")
Signed-off-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net/smc: Send out the remaining data in sndbuf before close

The current autocork algorithms will delay the data transmission
in BH context to smc_release_cb() when sock_lock is hold by user.

So there is a possibility that when connection is being actively
closed (sock_lock is hold by user now), some corked data still
remains in sndbuf, waiting to be sent by smc_release_cb(). This
will cause:

- smc_close_stream_wait(), which is called under the sock_lock,
  has a high probability of timeout because data transmission is
  delayed until sock_lock is released.

- Unexpected data sends may happen after connction closed and use
  the rtoken which has been deleted by remote peer through
  LLC_DELETE_RKEY messages.

So this patch will try to send out the remaining corked data in
sndbuf before active close process, to ensure data integrity and
avoid unexpected data transmission after close.

Reported-by: Guangguan Wang <[email protected]>
Fixes: 6b88af839d20 ("net/smc: don't send in the BH context if sock_owned_by_user")
Signed-off-by: Wen Gu <[email protected]>
Acked-by: Karsten Graul <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

smb3: cleanup and clarify status of tree connections

Currently the way the tid (tree connection) status is tracked
is confusing.  The same enum is used for structs cifs_tcon
and cifs_ses and TCP_Server_info, but each of these three has
different states that they transition among.  The current
code also unnecessarily uses camelCase.

Convert from use of statusEnum to a new tid_status_enum for
tree connections.  The valid states for a tid are:

        TID_NEW = 0,
        TID_GOOD,
        TID_EXITING,
        TID_NEED_RECON,
        TID_NEED_TCON,
        TID_IN_TCON,
        TID_NEED_FILES_INVALIDATE, /* unused, considering removing in future */
        TID_IN_FILES_INVALIDATE

It also removes CifsNeedTcon, CifsInTcon, CifsNeedFilesInvalidate and
CifsInFilesInvalidate from the statusEnum used for session and
TCP_Server_Info since they are not relevant for those.

A follow on patch will fix the places where we use the
tcon->need_reconnect flag to be more consistent with the tid->status.

Also fixes a bug that was:
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Shyam Prasad N <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

Merge tag 'kgdb-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux

Pull kgdb update from Daniel Thompson:
"Only a single patch this cycle. Fix an obvious mistake with the kdb
  memory accessors.

  It was a stupid mistake (to/from backwards) but it has been there for
  a long time since many architectures tolerated it with surprisingly
  good grace"

* tag 'kgdb-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kdb: Fix the putarea helper function