Git Repo - linux.git/log

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
"ARM:
   - Page ownership tracking between host EL1 and EL2
   - Rely on userspace page tables to create large stage-2 mappings
   - Fix incompatibility between pKVM and kmemleak
   - Fix the PMU reset state, and improve the performance of the virtual
     PMU
   - Move over to the generic KVM entry code
   - Address PSCI reset issues w.r.t. save/restore
   - Preliminary rework for the upcoming pKVM fixed feature
   - A bunch of MM cleanups
   - a vGIC fix for timer spurious interrupts
   - Various cleanups

  s390:
   - enable interpretation of specification exceptions
   - fix a vcpu_idx vs vcpu_id mixup

  x86:
   - fast (lockless) page fault support for the new MMU
   - new MMU now the default
   - increased maximum allowed VCPU count
   - allow inhibit IRQs on KVM_RUN while debugging guests
   - let Hyper-V-enabled guests run with virtualized LAPIC as long as
     they do not enable the Hyper-V "AutoEOI" feature
   - fixes and optimizations for the toggling of AMD AVIC (virtualized
     LAPIC)
   - tuning for the case when two-dimensional paging (EPT/NPT) is
     disabled
   - bugfixes and cleanups, especially with respect to vCPU reset and
     choosing a paging mode based on CR0/CR4/EFER
   - support for 5-level page table on AMD processors

  Generic:
   - MMU notifier invalidation callbacks do not take mmu_lock unless
     necessary
   - improved caching of LRU kvm_memory_slot
   - support for histogram statistics
   - add statistics for halt polling and remote TLB flush requests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits)
  KVM: Drop unused kvm_dirty_gfn_invalid()
  KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
  KVM: MMU: mark role_regs and role accessors as maybe unused
  KVM: MIPS: Remove a "set but not used" variable
  x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait
  KVM: stats: Add VM stat for remote tlb flush requests
  KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count()
  KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page
  KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality
  Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()"
  KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page
  kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710
  kvm: x86: Increase MAX_VCPUS to 1024
  kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS
  KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
  KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host
  KVM: s390: index kvm->arch.idle_mask by vcpu_idx
  KVM: s390: Enable specification exception interpretation
  KVM: arm64: Trim guest debug exception handling
  KVM: SVM: Add 5-level page table support for SVM
  ...

putname(): IS_ERR_OR_NULL() is wrong here

Mixing NULL and ERR_PTR() just in case is a Bad Idea(tm). For
struct filename the former is wrong - failures are reported
as ERR_PTR(...), not as NULL.

Signed-off-by: Al Viro <[email protected]>

namei: Standardize callers of filename_create()

filename_create() has two variants, one which drops the caller's
reference to filename (filename_create) and one which does
not (__filename_create). This can be confusing as it's unusual to drop a
caller's reference. Remove filename_create, rename __filename_create
to filename_create, and convert all callers.

Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
Cc: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Al Viro <[email protected]>

Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging

Pull dmi fix from Jean Delvare.

Unbreak some existing udev/hwdb modalias matches due to misplaced
product_sku field.

* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
firmware: dmi: Move product_sku info to the end of the modalias

namei: Standardize callers of filename_lookup()

filename_lookup() has two variants, one which drops the caller's
reference to filename (filename_lookup), and one which does
not (__filename_lookup). This can be confusing as it's unusual to drop a
caller's reference. Remove filename_lookup, rename __filename_lookup
to filename_lookup, and convert all callers. The cost is a few slightly
longer functions, but the clarity is greater.

[AV: consuming a reference is not at all unusual, actually; look at
e.g. do_mkdirat(), for example. It's more that we want non-consuming
variant for close relative of that function...]

Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
Cc: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Al Viro <[email protected]>

Merge tag 'ntb-5.15' of git://github.com/jonmason/ntb

Pull NTB updates from Jon Mason:
"Bug fixes and clean-ups for Linux v5.15"

* tag 'ntb-5.15' of git://github.com/jonmason/ntb:
  NTB: switch from 'pci_' to 'dma_' API
  ntb: ntb_pingpong: remove redundant initialization of variables msg_data and spad_data
  NTB: perf: Fix an error code in perf_setup_inbuf()
  NTB: Fix an error code in ntb_msit_probe()
  ntb: intel: remove invalid email address in header comment

rename __filename_parentat() to filename_parentat()

... in separate commit, to avoid noise in previous one

Signed-off-by: Al Viro <[email protected]>

Merge tag 'rproc-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc

Pull remoteproc updates from Bjorn Andersson:

- move the crash recovery worker to the freezable work queue to avoid
   interaction with other drivers during suspend & resume

- fix a couple of typos in comments

- add support for handling the audio DSP on SDM660

- fix a race between the Qualcomm wireless subsystem driver and the
   associated driver for the RF chip

* tag 'rproc-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
  remoteproc: q6v5_pas: Add sdm660 ADSP PIL compatible
  dt-bindings: remoteproc: qcom: adsp: Add SDM660 ADSP
  remoteproc: use freezable workqueue for crash notifications
  remoteproc: fix kernel doc for struct rproc_ops
  remoteproc: fix an typo in fw_elf_get_class code comments
  remoteproc: qcom: wcnss: Fix race with iris probe

namei: Fix use after free in kern_path_locked

In 0ee50b47532a ("namei: change filename_parentat() calling
conventions"), filename_parentat() was made to always call putname() on
the  filename before returning, and kern_path_locked() was migrated to
this calling convention.  However, kern_path_locked() uses the "last"
parameter to lookup and potentially create a new dentry. The last
parameter contains the last component of the path and points within the
filename, which was recently freed at the end of filename_parentat().
Thus, when kern_path_locked() calls __lookup_hash(), it is using the
filename after it has already been freed.

In other words, these calling conventions had been wrong for the
only remaining caller of filename_parentat().  Everything else
is using __filename_parentat(), which does not drop the reference;
so should kern_path_locked().

Switch kern_path_locked() to use of __filename_parentat() and move
getting/dropping struct filename into wrapper.  Remove filename_parentat(),
now that we have no remaining callers.

Fixes: 0ee50b47532a ("namei: change filename_parentat() calling conventions")
Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
Cc: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Reported-by: [email protected]
Signed-off-by: Stephen Brennan <[email protected]>
Co-authored-by: Dmitry Kadashev <[email protected]>
Signed-off-by: Al Viro <[email protected]>

Merge tag 'backlight-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight

Pull backlight updates from Lee Jones:
"Fix-ups:
   - Improve bootloader/kernel device handover

  Bug Fixes:
   - Stabilise backlight in ktd253 driver"

* tag 'backlight-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
  backlight: pwm_bl: Improve bootloader/kernel device handover
  backlight: ktd253: Stabilize backlight

Merge tag 'mfd-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

Pull MFD updates from Lee Jones:
"Core Frameworks:
   - Add support for registering devices via MFD cells to Simple MFD (I2C)

  New Drivers:
   - Add support for Renesas Synchronization Management Unit (SMU)

  New Device Support:
   - Add support for N5010 to Intel M10 BMC
   - Add support for Cannon Lake to Intel LPSS ACPI
   - Add support for Samsung SSG{1,2} to ST-Ericsson's U8500 family
   - Add support for TQMx110EB and TQMxE40x to TQ-Systems PLD TQMx86

  New Functionality:
   - Add support for GPIO to Intel LPC ICH
   - Add support for Reset to Texas Instruments TPS65086

  Fix-ups:
   - Trivial, sorting, whitespace, renaming, etc; mt6360-core, db8500-prcmu-regs, tqmx86
   - Device Tree fiddling; syscon, axp20x, qcom,pm8008, ti,tps65086, brcm,cru
   - Use proper APIs for IRQ map resolution; ab8500-core, stmpe, tc3589x, wm8994-irq
   - Pass 'supplied-from' property through axp288_fuel_gauge via swnode
   - Remove unused file entry; MAINTAINERS
   - Make interrupt line optional; tps65086
   - Rename db8500-cpuidle driver symbol; db8500-prcmu
   - Remove support for unused hardware; tqmx86
   - Provide a standard LPC clock frequency for unknown boards; tqmx86
   - Remove unused code; ti_am335x_tscadc
   - Use of_iomap() instead of ioremap(); syscon

  Bug Fixes:
   - Clear GPIO IRQ resource flags when no IRQ is set; tqmx86
   - Fix incorrect/misleading frequencies; db8500-prcmu
   - Mitigate namespace clash with other GPIOBASE users"

* tag 'mfd-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (31 commits)
  mfd: lpc_sch: Rename GPIOBASE to prevent build error
  mfd: syscon: Use of_iomap() instead of ioremap()
  dt-bindings: mfd: Add Broadcom CRU
  mfd: ti_am335x_tscadc: Delete superfluous error message
  mfd: tqmx86: Assume 24MHz LPC clock for unknown boards
  mfd: tqmx86: Add support for TQ-Systems DMI IDs
  mfd: tqmx86: Add support for TQMx110EB and TQMxE40x
  mfd: tqmx86: Fix typo in "platform"
  mfd: tqmx86: Remove incorrect TQMx90UC board ID
  mfd: tqmx86: Clear GPIO IRQ resource when no IRQ is set
  mfd: simple-mfd-i2c: Add support for registering devices via MFD cells
  mfd/cpuidle: ux500: Rename driver symbol
  mfd: tps65086: Add cell entry for reset driver
  mfd: tps65086: Make interrupt line optional
  dt-bindings: mfd: Convert tps65086.txt to YAML
  MAINTAINERS: Adjust ARM/NOMADIK/Ux500 ARCHITECTURES to file renaming
  mfd: db8500-prcmu: Handle missing FW variant
  mfd: db8500-prcmu: Rename register header
  mfd: axp20x: Add supplied-from property to axp288_fuel_gauge cell
  mfd: Don't use irq_create_mapping() to resolve a mapping
  ...

Merge tag 'gpio-updates-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio updates from Bartosz Golaszewski:
"We mostly have various improvements and refactoring all over the place
  but also some interesting new features - like the virtio GPIO driver
  that allows guest VMs to use host's GPIOs. We also have a new/old GPIO
  driver for rockchip - this one has been split out of the pinctrl
  driver.

  Summary:

   - new driver: gpio-virtio allowing a guest VM running linux to access
     GPIO lines provided by the host

   - split the GPIO driver out of the rockchip pin control driver

   - add support for a new model to gpio-aspeed-sgpio, refactor the
     driver and use generic device property interfaces, improve property
     sanitization

   - add ACPI support to gpio-tegra186

   - improve the code setting the line names to support multiple GPIO
     banks per device

   - constify a bunch of OF functions in the core GPIO code and make the
     declaration for one of the core OF functions we use consistent
     within its header

   - use software nodes in intel_quark_i2c_gpio

   - add support for the gpio-line-names property in gpio-mt7621

   - use the standard GPIO function for setting the GPIO names in
     gpio-brcmstb

   - fix a bunch of leaks and other bugs in gpio-mpc8xxx

   - use generic pm callbacks in gpio-ml-ioh

   - improve resource management and PM handling in gpio-mlxbf2

   - modernize and improve the gpio-dwapb driver

   - coding style improvements in gpio-rcar

   - documentation fixes and improvements

   - update the MAINTAINERS entry for gpio-zynq

   - minor tweaks in several drivers"

* tag 'gpio-updates-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (35 commits)
  gpio: mpc8xxx: Use 'devm_gpiochip_add_data()' to simplify the code and avoid a leak
  gpio: mpc8xxx: Fix a potential double iounmap call in 'mpc8xxx_probe()'
  gpio: mpc8xxx: Fix a resources leak in the error handling path of 'mpc8xxx_probe()'
  gpio: viperboard: remove platform_set_drvdata() call in probe
  gpio: virtio: Add missing mailings lists in MAINTAINERS entry
  gpio: virtio: Fix sparse warnings
  gpio: remove the obsolete MX35 3DS BOARD MC9S08DZ60 GPIO functions
  gpio: max730x: Use the right include
  gpio: Add virtio-gpio driver
  gpio: mlxbf2: Use DEFINE_RES_MEM_NAMED() helper macro
  gpio: mlxbf2: Use devm_platform_ioremap_resource()
  gpio: mlxbf2: Drop wrong use of ACPI_PTR()
  gpio: mlxbf2: Convert to device PM ops
  gpio: dwapb: Get rid of legacy platform data
  mfd: intel_quark_i2c_gpio: Convert GPIO to use software nodes
  gpio: dwapb: Read GPIO base from gpio-base property
  gpio: dwapb: Unify ACPI enumeration checks in get_irq() and configure_irqs()
  gpiolib: Deduplicate forward declaration in the consumer.h header
  MAINTAINERS: update gpio-zynq.yaml reference
  gpio: tegra186: Add ACPI support
  ...

PM: sleep: core: Avoid setting power.must_resume to false

There are variables(power.may_skip_resume and dev->power.must_resume)
and DPM_FLAG_MAY_SKIP_RESUME flags to control the resume of devices after
a system wide suspend transition.

Setting the DPM_FLAG_MAY_SKIP_RESUME flag means that the driver allows
its "noirq" and "early" resume callbacks to be skipped if the device
can be left in suspend after a system-wide transition into the working
state. PM core determines that the driver's "noirq" and "early" resume
callbacks should be skipped or not with dev_pm_skip_resume() function by
checking power.may_skip_resume variable.

power.must_resume variable is getting set to false in __device_suspend()
function without checking device's DPM_FLAG_MAY_SKIP_RESUME settings.
In problematic scenario, where all the devices in the suspend_late
stage are successful and some device can fail to suspend in
suspend_noirq phase. So some devices successfully suspended in suspend_late
stage are not getting chance to execute __device_suspend_noirq()
to set dev->power.must_resume variable to true and not getting
resumed in early_resume phase.

Add a check for device's DPM_FLAG_MAY_SKIP_RESUME flag before
setting power.must_resume variable in __device_suspend function.

Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
Signed-off-by: Prasad Sodagudi <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Merge tag 'fuse-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

- Allow mounting an active fuse device. Previously the fuse device
   would always be mounted during initialization, and sharing a fuse
   superblock was only possible through mount or namespace cloning

- Fix data flushing in syncfs (virtiofs only)

- Fix data flushing in copy_file_range()

- Fix a possible deadlock in atomic O_TRUNC

- Misc fixes and cleanups

* tag 'fuse-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: remove unused arg in fuse_write_file_get()
  fuse: wait for writepages in syncfs
  fuse: flush extending writes
  fuse: truncate pagecache on atomic_o_trunc
  fuse: allow sharing existing sb
  fuse: move fget() to fuse_get_tree()
  fuse: move option checking into fuse_fill_super()
  fuse: name fs_context consistently
  fuse: fix use after free in fuse_read_interrupt()

Documentation: power: include kernel-doc in Energy Model doc

Improve the existing documentation of the Energy Model and add kernel-doc
comments. This extends an API description and helps better understanding
the usage.

Signed-off-by: Lukasz Luba <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

PM: EM: fix kernel-doc comments

Fix the kernel-doc comments for the improved Energy Model documentation.

Signed-off-by: Lukasz Luba <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

cpufreq: intel_pstate: hybrid: Rework HWP calibration

The current HWP calibration for hybrid processors in intel_pstate is
fragile, because it depends too much on the information provided by
the platform firmware via CPPC which may not be reliable enough. It
also need not be so complicated.

In order to improve that mechanism and make it more resistant to
platform firmware issues, make it only use the CPPC nominal_perf
values to compute the HWP-to-frequency scaling factors for all
CPUs and possibly use the HWP_CAP highest_perf values to recompute
them if the ones derived from the CPPC nominal_perf values alone
appear to be too high.

Namely, fetch CPC.nominal_perf for all CPUs present in the system,
find the minimum one and use it as a reference for computing all of
the CPUs' scaling factors (using the observation that for the CPUs
having the minimum CPC.nominal_perf the HWP range of available
performance levels should be the same as the range of available
"legacy" P-states and so the HWP-to-frequency scaling factor for
them should be the same as the corresponding scaling factor used
for representing the P-state values in kHz).

Signed-off-by: Rafael J. Wysocki <[email protected]>
Tested-by: Zhang Rui <[email protected]>

ACPI: CPPC: Introduce cppc_get_nominal_perf()

On some systems the nominal_perf value retrieved via CPPC is just
a constant and fetching it doesn't require accessing any registers,
so if it is the only CPPC capability that's needed, it is wasteful
to run cppc_get_perf_caps() in order to get just that value alone,
especially when this is done for CPUs other than the one running
the code.

For this reason, introduce cppc_get_nominal_perf() allowing
nominal_perf to be obtained individually, by generalizing the
existing cppc_get_desired_perf() (and renaming it) so it can be
used to retrieve any specific CPPC capability value.

While at it, clean up the cppc_get_desired_perf() kerneldoc comment
a bit.

Signed-off-by: Rafael J. Wysocki <[email protected]>

ACPI: scan: Remove unneeded header linux/nls.h

Code that use linux/nls.h was moved to device_sysfs.c by commit
c2efefb33abf ("ACPI / scan: Move sysfs-related device code to a separate file")

Remove this include so that complier has easier times and it would be
easier to grep where nls code is used.

Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq()

This function has the 'irq' parameter which isn't ever used, so drop it.

Signed-off-by: Sergey Shtylyov <[email protected]>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

Merge tag 'kgdb-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux

Pull kgdb updates from Daniel Thompson:
"Changes for kgdb/kdb this cycle are dominated by a change from Sumit
  that removes as small (256K) private heap from kdb. This is change
  I've hoped for ever since I discovered how few users of this heap
  remained in the kernel, so many thanks to Sumit for hunting these
  down.

  The other change is an incremental step towards SPDX headers"

* tag 'kgdb-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kernel: debug: Convert to SPDX identifier
  kdb: Rename members of struct kdbtab_t
  kdb: Simplify kdb_defcmd macro logic
  kdb: Get rid of redundant kdb_register_flags()
  kdb: Rename struct defcmd_set to struct kdb_macro
  kdb: Get rid of custom debug heap allocator

cxl/registers: Fix Documentation warning

Commit 0f06157e0135 ("cxl/core: Move register mapping infrastructure")
neglected to add a DOC header for the new drivers/core/regs.c file.

Reported-by: Ben Widawsky <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/163072206675.2250120.3527179192933919995.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>

cxl/pmem: Fix Documentation warning

Commit 06737cd0d216 ("cxl/core: Move pmem functionality") neglected to
add a DOC header for the new drivers/cxl/core/pmem.c file.

Reported-by: Ben Widawsky <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/163072206163.2250120.11486436976516079516.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>

cxl/uapi: Fix defined but not used warnings

Fix unused-const-variable warnings emitted by gcc when cxlmem.h is used
by pretty much all files except pci.c

Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/163072205652.2250120.16833548560832424468.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>

cxl/pci: Fix debug message in cxl_probe_regs()

Indicator string for mbox and memdev register set to status
incorrectly in error message.

Cc: <[email protected]>
Fixes: 30af97296f48 ("cxl/pci: Map registers based on capabilities")
Signed-off-by: Li Qiang (Johnny Li) <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/163072205089.2250120.8103605864156687395.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>

cxl/pci: Fix lockdown level

A proposed rework of security_locked_down() users identified that the
cxl_pci driver was passing the wrong lockdown_reason. Update
cxl_mem_raw_command_allowed() to fail raw command access when raw pci
access is also disabled.

Fixes: 13237183c735 ("cxl/mem: Add a "RAW" send command")
Cc: Ben Widawsky <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: <[email protected]>
Cc: Ondrej Mosnacek <[email protected]>
Cc: Paul Moore <[email protected]>
Link: https://lore.kernel.org/r/163072204525.2250120.16615792476976546735.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>

cxl/acpi: Do not add DSDT disabled ACPI0016 host bridge ports

During CXL ACPI probe, host bridge ports are discovered by scanning
the ACPI0017 root port for ACPI0016 host bridge devices. The scan
matches on the hardware id of "ACPI0016". An issue occurs when an
ACPI0016 device is defined in the DSDT yet disabled on the platform.
Attempts by the cxl_acpi driver to add host bridge ports using a
disabled device fails, and the entire cxl_acpi probe fails.

The DSDT table includes an _STA method that sets the status and the
ACPI subsystem has checks available to examine it. One such check is
in the acpi_pci_find_root() path. Move the call to acpi_pci_find_root()
to the matching function to prevent this issue when adding either
upstream or downstream ports.

Suggested-by: Dan Williams <[email protected]>
Signed-off-by: Alison Schofield <[email protected]>
Fixes: 7d4b5ca2e2cb ("cxl/acpi: Add downstream port data to cxl_port instances")
Cc: <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/163072203957.2250120.2178685721061002124.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>

Revert "memcg: enable accounting for pollfd and select bits arrays"

This reverts commit b655843444152c0a14b749308e4cb35d91cbcf0b.

Just like with the memcg lock accounting, the kernel test robot reports
a sizeable performance regression for this commit, and while it clearly
does the rigth thing in theory, we'll need to look at just how to avoid
or minimize the performance overhead of the memcg accounting.

People already have suggestions on how to do that, but it's "future
work".

So revert it for now.

[ Note: the first link below is for this same commit but a different
commit ID, because it's the kernel test robot ended up noticing it in
Andrew Morton's patch queue ]

Link: https://lore.kernel.org/lkml/20210905132732.GC15026@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
Acked-by: Jens Axboe <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Revert "memcg: enable accounting for file lock caches"

This reverts commit 0f12156dff2862ac54235fc72703f18770769042.

The kernel test robot reports a sizeable performance regression for this
commit, and while it clearly does the rigth thing in theory, we'll need
to look at just how to avoid or minimize the performance overhead of the
memcg accounting.

People already have suggestions on how to do that, but it's "future
work".

So revert it for now.

Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/
Acked-by: Jens Axboe <[email protected]>
Acked-by: Shakeel Butt <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly"

This reverts commit 9857a17f206ff374aea78bccfb687f145368be2e.

That commit was completely broken, and I should have caught on to it
earlier.  But happily, the kernel test robot noticed the breakage fairly
quickly.

The breakage is because "try_get_page()" is about avoiding the page
reference count overflow case, but is otherwise the exact same as a
plain "get_page()".

In contrast, "try_get_compound_head()" is an entirely different beast,
and uses __page_cache_add_speculative() because it's not just about the
page reference count, but also about possibly racing with the underlying
page going away.

So all the commentary about how

"try_get_page() has fallen a little behind in terms of maintenance,
  try_get_compound_head() handles speculative page references more
  thoroughly"

was just completely wrong: yes, try_get_compound_head() handles
speculative page references, but the point is that try_get_page() does
not, and must not.

So there's no lack of maintainance - there are fundamentally different
semantics.

A speculative page reference would be entirely wrong in "get_page()",
and it's entirely wrong in "try_get_page()".  It's not about
speculation, it's purely about "uhhuh, you can't get this page because
you've tried to increment the reference count too much already".

The reason the kernel test robot noticed this bug was that it hit the
VM_BUG_ON() in __page_cache_add_speculative(), which is all about
verifying that the context of any speculative page access is correct.
But since that isn't what try_get_page() is all about, the VM_BUG_ON()
tests things that are not correct to test for try_get_page().

Reported-by: kernel test robot <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

block: move fs/block_dev.c to block/bdev.c

Move it together with the rest of the block layer.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

block: split out operations on block special files

Add a new block/fops.c for all the file and address_space operations
that provide the block special file support.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[axboe: correct trailing whitespace while at it]
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'nvme-5.15-2021-09-07' of git://git.infradead.org/nvme into block-5.15

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 5.15

- fix nvmet command set reporting for passthrough controllers
   (Adam Manzanares)
- updated
- update a MAINTAINERS email address (Chaitanya Kulkarni)
- set QUEUE_FLAG_NOWAIT for nvme-multipth (me)
- handle errors from add_disk() (Luis Chamberlain)
- update the keep alive interval when kato is modified (Tatsuya Sasaki)
- fix a buffer overrun in nvmet_subsys_attr_serial (Hannes Reinecke)
- do not reset transport on data digest errors in nvme-tcp (Daniel Wagner)
- only call synchronize_srcu when clearing current path (Daniel Wagner)
- revalidate paths during rescan (Hannes Reinecke)"

* tag 'nvme-5.15-2021-09-07' of git://git.infradead.org/nvme:
  nvme: update MAINTAINERS email address
  nvme: add error handling support for add_disk()
  nvme: only call synchronize_srcu when clearing current path
  nvme: update keep alive interval when kato is modified
  nvme-tcp: Do not reset transport on data digest errors
  nvmet: fixup buffer overrun in nvmet_subsys_attr_serial()
  nvmet: return bool from nvmet_passthru_ctrl and nvmet_is_passthru_req
  nvmet: looks at the passthrough controller when initializing CAP
  nvme: move nvme_multi_css into nvme.h
  nvme-multipath: revalidate paths during rescan
  nvme-multipath: set QUEUE_FLAG_NOWAIT

blk-throttle: fix UAF by deleteing timer in blk_throtl_exit()

The pending timer has been set up in blk_throtl_init(). However, the
timer is not deleted in blk_throtl_exit(). This means that the timer
handler may still be running after freeing the timer, which would
result in a use-after-free.

Fix by calling del_timer_sync() to delete the timer in blk_throtl_exit().

Signed-off-by: Li Jinlin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

block: genhd: don't call blkdev_show() with major_names_lock held

If CONFIG_BLK_DEV_LOOP && CONFIG_MTD (at least; there might be other
combinations), lockdep complains circular locking dependency at
__loop_clr_fd(), for major_names_lock serves as a locking dependency
aggregating hub across multiple block modules.

======================================================
WARNING: possible circular locking dependency detected
5.14.0+ #757 Tainted: G            E
------------------------------------------------------
systemd-udevd/7568 is trying to acquire lock:
ffff88800f334d48 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x70/0x560

but task is already holding lock:
ffff888014a7d4a0 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x4d/0x400 [loop]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&lo->lo_mutex){+.+.}-{3:3}:
        lock_acquire+0xbe/0x1f0
        __mutex_lock_common+0xb6/0xe10
        mutex_lock_killable_nested+0x17/0x20
        lo_open+0x23/0x50 [loop]
        blkdev_get_by_dev+0x199/0x540
        blkdev_open+0x58/0x90
        do_dentry_open+0x144/0x3a0
        path_openat+0xa57/0xda0
        do_filp_open+0x9f/0x140
        do_sys_openat2+0x71/0x150
        __x64_sys_openat+0x78/0xa0
        do_syscall_64+0x3d/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #5 (&disk->open_mutex){+.+.}-{3:3}:
        lock_acquire+0xbe/0x1f0
        __mutex_lock_common+0xb6/0xe10
        mutex_lock_nested+0x17/0x20
        bd_register_pending_holders+0x20/0x100
        device_add_disk+0x1ae/0x390
        loop_add+0x29c/0x2d0 [loop]
        blk_request_module+0x5a/0xb0
        blkdev_get_no_open+0x27/0xa0
        blkdev_get_by_dev+0x5f/0x540
        blkdev_open+0x58/0x90
        do_dentry_open+0x144/0x3a0
        path_openat+0xa57/0xda0
        do_filp_open+0x9f/0x140
        do_sys_openat2+0x71/0x150
        __x64_sys_openat+0x78/0xa0
        do_syscall_64+0x3d/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #4 (major_names_lock){+.+.}-{3:3}:
        lock_acquire+0xbe/0x1f0
        __mutex_lock_common+0xb6/0xe10
        mutex_lock_nested+0x17/0x20
        blkdev_show+0x19/0x80
        devinfo_show+0x52/0x60
        seq_read_iter+0x2d5/0x3e0
        proc_reg_read_iter+0x41/0x80
        vfs_read+0x2ac/0x330
        ksys_read+0x6b/0xd0
        do_syscall_64+0x3d/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&p->lock){+.+.}-{3:3}:
        lock_acquire+0xbe/0x1f0
        __mutex_lock_common+0xb6/0xe10
        mutex_lock_nested+0x17/0x20
        seq_read_iter+0x37/0x3e0
        generic_file_splice_read+0xf3/0x170
        splice_direct_to_actor+0x14e/0x350
        do_splice_direct+0x84/0xd0
        do_sendfile+0x263/0x430
        __se_sys_sendfile64+0x96/0xc0
        do_syscall_64+0x3d/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#3){.+.+}-{0:0}:
        lock_acquire+0xbe/0x1f0
        lo_write_bvec+0x96/0x280 [loop]
        loop_process_work+0xa68/0xc10 [loop]
        process_one_work+0x293/0x480
        worker_thread+0x23d/0x4b0
        kthread+0x163/0x180
        ret_from_fork+0x1f/0x30

-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
        lock_acquire+0xbe/0x1f0
        process_one_work+0x280/0x480
        worker_thread+0x23d/0x4b0
        kthread+0x163/0x180
        ret_from_fork+0x1f/0x30

-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
        validate_chain+0x1f0d/0x33e0
        __lock_acquire+0x92d/0x1030
        lock_acquire+0xbe/0x1f0
        flush_workqueue+0x8c/0x560
        drain_workqueue+0x80/0x140
        destroy_workqueue+0x47/0x4f0
        __loop_clr_fd+0xb4/0x400 [loop]
        blkdev_put+0x14a/0x1d0
        blkdev_close+0x1c/0x20
        __fput+0xfd/0x220
        task_work_run+0x69/0xc0
        exit_to_user_mode_prepare+0x1ce/0x1f0
        syscall_exit_to_user_mode+0x26/0x60
        do_syscall_64+0x4c/0xb0
        entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
   (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&lo->lo_mutex);
                                lock(&disk->open_mutex);
                                lock(&lo->lo_mutex);
   lock((wq_completion)loop0);

  *** DEADLOCK ***

2 locks held by systemd-udevd/7568:
  #0: ffff888012554128 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_put+0x4c/0x1d0
  #1: ffff888014a7d4a0 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x4d/0x400 [loop]

stack backtrace:
CPU: 0 PID: 7568 Comm: systemd-udevd Tainted: G            E     5.14.0+ #757
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
Call Trace:
  dump_stack_lvl+0x79/0xbf
  print_circular_bug+0x5d6/0x5e0
  ? stack_trace_save+0x42/0x60
  ? save_trace+0x3d/0x2d0
  check_noncircular+0x10b/0x120
  validate_chain+0x1f0d/0x33e0
  ? __lock_acquire+0x953/0x1030
  ? __lock_acquire+0x953/0x1030
  __lock_acquire+0x92d/0x1030
  ? flush_workqueue+0x70/0x560
  lock_acquire+0xbe/0x1f0
  ? flush_workqueue+0x70/0x560
  flush_workqueue+0x8c/0x560
  ? flush_workqueue+0x70/0x560
  ? sched_clock_cpu+0xe/0x1a0
  ? drain_workqueue+0x41/0x140
  drain_workqueue+0x80/0x140
  destroy_workqueue+0x47/0x4f0
  ? blk_mq_freeze_queue_wait+0xac/0xd0
  __loop_clr_fd+0xb4/0x400 [loop]
  ? __mutex_unlock_slowpath+0x35/0x230
  blkdev_put+0x14a/0x1d0
  blkdev_close+0x1c/0x20
  __fput+0xfd/0x220
  task_work_run+0x69/0xc0
  exit_to_user_mode_prepare+0x1ce/0x1f0
  syscall_exit_to_user_mode+0x26/0x60
  do_syscall_64+0x4c/0xb0
  entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f0fd4c661f7
Code: 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 13 fc ff ff
RSP: 002b:00007ffd1c9e9fd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 00007f0fd46be6c8 RCX: 00007f0fd4c661f7
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000006
RBP: 0000000000000006 R08: 000055fff1eaf400 R09: 0000000000000000
R10: 00007f0fd46be6c8 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000002f08 R15: 00007ffd1c9ea050

Commit 1c500ad706383f1a ("loop: reduce the loop_ctl_mutex scope") is for
breaking "loop_ctl_mutex => &lo->lo_mutex" dependency chain. But enabling
a different block module results in forming circular locking dependency
due to shared major_names_lock mutex.

The simplest fix is to call probe function without holding
major_names_lock [1], but Christoph Hellwig does not like such idea.
Therefore, instead of holding major_names_lock in blkdev_show(),
introduce a different lock for blkdev_show() in order to break
"sb_writers#$N => &p->lock => major_names_lock" dependency chain.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tetsuo Handa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull more ARM cpufreq changes for v5.15-rc1 from Viresh Kumar:

"This adds a new cpufreq driver for Mediatek, which had been going
through reviews since last one year."

* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  cpufreq: mediatek-hw: Add support for CPUFREQ HW
  cpufreq: Add of_perf_domain_get_sharing_cpumask
  dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW

Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"

Revert commit d0e936adbd22 ("cpufreq: intel_pstate: Process HWP
Guaranteed change notification"), because it causes a NULL pointer
dereference to occur on Lenovo X1 gen9 laptops due to an HWP
guaranteed performance change interrupt arriving prematurely.

This feature will be revisited in the next cycle.

Reported-by: Jens Axboe <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

ieee802154: Remove redundant initialization of variable ret

The variable ret is being initialized with a value that is never read, it
is being updated later on. The assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'stmmac-wol-fix'

Joakim Zhang says:

====================
net: stmmac: fix WoL issue

This patch set fixes stmmac not working after system resume back with WoL
active. Thanks a lot for Russell King keeps looking into this issue.
====================

Signed-off-by: David S. Miller <[email protected]>

net: stmmac: fix MAC not working when system resume back with WoL active

We can reproduce this issue with below steps:
1) enable WoL on the host
2) host system suspended
3) remote client send out wakeup packets
We can see that host system resume back, but can't work, such as ping failed.

After a bit digging, this issue is introduced by the commit 46f69ded988d
("net: stmmac: Use resolved link config in mac_link_up()"), which use
the finalised link parameters in mac_link_up() rather than the
parameters in mac_config().

There are two scenarios for MAC suspend/resume in STMMAC driver:

1) MAC suspend with WoL inactive, stmmac_suspend() call
phylink_mac_change() to notify phylink machine that a change in MAC
state, then .mac_link_down callback would be invoked. Further, it will
call phylink_stop() to stop the phylink instance. When MAC resume back,
firstly phylink_start() is called to start the phylink instance, then
call phylink_mac_change() which will finally trigger phylink machine to
invoke .mac_config and .mac_link_up callback. All is fine since
configuration in these two callbacks will be initialized, that means MAC
can restore the state.

2) MAC suspend with WoL active, phylink_mac_change() will put link
down, but there is no phylink_stop() to stop the phylink instance, so it
will link up again, that means .mac_config and .mac_link_up would be
invoked before system suspended. After system resume back, it will do
DMA initialization and SW reset which let MAC lost the hardware setting
(i.e MAC_Configuration register(offset 0x0) is reset). Since link is up
before system suspended, so .mac_link_up would not be invoked after
system resume back, lead to there is no chance to initialize the
configuration in .mac_link_up callback, as a result, MAC can't work any
longer.

After discussed with Russell King [1], we confirm that phylink framework
have not take WoL into consideration yet. This patch calls
phylink_suspend()/phylink_resume() functions which is newly introduced
by Russell King to fix this issue.

[1] https://lore.kernel.org/netdev/20210901090228 [email protected]/

Fixes: 46f69ded988d ("net: stmmac: Use resolved link config in mac_link_up()")
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phylink: add suspend/resume support

Joakim Zhang reports that Wake-on-Lan with the stmmac ethernet driver broke
when moving the incorrect handling of mac link state out of mac_config().
This reason this breaks is because the stmmac's WoL is handled by the MAC
rather than the PHY, and phylink doesn't cater for that scenario.

This patch adds the necessary phylink code to handle suspend/resume events
according to whether the MAC still needs a valid link or not. This is the
barest minimum for this support.

Reported-by: Joakim Zhang <[email protected]>
Tested-by: Joakim Zhang <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Joakim Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: renesas: sh_eth: Fix freeing wrong tx descriptor

The cur_tx counter must be incremented after TACT bit of
txdesc->status was set. However, a CPU is possible to reorder
instructions and/or memory accesses between cur_tx and
txdesc->status. And then, if TX interrupt happened at such a
timing, the sh_eth_tx_free() may free the descriptor wrongly.
So, add wmb() before cur_tx++.
Otherwise NETDEV WATCHDOG timeout is possible to happen.

Fixes: 86a74ff21a7a ("net: sh_eth: add support for Renesas SuperH Ethernet")
Signed-off-by: Yoshihiro Shimoda <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

btrfs: zoned: fix double counting of split ordered extent

btrfs_add_ordered_extent_*() add num_bytes to fs_info->ordered_bytes.
Then, splitting an ordered extent will call btrfs_add_ordered_extent_*()
again for split extents, leading to double counting of the region of
a split extent. These leaked bytes are finally reported at unmount time
as follow:

BTRFS info (device dm-1): at unmount dio bytes count 364544

Fix the double counting by subtracting split extent's size from
fs_info->ordered_bytes.

Fixes: d22002fd37bd ("btrfs: zoned: split ordered extent when bio is sent")
CC: [email protected] # 5.12+
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Naohiro Aota <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: fix lockdep warning while mounting sprout fs

Following test case reproduces lockdep warning.

  Test case:

  $ mkfs.btrfs -f <dev1>
  $ btrfstune -S 1 <dev1>
  $ mount <dev1> <mnt>
  $ btrfs device add <dev2> <mnt> -f
  $ umount <mnt>
  $ mount <dev2> <mnt>
  $ umount <mnt>

The warning claims a possible ABBA deadlock between the threads
initiated by [#1] btrfs device add and [#0] the mount.

  [ 540.743122] WARNING: possible circular locking dependency detected
  [ 540.743129] 5.11.0-rc7+ #5 Not tainted
  [ 540.743135] ------------------------------------------------------
  [ 540.743142] mount/2515 is trying to acquire lock:
  [ 540.743149] ffffa0c5544c2ce0 (&fs_devs->device_list_mutex){+.+.}-{4:4}, at: clone_fs_devices+0x6d/0x210 [btrfs]
  [ 540.743458] but task is already holding lock:
  [ 540.743461] ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs]
  [ 540.743541] which lock already depends on the new lock.
  [ 540.743543] the existing dependency chain (in reverse order) is:

  [ 540.743546] -> #1 (btrfs-chunk-00){++++}-{4:4}:
  [ 540.743566] down_read_nested+0x48/0x2b0
  [ 540.743585] __btrfs_tree_read_lock+0x32/0x200 [btrfs]
  [ 540.743650] btrfs_read_lock_root_node+0x70/0x200 [btrfs]
  [ 540.743733] btrfs_search_slot+0x6c6/0xe00 [btrfs]
  [ 540.743785] btrfs_update_device+0x83/0x260 [btrfs]
  [ 540.743849] btrfs_finish_chunk_alloc+0x13f/0x660 [btrfs] <--- device_list_mutex
  [ 540.743911] btrfs_create_pending_block_groups+0x18d/0x3f0 [btrfs]
  [ 540.743982] btrfs_commit_transaction+0x86/0x1260 [btrfs]
  [ 540.744037] btrfs_init_new_device+0x1600/0x1dd0 [btrfs]
  [ 540.744101] btrfs_ioctl+0x1c77/0x24c0 [btrfs]
  [ 540.744166] __x64_sys_ioctl+0xe4/0x140
  [ 540.744170] do_syscall_64+0x4b/0x80
  [ 540.744174] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [ 540.744180] -> #0 (&fs_devs->device_list_mutex){+.+.}-{4:4}:
  [ 540.744184] __lock_acquire+0x155f/0x2360
  [ 540.744188] lock_acquire+0x10b/0x5c0
  [ 540.744190] __mutex_lock+0xb1/0xf80
  [ 540.744193] mutex_lock_nested+0x27/0x30
  [ 540.744196] clone_fs_devices+0x6d/0x210 [btrfs]
  [ 540.744270] btrfs_read_chunk_tree+0x3c7/0xbb0 [btrfs]
  [ 540.744336] open_ctree+0xf6e/0x2074 [btrfs]
  [ 540.744406] btrfs_mount_root.cold.72+0x16/0x127 [btrfs]
  [ 540.744472] legacy_get_tree+0x38/0x90
  [ 540.744475] vfs_get_tree+0x30/0x140
  [ 540.744478] fc_mount+0x16/0x60
  [ 540.744482] vfs_kern_mount+0x91/0x100
  [ 540.744484] btrfs_mount+0x1e6/0x670 [btrfs]
  [ 540.744536] legacy_get_tree+0x38/0x90
  [ 540.744537] vfs_get_tree+0x30/0x140
  [ 540.744539] path_mount+0x8d8/0x1070
  [ 540.744541] do_mount+0x8d/0xc0
  [ 540.744543] __x64_sys_mount+0x125/0x160
  [ 540.744545] do_syscall_64+0x4b/0x80
  [ 540.744547] entry_SYSCALL_64_after_hwframe+0x44/0xa9

  [ 540.744551] other info that might help us debug this:
  [ 540.744552] Possible unsafe locking scenario:

  [ 540.744553] CPU0 CPU1
  [ 540.744554] ---- ----
  [ 540.744555] lock(btrfs-chunk-00);
  [ 540.744557] lock(&fs_devs->device_list_mutex);
  [ 540.744560] lock(btrfs-chunk-00);
  [ 540.744562] lock(&fs_devs->device_list_mutex);
  [ 540.744564]
   *** DEADLOCK ***

  [ 540.744565] 3 locks held by mount/2515:
  [ 540.744567] #0: ffffa0c56bf7a0e0 (&type->s_umount_key#42/1){+.+.}-{4:4}, at: alloc_super.isra.16+0xdf/0x450
  [ 540.744574] #1: ffffffffc05a9628 (uuid_mutex){+.+.}-{4:4}, at: btrfs_read_chunk_tree+0x63/0xbb0 [btrfs]
  [ 540.744640] #2: ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs]
  [ 540.744708]
   stack backtrace:
  [ 540.744712] CPU: 2 PID: 2515 Comm: mount Not tainted 5.11.0-rc7+ #5

But the device_list_mutex in clone_fs_devices() is redundant, as
explained below.  Two threads [1]  and [2] (below) could lead to
clone_fs_device().

  [1]
  open_ctree <== mount sprout fs
   btrfs_read_chunk_tree()
    mutex_lock(&uuid_mutex) <== global lock
    read_one_dev()
     open_seed_devices()
      clone_fs_devices() <== seed fs_devices
       mutex_lock(&orig->device_list_mutex) <== seed fs_devices

  [2]
  btrfs_init_new_device() <== sprouting
   mutex_lock(&uuid_mutex); <== global lock
   btrfs_prepare_sprout()
     lockdep_assert_held(&uuid_mutex)
     clone_fs_devices(seed_fs_device) <== seed fs_devices

Both of these threads hold uuid_mutex which is sufficient to protect
getting the seed device(s) freed while we are trying to clone it for
sprouting [2] or mounting a sprout [1] (as above). A mounted seed device
can not free/write/replace because it is read-only. An unmounted seed
device can be freed by btrfs_free_stale_devices(), but it needs
uuid_mutex.  So this patch removes the unnecessary device_list_mutex in
clone_fs_devices().  And adds a lockdep_assert_held(&uuid_mutex) in
clone_fs_devices().

Reported-by: Su Yue <[email protected]>
Tested-by: Su Yue <[email protected]>
Signed-off-by: Anand Jain <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: delay blkdev_put until after the device remove

When removing the device we call blkdev_put() on the device once we've
removed it, and because we have an EXCL open we need to take the
->open_mutex on the block device to clean it up.  Unfortunately during
device remove we are holding the sb writers lock, which results in the
following lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #407 Not tainted
------------------------------------------------------
losetup/11595 is trying to acquire lock:
ffff973ac35dd138 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0

but task is already holding lock:
ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&lo->lo_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       lo_open+0x28/0x60 [loop]
       blkdev_get_whole+0x25/0xf0
       blkdev_get_by_dev.part.0+0x168/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       do_sys_openat2+0x7b/0x130
       __x64_sys_openat+0x46/0x70
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&disk->open_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       blkdev_put+0x3a/0x220
       btrfs_rm_device.cold+0x62/0xe5
       btrfs_ioctl+0x2a31/0x2e70
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#12){.+.+}-{0:0}:
       lo_write_bvec+0xc2/0x240 [loop]
       loop_process_work+0x238/0xd00 [loop]
       process_one_work+0x26b/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
       process_one_work+0x245/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
       __lock_acquire+0x10ea/0x1d90
       lock_acquire+0xb5/0x2b0
       flush_workqueue+0x91/0x5e0
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x660 [loop]
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
  (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&lo->lo_mutex);
                               lock(&disk->open_mutex);
                               lock(&lo->lo_mutex);
  lock((wq_completion)loop0);

*** DEADLOCK ***

1 lock held by losetup/11595:
#0: ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

stack backtrace:
CPU: 0 PID: 11595 Comm: losetup Not tainted 5.14.0-rc2+ #407
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
dump_stack_lvl+0x57/0x72
check_noncircular+0xcf/0xf0
? stack_trace_save+0x3b/0x50
__lock_acquire+0x10ea/0x1d90
lock_acquire+0xb5/0x2b0
? flush_workqueue+0x67/0x5e0
? lockdep_init_map_type+0x47/0x220
flush_workqueue+0x91/0x5e0
? flush_workqueue+0x67/0x5e0
? verify_cpu+0xf0/0x100
drain_workqueue+0xa0/0x110
destroy_workqueue+0x36/0x250
__loop_clr_fd+0x9a/0x660 [loop]
? blkdev_ioctl+0x8d/0x2a0
block_ioctl+0x3f/0x50
__x64_sys_ioctl+0x80/0xb0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc21255d4cb

So instead save the bdev and do the put once we've dropped the sb
writers lock in order to avoid the lockdep recursion.

Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: update the bdev time directly when closing

We update the ctime/mtime of a block device when we remove it so that
blkid knows the device changed.  However we do this by re-opening the
block device and calling filp_update_time.  This is more correct because
it'll call the inode->i_op->update_time if it exists, but the block dev
inodes do not do this.  Instead call generic_update_time() on the
bd_inode in order to avoid the blkdev_open path and get rid of the
following lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #406 Not tainted
------------------------------------------------------
losetup/11596 is trying to acquire lock:
ffff939640d2f538 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0

but task is already holding lock:
ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&lo->lo_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       lo_open+0x28/0x60 [loop]
       blkdev_get_whole+0x25/0xf0
       blkdev_get_by_dev.part.0+0x168/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       do_sys_openat2+0x7b/0x130
       __x64_sys_openat+0x46/0x70
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&disk->open_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       blkdev_get_by_dev.part.0+0x56/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       file_open_name+0xc7/0x170
       filp_open+0x2c/0x50
       btrfs_scratch_superblocks.part.0+0x10f/0x170
       btrfs_rm_device.cold+0xe8/0xed
       btrfs_ioctl+0x2a31/0x2e70
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#12){.+.+}-{0:0}:
       lo_write_bvec+0xc2/0x240 [loop]
       loop_process_work+0x238/0xd00 [loop]
       process_one_work+0x26b/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
       process_one_work+0x245/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
       __lock_acquire+0x10ea/0x1d90
       lock_acquire+0xb5/0x2b0
       flush_workqueue+0x91/0x5e0
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x660 [loop]
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
  (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&lo->lo_mutex);
                               lock(&disk->open_mutex);
                               lock(&lo->lo_mutex);
  lock((wq_completion)loop0);

*** DEADLOCK ***

1 lock held by losetup/11596:
#0: ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

stack backtrace:
CPU: 1 PID: 11596 Comm: losetup Not tainted 5.14.0-rc2+ #406
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
dump_stack_lvl+0x57/0x72
check_noncircular+0xcf/0xf0
? stack_trace_save+0x3b/0x50
__lock_acquire+0x10ea/0x1d90
lock_acquire+0xb5/0x2b0
? flush_workqueue+0x67/0x5e0
? lockdep_init_map_type+0x47/0x220
flush_workqueue+0x91/0x5e0
? flush_workqueue+0x67/0x5e0
? verify_cpu+0xf0/0x100
drain_workqueue+0xa0/0x110
destroy_workqueue+0x36/0x250
__loop_clr_fd+0x9a/0x660 [loop]
? blkdev_ioctl+0x8d/0x2a0
block_ioctl+0x3f/0x50
__x64_sys_ioctl+0x80/0xb0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: use correct header for div_u64 in misc.h

asm/do_div.h is for div_u64, but it is found in math64.h. This change
will make compiler job easier and prevent compiler errors in situation
where compiler will not find math64.h from another paths.

Signed-off-by: Kari Argillander <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: fix upper limit for max_inline for page size 64K

The mount option max_inline ranges from 0 to the sectorsize (which is
now equal to page size). But we parse the mount options too early and
before the actual sectorsize is read from the superblock. So the upper
limit of max_inline is unaware of the actual sectorsize and is limited
by the temporary sectorsize 4096, even on a system where the default
sectorsize is 64K.

Fix this by reading the superblock sectorsize before the mount option
parse.

Reported-by: Alexander Tsvetkov <[email protected]>
CC: [email protected] # 5.4+
Signed-off-by: Anand Jain <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

s390/hmcdrv_ftp: fix kernel doc comment

Signed-off-by: Heiko Carstens <[email protected]>

s390: remove xpram device driver

Support for expanded storage was only available until z13 and z/VM 6.3
respectively. However there haven't been any use cases a long time
before for this device driver.
Therefore remove it.

Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>

s390/pci: read clp_list_pci_req only once

We do not have to reset the fh_list in the loop.

Signed-off-by: Pierre Morel <[email protected]>
Reviewed-by: Niklas Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>

s390/pci: fix clp_get_state() handling of -ENODEV

With commit cc049eecfb7a ("s390/pci: simplify CLP List PCI handling")
clp_get_state() was changed to make use of the new clp_find_pci() helper
function to query a specific function. This however returns -ENODEV when
the device is not found at all and this error was passed to the caller.
It was missed however that the callers actually expect a success return
from clp_get_state() if the device is gone.

Fix this by handling the -ENODEV return of clp_find_pci() explicitly in
clp_get_state() returning success and setting the state parameter to
ZPCI_FN_STATE_RESERVED matching the design concept that a PCI function
that disappeared must have been resverved elsewhere. For all other error
returns continue to just pass them on to the caller.

Reviewed-by: Matthew Rosato <[email protected]>
Fixes: cc049eecfb7a ("s390/pci: simplify CLP List PCI handling")
Signed-off-by: Niklas Schnelle <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>

s390/cio: fix kernel doc comment

Signed-off-by: Heiko Carstens <[email protected]>

s390/ctrlchar: fix kernel doc comment

Signed-off-by: Heiko Carstens <[email protected]>

s390/con3270: use proper type for tasklet function

Get rid of this warning:

drivers/s390/char/con3270.c:629:22: warning: cast between incompatible function types from ‘void (*)(struct raw3270_request *)’ to ‘void (*)(long unsigned int)’ [-Wcast-function-type]
629 | (void (*)(unsigned long)) con3270_read_tasklet,
| ^

Signed-off-by: Heiko Carstens <[email protected]>

s390/cpum_cf: move array from header to C file

Move array from header to C file to avoid that it gets defined in
every C file where the header is included:

In file included from arch/s390/kernel/perf_cpum_cf_common.c:19:
./arch/s390/include/asm/cpu_mcf.h:27:18: warning: ‘cpumf_ctr_ctl’ defined but not used [-Wunused-const-variable=]
27 | static const u64 cpumf_ctr_ctl[CPUMF_CTR_SET_MAX] = {
| ^~~~~~~~~~~~~

Signed-off-by: Heiko Carstens <[email protected]>

s390/mm: fix kernel doc comments

Signed-off-by: Heiko Carstens <[email protected]>

s390/topology: fix topology information when calling cpu hotplug notifiers

The cpu hotplug notifiers are called without updating the core/thread
masks when a new CPU is added. This causes problems with code setting
up data structures in a cpu hotplug notifier, and relying on that later
in normal code.

This caused a crash in the new core scheduling code (SCHED_CORE),
where rq->core was set up in a notifier depending on cpu masks.

To fix this, add a cpu_setup_mask which is used in update_cpu_masks()
instead of the cpu_online_mask to determine whether the cpu masks should
be set for a certain cpu. Also move update_cpu_masks() to update the
masks before calling notify_cpu_starting() so that the notifiers are
seeing the updated masks.

Signed-off-by: Sven Schnelle <[email protected]>
Cc: <[email protected]>
[[email protected]: get rid of cpu_online_mask handling]
Signed-off-by: Heiko Carstens <[email protected]>

s390/unwind: use current_frame_address() to unwind current task

current_stack_pointer() simply returns current value of %r15. If
current_stack_pointer() caller allocates stack (which is the case in
unwind code) %r15 points to a stack frame allocated for callees, meaning
current_stack_pointer() caller (e.g. stack_trace_save) will end up in
the stacktrace. This is not expected by stack_trace_save*() callers and
causes problems.

current_frame_address() on the other hand returns function stack frame
address, which matches %r15 upon function invocation. Using it in
get_stack_pointer() makes it more aligned with x86 implementation
(according to BACKTRACE_SELF_TEST output) and meets stack_trace_save*()
caller's expectations, notably KCSAN.

Also make sure unwind_start is always inlined.

Reported-by: Nathan Chancellor <[email protected]>
Suggested-by: Marco Elver <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
Tested-by: Marco Elver <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Link: https://lore.kernel.org/r/patch.git-04dd26be3043.your-ad-here.call-01630504868-ext-6188@work.hours
Signed-off-by: Heiko Carstens <[email protected]>

ALSA: gus: Fix repeated probe for ISA interwave card

The legacy ISA probe tries to probe the card repeatedly, and this
would conflict with the refactoring using devres. Put the card
creation out of the loop and only probe GUS object repeatedly.

Fixes: 5b88da3c800f ("ALSA: gus: Allocate resources with device-managed APIs")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: gus: Fix repeated probes of snd_gus_create()

GUS card object may be repeatedly probed for the legacy ISA devices,
and the behavior doesn't fit with the devres resource management.

Revert partially back to the classical way for the snd_gus_card
object, so that the repeated calls of snd_gus_create() are allowed.

Fixes: 5b88da3c800f ("ALSA: gus: Allocate resources with device-managed APIs")
Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

bonding: 3ad: pass parameter bond_params by reference

The parameter bond_params is a relatively large 192 byte sized
struct so pass it by reference rather than by value to reduce
copying.

Addresses-Coverity: ("Big parameter passed by value")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'linux-can-fixes-for-5.15-20210907' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

linux-can-fixes-for-5.15-20210907

Signed-off-by: David S. Miller <[email protected]>

Merge tag 'wireless-drivers-2021-09-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for v5.15

First set of fixes for v5.15 and only iwlwifi patches this time. Most
important being support for new hardware and new firmware API.

I had already earlier applied a fix which also Linus applied to this
tree as commit 1476ff21abb4 ("iwl: fix debug printf format strings"),
but this doesn't seem to cause any conflicts so I left it there.

iwlwifi

* add support for firmware API 66

* add support for Samsung Galaxy Book Flex2 Alpha

* fix a leak happening every time module is loaded

* fix a printk compiler warning
====================

Signed-off-by: David S. Miller <[email protected]>

cxgb3: fix oops on module removal

When removing the driver module w/o bringing an interface up before
the error below occurs. Reason seems to be that cancel_work_sync() is
called in t3_sge_stop() for a queue that hasn't been initialized yet.

[10085.941785] ------------[ cut here ]------------
[10085.941799] WARNING: CPU: 1 PID: 5850 at kernel/workqueue.c:3074 __flush_work+0x3ff/0x480
[10085.941819] Modules linked in: vfat snd_hda_codec_hdmi fat snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio led_class ee1004 iTCO_
wdt intel_tcc_cooling x86_pkg_temp_thermal coretemp aesni_intel crypto_simd cryptd snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core r
8169 snd_pcm realtek mdio_devres snd_timer snd i2c_i801 i2c_smbus libphy i915 i2c_algo_bit cxgb3(-) intel_gtt ttm mdio drm_kms_helper mei_me s
yscopyarea sysfillrect sysimgblt mei fb_sys_fops acpi_pad sch_fq_codel crypto_user drm efivarfs ext4 mbcache jbd2 crc32c_intel
[10085.941944] CPU: 1 PID: 5850 Comm: rmmod Not tainted 5.14.0-rc7-next-20210826+ #6
[10085.941974] Hardware name: System manufacturer System Product Name/PRIME H310I-PLUS, BIOS 2603 10/21/2019
[10085.941992] RIP: 0010:__flush_work+0x3ff/0x480
[10085.942003] Code: c0 74 6b 65 ff 0d d1 bd 78 75 e8 bc 2f 06 00 48 c7 c6 68 b1 88 8a 48 c7 c7 e0 5f b4 8b 45 31 ff e8 e6 66 04 00 e9 4b fe ff ff <0f> 0b 45 31 ff e9 41 fe ff ff e8 72 c1 79 00 85 c0 74 87 80 3d 22
[10085.942036] RSP: 0018:ffffa1744383fc08 EFLAGS: 00010246
[10085.942048] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000923
[10085.942062] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff91c901710a88
[10085.942076] RBP: ffffa1744383fce8 R08: 0000000000000001 R09: 0000000000000001
[10085.942090] R10: 00000000000000c2 R11: 0000000000000000 R12: ffff91c901710a88
[10085.942104] R13: 0000000000000000 R14: ffff91c909a96100 R15: 0000000000000001
[10085.942118] FS:  00007fe417837740(0000) GS:ffff91c969d00000(0000) knlGS:0000000000000000
[10085.942134] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10085.942146] CR2: 000055a8d567ecd8 CR3: 0000000121690003 CR4: 00000000003706e0
[10085.942160] Call Trace:
[10085.942166]  ? __lock_acquire+0x3af/0x22e0
[10085.942177]  ? cancel_work_sync+0xb/0x10
[10085.942187]  __cancel_work_timer+0x128/0x1b0
[10085.942197]  ? __pm_runtime_resume+0x5b/0x90
[10085.942208]  cancel_work_sync+0xb/0x10
[10085.942217]  t3_sge_stop+0x2f/0x50 [cxgb3]
[10085.942234]  remove_one+0x26/0x190 [cxgb3]
[10085.942248]  pci_device_remove+0x39/0xa0
[10085.942258]  __device_release_driver+0x15e/0x240
[10085.942269]  driver_detach+0xd9/0x120
[10085.942278]  bus_remove_driver+0x53/0xd0
[10085.942288]  driver_unregister+0x2c/0x50
[10085.942298]  pci_unregister_driver+0x31/0x90
[10085.942307]  cxgb3_cleanup_module+0x10/0x18c [cxgb3]
[10085.942324]  __do_sys_delete_module+0x191/0x250
[10085.942336]  ? syscall_enter_from_user_mode+0x21/0x60
[10085.942347]  ? trace_hardirqs_on+0x2a/0xe0
[10085.942357]  __x64_sys_delete_module+0x13/0x20
[10085.942368]  do_syscall_64+0x40/0x90
[10085.942377]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[10085.942389] RIP: 0033:0x7fe41796323b

Fixes: 5e0b8928927f ("net:cxgb3: replace tasklets with works")
Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mfd: lpc_sch: Rename GPIOBASE to prevent build error

One MIPS platform (mach-rc32434) defines GPIOBASE. This macro
conflicts with one of the same name in lpc_sch.c. Rename the latter one
to prevent the build error.

../drivers/mfd/lpc_sch.c:25: error: "GPIOBASE" redefined [-Werror]
25 | #define GPIOBASE 0x44
../arch/mips/include/asm/mach-rc32434/rb.h:32: note: this is the location of the previous definition
32 | #define GPIOBASE 0x050000

Cc: Denis Turischev <[email protected]>
Fixes: e82c60ae7d3a ("mfd: Introduce lpc_sch for Intel SCH LPC bridge")
Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Lee Jones <[email protected]>

mfd: syscon: Use of_iomap() instead of ioremap()

This automatically selects between ioremap() and ioremap_np() on
platforms that require it, such as Apple SoCs.

Signed-off-by: Hector Martin <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: Lee Jones <[email protected]>

dma-buf: DMABUF_SYSFS_STATS should depend on DMA_SHARED_BUFFER

DMA-BUF sysfs statistics are an option of DMA-BUF. It does not make
much sense to bother the user with a question about DMA-BUF sysfs
statistics if DMA-BUF itself is not enabled. Worse, enabling the
statistics enables the feature.

Fixes: bdb8d06dfefd666d ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Sumit Semwal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

dma-buf: DMABUF_DEBUG should depend on DMA_SHARED_BUFFER

DMA-BUF debug checks are an option of DMA-BUF. Enabling DMABUF_DEBUG
without DMA_SHARED_BUFFER does not have any impact, as drivers/dma-buf/
is not entered during the build when DMA_SHARED_BUFFER is disabled.

Fixes: 84335675f2223cbd ("dma-buf: Add debug option")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Sumit Semwal <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

can: c_can: fix null-ptr-deref on ioctl()

The pdev maybe not a platform device, e.g. c_can_pci device, in this
case, calling to_platform_device() would not make sense. Also, per the
comment in drivers/net/can/c_can/c_can_ethtool.c, @bus_info should
match dev_name() string, so I am replacing this with dev_name() to fix
this issue.

[    1.458583] BUG: unable to handle page fault for address: 0000000100000000
[    1.460921] RIP: 0010:strnlen+0x1a/0x30
[    1.466336]  ? c_can_get_drvinfo+0x65/0xb0 [c_can]
[    1.466597]  ethtool_get_drvinfo+0xae/0x360
[    1.466826]  dev_ethtool+0x10f8/0x2970
[    1.467880]  sock_ioctl+0xef/0x300

Fixes: 2722ac986e93 ("can: c_can: add ethtool support")
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected] # 5.14+
Signed-off-by: Tong Zhang <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: rcar_canfd: add __maybe_unused annotation to silence warning

Since commit

| dd3bd23eb438 ("can: rcar_canfd: Add Renesas R-Car CAN FD driver")

the rcar_canfd driver can be compile tested on all architectures. On
non OF enabled archs, or archs where OF is optional (and disabled in
the .config) the compilation throws the following warning:

| drivers/net/can/rcar/rcar_canfd.c:2020:34: warning: unused variable 'rcar_canfd_of_table' [-Wunused-const-variable]
| static const struct of_device_id rcar_canfd_of_table[] = {
| ^

This patch fixes the warning by marking the variable
rcar_canfd_of_table as __maybe_unused.

Fixes: ac4224087312 ("can: rcar: Kconfig: Add helper dependency on COMPILE_TEST")
Fixes: dd3bd23eb438 ("can: rcar_canfd: Add Renesas R-Car CAN FD driver")
Link: https://lore.kernel.org/all/[email protected]
Cc: [email protected]
Cc: Cai Huoqing <[email protected]>
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

Input: mms114 - support MMS134S

The MMS134S like the MMS136 has an event size of 6 bytes.

After this patch, the touchscreen on the Samsung SGH-I407
works fine with PostmarketOS.

Signed-off-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>

Input: elan_i2c - reduce the resume time for controller in Whitebox

Similar to controllers found Voxel, Delbin, Magpie and Bobba, the one found
in Whitebox does not need to be reset after issuing power-on command, and
skipping reset saves resume time.

Signed-off-by: Jingle Wu <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>

ALSA: vx222: fix null-ptr-deref

a recent refactor created a null pointer vx in snd_vx222_probe().
The vx pointer should have been populated in snd_vx222_create() as
suggested in earlier version, otherwise vx->core.ibl.size will throw an
error.

[ 1.298398] BUG: kernel NULL pointer dereference, address: 00000000000001d8
[ 1.316799] RIP: 0010:snd_vx222_probe+0x155/0x290 [snd_vx222]

Fixes: 3bde3359aa16 ("ALSA: vx222: Allocate resources with device-managed APIs")
Signed-off-by: Tong Zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

libata: pass over maintainership to Damien Le Moal

Damien has agreed to take over maintainership of libata, update the
MAINTAINERS file to reflect that.

Acked-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

docs: pdfdocs: Fix typo in CJK-language specific font settings

There were typos in the fallback definitions of dummy LaTeX macros
for systems without CJK fonts.
They cause build errors in "make pdfdocs" on such systems.
Fix them.

Fixes: e291ff6f5a03 ("docs: pdfdocs: Add CJK-language-specific font settings")
Signed-off-by: Akira Yokosawa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>

thunderbolt: test: split up test cases in tb_test_credit_alloc_all

The tb_test_credit_alloc_all() function had a huge number of
KUNIT_ASSERT() statements, all of which (though the magic of many many
layers of inscrutable macros) ended up allocating and initializing
various test assertion structures on the stack.

Don't do that.  The kernel stack isn't infinite, and we have compiler
warnings (now errors) for the case where a stack frame grows too large.

Like it did here, by not an inconsiderable margin:

   drivers/thunderbolt/test.c: In function ‘tb_test_credit_alloc_all’:
   drivers/thunderbolt/test.c:2367:1: error: the frame size of 4500 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
    2367 | }
         | ^

Solve this similarly to the lib/test_scanf case: split out the tests
into several smaller functions, each just testing one particular tunnel
credit allocation.

This makes the i386 allyesconfig build work for me again.

Signed-off-by: Linus Torvalds <[email protected]>

lib/test_scanf: split up number parsing test routines

It turns out that gcc has real trouble merging all the temporary
on-stack buffer allocation.  So despite the fact that their lifetimes do
not overlap, gcc will allocate stack for all of them when they have
different types.  Which they do in the number scanning test routines.

This is unfortunate in general, but with lots of test-cases in one
function, it becomes a real problem.  gcc will allocate a huge stack
frame for no actual good reason.

We have tried to counteract this tendency of gcc not merging stack slots
(see "-fconserve-stack"), but that has limited effect (and should be on
by default these days, iirc).

So with all the debug options enabled on an i386 allmodconfig build, we
end up with overly big stack frames, and the resulting stack frame size
warnings (now errors):

   lib/test_scanf.c: In function ‘numbers_list_field_width_val_width’:
   lib/test_scanf.c:530:1: error: the frame size of 2088 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
     530 | }
         | ^
   lib/test_scanf.c: In function ‘numbers_list_field_width_typemax’:
   lib/test_scanf.c:488:1: error: the frame size of 2568 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
     488 | }
         | ^
   lib/test_scanf.c: In function ‘numbers_list’:
   lib/test_scanf.c:437:1: error: the frame size of 2088 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
     437 | }
         | ^

In this particular case, the reasonably straightforward solution is to
just split out the test routines into multiple more targeted versions.
That way we don't have one huge stack, but several smaller ones, and
they aren't active all at the same time.

Signed-off-by: Linus Torvalds <[email protected]>

iwl: fix debug printf format strings

The variable 'package_size' is an unsigned long, and should be printed
out using '%lu', not '%zd' (that would be for a size_t).

Yes, on many architectures (including x86-64), 'size_t' is in fact the
same type as 'long', but that's a fairly random architecture definition,
and on some platforms 'size_t' is in fact 'int' rather than 'long'.

That is the case on traditional 32-bit x86. Yes, both types are the
exact same 32-bit size, and it would all print out perfectly correctly,
but '%zd' ends up still being wrong.

And we can't make 'package_size' be a 'size_t', because we get the
actual value using efivar_entry_get() that takes a pointer to an
'unsigned long'. So '%lu' it is.

This fixes two of the i386 allmodconfig build warnings (that is now an
error due to -Werror).

Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'block-5.15-2021-09-05' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Was going to send this one in later this week, but given that -Werror
  is now enabled (or at least available), the mq-deadline fix really
  should go in for the folks hitting that.

   - Ensure dd_queued() is only there if needed (Geert)

   - Fix a kerneldoc warning for bio_alloc_kiocb()

   - BFQ fix for queue merging

   - loop locking fix (Tetsuo)"

* tag 'block-5.15-2021-09-05' of git://git.kernel.dk/linux-block:
  loop: reduce the loop_ctl_mutex scope
  bio: fix kerneldoc documentation for bio_alloc_kiocb()
  block, bfq: honor already-setup queue merges
  block/mq-deadline: Move dd_queued() to fix defined but not used warning

Merge tag 'misc-5.15-2021-09-05' of git://git.kernel.dk/linux-block

Pull CDROM maintainer update from Jens Axboe:
"It's been about 22 years since I originally started maintaining the
  CDROM code, and I just haven't been able to even get reviews done in a
  timely fashion the last handful of years.

  Time to pass it on, and Phillip has volunteered take over these
  duties. I'll be helping as needed for the foreseeable future"

* tag 'misc-5.15-2021-09-05' of git://git.kernel.dk/linux-block:
  cdrom: update uniform CD-ROM maintainership in MAINTAINERS file

Merge tag 'libata-5.15-2021-09-05' of git://git.kernel.dk/linux-block

Pull libata fixes from Jens Axboe:
"Fixes for queued trim on certain Samsung SSDs, in conjunction with
  certain ATI controllers"

* tag 'libata-5.15-2021-09-05' of git://git.kernel.dk/linux-block:
  libata: Add ATA_HORKAGE_NO_NCQ_ON_ATI for Samsung 860 and 870 SSD.
  libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs

Merge tag 'for-5.15/io_uring-2021-09-04' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"As sometimes happens, two reports came in around the merge window open
  that led to some fixes. Hence this one is a bit bigger than usual
  followup fixes, but most of it will be going towards stable, outside
  of the fixes that are addressing regressions from this merge window.

  In detail:

   - postgres is a heavy user of signals between tasks, and if we're
     unlucky this can interfere with io-wq worker creation. Make sure
     we're resilient against unrelated signal handling. This set of
     changes also includes hardening against allocation failures, which
     could previously had led to stalls.

   - Some use cases that end up having a mix of bounded and unbounded
     work would have starvation issues related to that. Split the
     pending work lists to handle that better.

   - Completion trace int -> unsigned -> long fix

   - Fix issue with REGISTER_IOWQ_MAX_WORKERS and SQPOLL

   - Fix regression with hash wait lock in this merge window

   - Fix retry issued on block devices (Ming)

   - Fix regression with links in this merge window (Pavel)

   - Fix race with multi-shot poll and completions (Xiaoguang)

   - Ensure regular file IO doesn't inadvertently skip completion
     batching (Pavel)

   - Ensure submissions are flushed after running task_work (Pavel)"

* tag 'for-5.15/io_uring-2021-09-04' of git://git.kernel.dk/linux-block:
  io_uring: io_uring_complete() trace should take an integer
  io_uring: fix possible poll event lost in multi shot mode
  io_uring: prolong tctx_task_work() with flushing
  io_uring: don't disable kiocb_done() CQE batching
  io_uring: ensure IORING_REGISTER_IOWQ_MAX_WORKERS works with SQPOLL
  io-wq: make worker creation resilient against signals
  io-wq: get rid of FIXED worker flag
  io-wq: only exit on fatal signals
  io-wq: split bounded and unbounded work into separate lists
  io-wq: fix queue stalling race
  io_uring: don't submit half-prepared drain request
  io_uring: fix queueing half-created requests
  io-wq: ensure that hash wait lock is IRQ disabling
  io_uring: retry in case of short read on block device
  io_uring: IORING_OP_WRITE needs hash_reg_file set
  io-wq: fix race between adding work and activating a free worker

don't make the syscall checking produce errors from warnings

Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

net: wwan: iosm: Unify IO accessors used in the driver

Currently we have readl()/writel()/ioread*()/iowrite*() APIs in use.
Let's unify to use only ioread*()/iowrite*() variants.

Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: wwan: iosm: Replace io.*64_lo_hi() with regular accessors

The io.*_lo_hi() variants are not strictly needed on the x86 hardware
and especially the PCI bus. Replace them with regular accessors, but
leave headers in place in case of 32-bit build.

Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: qcom/emac: Replace strlcpy with strscpy

The strlcpy should not be used because it doesn't limit the source
length. As linus says, it's a completely useless function if you
can't implicitly trust the source string - but that is almost always
why people think they should use it! All in all the BSD function
will lead some potential bugs.

But the strscpy doesn't require reading memory from the src string
beyond the specified "count" bytes, and since the return value is
easier to error-check than strlcpy()'s. In addition, the implementation
is robust to the string changing out from underneath it, unlike the
current strlcpy() implementation.

Thus, We prefer using strscpy instead of strlcpy.

Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ip6_gre: Revert "ip6_gre: add validation for csum_start"

This reverts commit 9cf448c200ba9935baa94e7a0964598ce947db9d.

This commit was added for equivalence with a similar fix to ip_gre.
That fix proved to have a bug. Upon closer inspection, ip6_gre is not
susceptible to the original bug.

So revert the unnecessary extra check.

In short, ipgre_xmit calls skb_pull to remove ipv4 headers previously
inserted by dev_hard_header. ip6gre_tunnel_xmit does not.

Link: https://lore.kernel.org/netdev/CA+FuTSe+vJgTVLc9SojGuN-f9YQ+xWLPKE_S4f=f+w+_P2hgUg@mail.gmail.com/#t
Fixes: 9cf448c200ba ("ip6_gre: add validation for csum_start")
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

kernel: debug: Convert to SPDX identifier

use SPDX-License-Identifier instead of a verbose license text

Signed-off-by: Cai Huoqing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Daniel Thompson <[email protected]>

KVM: Drop unused kvm_dirty_gfn_invalid()

Drop the unused function as reported by test bot.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <20210901230904 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

net: hns3: make hclgevf_cmd_caps_bit_map0 and hclge_cmd_caps_bit_map0 static

This symbols is not used outside of hclge_cmd.c and hclgevf_cmd.c, so marks
it static.

Fix the following sparse warning:

drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c:345:35:
warning: symbol 'hclgevf_cmd_caps_bit_map0' was not declared. Should it
be static?

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c:365:33: warning:
symbol 'hclge_cmd_caps_bit_map0' was not declared. Should it be static?

Reported-by: Abaci Robot <[email protected]>
Signed-off-by: chongjiapeng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'bonding-fix'

Jussi Maki says:

====================
bonding: Fix negative jump count reported by syzbot

This patch set fixes a negative jump count warning encountered by
syzbot [1] and extends the tests to cover nested bonding devices.

[1]: https://lore.kernel.org/lkml/0000000000000a9f3605cb1d2455@google.com/
====================

Signed-off-by: David S. Miller <[email protected]>

selftests/bpf: Test XDP bonding nest and unwind

Modify the test to check that enslaving a bond slave with a XDP program
is now allowed.

Extend attach test to exercise the program unwinding in bond_xdp_set and
add a new test for loading XDP program on doubly nested bond device to
verify that static key incr/decr is correct.

Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bonding: Fix negative jump label count on nested bonding

With nested bonding devices the nested bond device's ndo_bpf was
called without a program causing it to decrement the static key
without a prior increment leading to negative count.

Fix the issue by 1) only calling slave's ndo_bpf when there's a
program to be loaded and 2) only decrement the count when a program
is unloaded.

Fixes: 9e2ee5c7e7c3 ("net, bonding: Add XDP support to the bonding driver")
Reported-by: [email protected]
Signed-off-by: Jussi Maki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

MAINTAINERS: add VM SOCKETS (AF_VSOCK) entry

Add a new entry for VM Sockets (AF_VSOCK) that covers vsock core,
tests, and headers. Move some general vsock stuff from virtio-vsock
entry into this new more general vsock entry.

I've been reviewing and contributing for the last few years,
so I'm available to help maintain this code.

Cc: Dexuan Cui <[email protected]>
Cc: Jorgen Hansen <[email protected]>
Cc: Stefan Hajnoczi <[email protected]>
Suggested-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Stefano Garzarella <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

stmmac: dwmac-loongson:Fix missing return value

Add the return value when phy_mode < 0.

Signed-off-by: zhaoxiao <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fuse: remove unused arg in fuse_write_file_get()

The struct fuse_conn argument is not used and can be removed.

Signed-off-by: Miklos Szeredi <[email protected]>

fuse: wait for writepages in syncfs

In case of fuse the MM subsystem doesn't guarantee that page writeback
completes by the time ->sync_fs() is called.  This is because fuse
completes page writeback immediately to prevent DoS of memory reclaim by
the userspace file server.

This means that fuse itself must ensure that writes are synced before
sending the SYNCFS request to the server.

Introduce sync buckets, that hold a counter for the number of outstanding
write requests.  On syncfs replace the current bucket with a new one and
wait until the old bucket's counter goes down to zero.

It is possible to have multiple syncfs calls in parallel, in which case
there could be more than one waited-on buckets.  Descendant buckets must
not complete until the parent completes.  Add a count to the child (new)
bucket until the (parent) old bucket completes.

Use RCU protection to dereference the current bucket and to wake up an
emptied bucket.  Use fc->lock to protect against parallel assignments to
the current bucket.

This leaves just the counter to be a possible scalability issue.  The
fc->num_waiting counter has a similar issue, so both should be addressed at
the same time.

Reported-by: Amir Goldstein <[email protected]>
Fixes: 2d82ab251ef0 ("virtiofs: propagate sync() to file server")
Cc: <[email protected]> # v5.14
Signed-off-by: Miklos Szeredi <[email protected]>

Documentation: Add documentation for VDUSE

VDUSE (vDPA Device in Userspace) is a framework to support
implementing software-emulated vDPA devices in userspace. This
document is intended to clarify the VDUSE design and usage.

Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>

vduse: Introduce VDUSE - vDPA Device in Userspace

This VDUSE driver enables implementing software-emulated vDPA
devices in userspace. The vDPA device is created by
ioctl(VDUSE_CREATE_DEV) on /dev/vduse/control. Then a char device
interface (/dev/vduse/$NAME) is exported to userspace for device
emulation.

In order to make the device emulation more secure, the device's
control path is handled in kernel. A message mechnism is introduced
to forward some dataplane related control messages to userspace.

And in the data path, the DMA buffer will be mapped into userspace
address space through different ways depending on the vDPA bus to
which the vDPA device is attached. In virtio-vdpa case, the MMU-based
software IOTLB is used to achieve that. And in vhost-vdpa case, the
DMA buffer is reside in a userspace memory region which can be shared
to the VDUSE userspace processs via transferring the shmfd.

For more details on VDUSE design and usage, please see the follow-on
Documentation commit.

NB(mst): when merging this with
b542e383d8c0 ("eventfd: Make signal recursion protection a task bit")
replace eventfd_signal_count with eventfd_signal_allowed,
and drop the previous
("eventfd: Export eventfd_wake_count to modules").

Signed-off-by: Xie Yongji <[email protected]>
Acked-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>