parisc: Ensure userspace privilege for ptraced processes in regset functions
On parisc the privilege level of a process is stored in the lowest two bits of
the instruction pointers (IAOQ0 and IAOQ1). On Linux we use privilege level 0
for the kernel and privilege level 3 for user-space. So userspace should not be
allowed to modify IAOQ0 or IAOQ1 of a ptraced process to change it's privilege
level to e.g. 0 to try to gain kernel privileges.
This patch prevents such modifications in the regset support functions by
always setting the two lowest bits to one (which relates to privilege level 3
for user-space) if IAOQ0 or IAOQ1 are modified via ptrace regset calls.
parisc: Fix kernel panic due invalid values in IAOQ0 or IAOQ1
On parisc the privilege level of a process is stored in the lowest two bits of
the instruction pointers (IAOQ0 and IAOQ1). On Linux we use privilege level 0
for the kernel and privilege level 3 for user-space. So userspace should not be
allowed to modify IAOQ0 or IAOQ1 of a ptraced process to change it's privilege
level to e.g. 0 to try to gain kernel privileges.
This patch prevents such modifications by always setting the two lowest bits to
one (which relates to privilege level 3 for user-space) if IAOQ0 or IAOQ1 are
modified via ptrace calls in the native and compat ptrace paths.
Cong Wang [Tue, 16 Jul 2019 20:57:30 +0000 (13:57 -0700)]
net_sched: unset TCQ_F_CAN_BYPASS when adding filters
For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
notably fq_codel, it makes no sense to let packets bypass the TC
filters we setup in any scenario, otherwise our packets steering
policy could not be enforced.
This can be reproduced easily with the following script:
ip li add dev dummy0 type dummy
ifconfig dummy0 up
tc qd add dev dummy0 root fq_codel
tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo
tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo
ping -I dummy0 192.168.112.1
Without this patch, packets are sent directly to dummy0 without
hitting any of the filters. With this patch, packets are redirected
to loopback as expected.
This fix is not perfect, it only unsets the flag but does not set it back
because we have to save the information somewhere in the qdisc if we
really want that. Note, both fq_codel and sfq clear this flag in their
->bind_tcf() but this is clearly not sufficient when we don't use any
class ID.
Merge tag 'gpio-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
- Revert a SPIO GPIO fix that didn't fix anything but instead created
new problems.
- Remove the EM GPIO irqdomain in a safe manner.
- Fix a memory leak in the gpio quirks.
- Make the DaVinci error path silent on probe deferral.
* tag 'gpio-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
Revert "gpio/spi: Fix spi-gpio regression on active high CS"
gpio: em: remove the gpiochip before removing the irq domain
gpiolib: of: fix a memory leak in of_gpio_flags_quirks()
gpio: davinci: silence error prints in case of EPROBE_DEFER
net/rds: Initialize ic->i_fastreg_wrs upon allocation
Otherwise, if an IB connection is torn down before "rds_ib_setup_qp"
is called, the value of "ic->i_fastreg_wrs" is still at zero
(as it wasn't initialized by "rds_ib_setup_qp").
Consequently "rds_ib_conn_path_shutdown" will spin forever,
waiting for it to go back to "RDS_IB_DEFAULT_FR_WR",
which of course will never happen as there are no
outstanding work requests.
net/rds: Keep track of and wait for FRWR segments in use upon shutdown
Since "rds_ib_free_frmr" and "rds_ib_free_frmr_list" simply put
the FRMR memory segments on the "drop_list" or "free_list",
and it is the job of "rds_ib_flush_mr_pool" to reap those entries
by ultimately issuing a "IB_WR_LOCAL_INV" work-request,
we need to trigger and then wait for all those memory segments
attached to a particular connection to be fully released before
we can move on to release the QP, CQ, etc.
So we make "rds_ib_conn_path_shutdown" wait for one more
atomic_t called "i_fastreg_inuse_count" that keeps track of how
many FRWR memory segments are out there marked "FRMR_IS_INUSE"
(and also wake_up rds_ib_ring_empty_wait, as they go away).
net/rds: Set fr_state only to FRMR_IS_FREE if IB_WR_LOCAL_INV had been successful
Fix a bug where fr_state first goes to FRMR_IS_STALE, because of a failure
of operation IB_WR_LOCAL_INV, but then gets set back to "FRMR_IS_FREE"
uncoditionally, even though the operation failed.
Make function "rds_ib_try_reuse_ibmr" return NULL in case
memory region could not be allocated, since callers
simply check if the return value is not NULL.
net/rds: Wait for the FRMR_IS_FREE (or FRMR_IS_STALE) transition after posting IB_WR_LOCAL_INV
In order to:
1) avoid a silly bouncing between "clean_list" and "drop_list"
triggered by function "rds_ib_reg_frmr" as it is releases frmr
regions whose state is not "FRMR_IS_FREE" right away.
2) prevent an invalid access error in a race from a pending
"IB_WR_LOCAL_INV" operation with a teardown ("dma_unmap_sg", "put_page")
and de-registration ("ib_dereg_mr") of the corresponding
memory region.
net/rds: Get rid of "wait_clean_list_grace" and add locking
Waiting for activity on the "clean_list" to quiesce is no substitute
for proper locking.
We can have multiple threads competing for "llist_del_first"
via "rds_ib_reuse_mr", and a single thread competing
for "llist_del_all" and "llist_del_first" via "rds_ib_flush_mr_pool".
Since "llist_del_first" depends on "list->first->next" not to change
in the midst of the operation, simply waiting for all current calls
to "rds_ib_reuse_mr" to quiesce across all CPUs is woefully inadequate:
By the time "wait_clean_list_grace" is done iterating over all CPUs to see
that there is no concurrent caller to "rds_ib_reuse_mr", a new caller may
have just shown up on the first CPU.
Furthermore, <linux/llist.h> explicitly calls out the need for locking:
* Cases where locking is needed:
* If we have multiple consumers with llist_del_first used in one consumer,
* and llist_del_first or llist_del_all used in other consumers,
* then a lock is needed.
Also, while at it, drop the unused "pool" parameter
from "list_to_llist_nodes".
net/rds: Give fr_state a chance to transition to FRMR_IS_FREE
In the context of FRMR (ib_frmr.c):
Memory regions make it onto the "clean_list" via "rds_ib_flush_mr_pool",
after the memory region has been posted for invalidation via
"rds_ib_post_inv".
At that point in time, "fr_state" may still be in state "FRMR_IS_INUSE",
since the only place where "fr_state" transitions to "FRMR_IS_FREE"
is in "rds_ib_mr_cqe_handler", which is triggered by a tasklet.
So in case we notice that "fr_state != FRMR_IS_FREE" (see below),
we wait for "fr_inv_done" to trigger with a maximum of 10msec.
Then we check again, and only put the memory region onto the drop_list
(via "rds_ib_free_frmr") in case the situation remains unchanged.
This avoids the problem of memory-regions bouncing between "clean_list"
and "drop_list" before they even have a chance to be properly invalidated.
net/sched/act_ct.o: In function `tcf_ct_act':
act_ct.c:(.text+0x21ac): undefined reference to `nf_ct_nat_ext_add'
act_ct.c:(.text+0x229a): undefined reference to `nf_nat_icmp_reply_translation'
act_ct.c:(.text+0x233a): undefined reference to `nf_nat_setup_info'
act_ct.c:(.text+0x234a): undefined reference to `nf_nat_alloc_null_binding'
act_ct.c:(.text+0x237c): undefined reference to `nf_nat_packet'
net: sctp: fix warning "NULL check before some freeing functions is not needed"
This patch removes NULL checks before calling kfree.
fixes below issues reported by coccicheck
net/sctp/sm_make_chunk.c:2586:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2652:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2667:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2684:3-8: WARNING: NULL check before some
freeing functions is not needed.
Merge tag 'hwlock-v5.3' of git://github.com/andersson/remoteproc
Pull hwspinlock updates from Bjorn Andersson:
"This contains support for hardware spinlock TI K3 AM65x and J721E
family of SoCs, support for using hwspinlocks from atomic context and
better error reporting when dealing with hardware disabled in
DeviceTree"
* tag 'hwlock-v5.3' of git://github.com/andersson/remoteproc:
hwspinlock: add the 'in_atomic' API
hwspinlock: document the hwspinlock 'raw' API
hwspinlock: stm32: implement the relax() ops
hwspinlock: ignore disabled device
hwspinlock/omap: Add a trace during probe
hwspinlock/omap: Add support for TI K3 SoCs
dt-bindings: hwlock: Update OMAP binding for TI K3 SoCs
Merge tag 'rproc-v5.3' of git://github.com/andersson/remoteproc
Pull remoteproc updates from Bjorn Andersson:
"This adds support for the STM32 remoteproc, additional i.MX platforms
with Cortex M4 remoteprocs and Qualcomm's QCS404 Compute DSP.
Also initial support for vendor specific resource table entries and
support for unprocessed Qualcomm firmware files"
* tag 'rproc-v5.3' of git://github.com/andersson/remoteproc:
remoteproc: stm32: fix building without ARM SMCC
remoteproc: qcom: q6v5-mss: Fix build error without QCOM_MDT_LOADER
remoteproc: copy parent dma_pfn_offset for vdev
remoteproc: qcom: q6v5-mss: Support loading non-split images
soc: qcom: mdt_loader: Support loading non-split images
remoteproc: stm32: add an ST stm32_rproc driver
dt-bindings: remoteproc: add bindings for stm32 remote processor driver
dt-bindings: stm32: add bindings for ML-AHB interconnect
remoteproc: Use struct_size() helper
remoteproc: add vendor resources handling
remoteproc: imx: Fix typo in "failed"
remoteproc: imx: Broaden the Kconfig selection logic
remoteproc,rpmsg: add missing MAINTAINERS file entries
remoteproc: qcom: qdsp6-adsp: Add support for QCS404 CDSP
dt-bindings: remoteproc: Rename and amend Hexagon v56 binding
If we shut down a process without having destroyed its GWS-using
queues, it is possible that GWS BO will still be in the process
BO list during the gpuvm destruction. This list should be empty
at that time, so we should remove the GWS allocation at the
process uninit point if it is still around.
Evan Quan [Thu, 11 Jul 2019 07:13:17 +0000 (15:13 +0800)]
drm/amd/powerplay: correct smu_update_table usage
The interface was used in a confusing way. In profile mode scenario,
the 2nd parameter of the interface was used in a different way from
other scenarios.
Felix Kuehling [Sat, 13 Jul 2019 06:27:34 +0000 (02:27 -0400)]
drm/amdgpu: Fix silent amdgpu_bo_move failures
Under memory pressure, buffer moves between RAM to VRAM can
fail when there is no GTT space available. In those cases
amdgpu_bo_move falls back to ttm_bo_move_memcpy, which seems to
succeed, although it doesn't really support non-contiguous or
invisible VRAM. This manifests as VM faults with corrupted page
table entries in KFD eviction stress tests.
Print some helpful messages when lack of GTT space is causing buffer
moves to fail. Check that source and destination memory regions are
supported by ttm_bo_move_memcpy before taking that fallback.
Merge tag 'rpmsg-v5.3' of git://github.com/andersson/remoteproc
Pull rpmsg updates from Bjorn Andersson:
"This contains a DT binding update and a change to make the remote
function of rpmsg_devices optional"
* tag 'rpmsg-v5.3' of git://github.com/andersson/remoteproc:
rpmsg: core: Make remove handler for rpmsg driver optional.
dt-bindings: soc: qcom: Add remote-pid binding for GLINK SMEM
Merge tag 'vfio-v5.3-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Static symbol cleanup in mdev samples (Kefeng Wang)
- Use vma help in nvlink code (Peng Hao)
- Remove unused code in mbochs sample (YueHaibing)
- Send uevents around mdev registration (Alex Williamson)
* tag 'vfio-v5.3-rc1' of git://github.com/awilliam/linux-vfio:
mdev: Send uevents around parent device registration
sample/mdev/mbochs: remove set but not used variable 'mdev_state'
vfio: vfio_pci_nvlink2: use a vma helper function
vfio-mdev/samples: make some symbols static
line 1: directory path to the .ko file
line 2: a list of objects linked into this module
line 3: unresolved symbols (only when CONFIG_TRIM_UNUSED_KSYMS=y)
Now that *.mod and *.ko are created in the same directory, the line 1
provides no valuable information. It can be derived by replacing the
extension .mod with .ko. In fact, nobody uses the first line any more.
kbuild: create *.mod with full directory path and remove MODVERDIR
While descending directories, Kbuild produces objects for modules,
but do not link final *.ko files; it is done in the modpost.
To keep track of modules, Kbuild creates a *.mod file in $(MODVERDIR)
for every module it is building. Some post-processing steps read the
necessary information from *.mod files. This avoids descending into
directories again. This mechanism was introduced in 2003 or so.
Later, commit 551559e13af1 ("kbuild: implement modules.order") added
modules.order. So, we can simply read it out to know all the modules
with directory paths. This is easier than parsing the first line of
*.mod files.
$(MODVERDIR) has a flat directory structure, that is, *.mod files
are named only with base names. This is based on the assumption that
the module name is unique across the tree. This assumption is really
fragile.
Stephen Rothwell reported a race condition caused by a module name
conflict:
https://lkml.org/lkml/2019/5/13/991
In parallel building, two different threads could write to the same
$(MODVERDIR)/*.mod simultaneously.
Non-unique module names are the source of all kind of troubles, hence
commit 3a48a91901c5 ("kbuild: check uniqueness of module names")
introduced a new checker script.
However, it is still fragile in the build system point of view because
this race happens before scripts/modules-check.sh is invoked. If it
happens again, the modpost will emit unclear error messages.
To fix this issue completely, create *.mod with full directory path
so that two threads never attempt to write to the same file.
$(MODVERDIR) is no longer needed.
Since modules with directory paths are listed in modules.order, Kbuild
is still able to find *.mod files without additional descending.
I also killed cmd_secanalysis; scripts/mod/sumversion.c computes MD4 hash
for modules with MODULE_VERSION(). When CONFIG_DEBUG_SECTION_MISMATCH=y,
it occurs not only in the modpost stage, but also during directory
descending, where sumversion.c may parse stale *.mod files. It would emit
'No such file or directory' warning when an object consisting a module is
renamed, or when a single-obj module is turned into a multi-obj module or
vice versa.
kbuild: export_report: read modules.order instead of .tmp_versions/*.mod
Towards the goal of removing MODVERDIR aka .tmp_versions, read out
modules.order to get the list of modules to be processed. This is
simpler than parsing *.mod files in .tmp_versions.
kbuild: modpost: read modules.order instead of $(MODVERDIR)/*.mod
Towards the goal of removing MODVERDIR, read out modules.order to get
the list of modules to be processed. This is simpler than parsing *.mod
files in $(MODVERDIR).
For external modules, $(KBUILD_EXTMOD)/modules.order should be read.
I removed the single target %.ko from the top Makefile. To make sure
modpost works correctly, vmlinux and the other modules must be built.
You cannot build a particular .ko file alone.
Mike Snitzer [Wed, 17 Jul 2019 16:57:06 +0000 (12:57 -0400)]
dm: use printk ratelimiting functions
DM provided its own ratelimiting printk wrapper but given printk
advances this is no longer needed.
Also, switching DMDEBUG_LIMIT to using pr_debug_ratelimited() fixes the
reported issue where DMDEBUG_LIMIT() still caused a flood of "callbacks
suppressed" messages.
Reported-by: Milan Broz <[email protected]>
Depends-on: 29fc2bc7539386 ("printk: pr_debug_ratelimited: check state first to reduce "callbacks suppressed" messages") Signed-off-by: Mike Snitzer <[email protected]>
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"This round of clk driver and framework updates is heavy on the driver
update side. The two main highlights in the core framework are the
addition of an bulk clk_get API that handles optional clks and an
extra debugfs file that tells the developer about the current parent
of a clk.
The driver updates are dominated by i.MX in the diffstat, but that is
mostly because that SoC has started converting to the clk_hw style of
clk registration. The next big update is in the Amlogic meson clk
driver that gained some support for audio, cpu, and temperature clks
while fixing some PLL issues. Finally, the biggest thing that stands
out is the conversion of a large part of the Allwinner sunxi-ng driver
to the new clk parent scheme that uses less strings and more pointer
comparisons to match clk parents and children up.
In general, it looks like we have a lot of little fixes and tweaks
here and there to clk data along with the normal addition of a handful
of new drivers and a couple new core framework features.
Core:
- Add a 'clk_parent' file in clk debugfs
- Add a clk_bulk_get_optional() API (with devm too)
New Drivers:
- Support gated clk controller on MIPS based BCM63XX SoCs
- Support SiLabs Si5341 and Si5340 chips
- Support for CPU clks on Raspberry Pi devices
- Audsys clock driver for MediaTek MT8516 SoCs
Updates:
- Convert a large portion of the Allwinner sunxi-ng driver to new clk parent scheme
- Small frequency support for SiLabs Si544 chips
- Slow clk support for AT91 SAM9X60 SoCs
- Remove dead code in various clk drivers (-Wunused)
- Support for Marvell 98DX1135 SoCs
- Get duty cycle of generic pwm clks
- Improvement in mmc phase calculation and cleanup of some rate defintions
- Switch i.MX6 and i.MX7 clock drivers to clk_hw based APIs
- Add GPIO, SNVS and GIC clocks for i.MX8 drivers
- Mark imx6sx/ul/ull/sll MMDC_P1_IPG and imx8mm DRAM_APB as critical clock
- Correct imx7ulp nic1_bus_clk and imx8mm audio_pll2_clk clock setting
- Add clks for new Exynos5422 Dynamic Memory Controller driver
- Clock definition for Exynos4412 Mali
- Add CMM (Color Management Module) clocks on Renesas R-Car H3, M3-N, E3, and D3
- Add TPU (Timer Pulse Unit / PWM) clocks on Renesas RZ/G2M
- Support for 32 bit clock IDs in TI's sci-clks for J721e SoCs
- TI clock probing done from DT by default instead of firmware
- Fix Amlogic Meson mpll fractional part and spread sprectrum issues
- Add Amlogic meson8 audio clocks
- Add Amlogic g12a temperature sensors clocks
- Add Amlogic g12a and g12b cpu clocks
- Add TPU (Timer Pulse Unit / PWM) clocks on Renesas R-Car H3, M3-W, and M3-N
- Add CMM (Color Management Module) clocks on Renesas R-Car M3-W
- Add Clock Domain support on Renesas RZ/N1"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (190 commits)
clk: consoldiate the __clk_get_hw() declarations
clk: sprd: Add check for return value of sprd_clk_regmap_init()
clk: lochnagar: Update DT binding doc to include the primary SPDIF MCLK
clk: Add Si5341/Si5340 driver
dt-bindings: clock: Add silabs,si5341
clk: clk-si544: Implement small frequency change support
clk: add BCM63XX gated clock controller driver
devicetree: document the BCM63XX gated clock bindings
clk: at91: sckc: use dedicated functions to unregister clock
clk: at91: sckc: improve error path for sama5d4 sck registration
clk: at91: sckc: remove unnecessary line
clk: at91: sckc: improve error path for sam9x5 sck register
clk: at91: sckc: add support to free slow clock osclillator
clk: at91: sckc: add support to free slow rc oscillator
clk: at91: sckc: add support to free slow oscillator
clk: rockchip: export HDMIPHY clock on rk3228
clk: rockchip: add watchdog pclk on rk3328
clk: rockchip: add clock id for hdmi_phy special clock on rk3228
clk: rockchip: add clock id for watchdog pclk on rk3328
clk: at91: sckc: add support for SAM9X60
...
* tag 'rtc-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (37 commits)
rtc: wm831x: Add IRQF_ONESHOT flag
rtc: stm32: remove one condition check in stm32_rtc_set_alarm()
rtc: pcf2123: Fix build error
rtc: interface: Change type of 'count' from int to u64
rtc: pcf8563: Clear event flags and disable interrupts before requesting irq
rtc: pcf8563: Fix interrupt trigger method
rtc: pcf2123: fix negative offset rounding
rtc: pcf2123: add alarm support
rtc: pcf2123: use %ptR
rtc: pcf2123: port to regmap
rtc: pcf2123: remove sysfs register view
rtc: rx8025: simplify getting the adapter of a client
rtc: rx8010: simplify getting the adapter of a client
rtc: rv8803: simplify getting the adapter of a client
rtc: m41t80: simplify getting the adapter of a client
rtc: fm3130: simplify getting the adapter of a client
rtc: tegra: Drop MODULE_ALIAS
rtc: sun6i: Add R40 compatible
dt-bindings: rtc: sun6i: Add the R40 RTC compatible
dt-bindings: rtc: Convert Allwinner A31 RTC to a schema
...
Merge tag 'dmaengine-5.3-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
- Add support in dmaengine core to do device node checks for DT devices
and update bunch of drivers to use that and remove open coding from
drivers
- New driver/driver support for new hardware, namely:
- MediaTek UART APDMA
- Freescale i.mx7ulp edma2
- Synopsys eDMA IP core version 0
- Allwinner H6 DMA
- Updates to axi-dma and support for interleaved cyclic transfers
- Greg's debugfs return value check removals on drivers
- Updates to stm32-dma, hsu, dw, pl330, tegra drivers
* tag 'dmaengine-5.3-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (68 commits)
dmaengine: Revert "dmaengine: fsl-edma: add i.mx7ulp edma2 version support"
dmaengine: at_xdmac: check for non-empty xfers_list before invoking callback
Documentation: dmaengine: clean up description of dmatest usage
dmaengine: tegra210-adma: remove PM_CLK dependency
dmaengine: fsl-edma: add i.mx7ulp edma2 version support
dt-bindings: dma: fsl-edma: add new i.mx7ulp-edma
dmaengine: fsl-edma-common: version check for v2 instead
dmaengine: fsl-edma-common: move dmamux register to another single function
dmaengine: fsl-edma: add drvdata for fsl-edma
dmaengine: Revert "dmaengine: fsl-edma: support little endian for edma driver"
dmaengine: rcar-dmac: Reject zero-length slave DMA requests
dmaengine: dw: Enable iDMA 32-bit on Intel Elkhart Lake
dmaengine: dw-edma: fix semicolon.cocci warnings
dmaengine: sh: usb-dmac: Use [] to denote a flexible array member
dmaengine: dmatest: timeout value of -1 should specify infinite wait
dmaengine: dw: Distinguish ->remove() between DW and iDMA 32-bit
dmaengine: fsl-edma: support little endian for edma driver
dmaengine: hsu: Revert "set HSU_CH_MTSR to memory width"
dmagengine: pl330: add code to get reset property
dt-bindings: pl330: document the optional resets property
...
Merge tag 'mips_5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Paul Burton:
"A light batch this time around but significant improvements for
certain systems:
- Removal of readq & writeq for MIPS32 kernels where they would
simply BUG() anyway, allowing drivers or other code that #ifdefs on
their presence to work properly.
- Improvements for Ingenic JZ4740 systems, including support for the
external memory controller & pinmuxing fixes for qi_lb60/NanoNote
systems.
- Improvements for Lantiq systems, in particular around SMP & IPIs.
- DT updates for ralink/MediaTek MT7628a systems to probe & configure
a bunch more devices.
- Miscellaneous cleanups & build fixes"
* tag 'mips_5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (30 commits)
MIPS: fix some more fall through errors in arch/mips
MIPS: perf events: handle switch statement falling through warnings
mips/kprobes: Export kprobe_fault_handler()
MAINTAINERS: Add myself as Ingenic SoCs maintainer
MIPS: ralink: mt7628a.dtsi: Add watchdog controller DT node
MIPS: ralink: mt7628a.dtsi: Add SPI controller DT node
MIPS: ralink: mt7628a.dtsi: Add GPIO controller DT node
MIPS: ralink: mt7628a.dtsi: Add pinctrl DT properties to the UART nodes
MIPS: ralink: mt7628a.dtsi: Add pinmux DT node
MIPS: ralink: mt7628a.dtsi: Add SPDX GPL-2.0 license identifier
MIPS: lantiq: Add SMP support for lantiq interrupt controller
MIPS: lantiq: Shorten register names, remove unused macros
MIPS: lantiq: Fix bitfield masking
MIPS: lantiq: Remove unused macros
MIPS: lantiq: Fix attributes of of_device_id structure
MIPS: lantiq: Change variables to the same type as the source
MIPS: lantiq: Move macro directly to iomem function
mips: Remove q-accessors from non-64bit platforms
FDDI: defza: Include linux/io-64-nonatomic-lo-hi.h
MIPS: configs: Remove useless UEVENT_HELPER_PATH
...
Merge tag 'for-linus-20190617' of git://git.sourceforge.jp/gitroot/uclinux-h8/linux
Pull SH updates from Yoshinori Sato.
kprobe fix, defconfig updates and a SH Kconfig fix.
* tag 'for-linus-20190617' of git://git.sourceforge.jp/gitroot/uclinux-h8/linux:
arch/sh: Check for kprobe trap number before trying to handle a kprobe trap
sh: configs: Remove useless UEVENT_HELPER_PATH
Fix allyesconfig output.
Wanpeng Li [Sat, 6 Jul 2019 01:26:50 +0000 (09:26 +0800)]
KVM: LAPIC: Make lapic timer unpinned
Commit 61abdbe0bcc2 ("kvm: x86: make lapic hrtimer pinned") pinned the
lapic timer to avoid to wait until the next kvm exit for the guest to
see KVM_REQ_PENDING_TIMER set. There is another solution to give a kick
after setting the KVM_REQ_PENDING_TIMER bit, make lapic timer unpinned
will be used in follow up patches.
Daniel Drake [Wed, 17 Jul 2019 05:10:58 +0000 (13:10 +0800)]
platform/x86: asus: Rename "fan mode" to "fan boost mode"
The Asus WMI spec indicates that the function being controlled here
is called "Fan Boost Mode". The user-facing documentation also calls it
this.
The spec uses the term "fan mode" is used to refer to other things,
including functionality expected to appear on future products.
We missed this before as we are not dealing with the most readable of
specs, and didn't forsee any confusion around shortening the name.
Rename "fan mode" to "fan boost mode" to improve consistency with the
spec and to avoid a future naming conflict.
There is no interface breakage here since this has yet to be included
in an official kernel release. I also updated the kernel version listed
under ABI accordingly.
Merge more updates from Andrew Morton:
"VM:
- z3fold fixes and enhancements by Henry Burns and Vitaly Wool
- more accurate reclaimed slab caches calculations by Yafang Shao
- fix MAP_UNINITIALIZED UAPI symbol to not depend on config, by
Christoph Hellwig
- !CONFIG_MMU fixes by Christoph Hellwig
- new novmcoredd parameter to omit device dumps from vmcore, by
Kairui Song
- new test_meminit module for testing heap and pagealloc
initialization, by Alexander Potapenko
- ioremap improvements for huge mappings, by Anshuman Khandual
- generalize kprobe page fault handling, by Anshuman Khandual
- device-dax hotplug fixes and improvements, by Pavel Tatashin
- enable synchronous DAX fault on powerpc, by Aneesh Kumar K.V
- add pte_devmap() support for arm64, by Robin Murphy
- unify locked_vm accounting with a helper, by Daniel Jordan
- several misc fixes
core/lib:
- new typeof_member() macro including some users, by Alexey Dobriyan
- make BIT() and GENMASK() available in asm, by Masahiro Yamada
- changed LIST_POISON2 on x86_64 to 0xdead000000000122 for better
code generation, by Alexey Dobriyan
- rbtree code size optimizations, by Michel Lespinasse
- convert struct pid count to refcount_t, by Joel Fernandes
get_maintainer.pl:
- add --no-moderated switch to skip moderated ML's, by Joe Perches
misc:
- ptrace PTRACE_GET_SYSCALL_INFO interface
- coda updates
- gdb scripts, various"
[ Using merge message suggestion from Vlastimil Babka, with some editing - Linus ]
* emailed patches from Andrew Morton <[email protected]>: (100 commits)
fs/select.c: use struct_size() in kmalloc()
mm: add account_locked_vm utility function
arm64: mm: implement pte_devmap support
mm: introduce ARCH_HAS_PTE_DEVMAP
mm: clean up is_device_*_page() definitions
mm/mmap: move common defines to mman-common.h
mm: move MAP_SYNC to asm-generic/mman-common.h
device-dax: "Hotremove" persistent memory that is used like normal RAM
mm/hotplug: make remove_memory() interface usable
device-dax: fix memory and resource leak if hotplug fails
include/linux/lz4.h: fix spelling and copy-paste errors in documentation
ipc/mqueue.c: only perform resource calculation if user valid
include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
scripts/gdb: add helpers to find and list devices
scripts/gdb: add lx-genpd-summary command
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
kernel/pid.c: convert struct pid count to refcount_t
drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings
select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()
select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR
...
Currently, kcopyd has a sub-job size of 64KB and a maximum number of 8
sub-jobs. As a result, for any kcopyd job, we have a maximum of 512KB of
I/O in flight.
This upper limit to the amount of in-flight I/O under-utilizes fast
devices and results in decreased throughput, e.g., when writing to a
snapshotted thin LV with I/O size less than the pool's block size (so
COW is performed using kcopyd).
Increase kcopyd's default sub-job size to 512KB, so we have a maximum of
4MB of I/O in flight for each kcopyd job. This results in an up to 96%
improvement of bandwidth when writing to a snapshotted thin LV, with I/O
sizes less than the pool's block size.
Also, add dm_mod.kcopyd_subjob_size_kb module parameter to allow users
to fine tune the sub-job size of kcopyd. The default value of this
parameter is 512KB and the maximum allowed value is 1024KB.
We evaluate the performance impact of the change by running the
snap_breaking_throughput benchmark, from the device mapper test suite
[1].
The benchmark:
1. Creates a 1G thin LV
2. Provisions the thin LV
3. Takes a snapshot of the thin LV
4. Writes to the thin LV with:
Running this benchmark with various thin pool block sizes and dd I/O
sizes (all combinations triggering the use of kcopyd) we get the
following results:
Mike Snitzer [Wed, 17 Jul 2019 15:12:30 +0000 (11:12 -0400)]
dm snapshot: fix oversights in optional discard support
__find_snapshots_sharing_cow() should always be used with _origins_lock
held so fix snapshot_io_hints() accordingly. Also, once a snapshot is
being merged discards must not be allowed -- otherwise incorrect or
duplicate work will be performed.
Fixes: 2e6023850e177d ("dm snapshot: add optional discard support features") Reported-by: Nikos Tsironis <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
Damien Le Moal [Tue, 16 Jul 2019 05:39:34 +0000 (14:39 +0900)]
dm zoned: fix zone state management race
dm-zoned uses the zone flag DMZ_ACTIVE to indicate that a zone of the
backend device is being actively read or written and so cannot be
reclaimed. This flag is set as long as the zone atomic reference
counter is not 0. When this atomic is decremented and reaches 0 (e.g.
on BIO completion), the active flag is cleared and set again whenever
the zone is reused and BIO issued with the atomic counter incremented.
These 2 operations (atomic inc/dec and flag set/clear) are however not
always executed atomically under the target metadata mutex lock and
this causes the warning:
WARN_ON(!test_bit(DMZ_ACTIVE, &zone->flags));
in dmz_deactivate_zone() to be displayed. This problem is regularly
triggered with xfstests generic/209, generic/300, generic/451 and
xfs/077 with XFS being used as the file system on the dm-zoned target
device. Similarly, xfstests ext4/303, ext4/304, generic/209 and
generic/300 trigger the warning with ext4 use.
This problem can be easily fixed by simply removing the DMZ_ACTIVE flag
and managing the "ACTIVE" state by directly looking at the reference
counter value. To do so, the functions dmz_activate_zone() and
dmz_deactivate_zone() are changed to inline functions respectively
calling atomic_inc() and atomic_dec(), while the dmz_is_active() macro
is changed to an inline function calling atomic_read().
Darrick J. Wong [Mon, 15 Jul 2019 15:51:00 +0000 (08:51 -0700)]
iomap: move the main iteration code into a separate file
Move the main iteration code into a separate file so that we can group
related functions in a single file instead of having a single enormous
source file.
Darrick J. Wong [Mon, 15 Jul 2019 15:50:59 +0000 (08:50 -0700)]
iomap: move the buffered IO code into a separate file
Move the buffered IO code into a separate file so that we can group
related functions in a single file instead of having a single enormous
source file.
Darrick J. Wong [Mon, 15 Jul 2019 15:50:58 +0000 (08:50 -0700)]
iomap: move the SEEK_HOLE code into a separate file
Move the SEEK_HOLE/SEEK_DATA code into a separate file so that we can
group related functions in a single file instead of having a single
enormous source file.
Darrick J. Wong [Mon, 15 Jul 2019 15:50:58 +0000 (08:50 -0700)]
iomap: move the file mapping reporting code into a separate file
Move the file mapping reporting code (FIEMAP/FIBMAP) into a separate
file so that we can group related functions in a single file instead of
having a single enormous source file.
Darrick J. Wong [Mon, 15 Jul 2019 15:50:57 +0000 (08:50 -0700)]
iomap: move the swapfile code into a separate file
Move the swapfile activation code into a separate file so that we can
group related functions in a single file instead of having a single
enormous source file.
kbuild: modsign: read modules.order instead of $(MODVERDIR)/*.mod
Towards the goal of removing MODVERDIR, read out modules.order to get
the list of modules to be signed. This is simpler than parsing *.mod
files in $(MODVERDIR).
The modules_sign target is only supported for in-kernel modules.
So, this commit does not take care of external modules.
kbuild: modinst: read modules.order instead of $(MODVERDIR)/*.mod
Towards the goal of removing MODVERDIR, read out modules.order to get
the list of modules to be installed. This is simpler than parsing *.mod
files in $(MODVERDIR).
For external modules, $(KBUILD_EXTMOD)/modules.order should be read.
kbuild: remove duplication from modules.order in sub-directories
Currently, only the top-level modules.order drops duplicated entries.
The modules.order files in sub-directories potentially contain
duplication. To list out the paths of all modules, I want to use
modules.order instead of parsing *.mod files in $(MODVERDIR).
To achieve this, I want to rip off duplication from modules.order
of external modules too.
kbuild: get rid of kernel/ prefix from in-tree modules.{order,builtin}
Removing the 'kernel/' prefix will make our life easier because we can
simply do 'cat modules.order' to get all built modules with full paths.
Currently, we parse the first line of '*.mod' files in $(MODVERDIR).
Since we have duplicated functionality here, I plan to remove MODVERDIR
entirely.
In fact, modules.order is generated also for external modules in a
broken format. It adds the 'kernel/' prefix to the absolute path of
the module, like this:
kernel//path/to/your/external/module/foo.ko
This is fine for now since modules.order is not used for external
modules. However, I want to sanitize the format everywhere towards
the goal of removing MODVERDIR.
We cannot change the format of installed module.{order,builtin}.
So, 'make modules_install' will add the 'kernel/' prefix while copying
them to $(MODLIB)/.
kbuild: do not create empty modules.order in the prepare stage
Currently, $(objtree)/modules.order is touched in two places.
In the 'prepare0' rule, scripts/Makefile.build creates an empty
modules.order while processing 'obj=.'
In the 'modules' rule, the top-level Makefile overwrites it with
the correct list of modules.
While this might be a good side-effect that modules.order is made
empty every time (probably this is not intended functionality),
I personally do not like this behavior.
Create modules.order only when it is sensible to do so.
This avoids creating the following pointless files:
Use recently introduced devm_platform_ioremap_resource
helper which wraps platform_get_resource() and
devm_ioremap_resource() together. This helps produce much
cleaner code and remove local `struct resource` declaration.
kbuild: add --hash-style= and --build-id unconditionally
As commit 1e0221374e30 ("mips: vdso: drop unnecessary cc-ldoption")
explained, these flags are supported by the minimal required version
of binutils. They are supported by ld.lld too.
kbuild: get rid of misleading $(AS) from documents
The assembler files in the kernel are *.S instead of *.s, so they must
be preprocessed. Since 'as' of GNU binutils is not able to preprocess,
we always use $(CC) as an assembler driver.
$(AS) is almost unused in Kbuild. As of v5.2, there is just one place
that directly invokes $(AS).
Since commit 00c864f8903d ("kconfig: allow all config targets to write
auto.conf if missing"), Kconfig creates include/config/auto.conf in the
defconfig stage when it is missing.
Joonas Kylmälä reported incorrect auto.conf generation under some
circumstances.
You will see CONFIG_USB_FUNCTIONFS=y is correctly contained in the
.config, but not in the auto.conf.
Please note drivers/usb/gadget/legacy/Kconfig is included from a choice
block in drivers/usb/gadget/Kconfig. So USB_FUNCTIONFS is a choice value.
This is probably a similar situation described in commit beaaddb62540
("kconfig: tests: test defconfig when two choices interact").
When sym_calc_choice() is called, the choice symbol forgets the
SYMBOL_DEF_USER unless all of its choice values are explicitly set by
the user.
The choice symbol is given just one chance to recall it because
set_all_choice_values() is called if SYMBOL_NEED_SET_CHOICE_VALUES
is set.
When sym_calc_choice() is called again, the choice symbol forgets it
forever, since SYMBOL_NEED_SET_CHOICE_VALUES is a one-time aid.
Hence, we cannot call sym_clear_all_valid() again and again.
It is crazy to repeat set and unset of internal flags. However, we
cannot simply get rid of "sym->flags &= flags | ~SYMBOL_DEF_USER;"
Doing so would re-introduce the problem solved by commit 5d09598d488f
("kconfig: fix new choices being skipped upon config update").
To work around the issue, conf_write_autoconf() stopped calling
sym_clear_all_valid().
conf_write() must be changed accordingly. Currently, it clears
SYMBOL_WRITE after the symbol is written into the .config file. This
is needed to prevent it from writing the same symbol multiple times in
case the symbol is declared in two or more locations. I added the new
flag SYMBOL_WRITTEN, to track the symbols that have been written.
Anyway, this is a cheesy workaround in order to suppress the issue
as far as defconfig is concerned.
Handling of choices is totally broken. sym_clear_all_valid() is called
every time a user touches a symbol from the GUI interface. To reproduce
it, just add a new symbol drivers/usb/gadget/legacy/Kconfig, then touch
around unrelated symbols from menuconfig. USB_FUNCTIONFS will disappear
from the .config file.
I added the Fixes tag since it is more fatal than before. But, this
has been broken since long long time before, and still it is.
We should take a closer look to fix this correctly somehow.
xen/pv: Fix a boot up hang revealed by int3 self test
Commit 7457c0da024b ("x86/alternatives: Add int3_emulate_call()
selftest") is used to ensure there is a gap setup in int3 exception stack
which could be used for inserting call return address.
This gap is missed in XEN PV int3 exception entry path, then below panic
triggered:
For 64bit PV guests, Xen's ABI enters the kernel with using SYSRET, with
%rcx/%r11 on the stack. To convert back to "normal" looking exceptions,
the xen thunks do 'xen_*: pop %rcx; pop %r11; jmp *'.
E.g. Extracting 'xen_pv_trap xenint3' we have:
xen_xenint3:
pop %rcx;
pop %r11;
jmp xenint3
As xenint3 and int3 entry code are same except xenint3 doesn't generate
a gap, we can fix it by using int3 and drop useless xenint3.
PVH guest needs PV extentions to work, so "nopv" parameter should be
ignored for PVH but not for HVM guest.
If PVH guest boots up via the Xen-PVH boot entry, xen_pvh is set early,
we know it's PVH guest and ignore "nopv" parameter directly.
If PVH guest boots up via the normal boot entry same as HVM guest, it's
hard to distinguish PVH and HVM guest at that time. In this case, we
have to panic early if PVH is detected and nopv is enabled to avoid a
worse situation later.
Remove static from bool_x86_init_noop/x86_op_int_noop so they could be
used globally. Move xen_platform_hvm() after xen_hvm_guest_late_init()
to avoid compile error.
x86: Add "nopv" parameter to disable PV extensions
In virtualization environment, PV extensions (drivers, interrupts,
timers, etc) are enabled in the majority of use cases which is the
best option.
However, in some cases (kexec not fully working, benchmarking)
we want to disable PV extensions. We have "xen_nopv" for that purpose
but only for XEN. For a consistent admin experience a common command
line parameter "nopv" set across all PV guest implementations is a
better choice.
There are guest types which just won't work without PV extensions,
like Xen PV, Xen PVH and jailhouse. add a "ignore_nopv" member to
struct hypervisor_x86 set to true for those guest types and call
the detect functions only if nopv is false or ignore_nopv is true.
The Xen tmem (transcendent memory) driver can be removed, as the
related Xen hypervisor feature never made it past the "experimental"
state and will be removed in future Xen versions (>= 4.13).
The xen-selfballoon driver depends on tmem, so it can be removed, too.
Commit 8990cac6e5ea ("x86/jump_label: Initialize static branching
early") adds jump_label_init() call in setup_arch() to make static
keys initialized early, so we could use the original simpler code
again.
Juergen Gross [Fri, 21 Jun 2019 18:47:03 +0000 (20:47 +0200)]
xen/events: fix binding user event channels to cpus
When binding an interdomain event channel to a vcpu via
IOCTL_EVTCHN_BIND_INTERDOMAIN not only the event channel needs to be
bound, but the affinity of the associated IRQi must be changed, too.
Otherwise the IRQ and the event channel won't be moved to another vcpu
in case the original vcpu they were bound to is going offline.
scsi: megaraid_sas: set an unlimited max_segment_size
When using a virt_boundary_mask, as done for NVMe devices attached to
megaraid_sas controllers, we require an unlimited max_segment_size as the
virt boundary merging code assumes that. But we also need to propagate
that to the DMA mapping layer to make dma-debug happy. The SCSI layer
takes care of that when using the per-host virt_boundary setting, but
given that megaraid_sas only wants to set the virt_boundary for actual
NVMe devices, we can't rely on that. The DMA layer maximum segment is
global to the HBA however, so we have to set it explicitly. This patch
assumes that megaraid_sas does not have a segment size limitation, which
seems true based on the SGL format, but will need to be verified.
scsi: mpt3sas: set an unlimited max_segment_size for SAS 3.0 HBAs
When using a virt_boundary_mask, as done for NVMe devices attached to
mpt3sas controllers, we require an unlimited max_segment_size as the virt
boundary merging code assumes that. But we also need to propagate that to
the DMA mapping layer to make dma-debug happy. The SCSI layer takes care
of that when using the per-host virt_boundary setting, but given that
mpt3sas only wants to set the virt_boundary for actual NVMe devices, we
can't rely on that. The DMA layer maximum segment is global to the HBA
however, so we have to set it explicitly. This patch assumes that mpt3sas
does not have a segment size limitation, which seems true based on the SGL
format, but will need to be verified.
scsi: core: add a host / host template field for the virt boundary
This allows drivers setting it up easily instead of branching out to block
layer calls in slave_alloc, and ensures the upgraded max_segment_size
setting gets picked up by the DMA layer.
Al Viro [Thu, 4 Jul 2019 20:57:51 +0000 (16:57 -0400)]
switch the remnants of releasing the mountpoint away from fs_pin
We used to need rather convoluted ordering trickery to guarantee
that dput() of ex-mountpoints happens before the final mntput()
of the same. Since we don't need that anymore, there's no point
playing with fs_pin for that.
Al Viro [Sun, 30 Jun 2019 23:18:53 +0000 (19:18 -0400)]
get rid of detach_mnt()
Lift getting the original mount (dentry is actually not needed at all)
of the mountpoint into the callers - to do_move_mount() and pivot_root()
level. That simplifies the cleanup in those and allows to get saner
arguments for attach_mnt_recursive().
This patch fixes below sparse warning related to __virtio
type in virtio pmem driver. This is reported by Intel test
bot on linux-next tree.
nd_virtio.c:56:28: warning: incorrect type in assignment
(different base types)
nd_virtio.c:56:28: expected unsigned int [unsigned] [usertype] type
nd_virtio.c:56:28: got restricted __virtio32
nd_virtio.c:93:59: warning: incorrect type in argument 2
(different base types)
nd_virtio.c:93:59: expected restricted __virtio32 [usertype] val
nd_virtio.c:93:59: got unsigned int [unsigned] [usertype] ret
Al Viro [Sun, 30 Jun 2019 14:39:08 +0000 (10:39 -0400)]
make struct mountpoint bear the dentry reference to mountpoint, not struct mount
Using dput_to_list() to shift the contributing reference from ->mnt_mountpoint
to ->mnt_mp->m_dentry. Dentries are dropped (with dput_to_list()) as soon
as struct mountpoint is destroyed; in cases where we are under namespace_sem
we use the global list, shrinking it in namespace_unlock(). In case of
detaching stuck MNT_LOCKed children at final mntput_no_expire() we use a local
list and shrink it ourselves. ->mnt_ex_mountpoint crap is gone.
Ming Lei [Fri, 12 Jul 2019 02:08:19 +0000 (10:08 +0800)]
scsi: core: Fix race on creating sense cache
When scsi_init_sense_cache(host) is called concurrently from different
hosts, each code path may find that no cache has been created and
allocate a new one. The lack of locking can lead to potentially
overriding a cache allocated by a different host.
Fix the issue by moving 'mutex_lock(&scsi_sense_cache_mutex)' before
scsi_select_sense_cache().
Damien Le Moal [Wed, 17 Jul 2019 01:51:49 +0000 (10:51 +0900)]
scsi: sd_zbc: Fix compilation warning
kbuild test robot gets the following compilation warning using gcc 7.4
cross compilation for c6x (GCC_VERSION=7.4.0 make.cross ARCH=c6x).
In file included from include/asm-generic/bug.h:18:0,
from arch/c6x/include/asm/bug.h:12,
from include/linux/bug.h:5,
from include/linux/thread_info.h:12,
from include/asm-generic/current.h:5,
from ./arch/c6x/include/generated/asm/current.h:1,
from include/linux/sched.h:12,
from include/linux/blkdev.h:5,
from drivers//scsi/sd_zbc.c:11:
drivers//scsi/sd_zbc.c: In function 'sd_zbc_read_zones':
>> include/linux/kernel.h:62:48: warning: 'zone_blocks' may be used
uninitialized in this function [-Wmaybe-uninitialized]
#define __round_mask(x, y) ((__typeof__(x))((y)-1))
^
drivers//scsi/sd_zbc.c:464:6: note: 'zone_blocks' was declared here
u32 zone_blocks;
^~~~~~~~~~~
This is a false-positive report. The variable zone_blocks is always
initialized in sd_zbc_check_zones() before use. It is not initialized
only and only if sd_zbc_check_zones() fails.
Avoid this warning by initializing the zone_blocks variable to 0.
Fixes: 5f832a395859 ("scsi: sd_zbc: Fix sd_zbc_check_zones() error checks") Cc: Stable <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
Colin Ian King [Tue, 2 Jul 2019 09:18:35 +0000 (10:18 +0100)]
scsi: libfc: fix null pointer dereference on a null lport
Currently if lport is null then the null lport pointer is dereference when
printing out debug via the FC_LPORT_DB macro. Fix this by using the more
generic FC_LIBFC_DBG debug macro instead that does not use lport.
Addresses-Coverity: ("Dereference after null check") Fixes: 7414705ea4ae ("libfc: Add runtime debugging with debug_logging module parameter") Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
RocksDB can hang indefinitely when using a DAX file. This is due to
a bug in the XArray conversion when handling a PMD fault and finding a
PTE entry. We use the wrong index in the hash and end up waiting on
the wrong waitqueue.
There's actually no need to wait; if we find a PTE entry while looking
for a PMD entry, we can return immediately as we know we should fall
back to a PTE fault (which may not conflict with the lock held).
We reuse the XA_RETRY_ENTRY to signal a conflicting entry was found.
This value can never be found in an XArray while holding its lock, so
it does not create an ambiguity.