Git Repo - linux.git/log

arm64: ftrace: emit ftrace-mod.o contents through code

When building the arm64 kernel with both CONFIG_ARM64_MODULE_PLTS and
CONFIG_DYNAMIC_FTRACE enabled, the ftrace-mod.o object file is built
with the kernel and contains a trampoline that is linked into each
module, so that modules can be loaded far away from the kernel and
still reach the ftrace entry point in the core kernel with an ordinary
relative branch, as is emitted by the compiler instrumentation code
dynamic ftrace relies on.

In order to be able to build out of tree modules, this object file
needs to be included into the linux-headers or linux-devel packages,
which is undesirable, as it makes arm64 a special case (although a
precedent does exist for 32-bit PPC).

Given that the trampoline essentially consists of a PLT entry, let's
not bother with a source or object file for it, and simply patch it
in whenever the trampoline is being populated, using the existing
PLT support routines.

Cc: <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

arm64: module-plts: factor out PLT generation code for ftrace

To allow the ftrace trampoline code to reuse the PLT entry routines,
factor it out and move it into asm/module.h.

Cc: <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

afs: Properly reset afs_vnode (inode) fields

When an AFS inode is allocated by afs_alloc_inode(), the allocated
afs_vnode struct isn't necessarily reset from the last time it was used as
an inode because the slab constructor is only invoked once when the memory
is obtained from the page allocator.

This means that information can leak from one inode to the next because
we're not calling kmem_cache_zalloc(). Some of the information isn't
reset, in particular the permit cache pointer.

Bring the clearances up to date.

Signed-off-by: David Howells <[email protected]>
Tested-by: Marc Dionne <[email protected]>

afs: Fix permit refcounting

Fix four refcount bugs in afs_cache_permit():

(1) When checking the result of the kzalloc(), we can't just return, but
     must put 'permits'.

(2) We shouldn't put permits immediately after hashing a new permit as we
     need to keep the pointer stable so that we can check to see if
     vnode->permit_cache has changed before we decide whether to assign to
     it.

(3) 'permits' is being put twice.

(4) We need to put either the replacement or the thing replaced after the
     assignment to vnode->permit_cache.

Without this, lots of the following are seen:

  Kernel BUG at ffffffffa039857b [verbose debug info unavailable]
  ------------[ cut here ]------------
  Kernel BUG at ffffffffa039858a [verbose debug info unavailable]
  ------------[ cut here ]------------

The addresses are in the .text..refcount section of the kafs.ko module.
Following the relocation records for the __ex_table section shows one to be
due to the decrement in afs_put_permits() and the other to be key_get() in
afs_cache_permit().

Occasionally, the following is seen:

  refcount_t overflow at afs_cache_permit+0x57d/0x5c0 [kafs] in cc1[562], uid/euid: 0/0
  WARNING: CPU: 0 PID: 562 at kernel/panic.c:657 refcount_error_report+0x9c/0xac
  ...

Reported-by: Marc Dionne <[email protected]>
Signed-off-by: David Howells <[email protected]>
Tested-by: Marc Dionne <[email protected]>

can: mcba_usb: fix device disconnect bug

Currently, when you disconnect the device, the driver infinitely
resubmits all URBs, so you see:

Rx URB aborted (-32)

in an infinite loop.

Fix this by catching -EPIPE (what we get in urb->status when the device
disconnects) and not resubmitting.

With this patch, I can plug and unplug many times and the driver
recovers correctly.

Signed-off-by: Martin Kelly <[email protected]>
Cc: linux-stable <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: mcba_usb: fix typo

Fix typo "analizer" --> "Analyzer".

Signed-off-by: Martin Kelly <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: flexcan: fix VF610 state transition issue

Enable FLEXCAN_QUIRK_BROKEN_PERR_STATE for VF610 to report correct state
transitions.

Tested-by: Mirza Krak <[email protected]>
Cc: linux-stable <[email protected]> # >= v4.11
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: flexcan: Update IRQ Err Passive information

The flexcan IP cores used on MX25 and MX35 do not generate Error Passive
IRQs. Update the IP core overview table in the driver accordingly.

Suggested-by: ZHU Yi (ST-FIR/ENG1-Zhu) <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: peak/pci: fix potential bug when probe() fails

PCI/PCIe drivers for PEAK-System CAN/CAN-FD interfaces do some access to the
PCI config during probing. In case one of these accesses fails, a POSITIVE
PCIBIOS_xxx error code is returned back. This POSITIVE error code MUST be
converted into a NEGATIVE errno for the probe() function to indicate it
failed. Using the pcibios_err_to_errno() function, we make sure that the
return code will always be negative.

Signed-off-by: Stephane Grosjean <[email protected]>
Cc: linux-stable <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: ti_hecc: Fix napi poll return value for repoll

After commit d75b1ade567f ("net: less interrupt masking in NAPI") napi
repoll is done only when work_done == budget.
So we need to return budget if there are still packets to receive.

Signed-off-by: Oliver Stäbler <[email protected]>
Cc: linux-stable <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: kvaser_usb: ratelimit errors if incomplete messages are received

Avoid flooding the kernel log with "Formate error", if incomplete message
are received.

Signed-off-by: Jimmy Assarsson <[email protected]>
Cc: linux-stable <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: kvaser_usb: Fix comparison bug in kvaser_usb_read_bulk_callback()

The conditon in the while-loop becomes true when actual_length is less than
2 (MSG_HEADER_LEN). In best case we end up with a former, already
dispatched msg, that got msg->len greater than actual_length. This will
result in a "Format error" error printout.

Problem seen when unplugging a Kvaser USB device connected to a vbox guest.

warning: comparison between signed and unsigned integer expressions
[-Wsign-compare]

Signed-off-by: Jimmy Assarsson <[email protected]>
Cc: linux-stable <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: kvaser_usb: free buf in error paths

The allocated buffer was not freed if usb_submit_urb() failed.

Signed-off-by: Jimmy Assarsson <[email protected]>
Cc: linux-stable <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

samples/bpf: add error checking for perf ioctl calls in bpf loader

load_bpf_file() should fail if ioctl with command
PERF_EVENT_IOC_ENABLE and PERF_EVENT_IOC_SET_BPF fails.
When they do fail, proper error messages are printed.

With this change, the below "syscall_tp" run shows that
the maximum number of bpf progs attaching to the same
perf tracepoint is indeed enforced.
  $ ./syscall_tp -i 64
  prog #0: map ids 4 5
  ...
  prog #63: map ids 382 383
  $ ./syscall_tp -i 65
  prog #0: map ids 4 5
  ...
  prog #64: map ids 388 389
  ioctl PERF_EVENT_IOC_SET_BPF failed err Argument list too long

Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: set maximum number of attached progs to 64 for a single perf tp

cgropu+bpf prog array has a maximum number of 64 programs.
Let us apply the same limit here.

Fixes: e87c6bc3852b ("bpf: permit multiple bpf attachments for a single perf event")
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

Merge tag 'apparmor-pr-2017-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull apparmor bugfix from John Johansen:
"Fix oops in audit_signal_cb hook marked for stable"

* tag 'apparmor-pr-2017-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: fix oops in audit_signal_cb hook

Merge tag 'acpi-4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These fix a regression related to the ACPI EC handling during system
  suspend/resume on some platforms and prevent modalias from being
  exposed to user space for ACPI device object with "not functional and
  not present" status.

  Specifics:

   - Fix an ACPI EC driver regression (from the 4.9 cycle) causing the
     driver's power management operations to be omitted during system
     suspend/resume on platforms where the EC instance from the ECDT
     table is used instead of the one from the DSDT (Lv Zheng).

   - Prevent modalias from being exposed to user space for ACPI device
     objects with _STA returning 0 (not present and not functional) to
     prevent driver modules from being loaded automatically for hardware
     that is not actually present on some platforms (Hans de Goede)"

* tag 'acpi-4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / EC: Fix regression related to PM ops support in ECDT device
  ACPI / bus: Leave modalias empty for devices which are not present

Merge tag 'pm-4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:

- add missing module information to the Mediatek cpufreq driver module
   (Jesse Chan)

- fix config dependencies for the Loongson cpufreq driver (James Hogan)

- fix two issues related to CPU offline in the cpupower utility
   (Abhishek Goel).

* tag 'pm-4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: mediatek: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
  cpufreq: Add Loongson machine dependencies
  cpupower : Fix cpupower working when cpu0 is offline
  cpupowerutils: bench - Fix cpu online check

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull quota & reiserfs changes from Jan Kara:

- two error checking improvements for quota

- remove bogus i_version increase for reiserfs

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: Check for register_shrinker() failure.
  quota: propagate error from __dquot_initialize
  reiserfs: remove unneeded i_version bump

Merge branch 'drm-fixes-4.15' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

Fixes for 4.15.  Highlights:
- DC fixes for S3, gamma, audio, pageflipping, etc.
- fix a regression in radeon from kfd removal
- fix a ttm regression with swiotlb disabled
- misc other fixes

* 'drm-fixes-4.15' of git://people.freedesktop.org/~agd5f/linux: (36 commits)
  drm/radeon: remove init of CIK VMIDs 8-16 for amdkfd
  drm/ttm: fix populate_and_map() functions once more
  drm/amd/display: USB-C / thunderbolt dock specific workaround
  drm/amd/display: Switch to drm_atomic_helper_wait_for_flip_done
  drm/amd/display: fix gamma setting
  drm/amd/display: Do not put drm_atomic_state on resume
  drm/amd/display: Fix couple more inconsistent NULL checks in dc_resource
  drm/amd/display: Fix potential NULL and mem leak in create_links
  drm/amd/display: Fix hubp check in set_cursor_position
  drm/amd/display: Fix use before NULL check in validate_timing
  drm/amd/display: Bunch of smatch error and warning fixes in DC
  drm/amd/display: Fix amdgpu_dm bugs found by smatch
  drm/amd/display: try to find matching audio inst for enc inst first
  drm/amd/display: fix seq issue: turn on clock before programming afmt.
  drm/amd/display: fix memory leaks on error exit return
  drm/amd/display: check plane state before validating fbc
  drm/amd/display: Do DC mode-change check when adding CRTCs
  drm/amd/display: Revert noisy assert messages
  drm/amd/display: fix split viewport rounding error
  drm/amd/display: Check aux channel before MST resume
  ...

Merge branch 'for-upstream/mali-dp' of git://linux-arm.org/linux-ld into drm-fixes

mali-dp interface cleanups.

* 'for-upstream/mali-dp' of git://linux-arm.org/linux-ld:
  drm: mali-dp: Disable planes when their CRTC gets disabled.
  drm: mali-dp: Separate static internal data into a read-only structure.
  drm/arm: Replace instances of drm_dev_unref with drm_dev_put.
  drm: mali-dp: switch to drm_*_get(), drm_*_put() helpers

Merge tag 'drm-amdkfd-fixes-2017-11-26' of git://people.freedesktop.org/~gabbayo/linux into drm-fixes

This is amdkfd pull request for -rc2. It contains three small fixes to the
CIK SDMA code, compilation error fix in kfd_ioctl.h and fix to accessing
a pointer after it was released.

* tag 'drm-amdkfd-fixes-2017-11-26' of git://people.freedesktop.org/~gabbayo/linux:
  uapi: fix linux/kfd_ioctl.h userspace compilation errors
  drm/amdkfd: fix amdkfd use-after-free GP fault
  drm/amdkfd: Fix SDMA oversubsription handling
  drm/amdkfd: Fix SDMA ring buffer size calculation
  drm/amdgpu: Fix SDMA load/unload sequence on HWS disabled mode

Merge branch 'for-upstream/hdlcd' of git://linux-arm.org/linux-ld into drm-fixes

3 hdlcd fixes/cleanups

* 'for-upstream/hdlcd' of git://linux-arm.org/linux-ld:
  drm/arm: Replace instances of drm_dev_unref with drm_dev_put.
  drm: Fix checkpatch issue: "WARNING: braces {} are not necessary for single statement blocks."
  drm: hdlcd: Update PM code to save/restore console.

Merge tag 'imx-drm-fixes-2017-11-30' of git://git.pengutronix.de/git/pza/linux into drm-fixes

drm/imx: fix commit_tail for new drm_atomic_helper_setup_commit

Since commit 080de2e5be2d ("drm/atomic: Check for busy planes/connectors before
setting the commit"), drm_atomic_helper_setup_commit expects that blocking
commits have completed flipping before the commit_tail returns. Add the missing
wait_for_flip_done to commit_tail to ensure this.

* tag 'imx-drm-fixes-2017-11-30' of git://git.pengutronix.de/git/pza/linux:
drm/imx: always call wait_for_flip_done in commit_tail

Merge tag 'drm-intel-fixes-2017-11-30' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Disable transparent huge pages for now until we have a W/A
- Building fix when CONFIG_BACKLIGHT_CLASS_DEVICE is not selected
- GMBUS communication robustness
- Fbdev hotplug handling fix

gvt-fixes-2017-11-28

- regression fix for sane request alloc (Fred)
- locking fix (Changbin)
- fix invalid addr mask (Xiong)
- compression regression fix (Weinan)
- fix default pipe enable for virtual display (Xiaolin)

* tag 'drm-intel-fixes-2017-11-30' of git://anongit.freedesktop.org/drm/drm-intel:
  drm/i915: Disable THP until we have a GPU read BW W/A
  drm/i915/gvt: Correct ADDR_4K/2M/1G_MASK definition
  drm/i915/gvt: enabled pipe A default on creating vgpu
  drm/i915/gvt: Move request alloc to dispatch_workload path only
  drm/i915/gvt: remove skl_misc_ctl_write handler
  drm/i915/gvt: Fix unsafe locking caused by spin_unlock_bh
  drm/i915: fix intel_backlight_device_register declaration
  drm/i915/fbdev: Serialise early hotplug events with async fbdev config
  drm/i915: Prevent zero length "index" write
  drm/i915: Don't try indexed reads to alternate slave addresses

Merge tag 'omapdrm-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux into drm-fixes

omapdrm fixes for 4.15

* Fix platform detection issue causing OMAP3 DPI output to have missing color bits
* Fix platform detection issue causing OMAP4 HDMI audio not to work

* tag 'omapdrm-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
  omapdrm: hdmi4_cec: signedness bug in hdmi4_cec_init()
  drm: omapdrm: Fix DPI on platforms using the DSI VDDS
  omapdrm: hdmi4: Correct the SoC revision matching
  drm/omap: displays: panel-dpi: add backlight dependency
  drm/omap: Fix error handling path in 'omap_dmm_probe()'

Merge tag 'drm-misc-fixes-2017-11-30' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

drm-misc-fixes for -rc2

- big pile of bridge driver (mostly tc358767), all handled by Archit
  and Andrez
- rockchip dsi fix
- atomic helper regression fix for spurious -EBUSY (Maarten)
- fix deferred fbdev fallout (Maarten)

* tag 'drm-misc-fixes-2017-11-30' of git://anongit.freedesktop.org/drm/drm-misc:
  drm/bridge: tc358767: fix 1-lane behavior
  drm/bridge: tc358767: fix AUXDATAn registers access
  drm/bridge: tc358767: fix timing calculations
  drm/bridge: tc358767: fix DP0_MISC register set
  drm/bridge: tc358767: filter out too high modes
  drm/bridge: tc358767: do no fail on hi-res displays
  drm/bridge: Fix lvds-encoder since the panel_bridge rework.
  drm/bridge: synopsys/dw-hdmi: Enable cec clock
  drm/bridge: adv7511/33: Fix adv7511_cec_init() failure handling
  drm/fb_helper: Disable all crtc's when initial setup fails.
  drm/atomic: make drm_atomic_helper_wait_for_vblanks more agressive
  drm/rockchip: dw-mipi-dsi: fix possible un-balanced runtime PM enable

hwmon: (jc42) optionally try to disable the SMBUS timeout

With a nxp,se97 chip on an atmel sama5d31 board, the I2C adapter driver
is not always capable of avoiding the 25-35 ms timeout as specified by
the SMBUS protocol. This may cause silent corruption of the last bit of
any transfer, e.g. a one is read instead of a zero if the sensor chip
times out. This also affects the eeprom half of the nxp-se97 chip, where
this silent corruption was originally noticed. Other I2C adapters probably
suffer similar issues, e.g. bit-banging comes to mind as risky...

The SMBUS register in the nxp chip is not a standard Jedec register, but
it is not special to the nxp chips either, at least the atmel chips
have the same mechanism. Therefore, do not special case this on the
manufacturer, it is opt-in via the device property anyway.

Cc: [email protected] # 4.9+
Signed-off-by: Peter Rosin <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>

RISC-V: Clean up an unused include

We used to have some cmpxchg syscalls. They're no longer there, so we
no longer need the include.

CC: Christoph Hellwig <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Allow userspace to flush the instruction cache

Despite RISC-V having a direct 'fence.i' instruction available to
userspace (which we can't trap!), that's not actually viable when
running on Linux because the kernel might schedule a process on another
hart. There is no way for userspace to handle this without invoking the
kernel (as it doesn't know the thread->hart mappings), so we've defined
a RISC-V specific system call to flush the instruction cache.

This patch adds both a system call and a VDSO entry. If possible, we'd
like to avoid having the system call be considered part of the
user-facing ABI and instead restrict that to the VDSO entry -- both just
in general to avoid having additional user-visible ABI to maintain, and
because we'd prefer that users just call the VDSO entry because there
might be a better way to do this in the future (ie, one that doesn't
require entering the kernel).

Signed-off-by: Andrew Waterman <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Flush I$ when making a dirty page executable

The RISC-V ISA allows for instruction caches that are not coherent WRT
stores, even on a single hart.  As a result, we need to explicitly flush
the instruction cache whenever marking a dirty page as executable in
order to preserve the correct system behavior.

Local instruction caches aren't that scary (our implementations actually
flush the cache, but RISC-V is defined to allow higher-performance
implementations to exist), but RISC-V defines no way to perform an
instruction cache shootdown.  When explicitly asked to do so we can
shoot down remote instruction caches via an IPI, but this is a bit on
the slow side.

Instead of requiring an IPI to all harts whenever marking a page as
executable, we simply flush the currently running harts.  In order to
maintain correct behavior, we additionally mark every other hart as
needing a deferred instruction cache which will be taken before anything
runs on it.

Signed-off-by: Andrew Waterman <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

net: dsa: bcm_sf2: Set correct CHAIN_ID and slice number mask

When configuring an IPv6 address mask, we should use SLICE_NUM_MASK as
the mask in order to make sure all bits are masked by the hardware.
Also, we want matching entries to have a CHAIN_ID value set to the same
value as the rule index we return to user-space for convenience, so fix
that too.

Fixes: ba0696c22e7c ("net: dsa: bcm_sf2: Add support for IPv6 CFP rules")
Fixes: dd8eff68343d ("net: dsa: bcm_sf2: Allow matching arbitrary IPv6 masks/lengths")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tools/bpf: adjust rlimit RLIMIT_MEMLOCK for test_verifier_log

The default rlimit RLIMIT_MEMLOCK is 64KB. In certain cases,
e.g. in a test machine mimicking our production system, this test may
fail due to unable to charge the required memory for prog load:
  # ./test_verifier_log
  Test log_level 0...
  ERROR: Program load returned: ret:-1/errno:1, expected ret:-1/errno:22

Changing the default rlimit RLIMIT_MEMLOCK to unlimited makes
the test always pass.

Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

RISC-V: Add missing include

Fixes:

include/asm-generic/mm_hooks.h:20:11: warning: 'struct vm_area_struct' declared inside parameter list will not be visible outside of this definition or declaration
include/asm-generic/mm_hooks.h:19:38: warning: 'struct mm_struct' declared inside parameter list will not be visible outside of this definition or declaration

Signed-off-by: Olof Johansson <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Use define for get_cycles like other architectures

Signed-off-by: Olof Johansson <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Provide stub of setup_profiling_timer()

Fixes the following on allmodconfig build:

profile.c:(.text+0x3e4): undefined reference to `setup_profiling_timer'

Signed-off-by: Olof Johansson <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: Export some expected symbols for modules

These are the ones needed by current allmodconfig, so add them instead
of everything other architectures are exporting -- the rest can be
added on demand later if needed.

Signed-off-by: Olof Johansson <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: move empty_zero_page definition to C and export it

Needed by some modules (exported by other architectures).

Signed-off-by: Olof Johansson <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: io.h: type fixes for warnings

include <linux/types.h> for __iomem definition. Also, add volatile to
iounmap() like other architectures have it to avoid "discarding
volatile" warnings from some drivers.

Finally, explicitly promote the base address for INB/OUTB functions to
avoid some old legacy drivers complaining about int-to-ptr promotions.
The drivers are unlikely to work but they're included in allmodconfig
so the warnings are noisy.

Fixes, among other warnings, these with allmodconfig:

../arch/riscv/include/asm/io.h:24:21: error: expected '=', ',', ';', 'asm' or '__attribute__' before '*' token
extern void __iomem *ioremap(phys_addr_t offset, unsigned long size);

sound/pci/echoaudio/echoaudio.c: In function 'snd_echo_free':
sound/pci/echoaudio/echoaudio.c:1879:10: warning: passing argument 1 of 'iounmap' discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers]

Signed-off-by: Olof Johansson <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: use RISCV_{INT,SHORT} instead of {INT,SHORT} for asm macros

INT and SHORT are used by some drivers that pull in the include files,
so prefixing helps avoid namespace conflicts. Other constructs in the
same file already uses this.

Fixes, among others, these warnings with allmodconfig:

../sound/core/pcm_misc.c:43:0: warning: "INT" redefined
#define INT __force int

Signed-off-by: Olof Johansson <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

RISC-V: use generic serial.h

Fixes this from allmodconfig:

drivers/tty/serial/earlycon.c:27:10: fatal error: asm/serial.h: No such file or directory

Signed-off-by: Olof Johansson <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

SUNRPC: Handle ENETDOWN errors

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

xfs: Properly retry failed dquot items in case of error during buffer writeback

Once the inode item writeback errors is already fixed, it's time to fix the same
problem in dquot code.

Although there were no reports of users hitting this bug in dquot code (at least
none I've seen), the bug is there and I was already planning to fix it when the
correct approach to fix the inodes part was decided.

This patch aims to fix the same problem in dquot code, regarding failed buffers
being unable to be resubmitted once they are flush locked.

Tested with the recently test-case sent to fstests list by Hou Tao.

Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

xfs: scrub inode mode properly

Since we've used up all the bits in i_mode, the existing mode check
doesn't actually do anything useful. However, we've not used all the
bit values in the format portion of i_mode, so we /do/ need to test
that for bad values.

Fixes: 80e4e1268 ("xfs: scrub inodes")
Fixes-coverity-id: 1423992
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

xfs: remove unused parameter from xfs_writepage_map

The first thing that xfs_writepage_map does is clobber the offset
parameter. Since we never use the passed-in value, turn the parameter
into a local variable. This gets rid of an UBSAN warning in generic/466.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

xfs: ubsan fixes

Fix some complaints from the UBSAN about signed integer addition overflows.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:

- x86 bugfixes: APIC, nested virtualization, IOAPIC

- PPC bugfix: HPT guests on a POWER9 radix host

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits)
  KVM: Let KVM_SET_SIGNAL_MASK work as advertised
  KVM: VMX: Fix vmx->nested freeing when no SMI handler
  KVM: VMX: Fix rflags cache during vCPU reset
  KVM: X86: Fix softlockup when get the current kvmclock
  KVM: lapic: Fixup LDR on load in x2apic
  KVM: lapic: Split out x2apic ldr calculation
  KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts
  KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP
  KVM: x86: Fix CPUID function for word 6 (80000001_ECX)
  KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2
  KVM: x86: ioapic: Preserve read-only values in the redirection table
  KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
  KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq
  KVM: x86: ioapic: Don't fire level irq when Remote IRR set
  KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
  KVM: x86: inject exceptions produced by x86_decode_insn
  KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs
  KVM: x86: fix em_fxstor() sleeping while in atomic
  KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure
  KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry
  ...

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:

- SPDX identifiers are added to more of the s390 specific files.

- The ELF_ET_DYN_BASE base patch from Kees is reverted, with the change
   some old 31-bit programs crash.

- Bug fixes and cleanups.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (29 commits)
  s390/gs: add compat regset for the guarded storage broadcast control block
  s390: revert ELF_ET_DYN_BASE base changes
  s390: Remove redundant license text
  s390: crypto: Remove redundant license text
  s390: include: Remove redundant license text
  s390: kernel: Remove redundant license text
  s390: add SPDX identifiers to the remaining files
  s390: appldata: add SPDX identifiers to the remaining files
  s390: pci: add SPDX identifiers to the remaining files
  s390: mm: add SPDX identifiers to the remaining files
  s390: crypto: add SPDX identifiers to the remaining files
  s390: kernel: add SPDX identifiers to the remaining files
  s390: sthyi: add SPDX identifiers to the remaining files
  s390: drivers: Remove redundant license text
  s390: crypto: Remove redundant license text
  s390: virtio: add SPDX identifiers to the remaining files
  s390: scsi: zfcp_aux: add SPDX identifier
  s390: net: add SPDX identifiers to the remaining files
  s390: char: add SPDX identifiers to the remaining files
  s390: cio: add SPDX identifiers to the remaining files
  ...

sit: update frag_off info

After parsing the sit netlink change info, we forget to update frag_off in
ipip6_tunnel_update(). Fix it by assigning frag_off with new value.

Reported-by: Jianlin Shi <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: remove buggy call to tcp_v6_restore_cb()

tcp_v6_send_reset() expects to receive an skb with skb->cb[] layout as
used in TCP stack.
MD5 lookup uses tcp_v6_iif() and tcp_v6_sdif() and thus
TCP_SKB_CB(skb)->header.h6

This patch probably fixes RST packets sent on behalf of a timewait md5
ipv6 socket.

Before Florian patch, tcp_v6_restore_cb() was needed before jumping to
no_tcp_socket label.

Fixes: 271c3b9b7bda ("tcp: honour SO_BINDTODEVICE for TW_RST case too")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Florian Westphal <[email protected]>
Acked-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

act_sample: get rid of tcf_sample_cleanup_rcu()

Similar to commit d7fb60b9cafb ("net_sched: get rid of tcfa_rcu"),
TC actions don't need to respect RCU grace period, because it
is either just detached from tc filter (standalone case) or
it is removed together with tc filter (bound case) in which case
RCU grace period is already respected at filter layer.

Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action")
Reported-by: Eric Dumazet <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Cc: Jiri Pirko <[email protected]>
Cc: Yotam Gigi <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'rxrpc-fixes-20171129' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Fixes

Here are three patches for AF_RXRPC. One removes some whitespace, one
fixes terminal ACK generation and the third makes a couple of places
actually use the timeout value just determined rather than ignoring it.
====================

Signed-off-by: David S. Miller <[email protected]>

drm/imx: always call wait_for_flip_done in commit_tail

drm_atomic_helper_wait_for_vblanks will go away in the future.

The new drm_atomic_helper_setup_commit in 4.15 expects that blocking commits
have completed flipping before the commit_tail returns. This must be ensured
by calling wait_for_vblanks or wait_for_flip_done, where flip_done might do
a less agressive wait, which is fine for imx-drm.

Fixes: 080de2e5be2d (drm/atomic: Check for busy planes/connectors before
setting the commit)
Signed-off-by: Lucas Stach <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Signed-off-by: Philipp Zabel <[email protected]>

net: stmmac: dwmac-sun8i: fix allwinner,leds-active-low handling

The driver expect "allwinner,leds-active-low" to be in PHY node, but
the binding doc expect it to be in MAC node.

Since all board DT use it also in MAC node, the driver need to search
allwinner,leds-active-low in MAC node.

Signed-off-by: Corentin Labbe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

skbuff: Grammar s/are can/can/, s/change/changes/

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: allocate zeroed tx descriptors

Reserved and unused fields in the Tx descriptors should be 0. The PPv2
driver doesn't clear them at run-time (for performance reasons) but
these descriptors aren't zeroed when allocated, which can lead to
unpredictable behaviors. This patch fixes this by using
dma_zalloc_coherent instead of dma_alloc_coherent.

Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Yan Markman <[email protected]>
[Antoine: commit message]
Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'acpi-ec' into acpi

* acpi-ec:
ACPI / EC: Fix regression related to PM ops support in ECDT device

Merge branch 'pm-tools'

* pm-tools:
cpupower : Fix cpupower working when cpu0 is offline
cpupowerutils: bench - Fix cpu online check

omapdrm: hdmi4_cec: signedness bug in hdmi4_cec_init()

"ret" needs to be signed for the error handling to work.

Fixes: 8d7f934df8d8 ("omapdrm: hdmi4_cec: add OMAP4 HDMI CEC support")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Sebastian Reichel <[email protected]>
Acked-by: Hans Verkuil <[email protected]>
Signed-off-by: Tomi Valkeinen <[email protected]>

drm: omapdrm: Fix DPI on platforms using the DSI VDDS

Commit d178e034d565 ("drm: omapdrm: Move FEAT_DPI_USES_VDDS_DSI feature
to dpi code") replaced usage of platform data version with SoC matching
to configure DPI VDDS. The SoC match entries were incorrect, they should
have matched on the machine name instead of the SoC family. Fix it.

The result was observed on OpenPandora with OMAP3530 where the panel only
had the Blue channel and Red&Green were missing. It was not observed on
GTA04 with DM3730.

Fixes: d178e034d565 ("drm: omapdrm: Move FEAT_DPI_USES_VDDS_DSI feature to dpi code")
Signed-off-by: Laurent Pinchart <[email protected]>
Reported-by: H. Nikolaus Schaller <[email protected]>
Tested-by: H. Nikolaus Schaller <[email protected]>
Cc: [email protected] # 4.14
Signed-off-by: Tomi Valkeinen <[email protected]>

omapdrm: hdmi4: Correct the SoC revision matching

I believe the intention of the commit 2c9fc9bf45f8
("drm: omapdrm: Move FEAT_HDMI_* features to hdmi4 driver")
was to identify omap4430 ES1.x, omap4430 ES2.x and other OMAP4 revisions,
like omap4460.

By using family=OMAP4 in the match the code will treat omap4460 ES1.x in a
same way as it would treat omap4430 ES1.x

This breaks HDMI audio on OMAP4460 devices (PandaES for example).

Correct the match rule so we are not going to get false positive match.

Fixes: 2c9fc9bf45f8 ("drm: omapdrm: Move FEAT_HDMI_* features to hdmi4 driver")
Cc: [email protected] # 4.14
Signed-off-by: Peter Ujfalusi <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
Signed-off-by: Tomi Valkeinen <[email protected]>

drm/omap: displays: panel-dpi: add backlight dependency

The new backlight code causes a link failure when backlight
support itself is disabled:

drivers/gpu/drm/omapdrm/displays/panel-dpi.o: In function `panel_dpi_probe_of':
panel-dpi.c:(.text+0x35c): undefined reference to `of_find_backlight_by_node'

This adds a Kconfig dependency like we have for the other OMAP
display targets.

Fixes: 39135a305a0f ("drm/omap: displays: panel-dpi: Support for handling backlight devices")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Tomi Valkeinen <[email protected]>

drm/omap: Fix error handling path in 'omap_dmm_probe()'

If we don't find a matching device node, we must free the memory allocated
in 'omap_dmm' a few lines above.

Fixes: 7cb0d6c17b96 ("drm/omap: fix TILER on OMAP5")
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: Tomi Valkeinen <[email protected]>

drm/i915: Disable THP until we have a GPU read BW W/A

We seem to be missing some W/A for 2M pages and are getting
a hit on raw GPU read bandwidths (even 30%) even though the
GPU write bandwidths improve (even 10%).

For now, disable THP, which is our only practical source of
2M pages until we have a W/A for the issue.

v2:
- Be explicit that we talk about GPU bandwidths (Eero)
- s/deny/never/ because that's why (Chris)

Reported-by: Valtteri Rantala <[email protected]>
Fixes: b901bb89324a ("drm/i915/gemfs: enable THP")
Signed-off-by: Joonas Lahtinen <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Valtteri Rantala <[email protected]>
Cc: Eero Tamminen <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Tested-by: Valtteri Rantala <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 9987da4b5dcfc8b94b702d4bb94b30955eb73c75)
Signed-off-by: Joonas Lahtinen <[email protected]>

drm/bridge: tc358767: fix 1-lane behavior

Use drm_dp_channel_eq_ok helper

Acked-by: Philipp Zabel <[email protected]>
Signed-off-by: Andrey Gusakov <[email protected]>
Signed-off-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/1510073785-16108-7-git-send-email-andrey.gusakov@cogentembedded.com

drm/bridge: tc358767: fix AUXDATAn registers access

First four bytes should go to DP0_AUXWDATA0. Due to bug if
len > 4 first four bytes was writen to DP0_AUXWDATA1 and all
data get shifted by 4 bytes. Fix it.

Acked-by: Philipp Zabel <[email protected]>
Signed-off-by: Andrey Gusakov <[email protected]>
Signed-off-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/1510073785-16108-6-git-send-email-andrey.gusakov@cogentembedded.com

drm/bridge: tc358767: fix timing calculations

Fields in HTIM01 and HTIM02 regs should be even.
Recomended thresh_dly value is max_tu_symbol.
Remove set of VPCTRL0.VSDELAY as it is related to DSI input
interface. Currently driver supports only DPI.

Acked-by: Philipp Zabel <[email protected]>
Signed-off-by: Andrey Gusakov <[email protected]>
Signed-off-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/1510073785-16108-5-git-send-email-andrey.gusakov@cogentembedded.com

drm/bridge: tc358767: fix DP0_MISC register set

Remove shift from TU_SIZE_RECOMMENDED define as it used to
calculate max_tu_symbols.

Acked-by: Philipp Zabel <[email protected]>
Signed-off-by: Andrey Gusakov <[email protected]>
Signed-off-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/1510073785-16108-4-git-send-email-andrey.gusakov@cogentembedded.com

drm/bridge: tc358767: filter out too high modes

Pixel clock limitation for DPI is 154 MHz. Do not accept modes
with higher pixel clock rate.

Reviewed-by: Andrzej Hajda <[email protected]>
Signed-off-by: Andrey Gusakov <[email protected]>
Signed-off-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/1510073785-16108-3-git-send-email-andrey.gusakov@cogentembedded.com

drm/bridge: tc358767: do no fail on hi-res displays

Do not fail data rates higher than 2.7 and more than 2 lanes.
Try to fall back to 2.7Gbps and 2 lanes.

Acked-by: Philipp Zabel <[email protected]>
Reviewed-by: Andrzej Hajda <[email protected]>
Signed-off-by: Andrey Gusakov <[email protected]>
Signed-off-by: Andrzej Hajda <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/1510073785-16108-2-git-send-email-andrey.gusakov@cogentembedded.com

drm/bridge: Fix lvds-encoder since the panel_bridge rework.

The panel_bridge bridge attaches to the panel's OF node, not the
lvds-encoder's node. Put in a little no-op bridge of our own so that
our consumers can still find a bridge where they expect.

This also fixes an unintended unregistration and leak of the
panel-bridge on module remove.

Signed-off-by: Eric Anholt <[email protected]>
Fixes: 13dfc0540a57 ("drm/bridge: Refactor out the panel wrapper from the lvds-encoder bri
dge.")
Tested-by: Lothar Waßmann <[email protected]>
Signed-off-by: Archit Taneja <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/bridge: synopsys/dw-hdmi: Enable cec clock

Support the "cec" optional clock. The documentation already mentions "cec"
optional clock and it is used by several boards, but currently the driver
doesn't enable it, thus preventing cec from working on those boards.

And even worse: a /dev/cecX device will appear for those boards, but it
won't be functioning without configuring this clock.

Changes:
v4:
- Change commit message to stress the importance of this patch

v3:
- Drop useless braces

v2:
- Separate ENOENT errors from others
- Propagate other errors (especially -EPROBE_DEFER)

Signed-off-by: Pierre-Hugues Husson <[email protected]>
Signed-off-by: Archit Taneja <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/bridge: adv7511/33: Fix adv7511_cec_init() failure handling

If the device tree for a board did not specify a cec clock, then
adv7511_cec_init would return an error, which would cause adv7511_probe()
to fail and thus there is no HDMI output.

There is no need to have adv7511_probe() fail if the CEC initialization
fails, so just change adv7511_cec_init() to a void function. In addition,
adv7511_cec_init() should just return silently if the cec clock isn't
found and show a message for any other errors.

An otherwise correct cleanup patch from Dan Carpenter turned this broken
failure handling into a kernel Oops, so bisection points to commit
7af35b0addbc ("drm/kirin: Checking for IS_ERR() instead of NULL") rather
than 3b1b975003e4 ("drm: adv7511/33: add HDMI CEC support").

Based on earlier patches from Arnd and John.

Reported-by: Naresh Kamboju <[email protected]>
Cc: Xinliang Liu <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Sean Paul <[email protected]>
Cc: Archit Taneja <[email protected]>
Cc: John Stultz <[email protected]>
Link: https://bugs.linaro.org/show_bug.cgi?id=3345
Link: https://lkft.validation.linaro.org/scheduler/job/48017#L3551
Fixes: 7af35b0addbc ("drm/kirin: Checking for IS_ERR() instead of NULL")
Fixes: 3b1b975003e4 ("drm: adv7511/33: add HDMI CEC support")
Signed-off-by: Hans Verkuil <[email protected]>
Tested-by: Hans Verkuil <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
Tested-by: John Stultz <[email protected]>
Signed-off-by: Archit Taneja <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge branch 'akpm' (patches from Andrew)

Mergr misc fixes from Andrew Morton:
"28 fixes"

* emailed patches from Andrew Morton <[email protected]>: (28 commits)
  fs/hugetlbfs/inode.c: change put_page/unlock_page order in hugetlbfs_fallocate()
  mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
  autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
  autofs: revert "autofs: take more care to not update last_used on path walk"
  fs/fat/inode.c: fix sb_rdonly() change
  mm, memcg: fix mem_cgroup_swapout() for THPs
  mm: migrate: fix an incorrect call of prep_transhuge_page()
  kmemleak: add scheduling point to kmemleak_scan()
  scripts/bloat-o-meter: don't fail with division by 0
  fs/mbcache.c: make count_objects() more robust
  Revert "mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical"
  mm/madvise.c: fix madvise() infinite loop under special circumstances
  exec: avoid RLIMIT_STACK races with prlimit()
  IB/core: disable memory registration of filesystem-dax vmas
  v4l2: disable filesystem-dax mapping support
  mm: fail get_vaddr_frames() for filesystem-dax mappings
  mm: introduce get_user_pages_longterm
  device-dax: implement ->split() to catch invalid munmap attempts
  mm, hugetlbfs: introduce ->split() to vm_operations_struct
  scripts/faddr2line: extend usage on generic arch
  ...

fs/hugetlbfs/inode.c: change put_page/unlock_page order in hugetlbfs_fallocate()

hugetlfs_fallocate() currently performs put_page() before unlock_page().
This scenario opens a small time window, from the time the page is added
to the page cache, until it is unlocked, in which the page might be
removed from the page-cache by another core.  If the page is removed
during this time windows, it might cause a memory corruption, as the
wrong page will be unlocked.

It is arguable whether this scenario can happen in a real system, and
there are several mitigating factors.  The issue was found by code
inspection (actually grep), and not by actually triggering the flow.
Yet, since putting the page before unlocking is incorrect it should be
fixed, if only to prevent future breakage or someone copy-pasting this
code.

Mike said:
"I am of the opinion that this does not need to be sent to stable.
  Although the ordering is current code is incorrect, there is no way
  for this to be a problem with current locking. In addition, I verified
  that the perhaps bigger issue with sys_fadvise64(POSIX_FADV_DONTNEED)
  for hugetlbfs and other filesystems is addressed in 3a77d214807c ("mm:
  fadvise: avoid fadvise for fs without backing device")"

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 70c3547e36f5c ("hugetlbfs: add hugetlbfs_fallocate()")
Signed-off-by: Nadav Amit <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine

I made a mistake during converting hugetlb code to 5-level paging: in
huge_pte_alloc() we have to use p4d_alloc(), not p4d_offset().

Otherwise it leads to crash -- NULL-pointer dereference in pud_alloc()
if p4d table is not yet allocated.

It only can happen in 5-level paging mode. In 4-level paging mode
p4d_offset() always returns pgd, so we are fine.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: c2febafc6773 ("mm: convert generic code to 5-level paging")
Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: <[email protected]> [4.11+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"

Commit 42f461482178 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
allowed the fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT
flag but introduced a semantic change.

In order to honor AT_NO_AUTOMOUNT a semantic change was made to the
negative dentry case for stat family system calls in follow_automount().

This changed the unconditional triggering of an automount in this case
to no longer be done and an error returned instead.

This has caused more problems than I expected so reverting the change is
needed.

In a discussion with Neil Brown it was concluded that the automount(8)
daemon can implement this change without kernel modifications. So that
will be done instead and the autofs module documentation updated with a
description of the problem and what needs to be done by module users for
this specific case.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
Signed-off-by: Ian Kent <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Al Viro <[email protected]>
Cc: David Howells <[email protected]>
Cc: Colin Walters <[email protected]>
Cc: Ondrej Holy <[email protected]>
Cc: <[email protected]> [4.11+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

autofs: revert "autofs: take more care to not update last_used on path walk"

While commit 092a53452bb7 ("autofs: take more care to not update
last_used on path walk") helped (partially) resolve a problem where
automounts were not expiring due to aggressive accesses from user space
it has a side effect for very large environments.

This change helps with the expire problem by making the expire more
aggressive but, for very large environments, that means more mount
requests from clients. When there are a lot of clients that can mean
fairly significant server load increases.

It turns out I put the last_used in this position to solve this very
problem and failed to update my own thinking of the autofs expire
policy. So the patch being reverted introduces a regression which
should be fixed.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 092a53452b ("autofs: take more care to not update last_used on path walk")
Signed-off-by: Ian Kent <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Cc: Al Viro <[email protected]>
Cc: <[email protected]> [4.11+]
Cc: Colin Walters <[email protected]>
Cc: David Howells <[email protected]>
Cc: Ondrej Holy <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/fat/inode.c: fix sb_rdonly() change

Commit bc98a42c1f7d ("VFS: Convert sb->s_flags & MS_RDONLY to
sb_rdonly(sb)") converted fat_remount():new_rdonly from a bool to an
int.

However fat_remount() depends upon the compiler's conversion of a
non-zero integer into boolean `true'.

Fix it by switching `new_rdonly' back into a bool.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: bc98a42c1f7d0f8 ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)")
Signed-off-by: OGAWA Hirofumi <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: David Howells <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, memcg: fix mem_cgroup_swapout() for THPs

Commit d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout()
support THP") changed mem_cgroup_swapout() to support transparent huge
page (THP).

However the patch missed one location which should be changed for
correctly handling THPs. The resulting bug will cause the memory
cgroups whose THPs were swapped out to become zombies on deletion.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() support THP")
Signed-off-by: Shakeel Butt <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Greg Thelen <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: migrate: fix an incorrect call of prep_transhuge_page()

In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during
memory hotplug/hot remove prep_transhuge_page() is called incorrectly on
non-THP pages for migration, when THP is on but THP migration is not
enabled.  This leads to a bad state of target pages for migration.

By inspecting the code, if called on a non-THP, prep_transhuge_page()
will

1) change the value of the mapping of (page + 2), since it is used for
    THP deferred list;

2) change the lru value of (page + 1), since it is used for THP's dtor.

Both can lead to data corruption of these two pages.

Andrea said:
"Pragmatically and from the point of view of the memory_hotplug subsys,
  the effect is a kernel crash when pages are being migrated during a
  memory hot remove offline and migration target pages are found in a
  bad state"

This patch fixes it by only calling prep_transhuge_page() when we are
certain that the target page is THP.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration")
Signed-off-by: Zi Yan <[email protected]>
Reported-by: Andrea Reale <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: <[email protected]> [4.14]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kmemleak: add scheduling point to kmemleak_scan()

kmemleak_scan() will scan struct page for each node and it can be really
large and resulting in a soft lockup.  We have seen a soft lockup when
do scan while compile kernel:

  watchdog: BUG: soft lockup - CPU#53 stuck for 22s! [bash:10287]
[...]
  Call Trace:
   kmemleak_scan+0x21a/0x4c0
   kmemleak_write+0x312/0x350
   full_proxy_write+0x5a/0xa0
   __vfs_write+0x33/0x150
   vfs_write+0xad/0x1a0
   SyS_write+0x52/0xc0
   do_syscall_64+0x61/0x1a0
   entry_SYSCALL64_slow_path+0x25/0x25

Fix this by adding cond_resched every MAX_SCAN_SIZE.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yisheng Xie <[email protected]>
Suggested-by: Catalin Marinas <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Cc: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

scripts/bloat-o-meter: don't fail with division by 0

Under some circumstances it's possible to get a divider 0 which crashes
the script.

  Traceback (most recent call last):
    File "linux/scripts/bloat-o-meter", line 98, in <module>
      print_result("Function", "tTdDbBrR", 2)
    File "linux/scripts/bloat-o-meter", line 87, in print_result
      (otot, ntot, (ntot - otot)*100.0/otot))
  ZeroDivisionError: float division by zero

Hide this by checking the divider first.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Vaneet Narang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs/mbcache.c: make count_objects() more robust

When running ltp stress test for 7*24 hours, vmscan occasionally emits
the following warning continuously:

  mb_cache_scan+0x0/0x3f0 negative objects to delete
  nr=-9232265467809300450
  ...

Tracing shows the freeable(mb_cache_count returns) is -1, which causes
the continuous accumulation and overflow of total_scan.

This patch makes sure that mb_cache_count() cannot return a negative
value, which makes the mbcache shrinker more robust.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jiang Biao <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Revert "mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical"

This reverts commit 0f6d24f87856 ("mm/page-writeback.c: print a warning
if the vm dirtiness settings are illogical") because it causes false
positive warnings during OOM situations as noticed by Tetsuo Handa:

  Node 0 active_anon:3525940kB inactive_anon:8372kB active_file:216kB inactive_file:1872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2504kB dirty:52kB writeback:0kB shmem:8660kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 636928kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
  Node 0 DMA free:14848kB min:284kB low:352kB high:420kB active_anon:992kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 2687 3645 3645
  Node 0 DMA32 free:53004kB min:49608kB low:62008kB high:74408kB active_anon:2712648kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2773132kB mlocked:0kB kernel_stack:96kB pagetables:5096kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 0 958 958
  Node 0 Normal free:17140kB min:17684kB low:22104kB high:26524kB active_anon:812300kB inactive_anon:8372kB active_file:1228kB inactive_file:1868kB unevictable:0kB writepending:52kB present:1048576kB managed:981224kB mlocked:0kB kernel_stack:3520kB pagetables:8552kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  [...]
  Out of memory: Kill process 8459 (a.out) score 999 or sacrifice child
  Killed process 8459 (a.out) total-vm:4180kB, anon-rss:88kB, file-rss:0kB, shmem-rss:0kB
  oom_reaper: reaped process 8459 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
  vm direct limit must be set greater than background limit.

The problem is that both thresh and bg_thresh will be 0 if
available_memory is less than 4 pages when evaluating
global_dirtyable_memory.

While this might be worked around the whole point of the warning is
dubious at best.  We do rely on admins to do sensible things when
changing tunable knobs.  Dirty memory writeback knobs are not any
special in that regards so revert the warning rather than adding more
hacks to work this around.

Debugged by Yafang Shao.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 0f6d24f87856 ("mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical")
Signed-off-by: Michal Hocko <[email protected]>
Reported-by: Tetsuo Handa <[email protected]>
Cc: Yafang Shao <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/madvise.c: fix madvise() infinite loop under special circumstances

MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
Unfortunately madvise_willneed() doesn't communicate this information
properly to the generic madvise syscall implementation.  The calling
convention is quite subtle there.  madvise_vma() is supposed to either
return an error or update &prev otherwise the main loop will never
advance to the next vma and it will keep looping for ever without a way
to get out of the kernel.

It seems this has been broken since introduction.  Nobody has noticed
because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.

[[email protected]: rewrite changelog]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
Signed-off-by: chenjie <[email protected]>
Signed-off-by: guoxuenan <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: zhangyi (F) <[email protected]>
Cc: Miao Xie <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Carsten Otte <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

exec: avoid RLIMIT_STACK races with prlimit()

While the defense-in-depth RLIMIT_STACK limit on setuid processes was
protected against races from other threads calling setrlimit(), I missed
protecting it against races from external processes calling prlimit().
This adds locking around the change and makes sure that rlim_max is set
too.

Link: http://lkml.kernel.org/r/20171127193457.GA11348@beast
Fixes: 64701dee4178e ("exec: Use sane stack rlimit under secureexec")
Signed-off-by: Kees Cook <[email protected]>
Reported-by: Ben Hutchings <[email protected]>
Reported-by: Brad Spengler <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Cc: James Morris <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

IB/core: disable memory registration of filesystem-dax vmas

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow RDMA to create long standing memory registrations
against filesytem-dax vmas.

Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Christoph Hellwig <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Jason Gunthorpe <[email protected]>
Acked-by: Doug Ledford <[email protected]>
Cc: Sean Hefty <[email protected]>
Cc: Hal Rosenstock <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

v4l2: disable filesystem-dax mapping support

V4L2 memory registrations are incompatible with filesystem-dax that
needs the ability to revoke dma access to a mapping at will, or
otherwise allow the kernel to wait for completion of DMA. The
filesystem-dax implementation breaks the traditional solution of
truncate of active file backed mappings since there is no page-cache
page we can orphan to sustain ongoing DMA.

If v4l2 wants to support long lived DMA mappings it needs to arrange to
hold a file lease or use some other mechanism so that the kernel can
coordinate revoking DMA access when the filesystem needs to truncate
mappings.

Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Jan Kara <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Hal Rosenstock <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sean Hefty <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fail get_vaddr_frames() for filesystem-dax mappings

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow V4L2, Exynos, and other frame vector users to create
long standing / irrevocable memory registrations against filesytem-dax
vmas.

[[email protected]: add comment for vma_is_fsdax() check in get_vaddr_frames(), per Jan]
Link: http://lkml.kernel.org/r/151197874035.26211.4061781453123083667.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Hal Rosenstock <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sean Hefty <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: introduce get_user_pages_longterm

Patch series "introduce get_user_pages_longterm()", v2.

Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely.  This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).

In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future.  This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.

Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.

Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.

I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel.  The behavior regression this policy
change implies is one of the reasons we maintain the "dax enabled.
Warning: EXPERIMENTAL, use at your own risk" notification when mounting
a filesystem in dax mode.

It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.

This patch (of 4):

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow long standing memory registrations against
filesytem-dax vmas.  Device-dax vmas do not have this problem and are
explicitly allowed.

This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and
V4L2).

[[email protected]: use kcalloc()]
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <[email protected]>
Suggested-by: Christoph Hellwig <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Hal Rosenstock <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sean Hefty <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

device-dax: implement ->split() to catch invalid munmap attempts

Similar to how device-dax enforces that the 'address', 'offset', and
'len' parameters to mmap() be aligned to the device's fundamental
alignment, the same constraints apply to munmap().  Implement ->split()
to fail munmap calls that violate the alignment constraint.

Otherwise, we later fail VM_BUG_ON checks in the unmap_page_range() path
with crash signatures of the form:

    vma ffff8800b60c8a88 start 00007f88c0000000 end 00007f88c0e00000
    next           (null) prev           (null) mm ffff8800b61150c0
    prot 8000000000000027 anon_vma           (null) vm_ops ffffffffa0091240
    pgoff 0 file ffff8800b638ef80 private_data           (null)
    flags: 0x380000fb(read|write|shared|mayread|maywrite|mayexec|mayshare|softdirty|mixedmap|hugepage)
    ------------[ cut here ]------------
    kernel BUG at mm/huge_memory.c:2014!
    [..]
    RIP: 0010:__split_huge_pud+0x12a/0x180
    [..]
    Call Trace:
     unmap_page_range+0x245/0xa40
     ? __vma_adjust+0x301/0x990
     unmap_vmas+0x4c/0xa0
     unmap_region+0xae/0x120
     ? __vma_rb_erase+0x11a/0x230
     do_munmap+0x276/0x410
     vm_munmap+0x6a/0xa0
     SyS_munmap+0x1d/0x30

Link: http://lkml.kernel.org/r/151130418681.4029.7118245855057952010.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Jeff Moyer <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, hugetlbfs: introduce ->split() to vm_operations_struct

Patch series "device-dax: fix unaligned munmap handling"

When device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and fail attempts to split vmas into unaligned ranges.  It
would be messy to teach the munmap path about device-dax alignment
constraints in the same (hstate) way that hugetlbfs communicates this
constraint.  Instead, these patches introduce a new ->split() vm
operation.

This patch (of 2):

The device-dax interface has similar constraints as hugetlbfs in that it
requires the munmap path to unmap in huge page aligned units.  Rather
than add more custom vma handling code in __split_vma() introduce a new
vm operation to perform this vma specific check.

Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

scripts/faddr2line: extend usage on generic arch

When cross-compiling, fadd2line should use the binary tool used for the
target system, rather than that of the host.

Link: http://lkml.kernel.org/r/20171121092911.GA150711@sofia
Signed-off-by: Liu Changcheng <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: replace pte_write with pte_access_permitted in fault + gup paths

The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:

- validate that the mapping is writable from a protection keys
standpoint

- validate that the pte has _PAGE_USER set since all fault paths where
pte_write is must be referencing user-memory.

Link: http://lkml.kernel.org/r/151043111604.2842.8051684481794973100.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: replace pmd_write with pmd_access_permitted in fault + gup paths

The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:

- validate that the mapping is writable from a protection keys
standpoint

- validate that the pte has _PAGE_USER set since all fault paths where
pmd_write is must be referencing user-memory.

Link: http://lkml.kernel.org/r/151043111049.2842.15241454964150083466.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: replace pud_write with pud_access_permitted in fault + gup paths

The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:

- validate that the mapping is writable from a protection keys
standpoint

- validate that the pte has _PAGE_USER set since all fault paths where
pud_write is must be referencing user-memory.

[[email protected]: fix powerpc compile error]
Link: http://lkml.kernel.org/r/151129127237.37405.16073414520854722485.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151043110453.2842.2166049702068628177.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE

In response to compile breakage introduced by a series that added the
pud_write helper to x86, Stephen notes:

    did you consider using the other paradigm:

    In arch include files:
    #define pud_write       pud_write
    static inline int pud_write(pud_t pud)
     .....

    Then in include/asm-generic/pgtable.h:

    #ifndef pud_write
    tatic inline int pud_write(pud_t pud)
    {
            ....
    }
    #endif

    If you had, then the powerpc code would have worked ... ;-) and many
    of the other interfaces in include/asm-generic/pgtable.h are
    protected that way ...

Given that some architecture already define pmd_write() as a macro, it's
a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE.

Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Suggested-by: Stephen Rothwell <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Oliver OHalloran <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix device-dax pud write-faults triggered by get_user_pages()

Currently only get_user_pages_fast() can safely handle the writable gup
case due to its use of pud_access_permitted() to check whether the pud
entry is writable.  In the gup slow path pud_write() is used instead of
pud_access_permitted() and to date it has been unimplemented, just calls
BUG_ON().

    kernel BUG at ./include/linux/hugetlb.h:244!
    [..]
    RIP: 0010:follow_devmap_pud+0x482/0x490
    [..]
    Call Trace:
     follow_page_mask+0x28c/0x6e0
     __get_user_pages+0xe4/0x6c0
     get_user_pages_unlocked+0x130/0x1b0
     get_user_pages_fast+0x89/0xb0
     iov_iter_get_pages_alloc+0x114/0x4a0
     nfs_direct_read_schedule_iovec+0xd2/0x350
     ? nfs_start_io_direct+0x63/0x70
     nfs_file_direct_read+0x1e0/0x250
     nfs_file_read+0x90/0xc0

For now this just implements a simple check for the _PAGE_RW bit similar
to pmd_write.  However, this implies that the gup-slow-path check is
missing the extra checks that the gup-fast-path performs with
pud_access_permitted.  Later patches will align all checks to use the
'access_permitted' helper if the architecture provides it.

Note that the generic 'access_permitted' helper fallback is the simple
_PAGE_RW check on architectures that do not define the
'access_permitted' helper(s).

[[email protected]: fix powerpc compile error]
Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Acked-by: Thomas Gleixner <[email protected]> [x86]
Cc: Kirill A. Shutemov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/cma: fix alloc_contig_range ret code/potential leak

If the call __alloc_contig_migrate_range() in alloc_contig_range returns
-EBUSY, processing continues so that test_pages_isolated() is called
where there is a tracepoint to identify the busy pages.  However, it is
possible for busy pages to become available between the calls to these
two routines.  In this case, the range of pages may be allocated.
Unfortunately, the original return code (ret == -EBUSY) is still set and
returned to the caller.  Therefore, the caller believes the pages were
not allocated and they are leaked.

Update the comment to indicate that allocation is still possible even if
__alloc_contig_migrate_range returns -EBUSY.  Also, clear return code in
this case so that it is not accidentally used or returned to caller.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure")
Signed-off-by: Mike Kravetz <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Joonsoo Kim <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>