Git Repo - linux.git/log

virtio_net: fix support for small rings

When ring size is small (<32 entries) making buffers smaller means a
full ring might not be able to hold enough buffers to fit a single large
packet.

Make sure a ring full of buffers is large enough to allow at least one
packet of max size.

Fixes: 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag allocators")
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio_net: reduce alignment for buffers

We don't need to align length to any particular
value anymore. Aligning to L1 cache size probably
sill makes sense to reduce false sharing.

Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio_net: rework mergeable buffer handling

Use the new _ctx virtio API to maintain true length for each buffer.

Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio_net: allow specifying context for rx

With mergeable buffers we never use s/g for rx,
so allow specifying context in that case.

Signed-off-by: Michael S. Tsirkin <[email protected]>

powerpc/64s: Support new device tree binding for discovering CPU features

The ibm,powerpc-cpu-features device tree binding describes CPU features with
ASCII names and extensible compatibility, privilege, and enablement metadata
that allows improved flexibility and compatibility with new hardware.

The interface is described in detail in ibm,powerpc-cpu-features.txt in this
patch.

Currently this code is not enabled by default, and there are no released
firmwares that provide the binding.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

drivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparison

Use time_after() for time comparison with the new fix.

Signed-off-by: Karim Eshapa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

DECnet: Use container_of() for embedded struct

Instead of a direct cross-type cast, use conatiner_of() to locate
the embedded structure, even in the face of future struct layout
randomization.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

powerpc: Don't print cpu_spec->cpu_name if it's NULL

Currently we assume that if the cpu_spec has a pvr_mask then it must also have a
cpu_name. But that will change in a subsequent commit when we do CPU feature
discovery via the device tree, so check explicitly if cpu_name is NULL.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

of/fdt: introduce of_scan_flat_dt_subnodes and of_get_flat_dt_phandle

Introduce primitives for FDT parsing. These will be used for powerpc
cpufeatures node scanning, which has quite complex structure but should
be processed early.

Cc: [email protected]
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next

Freescale updates from Scott:

"Includes a fix for a powerpc/next mm regression on 64e, a fix for a
kernel hang on 64e when using a debugger inside a relocated kernel, a
qman fix, and misc qe improvements."

Merge tag 'kvm-arm-for-v4.12-round2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

Second round of KVM/ARM Changes for v4.12.

Changes include:
- A fix related to the 32-bit idmap stub
- A fix to the bitmask used to deode the operands of an AArch32 CP
   instruction
- We have moved the files shared between arch/arm/kvm and
   arch/arm64/kvm to virt/kvm/arm
- We add support for saving/restoring the virtual ITS state to
   userspace

KVM: arm/arm64: vgic-its: Cleanup after failed ITT restore

When failing to restore the ITT for a DTE, we should remove the failed
device entry from the list and free the object.

We slightly refactor vgic_its_destroy to be able to reuse the now
separate vgic_its_free_dte() function.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Don't call map_resources when restoring ITS tables

The only reason we called kvm_vgic_map_resources() when restoring the
ITS tables was because we wanted to have the KVM iodevs registered in
the KVM IO bus framework at the time when the ITS was restored such that
a restored and active device can inject MSIs prior to otherwise calling
kvm_vgic_map_resources() from the first run of a VCPU.

Since we now register the KVM iodevs for the redestributors and ITS as
soon as possible (when setting the base addresses), we no longer need
this call and kvm_vgic_map_resources() is again called only when first
running a VCPU.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Register ITS iodev when setting base address

We have to register the ITS iodevice before running the VM, because in
migration scenarios, we may be restoring a live device that wishes to
inject MSIs before the VCPUs have started.

All we need to register the ITS io device is the base address of the
ITS, so we can simply register that when the base address of the ITS is
set.

[ Code to fix concurrency issues when setting the ITS base address and
to fix the undef base address check written by Marc Zyngier ]

Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Get rid of its->initialized field

The its->initialized doesn't bring much to the table, and creates
unnecessary ordering between setting the address and initializing it
(which amounts to exactly nothing).

Let's kill it altogether, making KVM_DEV_ARM_VGIC_CTRL_INIT the no-op
it deserves to be.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs

Instead of waiting with registering KVM iodevs until the first VCPU is
run, we can actually create the iodevs when the redist base address is
set. The only downside is that we must now also check if we need to do
this for VCPUs which are created after creating the VGIC, because there
is no enforced ordering between creating the VGIC (and setting its base
addresses) and creating the VCPUs.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Slightly rework kvm_vgic_addr

As we are about to handle setting the address for the redistributor base
region separately from some of the other base addresses, let's rework
this function to leave a little more room for being flexible in what
each type of base address does.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Make vgic_v3_check_base more broadly usable

As we are about to fiddle with the IO device registration mechanism,
let's be a little more careful when setting base addresses as early as
possible. When setting a base address, we can check that there's
address space enough for its scope and when the last of the two
base addresses (dist and redist) get set, we can also check if the
regions overlap at that time.

This allows us to provide error messages to the user at time when trying
to set the base address, as opposed to later when trying to run the VM.

To do this, we make vgic_v3_check_base available in the core vgic-v3
code as well as in the other parts of the GICv3 code, namely the MMIO
config code.

We also return true for undefined base addresses so that the function
can be used before all base addresses are set; all callers already check
for uninitialized addresses before calling this function.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Refactor vgic_register_redist_iodevs

Split out the function to register all the redistributor iodevs into a
function that handles a single redistributor at a time in preparation
for being able to call this per VCPU as these get created.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: Add kvm_vcpu_get_idx to get vcpu index in kvm->vcpus

There are occasional needs to use the index of vcpu in the kvm->vcpus
array to map something related to a VCPU. For example, unlike the
vcpu->vcpu_id, the vcpu index is guaranteed to not be sparse across all
vcpus which is useful when allocating a memory area for each vcpu.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

nVMX: Advertise PML to L1 hypervisor

Advertise the PML bit in vmcs12 but don't try to enable
it in hardware when running L2 since L0 is emulating it. Also,
preserve L0's settings for PML since it may still
want to log writes.

Signed-off-by: Bandan Das <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nVMX: Implement emulated Page Modification Logging

With EPT A/D enabled, processor access to L2 guest
paging structures will result in a write violation.
When this happens, write the GUEST_PHYSICAL_ADDRESS
to the pml buffer provided by L1 if the access is
write and the dirty bit is being set.

This patch also adds necessary checks during VMEntry if L1
has enabled PML. If the PML index overflows, we change the
exit reason and run L1 to simulate a PML full event.

Signed-off-by: Bandan Das <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kvm: x86: Add a hook for arch specific dirty logging emulation

When KVM updates accessed/dirty bits, this hook can be used
to invoke an arch specific function that implements/emulates
dirty logging such as PML.

Signed-off-by: Bandan Das <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kvm: nVMX: Validate CR3 target count on nested VM-entry

According to the SDM, the CR3-target count must not be greater than
4. Future processors may support a different number of CR3-target
values. Software should read the VMX capability MSR IA32_VMX_MISC to
determine the number of values supported.

Signed-off-by: Jim Mattson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

The main thing here is a new implementation of the in-kernel
XICS interrupt controller emulation for POWER9 machines, from Ben
Herrenschmidt.

POWER9 has a new interrupt controller called XIVE (eXternal Interrupt
Virtualization Engine) which is able to deliver interrupts directly
to guest virtual CPUs in hardware without hypervisor intervention.
With this new code, the guest still sees the old XICS interface but
performance is better because the XICS emulation in the host uses the
XIVE directly rather than going through a XICS emulation in firmware.

Conflicts:
arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix]
arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]

KVM: set no_llseek in stat_fops_per_vm

In vm_stat_get_per_vm_fops and vcpu_stat_get_per_vm_fops, since we
use nonseekable_open() to open, we should use no_llseek() to seek,
not generic_file_llseek().

Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

powerpc/64s: Fix unnecessary machine check handler relocation branch

Similarly to commit 2563a70c3b ("powerpc/64s: Remove unnecessary relocation
branch from idle handler"), the machine check handler has a BRANCH_TO from
relocated to relocated code, which is unnecessary.

It has also caused build errors with some toolchains:

  arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
  arch/powerpc/kernel/exceptions-64s.S:395: Error: operand out of range
  (0xffffffffffff8280 is not between 0x0000000000000000 and
  0x000000000000ffff)

Fixes: 1945bc4549e5 ("powerpc/64s: Fix POWER9 machine check handler from stop state")
Signed-off-by: Nicholas Piggin <[email protected]>
Reported-and-tested-by : Abdul Haleem <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/book3s/64: Rework page table geometry for lower memory usage

Recently in commit f6eedbba7a26 ("powerpc/mm/hash: Increase VA range to 128TB")
we increased the virtual address space for user processes to 128TB by default,
and up to 512TB if user space opts in.

This obviously required expanding the range of the Linux page tables. For Book3s
64-bit using hash and with PAGE_SIZE=64K, we increased the PGD to 2^15 entries.
This meant we could cover the full address range, while still being able to
insert a 16G hugepage at the PGD level and a 16M hugepage in the PMD.

The downside of that geometry is that it uses a lot of memory for the PGD, and
in particular makes the PGD a 4-page allocation, which means it's much more
likely to fail under memory pressure.

Instead we can make the PMD larger, so that a single PUD entry maps 16G,
allowing the 16G hugepages to sit at that level in the tree. We're then able to
split the remaining bits between the PUG and PGD. We make the PGD slightly
larger as that results in lower memory usage for typical programs.

When THP is enabled the PMD actually doubles in size, to 2^11 entries, or 2^14
bytes, which is large but still < PAGE_SIZE.

Signed-off-by: Michael Ellerman <[email protected]>
Reviewed-by: Balbir Singh <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>

powerpc: Fix distclean with Makefile.postlink

Makefile.postlink always includes include/config/auto.conf, however
this file is not present in a clean kernel tree, causing make to fail:

  $ git clone linuxppc.git
  $ cd linuxppc.git
  $ make distclean
  arch/powerpc/Makefile.postlink:10: include/config/auto.conf: No such file or directory
  make[1]: *** No rule to make target `include/config/auto.conf'.  Stop.
  make: *** [vmlinuxclean] Error 2

Equally running 'make distclean; make distclean' will trip the error case.

Change the inclusion such that file not being found does not trigger an error.

Fixes: f188d0524d7e ("powerpc: Use the new post-link pass to check relocations")
Reported-by: Mircea Pop <[email protected]>
Signed-off-by: Horia Geantă <[email protected]>
Tested-by: Justin M. Forbes <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

KVM: arm/arm64: vgic: Rename kvm_vgic_vcpu_init to kvm_vgic_vcpu_enable

This function really doesn't init anything, it enables the CPU
interface, so name it as such, which gives us the name to use for actual
init work later on.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Clarification and relaxation to ITS save/restore ABI

Clarify what is meant by the save/restore ABI only supporting virtual
physical interrupts.

Relax the requirement of the order that the collection entries are
written in and be clear that there is no particular ordering enforced.

Some cosmetic changes in the capitalization of ID names to align with
the GICv3 manual and remove the empty line in the bottom of the patch.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

x86/intel_rdt: Fix a typo in Documentation

Example 3 contains a typo:

"C0" in "# echo C0 > p0/cpus" is wrong because it specifies core
6-7 instead of wanted core 4-7.

Correct this typo to avoid confusion.

Signed-off-by: Xiaochen Shen <[email protected]>
Acked-by: Fenghua Yu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

clocksource/arm_arch_timer: Fix arch_timer_mem_find_best_frame()

arch_timer_mem_find_best_frame() looks through ARCH_TIMER_MEM_MAX_FRAMES
frames even after finding matches to ensure the best frame is chosen,
which means the variable frame will point to the last valid frame but
not necessarily the best frame.

On Juno, we get the following error as the wrong frame is returned as the
best frame from arch_timer_mem_find_best_frame():

  arch_timer: Unable to map frame @ 0x0000000000000000
  arch_timer: Frame missing phys irq.
  Failed to initialize '/timer@2a810000': -22

Fix the issue by correctly returning the best frame from
arch_timer_mem_find_best_frame().

Fixes: c389d701dfb7 ("clocksource: arm_arch_timer: split MMIO timer probing.")
Signed-off-by: Sudeep Holla <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

x86/build: Don't add -maccumulate-outgoing-args w/o compiler support

Clang does not support this machine dependent option.

Older versions of GCC (pre 3.0) may not support this option, added in
2000, but it's unlikely they can still compile a working kernel.

Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/boot/32: Fix UP boot on Quark and possibly other platforms

This partially reverts commit:

  23b2a4ddebdd17f ("x86/boot/32: Defer resyncing initial_page_table until per-cpu is set up")

That commit had one definite bug and one potential bug.  The
definite bug is that setup_per_cpu_areas() uses a differnet generic
implementation on UP kernels, so initial_page_table never got
resynced.  This was fine for access to percpu data (it's in the
identity map on UP), but it breaks other users of
initial_page_table.  The potential bug is that helpers like
efi_init() would be called before the tables were synced.

Avoid both problems by just syncing the page tables in setup_arch()
*and* setup_per_cpu_areas().

Reported-by: Jan Kiszka <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()

'__vmalloc_start_set' currently only gets set in initmem_init() when
!CONFIG_NEED_MULTIPLE_NODES. This breaks detection of vmalloc address
with virt_addr_valid() with CONFIG_NEED_MULTIPLE_NODES=y, causing
a kernel crash:

[mm/usercopy] 517e1fbeb6: kernel BUG at arch/x86/mm/physaddr.c:78!

Set '__vmalloc_start_set' appropriately for that case as well.

Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Laura Abbott <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: dc16ecf7fd1f ("x86-32: use specific __vmalloc_start_set flag in __virt_addr_valid")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

Merge tag 'linux-kselftest-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest updates from Shuah Khan:
"This update consists of:

   - important fixes for build failures and clean target related
     warnings to address regressions introduced in commit 88baa78d1f31
     ("selftests: remove duplicated all and clean target")

   - several minor spelling fixes in and log messages and comment
     blocks.

   - Enabling configs for better test coverage in ftrace, vm, and
     cpufreq tests.

   - .gitignore changes"

* tag 'linux-kselftest-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (26 commits)
  selftests: x86: add missing executables to .gitignore
  selftests: watchdog: accept multiple params on command line
  selftests: create cpufreq kconfig fragments
  selftests: x86: override clean in lib.mk to fix warnings
  selftests: sync: override clean in lib.mk to fix warnings
  selftests: splice: override clean in lib.mk to fix warnings
  selftests: gpio: fix clean target to remove all generated files and dirs
  selftests: add gpio generated files to .gitignore
  selftests: powerpc: override clean in lib.mk to fix warnings
  selftests: gpio: override clean in lib.mk to fix warnings
  selftests: futex: override clean in lib.mk to fix warnings
  selftests: lib.mk: define CLEAN macro to allow Makefiles to override clean
  selftests: splice: fix clean target to not remove default_file_splice_read.sh
  selftests: gpio: add config fragment for gpio-mockup
  selftests: breakpoints: allow to cross-compile for aarch64/arm64
  selftests/Makefile: Add missed PHONY targets
  selftests/vm/run_vmtests: Fix wrong comment
  selftests/Makefile: Add missed closing `"` in comment
  selftests/vm/run_vmtests: Polish output text
  selftests/timers: fix spelling mistake: "Asynchronous"
  ...

Merge tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing updates from Steven Rostedt:
"These are three simple changes.

  The first one is just a switch from using strcpy() to strlcpy().
  Someone thought that it may cause an overflow bug, but since it only
  copies comms into a pre-allocated array of TASK_COMM_LEN, and no comm
  should ever be bigger than that, nor not end with a nul character,
  this change is more of a safety precaution than fixing anything that
  is actually broken.

  The other two changes are simply cleaning and optimizing some code"

* tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Simplify ftrace_match_record() even more
  ftrace: Remove an unneeded condition
  tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

Merge tags 'for-linus' and 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull more rdma updates from Doug Ledford:
"As mentioned in my first pull request, this is the subsequent pull
  requests I had. This is all I have, and in fact this cleans out the
  RDMA subsystem's entire patchworks queue of kernel changes that are
  ready to go (well, it did for the weekend anyway, a few new patches
  are in, but they'll be coming during the -rc cycle).

  The first tag contains a single patch that would have conflicted if
  taken from my tree or DaveM's tree as it needed our trees merged to
  come cleanly.

  The second tag contains the patch series from Intel plus three other
  stragllers that came in late last week. I took them because it allowed
  me to legitimately claim that the RDMA patchworks queue was, for a
  short time, 100% cleared of all waiting kernel patches, woohoo! :-).

  I have it under my for-next tag, so it did get 0day and linux- next
  over the end of last week, and linux-next did show one minor conflict.

  Summary:

  'for-linus' tag:
   - mlx5/IPoIB fixup patch

  'for-next' tag:
   - the hfi1 15 patch set that landed late
   - IPoIB get_link_ksettings which landed late because I asked for a
     respin
   - one late rxe change
   - one -rc worthy fix that's in early"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/mlx5: Enable IPoIB acceleration

* tag 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  rxe: expose num_possible_cpus() cnum_comp_vectors
  IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type
  IB/hfi1: Clean up on context initialization failure
  IB/hfi1: Fix an assign/ordering issue with shared context IDs
  IB/hfi1: Clean up context initialization
  IB/hfi1: Correctly clear the pkey
  IB/hfi1: Search shared contexts on the opened device, not all devices
  IB/hfi1: Remove atomic operations for SDMA_REQ_HAVE_AHG bit
  IB/hfi1: Use filedata rather than filepointer
  IB/hfi1: Name function prototype parameters
  IB/hfi1: Fix a subcontext memory leak
  IB/hfi1: Return an error on memory allocation failure
  IB/hfi1: Adjust default eager_buffer_size to 8MB
  IB/hfi1: Get rid of divide when setting the tx request header
  IB/hfi1: Fix yield logic in send engine
  IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line
  IB/hfi1: Fix checks for Offline transient state
  IB/ipoib: add get_link_ksettings in ethtool

Revert "ipv4: restore rt->fi for reference counting"

This reverts commit 82486aa6f1b9bc8145e6d0fa2bc0b44307f3b875.

As implemented, this causes dangling netdevice refs.

Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

- add framework for supporting PCIe devices in Endpoint mode (Kishon
   Vijay Abraham I)

- use non-postable PCI config space mappings when possible (Lorenzo
   Pieralisi)

- clean up and unify mmap of PCI BARs (David Woodhouse)

- export and unify Function Level Reset support (Christoph Hellwig)

- avoid FLR for Intel 82579 NICs (Sasha Neftin)

- add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig)

- short-circuit config access failures for disconnected devices (Keith
   Busch)

- remove D3 sleep delay when possible (Adrian Hunter)

- freeze PME scan before suspending devices (Lukas Wunner)

- stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava)

- disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann)

- add arch-specific alignment control to improve device passthrough by
   avoiding multiple BARs in a page (Yongji Xie)

- add sysfs sriov_drivers_autoprobe to control VF driver binding
   (Bodong Wang)

- allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas)

- fix crashes when unbinding host controllers that don't support
   removal (Brian Norris)

- add driver for MicroSemi Switchtec management interface (Logan
   Gunthorpe)

- add driver for Faraday Technology FTPCI100 host bridge (Linus
   Walleij)

- add i.MX7D support (Andrey Smirnov)

- use generic MSI support for Aardvark (Thomas Petazzoni)

- make Rockchip driver modular (Brian Norris)

- advertise 128-byte Read Completion Boundary support for Rockchip
   (Shawn Lin)

- advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin)

- convert atomic_t to refcount_t in HV driver (Elena Reshetova)

- add CPU IRQ affinity in HV driver (K. Y. Srinivasan)

- fix PCI bus removal in HV driver (Long Li)

- add support for ThunderX2 DMA alias topology (Jayachandran C)

- add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki)

- add ITE 8893 bridge DMA alias quirk (Jarod Wilson)

- restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
   (Manish Jaggi)

* tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
  PCI: Don't allow unbinding host controllers that aren't prepared
  ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
  MAINTAINERS: Add PCI Endpoint maintainer
  Documentation: PCI: Add userguide for PCI endpoint test function
  tools: PCI: Add sample test script to invoke pcitest
  tools: PCI: Add a userspace tool to test PCI endpoint
  Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
  misc: Add host side PCI driver for PCI test function device
  PCI: Add device IDs for DRA74x and DRA72x
  dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
  PCI: dwc: dra7xx: Workaround for errata id i870
  dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
  PCI: dwc: dra7xx: Add EP mode support
  PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
  dt-bindings: PCI: Add DT bindings for PCI designware EP mode
  PCI: dwc: designware: Add EP mode support
  Documentation: PCI: Add binding documentation for pci-test endpoint function
  ixgbe: Use pcie_flr() instead of duplicating it
  IB/hfi1: Use pcie_flr() instead of duplicating it
  PCI: imx6: Fix spelling mistake: "contol" -> "control"
  ...

Merge tag 'tty-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial updates from Greg KH:
"Here is the "big" TTY/Serial patch updates for 4.12-rc1

  Not a lot of new things here, the normal number of serial driver
  updates and additions, tiny bugs fixed, and some core files split up
  to make future changes a bit easier for Nicolas's "tiny-tty" work.

  All of these have been in linux-next for a while"

* tag 'tty-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (62 commits)
  serial: small Makefile reordering
  tty: split job control support into a file of its own
  tty: move baudrate handling code to a file of its own
  console: move console_init() out of tty_io.c
  serial: 8250_early: Add earlycon support for Palmchip UART
  tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44
  vt: make mouse selection of non-ASCII consistent
  vt: set mouse selection word-chars to gpm's default
  imx-serial: Reduce RX DMA startup latency when opening for reading
  serial: omap: suspend device on probe errors
  serial: omap: fix runtime-pm handling on unbind
  tty: serial: omap: add UPF_BOOT_AUTOCONF flag for DT init
  serial: samsung: Remove useless spinlock
  serial: samsung: Add missing checks for dma_map_single failure
  serial: samsung: Use right device for DMA-mapping calls
  serial: imx: setup DCEDTE early and ensure DCD and RI irqs to be off
  tty: fix comment typo s/repsonsible/responsible/
  tty: amba-pl011: Fix spurious TX interrupts
  serial: xuartps: Enable clocks in the pm disable case also
  serial: core: Re-use struct uart_port {name} field
  ...

tracing: Use cpumask_available() to check if cpumask variable may be used

This fixes the following clang warning:

kernel/trace/trace.c:3231:12: warning: address of array 'iter->started'
will always evaluate to 'true' [-Wpointer-bool-conversion]
if (iter->started)

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthias Kaehlcke <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

- the rest of MM

- various misc things

- procfs updates

- lib/ updates

- checkpatch updates

- kdump/kexec updates

- add kvmalloc helpers, use them

- time helper updates for Y2038 issues. We're almost ready to remove
   current_fs_time() but that awaits a btrfs merge.

- add tracepoints to DAX

* emailed patches from Andrew Morton <[email protected]>: (114 commits)
  drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
  selftests/vm: add a test for virtual address range mapping
  dax: add tracepoint to dax_insert_mapping()
  dax: add tracepoint to dax_writeback_one()
  dax: add tracepoints to dax_writeback_mapping_range()
  dax: add tracepoints to dax_load_hole()
  dax: add tracepoints to dax_pfn_mkwrite()
  dax: add tracepoints to dax_iomap_pte_fault()
  mtd: nand: nandsim: convert to memalloc_noreclaim_*()
  treewide: convert PF_MEMALLOC manipulations to new helpers
  mm: introduce memalloc_noreclaim_{save,restore}
  mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
  mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
  mm/huge_memory.c: use zap_deposited_table() more
  time: delete CURRENT_TIME_SEC and CURRENT_TIME
  gfs2: replace CURRENT_TIME with current_time
  apparmorfs: replace CURRENT_TIME with current_time()
  lustre: replace CURRENT_TIME macro
  fs: ubifs: replace CURRENT_TIME_SEC with current_time
  fs: ufs: use ktime_get_real_ts64() for birthtime
  ...

drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4

  drivers/staging/ccree/ssi_hash.c:1990: error: unknown field 'template_ahash' specified in initializer
  drivers/staging/ccree/ssi_hash.c:1991: error: unknown field 'init' specified in initializer
  drivers/staging/ccree/ssi_hash.c:1991: warning: missing braces around initializer
  drivers/staging/ccree/ssi_hash.c:1991: warning: (near initialization for 'driver_hash[0].<anonymous>.template_ahash')
  drivers/staging/ccree/ssi_hash.c:1992: error: unknown field 'update' specified in initializer
  drivers/staging/ccree/ssi_hash.c:1992: warning: excess elements in union initializer
  drivers/staging/ccree/ssi_hash.c:1992: warning: (near initialization for 'driver_hash[0].<anonymous>')
  drivers/staging/ccree/ssi_hash.c:1993: error: unknown field 'final' specified in initializer
  drivers/staging/ccree/ssi_hash.c:1993: warning: excess elements in union initializer
  drivers/staging/ccree/ssi_hash.c:1993: warning: (near initialization for 'driver_hash[0].<anonymous>')
  ...

gcc-4.4.4 has issues with anon union initializers.  Work around this.

Cc: Gilad Ben-Yossef <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftests/vm: add a test for virtual address range mapping

This verifies virtual address mapping below and above the 128TB range
and makes sure that address returned are within the expected range
depending upon the hint passed from the user space.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Cc: Michal Suchanek <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dax: add tracepoint to dax_insert_mapping()

Add a tracepoint to dax_insert_mapping(), following the same logging
conventions as the rest of DAX.  This tracepoint, along with the one in
dax_load_hole(), lets us know how a DAX PTE fault was serviced.

Here is an example DAX fault that inserts a PTE mapping:

  small-1126  [007] ....
   145.451604: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220

  small-1126  [007] ....
   145.452317: dax_insert_mapping: dev 259:0 ino 0x1003 shared write address 0x10420000 radix_entry 0x100006

  small-1126  [007] ....
   145.452399: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dax: add tracepoint to dax_writeback_one()

Add a tracepoint to dax_writeback_one(), following the same logging
conventions as the rest of DAX.

Here is an example range writeback which ends up flushing one PMD and
one PTE:

  test-1265  [003] ....
   496.615250: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff

  test-1265  [003] ....
   496.616263: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x0 pglen 0x200

  test-1265  [003] ....
   496.616270: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x305 pglen 0x1

  test-1265  [003] ....
   496.616272: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff

[[email protected]: struct blk_dax_ctl has disappeared]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dax: add tracepoints to dax_writeback_mapping_range()

Add tracepoints to dax_writeback_mapping_range(), following the same
logging conventions as the rest of DAX.

Here is an example writeback call:

  msync-1085  [006] ....
   200.902565: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff

  msync-1085  [006] ....
   200.902579: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff

[[email protected]: fix regression in dax_writeback_mapping_range()]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dax: add tracepoints to dax_load_hole()

Add tracepoints to dax_load_hole(), following the same logging conventions
as the rest of DAX.

Here is the logging generated by a PTE read from a hole:

  read-1075  [002] ....
    62.362108: dax_pte_fault: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280

  read-1075  [002] ....
    62.362140: dax_load_hole: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE

  read-1075  [002] ....
    62.362141: dax_pte_fault_done: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dax: add tracepoints to dax_pfn_mkwrite()

Add tracepoints to dax_pfn_mkwrite(), following the same logging
conventions as the rest of DAX.

Here is an example PTE fault followed by a pfn_mkwrite:

  small_aligned-1094  [002] ....
   374.084998: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200

  small_aligned-1094  [002] ....
   374.085145: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 MAJOR|NOPAGE

  small_aligned-1094  [002] ....
   374.085165: dax_pfn_mkwrite: dev 259:0 ino 0x1003 shared WRITE|MKWRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 NOPAGE

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dax: add tracepoints to dax_iomap_pte_fault()

Patch series "second round of tracepoints for DAX".

This second round of DAX tracepoint patches adds tracing to the PTE
fault path (dax_iomap_pte_fault(), dax_pfn_mkwrite(), dax_load_hole(),
dax_insert_mapping()) and to the writeback path
(dax_writeback_mapping_range(), dax_writeback_one()).

The purpose of this tracing is to give us a high level view of what DAX
is doing, whether faults are being serviced by PMDs or PTEs, and by real
storage or by zero pages covering holes.

I do have some patches nearly ready which also add tracing to
grab_mapping_entry() and dax_insert_mapping_entry().  These are more
targeted at logging how we are interacting with the radix tree, how we
use empty entries for locking, whether we "downgrade" huge zero pages to
4k PTE sized allocations, etc.  In the end it seemed to me that this
might be too detailed to have as constantly present tracepoints, but if
anyone sees value in having tracepoints like this in the DAX code
permanently (Jan?), please let me know and I'll add those last two
patches.

All these tracepoints were done to be consistent with the style of the
XFS tracepoints and with the existing DAX PMD tracepoints.

This patch (of 6):

Add tracepoints to dax_iomap_pte_fault(), following the same logging
conventions as the rest of DAX.

Here is an example fault that initially tries to be serviced by the PMD
fault handler but which falls back to PTEs because the VMA isn't large
enough to hold a PMD:

  small-1086  [005] ....
   71.140014: xfs_filemap_huge_fault: dev 259:0 ino 0x1003

  small-1086  [005] ....
    71.140027: dax_pmd_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400

  small-1086  [005] ....
    71.140028: dax_pmd_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 FALLBACK

  small-1086  [005] ....
    71.140035: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220

  small-1086  [005] ....
    71.140396: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mtd: nand: nandsim: convert to memalloc_noreclaim_*()

Nandsim has own functions set_memalloc() and clear_memalloc() for robust
setting and clearing of PF_MEMALLOC. Replace them by the new generic
helpers. No functional change.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Chris Leech <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Lee Duncan <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Adrian Hunter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

treewide: convert PF_MEMALLOC manipulations to new helpers

We now have memalloc_noreclaim_{save,restore} helpers for robust setting
and clearing of PF_MEMALLOC. Let's convert the code which was using the
generic tsk_restore_flags(). No functional change.

[[email protected]: in net/core/sock.c the hunk is missing]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Lee Duncan <[email protected]>
Cc: Chris Leech <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Wouter Verhelst <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: introduce memalloc_noreclaim_{save,restore}

The previous patch ("mm: prevent potential recursive reclaim due to
clearing PF_MEMALLOC") has shown that simply setting and clearing
PF_MEMALLOC in current->flags can result in wrongly clearing a
pre-existing PF_MEMALLOC flag and potentially lead to recursive reclaim.
Let's introduce helpers that support proper nesting by saving the
previous stat of the flag, similar to the existing memalloc_noio_* and
memalloc_nofs_* helpers. Convert existing setting/clearing of
PF_MEMALLOC within mm to the new helpers.

There are no known issues with the converted code, but the change makes
it more robust.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Chris Leech <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Lee Duncan <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Richard Weinberger <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC

Patch series "more robust PF_MEMALLOC handling"

This series aims to unify the setting and clearing of PF_MEMALLOC, which
prevents recursive reclaim.  There are some places that clear the flag
unconditionally from current->flags, which may result in clearing a
pre-existing flag.  This already resulted in a bug report that Patch 1
fixes (without the new helpers, to make backporting easier).  Patch 2
introduces the new helpers, modelled after existing memalloc_noio_* and
memalloc_nofs_* helpers, and converts mm core to use them.  Patches 3
and 4 convert non-mm code.

This patch (of 4):

__alloc_pages_direct_compact() sets PF_MEMALLOC to prevent deadlock
during page migration by lock_page() (see the comment in
__unmap_and_move()).  Then it unconditionally clears the flag, which can
clear a pre-existing PF_MEMALLOC flag and result in recursive reclaim.
This was not a problem until commit a8161d1ed609 ("mm, page_alloc:
restructure direct compaction handling in slowpath"), because direct
compation was called only after direct reclaim, which was skipped when
PF_MEMALLOC flag was set.

Even now it's only a theoretical issue, as the new callsite of
__alloc_pages_direct_compact() is reached only for costly orders and
when gfp_pfmemalloc_allowed() is true, which means either
__GFP_NOMEMALLOC is in gfp_flags or in_interrupt() is true.  There is no
such known context, but let's play it safe and make
__alloc_pages_direct_compact() robust for cases where PF_MEMALLOC is
already set.

Fixes: a8161d1ed609 ("mm, page_alloc: restructure direct compaction handling in slowpath")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Reported-by: Andrey Ryabinin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Chris Leech <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Lee Duncan <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required

Although all architectures use a deposited page table for THP on
anonymous VMAs, some architectures (s390 and powerpc) require the
deposited storage even for file backed VMAs due to quirks of their MMUs.

This patch adds support for depositing a table in DAX PMD fault handling
path for archs that require it. Other architectures should see no
functional changes.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Oliver O'Halloran <[email protected]>
Cc: Reza Arbab <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: [email protected]
Cc: Oliver O'Halloran <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/huge_memory.c: use zap_deposited_table() more

Depending on the flags of the PMD being zapped there may or may not be a
deposited pgtable to be freed. In two of the three cases this is open
coded while the third uses the zap_deposited_table() helper. This patch
converts the others to use the helper to clean things up a bit.

Link: http://lkml.kernel.org/r/[email protected]
Cc: Reza Arbab <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: [email protected]
Cc: Oliver O'Halloran <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

time: delete CURRENT_TIME_SEC and CURRENT_TIME

All uses of CURRENT_TIME_SEC and CURRENT_TIME macros have been replaced
by other time functions. These macros are also not y2038 safe. And,
all their use cases can be fulfilled by y2038 safe ktime_get_* variants.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Acked-by: John Stultz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

gfs2: replace CURRENT_TIME with current_time

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

apparmorfs: replace CURRENT_TIME with current_time()

CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by current_time().

This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
current_time() is also planned to be transitioned to y2038 safe behavior
along with this change.

CURRENT_TIME macro will be deleted before merging the aforementioned
change.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Acked-by: John Johansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lustre: replace CURRENT_TIME macro

CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by current_time() for
filesystem times, and ktime_get_* functions for others.

struct timespec is also not y2038 safe. Retain timespec for timestamp
representation here as lustre uses it internally everywhere. These
references will be changed to use struct timespec64 in a separate patch.

This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
current_time() is also planned to be transitioned to y2038 safe behavior
along with this change.

CURRENT_TIME macro will be deleted before merging the aforementioned
change.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Cc: Oleg Drokin <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: James Simmons <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: ubifs: replace CURRENT_TIME_SEC with current_time

CURRENT_TIME_SEC is not y2038 safe.  current_time() will be transitioned
to use 64 bit time along with vfs in a separate patch.  There is no plan
to transition CURRENT_TIME_SEC to use y2038 safe time interfaces.

current_time() returns timestamps according to the granularities set in
the inode's super_block.  The granularity check to call
current_fs_time() or CURRENT_TIME_SEC is not required.

Use current_time() directly to update inode timestamp.  Use
timespec_trunc during file system creation, before the first inode is
created.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Artem Bityutskiy <[email protected]>
Cc: Adrian Hunter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: ufs: use ktime_get_real_ts64() for birthtime

CURRENT_TIME is not y2038 safe. Replace it with ktime_get_real_ts64().
Inode time formats are already 64 bit long and accommodates time64_t.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Cc: Evgeniy Dushistov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: ceph: CURRENT_TIME with ktime_get_real_ts()

CURRENT_TIME is not y2038 safe.  The macro will be deleted and all the
references to it will be replaced by ktime_get_* apis.

struct timespec is also not y2038 safe.  Retain timespec for timestamp
representation here as ceph uses it internally everywhere.  These
references will be changed to use struct timespec64 in a separate patch.

The current_fs_time() api is being changed to use vfs struct inode* as
an argument instead of struct super_block*.

Set the new mds client request r_stamp field using ktime_get_real_ts()
instead of using current_fs_time().

Also, since r_stamp is used as mtime on the server, use timespec_trunc()
to truncate the timestamp, using the right granularity from the
superblock.

This api will be transitioned to be y2038 safe along with vfs.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
M: Ilya Dryomov <[email protected]>
M: "Yan, Zheng" <[email protected]>
M: Sage Weil <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: cifs: replace CURRENT_TIME by other appropriate apis

CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by current_time() for
filesystem times, and ktime_get_* functions for authentication
timestamps and timezone calculations.

This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.

CURRENT_TIME macro will be deleted before merging the aforementioned
change.

The inode timestamps read from the server are assumed to have correct
granularity and range.

The patch also assumes that the difference between server and client
times lie in the range INT_MIN..INT_MAX. This is valid because this is
the difference between current times between server and client, and the
largest timezone difference is in the range of one day.

All cifs timestamps currently use timespec representation internally.
Authentication and timezone timestamps can also be transitioned into
using timespec64 when all other timestamps for cifs is transitioned to
use timespec64.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Cc: Steve French <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

trace: make trace_hwlat timestamp y2038 safe

struct timespec is not y2038 safe on 32 bit machines and needs to be
replaced by struct timespec64 in order to represent times beyond year
2038 on such machines.

Fix all the timestamp representation in struct trace_hwlat and all the
corresponding implementations.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: f2fs: use ktime_get_real_seconds for sit_info times

CURRENT_TIME_SEC is not y2038 safe.

Replace use of CURRENT_TIME_SEC with ktime_get_real_seconds in segment
timestamps used by GC algorithm including the segment mtime timestamps.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Cc: Jaegeuk Kim <[email protected]>
Cc: Chao Yu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

format-security: move static strings to const

While examining output from trial builds with -Wformat-security enabled,
many strings were found that should be defined as "const", or as a char
array instead of char pointer. This makes some static analysis easier,
by producing fewer false positives.

As these are all trivial changes, it seemed best to put them all in a
single patch rather than chopping them up per maintainer.

Link: http://lkml.kernel.org/r/20170405214711.GA5711@beast
Signed-off-by: Kees Cook <[email protected]>
Acked-by: Jes Sorensen <[email protected]> [runner.c]
Cc: Tony Lindgren <[email protected]>
Cc: Russell King <[email protected]>
Cc: "Maciej W. Rozycki" <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Sean Paul <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Yisen Zhuang <[email protected]>
Cc: Salil Mehta <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Patrice Chotard <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Matt Redfearn <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Mugunthan V N <[email protected]>
Cc: Felipe Balbi <[email protected]>
Cc: Jarod Wilson <[email protected]>
Cc: Florian Westphal <[email protected]>
Cc: Antonio Quartulli <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Kejian Yan <[email protected]>
Cc: Daode Huang <[email protected]>
Cc: Qianqian Xie <[email protected]>
Cc: Philippe Reynes <[email protected]>
Cc: Colin Ian King <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Christian Gromm <[email protected]>
Cc: Andrey Shvetsov <[email protected]>
Cc: Jason Litzinger <[email protected]>
Cc: WANG Cong <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Documentation/vm/transhuge.txt: fix trivial typos

[[email protected]: fixes per Randy]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flag

Commit afddba49d18f ("fs: introduce write_begin, write_end, and
perform_write aops") introduced AOP_FLAG_UNINTERRUPTIBLE flag which was
checked in pagecache_write_begin(), but that check was removed by
4e02ed4b4a2f ("fs: remove prepare_write/commit_write").

Between these two commits, commit d9414774dc0c ("cifs: Convert cifs to
new aops.") added a check in cifs_write_begin(), but that check was soon
removed by commit a98ee8c1c707 ("[CIFS] fix regression in
cifs_write_begin/cifs_write_end").

Therefore, AOP_FLAG_UNINTERRUPTIBLE flag is checked nowhere. Let's
remove this flag. This patch has no functionality changes.

Link: http://lkml.kernel.org/r/1489294781-53494-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

include/linux/uaccess.h: remove expensive WARN_ON in pagefault_disabled_dec

pagefault_disabled_dec is frequently used inline, and it has a WARN_ON
for underflow that expands to about 6.5k of extra code.  The warning
doesn't seem to be that useful and worth so much code so remove it.

If it was needed could make it depending on some debug kernel option.

Saves ~6.5k in my kernel

     text    data     bss     dec     hex filename
  9039417 5367568 11116544        25523529        1857549 vmlinux-before-pf
  9032805 5367568 11116544        25516917        1855b75 vmlinux-pf

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/scsi/megaraid: remove expensive inline from megasas_return_cmd

Remove an inline from a fairly big function that is used often.  It's
unlikely that calling or not calling it makes a lot of difference.

Saves around 8k text in my kernel.

     text    data     bss     dec     hex filename
  9047801 5367568 11116544        25531913        1859609 vmlinux-before-megasas
  9039417 5367568 11116544        25523529        1857549 vmlinux-megasas

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andi Kleen <[email protected]>
Cc: Kashyap Desai <[email protected]>
Cc: Sumit Saxena <[email protected]>
Cc: James Bottomley <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kref: remove WARN_ON for NULL release functions

The kref functions check for NULL release functions.  This WARN_ON seems
rather pointless.  We will eventually release and then just crash
nicely.  It is also somewhat expensive because these functions are
inlined in a lot of places.  Removing the WARN_ONs saves around 2.3k in
this kernel (likely more in others with more drivers)

     text    data     bss     dec     hex filename
  9083992 5367600 11116544        25568136        1862388 vmlinux-before-load-avg
  9070166 5367600 11116544        25554310        185ed86 vmlinux-load-avg

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Andi Kleen <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

treewide: decouple cacheflush.h and set_memory.h

Now that all call sites, completely decouple cacheflush.h and
set_memory.h

[[email protected]: kprobes/x86: merge fix for set_memory.h decoupling]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/staging/media/atomisp/pci/atomisp2: use set_memory.h

Cc: Laura Abbott <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/video/fbdev/vermilion/vermilion.c: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Acked-by: Bartlomiej Zolnierkiewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/misc/sram-exec.c: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

alsa: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Acked-by: Takashi Iwai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/power/snapshot.c: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/module.c: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Acked-by: Jessica Yu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

include/linux/filter.h: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/watchdog/hpwdt.c: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/hwtracing/intel_th/msu.c: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Acked-by: Alexander Shishkin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drm: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly.

[[email protected]: track drivers/gpu/drm/i915/i915_gem_gtt.c linux-next changes]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

agp: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

s390: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Acked-by: Heiko Carstens <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arm64: use set_memory.h header

The set_memory_* functions have moved to set_memory.h. Use that header
explicitly.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arm: use set_memory.h header

set_memory_* functions have moved to set_memory.h. Switch to this
explicitly

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Acked-by: Russell King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

treewide: move set_memory_* functions away from cacheflush.h

Patch series "set_memory_* functions header refactor", v3.

The set_memory_* APIs came out of a desire to have a better way to
change memory attributes.  Many of these attributes were linked to cache
functionality so the prototypes were put in cacheflush.h.  These days,
the APIs have grown and have a much wider use than just cache APIs.  To
support this growth, split off set_memory_* and friends into a separate
header file to avoid growing cacheflush.h for APIs that have nothing to
do with caches.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Laura Abbott <[email protected]>
Acked-by: Russell King <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

treewide: spelling: correct diffrent[iate] and banlance typos

Add these misspellings to scripts/spelling.txt too

Link: http://lkml.kernel.org/r/962aace119675e5fe87be2a88ddac1a5486f8e60.1490931810.git.joe@perches.com
Signed-off-by: Joe Perches <[email protected]>
Acked-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

scripts/spelling.txt: add "intialise(d)" pattern and fix typo instances

Fix typos and add the following to the scripts/spelling.txt:

  intialisation||initialisation
  intialised||initialised
  intialise||initialise

This commit does not intend to change the British spelling itself.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

scripts/spelling.txt: add regsiter -> register spelling mistake

This typo is quite common. Fix it and add it to the spelling file so
that checkpatch catches it earlier.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Stephen Boyd <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

scripts/spelling.txt: add "memory" pattern and fix typos

Fix typos and add the following to the scripts/spelling.txt:

momery||memory

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Stephen Boyd <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, vmalloc: use __GFP_HIGHMEM implicitly

__vmalloc* allows users to provide gfp flags for the underlying
allocation.  This API is quite popular

  $ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l
  77

The only problem is that many people are not aware that they really want
to give __GFP_HIGHMEM along with other flags because there is really no
reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages
which are mapped to the kernel vmalloc space.  About half of users don't
use this flag, though.  This signals that we make the API unnecessarily
too complex.

This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to
be mapped to the vmalloc space.  Current users which add __GFP_HIGHMEM
are simplified and drop the flag.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Reviewed-by: Matthew Wilcox <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Cristopher Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, swap: use kvzalloc to allocate some swap data structures

Now vzalloc() is used in swap code to allocate various data structures,
such as swap cache, swap slots cache, cluster info, etc.  Because the
size may be too large on some system, so that normal kzalloc() may fail.
But using kzalloc() has some advantages, for example, less memory
fragmentation, less TLB pressure, etc.  So change the data structure
allocation in swap code to use kvzalloc() which will try kzalloc()
firstly, and fallback to vzalloc() if kzalloc() failed.

In general, although kmalloc() will reduce the number of high-order
pages in short term, vmalloc() will cause more pain for memory
fragmentation in the long term.  And the swap data structure allocation
that is changed in this patch is expected to be long term allocation.

From Dave Hansen:
"for example, we have a two-page data structure. vmalloc() takes two
  effectively random order-0 pages, probably from two different 2M pages
  and pins them. That "kills" two 2M pages. kmalloc(), allocating two
  *contiguous* pages, will not cross a 2M boundary. That means it will
  only "kill" the possibility of a single 2M page. More 2M pages == less
  fragmentation.

The allocation in this patch occurs during swap on time, which is
usually done during system boot, so usually we have high opportunity to
allocate the contiguous pages successfully.

The allocation for swap_map[] in struct swap_info_struct is not changed,
because that is usually quite large and vmalloc_to_page() is used for
it.  That makes it a little harder to change.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Huang Ying <[email protected]>
Acked-by: Tim Chen <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/md/bcache/super.c: use kvmalloc

bcache_device_init uses kmalloc for small requests and vmalloc for those
which are larger than 64 pages. This alone is a strange criterion.
Moreover kmalloc can fallback to vmalloc on the failure. Let's simply
use kvmalloc instead as it knows how to handle the fallback properly

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/md/dm-ioctl.c: use kvmalloc rather than opencoded variant

copy_params uses kmalloc with vmalloc fallback.  We already have a
helper for that - kvmalloc.  This caller requires GFP_NOIO semantic so
it hasn't been converted with many others by previous patches.  All we
need to achieve this semantic is to use the scope
memalloc_noio_{save,restore} around kvmalloc.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Mikulas Patocka <[email protected]>
Cc: Mike Snitzer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

net: use kvmalloc with __GFP_REPEAT rather than open coded variant

fq_alloc_node, alloc_netdev_mqs and netif_alloc* open code kmalloc with
vmalloc fallback.  Use the kvmalloc variant instead.  Keep the
__GFP_REPEAT flag based on explanation from Eric:

"At the time, tests on the hardware I had in my labs showed that
  vmalloc() could deliver pages spread all over the memory and that was
  a small penalty (once memory is fragmented enough, not at boot time)"

The way how the code is constructed means, however, that we prefer to go
and hit the OOM killer before we fall back to the vmalloc for requests
<=32kB (with 4kB pages) in the current code.  This is rather disruptive
for something that can be achived with the fallback.  On the other hand
__GFP_REPEAT doesn't have any useful semantic for these requests.  So
the effect of this patch is that requests which fit into 32kB will fall
back to vmalloc easier now.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: David Miller <[email protected]>
Cc: Shakeel Butt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>