Git Repo - linux.git/log

arm64: atomic_lse: match asm register sizes

The LSE atomic code uses asm register variables to ensure that
parameters are allocated in specific registers. In the majority of cases
we specifically ask for an x register when using 64-bit values, but in a
couple of cases we use a w regsiter for a 64-bit value.

For asm register variables, the compiler only cares about the register
index, with wN and xN having the same meaning. The compiler determines
the register size to use based on the type of the variable. Thus, this
inconsistency is merely confusing, and not harmful to code generation.

For consistency, this patch updates those cases to use the x register
alias. There should be no functional change as a result of this patch.

Acked-by: Will Deacon <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

arm64: armv8_deprecated: ensure extension of addr

Our compat swp emulation holds the compat user address in an unsigned
int, which it passes to __user_swpX_asm(). When a 32-bit value is passed
in a register, the upper 32 bits of the register are unknown, and we
must extend the value to 64 bits before we can use it as a base address.

This patch casts the address to unsigned long to ensure it has been
suitably extended, avoiding the potential issue, and silencing a related
warning from clang.

Fixes: bd35a4adc413 ("arm64: Port SWP/SWPB emulation support from arm")
Cc: <[email protected]> # 3.19.x-
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

arm64: uaccess: ensure extension of access_ok() addr

Our access_ok() simply hands its arguments over to __range_ok(), which
implicitly assummes that the addr parameter is 64 bits wide. This isn't
necessarily true for compat code, which might pass down a 32-bit address
parameter.

In these cases, we don't have a guarantee that the address has been zero
extended to 64 bits, and the upper bits of the register may contain
unknown values, potentially resulting in a suprious failure.

Avoid this by explicitly casting the addr parameter to an unsigned long
(as is done on other architectures), ensuring that the parameter is
widened appropriately.

Fixes: 0aea86a2176c ("arm64: User access library functions")
Cc: <[email protected]> # 3.7.x-
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

arm64: ensure extension of smp_store_release value

When an inline assembly operand's type is narrower than the register it
is allocated to, the least significant bits of the register (up to the
operand type's width) are valid, and any other bits are permitted to
contain any arbitrary value. This aligns with the AAPCS64 parameter
passing rules.

Our __smp_store_release() implementation does not account for this, and
implicitly assumes that operands have been zero-extended to the width of
the type being stored to. Thus, we may store unknown values to memory
when the value type is narrower than the pointer type (e.g. when storing
a char to a long).

This patch fixes the issue by casting the value operand to the same
width as the pointer operand in all cases, which ensures that the value
is zero-extended as we expect. We use the same union trickery as
__smp_load_acquire and {READ,WRITE}_ONCE() to avoid GCC complaining that
pointers are potentially cast to narrower width integers in unreachable
paths.

A whitespace issue at the top of __smp_store_release() is also
corrected.

No changes are necessary for __smp_load_acquire(). Load instructions
implicitly clear any upper bits of the register, and the compiler will
only consider the least significant bits of the register as valid
regardless.

Fixes: 47933ad41a86 ("arch: Introduce smp_load_acquire(), smp_store_release()")
Fixes: 878a84d5a8a1 ("arm64: add missing data types in smp_load_acquire/smp_store_release")
Cc: <[email protected]> # 3.14.x-
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

arm64: xchg: hazard against entire exchange variable

The inline assembly in __XCHG_CASE() uses a +Q constraint to hazard
against other accesses to the memory location being exchanged. However,
the pointer passed to the constraint is a u8 pointer, and thus the
hazard only applies to the first byte of the location.

GCC can take advantage of this, assuming that other portions of the
location are unchanged, as demonstrated with the following test case:

union u {
unsigned long l;
unsigned int i[2];
};

unsigned long update_char_hazard(union u *u)
{
unsigned int a, b;

a = u->i[1];
asm ("str %1, %0" : "+Q" (*(char *)&u->l) : "r" (0UL));
b = u->i[1];

return a ^ b;
}

unsigned long update_long_hazard(union u *u)
{
unsigned int a, b;

a = u->i[1];
asm ("str %1, %0" : "+Q" (*(long *)&u->l) : "r" (0UL));
b = u->i[1];

return a ^ b;
}

The linaro 15.08 GCC 5.1.1 toolchain compiles the above as follows when
using -O2 or above:

0000000000000000 <update_char_hazard>:
   0: d2800001 mov x1, #0x0                    // #0
   4: f9000001 str x1, [x0]
   8: d2800000 mov x0, #0x0                    // #0
   c: d65f03c0 ret

0000000000000010 <update_long_hazard>:
  10: b9400401 ldr w1, [x0,#4]
  14: d2800002 mov x2, #0x0                    // #0
  18: f9000002 str x2, [x0]
  1c: b9400400 ldr w0, [x0,#4]
  20: 4a000020 eor w0, w1, w0
  24: d65f03c0 ret

This patch fixes the issue by passing an unsigned long pointer into the
+Q constraint, as we do for our cmpxchg code. This may hazard against
more than is necessary, but this is better than missing a necessary
hazard.

Fixes: 305d454aaa29 ("arm64: atomics: implement native {relaxed, acquire, release} atomics")
Cc: <[email protected]> # 4.4.x-
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

arm64: documentation: document tagged pointer stack constraints

Some kernel features don't currently work if a task puts a non-zero
address tag in its stack pointer, frame pointer, or frame record entries
(FP, LR).

For example, with a tagged stack pointer, the kernel can't deliver
signals to the process, and the task is killed instead. As another
example, with a tagged frame pointer or frame records, perf fails to
generate call graphs or resolve symbols.

For now, just document these limitations, instead of finding and fixing
everything that doesn't work, as it's not known if anyone needs to use
tags in these places anyway.

In addition, as requested by Dave Martin, generalize the limitations
into a general kernel address tag policy, and refactor
tagged-pointers.txt to include it.

Fixes: d50240a5f6ce ("arm64: mm: permit use of tagged pointers at EL0")
Cc: <[email protected]> # 3.12.x-
Reviewed-by: Dave Martin <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Kristina Martsenko <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

arm64: entry: improve data abort handling of tagged pointers

When handling a data abort from EL0, we currently zero the top byte of
the faulting address, as we assume the address is a TTBR0 address, which
may contain a non-zero address tag. However, the address may be a TTBR1
address, in which case we should not zero the top byte. This patch fixes
that. The effect is that the full TTBR1 address is passed to the task's
signal handler (or printed out in the kernel log).

When handling a data abort from EL1, we leave the faulting address
intact, as we assume it's either a TTBR1 address or a TTBR0 address with
tag 0x00. This is true as far as I'm aware, we don't seem to access a
tagged TTBR0 address anywhere in the kernel. Regardless, it's easy to
forget about address tags, and code added in the future may not always
remember to remove tags from addresses before accessing them. So add tag
handling to the EL1 data abort handler as well. This also makes it
consistent with the EL0 data abort handler.

Fixes: d50240a5f6ce ("arm64: mm: permit use of tagged pointers at EL0")
Cc: <[email protected]> # 3.12.x-
Reviewed-by: Dave Martin <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Kristina Martsenko <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

arm64: hw_breakpoint: fix watchpoint matching for tagged pointers

When we take a watchpoint exception, the address that triggered the
watchpoint is found in FAR_EL1. We compare it to the address of each
configured watchpoint to see which one was hit.

The configured watchpoint addresses are untagged, while the address in
FAR_EL1 will have an address tag if the data access was done using a
tagged address. The tag needs to be removed to compare the address to
the watchpoints.

Currently we don't remove it, and as a result can report the wrong
watchpoint as being hit (specifically, always either the highest TTBR0
watchpoint or lowest TTBR1 watchpoint). This patch removes the tag.

Fixes: d50240a5f6ce ("arm64: mm: permit use of tagged pointers at EL0")
Cc: <[email protected]> # 3.12.x-
Acked-by: Mark Rutland <[email protected]>
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Kristina Martsenko <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

arm64: traps: fix userspace cache maintenance emulation on a tagged pointer

When we emulate userspace cache maintenance in the kernel, we can
currently send the task a SIGSEGV even though the maintenance was done
on a valid address. This happens if the address has a non-zero address
tag, and happens to not be mapped in.

When we get the address from a user register, we don't currently remove
the address tag before performing cache maintenance on it. If the
maintenance faults, we end up in either __do_page_fault, where find_vma
can't find the VMA if the address has a tag, or in do_translation_fault,
where the tagged address will appear to be above TASK_SIZE. In both
cases, the address is not mapped in, and the task is sent a SIGSEGV.

This patch removes the tag from the address before using it. With this
patch, the fault is handled correctly, the address gets mapped in, and
the cache maintenance succeeds.

As a second bug, if cache maintenance (correctly) fails on an invalid
tagged address, the address gets passed into arm64_notify_segfault,
where find_vma fails to find the VMA due to the tag, and the wrong
si_code may be sent as part of the siginfo_t of the segfault. With this
patch, the correct si_code is sent.

Fixes: 7dd01aef0557 ("arm64: trap userspace "dc cvau" cache operation on errata-affected core")
Cc: <[email protected]> # 4.8.x-
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Kristina Martsenko <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

Merge tag 'armsoc-fixes-nc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull misc ARM SoC fixes from Olof Johansson:
"ARM SoC non-urgent fixes for merge window

  Smaller patches that didn't seem to find a home in other branches, and
  low-priority fixes from late in the merge window. A number of these
  are MAINTAINER updates, it seems.

  Highlights:

   * Maintainers:
     - Remove Alexandre Courbot and Stephen Warren from Tegra
       maintainership, add Jon Hunter
     - Remove Stephen Warren and add Stefan Wahren to bcm2835
     - Tweaks for file flagging for Marvell Dove

   * Fixes:
     - For two non-common-clk platform, handle clk_disable with NULL arg
     - Remove redundant Kconfig select for Oxnas"

* tag 'armsoc-fixes-nc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: mmp: let clk_disable() return immediately if clk is NULL
  ARM: w90x900: let clk_disable() return immediately if clk is NULL
  MAINTAINERS: Add file patterns for dove device tree bindings
  ARM: oxnas: remove redundant select CPU_V6K
  MAINTAINERS: tegra: Remove self as maintainer
  MAINTAINERS: tegra: Replace Stephen with Jon
  MAINTAINERS: Add Stefan Wahren to bcm2835.
  MAINTAINERS: remove swarren from bcm2835
  MAINTAINERS: Add Jon Mason to BCM5301X maintainers

Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc vfs updates from Al Viro:
"Assorted bits and pieces from various people. No common topic in this
  pile, sorry"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs/affs: add rename exchange
  fs/affs: add rename2 to prepare multiple methods
  Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()
  fs: don't set *REFERENCED on single use objects
  fs: compat: Remove warning from COMPATIBLE_IOCTL
  remove pointless extern of atime_need_update_rcu()
  fs: completely ignore unknown open flags
  fs: add a VALID_OPEN_FLAGS
  fs: remove _submit_bh()
  fs: constify tree_descr arrays passed to simple_fill_super()
  fs: drop duplicate header percpu-rwsem.h
  fs/affs: bugfix: Write files greater than page size on OFS
  fs/affs: bugfix: enable writes on OFS disks
  fs/affs: remove node generation check
  fs/affs: import amigaffs.h
  fs/affs: bugfix: make symbolic links work again

device-dax: kill NR_DEV_DAX

There is no point to ask how many device-dax instances the kernel should
support. Since we are already using a dynamic major number, just allow
the max number of minors by default and be done. This also fixes the
fact that the proposed max for the NR_DEV_DAX range was larger than what
could be supported by alloc_chrdev_region().

Fixes: ba09c01d2fa8 ("dax: convert to the cdev api")
Reported-by: Geert Uytterhoeven <[email protected]>
Tested-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Dan Williams <[email protected]>

Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fix from Al Viro:
"Braino fix for iov_iter_revert() misuse"

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix braino in generic_file_read_iter()

proc: try to remove use of FOLL_FORCE entirely

We fixed the bugs in it, but it's still an ugly interface, so let's see
if anybody actually depends on it. It's entirely possible that nothing
actually requires the whole "punch through read-only mappings"
semantics.

For example, gdb definitely uses the /proc/<pid>/mem interface, but it
looks like it mainly does it for regular reads of the target (that don't
need FOLL_FORCE), and looking at the gdb source code seems to fall back
on the traditional ptrace(PTRACE_POKEDATA) interface if it needs to.

If this breaks something, I do have a (more complex) version that only
enables FOLL_FORCE when somebody has PTRACE_ATTACH'ed to the target,
like the comment here used to say ("Maybe we should limit FOLL_FORCE to
actual ptrace users?").

Cc: Kees Cook <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Eric Biederman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'qed-general-fixes'

Yuval Mintz says:

====================
qed*: General fixes

This series contain several fixes for qed and qede.

- #1 [and ~#5] relate to XDP cleanups
- #2 and #5 correct VF behavior
- #3 and #4 fix and add missing configurations needed for RoCE & storage
====================

Signed-off-by: David S. Miller <[email protected]>

qede: Split PF/VF ndos.

PFs and VFs share the same structure of NDOs today,
and the VFs explicitly fails the ndo_xdp() callback stating
it doesn't support XDP.

This results in lots of:

  [qede_xdp:1032(enp131s2)]VFs don't support XDP
  ------------[ cut here ]------------
  WARNING: CPU: 4 PID: 1426 at net/core/rtnetlink.c:1637 rtnl_dump_ifinfo+0x354/0x3c0
  ...
  Call Trace:
    ? __alloc_skb+0x9b/0x1d0
    netlink_dump+0x122/0x290
    netlink_recvmsg+0x27d/0x430
    sock_recvmsg+0x3d/0x50
  ...

As every dump request for the VF interface info would fail due to
rtnl_xdp_fill() returning an error code.

To resolve this, introduce a subset of the NDOs meant for the VF
in a seperate structure and register that one instead for VFs,
and omit the ndo_xdp initialization.

Fixes: 40b8c45492ef ("qede: Prevent VFs from using XDP")
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: Correct doorbell configuration for !4Kb pages

When configuring the doorbell DPI address, driver aligns the start
address to 4KB [HW-pages] instead of host PAGE_SIZE.
As a result, RoCE applications might receive addresses which are
unaligned to pages [when PAGE_SIZE > 4KB], which is a security risk.

Fixes: 51ff17251c9c ("qed: Add support for RoCE hw init")
Signed-off-by: Ram Amrani <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: Tell QM the number of tasks

Driver doesn't pass the number of tasks to the QM init logic
which would cause back-pressure in scenarios requiring many tasks
[E.g., using max MRs] and thus reduced performance.

Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: Fix VF removal sequence

After previos changes in HW-stop scheme, VFs stopped sending CLOSE
messages to their PFs when they unload.

Fixes: 1226337ad98f ("qed: Correct HW stop flow")
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qede: Fix XDP memory leak on unload

When (re|un)loading, Tx-queues belonging to XDP would not get freed.

Fixes: cb6aeb079294 ("qede: Add support for XDP_TX")
Signed-off-by: Sudarsana Reddy Kalluru <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mlx4-misc-fixes'

Tariq Toukan says:

====================
mlx4 misc fixes

This patchset contains misc bug fixes from the team
to the mlx4 Core and Eth drivers.

Series generated against net commit:
32f1bc0f3d26 Revert "ipv4: restore rt->fi for reference counting"
====================

Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Reduce harmless SRIOV error message to debug level

Under SRIOV resource management, extra counters are allocated to VFs
from a free pool. If that pool is empty, the ALLOC_RES command for
a counter resource fails -- and this generates a misleading error
message in the message log.

Under SRIOV, each VF is allocated (i.e., guaranteed) 2 counters --
one counter per port. For ETH ports, the RoCE driver requests an
additional counter (above the guaranteed counters). If that request
fails, the VF RoCE driver simply uses the default (i.e., guaranteed)
counter for that port.

Thus, failing to allocate an additional counter does not constitute
a problem, and the error message on the PF when this occurs should
be reduced to debug level.

Finally, to identify the situation that the reason for the failure is
that no resources are available to grant to the VF, we modified the
error returned by mlx4_grant_resource to -EDQUOT (Quota exceeded),
which more accurately describes the error.

Fixes: c3abb51bdb0e ("IB/mlx4: Add RoCE/IB dedicated counters")
Signed-off-by: Jack Morgenstein <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_en: Avoid adding steering rules with invalid ring

Inserting steering rules with illegal ring is an invalid operation,
block it.

Fixes: 820672812f82 ('net/mlx4_en: Manage flow steering rules with ethtool')
Signed-off-by: Talat Batheesh <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_en: Change the error print to debug print

The error print within mlx4_en_calc_rx_buf() should be a debug print.

Fixes: 51151a16a60f ('mlx4: allow order-0 memory allocations in RX path')
Signed-off-by: Kamal Heib <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

s390/virtio: change maintainership

Halil is doing a lot more work in the virtio area on s390 than I
do. Let's reflect the reality in the maintainers file.

Signed-off-by: Christian Borntraeger <[email protected]>
Acked-by: Halil Pasic <[email protected]>
Acked-by: Cornelia Huck <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

tools/virtio: fix spelling mistake: "wakeus" -> "wakeups"

trivial fix to spelling mistake in an error message.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>

virtio_net: tidy a couple debug statements

We are printing a decimal value for truesize so we shouldn't use an "0x"
prefix.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

ptr_ring: support testing different batching sizes

Use the param flag for that.

Signed-off-by: Michael S. Tsirkin <[email protected]>

ringtest: support test specific parameters

Add a new flag for passing test-specific parameters.

Signed-off-by: Michael S. Tsirkin <[email protected]>

ptr_ring: batch ring zeroing

A known weakness in ptr_ring design is that it does not handle well the
situation when ring is almost full: as entries are consumed they are
immediately used again by the producer, so consumer and producer are
writing to a shared cache line.

To fix this, add batching to consume calls: as entries are
consumed do not write NULL into the ring until we get
a multiple (in current implementation 2x) of cache lines
away from the producer. At that point, write them all out.

We do the write out in the reverse order to keep
producer from sharing cache with consumer for as long
as possible.

Writeout also triggers when ring wraps around - there's
no special reason to do this but it helps keep the code
a bit simpler.

What should we do if getting away from producer by 2 cache lines
would mean we are keeping the ring moe than half empty?
Maybe we should reduce the batching in this case,
current patch simply reduces the batching.

Notes:
- it is no longer true that a call to consume guarantees
  that the following call to produce will succeed.
  No users seem to assume that.
- batching can also in theory reduce the signalling rate:
  users that would previously send interrups to the producer
  to wake it up after consuming each entry would now only
  need to do this once in a batch.
  Doing this would be easy by returning a flag to the caller.
  No users seem to do signalling on consume yet so this was not
  implemented yet.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Jason Wang <[email protected]>

virtio: virtio_driver doc

Add comments for the virtio_driver members that were not documented.

Signed-off-by: Cornelia Huck <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio_net: don't reset twice on XDP on/off

We already do a reset once in remove_vq_common -
there appears to be no point in doing another one
when we add/remove XDP.

Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio_net: fix support for small rings

When ring size is small (<32 entries) making buffers smaller means a
full ring might not be able to hold enough buffers to fit a single large
packet.

Make sure a ring full of buffers is large enough to allow at least one
packet of max size.

Fixes: 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag allocators")
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio_net: reduce alignment for buffers

We don't need to align length to any particular
value anymore. Aligning to L1 cache size probably
sill makes sense to reduce false sharing.

Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio_net: rework mergeable buffer handling

Use the new _ctx virtio API to maintain true length for each buffer.

Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio_net: allow specifying context for rx

With mergeable buffers we never use s/g for rx,
so allow specifying context in that case.

Signed-off-by: Michael S. Tsirkin <[email protected]>

powerpc/64s: Support new device tree binding for discovering CPU features

The ibm,powerpc-cpu-features device tree binding describes CPU features with
ASCII names and extensible compatibility, privilege, and enablement metadata
that allows improved flexibility and compatibility with new hardware.

The interface is described in detail in ibm,powerpc-cpu-features.txt in this
patch.

Currently this code is not enabled by default, and there are no released
firmwares that provide the binding.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

drivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparison

Use time_after() for time comparison with the new fix.

Signed-off-by: Karim Eshapa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

DECnet: Use container_of() for embedded struct

Instead of a direct cross-type cast, use conatiner_of() to locate
the embedded structure, even in the face of future struct layout
randomization.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

powerpc: Don't print cpu_spec->cpu_name if it's NULL

Currently we assume that if the cpu_spec has a pvr_mask then it must also have a
cpu_name. But that will change in a subsequent commit when we do CPU feature
discovery via the device tree, so check explicitly if cpu_name is NULL.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

of/fdt: introduce of_scan_flat_dt_subnodes and of_get_flat_dt_phandle

Introduce primitives for FDT parsing. These will be used for powerpc
cpufeatures node scanning, which has quite complex structure but should
be processed early.

Cc: [email protected]
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next

Freescale updates from Scott:

"Includes a fix for a powerpc/next mm regression on 64e, a fix for a
kernel hang on 64e when using a debugger inside a relocated kernel, a
qman fix, and misc qe improvements."

Merge tag 'kvm-arm-for-v4.12-round2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

Second round of KVM/ARM Changes for v4.12.

Changes include:
- A fix related to the 32-bit idmap stub
- A fix to the bitmask used to deode the operands of an AArch32 CP
   instruction
- We have moved the files shared between arch/arm/kvm and
   arch/arm64/kvm to virt/kvm/arm
- We add support for saving/restoring the virtual ITS state to
   userspace

KVM: arm/arm64: vgic-its: Cleanup after failed ITT restore

When failing to restore the ITT for a DTE, we should remove the failed
device entry from the list and free the object.

We slightly refactor vgic_its_destroy to be able to reuse the now
separate vgic_its_free_dte() function.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Don't call map_resources when restoring ITS tables

The only reason we called kvm_vgic_map_resources() when restoring the
ITS tables was because we wanted to have the KVM iodevs registered in
the KVM IO bus framework at the time when the ITS was restored such that
a restored and active device can inject MSIs prior to otherwise calling
kvm_vgic_map_resources() from the first run of a VCPU.

Since we now register the KVM iodevs for the redestributors and ITS as
soon as possible (when setting the base addresses), we no longer need
this call and kvm_vgic_map_resources() is again called only when first
running a VCPU.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Register ITS iodev when setting base address

We have to register the ITS iodevice before running the VM, because in
migration scenarios, we may be restoring a live device that wishes to
inject MSIs before the VCPUs have started.

All we need to register the ITS io device is the base address of the
ITS, so we can simply register that when the base address of the ITS is
set.

[ Code to fix concurrency issues when setting the ITS base address and
to fix the undef base address check written by Marc Zyngier ]

Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Get rid of its->initialized field

The its->initialized doesn't bring much to the table, and creates
unnecessary ordering between setting the address and initializing it
(which amounts to exactly nothing).

Let's kill it altogether, making KVM_DEV_ARM_VGIC_CTRL_INIT the no-op
it deserves to be.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs

Instead of waiting with registering KVM iodevs until the first VCPU is
run, we can actually create the iodevs when the redist base address is
set. The only downside is that we must now also check if we need to do
this for VCPUs which are created after creating the VGIC, because there
is no enforced ordering between creating the VGIC (and setting its base
addresses) and creating the VCPUs.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Slightly rework kvm_vgic_addr

As we are about to handle setting the address for the redistributor base
region separately from some of the other base addresses, let's rework
this function to leave a little more room for being flexible in what
each type of base address does.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Make vgic_v3_check_base more broadly usable

As we are about to fiddle with the IO device registration mechanism,
let's be a little more careful when setting base addresses as early as
possible. When setting a base address, we can check that there's
address space enough for its scope and when the last of the two
base addresses (dist and redist) get set, we can also check if the
regions overlap at that time.

This allows us to provide error messages to the user at time when trying
to set the base address, as opposed to later when trying to run the VM.

To do this, we make vgic_v3_check_base available in the core vgic-v3
code as well as in the other parts of the GICv3 code, namely the MMIO
config code.

We also return true for undefined base addresses so that the function
can be used before all base addresses are set; all callers already check
for uninitialized addresses before calling this function.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Refactor vgic_register_redist_iodevs

Split out the function to register all the redistributor iodevs into a
function that handles a single redistributor at a time in preparation
for being able to call this per VCPU as these get created.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: Add kvm_vcpu_get_idx to get vcpu index in kvm->vcpus

There are occasional needs to use the index of vcpu in the kvm->vcpus
array to map something related to a VCPU. For example, unlike the
vcpu->vcpu_id, the vcpu index is guaranteed to not be sparse across all
vcpus which is useful when allocating a memory area for each vcpu.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

nVMX: Advertise PML to L1 hypervisor

Advertise the PML bit in vmcs12 but don't try to enable
it in hardware when running L2 since L0 is emulating it. Also,
preserve L0's settings for PML since it may still
want to log writes.

Signed-off-by: Bandan Das <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nVMX: Implement emulated Page Modification Logging

With EPT A/D enabled, processor access to L2 guest
paging structures will result in a write violation.
When this happens, write the GUEST_PHYSICAL_ADDRESS
to the pml buffer provided by L1 if the access is
write and the dirty bit is being set.

This patch also adds necessary checks during VMEntry if L1
has enabled PML. If the PML index overflows, we change the
exit reason and run L1 to simulate a PML full event.

Signed-off-by: Bandan Das <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kvm: x86: Add a hook for arch specific dirty logging emulation

When KVM updates accessed/dirty bits, this hook can be used
to invoke an arch specific function that implements/emulates
dirty logging such as PML.

Signed-off-by: Bandan Das <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

kvm: nVMX: Validate CR3 target count on nested VM-entry

According to the SDM, the CR3-target count must not be greater than
4. Future processors may support a different number of CR3-target
values. Software should read the VMX capability MSR IA32_VMX_MISC to
determine the number of values supported.

Signed-off-by: Jim Mattson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

The main thing here is a new implementation of the in-kernel
XICS interrupt controller emulation for POWER9 machines, from Ben
Herrenschmidt.

POWER9 has a new interrupt controller called XIVE (eXternal Interrupt
Virtualization Engine) which is able to deliver interrupts directly
to guest virtual CPUs in hardware without hypervisor intervention.
With this new code, the guest still sees the old XICS interface but
performance is better because the XICS emulation in the host uses the
XIVE directly rather than going through a XICS emulation in firmware.

Conflicts:
arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix]
arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]

KVM: set no_llseek in stat_fops_per_vm

In vm_stat_get_per_vm_fops and vcpu_stat_get_per_vm_fops, since we
use nonseekable_open() to open, we should use no_llseek() to seek,
not generic_file_llseek().

Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

powerpc/64s: Fix unnecessary machine check handler relocation branch

Similarly to commit 2563a70c3b ("powerpc/64s: Remove unnecessary relocation
branch from idle handler"), the machine check handler has a BRANCH_TO from
relocated to relocated code, which is unnecessary.

It has also caused build errors with some toolchains:

  arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
  arch/powerpc/kernel/exceptions-64s.S:395: Error: operand out of range
  (0xffffffffffff8280 is not between 0x0000000000000000 and
  0x000000000000ffff)

Fixes: 1945bc4549e5 ("powerpc/64s: Fix POWER9 machine check handler from stop state")
Signed-off-by: Nicholas Piggin <[email protected]>
Reported-and-tested-by : Abdul Haleem <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/mm/book3s/64: Rework page table geometry for lower memory usage

Recently in commit f6eedbba7a26 ("powerpc/mm/hash: Increase VA range to 128TB")
we increased the virtual address space for user processes to 128TB by default,
and up to 512TB if user space opts in.

This obviously required expanding the range of the Linux page tables. For Book3s
64-bit using hash and with PAGE_SIZE=64K, we increased the PGD to 2^15 entries.
This meant we could cover the full address range, while still being able to
insert a 16G hugepage at the PGD level and a 16M hugepage in the PMD.

The downside of that geometry is that it uses a lot of memory for the PGD, and
in particular makes the PGD a 4-page allocation, which means it's much more
likely to fail under memory pressure.

Instead we can make the PMD larger, so that a single PUD entry maps 16G,
allowing the 16G hugepages to sit at that level in the tree. We're then able to
split the remaining bits between the PUG and PGD. We make the PGD slightly
larger as that results in lower memory usage for typical programs.

When THP is enabled the PMD actually doubles in size, to 2^11 entries, or 2^14
bytes, which is large but still < PAGE_SIZE.

Signed-off-by: Michael Ellerman <[email protected]>
Reviewed-by: Balbir Singh <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>

powerpc: Fix distclean with Makefile.postlink

Makefile.postlink always includes include/config/auto.conf, however
this file is not present in a clean kernel tree, causing make to fail:

  $ git clone linuxppc.git
  $ cd linuxppc.git
  $ make distclean
  arch/powerpc/Makefile.postlink:10: include/config/auto.conf: No such file or directory
  make[1]: *** No rule to make target `include/config/auto.conf'.  Stop.
  make: *** [vmlinuxclean] Error 2

Equally running 'make distclean; make distclean' will trip the error case.

Change the inclusion such that file not being found does not trigger an error.

Fixes: f188d0524d7e ("powerpc: Use the new post-link pass to check relocations")
Reported-by: Mircea Pop <[email protected]>
Signed-off-by: Horia Geantă <[email protected]>
Tested-by: Justin M. Forbes <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

KVM: arm/arm64: vgic: Rename kvm_vgic_vcpu_init to kvm_vgic_vcpu_enable

This function really doesn't init anything, it enables the CPU
interface, so name it as such, which gives us the name to use for actual
init work later on.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

KVM: arm/arm64: Clarification and relaxation to ITS save/restore ABI

Clarify what is meant by the save/restore ABI only supporting virtual
physical interrupts.

Relax the requirement of the order that the collection entries are
written in and be clear that there is no particular ordering enforced.

Some cosmetic changes in the capitalization of ID names to align with
the GICv3 manual and remove the empty line in the bottom of the patch.

Signed-off-by: Christoffer Dall <[email protected]>
Reviewed-by: Eric Auger <[email protected]>

x86/intel_rdt: Fix a typo in Documentation

Example 3 contains a typo:

"C0" in "# echo C0 > p0/cpus" is wrong because it specifies core
6-7 instead of wanted core 4-7.

Correct this typo to avoid confusion.

Signed-off-by: Xiaochen Shen <[email protected]>
Acked-by: Fenghua Yu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

clocksource/arm_arch_timer: Fix arch_timer_mem_find_best_frame()

arch_timer_mem_find_best_frame() looks through ARCH_TIMER_MEM_MAX_FRAMES
frames even after finding matches to ensure the best frame is chosen,
which means the variable frame will point to the last valid frame but
not necessarily the best frame.

On Juno, we get the following error as the wrong frame is returned as the
best frame from arch_timer_mem_find_best_frame():

  arch_timer: Unable to map frame @ 0x0000000000000000
  arch_timer: Frame missing phys irq.
  Failed to initialize '/timer@2a810000': -22

Fix the issue by correctly returning the best frame from
arch_timer_mem_find_best_frame().

Fixes: c389d701dfb7 ("clocksource: arm_arch_timer: split MMIO timer probing.")
Signed-off-by: Sudeep Holla <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Daniel Lezcano <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

x86/build: Don't add -maccumulate-outgoing-args w/o compiler support

Clang does not support this machine dependent option.

Older versions of GCC (pre 3.0) may not support this option, added in
2000, but it's unlikely they can still compile a working kernel.

Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/boot/32: Fix UP boot on Quark and possibly other platforms

This partially reverts commit:

  23b2a4ddebdd17f ("x86/boot/32: Defer resyncing initial_page_table until per-cpu is set up")

That commit had one definite bug and one potential bug.  The
definite bug is that setup_per_cpu_areas() uses a differnet generic
implementation on UP kernels, so initial_page_table never got
resynced.  This was fine for access to percpu data (it's in the
identity map on UP), but it breaks other users of
initial_page_table.  The potential bug is that helpers like
efi_init() would be called before the tables were synced.

Avoid both problems by just syncing the page tables in setup_arch()
*and* setup_per_cpu_areas().

Reported-by: Jan Kiszka <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Garnier <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()

'__vmalloc_start_set' currently only gets set in initmem_init() when
!CONFIG_NEED_MULTIPLE_NODES. This breaks detection of vmalloc address
with virt_addr_valid() with CONFIG_NEED_MULTIPLE_NODES=y, causing
a kernel crash:

[mm/usercopy] 517e1fbeb6: kernel BUG at arch/x86/mm/physaddr.c:78!

Set '__vmalloc_start_set' appropriately for that case as well.

Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Laura Abbott <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: dc16ecf7fd1f ("x86-32: use specific __vmalloc_start_set flag in __virt_addr_valid")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

Merge tag 'linux-kselftest-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest updates from Shuah Khan:
"This update consists of:

   - important fixes for build failures and clean target related
     warnings to address regressions introduced in commit 88baa78d1f31
     ("selftests: remove duplicated all and clean target")

   - several minor spelling fixes in and log messages and comment
     blocks.

   - Enabling configs for better test coverage in ftrace, vm, and
     cpufreq tests.

   - .gitignore changes"

* tag 'linux-kselftest-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (26 commits)
  selftests: x86: add missing executables to .gitignore
  selftests: watchdog: accept multiple params on command line
  selftests: create cpufreq kconfig fragments
  selftests: x86: override clean in lib.mk to fix warnings
  selftests: sync: override clean in lib.mk to fix warnings
  selftests: splice: override clean in lib.mk to fix warnings
  selftests: gpio: fix clean target to remove all generated files and dirs
  selftests: add gpio generated files to .gitignore
  selftests: powerpc: override clean in lib.mk to fix warnings
  selftests: gpio: override clean in lib.mk to fix warnings
  selftests: futex: override clean in lib.mk to fix warnings
  selftests: lib.mk: define CLEAN macro to allow Makefiles to override clean
  selftests: splice: fix clean target to not remove default_file_splice_read.sh
  selftests: gpio: add config fragment for gpio-mockup
  selftests: breakpoints: allow to cross-compile for aarch64/arm64
  selftests/Makefile: Add missed PHONY targets
  selftests/vm/run_vmtests: Fix wrong comment
  selftests/Makefile: Add missed closing `"` in comment
  selftests/vm/run_vmtests: Polish output text
  selftests/timers: fix spelling mistake: "Asynchronous"
  ...

Merge tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing updates from Steven Rostedt:
"These are three simple changes.

  The first one is just a switch from using strcpy() to strlcpy().
  Someone thought that it may cause an overflow bug, but since it only
  copies comms into a pre-allocated array of TASK_COMM_LEN, and no comm
  should ever be bigger than that, nor not end with a nul character,
  this change is more of a safety precaution than fixing anything that
  is actually broken.

  The other two changes are simply cleaning and optimizing some code"

* tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Simplify ftrace_match_record() even more
  ftrace: Remove an unneeded condition
  tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

Merge tags 'for-linus' and 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull more rdma updates from Doug Ledford:
"As mentioned in my first pull request, this is the subsequent pull
  requests I had. This is all I have, and in fact this cleans out the
  RDMA subsystem's entire patchworks queue of kernel changes that are
  ready to go (well, it did for the weekend anyway, a few new patches
  are in, but they'll be coming during the -rc cycle).

  The first tag contains a single patch that would have conflicted if
  taken from my tree or DaveM's tree as it needed our trees merged to
  come cleanly.

  The second tag contains the patch series from Intel plus three other
  stragllers that came in late last week. I took them because it allowed
  me to legitimately claim that the RDMA patchworks queue was, for a
  short time, 100% cleared of all waiting kernel patches, woohoo! :-).

  I have it under my for-next tag, so it did get 0day and linux- next
  over the end of last week, and linux-next did show one minor conflict.

  Summary:

  'for-linus' tag:
   - mlx5/IPoIB fixup patch

  'for-next' tag:
   - the hfi1 15 patch set that landed late
   - IPoIB get_link_ksettings which landed late because I asked for a
     respin
   - one late rxe change
   - one -rc worthy fix that's in early"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/mlx5: Enable IPoIB acceleration

* tag 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  rxe: expose num_possible_cpus() cnum_comp_vectors
  IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type
  IB/hfi1: Clean up on context initialization failure
  IB/hfi1: Fix an assign/ordering issue with shared context IDs
  IB/hfi1: Clean up context initialization
  IB/hfi1: Correctly clear the pkey
  IB/hfi1: Search shared contexts on the opened device, not all devices
  IB/hfi1: Remove atomic operations for SDMA_REQ_HAVE_AHG bit
  IB/hfi1: Use filedata rather than filepointer
  IB/hfi1: Name function prototype parameters
  IB/hfi1: Fix a subcontext memory leak
  IB/hfi1: Return an error on memory allocation failure
  IB/hfi1: Adjust default eager_buffer_size to 8MB
  IB/hfi1: Get rid of divide when setting the tx request header
  IB/hfi1: Fix yield logic in send engine
  IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line
  IB/hfi1: Fix checks for Offline transient state
  IB/ipoib: add get_link_ksettings in ethtool

Revert "ipv4: restore rt->fi for reference counting"

This reverts commit 82486aa6f1b9bc8145e6d0fa2bc0b44307f3b875.

As implemented, this causes dangling netdevice refs.

Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

- add framework for supporting PCIe devices in Endpoint mode (Kishon
   Vijay Abraham I)

- use non-postable PCI config space mappings when possible (Lorenzo
   Pieralisi)

- clean up and unify mmap of PCI BARs (David Woodhouse)

- export and unify Function Level Reset support (Christoph Hellwig)

- avoid FLR for Intel 82579 NICs (Sasha Neftin)

- add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig)

- short-circuit config access failures for disconnected devices (Keith
   Busch)

- remove D3 sleep delay when possible (Adrian Hunter)

- freeze PME scan before suspending devices (Lukas Wunner)

- stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava)

- disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann)

- add arch-specific alignment control to improve device passthrough by
   avoiding multiple BARs in a page (Yongji Xie)

- add sysfs sriov_drivers_autoprobe to control VF driver binding
   (Bodong Wang)

- allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas)

- fix crashes when unbinding host controllers that don't support
   removal (Brian Norris)

- add driver for MicroSemi Switchtec management interface (Logan
   Gunthorpe)

- add driver for Faraday Technology FTPCI100 host bridge (Linus
   Walleij)

- add i.MX7D support (Andrey Smirnov)

- use generic MSI support for Aardvark (Thomas Petazzoni)

- make Rockchip driver modular (Brian Norris)

- advertise 128-byte Read Completion Boundary support for Rockchip
   (Shawn Lin)

- advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin)

- convert atomic_t to refcount_t in HV driver (Elena Reshetova)

- add CPU IRQ affinity in HV driver (K. Y. Srinivasan)

- fix PCI bus removal in HV driver (Long Li)

- add support for ThunderX2 DMA alias topology (Jayachandran C)

- add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki)

- add ITE 8893 bridge DMA alias quirk (Jarod Wilson)

- restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
   (Manish Jaggi)

* tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
  PCI: Don't allow unbinding host controllers that aren't prepared
  ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
  MAINTAINERS: Add PCI Endpoint maintainer
  Documentation: PCI: Add userguide for PCI endpoint test function
  tools: PCI: Add sample test script to invoke pcitest
  tools: PCI: Add a userspace tool to test PCI endpoint
  Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
  misc: Add host side PCI driver for PCI test function device
  PCI: Add device IDs for DRA74x and DRA72x
  dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
  PCI: dwc: dra7xx: Workaround for errata id i870
  dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
  PCI: dwc: dra7xx: Add EP mode support
  PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
  dt-bindings: PCI: Add DT bindings for PCI designware EP mode
  PCI: dwc: designware: Add EP mode support
  Documentation: PCI: Add binding documentation for pci-test endpoint function
  ixgbe: Use pcie_flr() instead of duplicating it
  IB/hfi1: Use pcie_flr() instead of duplicating it
  PCI: imx6: Fix spelling mistake: "contol" -> "control"
  ...

Merge tag 'tty-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial updates from Greg KH:
"Here is the "big" TTY/Serial patch updates for 4.12-rc1

  Not a lot of new things here, the normal number of serial driver
  updates and additions, tiny bugs fixed, and some core files split up
  to make future changes a bit easier for Nicolas's "tiny-tty" work.

  All of these have been in linux-next for a while"

* tag 'tty-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (62 commits)
  serial: small Makefile reordering
  tty: split job control support into a file of its own
  tty: move baudrate handling code to a file of its own
  console: move console_init() out of tty_io.c
  serial: 8250_early: Add earlycon support for Palmchip UART
  tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44
  vt: make mouse selection of non-ASCII consistent
  vt: set mouse selection word-chars to gpm's default
  imx-serial: Reduce RX DMA startup latency when opening for reading
  serial: omap: suspend device on probe errors
  serial: omap: fix runtime-pm handling on unbind
  tty: serial: omap: add UPF_BOOT_AUTOCONF flag for DT init
  serial: samsung: Remove useless spinlock
  serial: samsung: Add missing checks for dma_map_single failure
  serial: samsung: Use right device for DMA-mapping calls
  serial: imx: setup DCEDTE early and ensure DCD and RI irqs to be off
  tty: fix comment typo s/repsonsible/responsible/
  tty: amba-pl011: Fix spurious TX interrupts
  serial: xuartps: Enable clocks in the pm disable case also
  serial: core: Re-use struct uart_port {name} field
  ...

tracing: Use cpumask_available() to check if cpumask variable may be used

This fixes the following clang warning:

kernel/trace/trace.c:3231:12: warning: address of array 'iter->started'
will always evaluate to 'true' [-Wpointer-bool-conversion]
if (iter->started)

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matthias Kaehlcke <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

- the rest of MM

- various misc things

- procfs updates

- lib/ updates

- checkpatch updates

- kdump/kexec updates

- add kvmalloc helpers, use them

- time helper updates for Y2038 issues. We're almost ready to remove
   current_fs_time() but that awaits a btrfs merge.

- add tracepoints to DAX

* emailed patches from Andrew Morton <[email protected]>: (114 commits)
  drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
  selftests/vm: add a test for virtual address range mapping
  dax: add tracepoint to dax_insert_mapping()
  dax: add tracepoint to dax_writeback_one()
  dax: add tracepoints to dax_writeback_mapping_range()
  dax: add tracepoints to dax_load_hole()
  dax: add tracepoints to dax_pfn_mkwrite()
  dax: add tracepoints to dax_iomap_pte_fault()
  mtd: nand: nandsim: convert to memalloc_noreclaim_*()
  treewide: convert PF_MEMALLOC manipulations to new helpers
  mm: introduce memalloc_noreclaim_{save,restore}
  mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
  mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
  mm/huge_memory.c: use zap_deposited_table() more
  time: delete CURRENT_TIME_SEC and CURRENT_TIME
  gfs2: replace CURRENT_TIME with current_time
  apparmorfs: replace CURRENT_TIME with current_time()
  lustre: replace CURRENT_TIME macro
  fs: ubifs: replace CURRENT_TIME_SEC with current_time
  fs: ufs: use ktime_get_real_ts64() for birthtime
  ...

drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4

  drivers/staging/ccree/ssi_hash.c:1990: error: unknown field 'template_ahash' specified in initializer
  drivers/staging/ccree/ssi_hash.c:1991: error: unknown field 'init' specified in initializer
  drivers/staging/ccree/ssi_hash.c:1991: warning: missing braces around initializer
  drivers/staging/ccree/ssi_hash.c:1991: warning: (near initialization for 'driver_hash[0].<anonymous>.template_ahash')
  drivers/staging/ccree/ssi_hash.c:1992: error: unknown field 'update' specified in initializer
  drivers/staging/ccree/ssi_hash.c:1992: warning: excess elements in union initializer
  drivers/staging/ccree/ssi_hash.c:1992: warning: (near initialization for 'driver_hash[0].<anonymous>')
  drivers/staging/ccree/ssi_hash.c:1993: error: unknown field 'final' specified in initializer
  drivers/staging/ccree/ssi_hash.c:1993: warning: excess elements in union initializer
  drivers/staging/ccree/ssi_hash.c:1993: warning: (near initialization for 'driver_hash[0].<anonymous>')
  ...

gcc-4.4.4 has issues with anon union initializers.  Work around this.

Cc: Gilad Ben-Yossef <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

selftests/vm: add a test for virtual address range mapping

This verifies virtual address mapping below and above the 128TB range
and makes sure that address returned are within the expected range
depending upon the hint passed from the user space.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Cc: Michal Suchanek <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dax: add tracepoint to dax_insert_mapping()

Add a tracepoint to dax_insert_mapping(), following the same logging
conventions as the rest of DAX.  This tracepoint, along with the one in
dax_load_hole(), lets us know how a DAX PTE fault was serviced.

Here is an example DAX fault that inserts a PTE mapping:

  small-1126  [007] ....
   145.451604: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220

  small-1126  [007] ....
   145.452317: dax_insert_mapping: dev 259:0 ino 0x1003 shared write address 0x10420000 radix_entry 0x100006

  small-1126  [007] ....
   145.452399: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dax: add tracepoint to dax_writeback_one()

Add a tracepoint to dax_writeback_one(), following the same logging
conventions as the rest of DAX.

Here is an example range writeback which ends up flushing one PMD and
one PTE:

  test-1265  [003] ....
   496.615250: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff

  test-1265  [003] ....
   496.616263: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x0 pglen 0x200

  test-1265  [003] ....
   496.616270: dax_writeback_one: dev 259:0 ino 0x1003 pgoff 0x305 pglen 0x1

  test-1265  [003] ....
   496.616272: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x0-0x7ffffffffffff

[[email protected]: struct blk_dax_ctl has disappeared]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dax: add tracepoints to dax_writeback_mapping_range()

Add tracepoints to dax_writeback_mapping_range(), following the same
logging conventions as the rest of DAX.

Here is an example writeback call:

  msync-1085  [006] ....
   200.902565: dax_writeback_range: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff

  msync-1085  [006] ....
   200.902579: dax_writeback_range_done: dev 259:0 ino 0x1003 pgoff 0x200-0x2ff

[[email protected]: fix regression in dax_writeback_mapping_range()]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dax: add tracepoints to dax_load_hole()

Add tracepoints to dax_load_hole(), following the same logging conventions
as the rest of DAX.

Here is the logging generated by a PTE read from a hole:

  read-1075  [002] ....
    62.362108: dax_pte_fault: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280

  read-1075  [002] ....
    62.362140: dax_load_hole: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE

  read-1075  [002] ....
    62.362141: dax_pte_fault_done: dev 259:0 ino 0x1003 shared ALLOW_RETRY|KILLABLE|USER address 0x10480000 pgoff 0x280 NOPAGE

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dax: add tracepoints to dax_pfn_mkwrite()

Add tracepoints to dax_pfn_mkwrite(), following the same logging
conventions as the rest of DAX.

Here is an example PTE fault followed by a pfn_mkwrite:

  small_aligned-1094  [002] ....
   374.084998: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200

  small_aligned-1094  [002] ....
   374.085145: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 MAJOR|NOPAGE

  small_aligned-1094  [002] ....
   374.085165: dax_pfn_mkwrite: dev 259:0 ino 0x1003 shared WRITE|MKWRITE|ALLOW_RETRY|KILLABLE|USER address 0x10400000 pgoff 0x200 NOPAGE

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dax: add tracepoints to dax_iomap_pte_fault()

Patch series "second round of tracepoints for DAX".

This second round of DAX tracepoint patches adds tracing to the PTE
fault path (dax_iomap_pte_fault(), dax_pfn_mkwrite(), dax_load_hole(),
dax_insert_mapping()) and to the writeback path
(dax_writeback_mapping_range(), dax_writeback_one()).

The purpose of this tracing is to give us a high level view of what DAX
is doing, whether faults are being serviced by PMDs or PTEs, and by real
storage or by zero pages covering holes.

I do have some patches nearly ready which also add tracing to
grab_mapping_entry() and dax_insert_mapping_entry().  These are more
targeted at logging how we are interacting with the radix tree, how we
use empty entries for locking, whether we "downgrade" huge zero pages to
4k PTE sized allocations, etc.  In the end it seemed to me that this
might be too detailed to have as constantly present tracepoints, but if
anyone sees value in having tracepoints like this in the DAX code
permanently (Jan?), please let me know and I'll add those last two
patches.

All these tracepoints were done to be consistent with the style of the
XFS tracepoints and with the existing DAX PMD tracepoints.

This patch (of 6):

Add tracepoints to dax_iomap_pte_fault(), following the same logging
conventions as the rest of DAX.

Here is an example fault that initially tries to be serviced by the PMD
fault handler but which falls back to PTEs because the VMA isn't large
enough to hold a PMD:

  small-1086  [005] ....
   71.140014: xfs_filemap_huge_fault: dev 259:0 ino 0x1003

  small-1086  [005] ....
    71.140027: dax_pmd_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400

  small-1086  [005] ....
    71.140028: dax_pmd_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 vm_start 0x10200000 vm_end 0x10500000 pgoff 0x220 max_pgoff 0x1400 FALLBACK

  small-1086  [005] ....
    71.140035: dax_pte_fault: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220

  small-1086  [005] ....
    71.140396: dax_pte_fault_done: dev 259:0 ino 0x1003 shared WRITE|ALLOW_RETRY|KILLABLE|USER address 0x10420000 pgoff 0x220 MAJOR|NOPAGE

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mtd: nand: nandsim: convert to memalloc_noreclaim_*()

Nandsim has own functions set_memalloc() and clear_memalloc() for robust
setting and clearing of PF_MEMALLOC. Replace them by the new generic
helpers. No functional change.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Chris Leech <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Lee Duncan <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Adrian Hunter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

treewide: convert PF_MEMALLOC manipulations to new helpers

We now have memalloc_noreclaim_{save,restore} helpers for robust setting
and clearing of PF_MEMALLOC. Let's convert the code which was using the
generic tsk_restore_flags(). No functional change.

[[email protected]: in net/core/sock.c the hunk is missing]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Lee Duncan <[email protected]>
Cc: Chris Leech <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Wouter Verhelst <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: introduce memalloc_noreclaim_{save,restore}

The previous patch ("mm: prevent potential recursive reclaim due to
clearing PF_MEMALLOC") has shown that simply setting and clearing
PF_MEMALLOC in current->flags can result in wrongly clearing a
pre-existing PF_MEMALLOC flag and potentially lead to recursive reclaim.
Let's introduce helpers that support proper nesting by saving the
previous stat of the flag, similar to the existing memalloc_noio_* and
memalloc_nofs_* helpers. Convert existing setting/clearing of
PF_MEMALLOC within mm to the new helpers.

There are no known issues with the converted code, but the change makes
it more robust.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Chris Leech <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Lee Duncan <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Richard Weinberger <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC

Patch series "more robust PF_MEMALLOC handling"

This series aims to unify the setting and clearing of PF_MEMALLOC, which
prevents recursive reclaim.  There are some places that clear the flag
unconditionally from current->flags, which may result in clearing a
pre-existing flag.  This already resulted in a bug report that Patch 1
fixes (without the new helpers, to make backporting easier).  Patch 2
introduces the new helpers, modelled after existing memalloc_noio_* and
memalloc_nofs_* helpers, and converts mm core to use them.  Patches 3
and 4 convert non-mm code.

This patch (of 4):

__alloc_pages_direct_compact() sets PF_MEMALLOC to prevent deadlock
during page migration by lock_page() (see the comment in
__unmap_and_move()).  Then it unconditionally clears the flag, which can
clear a pre-existing PF_MEMALLOC flag and result in recursive reclaim.
This was not a problem until commit a8161d1ed609 ("mm, page_alloc:
restructure direct compaction handling in slowpath"), because direct
compation was called only after direct reclaim, which was skipped when
PF_MEMALLOC flag was set.

Even now it's only a theoretical issue, as the new callsite of
__alloc_pages_direct_compact() is reached only for costly orders and
when gfp_pfmemalloc_allowed() is true, which means either
__GFP_NOMEMALLOC is in gfp_flags or in_interrupt() is true.  There is no
such known context, but let's play it safe and make
__alloc_pages_direct_compact() robust for cases where PF_MEMALLOC is
already set.

Fixes: a8161d1ed609 ("mm, page_alloc: restructure direct compaction handling in slowpath")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Reported-by: Andrey Ryabinin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Boris Brezillon <[email protected]>
Cc: Chris Leech <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Lee Duncan <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required

Although all architectures use a deposited page table for THP on
anonymous VMAs, some architectures (s390 and powerpc) require the
deposited storage even for file backed VMAs due to quirks of their MMUs.

This patch adds support for depositing a table in DAX PMD fault handling
path for archs that require it. Other architectures should see no
functional changes.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Oliver O'Halloran <[email protected]>
Cc: Reza Arbab <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: [email protected]
Cc: Oliver O'Halloran <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/huge_memory.c: use zap_deposited_table() more

Depending on the flags of the PMD being zapped there may or may not be a
deposited pgtable to be freed. In two of the three cases this is open
coded while the third uses the zap_deposited_table() helper. This patch
converts the others to use the helper to clean things up a bit.

Link: http://lkml.kernel.org/r/[email protected]
Cc: Reza Arbab <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: [email protected]
Cc: Oliver O'Halloran <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

time: delete CURRENT_TIME_SEC and CURRENT_TIME

All uses of CURRENT_TIME_SEC and CURRENT_TIME macros have been replaced
by other time functions. These macros are also not y2038 safe. And,
all their use cases can be fulfilled by y2038 safe ktime_get_* variants.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Acked-by: John Stultz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

gfs2: replace CURRENT_TIME with current_time

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

apparmorfs: replace CURRENT_TIME with current_time()

CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by current_time().

This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
current_time() is also planned to be transitioned to y2038 safe behavior
along with this change.

CURRENT_TIME macro will be deleted before merging the aforementioned
change.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Acked-by: John Johansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lustre: replace CURRENT_TIME macro

CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by current_time() for
filesystem times, and ktime_get_* functions for others.

struct timespec is also not y2038 safe. Retain timespec for timestamp
representation here as lustre uses it internally everywhere. These
references will be changed to use struct timespec64 in a separate patch.

This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
current_time() is also planned to be transitioned to y2038 safe behavior
along with this change.

CURRENT_TIME macro will be deleted before merging the aforementioned
change.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Cc: Oleg Drokin <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: James Simmons <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: ubifs: replace CURRENT_TIME_SEC with current_time

CURRENT_TIME_SEC is not y2038 safe.  current_time() will be transitioned
to use 64 bit time along with vfs in a separate patch.  There is no plan
to transition CURRENT_TIME_SEC to use y2038 safe time interfaces.

current_time() returns timestamps according to the granularities set in
the inode's super_block.  The granularity check to call
current_fs_time() or CURRENT_TIME_SEC is not required.

Use current_time() directly to update inode timestamp.  Use
timespec_trunc during file system creation, before the first inode is
created.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Artem Bityutskiy <[email protected]>
Cc: Adrian Hunter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: ufs: use ktime_get_real_ts64() for birthtime

CURRENT_TIME is not y2038 safe. Replace it with ktime_get_real_ts64().
Inode time formats are already 64 bit long and accommodates time64_t.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Cc: Evgeniy Dushistov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: ceph: CURRENT_TIME with ktime_get_real_ts()

CURRENT_TIME is not y2038 safe.  The macro will be deleted and all the
references to it will be replaced by ktime_get_* apis.

struct timespec is also not y2038 safe.  Retain timespec for timestamp
representation here as ceph uses it internally everywhere.  These
references will be changed to use struct timespec64 in a separate patch.

The current_fs_time() api is being changed to use vfs struct inode* as
an argument instead of struct super_block*.

Set the new mds client request r_stamp field using ktime_get_real_ts()
instead of using current_fs_time().

Also, since r_stamp is used as mtime on the server, use timespec_trunc()
to truncate the timestamp, using the right granularity from the
superblock.

This api will be transitioned to be y2038 safe along with vfs.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
M: Ilya Dryomov <[email protected]>
M: "Yan, Zheng" <[email protected]>
M: Sage Weil <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: cifs: replace CURRENT_TIME by other appropriate apis

CURRENT_TIME macro is not y2038 safe on 32 bit systems.

The patch replaces all the uses of CURRENT_TIME by current_time() for
filesystem times, and ktime_get_* functions for authentication
timestamps and timezone calculations.

This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.

CURRENT_TIME macro will be deleted before merging the aforementioned
change.

The inode timestamps read from the server are assumed to have correct
granularity and range.

The patch also assumes that the difference between server and client
times lie in the range INT_MIN..INT_MAX. This is valid because this is
the difference between current times between server and client, and the
largest timezone difference is in the range of one day.

All cifs timestamps currently use timespec representation internally.
Authentication and timezone timestamps can also be transitioned into
using timespec64 when all other timestamps for cifs is transitioned to
use timespec64.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Cc: Steve French <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

trace: make trace_hwlat timestamp y2038 safe

struct timespec is not y2038 safe on 32 bit machines and needs to be
replaced by struct timespec64 in order to represent times beyond year
2038 on such machines.

Fix all the timestamp representation in struct trace_hwlat and all the
corresponding implementations.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fs: f2fs: use ktime_get_real_seconds for sit_info times

CURRENT_TIME_SEC is not y2038 safe.

Replace use of CURRENT_TIME_SEC with ktime_get_real_seconds in segment
timestamps used by GC algorithm including the segment mtime timestamps.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepa Dinamani <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Cc: Jaegeuk Kim <[email protected]>
Cc: Chao Yu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>