Mark Rutland [Wed, 3 May 2017 15:09:37 +0000 (16:09 +0100)]
arm64: atomic_lse: match asm register sizes
The LSE atomic code uses asm register variables to ensure that
parameters are allocated in specific registers. In the majority of cases
we specifically ask for an x register when using 64-bit values, but in a
couple of cases we use a w regsiter for a 64-bit value.
For asm register variables, the compiler only cares about the register
index, with wN and xN having the same meaning. The compiler determines
the register size to use based on the type of the variable. Thus, this
inconsistency is merely confusing, and not harmful to code generation.
For consistency, this patch updates those cases to use the x register
alias. There should be no functional change as a result of this patch.
Mark Rutland [Wed, 3 May 2017 15:09:36 +0000 (16:09 +0100)]
arm64: armv8_deprecated: ensure extension of addr
Our compat swp emulation holds the compat user address in an unsigned
int, which it passes to __user_swpX_asm(). When a 32-bit value is passed
in a register, the upper 32 bits of the register are unknown, and we
must extend the value to 64 bits before we can use it as a base address.
This patch casts the address to unsigned long to ensure it has been
suitably extended, avoiding the potential issue, and silencing a related
warning from clang.
Mark Rutland [Wed, 3 May 2017 15:09:35 +0000 (16:09 +0100)]
arm64: uaccess: ensure extension of access_ok() addr
Our access_ok() simply hands its arguments over to __range_ok(), which
implicitly assummes that the addr parameter is 64 bits wide. This isn't
necessarily true for compat code, which might pass down a 32-bit address
parameter.
In these cases, we don't have a guarantee that the address has been zero
extended to 64 bits, and the upper bits of the register may contain
unknown values, potentially resulting in a suprious failure.
Avoid this by explicitly casting the addr parameter to an unsigned long
(as is done on other architectures), ensuring that the parameter is
widened appropriately.
Mark Rutland [Wed, 3 May 2017 15:09:34 +0000 (16:09 +0100)]
arm64: ensure extension of smp_store_release value
When an inline assembly operand's type is narrower than the register it
is allocated to, the least significant bits of the register (up to the
operand type's width) are valid, and any other bits are permitted to
contain any arbitrary value. This aligns with the AAPCS64 parameter
passing rules.
Our __smp_store_release() implementation does not account for this, and
implicitly assumes that operands have been zero-extended to the width of
the type being stored to. Thus, we may store unknown values to memory
when the value type is narrower than the pointer type (e.g. when storing
a char to a long).
This patch fixes the issue by casting the value operand to the same
width as the pointer operand in all cases, which ensures that the value
is zero-extended as we expect. We use the same union trickery as
__smp_load_acquire and {READ,WRITE}_ONCE() to avoid GCC complaining that
pointers are potentially cast to narrower width integers in unreachable
paths.
A whitespace issue at the top of __smp_store_release() is also
corrected.
No changes are necessary for __smp_load_acquire(). Load instructions
implicitly clear any upper bits of the register, and the compiler will
only consider the least significant bits of the register as valid
regardless.
Fixes: 47933ad41a86 ("arch: Introduce smp_load_acquire(), smp_store_release()") Fixes: 878a84d5a8a1 ("arm64: add missing data types in smp_load_acquire/smp_store_release") Cc: <[email protected]> # 3.14.x- Acked-by: Will Deacon <[email protected]> Signed-off-by: Mark Rutland <[email protected]> Cc: Matthias Kaehlcke <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
Mark Rutland [Wed, 3 May 2017 15:09:33 +0000 (16:09 +0100)]
arm64: xchg: hazard against entire exchange variable
The inline assembly in __XCHG_CASE() uses a +Q constraint to hazard
against other accesses to the memory location being exchanged. However,
the pointer passed to the constraint is a u8 pointer, and thus the
hazard only applies to the first byte of the location.
GCC can take advantage of this, assuming that other portions of the
location are unchanged, as demonstrated with the following test case:
union u {
unsigned long l;
unsigned int i[2];
};
unsigned long update_char_hazard(union u *u)
{
unsigned int a, b;
a = u->i[1];
asm ("str %1, %0" : "+Q" (*(char *)&u->l) : "r" (0UL));
b = u->i[1];
return a ^ b;
}
unsigned long update_long_hazard(union u *u)
{
unsigned int a, b;
a = u->i[1];
asm ("str %1, %0" : "+Q" (*(long *)&u->l) : "r" (0UL));
b = u->i[1];
return a ^ b;
}
The linaro 15.08 GCC 5.1.1 toolchain compiles the above as follows when
using -O2 or above:
This patch fixes the issue by passing an unsigned long pointer into the
+Q constraint, as we do for our cmpxchg code. This may hazard against
more than is necessary, but this is better than missing a necessary
hazard.
Some kernel features don't currently work if a task puts a non-zero
address tag in its stack pointer, frame pointer, or frame record entries
(FP, LR).
For example, with a tagged stack pointer, the kernel can't deliver
signals to the process, and the task is killed instead. As another
example, with a tagged frame pointer or frame records, perf fails to
generate call graphs or resolve symbols.
For now, just document these limitations, instead of finding and fixing
everything that doesn't work, as it's not known if anyone needs to use
tags in these places anyway.
In addition, as requested by Dave Martin, generalize the limitations
into a general kernel address tag policy, and refactor
tagged-pointers.txt to include it.
arm64: entry: improve data abort handling of tagged pointers
When handling a data abort from EL0, we currently zero the top byte of
the faulting address, as we assume the address is a TTBR0 address, which
may contain a non-zero address tag. However, the address may be a TTBR1
address, in which case we should not zero the top byte. This patch fixes
that. The effect is that the full TTBR1 address is passed to the task's
signal handler (or printed out in the kernel log).
When handling a data abort from EL1, we leave the faulting address
intact, as we assume it's either a TTBR1 address or a TTBR0 address with
tag 0x00. This is true as far as I'm aware, we don't seem to access a
tagged TTBR0 address anywhere in the kernel. Regardless, it's easy to
forget about address tags, and code added in the future may not always
remember to remove tags from addresses before accessing them. So add tag
handling to the EL1 data abort handler as well. This also makes it
consistent with the EL0 data abort handler.
arm64: hw_breakpoint: fix watchpoint matching for tagged pointers
When we take a watchpoint exception, the address that triggered the
watchpoint is found in FAR_EL1. We compare it to the address of each
configured watchpoint to see which one was hit.
The configured watchpoint addresses are untagged, while the address in
FAR_EL1 will have an address tag if the data access was done using a
tagged address. The tag needs to be removed to compare the address to
the watchpoints.
Currently we don't remove it, and as a result can report the wrong
watchpoint as being hit (specifically, always either the highest TTBR0
watchpoint or lowest TTBR1 watchpoint). This patch removes the tag.
arm64: traps: fix userspace cache maintenance emulation on a tagged pointer
When we emulate userspace cache maintenance in the kernel, we can
currently send the task a SIGSEGV even though the maintenance was done
on a valid address. This happens if the address has a non-zero address
tag, and happens to not be mapped in.
When we get the address from a user register, we don't currently remove
the address tag before performing cache maintenance on it. If the
maintenance faults, we end up in either __do_page_fault, where find_vma
can't find the VMA if the address has a tag, or in do_translation_fault,
where the tagged address will appear to be above TASK_SIZE. In both
cases, the address is not mapped in, and the task is sent a SIGSEGV.
This patch removes the tag from the address before using it. With this
patch, the fault is handled correctly, the address gets mapped in, and
the cache maintenance succeeds.
As a second bug, if cache maintenance (correctly) fails on an invalid
tagged address, the address gets passed into arm64_notify_segfault,
where find_vma fails to find the VMA due to the tag, and the wrong
si_code may be sent as part of the siginfo_t of the segfault. With this
patch, the correct si_code is sent.
Linus Torvalds [Tue, 9 May 2017 16:20:16 +0000 (09:20 -0700)]
Merge tag 'armsoc-fixes-nc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull misc ARM SoC fixes from Olof Johansson:
"ARM SoC non-urgent fixes for merge window
Smaller patches that didn't seem to find a home in other branches, and
low-priority fixes from late in the merge window. A number of these
are MAINTAINER updates, it seems.
Highlights:
* Maintainers:
- Remove Alexandre Courbot and Stephen Warren from Tegra
maintainership, add Jon Hunter
- Remove Stephen Warren and add Stefan Wahren to bcm2835
- Tweaks for file flagging for Marvell Dove
* Fixes:
- For two non-common-clk platform, handle clk_disable with NULL arg
- Remove redundant Kconfig select for Oxnas"
* tag 'armsoc-fixes-nc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: mmp: let clk_disable() return immediately if clk is NULL
ARM: w90x900: let clk_disable() return immediately if clk is NULL
MAINTAINERS: Add file patterns for dove device tree bindings
ARM: oxnas: remove redundant select CPU_V6K
MAINTAINERS: tegra: Remove self as maintainer
MAINTAINERS: tegra: Replace Stephen with Jon
MAINTAINERS: Add Stefan Wahren to bcm2835.
MAINTAINERS: remove swarren from bcm2835
MAINTAINERS: Add Jon Mason to BCM5301X maintainers
Linus Torvalds [Tue, 9 May 2017 16:12:53 +0000 (09:12 -0700)]
Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
"Assorted bits and pieces from various people. No common topic in this
pile, sorry"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/affs: add rename exchange
fs/affs: add rename2 to prepare multiple methods
Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()
fs: don't set *REFERENCED on single use objects
fs: compat: Remove warning from COMPATIBLE_IOCTL
remove pointless extern of atime_need_update_rcu()
fs: completely ignore unknown open flags
fs: add a VALID_OPEN_FLAGS
fs: remove _submit_bh()
fs: constify tree_descr arrays passed to simple_fill_super()
fs: drop duplicate header percpu-rwsem.h
fs/affs: bugfix: Write files greater than page size on OFS
fs/affs: bugfix: enable writes on OFS disks
fs/affs: remove node generation check
fs/affs: import amigaffs.h
fs/affs: bugfix: make symbolic links work again
Dan Williams [Mon, 8 May 2017 19:33:53 +0000 (12:33 -0700)]
device-dax: kill NR_DEV_DAX
There is no point to ask how many device-dax instances the kernel should
support. Since we are already using a dynamic major number, just allow
the max number of minors by default and be done. This also fixes the
fact that the proposed max for the NR_DEV_DAX range was larger than what
could be supported by alloc_chrdev_region().
Linus Torvalds [Tue, 9 May 2017 15:45:16 +0000 (08:45 -0700)]
proc: try to remove use of FOLL_FORCE entirely
We fixed the bugs in it, but it's still an ugly interface, so let's see
if anybody actually depends on it. It's entirely possible that nothing
actually requires the whole "punch through read-only mappings"
semantics.
For example, gdb definitely uses the /proc/<pid>/mem interface, but it
looks like it mainly does it for regular reads of the target (that don't
need FOLL_FORCE), and looking at the gdb source code seems to fall back
on the traditional ptrace(PTRACE_POKEDATA) interface if it needs to.
If this breaks something, I do have a (more complex) version that only
enables FOLL_FORCE when somebody has PTRACE_ATTACH'ed to the target,
like the comment here used to say ("Maybe we should limit FOLL_FORCE to
actual ptrace users?").
David S. Miller [Tue, 9 May 2017 15:24:24 +0000 (11:24 -0400)]
Merge branch 'qed-general-fixes'
Yuval Mintz says:
====================
qed*: General fixes
This series contain several fixes for qed and qede.
- #1 [and ~#5] relate to XDP cleanups
- #2 and #5 correct VF behavior
- #3 and #4 fix and add missing configurations needed for RoCE & storage
====================
Mintz, Yuval [Tue, 9 May 2017 12:07:51 +0000 (15:07 +0300)]
qede: Split PF/VF ndos.
PFs and VFs share the same structure of NDOs today,
and the VFs explicitly fails the ndo_xdp() callback stating
it doesn't support XDP.
This results in lots of:
[qede_xdp:1032(enp131s2)]VFs don't support XDP
------------[ cut here ]------------
WARNING: CPU: 4 PID: 1426 at net/core/rtnetlink.c:1637 rtnl_dump_ifinfo+0x354/0x3c0
...
Call Trace:
? __alloc_skb+0x9b/0x1d0
netlink_dump+0x122/0x290
netlink_recvmsg+0x27d/0x430
sock_recvmsg+0x3d/0x50
...
As every dump request for the VF interface info would fail due to
rtnl_xdp_fill() returning an error code.
To resolve this, introduce a subset of the NDOs meant for the VF
in a seperate structure and register that one instead for VFs,
and omit the ndo_xdp initialization.
Fixes: 40b8c45492ef ("qede: Prevent VFs from using XDP") Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Ram Amrani [Tue, 9 May 2017 12:07:50 +0000 (15:07 +0300)]
qed: Correct doorbell configuration for !4Kb pages
When configuring the doorbell DPI address, driver aligns the start
address to 4KB [HW-pages] instead of host PAGE_SIZE.
As a result, RoCE applications might receive addresses which are
unaligned to pages [when PAGE_SIZE > 4KB], which is a security risk.
Mintz, Yuval [Tue, 9 May 2017 12:07:49 +0000 (15:07 +0300)]
qed: Tell QM the number of tasks
Driver doesn't pass the number of tasks to the QM init logic
which would cause back-pressure in scenarios requiring many tasks
[E.g., using max MRs] and thus reduced performance.
When (re|un)loading, Tx-queues belonging to XDP would not get freed.
Fixes: cb6aeb079294 ("qede: Add support for XDP_TX") Signed-off-by: Sudarsana Reddy Kalluru <[email protected]> Signed-off-by: Yuval Mintz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net/mlx4_core: Reduce harmless SRIOV error message to debug level
Under SRIOV resource management, extra counters are allocated to VFs
from a free pool. If that pool is empty, the ALLOC_RES command for
a counter resource fails -- and this generates a misleading error
message in the message log.
Under SRIOV, each VF is allocated (i.e., guaranteed) 2 counters --
one counter per port. For ETH ports, the RoCE driver requests an
additional counter (above the guaranteed counters). If that request
fails, the VF RoCE driver simply uses the default (i.e., guaranteed)
counter for that port.
Thus, failing to allocate an additional counter does not constitute
a problem, and the error message on the PF when this occurs should
be reduced to debug level.
Finally, to identify the situation that the reason for the failure is
that no resources are available to grant to the VF, we modified the
error returned by mlx4_grant_resource to -EDQUOT (Quota exceeded),
which more accurately describes the error.
Fixes: c3abb51bdb0e ("IB/mlx4: Add RoCE/IB dedicated counters") Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
A known weakness in ptr_ring design is that it does not handle well the
situation when ring is almost full: as entries are consumed they are
immediately used again by the producer, so consumer and producer are
writing to a shared cache line.
To fix this, add batching to consume calls: as entries are
consumed do not write NULL into the ring until we get
a multiple (in current implementation 2x) of cache lines
away from the producer. At that point, write them all out.
We do the write out in the reverse order to keep
producer from sharing cache with consumer for as long
as possible.
Writeout also triggers when ring wraps around - there's
no special reason to do this but it helps keep the code
a bit simpler.
What should we do if getting away from producer by 2 cache lines
would mean we are keeping the ring moe than half empty?
Maybe we should reduce the batching in this case,
current patch simply reduces the batching.
Notes:
- it is no longer true that a call to consume guarantees
that the following call to produce will succeed.
No users seem to assume that.
- batching can also in theory reduce the signalling rate:
users that would previously send interrups to the producer
to wake it up after consuming each entry would now only
need to do this once in a batch.
Doing this would be easy by returning a flag to the caller.
No users seem to do signalling on consume yet so this was not
implemented yet.
Nicholas Piggin [Tue, 9 May 2017 03:16:52 +0000 (13:16 +1000)]
powerpc/64s: Support new device tree binding for discovering CPU features
The ibm,powerpc-cpu-features device tree binding describes CPU features with
ASCII names and extensible compatibility, privilege, and enablement metadata
that allows improved flexibility and compatibility with new hardware.
The interface is described in detail in ibm,powerpc-cpu-features.txt in this
patch.
Currently this code is not enabled by default, and there are no released
firmwares that provide the binding.
Nicholas Piggin [Tue, 9 May 2017 03:17:08 +0000 (13:17 +1000)]
powerpc: Don't print cpu_spec->cpu_name if it's NULL
Currently we assume that if the cpu_spec has a pvr_mask then it must also have a
cpu_name. But that will change in a subsequent commit when we do CPU feature
discovery via the device tree, so check explicitly if cpu_name is NULL.
Nicholas Piggin [Tue, 18 Apr 2017 19:12:18 +0000 (05:12 +1000)]
of/fdt: introduce of_scan_flat_dt_subnodes and of_get_flat_dt_phandle
Introduce primitives for FDT parsing. These will be used for powerpc
cpufeatures node scanning, which has quite complex structure but should
be processed early.
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:
"Includes a fix for a powerpc/next mm regression on 64e, a fix for a
kernel hang on 64e when using a debugger inside a relocated kernel, a
qman fix, and misc qe improvements."
Paolo Bonzini [Tue, 9 May 2017 10:51:49 +0000 (12:51 +0200)]
Merge tag 'kvm-arm-for-v4.12-round2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
Second round of KVM/ARM Changes for v4.12.
Changes include:
- A fix related to the 32-bit idmap stub
- A fix to the bitmask used to deode the operands of an AArch32 CP
instruction
- We have moved the files shared between arch/arm/kvm and
arch/arm64/kvm to virt/kvm/arm
- We add support for saving/restoring the virtual ITS state to
userspace
KVM: arm/arm64: Don't call map_resources when restoring ITS tables
The only reason we called kvm_vgic_map_resources() when restoring the
ITS tables was because we wanted to have the KVM iodevs registered in
the KVM IO bus framework at the time when the ITS was restored such that
a restored and active device can inject MSIs prior to otherwise calling
kvm_vgic_map_resources() from the first run of a VCPU.
Since we now register the KVM iodevs for the redestributors and ITS as
soon as possible (when setting the base addresses), we no longer need
this call and kvm_vgic_map_resources() is again called only when first
running a VCPU.
KVM: arm/arm64: Register ITS iodev when setting base address
We have to register the ITS iodevice before running the VM, because in
migration scenarios, we may be restoring a live device that wishes to
inject MSIs before the VCPUs have started.
All we need to register the ITS io device is the base address of the
ITS, so we can simply register that when the base address of the ITS is
set.
[ Code to fix concurrency issues when setting the ITS base address and
to fix the undef base address check written by Marc Zyngier ]
Marc Zyngier [Mon, 8 May 2017 17:15:40 +0000 (18:15 +0100)]
KVM: arm/arm64: Get rid of its->initialized field
The its->initialized doesn't bring much to the table, and creates
unnecessary ordering between setting the address and initializing it
(which amounts to exactly nothing).
Let's kill it altogether, making KVM_DEV_ARM_VGIC_CTRL_INIT the no-op
it deserves to be.
KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUs
Instead of waiting with registering KVM iodevs until the first VCPU is
run, we can actually create the iodevs when the redist base address is
set. The only downside is that we must now also check if we need to do
this for VCPUs which are created after creating the VGIC, because there
is no enforced ordering between creating the VGIC (and setting its base
addresses) and creating the VCPUs.
As we are about to handle setting the address for the redistributor base
region separately from some of the other base addresses, let's rework
this function to leave a little more room for being flexible in what
each type of base address does.
KVM: arm/arm64: Make vgic_v3_check_base more broadly usable
As we are about to fiddle with the IO device registration mechanism,
let's be a little more careful when setting base addresses as early as
possible. When setting a base address, we can check that there's
address space enough for its scope and when the last of the two
base addresses (dist and redist) get set, we can also check if the
regions overlap at that time.
This allows us to provide error messages to the user at time when trying
to set the base address, as opposed to later when trying to run the VM.
To do this, we make vgic_v3_check_base available in the core vgic-v3
code as well as in the other parts of the GICv3 code, namely the MMIO
config code.
We also return true for undefined base addresses so that the function
can be used before all base addresses are set; all callers already check
for uninitialized addresses before calling this function.
Split out the function to register all the redistributor iodevs into a
function that handles a single redistributor at a time in preparation
for being able to call this per VCPU as these get created.
KVM: Add kvm_vcpu_get_idx to get vcpu index in kvm->vcpus
There are occasional needs to use the index of vcpu in the kvm->vcpus
array to map something related to a VCPU. For example, unlike the
vcpu->vcpu_id, the vcpu index is guaranteed to not be sparse across all
vcpus which is useful when allocating a memory area for each vcpu.
Bandan Das [Fri, 5 May 2017 19:25:15 +0000 (15:25 -0400)]
nVMX: Advertise PML to L1 hypervisor
Advertise the PML bit in vmcs12 but don't try to enable
it in hardware when running L2 since L0 is emulating it. Also,
preserve L0's settings for PML since it may still
want to log writes.
With EPT A/D enabled, processor access to L2 guest
paging structures will result in a write violation.
When this happens, write the GUEST_PHYSICAL_ADDRESS
to the pml buffer provided by L1 if the access is
write and the dirty bit is being set.
This patch also adds necessary checks during VMEntry if L1
has enabled PML. If the PML index overflows, we change the
exit reason and run L1 to simulate a PML full event.
Jim Mattson [Fri, 5 May 2017 18:28:09 +0000 (11:28 -0700)]
kvm: nVMX: Validate CR3 target count on nested VM-entry
According to the SDM, the CR3-target count must not be greater than
4. Future processors may support a different number of CR3-target
values. Software should read the VMX capability MSR IA32_VMX_MISC to
determine the number of values supported.
Paolo Bonzini [Tue, 9 May 2017 09:50:01 +0000 (11:50 +0200)]
Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
The main thing here is a new implementation of the in-kernel
XICS interrupt controller emulation for POWER9 machines, from Ben
Herrenschmidt.
POWER9 has a new interrupt controller called XIVE (eXternal Interrupt
Virtualization Engine) which is able to deliver interrupts directly
to guest virtual CPUs in hardware without hypervisor intervention.
With this new code, the guest still sees the old XICS interface but
performance is better because the XICS emulation in the host uses the
XIVE directly rather than going through a XICS emulation in firmware.
Geliang Tang [Sat, 6 May 2017 15:37:19 +0000 (23:37 +0800)]
KVM: set no_llseek in stat_fops_per_vm
In vm_stat_get_per_vm_fops and vcpu_stat_get_per_vm_fops, since we
use nonseekable_open() to open, we should use no_llseek() to seek,
not generic_file_llseek().
Similarly to commit 2563a70c3b ("powerpc/64s: Remove unnecessary relocation
branch from idle handler"), the machine check handler has a BRANCH_TO from
relocated to relocated code, which is unnecessary.
It has also caused build errors with some toolchains:
arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:395: Error: operand out of range
(0xffffffffffff8280 is not between 0x0000000000000000 and
0x000000000000ffff)
Fixes: 1945bc4549e5 ("powerpc/64s: Fix POWER9 machine check handler from stop state") Signed-off-by: Nicholas Piggin <[email protected]>
Reported-and-tested-by : Abdul Haleem <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
powerpc/mm/book3s/64: Rework page table geometry for lower memory usage
Recently in commit f6eedbba7a26 ("powerpc/mm/hash: Increase VA range to 128TB")
we increased the virtual address space for user processes to 128TB by default,
and up to 512TB if user space opts in.
This obviously required expanding the range of the Linux page tables. For Book3s
64-bit using hash and with PAGE_SIZE=64K, we increased the PGD to 2^15 entries.
This meant we could cover the full address range, while still being able to
insert a 16G hugepage at the PGD level and a 16M hugepage in the PMD.
The downside of that geometry is that it uses a lot of memory for the PGD, and
in particular makes the PGD a 4-page allocation, which means it's much more
likely to fail under memory pressure.
Instead we can make the PMD larger, so that a single PUD entry maps 16G,
allowing the 16G hugepages to sit at that level in the tree. We're then able to
split the remaining bits between the PUG and PGD. We make the PGD slightly
larger as that results in lower memory usage for typical programs.
When THP is enabled the PMD actually doubles in size, to 2^11 entries, or 2^14
bytes, which is large but still < PAGE_SIZE.
Horia Geantă [Mon, 8 May 2017 08:50:16 +0000 (11:50 +0300)]
powerpc: Fix distclean with Makefile.postlink
Makefile.postlink always includes include/config/auto.conf, however
this file is not present in a clean kernel tree, causing make to fail:
$ git clone linuxppc.git
$ cd linuxppc.git
$ make distclean
arch/powerpc/Makefile.postlink:10: include/config/auto.conf: No such file or directory
make[1]: *** No rule to make target `include/config/auto.conf'. Stop.
make: *** [vmlinuxclean] Error 2
Equally running 'make distclean; make distclean' will trip the error case.
Change the inclusion such that file not being found does not trigger an error.
Fixes: f188d0524d7e ("powerpc: Use the new post-link pass to check relocations") Reported-by: Mircea Pop <[email protected]> Signed-off-by: Horia Geantă <[email protected]> Tested-by: Justin M. Forbes <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
KVM: arm/arm64: vgic: Rename kvm_vgic_vcpu_init to kvm_vgic_vcpu_enable
This function really doesn't init anything, it enables the CPU
interface, so name it as such, which gives us the name to use for actual
init work later on.
arch_timer_mem_find_best_frame() looks through ARCH_TIMER_MEM_MAX_FRAMES
frames even after finding matches to ensure the best frame is chosen,
which means the variable frame will point to the last valid frame but
not necessarily the best frame.
On Juno, we get the following error as the wrong frame is returned as the
best frame from arch_timer_mem_find_best_frame():
arch_timer: Unable to map frame @ 0x0000000000000000
arch_timer: Frame missing phys irq.
Failed to initialize '/timer@2a810000': -22
Fix the issue by correctly returning the best frame from
arch_timer_mem_find_best_frame().
Andy Lutomirski [Tue, 9 May 2017 00:09:10 +0000 (17:09 -0700)]
x86/boot/32: Fix UP boot on Quark and possibly other platforms
This partially reverts commit:
23b2a4ddebdd17f ("x86/boot/32: Defer resyncing initial_page_table until per-cpu is set up")
That commit had one definite bug and one potential bug. The
definite bug is that setup_per_cpu_areas() uses a differnet generic
implementation on UP kernels, so initial_page_table never got
resynced. This was fine for access to percpu data (it's in the
identity map on UP), but it breaks other users of
initial_page_table. The potential bug is that helpers like
efi_init() would be called before the tables were synced.
Avoid both problems by just syncing the page tables in setup_arch()
*and* setup_per_cpu_areas().
Laura Abbott [Mon, 8 May 2017 21:23:16 +0000 (14:23 -0700)]
x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
'__vmalloc_start_set' currently only gets set in initmem_init() when
!CONFIG_NEED_MULTIPLE_NODES. This breaks detection of vmalloc address
with virt_addr_valid() with CONFIG_NEED_MULTIPLE_NODES=y, causing
a kernel crash:
[mm/usercopy] 517e1fbeb6: kernel BUG at arch/x86/mm/physaddr.c:78!
Set '__vmalloc_start_set' appropriately for that case as well.
Linus Torvalds [Tue, 9 May 2017 03:43:30 +0000 (20:43 -0700)]
Merge tag 'linux-kselftest-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
"This update consists of:
- important fixes for build failures and clean target related
warnings to address regressions introduced in commit 88baa78d1f31
("selftests: remove duplicated all and clean target")
- several minor spelling fixes in and log messages and comment
blocks.
- Enabling configs for better test coverage in ftrace, vm, and
cpufreq tests.
- .gitignore changes"
* tag 'linux-kselftest-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (26 commits)
selftests: x86: add missing executables to .gitignore
selftests: watchdog: accept multiple params on command line
selftests: create cpufreq kconfig fragments
selftests: x86: override clean in lib.mk to fix warnings
selftests: sync: override clean in lib.mk to fix warnings
selftests: splice: override clean in lib.mk to fix warnings
selftests: gpio: fix clean target to remove all generated files and dirs
selftests: add gpio generated files to .gitignore
selftests: powerpc: override clean in lib.mk to fix warnings
selftests: gpio: override clean in lib.mk to fix warnings
selftests: futex: override clean in lib.mk to fix warnings
selftests: lib.mk: define CLEAN macro to allow Makefiles to override clean
selftests: splice: fix clean target to not remove default_file_splice_read.sh
selftests: gpio: add config fragment for gpio-mockup
selftests: breakpoints: allow to cross-compile for aarch64/arm64
selftests/Makefile: Add missed PHONY targets
selftests/vm/run_vmtests: Fix wrong comment
selftests/Makefile: Add missed closing `"` in comment
selftests/vm/run_vmtests: Polish output text
selftests/timers: fix spelling mistake: "Asynchronous"
...
Linus Torvalds [Tue, 9 May 2017 03:36:38 +0000 (20:36 -0700)]
Merge tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull more tracing updates from Steven Rostedt:
"These are three simple changes.
The first one is just a switch from using strcpy() to strlcpy().
Someone thought that it may cause an overflow bug, but since it only
copies comms into a pre-allocated array of TASK_COMM_LEN, and no comm
should ever be bigger than that, nor not end with a nul character,
this change is more of a safety precaution than fixing anything that
is actually broken.
The other two changes are simply cleaning and optimizing some code"
* tag 'trace-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Simplify ftrace_match_record() even more
ftrace: Remove an unneeded condition
tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()
Linus Torvalds [Tue, 9 May 2017 03:07:29 +0000 (20:07 -0700)]
Merge tags 'for-linus' and 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull more rdma updates from Doug Ledford:
"As mentioned in my first pull request, this is the subsequent pull
requests I had. This is all I have, and in fact this cleans out the
RDMA subsystem's entire patchworks queue of kernel changes that are
ready to go (well, it did for the weekend anyway, a few new patches
are in, but they'll be coming during the -rc cycle).
The first tag contains a single patch that would have conflicted if
taken from my tree or DaveM's tree as it needed our trees merged to
come cleanly.
The second tag contains the patch series from Intel plus three other
stragllers that came in late last week. I took them because it allowed
me to legitimately claim that the RDMA patchworks queue was, for a
short time, 100% cleared of all waiting kernel patches, woohoo! :-).
I have it under my for-next tag, so it did get 0day and linux- next
over the end of last week, and linux-next did show one minor conflict.
Summary:
'for-linus' tag:
- mlx5/IPoIB fixup patch
'for-next' tag:
- the hfi1 15 patch set that landed late
- IPoIB get_link_ksettings which landed late because I asked for a
respin
- one late rxe change
- one -rc worthy fix that's in early"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/mlx5: Enable IPoIB acceleration
* tag 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
rxe: expose num_possible_cpus() cnum_comp_vectors
IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type
IB/hfi1: Clean up on context initialization failure
IB/hfi1: Fix an assign/ordering issue with shared context IDs
IB/hfi1: Clean up context initialization
IB/hfi1: Correctly clear the pkey
IB/hfi1: Search shared contexts on the opened device, not all devices
IB/hfi1: Remove atomic operations for SDMA_REQ_HAVE_AHG bit
IB/hfi1: Use filedata rather than filepointer
IB/hfi1: Name function prototype parameters
IB/hfi1: Fix a subcontext memory leak
IB/hfi1: Return an error on memory allocation failure
IB/hfi1: Adjust default eager_buffer_size to 8MB
IB/hfi1: Get rid of divide when setting the tx request header
IB/hfi1: Fix yield logic in send engine
IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line
IB/hfi1: Fix checks for Offline transient state
IB/ipoib: add get_link_ksettings in ethtool
- add ITE 8893 bridge DMA alias quirk (Jarod Wilson)
- restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
(Manish Jaggi)
* tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
PCI: Don't allow unbinding host controllers that aren't prepared
ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
MAINTAINERS: Add PCI Endpoint maintainer
Documentation: PCI: Add userguide for PCI endpoint test function
tools: PCI: Add sample test script to invoke pcitest
tools: PCI: Add a userspace tool to test PCI endpoint
Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
misc: Add host side PCI driver for PCI test function device
PCI: Add device IDs for DRA74x and DRA72x
dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
PCI: dwc: dra7xx: Workaround for errata id i870
dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
PCI: dwc: dra7xx: Add EP mode support
PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
dt-bindings: PCI: Add DT bindings for PCI designware EP mode
PCI: dwc: designware: Add EP mode support
Documentation: PCI: Add binding documentation for pci-test endpoint function
ixgbe: Use pcie_flr() instead of duplicating it
IB/hfi1: Use pcie_flr() instead of duplicating it
PCI: imx6: Fix spelling mistake: "contol" -> "control"
...
Linus Torvalds [Tue, 9 May 2017 01:49:23 +0000 (18:49 -0700)]
Merge tag 'tty-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
"Here is the "big" TTY/Serial patch updates for 4.12-rc1
Not a lot of new things here, the normal number of serial driver
updates and additions, tiny bugs fixed, and some core files split up
to make future changes a bit easier for Nicolas's "tiny-tty" work.
All of these have been in linux-next for a while"
* tag 'tty-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (62 commits)
serial: small Makefile reordering
tty: split job control support into a file of its own
tty: move baudrate handling code to a file of its own
console: move console_init() out of tty_io.c
serial: 8250_early: Add earlycon support for Palmchip UART
tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44
vt: make mouse selection of non-ASCII consistent
vt: set mouse selection word-chars to gpm's default
imx-serial: Reduce RX DMA startup latency when opening for reading
serial: omap: suspend device on probe errors
serial: omap: fix runtime-pm handling on unbind
tty: serial: omap: add UPF_BOOT_AUTOCONF flag for DT init
serial: samsung: Remove useless spinlock
serial: samsung: Add missing checks for dma_map_single failure
serial: samsung: Use right device for DMA-mapping calls
serial: imx: setup DCEDTE early and ensure DCD and RI irqs to be off
tty: fix comment typo s/repsonsible/responsible/
tty: amba-pl011: Fix spurious TX interrupts
serial: xuartps: Enable clocks in the pm disable case also
serial: core: Re-use struct uart_port {name} field
...
Linus Torvalds [Tue, 9 May 2017 01:17:56 +0000 (18:17 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
- the rest of MM
- various misc things
- procfs updates
- lib/ updates
- checkpatch updates
- kdump/kexec updates
- add kvmalloc helpers, use them
- time helper updates for Y2038 issues. We're almost ready to remove
current_fs_time() but that awaits a btrfs merge.
- add tracepoints to DAX
* emailed patches from Andrew Morton <[email protected]>: (114 commits)
drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
selftests/vm: add a test for virtual address range mapping
dax: add tracepoint to dax_insert_mapping()
dax: add tracepoint to dax_writeback_one()
dax: add tracepoints to dax_writeback_mapping_range()
dax: add tracepoints to dax_load_hole()
dax: add tracepoints to dax_pfn_mkwrite()
dax: add tracepoints to dax_iomap_pte_fault()
mtd: nand: nandsim: convert to memalloc_noreclaim_*()
treewide: convert PF_MEMALLOC manipulations to new helpers
mm: introduce memalloc_noreclaim_{save,restore}
mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
mm/huge_memory.c: use zap_deposited_table() more
time: delete CURRENT_TIME_SEC and CURRENT_TIME
gfs2: replace CURRENT_TIME with current_time
apparmorfs: replace CURRENT_TIME with current_time()
lustre: replace CURRENT_TIME macro
fs: ubifs: replace CURRENT_TIME_SEC with current_time
fs: ufs: use ktime_get_real_ts64() for birthtime
...
Andrew Morton [Mon, 8 May 2017 23:00:22 +0000 (16:00 -0700)]
drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
drivers/staging/ccree/ssi_hash.c:1990: error: unknown field 'template_ahash' specified in initializer
drivers/staging/ccree/ssi_hash.c:1991: error: unknown field 'init' specified in initializer
drivers/staging/ccree/ssi_hash.c:1991: warning: missing braces around initializer
drivers/staging/ccree/ssi_hash.c:1991: warning: (near initialization for 'driver_hash[0].<anonymous>.template_ahash')
drivers/staging/ccree/ssi_hash.c:1992: error: unknown field 'update' specified in initializer
drivers/staging/ccree/ssi_hash.c:1992: warning: excess elements in union initializer
drivers/staging/ccree/ssi_hash.c:1992: warning: (near initialization for 'driver_hash[0].<anonymous>')
drivers/staging/ccree/ssi_hash.c:1993: error: unknown field 'final' specified in initializer
drivers/staging/ccree/ssi_hash.c:1993: warning: excess elements in union initializer
drivers/staging/ccree/ssi_hash.c:1993: warning: (near initialization for 'driver_hash[0].<anonymous>')
...
gcc-4.4.4 has issues with anon union initializers. Work around this.
selftests/vm: add a test for virtual address range mapping
This verifies virtual address mapping below and above the 128TB range
and makes sure that address returned are within the expected range
depending upon the hint passed from the user space.
Ross Zwisler [Mon, 8 May 2017 23:00:16 +0000 (16:00 -0700)]
dax: add tracepoint to dax_insert_mapping()
Add a tracepoint to dax_insert_mapping(), following the same logging
conventions as the rest of DAX. This tracepoint, along with the one in
dax_load_hole(), lets us know how a DAX PTE fault was serviced.
Here is an example DAX fault that inserts a PTE mapping:
Ross Zwisler [Mon, 8 May 2017 23:00:00 +0000 (16:00 -0700)]
dax: add tracepoints to dax_iomap_pte_fault()
Patch series "second round of tracepoints for DAX".
This second round of DAX tracepoint patches adds tracing to the PTE
fault path (dax_iomap_pte_fault(), dax_pfn_mkwrite(), dax_load_hole(),
dax_insert_mapping()) and to the writeback path
(dax_writeback_mapping_range(), dax_writeback_one()).
The purpose of this tracing is to give us a high level view of what DAX
is doing, whether faults are being serviced by PMDs or PTEs, and by real
storage or by zero pages covering holes.
I do have some patches nearly ready which also add tracing to
grab_mapping_entry() and dax_insert_mapping_entry(). These are more
targeted at logging how we are interacting with the radix tree, how we
use empty entries for locking, whether we "downgrade" huge zero pages to
4k PTE sized allocations, etc. In the end it seemed to me that this
might be too detailed to have as constantly present tracepoints, but if
anyone sees value in having tracepoints like this in the DAX code
permanently (Jan?), please let me know and I'll add those last two
patches.
All these tracepoints were done to be consistent with the style of the
XFS tracepoints and with the existing DAX PMD tracepoints.
This patch (of 6):
Add tracepoints to dax_iomap_pte_fault(), following the same logging
conventions as the rest of DAX.
Here is an example fault that initially tries to be serviced by the PMD
fault handler but which falls back to PTEs because the VMA isn't large
enough to hold a PMD:
small-1086 [005] ....
71.140014: xfs_filemap_huge_fault: dev 259:0 ino 0x1003
Vlastimil Babka [Mon, 8 May 2017 22:59:57 +0000 (15:59 -0700)]
mtd: nand: nandsim: convert to memalloc_noreclaim_*()
Nandsim has own functions set_memalloc() and clear_memalloc() for robust
setting and clearing of PF_MEMALLOC. Replace them by the new generic
helpers. No functional change.
Vlastimil Babka [Mon, 8 May 2017 22:59:53 +0000 (15:59 -0700)]
treewide: convert PF_MEMALLOC manipulations to new helpers
We now have memalloc_noreclaim_{save,restore} helpers for robust setting
and clearing of PF_MEMALLOC. Let's convert the code which was using the
generic tsk_restore_flags(). No functional change.
Vlastimil Babka [Mon, 8 May 2017 22:59:50 +0000 (15:59 -0700)]
mm: introduce memalloc_noreclaim_{save,restore}
The previous patch ("mm: prevent potential recursive reclaim due to
clearing PF_MEMALLOC") has shown that simply setting and clearing
PF_MEMALLOC in current->flags can result in wrongly clearing a
pre-existing PF_MEMALLOC flag and potentially lead to recursive reclaim.
Let's introduce helpers that support proper nesting by saving the
previous stat of the flag, similar to the existing memalloc_noio_* and
memalloc_nofs_* helpers. Convert existing setting/clearing of
PF_MEMALLOC within mm to the new helpers.
There are no known issues with the converted code, but the change makes
it more robust.
Vlastimil Babka [Mon, 8 May 2017 22:59:46 +0000 (15:59 -0700)]
mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
Patch series "more robust PF_MEMALLOC handling"
This series aims to unify the setting and clearing of PF_MEMALLOC, which
prevents recursive reclaim. There are some places that clear the flag
unconditionally from current->flags, which may result in clearing a
pre-existing flag. This already resulted in a bug report that Patch 1
fixes (without the new helpers, to make backporting easier). Patch 2
introduces the new helpers, modelled after existing memalloc_noio_* and
memalloc_nofs_* helpers, and converts mm core to use them. Patches 3
and 4 convert non-mm code.
This patch (of 4):
__alloc_pages_direct_compact() sets PF_MEMALLOC to prevent deadlock
during page migration by lock_page() (see the comment in
__unmap_and_move()). Then it unconditionally clears the flag, which can
clear a pre-existing PF_MEMALLOC flag and result in recursive reclaim.
This was not a problem until commit a8161d1ed609 ("mm, page_alloc:
restructure direct compaction handling in slowpath"), because direct
compation was called only after direct reclaim, which was skipped when
PF_MEMALLOC flag was set.
Even now it's only a theoretical issue, as the new callsite of
__alloc_pages_direct_compact() is reached only for costly orders and
when gfp_pfmemalloc_allowed() is true, which means either
__GFP_NOMEMALLOC is in gfp_flags or in_interrupt() is true. There is no
such known context, but let's play it safe and make
__alloc_pages_direct_compact() robust for cases where PF_MEMALLOC is
already set.
mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
Although all architectures use a deposited page table for THP on
anonymous VMAs, some architectures (s390 and powerpc) require the
deposited storage even for file backed VMAs due to quirks of their MMUs.
This patch adds support for depositing a table in DAX PMD fault handling
path for archs that require it. Other architectures should see no
functional changes.
Depending on the flags of the PMD being zapped there may or may not be a
deposited pgtable to be freed. In two of the three cases this is open
coded while the third uses the zap_deposited_table() helper. This patch
converts the others to use the helper to clean things up a bit.
Deepa Dinamani [Mon, 8 May 2017 22:59:37 +0000 (15:59 -0700)]
time: delete CURRENT_TIME_SEC and CURRENT_TIME
All uses of CURRENT_TIME_SEC and CURRENT_TIME macros have been replaced
by other time functions. These macros are also not y2038 safe. And,
all their use cases can be fulfilled by y2038 safe ktime_get_* variants.
Deepa Dinamani [Mon, 8 May 2017 22:59:31 +0000 (15:59 -0700)]
apparmorfs: replace CURRENT_TIME with current_time()
CURRENT_TIME macro is not y2038 safe on 32 bit systems.
The patch replaces all the uses of CURRENT_TIME by current_time().
This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
current_time() is also planned to be transitioned to y2038 safe behavior
along with this change.
CURRENT_TIME macro will be deleted before merging the aforementioned
change.
Deepa Dinamani [Mon, 8 May 2017 22:59:28 +0000 (15:59 -0700)]
lustre: replace CURRENT_TIME macro
CURRENT_TIME macro is not y2038 safe on 32 bit systems.
The patch replaces all the uses of CURRENT_TIME by current_time() for
filesystem times, and ktime_get_* functions for others.
struct timespec is also not y2038 safe. Retain timespec for timestamp
representation here as lustre uses it internally everywhere. These
references will be changed to use struct timespec64 in a separate patch.
This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
current_time() is also planned to be transitioned to y2038 safe behavior
along with this change.
CURRENT_TIME macro will be deleted before merging the aforementioned
change.
Deepa Dinamani [Mon, 8 May 2017 22:59:25 +0000 (15:59 -0700)]
fs: ubifs: replace CURRENT_TIME_SEC with current_time
CURRENT_TIME_SEC is not y2038 safe. current_time() will be transitioned
to use 64 bit time along with vfs in a separate patch. There is no plan
to transition CURRENT_TIME_SEC to use y2038 safe time interfaces.
current_time() returns timestamps according to the granularities set in
the inode's super_block. The granularity check to call
current_fs_time() or CURRENT_TIME_SEC is not required.
Use current_time() directly to update inode timestamp. Use
timespec_trunc during file system creation, before the first inode is
created.
Deepa Dinamani [Mon, 8 May 2017 22:59:19 +0000 (15:59 -0700)]
fs: ceph: CURRENT_TIME with ktime_get_real_ts()
CURRENT_TIME is not y2038 safe. The macro will be deleted and all the
references to it will be replaced by ktime_get_* apis.
struct timespec is also not y2038 safe. Retain timespec for timestamp
representation here as ceph uses it internally everywhere. These
references will be changed to use struct timespec64 in a separate patch.
The current_fs_time() api is being changed to use vfs struct inode* as
an argument instead of struct super_block*.
Set the new mds client request r_stamp field using ktime_get_real_ts()
instead of using current_fs_time().
Also, since r_stamp is used as mtime on the server, use timespec_trunc()
to truncate the timestamp, using the right granularity from the
superblock.
This api will be transitioned to be y2038 safe along with vfs.
Deepa Dinamani [Mon, 8 May 2017 22:59:16 +0000 (15:59 -0700)]
fs: cifs: replace CURRENT_TIME by other appropriate apis
CURRENT_TIME macro is not y2038 safe on 32 bit systems.
The patch replaces all the uses of CURRENT_TIME by current_time() for
filesystem times, and ktime_get_* functions for authentication
timestamps and timezone calculations.
This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
CURRENT_TIME macro will be deleted before merging the aforementioned
change.
The inode timestamps read from the server are assumed to have correct
granularity and range.
The patch also assumes that the difference between server and client
times lie in the range INT_MIN..INT_MAX. This is valid because this is
the difference between current times between server and client, and the
largest timezone difference is in the range of one day.
All cifs timestamps currently use timespec representation internally.
Authentication and timezone timestamps can also be transitioned into
using timespec64 when all other timestamps for cifs is transitioned to
use timespec64.
Deepa Dinamani [Mon, 8 May 2017 22:59:13 +0000 (15:59 -0700)]
trace: make trace_hwlat timestamp y2038 safe
struct timespec is not y2038 safe on 32 bit machines and needs to be
replaced by struct timespec64 in order to represent times beyond year
2038 on such machines.
Fix all the timestamp representation in struct trace_hwlat and all the
corresponding implementations.