Git Repo - linux.git/log

]> Git Repo - linux.git/log

projects / linux.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Alexei Starovoitov [Thu, 3 Jan 2019 00:01:25 +0000 (16:01 -0800)]

Merge branch 'prevent-oob-under-speculation'

Daniel Borkmann says:

====================
This set fixes an out of bounds case under speculative execution
by implementing masking of pointer alu into the verifier. For
details please see the individual patches.

Thanks!

v2 -> v3:
  - 8/9: change states_equal condition into old->speculative &&
    !cur->speculative, thanks Jakub!
  - 8/9: remove incorrect speculative state test in
    propagate_liveness(), thanks Jakub!
v1 -> v2:
  - Typo fixes in commit msg and a comment, thanks David!
====================

Signed-off-by: Alexei Starovoitov <[email protected]>

commit | commitdiff | tree

Daniel Borkmann [Wed, 2 Jan 2019 23:58:35 +0000 (00:58 +0100)]

bpf: add various test cases to selftests

Add various map value pointer related test cases to test_verifier
kselftest to reflect recent changes and improve test coverage. The
tests include basic masking functionality, unprivileged behavior
on pointer arithmetic which goes oob, mixed bounds tests, negative
unknown scalar but resulting positive offset for access and helper
range, handling of arithmetic from multiple maps, various masking
scenarios with subsequent map value access and others including two
test cases from Jann Horn for prior fixes.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

commit | commitdiff | tree

Daniel Borkmann [Wed, 2 Jan 2019 23:58:34 +0000 (00:58 +0100)]

bpf: prevent out of bounds speculation on pointer arithmetic

Jann reported that the original commit back in b2157399cc98
("bpf: prevent out-of-bounds speculation") was not sufficient
to stop CPU from speculating out of bounds memory access:
While b2157399cc98 only focussed on masking array map access
for unprivileged users for tail calls and data access such
that the user provided index gets sanitized from BPF program
and syscall side, there is still a more generic form affected
from BPF programs that applies to most maps that hold user
data in relation to dynamic map access when dealing with
unknown scalars or "slow" known scalars as access offset, for
example:

  - Load a map value pointer into R6
  - Load an index into R7
  - Do a slow computation (e.g. with a memory dependency) that
    loads a limit into R8 (e.g. load the limit from a map for
    high latency, then mask it to make the verifier happy)
  - Exit if R7 >= R8 (mispredicted branch)
  - Load R0 = R6[R7]
  - Load R0 = R6[R0]

For unknown scalars there are two options in the BPF verifier
where we could derive knowledge from in order to guarantee
safe access to the memory: i) While </>/<=/>= variants won't
allow to derive any lower or upper bounds from the unknown
scalar where it would be safe to add it to the map value
pointer, it is possible through ==/!= test however. ii) another
option is to transform the unknown scalar into a known scalar,
for example, through ALU ops combination such as R &= <imm>
followed by R |= <imm> or any similar combination where the
original information from the unknown scalar would be destroyed
entirely leaving R with a constant. The initial slow load still
precedes the latter ALU ops on that register, so the CPU
executes speculatively from that point. Once we have the known
scalar, any compare operation would work then. A third option
only involving registers with known scalars could be crafted
as described in [0] where a CPU port (e.g. Slow Int unit)
would be filled with many dependent computations such that
the subsequent condition depending on its outcome has to wait
for evaluation on its execution port and thereby executing
speculatively if the speculated code can be scheduled on a
different execution port, or any other form of mistraining
as described in [1], for example. Given this is not limited
to only unknown scalars, not only map but also stack access
is affected since both is accessible for unprivileged users
and could potentially be used for out of bounds access under
speculation.

In order to prevent any of these cases, the verifier is now
sanitizing pointer arithmetic on the offset such that any
out of bounds speculation would be masked in a way where the
pointer arithmetic result in the destination register will
stay unchanged, meaning offset masked into zero similar as
in array_index_nospec() case. With regards to implementation,
there are three options that were considered: i) new insn
for sanitation, ii) push/pop insn and sanitation as inlined
BPF, iii) reuse of ax register and sanitation as inlined BPF.

Option i) has the downside that we end up using from reserved
bits in the opcode space, but also that we would require
each JIT to emit masking as native arch opcodes meaning
mitigation would have slow adoption till everyone implements
it eventually which is counter-productive. Option ii) and iii)
have both in common that a temporary register is needed in
order to implement the sanitation as inlined BPF since we
are not allowed to modify the source register. While a push /
pop insn in ii) would be useful to have in any case, it
requires once again that every JIT needs to implement it
first. While possible, amount of changes needed would also
be unsuitable for a -stable patch. Therefore, the path which
has fewer changes, less BPF instructions for the mitigation
and does not require anything to be changed in the JITs is
option iii) which this work is pursuing. The ax register is
already mapped to a register in all JITs (modulo arm32 where
it's mapped to stack as various other BPF registers there)
and used in constant blinding for JITs-only so far. It can
be reused for verifier rewrites under certain constraints.
The interpreter's tmp "register" has therefore been remapped
into extending the register set with hidden ax register and
reusing that for a number of instructions that needed the
prior temporary variable internally (e.g. div, mod). This
allows for zero increase in stack space usage in the interpreter,
and enables (restricted) generic use in rewrites otherwise as
long as such a patchlet does not make use of these instructions.
The sanitation mask is dynamic and relative to the offset the
map value or stack pointer currently holds.

There are various cases that need to be taken under consideration
for the masking, e.g. such operation could look as follows:
ptr += val or val += ptr or ptr -= val. Thus, the value to be
sanitized could reside either in source or in destination
register, and the limit is different depending on whether
the ALU op is addition or subtraction and depending on the
current known and bounded offset. The limit is derived as
follows: limit := max_value_size - (smin_value + off). For
subtraction: limit := umax_value + off. This holds because
we do not allow any pointer arithmetic that would
temporarily go out of bounds or would have an unknown
value with mixed signed bounds where it is unclear at
verification time whether the actual runtime value would
be either negative or positive. For example, we have a
derived map pointer value with constant offset and bounded
one, so limit based on smin_value works because the verifier
requires that statically analyzed arithmetic on the pointer
must be in bounds, and thus it checks if resulting
smin_value + off and umax_value + off is still within map
value bounds at time of arithmetic in addition to time of
access. Similarly, for the case of stack access we derive
the limit as follows: MAX_BPF_STACK + off for subtraction
and -off for the case of addition where off := ptr_reg->off +
ptr_reg->var_off.value. Subtraction is a special case for
the masking which can be in form of ptr += -val, ptr -= -val,
or ptr -= val. In the first two cases where we know that
the value is negative, we need to temporarily negate the
value in order to do the sanitation on a positive value
where we later swap the ALU op, and restore original source
register if the value was in source.

The sanitation of pointer arithmetic alone is still not fully
sufficient as is, since a scenario like the following could
happen ...

  PTR += 0x1000 (e.g. K-based imm)
  PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
  PTR += 0x1000
  PTR -= BIG_NUMBER_WITH_SLOW_COMPARISON
  [...]

... which under speculation could end up as ...

  PTR += 0x1000
  PTR -= 0 [ truncated by mitigation ]
  PTR += 0x1000
  PTR -= 0 [ truncated by mitigation ]
  [...]

... and therefore still access out of bounds. To prevent such
case, the verifier is also analyzing safety for potential out
of bounds access under speculative execution. Meaning, it is
also simulating pointer access under truncation. We therefore
"branch off" and push the current verification state after the
ALU operation with known 0 to the verification stack for later
analysis. Given the current path analysis succeeded it is
likely that the one under speculation can be pruned. In any
case, it is also subject to existing complexity limits and
therefore anything beyond this point will be rejected. In
terms of pruning, it needs to be ensured that the verification
state from speculative execution simulation must never prune
a non-speculative execution path, therefore, we mark verifier
state accordingly at the time of push_stack(). If verifier
detects out of bounds access under speculative execution from
one of the possible paths that includes a truncation, it will
reject such program.

Given we mask every reg-based pointer arithmetic for
unprivileged programs, we've been looking into how it could
affect real-world programs in terms of size increase. As the
majority of programs are targeted for privileged-only use
case, we've unconditionally enabled masking (with its alu
restrictions on top of it) for privileged programs for the
sake of testing in order to check i) whether they get rejected
in its current form, and ii) by how much the number of
instructions and size will increase. We've tested this by
using Katran, Cilium and test_l4lb from the kernel selftests.
For Katran we've evaluated balancer_kern.o, Cilium bpf_lxc.o
and an older test object bpf_lxc_opt_-DUNKNOWN.o and l4lb
we've used test_l4lb.o as well as test_l4lb_noinline.o. We
found that none of the programs got rejected by the verifier
with this change, and that impact is rather minimal to none.
balancer_kern.o had 13,904 bytes (1,738 insns) xlated and
7,797 bytes JITed before and after the change. Most complex
program in bpf_lxc.o had 30,544 bytes (3,817 insns) xlated
and 18,538 bytes JITed before and after and none of the other
tail call programs in bpf_lxc.o had any changes either. For
the older bpf_lxc_opt_-DUNKNOWN.o object we found a small
increase from 20,616 bytes (2,576 insns) and 12,536 bytes JITed
before to 20,664 bytes (2,582 insns) and 12,558 bytes JITed
after the change. Other programs from that object file had
similar small increase. Both test_l4lb.o had no change and
remained at 6,544 bytes (817 insns) xlated and 3,401 bytes
JITed and for test_l4lb_noinline.o constant at 5,080 bytes
(634 insns) xlated and 3,313 bytes JITed. This can be explained
in that LLVM typically optimizes stack based pointer arithmetic
by using K-based operations and that use of dynamic map access
is not overly frequent. However, in future we may decide to
optimize the algorithm further under known guarantees from
branch and value speculation. Latter seems also unclear in
terms of prediction heuristics that today's CPUs apply as well
as whether there could be collisions in e.g. the predictor's
Value History/Pattern Table for triggering out of bounds access,
thus masking is performed unconditionally at this point but could
be subject to relaxation later on. We were generally also
brainstorming various other approaches for mitigation, but the
blocker was always lack of available registers at runtime and/or
overhead for runtime tracking of limits belonging to a specific
pointer. Thus, we found this to be minimally intrusive under
given constraints.

With that in place, a simple example with sanitized access on
unprivileged load at post-verification time looks as follows:

  # bpftool prog dump xlated id 282
  [...]
  28: (79) r1 = *(u64 *)(r7 +0)
  29: (79) r2 = *(u64 *)(r7 +8)
  30: (57) r1 &= 15
  31: (79) r3 = *(u64 *)(r0 +4608)
  32: (57) r3 &= 1
  33: (47) r3 |= 1
  34: (2d) if r2 > r3 goto pc+19
  35: (b4) (u32) r11 = (u32) 20479  |
  36: (1f) r11 -= r2                | Dynamic sanitation for pointer
  37: (4f) r11 |= r2                | arithmetic with registers
  38: (87) r11 = -r11               | containing bounded or known
  39: (c7) r11 s>>= 63              | scalars in order to prevent
  40: (5f) r11 &= r2                | out of bounds speculation.
  41: (0f) r4 += r11                |
  42: (71) r4 = *(u8 *)(r4 +0)
  43: (6f) r4 <<= r1
  [...]

For the case where the scalar sits in the destination register
as opposed to the source register, the following code is emitted
for the above example:

  [...]
  16: (b4) (u32) r11 = (u32) 20479
  17: (1f) r11 -= r2
  18: (4f) r11 |= r2
  19: (87) r11 = -r11
  20: (c7) r11 s>>= 63
  21: (5f) r2 &= r11
  22: (0f) r2 += r0
  23: (61) r0 = *(u32 *)(r2 +0)
  [...]

JIT blinding example with non-conflicting use of r10:

  [...]
   d5: je     0x0000000000000106    _
   d7: mov    0x0(%rax),%edi       |
   da: mov    $0xf153246,%r10d     | Index load from map value and
   e0: xor    $0xf153259,%r10      | (const blinded) mask with 0x1f.
   e7: and    %r10,%rdi            |_
   ea: mov    $0x2f,%r10d          |
   f0: sub    %rdi,%r10            | Sanitized addition. Both use r10
   f3: or     %rdi,%r10            | but do not interfere with each
   f6: neg    %r10                 | other. (Neither do these instructions
   f9: sar    $0x3f,%r10           | interfere with the use of ax as temp
   fd: and    %r10,%rdi            | in interpreter.)
  100: add    %rax,%rdi            |_
  103: mov    0x0(%rdi),%eax
[...]

Tested that it fixes Jann's reproducer, and also checked that test_verifier
and test_progs suite with interpreter, JIT and JIT with hardening enabled
on x86-64 and arm64 runs successfully.

  [0] Speculose: Analyzing the Security Implications of Speculative
      Execution in CPUs, Giorgi Maisuradze and Christian Rossow,
      https://arxiv.org/pdf/1801.04084.pdf

  [1] A Systematic Evaluation of Transient Execution Attacks and
      Defenses, Claudio Canella, Jo Van Bulck, Michael Schwarz,
      Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens,
      Dmitry Evtyushkin, Daniel Gruss,
      https://arxiv.org/pdf/1811.05441.pdf

Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Reported-by: Jann Horn <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

commit | commitdiff | tree

Daniel Borkmann [Wed, 2 Jan 2019 23:58:33 +0000 (00:58 +0100)]

bpf: fix check_map_access smin_value test when pointer contains offset

In check_map_access() we probe actual bounds through __check_map_access()
with offset of reg->smin_value + off for lower bound and offset of
reg->umax_value + off for the upper bound. However, even though the
reg->smin_value could have a negative value, the final result of the
sum with off could be positive when pointer arithmetic with known and
unknown scalars is combined. In this case we reject the program with
an error such as "R<x> min value is negative, either use unsigned index
or do a if (index >=0) check." even though the access itself would be
fine. Therefore extend the check to probe whether the actual resulting
reg->smin_value + off is less than zero.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

commit | commitdiff | tree

Daniel Borkmann [Wed, 2 Jan 2019 23:58:32 +0000 (00:58 +0100)]

bpf: restrict unknown scalars of mixed signed bounds for unprivileged

For unknown scalars of mixed signed bounds, meaning their smin_value is
negative and their smax_value is positive, we need to reject arithmetic
with pointer to map value. For unprivileged the goal is to mask every
map pointer arithmetic and this cannot reliably be done when it is
unknown at verification time whether the scalar value is negative or
positive. Given this is a corner case, the likelihood of breaking should
be very small.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

commit | commitdiff | tree

Daniel Borkmann [Wed, 2 Jan 2019 23:58:31 +0000 (00:58 +0100)]

bpf: restrict stack pointer arithmetic for unprivileged

Restrict stack pointer arithmetic for unprivileged users in that
arithmetic itself must not go out of bounds as opposed to the actual
access later on. Therefore after each adjust_ptr_min_max_vals() with
a stack pointer as a destination we simulate a check_stack_access()
of 1 byte on the destination and once that fails the program is
rejected for unprivileged program loads. This is analog to map
value pointer arithmetic and needed for masking later on.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

commit | commitdiff | tree

Daniel Borkmann [Wed, 2 Jan 2019 23:58:30 +0000 (00:58 +0100)]

bpf: restrict map value pointer arithmetic for unprivileged

Restrict map value pointer arithmetic for unprivileged users in that
arithmetic itself must not go out of bounds as opposed to the actual
access later on. Therefore after each adjust_ptr_min_max_vals() with a
map value pointer as a destination it will simulate a check_map_access()
of 1 byte on the destination and once that fails the program is rejected
for unprivileged program loads. We use this later on for masking any
pointer arithmetic with the remainder of the map value space. The
likelihood of breaking any existing real-world unprivileged eBPF
program is very small for this corner case.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

commit | commitdiff | tree

Daniel Borkmann [Wed, 2 Jan 2019 23:58:29 +0000 (00:58 +0100)]

bpf: enable access to ax register also from verifier rewrite

Right now we are using BPF ax register in JIT for constant blinding as
well as in interpreter as temporary variable. Verifier will not be able
to use it simply because its use will get overridden from the former in
bpf_jit_blind_insn(). However, it can be made to work in that blinding
will be skipped if there is prior use in either source or destination
register on the instruction. Taking constraints of ax into account, the
verifier is then open to use it in rewrites under some constraints. Note,
ax register already has mappings in every eBPF JIT.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

commit | commitdiff | tree

Daniel Borkmann [Wed, 2 Jan 2019 23:58:28 +0000 (00:58 +0100)]

bpf: move tmp variable into ax register in interpreter

This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run()
into the hidden ax register. The latter is currently only used in JITs
for constant blinding as a temporary scratch register, meaning the BPF
interpreter will never see the use of ax. Therefore it is safe to use
it for the cases where tmp has been used earlier. This is needed to later
on allow restricted hidden use of ax in both interpreter and JITs.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

commit | commitdiff | tree

Daniel Borkmann [Wed, 2 Jan 2019 23:58:27 +0000 (00:58 +0100)]

bpf: move {prev_,}insn_idx into verifier env

Move prev_insn_idx and insn_idx from the do_check() function into
the verifier environment, so they can be read inside the various
helper functions for handling the instructions. It's easier to put
this into the environment rather than changing all call-sites only
to pass it along. insn_idx is useful in particular since this later
on allows to hold state in env->insn_aux_data[env->insn_idx].

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

commit | commitdiff | tree

Aditya Pakki [Mon, 24 Dec 2018 18:24:45 +0000 (12:24 -0600)]

infiniband/qedr: Potential null ptr dereference of qp

idr_find() may fail and return a NULL pointer. The fix checks the return
value of the function and returns an error in case of NULL.

Signed-off-by: Aditya Pakki <[email protected]>
Acked-by: Michal Kalderon <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

commit | commitdiff | tree

Aditya Pakki [Wed, 26 Dec 2018 18:56:22 +0000 (12:56 -0600)]

infiniband: bnxt_re: qplib: Check the return value of send_message

In bnxt_qplib_map_tc2cos(), bnxt_qplib_rcfw_send_message() can return an
error value but it is lost. Propagate this error to the callers.

Signed-off-by: Aditya Pakki <[email protected]>
Acked-By: Devesh Sharma <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

commit | commitdiff | tree

Julia Lawall [Sun, 23 Dec 2018 08:57:01 +0000 (09:57 +0100)]

IB/ipoib: drop useless LIST_HEAD

Drop LIST_HEAD where the variable it declares is never used.

Commit 31c02e215700 ("IPoIB: Avoid using stale last_send counter
when reaping AHs") removed the uses, but not the declaration.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
... when != x
// </smpl>

Fixes: 31c02e215700 ("IPoIB: Avoid using stale last_send counter when reaping AHs")
Signed-off-by: Julia Lawall <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

commit | commitdiff | tree

Aaro Koskinen [Wed, 2 Jan 2019 18:43:01 +0000 (20:43 +0200)]

MIPS: OCTEON: mark RGMII interface disabled on OCTEON III

Commit 885872b722b7 ("MIPS: Octeon: Add Octeon III CN7xxx
interface detection") added RGMII interface detection for OCTEON III,
but it results in the following logs:

[ 7.165984] ERROR: Unsupported Octeon model in __cvmx_helper_rgmii_probe
[ 7.173017] ERROR: Unsupported Octeon model in __cvmx_helper_rgmii_probe

The current RGMII routines are valid only for older OCTEONS that
use GMX/ASX hardware blocks. On later chips AGL should be used,
but support for that is missing in the mainline. Until that is added,
mark the interface as disabled.

Fixes: 885872b722b7 ("MIPS: Octeon: Add Octeon III CN7xxx interface detection")
Signed-off-by: Aaro Koskinen <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: [email protected]
Cc: [email protected] # 4.7+

commit | commitdiff | tree

Bjorn Helgaas [Wed, 2 Jan 2019 21:31:15 +0000 (15:31 -0600)]

Merge branch 'pci/imx6'

- Enable MSI for imx6 downstream components (Richard Zhu)

* pci/imx6:
PCI: imx: Enable MSI from downstream components

commit | commitdiff | tree

Bjorn Helgaas [Wed, 2 Jan 2019 21:31:15 +0000 (15:31 -0600)]

Merge branch 'remotes/lorenzo/pci/uniphier'

  - Add UniPhier PCIe controller driver and DT bindings (Kunihiko Hayashi)

* remotes/lorenzo/pci/uniphier:
  PCI: uniphier: Add UniPhier PCIe host controller support
  dt-bindings: PCI: Add UniPhier PCIe host controller description

# Conflicts:
# drivers/pci/controller/dwc/Kconfig
# drivers/pci/controller/dwc/Makefile

commit | commitdiff | tree

Bjorn Helgaas [Wed, 2 Jan 2019 21:31:10 +0000 (15:31 -0600)]

Merge branch 'remotes/lorenzo/pci/mediatek'

  - Use devm resource parser in mediatek (Honghui Zhang)

  - Remove unused mediatek "num-lanes" DT property (Honghui Zhang)

* remotes/lorenzo/pci/mediatek:
  arm64: dts: mt7622: Remove un-used property for PCIe
  arm: dts: mt7623: Remove un-used property for PCIe
  dt-bindings: PCI: MediaTek: Remove un-used property
  PCI: mediatek: Remove un-used variant in struct mtk_pcie_port
  PCI: mediatek: Use devm_of_pci_get_host_bridge_resources() to parse DT

commit | commitdiff | tree

Bjorn Helgaas [Wed, 2 Jan 2019 21:31:09 +0000 (15:31 -0600)]

Merge branch 'remotes/lorenzo/pci/dwc-msi'

  - Mask DesignWare interrupts instead of disabling them to avoid lost
    interrupts (Marc Zyngier)

  - Add locking when acking DesignWare interrupts (Marc Zyngier)

  - Ack DesignWare interrupts in the proper callbacks (Marc Zyngier)

* remotes/lorenzo/pci/dwc-msi:
  PCI: dwc: Move interrupt acking into the proper callback
  PCI: dwc: Take lock when ACKing an interrupt
  PCI: dwc: Use interrupt masking instead of disabling

commit | commitdiff | tree

Bjorn Helgaas [Wed, 2 Jan 2019 21:31:08 +0000 (15:31 -0600)]

Merge branch 'remotes/lorenzo/pci/dwc'

  - Constify histb dw_pcie_host_ops structure (Julia Lawall)

  - Support multiple power domains for imx6 (Leonard Crestez)

  - Constify layerscape driver data (Stefan Agner)

  - Update imx6 Kconfig to allow imx6 PCIe in imx7 kernel (Trent Piepho)

  - Support armada8k GPIO reset (Baruch Siach)

  - Support suspend/resume support on imx6 (Leonard Crestez)

  - Don't hard-code DesignWare DBI/ATU offst (Stephen Warren)

  - Skip i.MX6 PHY setup on i.MX7D (Andrey Smirnov)

  - Remove Jianguo Sun from HiSilicon STB maintainers (Lorenzo Pieralisi)

* remotes/lorenzo/pci/dwc:
  MAINTAINERS: Remove Jianguo Sun from HiSilicon STB DWC entry
  PCI: dwc: Don't hard-code DBI/ATU offset
  PCI: imx: Add imx6sx suspend/resume support
  PCI: armada8k: Add support for gpio controlled reset signal
  PCI: dwc: Adjust Kconfig to allow IMX6 PCIe host on IMX7
  PCI: dwc: layerscape: Constify driver data
  PCI: imx: Add multi-pd support
  dt-bindings: imx6q-pcie: Add multi-pd bindings for imx6sx
  PCI: histb: Constify dw_pcie_host_ops structure

commit | commitdiff | tree

Bjorn Helgaas [Wed, 2 Jan 2019 21:31:07 +0000 (15:31 -0600)]

Merge branch 'remotes/lorenzo/pci/amlogic'

  - Add Amlogic Meson PCIe controller driver and DT bindings (Yue Wang)

* remotes/lorenzo/pci/amlogic:
  PCI: amlogic: Add the Amlogic Meson PCIe controller driver
  dt-bindings: PCI: meson: add DT bindings for Amlogic Meson PCIe controller

commit | commitdiff | tree

Bjorn Helgaas [Wed, 2 Jan 2019 21:31:06 +0000 (15:31 -0600)]

Merge branch 'pci/virtualization'

  - Skip VF scanning on powerpc, which does this in firmware (Sebastian
    Ott)

* pci/virtualization:
  s390/pci: skip VF scanning
  PCI/IOV: Add flag so platforms can skip VF scanning
  PCI/IOV: Factor out sriov_add_vfs()

commit | commitdiff | tree

Bjorn Helgaas [Wed, 2 Jan 2019 21:31:05 +0000 (15:31 -0600)]

Merge branch 'pci/switchtec'

  - Remove status check after submitting Switchtec MRPC Firmware Download
    commands to avoid Completion Timeouts (Kelvin Cao)

  - Set Switchtec coherent DMA mask to allow 64-bit DMA (Boris Glimcher)

  - Fix Switchtec SWITCHTEC_IOCTL_EVENT_IDX_ALL flag overwrite issue (Joey
    Zhang)

  - Enable write combining for Switchtec MRPC Input buffers (Kelvin Cao)

  - Add Switchtec MRPC DMA mode support (Wesley Sheng)

* pci/switchtec:
  switchtec: Add MRPC DMA mode support
  switchtec: Improve MRPC efficiency by enabling write combining
  switchtec: Fix SWITCHTEC_IOCTL_EVENT_IDX_ALL flags overwrite
  switchtec: Set DMA coherent mask
  switchtec: Remove immediate status check after submitting MRPC command

commit | commitdiff | tree

Bjorn Helgaas [Wed, 2 Jan 2019 21:31:05 +0000 (15:31 -0600)]

Merge branch 'pci/pm'

  - Allow runtime PM even if driver doesn't supply callbacks (Jarkko
    Nikula)

* pci/pm:
  PCI / PM: Allow runtime PM without callback functions

commit | commitdiff | tree

Bjorn Helgaas [Wed, 2 Jan 2019 21:31:04 +0000 (15:31 -0600)]

Merge branch 'pci/peer-to-peer'

- Clean up P2PDMA documentation (Randy Dunlap)

* pci/peer-to-peer:
PCI/P2PDMA: Clean up documentation and kernel-doc

commit | commitdiff | tree

Bjorn Helgaas [Wed, 2 Jan 2019 21:31:03 +0000 (15:31 -0600)]

Merge branch 'pci/misc'

  - Expand Kconfig "PF" acronyms (Randy Dunlap)

  - Update MAINTAINERS for arch/x86/kernel/early-quirks.c (Bjorn Helgaas)

  - Add missing include to drivers/pci.h (Alexandru Gagniuc)

  - Override Synopsys USB 3.x HAPS device class so dwc3-haps can claim it
    instead of xhci (Thinh Nguyen)

* pci/misc:
  PCI: Override Synopsys USB 3.x HAPS device class
  PCI: Move Synopsys HAPS platform device IDs
  PCI: Add missing include to drivers/pci.h
  PCI: Remove unnecessary space before function pointer arguments
  MAINTAINERS: Add x86 early-quirks.c file pattern to PCI subsystem
  PCI: Expand the "PF" acronym in Kconfig help text

commit | commitdiff | tree

Bjorn Helgaas [Wed, 2 Jan 2019 21:31:02 +0000 (15:31 -0600)]

Merge branch 'pci/enumeration'

  - Fix Broadcom CNB20LE host bridge unintended sign extension (Colin Ian
    King)

* pci/enumeration:
  x86/PCI: Fix Broadcom CNB20LE unintended sign extension (redux)

commit | commitdiff | tree

Bjorn Helgaas [Wed, 2 Jan 2019 21:31:02 +0000 (15:31 -0600)]

Merge branch 'pci/aspm'

- Remove unused lists from ASPM pcie_link_state (Frederick Lawler)

* pci/aspm:
PCI/ASPM: Remove unused lists from struct pcie_link_state

commit | commitdiff | tree

Linus Torvalds [Wed, 2 Jan 2019 20:11:01 +0000 (12:11 -0800)]

Merge tag '9p-for-4.21' of git://github.com/martinetd/linux

Pull 9p updates from Dominique Martinet:
"Missing prototype warning fix and a syzkaller fix when a 9p server
  advertises a too small msize"

* tag '9p-for-4.21' of git://github.com/martinetd/linux:
  9p/net: put a lower bound on msize
  net/9p: include trans_common.h to fix missing prototype warning.

commit | commitdiff | tree

Linus Torvalds [Wed, 2 Jan 2019 20:08:29 +0000 (12:08 -0800)]

Merge tag '4.21-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs updates from Steve French:

- four fixes for stable

- improvements to DFS including allowing failover to alternate targets

- some small performance improvements

* tag '4.21-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: (39 commits)
  cifs: update internal module version number
  cifs: we can not use small padding iovs together with encryption
  cifs: Minor Kconfig clarification
  cifs: Always resolve hostname before reconnecting
  cifs: Add support for failover in cifs_reconnect_tcon()
  cifs: Add support for failover in smb2_reconnect()
  cifs: Only free DFS target list if we actually got one
  cifs: start DFS cache refresher in cifs_mount()
  cifs: Use GFP_ATOMIC when a lock is held in cifs_mount()
  cifs: Add support for failover in cifs_reconnect()
  cifs: Add support for failover in cifs_mount()
  cifs: remove set but not used variable 'sep'
  cifs: Make use of DFS cache to get new DFS referrals
  cifs: minor updates to documentation
  cifs: check kzalloc return
  cifs: remove set but not used variable 'server'
  cifs: Use kzfree() to free password
  cifs: Fix to use kmem_cache_free() instead of kfree()
  cifs: update for current_kernel_time64() removal
  cifs: Add DFS cache routines
  ...

commit | commitdiff | tree

Linus Torvalds [Wed, 2 Jan 2019 19:05:43 +0000 (11:05 -0800)]

Merge branch 'next-tpm' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull TPM updates from James Morris:

- Support for partial reads of /dev/tpm0.

- Clean up for TPM 1.x code: move the commands to tpm1-cmd.c and make
   everything to use the same data structure for building TPM commands
   i.e. struct tpm_buf.

* 'next-tpm' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (25 commits)
  tpm: add support for partial reads
  tpm: tpm_ibmvtpm: fix kdoc warnings
  tpm: fix kdoc for tpm2_flush_context_cmd()
  tpm: tpm_try_transmit() refactor error flow.
  tpm: use u32 instead of int for PCR index
  tpm1: reimplement tpm1_continue_selftest() using tpm_buf
  tpm1: reimplement SAVESTATE using tpm_buf
  tpm1: rename tpm1_pcr_read_dev to tpm1_pcr_read()
  tpm1: implement tpm1_pcr_read_dev() using tpm_buf structure
  tpm: tpm1: rewrite tpm1_get_random() using tpm_buf structure
  tpm: tpm-space.c remove unneeded semicolon
  tpm: tpm-interface.c drop unused macros
  tpm: add tpm_auto_startup() into tpm-interface.c
  tpm: factor out tpm_startup function
  tpm: factor out tpm 1.x pm suspend flow into tpm1-cmd.c
  tpm: move tpm 1.x selftest code from tpm-interface.c tpm1-cmd.c
  tpm: factor out tpm1_get_random into tpm1-cmd.c
  tpm: move tpm_getcap to tpm1-cmd.c
  tpm: move tpm1_pcr_extend to tpm1-cmd.c
  tpm: factor out tpm_get_timeouts()
  ...

commit | commitdiff | tree

Linus Torvalds [Wed, 2 Jan 2019 18:56:09 +0000 (10:56 -0800)]

Merge branch 'next-smack' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull smack updates from James Morris:
"Two Smack patches for 4.21.

  Jose's patch adds missing documentation and Zoran's fleshes out the
  access checks on keyrings"

* 'next-smack' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  Smack: Improve Documentation
  smack: fix access permissions for keyring

commit | commitdiff | tree

Linus Torvalds [Wed, 2 Jan 2019 18:46:03 +0000 (10:46 -0800)]

block: don't use un-ordered __set_current_state(TASK_UNINTERRUPTIBLE)

This mostly reverts commit 849a370016a5 ("block: avoid ordered task
state change for polled IO").  It was wrongly claiming that the ordering
wasn't necessary.  The memory barrier _is_ necessary.

If something is truly polling and not going to sleep, it's the whole
state setting that is unnecessary, not the memory barrier.  Whenever you
set your state to a sleeping state, you absolutely need the memory
barrier.

Note that sometimes the memory barrier can be elsewhere.  For example,
the ordering might be provided by an external lock, or by setting the
process state to sleeping before adding yourself to the wait queue list
that is used for waking up (where the wait queue lock itself will
guarantee that any wakeup will correctly see the sleeping state).

But none of those cases were true here.

NOTE! Some of the polling paths may indeed be able to drop the state
setting entirely, at which point the memory barrier also goes away.

(Also note that this doesn't revert the TASK_RUNNING cases: there is no
race between a wakeup and setting the process state to TASK_RUNNING,
since the end result doesn't depend on ordering).

Cc: Jens Axboe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

commit | commitdiff | tree

Eric Dumazet [Wed, 2 Jan 2019 17:20:27 +0000 (09:20 -0800)]

isdn: fix kernel-infoleak in capi_unlocked_ioctl

Since capi_ioctl() copies 64 bytes after calling
capi20_get_manufacturer() we need to ensure to not leak
information to user.

BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
CPU: 0 PID: 11245 Comm: syz-executor633 Not tainted 4.20.0-rc7+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x173/0x1d0 lib/dump_stack.c:113
kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
kmsan_internal_check_memory+0x9d4/0xb00 mm/kmsan/kmsan.c:704
kmsan_copy_to_user+0xab/0xc0 mm/kmsan/kmsan_hooks.c:601
_copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
capi_ioctl include/linux/uaccess.h:177 [inline]
capi_unlocked_ioctl+0x1a0b/0x1bf0 drivers/isdn/capi/capi.c:939
do_vfs_ioctl+0xebd/0x2bf0 fs/ioctl.c:46
ksys_ioctl fs/ioctl.c:713 [inline]
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl+0x1da/0x270 fs/ioctl.c:718
__x64_sys_ioctl+0x4a/0x70 fs/ioctl.c:718
do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x440019
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffdd4659fb8 EFLAGS: 00000213 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440019
RDX: 0000000020000080 RSI: 00000000c0044306 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000213 R12: 00000000004018a0
R13: 0000000000401930 R14: 0000000000000000 R15: 0000000000000000

Local variable description: ----data.i@capi_unlocked_ioctl
Variable was created at:
capi_ioctl drivers/isdn/capi/capi.c:747 [inline]
capi_unlocked_ioctl+0x82/0x1bf0 drivers/isdn/capi/capi.c:939
do_vfs_ioctl+0xebd/0x2bf0 fs/ioctl.c:46

Bytes 12-63 of 64 are uninitialized
Memory access of size 64 starts at ffff88807ac5fce8
Data copied to user address 0000000020000080

Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Karsten Keil <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

commit | commitdiff | tree

Stefano Brivio [Wed, 2 Jan 2019 12:29:27 +0000 (13:29 +0100)]

ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error

In ip6_neigh_lookup(), we must not return errors coming from
neigh_create(): if creation of a neighbour entry fails, the lookup should
return NULL, in the same way as it's done in __neigh_lookup().

Otherwise, callers legitimately checking for a non-NULL return value of
the lookup function might dereference an invalid pointer.

For instance, on neighbour table overflow, ndisc_router_discovery()
crashes ndisc_update() by passing ERR_PTR(-ENOBUFS) as 'neigh' argument.

Reported-by: Jianlin Shi <[email protected]>
Fixes: f8a1b43b709d ("net/ipv6: Create a neigh_lookup for FIB entries")
Signed-off-by: Stefano Brivio <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

commit | commitdiff | tree

Eric Dumazet [Wed, 2 Jan 2019 12:24:20 +0000 (04:24 -0800)]

net/hamradio/6pack: use mod_timer() to rearm timers

Using del_timer() + add_timer() is generally unsafe on SMP,
as noticed by syzbot. Use mod_timer() instead.

kernel BUG at kernel/time/timer.c:1136!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 1026 Comm: kworker/u4:4 Not tainted 4.20.0+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_unbound flush_to_ldisc
RIP: 0010:add_timer kernel/time/timer.c:1136 [inline]
RIP: 0010:add_timer+0xa81/0x1470 kernel/time/timer.c:1134
Code: 4d 89 7d 40 48 c7 85 70 fe ff ff 00 00 00 00 c7 85 7c fe ff ff ff ff ff ff 48 89 85 90 fe ff ff e9 e6 f7 ff ff e8 cf 42 12 00 <0f> 0b e8 c8 42 12 00 0f 0b e8 c1 42 12 00 4c 89 bd 60 fe ff ff e9
RSP: 0018:ffff8880a7fdf5a8 EFLAGS: 00010293
RAX: ffff8880a7846340 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff816f3ee1 RDI: ffff88808a514ff8
RBP: ffff8880a7fdf760 R08: 0000000000000007 R09: ffff8880a7846c58
R10: ffff8880a7846340 R11: 0000000000000000 R12: ffff88808a514ff8
R13: ffff88808a514ff8 R14: ffff88808a514dc0 R15: 0000000000000030
FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000061c500 CR3: 00000000994d9000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
decode_prio_command drivers/net/hamradio/6pack.c:903 [inline]
sixpack_decode drivers/net/hamradio/6pack.c:971 [inline]
sixpack_receive_buf drivers/net/hamradio/6pack.c:457 [inline]
sixpack_receive_buf+0xf9c/0x1470 drivers/net/hamradio/6pack.c:434
tty_ldisc_receive_buf+0x164/0x1c0 drivers/tty/tty_buffer.c:465
tty_port_default_receive_buf+0x114/0x190 drivers/tty/tty_port.c:38
receive_buf drivers/tty/tty_buffer.c:481 [inline]
flush_to_ldisc+0x3b2/0x590 drivers/tty/tty_buffer.c:533
process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
kthread+0x357/0x430 kernel/kthread.c:246
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Andreas Koensgen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

commit | commitdiff | tree

Xue Chaojing [Tue, 1 Jan 2019 19:39:33 +0000 (19:39 +0000)]

net-next/hinic:add shutdown callback

If there is no shutdown callback, our board will report pcie UNF errors
after restarting. This patch add shutdown callback for hinic.

Signed-off-by: Xue Chaojing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

commit | commitdiff | tree

Linus Torvalds [Wed, 2 Jan 2019 17:48:13 +0000 (09:48 -0800)]

Merge branch 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull seccomp updates from James Morris:

- Add SECCOMP_RET_USER_NOTIF

- seccomp fixes for sparse warnings and s390 build (Tycho)

* 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  seccomp, s390: fix build for syscall type change
  seccomp: fix poor type promotion
  samples: add an example of seccomp user trap
  seccomp: add a return code to trap to userspace
  seccomp: switch system call argument type to void *
  seccomp: hoist struct seccomp_data recalculation higher

commit | commitdiff | tree

Bartlomiej Zolnierkiewicz [Wed, 2 Jan 2019 17:47:37 +0000 (18:47 +0100)]

drm/nouveau: fix incorrect FB_BACKLIGHT usage in Kconfig

Making FB_BACKLIGHT tristate by commit b4a1ed0cd18b ("fbdev:
make FB_BACKLIGHT a tristate") caused unmet dependencies in
some configurations:

WARNING: unmet direct dependencies detected for FB_BACKLIGHT
  Depends on [m]: HAS_IOMEM [=y] && FB [=m]
  Selected by [y]:
  - DRM_NOUVEAU [=y] && HAS_IOMEM [=y] && DRM [=y] && PCI [=y] && MMU [=y] && DRM_NOUVEAU_BACKLIGHT [=y]
  Selected by [m]:
  - FB_NVIDIA [=m] && HAS_IOMEM [=y] && FB [=m] && PCI [=y] && FB_NVIDIA_BACKLIGHT [=y]

Fix it by making DRM_NOUVEAU select BACKLIGHT_CLASS_DEVICE and
BACKLIGHT_LCD_SUPPORT instead of FB_BACKLIGHT.

Fixes: b4a1ed0cd18b ("fbdev: make FB_BACKLIGHT a tristate")
Reported-by: Randy Dunlap <[email protected]>
Acked-by: Randy Dunlap <[email protected]> # build-tested
Cc: Rob Clark <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Ben Skeggs <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]>

commit | commitdiff | tree

Linus Torvalds [Wed, 2 Jan 2019 17:43:14 +0000 (09:43 -0800)]

Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull integrity updates from James Morris:
"In Linux 4.19, a new LSM hook named security_kernel_load_data was
  upstreamed, allowing LSMs and IMA to prevent the kexec_load syscall.
  Different signature verification methods exist for verifying the
  kexec'ed kernel image. This adds additional support in IMA to prevent
  loading unsigned kernel images via the kexec_load syscall,
  independently of the IMA policy rules, based on the runtime "secure
  boot" flag. An initial IMA kselftest is included.

  In addition, this pull request defines a new, separate keyring named
  ".platform" for storing the preboot/firmware keys needed for verifying
  the kexec'ed kernel image's signature and includes the associated IMA
  kexec usage of the ".platform" keyring.

  (David Howell's and Josh Boyer's patches for reading the
  preboot/firmware keys, which were previously posted for a different
  use case scenario, are included here)"

* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  integrity: Remove references to module keyring
  ima: Use inode_is_open_for_write
  ima: Support platform keyring for kernel appraisal
  efi: Allow the "db" UEFI variable to be suppressed
  efi: Import certificates from UEFI Secure Boot
  efi: Add an EFI signature blob parser
  efi: Add EFI signature data types
  integrity: Load certs to the platform keyring
  integrity: Define a trusted platform keyring
  selftests/ima: kexec_load syscall test
  ima: don't measure/appraise files on efivarfs
  x86/ima: retry detecting secure boot mode
  docs: Extend trusted keys documentation for TPM 2.0
  x86/ima: define arch_get_ima_policy() for x86
  ima: add support for arch specific policies
  ima: refactor ima_init_policy()
  ima: prevent kexec_load syscall based on runtime secureboot flag
  x86/ima: define arch_ima_get_secureboot
  integrity: support new struct public_key_signature encoding field

commit | commitdiff | tree

Yangtao Li [Fri, 21 Dec 2018 15:59:36 +0000 (10:59 -0500)]

sunrpc: convert to DEFINE_SHOW_ATTRIBUTE

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Santosh kumar pradhan [Wed, 19 Dec 2018 06:59:57 +0000 (12:29 +0530)]

sunrpc: Add xprt after nfs4_test_session_trunk()

Multipathing: In case of NFSv3, rpc_clnt_test_and_add_xprt() adds
the xprt to xprt switch (i.e. xps) if rpc_call_null_helper() returns
success. But in case of NFSv4.1, it needs to do EXCHANGEID to verify
the path along with check for session trunking.

Add the xprt in nfs4_test_session_trunk() only when
nfs4_detect_session_trunking() returns success. Also release refcount
hold by rpc_clnt_setup_test_and_add_xprt().

Signed-off-by: Santosh kumar pradhan <[email protected]>
Tested-by: Suresh Jayaraman <[email protected]>
Reported-by: Aditya Agnihotri <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

J. Bruce Fields [Thu, 20 Dec 2018 15:42:36 +0000 (10:42 -0500)]

sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS

It's OK to sleep here, we just don't want to recurse into the filesystem
as a writeout could be waiting on this.

Future work: the documentation for GFP_NOFS says "Please try to avoid
using this flag directly and instead use memalloc_nofs_{save,restore} to
mark the whole scope which cannot/shouldn't recurse into the FS layer
with a short explanation why. All allocation requests will inherit
GFP_NOFS implicitly."

But I'm not sure where to do this. Should the workqueue be arranging
that for us in the case of workqueues created with WQ_MEM_RECLAIM?

Reported-by: Trond Myklebust <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

J. Bruce Fields [Thu, 20 Dec 2018 15:35:11 +0000 (10:35 -0500)]

sunrpc: handle ENOMEM in rpcb_getport_async

If we ignore the error we'll hit a null dereference a little later.

Reported-by: [email protected]
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

NeilBrown [Wed, 19 Dec 2018 23:29:55 +0000 (10:29 +1100)]

NFS: remove unnecessary test for IS_ERR(cred)

As gte_current_cred() cannot return an error,
this test is not necessary.
It hasn't been necessary for years, but it wasn't so obvious
before.

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Fri, 7 Dec 2018 16:11:44 +0000 (11:11 -0500)]

xprtrdma: Prevent leak of rpcrdma_rep objects

If a reply has been processed but the RPC is later retransmitted
anyway, the req->rl_reply field still contains the only pointer to
the old rpcrdma rep. When the next reply comes in, the reply handler
will stomp on the rl_reply field, leaking the old rep.

A trace event is added to capture such leaks.

This problem seems to be worsened by the restructuring of the RPC
Call path in v4.20. Fully addressing this issue will require at
least a re-architecture of the disconnect logic, which is not
appropriate during -rc.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Olga Kornievskaia [Thu, 6 Dec 2018 16:10:36 +0000 (11:10 -0500)]

NFSv4.2 fix async copy reboot recovery

Original commit (e4648aa4f98a "NFS recover from destination server
reboot for copies") used memcmp() and then it was changed to use
nfs4_stateid_match_other() but that function returns opposite of
memcmp. As the result, recovery can't find the copy leading
to copy hanging.

Fixes: 80f42368868e ("NFSv4: Split out NFS v4.2 copy completion functions")
Fixes: cb7a8384dc02 ("NFS: Split out the body of nfs4_reclaim_open_state")
Signed-of-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 16:00:48 +0000 (11:00 -0500)]

xprtrdma: Don't leak freed MRs

Defensive clean up. Don't set frwr->fr_mr until we know that the
scatterlist allocation has succeeded.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 16:00:37 +0000 (11:00 -0500)]

xprtrdma: Add documenting comment for rpcrdma_buffer_destroy

Make a note of the function's dependency on an earlier ib_drain_qp.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 16:00:32 +0000 (11:00 -0500)]

xprtrdma: Replace outdated comment for rpcrdma_ep_post

Since commit 7c8d9e7c8863 ("xprtrdma: Move Receive posting to
Receive handler"), rpcrdma_ep_post is no longer responsible for
posting Receive buffers. Update the documenting comment to reflect
this change.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 16:00:27 +0000 (11:00 -0500)]

xprtrdma: Update comments in frwr_op_send

Commit f2877623082b ("xprtrdma: Chain Send to FastReg WRs") was
written before commit ce5b37178283 ("xprtrdma: Replace all usage of
"frmr" with "frwr""), but was merged afterwards. Thus it still
refers to FRMR and MWs.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 16:00:22 +0000 (11:00 -0500)]

SUNRPC: Fix some kernel doc complaints

Clean up some warnings observed when building with "make W=1".

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 16:00:16 +0000 (11:00 -0500)]

SUNRPC: Simplify defining common RPC trace events

Clean up, no functional change is expected.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 16:00:11 +0000 (11:00 -0500)]

NFS: Fix NFSv4 symbolic trace point output

These symbolic values were not being displayed in string form.
TRACE_DEFINE_ENUM was missing in many cases. It also turns out that
__print_symbolic wants an unsigned long in the first field...

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 16:00:06 +0000 (11:00 -0500)]

xprtrdma: Trace mapping, alloc, and dereg failures

These are rare, but can be helpful at tracking down DMAR and other
problems.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 16:00:00 +0000 (11:00 -0500)]

xprtrdma: Add trace points for calls to transport switch methods

Name them "trace_xprtrdma_op_*" so they can be easily enabled as a
group. No trace point is added where the generic layer already has
observability.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:59:55 +0000 (10:59 -0500)]

xprtrdma: Relocate the xprtrdma_mr_map trace points

The mr_map trace points were capturing information about the previous
use of the MR rather than about the segment that was just mapped.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:59:49 +0000 (10:59 -0500)]

xprtrdma: Clean up of xprtrdma chunk trace points

The chunk-related trace points capture nearly the same information
as the MR-related trace points.

Also, rename them so globbing can be used to enable or disable
these trace points more easily.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:59:44 +0000 (10:59 -0500)]

xprtrdma: Remove unused fields from rpcrdma_ia

Clean up. The last use of these fields was in commit 173b8f49b3af
("xprtrdma: Demote "connect" log messages") .

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:59:39 +0000 (10:59 -0500)]

xprtrdma: Cull dprintk() call sites

Clean up: Remove dprintk() call sites that report rare or impossible
errors. Leave a few that display high-value low noise status
information.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:59:33 +0000 (10:59 -0500)]

xprtrdma: Simplify locking that protects the rl_allreqs list

Clean up: There's little chance of contention between the use of
rb_lock and rb_reqslock, so merge the two. This avoids having to
take both in some (possibly future) cases.

Transport tear-down is already serialized, thus there is no need for
locking at all when destroying rpcrdma_reqs.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:59:28 +0000 (10:59 -0500)]

xprtrdma: Expose transport header errors

For better observability of parsing errors, return the error code
generated in the decoders to the upper layer consumer.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:59:23 +0000 (10:59 -0500)]

xprtrdma: Remove request_module from backchannel

Since commit ffe1f0df5862 ("rpcrdma: Merge svcrdma and xprtrdma
modules into one"), the forward and backchannel components are part
of the same kernel module. A separate request_module() call in the
backchannel code is no longer necessary.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:59:17 +0000 (10:59 -0500)]

xprtrdma: Recognize XDRBUF_SPARSE_PAGES

Commit 431f6eb3570f ("SUNRPC: Add a label for RPC calls that require
allocation on receive") didn't update similar logic in rpc_rdma.c.
I don't think this is a bug, per-se; the commit just adds more
careful checking for broken upper layer behavior.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:59:12 +0000 (10:59 -0500)]

NFS: Make "port=" mount option optional for RDMA mounts

Having to specify "proto=rdma,port=20049" is cumbersome.

RFC 8267 Section 6.3 requires NFSv4 clients to use "the alternative
well-known port number", which is 20049. Make the use of the well-
known port number automatic, just as it is for NFS/TCP and port
2049.

For NFSv2/3, Section 4.2 allows clients to simply choose 20049 as
the default or use rpcbind. I don't know of an NFS/RDMA server
implementation that registers it's NFS/RDMA service with rpcbind,
so automatically choosing 20049 seems like the better choice. The
other widely-deployed NFS/RDMA client, Solaris, also uses 20049
as the default port.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:59:07 +0000 (10:59 -0500)]

xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)

Place the associated RPC transaction's XID in the upper 32 bits of
each RDMA segment's rdma_offset field. There are two reasons to do
this:

- The R_key only has 8 bits that are different from registration to
  registration. The XID adds more uniqueness to each RDMA segment to
  reduce the likelihood of a software bug on the server reading from
  or writing into memory it's not supposed to.

- On-the-wire RDMA Read and Write requests do not otherwise carry
  any identifier that matches them up to an RPC. The XID in the
  upper 32 bits will act as an eye-catcher in network captures.

Suggested-by: Tom Talpey <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:59:01 +0000 (10:59 -0500)]

xprtrdma: Remove rpcrdma_memreg_ops

Clean up: Now that there is only FRWR, there is no need for a memory
registration switch. The indirect calls to the memreg operations can
be replaced with faster direct calls.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:58:56 +0000 (10:58 -0500)]

xprtrdma: Remove support for FMR memory registration

FMR is not supported on most recent RDMA devices. It is also less
secure than FRWR because an FMR memory registration can expose
adjacent bytes to remote reading or writing. As discussed during the
RDMA BoF at LPC 2018, it is time to remove support for FMR in the
NFS/RDMA client stack.

Note that NFS/RDMA server-side uses either local memory registration
or FRWR. FMR is not used.

There are a few Infiniband/RoCE devices in the kernel tree that do
not appear to support MEM_MGT_EXTENSIONS (FRWR), and therefore will
not support client-side NFS/RDMA after this patch. These are:

- mthca
- qib
- hns (RoCE)

Users of these devices can use NFS/TCP on IPoIB instead.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:58:51 +0000 (10:58 -0500)]

xprtrdma: Reduce max_frwr_depth

Some devices advertise a large max_fast_reg_page_list_len
capability, but perform optimally when MRs are significantly smaller
than that depth -- probably when the MR itself is no larger than a
page.

By default, the RDMA R/W core API uses max_sge_rd as the maximum
page depth for MRs. For some devices, the value of max_sge_rd is
1, which is also not optimal. Thus, when max_sge_rd is larger than
1, use that value. Otherwise use the value of the
max_fast_reg_page_list_len attribute.

I've tested this with CX-3 Pro, FastLinq, and CX-5 devices. It
reproducibly improves the throughput of large I/Os by several
percent.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:58:45 +0000 (10:58 -0500)]

xprtrdma: Fix ri_max_segs and the result of ro_maxpages

With certain combinations of krb5i/p, MR size, and r/wsize, I/O can
fail with EMSGSIZE. This is because the calculated value of
ri_max_segs (the max number of MRs per RPC) exceeded
RPCRDMA_MAX_HDR_SEGS, which caused Read or Write list encoding to
walk off the end of the transport header.

Once that was addressed, the ro_maxpages result has to be corrected
to account for the number of MRs needed for Reply chunks, which is
2 MRs smaller than a normal Read or Write chunk.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:58:40 +0000 (10:58 -0500)]

xprtrdma: Don't wake pending tasks until disconnect is done

Transport disconnect processing does a "wake pending tasks" at
various points.

Suppose an RPC Reply is being processed. The RPC task that Reply
goes with is waiting on the pending queue. If a disconnect wake-up
happens before reply processing is done, that reply, even if it is
good, is thrown away, and the RPC has to be sent again.

This window apparently does not exist for socket transports because
there is a lock held while a reply is being received which prevents
the wake-up call until after reply processing is done.

To resolve this, all RPC replies being processed on an RPC-over-RDMA
transport have to complete before pending tasks are awoken due to a
transport disconnect.

Callers that already hold the transport write lock may invoke
->ops->close directly. Others use a generic helper that schedules
a close when the write lock can be taken safely.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:58:35 +0000 (10:58 -0500)]

xprtrdma: No qp_event disconnect

After thinking about this more, and auditing other kernel ULP imple-
mentations, I believe that a DISCONNECT cm_event will occur after a
fatal QP event. If that's the case, there's no need for an explicit
disconnect in the QP event handler.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:58:29 +0000 (10:58 -0500)]

xprtrdma: Replace rpcrdma_receive_wq with a per-xprt workqueue

To address a connection-close ordering problem, we need the ability
to drain the RPC completions running on rpcrdma_receive_wq for just
one transport. Give each transport its own RPC completion workqueue,
and drain that workqueue when disconnecting the transport.

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:58:24 +0000 (10:58 -0500)]

xprtrdma: Refactor Receive accounting

Clean up: Divide the work cleanly:

- rpcrdma_wc_receive is responsible only for RDMA Receives
- rpcrdma_reply_handler is responsible only for RPC Replies
- the posted send and receive counts both belong in rpcrdma_ep

Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:58:19 +0000 (10:58 -0500)]

xprtrdma: Ensure MRs are DMA-unmapped when posting LOCAL_INV fails

The recovery case in frwr_op_unmap_sync needs to DMA unmap each MR.
frwr_release_mr does not DMA-unmap, but the recycle worker does.

Fixes: 61da886bf74e ("xprtrdma: Explicitly resetting MRs is ... ")
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Chuck Lever [Wed, 19 Dec 2018 15:58:13 +0000 (10:58 -0500)]

xprtrdma: Yet another double DMA-unmap

While chasing yet another set of DMAR fault reports, I noticed that
the frwr recycler conflates whether or not an MR has been DMA
unmapped with frwr->fr_state. Actually the two have only an indirect
relationship. It's in fact impossible to guess reliably whether the
MR has been DMA unmapped based on its fr_state field, especially as
the surrounding code and its assumptions have changed over time.

A better approach is to track the DMA mapping status explicitly so
that the recycler is less brittle to unexpected situations, and
attempts to DMA-unmap a second time are prevented.

Signed-off-by: Chuck Lever <[email protected]>
Cc: [email protected] # v4.20
Signed-off-by: Anna Schumaker <[email protected]>

commit | commitdiff | tree

Moni Shoua [Wed, 26 Dec 2018 19:42:12 +0000 (21:42 +0200)]

IB/core: Add advise_mr to the list of known ops

We need to add advise_mr to the list of operation setters on the ib_device
or otherwise callers to ib_set_device_ops() for advise_mr operation will
not have their callback registered.

When the advise_mr series was merged with the device ops series the
SET_DEVICE_OPS() was missed.

Fixes: 813e90b1aeaa ("IB/mlx5: Add advise_mr() support")
Signed-off-by: Moni Shoua <[email protected]>
Reviewed-by: Majd Dibbiny <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

commit | commitdiff | tree

Leon Romanovsky [Wed, 26 Dec 2018 13:22:12 +0000 (15:22 +0200)]

Revert "IB/mlx5: Fix long EEH recover time with NVMe offloads"

Longer term testing shows this patch didn't play well with MR cache and
caused to call traces during remove_mkeys().

This reverts commit bb7e22a8ab00ff9ba911a45ba8784cef9e6d6f7a.

Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

commit | commitdiff | tree

Yishai Hadas [Sun, 23 Dec 2018 11:12:21 +0000 (13:12 +0200)]

IB/mlx5: Allow XRC INI usage via verbs in DEVX context

From device point of view both XRC target and initiator are XRC transport
type.

Fix to use the expected UID as was handled for the XRC target case to
allow its usage via verbs in DEVX context.

Fixes: 5aa3771ded54 ("IB/mlx5: Allow XRC usage via verbs in DEVX context")
Signed-off-by: Yishai Hadas <[email protected]>
Reviewed-by: Artemy Kovalyov <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

commit | commitdiff | tree

Guo Ren [Wed, 2 Jan 2019 14:09:25 +0000 (22:09 +0800)]

csky: Add perf support for C-SKY

This adds basic perf support for all C-SKY CPUs. Hardware events are
only supported by 807/810/860.

Signed-off-by: Guo Ren <[email protected]>

commit | commitdiff | tree

Adrian Hunter [Fri, 21 Dec 2018 12:06:20 +0000 (14:06 +0200)]

perf session: Add comment for perf_session__register_idle_thread()

Add a comment to perf_session__register_idle_thread() to bring attention to
a pitfall with the idle task thread structure. The pitfall is that there
should really be a 'struct thread' for the idle task of each cpu, but there
is only one that can have pid == tid == 0.

Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

commit | commitdiff | tree

Adrian Hunter [Fri, 21 Dec 2018 12:06:19 +0000 (14:06 +0200)]

perf thread-stack: Fix thread stack processing for the idle task

perf creates a single 'struct thread' to represent the idle task. That
is because threads are identified by PID and TID, and the idle task
always has PID == TID == 0.

However, there are actually separate idle tasks for each CPU. That
creates a problem for thread stack processing which assumes that each
thread has a single stack, not one stack per CPU.

Fix that by passing through the CPU number, and in the case of the idle
"thread", pick the thread stack from an array based on the CPU number.

Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

commit | commitdiff | tree

Adrian Hunter [Fri, 21 Dec 2018 12:06:18 +0000 (14:06 +0200)]

perf thread-stack: Allocate an array of thread stacks

In preparation for fixing thread stack processing for the idle task,
allocate an array of thread stacks.

Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ No need to check for NULL when calling zfree(), noticed by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

commit | commitdiff | tree

Adrian Hunter [Fri, 21 Dec 2018 12:06:17 +0000 (14:06 +0200)]

perf thread-stack: Factor out thread_stack__init()

In preparation for fixing thread stack processing for the idle task,
factor out thread_stack__init().

Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

commit | commitdiff | tree

Adrian Hunter [Fri, 21 Dec 2018 12:06:16 +0000 (14:06 +0200)]

perf thread-stack: Allow for a thread stack array

In preparation for fixing thread stack processing for the idle task,
allow for a thread stack array.

Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

commit | commitdiff | tree

Adrian Hunter [Fri, 21 Dec 2018 12:06:15 +0000 (14:06 +0200)]

perf thread-stack: Avoid direct reference to the thread's stack

In preparation for fixing thread stack processing for the idle task,
avoid direct reference to the thread's stack. The thread stack will
change to an array of thread stacks, at which point the meaning of the
direct reference will change.

Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Rename thread_stack__ts() to thread__stack() since this operates on a 'thread' struct ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

commit | commitdiff | tree

Adrian Hunter [Fri, 21 Dec 2018 12:06:14 +0000 (14:06 +0200)]

perf thread-stack: Tidy thread_stack__bottom() usage

In preparation for fixing thread stack processing for the idle task,
tidy thread_stack__bottom() usage. Specifically, the parameter 'thread'
is not needed.

Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

commit | commitdiff | tree

Adrian Hunter [Fri, 21 Dec 2018 12:06:13 +0000 (14:06 +0200)]

perf thread-stack: Simplify some code in thread_stack__process()

In preparation for fixing thread stack processing for the idle task,
simplify some code in thread_stack__process().

Signed-off-by: Adrian Hunter <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

commit | commitdiff | tree

Bjorn Andersson [Mon, 24 Dec 2018 07:26:44 +0000 (23:26 -0800)]

thermal: generic-adc: Fix adc to temp interpolation

First correct the edge case to return the last element if we're
outside the range, rather than at the last element, so that
interpolation is not omitted for points between the two last entries in
the table.

Then correct the formula to perform linear interpolation based the two
points surrounding the read ADC value. The indices for temp are kept as
"hi" and "lo" to pair with the adc indices, but there's no requirement
that the temperature is provided in descendent order. mult_frac() is
used to prevent issues with overflowing the int.

Cc: Laxman Dewangan <[email protected]>
Signed-off-by: Bjorn Andersson <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>

commit | commitdiff | tree

Yoshihiro Kaneko [Mon, 17 Dec 2018 14:50:21 +0000 (23:50 +0900)]

thermal: rcar_thermal: add R8A77990 support

Add support for R-Car E3 (R8A77990) thermal support.

Signed-off-by: Yoshihiro Kaneko <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>

commit | commitdiff | tree

Yoshihiro Kaneko [Mon, 17 Dec 2018 14:50:20 +0000 (23:50 +0900)]

dt-bindings: thermal: rcar-thermal: add R8A77990 support

Document the R-Car E3 (R8A77990) SoC bindings.

Signed-off-by: Yoshihiro Kaneko <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Simon Horman <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>

commit | commitdiff | tree

Fabrizio Castro [Thu, 13 Dec 2018 20:23:10 +0000 (20:23 +0000)]

thermal: rcar_thermal: add R8A774C0 support

Add thermal support for the RZ/G2E SoC (a.k.a. R8A774C0).

Signed-off-by: Fabrizio Castro <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>

commit | commitdiff | tree

Fabrizio Castro [Thu, 13 Dec 2018 20:19:48 +0000 (20:19 +0000)]

dt-bindings: thermal: rcar-thermal: add R8A774C0 support

Document RZ/G2E SoC (a.k.a. r8a774c0) bindings.

Signed-off-by: Fabrizio Castro <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>

commit | commitdiff | tree

Miquel Raynal [Wed, 12 Dec 2018 09:36:43 +0000 (10:36 +0100)]

dt-bindings: cp110: document the thermal interrupt capabilities

The thermal IP can produce interrupts on overheat situation.
Describe them.

Signed-off-by: Miquel Raynal <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>

commit | commitdiff | tree

Miquel Raynal [Wed, 12 Dec 2018 09:36:42 +0000 (10:36 +0100)]

dt-bindings: ap806: document the thermal interrupt capabilities

The thermal IP can produce interrupts on overheat situation.
Describe them.

Signed-off-by: Miquel Raynal <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>

commit | commitdiff | tree

Miquel Raynal [Wed, 12 Dec 2018 09:36:41 +0000 (10:36 +0100)]

MAINTAINERS: thermal: add entry for Marvell MVEBU thermal driver

Add myself as Marvell MVEBU thermal driver maintainer.

Signed-off-by: Miquel Raynal <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>

commit | commitdiff | tree

Miquel Raynal [Wed, 12 Dec 2018 09:36:40 +0000 (10:36 +0100)]

thermal: armada: add overheat interrupt support

The IP can manage to trigger interrupts on overheat situation from all
the sensors.

However, the interrupt source changes along with the last selected
source (ie. the last read sensor), which is an inconsistent behavior.
Avoid possible glitches by always selecting back only one channel which
will then be referenced as the "overheat_sensor" (arbitrarily: the first
in the DT which has a critical trip point filled in).

It is possible that the scan of all thermal zone nodes did not bring a
critical trip point from which the overheat interrupt could be
configured. In this case just complain but do not fail the probe.

Also disable sensor switch during overheat situations because changing
the channel while the system is too hot could clear the overheat state
by changing the source while the temperature is still very high.

Even if the overheat state is not declared, overheat interrupt must be
cleared by reading the DFX interrupt cause _after_ the temperature has
fallen down to the low threshold, otherwise future possible interrupts
would not be served. A work polls the corresponding register until the
overheat flag gets cleared in this case.

Suggested-by: David Sniatkiwicz <[email protected]>
Signed-off-by: Miquel Raynal <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>

commit | commitdiff | tree

Arnd Bergmann [Mon, 10 Dec 2018 21:57:27 +0000 (22:57 +0100)]

thermal: st: fix Makefile typo

When STM32_THERMAL is enabled, this overrides all previously
enabled files in the same directory, as seen from this link failure:

ERROR: "st_thermal_pm_ops" [drivers/thermal/st/st_thermal_syscfg.ko] undefined!
ERROR: "st_thermal_register" [drivers/thermal/st/st_thermal_syscfg.ko] undefined!
ERROR: "st_thermal_unregister" [drivers/thermal/st/st_thermal_syscfg.ko] undefined!

The correct syntax in Makefile requires using += instead of :=.

Fixes: 1d6931556073 ("thermal: add stm32 thermal driver")
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Daniel Lezcano <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>

commit | commitdiff | tree

Kunihiko Hayashi [Fri, 7 Dec 2018 01:06:04 +0000 (10:06 +0900)]

thermal: uniphier: Convert to SPDX identifier

This converts license boilerplate to SPDX identifier, and removes
unnecessary lines.

Signed-off-by: Kunihiko Hayashi <[email protected]>
Reviewed-by: Daniel Lezcano <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>

commit | commitdiff | tree

Yangtao Li [Fri, 23 Nov 2018 14:55:22 +0000 (09:55 -0500)]

thermal/intel_powerclamp: Change to use DEFINE_SHOW_ATTRIBUTE macro

Use macro to simplify the code.

Signed-off-by: Yangtao Li <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>

commit | commitdiff | tree

Yangtao Li [Fri, 23 Nov 2018 14:50:58 +0000 (09:50 -0500)]

thermal: tegra: soctherm: Change to use DEFINE_SHOW_ATTRIBUTE macro

Use macro to simplify the code.

Signed-off-by: Yangtao Li <[email protected]>
Signed-off-by: Eduardo Valentin <[email protected]>

Empty description

This page took 0.124738 seconds and 4 git commands to generate.