]> Git Repo - linux.git/log
linux.git
6 years agonet: ipv6: netconf: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:22 +0000 (10:46 -0800)]
net: ipv6: netconf: perform strict checks also for doit handlers

Make RTM_GETNETCONF's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: ipv6: addr: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:21 +0000 (10:46 -0800)]
net: ipv6: addr: perform strict checks also for doit handlers

Make RTM_GETADDR's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: ipv4: ipmr: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:20 +0000 (10:46 -0800)]
net: ipv4: ipmr: perform strict checks also for doit handlers

Make RTM_GETROUTE's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

v2: - improve extack messages (DaveA).

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: ipv4: route: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:19 +0000 (10:46 -0800)]
net: ipv4: route: perform strict checks also for doit handlers

Make RTM_GETROUTE's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

v2: - new patch (DaveA).

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: ipv4: netconf: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:18 +0000 (10:46 -0800)]
net: ipv4: netconf: perform strict checks also for doit handlers

Make RTM_GETNETCONF's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: namespace: perform strict checks also for doit handlers
Jakub Kicinski [Fri, 18 Jan 2019 18:46:17 +0000 (10:46 -0800)]
net: namespace: perform strict checks also for doit handlers

Make RTM_GETNSID's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

v2: - don't check size >= sizeof(struct rtgenmsg) (Nicolas).

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agortnetlink: ifinfo: perform strict checks also for doit handler
Jakub Kicinski [Fri, 18 Jan 2019 18:46:16 +0000 (10:46 -0800)]
rtnetlink: ifinfo: perform strict checks also for doit handler

Make RTM_GETLINK's doit handler use strict checks when
NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agortnetlink: stats: reject requests for unknown stats
Jakub Kicinski [Fri, 18 Jan 2019 18:46:15 +0000 (10:46 -0800)]
rtnetlink: stats: reject requests for unknown stats

In the spirit of strict checks reject requests of stats the kernel
does not support when NETLINK_F_STRICT_CHK is set.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agortnetlink: stats: validate attributes in get as well as dumps
Jakub Kicinski [Fri, 18 Jan 2019 18:46:14 +0000 (10:46 -0800)]
rtnetlink: stats: validate attributes in get as well as dumps

Make sure NETLINK_GET_STRICT_CHK influences both GETSTATS doit
as well as the dump.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: netlink: add helper to retrieve NETLINK_F_STRICT_CHK
Jakub Kicinski [Fri, 18 Jan 2019 18:46:13 +0000 (10:46 -0800)]
net: netlink: add helper to retrieve NETLINK_F_STRICT_CHK

Dumps can read state of the NETLINK_F_STRICT_CHK flag from
a field in the callback structure.  For non-dump GET requests
we need a way to access the state of that flag from a socket.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agovirtio-net: per-queue RPS config
Willem de Bruijn [Fri, 18 Jan 2019 01:08:53 +0000 (20:08 -0500)]
virtio-net: per-queue RPS config

On multiqueue network devices, RPS maps are configured independently
for each receive queue through /sys/class/net/$DEV/queues/rx-*.

On virtio-net currently all packets use the map from rx-0, because the
real rx queue is not known at time of map lookup by get_rps_cpu.

Call skb_record_rx_queue in the driver rx path to make lookup work.

Recording the receive queue has ramifications beyond RPS, such as in
sticky load balancing decisions for sockets (skb_tx_hash) and XPS.

Reported-by: Mark Hlady <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: phy: phy driver features are mandatory
Camelia Groza [Thu, 17 Jan 2019 12:22:36 +0000 (14:22 +0200)]
net: phy: phy driver features are mandatory

Since phy driver features became a link_mode bitmap, phy drivers that
don't have a list of features configured will cause the kernel to crash
when probed.

Prevent the phy driver from registering if the features field is missing.

Fixes: 719655a14971 ("net: phy: Replace phy driver features u32 with link_mode bitmap")
Reported-by: Scott Wood <[email protected]>
Signed-off-by: Camelia Groza <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoisdn: avm: Fix string plus integer warning from Clang
Nathan Chancellor [Thu, 10 Jan 2019 05:41:08 +0000 (22:41 -0700)]
isdn: avm: Fix string plus integer warning from Clang

A recent commit in Clang expanded the -Wstring-plus-int warning, showing
some odd behavior in this file.

drivers/isdn/hardware/avm/b1.c:426:30: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int]
                cinfo->version[j] = "\0\0" + 1;
                                    ~~~~~~~^~~
drivers/isdn/hardware/avm/b1.c:426:30: note: use array indexing to silence this warning
                cinfo->version[j] = "\0\0" + 1;
                                           ^
                                    &      [  ]
1 warning generated.

This is equivalent to just "\0". Nick pointed out that it is smarter to
use "" instead of "\0" because "" is used elsewhere in the kernel and
can be deduplicated at the linking stage.

Link: https://github.com/ClangBuiltLinux/linux/issues/309
Suggested-by: Nick Desaulniers <[email protected]>
Signed-off-by: Nathan Chancellor <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agosch_api: Change signature of qdisc_tree_reduce_backlog() to use ints
Toke Høiland-Jørgensen [Wed, 9 Jan 2019 16:10:57 +0000 (17:10 +0100)]
sch_api: Change signature of qdisc_tree_reduce_backlog() to use ints

There are now several places where qdisc_tree_reduce_backlog() is called
with a negative number of packets (to signal an increase in number of
packets in the queue). Rather than rely on overflow behaviour, change the
function signature to use signed integers to communicate this usage to
people reading the code.

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agopowerpc: chrp: Use of_node_is_type to access device_type
Rob Herring [Fri, 18 Jan 2019 14:12:10 +0000 (08:12 -0600)]
powerpc: chrp: Use of_node_is_type to access device_type

Commit 8ce5f8415753 ("of: Remove struct device_node.type pointer")
removed struct device_node.type pointer, but the conversion to use
of_node_is_type() accessor was missed in chrp_init_IRQ().

Fixes: 8ce5f8415753 ("of: Remove struct device_node.type pointer")
Reported-by: kbuild test robot <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: [email protected]
Acked-by: Michael Ellerman <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
6 years agoMerge tag 'mlx5-fixes-2019-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Sat, 19 Jan 2019 02:23:23 +0000 (18:23 -0800)]
Merge tag 'mlx5-fixes-2019-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-01-18

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

For -stable v4.18
('net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames')

The patch doesn't apply cleanly to 4.18.y, but it is very simple to
resolve, what should be the procedure here ?
====================

Signed-off-by: David S. Miller <[email protected]>
6 years agonet/mlx5e: Fix cb_ident duplicate in indirect block register
Eli Britstein [Wed, 19 Dec 2018 05:36:51 +0000 (07:36 +0200)]
net/mlx5e: Fix cb_ident duplicate in indirect block register

Previously the identifier used for indirect block callback registry
and for block rule cb registry (when done via indirect blocks) was the
pointer to the tunnel netdev we were interested in receiving updates on.
This worked fine if a single PF existed that registered one callback for
the tunnel netdev of interest. However, if multiple PFs are in place then
the 2nd PF tries to register with the same tunnel netdev identifier. This
leads to EEXIST errors and/or incorrect cb deletions.

Prevent this conflict by using the rpriv pointer as the identifier for
netdev indirect block cb registry, allowing each PF to register a unique
callback per tunnel netdev. For block cb registry, the same PF may
register multiple cbs to the same block if using TC shared blocks.
Instead of the rpriv, use the pointer to the allocated indr_priv data as
the identifier here. This means that there can be a unique block callback
for each PF/tunnel netdev combo.

Fixes: f5bc2c5de101 ("net/mlx5e: Support TC indirect block notifications
for eswitch uplink reprs")
Signed-off-by: Eli Britstein <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
6 years agonet/mlx5e: Fix wrong (zero) TX drop counter indication for representor
Tariq Toukan [Thu, 8 Nov 2018 10:06:53 +0000 (12:06 +0200)]
net/mlx5e: Fix wrong (zero) TX drop counter indication for representor

For representors, the TX dropped counter is not folded from the
per-ring counters. Fix it.

Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
6 years agonet/mlx5e: Fix wrong error code return on FEC query failure
Shay Agroskin [Sun, 9 Dec 2018 10:00:13 +0000 (12:00 +0200)]
net/mlx5e: Fix wrong error code return on FEC query failure

Advertised and configured FEC query failure resulted in printing
wrong error code.

Fixes: 6cfa94605091 ("net/mlx5e: Ethtool driver callback for query/set FEC policy")
Signed-off-by: Shay Agroskin <[email protected]>
Reported-by: Or Gerlitz <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
6 years agonet/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
Cong Wang [Tue, 4 Dec 2018 06:14:04 +0000 (22:14 -0800)]
net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames

When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, we have a switch which pads non-zero octets, this
causes kernel hardware checksum fault repeatedly.

Prior to:
commit '88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE ...")'
skb checksum was forced to be CHECKSUM_NONE when padding is detected.
After it, we need to keep skb->csum updated, like what we do for RXFCS.
However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
headers, it is not worthy the effort as the packets are so small that
CHECKSUM_COMPLETE can't save anything.

Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
Cc: Eric Dumazet <[email protected]>
Cc: Tariq Toukan <[email protected]>
Cc: Nikola Ciprich <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
6 years agotools: bpftool: Cleanup license mess
Thomas Gleixner [Thu, 17 Jan 2019 23:14:24 +0000 (00:14 +0100)]
tools: bpftool: Cleanup license mess

Precise and non-ambiguous license information is important. The recent
relicensing of the bpftools introduced a license conflict.

The files have now:

     SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause

and

     * This program is free software; you can redistribute it and/or
     * modify it under the terms of the GNU General Public License
     * as published by the Free Software Foundation; either version
     * 2 of the License, or (at your option) any later version

Amazingly about 20 people acked that change and neither they nor the
committer noticed. Oh well.

Digging deeper: The files were imported from the iproute2 repository with
the GPL V2 or later boiler plate text in commit b66e907cfee2 ("tools:
bpftool: copy JSON writer from iproute2 repository")

Looking at the iproute2 repository at

  git://git.kernel.org/pub/scm/network/iproute2/iproute2.git

the following commit is the equivivalent:

  commit d9d8c839 ("json_writer: add SPDX Identifier (GPL-2/BSD-2)")

That commit explicitly removes the boiler plate and relicenses the code
uner GPL-2.0-only and BSD-2-Clause. As Steven wrote the original code and
also the relicensing commit, it's assumed that the relicensing was intended
to do exaclty that. Just the kernel side update failed to remove the boiler
plate. Do so now.

Fixes: 907b22365115 ("tools: bpftool: dual license all files")
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: YueHaibing <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: Stanislav Fomichev <[email protected]>
Cc: Sean Young <[email protected]>
Cc: Jiri Benc <[email protected]>
Cc: David Calavera <[email protected]>
Cc: Andrey Ignatov <[email protected]>
Cc: Joe Stringer <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Petar Penkov <[email protected]>
Cc: Sandipan Das <[email protected]>
Cc: Prashant Bhole <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Taeung Song <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Quentin Monnet <[email protected]>
CC: [email protected]
Cc: [email protected]
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
6 years agobpf: fix inner map masking to prevent oob under speculation
Daniel Borkmann [Thu, 17 Jan 2019 15:34:45 +0000 (16:34 +0100)]
bpf: fix inner map masking to prevent oob under speculation

During review I noticed that inner meta map setup for map in
map is buggy in that it does not propagate all needed data
from the reference map which the verifier is later accessing.

In particular one such case is index masking to prevent out of
bounds access under speculative execution due to missing the
map's unpriv_array/index_mask field propagation. Fix this such
that the verifier is generating the correct code for inlined
lookups in case of unpriviledged use.

Before patch (test_verifier's 'map in map access' dump):

  # bpftool prog dump xla id 3
     0: (62) *(u32 *)(r10 -4) = 0
     1: (bf) r2 = r10
     2: (07) r2 += -4
     3: (18) r1 = map[id:4]
     5: (07) r1 += 272                |
     6: (61) r0 = *(u32 *)(r2 +0)     |
     7: (35) if r0 >= 0x1 goto pc+6   | Inlined map in map lookup
     8: (54) (u32) r0 &= (u32) 0      | with index masking for
     9: (67) r0 <<= 3                 | map->unpriv_array.
    10: (0f) r0 += r1                 |
    11: (79) r0 = *(u64 *)(r0 +0)     |
    12: (15) if r0 == 0x0 goto pc+1   |
    13: (05) goto pc+1                |
    14: (b7) r0 = 0                   |
    15: (15) if r0 == 0x0 goto pc+11
    16: (62) *(u32 *)(r10 -4) = 0
    17: (bf) r2 = r10
    18: (07) r2 += -4
    19: (bf) r1 = r0
    20: (07) r1 += 272                |
    21: (61) r0 = *(u32 *)(r2 +0)     | Index masking missing (!)
    22: (35) if r0 >= 0x1 goto pc+3   | for inner map despite
    23: (67) r0 <<= 3                 | map->unpriv_array set.
    24: (0f) r0 += r1                 |
    25: (05) goto pc+1                |
    26: (b7) r0 = 0                   |
    27: (b7) r0 = 0
    28: (95) exit

After patch:

  # bpftool prog dump xla id 1
     0: (62) *(u32 *)(r10 -4) = 0
     1: (bf) r2 = r10
     2: (07) r2 += -4
     3: (18) r1 = map[id:2]
     5: (07) r1 += 272                |
     6: (61) r0 = *(u32 *)(r2 +0)     |
     7: (35) if r0 >= 0x1 goto pc+6   | Same inlined map in map lookup
     8: (54) (u32) r0 &= (u32) 0      | with index masking due to
     9: (67) r0 <<= 3                 | map->unpriv_array.
    10: (0f) r0 += r1                 |
    11: (79) r0 = *(u64 *)(r0 +0)     |
    12: (15) if r0 == 0x0 goto pc+1   |
    13: (05) goto pc+1                |
    14: (b7) r0 = 0                   |
    15: (15) if r0 == 0x0 goto pc+12
    16: (62) *(u32 *)(r10 -4) = 0
    17: (bf) r2 = r10
    18: (07) r2 += -4
    19: (bf) r1 = r0
    20: (07) r1 += 272                |
    21: (61) r0 = *(u32 *)(r2 +0)     |
    22: (35) if r0 >= 0x1 goto pc+4   | Now fixed inlined inner map
    23: (54) (u32) r0 &= (u32) 0      | lookup with proper index masking
    24: (67) r0 <<= 3                 | for map->unpriv_array.
    25: (0f) r0 += r1                 |
    26: (05) goto pc+1                |
    27: (b7) r0 = 0                   |
    28: (b7) r0 = 0
    29: (95) exit

Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
6 years agobpf: pull in pkt_sched.h header for tooling to fix bpftool build
Daniel Borkmann [Thu, 17 Jan 2019 15:15:09 +0000 (16:15 +0100)]
bpf: pull in pkt_sched.h header for tooling to fix bpftool build

Dan reported that bpftool does not compile for him:

  $ make tools/bpf
    DESCEND  bpf

  Auto-detecting system features:
  ..                        libbfd: [ on  ]
  ..        disassembler-four-args: [ OFF ]

    DESCEND  bpftool

  Auto-detecting system features:
  ..                        libbfd: [ on  ]
  ..        disassembler-four-args: [ OFF ]

    CC       /opt/linux.git/tools/bpf/bpftool/net.o
  In file included from /opt/linux.git/tools/include/uapi/linux/pkt_cls.h:6:0,
                 from /opt/linux.git/tools/include/uapi/linux/tc_act/tc_bpf.h:14,
                 from net.c:13:
  net.c: In function 'show_dev_tc_bpf':
  net.c:164:21: error: 'TC_H_CLSACT' undeclared (first use in this function)
    handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS);
  [...]

Fix it by importing pkt_sched.h header copy into tooling
infrastructure.

Fixes: 49a249c38726 ("tools/bpftool: copy a few net uapi headers to tools directory")
Fixes: f6f3bac08ff9 ("tools/bpf: bpftool: add net support")
Reported-by: Dan Gilson <[email protected]>
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=202315
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
6 years agoMerge branch 'mlxsw-fixes'
David S. Miller [Fri, 18 Jan 2019 23:12:16 +0000 (15:12 -0800)]
Merge branch 'mlxsw-fixes'

Ido Schimmel says:

====================
mlxsw: Various fixes

This patchset contains small fixes in mlxsw and one fix in the bridge
driver.

Patches #1-#4 perform small adjustments in PCI and FID code following
recent tests that were performed on the Spectrum-2 ASIC.

Patch #5 fixes the bridge driver to mark FDB entries that were added by
user as such. Otherwise, these entries will be ignored by underlying
switch drivers.

Patch #6 fixes a long standing issue in mlxsw where the driver
incorrectly programmed static FDB entries as both static and sticky.

Patches #7-#8 add test cases for above mentioned bugs.

Please consider patches #1, #2 and #4 for stable.
====================

Signed-off-by: David S. Miller <[email protected]>
6 years agoselftests: forwarding: Add a test case for externally learned FDB entries
Ido Schimmel [Fri, 18 Jan 2019 15:58:03 +0000 (15:58 +0000)]
selftests: forwarding: Add a test case for externally learned FDB entries

Test that externally learned FDB entries can roam, but not age out.

Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoselftests: mlxsw: Test FDB offload indication
Ido Schimmel [Fri, 18 Jan 2019 15:58:02 +0000 (15:58 +0000)]
selftests: mlxsw: Test FDB offload indication

Test that externally learned FDB entries added from user space are
marked as offloaded.

Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agomlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
Ido Schimmel [Fri, 18 Jan 2019 15:58:01 +0000 (15:58 +0000)]
mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky

The driver currently treats static FDB entries as both static and
sticky. This is incorrect and prevents such entries from being roamed to
a different port via learning.

Fix this by configuring static entries with ageing disabled and roaming
enabled.

In net-next we can add proper support for the newly introduced 'sticky'
flag.

Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-by: Alexander Petrovskiy <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: bridge: Mark FDB entries that were added by user as such
Ido Schimmel [Fri, 18 Jan 2019 15:58:00 +0000 (15:58 +0000)]
net: bridge: Mark FDB entries that were added by user as such

Externally learned entries can be added by a user or by a switch driver
that is notifying the bridge driver about entries that were learned in
hardware.

In the first case, the entries are not marked with the 'added_by_user'
flag, which causes switch drivers to ignore them and not offload them.

The 'added_by_user' flag can be set on externally learned FDB entries
based on the 'swdev_notify' parameter in br_fdb_external_learn_add(),
which effectively means if the created / updated FDB entry was added by
a user or not.

Fixes: 816a3bed9549 ("switchdev: Add fdb.added_by_user to switchdev notifications")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-by: Alexander Petrovskiy <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Cc: Roopa Prabhu <[email protected]>
Cc: Nikolay Aleksandrov <[email protected]>
Cc: [email protected]
Signed-off-by: David S. Miller <[email protected]>
6 years agomlxsw: spectrum_fid: Update dummy FID index
Nir Dotan [Fri, 18 Jan 2019 15:57:59 +0000 (15:57 +0000)]
mlxsw: spectrum_fid: Update dummy FID index

When using a tc flower action of egress mirred redirect, the driver adds
an implicit FID setting action. This implicit action sets a dummy FID to
the packet and is used as part of a design for trapping unmatched flows
in OVS.  While this implicit FID setting action is supposed to be a NOP
when a redirect action is added, in Spectrum-2 the FID record is
consulted as the dummy FID index is an 802.1D FID index and the packet
is dropped instead of being redirected.

Set the dummy FID index value to be within 802.1Q range. This satisfies
both Spectrum-1 which ignores the FID and Spectrum-2 which identifies it
as an 802.1Q FID and will then follow the redirect action.

Fixes: c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Signed-off-by: Nir Dotan <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agomlxsw: pci: Return error on PCI reset timeout
Nir Dotan [Fri, 18 Jan 2019 15:57:57 +0000 (15:57 +0000)]
mlxsw: pci: Return error on PCI reset timeout

Return an appropriate error in the case when the driver timeouts on waiting
for firmware to go out of PCI reset.

Fixes: 233fa44bd67a ("mlxsw: pci: Implement reset done check")
Signed-off-by: Nir Dotan <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agomlxsw: pci: Increase PCI SW reset timeout
Nir Dotan [Fri, 18 Jan 2019 15:57:56 +0000 (15:57 +0000)]
mlxsw: pci: Increase PCI SW reset timeout

Spectrum-2 PHY layer introduces a calibration period which is a part of the
Spectrum-2 firmware boot process. Hence increase the SW timeout waiting for
the firmware to come out of boot. This does not increase system boot time
in cases where the firmware PHY calibration process is done quickly.

Fixes: c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Signed-off-by: Nir Dotan <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agomlxsw: pci: Ring CQ's doorbell before RDQ's
Ido Schimmel [Fri, 18 Jan 2019 15:57:55 +0000 (15:57 +0000)]
mlxsw: pci: Ring CQ's doorbell before RDQ's

When a packet should be trapped to the CPU the device consumes a WQE
(work queue element) from an RDQ (receive descriptor queue) and copies
the packet to the address specified in the WQE. The device then tries to
post a CQE (completion queue element) that contains various metadata
(e.g., ingress port) about the packet to a CQ (completion queue).

In case the device managed to consume a WQE, but did not manage to post
the corresponding CQE, it will get stuck. This unlikely situation can be
triggered due to the scheme the driver is currently using to process
CQEs.

The driver will consume up to 512 CQEs at a time and after processing
each corresponding WQE it will ring the RDQ's doorbell, letting the
device know that a new WQE was posted for it to consume. Only after
processing all the CQEs (up to 512), the driver will ring the CQ's
doorbell, letting the device know that new ones can be posted.

Fix this by having the driver ring the CQ's doorbell for every processed
CQE, but before ringing the RDQ's doorbell. This guarantees that
whenever we post a new WQE, there is a corresponding CQE available. Copy
the currently processed CQE to prevent the device from overwriting it
with a new CQE after ringing the doorbell.

Note that the driver still arms the CQ only after processing all the
pending CQEs, so that interrupts for this CQ will only be delivered
after the driver finished its processing.

Before commit 8404f6f2e8ed ("mlxsw: pci: Allow to use CQEs of version 1
and version 2") the issue was virtually impossible to trigger since the
number of CQEs was twice the number of WQEs and the number of CQEs
processed at a time was equal to the number of available WQEs.

Fixes: 8404f6f2e8ed ("mlxsw: pci: Allow to use CQEs of version 1 and version 2")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-by: Semion Lisyansky <[email protected]>
Tested-by: Semion Lisyansky <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoMerge branch 'hns3-fixes'
David S. Miller [Fri, 18 Jan 2019 23:10:22 +0000 (15:10 -0800)]
Merge branch 'hns3-fixes'

Huazhong Tan says:

====================
net: hns3: code optimizations & bugfixes for HNS3 driver

This patchset includes bugfixes and code optimizations for the HNS3
ethernet controller driver
====================

Signed-off-by: David S. Miller <[email protected]>
6 years agonet: hns3: add HNAE3_RESTORE_CLIENT interface in enet module
Yunsheng Lin [Fri, 18 Jan 2019 08:13:14 +0000 (16:13 +0800)]
net: hns3: add HNAE3_RESTORE_CLIENT interface in enet module

The HNAE3_INIT_CLIENT interface is also used when changing tc
configuration, vlan/mac hardware table does not need to be restored
when tc configuration changes.

This patch adds a HNAE3_RESTORE_CLIENT interface to restore the
vlan/mac hardware table when resetting.

Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Peng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: hns3: do reinitialization while ETS configuration changed
Huazhong Tan [Fri, 18 Jan 2019 08:13:13 +0000 (16:13 +0800)]
net: hns3: do reinitialization while ETS configuration changed

When the ETS information is changed, the network device needs to be
re-initialized, otherwise the information such as the receiving queue
will be incorrect.

Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Peng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: hns3: fix wrong combined count returned by ethtool -l
Huazhong Tan [Fri, 18 Jan 2019 08:13:12 +0000 (16:13 +0800)]
net: hns3: fix wrong combined count returned by ethtool -l

The current code returns the number of all queues that can be used and
the number of queues that have been allocated, which is incorrect.
What should be returned is the number of queues allocated for each enabled
TC and the number of queues that can be allocated.

This patch fixes it.

Fixes: 482d2e9c1cc7 ("net: hns3: add support to query tqps number")
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Peng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: hns3: adjust the use of alloc_tqps and num_tqps
Huazhong Tan [Fri, 18 Jan 2019 08:13:11 +0000 (16:13 +0800)]
net: hns3: adjust the use of alloc_tqps and num_tqps

The alloc_tqps field of struct hclge_vport represents the total number
of tqps allocated to the vport. The num_tqps of struct
hnae3_knic_private_info indicates the total number of all enabled tqps,
which needs to be distinguished during use.

Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Peng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: hns3: fix user configuration loss for ethtool -L
Huazhong Tan [Fri, 18 Jan 2019 08:13:10 +0000 (16:13 +0800)]
net: hns3: fix user configuration loss for ethtool -L

Ethtool -L option with the combined parameter is for changing the number of
multi-purpose channels of the specified network device. Under the current
scheme, the user configuration information will be lost after the reset or
TC information changed.

This patch fixes this issue. By default, this configuration is set to the
minimum between the number of queues for each enabled TCs and the maximum
number support available in the hardware. When there is a user
configuration, regardless of the reset or TC information change, it should
keep the user's configuration while it is under the hardware limits,
otherwise set to the maximum number support available in the hardware.

Fixes: 09f2af6405b8 ("net: hns3: add support to modify tqps number")
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Peng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: hns3: remove redundant codes in hclge_knic_setup
Huazhong Tan [Fri, 18 Jan 2019 08:13:09 +0000 (16:13 +0800)]
net: hns3: remove redundant codes in hclge_knic_setup

The TC info will be updated in hclge_tm_vport_tc_info_update(),
so hclge_knic_setup() no need to do it again.

Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Peng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: hns3: modify parameter checks in the hns3_set_channels
Huazhong Tan [Fri, 18 Jan 2019 08:13:08 +0000 (16:13 +0800)]
net: hns3: modify parameter checks in the hns3_set_channels

The number of queues for each enabled TC should range from 1 to
the maximum available value, and return directly if the value
is same as the current one.

Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Peng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: hns3: add interface hclge_tm_bp_setup
Huazhong Tan [Fri, 18 Jan 2019 08:13:07 +0000 (16:13 +0800)]
net: hns3: add interface hclge_tm_bp_setup

Provide a common interface to complete the back pressure settings
of all enabled TCs. So other functions directly call this interface
to complete the corresponding operation.

Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Peng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: hns3: reuse reinitialization interface in the hns3_set_channels
Huazhong Tan [Fri, 18 Jan 2019 08:13:06 +0000 (16:13 +0800)]
net: hns3: reuse reinitialization interface in the hns3_set_channels

There is already common interface for network device reinitialization,
so hns3_set_channels() should just call them.

Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Peng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: hns3: remove unnecessary hns3_adjust_tqps_num
Huazhong Tan [Fri, 18 Jan 2019 08:13:05 +0000 (16:13 +0800)]
net: hns3: remove unnecessary hns3_adjust_tqps_num

The parameter passed to hns3_set_channels() are already the number of
queues per channel of the enabled TC, so it is not need to divide
the number of enabled TCs.

Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Peng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: hns3: remove unused member in struct hns3_enet_ring
Huazhong Tan [Fri, 18 Jan 2019 08:13:04 +0000 (16:13 +0800)]
net: hns3: remove unused member in struct hns3_enet_ring

The irq_init_flag field in struct hns3_enet_ring is unnecessary.
This patch removes it.

Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Peng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: hns3: modify enet reinitialization interface
Huazhong Tan [Fri, 18 Jan 2019 08:13:03 +0000 (16:13 +0800)]
net: hns3: modify enet reinitialization interface

hns3_reset_notify_init_enet and hns3_reset_notify_uninit_enet are the
reinitialization interface that will be called when the device reset,
the number of TC changed, or the queue length changed. So these two
function should call hns3_get_ring_config() and hns3_put_ring_config()
to allocate and free memory for the ring with the correct number.

Also this patch fixes a double free problem when
hns3_reset_notify_uninit_enet calling hns3_nic_dealloc_vector_data

Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Peng Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoMerge branch 'Devlink-health-reporting-and-recovery-system'
David S. Miller [Fri, 18 Jan 2019 22:51:23 +0000 (14:51 -0800)]
Merge branch 'Devlink-health-reporting-and-recovery-system'

Eran Ben Elisha says:

====================
Devlink health reporting and recovery system

The health mechanism is targeted for Real Time Alerting, in order to know when
something bad had happened to a PCI device
- Provide alert debug information
- Self healing
- If problem needs vendor support, provide a way to gather all needed debugging
  information.

The main idea is to unify and centralize driver health reports in the
generic devlink instance and allow the user to set different
attributes of the health reporting and recovery procedures.

The devlink health reporter:
Device driver creates a "health reporter" per each error/health type.
Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
or unknown (driver specific).
For each registered health reporter a driver can issue error/health reports
asynchronously. All health reports handling is done by devlink.
Device driver can provide specific callbacks for each "health reporter", e.g.
 - Recovery procedures
 - Diagnostics and object dump procedures
 - OOB initial attributes
Different parts of the driver can register different types of health reporters
with different handlers.

Once an error is reported, devlink health will do the following actions:
  * A log is being send to the kernel trace events buffer
  * Health status and statistics are being updated for the reporter instance
  * Object dump is being taken and saved at the reporter instance (as long as
    there is no other dump which is already stored)
  * Auto recovery attempt is being done. Depends on:
    - Auto-recovery configuration
    - Grace period vs. time passed since last recover

The user interface:
User can access/change each reporter attributes and driver specific callbacks
via devlink, e.g per error type (per health reporter)
 - Configure reporter's generic attributes (like: Disable/enable auto recovery)
 - Invoke recovery procedure
 - Run diagnostics
 - Object dump

The devlink health interface (via netlink):
DEVLINK_CMD_HEALTH_REPORTER_GET
  Retrieves status and configuration info per DEV and reporter.
DEVLINK_CMD_HEALTH_REPORTER_SET
  Allows reporter-related configuration setting.
DEVLINK_CMD_HEALTH_REPORTER_RECOVER
  Triggers a reporter's recovery procedure.
DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE
  Retrieves diagnostics data from a reporter on a device.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET
  Retrieves the last stored dump. Devlink health
  saves a single dump. If an dump is not already stored by the devlink
  for this reporter, devlink generates a new dump.
  dump output is defined by the reporter.
DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR
  Clears the last saved dump file for the specified reporter.

                                               netlink
                                      +--------------------------+
                                      |                          |
                                      |            +             |
                                      |            |             |
                                      +--------------------------+
                                                   |request for ops
                                                   |(diagnose,
 mlx5_core                             devlink     |recover,
                                                   |dump)
+--------+                            +--------------------------+
|        |                            |    reporter|             |
|        |                            |  +---------v----------+  |
|        |   ops execution            |  |                    |  |
|     <----------------------------------+                    |  |
|        |                            |  |                    |  |
|        |                            |  + ^------------------+  |
|        |                            |    | request for ops     |
|        |                            |    | (recover, dump)     |
|        |                            |    |                     |
|        |                            |  +-+------------------+  |
|        |     health report          |  | health handler     |  |
|        +------------------------------->                    |  |
|        |                            |  +--------------------+  |
|        |     health reporter create |                          |
|        +---------------------------->                          |
+--------+                            +--------------------------+

In this patchset, mlx5e TX reporter is implemented.

v2:
- Remove FW* reporters to decrease the amount of patches in the patchset
====================

Signed-off-by: David S. Miller <[email protected]>
6 years agodevlink: Add Documentation/networking/devlink-health.txt
Aya Levin [Thu, 17 Jan 2019 21:59:20 +0000 (23:59 +0200)]
devlink: Add Documentation/networking/devlink-health.txt

This patch adds a new file to add information about devlink health
mechanism.

Signed-off-by: Aya Levin <[email protected]>
Signed-off-by: Eran Ben Elisha <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet/mlx5e: Add TX timeout support for mlx5e TX reporter
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:19 +0000 (23:59 +0200)]
net/mlx5e: Add TX timeout support for mlx5e TX reporter

With this patch, ndo_tx_timeout callback will be redirected to the TX
reporter in order to detect a TX timeout error and report it to the
devlink health. (The watchdog detects TX timeouts, but the driver verify
the issue still exists before launching any recover method).

In addition, recover from TX timeout in case of lost interrupt was added
to the TX reporter recover method. The TX timeout recover from lost
interrupt is not a new feature in the driver, this patch re-organize the
functionality and move it to the TX reporter recovery flow.

TX timeout example:
(with auto_recover set to false, if set to true, the manual recover and
diagnose sections are irrelevant)

$cat /sys/kernel/debug/tracing/trace
...
devlink_health_report: bus_name=pci dev_name=0000:00:09.0
driver_name=mlx5_core reporter_name=TX: TX timeout on queue: 0, SQ: 0xd8a, CQ:
0x406, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 13972000

$devlink health diagnose pci/0000:00:09 reporter TX
SQ 0xd8a: HW state: 1, stopped: 1
SQ 0xe44: HW state: 1, stopped: 0
SQ 0xeb4: HW state: 1, stopped: 0
SQ 0xf1f: HW state: 1, stopped: 0
SQ 0xf80: HW state: 1, stopped: 0
SQ 0xfe5: HW state: 1, stopped: 0

$devlink health recover pci/0000:00:09 reporter TX
$devlink health show
pci/0000:00:09.0:
  name TX state healthy #err 1 #recover 1 last_dump_ts N/A dump_available false
    attributes:
        grace_period 500 auto_recover false

Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet/mlx5e: Add TX reporter support
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:18 +0000 (23:59 +0200)]
net/mlx5e: Add TX reporter support

Add mlx5e tx reporter to devlink health reporters. This reporter will be
responsible for diagnosing, reporting and recovering of TX errors.
This patch declares the TX reporter operations and allocate it using the
devlink health API. Currently, this reporter supports reporting and
recovering from send error CQE only. In addition, it adds diagnose
information for the open SQs.

For a local SQ recover (due to driver error report), in case of SQ recover
failure, the recover operation will be considered as a failure.
For a full TX recover, an attempt to close and open the channels will be
done. If this one passed successfully, it will be considered as a
successful recover.

The SQ recover from error CQE flow is not a new feature in the driver,
this patch re-organize the functions and adapt them for the devlink
health API. For this purpose, move code from en_main.c to a new file
named reporter_tx.c.

Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agodevlink: Add health dump {get,clear} commands
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:17 +0000 (23:59 +0200)]
devlink: Add health dump {get,clear} commands

Add devlink health dump commands, in order to run an dump operation
over a specific reporter.

The supported operations are dump_get in order to get last saved
dump (if not exist, dump now) and dump_clear to clear last saved
dump.

It is expected from driver's callback for diagnose command to fill it
via the buffer descriptors API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.

Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agodevlink: Add health diagnose command
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:16 +0000 (23:59 +0200)]
devlink: Add health diagnose command

Add devlink health diagnose command, in order to run a diagnose
operation over a specific reporter.

It is expected from driver's callback for diagnose command to fill it
via the buffer descriptors API. Devlink will parse it and convert it to
netlink nla API in order to pass it to the user.

Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agodevlink: Add health recover command
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:15 +0000 (23:59 +0200)]
devlink: Add health recover command

Add devlink health recover command to the uapi, in order to allow the user
to execute a recover operation over a specific reporter.

Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agodevlink: Add health set command
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:14 +0000 (23:59 +0200)]
devlink: Add health set command

Add devlink health set command, in order to set configuration parameters
for a specific reporter.
Supported parameters are:
- graceful_period: Time interval between auto recoveries (in msec)
- auto_recover: Determines if the devlink shall execute recover upon
receiving error for the reporter

Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agodevlink: Add health get command
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:13 +0000 (23:59 +0200)]
devlink: Add health get command

Add devlink health get command to provide reporter/s data for user space.
Add the ability to get data per reporter or dump data from all available
reporters.

Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agodevlink: Add health report functionality
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:12 +0000 (23:59 +0200)]
devlink: Add health report functionality

Upon error discover, every driver can report it to the devlink health
mechanism via devlink_health_report function, using the appropriate
reporter registered to it. Driver can pass error specific context which
will be delivered to it as part of the dump / recovery callbacks.

Once an error is reported, devlink health will do the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and stored at the reporter instance (as long
  as there is no other dump which is already stored)
* Auto recovery attempt is being done. depends on:
  - Auto Recovery configuration
  - Grace period vs. time since last recover

Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agodevlink: Add health reporter create/destroy functionality
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:11 +0000 (23:59 +0200)]
devlink: Add health reporter create/destroy functionality

Devlink health reporter is an instance for reporting, diagnosing and
recovering from run time errors discovered by the reporters.
Define it's data structure and supported operations.
In addition, expose devlink API to create and destroy a reporter.
Each devlink instance will hold it's own reporters list.

As part of the allocation, driver shall provide a set of callbacks which
will be used the devlink in order to handle health reports and user
commands related to this reporter. In addition, driver is entitled to
provide some priv pointer, which can be fetched from the reporter by
devlink_health_reporter_priv function.

For each reporter, devlink will hold a metadata of statistics,
buffers and status.

Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agodevlink: Add health buffer support
Eran Ben Elisha [Thu, 17 Jan 2019 21:59:10 +0000 (23:59 +0200)]
devlink: Add health buffer support

Devlink health buffer is a mechanism to pass descriptors between drivers
and devlink. The API allows the driver to add objects, object pair,
value array (nested attributes), value and name.

Driver can use this API to fill the buffers in a format which can be
translated by the devlink to the netlink message.

In order to fulfill it, an internal buffer descriptor is defined. This
will hold the data and metadata per each attribute and by used to pass
actual commands to the netlink.

This mechanism will be later used in devlink health for dump and diagnose
data store by the drivers.

Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet_sched: add hit counter for matchall
Cong Wang [Thu, 17 Jan 2019 20:44:25 +0000 (12:44 -0800)]
net_sched: add hit counter for matchall

Although matchall always matches packets, however, it still
relies on a protocol match first. So it is still useful to have
such a counter for matchall. Of course, unlike u32, every time
we hit a matchall filter, it is always a success, so we don't
have to distinguish them.

Sample output:

filter protocol 802.1Q pref 100 matchall chain 0
filter protocol 802.1Q pref 100 matchall chain 0 handle 0x1
  not_in_hw (rule hit 10)
action order 1: vlan  pop continue
 index 1 ref 1 bind 1 installed 40 sec used 1 sec
Action statistics:
Sent 836 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

Reported-by: Martin Olsson <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Cc: Jiri Pirko <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoMerge branch 'phy-improve-stopping-PHY'
David S. Miller [Fri, 18 Jan 2019 22:12:25 +0000 (14:12 -0800)]
Merge branch 'phy-improve-stopping-PHY'

Heiner Kallweit says:

====================
net: phy: improve stopping PHY

This patchset improves and simplifies stopping the PHY.

Heiner Kallweit (3):
  net: phy: stop PHY if needed when entering phy_disconnect
  net: phy: ensure phylib state machine is stopped after calling phy_stop
  net: phy: remove phy_stop_interrupts

v2:
- break down the patch to a patchset
v3:
- don't warn if driver didn't call phy_stop before phy_disconnect
====================

Signed-off-by: David S. Miller <[email protected]>
6 years agonet: phy: remove phy_stop_interrupts
Heiner Kallweit [Thu, 17 Jan 2019 19:09:21 +0000 (20:09 +0100)]
net: phy: remove phy_stop_interrupts

Interrupts have been disabled in phy_stop() already. So we can remove
phy_stop_interrupts() and free the interrupt in phy_disconnect()
directly.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: phy: ensure phylib state machine is stopped after calling phy_stop
Heiner Kallweit [Thu, 17 Jan 2019 19:08:39 +0000 (20:08 +0100)]
net: phy: ensure phylib state machine is stopped after calling phy_stop

The call to the phylib state machine in phy_stop() just ensures that
the state machine isn't re-triggered, but a state machine call may
be scheduled already. So lets's call phy_stop_machine().
This also allows to get rid of the call to phy_stop_machine() in
phy_disconnect().

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: phy: stop PHY if needed when entering phy_disconnect
Heiner Kallweit [Thu, 17 Jan 2019 19:07:54 +0000 (20:07 +0100)]
net: phy: stop PHY if needed when entering phy_disconnect

Stop PHY if needed when entering phy_disconnect. This allows drivers
that don't need a separate call to phy_stop() to omit this call.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoMAINTAINERS: update email addresses of liquidio driver maintainers
Felix Manlunas [Thu, 17 Jan 2019 18:07:45 +0000 (18:07 +0000)]
MAINTAINERS: update email addresses of liquidio driver maintainers

Update email addresses of liquidio driver maintainers.  Also remove a
former maintainer.

Signed-off-by: Felix Manlunas <[email protected]>
Acked-by: Derek Chickles <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: Fix typo in NET_FAILOVER help text
Jonathan Neuschäfer [Thu, 17 Jan 2019 17:02:18 +0000 (18:02 +0100)]
net: Fix typo in NET_FAILOVER help text

"also enables" should not be spelled as one word.

Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")
Signed-off-by: Jonathan Neuschäfer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: Fix usage of pskb_trim_rcsum
Ross Lagerwall [Thu, 17 Jan 2019 15:34:38 +0000 (15:34 +0000)]
net: Fix usage of pskb_trim_rcsum

In certain cases, pskb_trim_rcsum() may change skb pointers.
Reinitialize header pointers afterwards to avoid potential
use-after-frees. Add a note in the documentation of
pskb_trim_rcsum(). Found by KASAN.

Signed-off-by: Ross Lagerwall <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agotcp: declare tcp_mmap() only when CONFIG_MMU is set
Yafang Shao [Thu, 17 Jan 2019 10:03:14 +0000 (18:03 +0800)]
tcp: declare tcp_mmap() only when CONFIG_MMU is set

Since tcp_mmap() is defined when CONFIG_MMU is set.

Signed-off-by: Yafang Shao <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: jme: fix indentation issues
Colin Ian King [Thu, 17 Jan 2019 00:03:26 +0000 (00:03 +0000)]
net: jme: fix indentation issues

There are two lines that have indentation issues, fix these. Also remove
an empty line.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: vxge: fix indentation issue
Colin Ian King [Wed, 16 Jan 2019 23:59:10 +0000 (23:59 +0000)]
net: vxge: fix indentation issue

There is a goto statement that indented too deeply, fix it.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: phy: improve get_phy_id
Heiner Kallweit [Wed, 16 Jan 2019 18:52:51 +0000 (19:52 +0100)]
net: phy: improve get_phy_id

Only caller of get_phy_id() is get_phy_device(). There a PHY ID of
0xffffffff is translated back to -ENODEV. So we can avoid some
overhead by returning -ENODEV directly.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: phy: remove state PHY_CHANGELINK
Heiner Kallweit [Wed, 16 Jan 2019 18:47:57 +0000 (19:47 +0100)]
net: phy: remove state PHY_CHANGELINK

Since recent changes to the phylib state machine state PHY_CHANGELINK
isn't used any longer. Therefore let's remove it.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: ip6_gre: remove gre_hdr_len from ip6erspan_rcv
Lorenzo Bianconi [Wed, 16 Jan 2019 18:38:05 +0000 (19:38 +0100)]
net: ip6_gre: remove gre_hdr_len from ip6erspan_rcv

Remove gre_hdr_len from ip6erspan_rcv routine signature since
it is not longer used

Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoMerge tag 'media/v5.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Fri, 18 Jan 2019 19:34:10 +0000 (07:34 +1200)]
Merge tag 'media/v5.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - a regression fix at v4l2 core, with affects multi-plane streams

 - a fix at vim2m driver

* tag 'media/v5.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: vim2m: only cancel work if it is for right context
  media: v4l: ioctl: Validate num_planes for debug messages
  media: v4l: ioctl: Validate num_planes before using it
  media: v4l2-ioctl: Clear only per-plane reserved fields

6 years agoMerge tag 'pci-v5.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Linus Torvalds [Fri, 18 Jan 2019 19:26:16 +0000 (07:26 +1200)]
Merge tag 'pci-v5.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas::

 - Fix PCI kconfig menu organization (Rob Herring)

 - Fix pci_alloc_irq_vectors_affinity() error return to allow "reduce
   and retry" for drivers using IRQ sets (Ming Lei)

 - Fix "pci=disable_acs_redir" initdata use-after-free problem (Logan
   Gunthorpe)

* tag 'pci-v5.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Fix __initdata issue with "pci=disable_acs_redir" parameter
  PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity()
  PCI: Fix PCI kconfig menu organization

6 years agoMerge tag 'i3c/fixes-for-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 18 Jan 2019 19:23:25 +0000 (07:23 +1200)]
Merge tag 'i3c/fixes-for-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux

Pull i3c fixes from Boris Brezillon:

 - Fix the error check on master->sysclk val in the Cadence driver

 - Fix reattach implementation in the Designware driver

* tag 'i3c/fixes-for-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
  i3c: master: dw-i3c-master: fix i3c_attach/reattach
  i3c: master: Fix an error checking typo in 'cdns_i3c_master_probe()'

6 years agoMerge tag 'mtd/fixes-for-5.0-rc3' of git://git.infradead.org/linux-mtd
Linus Torvalds [Fri, 18 Jan 2019 19:21:43 +0000 (07:21 +1200)]
Merge tag 'mtd/fixes-for-5.0-rc3' of git://git.infradead.org/linux-mtd

Pull mtd fixes from Boris Brezillon:
 "Raw NAND changes:

   - jz4740: fix a compilation warning

   - fsmc: fix a regression introduced by ->select_chip() deprecation

   - denali: fix a regression introduced by NAND_KEEP_TIMINGS addition"

* tag 'mtd/fixes-for-5.0-rc3' of git://git.infradead.org/linux-mtd:
  mtd: rawnand: denali: get ->setup_data_interface() working again
  mtd: nand: jz4740: fix '__iomem *' vs. '* __iomem'
  mtd: rawnand: fsmc: Keep bank enable bit set

6 years agoMerge tag 'regmap-fix-v5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 18 Jan 2019 19:17:19 +0000 (07:17 +1200)]
Merge tag 'regmap-fix-v5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
 "The cleanups for the way we handle type information introduced during
  the merge window revealed that we'd been abusing the irq APIs for a
  long time, causing breakage for systems.

  This has a couple of minimal fixes for that which restore the previous
  behaviour for the time being, we'll fix it properly for v5.1 but
  that'd be a bit much to do as a bug fix"

* tag 'regmap-fix-v5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap-irq: do not write mask register if mask_base is zero
  regmap: regmap-irq: silently ignore unsupported type settings

6 years agonet: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling
Thomas Petazzoni [Wed, 16 Jan 2019 09:53:58 +0000 (10:53 +0100)]
net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling

The current code in __mdiobus_register() doesn't properly handle
failures returned by the devm_gpiod_get_optional() call: it returns
immediately, without unregistering the device that was added by the
call to device_register() earlier in the function.

This leaves a stale device, which then causes a NULL pointer
dereference in the code that handles deferred probing:

[    1.489982] Unable to handle kernel NULL pointer dereference at virtual address 00000074
[    1.498110] pgd = (ptrval)
[    1.500838] [00000074] *pgd=00000000
[    1.504432] Internal error: Oops: 17 [#1] SMP ARM
[    1.509133] Modules linked in:
[    1.512192] CPU: 1 PID: 51 Comm: kworker/1:3 Not tainted 4.20.0-00039-g3b73a4cc8b3e-dirty #99
[    1.520708] Hardware name: Xilinx Zynq Platform
[    1.525261] Workqueue: events deferred_probe_work_func
[    1.530403] PC is at klist_next+0x10/0xfc
[    1.534403] LR is at device_for_each_child+0x40/0x94
[    1.539361] pc : [<c0683fbc>]    lr : [<c0455d90>]    psr: 200e0013
[    1.545628] sp : ceeefe68  ip : 00000001  fp : ffffe000
[    1.550863] r10: 00000000  r9 : c0c66790  r8 : 00000000
[    1.556079] r7 : c0457d44  r6 : 00000000  r5 : ceeefe8c  r4 : cfa2ec78
[    1.562604] r3 : 00000064  r2 : c0457d44  r1 : ceeefe8c  r0 : 00000064
[    1.569129] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    1.576263] Control: 18c5387d  Table: 0ed7804a  DAC: 00000051
[    1.582013] Process kworker/1:3 (pid: 51, stack limit = 0x(ptrval))
[    1.588280] Stack: (0xceeefe68 to 0xceef0000)
[    1.592630] fe60:                   cfa2ec78 c0c03c08 00000000 c0457d44 00000000 c0c66790
[    1.600814] fe80: 00000000 c0455d90 ceeefeac 00000064 00000000 0d7a542e cee9d494 cfa2ec78
[    1.608998] fea0: cfa2ec78 00000000 c0457d44 c0457d7c cee9d494 c0c03c08 00000000 c0455dac
[    1.617182] fec0: cf98ba44 cf926a00 cee9d494 0d7a542e 00000000 cf935a10 cf935a10 cf935a10
[    1.625366] fee0: c0c4e9b8 c0457d7c c0c4e80c 00000001 cf935a10 c0457df4 cf935a10 c0c4e99c
[    1.633550] ff00: c0c4e99c c045a27c c0c4e9c4 ced63f80 cfde8a80 cfdebc00 00000000 c013893c
[    1.641734] ff20: cfde8a80 cfde8a80 c07bd354 ced63f80 ced63f94 cfde8a80 00000008 c0c02d00
[    1.649936] ff40: cfde8a98 cfde8a80 ffffe000 c0139a30 ffffe000 c0c6624a c07bd354 00000000
[    1.658120] ff60: ffffe000 cee9e780 ceebfe00 00000000 ceeee000 ced63f80 c0139788 cf8cdea4
[    1.666304] ff80: cee9e79c c013e598 00000001 ceebfe00 c013e44c 00000000 00000000 00000000
[    1.674488] ffa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
[    1.682671] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.690855] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    1.699058] [<c0683fbc>] (klist_next) from [<c0455d90>] (device_for_each_child+0x40/0x94)
[    1.707241] [<c0455d90>] (device_for_each_child) from [<c0457d7c>] (device_reorder_to_tail+0x38/0x88)
[    1.716476] [<c0457d7c>] (device_reorder_to_tail) from [<c0455dac>] (device_for_each_child+0x5c/0x94)
[    1.725692] [<c0455dac>] (device_for_each_child) from [<c0457d7c>] (device_reorder_to_tail+0x38/0x88)
[    1.734927] [<c0457d7c>] (device_reorder_to_tail) from [<c0457df4>] (device_pm_move_to_tail+0x28/0x40)
[    1.744235] [<c0457df4>] (device_pm_move_to_tail) from [<c045a27c>] (deferred_probe_work_func+0x58/0x8c)
[    1.753746] [<c045a27c>] (deferred_probe_work_func) from [<c013893c>] (process_one_work+0x210/0x4fc)
[    1.762888] [<c013893c>] (process_one_work) from [<c0139a30>] (worker_thread+0x2a8/0x5c0)
[    1.771072] [<c0139a30>] (worker_thread) from [<c013e598>] (kthread+0x14c/0x154)
[    1.778482] [<c013e598>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[    1.785689] Exception stack(0xceeeffb0 to 0xceeefff8)
[    1.790739] ffa0:                                     00000000 00000000 00000000 00000000
[    1.798923] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.807107] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    1.813724] Code: e92d47f0 e1a05000 e8900048 e1a00003 (e5937010)
[    1.819844] ---[ end trace 3c2c0c8b65399ec9 ]---

The actual error that we had from devm_gpiod_get_optional() was
-EPROBE_DEFER, due to the GPIO being provided by a driver that is
probed later than the Ethernet controller driver.

To fix this, we simply add the missing device_del() invocation in the
error path.

Fixes: 69226896ad636 ("mdio_bus: Issue GPIO RESET to PHYs")
Signed-off-by: Thomas Petazzoni <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agodoc: net: fix bad references to network drivers
Otto Sabart [Mon, 14 Jan 2019 11:56:36 +0000 (12:56 +0100)]
doc: net: fix bad references to network drivers

Fix "reference to nonexisting document" warnings.

Fixes: b255e500c8dc ("net: documentation: build a directory structure for drivers")
Signed-off-by: Otto Sabart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoMerge tag 'powerpc-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Fri, 18 Jan 2019 17:55:42 +0000 (05:55 +1200)]
Merge tag 'powerpc-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "A couple of weeks of fixes.

  There's one fix for an oops on Power9 machines with Open CAPI
  adapters.

  And a fix for probable memory corruption in some of the new NPU code,
  caught by smatch though and not seen in the wild.

  Plus a few other minor fixes.

  There's one non-fix which is the perf_regs change. That was sent
  during the merge window but I accidentally only merged the first of
  two patches in the series. It's been in linux-next so hopefully
  doesn't conflict with anything in acme's tree.

  Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Breno Leitao,
  Christian Lamparter, Christophe Leroy, Dan Carpenter, Frederic Barrat,
  Greg Kurz, Jason A. Donenfeld, Madhavan Srinivasan"

* tag 'powerpc-5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/syscalls: Fix syscall tracing
  powerpc/pseries: Fix build break due to pnv_npu2_init()
  powerpc/4xx/ocm: Fix fix for phys_addr_t printf warnings
  powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
  powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM
  powerpc/8xx: fix setting of pagetable for Abatron BDI debug tool.
  powerpc/powernv/npu: Allocate enough memory in pnv_try_setup_npu_table_group()
  powerpc/perf: Update perf_regs structure to include MMCRA

6 years agoMerge tag 'for-linus-5.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 18 Jan 2019 17:53:41 +0000 (05:53 +1200)]
Merge tag 'for-linus-5.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - Several fixes for the Xen pvcalls drivers (1 fix for the backend and
   8 for the frontend).

 - A fix for a rather longstanding bug in the Xen sched_clock()
   interface which led to weird time jumps when migrating the system.

 - A fix for avoiding accesses to x2apic MSRs in Xen PV guests.

* tag 'for-linus-5.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Fix x86 sched_clock() interface for xen
  pvcalls-front: fix potential null dereference
  always clear the X2APIC_ENABLE bit for PV guest
  pvcalls-front: Avoid get_free_pages(GFP_KERNEL) under spinlock
  xen/pvcalls: remove set but not used variable 'intf'
  pvcalls-back: set -ENOTCONN in pvcalls_conn_back_read
  pvcalls-front: don't return error when the ring is full
  pvcalls-front: properly allocate sk
  pvcalls-front: don't try to free unallocated rings
  pvcalls-front: read all data before closing the connection

6 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Fri, 18 Jan 2019 17:48:43 +0000 (05:48 +1200)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes the following issues:
   - Zero-length DMA mapping in caam
   - Invalidly mapping stack memory for DMA in talitos
   - Use after free in cavium/nitrox
   - Key parsing in authenc
   - Undefined shift in sm3
   - Bogus completion call in authencesn
   - SHA support detection in caam"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: sm3 - fix undefined shift by >= width of value
  crypto: talitos - fix ablkcipher for CONFIG_VMAP_STACK
  crypto: talitos - reorder code in talitos_edesc_alloc()
  crypto: adiantum - initialize crypto_spawn::inst
  crypto: cavium/nitrox - Use after free in process_response_list()
  crypto: authencesn - Avoid twice completion call in decrypt path
  crypto: caam - fix SHA support detection
  crypto: caam - fix zero-length buffer DMA mapping
  crypto: ccree - convert to use crypto_authenc_extractkeys()
  crypto: bcm - convert to use crypto_authenc_extractkeys()
  crypto: authenc - fix parsing key with misaligned rta_len

6 years agoMerge tag 'acpi-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 18 Jan 2019 17:46:00 +0000 (05:46 +1200)]
Merge tag 'acpi-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix an ACPI initialization ordering issue introduced in the 4.17
  time frame and causing functional problems to appear on multiple
  systems and fix some fallout of the recent change to enable building
  kernels with ACPI support and without PCI.

  Specifics:

   - Restore the ACPI initialization ordering changed implicitly by the
     module-level AML handling rework during the 4.17 development cycle
     that caused the EC address space handler based on information from
     ECDT to be set up before loading AML definition blocks, making it
     effectively not accessible by AML on some systems that don't work
     as expected any more (Rafael Wysocki).

   - Add direct dependencies on PCI to Kconfig in multiple places for
     code that depends on both ACPI and PCI, but the PCI dependency was
     implicitly satisfied by the ACPI dependency before, to prevent
     invalid configurations from being created, for example by
     randconfig (Sinan Kaya)"

* tag 'acpi-5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: EC: Look for ECDT EC after calling acpi_load_tables()
  drivers: thermal: int340x_thermal: Make PCI dependency explicit
  x86/intel/lpss: Make PCI dependency explicit
  platform/x86: apple-gmux: Make PCI dependency explicit
  platform/x86: intel_pmc: Make PCI dependency explicit
  platform/x86: intel_ips: make PCI dependency explicit
  vga-switcheroo: make PCI dependency explicit
  ata: pata_acpi: Make PCI dependency explicit
  ACPI / LPSS: Make PCI dependency explicit

6 years agoMerge tag 'fbdev-v5.0-rc3' of git://github.com/bzolnier/linux
Linus Torvalds [Fri, 18 Jan 2019 17:43:05 +0000 (05:43 +1200)]
Merge tag 'fbdev-v5.0-rc3' of git://github.com/bzolnier/linux

Pull fbdev fixes from Bartlomiej Zolnierkiewicz:

 - fix stack memory leak in omap2fb driver (Vlad Tsyrklevich)

 - fix OF node name handling v4.20 regression in offb driver (Rob
   Herring)

 - convert CONFIG_FB_LOGO_CENTER config option added in v5.0-rc1 into a
   kernel parameter (Peter Rosin)

* tag 'fbdev-v5.0-rc3' of git://github.com/bzolnier/linux:
  fbdev: fbmem: convert CONFIG_FB_LOGO_CENTER into a cmd line option
  fbdev: offb: Fix OF node name handling
  omap2fb: Fix stack memory disclosure

6 years agoMerge tag 'drm-fixes-2019-01-18-1' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 18 Jan 2019 17:41:38 +0000 (05:41 +1200)]
Merge tag 'drm-fixes-2019-01-18-1' of git://anongit.freedesktop.org/drm/drm

Pull drm update from Dave Airlie:
 "Add nouveau TU102 (RTX 2080 Ti) support"

* tag 'drm-fixes-2019-01-18-1' of git://anongit.freedesktop.org/drm/drm:
  drm/nouveau/core: recognise TU102

6 years agobtrfs: wakeup cleaner thread when adding delayed iput
Josef Bacik [Fri, 11 Jan 2019 15:21:02 +0000 (10:21 -0500)]
btrfs: wakeup cleaner thread when adding delayed iput

The cleaner thread usually takes care of delayed iputs, with the
exception of the btrfs_end_transaction_throttle path.  Delaying iputs
means we are potentially delaying the eviction of an inode and it's
respective space.  The cleaner thread only gets woken up every 30
seconds, or when we require space.  If there are a lot of inodes that
need to be deleted we could induce a serious amount of latency while we
wait for these inodes to be evicted.  So instead wakeup the cleaner if
it's not already awake to process any new delayed iputs we add to the
list.  If we suddenly need space we will less likely be backed up
behind a bunch of inodes that are waiting to be deleted, and we could
possibly free space before we need to get into the flushing logic which
will save us some latency.

Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>
6 years agobtrfs: run delayed iputs before committing
Josef Bacik [Fri, 11 Jan 2019 15:21:01 +0000 (10:21 -0500)]
btrfs: run delayed iputs before committing

Delayed iputs means we can have final iputs of deleted inodes in the
queue, which could potentially generate a lot of pinned space that could
be free'd.  So before we decide to commit the transaction for ENOPSC
reasons, run the delayed iputs so that any potential space is free'd up.
If there is and we freed enough we can then commit the transaction and
potentially be able to make our reservation.

Reviewed-by: Omar Sandoval <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>
6 years agobtrfs: wait on ordered extents on abort cleanup
Josef Bacik [Wed, 21 Nov 2018 19:05:45 +0000 (14:05 -0500)]
btrfs: wait on ordered extents on abort cleanup

If we flip read-only before we initiate writeback on all dirty pages for
ordered extents we've created then we'll have ordered extents left over
on umount, which results in all sorts of bad things happening.  Fix this
by making sure we wait on ordered extents if we have to do the aborted
transaction cleanup stuff.

generic/475 can produce this warning:

 [ 8531.177332] WARNING: CPU: 2 PID: 11997 at fs/btrfs/disk-io.c:3856 btrfs_free_fs_root+0x95/0xa0 [btrfs]
 [ 8531.183282] CPU: 2 PID: 11997 Comm: umount Tainted: G        W 5.0.0-rc1-default+ #394
 [ 8531.185164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
 [ 8531.187851] RIP: 0010:btrfs_free_fs_root+0x95/0xa0 [btrfs]
 [ 8531.193082] RSP: 0018:ffffb1ab86163d98 EFLAGS: 00010286
 [ 8531.194198] RAX: ffff9f3449494d18 RBX: ffff9f34a2695000 RCX:0000000000000000
 [ 8531.195629] RDX: 0000000000000002 RSI: 0000000000000001 RDI:0000000000000000
 [ 8531.197315] RBP: ffff9f344e930000 R08: 0000000000000001 R09:0000000000000000
 [ 8531.199095] R10: 0000000000000000 R11: ffff9f34494d4ff8 R12:ffffb1ab86163dc0
 [ 8531.200870] R13: ffff9f344e9300b0 R14: ffffb1ab86163db8 R15:0000000000000000
 [ 8531.202707] FS:  00007fc68e949fc0(0000) GS:ffff9f34bd800000(0000)knlGS:0000000000000000
 [ 8531.204851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [ 8531.205942] CR2: 00007ffde8114dd8 CR3: 000000002dfbd000 CR4:00000000000006e0
 [ 8531.207516] Call Trace:
 [ 8531.208175]  btrfs_free_fs_roots+0xdb/0x170 [btrfs]
 [ 8531.210209]  ? wait_for_completion+0x5b/0x190
 [ 8531.211303]  close_ctree+0x157/0x350 [btrfs]
 [ 8531.212412]  generic_shutdown_super+0x64/0x100
 [ 8531.213485]  kill_anon_super+0x14/0x30
 [ 8531.214430]  btrfs_kill_super+0x12/0xa0 [btrfs]
 [ 8531.215539]  deactivate_locked_super+0x29/0x60
 [ 8531.216633]  cleanup_mnt+0x3b/0x70
 [ 8531.217497]  task_work_run+0x98/0xc0
 [ 8531.218397]  exit_to_usermode_loop+0x83/0x90
 [ 8531.219324]  do_syscall_64+0x15b/0x180
 [ 8531.220192]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 [ 8531.221286] RIP: 0033:0x7fc68e5e4d07
 [ 8531.225621] RSP: 002b:00007ffde8116608 EFLAGS: 00000246 ORIG_RAX:00000000000000a6
 [ 8531.227512] RAX: 0000000000000000 RBX: 00005580c2175970 RCX:00007fc68e5e4d07
 [ 8531.229098] RDX: 0000000000000001 RSI: 0000000000000000 RDI:00005580c2175b80
 [ 8531.230730] RBP: 0000000000000000 R08: 00005580c2175ba0 R09:00007ffde8114e80
 [ 8531.232269] R10: 0000000000000000 R11: 0000000000000246 R12:00005580c2175b80
 [ 8531.233839] R13: 00007fc68eac61c4 R14: 00005580c2175a68 R15:0000000000000000

Leaving a tree in the rb-tree:

3853 void btrfs_free_fs_root(struct btrfs_root *root)
3854 {
3855         iput(root->ino_cache_inode);
3856         WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));

CC: [email protected]
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
[ add stacktrace ]
Signed-off-by: David Sterba <[email protected]>
6 years agobtrfs: handle delayed ref head accounting cleanup in abort
Josef Bacik [Wed, 21 Nov 2018 19:05:41 +0000 (14:05 -0500)]
btrfs: handle delayed ref head accounting cleanup in abort

We weren't doing any of the accounting cleanup when we aborted
transactions.  Fix this by making cleanup_ref_head_accounting global and
calling it from the abort code, this fixes the issue where our
accounting was all wrong after the fs aborts.

The test generic/475 on a 2G VM can trigger the problems eg.:

  [ 8502.136957] WARNING: CPU: 0 PID: 11064 at fs/btrfs/extent-tree.c:5986 btrfs_free_block_grou +ps+0x3dc/0x410 [btrfs]
  [ 8502.148372] CPU: 0 PID: 11064 Comm: umount Not tainted 5.0.0-rc1-default+ #394
  [ 8502.150807] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626 +cc-prebuilt.qemu-project.org 04/01/2014
  [ 8502.154317] RIP: 0010:btrfs_free_block_groups+0x3dc/0x410 [btrfs]
  [ 8502.160623] RSP: 0018:ffffb1ab84b93de8 EFLAGS: 00010206
  [ 8502.161906] RAX: 0000000001000000 RBX: ffff9f34b1756400 RCX: 0000000000000000
  [ 8502.163448] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff9f34b1755400
  [ 8502.164906] RBP: ffff9f34b7e8c000 R08: 0000000000000001 R09: 0000000000000000
  [ 8502.166716] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9f34b7e8c108
  [ 8502.168498] R13: ffff9f34b7e8c158 R14: 0000000000000000 R15: dead000000000100
  [ 8502.170296] FS:  00007fb1cf15ffc0(0000) GS:ffff9f34bd400000(0000) knlGS:0000000000000000
  [ 8502.172439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 8502.173669] CR2: 00007fb1ced507b0 CR3: 000000002f7a6000 CR4: 00000000000006f0
  [ 8502.175094] Call Trace:
  [ 8502.175759]  close_ctree+0x17f/0x350 [btrfs]
  [ 8502.176721]  generic_shutdown_super+0x64/0x100
  [ 8502.177702]  kill_anon_super+0x14/0x30
  [ 8502.178607]  btrfs_kill_super+0x12/0xa0 [btrfs]
  [ 8502.179602]  deactivate_locked_super+0x29/0x60
  [ 8502.180595]  cleanup_mnt+0x3b/0x70
  [ 8502.181406]  task_work_run+0x98/0xc0
  [ 8502.182255]  exit_to_usermode_loop+0x83/0x90
  [ 8502.183113]  do_syscall_64+0x15b/0x180
  [ 8502.183919]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Corresponding to

  release_global_block_rsv() {
  ...
  WARN_ON(fs_info->delayed_refs_rsv.reserved > 0);

CC: [email protected]
Signed-off-by: Josef Bacik <[email protected]>
[ add log dump ]
Signed-off-by: David Sterba <[email protected]>
6 years agoRevert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"
David Sterba [Wed, 9 Jan 2019 14:02:23 +0000 (15:02 +0100)]
Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io"

This reverts commit e73e81b6d0114d4a303205a952ab2e87c44bd279.

This patch causes a few problems:

- adds latency to btrfs_finish_ordered_io
- as btrfs_finish_ordered_io is used for free space cache, generating
  more work from btrfs_btree_balance_dirty_nodelay could end up in the
  same workque, effectively deadlocking

12260 kworker/u96:16+btrfs-freespace-write D
[<0>] balance_dirty_pages+0x6e6/0x7ad
[<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90
[<0>] btrfs_finish_ordered_io+0x3da/0x770
[<0>] normal_work_helper+0x1c5/0x5a0
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x46/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff

Transaction commit will wait on the freespace cache:

838 btrfs-transacti D
[<0>] btrfs_start_ordered_extent+0x154/0x1e0
[<0>] btrfs_wait_ordered_range+0xbd/0x110
[<0>] __btrfs_wait_cache_io+0x49/0x1a0
[<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0
[<0>] commit_cowonly_roots+0x215/0x2b0
[<0>] btrfs_commit_transaction+0x37e/0x910
[<0>] transaction_kthread+0x14d/0x180
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff

And then writepages ends up waiting on transaction commit:

9520 kworker/u96:13+flush-btrfs-1 D
[<0>] wait_current_trans+0xac/0xe0
[<0>] start_transaction+0x21b/0x4b0
[<0>] cow_file_range_inline+0x10b/0x6b0
[<0>] cow_file_range.isra.69+0x329/0x4a0
[<0>] run_delalloc_range+0x105/0x3c0
[<0>] writepage_delalloc+0x119/0x180
[<0>] __extent_writepage+0x10c/0x390
[<0>] extent_write_cache_pages+0x26f/0x3d0
[<0>] extent_writepages+0x4f/0x80
[<0>] do_writepages+0x17/0x60
[<0>] __writeback_single_inode+0x59/0x690
[<0>] writeback_sb_inodes+0x291/0x4e0
[<0>] __writeback_inodes_wb+0x87/0xb0
[<0>] wb_writeback+0x3bb/0x500
[<0>] wb_workfn+0x40d/0x610
[<0>] process_one_work+0x1ee/0x5a0
[<0>] worker_thread+0x1e0/0x3d0
[<0>] kthread+0xf5/0x130
[<0>] ret_from_fork+0x24/0x30
[<0>] 0xffffffffffffffff

Eventually, we have every process in the system waiting on
balance_dirty_pages(), and nobody is able to make progress on page
writeback.

The original patch tried to fix an OOM condition, that happened on 4.4 but no
success reproducing that on later kernels (4.19 and 4.20). This is more likely
a problem in OOM itself.

Link: https://lore.kernel.org/linux-btrfs/[email protected]/
Reported-by: Chris Mason <[email protected]>
CC: [email protected] # 4.18+
CC: ethanlien <[email protected]>
Signed-off-by: David Sterba <[email protected]>
6 years agoMerge branch 'acpi-pci'
Rafael J. Wysocki [Fri, 18 Jan 2019 10:17:16 +0000 (11:17 +0100)]
Merge branch 'acpi-pci'

* acpi-pci:
  drivers: thermal: int340x_thermal: Make PCI dependency explicit
  x86/intel/lpss: Make PCI dependency explicit
  platform/x86: apple-gmux: Make PCI dependency explicit
  platform/x86: intel_pmc: Make PCI dependency explicit
  platform/x86: intel_ips: make PCI dependency explicit
  vga-switcheroo: make PCI dependency explicit
  ata: pata_acpi: Make PCI dependency explicit
  ACPI / LPSS: Make PCI dependency explicit

6 years agomtd: rawnand: denali: get ->setup_data_interface() working again
Masahiro Yamada [Fri, 18 Jan 2019 05:30:38 +0000 (14:30 +0900)]
mtd: rawnand: denali: get ->setup_data_interface() working again

Commit 7a08dbaedd36 ("mtd: rawnand: Move ->setup_data_interface() to
nand_controller_ops") missed to invert the if-conditonal for denali.
Since then, the Denali NAND driver cannnot invoke setup_data_interface.

Fixes: 7a08dbaedd36 ("mtd: rawnand: Move ->setup_data_interface() to nand_controller_ops")
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Miquel Raynal <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
6 years agomtd: nand: jz4740: fix '__iomem *' vs. '* __iomem'
Luc Van Oostenryck [Thu, 17 Jan 2019 17:39:07 +0000 (18:39 +0100)]
mtd: nand: jz4740: fix '__iomem *' vs. '* __iomem'

The function jz_nand_ioremap_resource() needs a pointer to an __iomem
pointer as its last argument but this argument is declared as:
void * __iomem *base

Fix this by using the correct declaration:
void __iomem **base
which then also removes the following Sparse's warnings:
  282:15: warning: incorrect type in assignment (different address spaces)
  282:15:    expected void *[noderef] <asn:2>
  282:15:    got void [noderef] <asn:2> *
  322:57: warning: incorrect type in argument 4 (different address spaces)
  322:57:    expected void *[noderef] <asn:2> *base
  322:57:    got void [noderef] <asn:2> **
  402:67: warning: incorrect type in argument 4 (different address spaces)
  402:67:    expected void *[noderef] <asn:2> *base
  402:67:    got void [noderef] <asn:2> **

Signed-off-by: Luc Van Oostenryck <[email protected]>
Acked-by: Miquel Raynal <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
6 years agoMerge branch 'tcp_openreq_child'
David S. Miller [Fri, 18 Jan 2019 06:19:05 +0000 (22:19 -0800)]
Merge branch 'tcp_openreq_child'

Eric Dumazet says:

====================
tcp: remove code from tcp_create_openreq_child()

tcp_create_openreq_child() is essentially cloning a listener, then
must initialize some fields that can not be inherited.

Listeners are either fresh sockets, or sockets that came through
tcp_disconnect() after a session that dirtied many fields.

By moving code to tcp_disconnect(), we can shorten time taken
to create a clone, since tcp_disconnect() operation is very
unlikely.
====================

Acked-by: Yuchung Cheng <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agotcp: move rx_opt & syn_data_acked init to tcp_disconnect()
Eric Dumazet [Thu, 17 Jan 2019 19:23:42 +0000 (11:23 -0800)]
tcp: move rx_opt & syn_data_acked init to tcp_disconnect()

If we make sure all listeners have these fields cleared, then a clone
will also inherit zero values.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agotcp: move tp->rack init to tcp_disconnect()
Eric Dumazet [Thu, 17 Jan 2019 19:23:41 +0000 (11:23 -0800)]
tcp: move tp->rack init to tcp_disconnect()

If we make sure all listeners have proper tp->rack value,
then a clone will also inherit proper initial value.

Note that fresh sockets init tp->rack from tcp_init_sock()

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agotcp: move app_limited init to tcp_disconnect()
Eric Dumazet [Thu, 17 Jan 2019 19:23:40 +0000 (11:23 -0800)]
tcp: move app_limited init to tcp_disconnect()

If we make sure all listeners have app_limited set to ~0U,
then a clone will also inherit proper initial value.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agotcp: move retrans_out, sacked_out, tlp_high_seq, last_oow_ack_time init to tcp_discon...
Eric Dumazet [Thu, 17 Jan 2019 19:23:39 +0000 (11:23 -0800)]
tcp: move retrans_out, sacked_out, tlp_high_seq, last_oow_ack_time init to tcp_disconnect()

If we make sure all listeners have these fields cleared, then a clone
will also inherit zero values.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agotcp: do not clear urg_data in tcp_create_openreq_child
Eric Dumazet [Thu, 17 Jan 2019 19:23:38 +0000 (11:23 -0800)]
tcp: do not clear urg_data in tcp_create_openreq_child

All listeners have this field cleared already, since tcp_disconnect()
clears it and newly created sockets have also a zero value here.

So a clone will inherit a zero value here.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agotcp: move snd_cwnd & snd_cwnd_cnt init to tcp_disconnect()
Eric Dumazet [Thu, 17 Jan 2019 19:23:37 +0000 (11:23 -0800)]
tcp: move snd_cwnd & snd_cwnd_cnt init to tcp_disconnect()

Passive connections can inherit proper value by cloning,
if we make sure all listeners have the proper values there.

tcp_disconnect() was setting snd_cwnd to 2, which seems
quite obsolete since IW10 adoption.

Also remove an obsolete comment.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agotcp: move mdev_us init to tcp_disconnect()
Eric Dumazet [Thu, 17 Jan 2019 19:23:36 +0000 (11:23 -0800)]
tcp: move mdev_us init to tcp_disconnect()

If we make sure a listener always has its mdev_us
field set to TCP_TIMEOUT_INIT, we do not need to rewrite
this field after a new clone is created.

tcp_disconnect() is very seldom used in real applications.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
This page took 0.140858 seconds and 4 git commands to generate.