Git Repo - linux.git/log

bnxt_en: Add basic ethtool -t selftest support.

Add the basic infrastructure and only firmware tests initially.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Add suspend/resume callbacks.

Add suspend/resume callbacks using the newer dev_pm_ops method.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Add ethtool set_wol method.

And add functions to set and free magic packet filter.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Add ethtool get_wol method.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Add pci shutdown method.

Add pci shutdown method to put device in the proper WoL and power state.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Add basic WoL infrastructure.

Add code to driver probe function to check if the device is WoL capable
and if Magic packet WoL filter is currently set.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Update firmware interface spec to 1.7.6.2.

Features added include WoL and selftest.

Signed-off-by: Deepak Khungar <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mfd: cros-ec: Fix host command buffer size

For SPI, we can get up to 32 additional bytes for response preamble.
The current overhead (2 bytes) may cause problems when we try to receive
a big response. Update it to 32 bytes.

Without this fix we could see a kernel BUG when we receive a big response
from the Chrome EC when is connected via SPI.

Signed-off-by: Vic Yang <[email protected]>
Tested-by: Enric Balletbo i Serra <enric.balletbo.collabora.com>
Signed-off-by: Lee Jones <[email protected]>

Merge tag 'gpio-v4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull late GPIO fixes from Linus Walleij:
"Some late coming ACPI fixes for GPIO.

  We're dealing with ACPI issues here. The first is related to wake IRQs
  on Bay Trail/Cherry Trail CPUs which are common in laptops. The second
  is about proper probe deferral when reading _CRS properties.

  For my untrained eye it seems there was some quarrel between the BIOS
  and the kernel about who is supposed to deal with wakeups from GPIO
  lines"

* tag 'gpio-v4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  ACPI / gpio: do not fall back to parsing _CRS when we get a deferral
  gpio: acpi: Call enable_irq_wake for _IAE GpioInts with Wake set

Merge tag 'wireless-drivers-for-davem-2017-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.11

iwlwifi

* an RCU fix
* a fix for a potential out-of-bounds access crash
* a fix for IBSS which has been broken since DQA was enabled

rtlwifi

* fix scheduling while atomic regression

brcmfmac

* fix use-after-free bug found by KASAN
====================

Signed-off-by: David S. Miller <[email protected]>

Merge tag 'nios2-v4.11-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2

Pull nios2 fix from Ley Foon Tan:

- nios2: reserve boot memory for device tree

* tag 'nios2-v4.11-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
nios2: reserve boot memory for device tree

net: ethernet: ti: cpsw: fix race condition during open()

TI's cpsw driver handles both OF and non-OF case for phy
connect. Unfortunately of_phy_connect() returns NULL on
error while phy_connect() returns ERR_PTR().

To handle this, cpsw_slave_open() overrides the return value
from phy_connect() to make it NULL or error.

This leaves a small window, where cpsw_adjust_link() may be
invoked for a slave while slave->phy pointer is temporarily
set to -ENODEV (or some other error) before it is finally set
to NULL.

_cpsw_adjust_link() only handles the NULL case, and an oops
results when ERR_PTR() is seen by it.

Note that cpsw_adjust_link() checks PHY status for each
slave whenever it is invoked. It can so happen that even
though phy_connect() for a given slave returns error,
_cpsw_adjust_link() is still called for that slave because
the link status of another slave changed.

Fix this by using a temporary pointer to store return value
of {of_}phy_connect() and do a one-time write to slave->phy.

Reviewed-by: Grygorii Strashko <[email protected]>
Reported-by: Yan Liu <[email protected]>
Signed-off-by: Sekhar Nori <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'wireless-drivers-for-davem-2017-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.11

iwlwifi

* an RCU fix
* a fix for a potential out-of-bounds access crash
* a fix for IBSS which has been broken since DQA was enabled

rtlwifi

* fix scheduling while atomic regression

brcmfmac

* fix use-after-free bug found by KASAN
====================

Signed-off-by: David S. Miller <[email protected]>

Merge tag 'drm-fixes-for-v4.11-rc6' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"This is just mostly stuff that missed rc5, from vmwgfx and msm
  drivers"

* tag 'drm-fixes-for-v4.11-rc6' of git://people.freedesktop.org/~airlied/linux:
  drm/msm: Make sure to detach the MMU during GPU cleanup
  drm/msm/hdmi: redefinitions of macros not required
  drm/msm/mdp5: Update SSPP_MAX value
  drm/msm/dsi: Fix bug in dsi_mgr_phy_enable
  drm/msm: Don't allow zero sized buffer objects
  drm/msm: Fix wrong pointer check in a5xx_destroy
  drm/msm: adreno: fix build error without debugfs
  drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl()
  drm/vmwgfx: Remove getparam error message
  drm/ttm: Avoid calling drm_ht_remove from atomic context
  drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces
  drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl()
  drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl()
  drm/vmwgfx: Type-check lookups of fence objects

l2tp: fix PPP pseudo-wire auto-loading

PPP pseudo-wire type is 7 (11 is L2TP_PWTYPE_IP).

Fixes: f1f39f911027 ("l2tp: auto load type modules")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnx2x: fix spelling mistake in macros HW_INTERRUT_ASSERT_SET_*

Trival fix, rename HW_INTERRUT_ASSERT_SET_* to HW_INTERRUPT_ASSERT_SET_*

Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

l2tp: take reference on sessions being dumped

Take a reference on the sessions returned by l2tp_session_find_nth()
(and rename it l2tp_session_get_nth() to reflect this change), so that
caller is assured that the session isn't going to disappear while
processing it.

For procfs and debugfs handlers, the session is held in the .start()
callback and dropped in .show(). Given that pppol2tp_seq_session_show()
dereferences the associated PPPoL2TP socket and that
l2tp_dfs_seq_session_show() might call pppol2tp_show(), we also need to
call the session's .ref() callback to prevent the socket from going
away from under us.

Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Fixes: 0ad6614048cf ("l2tp: Add debugfs files for dumping l2tp debug info")
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

can: hi311x: Add Holt HI-311x CAN driver

This patch adds support for the Holt HI-311x CAN controller. The HI311x
CAN controller is capable of transmitting and receiving standard data
frames, extended data frames and remote frames. The HI311x interfaces
with the host over SPI.

Datasheet: www.holtic.com/documents/371-hi-3110_v-rev-jpdf.do

Signed-off-by: Akshay Bhat <[email protected]>
Acked-by: Wolfgang Grandegger <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: holt_hi311x: document device tree bindings

Document the HOLT HI-311x CAN device tree bindings.

Signed-off-by: Akshay Bhat <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: initial support for network namespaces

This patch adds initial support for network namespaces. The changes only
enable support in the CAN raw, proc and af_can code. GW and BCM still
have their checks that ensure that they are used only from the main
namespace.

The patch boils down to moving the global structures, i.e. the global
filter list and their /proc stats, into a per-namespace structure and passing
around the corresponding "struct net" in a lot of different places.

Changes since v1:
- rebased on current HEAD (2bfe01e)
- fixed overlong line

Signed-off-by: Mario Kicherer <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: ti_hecc: Convert TI HECC driver to DT only driver

This patch converts TI HECC driver to DT only driver. This results in
removing ti_hecc.h containing now obsolete platform data.

Former transceiver_switch callback function will be now modelled via
regulator API.

Signed-off-by: Anton Glukhov <[email protected]>
Signed-off-by: Yegor Yefremov <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: ti_hecc: Add TI HECC DT binding documentation

DT binding documentation for TI High End CAN Controller

Signed-off-by: Anton Glukhov <[email protected]>
Signed-off-by: Yegor Yefremov <[email protected]>
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>

Merge branch 'qed-QM-ILT-changes'

Yuval Mintz says:

====================
qed: QM & ILT changes

This series introduces several changes and improvements to existing
queue manager and ILT configurations done during initialization.
Notice some of the patches are actually future fixes, I.e., bugs that
can't be triggered with exisiting driver but are needed for some future
functionality.

Patch #1 refactors the configuration of the hardware's queue manager,
which is quite messy today. This contains most of the bulk [code-wise]
in the series.

Patch #2, #3 fix Timers related ILT configurations that are yet to
affect qed in existing scenarios.

Patch #4 reduces needless ILT lines wasted for RoCE configurations.

Patch #5 allows RoCE partitions to manage with less memory regions
[important, e.g., for Multi-function parititions with RoCE support].
====================

Signed-off-by: David S. Miller <[email protected]>

qed: Manage with less memory regions for RoCE

It's possible some configurations would prevent driver from utilizing
all the Memory Regions due to a lack of ILT lines.
In such a case, calculate how many memory regions would have to be
dropped due to limit, and manage without those.

Signed-off-by: Ram Amrani <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: RoCE doesn't need to use SRC

As RoCE doesn't need to use the SRC, allocating ILT memory
on behalf of RoCE is wasting available ILT lines.

Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: Correct TM ILT lines in presence of VFs

As of today there's no protocol supported that requires
support from the TM hardware block and enables SRIOV,
but we should still correct the calculation to reflect
the lines required for such future VFs instead of changing
the PF's own lines.

Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: Fix TM block ILT allocation

When configuring the HW timers block we should set the number of CIDs
up until the last CID that require timers, instead of only those CIDs
whose protocol needs timers support.

Today, the protocols that require HW timers' support have their CIDs
before any other protocol, but that would change in future [when we
add iWARP support].

Signed-off-by: Michal Kalderon <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: Revise QM cofiguration

Refactor and clean up the queue manager initialization logic.
Also, this adds support for RoC low latency queues, which later
would be used for improving RoCE latency in high throughput scenarios.

Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: usbnet: support 64bit stats

Add support for the net stats64 counters to the usbnet core. With that
in place put the hooks into every usbnet driver to use it.

This is a strait forward addition of 64bit counters for RX and TX packet
and byte counts. It is done in the same style as for the other net drivers
that support stats64. Note that the other stats fields remain as 32bit
sized values (error counts, etc).

The motivation to add this is that it is not particularly difficult to
get the RX and TX byte counts to wrap on 32bit platforms.

Signed-off-by: Greg Ungerer <[email protected]>
Acked-by: Bjørn Mork <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

soreuseport: use "unsigned int" in __reuseport_alloc()

Number of sockets is limited by 16-bit, so 64-bit allocation will never
happen.

16-bit ops are the worst code density-wise on x86_64 because of
additional prefix (66).

Space savings:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-3 (-3)
function old new delta
reuseport_add_sock 539 536 -3

Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

flowcache: more "unsigned int"

Make ->hash_count, ->low_watermark and ->high_watermark unsigned int
and propagate unsignedness to other variables.

This change doesn't change code generation because these fields aren't
used in 64-bit contexts but make it anyway: these fields can't be
negative numbers.

Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

flowcache: make flow_cache_hash_size() return "unsigned int"

Hash size can't negative so "unsigned int" is logically correct.

Propagate "unsigned int" to loop counters.

Space savings:

add/remove: 0/0 grow/shrink: 2/2 up/down: 6/-18 (-12)
function                                     old     new   delta
flow_cache_flush_tasklet                     362     365      +3
__flow_cache_shrink                          333     336      +3
flow_cache_cpu_up_prep                       178     171      -7
flow_cache_lookup                           1159    1148     -11

Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

flowcache: make flow_key_size() return "unsigned int"

Flow keys aren't 4GB+ numbers so 64-bit arithmetic is excessive.

Space savings (I'm not sure what CSWTCH is):

add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-48 (-48)
function                                     old     new   delta
flow_cache_lookup                           1163    1159      -4
CSWTCH                                     75997   75953     -44

Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/faraday: Add missing include of of.h

Breaking the include loop netdevice.h, dsa.h, devlink.h broke this
driver, it depends on includes brought in by these headers. Adding
linux/of.h fixes it.

Fixes: ed0e39e97d34 ("net: break include loop netdevice.h, dsa.h, devlink.h")
Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

vxlan: fix ND proxy when skb doesn't have transport header offset

When an incoming frame is tagged or when GRO is disabled, the skb
handled to vxlan_xmit() doesn't contain a valid transport header
offset. This makes ND proxying fail.

We combine two changes: replace use of skb_transport_offset() and ensure
the necessary amount of skb is linear just before using it:

- In vxlan_xmit(), when determining if we have an ICMPv6 neighbor
   discovery packet, just check if it is an ICMPv6 packet and rely on
   neigh_reduce() to do more checks if this is the case. The use of
   pskb_may_pull() is replaced by skb_header_pointer() for just the IPv6
   header.

- In neigh_reduce(), add pskb_may_pull() for IPv6 header and neighbor
   discovery message since this was removed from vxlan_xmit(). Replace
   skb_transport_header() with ipv6_hdr() + 1.

- In vxlan_na_create(), replace first skb_transport_offset() with
   ipv6_hdr() + 1 and second with skb_network_offset() + sizeof(struct
   ipv6hdr). Additionally, ensure we pskb_may_pull() the whole skb as we
   need it to iterate over the options.

Signed-off-by: Vincent Bernat <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: minimize false-positives on TCP/GRO check

Markus Trippelsdorf reported that after commit dcb17d22e1c2 ("tcp: warn
on bogus MSS and try to amend it") the kernel started logging the
warning for a NIC driver that doesn't even support GRO.

It was diagnosed that it was possibly caused on connections that were
using TCP Timestamps but some packets lacked the Timestamps option. As
we reduce rcv_mss when timestamps are used, the lack of them would cause
the packets to be bigger than expected, although this is a valid case.

As this warning is more as a hint, getting a clean-cut on the
threshold is probably not worth the execution time spent on it. This
patch thus alleviates the false-positives with 2 quick checks: by
accounting for the entire TCP option space and also checking against the
interface MTU if it's available.

These changes, specially the MTU one, might mask some real positives,
though if they are really happening, it's possible that sooner or later
it will be triggered anyway.

Reported-by: Markus Trippelsdorf <[email protected]>
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'xtensa-20170403' of git://github.com/jcmvbkbc/linux-xtensa

Pull Xtensa fixes from Max Filippov:

- make __pa work with uncached KSEG addresses, it fixes DMA memory
   mmapping and DMA debug

- fix torn stack dump output

- wire up statx syscall

* tag 'xtensa-20170403' of git://github.com/jcmvbkbc/linux-xtensa:
  xtensa: wire up statx system call
  xtensa: fix stack dump output
  xtensa: make __pa work with uncached KSEG addresses

Merge branch 'msm-fixes-4.11-rc6' of git://people.freedesktop.org/~robclark/linux into drm-fixes

misc msm fixes.

* 'msm-fixes-4.11-rc6' of git://people.freedesktop.org/~robclark/linux:
  drm/msm: Make sure to detach the MMU during GPU cleanup
  drm/msm/hdmi: redefinitions of macros not required
  drm/msm/mdp5: Update SSPP_MAX value
  drm/msm/dsi: Fix bug in dsi_mgr_phy_enable
  drm/msm: Don't allow zero sized buffer objects
  drm/msm: Fix wrong pointer check in a5xx_destroy
  drm/msm: adreno: fix build error without debugfs

sctp: check for dst and pathmtu update in sctp_packet_config

This patch is to move sctp_transport_dst_check into sctp_packet_config
from sctp_packet_transmit and add pathmtu check in sctp_packet_config.

With this fix, sctp can update dst or pathmtu before appending chunks,
which can void dropping packets in sctp_packet_transmit when dst is
obsolete or dst's mtu is changed.

This patch is also to improve some other codes in sctp_packet_config.
It updates packet max_size with gso_max_size, checks for dst and
pathmtu, and appends ecne chunk only when packet is empty and asoc
is not NULL.

It makes sctp flush work better, as we only need to set up them once
for one flush schedule. It's also safe, since asoc is NULL only when
the packet is created by sctp_ootb_pkt_new in which it just gets the
new dst, no need to do more things for it other than set packet with
transport's pathmtu.

Signed-off-by: Xin Long <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sctp: add SCTP_PR_STREAM_STATUS sockopt for prsctp

Before when implementing sctp prsctp, SCTP_PR_STREAM_STATUS wasn't
added, as it needs to save abandoned_(un)sent for every stream.

After sctp stream reconf is added in sctp, assoc has structure
sctp_stream_out to save per stream info.

This patch is to add SCTP_PR_STREAM_STATUS by putting the prsctp
per stream statistics into sctp_stream_out.

v1->v2:
fix an indent issue.

Signed-off-by: Xin Long <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'hns-misc-fixes'

Salil Mehta says:

====================
net: hns: Misc. HNS Bug Fixes & Code Improvements

This patch set introduces various HNS bug fixes, optimizations and code
improvements.
====================

Signed-off-by: David S. Miller <[email protected]>

net: hns: Some checkpatch.pl script & warning fixes

This patch fixes some checkpatch.pl script caught errors and
warnings during the compilation time.

Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Avoid Hip06 chip TX packet line bug

There is a bug on Hip06 that tx ring interrupts packets count will be
clear when drivers send data to tx ring, so that the tx packets count
will never upgrade to packets line, and cause the interrupts engendered
was delayed.
Sometimes, it will cause sending performance lower than expected.

To fix this bug, we set tx ring interrupts packets line to 1 forever,
to avoid count clear. And set the gap time to 20us, to solve the problem
that too many interrupts engendered when packets line is 1.

This patch could advance the send performance on ARM from 6.6G to 9.37G
when an iperf send thread on ARM and an iperf send thread on X86 for XGE.

Signed-off-by: lipeng <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Adjust the SBM module buffer threshold

HNS needs SMB Buffers to store at least two packets after sending
pause frame because of the link delay. The MTU of HNS is 9728. As
the processor user manual described, the SBM buffer threshold should
be modified.

Reported-by: Ping Zhang <[email protected]>
Signed-off-by: Kejian Yan <[email protected]>
Reviewed-by: Salil Mehta <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Simplify the exception sequence in hns_ppe_init()

We need to free all ppe submodule if it fails to initialize ppe by
any fault, so this patch will free all ppe resource before
hns_ppe_init() returns exception situation

Reported-by: JinchuanTian <[email protected]>
Signed-off-by: Kejian Yan <[email protected]>
Reviewed-by: Salil Mehta <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Optimise the code in hns_mdio_wait_ready()

This patch fixes the code to clear pclint warning/info.

Reported-by: Ping Zhang <[email protected]>
Signed-off-by: Kejian Yan <[email protected]>
Reviewed-by: Salil Mehta <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Clean redundant code from hns_mdio.c file

This patch cleans the redundant code from hns_mdio.c.

Reported-by: Ping Zhang <[email protected]>
Signed-off-by: Kejian Yan <[email protected]>
Reviewed-by: Salil Mehta <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Remove redundant mac table operations

This patch removes redundant functions used only for debugging
purposes.

Reported-by: Weiwei Deng <[email protected]>
Signed-off-by: Kejian Yan <[email protected]>
Reviewed-by: Salil Mehta <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Remove redundant mac_get_id()

There is a mac_id in mac control block structure, so the callback
function mac_get_id() is useless. Here we remove this function.

Reported-by: Weiwei Deng <[email protected]>
Signed-off-by: Kejian Yan <[email protected]>
Reviewed-by: Salil Mehta <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Remove the redundant adding and deleting mac function

The functions (hns_dsaf_set_mac_mc_entry() and hns_mac_del_mac()) are
not called by any functions. They are dead code in hns. And the same
features are implemented by the patch (the id is 66355f5).

Reported-by: Weiwei Deng <[email protected]>
Signed-off-by: Kejian Yan <[email protected]>
Reviewed-by: Salil Mehta <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Correct HNS RSS key set function

This patch fixes below ethtool configuration error:

localhost:~ # ethtool -X eth0 hkey XX:XX:XX...
Cannot set Rx flow hash configuration: Operation not supported

Signed-off-by: lipeng <[email protected]>
Reviewed-by: Yisen Zhuang <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Replace netif_tx_lock to ring spin lock

netif_tx_lock is a global spin lock, it will take affect
in all rings in the netdevice. In tx_poll_one process, it can
only lock the current ring, in this case, we define a spin lock
in hnae_ring struct for it.

Signed-off-by: lipeng <[email protected]>
reviewed-by: Yisen Zhuang <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Fix to adjust buf_size of ring according to mtu

Because buf_size of ring set to 2048, the process of rx_poll_one
can reuse the page, therefore the performance of XGE can improve.
But the chip only supports three bds in one package, so the max mtu
is 6K when it sets to 2048. For better performane in litter mtu, we
need change buf_size according to mtu.

When user change mtu, hns is only change the desc in memory. There
are some desc has been fetched by the chip, these desc can not be
changed by the code. So it needs set the port loopback and send
some packages to let the chip consumes the wrong desc and fetch new
desc.
Because the Pv660 do not support rss indirection, we need add version
check in mtu change process.

Signed-off-by: lipeng <[email protected]>
reviewed-by: Yisen Zhuang <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Optimize hns_nic_common_poll for better performance

After polling less than buget packages, we need check again. If
there are still some packages, we call napi_schedule add softirq
queue, this is not better way. So we return buget value instead
of napi_schedule.

Signed-off-by: lipeng <[email protected]>
reviewed-by: Yisen Zhuang <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: bug fix of ethtool show the speed

When run ethtool ethX on hns driver, the speed will show
as "Unknown". The base.speed is not correct assigned,
this patch fix this bug.

Signed-off-by: Daode Huang <[email protected]>
Reviewed-by: Yisen Zhuang <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Remove redundant memset during buffer release

Because all members of desc_cb is assigned when xmit one package, so it
can delete in hnae_free_buffer, as follows:
- "dma, priv, length, type" are assigned in fill_v2_desc.
- "page_offset, reuse_flag, buf" are not used in tx direction.

Signed-off-by: lipeng <[email protected]>
Signed-off-by: Weiwei Deng <[email protected]>
Reviewed-by: Yisen Zhuang <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Optimize the code for GMAC pad and crc Config

This patch optimises the init configuration code leg
for gmac pad and crc set interface.

Signed-off-by: lipeng <[email protected]>
Signed-off-by: JinchuanTian <[email protected]>
Reviewed-by: Yisen Zhuang <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Modify GMAC init TX threshold value

This patch reduces GMAC TX threshold value to avoid gmac
hang-up with speed 100M/duplex half.

Signed-off-by: lipeng <[email protected]>
Signed-off-by: JinchuanTian <[email protected]>
Reviewed-by: Yisen Zhuang <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns: Fix the implementation of irq affinity function

This patch fixes the implementation of the IRQ affinity
function. This function is used to create the cpu mask
which eventually is used to initialize the cpu<->queue
association for XPS(Transmit Packet Steering).

Signed-off-by: lipeng <[email protected]>
Signed-off-by: Kejian Yan <[email protected]>
Reviewed-by: Yisen Zhuang <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

flow dissector: correct size of storage for ARP

The last argument to __skb_header_pointer() should be a buffer large
enough to store struct arphdr. This can be a pointer to a struct arphdr
structure. The code was previously using a pointer to a pointer to
struct arphdr.

By my counting the storage available both before and after is 8 bytes on
x86_64.

Fixes: 55733350e5e8 ("flow disector: ARP support")
Reported-by: Nicolas Iooss <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drm/msm: Make sure to detach the MMU during GPU cleanup

We should be detaching the MMU before destroying the address
space. To do this cleanly, the detach has to happen in
adreno_gpu_cleanup() because it needs access to structs
in adreno_gpu.c. Plus it is better symmetry to have
the attach and detach at the same code level.

Signed-off-by: Jordan Crouse <[email protected]>
Signed-off-by: Rob Clark <[email protected]>

drm/msm/hdmi: redefinitions of macros not required

4 macros already defined in hdmi.h,
which is not required to redefine in hdmi_audio.c

Signed-off-by: Vinay Simha BN <[email protected]>
Signed-off-by: Rob Clark <[email protected]>

drm/msm/mdp5: Update SSPP_MAX value

'SSPP_MAX + 1' is the max number of hwpipes that can be present on a
MDP5 platform. Recently, 2 new cursor hwpipes were added, which
caused overflows in arrays that used SSPP_MAX to represent the number
of elements. Update the SSPP_MAX value to incorporate the extra
hwpipes.

Signed-off-by: Archit Taneja <[email protected]>
Signed-off-by: Rob Clark <[email protected]>

drm/msm/dsi: Fix bug in dsi_mgr_phy_enable

A recent commit introduces a bug in dsi_mgr_phy_enable. In the non
dual DSI mode, we reset the mdsi (master DSI) PHY. This isn't right
since master and slave DSI exist only in dual DSI mode. For the normal
mode of operation, we should simply reset the PHY of the DSI device
(i.e. msm_dsi) corresponding to the current bridge.

Usage of the wrong DSI pointer also resulted in a static checker
warning. That too is resolved with this fix.

Fixes: b62aa70a98c5 (drm/msm/dsi: Move PHY operations out of host)
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Archit Taneja <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
Signed-off-by: Rob Clark <[email protected]>

drm/msm: Don't allow zero sized buffer objects

Zero sized buffer objects tend to make various bits of the GEM
infrastructure complain:

WARNING: CPU: 1 PID: 2323 at drivers/gpu/drm/drm_mm.c:389 drm_mm_insert_node_generic+0x258/0x2f0
Modules linked in:

CPU: 1 PID: 2323 Comm: drm-api-test Tainted: G W 4.9.0-rc4-00906-g693af44 #213
Hardware name: Qualcomm Technologies, Inc. DB820c (DT)
task: ffff8000d7353400 task.stack: ffff8000d7720000
PC is at drm_mm_insert_node_generic+0x258/0x2f0
LR is at drm_vma_offset_add+0x4c/0x70

Zero sized buffers serve no appreciable value to the user so disallow
them at create time.

Signed-off-by: Jordan Crouse <[email protected]>
Signed-off-by: Rob Clark <[email protected]>

drm/msm: Fix wrong pointer check in a5xx_destroy

Instead of checking for a5xx_gpu->gpmu_iova during destroy we
accidently check a5xx_gpu->gpmu_bo.

Signed-off-by: Jordan Crouse <[email protected]>
Signed-off-by: Rob Clark <[email protected]>

drm/msm: adreno: fix build error without debugfs

The newly added a5xx support fails to build when debugfs is diabled:

drivers/gpu/drm/msm/adreno/a5xx_gpu.c:849:4: error: 'struct msm_gpu_funcs' has no member named 'show'
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:849:11: error: 'a5xx_show' undeclared here (not in a function); did you mean 'a5xx_irq'?

This adds a missing #ifdef.

Fixes: b5f103ab98c7 ("drm/msm: gpu: Add A5XX target support")
Cc: [email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Rob Clark <[email protected]>

Merge branch 'vmwgfx-fixes-4.11' of git://people.freedesktop.org/~thomash/linux into drm-fixes

Set of vmwgfx fixes
* 'vmwgfx-fixes-4.11' of git://people.freedesktop.org/~thomash/linux:
  drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl()
  drm/vmwgfx: Remove getparam error message
  drm/ttm: Avoid calling drm_ht_remove from atomic context
  drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces
  drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl()
  drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl()
  drm/vmwgfx: Type-check lookups of fence objects

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
"Four bug fixes, two of them for stable:

   - avoid initrd corruptions in the kernel decompressor

   - prevent inconsistent dumps if the boot CPU does not have address
     zero

   - fix the new pkey interface added with the merge window for 4.11

   - a fix for a fix, another issue with user copy zero padding"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/uaccess: get_user() should zero on failure (again)
  s390/pkey: Fix wrong handling of secure key with old MKVP
  s390/smp: fix ipl from cpu with non-zero address
  s390/decompressor: fix initrd corruption caused by bss clear

Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS fix from Thomas Gleixner:
"Prevent dmesg from being spammed when MCE logging is active"

* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Don't print MCEs when mcelog is active

nios2: reserve boot memory for device tree

Make sure to reserve the boot memory for the flattened device tree.
Otherwise it might get overwritten, e.g. when initial_boot_params is
copied, leading to a corrupted FDT and a boot hang/crash:

  bootconsole [early0] enabled
  Early console on uart16650 initialized at 0xf8001600
  OF: fdt: Error -11 processing FDT
  Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree!

  ---[ end Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree!

Guenter Roeck says:

> I think I found the problem. In unflatten_and_copy_device_tree(), with added
> debug information:
>
> OF: fdt: initial_boot_params=c861e400, dt=c861f000 size=28874 (0x70ca)
>
> ... and then initial_boot_params is copied to dt, which results in corrupted
> fdt since the memory overlaps. Looks like the initial_boot_params memory
> is not reserved and (re-)allocated by early_init_dt_alloc_memory_arch().

Cc: [email protected]
Reported-by: Guenter Roeck <[email protected]>
Reference: http://lkml.kernel.org/r/20170226210338 [email protected]
Tested-by: Guenter Roeck <[email protected]>
Signed-off-by: Tobias Klauser <[email protected]>
Acked-by: Ley Foon Tan <[email protected]>

net: ethernet: ti: cpsw: wake tx queues on ndo_tx_timeout

In case, if TX watchdog is fired some or all netdev TX queues will be
stopped and as part of recovery it is required not only to drain and
reinitailize CPSW TX channeles, but also wake up stoppted TX queues what
doesn't happen now and netdevice will stop transmiting data until
reopenned.

Hence, add netif_tx_wake_all_queues() call in .ndo_tx_timeout() to complete
recovery and restore TX path.

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'rds-minor-bug-fixes'

Sowmini Varadhan says:

====================
rds: tcp: couple of minor bug fixes

A couple of minor bugfixes that showed up during testing
====================

Signed-off-by: David S. Miller <[email protected]>

rds: tcp: canonical connection order for all paths with index > 0

The rds_connect_worker() has a bug in the check that enforces the
canonical connection order described in the comments of
rds_tcp_state_change(). The intention is to make sure that all
the multipath connections are always initiated by the smaller IP
address via rds_start_mprds. To achieve this, rds_connection_worker
should check that cp_index > 0.

Signed-off-by: Sowmini Varadhan <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rds: tcp: allow progress of rds_conn_shutdown if the rds_connection is marked ERROR by an intervening FIN

rds_conn_shutdown() runs in workq context, and marks the rds_connection
as DISCONNECTING before quiescing Tx/Rx paths. However, after all I/O
has quiesced, we may still find the rds_connection state to be
RDS_CONN_ERROR if an intervening FIN was processed in softirq context.

This is not a fatal error: rds_conn_shutdown() should continue the
shutdown, and there is no need to log noisy messages about this event.

Signed-off-by: Sowmini Varadhan <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sock: correctly test SOCK_TIMESTAMP in sock_recv_ts_and_drops()

It seems the code does not match the intent.

This broke packetdrill, and probably other programs.

Fixes: 6c7c98bad488 ("sock: avoid dirtying sk_stamp, if possible")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Paolo Abeni <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: fix build with gcc-4.4.4

drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: In function 'mlx5e_set_rxfh':
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: error: unknown field 'rss' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: warning: missing braces around initializer
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: warning: (near initialization for 'rrp.<anonymous>')
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1068: error: unknown field 'rss' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1069: warning: excess elements in struct initializer
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1069: warning: (near initialization for 'rrp')

gcc-4.4.4 has issues with anonymous union initializers. Work around this.

Cc: Saeed Mahameed <[email protected]>
Cc: Tariq Toukan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drivers/net/ethernet/mellanox/mlx5/core/en_main.c: fix build with gcc-4.4.4

drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_redirect_rqts':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2210: error: unknown field 'rqn' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2211: warning: missing braces around initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2211: warning: (near initialization for 'direct_rrp.<anonymous>')
drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_redirect_rqts_to_channels':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: error: unknown field 'rss' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: missing braces around initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: (near initialization for 'rrp.<anonymous>')
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: initialization makes integer from pointer without a cast
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2228: error: unknown field 'rss' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2229: warning: excess elements in struct initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2229: warning: (near initialization for 'rrp')
drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_redirect_rqts_to_drop':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2238: error: unknown field 'rqn' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2239: warning: missing braces around initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2239: warning: (near initialization for 'drop_rrp.<anonymous>')

gcc-4.4.4 has issues with anonymous union initializers. Work around this.

Cc: Saeed Mahameed <[email protected]>
Cc: Tariq Toukan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: fix cbs configuration

Sending again, because forgot to include net-dev.

The QoS IP does not accept AVB capabilities to default/queue 0, this way we
guarantee 75% bandwidth for AVB. This patch assures that only queues >= 1
gets CBS confgured. Additional info was also added to stmmac.txt.

Reported-by: Niklas Cassel <[email protected]>
Signed-off-by: Joao Pinto <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Linux 4.11-rc5

Merge tag 'dmaengine-fix-4.11-rc5' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
"A couple of minor fixes for 4.11:

   - array bound fix for __get_unmap_pool()

   - cyclic period splitting for bcm2835"

* tag 'dmaengine-fix-4.11-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: Fix array index out of bounds warning in __get_unmap_pool()
  dmaengine: bcm2835: Fix cyclic DMA period splitting

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"This update provides:

   - prevent KASLR from randomizing EFI regions

   - restrict the usage of -maccumulate-outgoing-args and document when
     and why it is required.

   - make the Global Physical Address calculation for UV4 systems work
     correctly.

   - address a copy->paste->forgot-edit problem in the MCE exception
     table entries.

   - assign a name to AMD MCA bank 3, so the sysfs file registration
     works.

   - add a missing include in the boot code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Include missing header file
  x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs
  x86/build: Mostly disable '-maccumulate-outgoing-args'
  x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization
  x86/mce: Fix copy/paste error in exception table entries
  x86/platform/uv: Fix calculation of Global Physical Address

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Thomas Gleixner:
"This update provides:

   - make the scheduler clock switch to unstable mode smooth so the
     timestamps stay at microseconds granularity instead of switching to
     tick granularity.

   - unbreak perf test tsc by taking the new offset into account which
     was added in order to proveide better sched clock continuity

   - switching sched clock to unstable mode runs all clock related
     computations which affect the sched clock output itself from a work
     queue. In case of preemption sched clock uses half updated data and
     provides wrong timestamps. Keep the math in the protected context
     and delegate only the static key switch to workqueue context.

   - remove a duplicate header include"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/headers: Remove duplicate #include <linux/sched/debug.h> line
  sched/clock: Fix broken stable to unstable transfer
  sched/clock, x86/perf: Fix "perf test tsc"
  sched/clock: Fix clear_sched_clock_stable() preempt wobbly

Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fix from Thomas Gleixner:
"Downgrade the missing ESRT header printk to warning level and remove a
useless error printk which just generates noise for no value"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/esrt: Cleanup bad memory map log messages

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
"Two small fixes for the new CLKEVT_OF infrastructure"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
vmlinux.lds: Add __clkevt_of_table to kernel
clockevents: Fix syntax error in clkevt-of macro

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
"Two small fixlets:

   - select a required Kconfig to make the MVEBU driver compile

   - add the missing MIPS local GIC interrupts which prevent drivers to
     probe successfully"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mips-gic: Fix Local compare interrupt
  irqchip/mvebu-odmi: Select GENERIC_MSI_IRQ_DOMAIN

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core fix from Thomas Gleixner:
"Prevent leaking kernel memory via /proc/$pid/syscall when the queried
task is not in a syscall"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/syscall: Clear return values when no stack

Merge branch 'mpls-more-labels'

David Ahern says:

====================
net: mpls: Allow users to configure more labels per route

Increase the maximum number of new labels for MPLS routes from 2 to 30.

To keep memory consumption in check, the labels array is moved to the end
of mpls_nh and mpls_iptunnel_encap structs as a 0-sized array. Allocations
use the maximum number of labels across all nexthops in a route for LSR
and the number of labels configured for LWT.

The mpls_route layout is changed to:

   +----------------------+
   | mpls_route           |
   +----------------------+
   | mpls_nh 0            |
   +----------------------+
   | alignment padding    |   4 bytes for odd number of labels; 0 for even
   +----------------------+
   | via[rt_max_alen] 0   |
   +----------------------+
   | alignment padding    |   via's aligned on sizeof(unsigned long)
   +----------------------+
   | ...                  |

Meaning the via follows its mpls_nh providing better locality as the
number of labels increases. UDP_RR tests with namespaces shows no impact
to a modest performance increase with this layout for 1 or 2 labels and
1 or 2 nexthops.

mpls_route allocation size is limited to 4096 bytes allowing on the
order of 30 nexthops with 30 labels (or more nexthops with fewer
labels). LWT encap shares same maximum number of labels as mpls routing.

v3
- initialize n_labels to 0 in case RTA_NEWDST is not defined; detected
  by the kbuild test robot

v2
- updates per Eric's comments
  + added patch to ensure all reads of rt_nhn_alive and nh_flags in
    the packet path use READ_ONCE and all writes via event handlers
    use WRITE_ONCE

  + limit mpls_route size to 4096 (PAGE_SIZE for most arch)

  + mostly killed use of MAX_NEW_LABELS; it exists only for common
    limit between lwt and routing paths
====================

Signed-off-by: David S. Miller <[email protected]>

net: mpls: Increase max number of labels for lwt encap

Alow users to push down more labels per MPLS encap. Similar to LSR case,
move label array to the end of mpls_iptunnel_encap and allocate based on
the number of labels for the route.

For consistency with the LSR case, re-use the same maximum number of
labels.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mpls: bump maximum number of labels

Allow users to push down more labels per MPLS route. With the previous
patches, no memory allocations are based on MAX_NEW_LABELS; the limit
is only used to keep userspace in check.

At this point MAX_NEW_LABELS is only used for mpls_route_config (copying
route data from userspace) and processing nexthops looking for the max
number of labels across the route spec.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mpls: Limit memory allocation for mpls_route

Limit memory allocation size for mpls_route to 4096.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mpls: change mpls_route layout

Move labels to the end of mpls_nh as a 0-sized array and within mpls_route
move the via for a nexthop after the mpls_nh. The new layout becomes:

   +----------------------+
   | mpls_route           |
   +----------------------+
   | mpls_nh 0            |
   +----------------------+
   | alignment padding    |   4 bytes for odd number of labels; 0 for even
   +----------------------+
   | via[rt_max_alen] 0   |
   +----------------------+
   | alignment padding    |   via's aligned on sizeof(unsigned long)
   +----------------------+
   | ...                  |
   +----------------------+
   | mpls_nh n-1          |
   +----------------------+
   | via[rt_max_alen] n-1 |
   +----------------------+

Memory allocated for nexthop + via is constant across all nexthops and
their via. It is based on the maximum number of labels across all nexthops
and the maximum via length. The size is saved in the mpls_route as
rt_nh_size. Accessing a nexthop becomes rt->rt_nh + index * rt->rt_nh_size.

The offset of the via address from a nexthop is saved as rt_via_offset
so that given an mpls_nh pointer the via for that hop is simply
nh + rt->rt_via_offset.

With prior code, memory allocated per mpls_route with 1 nexthop:
     via is an ethernet address - 64 bytes
     via is an ipv4 address     - 64
     via is an ipv6 address     - 72

With this patch set, memory allocated per mpls_route with 1 nexthop and
1 or 2 labels:
     via is an ethernet address - 56 bytes
     via is an ipv4 address     - 56
     via is an ipv6 address     - 64

The 8-byte reduction is due to the previous patch; the change introduced
by this patch has no impact on the size of allocations for 1 or 2 labels.

Performance impact of this change was examined using network namespaces
with veth pairs connecting namespaces. ns0 inserts the packet to the
label-switched path using an lwt route with encap mpls. ns1 adds 1 or 2
labels depending on test, ns2 (and ns3 for 2-label test) pops the label
and forwards. ns3 (or ns4) for a 2-label is the destination. Similar
series of namespaces used for 2-nexthop test.

Intent is to measure changes to latency (overhead in manipulating the
packet) in the forwarding path. Tests used netperf with UDP_RR.

IPv4:                     current   patches
   1 label, 1 nexthop      29908     30115
   2 label, 1 nexthop      29071     29612
   1 label, 2 nexthop      29582     29776
   2 label, 2 nexthop      29086     29149

IPv6:                     current   patches
   1 label, 1 nexthop      24502     24960
   2 label, 1 nexthop      24041     24407
   1 label, 2 nexthop      23795     23899
   2 label, 2 nexthop      23074     22959

In short, the change has no effect to a modest increase in performance.
This is expected since this patch does not really have an impact on routes
with 1 or 2 labels (the current limit) and 1 or 2 nexthops.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mpls: Convert number of nexthops to u8

Number of nexthops and number of alive nexthops are tracked using an
unsigned int. A route should never have more than 255 nexthops so
convert both to u8. Update all references and intermediate variables
to consistently use u8 as well.

Shrinks the size of mpls_route from 32 bytes to 24 bytes with a 2-byte
hole before the nexthops.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mpls: rt_nhn_alive and nh_flags should be accessed using READ_ONCE

The number of alive nexthops for a route (rt->rt_nhn_alive) and the
flags for a next hop (nh->nh_flags) are modified by netdev event
handlers. The event handlers run with rtnl_lock held so updates are
always done with the lock held. The packet path accesses the fields
under the rcu lock. Since those fields can change at any moment in
the packet path, both fields should be accessed using READ_ONCE. Updates
to both fields should use WRITE_ONCE.

Update mpls_select_multipath (packet path) and mpls_ifdown and mpls_ifup
(event handlers) accordingly.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'l2tp_session_find-fixes'

Guillaume Nault says:

====================
l2tp: fix usage of l2tp_session_find()

l2tp_session_find() doesn't take a reference on the session returned to
its caller. Virtually all l2tp_session_find() users are racy, either
because the session can disappear from under them or because they take
a reference too late. This leads to bugs like 'use after free' or
failure to notice duplicate session creations.

In some cases, taking a reference on the session is not enough. The
special callbacks .ref() and .deref() also have to be called in cases
where the PPP pseudo-wire uses the socket associated with the session.
Therefore, when looking up a session, we also have to pass a flag
indicating if the .ref() callback has to be called.

In the future, we probably could drop the .ref() and .deref() callbacks
entirely by protecting the .sock field of struct pppol2tp_session with
RCU, thus allowing it to be freed and set to NULL even if the L2TP
session is still alive.
====================

Signed-off-by: David S. Miller <[email protected]>

l2tp: take a reference on sessions used in genetlink handlers

Callers of l2tp_nl_session_find() need to hold a reference on the
returned session since there's no guarantee that it isn't going to
disappear from under them.

Relying on the fact that no l2tp netlink message may be processed
concurrently isn't enough: sessions can be deleted by other means
(e.g. by closing the PPPOL2TP socket of a ppp pseudowire).

l2tp_nl_cmd_session_delete() is a bit special: it runs a callback
function that may require a previous call to session->ref(). In
particular, for ppp pseudowires, the callback is l2tp_session_delete(),
which then calls pppol2tp_session_close() and dereferences the PPPOL2TP
socket. The socket might already be gone at the moment
l2tp_session_delete() calls session->ref(), so we need to take a
reference during the session lookup. So we need to pass the do_ref
variable down to l2tp_session_get() and l2tp_session_get_by_ifname().

Since all callers have to be updated, l2tp_session_find_by_ifname() and
l2tp_nl_session_find() are renamed to reflect their new behaviour.

Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

l2tp: hold session while sending creation notifications

l2tp_session_find() doesn't take any reference on the returned session.
Therefore, the session may disappear while sending the notification.

Use l2tp_session_get() instead and decrement session's refcount once
the notification is sent.

Fixes: 33f72e6f0c67 ("l2tp : multicast notification to the registered listeners")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

l2tp: fix duplicate session creation

l2tp_session_create() relies on its caller for checking for duplicate
sessions. This is racy since a session can be concurrently inserted
after the caller's verification.

Fix this by letting l2tp_session_create() verify sessions uniqueness
upon insertion. Callers need to be adapted to check for
l2tp_session_create()'s return code instead of calling
l2tp_session_find().

pppol2tp_connect() is a bit special because it has to work on existing
sessions (if they're not connected) or to create a new session if none
is found. When acting on a preexisting session, a reference must be
held or it could go away on us. So we have to use l2tp_session_get()
instead of l2tp_session_find() and drop the reference before exiting.

Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

l2tp: ensure session can't get removed during pppol2tp_session_ioctl()

Holding a reference on session is required before calling
pppol2tp_session_ioctl(). The session could get freed while processing the
ioctl otherwise. Since pppol2tp_session_ioctl() uses the session's socket,
we also need to take a reference on it in l2tp_session_get().

Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

l2tp: fix race in l2tp_recv_common()

Taking a reference on sessions in l2tp_recv_common() is racy; this
has to be done by the callers.

To this end, a new function is required (l2tp_session_get()) to
atomically lookup a session and take a reference on it. Callers then
have to manually drop this reference.

Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>