Git Repo - linux.git/log

bnxt_en: Fix CNP CoS queue regression.

Recent changes to support the 57500 devices have created this
regression. The bnxt_hwrm_queue_qportcfg() call was moved to be
called earlier before the RDMA support was determined, causing
the CoS queues configuration to be set before knowing whether RDMA
was supported or not. Fix it by moving it to the right place right
after RDMA support is determined.

Fixes: 98f04cf0f1fc ("bnxt_en: Check context memory requirements from firmware.")
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'char-misc-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are some small driver fixes for 4.20-rc6.

  There is a hyperv fix that for some reaon took forever to get into a
  shape that could be applied to the tree properly, but resolves a much
  reported issue. The others are some gnss patches, one a bugfix and the
  two others updates to the MAINTAINERS file to properly match the gnss
  files in the tree.

  All have been in linux-next for a while with no reported issues"

* tag 'char-misc-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  MAINTAINERS: exclude gnss from SIRFPRIMA2 regex matching
  MAINTAINERS: add gnss scm tree
  gnss: sirf: fix activation retry handling
  Drivers: hv: vmbus: Offload the handling of channels to two workqueues

Merge tag 'staging-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging fixes from Greg KH:
"Here are two staging driver bugfixes for 4.20-rc6.

  One is a revert of a previously incorrect patch that was merged a
  while ago, and the other resolves a possible buffer overrun that was
  found by code inspection.

  Both of these have been in the linux-next tree with no reported
  issues"

* tag 'staging-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  Revert commit ef9209b642f "staging: rtl8723bs: Fix indenting errors and an off-by-one mistake in core/rtw_mlme_ext.c"
  staging: rtl8712: Fix possible buffer overrun

Merge tag 'tty-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty driver fixes from Greg KH:
"Here are three small tty driver fixes for 4.20-rc6

  Nothing major, just some bug fixes for reported issues. Full details
  are in the shortlog.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  kgdboc: fix KASAN global-out-of-bounds bug in param_set_kgdboc_var()
  tty: serial: 8250_mtk: always resume the device in probe.
  tty: do not set TTY_IO_ERROR flag if console port

Merge tag 'usb-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB fixes for 4.20-rc6

  The "largest" here are some xhci fixes for reported issues. Also here
  is a USB core fix, some quirk additions, and a usb-serial fix which
  required the export of one of the tty layer's functions to prevent
  code duplication. The tty maintainer agreed with this change.

  All of these have been in linux-next with no reported issues"

* tag 'usb-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  xhci: Prevent U1/U2 link pm states if exit latency is too long
  xhci: workaround CSS timeout on AMD SNPS 3.0 xHC
  USB: check usb_get_extra_descriptor for proper size
  USB: serial: console: fix reported terminal settings
  usb: quirk: add no-LPM quirk on SanDisk Ultra Flair device
  USB: Fix invalid-free bug in port_over_current_notify()
  usb: appledisplay: Add 27" Apple Cinema Display

Merge tag '4.20-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"Three small fixes: a fix for smb3 direct i/o, a fix for CIFS DFS for
  stable and a minor cifs Kconfig fix"

* tag '4.20-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Avoid returning EBUSY to upper layer VFS
  cifs: Fix separator when building path from dentry
  cifs: In Kconfig CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)

Merge tag 'dax-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull dax fixes from Dan Williams:
"The last of the known regression fixes and fallout from the Xarray
  conversion of the filesystem-dax implementation.

  On the path to debugging why the dax memory-failure injection test
  started failing after the Xarray conversion a couple more fixes for
  the dax_lock_mapping_entry(), now called dax_lock_page(), surfaced.
  Those plus the bug that started the hunt are now addressed. These
  patches have appeared in a -next release with no issues reported.

  Note the touches to mm/memory-failure.c are just the conversion to the
  new function signature for dax_lock_page().

  Summary:

   - Fix the Xarray conversion of fsdax to properly handle
     dax_lock_mapping_entry() in the presense of pmd entries

   - Fix inode destruction racing a new lock request"

* tag 'dax-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: Fix unlock mismatch with updated API
  dax: Don't access a freed inode
  dax: Check page->mapping isn't NULL

Merge tag 'libnvdimm-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
"A regression fix for the Address Range Scrub implementation, yes
  another one, and support for platforms that misalign persistent memory
  relative to the Linux memory hotplug section constraint. Longer term,
  support for sub-section memory hotplug would alleviate alignment
  waste, but until then this hack allows a 'struct page' memmap to be
  established for these misaligned memory regions.

  These have all appeared in a -next release, and thanks to Patrick for
  reporting and testing the alignment padding fix.

  Summary:

   - Unless and until the core mm handles memory hotplug units smaller
     than a section (128M), persistent memory namespaces must be padded
     to section alignment.

     The libnvdimm core already handled section collision with "System
     RAM", but some configurations overlap independent "Persistent
     Memory" ranges within a section, so additional padding injection is
     added for that case.

   - The recent reworks of the ARS (address range scrub) state machine
     to reduce the number of state flags inadvertantly missed a
     conversion of acpi_nfit_ars_rescan() call sites. Fix the regression
     whereby user-requested ARS results in a "short" scrub rather than a
     "long" scrub.

   - Fixup the unit tests to handle / test the 128M section alignment of
     mocked test resources.

* tag 'libnvdimm-fixes-4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  acpi/nfit: Fix user-initiated ARS to be "ARS-long" rather than "ARS-short"
  libnvdimm, pfn: Pad pfn namespaces relative to other regions
  tools/testing/nvdimm: Align test resources to 128M

net: dsa: Make dsa_master_set_mtu() static

Add the missing static keyword.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: Restore MTU on master device on unload

A previous change tries to set the MTU on the master device to take
into account the DSA overheads. This patch tries to reset the master
device back to the default MTU.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'platform-data-controls-for-mdio-gpio'

Andrew Lunn says:

====================
platform data controls for mdio-gpio

Soon to be mainlined is an x86 platform with a Marvell switch, and a
bit-banging MDIO bus. In order to make this work, the phy_mask of the
MDIO bus needs to be set to prevent scanning for PHYs, and the
phy_ignore_ta_mask needs to be set because the switch has broken
turnaround.

Add a platform_data structure with these parameters.
====================

Signed-off-by: David S. Miller <[email protected]>

net: phy: mdio-gpio: Add phy_ignore_ta_mask to platform data

The Marvell 6390 Ethernet switch family does not perform MDIO
turnaround correctly. Many hardware MDIO bus masters don't care about
this, but the bitbangging implementation in Linux does by default. Add
phy_ignore_ta_mask to the platform data so that the bitbangging code
can be told which devices are known to get TA wrong.

v2
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: mdio-gpio: Add platform_data support for phy_mask

It is sometimes necessary to instantiate a bit-banging MDIO bus as a
platform device, without the aid of device tree.

When device tree is being used, the bus is not scanned for devices,
only those devices which are in device tree are probed. Without device
tree, by default, all addresses on the bus are scanned. This may then
find a device which is not a PHY, e.g. a switch. And the switch may
have registers containing values which look like a PHY. So during the
scan, a PHY device is wrongly created.

After the bus has been registered, a search is made for
mdio_board_info structures which indicates devices on the bus, and the
driver which should be used for them. This is typically used to
instantiate Ethernet switches from platform drivers. However, if the
scanning of the bus has created a PHY device at the same location as
indicated into the board info for a switch, the switch device is not
created, since the address is already busy.

This can be avoided by setting the phy_mask of the mdio bus. This mask
prevents addresses on the bus being scanned.

v2
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_core: Correctly set PFC param if global pause is turned off.

rx_ppp and tx_ppp can be set between 0 and 255, so don't clamp to 1.

Fixes: 6e8814ceb7e8 ("net/mlx4_en: Fix mixed PFC and Global pause user control requests")
Signed-off-by: Tarick Bedeir <[email protected]>
Reviewed-by: Eran Ben Elisha <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal

Pull thermal SoC fixes from Eduardo Valentin:
"Fixes for armada and broadcom thermal drivers"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
  thermal: broadcom: constify thermal_zone_of_device_ops structure
  thermal: armada: constify thermal_zone_of_device_ops structure
  thermal: bcm2835: Switch to SPDX identifier
  thermal: armada: fix legacy resource fixup
  thermal: armada: fix legacy validity test sense

ip: silence udp zerocopy smatch false positive

extra_uref is used in __ip(6)_append_data only if uarg is set.

Smatch sees that the variable is passed to sock_zerocopy_put_abort.
This function accesses it only when uarg is set, but smatch cannot
infer this.

Make this dependency explicit.

Fixes: 52900d22288e ("udp: elide zerocopy operation in hot path")
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'asm-generic-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic fix from Arnd Bergmann:
"Multiple people reported a bug I introduced in asm-generic/unistd.h in
  4.20, this is the obvious bugfix to get glibc and others to correctly
  build again on new architectures that no longer provide the old
  fstatat64() family of system calls"

* tag 'asm-generic-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: unistd.h: fixup broken macro include.

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
"A few clk driver fixes this time:

   - Introduce protected-clock DT binding to fix breakage on qcom
     sdm845-mtp boards where the qspi clks introduced this merge window
     cause the firmware on those boards to take down the system if we
     try to read the clk registers

   - Fix a couple off-by-one errors found by Dan Carpenter

   - Handle failure in zynq fixed factor clk driver to avoid using
     uninitialized data"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: zynqmp: Off by one in zynqmp_is_valid_clock()
  clk: mmp: Off by one in mmp_clk_add()
  clk: mvebu: Off by one bugs in cp110_of_clk_get()
  arm64: dts: qcom: sdm845-mtp: Mark protected gcc clocks
  clk: qcom: Support 'protected-clocks' property
  dt-bindings: clk: Introduce 'protected-clocks' property
  clk: zynqmp: handle fixed factor param query error

Merge tag 'xfs-4.20-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
"Here are hopefully the last set of fixes for 4.20.

  There's a fix for a longstanding statfs reporting problem with project
  quotas, a correction for page cache invalidation behaviors when
  fallocating near EOF, and a fix for a broken metadata verifier return
  code.

  Finally, the most important fix is to the pipe splicing code (aka the
  generic copy_file_range fallback) to avoid pointless short directio
  reads by only asking the filesystem for as much data as there are
  available pages in the pipe buffer. Our previous fix (simulated short
  directio reads because the number of pages didn't match the length of
  the read requested) caused subtle problems on overlayfs, so that part
  is reverted.

  Anyhow, this series passes fstests -g all on xfs and overlay+xfs, and
  has passed 17 billion fsx operations problem-free since I started
  testing

  Summary:

   - Fix broken project quota inode counts

   - Fix incorrect PAGE_MASK/PAGE_SIZE usage

   - Fix incorrect return value in btree verifier

   - Fix WARN_ON remap flags false positive

   - Fix splice read overflows"

* tag 'xfs-4.20-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: partially revert 4721a601099 (simulated directio short read on EFAULT)
  splice: don't read more than available pipe space
  vfs: allow some remap flags to be passed to vfs_clone_file_range
  xfs: fix inverted return from xfs_btree_sblock_verify_crc
  xfs: fix PAGE_MASK usage in xfs_free_file_space
  fs/xfs: fix f_ffree value for statfs when project quota is set

Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"

This reverts commit 89c83fb539f95491be80cdd5158e6f0ce329e317.

This should have been done as part of 2f0799a0ffc0 ("mm, thp: restore
node-local hugepage allocations").  The movement of the thp allocation
policy from alloc_pages_vma() to alloc_hugepage_direct_gfpmask() was
intended to only set __GFP_THISNODE for mempolicies that are not
MPOL_BIND whereas the revert could set this regardless of mempolicy.

While the check for MPOL_BIND between alloc_hugepage_direct_gfpmask()
and alloc_pages_vma() was racy, that has since been removed since the
revert.  What is left is the possibility to use __GFP_THISNODE in
policy_node() when it is unexpected because the special handling for
hugepages in alloc_pages_vma()  was removed as part of the consolidation.

Secondly, prior to 89c83fb539f9, alloc_pages_vma() implemented a somewhat
different policy for hugepage allocations, which were allocated through
alloc_hugepage_vma().  For hugepage allocations, if the allocating
process's node is in the set of allowed nodes, allocate with
__GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with
__GFP_THISNODE instead).  This was changed for shmem_alloc_hugepage() to
allow fallback to other nodes in 89c83fb539f9 as it did for new_page() in
mm/mempolicy.c which is functionally different behavior and removes the
requirement to only allocate hugepages locally.

So this commit does a full revert of 89c83fb539f9 instead of the partial
revert that was done in 2f0799a0ffc0.  The result is the same thp
allocation policy for 4.20 that was in 4.19.

Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask")
Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations")
Signed-off-by: David Rientjes <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Revert "net/ibm/emac: wrong bit is used for STA control"

This reverts commit 624ca9c33c8a853a4a589836e310d776620f4ab9.

This commit is completely bogus. The STACR register has two formats, old
and new, depending on the version of the IP block used. There's a pair of
device-tree properties that can be used to specify the format used:

has-inverted-stacr-oc
has-new-stacr-staopc

What this commit did was to change the bit definition used with the old
parts to match the new parts. This of course breaks the driver on all
the old ones.

Instead, the author should have set the appropriate properties in the
device-tree for the variant used on his board.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'tc-testing-next'

Lucas Bates says:

====================
tc-testing: implement command timeouts and better results tracking

Patch 1 adds a timeout feature for any command tdc launches in a subshell.
This prevents tdc from hanging indefinitely.

Patches 2-4 introduce a new method for tracking and generating test case
results, and implements it across the core script and all applicable
plugins.
====================

Signed-off-by: David S. Miller <[email protected]>

tc-testing: gitignore, ignore generated test results

Ignore any .tap or .xml test result files generated by tdc.

Additionally, ignore plugin symlinks.

Signed-off-by: Lucas Bates <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tc-testing: Implement the TdcResults module in tdc

In tdc and the valgrind plugin, begin using the TdcResults module
to track executed test cases.

Signed-off-by: Lucas Bates <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tc-testing: Add new TdcResults module

This module includes new classes for tdc to use in keeping track
of test case results, instead of generating and tracking a lengthy
string.

The new module can be extended to support multiple formal
test result formats to be friendlier to automation.

Signed-off-by: Lucas Bates <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tc-testing: Add command timeout feature to tdc

Using an attribute set in the tdc_config.py file, limit the
amount of time tdc will wait for an executed command to
complete and prevent the script from hanging entirely.

This timeout will be applied to all executed commands.

Signed-off-by: Lucas Bates <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'skb-headroom-slab-out-of-bounds'

Stefano Brivio says:

====================
Fix slab out-of-bounds on insufficient headroom for IPv6 packets

Patch 1/2 fixes a slab out-of-bounds occurring with short SCTP packets over
IPv4 over L2TP over IPv6 on a configuration with relatively low HEADER_MAX.

Patch 2/2 makes sure we avoid writing before the allocated buffer in
neigh_hh_output() in case the headroom is enough for the unaligned hardware
header size, but not enough for the aligned one, and that we warn if we hit
this condition.
====================

Signed-off-by: David S. Miller <[email protected]>

neighbour: Avoid writing before skb->head in neigh_hh_output()

While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.

In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.

Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.

v2:
- instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
   (Eric Dumazet)
- if we avoid the panic, though, we need to explicitly check the headroom
   before the memcpy(), otherwise we'll have corrupted slabs on a running
   kernel, after we warn
- use __skb_push() instead of skb_push(), as the headroom check is
   already implemented here explicitly (Eric Dumazet)

Signed-off-by: Stefano Brivio <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: Check available headroom in ip6_xmit() even without options

Even if we send an IPv6 packet without options, MAX_HEADER might not be
enough to account for the additional headroom required by alignment of
hardware headers.

On a configuration without HYPERV_NET, WLAN, AX25, and with IPV6_TUNNEL,
sending short SCTP packets over IPv4 over L2TP over IPv6, we start with
100 bytes of allocated headroom in sctp_packet_transmit(), end up with 54
bytes after l2tp_xmit_skb(), and 14 bytes in ip6_finish_output2().

Those would be enough to append our 14 bytes header, but we're going to
align that to 16 bytes, and write 2 bytes out of the allocated slab in
neigh_hh_output().

KASan says:

[  264.967848] ==================================================================
[  264.967861] BUG: KASAN: slab-out-of-bounds in ip6_finish_output2+0x1aec/0x1c70
[  264.967866] Write of size 16 at addr 000000006af1c7fe by task netperf/6201
[  264.967870]
[  264.967876] CPU: 0 PID: 6201 Comm: netperf Not tainted 4.20.0-rc4+ #1
[  264.967881] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
[  264.967887] Call Trace:
[  264.967896] ([<00000000001347d6>] show_stack+0x56/0xa0)
[  264.967903]  [<00000000017e379c>] dump_stack+0x23c/0x290
[  264.967912]  [<00000000007bc594>] print_address_description+0xf4/0x290
[  264.967919]  [<00000000007bc8fc>] kasan_report+0x13c/0x240
[  264.967927]  [<000000000162f5e4>] ip6_finish_output2+0x1aec/0x1c70
[  264.967935]  [<000000000163f890>] ip6_finish_output+0x430/0x7f0
[  264.967943]  [<000000000163fe44>] ip6_output+0x1f4/0x580
[  264.967953]  [<000000000163882a>] ip6_xmit+0xfea/0x1ce8
[  264.967963]  [<00000000017396e2>] inet6_csk_xmit+0x282/0x3f8
[  264.968033]  [<000003ff805fb0ba>] l2tp_xmit_skb+0xe02/0x13e0 [l2tp_core]
[  264.968037]  [<000003ff80631192>] l2tp_eth_dev_xmit+0xda/0x150 [l2tp_eth]
[  264.968041]  [<0000000001220020>] dev_hard_start_xmit+0x268/0x928
[  264.968069]  [<0000000001330e8e>] sch_direct_xmit+0x7ae/0x1350
[  264.968071]  [<000000000122359c>] __dev_queue_xmit+0x2b7c/0x3478
[  264.968075]  [<00000000013d2862>] ip_finish_output2+0xce2/0x11a0
[  264.968078]  [<00000000013d9b14>] ip_finish_output+0x56c/0x8c8
[  264.968081]  [<00000000013ddd1e>] ip_output+0x226/0x4c0
[  264.968083]  [<00000000013dbd6c>] __ip_queue_xmit+0x894/0x1938
[  264.968100]  [<000003ff80bc3a5c>] sctp_packet_transmit+0x29d4/0x3648 [sctp]
[  264.968116]  [<000003ff80b7bf68>] sctp_outq_flush_ctrl.constprop.5+0x8d0/0xe50 [sctp]
[  264.968131]  [<000003ff80b7c716>] sctp_outq_flush+0x22e/0x7d8 [sctp]
[  264.968146]  [<000003ff80b35c68>] sctp_cmd_interpreter.isra.16+0x530/0x6800 [sctp]
[  264.968161]  [<000003ff80b3410a>] sctp_do_sm+0x222/0x648 [sctp]
[  264.968177]  [<000003ff80bbddac>] sctp_primitive_ASSOCIATE+0xbc/0xf8 [sctp]
[  264.968192]  [<000003ff80b93328>] __sctp_connect+0x830/0xc20 [sctp]
[  264.968208]  [<000003ff80bb11ce>] sctp_inet_connect+0x2e6/0x378 [sctp]
[  264.968212]  [<0000000001197942>] __sys_connect+0x21a/0x450
[  264.968215]  [<000000000119aff8>] sys_socketcall+0x3d0/0xb08
[  264.968218]  [<000000000184ea7a>] system_call+0x2a2/0x2c0

[...]

Just like ip_finish_output2() does for IPv4, check that we have enough
headroom in ip6_xmit(), and reallocate it if we don't.

This issue is older than git history.

Reported-by: Jianlin Shi <[email protected]>
Signed-off-by: Stefano Brivio <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: lack of available data can also cause TSO defer

tcp_tso_should_defer() can return true in three different cases :

1) We are cwnd-limited
2) We are rwnd-limited
3) We are application limited.

Neal pointed out that my recent fix went too far, since
it assumed that if we were not in 1) case, we must be rwnd-limited

Fix this by properly populating the is_cwnd_limited and
is_rwnd_limited booleans.

After this change, we can finally move the silly check for FIN
flag only for the application-limited case.

The same move for EOR bit will be handled in net-next,
since commit 1c09f7d073b1 ("tcp: do not try to defer skbs
with eor mark (MSG_EOR)") is scheduled for linux-4.21

Tested by running 200 concurrent netperf -t TCP_RR -- -r 60000,100
and checking none of them was rwnd_limited in the chrono_stat
output from "ss -ti" command.

Fixes: 41727549de3e ("tcp: Do not underestimate rwnd_limited")
Signed-off-by: Eric Dumazet <[email protected]>
Suggested-by: Neal Cardwell <[email protected]>
Reviewed-by: Neal Cardwell <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Yuchung Cheng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: call sk_dst_reset when set SO_DONTROUTE

after set SO_DONTROUTE to 1, the IP layer should not route packets if
the dest IP address is not in link scope. But if the socket has cached
the dst_entry, such packets would be routed until the sk_dst_cache
expires. So we should clean the sk_dst_cache when a user set
SO_DONTROUTE option. Below are server/client python scripts which
could reprodue this issue:

server side code:

==========================================================================
import socket
import struct
import time

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('0.0.0.0', 9000))
s.listen(1)
sock, addr = s.accept()
sock.setsockopt(socket.SOL_SOCKET, socket.SO_DONTROUTE, struct.pack('i', 1))
while True:
    sock.send(b'foo')
    time.sleep(1)
==========================================================================

client side code:
==========================================================================
import socket
import time

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('server_address', 9000))
while True:
    data = s.recv(1024)
    print(data)
==========================================================================

Signed-off-by: yupeng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

neighbor: Improve garbage collection

The existing garbage collection algorithm has a number of problems:

1. The gc algorithm will not evict PERMANENT entries as those entries
   are managed by userspace, yet the existing algorithm walks the entire
   hash table which means it always considers PERMANENT entries when
   looking for entries to evict. In some use cases (e.g., EVPN) there
   can be tens of thousands of PERMANENT entries leading to wasted
   CPU cycles when gc kicks in. As an example, with 32k permanent
   entries, neigh_alloc has been observed taking more than 4 msec per
   invocation.

2. Currently, when the number of neighbor entries hits gc_thresh2 and
   the last flush for the table was more than 5 seconds ago gc kicks in
   walks the entire hash table evicting *all* entries not in PERMANENT
   or REACHABLE state and not marked as externally learned. There is no
   discriminator on when the neigh entry was created or if it just moved
   from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).

   It is possible for entries to be created or for established neighbor
   entries to be moved to STALE (e.g., an external node sends an ARP
   request) right before the 5 second window lapses:

        -----|---------x|----------|-----
            t-5         t         t+5

   If that happens those entries are evicted during gc causing unnecessary
   thrashing on neighbor entries and userspace caches trying to track them.

   Further, this contradicts the description of gc_thresh2 which says
   "Entries older than 5 seconds will be cleared".

   One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
   whole point of having separate thresholds.

3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
   when gc_thresh2 is exceeded is over kill and contributes to trashing
   especially during startup.

This patch addresses these problems as follows:

1. Use of a separate list_head to track entries that can be garbage
   collected along with a separate counter. PERMANENT entries are not
   added to this list.

   The gc_thresh parameters are only compared to the new counter, not the
   total entries in the table. The forced_gc function is updated to only
   walk this new gc_list looking for entries to evict.

2. Entries are added to the list head at the tail and removed from the
   front.

3. Entries are only evicted if they were last updated more than 5 seconds
   ago, adhering to the original intent of gc_thresh2.

4. Forced gc is stopped once the number of gc_entries drops below
   gc_thresh2.

5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
   when allocating a new neighbor for a PERMANENT entry. By extension this
   means there are no explicit limits on the number of PERMANENT entries
   that can be created, but this is no different than FIB entries or FDB
   entries.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'hns3-error-handling'

Salil Mehta says:

====================
net: hns3: Additions/optimizations related to HNS3 H/W err handling

This patch set primarily does following addtions and optimizations
related to error handling in HNS3 Ethernet driver:

1. Name changes for enable and process functions and minor loop
    optimizations. [PATCH 1-6]
2. Modify query and clearing of RAS errors using new set of commands
    because modules specific commands for clearing RCB PPP PF, SSU are
    obselete. [PATCH 7]
3. Deletes logging 1-bit errors for RAS in HNS3 driver as these never
    get reported to the driver. [PATCH 8]
4. Add handling of NIC hw errors reported through MSIx rather than
    PCIe AER channel. [PATCH 9]
5. Add handling for the HW RAS and MSIx errors in the modules MAC, PPP
    PF, MSIx SRAM, RCB and SSU. [PATCH 10-13]
6. Add handling of RoCEE RAS errors. [PATCH 14]
====================

Signed-off-by: David S. Miller <[email protected]>

net: hns3: add handling of RDMA RAS errors

This patch handles the RDMA RAS errors.
1. Enable RAS interrupt, print error detail info and clear error status.
2. Do CORE reset to recovery when these non-fatal errors happened.

Signed-off-by: Xiaofei Tan <[email protected]>
Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: handle hw errors of SSU

This patch enables and handles hw errors of the Storage Switch Unit(SSU).

Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: handle hw errors of PPU(RCB)

This patch enables and handles hw RAS and MSIx errors of PPU(RCB).

Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: handle hw errors of PPP PF

This patch handles PF hw errors of PPP(Programmable Packet Processor).

Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: add handling of hw errors of MAC

This patch adds enable and handling of hw errors of
the MAC block.

Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: add handling of hw errors reported through MSIX

This patch adds handling for HNS3 hardware errors(non-standard)
which are reported through MSIX interrupts and not through
PCIe AER channel.

These MSIX reported hardware errors are handled using common
misc. interrupt handler. Hardware error related registers
cannot be cleared in context to the interrupt received as
they require *heavy* access to hardware using IMP(Integrated
Mangement Processor) commands. Hence, we defer the clearing
of such error events till later time.

Since, we have defered exact identification of errors we
will have to defer the level of receovery/reset which
might be required. Hence, a new reset type UNKNOWN reset
has been introduced which effectively defers the assertion
of the reset till we get hold of kind of errors at later
time.

Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: deleted logging 1 bit errors

This patch deletes logging 1 bit errors for the following reasons.
1. AER does not notify 1 bit errors to the device drivers.
However AER reports 1 bit errors to the userspace through the
trace_aer_event for logging in the rasdaemon.
2. Firmware clears the status of 1 bit errors in the hw registers.

Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: add handling of hw ras errors using new set of commands

1. This patch adds handling of hw ras errors using new set of
common commands.
2. Updated the error message tables to match the register's name and
error status returned by the commands.

Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: add optimization in the hclge_hw_error_set_state

1. This patch adds minor loop optimization in the
hclge_hw_error_set_state function.
2. Adds logging module's name if it fails to configure the
error interrupts.

Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: rename process_hw_error function

This patch renames process_hw_error function to
handle_hw_ras_error function to match the purpose
of the function. This is because hw errors reported through
ras and msix interrupts will be handled separately.

Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: deletes unnecessary settings of the descriptor data

This patch deletes unnecessary setting of the descriptor data
to 0 for disabling error interrupts because
it is already done by the hclge_cmd_setup_basic_desc function.

Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: re-enable error interrupts on hw reset

This patch adds calling hclge_hw_error_set_state function
to re-enable the error interrupts those will be disabled on
the hw reset.

Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: rename enable error interrupt functions

This patch
- renames the enable error interrupt functions.
The reason is that these functions
are used for both enable and disable error interrupts.

- removes redundant logs from the enable error interrupt functions.

Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: remove existing process error functions and reorder hw_blk table

1.The command interface for queryng and clearing hw errors is
  changed, which requires the new process error functions to be added.
  This patch removes all the current process error functions and
  associated definitions. The new functions to handle ras errors
  would be added in this patch set.

2. Fixed order issue of the hw_blk table.

Signed-off-by: Shiju Jose <[email protected]>
Signed-off-by: Salil Mehta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull vhost/virtio fixes from Michael Tsirkin:
"A couple of last-minute fixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost/vsock: fix use-after-free in network stack callers
  virtio/s390: fix race in ccw_io_helper()
  virtio/s390: avoid race on vcdev->config
  vhost/vsock: fix reset orphans race with close timeout

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
"Avoid sending IPIs with interrupts disabled"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: hibernate: Avoid sending cross-calling with interrupts disabled

Merge branch 'support-alu32_arsh'

Jiong Wang says:

====================
BPF_ALU | BPF_ARSH | BPF_* were rejected by commit: 7891a87efc71
("bpf: arsh is not supported in 32 bit alu thus reject it"). As explained
in the commit message, this is due to there is no complete support for them
on interpreter and various JIT compilation back-ends.

This patch set is a follow-up which completes the missing bits. This also
pave the way for running bpf program compiled with ALU32 instruction
enabled by specifing -mattr=+alu32 to LLVM for which case there is likely
to have more BPF_ALU | BPF_ARSH insns that will trigger the rejection code.

test_verifier.c is updated accordingly.

I have tested this patch set on x86-64 and NFP, I need help of review and
test on the arch changes (mips/ppc/s390).

Note, there might be merge confict on mips change which is better to be
applied on top of:

commit: 20b880a05f06 ("mips: bpf: fix encoding bug for mm_srlv32_op"),

which is on mips-fixes branch at the moment.

Thanks.

v1->v2:
- Fix ppc implementation bug. Should zero high bits explicitly.
====================

Signed-off-by: Alexei Starovoitov <[email protected]>

selftests: bpf: update testcases for BPF_ALU | BPF_ARSH

"arsh32 on imm" and "arsh32 on reg" now are accepted. Also added two new
testcases to make sure arsh32 won't be treated as arsh64 during
interpretation or JIT code-gen for which case the high bits will be moved
into low halve that the testcases could catch them.

Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Jiong Wang <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: verifier remove the rejection on BPF_ALU | BPF_ARSH

This patch remove the rejection on BPF_ALU | BPF_ARSH as we have supported
them on interpreter and all JIT back-ends

Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Jiong Wang <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: interpreter support BPF_ALU | BPF_ARSH

This patch implements interpreting BPF_ALU | BPF_ARSH. Do arithmetic right
shift on low 32-bit sub-register, and zero the high 32 bits.

Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Jiong Wang <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

nfp: bpf: implement jitting of BPF_ALU | BPF_ARSH | BPF_*

BPF_X support needs indirect shift mode, please see code comments for
details.

Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Jiong Wang <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

s390: bpf: implement jitting of BPF_ALU | BPF_ARSH | BPF_*

This patch implements code-gen for BPF_ALU | BPF_ARSH | BPF_*.

Cc: Martin Schwidefsky <[email protected]>
Cc: Heiko Carstens <[email protected]>
Signed-off-by: Jiong Wang <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

ppc: bpf: implement jitting of BPF_ALU | BPF_ARSH | BPF_*

This patch implements code-gen for BPF_ALU | BPF_ARSH | BPF_*.

Cc: Naveen N. Rao <[email protected]>
Cc: Sandipan Das <[email protected]>
Signed-off-by: Jiong Wang <[email protected]>
Reviewed-by: Sandipan Das <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

mips: bpf: implement jitting of BPF_ALU | BPF_ARSH | BPF_X

Jitting of BPF_K is supported already, but not BPF_X. This patch complete
the support for the latter on both MIPS and microMIPS.

Cc: Paul Burton <[email protected]>
Cc: [email protected]
Acked-by: Paul Burton <[email protected]>
Signed-off-by: Jiong Wang <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

mips: bpf: fix encoding bug for mm_srlv32_op

For micro-mips, srlv inside POOL32A encoding space should use 0x50
sub-opcode, NOT 0x90.

Some early version ISA doc describes the encoding as 0x90 for both srlv and
srav, this looks to me was a typo. I checked Binutils libopcode
implementation which is using 0x50 for srlv and 0x90 for srav.

v1->v2:
- Keep mm_srlv32_op sorted by value.

Fixes: f31318fdf324 ("MIPS: uasm: Add srlv uasm instruction")
Cc: Markos Chandras <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: [email protected]
Acked-by: Jakub Kicinski <[email protected]>
Acked-by: Song Liu <[email protected]>
Signed-off-by: Jiong Wang <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

Merge tag 'gcc-plugins-v4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull gcc stackleak plugin fixes from Kees Cook:

- Remove tracing for inserted stack depth marking function (Anders
   Roxell)

- Move gcc-plugin pass location to avoid objtool warnings (Alexander
   Popov)

* tag 'gcc-plugins-v4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  stackleak: Register the 'stackleak_cleanup' pass before the '*free_cfg' pass
  stackleak: Mark stackleak_track_stack() as notrace

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

- Disable the new crypto stats interface as it's still being changed

- Fix potential uses-after-free in cbc/cfb/pcbc.

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: user - Disable statistics interface
crypto: do not free algorithm before using

Merge branch 'mlxsw-Un-offload-FDB-on-NVE-detach-attach'

Ido Schimmel says:

====================
mlxsw: Un/offload FDB on NVE detach/attach

Petr says:

When a VXLAN device is attached to a bridge of a driver capable of
offloading such, or upped, the FDB entries already present at the device
need to be offloaded. Similarly when an offloaded VXLAN device ceases
being interesting (it is downed, or detached, or a front-panel port
netdevice is detached from the bridge that the VXLAN device is attached
to), any offloaded FDB entries need to be unoffloaded and unmarked. This
attach / detach processing is implemented in this patchset.

In patch #1, a code pattern is extracted into a named function for
easier reuse.

In patch #2, vxlan_fdb_replay() is added to send
SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE for each FDB entry with a given VNI.
The intention is that the offloading driver will interpret these events
like any other and thus offload the FDB entries that existed prior to
VXLAN attach.

In patches #3 and #4, the functions vxlan_fdb_clear_offload() resp.
br_fdb_clear_offload() are added. These clear the offloaded flag at
matching FDB entries.

In patches #5-#9, we introduce FID-type-specific and NVE-type-specific
ops necessary to properly abstract invocations of the replay/clear
functions.

Finally patch #10 implements the FDB management.

In patch #11, the mlxsw-specific test case is extended to check that the
management of offload marks under the newly-supported situations is
correct. Patch #12, from Ido, exercises the new code paths in actual
functional test.

v2:
- Patch #1:
    - Modify vxlan_fdb_switchdev_notifier_info() to initialize the
      structure through a passed-in pointer argument, instead of returning
      it as a value.
- Patch #2:
    - Adapt to API change in vxlan_fdb_switchdev_notifier_info()
====================

Signed-off-by: David S. Miller <[email protected]>

selftests: forwarding: Add PVID test case for VXLAN with VLAN-aware bridges

When using VLAN-aware bridges with VXLAN, the VLAN that is mapped to the
VNI of the VXLAN device is that which is configured as "pvid untagged"
on the corresponding bridge port.

When these flags are toggled or when the VLAN is deleted entirely,
remote hosts should not be able to receive packets from the VTEP.

Add a test case for above mentioned scenarios.

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: mlxsw: vxlan: Test FDB un/marking on VXLAN join/leave

When a VXLAN device is attached to an offloaded bridge, or when a
front-panel port is attached to a bridge that already has a VXLAN
device, mlxsw should offload the existing offloadable FDB entries.
Similarly when VXLAN device is downed, the FDB entries are unoffloaded,
and the marks thus need to be cleared. Similarly when a front-panel port
device is attached to a bridge with a VXLAN device, or when VLAN flags
are tweaked on a VXLAN port attached to a VLAN-aware bridge.

Test that the replaying / clearing logic works by observing transitions
in presence of offload marks under different scenarios.

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_nve: Un/offload FDB on nve_fid_disable/enable

Any existing NVE FDB entries need to be offloaded when NVE is enabled
for a given FID. Recent patches have added fdb_replay op for this, so
just invoke it from mlxsw_sp_nve_fid_enable().

When NVE is disabled on a FID, any existing FDB offloaded marks need to
be cleared on NVE device as well as on its bridge master. An op to
handle this, fdb_clear_offload, has been added to FID ops and NVE ops in
previous patches. Add code to resolve the NVE device, NVE type, and
dispatch to both fdb_clear_offload ops.

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum: Add mlxsw_sp_fid_ops.fdb_clear_offload

If there are any offloaded FDB entries at bridge master of an NVE device
at the time that it's un-offloaded, their offloaded marks need to be
cleared. How that is done depends on whether the bridge in question is
vlan aware. Therefore add a per-FID-type operation.

Implement the operation for the 802.1q and 802.1d bridges.

Add and publish a function mlxsw_sp_fid_fdb_clear_offload() to dispatch
to the new operation according to FID type.

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_nve: Add mlxsw_sp_nve_ops.fdb_clear_offload

If there are any offloaded FDB entries at an NVE device at the time that
it's un-offloaded, their offloaded marks need to be cleared. How that is
done depends on NVE device type, and therefore add a per-NVE-type
operation.

Implement the operation for the sole NVE device type currently supported
by mlxsw, VXLAN.

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_nve: Add mlxsw_sp_nve_ops.fdb_replay

A replay of FDB needs to be performed so that the FDB entries existing
at the NVE device are offloaded. How the replay is done depends on NVE
device type, and therefore add a per-NVE-type operation.

Implement the operation for the sole NVE device type currently supported
by mlxsw, VXLAN.

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_switchdev: Publish mlxsw_sp_switchdev_notifier

The notifier block will need to be passed to vxlan_fdb_replay() in a
follow-up patch.

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum: Track NVE type at FIDs

A follow-up patch will add support for replay and for clearing of
offload marks. These are NVE type-sensitive operations, and to be able
to dispatch them properly, a FID needs to know what NVE type is attached
to it.

Therefore, track the NVE type at struct mlxsw_sp_fid. Extend
mlxsw_sp_fid_vni_set() to take it as an argument, and add
mlxsw_sp_fid_nve_type().

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bridge: Add br_fdb_clear_offload()

When a driver unoffloads all FDB entries en bloc, it's inefficient to
send the switchdev notification one by one. Add a helper that unsets the
offload flag on FDB entries on a given bridge port and VLAN.

Signed-off-by: Petr Machata <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

vxlan: Add vxlan_fdb_clear_offload()

When a driver unoffloads all FDB entries en bloc, it's inefficient to
send the switchdev notification one by one. Add a helper that walks the
FDB table, unsetting the offload flag on RDST with a given VNI.

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

vxlan: Add vxlan_fdb_replay()

When a VXLAN device becomes relevant to a driver (such as when it is
attached to an offloaded bridge), the driver will generally need to walk
the existing FDB entries and offload them.

Add a function vxlan_fdb_replay() to call a given notifier block for
each FDB entry with a given VNI.

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

vxlan: Add a function to init switchdev_notifier_vxlan_fdb_info

There are currently two places that need to initialize the notifier info
structure, and one more is coming next when vxlan_fdb_replay() is
introduced. These three instances have / will have very similar code
that is easy to abstract away into a named function.

Add such function, vxlan_fdb_switchdev_notifier_info(), and call it from
vxlan_fdb_switchdev_call_notifiers() and vxlan_fdb_find_uc().

Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'pci-v4.20-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
"Revert ASPM change that caused a regression"

* tag 'pci-v4.20-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI/ASPM: Do not initialize link state when aspm_disabled is set"

Merge branch 'net-aquantia-add-RSS-configuration'

Igor Russkikh says:

====================
net: aquantia: add RSS configuration

In this patchset few bugs related to RSS are fixed and RSS table and
hash key configuration is added.

We also do increase max number of HW rings upto 8.

v2: removed extra arg check
====================

Signed-off-by: David S. Miller <[email protected]>

net: aquantia: add support of RSS configuration

Add support of configuration of RSS hash key and RSS indirection table.

Signed-off-by: Dmitry Bogdanov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: aquantia: fix initialization of RSS table

Now RSS indirection table is initialized before setting up the number of
hw queues, consequently the table may be filled by non existing queues.
This patch moves the initialization when the number of hw queues is
known.

Signed-off-by: Dmitry Bogdanov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: aquantia: increase max number of hw queues

Increase the upper limit of the hw queues up to 8.
This makes RSS better on multiheaded cpus.

This is a maximum AQC hardware supports in one traffic class.

The actual value is still limited by a number of available cpu cores.

Signed-off-by: Dmitry Bogdanov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: aquantia: fix RSS table and key sizes

Set RSS indirection table and RSS hash key sizes to their real size.

Signed-off-by: Dmitry Bogdanov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output

In 'seg6_output', stack variable 'struct flowi6 fl6' was missing
initialization.

Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: Shmulik Ladkani <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'for-linus-20181207' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Let's try this again...

  We're finally happy with the DM livelock issue, and it's also passed
  overnight testing and the corruption regression test. The end result
  is much nicer now too, which is great.

  Outside of that fix, there's a pull request for NVMe with two small
  fixes, and a regression fix for BFQ from this merge window. The BFQ
  fix looks bigger than it is, it's 90% comment updates"

* tag 'for-linus-20181207' of git://git.kernel.dk/linux-block:
  blk-mq: punt failed direct issue to dispatch list
  nvmet-rdma: fix response use after free
  nvme: validate controller state before rescheduling keep alive
  block, bfq: fix decrement of num_active_groups

Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"A set of driver bugfixes for the I2C subsystem"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: uniphier-f: fix violation of tLOW requirement for Fast-mode
  i2c: uniphier: fix violation of tLOW requirement for Fast-mode
  i2c: uniphier-f: fill TX-FIFO only in IRQ handler for repeated START
  i2c: uniphier-f: fix timeout error after reading 8 bytes
  i2c: scmi: Fix probe error on devices with an empty SMB0001 ACPI device node
  i2c: axxia: properly handle master timeout
  i2c: rcar: check bus state before reinitializing
  i2c: nvidia-gpu: limit reads also for combined messages
  i2c: nvidia-gpu: adhere to I2C fault codes

Merge tag 'dmaengine-fix-4.20-rc6' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
"Another pull request for dmaengine. We got bunch of fixes early this
  week and all are tagged to stable. Hope this is last fix for this
  cycle:

   - Fix imx-sdma handling of channel terminations, this involves
     reverting two commits and implement async termination

   - Fix cppi dma channel deletion from pending list on stop

   - Fix FIFO size for dw controller in Intel Merrifield"

* tag 'dmaengine-fix-4.20-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: dw: Fix FIFO size for Intel Merrifield
  dmaengine: cppi41: delete channel from pending list when stop channel
  dmaengine: imx-sdma: use GFP_NOWAIT for dma descriptor allocations
  dmaengine: imx-sdma: implement channel termination via worker
  Revert "dmaengine: imx-sdma: alloclate bd memory from dma pool"
  Revert "dmaengine: imx-sdma: Use GFP_NOWAIT for dma allocations"

x86/vdso: Drop implicit common-page-size linker flag

GNU linker's -z common-page-size's default value is based on the target
architecture. arch/x86/entry/vdso/Makefile sets it to the architecture
default, which is implicit and redundant. Drop it.

Fixes: 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu")
Reported-by: Dmitry Golovin <[email protected]>
Reported-by: Bill Wendling <[email protected]>
Suggested-by: Dmitry Golovin <[email protected]>
Suggested-by: Rui Ueyama <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Fangrui Song <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: x86-ml <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: https://bugs.llvm.org/show_bug.cgi?id=38774
Link: https://github.com/ClangBuiltLinux/linux/issues/31

arm64: hibernate: Avoid sending cross-calling with interrupts disabled

Since commit 3b8c9f1cdfc50 ("arm64: IPI each CPU after invalidating the
I-cache for kernel mappings"), a call to flush_icache_range() will use
an IPI to cross-call other online CPUs so that any stale instructions
are flushed from their pipelines. This triggers a WARN during the
hibernation resume path, where flush_icache_range() is called with
interrupts disabled and is therefore prone to deadlock:

  | Disabling non-boot CPUs ...
  | CPU1: shutdown
  | psci: CPU1 killed.
  | CPU2: shutdown
  | psci: CPU2 killed.
  | CPU3: shutdown
  | psci: CPU3 killed.
  | WARNING: CPU: 0 PID: 1 at ../kernel/smp.c:416 smp_call_function_many+0xd4/0x350
  | Modules linked in:
  | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc4 #1

Since all secondary CPUs have been taken offline prior to invalidating
the I-cache, there's actually no need for an IPI and we can simply call
__flush_icache_range() instead.

Cc: <[email protected]>
Fixes: 3b8c9f1cdfc50 ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
Reported-by: Kunihiko Hayashi <[email protected]>
Tested-by: Kunihiko Hayashi <[email protected]>
Tested-by: James Morse <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

Merge branch 'nvme-4.20' of git://git.infradead.org/nvme into for-linus

Pull NVMe fixes from Christoph.

* 'nvme-4.20' of git://git.infradead.org/nvme:
nvmet-rdma: fix response use after free
nvme: validate controller state before rescheduling keep alive

blk-mq: punt failed direct issue to dispatch list

After the direct dispatch corruption fix, we permanently disallow direct
dispatch of non read/write requests. This works fine off the normal IO
path, as they will be retried like any other failed direct dispatch
request. But for the blk_insert_cloned_request() that only DM uses to
bypass the bottom level scheduler, we always first attempt direct
dispatch. For some types of requests, that's now a permanent failure,
and no amount of retrying will make that succeed. This results in a
livelock.

Instead of making special cases for what we can direct issue, and now
having to deal with DM solving the livelock while still retaining a BUSY
condition feedback loop, always just add a request that has been through
->queue_rq() to the hardware queue dispatch list. These are safe to use
as no merging can take place there. Additionally, if requests do have
prepped data from drivers, we aren't dependent on them not sharing space
in the request structure to safely add them to the IO scheduler lists.

This basically reverts ffe81d45322c and is based on a patch from Ming,
but with the list insert case covered as well.

Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
Cc: [email protected]
Suggested-by: Ming Lei <[email protected]>
Reported-by: Bart Van Assche <[email protected]>
Tested-by: Ming Lei <[email protected]>
Acked-by: Mike Snitzer <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

nvmet-rdma: fix response use after free

nvmet_rdma_release_rsp() may free the response before using it at error
flow.

Fixes: 8407879 ("nvmet-rdma: fix possible bogus dereference under heavy load")
Signed-off-by: Israel Rukshin <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Max Gurtovoy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme: validate controller state before rescheduling keep alive

Delete operations are seeing NULL pointer references in call_timer_fn.
Tracking these back, the timer appears to be the keep alive timer.

nvme_keep_alive_work() which is tied to the timer that is cancelled
by nvme_stop_keep_alive(), simply starts the keep alive io but doesn't
wait for it's completion. So nvme_stop_keep_alive() only stops a timer
when it's pending. When a keep alive is in flight, there is no timer
running and the nvme_stop_keep_alive() will have no affect on the keep
alive io. Thus, if the io completes successfully, the keep alive timer
will be rescheduled. In the failure case, delete is called, the
controller state is changed, the nvme_stop_keep_alive() is called while
the io is outstanding, and the delete path continues on. The keep
alive happens to successfully complete before the delete paths mark it
as aborted as part of the queue termination, so the timer is restarted.
The delete paths then tear down the controller, and later on the timer
code fires and the timer entry is now corrupt.

Fix by validating the controller state before rescheduling the keep
alive. Testing with the fix has confirmed the condition above was hit.

Signed-off-by: James Smart <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

block, bfq: fix decrement of num_active_groups

Since commit '2d29c9f89fcd ("block, bfq: improve asymmetric scenarios
detection")', if there are process groups with I/O requests waiting for
completion, then BFQ tags the scenario as 'asymmetric'. This detection
is needed for preserving service guarantees (for details, see comments
on the computation * of the variable asymmetric_scenario in the
function bfq_better_to_idle).

Unfortunately, commit '2d29c9f89fcd ("block, bfq: improve asymmetric
scenarios detection")' contains an error exactly in the updating of
the number of groups with I/O requests waiting for completion: if a
group has more than one descendant process, then the above number of
groups, which is renamed from num_active_groups to a more appropriate
num_groups_with_pending_reqs by this commit, may happen to be wrongly
decremented multiple times, namely every time one of the descendant
processes gets all its pending I/O requests completed.

A correct, complete solution should work as follows. Consider a group
that is inactive, i.e., that has no descendant process with pending
I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs
is still accounting for this group, because the group still has some
descendant process with some I/O request still in
flight. num_groups_with_pending_reqs should be decremented when the
in-flight request of the last descendant process is finally completed
(assuming that nothing else has changed for the group in the meantime,
in terms of composition of the group and active/inactive state of
child groups and processes). To accomplish this, an additional
pending-request counter must be added to entities, and must be
updated correctly.

To avoid this additional field and operations, this commit resorts to
the following tradeoff between simplicity and accuracy: for an
inactive group that is still counted in num_groups_with_pending_reqs,
this commit decrements num_groups_with_pending_reqs when the first
descendant process of the group remains with no request waiting for
completion.

This simplified scheme provides a fix to the unbalanced decrements
introduced by 2d29c9f89fcd. Since this error was also caused by lack
of comments on this non-trivial issue, this commit also adds related
comments.

Fixes: 2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection")
Reported-by: Steven Barrett <[email protected]>
Tested-by: Steven Barrett <[email protected]>
Tested-by: Lucjan Lucjanov <[email protected]>
Reviewed-by: Federico Motta <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'gnss-4.20-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/gnss into char-misc-linus

Johan writes:

GNSS fixes for 4.20-rc6

Here's a fix for a broken activation retry loop in the sirf driver.

Included are also two MAINTAINERS updates.

All have been in linux-next with no reported issues.

Signed-off-by: Johan Hovold <[email protected]>
* tag 'gnss-4.20-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/gnss:
  MAINTAINERS: exclude gnss from SIRFPRIMA2 regex matching
  MAINTAINERS: add gnss scm tree
  gnss: sirf: fix activation retry handling

CIFS: Avoid returning EBUSY to upper layer VFS

EBUSY is not handled by VFS, and will be passed to user-mode. This is not
correct as we need to wait for more credits.

This patch also fixes a bug where rsize or wsize is used uninitialized when
the call to server->ops->wait_mtu_credits() fails.

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Long Li <[email protected]>
Signed-off-by: Steve French <[email protected]>
Reviewed-by: Pavel Shilovsky <[email protected]>

net/mlx5: Expose packet based credit mode

Packet based credit mode bit determines whether the credit mode
is done per message or packet. Expose the QP creation flag and
the HCA capability.

Signed-off-by: Danit Goldberg <[email protected]>
Reviewed-by: Majd Dibbiny <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>

crypto: user - Disable statistics interface

Since this user-space API is still undergoing significant changes,
this patch disables it for the current merge window.

Signed-off-by: Herbert Xu <[email protected]>

Merge tag 'drm-fixes-2018-12-07' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"There's a bit more in here than I'd like, and I'm hoping things calm
  down when I'm out.

  msm:
   - a bunch of display fixes for the new DPU
   - a couple of command submission fixes

  omap:
   - some DSI fixes

  ast:
   - driver unload crash fix

  core:
   - fix the lease uevent so userspace can distinguish it

  amd:
   - fix a bpc regression
   - fix lru handling regression
   - fixed firmware support for new GPUs
   - power management fixes for vega20"

* tag 'drm-fixes-2018-12-07' of git://anongit.freedesktop.org/drm/drm: (37 commits)
  drm/ast: Fix connector leak during driver unload
  drm/amdgpu/vcn: Update vcn.cur_state during suspend
  drm/amd/display: Fix overflow/truncation from strncpy.
  drm/amd/powerplay: improve OD code robustness
  drm/amdgpu: enlarge maximum waiting time of KIQ
  drm/fb-helper: Fix typo in parameter description
  drm/amd/powerplay: support SoftMin/Max setting for some specific DPM
  drm/amd/powerplay: issue pre-display settings for display change event
  drm/amd/powerplay: support new pptable upload on Vega20
  drm/amdgpu/gmc8: always load MC firmware in the driver
  drm/amdgpu/gmc8: update MC firmware for polaris
  drm/amdgpu: update mc firmware image for polaris12 variants
  drm/msm: Fix error return checking
  drm/msm/dpu: Ignore alpha for XBGR8888 format
  drm/msm: dpu: Fix "WARNING: invalid free of devm_ allocated data"
  drm/msm/hdmi: Drop pointless static qualifier in msm_hdmi_bind()
  drm/msm: Move fence put to where failure occurs
  drm/msm: dpu: Don't set legacy plane->crtc pointer
  drm/msm/gpu: Don't map command buffers with nr_relocs equal to 0
  drm/msm/hdmi: Enable HPD after HDMI IRQ is set up
  ...

Merge tag 'nfs-for-4.20-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
"This is mainly fallout from the updates to the SUNRPC code that is
  being triggered from less common combinations of NFS mount options.

  Highlights include:

  Stable fixes:
   - Fix a page leak when using RPCSEC_GSS/krb5p to encrypt data.

  Bugfixes:
   - Fix a regression that causes the RPC receive code to hang
   - Fix call_connect_status() so that it handles tasks that got
     transmitted while queued waiting for the socket lock.
   - Fix a memory leak in call_encode()
   - Fix several other connect races.
   - Fix receive code error handling.
   - Use the discard iterator rather than MSG_TRUNC for compatibility
     with AF_UNIX/AF_LOCAL sockets.
   - nfs: don't dirty kernel pages read by direct-io
   - pnfs/Flexfiles fix to enforce per-mirror stateid only for NFSv4
     data servers"

* tag 'nfs-for-4.20-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Don't force a redundant disconnection in xs_read_stream()
  SUNRPC: Fix up socket polling
  SUNRPC: Use the discard iterator rather than MSG_TRUNC
  SUNRPC: Treat EFAULT as a truncated message in xs_read_stream_request()
  SUNRPC: Fix up handling of the XDRBUF_SPARSE_PAGES flag
  SUNRPC: Fix RPC receive hangs
  SUNRPC: Fix a potential race in xprt_connect()
  SUNRPC: Fix a memory leak in call_encode()
  SUNRPC: Fix leak of krb5p encode pages
  SUNRPC: call_connect_status() must handle tasks that got transmitted
  nfs: don't dirty kernel pages read by direct-io
  flexfiles: enforce per-mirror stateid only for v4 DSes

Merge branch 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM spectre fix from Russell King:
"Exynos folk noticed that CPU hotplug wasn't working with their kernel
configuration, and have tested this as fixing the problem"

* 'spectre' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: ensure that processor vtables is not lost after boot

Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
"Some small fixes that have been accumulated:

   - Chris Cole noticed that in a SMP environment, the DMA cache
     coherence handling can produce undesirable results in a corner
     case

   - Propagate that fix for ARMv7M as well

   - Fix a false positive with source fortification

   - Fix an uninitialised return that Nathan Jones spotted"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8816/1: dma-mapping: fix potential uninitialized return
  ARM: 8815/1: V7M: align v7m_dma_inv_range() with v7 counterpart
  ARM: 8814/1: mm: improve/fix ARM v7_dma_inv_range() unaligned address handling
  ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE

i2c: uniphier-f: fix violation of tLOW requirement for Fast-mode

Currently, the clock duty is set as tLOW/tHIGH = 1/1. For Fast-mode,
tLOW is set to 1.25 us while the I2C spec requires tLOW >= 1.3 us.

tLOW/tHIGH = 5/4 would meet both Standard-mode and Fast-mode:
Standard-mode: tLOW = 5.56 us, tHIGH = 4.44 us
Fast-mode: tLOW = 1.39 us, tHIGH = 1.11 us

Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>

i2c: uniphier: fix violation of tLOW requirement for Fast-mode

Currently, the clock duty is set as tLOW/tHIGH = 1/1. For Fast-mode,
tLOW is set to 1.25 us while the I2C spec requires tLOW >= 1.3 us.

tLOW/tHIGH = 5/4 would meet both Standard-mode and Fast-mode:
Standard-mode: tLOW = 5.56 us, tHIGH = 4.44 us
Fast-mode: tLOW = 1.39 us, tHIGH = 1.11 us

Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>