Git Repo - linux.git/log

Merge tag 'pci-v4.11-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

- fix iProc memory corruption

- fix ThunderX usage of unregistered PNP/ACPI ID

- fix ThunderX resource reservation on early firmware

* tag 'pci-v4.11-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: thunder-pem: Add legacy firmware support for Cavium ThunderX host controller
  PCI: thunder-pem: Use Cavium assigned hardware ID for ThunderX host controller
  PCI: iproc: Save host bridge window resource in struct iproc_pcie

xtensa: make __pa work with uncached KSEG addresses

When __pa is applied to virtual address in uncached KSEG region the
result is incorrect. Fix it by checking if the original address is in
the uncached KSEG and adjusting the result. It looks better than masking
off bits because pfn_valid would correctly work with new __pa results
and it may be made working in noMMU case, once we get definition for
uncached memory view.

This is required for the dma_common_mmap and DMA debug code to work
correctly: they both indirectly use __pa with coherent DMA addresses.
In case of DMA debug the visible effect is false reports that an address
mapped for DMA is accessed by CPU.

Cc: [email protected]
Tested-by: Boris Brezillon <[email protected]>
Reviewed-by: Boris Brezillon <[email protected]>
Signed-off-by: Max Filippov <[email protected]>

ibmvnic: Remove debugfs support

The debugfs support in the ibmvnic driver is not, and never has been,
supported. Just remove it.

The work done in the debugfs code for the driver was part of the original
spec for the ibmvnic driver. The corresponding support for this from the
server side was never supported and has been dropped.

Signed-off-by: Nathan Fontenot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bonding: refine bond_fold_stats() wrap detection

Some device drivers reset their stats at down/up events, possibly
fooling bonding stats, since they operate with relative deltas.

It is nearly not possible to fix drivers, since some of them compute the
tx/rx counters based on per rx/tx queue stats, and the queues can be
reconfigured (ethtool -L) between the down/up sequence.

Lets avoid accumulating 'negative' values that render bonding stats
useless.

It is better to lose small deltas, assuming the bonding stats are
fetched at a reasonable frequency.

Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mmc: sdhci-of-at91: fix MMC_DDR_52 timing selection

The controller has different timings for MMC_TIMING_UHS_DDR50 and
MMC_TIMING_MMC_DDR52. Configuring the controller with SDHCI_CTRL_UHS_DDR50,
when MMC_TIMING_MMC_DDR52 timings are requested, is not correct and can
lead to unexpected behavior.

Signed-off-by: Ludovic Desroches <[email protected]>
Fixes: bb5f8ea4d514 ("mmc: sdhci-of-at91: introduce driver for the Atmel SDMMC")
Cc: <[email protected]> # 4.4+
Signed-off-by: Ulf Hansson <[email protected]>

mmc: sdhci: Disable runtime pm when the sdio_irq is enabled

SDIO cards may need clock to send the card interrupt to the host.

On a cherrytrail tablet with a RTL8723BS wifi chip, without this patch
pinging the tablet results in:

PING 192.168.1.14 (192.168.1.14) 56(84) bytes of data.
64 bytes from 192.168.1.14: icmp_seq=1 ttl=64 time=78.6 ms
64 bytes from 192.168.1.14: icmp_seq=2 ttl=64 time=1760 ms
64 bytes from 192.168.1.14: icmp_seq=3 ttl=64 time=753 ms
64 bytes from 192.168.1.14: icmp_seq=4 ttl=64 time=3.88 ms
64 bytes from 192.168.1.14: icmp_seq=5 ttl=64 time=795 ms
64 bytes from 192.168.1.14: icmp_seq=6 ttl=64 time=1841 ms
64 bytes from 192.168.1.14: icmp_seq=7 ttl=64 time=810 ms
64 bytes from 192.168.1.14: icmp_seq=8 ttl=64 time=1860 ms
64 bytes from 192.168.1.14: icmp_seq=9 ttl=64 time=812 ms
64 bytes from 192.168.1.14: icmp_seq=10 ttl=64 time=48.6 ms

Where as with this patch I get:

PING 192.168.1.14 (192.168.1.14) 56(84) bytes of data.
64 bytes from 192.168.1.14: icmp_seq=1 ttl=64 time=3.96 ms
64 bytes from 192.168.1.14: icmp_seq=2 ttl=64 time=1.97 ms
64 bytes from 192.168.1.14: icmp_seq=3 ttl=64 time=17.2 ms
64 bytes from 192.168.1.14: icmp_seq=4 ttl=64 time=2.46 ms
64 bytes from 192.168.1.14: icmp_seq=5 ttl=64 time=2.83 ms
64 bytes from 192.168.1.14: icmp_seq=6 ttl=64 time=1.40 ms
64 bytes from 192.168.1.14: icmp_seq=7 ttl=64 time=2.10 ms
64 bytes from 192.168.1.14: icmp_seq=8 ttl=64 time=1.40 ms
64 bytes from 192.168.1.14: icmp_seq=9 ttl=64 time=2.04 ms
64 bytes from 192.168.1.14: icmp_seq=10 ttl=64 time=1.40 ms

Cc: Dong Aisheng <[email protected]>
Cc: Ian W MORRISON <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Acked-by: Adrian Hunter <[email protected]>
Acked-by: Dong Aisheng <[email protected]>
Cc: [email protected]
Signed-off-by: Ulf Hansson <[email protected]>

arm64: drop non-existing vdso-offsets.h from .gitignore

Since commit a66649dab350 ("arm64: fix vdso-offsets.h dependency"),
include/generated/vdso-offsets.h is directly generated without
arch/arm64/kernel/vdso/vdso-offsets.h.

Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

arm64: remove redundant header file in current.h

Commint 9d84fb27fa1 ("arm64: restore get_current() optimisation") has
removed read_sysreg() and asm/sysreg.h is redundant.

This patch removes asm/sysreg.h header file.

Acked-by: Mark Rutland <[email protected]>
Signed-off-by: Shaokun Zhang <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

sctp: alloc stream info when initializing asoc

When sending a msg without asoc established, sctp will send INIT packet
first and then enqueue chunks.

Before receiving INIT_ACK, stream info is not yet alloced. But enqueuing
chunks needs to access stream info, like out stream state and out stream
cnt.

This patch is to fix it by allocing out stream info when initializing an
asoc, allocing in stream and re-allocing out stream when processing init.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

VSOCK: remove unnecessary ternary operator on return value

Rather than assign the positive errno values to ret and then
checking if it is positive and flip the sign, just return the
errno value.

Detected by CoverityScan, CID#986649 ("Logically Dead Code")

Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Reviewed-by: Jorgen Hansen <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drivers: add explicit interrupt.h includes

These files all use functions declared in interrupt.h, but currently rely
on implicit inclusion of this file (via netns/xfrm.h).

That won't work anymore when the flow cache is removed so include that
header where needed.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/packet: fix overflow in check for tp_reserve

When calculating po->tp_hdrlen + po->tp_reserve the result can overflow.

Fix by checking that tp_reserve <= INT_MAX on assign.

Signed-off-by: Andrey Konovalov <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/packet: fix overflow in check for tp_frame_nr

When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow.

Add a check that tp_block_size * tp_block_nr <= UINT_MAX.

Since frames_per_block <= tp_block_size, the expression would
never overflow.

Signed-off-by: Andrey Konovalov <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/packet: fix overflow in check for priv area size

Subtracting tp_sizeof_priv from tp_block_size and casting to int
to check whether one is less then the other doesn't always work
(both of them are unsigned ints).

Compare them as is instead.

Also cast tp_sizeof_priv to u64 before using BLK_PLUS_PRIV, as
it can overflow inside BLK_PLUS_PRIV otherwise.

Signed-off-by: Andrey Konovalov <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'asoc-fix-v4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v4.11

A relatively large pile of fixes for mainline, the first since the merge
window. The biggest block of changes here by volume is the sun8i-codec
set, the driver was newly added in the merge window but it was realized
that renaming some of the user visible controls was required so these
are being pushed for v4.11 to avoid the original code appearing in a
release. Otherwise it's all fairly standard bugfix stuff.

net: stmmac: dwmac-rk: Add handling for RGMII_ID/RXID/TXID

ATM dwmac-rk will always set and enable it's internal delay lines.
Using PHY internal delays in combination with the phy-mode
rgmii-id/rxid/txid was not possible. Only rgmii was supported.

Now we can disable rockchip's gmac delay lines and also use
rgmii-id/rxid/txid.

Tested only with a RK3288 based board.

Signed-off-by: Wadim Egorov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "net: stmmac: enable multiple buffers"

The commit aff3d9eff843 ("net: stmmac: enable multiple buffers") breaks
numerous boards. while some patch exists for fixing some of it,
dwmac-sunxi is still broken with it.
Since this patch is very huge, it will be better to split it in smaller
part.

Signed-off-by: Corentin Labbe <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

PNFS fix fallback to MDS if got error on commit to DS

Upong receiving some errors (EACCES) on commit to the DS the code
doesn't fallback to MDS and intead retrieds to the same DS again.

Signed-off-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>

brcmfmac: use local iftype avoiding use-after-free of virtual interface

A use-after-free was found using KASAN. In brcmf_p2p_del_if() the virtual
interface is removed using call to brcmf_remove_interface(). After that
the virtual interface instance has been freed and should not be referenced.
Solve this by storing the nl80211 iftype in local variable, which is used
in a couple of places anyway.

Cc: [email protected] # 4.10.x, 4.9.x
Reported-by: Daniel J Blueman <[email protected]>
Reviewed-by: Hante Meuleman <[email protected]>
Reviewed-by: Pieter-Paul Giesberts <[email protected]>
Reviewed-by: Franky Lin <[email protected]>
Signed-off-by: Arend van Spriel <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

rtlwifi: Fix scheduling while atomic splat

Following commit cceb0a597320 ("rtlwifi: Add work queue for c2h cmd."),
the following BUG is reported when rtl8723be is used:

BUG: sleeping function called from invalid context at mm/slab.h:432
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/0
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W O 4.11.0-rc3-wl+ #276
Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.50 09/29/2014
Call Trace:
<IRQ>
dump_stack+0x63/0x89
___might_sleep+0xe9/0x130
__might_sleep+0x4a/0x90
kmem_cache_alloc_trace+0x19f/0x200
? rtl_c2hcmd_enqueue+0x3e/0x110 [rtlwifi]
rtl_c2hcmd_enqueue+0x3e/0x110 [rtlwifi]
rtl8723be_c2h_packet_handler+0xac/0xc0 [rtl8723be]
rtl8723be_rx_command_packet+0x37/0x5c [rtl8723be]
_rtl_pci_rx_interrupt+0x200/0x6b0 [rtl_pci]
_rtl_pci_interrupt+0x20c/0x5d0 [rtl_pci]
__handle_irq_event_percpu+0x3f/0x1d0
handle_irq_event_percpu+0x23/0x60
handle_irq_event+0x3c/0x60
handle_fasteoi_irq+0xa2/0x170
handle_irq+0x20/0x30
do_IRQ+0x48/0xd0
common_interrupt+0x89/0x89
...

Although commit cceb0a597320 converted most c2h commands to use a work
queue, the Bluetooth coexistence routines can be in atomic mode when
they execute such a call.

Fixes: cceb0a597320 ("rtlwifi: Add work queue for c2h cmd.")
Signed-off-by: Larry Finger <[email protected]>
Cc: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

Merge tag 'iwlwifi-for-kalle-2017-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes

iwlwifi fixes for 4.11

Here are three patches intended for 4.11.  The first one is an RCU fix
by Sari.  The second one is a fix for a potential out-of-bounds access
crash by Dan.  And finally, the third and bigger one, is a fix for
IBSS, which has been broken since DQA was enabled in the driver.

arm64: fix NULL dereference in have_cpu_die()

Commit 5c492c3f5255 ("arm64: smp: Add function to determine if cpus are
stuck in the kernel") added a helper function to determine if die() is
supported in cpu_ops. This function assumes a cpu will have a valid
cpu_ops entry, but that may not be the case for cpu0 is spin-table or
parking protocol is used to boot secondary cpus. In that case, there
is a NULL dereference if have_cpu_die() is called by cpu0. So add a
check for a valid cpu_ops before dereferencing it.

Fixes: 5c492c3f5255 ("arm64: smp: Add function to determine if cpus are stuck in the kernel")
Signed-off-by: Mark Salter <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

drm/vc4: Allocate the right amount of space for boot-time CRTC state.

Without this, the first modeset would dereference past the allocation
when trying to free the mm node.

Signed-off-by: Eric Anholt <[email protected]>
Tested-by: Stefan Wahren <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
Fixes: d8dbf44f13b9 ("drm/vc4: Make the CRTCs cooperate on allocating display lists.")
Cc: <[email protected]> # v4.6+
Reviewed-by: Daniel Vetter <[email protected]>

Merge branch 'i2c-mux/for-current' of https://github.com/peda-r/i2c-mux into i2c/for-current

Pull in changes for 4.11 from the i2c-mux subsubsystem:

"Here are some changes for the pca954x driver. It's a revert of the
un-endorsed ACPI support and a fixup in the handling of PCA9546."

x86/build: Mostly disable '-maccumulate-outgoing-args'

The GCC '-maccumulate-outgoing-args' flag is enabled for most configs,
mostly because of issues which are no longer relevant.  For most
configs, and with most recent versions of GCC, it's no longer needed.

Clarify which cases need it, and only enable it for those cases.  Also
produce a compile-time error for the ftrace graph + mcount + '-Os' case,
which will otherwise cause runtime failures.

The main benefit of '-maccumulate-outgoing-args' is that it prevents an
ugly prologue for functions which have aligned stacks.  But removing the
option also has some benefits: more readable argument saves, smaller
text size, and (presumably) slightly improved performance.

Here are the object size savings for 32-bit and 64-bit defconfig
kernels:

      text    data     bss      dec     hex filename
  10006710 3543328 1773568 15323606 e9d1d6 vmlinux.x86-32.before
   9706358 3547424 1773568 15027350 e54c96 vmlinux.x86-32.after

      text    data     bss      dec     hex filename
  10652105 4537576 843776 16033457 f4a6b1 vmlinux.x86-64.before
  10639629 4537576 843776 16020981 f475f5 vmlinux.x86-64.after

That comes out to a 3% text size improvement on x86-32 and a 0.1% text
size improvement on x86-64.

Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Andrew Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/20170316193133.zrj6gug53766m6nn@treble
Signed-off-by: Ingo Molnar <[email protected]>

drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl()

In vmw_surface_define_ioctl(), the 'num_sizes' is the sum of the
'req->mip_levels' array. This array can be assigned any value from
the user space. As both the 'num_sizes' and the array is uint32_t,
it is easy to make 'num_sizes' overflow. The later 'mip_levels' is
used as the loop count. This can lead an oob write. Add the check of
'req->mip_levels' to avoid this.

Cc: <[email protected]>
Signed-off-by: Li Qiang <[email protected]>
Reviewed-by: Thomas Hellstrom <[email protected]>

drm/vmwgfx: Remove getparam error message

The mesa winsys sometimes uses unimplemented parameter requests to
check for features. Remove the error message to avoid bloating the
kernel log.

Cc: <[email protected]>
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>

drm/ttm: Avoid calling drm_ht_remove from atomic context

On recent kernels, calling drm_ht_remove triggers a might_sleep() warning
from within vfree(). So avoid calling it from atomic context. The use-cases
we fix here are both from destructors so there should be no concurrent
use of the hash tables.

Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>

drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces

Previously, when a surface was opened using a legacy (non prime) handle,
it was verified to have been created by a client in the same master realm.
Relax this so that opening is also allowed recursively if the client
already has the surface open.

This works around a regression in svga mesa where opening of a shared
surface is used recursively to obtain surface information.

Cc: <[email protected]>
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>

drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl()

In vmw_get_cap_3d_ioctl(), a user can supply 0 for a size that is
used in vzalloc(). This eventually calls dump_stack() (in warn_alloc()),
which can leak useful addresses to dmesg.

Add check to avoid a size of 0.

Cc: <[email protected]>
Signed-off-by: Murray McAllister <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>

drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl()

Before memory allocations vmw_surface_define_ioctl() checks the
upper-bounds of a user-supplied size, but does not check if the
supplied size is 0.

Add check to avoid NULL pointer dereferences.

Cc: <[email protected]>
Signed-off-by: Murray McAllister <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>

drm/vmwgfx: Type-check lookups of fence objects

A malicious caller could otherwise hand over handles to other objects
causing all sorts of interesting problems.

Testing done: Ran a Fedora 25 desktop using both Xorg and
gnome-shell/Wayland.

Cc: <[email protected]>
Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>

HID: wacom: call _query_tablet_data() for BAMBOO_TOUCH

Commit a544c619a54b ("HID: wacom: do not attempt to switch mode
while in probe") introduces delayed work for querying (setting the
mode) on all tablets. Bamboo Touch (056a:00d0) has a ghost
interface which claims to be a pen device. Though this device can
be removed, we have to set the mode on the ghost pen interface
before we remove it. After the aforementioned delay was introduced
the device was being removed before the mode setting could be
executed.

Signed-off-by: Aaron Armstrong Skomra <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>

HID: wacom: Don't add ghost interface as shared data

A previous commit (below) adds a check for already probed interfaces to
Wacom's matching heuristic. Unfortunately this causes the Bamboo Pen
(CTL-460) to match itself to its 'ghost' touch interface. After
subsequent changes to the driver this match to the ghost causes the
kernel to crash. This patch avoids calling wacom_add_shared_data()
for the BAMBOO_PEN's ghost touch interface.

Fixes: 41372d5d40e7 ("HID: wacom: Augment 'oVid' and 'oPid' with heuristics for HID_GENERIC")
Cc: stable <[email protected]> # 4.9
Signed-off-by: Aaron Armstrong Skomra <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>

ACPI / gpio: do not fall back to parsing _CRS when we get a deferral

If, while locating GPIOs by name, we get probe deferral, we should
immediately report it to caller rather than trying to fall back to parsing
unnamed GPIOs from _CRS block.

Cc: [email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
Acked-by: Mika Westerberg <[email protected]>
Acked-and-Tested-by: Hans de Goede <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>

gpio: acpi: Call enable_irq_wake for _IAE GpioInts with Wake set

On Bay Trail / Cherry Trail systems with a LID switch, the LID switch is
often connect to a gpioint handled by an _IAE event handler.
Before this commit such systems would not wake up when opening the lid,
requiring the powerbutton to be pressed after opening the lid to wakeup.

Note that Bay Trail / Cherry Trail systems use suspend-to-idle, so
the interrupts are generated anyway on those lines on lid switch changes,
but they are treated by the IRQ subsystem as spurious while suspended if
not marked as wakeup IRQs.

This commit calls enable_irq_wake() for _IAE GpioInts with a valid
event handler which have their Wake flag set. This fixes such systems
not waking up when opening the lid.

Signed-off-by: Hans de Goede <[email protected]>
Acked-by: Mika Westerberg <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>

s390/uaccess: get_user() should zero on failure (again)

Commit fd2d2b191fe7 ("s390: get_user() should zero on failure")
intended to fix s390's get_user() implementation which did not zero
the target operand if the read from user space faulted. Unfortunately
the patch has no effect: the corresponding inline assembly specifies
that the operand is only written to ("=") and the previous value is
discarded.

Therefore the compiler is free to and actually does omit the zero
initialization.

To fix this simply change the contraint modifier to "+", so the
compiler cannot omit the initialization anymore.

Fixes: c9ca78415ac1 ("s390/uaccess: provide inline variants of get_user/put_user")
Fixes: fd2d2b191fe7 ("s390: get_user() should zero on failure")
Cc: [email protected]
Cc: Al Viro <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux

Pull thermal management fixes from Zhang Rui:

- Fix a potential deadlock in cpu_cooling driver, which was introduced
   in 4.11-rc1. (Matthew Wilcox)

- Fix the cpu_cooling and devfreq_cooling code to handle possible error
   return value from OPP calls, together with three minor fixes in the
   same patch series. (Viresh Kumar)

* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  thermal: cpu_cooling: Check OPP for errors
  thermal: cpu_cooling: Replace dev_warn with dev_err
  thermal: devfreq: Check OPP for errors
  thermal: devfreq_cooling: Replace dev_warn with dev_err
  thermal: devfreq: Simplify expression
  thermal: Fix potential deadlock in cpu_cooling

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains a rather large update with Netfilter
fixes, specifically targeted to incorrect RCU usage in several spots and
the userspace conntrack helper infrastructure (nfnetlink_cthelper),
more specifically they are:

1) expect_class_max is incorrect set via cthelper, as in kernel semantics
   mandate that this represents the array of expectation classes minus 1.
   Patch from Liping Zhang.

2) Expectation policy updates via cthelper are currently broken for several
   reasons: This code allows illegal changes in the policy such as changing
   the number of expeciation classes, it is leaking the updated policy and
   such update occurs with no RCU protection at all. Fix this by adding a
   new nfnl_cthelper_update_policy() that describes what is really legal on
   the update path.

3) Fix several memory leaks in cthelper, from Jeffy Chen.

4) synchronize_rcu() is missing in the removal path of several modules,
   this may lead to races since CPU may still be running on code that has
   just gone. Also from Liping Zhang.

5) Don't use the helper hashtable from cthelper, it is not safe to walk
   over those bits without the helper mutex. Fix this by introducing a
   new independent list for userspace helpers. From Liping Zhang.

6) nf_ct_extend_unregister() needs synchronize_rcu() to make sure no
   packets are walking on any conntrack extension that is gone after
   module removal, again from Liping.

7) nf_nat_snmp may crash if we fail to unregister the helper due to
   accidental leftover code, from Gao Feng.

8) Fix leak in nfnetlink_queue with secctx support, from Liping Zhang.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Five fixes for this series:

   - a fix from me to ensure that blk-mq drivers that terminate IO in
     their ->queue_rq() handler by returning QUEUE_ERROR don't stall
     with a scheduler enabled.

   - four nbd fixes from Josef and Ratna, fixing various problems that
     are critical enough to go in for this cycle. They have been well
     tested"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nbd: replace kill_bdev() with __invalidate_device()
  nbd: set queue timeout properly
  nbd: set rq->errors to actual error code
  nbd: handle ERESTARTSYS properly
  blk-mq: include errors in did_work calculation

ezchip: nps_enet: check if napi has been completed

After a new NAPI_STATE_MISSED state was added to NAPI we can get into
this state and in such case we have to reschedule NAPI as some work is
still pending and we have to process it. napi_complete_done() function
returns false if we have to reschedule something (e.g. in case we were
in MISSED state) as current polling have not been completed yet.

nps_enet driver hasn't been verifying the return value of
napi_complete_done() and has been forcibly enabling interrupts. That is
not correct as we should not enable interrupts before we have processed
all scheduled work. As a result we were getting trapped in interrupt
hanlder chain as we had never been able to disabale ethernet
interrupts again.

So this patch makes nps_enet_poll() func verify return value of
napi_complete_done() and enable interrupts only in case all scheduled
work has been completed.

Signed-off-by: Vlad Zakharov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-03-29

This series contains updates to i40e and i40evf only.

Preethi changes the default driver mode of operation to descriptor
write-back for VF.

Alex cleans up and addresses several issues in the way that i40e handles
private flags.  Modifies the driver to use the length of the packet
instead of the DD status bit to determine if a new descriptor is ready
to be processed.  Refactors the driver by pulling the code responsible
for fetching the receive buffer and synchronizing DMA into a single
function.  Also pulled the code responsible for handling buffer
recycling and page counting and distributed it through several functions,
so we can commonize the bits that handle either freeing or recycling the
buffers.  Cleans up the code in preparation for us adding support for
build_skb().  Changed the way we handle the maximum frame size for the
receive path so it is more consistent with other drivers.

Paul enables XL722 to use the direct read/write method since it does not
support the AQ command to read/write the control register.

Christopher fixes a case where we miss an arq element if a new one is
added before we enable interrupts and exit the loop.

Jake cleans up a pointless goto statement.  Also cleaned up a flag that
was not being used.

Carolyn does round 2 for adding a delay to the receive queue to
accommodate the hardware needs.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'tipc-socketpair'

Parthasarathy Bhuvaragan says:

====================
tipc: add socketpair support

We add socketpair support for connection oriented sockets in
the first patch and for connection less in the second.
====================

Signed-off-by: David S. Miller <[email protected]>

tipc: allow rdm/dgram socketpairs

for socketpairs using connectionless transport, we cache
the respective node local TIPC portid to use in subsequent
calls to send() in the socket's private data.

Signed-off-by: Erik Hugne <[email protected]>
Signed-off-by: Parthasarathy Bhuvaragan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: add support for stream/seqpacket socketpairs

sockets A and B are connected back-to-back, similar to what
AF_UNIX does.

Signed-off-by: Erik Hugne <[email protected]>
Signed-off-by: Parthasarathy Bhuvaragan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvneta: set rx mode during resume if interface is running

I found a bug by:

0. boot and start dhcp client
1. echo mem > /sys/power/state
2. resume back immediately
3. don't touch dhcp client to renew the lease
4. ping the gateway. No acks

Usually, after step2, the DHCP lease isn't expired, so in theory we
should resume all back. But in fact, it doesn't. It turns out
the rx mode isn't resumed correctly. This patch fixes it by adding
mvneta_set_rx_mode(dev) in the resume hook if interface is running.

Signed-off-by: Jisheng Zhang <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvneta: add RGMII_RXID and RGMII_TXID support

RGMII_RXID and RGMII_TX_ID share the same GMAC CTRL setting as RGMII
or RGMII_ID.

Signed-off-by: Jisheng Zhang <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: veth: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <[email protected]>
Reviewed-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'bnxt_en-fixes'

Michael Chan says:

====================
bnxt_en: Small misc. fixes.

Fix a NULL pointer crash in open failure path, wrong arguments when
printing error messages, and a DMA unmap bug in XDP shutdown path.
====================

Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Fix DMA unmapping of the RX buffers in XDP mode during shutdown.

In bnxt_free_rx_skbs(), which is called to free up all RX buffers during
shutdown, we need to unmap the page if we are running in XDP mode.

Fixes: c61fb99cae51 ("bnxt_en: Add RX page mode support.")
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Correct the order of arguments to netdev_err() in bnxt_set_tpa()

Signed-off-by: Sankar Patchineelam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnxt_en: Fix NULL pointer dereference in reopen failure path

Net device reset can fail when the h/w or f/w is in a bad state.
Subsequent netdevice open fails in bnxt_hwrm_stat_ctx_alloc().
The cleanup invokes bnxt_hwrm_resource_free() which inturn
calls bnxt_disable_int(). In this routine, the code segment

if (ring->fw_ring_id != INVALID_HW_RING_ID)
BNXT_CP_DB(cpr->cp_doorbell, cpr->cp_raw_cons);

results in NULL pointer dereference as cpr->cp_doorbell is not yet
initialized, and fw_ring_id is zero.

The fix is to initialize cpr fw_ring_id to INVALID_HW_RING_ID before
bnxt_init_chip() is invoked.

Signed-off-by: Sankar Patchineelam <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cpuidle: powernv: Pass correct drv->cpumask for registration

drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
On PowerNV platform cpu_present could be less than cpu_possible in cases
where firmware detects the cpu, but it is not available to the OS.  When
CONFIG_HOTPLUG_CPU=n, such cpus are not hotplugable at runtime and hence
we skip creating cpu_device.

This breaks cpuidle on powernv where register_cpu() is not called for
cpus in cpu_possible_mask that cannot be hot-added at runtime.

Trying cpuidle_register_device() on cpu without cpu_device will cause
crash like this:

cpu 0xf: Vector: 380 (Data SLB Access) at [c000000ff1503490]
    pc: c00000000022c8bc: string+0x34/0x60
    lr: c00000000022ed78: vsnprintf+0x284/0x42c
    sp: c000000ff1503710
   msr: 9000000000009033
   dar: 6000000060000000
  current = 0xc000000ff1480000
  paca    = 0xc00000000fe82d00   softe: 0        irq_happened: 0x01
    pid   = 1, comm = swapper/8
Linux version 4.11.0-rc2 (sv@sagarika) (gcc version 4.9.4
(Buildroot 2017.02-00004-gc28573e) ) #15 SMP Fri Mar 17 19:32:02 IST 2017
enter ? for help
[link register   ] c00000000022ed78 vsnprintf+0x284/0x42c
[c000000ff1503710] c00000000022ebb8 vsnprintf+0xc4/0x42c (unreliable)
[c000000ff1503800] c00000000022ef40 vscnprintf+0x20/0x44
[c000000ff1503830] c0000000000ab61c vprintk_emit+0x94/0x2cc
[c000000ff15038a0] c0000000000acc9c vprintk_func+0x60/0x74
[c000000ff15038c0] c000000000619694 printk+0x38/0x4c
[c000000ff15038e0] c000000000224950 kobject_get+0x40/0x60
[c000000ff1503950] c00000000022507c kobject_add_internal+0x60/0x2c4
[c000000ff15039e0] c000000000225350 kobject_init_and_add+0x70/0x78
[c000000ff1503a60] c00000000053c288 cpuidle_add_sysfs+0x9c/0xe0
[c000000ff1503ae0] c00000000053aeac cpuidle_register_device+0xd4/0x12c
[c000000ff1503b30] c00000000053b108 cpuidle_register+0x98/0xcc
[c000000ff1503bc0] c00000000085eaf0 powernv_processor_idle_init+0x140/0x1e0
[c000000ff1503c60] c00000000000cd60 do_one_initcall+0xc0/0x15c
[c000000ff1503d20] c000000000833e84 kernel_init_freeable+0x1a0/0x25c
[c000000ff1503dc0] c00000000000d478 kernel_init+0x24/0x12c
[c000000ff1503e30] c00000000000b564 ret_from_kernel_thread+0x5c/0x78

This patch fixes the bug by passing correct cpumask from
powernv-cpuidle driver.

Signed-off-by: Vaidyanathan Srinivasan <[email protected]>
Reviewed-by: Gautham R. Shenoy <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
[ rjw: Comment massage ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

Merge branch 'apw' (xfrm_user fixes)

Merge xfrm_user validation fixes from Andy Whitcroft:
"Two patches we are applying to Ubuntu for XFRM_MSG_NEWAE validation
  issue reported by ZDI.

  The first of these is the primary fix, and the second is for a more
  theoretical issue that Kees pointed out when reviewing the first"

* emailed patches from Andy Whitcroft <[email protected]>:
  xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
  xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window

Merge branch 'for-chris-4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.11

parisc: Avoid stalled CPU warnings after system shutdown

Commit 73580dac7618 ("parisc: Fix system shutdown halt") introduced an endless
loop for systems which don't provide a software power off function. But the
soft lockup detector will detect this and report stalled CPUs after some time.
Avoid those unwanted warnings by disabling the soft lockup detector.

Fixes: 73580dac7618 ("parisc: Fix system shutdown halt")
Signed-off-by: Helge Deller <[email protected]>
Cc: [email protected] # 4.9+

parisc: Clean up fixup routines for get_user()/put_user()

Al Viro noticed that userspace accesses via get_user()/put_user() can be
simplified a lot with regard to usage of the exception handling.

This patch implements a fixup routine for get_user() and put_user() in such
that the exception handler will automatically load -EFAULT into the register
%r8 (the error value) in case on a fault on userspace.  Additionally the fixup
routine will zero the target register on fault in case of a get_user() call.
The target register is extracted out of the faulting assembly instruction.

This patch brings a few benefits over the old implementation:
1. Exception handling gets much cleaner, easier and smaller in size.
2. Helper functions like fixup_get_user_skip_1 (all of fixup.S) can be dropped.
3. No need to hardcode %r9 as target register for get_user() any longer. This
   helps the compiler register allocator and thus creates less assembler
   statements.
4. No dependency on the exception_data contents any longer.
5. Nested faults will be handled cleanly.

Reported-by: Al Viro <[email protected]>
Cc: <[email protected]> # v4.9+
Signed-off-by: Helge Deller <[email protected]>

parisc: Fix access fault handling in pa_memcpy()

pa_memcpy() is the major memcpy implementation in the parisc kernel which is
used to do any kind of userspace/kernel memory copies.

Al Viro noticed various bugs in the implementation of pa_mempcy(), most notably
that in case of faults it may report back to have copied more bytes than it
actually did.

Fixing those bugs is quite hard in the C-implementation, because the compiler
is messing around with the registers and we are not guaranteed that specific
variables are always in the same processor registers. This makes proper fault
handling complicated.

This patch implements pa_memcpy() in assembler. That way we have correct fault
handling and adding a 64-bit copy routine was quite easy.

Runtime tested with 32- and 64bit kernels.

Reported-by: Al Viro <[email protected]>
Cc: <[email protected]> # v4.9+
Signed-off-by: John David Anglin <[email protected]>
Signed-off-by: Helge Deller <[email protected]>

Merge tag 'mlx5e-pedit' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Or Gerlitz says:

====================
mlx5e-pedit 2017-03-28

This series adds support for offloading modifications of packet headers using
ConnectX-5 HW header re-write as an action applied during packet steering.

The offloaded SW mechanism is TC's pedit action. The offloading is
supported for E-Switch steering of VF traffic in the SRIOV
switchdev mode and for NIC (non eswitch) RX.

One use-case for this offload on virtual networks, is when the hypervisor
implements flow based router such as Open-Stack's DVR, where L2 headers
of guest packets re-written with routers' MAC addresses and the IP TTL
is decremented.

Another use case (which can be applied in parallel with routing) is
stateless NAT where guest L3/L4 headers are re-written.

The series is built as follows: the 1st six patches are preperations which
don't yet add new functionality, patches 7-8 add the FW APIs (data-structures
and commands) for header re-write, and patch nine allows offloading driver
to access pedit keys.

The 10th patch is somehow the core of the series, where we translate from
the pedit way to represent set of header modification elements to the FW
API for that same matter.

Once a set of HW modification is established, we register it with the FW
and get a modify header ID. When this ID is used with an action during
packet steering, the HW applies the header modification on the packet.

Patches 11 and 12 implement the above logic as an offload for pedit action
for the NIC and E-Switch use-cases.

I'd like to thanks Elijah Shakkour <[email protected]> for implementing
and helping me testing this functionality on HW simulator, before it could
be done with FW.
====================

Signed-off-by: David S. Miller <[email protected]>

net: mpls: Update lfib_nlmsg_size to skip deleted nexthops

A recent commit skips nexthops in a route if the device has been
deleted. Update lfib_nlmsg_size accordingly.

Reported-by: Roopa Prabhu <[email protected]>
Signed-off-by: David Ahern <[email protected]>
Acked-by: Roopa Prabhu <[email protected]>
Acked-by: Robert Shearman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: Allow building mdio-boardinfo into the kernel

mdio-boardinfo contains code that is helpful for platforms to register
specific MDIO bus devices independent of how CONFIG_MDIO_DEVICE or
CONFIG_PHYLIB will be selected (modular or built-in). In order to make
that possible, let's do the following:

- descend into drivers/net/phy/ unconditionally

- make mdiobus_setup_mdiodev_from_board_info() take a callback argument
  which allows us not to expose the internal MDIO board info list and
  mutex, yet maintain the logic within the same file

- relocate the code that creates a MDIO device into
  drivers/net/phy/mdio_bus.c

- build mdio-boardinfo.o into the kernel as soon as MDIO_DEVICE is
  defined (y or m)

Fixes: 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support from PHYs")
Fixes: 648ea0134069 ("net: phy: Allow pre-declaration of MDIO devices")
Signed-off-by: Florian Fainelli <[email protected]>
Tested-by: Arnd Bergmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

l2tp: purge socket queues in the .destruct() callback

The Rx path may grab the socket right before pppol2tp_release(), but
nothing guarantees that it will enqueue packets before
skb_queue_purge(). Therefore, the socket can be destroyed without its
queues fully purged.

Fix this by purging queues in pppol2tp_session_destruct() where we're
guaranteed nothing is still referencing the socket.

Fixes: 9e9cb6221aa7 ("l2tp: fix userspace reception on plain L2TP sockets")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

l2tp: hold tunnel socket when handling control frames in l2tp_ip and l2tp_ip6

The code following l2tp_tunnel_find() expects that a new reference is
held on sk. Either sk_receive_skb() or the discard_put error path will
drop a reference from the tunnel's socket.

This issue exists in both l2tp_ip and l2tp_ip6.

Fixes: a3c18422a4b4 ("l2tp: hold socket before dropping lock in l2tp_ip{, 6}_recv()")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'regset' (PTRACE_SETREGSET data leakage)

Merge PTRACE_SETREGSET leakage fixes from Dave Martin:
"This series is the collection of fixes I proposed on this topic, that
  have not yet appeared upstream or in the stable branches,

  The issue can leak kernel stack, but doesn't appear to allow userspace
  to attack the kernel directly.  The affected architectures are c6x,
  h8300, metag, mips and sparc.

  [ Mark Salter points out that c6x has no MMU or other mechanism to
    prevent userspace access to kernel code or data on c6x, but it
    doesn't hurt to clean that case up too. ]

  The bugs arise from use of user_regset_copyin(). Users of
  user_regset_copyin() can work in one of two ways:

   1) Copy directly to thread_struct or equivalent. (This seems to be
      the design assumption of the regset API, and is the most common
      approach.)

   2) Copy to a local variable and then transfer to thread_struct. (A
      significant minority of cases.)

  Buggy code typically involves approach 2"

* emailed patches from Dave Martin <[email protected]>:
  sparc/ptrace: Preserve previous registers for short regset write
  mips/ptrace: Preserve previous registers for short regset write
  metag/ptrace: Reject partial NT_METAG_RPIPE writes
  metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
  metag/ptrace: Preserve previous registers for short regset write
  h8300/ptrace: Fix incorrect register transfer count
  c6x/ptrace: Remove useless PTRACE_SETREGSET implementation

sparc/ptrace: Preserve previous registers for short regset write

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.

Signed-off-by: Dave Martin <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mips/ptrace: Preserve previous registers for short regset write

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.

Signed-off-by: Dave Martin <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

metag/ptrace: Reject partial NT_METAG_RPIPE writes

It's not clear what behaviour is sensible when doing partial write of
NT_METAG_RPIPE, so just don't bother.

This patch assumes that userspace will never rely on a partial SETREGSET
in this case, since it's not clear what should happen anyway.

Signed-off-by: Dave Martin <[email protected]>
Acked-by: James Hogan <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill TXSTATUS, a well-defined default value is used, based on the
task's current value.

Suggested-by: James Hogan <[email protected]>
Signed-off-by: Dave Martin <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

metag/ptrace: Preserve previous registers for short regset write

Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.

Signed-off-by: Dave Martin <[email protected]>
Acked-by: James Hogan <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

h8300/ptrace: Fix incorrect register transfer count

regs_set() and regs_get() are vulnerable to an off-by-1 buffer overrun
if CONFIG_CPU_H8S is set, since this adds an extra entry to
register_offset[] but not to user_regs_struct.

So, iterate over user_regs_struct based on its actual size, not based on
the length of register_offset[].

Signed-off-by: Dave Martin <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

c6x/ptrace: Remove useless PTRACE_SETREGSET implementation

gpr_set won't work correctly and can never have been tested, and the
correct behaviour is not clear due to the endianness-dependent task
layout.

So, just remove it. The core code will now return -EOPNOTSUPPORT when
trying to set NT_PRSTATUS on this architecture until/unless a correct
implementation is supplied.

Signed-off-by: Dave Martin <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder

Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to
wrapping issues. To ensure we are correctly ensuring that the two ESN
structures are the same size compare both the overall size as reported
by xfrm_replay_state_esn_len() and the internal length are the same.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <[email protected]>
Acked-by: Steffen Klassert <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window

When a new xfrm state is created during an XFRM_MSG_NEWSA call we
validate the user supplied replay_esn to ensure that the size is valid
and to ensure that the replay_window size is within the allocated
buffer.  However later it is possible to update this replay_esn via a
XFRM_MSG_NEWAE call.  There we again validate the size of the supplied
buffer matches the existing state and if so inject the contents.  We do
not at this point check that the replay_window is within the allocated
memory.  This leads to out-of-bounds reads and writes triggered by
netlink packets.  This leads to memory corruption and the potential for
priviledge escalation.

We already attempt to validate the incoming replay information in
xfrm_new_ae() via xfrm_replay_verify_len().  This confirms that the user
is not trying to change the size of the replay state buffer which
includes the replay_esn.  It however does not check the replay_window
remains within that buffer.  Add validation of the contained
replay_window.

CVE-2017-7184
Signed-off-by: Andy Whitcroft <[email protected]>
Acked-by: Steffen Klassert <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge remote-tracking branch 'mkp-scsi/4.11/scsi-fixes' into fixes

drm/etnaviv: (re-)protect fence allocation with GPU mutex

The fence allocation needs to be protected by the GPU mutex, otherwise
the fence seqnos of concurrent submits might not match the insertion order
of the jobs in the kernel ring. This breaks the assumption that jobs
complete with monotonically increasing fence seqnos.

Fixes: d9853490176c (drm/etnaviv: take GPU lock later in the submit process)
CC: [email protected] #4.9+
Signed-off-by: Lucas Stach <[email protected]>

Btrfs: fix an integer overflow check

This isn't super serious because you need CAP_ADMIN to run this code.

I added this integer overflow check last year but apparently I am
rubbish at writing integer overflow checks... There are two issues.
First, access_ok() works on unsigned long type and not u64 so on 32 bit
systems the access_ok() could be checking a truncated size. The other
issue is that we should be using a stricter limit so we don't overflow
the kzalloc() setting ctx->clone_roots later in the function after the
access_ok():

alloc_size = sizeof(struct clone_root) * (arg->clone_sources_count + 1);
sctx->clone_roots = kzalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN);

Fixes: f5ecec3ce21f ("btrfs: send: silence an integer overflow warning")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: David Sterba <[email protected]>
[ added comment ]
Signed-off-by: David Sterba <[email protected]>

btrfs: Change qgroup_meta_rsv to 64bit

Using an int value is causing qg->reserved to become negative and
exclusive -EDQUOT to be reached prematurely.

This affects exclusive qgroups only.

TEST CASE:

DEVICE=/dev/vdb
MOUNTPOINT=/mnt
SUBVOL=$MOUNTPOINT/tmp

umount $SUBVOL
umount $MOUNTPOINT

mkfs.btrfs -f $DEVICE
mount /dev/vdb $MOUNTPOINT
btrfs quota enable $MOUNTPOINT
btrfs subvol create $SUBVOL
umount $MOUNTPOINT
mount /dev/vdb $MOUNTPOINT
mount -o subvol=tmp $DEVICE $SUBVOL
btrfs qgroup limit -e 3G $SUBVOL

btrfs quota rescan /mnt -w

for i in `seq 1 44000`; do
  dd if=/dev/zero of=/mnt/tmp/test_$i bs=10k count=1
  if [[ $? > 0 ]]; then
     btrfs qgroup show -pcref $SUBVOL
     exit 1
  fi
done

Signed-off-by: Goldwyn Rodrigues <[email protected]>
[ add reproducer to changelog ]
Signed-off-by: David Sterba <[email protected]>

Btrfs: bring back repair during read

Commit 20a7db8ab3f2 ("btrfs: add dummy callback for readpage_io_failed
and drop checks") made a cleanup around readpage_io_failed_hook, and
it was supposed to keep the original sematics, but it also
unexpectedly disabled repair during read for dup, raid1 and raid10.

This fixes the problem by letting data's inode call the generic
readpage_io_failed callback by returning -EAGAIN from its
readpage_io_failed_hook in order to notify end_bio_extent_readpage to
do the rest. We don't call it directly because the generic one takes
an offset from end_bio_extent_readpage() to calculate the index in the
checksum array and inode's readpage_io_failed_hook doesn't offer that
offset.

Cc: David Sterba <[email protected]>
Signed-off-by: Liu Bo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
[ keep the const function attribute ]
Signed-off-by: David Sterba <[email protected]>

mac80211: unconditionally start new netdev queues with iTXQ support

When internal mac80211 TXQs aren't supported, netdev queues must
always started out started even when driver queues are stopped
while the interface is added. This is necessary because with the
internal TXQ support netdev queues are never stopped and packet
scheduling/dropping is done in mac80211.

Cc: [email protected] # 4.9+
Fixes: 80a83cfc434b1 ("mac80211: skip netdev queue control with software queuing")
Reported-and-tested-by: Sven Eckelmann <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

Merge remote-tracking branches 'asoc/fix/rt5665', 'asoc/fix/simple', 'asoc/fix/sti' and 'asoc/fix/sun8i' into asoc-linus

Merge remote-tracking branches 'asoc/fix/adsp', 'asoc/fix/atmel', 'asoc/fix/hdac-hdmi' and 'asoc/fix/mtk' into asoc-linus

Merge remote-tracking branch 'asoc/fix/rcar' into asoc-linus

Merge remote-tracking branch 'asoc/fix/intel' into asoc-linus

netfilter: nfnetlink_queue: fix secctx memory leak

We must call security_release_secctx to free the memory returned by
security_secid_to_secctx, otherwise memory may be leaked forever.

Fixes: ef493bd930ae ("netfilter: nfnetlink_queue: add security context information")
Signed-off-by: Liping Zhang <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

usb: phy: isp1301: Fix build warning when CONFIG_OF is disabled

Commit fd567653bdb9 ("usb: phy: isp1301: Add OF device ID table")
added an OF device ID table, but used the of_match_ptr() macro
that will lead to a build warning if CONFIG_OF symbol is disabled:

drivers/usb/phy//phy-isp1301.c:36:34: warning: ‘isp1301_of_match’ defined but not used [-Wunused-const-variable=]
static const struct of_device_id isp1301_of_match[] = {
^~~~~~~~~~~~~~~~

Fixes: fd567653bdb9 ("usb: phy: isp1301: Add OF device ID table")
Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Javier Martinez Canillas <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

xhci: Manually give back cancelled URB if we can't queue it for cancel

xhci needs to take care of four scenarios when asked to cancel a URB.

1 URB is not queued or already given back.
  usb_hcd_check_unlink_urb() will return an error, we pass the error on

2 We fail to find xhci internal structures from urb private data such as
  virtual device and endpoint ring.
  Give back URB immediately, can't do anything about internal structures.

3 URB private data has valid pointers to xhci internal data, but host is
  not  responding.
  give back URB immedately and remove the URB from the endpoint lists.

4 Everyting is working
  add URB to cancel list, queue a command to stop the endpoint, after
  which the URB can be turned to no-op or skipped, removed from lists,
  and given back.

We failed to give back the urb in case 2 where the correct device and
endpoint pointers could not be retrieved from URB private data.

This caused a hang on Dell Inspiron 5558/0VNM2T at resume from suspend
as urb was never returned.

[  245.270505] INFO: task rtsx_usb_ms_1:254 blocked for more than 120 seconds.
[  245.272244]       Tainted: G        W       4.11.0-rc3-ARCH #2
[  245.273983] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  245.275737] rtsx_usb_ms_1   D    0   254      2 0x00000000
[  245.277524] Call Trace:
[  245.279278]  __schedule+0x2d3/0x8a0
[  245.281077]  schedule+0x3d/0x90
[  245.281961]  usb_kill_urb.part.3+0x6c/0xa0 [usbcore]
[  245.282861]  ? wake_atomic_t_function+0x60/0x60
[  245.283760]  usb_kill_urb+0x21/0x30 [usbcore]
[  245.284649]  usb_start_wait_urb+0xe5/0x170 [usbcore]
[  245.285541]  ? try_to_del_timer_sync+0x53/0x80
[  245.286434]  usb_bulk_msg+0xbd/0x160 [usbcore]
[  245.287326]  rtsx_usb_send_cmd+0x63/0x90 [rtsx_usb]

Reported-by: [email protected]
Tested-by: [email protected]
Cc: [email protected]
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

xhci: Set URB actual length for stopped control transfers

A control transfer that stopped at the status stage incorrectly
warned about a "unexpected TRB Type 4", and did not set the
transferred actual_length for the URB.

The URB actual_length for control transfers should contain the
bytes transferred in the data stage.

Bytes of a partially sent setup stage and missing bytes from
status stage should be left out.

Cc: <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

xhci: plat: Register shutdown for xhci_plat

Shutdown should be called for xhci_plat devices especially for
situations where kexec might be used by stopping DMA
transactions.

Signed-off-by: Adam Wallis <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

i40e: fix for queue timing delays

This patch adds a delay to Rx queue disables to accommodate HW needs.

v2: Added missing check for disable only, additional details on the
need for the ugly delay and fixed spacing on comment.

Change-ID: I2864ca667ce5dcc2cc44f8718113b719742a46a1
Signed-off-by: Carolyn Wyborny <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e/i40evf: Change the way we limit the maximum frame size for Rx

This patch changes the way we handle the maximum frame size for the Rx
path.  Previously we were rounding up to 2K for a 1500 MTU and then brining
the max frame size down to MTU plus a fixed amount.  With this patch
applied what we now do is limit the maximum frame to 1.5K minus the value
for NET_IP_ALIGN for standard MTU, and for any MTU greater than 1500 we
allow up to the maximum frame size.  This makes the behavior more
consistent with the other drivers such as igb which had similar logic.  In
addition it reduces the test matrix for MTU since we only have two max
frame sizes that are handled for Rx now.

Change-ID: I23a9d3c857e7df04b0ef28c64df63e659c013f3f
Signed-off-by: Alexander Duyck <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e/i40evf: Add legacy-rx private flag to allow fallback to old Rx flow

This patch adds a control which will allow us to toggle into and out of the
legacy Rx mode. The legacy Rx mode is what we currently do when performing
Rx. As I make further changes what should happen is that the driver will
fall back to the behavior for Rx as of this patch should the "legacy-rx"
flag be set to on.

Change-ID: I0342998849bbb31351cce05f6e182c99174e7751
Signed-off-by: Alexander Duyck <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e/i40evf: Break i40e_fetch_rx_buffer up to allow for reuse of frag code

This patch is meant to clean up the code in preparation for us adding
support for build_skb. Specifically we deconstruct i40e_fetch_buffer into
several functions so that those functions can later be reused when we add a
path for build_skb.

Specifically with this change we split out the code for adding a page to an
exiting skb.

Change-ID: Iab1efbab6b8b97cb60ab9fdd0be1d37a056a154d
Signed-off-by: Alexander Duyck <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e/i40evf: Pull out code for cleaning up Rx buffers

This patch pulls out the code responsible for handling buffer recycling and
page counting and distributes it through several functions. This allows us
to commonize the bits that handle either freeing or recycling the buffers.

As far as the page count tracking one change to the logic is that
pagecnt_bias is decremented as soon as we call i40e_get_rx_buffer. It is
then the responsibility of the function that pulls the data to either
increment the pagecnt_bias if the buffer can be recycled as-is, or to
update page_offset so that we are pointing at the correct location for
placement of the next buffer.

Change-ID: Ibac576360cb7f0b1627f2a993d13c1a8a2bf60af
Signed-off-by: Alexander Duyck <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e/i40evf: Pull code for grabbing and syncing rx_buffer from fetch_buffer

This patch pulls the code responsible for fetching the Rx buffer and
synchronizing DMA into a function, specifically called i40e_get_rx_buffer.

The general idea is to allow for better code reuse by pulling this out of
i40e_fetch_rx_buffer. We dropped a couple of prefetches since the time
between the prefetch being called and the data being accessed was too small
to be useful.

Change-ID: I4885fce4b2637dbedc8e16431169d23d3d7e79b9
Signed-off-by: Alexander Duyck <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e/i40evf: Use length to determine if descriptor is done

This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.

Change-ID: Iebdf9cdb36c454ef092df27199b92ad09c374231
Signed-off-by: Alexander Duyck <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: remove FDIR_REQUIRES_REINIT driver flag

This flag hasn't been used since commit 1e1be8f622ee ("i40e: ATR policy
change to flush the table to clean stale ATR rules").

Lets simplify things and just remove it.

Change-ID: I76279d84db8a2fd96f445b96aa413059f9256879
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: remove a useless goto statement

The goto found here for when in MFP mode is pointless. It jumps to the
end of a series of if blocks. However, right after this statement is
a closing '}' for this if block, which will result in the program flow
going to the exact same location as the goto statement indicates. Thus,
regardless of whether we are in MFP mode, the program flow will resume
from the same location.

This arose due to various refactoring which did not notice that this
goto became essentially a no-op.

To properly understand this diff you will need to view a larger context
than is given by default.

Change-ID: I088f73c3831aa5c4e2281380c7a3ce605594300c
Signed-off-by: Jacob Keller <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: Check for new arq elements before leaving the adminq subtask loop

Fix a case where we miss an arq element if a new one is added before we
enable interrupts and exit the arq subtask loop. This occurs frequently
with RDMA running on Windows VF and causes long delays that prevent SMB
from establishing connections.

Change-ID: I3e1c8b2b960c12857d9b8275bea2c1563674392e
Signed-off-by: Christopher N Bednarz <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: use register for XL722 control register read/write

The XL722 doesn't support the AQ command to read/write the control
register so enable it to bypass the check and use the direct read/write
method.

Change-ID: Iefecc737b57207485c90845af5989d5af518bf16
Signed-off-by: Paul M Stillwell Jr <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: Clean up handling of private flags

This patch cleans up and addresses several issues in the way that i40e
handles private flags. Previously the code was choosing fixed bits and
trying to match them up with strings in a somewhat haphazard way. This
resulted in the possibility for adding a new bit and causing a mismatch as
the private flags are linear bits starting at 0, and the private flags in
the driver were split up over a group specific to the PF and a group that
was global.

What this change does is define an array of structs used to represent the
private flags. Contained within the structs are the bits necessary to know
which flags to set and/or clear depending on the state of the bit. By
doing this we can add new bits in the future with minimal overhead and
avoid creating possible mis-matches should we need to remove a flag based
on compile options.

Change-ID: Ia3214ab04f0ab2f70354ac0997a135f1d01b0acd
Signed-off-by: Alexander Duyck <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>