Linus Torvalds [Thu, 30 Mar 2017 22:08:38 +0000 (15:08 -0700)]
Merge tag 'pci-v4.11-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- fix iProc memory corruption
- fix ThunderX usage of unregistered PNP/ACPI ID
- fix ThunderX resource reservation on early firmware
* tag 'pci-v4.11-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: thunder-pem: Add legacy firmware support for Cavium ThunderX host controller
PCI: thunder-pem: Use Cavium assigned hardware ID for ThunderX host controller
PCI: iproc: Save host bridge window resource in struct iproc_pcie
Max Filippov [Wed, 29 Mar 2017 22:44:47 +0000 (15:44 -0700)]
xtensa: make __pa work with uncached KSEG addresses
When __pa is applied to virtual address in uncached KSEG region the
result is incorrect. Fix it by checking if the original address is in
the uncached KSEG and adjusting the result. It looks better than masking
off bits because pfn_valid would correctly work with new __pa results
and it may be made working in noMMU case, once we get definition for
uncached memory view.
This is required for the dma_common_mmap and DMA debug code to work
correctly: they both indirectly use __pa with coherent DMA addresses.
In case of DMA debug the visible effect is false reports that an address
mapped for DMA is accessed by CPU.
Nathan Fontenot [Wed, 29 Mar 2017 19:14:55 +0000 (15:14 -0400)]
ibmvnic: Remove debugfs support
The debugfs support in the ibmvnic driver is not, and never has been,
supported. Just remove it.
The work done in the debugfs code for the driver was part of the original
spec for the ibmvnic driver. The corresponding support for this from the
server side was never supported and has been dropped.
Eric Dumazet [Wed, 29 Mar 2017 17:45:44 +0000 (10:45 -0700)]
bonding: refine bond_fold_stats() wrap detection
Some device drivers reset their stats at down/up events, possibly
fooling bonding stats, since they operate with relative deltas.
It is nearly not possible to fix drivers, since some of them compute the
tx/rx counters based on per rx/tx queue stats, and the queues can be
reconfigured (ethtool -L) between the down/up sequence.
Lets avoid accumulating 'negative' values that render bonding stats
useless.
It is better to lose small deltas, assuming the bonding stats are
fetched at a reasonable frequency.
Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
The controller has different timings for MMC_TIMING_UHS_DDR50 and
MMC_TIMING_MMC_DDR52. Configuring the controller with SDHCI_CTRL_UHS_DDR50,
when MMC_TIMING_MMC_DDR52 timings are requested, is not correct and can
lead to unexpected behavior.
Hans de Goede [Sun, 26 Mar 2017 11:14:45 +0000 (13:14 +0200)]
mmc: sdhci: Disable runtime pm when the sdio_irq is enabled
SDIO cards may need clock to send the card interrupt to the host.
On a cherrytrail tablet with a RTL8723BS wifi chip, without this patch
pinging the tablet results in:
PING 192.168.1.14 (192.168.1.14) 56(84) bytes of data.
64 bytes from 192.168.1.14: icmp_seq=1 ttl=64 time=78.6 ms
64 bytes from 192.168.1.14: icmp_seq=2 ttl=64 time=1760 ms
64 bytes from 192.168.1.14: icmp_seq=3 ttl=64 time=753 ms
64 bytes from 192.168.1.14: icmp_seq=4 ttl=64 time=3.88 ms
64 bytes from 192.168.1.14: icmp_seq=5 ttl=64 time=795 ms
64 bytes from 192.168.1.14: icmp_seq=6 ttl=64 time=1841 ms
64 bytes from 192.168.1.14: icmp_seq=7 ttl=64 time=810 ms
64 bytes from 192.168.1.14: icmp_seq=8 ttl=64 time=1860 ms
64 bytes from 192.168.1.14: icmp_seq=9 ttl=64 time=812 ms
64 bytes from 192.168.1.14: icmp_seq=10 ttl=64 time=48.6 ms
Where as with this patch I get:
PING 192.168.1.14 (192.168.1.14) 56(84) bytes of data.
64 bytes from 192.168.1.14: icmp_seq=1 ttl=64 time=3.96 ms
64 bytes from 192.168.1.14: icmp_seq=2 ttl=64 time=1.97 ms
64 bytes from 192.168.1.14: icmp_seq=3 ttl=64 time=17.2 ms
64 bytes from 192.168.1.14: icmp_seq=4 ttl=64 time=2.46 ms
64 bytes from 192.168.1.14: icmp_seq=5 ttl=64 time=2.83 ms
64 bytes from 192.168.1.14: icmp_seq=6 ttl=64 time=1.40 ms
64 bytes from 192.168.1.14: icmp_seq=7 ttl=64 time=2.10 ms
64 bytes from 192.168.1.14: icmp_seq=8 ttl=64 time=1.40 ms
64 bytes from 192.168.1.14: icmp_seq=9 ttl=64 time=2.04 ms
64 bytes from 192.168.1.14: icmp_seq=10 ttl=64 time=1.40 ms
Masahiro Yamada [Thu, 30 Mar 2017 17:44:17 +0000 (02:44 +0900)]
arm64: drop non-existing vdso-offsets.h from .gitignore
Since commit a66649dab350 ("arm64: fix vdso-offsets.h dependency"),
include/generated/vdso-offsets.h is directly generated without
arch/arm64/kernel/vdso/vdso-offsets.h.
Andrey Konovalov [Wed, 29 Mar 2017 14:11:20 +0000 (16:11 +0200)]
net/packet: fix overflow in check for priv area size
Subtracting tp_sizeof_priv from tp_block_size and casting to int
to check whether one is less then the other doesn't always work
(both of them are unsigned ints).
Compare them as is instead.
Also cast tp_sizeof_priv to u64 before using BLK_PLUS_PRIV, as
it can overflow inside BLK_PLUS_PRIV otherwise.
Takashi Iwai [Thu, 30 Mar 2017 18:03:25 +0000 (20:03 +0200)]
Merge tag 'asoc-fix-v4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v4.11
A relatively large pile of fixes for mainline, the first since the merge
window. The biggest block of changes here by volume is the sun8i-codec
set, the driver was newly added in the merge window but it was realized
that renaming some of the user visible controls was required so these
are being pushed for v4.11 to avoid the original code appearing in a
release. Otherwise it's all fairly standard bugfix stuff.
Wadim Egorov [Wed, 29 Mar 2017 12:12:19 +0000 (14:12 +0200)]
net: stmmac: dwmac-rk: Add handling for RGMII_ID/RXID/TXID
ATM dwmac-rk will always set and enable it's internal delay lines.
Using PHY internal delays in combination with the phy-mode
rgmii-id/rxid/txid was not possible. Only rgmii was supported.
Now we can disable rockchip's gmac delay lines and also use
rgmii-id/rxid/txid.
LABBE Corentin [Wed, 29 Mar 2017 05:05:40 +0000 (07:05 +0200)]
Revert "net: stmmac: enable multiple buffers"
The commit aff3d9eff843 ("net: stmmac: enable multiple buffers") breaks
numerous boards. while some patch exists for fixing some of it,
dwmac-sunxi is still broken with it.
Since this patch is very huge, it will be better to split it in smaller
part.
Arend Van Spriel [Tue, 28 Mar 2017 08:11:30 +0000 (09:11 +0100)]
brcmfmac: use local iftype avoiding use-after-free of virtual interface
A use-after-free was found using KASAN. In brcmf_p2p_del_if() the virtual
interface is removed using call to brcmf_remove_interface(). After that
the virtual interface instance has been freed and should not be referenced.
Solve this by storing the nl80211 iftype in local variable, which is used
in a couple of places anyway.
Larry Finger [Tue, 21 Mar 2017 14:24:11 +0000 (09:24 -0500)]
rtlwifi: Fix scheduling while atomic splat
Following commit cceb0a597320 ("rtlwifi: Add work queue for c2h cmd."),
the following BUG is reported when rtl8723be is used:
BUG: sleeping function called from invalid context at mm/slab.h:432
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/0
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W O 4.11.0-rc3-wl+ #276
Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.50 09/29/2014
Call Trace:
<IRQ>
dump_stack+0x63/0x89
___might_sleep+0xe9/0x130
__might_sleep+0x4a/0x90
kmem_cache_alloc_trace+0x19f/0x200
? rtl_c2hcmd_enqueue+0x3e/0x110 [rtlwifi]
rtl_c2hcmd_enqueue+0x3e/0x110 [rtlwifi]
rtl8723be_c2h_packet_handler+0xac/0xc0 [rtl8723be]
rtl8723be_rx_command_packet+0x37/0x5c [rtl8723be]
_rtl_pci_rx_interrupt+0x200/0x6b0 [rtl_pci]
_rtl_pci_interrupt+0x20c/0x5d0 [rtl_pci]
__handle_irq_event_percpu+0x3f/0x1d0
handle_irq_event_percpu+0x23/0x60
handle_irq_event+0x3c/0x60
handle_fasteoi_irq+0xa2/0x170
handle_irq+0x20/0x30
do_IRQ+0x48/0xd0
common_interrupt+0x89/0x89
...
Although commit cceb0a597320 converted most c2h commands to use a work
queue, the Bluetooth coexistence routines can be in atomic mode when
they execute such a call.
Kalle Valo [Thu, 30 Mar 2017 16:38:15 +0000 (19:38 +0300)]
Merge tag 'iwlwifi-for-kalle-2017-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes
iwlwifi fixes for 4.11
Here are three patches intended for 4.11. The first one is an RCU fix
by Sari. The second one is a fix for a potential out-of-bounds access
crash by Dan. And finally, the third and bigger one, is a fix for
IBSS, which has been broken since DQA was enabled in the driver.
Mark Salter [Fri, 24 Mar 2017 13:53:56 +0000 (09:53 -0400)]
arm64: fix NULL dereference in have_cpu_die()
Commit 5c492c3f5255 ("arm64: smp: Add function to determine if cpus are
stuck in the kernel") added a helper function to determine if die() is
supported in cpu_ops. This function assumes a cpu will have a valid
cpu_ops entry, but that may not be the case for cpu0 is spin-table or
parking protocol is used to boot secondary cpus. In that case, there
is a NULL dereference if have_cpu_die() is called by cpu0. So add a
check for a valid cpu_ops before dereferencing it.
Fixes: 5c492c3f5255 ("arm64: smp: Add function to determine if cpus are stuck in the kernel") Signed-off-by: Mark Salter <[email protected]> Signed-off-by: Will Deacon <[email protected]>
The GCC '-maccumulate-outgoing-args' flag is enabled for most configs,
mostly because of issues which are no longer relevant. For most
configs, and with most recent versions of GCC, it's no longer needed.
Clarify which cases need it, and only enable it for those cases. Also
produce a compile-time error for the ftrace graph + mcount + '-Os' case,
which will otherwise cause runtime failures.
The main benefit of '-maccumulate-outgoing-args' is that it prevents an
ugly prologue for functions which have aligned stacks. But removing the
option also has some benefits: more readable argument saves, smaller
text size, and (presumably) slightly improved performance.
Here are the object size savings for 32-bit and 64-bit defconfig
kernels:
Li Qiang [Tue, 28 Mar 2017 03:10:53 +0000 (20:10 -0700)]
drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl()
In vmw_surface_define_ioctl(), the 'num_sizes' is the sum of the
'req->mip_levels' array. This array can be assigned any value from
the user space. As both the 'num_sizes' and the array is uint32_t,
it is easy to make 'num_sizes' overflow. The later 'mip_levels' is
used as the loop count. This can lead an oob write. Add the check of
'req->mip_levels' to avoid this.
Thomas Hellstrom [Mon, 27 Mar 2017 10:38:25 +0000 (12:38 +0200)]
drm/ttm: Avoid calling drm_ht_remove from atomic context
On recent kernels, calling drm_ht_remove triggers a might_sleep() warning
from within vfree(). So avoid calling it from atomic context. The use-cases
we fix here are both from destructors so there should be no concurrent
use of the hash tables.
Thomas Hellstrom [Mon, 27 Mar 2017 09:21:25 +0000 (11:21 +0200)]
drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces
Previously, when a surface was opened using a legacy (non prime) handle,
it was verified to have been created by a client in the same master realm.
Relax this so that opening is also allowed recursively if the client
already has the surface open.
This works around a regression in svga mesa where opening of a shared
surface is used recursively to obtain surface information.
drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl()
In vmw_get_cap_3d_ioctl(), a user can supply 0 for a size that is
used in vzalloc(). This eventually calls dump_stack() (in warn_alloc()),
which can leak useful addresses to dmesg.
HID: wacom: call _query_tablet_data() for BAMBOO_TOUCH
Commit a544c619a54b ("HID: wacom: do not attempt to switch mode
while in probe") introduces delayed work for querying (setting the
mode) on all tablets. Bamboo Touch (056a:00d0) has a ghost
interface which claims to be a pen device. Though this device can
be removed, we have to set the mode on the ghost pen interface
before we remove it. After the aforementioned delay was introduced
the device was being removed before the mode setting could be
executed.
HID: wacom: Don't add ghost interface as shared data
A previous commit (below) adds a check for already probed interfaces to
Wacom's matching heuristic. Unfortunately this causes the Bamboo Pen
(CTL-460) to match itself to its 'ghost' touch interface. After
subsequent changes to the driver this match to the ghost causes the
kernel to crash. This patch avoids calling wacom_add_shared_data()
for the BAMBOO_PEN's ghost touch interface.
Fixes: 41372d5d40e7 ("HID: wacom: Augment 'oVid' and 'oPid' with heuristics for HID_GENERIC") Cc: stable <[email protected]> # 4.9 Signed-off-by: Aaron Armstrong Skomra <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
Dmitry Torokhov [Thu, 23 Mar 2017 20:21:38 +0000 (13:21 -0700)]
ACPI / gpio: do not fall back to parsing _CRS when we get a deferral
If, while locating GPIOs by name, we get probe deferral, we should
immediately report it to caller rather than trying to fall back to parsing
unnamed GPIOs from _CRS block.
Hans de Goede [Fri, 24 Mar 2017 10:08:47 +0000 (11:08 +0100)]
gpio: acpi: Call enable_irq_wake for _IAE GpioInts with Wake set
On Bay Trail / Cherry Trail systems with a LID switch, the LID switch is
often connect to a gpioint handled by an _IAE event handler.
Before this commit such systems would not wake up when opening the lid,
requiring the powerbutton to be pressed after opening the lid to wakeup.
Note that Bay Trail / Cherry Trail systems use suspend-to-idle, so
the interrupts are generated anyway on those lines on lid switch changes,
but they are treated by the IRQ subsystem as spurious while suspended if
not marked as wakeup IRQs.
This commit calls enable_irq_wake() for _IAE GpioInts with a valid
event handler which have their Wake flag set. This fixes such systems
not waking up when opening the lid.
Heiko Carstens [Mon, 27 Mar 2017 07:48:04 +0000 (09:48 +0200)]
s390/uaccess: get_user() should zero on failure (again)
Commit fd2d2b191fe7 ("s390: get_user() should zero on failure")
intended to fix s390's get_user() implementation which did not zero
the target operand if the read from user space faulted. Unfortunately
the patch has no effect: the corresponding inline assembly specifies
that the operand is only written to ("=") and the previous value is
discarded.
Therefore the compiler is free to and actually does omit the zero
initialization.
To fix this simply change the contraint modifier to "+", so the
compiler cannot omit the initialization anymore.
Fixes: c9ca78415ac1 ("s390/uaccess: provide inline variants of get_user/put_user") Fixes: fd2d2b191fe7 ("s390: get_user() should zero on failure") Cc: [email protected] Cc: Al Viro <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
Linus Torvalds [Thu, 30 Mar 2017 02:59:49 +0000 (19:59 -0700)]
Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management fixes from Zhang Rui:
- Fix a potential deadlock in cpu_cooling driver, which was introduced
in 4.11-rc1. (Matthew Wilcox)
- Fix the cpu_cooling and devfreq_cooling code to handle possible error
return value from OPP calls, together with three minor fixes in the
same patch series. (Viresh Kumar)
* 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: cpu_cooling: Check OPP for errors
thermal: cpu_cooling: Replace dev_warn with dev_err
thermal: devfreq: Check OPP for errors
thermal: devfreq_cooling: Replace dev_warn with dev_err
thermal: devfreq: Simplify expression
thermal: Fix potential deadlock in cpu_cooling
The following patchset contains a rather large update with Netfilter
fixes, specifically targeted to incorrect RCU usage in several spots and
the userspace conntrack helper infrastructure (nfnetlink_cthelper),
more specifically they are:
1) expect_class_max is incorrect set via cthelper, as in kernel semantics
mandate that this represents the array of expectation classes minus 1.
Patch from Liping Zhang.
2) Expectation policy updates via cthelper are currently broken for several
reasons: This code allows illegal changes in the policy such as changing
the number of expeciation classes, it is leaking the updated policy and
such update occurs with no RCU protection at all. Fix this by adding a
new nfnl_cthelper_update_policy() that describes what is really legal on
the update path.
3) Fix several memory leaks in cthelper, from Jeffy Chen.
4) synchronize_rcu() is missing in the removal path of several modules,
this may lead to races since CPU may still be running on code that has
just gone. Also from Liping Zhang.
5) Don't use the helper hashtable from cthelper, it is not safe to walk
over those bits without the helper mutex. Fix this by introducing a
new independent list for userspace helpers. From Liping Zhang.
6) nf_ct_extend_unregister() needs synchronize_rcu() to make sure no
packets are walking on any conntrack extension that is gone after
module removal, again from Liping.
7) nf_nat_snmp may crash if we fail to unregister the helper due to
accidental leftover code, from Gao Feng.
8) Fix leak in nfnetlink_queue with secctx support, from Liping Zhang.
====================
Linus Torvalds [Wed, 29 Mar 2017 21:30:19 +0000 (14:30 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Five fixes for this series:
- a fix from me to ensure that blk-mq drivers that terminate IO in
their ->queue_rq() handler by returning QUEUE_ERROR don't stall
with a scheduler enabled.
- four nbd fixes from Josef and Ratna, fixing various problems that
are critical enough to go in for this cycle. They have been well
tested"
* 'for-linus' of git://git.kernel.dk/linux-block:
nbd: replace kill_bdev() with __invalidate_device()
nbd: set queue timeout properly
nbd: set rq->errors to actual error code
nbd: handle ERESTARTSYS properly
blk-mq: include errors in did_work calculation
Zakharov Vlad [Wed, 29 Mar 2017 10:41:46 +0000 (13:41 +0300)]
ezchip: nps_enet: check if napi has been completed
After a new NAPI_STATE_MISSED state was added to NAPI we can get into
this state and in such case we have to reschedule NAPI as some work is
still pending and we have to process it. napi_complete_done() function
returns false if we have to reschedule something (e.g. in case we were
in MISSED state) as current polling have not been completed yet.
nps_enet driver hasn't been verifying the return value of
napi_complete_done() and has been forcibly enabling interrupts. That is
not correct as we should not enable interrupts before we have processed
all scheduled work. As a result we were getting trapped in interrupt
hanlder chain as we had never been able to disabale ethernet
interrupts again.
So this patch makes nps_enet_poll() func verify return value of
napi_complete_done() and enable interrupts only in case all scheduled
work has been completed.
David S. Miller [Wed, 29 Mar 2017 21:13:09 +0000 (14:13 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-03-29
This series contains updates to i40e and i40evf only.
Preethi changes the default driver mode of operation to descriptor
write-back for VF.
Alex cleans up and addresses several issues in the way that i40e handles
private flags. Modifies the driver to use the length of the packet
instead of the DD status bit to determine if a new descriptor is ready
to be processed. Refactors the driver by pulling the code responsible
for fetching the receive buffer and synchronizing DMA into a single
function. Also pulled the code responsible for handling buffer
recycling and page counting and distributed it through several functions,
so we can commonize the bits that handle either freeing or recycling the
buffers. Cleans up the code in preparation for us adding support for
build_skb(). Changed the way we handle the maximum frame size for the
receive path so it is more consistent with other drivers.
Paul enables XL722 to use the direct read/write method since it does not
support the AQ command to read/write the control register.
Christopher fixes a case where we miss an arq element if a new one is
added before we enable interrupts and exit the loop.
Jake cleans up a pointless goto statement. Also cleaned up a flag that
was not being used.
Carolyn does round 2 for adding a delay to the receive queue to
accommodate the hardware needs.
====================
Erik Hugne [Wed, 29 Mar 2017 09:22:17 +0000 (11:22 +0200)]
tipc: allow rdm/dgram socketpairs
for socketpairs using connectionless transport, we cache
the respective node local TIPC portid to use in subsequent
calls to send() in the socket's private data.
Jisheng Zhang [Wed, 29 Mar 2017 08:47:19 +0000 (16:47 +0800)]
net: mvneta: set rx mode during resume if interface is running
I found a bug by:
0. boot and start dhcp client
1. echo mem > /sys/power/state
2. resume back immediately
3. don't touch dhcp client to renew the lease
4. ping the gateway. No acks
Usually, after step2, the DHCP lease isn't expired, so in theory we
should resume all back. But in fact, it doesn't. It turns out
the rx mode isn't resumed correctly. This patch fixes it by adding
mvneta_set_rx_mode(dev) in the resume hook if interface is running.
David S. Miller [Wed, 29 Mar 2017 21:05:34 +0000 (14:05 -0700)]
Merge branch 'bnxt_en-fixes'
Michael Chan says:
====================
bnxt_en: Small misc. fixes.
Fix a NULL pointer crash in open failure path, wrong arguments when
printing error messages, and a DMA unmap bug in XDP shutdown path.
====================
Michael Chan [Tue, 28 Mar 2017 23:47:31 +0000 (19:47 -0400)]
bnxt_en: Fix DMA unmapping of the RX buffers in XDP mode during shutdown.
In bnxt_free_rx_skbs(), which is called to free up all RX buffers during
shutdown, we need to unmap the page if we are running in XDP mode.
Fixes: c61fb99cae51 ("bnxt_en: Add RX page mode support.") Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
bnxt_en: Fix NULL pointer dereference in reopen failure path
Net device reset can fail when the h/w or f/w is in a bad state.
Subsequent netdevice open fails in bnxt_hwrm_stat_ctx_alloc().
The cleanup invokes bnxt_hwrm_resource_free() which inturn
calls bnxt_disable_int(). In this routine, the code segment
if (ring->fw_ring_id != INVALID_HW_RING_ID)
BNXT_CP_DB(cpr->cp_doorbell, cpr->cp_raw_cons);
results in NULL pointer dereference as cpr->cp_doorbell is not yet
initialized, and fw_ring_id is zero.
The fix is to initialize cpr fw_ring_id to INVALID_HW_RING_ID before
bnxt_init_chip() is invoked.
cpuidle: powernv: Pass correct drv->cpumask for registration
drv->cpumask defaults to cpu_possible_mask in __cpuidle_driver_init().
On PowerNV platform cpu_present could be less than cpu_possible in cases
where firmware detects the cpu, but it is not available to the OS. When
CONFIG_HOTPLUG_CPU=n, such cpus are not hotplugable at runtime and hence
we skip creating cpu_device.
This breaks cpuidle on powernv where register_cpu() is not called for
cpus in cpu_possible_mask that cannot be hot-added at runtime.
Trying cpuidle_register_device() on cpu without cpu_device will cause
crash like this:
Helge Deller [Wed, 29 Mar 2017 06:25:30 +0000 (08:25 +0200)]
parisc: Avoid stalled CPU warnings after system shutdown
Commit 73580dac7618 ("parisc: Fix system shutdown halt") introduced an endless
loop for systems which don't provide a software power off function. But the
soft lockup detector will detect this and report stalled CPUs after some time.
Avoid those unwanted warnings by disabling the soft lockup detector.
Helge Deller [Sat, 25 Mar 2017 10:59:15 +0000 (11:59 +0100)]
parisc: Clean up fixup routines for get_user()/put_user()
Al Viro noticed that userspace accesses via get_user()/put_user() can be
simplified a lot with regard to usage of the exception handling.
This patch implements a fixup routine for get_user() and put_user() in such
that the exception handler will automatically load -EFAULT into the register
%r8 (the error value) in case on a fault on userspace. Additionally the fixup
routine will zero the target register on fault in case of a get_user() call.
The target register is extracted out of the faulting assembly instruction.
This patch brings a few benefits over the old implementation:
1. Exception handling gets much cleaner, easier and smaller in size.
2. Helper functions like fixup_get_user_skip_1 (all of fixup.S) can be dropped.
3. No need to hardcode %r9 as target register for get_user() any longer. This
helps the compiler register allocator and thus creates less assembler
statements.
4. No dependency on the exception_data contents any longer.
5. Nested faults will be handled cleanly.
Helge Deller [Wed, 29 Mar 2017 19:41:05 +0000 (21:41 +0200)]
parisc: Fix access fault handling in pa_memcpy()
pa_memcpy() is the major memcpy implementation in the parisc kernel which is
used to do any kind of userspace/kernel memory copies.
Al Viro noticed various bugs in the implementation of pa_mempcy(), most notably
that in case of faults it may report back to have copied more bytes than it
actually did.
Fixing those bugs is quite hard in the C-implementation, because the compiler
is messing around with the registers and we are not guaranteed that specific
variables are always in the same processor registers. This makes proper fault
handling complicated.
This patch implements pa_memcpy() in assembler. That way we have correct fault
handling and adding a 64-bit copy routine was quite easy.
David S. Miller [Wed, 29 Mar 2017 18:24:21 +0000 (11:24 -0700)]
Merge tag 'mlx5e-pedit' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Or Gerlitz says:
====================
mlx5e-pedit 2017-03-28
This series adds support for offloading modifications of packet headers using
ConnectX-5 HW header re-write as an action applied during packet steering.
The offloaded SW mechanism is TC's pedit action. The offloading is
supported for E-Switch steering of VF traffic in the SRIOV
switchdev mode and for NIC (non eswitch) RX.
One use-case for this offload on virtual networks, is when the hypervisor
implements flow based router such as Open-Stack's DVR, where L2 headers
of guest packets re-written with routers' MAC addresses and the IP TTL
is decremented.
Another use case (which can be applied in parallel with routing) is
stateless NAT where guest L3/L4 headers are re-written.
The series is built as follows: the 1st six patches are preperations which
don't yet add new functionality, patches 7-8 add the FW APIs (data-structures
and commands) for header re-write, and patch nine allows offloading driver
to access pedit keys.
The 10th patch is somehow the core of the series, where we translate from
the pedit way to represent set of header modification elements to the FW
API for that same matter.
Once a set of HW modification is established, we register it with the FW
and get a modify header ID. When this ID is used with an action during
packet steering, the HW applies the header modification on the packet.
Patches 11 and 12 implement the above logic as an offload for pedit action
for the NIC and E-Switch use-cases.
I'd like to thanks Elijah Shakkour <[email protected]> for implementing
and helping me testing this functionality on HW simulator, before it could
be done with FW.
====================
Florian Fainelli [Tue, 28 Mar 2017 19:57:09 +0000 (12:57 -0700)]
net: phy: Allow building mdio-boardinfo into the kernel
mdio-boardinfo contains code that is helpful for platforms to register
specific MDIO bus devices independent of how CONFIG_MDIO_DEVICE or
CONFIG_PHYLIB will be selected (modular or built-in). In order to make
that possible, let's do the following:
- descend into drivers/net/phy/ unconditionally
- make mdiobus_setup_mdiodev_from_board_info() take a callback argument
which allows us not to expose the internal MDIO board info list and
mutex, yet maintain the logic within the same file
- relocate the code that creates a MDIO device into
drivers/net/phy/mdio_bus.c
- build mdio-boardinfo.o into the kernel as soon as MDIO_DEVICE is
defined (y or m)
Fixes: 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support from PHYs") Fixes: 648ea0134069 ("net: phy: Allow pre-declaration of MDIO devices") Signed-off-by: Florian Fainelli <[email protected]> Tested-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Guillaume Nault [Wed, 29 Mar 2017 06:45:29 +0000 (08:45 +0200)]
l2tp: purge socket queues in the .destruct() callback
The Rx path may grab the socket right before pppol2tp_release(), but
nothing guarantees that it will enqueue packets before
skb_queue_purge(). Therefore, the socket can be destroyed without its
queues fully purged.
Fix this by purging queues in pppol2tp_session_destruct() where we're
guaranteed nothing is still referencing the socket.
Fixes: 9e9cb6221aa7 ("l2tp: fix userspace reception on plain L2TP sockets") Signed-off-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Guillaume Nault [Wed, 29 Mar 2017 06:44:59 +0000 (08:44 +0200)]
l2tp: hold tunnel socket when handling control frames in l2tp_ip and l2tp_ip6
The code following l2tp_tunnel_find() expects that a new reference is
held on sk. Either sk_receive_skb() or the discard_put error path will
drop a reference from the tunnel's socket.
This issue exists in both l2tp_ip and l2tp_ip6.
Fixes: a3c18422a4b4 ("l2tp: hold socket before dropping lock in l2tp_ip{, 6}_recv()") Signed-off-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Wed, 29 Mar 2017 15:55:25 +0000 (08:55 -0700)]
Merge branch 'regset' (PTRACE_SETREGSET data leakage)
Merge PTRACE_SETREGSET leakage fixes from Dave Martin:
"This series is the collection of fixes I proposed on this topic, that
have not yet appeared upstream or in the stable branches,
The issue can leak kernel stack, but doesn't appear to allow userspace
to attack the kernel directly. The affected architectures are c6x,
h8300, metag, mips and sparc.
[ Mark Salter points out that c6x has no MMU or other mechanism to
prevent userspace access to kernel code or data on c6x, but it
doesn't hurt to clean that case up too. ]
The bugs arise from use of user_regset_copyin(). Users of
user_regset_copyin() can work in one of two ways:
1) Copy directly to thread_struct or equivalent. (This seems to be
the design assumption of the regset API, and is the most common
approach.)
2) Copy to a local variable and then transfer to thread_struct. (A
significant minority of cases.)
Buggy code typically involves approach 2"
* emailed patches from Dave Martin <[email protected]>:
sparc/ptrace: Preserve previous registers for short regset write
mips/ptrace: Preserve previous registers for short regset write
metag/ptrace: Reject partial NT_METAG_RPIPE writes
metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
metag/ptrace: Preserve previous registers for short regset write
h8300/ptrace: Fix incorrect register transfer count
c6x/ptrace: Remove useless PTRACE_SETREGSET implementation
Dave Martin [Mon, 27 Mar 2017 14:10:56 +0000 (15:10 +0100)]
metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill TXSTATUS, a well-defined default value is used, based on the
task's current value.
Dave Martin [Mon, 27 Mar 2017 14:10:54 +0000 (15:10 +0100)]
h8300/ptrace: Fix incorrect register transfer count
regs_set() and regs_get() are vulnerable to an off-by-1 buffer overrun
if CONFIG_CPU_H8S is set, since this adds an extra entry to
register_offset[] but not to user_regs_struct.
So, iterate over user_regs_struct based on its actual size, not based on
the length of register_offset[].
gpr_set won't work correctly and can never have been tested, and the
correct behaviour is not clear due to the endianness-dependent task
layout.
So, just remove it. The core code will now return -EOPNOTSUPPORT when
trying to set NT_PRSTATUS on this architecture until/unless a correct
implementation is supplied.
Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to
wrapping issues. To ensure we are correctly ensuring that the two ESN
structures are the same size compare both the overall size as reported
by xfrm_replay_state_esn_len() and the internal length are the same.
When a new xfrm state is created during an XFRM_MSG_NEWSA call we
validate the user supplied replay_esn to ensure that the size is valid
and to ensure that the replay_window size is within the allocated
buffer. However later it is possible to update this replay_esn via a
XFRM_MSG_NEWAE call. There we again validate the size of the supplied
buffer matches the existing state and if so inject the contents. We do
not at this point check that the replay_window is within the allocated
memory. This leads to out-of-bounds reads and writes triggered by
netlink packets. This leads to memory corruption and the potential for
priviledge escalation.
We already attempt to validate the incoming replay information in
xfrm_new_ae() via xfrm_replay_verify_len(). This confirms that the user
is not trying to change the size of the replay state buffer which
includes the replay_esn. It however does not check the replay_window
remains within that buffer. Add validation of the contained
replay_window.
Lucas Stach [Wed, 22 Mar 2017 11:07:23 +0000 (12:07 +0100)]
drm/etnaviv: (re-)protect fence allocation with GPU mutex
The fence allocation needs to be protected by the GPU mutex, otherwise
the fence seqnos of concurrent submits might not match the insertion order
of the jobs in the kernel ring. This breaks the assumption that jobs
complete with monotonically increasing fence seqnos.
Fixes: d9853490176c (drm/etnaviv: take GPU lock later in the submit process) CC: [email protected] #4.9+ Signed-off-by: Lucas Stach <[email protected]>
Dan Carpenter [Fri, 17 Mar 2017 20:51:20 +0000 (23:51 +0300)]
Btrfs: fix an integer overflow check
This isn't super serious because you need CAP_ADMIN to run this code.
I added this integer overflow check last year but apparently I am
rubbish at writing integer overflow checks... There are two issues.
First, access_ok() works on unsigned long type and not u64 so on 32 bit
systems the access_ok() could be checking a truncated size. The other
issue is that we should be using a stricter limit so we don't overflow
the kzalloc() setting ctx->clone_roots later in the function after the
access_ok():
Liu Bo [Fri, 24 Mar 2017 22:04:50 +0000 (15:04 -0700)]
Btrfs: bring back repair during read
Commit 20a7db8ab3f2 ("btrfs: add dummy callback for readpage_io_failed
and drop checks") made a cleanup around readpage_io_failed_hook, and
it was supposed to keep the original sematics, but it also
unexpectedly disabled repair during read for dup, raid1 and raid10.
This fixes the problem by letting data's inode call the generic
readpage_io_failed callback by returning -EAGAIN from its
readpage_io_failed_hook in order to notify end_bio_extent_readpage to
do the rest. We don't call it directly because the generic one takes
an offset from end_bio_extent_readpage() to calculate the index in the
checksum array and inode's readpage_io_failed_hook doesn't offer that
offset.
Johannes Berg [Wed, 29 Mar 2017 12:15:24 +0000 (14:15 +0200)]
mac80211: unconditionally start new netdev queues with iTXQ support
When internal mac80211 TXQs aren't supported, netdev queues must
always started out started even when driver queues are stopped
while the interface is added. This is necessary because with the
internal TXQ support netdev queues are never stopped and packet
scheduling/dropping is done in mac80211.
Cc: [email protected] # 4.9+ Fixes: 80a83cfc434b1 ("mac80211: skip netdev queue control with software queuing") Reported-and-tested-by: Sven Eckelmann <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
usb: phy: isp1301: Fix build warning when CONFIG_OF is disabled
Commit fd567653bdb9 ("usb: phy: isp1301: Add OF device ID table")
added an OF device ID table, but used the of_match_ptr() macro
that will lead to a build warning if CONFIG_OF symbol is disabled:
drivers/usb/phy//phy-isp1301.c:36:34: warning: ‘isp1301_of_match’ defined but not used [-Wunused-const-variable=]
static const struct of_device_id isp1301_of_match[] = {
^~~~~~~~~~~~~~~~
Fixes: fd567653bdb9 ("usb: phy: isp1301: Add OF device ID table") Reported-by: Arnd Bergmann <[email protected]> Signed-off-by: Javier Martinez Canillas <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Mathias Nyman [Tue, 28 Mar 2017 12:55:30 +0000 (15:55 +0300)]
xhci: Manually give back cancelled URB if we can't queue it for cancel
xhci needs to take care of four scenarios when asked to cancel a URB.
1 URB is not queued or already given back.
usb_hcd_check_unlink_urb() will return an error, we pass the error on
2 We fail to find xhci internal structures from urb private data such as
virtual device and endpoint ring.
Give back URB immediately, can't do anything about internal structures.
3 URB private data has valid pointers to xhci internal data, but host is
not responding.
give back URB immedately and remove the URB from the endpoint lists.
4 Everyting is working
add URB to cancel list, queue a command to stop the endpoint, after
which the URB can be turned to no-op or skipped, removed from lists,
and given back.
We failed to give back the urb in case 2 where the correct device and
endpoint pointers could not be retrieved from URB private data.
This caused a hang on Dell Inspiron 5558/0VNM2T at resume from suspend
as urb was never returned.
Mathias Nyman [Tue, 28 Mar 2017 12:55:29 +0000 (15:55 +0300)]
xhci: Set URB actual length for stopped control transfers
A control transfer that stopped at the status stage incorrectly
warned about a "unexpected TRB Type 4", and did not set the
transferred actual_length for the URB.
The URB actual_length for control transfers should contain the
bytes transferred in the data stage.
Bytes of a partially sent setup stage and missing bytes from
status stage should be left out.
Alexander Duyck [Tue, 14 Mar 2017 17:15:27 +0000 (10:15 -0700)]
i40e/i40evf: Change the way we limit the maximum frame size for Rx
This patch changes the way we handle the maximum frame size for the Rx
path. Previously we were rounding up to 2K for a 1500 MTU and then brining
the max frame size down to MTU plus a fixed amount. With this patch
applied what we now do is limit the maximum frame to 1.5K minus the value
for NET_IP_ALIGN for standard MTU, and for any MTU greater than 1500 we
allow up to the maximum frame size. This makes the behavior more
consistent with the other drivers such as igb which had similar logic. In
addition it reduces the test matrix for MTU since we only have two max
frame sizes that are handled for Rx now.
Alexander Duyck [Tue, 14 Mar 2017 17:15:26 +0000 (10:15 -0700)]
i40e/i40evf: Add legacy-rx private flag to allow fallback to old Rx flow
This patch adds a control which will allow us to toggle into and out of the
legacy Rx mode. The legacy Rx mode is what we currently do when performing
Rx. As I make further changes what should happen is that the driver will
fall back to the behavior for Rx as of this patch should the "legacy-rx"
flag be set to on.
Alexander Duyck [Tue, 14 Mar 2017 17:15:25 +0000 (10:15 -0700)]
i40e/i40evf: Break i40e_fetch_rx_buffer up to allow for reuse of frag code
This patch is meant to clean up the code in preparation for us adding
support for build_skb. Specifically we deconstruct i40e_fetch_buffer into
several functions so that those functions can later be reused when we add a
path for build_skb.
Specifically with this change we split out the code for adding a page to an
exiting skb.
Alexander Duyck [Tue, 14 Mar 2017 17:15:24 +0000 (10:15 -0700)]
i40e/i40evf: Pull out code for cleaning up Rx buffers
This patch pulls out the code responsible for handling buffer recycling and
page counting and distributes it through several functions. This allows us
to commonize the bits that handle either freeing or recycling the buffers.
As far as the page count tracking one change to the logic is that
pagecnt_bias is decremented as soon as we call i40e_get_rx_buffer. It is
then the responsibility of the function that pulls the data to either
increment the pagecnt_bias if the buffer can be recycled as-is, or to
update page_offset so that we are pointing at the correct location for
placement of the next buffer.
Alexander Duyck [Tue, 14 Mar 2017 17:15:23 +0000 (10:15 -0700)]
i40e/i40evf: Pull code for grabbing and syncing rx_buffer from fetch_buffer
This patch pulls the code responsible for fetching the Rx buffer and
synchronizing DMA into a function, specifically called i40e_get_rx_buffer.
The general idea is to allow for better code reuse by pulling this out of
i40e_fetch_rx_buffer. We dropped a couple of prefetches since the time
between the prefetch being called and the data being accessed was too small
to be useful.
Alexander Duyck [Tue, 14 Mar 2017 17:15:22 +0000 (10:15 -0700)]
i40e/i40evf: Use length to determine if descriptor is done
This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.
Jacob Keller [Fri, 10 Mar 2017 20:22:04 +0000 (12:22 -0800)]
i40e: remove a useless goto statement
The goto found here for when in MFP mode is pointless. It jumps to the
end of a series of if blocks. However, right after this statement is
a closing '}' for this if block, which will result in the program flow
going to the exact same location as the goto statement indicates. Thus,
regardless of whether we are in MFP mode, the program flow will resume
from the same location.
This arose due to various refactoring which did not notice that this
goto became essentially a no-op.
To properly understand this diff you will need to view a larger context
than is given by default.
i40e: Check for new arq elements before leaving the adminq subtask loop
Fix a case where we miss an arq element if a new one is added before we
enable interrupts and exit the arq subtask loop. This occurs frequently
with RDMA running on Windows VF and causes long delays that prevent SMB
from establishing connections.
Alexander Duyck [Fri, 10 Mar 2017 20:22:01 +0000 (12:22 -0800)]
i40e: Clean up handling of private flags
This patch cleans up and addresses several issues in the way that i40e
handles private flags. Previously the code was choosing fixed bits and
trying to match them up with strings in a somewhat haphazard way. This
resulted in the possibility for adding a new bit and causing a mismatch as
the private flags are linear bits starting at 0, and the private flags in
the driver were split up over a group specific to the PF and a group that
was global.
What this change does is define an array of structs used to represent the
private flags. Contained within the structs are the bits necessary to know
which flags to set and/or clear depending on the state of the bit. By
doing this we can add new bits in the future with minimal overhead and
avoid creating possible mis-matches should we need to remove a flag based
on compile options.