tap: reference to KVA of an unloaded module causes kernel panic
The commit 9a393b5d5988 ("tap: tap as an independent module") created a
separate tap module that implements tap functionality and exports
interfaces that will be used by macvtap and ipvtap modules to create
create respective tap devices.
However, that patch introduced a regression wherein the modules macvtap
and ipvtap can be removed (through modprobe -r) while there are
applications using the respective /dev/tapX devices. These applications
cause kernel to hold reference to /dev/tapX through 'struct cdev
macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap
modules respectively. So, when the application is later closed the
kernel panics because we are referencing KVA that is present in the
unloaded modules.
----------8<------- Example ----------8<----------
$ sudo ip li add name mv0 link enp7s0 type macvtap
$ sudo ip li show mv0 |grep mv0| awk -e '{print $1 $2}'
14:mv0@enp7s0:
$ cat /dev/tap14 &
$ lsmod |egrep -i 'tap|vlan'
macvtap 16384 0
macvlan 24576 1 macvtap
tap 24576 3 macvtap
$ sudo modprobe -r macvtap
$ fg
cat /dev/tap14
^C
<...system panics...>
BUG: unable to handle kernel paging request at ffffffffa038c500
IP: cdev_put+0xf/0x30
----------8<-----------------8<----------
The fix is to set cdev.owner to the module that creates the tap device
(either macvtap or ipvtap). With this set, the operations (in
fs/char_dev.c) on char device holds and releases the module through
cdev_get() and cdev_put() and will not allow the module to unload
prematurely.
Fixes: 9a393b5d5988ea4e (tap: tap as an independent module) Signed-off-by: Girish Moodalbail <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Eric Dumazet [Fri, 27 Oct 2017 04:21:40 +0000 (21:21 -0700)]
tcp: refresh tp timestamp before tcp_mtu_probe()
In the unlikely event tcp_mtu_probe() is sending a packet, we
want tp->tcp_mstamp being as accurate as possible.
This means we need to call tcp_mstamp_refresh() a bit earlier in
tcp_write_xmit().
Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jason Wang [Fri, 27 Oct 2017 03:05:44 +0000 (11:05 +0800)]
tuntap: properly align skb->head before building skb
An unaligned alloc_frag->offset caused by previous allocation will
result an unaligned skb->head. This will lead unaligned
skb_shared_info and then unaligned dataref which requires to be
aligned for accessing on some architecture. Fix this by aligning
alloc_frag->offset before the frag refilling.
Linus Torvalds [Sat, 28 Oct 2017 03:41:05 +0000 (20:41 -0700)]
Merge tag 'for-linus-4.14c-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- a fix for the Xen gntdev device repairing an issue in case of partial
failure of mapping multiple pages of another domain
- a fix of a regression in the Xen balloon driver introduced in 4.13
- a build fix for Xen on ARM which will trigger e.g. for Linux RT
- a maintainers update for pvops (not really Xen, but carrying through
this tree just for convenience)
* tag 'for-linus-4.14c-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
maintainers: drop Chris Wright from pvops
arm/xen: don't inclide rwlock.h directly.
xen: fix booting ballooned down hvm guest
xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap()
Linus Torvalds [Sat, 28 Oct 2017 03:38:47 +0000 (20:38 -0700)]
Merge tag 'arc-4.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- Fixes for HSDK platform
- module build error for !LLSC config
* tag 'arc-4.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: unbork module link errors with !CONFIG_ARC_HAS_LLSC
ARC: [plat-hsdk] Increase SDIO CIU frequency to 50000000Hz
ARC: [plat-hsdk] select CONFIG_RESET_HSDK from Kconfig
Linus Torvalds [Sat, 28 Oct 2017 03:35:31 +0000 (20:35 -0700)]
Fix tracing sample code warning.
Commit 6575257c60e1 ("tracing/samples: Fix creation and deletion of
simple_thread_fn creation") introduced a new warning due to using a
boolean as a counter.
Just make it "int".
Fixes: 6575257c60e1 ("tracing/samples: Fix creation and deletion of simple_thread_fn creation") Cc: Steven Rostedt <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Linus Torvalds [Sat, 28 Oct 2017 00:19:39 +0000 (17:19 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- revert a /dev/mem restriction change that crashes with certain boot
parameters
- an AMD erratum fix for cases where the BIOS doesn't apply it
- fix unwinder debuginfo
- improve ORC unwinder warning printouts"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "x86/mm: Limit mmap() of /dev/mem to valid physical addresses"
x86/unwind: Show function name+offset in ORC error messages
x86/entry: Fix idtentry unwind hint
x86/cpu/AMD: Apply the Erratum 688 fix when the BIOS doesn't
Linus Torvalds [Sat, 28 Oct 2017 00:17:25 +0000 (17:17 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Update the <linux/swait.h> documentation to discourage their use"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/swait: Document it clearly that the swait facilities are special and shouldn't be used
Linus Torvalds [Sat, 28 Oct 2017 00:15:49 +0000 (17:15 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Ingo Molnar:
"A fix for a misplaced permission check that can leave perf PT or LBR
disabled (on Intel CPUs) permanently until the next reboot"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/bts: Fix exclusive event reference leak
Linus Torvalds [Sat, 28 Oct 2017 00:14:32 +0000 (17:14 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Two fixes: an ARM fix for KASLR interaction with hibernation, plus an
efi_test crash fix"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub/arm: Don't randomize runtime regions when CONFIG_HIBERNATION=y
efi/efi_test: Prevent an Oops in efi_runtime_query_capsulecaps()
Andrew Duggan [Wed, 25 Oct 2017 16:30:16 +0000 (09:30 -0700)]
Input: synaptics-rmi4 - limit the range of what GPIOs are buttons
By convention the first 6 bits of F30 Ctrl 2 and 3 are used to signify
GPIOs which are connected to buttons. Additional GPIOs may be used as
input GPIOs to signal the touch controller of some event
(ie disable touchpad). These additional GPIOs may meet the criteria of
a button in rmi_f30_is_valid_button() but should not be considered
buttons. This patch limits the GPIOs which are mapped to buttons to just
the first 6.
Dmitry Torokhov [Mon, 23 Oct 2017 23:46:00 +0000 (16:46 -0700)]
Input: gtco - fix potential out-of-bound access
parse_hid_report_descriptor() has a while (i < length) loop, which
only guarantees that there's at least 1 byte in the buffer, but the
loop body can read multiple bytes which causes out-of-bounds access.
David S. Miller [Fri, 27 Oct 2017 15:05:34 +0000 (00:05 +0900)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2017-10-26
This series contains fixes to e1000, igb, ixgbe and i40e.
Vincenzo Maffione fixes a potential race condition which would result in
the interface being up but transmits are disabled in the hardware.
Colin Ian King fixes a possible NULL pointer dereference in e1000, which
was found by Coverity.
Jean-Philippe Brucker fixes a possible kernel panic when a driver cannot
map a transmit buffer, which is caused by an erroneous test.
Alex provides a fix for ixgbe, which is a partial revert of the commit ffed21bcee7a ("ixgbe: Don't bother clearing buffer memory for descriptor rings")
because the previous commit messed up the exception handling path by
adding the count back in when we did not need to. Also fixed a typo,
where the transmit ITR setting was being used to determine if we were
using adaptive receive interrupt moderation or not. Lastly, fixed a
memory leak by including programming descriptors in the cleaned count.
====================
Xin Long [Thu, 26 Oct 2017 11:27:17 +0000 (19:27 +0800)]
ip6_gre: update dst pmtu if dev mtu has been updated by toobig in __gre6_xmit
When receiving a Toobig icmpv6 packet, ip6gre_err would just set
tunnel dev's mtu, that's not enough. For skb_dst(skb)'s pmtu may
still be using the old value, it has no chance to be updated with
tunnel dev's mtu.
Jianlin found this issue by reducing route's mtu while running
netperf, the performance went to 0.
ip6ip6 and ip4ip6 tunnel can work well with this, as they lookup
the upper dst and update_pmtu it's pmtu or icmpv6_send a Toobig
to upper socket after setting tunnel dev's mtu.
We couldn't do that for ip6_gre, as gre's inner packet could be
any protocol, it's difficult to handle them (like lookup upper
dst) in a good way.
So this patch is to fix it by updating skb_dst(skb)'s pmtu when
dev->mtu < skb_dst(skb)'s pmtu in tx path. It's safe to do this
update there, as usually dev->mtu <= skb_dst(skb)'s pmtu and no
performance regression can be caused by this.
Xin Long [Thu, 26 Oct 2017 11:19:56 +0000 (19:19 +0800)]
ipip: only increase err_count for some certain type icmp in ipip_err
t->err_count is used to count the link failure on tunnel and an err
will be reported to user socket in tx path if t->err_count is not 0.
udp socket could even return EHOSTUNREACH to users.
Since commit fd58156e456d ("IPIP: Use ip-tunneling code.") removed
the 'switch check' for icmp type in ipip_err(), err_count would be
increased by the icmp packet with ICMP_EXC_FRAGTIME code. an link
failure would be reported out due to this.
In Jianlin's case, when receiving ICMP_EXC_FRAGTIME a icmp packet,
udp netperf failed with the err:
send_data: data send error: No route to host (errno 113)
We expect this error reported from tunnel to socket when receiving
some certain type icmp, but not ICMP_EXC_FRAGTIME, ICMP_SR_FAILED
or ICMP_PARAMETERPROB ones.
This patch is to bring 'switch check' for icmp type back to ipip_err
so that it only reports link failure for the right type icmp, just as
in ipgre_err() and ipip6_err().
Jose Abreu [Thu, 26 Oct 2017 09:07:12 +0000 (10:07 +0100)]
net: stmmac: First Queue must always be in DCB mode
According to DWMAC databook the first queue operating mode
must always be in DCB.
As MTL_QUEUE_DCB = 1, we need to always set the first queue
operating mode to DCB otherwise driver will think that queue
is in AVB mode (because MTL_QUEUE_AVB = 0).
Jose Abreu [Thu, 26 Oct 2017 08:51:33 +0000 (09:51 +0100)]
net: stmmac: dwc-qos-eth: Fix typo in DT bindings parsing
According to DT bindings documentation we are expecting a
property called "snps,read-requests" but we are parsing
instead a property called "read,read-requests".
The series includes some misc fixes for mlx5 core and etherent driver.
Please pull and let me know if there's any problem.
For -Stable:
net/mlx5e: Properly deal with encap flows add/del under neigh update (kernels >= 4.12)
net/mlx5: Fix health work queue spin lock to IRQ safe (kernels >= 4.13)
====================
There's unanticipated interaction with some boot parameters like 'mem=',
which now cause the new checks via valid_mmap_phys_addr_range() to be too
restrictive, crashing a Qemu bootup in fact, as reported by Fengguang Wu.
So while the motivation of the change is still entirely valid, we
need a few more rounds of testing to get it right - it's way too late
after -rc6, so revert it for now.
Here are:
* follow-up fixes for the WoWLAN security issue, to fix a
partial TKIP key material problem and to use crypto_memneq()
* a change for better enforcement of FQ's memory limit
* a disconnect/connect handling fix, and
* a user rate mask validation fix
====================
Xiong Zhang [Fri, 13 Oct 2017 22:34:47 +0000 (06:34 +0800)]
drm/i915/gvt: Adding ACTHD mmio read handler
When a workload is too heavy to finish it in gpu hang check timer
intervals(1.5), gpu hang check function will check ACTHD register
value to decide whether gpu is real dead or not. On real hw,
ACTHD is updated by HW when workload is running, then host kernel
won't think it is gpu hang. while guest kernel always read a constant
ACTHD value as GVT doesn't supply ACTHD emulate handler, then
guest kernel detects a fake gpu hang.
To remove such guest fake gpu hang, this patch supply ACTHD
mmio read handler which read real HW ACTHD register directly.
Zhenyu Wang [Thu, 19 Oct 2017 05:54:06 +0000 (13:54 +0800)]
drm/i915/gvt: properly check per_ctx bb valid state
Need to check valid state for per_ctx bb and bypass batch buffer
combine for scan if necessary. Otherwise adding invalid MI batch
buffer start cmd for per_ctx bb will cause scan failure, which is
taken as -EFAULT now so vGPU would be put in failsafe. This trys
to fix that by checking per_ctx bb valid state. Also remove old
invalid WARNING that indirect ctx bb shouldn't depend on valid
per_ctx bb.
This caused a regression:
"The specific problem is that dnsmasq refuses to start on openSUSE Leap
42.2. The specific cause is that and attempt to open a PF_LOCAL socket
gets EACCES. This means that networking doesn't function on a system
with a 4.14-rc2 system."
Sadly, the developers involved seemed to be in denial for several weeks
about this, delaying the revert. This has not been a good release for
the security subsystem, and this area needs to change development
practices.
Linus Torvalds [Thu, 26 Oct 2017 17:10:39 +0000 (19:10 +0200)]
Merge tag 'pm-4.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"This fixes a device power management quality of service (PM QoS)
framework implementation issue causing 'no restriction' requests for
device resume latency, including 'no restriction' set by user space,
to effectively override requests with specific device resume latency
requirements.
It is late in the cycle, but the bug in question is in the 'user space
can trigger unexpected behavior' category and the fix is
stable-candidate, so here it goes"
* tag 'pm-4.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / QoS: Fix device resume latency PM QoS
alpha/PCI: Move pci_map_irq()/pci_swizzle() out of initdata
The introduction of {map/swizzle}_irq() hooks in the struct pci_host_bridge
allowed to replace the pci_fixup_irqs() PCI IRQ allocation in alpha arch
PCI code with per-bridge map/swizzle functions with commit 0e4c2eeb758a
("alpha/PCI: Replace pci_fixup_irqs() call with host bridge IRQ mapping
hooks").
As a side effect of converting PCI IRQ allocation to the struct
pci_host_bridge {map/swizzle}_irq() hooks mechanism, the actual PCI IRQ
allocation function (ie pci_assign_irq()) is carried out per-device in
pci_device_probe() that is called when a PCI device driver is about to be
probed.
This means that, for drivers compiled as loadable modules, the actual PCI
device IRQ allocation can now happen after the system has booted so the
struct pci_host_bridge {map/swizzle}_irq() hooks pci_assign_irq() relies on
must stay valid after the system has booted so that PCI core can carry out
PCI IRQ allocation correctly.
Most of the alpha board structures pci_map_irq() and pci_swizzle() hooks
(that are used to initialize their struct pci_host_bridge equivalent
through the alpha_mv global variable - that represents the struct
alpha_machine_vector of the running kernel) are marked as
__init/__initdata; this causes freed memory dereferences when PCI IRQ
allocation is carried out after the kernel has booted (ie when loading PCI
drivers as loadable module) because when the kernel tries to bind the PCI
device to its (module) driver, the function pci_assign_irq() is called,
that in turn retrieves the struct pci_host_bridge {map/swizzle}_irq() hooks
to carry out PCI IRQ allocation; if those hooks are marked as __init
code/__initdata they point at freed/invalid memory.
Fix the issue by removing the __init/__initdata markers from all subarch
struct alpha_machine_vector.pci_map_irq()/pci_swizzle() functions (and
data).
Linus Torvalds [Thu, 26 Oct 2017 15:08:48 +0000 (17:08 +0200)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A few select fixes that should go into this series. Mainly for NVMe,
but also a single stable fix for nbd from Josef"
* 'for-linus' of git://git.kernel.dk/linux-block:
nbd: handle interrupted sendmsg with a sndtimeo set
nvme-rdma: Fix error status return in tagset allocation failure
nvme-rdma: Fix possible double free in reconnect flow
nvmet: synchronize sqhd update
nvme-fc: retry initial controller connections 3 times
nvme-fc: fix iowait hang
Linus Torvalds [Thu, 26 Oct 2017 15:06:35 +0000 (17:06 +0200)]
Merge tag 'spi-fix-v4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"There are a bunch of device specific fixes (more than I'd like, I've
been lax sending these) plus one important core fix for the conversion
to use an IDR for bus number allocation which avoids issues with
collisions when some but not all of the buses in the system have a
fixed bus number specified.
The Armada changes are rather large, specificially "spi: armada-3700:
Fix padding when sending not 4-byte aligned data", but it's a storage
corruption issue and there's things like indentation changes which
make it look bigger than it really is. It's been cooking in -next for
quite a while now and is part of the reason for the delay"
* tag 'spi-fix-v4.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fix IDR collision on systems with both fixed and dynamic SPI bus numbers
spi: bcm-qspi: Fix use after free in bcm_qspi_probe() in error path
spi: a3700: Return correct value on timeout detection
spi: uapi: spidev: add missing ioctl header
spi: stm32: Fix logical error in stm32_spi_prepare_mbr()
spi: armada-3700: Fix padding when sending not 4-byte aligned data
spi: armada-3700: Fix failing commands with quad-SPI
Alexander Duyck [Sun, 22 Oct 2017 01:12:29 +0000 (18:12 -0700)]
i40e: Add programming descriptors to cleaned_count
This patch updates the i40e driver to include programming descriptors in
the cleaned_count. Without this change it becomes possible for us to leak
memory as we don't trigger a large enough allocation when the time comes to
allocate new buffers and we end up overwriting a number of rx_buffers equal
to the number of programming descriptors we encountered.
Fixes: 0e626ff7ccbf ("i40e: Fix support for flow director programming status") Signed-off-by: Alexander Duyck <[email protected]> Tested-by: Anders K. Pedersen <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
Alexander Duyck [Fri, 20 Oct 2017 20:59:20 +0000 (13:59 -0700)]
i40e: Fix incorrect use of tx_itr_setting when checking for Rx ITR setup
It looks like there was either a copy/paste error or just a typo that
resulted in the Tx ITR setting being used to determine if we were using
adaptive Rx interrupt moderation or not.
This patch fixes the typo.
Fixes: 65e87c0398f5 ("i40evf: support queue-specific settings for interrupt moderation") Signed-off-by: Alexander Duyck <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
Alexander Duyck [Thu, 19 Oct 2017 21:07:13 +0000 (17:07 -0400)]
ixgbe: Fix Tx map failure path
This patch is a partial revert of "ixgbe: Don't bother clearing buffer
memory for descriptor rings". Specifically I messed up the exception
handling path a bit and this resulted in us incorrectly adding the count
back in when we didn't need to.
In order to make this simpler I am reverting most of the exception handling
path change and instead just replacing the bit that was handled by the
unmap_and_free call.
Fixes: ffed21bcee7a ("ixgbe: Don't bother clearing buffer memory for descriptor rings") Signed-off-by: Alexander Duyck <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
Colin Ian King [Fri, 22 Sep 2017 17:13:48 +0000 (18:13 +0100)]
e1000: avoid null pointer dereference on invalid stat type
Currently if the stat type is invalid then data[i] is being set
either by dereferencing a null pointer p, or it is reading from
an incorrect previous location if we had a valid stat type
previously. Fix this by skipping over the read of p on an invalid
stat type.
Detected by CoverityScan, CID#113385 ("Explicit null dereferenced")
e1000: fix race condition between e1000_down() and e1000_watchdog
This patch fixes a race condition that can result into the interface being
up and carrier on, but with transmits disabled in the hardware.
The bug may show up by repeatedly IFF_DOWN+IFF_UP the interface, which
allows e1000_watchdog() interleave with e1000_down().
CPU x CPU y
--------------------------------------------------------------------
e1000_down():
netif_carrier_off()
e1000_watchdog():
if (carrier == off) {
netif_carrier_on();
enable_hw_transmit();
}
disable_hw_transmit();
e1000_watchdog():
/* carrier on, do nothing */
Juergen Gross [Thu, 26 Oct 2017 09:50:56 +0000 (11:50 +0200)]
xen: fix booting ballooned down hvm guest
Commit 96edd61dcf44362d3ef0bed1a5361e0ac7886a63 ("xen/balloon: don't
online new memory initially") introduced a regression when booting a
HVM domain with memory less than mem-max: instead of ballooning down
immediately the system would try to use the memory up to mem-max
resulting in Xen crashing the domain.
For HVM domains the current size will be reflected in Xenstore node
memory/static-max instead of memory/target.
Additionally we have to trigger the ballooning process at once.
Double free of skb_array in tap module is causing kernel panic. When
tap_set_queue() fails we free skb_array right away by calling
skb_array_cleanup(). However, later on skb_array_cleanup() is called
again by tap_sock_destruct through sock_put(). This patch fixes that
issue.
Yousuk Seung [Tue, 24 Oct 2017 23:44:42 +0000 (16:44 -0700)]
tcp: call tcp_rate_skb_sent() when retransmit with unaligned skb->data
Current implementation calls tcp_rate_skb_sent() when tcp_transmit_skb()
is called when it clones skb only. Not calling tcp_rate_skb_sent() is OK
for all such code paths except from __tcp_retransmit_skb() which happens
when skb->data address is not aligned. This may rarely happen e.g. when
small amount of data is sent initially and the receiver partially acks
odd number of bytes for some reason, possibly malicious.
Eric Dumazet [Tue, 24 Oct 2017 15:20:31 +0000 (08:20 -0700)]
tcp/dccp: fix other lockdep splats accessing ireq_opt
In my first attempt to fix the lockdep splat, I forgot we could
enter inet_csk_route_req() with a freshly allocated request socket,
for which refcount has not yet been elevated, due to complex
SLAB_TYPESAFE_BY_RCU rules.
We either are in rcu_read_lock() section _or_ we own a refcount on the
request.
Correct RCU verb to use here is rcu_dereference_check(), although it is
not possible to prove we actually own a reference on a shared
refcount :/
In v2, I added ireq_opt_deref() helper and use in three places, to fix other
possible splats.
HĂ¥kon Bugge [Tue, 24 Oct 2017 14:16:28 +0000 (16:16 +0200)]
rds: Fix inaccurate accounting of unsignaled wrs
The number of unsignaled work-requests posted to the IB send queue is
tracked by a counter in the rds_ib_connection struct. When it reaches
zero, or the caller explicitly asks for it, the send-signaled bit is
set in send_flags and the counter is reset. This is performed by the
rds_ib_set_wr_signal_state() function.
However, this function is not always used which yields inaccurate
accounting. This commit fixes this, re-factors a code bloat related to
the matter, and makes the actual parameter type to the function
consistent.
David S. Miller [Thu, 26 Oct 2017 08:17:45 +0000 (17:17 +0900)]
Merge tag 'linux-can-fixes-for-4.14-20171024' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2017-10-24
here's another pull request for net/master.
The patch by Gerhard Bertelsmann fixes the CAN_CTRLMODE_LOOPBACK in the
sun4i driver. Two patches by Jimmy Assarsson for the kvaser_usb driver
fix a print in the error path of the kvaser_usb_close() and remove a
wrong warning message with the Leaf v2 firmware version v4.1.844.
====================
Antoine Tenart [Tue, 24 Oct 2017 09:41:28 +0000 (11:41 +0200)]
net: mvpp2: do not sleep in set_rx_mode
This patch replaces GFP_KERNEL by GFP_ATOMIC to avoid sleeping in the
ndo_set_rx_mode() call which is called with BH disabled.
Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Antoine Tenart [Tue, 24 Oct 2017 09:41:27 +0000 (11:41 +0200)]
net: mvpp2: fix invalid parameters order when calling the tcam init
When calling mvpp2_prs_mac_multi_set() from mvpp2_prs_mac_init(), two
parameters (the port index and the table index) are inverted. Fixes
this.
Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Antoine Tenart [Tue, 24 Oct 2017 09:41:26 +0000 (11:41 +0200)]
net: mvpp2: fix typo in the tcam setup
This patch fixes a typo in the mvpp2_prs_tcam_data_cmp() function, as
the shift value is inverted with the data.
Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net/mlx5e: DCBNL, Implement tc with ets type and zero bandwidth
Previously, tc with ets type and zero bandwidth is not accepted
by driver. This behavior does not follow the IEEE802.1qaz spec.
If there are tcs with ets type and zero bandwidth, these tcs are
assigned to the lowest priority tc_group #0. We equally distribute
100% bw of the tc_group #0 to these zero bandwidth ets tcs.
Also, the non zero bandwidth ets tcs are assigned to tc_group #1.
If there is no zero bandwidth ets tc, the non zero bandwidth ets tcs
are assigned to tc_group #0.
Or Gerlitz [Tue, 17 Oct 2017 10:33:43 +0000 (12:33 +0200)]
net/mlx5e: Properly deal with encap flows add/del under neigh update
Currently, the encap action offload is handled in the actions parse
function and not in mlx5e_tc_add_fdb_flow() where we deal with all
the other aspects of offloading actions (vlan, modify header) and
the rule itself.
When the neigh update code (mlx5e_tc_encap_flows_add()) recreates the
encap entry and offloads the related flows, we wrongly call again into
mlx5e_tc_add_fdb_flow(), this for itself would cause us to handle
again the offloading of vlans and header re-write which puts things
in non consistent state and step on freed memory (e.g the modify
header parse buffer which is already freed).
Since on error, mlx5e_tc_add_fdb_flow() detaches and may release the
encap entry, it causes a corruption at the neigh update code which goes
over the list of flows associated with this encap entry, or double free
when the tc flow is later deleted by user-space.
When neigh update (mlx5e_tc_encap_flows_del()) unoffloads the flows related
to an encap entry which is now invalid, we do a partial repeat of the eswitch
flow removal code which is wrong too.
To fix things up we do the following:
(1) handle the encap action offload in the eswitch flow add function
mlx5e_tc_add_fdb_flow() as done for the other actions and the rule itself.
(2) modify the neigh update code (mlx5e_tc_encap_flows_add/del) to only
deal with the encap entry and rules delete/add and not with any of
the other offloaded actions.
Fixes: 232c001398ae ('net/mlx5e: Add support to neighbour update flow') Signed-off-by: Or Gerlitz <[email protected]> Reviewed-by: Paul Blakey <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
Huy Nguyen [Wed, 4 Oct 2017 22:58:21 +0000 (17:58 -0500)]
net/mlx5: Delay events till mlx5 interface's add complete for pci resume
mlx5_ib_add is called during mlx5_pci_resume after a pci error.
Before mlx5_ib_add completes, there are multiple events which trigger
function mlx5_ib_event. This cause kernel panic because mlx5_ib_event
accesses unitialized resources.
The fix is to extend Erez Shitrit's patch <97834eba7c19>
("net/mlx5: Delay events till ib registration ends") to cover
the pci resume code path.
Trace:
mlx5_core 0001:01:00.6: mlx5_pci_resume was called
mlx5_core 0001:01:00.6: firmware version: 16.20.1011
mlx5_core 0001:01:00.6: mlx5_attach_interface:164:(pid 779):
mlx5_ib_event:2996:(pid 34777): warning: event on port 1
mlx5_ib_event:2996:(pid 34782): warning: event on port 1
Unable to handle kernel paging request for data at address 0x0001c104
Faulting instruction address: 0xd000000008f411fc
Oops: Kernel access of bad area, sig: 11 [#1]
...
...
Call Trace:
[c000000fff77bb70] [d000000008f4119c] mlx5_ib_event+0x64/0x470 [mlx5_ib] (unreliable)
[c000000fff77bc60] [d000000008e67130] mlx5_core_event+0xb8/0x210 [mlx5_core]
[c000000fff77bd10] [d000000008e4bd00] mlx5_eq_int+0x528/0x860[mlx5_core]
Moshe Shemesh [Thu, 19 Oct 2017 11:14:29 +0000 (14:14 +0300)]
net/mlx5: Fix health work queue spin lock to IRQ safe
spin_lock/unlock of health->wq_lock should be IRQ safe.
It was changed to spin_lock_irqsave since adding commit 0179720d6be2
("net/mlx5: Introduce trigger_health_work function") which uses
spin_lock from asynchronous event (IRQ) context.
Thus, all spin_lock/unlock of health->wq_lock should have been moved
to IRQ safe mode.
However, one occurrence on new code using this lock missed that
change, resulting in possible deadlock:
kernel: Possible unsafe locking scenario:
kernel: CPU0
kernel: ----
kernel: lock(&(&health->wq_lock)->rlock);
kernel: <Interrupt>
kernel: lock(&(&health->wq_lock)->rlock);
kernel: #012 *** DEADLOCK ***
Fixes: 2a0165a034ac ("net/mlx5: Cancel delayed recovery work when unloading the driver") Signed-off-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
Linus Torvalds [Thu, 26 Oct 2017 06:11:44 +0000 (08:11 +0200)]
Merge tag 'hwmon-for-linus-v4.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix initial temperature readings for TMP102
- Fix timeouts in DA9052 driver by increasing its sampling rate
* tag 'hwmon-for-linus-v4.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (tmp102) Fix first temperature reading
hwmon: (da9052) Increase sample rate when using TSI
Linus Torvalds [Thu, 26 Oct 2017 06:02:42 +0000 (08:02 +0200)]
Merge tag 'sound-4.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just two HD-audio fixups for a recent Realtek codec model. It's pretty
safe to apply (and unsurprisingly boring)"
* tag 'sound-4.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - fix headset mic problem for Dell machines with alc236
ALSA: hda/realtek - Add support for ALC236/ALC3204
Dave Airlie [Thu, 26 Oct 2017 04:49:44 +0000 (14:49 +1000)]
Merge branch 'drm-next-4.15' of git://people.freedesktop.org/~agd5f/linux into drm-next
Just a few fixes for 4.15.
* 'drm-next-4.15' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/amdgpu: Remove workaround for suspend/resume in uvd7
drm/amdgpu: don't flush the TLB before initializing GART
drm/amdgpu: minor cleanup for amdgpu_ttm_bind
drm/amdgpu/psp: prevent page fault by checking write_frame address(v4)
drm/amd/powerplay: retrieve the real-time coreClock values
drm/amd/powerplay: fix performance drop on Vega10
drm/amd/powerplay: add one smc message for Vega10
drm/amd/powerplay: fix amd_powerplay_reset()
amdgpu: add padding to the fence to handle ioctl.
drm/amdgpu:fix wb_clear
drm/amdgpu:fix vf_error_put
drm/amdgpu/sriov:now must reinit psp
drm/amdgpu: merge bios post checking functions
Evan Quan [Mon, 16 Oct 2017 08:51:28 +0000 (16:51 +0800)]
drm/amdgpu/psp: prevent page fault by checking write_frame address(v4)
- Prevent a possible buffer overflow when updating the ring buffer by
bounds checking the command frame against the available space in the
ring buffer.
v2: update the ring_buffer_end address
v3: update the commit log
v4: squash in print fix (Michel)
Evan Quan [Fri, 20 Oct 2017 07:42:34 +0000 (15:42 +0800)]
drm/amd/powerplay: retrieve the real-time coreClock values
- Currently, the coreClock value for min/max performance level on raven
is hard-coded. Use the real-time value retrieved by GetGfxMinFreqLimit
and GetGfxMaxFreqLimit PPSMC messages
Julien Gomes [Wed, 25 Oct 2017 18:50:50 +0000 (11:50 -0700)]
tun: allow positive return values on dev_get_valid_name() call
If the name argument of dev_get_valid_name() contains "%d", it will try
to assign it a unit number in __dev__alloc_name() and return either the
unit number (>= 0) or an error code (< 0).
Considering positive values as error values prevent tun device creations
relying this mechanism, therefor we should only consider negative values
as errors here.
nfp: refuse offloading filters that redirects to upper devices
Previously we did not ensure that a netdev is a representative netdev
before dereferencing its private data. This can occur when an upper netdev
is created on a representative netdev. This patch corrects this by first
ensuring that the netdev is a representative netdev before using it.
Checking only switchdev_port_same_parent_id is not sufficient to ensure
that we can safely use the netdev. Failing to check that the netdev is also
a representative netdev would result in incorrect dereferencing.
Fixes: 1a1e586f54bf ("nfp: add basic action capabilities to flower offloads") Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Pieter Jansen van Vuuren <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Andrei Vagin [Wed, 25 Oct 2017 17:16:42 +0000 (10:16 -0700)]
net/unix: don't show information about sockets from other namespaces
socket_diag shows information only about sockets from a namespace where
a diag socket lives.
But if we request information about one unix socket, the kernel don't
check that its netns is matched with a diag socket namespace, so any
user can get information about any unix socket in a system. This looks
like a bug.
v2: add a Fixes tag
Fixes: 51d7cccf0723 ("net: make sock diag per-namespace") Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Dan Carpenter [Tue, 24 Oct 2017 09:44:18 +0000 (12:44 +0300)]
drm/amd/powerplay: fix amd_powerplay_reset()
We accidentally inverted an if statement and turned amd_powerplay_reset()
into a no-op.
Fixes: ae97988fc89e ("drm/amd/powerplay: tidy up ret checks in amd_powerplay.c (v3)") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Michael J. Ruhl [Tue, 24 Oct 2017 12:41:01 +0000 (08:41 -0400)]
RDMA/netlink: OOPs in rdma_nl_rcv_msg() from misinterpreted flag
rdma_nl_rcv_msg() checks to see if it should use the .dump() callback
or the .doit() callback. The check is done with this check:
if (flags & NLM_F_DUMP) ...
The NLM_F_DUMP flag is two bits (NLM_F_ROOT | NLM_F_MATCH).
When an RDMA_NL_LS message (response) is received, the bit used for
indicating an error is the same bit as NLM_F_ROOT.
NLM_F_ROOT == (0x100) == RDMA_NL_LS_F_ERR.
ibacm sends a response with the RDMA_NL_LS_F_ERR bit set if an error
occurs in the service. The current code then misinterprets the
NLM_F_DUMP bit and trys to call the .dump() callback.
If the .dump() callback for the specified request is not available
(which is true for the RDMA_NL_LS messages) the following Oops occurs:
David Disseldorp [Fri, 20 Oct 2017 12:49:38 +0000 (14:49 +0200)]
SMB: fix validate negotiate info uninitialised memory use
An undersize validate negotiate info server response causes the client
to use uninitialised memory for struct validate_negotiate_info_rsp
comparisons of Dialect, SecurityMode and/or Capabilities members.
David Disseldorp [Fri, 20 Oct 2017 12:49:37 +0000 (14:49 +0200)]
SMB: fix leak of validate negotiate info response buffer
Fixes: ff1c038addc4 ("Check SMB3 dialects against downgrade attacks") Signed-off-by: David Disseldorp <[email protected]> Signed-off-by: Steve French <[email protected]>
Aurelien Aptel [Tue, 17 Oct 2017 12:47:17 +0000 (14:47 +0200)]
CIFS: do not send invalid input buffer on QUERY_INFO requests
query_info() doesn't use the InputBuffer field of the QUERY_INFO
request, therefore according to [MS-SMB2] it must:
a) set the InputBufferOffset to 0
b) send a zero-length InputBuffer
Doing a) is trivial but b) is a bit more tricky.
The packet is allocated according to it's StructureSize, which takes
into account an extra 1 byte buffer which we don't need
here. StructureSize fields must have constant values no matter the
actual length of the whole packet so we can't just edit that constant.
Both the NetBIOS-over-TCP message length ("rfc1002 length") L and the
iovec length L' have to be updated. Since L' is computed from L we
just update L by decrementing it by one.
Juergen Gross [Wed, 25 Oct 2017 15:08:07 +0000 (17:08 +0200)]
xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap()
In case gntdev_mmap() succeeds only partially in mapping grant pages
it will leave some vital information uninitialized needed later for
cleanup. This will lead to an out of bounds array access when unmapping
the already mapped pages.
So just initialize the data needed for unmapping the pages a little bit
earlier.
drm/i915/perf: fix perf enable/disable ioctls with 32bits userspace
The compat callback was missing and triggered failures in 32bits
userspace when enabling/disable the perf stream. We don't require any
particular processing here as these ioctls don't take any argument.
Ard Biesheuvel [Wed, 25 Oct 2017 10:04:48 +0000 (11:04 +0100)]
efi/libstub/arm: Don't randomize runtime regions when CONFIG_HIBERNATION=y
Commit:
e69176d68d26 ("ef/libstub/arm/arm64: Randomize the base of the UEFI rt services region")
implemented randomization of the virtual mapping that the OS chooses for
the UEFI runtime services. This was motivated by the fact that UEFI usually
does not bother to specify any permission restrictions for those regions,
making them prime real estate for exploitation now that the OS is getting
more and more careful not to leave any R+W+X mapped regions lying around.
However, this randomization breaks assumptions in the resume from
hibernation code, which expects all memory regions populated by UEFI to
remain in the same place, including their virtual mapping into the OS
memory space. While this assumption may not be entirely reasonable in the
first place, breaking it deliberately does not make a lot of sense either.
So let's refrain from this randomization pass if CONFIG_HIBERNATION=y.
Dan Carpenter [Wed, 25 Oct 2017 10:04:47 +0000 (11:04 +0100)]
efi/efi_test: Prevent an Oops in efi_runtime_query_capsulecaps()
If "qcaps.capsule_count" is ULONG_MAX then "qcaps.capsule_count + 1"
will overflow to zero and kcalloc() will return the ZERO_SIZE_PTR. We
try to dereference it inside the loop and crash.
Johannes Berg [Tue, 24 Oct 2017 19:12:13 +0000 (21:12 +0200)]
mac80211: don't compare TKIP TX MIC key in reinstall prevention
For the reinstall prevention, the code I had added compares the
whole key. It turns out though that iwlwifi firmware doesn't
provide the TKIP TX MIC key as it's not needed in client mode,
and thus the comparison will always return false.
For client mode, thus always zero out the TX MIC key part before
doing the comparison in order to avoid accepting the reinstall
of the key with identical encryption and RX MIC key, but not the
same TX MIC key (since the supplicant provides the real one.)
Fixes: fdf7cb4185b6 ("mac80211: accept key reinstall without changing anything") Signed-off-by: Johannes Berg <[email protected]>
* For non-universal planes add primary/cursor planes to lease
If we aren't exposing universal planes to this userspace client,
and it requests a lease on a crtc, we should implicitly export the
primary and cursor planes for the crtc.
If the lessee doesn't request universal planes, it will just see
the crtc, but if it does request them it will then see the plane
objects as well.
This also moves the object look ups earlier as a side effect, so
we'd exit the ioctl quicker for non-existant objects.
* Restrict leases to crtc/connector/planes.
This only allows leasing for objects we wish to allow.
* Check pad args are 0
* Check create flags and object count are valid.
* Check return from fd allocation
* Refactor lease idr setup and add some simple validation
* Use idr_mutex uniformly (Keith)
Keith Packard [Mon, 10 Apr 2017 04:35:34 +0000 (22:35 -0600)]
drm: Check mode object lease status in all master ioctl paths [v4]
Attempts to modify un-leased objects are rejected with an error.
Information returned about unleased objects is modified to make them
appear unusable and/or disconnected.
* With the change in the __drm_mode_object_find API to pass the
file_priv along, we can now centralize most of the lease-based
access checks in that function.
* A few places skip that API and require in-line checks.
Keith Packard [Wed, 15 Mar 2017 05:26:41 +0000 (22:26 -0700)]
drm: Add drm_object lease infrastructure [v5]
This provides new data structures to hold "lease" information about
drm mode setting objects, and provides for creating new drm_masters
which have access to a subset of the available drm resources.
An 'owner' is a drm_master which is not leasing the objects from
another drm_master, and hence 'owns' them.
A 'lessee' is a drm_master which is leasing objects from some other
drm_master. Each lessee holds the set of objects which it is leasing
from the lessor.
A 'lessor' is a drm_master which is leasing objects to another
drm_master. This is the same as the owner in the current code.
The set of objects any drm_master 'controls' is limited to the set of
objects it leases (for lessees) or all objects (for owners).
Objects not controlled by a drm_master cannot be modified through the
various state manipulating ioctls, and any state reported back to user
space will be edited to make them appear idle and/or unusable. For
instance, connectors always report 'disconnected', while encoders
report no possible crtcs or clones.
The full list of lessees leasing objects from an owner (either
directly, or indirectly through another lessee), can be searched from
an idr in the drm_master of the owner.
* Add revocation. This allows leases to be effectively revoked by
removing all of the objects they have access to. The lease itself
hangs around as it's hanging off a file.
* Free the leases IDR when the master is destroyed
* _drm_lease_held should look at lessees, not lessor
* Allow non-master files to check for lease status
* check DRIVER_MODESET before lease destroy call
* check DRIVER_MODESET for lease revoke (Chris)
* Use idr_mutex uniformly for all lease elements of struct drm_master. (Keith)
The new detection code for guest machine checks added a check based
on %r11 to .Lcleanup_sie to distinguish between normal asynchronous
interrupts and machine checks. But the funtion is called from the
program check handler as well with an undefined value in %r11.
The effect is that all program exceptions pointing to the SIE instruction
will set the CIF_MCCK_GUEST bit. The bit stays set for the CPU until the
next machine check comes in which will incorrectly be interpreted as a
guest machine check.
The simplest fix is to stop using .Lcleanup_sie in the program check
handler and duplicate a few instructions.
Fixes: c929500d7a5a ("s390/nmi: s390: New low level handling for machine check happening in guest") Cc: <[email protected]> # v4.13+ Reviewed-by: Christian Borntraeger <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
Linus Torvalds [Wed, 25 Oct 2017 04:46:43 +0000 (06:46 +0200)]
Merge tag 'nfs-for-4.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Fix a list corruption in xprt_release()
- Fix a workqueue lockdep warning due to unsafe use of
cancel_work_sync()
* tag 'nfs-for-4.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Destroy transport from the system workqueue
SUNRPC: fix a list corruption issue in xprt_release()
Vivien Didelot [Tue, 24 Oct 2017 20:37:19 +0000 (16:37 -0400)]
net: dsa: check master device before put
In the case of pdata, the dsa_cpu_parse function calls dev_put() before
making sure it isn't NULL. Fix this.
Fixes: 71e0bbde0d88 ("net: dsa: Add support for platform data") Signed-off-by: Vivien Didelot <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Josef Bacik [Tue, 24 Oct 2017 19:57:18 +0000 (15:57 -0400)]
nbd: handle interrupted sendmsg with a sndtimeo set
If you do not set sk_sndtimeo you will get -ERESTARTSYS if there is a
pending signal when you enter sendmsg, which we handle properly.
However if you set a timeout for your commands we'll set sk_sndtimeo to
that timeout, which means that sendmsg will start returning -EINTR
instead of -ERESTARTSYS. Fix this by checking either cases and doing
the correct thing.