]> Git Repo - linux.git/log
linux.git
6 years agoMerge branch 'tls-fixes'
David S. Miller [Fri, 15 Jun 2018 16:14:31 +0000 (09:14 -0700)]
Merge branch 'tls-fixes'

Daniel Borkmann says:

====================
Two tls fixes

First one is syzkaller trigered uaf and second one noticed
while writing test code with tls ulp. For details please see
individual patches.
====================

Signed-off-by: David S. Miller <[email protected]>
6 years agotls: fix waitall behavior in tls_sw_recvmsg
Daniel Borkmann [Fri, 15 Jun 2018 01:07:46 +0000 (03:07 +0200)]
tls: fix waitall behavior in tls_sw_recvmsg

Current behavior in tls_sw_recvmsg() is to wait for incoming tls
messages and copy up to exactly len bytes of data that the user
provided. This is problematic in the sense that i) if no packet
is currently queued in strparser we keep waiting until one has been
processed and pushed into tls receive layer for tls_wait_data() to
wake up and push the decrypted bits to user space. Given after
tls decryption, we're back at streaming data, use sock_rcvlowat()
hint from tcp socket instead. Retain current behavior with MSG_WAITALL
flag and otherwise use the hint target for breaking the loop and
returning to application. This is done if currently no ctx->recv_pkt
is ready, otherwise continue to process it from our strparser
backlog.

Fixes: c46234ebb4d1 ("tls: RX path for ktls")
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Dave Watson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agotls: fix use-after-free in tls_push_record
Daniel Borkmann [Fri, 15 Jun 2018 01:07:45 +0000 (03:07 +0200)]
tls: fix use-after-free in tls_push_record

syzkaller managed to trigger a use-after-free in tls like the
following:

  BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls]
  Write of size 1 at addr ffff88037aa08000 by task a.out/2317

  CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144
  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
  Call Trace:
   dump_stack+0x71/0xab
   print_address_description+0x6a/0x280
   kasan_report+0x258/0x380
   ? tls_push_record.constprop.15+0x6a2/0x810 [tls]
   tls_push_record.constprop.15+0x6a2/0x810 [tls]
   tls_sw_push_pending_record+0x2e/0x40 [tls]
   tls_sk_proto_close+0x3fe/0x710 [tls]
   ? tcp_check_oom+0x4c0/0x4c0
   ? tls_write_space+0x260/0x260 [tls]
   ? kmem_cache_free+0x88/0x1f0
   inet_release+0xd6/0x1b0
   __sock_release+0xc0/0x240
   sock_close+0x11/0x20
   __fput+0x22d/0x660
   task_work_run+0x114/0x1a0
   do_exit+0x71a/0x2780
   ? mm_update_next_owner+0x650/0x650
   ? handle_mm_fault+0x2f5/0x5f0
   ? __do_page_fault+0x44f/0xa50
   ? mm_fault_error+0x2d0/0x2d0
   do_group_exit+0xde/0x300
   __x64_sys_exit_group+0x3a/0x50
   do_syscall_64+0x9a/0x300
   ? page_fault+0x8/0x30
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happened through fault injection where aead_req allocation in
tls_do_encryption() eventually failed and we returned -ENOMEM from
the function. Turns out that the use-after-free is triggered from
tls_sw_sendmsg() in the second tls_push_record(). The error then
triggers a jump to waiting for memory in sk_stream_wait_memory()
resp. returning immediately in case of MSG_DONTWAIT. What follows is
the trim_both_sgl(sk, orig_size), which drops elements from the sg
list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
when the socket is being closed, where tls_sk_proto_close() callback
is invoked. The tls_complete_pending_work() will figure that there's
a pending closed tls record to be flushed and thus calls into the
tls_push_pending_closed_record() from there. ctx->push_pending_record()
is called from the latter, which is the tls_sw_push_pending_record()
from sw path. This again calls into tls_push_record(). And here the
tls_fill_prepend() will panic since the buffer address has been freed
earlier via trim_both_sgl(). One way to fix it is to move the aead
request allocation out of tls_do_encryption() early into tls_push_record().
This means we don't prep the tls header and advance state to the
TLS_PENDING_CLOSED_RECORD before allocation which could potentially
fail happened. That fixes the issue on my side.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Reported-by: [email protected]
Reported-by: [email protected]
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Dave Watson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoMerge branch 'l2tp-l2tp_ppp-must-ignore-non-PPP-sessions'
David S. Miller [Fri, 15 Jun 2018 16:12:37 +0000 (09:12 -0700)]
Merge branch 'l2tp-l2tp_ppp-must-ignore-non-PPP-sessions'

Guillaume Nault says:

====================
l2tp: l2tp_ppp must ignore non-PPP sessions

The original L2TP code was written for version 2 of the protocol, which
could only carry PPP sessions. Then L2TPv3 generalised the protocol so that
it could transport different kinds of pseudo-wires. But parts of the
l2tp_ppp module still break in presence of non-PPP sessions.

Assuming L2TPv2 tunnels can only transport PPP sessions is right, but
l2tp_netlink failed to ensure that (fixed in patch 1).
When retrieving a session from an arbitrary tunnel, l2tp_ppp needs to
filter out non-PPP sessions (last occurrence fixed in patch 2).
====================

Signed-off-by: David S. Miller <[email protected]>
6 years agol2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
Guillaume Nault [Fri, 15 Jun 2018 13:39:19 +0000 (15:39 +0200)]
l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()

pppol2tp_tunnel_ioctl() can act on an L2TPv3 tunnel, in which case
'session' may be an Ethernet pseudo-wire.

However, pppol2tp_session_ioctl() expects a PPP pseudo-wire, as it
assumes l2tp_session_priv() points to a pppol2tp_session structure. For
an Ethernet pseudo-wire l2tp_session_priv() points to an l2tp_eth_sess
structure instead, making pppol2tp_session_ioctl() access invalid
memory.

Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agol2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
Guillaume Nault [Fri, 15 Jun 2018 13:39:17 +0000 (15:39 +0200)]
l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels

The /proc/net/pppol2tp handlers (pppol2tp_seq_*()) iterate over all
L2TPv2 tunnels, and rightfully expect that only PPP sessions can be
found there. However, l2tp_netlink accepts creating Ethernet sessions
regardless of the underlying tunnel version.

This confuses pppol2tp_seq_session_show(), which expects that
l2tp_session_priv() returns a pppol2tp_session structure. When the
session is an Ethernet pseudo-wire, a struct l2tp_eth_sess is returned
instead. This leads to invalid memory access when
pppol2tp_session_get_sock() later tries to dereference ps->sk.

Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoMerge branch 'mlxsw-IPv6-and-reference-counting-fixes'
David S. Miller [Fri, 15 Jun 2018 16:11:17 +0000 (09:11 -0700)]
Merge branch 'mlxsw-IPv6-and-reference-counting-fixes'

Ido Schimmel says:

====================
mlxsw: IPv6 and reference counting fixes

The first three patches fix a mismatch between the new IPv6 behavior
introduced in commit f34436a43092 ("net/ipv6: Simplify route replace and
appending into multipath route") and mlxsw. The patches allow the driver
to support multipathing in IPv6 overlays with GRE tunnel devices. A
selftest will be submitted when net-next opens.

The last patch fixes a reference count problem of the port_vlan struct.
I plan to simplify the code in net-next, so that reference counting is
not necessary anymore.
====================

Signed-off-by: David S. Miller <[email protected]>
6 years agomlxsw: spectrum_switchdev: Fix port_vlan refcounting
Petr Machata [Fri, 15 Jun 2018 13:23:38 +0000 (16:23 +0300)]
mlxsw: spectrum_switchdev: Fix port_vlan refcounting

Switchdev notifications for addition of SWITCHDEV_OBJ_ID_PORT_VLAN are
distributed not only on clean addition, but also when flags on an
existing VLAN are changed. mlxsw_sp_bridge_port_vlan_add() calls
mlxsw_sp_port_vlan_get() to get at the port_vlan in question, which
implicitly references the object. This then leads to discrepancies in
reference counting when the VLAN is removed. spectrum.c warns about the
problem when the module is removed:

[13578.493090] WARNING: CPU: 0 PID: 2454 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2973 mlxsw_sp_port_remove+0xfd/0x110 [mlxsw_spectrum]
[...]
[13578.627106] Call Trace:
[13578.629617]  mlxsw_sp_fini+0x2a/0xe0 [mlxsw_spectrum]
[13578.634748]  mlxsw_core_bus_device_unregister+0x3e/0x130 [mlxsw_core]
[13578.641290]  mlxsw_pci_remove+0x13/0x40 [mlxsw_pci]
[13578.646238]  pci_device_remove+0x31/0xb0
[13578.650244]  device_release_driver_internal+0x14f/0x220
[13578.655562]  driver_detach+0x32/0x70
[13578.659183]  bus_remove_driver+0x47/0xa0
[13578.663134]  pci_unregister_driver+0x1e/0x80
[13578.667486]  mlxsw_sp_module_exit+0xc/0x3fa [mlxsw_spectrum]
[13578.673207]  __x64_sys_delete_module+0x13b/0x1e0
[13578.677888]  ? exit_to_usermode_loop+0x78/0x80
[13578.682374]  do_syscall_64+0x39/0xe0
[13578.685976]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix by putting the port_vlan when mlxsw_sp_port_vlan_bridge_join()
determines it's a flag-only change.

Fixes: b3529af6bb0d ("spectrum: Reference count VLAN entries")
Signed-off-by: Petr Machata <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agomlxsw: spectrum_router: Align with new route replace logic
Ido Schimmel [Fri, 15 Jun 2018 13:23:37 +0000 (16:23 +0300)]
mlxsw: spectrum_router: Align with new route replace logic

Commit f34436a43092 ("net/ipv6: Simplify route replace and appending
into multipath route") changed the IPv6 route replace logic so that the
first matching route (i.e., same metric) is replaced.

Have mlxsw replace the first matching route as well.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agomlxsw: spectrum_router: Allow appending to dev-only routes
Ido Schimmel [Fri, 15 Jun 2018 13:23:36 +0000 (16:23 +0300)]
mlxsw: spectrum_router: Allow appending to dev-only routes

Commit f34436a43092 ("net/ipv6: Simplify route replace and appending
into multipath route") changed the IPv6 route append logic so that
dev-only routes can be appended and not only gatewayed routes.

Align mlxsw with the new behaviour.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoipv6: Only emit append events for appended routes
Ido Schimmel [Fri, 15 Jun 2018 13:23:35 +0000 (16:23 +0300)]
ipv6: Only emit append events for appended routes

Current code will emit an append event in the FIB notification chain for
any route added with NLM_F_APPEND set, even if the route was not
appended to any existing route.

This is inconsistent with IPv4 where such an event is only emitted when
the new route is appended after an existing one.

Align IPv6 behavior with IPv4, thereby allowing listeners to more easily
handle these events.

Fixes: f34436a43092 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoMerge tag 'mac80211-for-davem-2018-06-15' of git://git.kernel.org/pub/scm/linux/kerne...
David S. Miller [Fri, 15 Jun 2018 16:08:26 +0000 (09:08 -0700)]
Merge tag 'mac80211-for-davem-2018-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A handful of fixes:
 * missing RCU grace period enforcement led to drivers freeing
   data structures before; fix from Dedy Lansky.
 * hwsim module init error paths were messed up; fixed it myself
   after a report from Colin King (who had sent a partial patch)
 * kernel-doc tag errors; fix from Luca Coelho
 * initialize the on-stack sinfo data structure when getting
   station information; fix from Sven Eckelmann
 * TXQ state dumping is now done from init, and when TXQs aren't
   initialized yet at that point, bad things happen, move the
   initialization; fix from Toke Høiland-Jørgensen.
====================

Signed-off-by: David S. Miller <[email protected]>
6 years agostmmac: added support for 802.1ad vlan stripping
Elad Nachman [Fri, 15 Jun 2018 06:57:39 +0000 (09:57 +0300)]
stmmac: added support for 802.1ad vlan stripping

stmmac reception handler calls stmmac_rx_vlan() to strip the vlan before
calling napi_gro_receive().

The function assumes VLAN tagged frames are always tagged with
802.1Q protocol, and assigns ETH_P_8021Q to the skb by hard-coding
the parameter on call to __vlan_hwaccel_put_tag() .

This causes packets not to be passed to the VLAN slave if it was created
with 802.1AD protocol
(ip link add link eth0 eth0.100 type vlan proto 802.1ad id 100).

This fix passes the protocol from the VLAN header into
__vlan_hwaccel_put_tag() instead of using the hard-coded value of
ETH_P_8021Q.

NETIF_F_HW_VLAN_STAG_RX check was added and the strip action is now
dependent on the correct combination of features and the detected vlan tag.

NETIF_F_HW_VLAN_STAG_RX feature was added to be in line with the driver
actual abilities.

Signed-off-by: Elad Nachman <[email protected]>
Reviewed-by: Toshiaki Makita <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoarch/*: Kconfig: fix documentation for NMI watchdog
Mauro Carvalho Chehab [Wed, 9 May 2018 02:44:08 +0000 (23:44 -0300)]
arch/*: Kconfig: fix documentation for NMI watchdog

Changeset 9919cba7ff71 ("watchdog: Update documentation") updated
the documentation, removing the old nmi_watchdog.txt and adding
a file with a new content.

Update Kconfig files accordingly.

Fixes: 9919cba7ff71 ("watchdog: Update documentation")
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Acked-by: Jonathan Corbet <[email protected]>
6 years agodocs: crypto_engine.rst: Fix two parse warnings
Mauro Carvalho Chehab [Sun, 6 May 2018 17:30:09 +0000 (14:30 -0300)]
docs: crypto_engine.rst: Fix two parse warnings

./Documentation/crypto/crypto_engine.rst:13: WARNING: Unexpected indentation.
./Documentation/crypto/crypto_engine.rst:15: WARNING: Block quote ends without a blank line; unexpected unindent.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Acked-by: Jonathan Corbet <[email protected]>
6 years agodocs: can.rst: fix a footnote reference
Mauro Carvalho Chehab [Sun, 6 May 2018 15:00:11 +0000 (12:00 -0300)]
docs: can.rst: fix a footnote reference

As stated at:
http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#footnotes

A footnote should contain either a number, a reference or
an auto number, e. g.:
        [1], [#f1] or [#].

While using [*] accidentaly works for html, it fails for other
document outputs. In particular, it causes an error with LaTeX
output, causing all books after networking to not be built.

So, replace it by a valid syntax.

Acked-by: Oliver Hartkopp <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Acked-by: Jonathan Corbet <[email protected]>
6 years agoafs: Optimise callback breaking by not repeating volume lookup
David Howells [Fri, 15 Jun 2018 14:24:50 +0000 (15:24 +0100)]
afs: Optimise callback breaking by not repeating volume lookup

At the moment, afs_break_callbacks calls afs_break_one_callback() for each
separate FID it was given, and the latter looks up the volume individually
for each one.

However, this is inefficient if two or more FIDs have the same vid as we
could reuse the volume.  This is complicated by cell aliasing whereby we
may have multiple cells sharing a volume and can therefore have multiple
callback interests for any particular volume ID.

At the moment afs_break_one_callback() scans the entire list of volumes
we're getting from a server and breaks the appropriate callback in every
matching volume, regardless of cell.  This scan is done for every FID.

Optimise callback breaking by the following means:

 (1) Sort the FID list by vid so that all FIDs belonging to the same volume
     are clumped together.

     This is done through the use of an indirection table as we cannot do
     an insertion sort on the afs_callback_break array as we decode FIDs
     into it as we subsequently also have to decode callback info into it
     that corresponds by array index only.

     We also don't really want to bubblesort afterwards if we can avoid it.

 (2) Sort the server->cb_interests array by vid so that all the matching
     volumes are grouped together.  This permits the scan to stop after
     finding a record that has a higher vid.

 (3) When breaking FIDs, we try to keep server->cb_break_lock as long as
     possible, caching the start point in the array for that volume group
     as long as possible.

     It might make sense to add another layer in that list and have a
     refcounted volume ID anchor that has the matching interests attached
     to it rather than being in the list.  This would allow the lock to be
     dropped without losing the cursor.

Signed-off-by: David Howells <[email protected]>
6 years agoafs: Display manually added cells in dynamic root mount
David Howells [Fri, 15 Jun 2018 14:19:22 +0000 (15:19 +0100)]
afs: Display manually added cells in dynamic root mount

Alter the dynroot mount so that cells created by manipulation of
/proc/fs/afs/cells and /proc/fs/afs/rootcell and by specification of a root
cell as a module parameter will cause directories for those cells to be
created in the dynamic root superblock for the network namespace[*].

To this end:

 (1) Only one dynamic root superblock is now created per network namespace
     and this is shared between all attempts to mount it.  This makes it
     easier to find the superblock to modify.

 (2) When a dynamic root superblock is created, the list of cells is walked
     and directories created for each cell already defined.

 (3) When a new cell is added, if a dynamic root superblock exists, a
     directory is created for it.

 (4) When a cell is destroyed, the directory is removed.

 (5) These directories are created by calling lookup_one_len() on the root
     dir which automatically creates them if they don't exist.

[*] Inasmuch as network namespaces are currently supported here.

Signed-off-by: David Howells <[email protected]>
6 years agoafs: Enable IPv6 DNS lookups
David Howells [Fri, 15 Jun 2018 14:19:10 +0000 (15:19 +0100)]
afs: Enable IPv6 DNS lookups

Remove the restriction on DNS lookup upcalls that prevents ipv6 addresses
from being looked up.

Signed-off-by: David Howells <[email protected]>
6 years agobsg: fix race of bsg_open and bsg_unregister
Anatoliy Glagolev [Wed, 13 Jun 2018 21:38:51 +0000 (15:38 -0600)]
bsg: fix race of bsg_open and bsg_unregister

The existing implementation allows races between bsg_unregister and
bsg_open paths. bsg_unregister and request_queue cleanup and deletion
may start and complete right after bsg_get_device (in bsg_open path)
retrieves bsg_class_device and releases the mutex. Then bsg_open path
touches freed memory of bsg_class_device and request_queue.

One possible fix is to hold the mutex all the way through bsg_get_device
instead of releasing it after bsg_class_device retrieval.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-Off-By: Anatoliy Glagolev <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
6 years agoblock: remov blk_queue_invalidate_tags
Christoph Hellwig [Fri, 15 Jun 2018 11:55:07 +0000 (13:55 +0200)]
block: remov blk_queue_invalidate_tags

This function is entirely unused, so remove it and the tag_queue_busy
member of struct request_queue.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
6 years agoMerge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-linus
Jens Axboe [Fri, 15 Jun 2018 14:11:05 +0000 (08:11 -0600)]
Merge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-linus

Pull NVMe fixes from Christoph:

"Fix various little regressions introduced in this merge window, plus
 a rework of the fibre channel connect and reconnect path to share the
 code instead of having separate sets of bugs. Last but not least a
 trivial trace point addition from Hannes."

* 'nvme-4.18' of git://git.infradead.org/nvme:
  nvme-fabrics: fix and refine state checks in __nvmf_check_ready
  nvme-fabrics: handle the admin-only case properly in nvmf_check_ready
  nvme-fabrics: refactor queue ready check
  blk-mq: remove blk_mq_tagset_iter
  nvme: remove nvme_reinit_tagset
  nvme-fc: fix nulling of queue data on reconnect
  nvme-fc: remove reinit_request routine
  nvme-fc: change controllers first connect to use reconnect path
  nvme: don't rely on the changed namespace list log
  nvmet: free smart-log buffer after use
  nvme-rdma: fix error flow during mapping request data
  nvme: add bio remapping tracepoint
  nvme: fix NULL pointer dereference in nvme_init_subsystem

6 years agocfg80211: fix rcu in cfg80211_unregister_wdev
Dedy Lansky [Fri, 15 Jun 2018 11:05:01 +0000 (13:05 +0200)]
cfg80211: fix rcu in cfg80211_unregister_wdev

Callers of cfg80211_unregister_wdev can free the wdev object
immediately after this function returns. This may crash the kernel
because this wdev object is still in use by other threads.
Add synchronize_rcu() after list_del_rcu to make sure wdev object can
be safely freed.

Signed-off-by: Dedy Lansky <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
6 years agomac80211: Move up init of TXQs
Toke Høiland-Jørgensen [Fri, 25 May 2018 12:29:21 +0000 (14:29 +0200)]
mac80211: Move up init of TXQs

On init, ieee80211_if_add() dumps the interface. Since that now includes a
dump of the TXQ state, we need to initialise that before the dump happens.
So move up the TXQ initialisation to to before the call to
ieee80211_if_add().

Fixes: 52539ca89f36 ("cfg80211: Expose TXQ stats and parameters to userspace")
Reported-by: Niklas Cassel <[email protected]>
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Tested-by: Niklas Cassel <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
6 years agomac80211_hwsim: fix module init error paths
Johannes Berg [Tue, 29 May 2018 10:04:51 +0000 (12:04 +0200)]
mac80211_hwsim: fix module init error paths

We didn't free the workqueue on any errors, nor did we
correctly check for rhashtable allocation errors, nor
did we free the hashtable on error.

Reported-by: Colin King <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
6 years agocfg80211: initialize sinfo in cfg80211_get_station
Sven Eckelmann [Wed, 6 Jun 2018 08:53:55 +0000 (10:53 +0200)]
cfg80211: initialize sinfo in cfg80211_get_station

Most of the implementations behind cfg80211_get_station will not initialize
sinfo to zero before manipulating it. For example, the member "filled",
which indicates the filled in parts of this struct, is often only modified
by enabling certain bits in the bitfield while keeping the remaining bits
in their original state. A caller without a preinitialized sinfo.filled can
then no longer decide which parts of sinfo were filled in by
cfg80211_get_station (or actually the underlying implementations).

cfg80211_get_station must therefore take care that sinfo is initialized to
zero. Otherwise, the caller may tries to read information which was not
filled in and which must therefore also be considered uninitialized. In
batadv_v_elp_get_throughput's case, an invalid "random" expected throughput
may be stored for this neighbor and thus the B.A.T.M.A.N V algorithm may
switch to non-optimal neighbors for certain destinations.

Fixes: 7406353d43c8 ("cfg80211: implement cfg80211_get_station cfg80211 API")
Reported-by: Thomas Lauer <[email protected]>
Reported-by: Marcel Schmidt <[email protected]>
Cc: [email protected]
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
6 years agonl80211: fix some kernel doc tag mistakes
Luca Coelho [Fri, 8 Jun 2018 07:04:47 +0000 (10:04 +0300)]
nl80211: fix some kernel doc tag mistakes

There is a bunch of tags marking constants with &, which means struct
or enum name.  Replace them with %, which is the correct tag for
constants.

Signed-off-by: Luca Coelho <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
6 years agonvme-fabrics: fix and refine state checks in __nvmf_check_ready
Christoph Hellwig [Mon, 11 Jun 2018 15:41:11 +0000 (17:41 +0200)]
nvme-fabrics: fix and refine state checks in __nvmf_check_ready

 - make sure we only allow internally generates commands in any non-live
   state
 - only allow connect commands on non-live queues when actually in the
   new or connecting states
 - treat all other non-live, non-dead states the same as a default
   cach-all

This fixes a regression where we could not shutdown a controller
orderly as we didn't allow the internal generated Property Set
command, and also ensures we don't accidentally let a Connect command
through in the wrong state.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: James Smart <[email protected]>
6 years agonvme-fabrics: handle the admin-only case properly in nvmf_check_ready
Christoph Hellwig [Mon, 11 Jun 2018 15:37:23 +0000 (17:37 +0200)]
nvme-fabrics: handle the admin-only case properly in nvmf_check_ready

In the ADMIN_ONLY state we don't have any I/O queues, but we should accept
all admin commands without further checks.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: James Smart <[email protected]>

6 years agonvme-fabrics: refactor queue ready check
Christoph Hellwig [Mon, 11 Jun 2018 15:34:06 +0000 (17:34 +0200)]
nvme-fabrics: refactor queue ready check

Move the is_connected check to the fibre channel transport, as it has no
meaning for other transports.  To facilitate this split out a new
nvmf_fail_nonready_command helper that is called by the transport when
it is asked to handle a command on a queue that is not ready.

Also avoid a function call for the queue live fast path by inlining
the check.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: James Smart <[email protected]>
6 years agoMerge tag 'linux-kselftest-4.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 15 Jun 2018 08:26:29 +0000 (17:26 +0900)]
Merge tag 'linux-kselftest-4.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull more Kselftest updates from Shuah Khan:

 - fix a signedness bug in cgroups test

 - add ppc support for kprobe args tests

* tag 'linux-kselftest-4.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kselftest/cgroup: fix a signedness bug
  selftests/ftrace: Add ppc support for kprobe args tests

6 years agoMerge tag 'sound-fix-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 15 Jun 2018 08:24:40 +0000 (17:24 +0900)]
Merge tag 'sound-fix-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Here is a collection of small fixes on top of the previous update.

  All small and obvious fixes. Mostly for usual suspects, USB-audio and
  HD-audio, but a few trivial error handling fixes for misc drivers as
  well"

* tag 'sound-fix-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Always create the interrupt pipe for the mixer
  ALSA: usb-audio: Add insertion control for UAC3 BADD
  ALSA: usb-audio: Change in connectors control creation interface
  ALSA: usb-audio: Add bi-directional terminal types
  ALSA: lx6464es: add error handling for pci_ioremap_bar
  ALSA: sonicvibes: add error handling for snd_ctl_add
  ALSA: usb-audio: Remove explicitly listed Mytek devices
  ALSA: usb-audio: Generic DSD detection for XMOS-based implementations
  ALSA: usb-audio: Add native DSD support for Mytek DACs
  ALSA: hda/realtek - Add shutup hint
  ALSA: usb-audio: Disable the quirk for Nura headset
  ALSA: hda: add dock and led support for HP ProBook 640 G4
  ALSA: hda: add dock and led support for HP EliteBook 830 G5
  ALSA: emu10k1: add error handling for snd_ctl_add
  ALSA: fm801: add error handling for snd_ctl_add

6 years agoMerge tag 'drm-next-2018-06-15' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 15 Jun 2018 08:20:53 +0000 (17:20 +0900)]
Merge tag 'drm-next-2018-06-15' of git://anongit.freedesktop.org/drm/drm

Pull amd drm fixes from Dave Airlie:
 "Just a single set of AMD fixes for stuff in -next for -rc1"

* tag 'drm-next-2018-06-15' of git://anongit.freedesktop.org/drm/drm: (47 commits)
  drm/amd/powerplay: Set higher SCLK&MCLK frequency than dpm7 in OD (v2)
  drm/amd/powerplay: remove uncessary extra gfxoff control call
  drm/amdgpu: fix parsing indirect register list v2
  drm/amd/include: Update df 3.6 mask and shift definition
  drm/amd/pp: Fix OD feature enable failed on Vega10 workstation cards
  drm/amd/display: Fix stale buffer object (bo) use
  drm/amd/pp: initialize result to before or'ing in data
  drm/amd/powerplay: fix wrong clock adjust sequence
  drm/amdgpu: Grab/put runtime PM references in atomic_commit_tail()
  drm/amd/powerplay: fix missed hwmgr check warning before call gfx_off_control handler
  drm/amdgpu: fix CG enabling hang with gfxoff enabled
  drm/amdgpu: fix clear_all and replace handling in the VM (v2)
  drm/amdgpu: add checking for sos version
  drm/amdgpu: fix the missed vcn fw version report
  Revert "drm/amdgpu: Add an ATPX quirk for hybrid laptop"
  drm/amdgpu/df: fix potential array out-of-bounds read
  drm/amdgpu: Fix NULL pointer when load kfd driver with PP block is disabled
  drm/gfx9: Update gc goldensetting for vega20.
  drm/amd/pp: Allow underclocking when od table is empty in vbios
  drm/amdgpu/display: check if ppfuncs exists before using it
  ...

6 years agosmb3: fix corrupt path in subdirs on smb311 with posix
Steve French [Fri, 15 Jun 2018 03:30:56 +0000 (22:30 -0500)]
smb3: fix corrupt path in subdirs on smb311 with posix

Signed-off-by: Steve French <[email protected]>
6 years agosmb3: do not display empty interface list
Steve French [Fri, 15 Jun 2018 02:59:31 +0000 (21:59 -0500)]
smb3: do not display empty interface list

If server does not support listing interfaces then do not
display empty "Server interfaces" line to avoid confusing users.

Signed-off-by: Steve French <[email protected]>
CC: Aurelien Aptel <[email protected]>
6 years agosmb3: Fix mode on mkdir on smb311 mounts
Steve French [Fri, 15 Jun 2018 02:56:32 +0000 (21:56 -0500)]
smb3: Fix mode on mkdir on smb311 mounts

mkdir was not passing the mode on smb3.11 mounts with posix extensions

Signed-off-by: Steve French <[email protected]>
6 years agocifs: Fix kernel oops when traceSMB is enabled
Paulo Alcantara [Thu, 14 Jun 2018 20:34:08 +0000 (17:34 -0300)]
cifs: Fix kernel oops when traceSMB is enabled

When traceSMB is enabled through 'echo 1 > /proc/fs/cifs/traceSMB', after a
mount, the following oops is triggered:

[   27.137943] BUG: unable to handle kernel paging request at
ffff8800f80c268b
[   27.143396] PGD 2c6b067 P4D 2c6b067 PUD 0
[   27.145386] Oops: 0000 [#1] SMP PTI
[   27.146186] CPU: 2 PID: 2655 Comm: mount.cifs Not tainted 4.17.0+ #39
[   27.147174] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.0.0-prebuilt.qemu-project.org 04/01/2014
[   27.148969] RIP: 0010:hex_dump_to_buffer+0x413/0x4b0
[   27.149738] Code: 48 8b 44 24 08 31 db 45 31 d2 48 89 6c 24 18 44 89
6c 24 24 48 c7 c1 78 b5 23 82 4c 89 64 24 10 44 89 d5 41 89 dc 4c 8d 58
02 <44> 0f b7 00 4d 89 dd eb 1f 83 c5 01 41 01 c4 41 39 ef 0f 84 48 fe
[   27.152396] RSP: 0018:ffffc9000058f8c0 EFLAGS: 00010246
[   27.153129] RAX: ffff8800f80c268b RBX: 0000000000000000 RCX:
ffffffff8223b578
[   27.153867] RDX: 0000000000000000 RSI: ffffffff81a55496 RDI:
0000000000000008
[   27.154612] RBP: 0000000000000000 R08: 0000000000000020 R09:
0000000000000083
[   27.155355] R10: 0000000000000000 R11: ffff8800f80c268d R12:
0000000000000000
[   27.156101] R13: 0000000000000002 R14: ffffc9000058f94d R15:
0000000000000008
[   27.156838] FS:  00007f1693a6b740(0000) GS:ffff88007fd00000(0000)
knlGS:0000000000000000
[   27.158354] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   27.159093] CR2: ffff8800f80c268b CR3: 00000000798fa001 CR4:
0000000000360ee0
[   27.159892] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   27.160661] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[   27.161464] Call Trace:
[   27.162123]  print_hex_dump+0xd3/0x160
[   27.162814] journal-offline (2658) used greatest stack depth: 13144
bytes left
[   27.162824]  ? __release_sock+0x60/0xd0
[   27.165344]  ? tcp_sendmsg+0x31/0x40
[   27.166177]  dump_smb+0x39/0x40
[   27.166972]  ? vsnprintf+0x236/0x490
[   27.167807]  __smb_send_rqst.constprop.12+0x103/0x430
[   27.168554]  ? apic_timer_interrupt+0xa/0x20
[   27.169306]  smb_send_rqst+0x48/0xc0
[   27.169984]  cifs_send_recv+0xda/0x420
[   27.170639]  SMB2_negotiate+0x23d/0xfa0
[   27.171301]  ? vsnprintf+0x236/0x490
[   27.171961]  ? smb2_negotiate+0x19/0x30
[   27.172586]  smb2_negotiate+0x19/0x30
[   27.173257]  cifs_negotiate_protocol+0x70/0xd0
[   27.173935]  ? kstrdup+0x43/0x60
[   27.174551]  cifs_get_smb_ses+0x295/0xbe0
[   27.175260]  ? lock_timer_base+0x67/0x80
[   27.175936]  ? __internal_add_timer+0x1a/0x50
[   27.176575]  ? add_timer+0x10f/0x230
[   27.177267]  cifs_mount+0x101/0x1190
[   27.177940]  ? cifs_smb3_do_mount+0x144/0x5c0
[   27.178575]  cifs_smb3_do_mount+0x144/0x5c0
[   27.179270]  mount_fs+0x35/0x150
[   27.179930]  vfs_kern_mount.part.28+0x54/0xf0
[   27.180567]  do_mount+0x5ad/0xc40
[   27.181234]  ? kmem_cache_alloc_trace+0xed/0x1a0
[   27.181916]  ksys_mount+0x80/0xd0
[   27.182535]  __x64_sys_mount+0x21/0x30
[   27.183220]  do_syscall_64+0x4e/0x100
[   27.183882]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   27.184535] RIP: 0033:0x7f169339055a
[   27.185192] Code: 48 8b 0d 41 d9 2b 00 f7 d8 64 89 01 48 83 c8 ff c3
66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0e d9 2b 00 f7 d8 64 89 01 48
[   27.187268] RSP: 002b:00007fff7b44eb58 EFLAGS: 00000202 ORIG_RAX:
00000000000000a5
[   27.188515] RAX: ffffffffffffffda RBX: 00007f1693a7e70e RCX:
00007f169339055a
[   27.189244] RDX: 000055b9f97f64e5 RSI: 000055b9f97f652c RDI:
00007fff7b45074f
[   27.189974] RBP: 000055b9fb8c9260 R08: 000055b9fb8ca8f0 R09:
0000000000000000
[   27.190721] R10: 0000000000000000 R11: 0000000000000202 R12:
000055b9fb8ca8f0
[   27.191429] R13: 0000000000000000 R14: 00007f1693a7c000 R15:
00007f1693a7e91d
[   27.192167] Modules linked in:
[   27.192797] CR2: ffff8800f80c268b
[   27.193435] ---[ end trace 67404c618badf323 ]---

The problem was that dump_smb() had been called with an invalid pointer,
that is, in __smb_send_rqst(), iov[1] doesn't exist (n_vec == 1).

This patch fixes it by relying on the n_vec value to dump out the smb
packets.

Signed-off-by: Paulo Alcantara <[email protected]>
Signed-off-by: Steve French <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
6 years agoCIFS: dump every session iface info
Aurelien Aptel [Thu, 14 Jun 2018 13:43:20 +0000 (15:43 +0200)]
CIFS: dump every session iface info

Signed-off-by: Aurelien Aptel <[email protected]>
Signed-off-by: Steve French <[email protected]>
6 years agoCIFS: parse and store info on iface queries
Aurelien Aptel [Thu, 14 Jun 2018 15:04:51 +0000 (17:04 +0200)]
CIFS: parse and store info on iface queries

Signed-off-by: Aurelien Aptel <[email protected]>
Signed-off-by: Steve French <[email protected]>
6 years agoCIFS: add iface info to struct cifs_ses
Aurelien Aptel [Thu, 14 Jun 2018 13:43:18 +0000 (15:43 +0200)]
CIFS: add iface info to struct cifs_ses

Signed-off-by: Aurelien Aptel <[email protected]>
Signed-off-by: Steve French <[email protected]>
6 years agoCIFS: complete PDU definitions for interface queries
Aurelien Aptel [Thu, 14 Jun 2018 13:43:17 +0000 (15:43 +0200)]
CIFS: complete PDU definitions for interface queries

Signed-off-by: Aurelien Aptel <[email protected]>
Signed-off-by: Steve French <[email protected]>
6 years agoCIFS: move default port definitions to cifsglob.h
Aurelien Aptel [Thu, 14 Jun 2018 13:43:16 +0000 (15:43 +0200)]
CIFS: move default port definitions to cifsglob.h

Signed-off-by: Aurelien Aptel <[email protected]>
Signed-off-by: Steve French <[email protected]>
6 years agocifs: Fix encryption/signing
Paulo Alcantara [Wed, 13 Jun 2018 16:54:14 +0000 (13:54 -0300)]
cifs: Fix encryption/signing

Since the rfc1002 generation was moved down to __smb_send_rqst(),
the transform header is now in rqst->rq_iov[0].

Correctly assign the transform header pointer in crypt_message().

Signed-off-by: Paulo Alcantara <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
6 years agocifs: update __smb_send_rqst() to take an array of requests
Ronnie Sahlberg [Mon, 11 Jun 2018 22:01:00 +0000 (08:01 +1000)]
cifs: update __smb_send_rqst() to take an array of requests

Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
6 years agocifs: remove smb2_send_recv()
Ronnie Sahlberg [Mon, 11 Jun 2018 22:00:59 +0000 (08:00 +1000)]
cifs: remove smb2_send_recv()

Now that we have the plumbing to pass request without an rfc1002
header all the way down to the point we write to the socket we no
longer need the smb2_send_recv() function.

Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
6 years agocifs: push rfc1002 generation down the stack
Ronnie Sahlberg [Mon, 11 Jun 2018 22:00:58 +0000 (08:00 +1000)]
cifs: push rfc1002 generation down the stack

Move the generation of the 4 byte length field down the stack and
generate it immediately before we start writing the data to the socket.

Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Aurelien Aptel <[email protected]>
Signed-off-by: Steve French <[email protected]>
6 years agosmb3: increase initial number of credits requested to allow write
Steve French [Wed, 13 Jun 2018 22:05:58 +0000 (17:05 -0500)]
smb3: increase initial number of credits requested to allow write

Compared to other clients the Linux smb3 client ramps up
credits very slowly, taking more than 128 operations before a
maximum size write could be sent (since the number of credits
requested is only 2 per small operation, causing the credit
limit to grow very slowly).

This lack of credits initially would impact large i/o performance,
when large i/o is tried early before enough credits are built up.

Signed-off-by: Steve French <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
6 years agocifs: minor documentation updates
Steve French [Wed, 13 Jun 2018 21:46:56 +0000 (16:46 -0500)]
cifs: minor documentation updates

Various minor cifs/smb3 documentation updates

Signed-off-by: Steve French <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
6 years agocifs: add lease tracking to the cached root fid
Ronnie Sahlberg [Wed, 13 Jun 2018 20:48:35 +0000 (06:48 +1000)]
cifs: add lease tracking to the cached root fid

Use a read lease for the cached root fid so that we can detect
when the content of the directory changes (via a break) at which time
we close the handle. On next access to the root the handle will be reopened
and cached again.

Signed-off-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
6 years agosmb3: note that smb3.11 posix extensions mount option is experimental
Steve French [Tue, 12 Jun 2018 17:11:31 +0000 (12:11 -0500)]
smb3: note that smb3.11 posix extensions mount option is experimental

Signed-off-by: Steve French <[email protected]>
6 years agoafs: Show all of a server's addresses in /proc/fs/afs/servers
David Howells [Sat, 2 Jun 2018 21:20:31 +0000 (22:20 +0100)]
afs: Show all of a server's addresses in /proc/fs/afs/servers

Show all of a server's addresses in /proc/fs/afs/servers, placing the
second plus addresses on padded lines of their own.  The current address is
marked with a star.

Signed-off-by: David Howells <[email protected]>
6 years agoafs: Handle CONFIG_PROC_FS=n
David Howells [Tue, 5 Jun 2018 09:54:24 +0000 (10:54 +0100)]
afs: Handle CONFIG_PROC_FS=n

The AFS filesystem depends at the moment on /proc for configuration and
also presents information that way - however, this causes a compilation
failure if procfs is disabled.

Fix it so that the procfs bits aren't compiled in if procfs is disabled.

This means that you can't configure the AFS filesystem directly, but it is
still usable provided that an up-to-date keyutils is installed to look up
cells by SRV or AFSDB DNS records.

Reported-by: Al Viro <[email protected]>
Signed-off-by: David Howells <[email protected]>
6 years agoproc: Make inline name size calculation automatic
David Howells [Wed, 13 Jun 2018 18:43:19 +0000 (19:43 +0100)]
proc: Make inline name size calculation automatic

Make calculation of the size of the inline name in struct proc_dir_entry
automatic, rather than having to manually encode the numbers and failing to
allow for lockdep.

Require a minimum inline name size of 33+1 to allow for names that look
like two hex numbers with a dash between.

Reported-by: Al Viro <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Al Viro <[email protected]>
6 years agoorangefs: simplify compat ioctl handling
Al Viro [Sun, 27 May 2018 12:52:48 +0000 (08:52 -0400)]
orangefs: simplify compat ioctl handling

no need to mess with copy_in_user(), etc...

Signed-off-by: Al Viro <[email protected]>
6 years agosignalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
Al Viro [Sun, 27 May 2018 12:35:50 +0000 (08:35 -0400)]
signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()

Signed-off-by: Al Viro <[email protected]>
6 years agohv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
Haiyang Zhang [Fri, 15 Jun 2018 01:29:09 +0000 (18:29 -0700)]
hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload

These fields in struct ndis_ipsecv2_offload and struct ndis_rsc_offload
are one byte according to the specs. This patch defines them with the
right size. These structs are not in use right now, but will be used soon.

Signed-off-by: Haiyang Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agords: avoid unenecessary cong_update in loop transport
Santosh Shilimkar [Thu, 14 Jun 2018 18:52:34 +0000 (11:52 -0700)]
rds: avoid unenecessary cong_update in loop transport

Loop transport which is self loopback, remote port congestion
update isn't relevant. Infact the xmit path already ignores it.
Receive path needs to do the same.

Reported-by: [email protected]
Reviewed-by: Sowmini Varadhan <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoMerge branch 'drm-next-4.18' of git://people.freedesktop.org/~agd5f/linux into drm...
Dave Airlie [Fri, 15 Jun 2018 01:32:23 +0000 (11:32 +1000)]
Merge branch 'drm-next-4.18' of git://people.freedesktop.org/~agd5f/linux into drm-next

Fixes for 4.18. Highlights:
- Fixes for gfxoff on Raven
- Remove an ATPX quirk now that the root cause is fixed
- Runtime PM fixes
- Vega20 register header update
- Wattman fixes
- Misc bug fixes

Signed-off-by: Dave Airlie <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
6 years agobpf, selftests: delete xfrm tunnel when test exits.
William Tu [Thu, 14 Jun 2018 12:01:06 +0000 (05:01 -0700)]
bpf, selftests: delete xfrm tunnel when test exits.

Make the printting of bpf xfrm tunnel better and
cleanup xfrm state and policy when xfrm test finishes.

Signed-off-by: William Tu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
6 years agobpf, selftest: check tunnel type more accurately
Jian Wang [Fri, 15 Jun 2018 01:22:17 +0000 (03:22 +0200)]
bpf, selftest: check tunnel type more accurately

Grep tunnel type directly to make sure 'ip' command supports it.

Signed-off-by: Jian Wang <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
6 years agoMerge branch 'bpf-misc-fixes'
Daniel Borkmann [Fri, 15 Jun 2018 01:13:18 +0000 (03:13 +0200)]
Merge branch 'bpf-misc-fixes'

Jakub Kicinski says:

====================
This small series allows test_offload.py selftest to run on modern
distributions which may create BPF programs for cgroups at boot,
like Ubuntu 18.04.  We still expect the program list to not be
altered by any other agent while the test is running, but no longer
depend on there being no BPF programs at all at the start.

Fixing the test revealed a small problem with bpftool, which doesn't
report the program load time very accurately.  Because nanoseconds
were not taken into account reported load time would fluctuate by
1 second.  First patch of the series takes care of fixing that.
====================

Signed-off-by: Daniel Borkmann <[email protected]>
6 years agoselftests/bpf: test offloads even with BPF programs present
Jakub Kicinski [Thu, 14 Jun 2018 18:06:56 +0000 (11:06 -0700)]
selftests/bpf: test offloads even with BPF programs present

Modern distroes increasingly make use of BPF programs.  Default
Ubuntu 18.04 installation boots with a number of cgroup_skb
programs loaded.

test_offloads.py tries to check if programs and maps are not
leaked on error paths by confirming the list of programs on the
system is empty between tests.

Since we can no longer expect the system to have no BPF objects
at boot try to remember the programs and maps present at the start,
and skip those when scanning the system.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
6 years agotools: bpftool: improve accuracy of load time
Jakub Kicinski [Thu, 14 Jun 2018 18:06:55 +0000 (11:06 -0700)]
tools: bpftool: improve accuracy of load time

BPF program load time is reported from the kernel relative to boot time.
If conversion to wall clock does not take nanosecond parts into account,
the load time reported by bpftool may differ by one second from run to
run.  This means JSON object reported by bpftool for a program will
randomly change.

Fixes: 71bb428fe2c1 ("tools: bpf: add bpftool")
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
6 years agoMerge branch 'l2tp-fixes'
David S. Miller [Fri, 15 Jun 2018 00:10:19 +0000 (17:10 -0700)]
Merge branch 'l2tp-fixes'

Guillaume Nault says:

====================
l2tp: pppol2tp_connect() fixes

This series fixes a few remaining issues with pppol2tp_connect().

It doesn't try to prevent invalid configurations that have no effect on
kernel's reliability. That would be work for a future patch set.

Patch 2 is the most important as it avoids an invalid pointer
dereference crashing the kernel. It depends on patch 1 for correctly
identifying L2TP session types.

Patches 3 and 4 avoid creating stale tunnels and sessions.
====================

Signed-off-by: David S. Miller <[email protected]>
6 years agol2tp: clean up stale tunnel or session in pppol2tp_connect's error path
Guillaume Nault [Wed, 13 Jun 2018 13:09:21 +0000 (15:09 +0200)]
l2tp: clean up stale tunnel or session in pppol2tp_connect's error path

pppol2tp_connect() may create a tunnel or a session. Remove them in
case of error.

Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agol2tp: prevent pppol2tp_connect() from creating kernel sockets
Guillaume Nault [Wed, 13 Jun 2018 13:09:20 +0000 (15:09 +0200)]
l2tp: prevent pppol2tp_connect() from creating kernel sockets

If 'fd' is negative, l2tp_tunnel_create() creates a tunnel socket using
the configuration passed in 'tcfg'. Currently, pppol2tp_connect() sets
the relevant fields to zero, tricking l2tp_tunnel_create() into setting
up an unusable kernel socket.

We can't set 'tcfg' with the required fields because there's no way to
get them from the current connect() parameters. So let's restrict
kernel sockets creation to the netlink API, which is the original use
case.

Fixes: 789a4a2c61d8 ("l2tp: Add support for static unmanaged L2TPv3 tunnels")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agol2tp: only accept PPP sessions in pppol2tp_connect()
Guillaume Nault [Wed, 13 Jun 2018 13:09:19 +0000 (15:09 +0200)]
l2tp: only accept PPP sessions in pppol2tp_connect()

l2tp_session_priv() returns a struct pppol2tp_session pointer only for
PPPoL2TP sessions. In particular, if the session is an L2TP_PWTYPE_ETH
pseudo-wire, l2tp_session_priv() returns a pointer to an l2tp_eth_sess
structure, which is much smaller than struct pppol2tp_session. This
leads to invalid memory dereference when trying to lock ps->sk_lock.

Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agol2tp: fix pseudo-wire type for sessions created by pppol2tp_connect()
Guillaume Nault [Wed, 13 Jun 2018 13:09:18 +0000 (15:09 +0200)]
l2tp: fix pseudo-wire type for sessions created by pppol2tp_connect()

Define cfg.pw_type so that the new session is created with its .pwtype
field properly set (L2TP_PWTYPE_PPP).

Not setting the pseudo-wire type had several annoying effects:

  * Invalid value returned in the L2TP_ATTR_PW_TYPE attribute when
    dumping sessions with the netlink API.

  * Impossibility to delete the session using the netlink API (because
    l2tp_nl_cmd_session_delete() gets the deletion callback function
    from an array indexed by the session's pseudo-wire type).

Also, there are several cases where we should check a session's
pseudo-wire type. For example, pppol2tp_connect() should refuse to
connect a session that is not PPPoL2TP, but that requires the session's
.pwtype field to be properly set.

Fixes: f7faffa3ff8e ("l2tp: Add L2TPv3 protocol support")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoeventpoll: switch to ->poll_mask
Ben Noordhuis [Thu, 14 Jun 2018 22:32:07 +0000 (00:32 +0200)]
eventpoll: switch to ->poll_mask

Signed-off-by: Ben Noordhuis <[email protected]>
Signed-off-by: Al Viro <[email protected]>
6 years agoaio: only return events requested in poll_mask() for IOCB_CMD_POLL
Christoph Hellwig [Mon, 11 Jun 2018 06:50:10 +0000 (08:50 +0200)]
aio: only return events requested in poll_mask() for IOCB_CMD_POLL

The ->poll_mask() operation has a mask of events that the caller
is interested in, but not all implementations might take it into
account.  Mask the return value to only the requested events,
similar to what the poll and epoll code does.

Reported-by: Avi Kivity <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Al Viro <[email protected]>
6 years agoMerge branch 'emaclite-fixes'
David S. Miller [Fri, 15 Jun 2018 00:08:04 +0000 (17:08 -0700)]
Merge branch 'emaclite-fixes'

Radhey Shyam Pandey says:

====================
emaclite bug fixes and code cleanup

This patch series fixes bug in emaclite remove and mdio_setup routines.
It does minor code cleanup.
====================

Signed-off-by: David S. Miller <[email protected]>
6 years agonet: emaclite: Remove xemaclite_mdio_setup return check
Radhey Shyam Pandey [Wed, 13 Jun 2018 06:35:19 +0000 (12:05 +0530)]
net: emaclite: Remove xemaclite_mdio_setup return check

Errors are already reported in xemaclite_mdio_setup so avoid
reporting it again.

Signed-off-by: Radhey Shyam Pandey <[email protected]>
Signed-off-by: Michal Simek <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: emaclite: Remove unused 'has_mdio' flag.
Radhey Shyam Pandey [Wed, 13 Jun 2018 06:35:18 +0000 (12:05 +0530)]
net: emaclite: Remove unused 'has_mdio' flag.

Remove unused 'has_mdio' flag.

Signed-off-by: Radhey Shyam Pandey <[email protected]>
Signed-off-by: Michal Simek <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: emaclite: Fix MDIO bus unregister bug
Radhey Shyam Pandey [Wed, 13 Jun 2018 06:35:17 +0000 (12:05 +0530)]
net: emaclite: Fix MDIO bus unregister bug

Since 'has_mdio' flag is not used,sequence insmod->rmmod-> insmod
leads to failure as MDIO unregister doesn't happen in .remove().
Fix it by checking MII bus pointer instead.

Signed-off-by: Radhey Shyam Pandey <[email protected]>
Signed-off-by: Michal Simek <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: emaclite: Fix position of lp->mii_bus assignment
Radhey Shyam Pandey [Wed, 13 Jun 2018 06:35:16 +0000 (12:05 +0530)]
net: emaclite: Fix position of lp->mii_bus assignment

To ensure MDIO bus is not double freed in remove() path
assign lp->mii_bus after MDIO bus registration.

Signed-off-by: Radhey Shyam Pandey <[email protected]>
Signed-off-by: Michal Simek <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoeventfd: only return events requested in poll_mask()
Avi Kivity [Fri, 8 Jun 2018 19:12:32 +0000 (22:12 +0300)]
eventfd: only return events requested in poll_mask()

The ->poll_mask() operation has a mask of events that the caller
is interested in, but we're returning all events regardless.

Change to return only the events the caller is interested in. This
fixes aio IO_CMD_POLL returning immediately when called with POLLIN
on an eventfd, since an eventfd is almost always ready for a write.

Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Al Viro <[email protected]>
6 years agoaio: mark __aio_sigset::sigmask const
Avi Kivity [Fri, 8 Jun 2018 14:55:05 +0000 (17:55 +0300)]
aio: mark __aio_sigset::sigmask const

io_pgetevents() will not change the signal mask. Mark it const
to make it clear and to reduce the need for casts in user code.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Al Viro <[email protected]>
6 years agotcp: verify the checksum of the first data segment in a new connection
Frank van der Linden [Tue, 12 Jun 2018 23:09:37 +0000 (23:09 +0000)]
tcp: verify the checksum of the first data segment in a new connection

commit 079096f103fa ("tcp/dccp: install syn_recv requests into ehash
table") introduced an optimization for the handling of child sockets
created for a new TCP connection.

But this optimization passes any data associated with the last ACK of the
connection handshake up the stack without verifying its checksum, because it
calls tcp_child_process(), which in turn calls tcp_rcv_state_process()
directly.  These lower-level processing functions do not do any checksum
verification.

Insert a tcp_checksum_complete call in the TCP_NEW_SYN_RECEIVE path to
fix this.

Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Frank van der Linden <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Tested-by: Balbir Singh <[email protected]>
Reviewed-by: Balbir Singh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agonet: qcom/emac: Add missing of_node_put()
YueHaibing [Mon, 11 Jun 2018 13:03:45 +0000 (21:03 +0800)]
net: qcom/emac: Add missing of_node_put()

Add missing of_node_put() call for device node returned by
of_parse_phandle().

Signed-off-by: YueHaibing <[email protected]>
Acked-by: Timur Tabi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
6 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Thu, 14 Jun 2018 23:51:42 +0000 (08:51 +0900)]
Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

 - MM remainders

 - various misc things

 - kcov updates

* emailed patches from Andrew Morton <[email protected]>: (27 commits)
  lib/test_printf.c: call wait_for_random_bytes() before plain %p tests
  hexagon: drop the unused variable zero_page_mask
  hexagon: fix printk format warning in setup.c
  mm: fix oom_kill event handling
  treewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX
  mm: use octal not symbolic permissions
  ipc: use new return type vm_fault_t
  sysvipc/sem: mitigate semnum index against spectre v1
  fault-injection: reorder config entries
  arm: port KCOV to arm
  sched/core / kcov: avoid kcov_area during task switch
  kcov: prefault the kcov_area
  kcov: ensure irq code sees a valid area
  kernel/relay.c: change return type to vm_fault_t
  exofs: avoid VLA in structures
  coredump: fix spam with zero VMA process
  fat: use fat_fs_error() instead of BUG_ON() in __fat_get_block()
  proc: skip branch in /proc/*/* lookup
  mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns
  mm/memblock: add missing include <linux/bootmem.h>
  ...

6 years agolib/test_printf.c: call wait_for_random_bytes() before plain %p tests
Thierry Escande [Thu, 14 Jun 2018 22:28:15 +0000 (15:28 -0700)]
lib/test_printf.c: call wait_for_random_bytes() before plain %p tests

If the test_printf module is loaded before the crng is initialized, the
plain 'p' tests will fail because the printed address will not be hashed
and the buffer will contain '(ptrval)' instead.

This patch adds a call to wait_for_random_bytes() before plain 'p' tests
to make sure the crng is initialized.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thierry Escande <[email protected]>
Acked-by: Tobin C. Harding <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: David Miller <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agohexagon: drop the unused variable zero_page_mask
Anshuman Khandual [Thu, 14 Jun 2018 22:28:12 +0000 (15:28 -0700)]
hexagon: drop the unused variable zero_page_mask

Hexagon arch does not seem to have subscribed to _HAVE_COLOR_ZERO_PAGE
framework.  Hence zero_page_mask variable is not needed.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agohexagon: fix printk format warning in setup.c
Randy Dunlap [Thu, 14 Jun 2018 22:28:09 +0000 (15:28 -0700)]
hexagon: fix printk format warning in setup.c

Fix printk format warning in hexagon/kernel/setup.c:

../arch/hexagon/kernel/setup.c: In function 'setup_arch':
../arch/hexagon/kernel/setup.c:69:2: warning: format '%x' expects argument of type 'unsigned int', but argument 2 has type 'long unsigned int' [-Wformat]

where:
extern unsigned long __phys_offset;
#define PHYS_OFFSET __phys_offset

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agomm: fix oom_kill event handling
Roman Gushchin [Thu, 14 Jun 2018 22:28:05 +0000 (15:28 -0700)]
mm: fix oom_kill event handling

Commit e27be240df53 ("mm: memcg: make sure memory.events is uptodate
when waking pollers") converted most of memcg event counters to
per-memcg atomics, which made them less confusing for a user.  The
"oom_kill" counter remained untouched, so now it behaves differently
than other counters (including "oom").  This adds nothing but confusion.

Let's fix this by adding the MEMCG_OOM_KILL event, and follow the
MEMCG_OOM approach.

This also removes a hack from count_memcg_event_mm(), introduced earlier
specially for the OOM_KILL counter.

[[email protected]: fix for droppage of memcg-replace-mm-owner-with-mm-memcg.patch]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Roman Gushchin <[email protected]>
Acked-by: Konstantin Khlebnikov <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agotreewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX
Stefan Agner [Thu, 14 Jun 2018 22:28:02 +0000 (15:28 -0700)]
treewide: use PHYS_ADDR_MAX to avoid type casting ULLONG_MAX

With PHYS_ADDR_MAX there is now a type safe variant for all bits set.
Make use of it.

Patch created using a semantic patch as follows:

// <smpl>
@@
typedef phys_addr_t;
@@
-(phys_addr_t)ULLONG_MAX
+PHYS_ADDR_MAX
// </smpl>

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Stefan Agner <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Acked-by: Catalin Marinas <[email protected]> [arm64]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agomm: use octal not symbolic permissions
Joe Perches [Thu, 14 Jun 2018 22:27:58 +0000 (15:27 -0700)]
mm: use octal not symbolic permissions

mm/*.c files use symbolic and octal styles for permissions.

Using octal and not symbolic permissions is preferred by many as more
readable.

https://lkml.org/lkml/2016/8/2/1945

Prefer the direct use of octal for permissions.

Done using
$ scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace mm/*.c
and some typing.

Before:  $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
44
After:  $ git grep -P -w "0[0-7]{3,3}" mm | wc -l
86

Miscellanea:

o Whitespace neatening around these conversions.

Link: http://lkml.kernel.org/r/2e032ef111eebcd4c5952bae86763b541d373469.1522102887.git.joe@perches.com
Signed-off-by: Joe Perches <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agoipc: use new return type vm_fault_t
Souptick Joarder [Thu, 14 Jun 2018 22:27:55 +0000 (15:27 -0700)]
ipc: use new return type vm_fault_t

Use new return type vm_fault_t for fault handler.  For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno.  Once all instances are converted, vm_fault_t will become a
distinct type.

Commit 1c8f422059ae ("mm: change return type to vm_fault_t")

Link: http://lkml.kernel.org/r/20180425043413.GA21467@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <[email protected]>
Reviewed-by: Matthew Wilcox <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Cc: Manfred Spraul <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agosysvipc/sem: mitigate semnum index against spectre v1
Davidlohr Bueso [Thu, 14 Jun 2018 22:27:51 +0000 (15:27 -0700)]
sysvipc/sem: mitigate semnum index against spectre v1

Both smatch and coverity are reporting potential issues with spectre
variant 1 with the 'semnum' index within the sma->sems array, ie:

  ipc/sem.c:388 sem_lock() warn: potential spectre issue 'sma->sems'
  ipc/sem.c:641 perform_atomic_semop_slow() warn: potential spectre issue 'sma->sems'
  ipc/sem.c:721 perform_atomic_semop() warn: potential spectre issue 'sma->sems'

Avoid any possible speculation by using array_index_nospec() thus
ensuring the semnum value is bounded to [0, sma->sem_nsems).  With the
exception of sem_lock() all of these are slowpaths.

Link: http://lkml.kernel.org/r/20180423171131.njs4rfm2yzyeg6do@linux-n805
Signed-off-by: Davidlohr Bueso <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Gustavo A. R. Silva" <[email protected]>
Cc: Manfred Spraul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agofault-injection: reorder config entries
Mikulas Patocka [Thu, 14 Jun 2018 22:27:48 +0000 (15:27 -0700)]
fault-injection: reorder config entries

Reorder Kconfig entries, so that menuconfig displays proper indentation.

Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1804251601160.30569@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Mikulas Patocka <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agoarm: port KCOV to arm
Dmitry Vyukov [Thu, 14 Jun 2018 22:27:44 +0000 (15:27 -0700)]
arm: port KCOV to arm

KCOV is code coverage collection facility used, in particular, by
syzkaller system call fuzzer.  There is some interest in using syzkaller
on arm devices.  So port KCOV to arm.

On implementation level this merely declares that KCOV is supported and
disables instrumentation of 3 special cases.  Reasons for disabling are
commented in code.

Tested with qemu-system-arm/vexpress-a15.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dmitry Vyukov <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Cc: Russell King <[email protected]>
Cc: Abbott Liu <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Koguchi Takuo <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agosched/core / kcov: avoid kcov_area during task switch
Mark Rutland [Thu, 14 Jun 2018 22:27:41 +0000 (15:27 -0700)]
sched/core / kcov: avoid kcov_area during task switch

During a context switch, we first switch_mm() to the next task's mm,
then switch_to() that new task.  This means that vmalloc'd regions which
had previously been faulted in can transiently disappear in the context
of the prev task.

Functions instrumented by KCOV may try to access a vmalloc'd kcov_area
during this window, and as the fault handling code is instrumented, this
results in a recursive fault.

We must avoid accessing any kcov_area during this window.  We can do so
with a new flag in kcov_mode, set prior to switching the mm, and cleared
once the new task is live.  Since task_struct::kcov_mode isn't always a
specific enum kcov_mode value, this is made an unsigned int.

The manipulation is hidden behind kcov_{prepare,finish}_switch() helpers,
which are empty for !CONFIG_KCOV kernels.

The code uses macros because I can't use static inline functions without a
circular include dependency between <linux/sched.h> and <linux/kcov.h>,
since the definition of task_struct uses things defined in <linux/kcov.h>

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mark Rutland <[email protected]>
Acked-by: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agokcov: prefault the kcov_area
Mark Rutland [Thu, 14 Jun 2018 22:27:37 +0000 (15:27 -0700)]
kcov: prefault the kcov_area

On many architectures the vmalloc area is lazily faulted in upon first
access.  This is problematic for KCOV, as __sanitizer_cov_trace_pc
accesses the (vmalloc'd) kcov_area, and fault handling code may be
instrumented.  If an access to kcov_area faults, this will result in
mutual recursion through the fault handling code and
__sanitizer_cov_trace_pc(), eventually leading to stack corruption
and/or overflow.

We can avoid this by faulting in the kcov_area before
__sanitizer_cov_trace_pc() is permitted to access it.  Once it has been
faulted in, it will remain present in the process page tables, and will
not fault again.

[[email protected]: code cleanup]
[[email protected]: add comment explaining kcov_fault_in_area()]
[[email protected]: fancier code comment from Mark]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mark Rutland <[email protected]>
Acked-by: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agokcov: ensure irq code sees a valid area
Mark Rutland [Thu, 14 Jun 2018 22:27:34 +0000 (15:27 -0700)]
kcov: ensure irq code sees a valid area

Patch series "kcov: fix unexpected faults".

These patches fix a few issues where KCOV code could trigger recursive
faults, discovered while debugging a patch enabling KCOV for arch/arm:

* On CONFIG_PREEMPT kernels, there's a small race window where
  __sanitizer_cov_trace_pc() can see a bogus kcov_area.

* Lazy faulting of the vmalloc area can cause mutual recursion between
  fault handling code and __sanitizer_cov_trace_pc().

* During the context switch, switching the mm can cause the kcov_area to
  be transiently unmapped.

These are prerequisites for enabling KCOV on arm, but the issues
themsevles are generic -- we just happen to avoid them by chance rather
than design on x86-64 and arm64.

This patch (of 3):

For kernels built with CONFIG_PREEMPT, some C code may execute before or
after the interrupt handler, while the hardirq count is zero.  In these
cases, in_task() can return true.

A task can be interrupted in the middle of a KCOV_DISABLE ioctl while it
resets the task's kcov data via kcov_task_init().  Instrumented code
executed during this period will call __sanitizer_cov_trace_pc(), and as
in_task() returns true, will inspect t->kcov_mode before trying to write
to t->kcov_area.

In kcov_init_task() we update t->kcov_{mode,area,size} with plain stores,
which may be re-ordered, torn, etc.  Thus __sanitizer_cov_trace_pc() may
see bogus values for any of these fields, and may attempt to write to
memory which is not mapped.

Let's avoid this by using WRITE_ONCE() to set t->kcov_mode, with a
barrier() to ensure this is ordered before we clear t->kov_{area,size}.
This ensures that any code execute while kcov_init_task() is preempted
will either see valid values for t->kcov_{area,size}, or will see that
t->kcov_mode is KCOV_MODE_DISABLED, and bail out without touching
t->kcov_area.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mark Rutland <[email protected]>
Acked-by: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agokernel/relay.c: change return type to vm_fault_t
Souptick Joarder [Thu, 14 Jun 2018 22:27:31 +0000 (15:27 -0700)]
kernel/relay.c: change return type to vm_fault_t

Use new return type vm_fault_t for fault handler.  For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno.  Once all instances are converted, vm_fault_t will become a
distinct type.

commit 1c8f422059ae ("mm: change return type to vm_fault_t")

Link: http://lkml.kernel.org/r/20180510140335.GA25363@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <[email protected]>
Reviewed-by: Matthew Wilcox <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Eric Biggers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agoexofs: avoid VLA in structures
Kees Cook [Thu, 14 Jun 2018 22:27:27 +0000 (15:27 -0700)]
exofs: avoid VLA in structures

On the quest to remove all VLAs from the kernel[1] this adjusts several
cases where allocation is made after an array of structures that points
back into the allocation.  The allocations are changed to perform
explicit calculations instead of using a Variable Length Array in a
structure.

Additionally, this lets Clang compile this code now, since Clang does
not support VLAIS[2].

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
[2] https://lkml.kernel.org/r/CA+55aFy6h1c3_rP_bXFedsTXzwW+9Q9MfJaW7GUmMBrAp-fJ9A@mail.gmail.com

[[email protected]: v2]
Link: http://lkml.kernel.org/r/20180418163546.GA45794@beast
Link: http://lkml.kernel.org/r/20180327203904.GA1151@beast
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Cc: Boaz Harrosh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agocoredump: fix spam with zero VMA process
Alexey Dobriyan [Thu, 14 Jun 2018 22:27:24 +0000 (15:27 -0700)]
coredump: fix spam with zero VMA process

Nobody ever tried to self destruct by unmapping whole address space at
once:

munmap((void *)0, (1ULL << 47) - 4096);

Doing this produces 2 warnings for zero-length vmalloc allocations:

  a.out[1353]: segfault at 7f80bcc4b757 ip 00007f80bcc4b757 sp 00007fff683939b8 error 14
  a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null)
...
  a.out: vmalloc: allocation failure: 0 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null)
...

Fix is to switch to kvmalloc().

Steps to reproduce:

// vsyscall=none
#include <sys/mman.h>
#include <sys/resource.h>
int main(void)
{
setrlimit(RLIMIT_CORE, &(struct rlimit){RLIM_INFINITY, RLIM_INFINITY});
munmap((void *)0, (1ULL << 47) - 4096);
return 0;
}

Link: http://lkml.kernel.org/r/20180410180353.GA2515@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agofat: use fat_fs_error() instead of BUG_ON() in __fat_get_block()
OGAWA Hirofumi [Thu, 14 Jun 2018 22:27:21 +0000 (15:27 -0700)]
fat: use fat_fs_error() instead of BUG_ON() in __fat_get_block()

If file size and FAT cluster chain is not matched (corrupted image), we
can hit BUG_ON(!phys) in __fat_get_block().

So, use fat_fs_error() instead.

[[email protected]: fix printk warning]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: OGAWA Hirofumi <[email protected]>
Reported-by: Anatoly Trosinenko <[email protected]>
Tested-by: Anatoly Trosinenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agoproc: skip branch in /proc/*/* lookup
Alexey Dobriyan [Thu, 14 Jun 2018 22:27:17 +0000 (15:27 -0700)]
proc: skip branch in /proc/*/* lookup

Code is structured like this:

for ( ... p < last; p++) {
if (memcmp == 0)
break;
}
if (p >= last)
ERROR
OK

gcc doesn't see that if if lookup succeeds than post loop branch will
never be taken and skip it.

[[email protected]: proc_pident_instantiate() no longer takes an inode*]
Link: http://lkml.kernel.org/r/20180423213954.GD9043@avx2
Signed-off-by: Alexey Dobriyan <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agomremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns
Mel Gorman [Thu, 14 Jun 2018 22:26:41 +0000 (15:26 -0700)]
mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns

Commit 5d1904204c99 ("mremap: fix race between mremap() and page
cleanning") fixed races between mremap and other operations for both
file-backed and anonymous mappings.  The file-backed was the most
critical as it allowed the possibility that data could be changed on a
physical page after page_mkclean returned which could trigger data loss
or data integrity issues.

A customer reported that the cost of the TLBs for anonymous regressions
was excessive and resulting in a 30-50% drop in performance overall
since this commit on a microbenchmark.  Unfortunately I neither have
access to the test-case nor can I describe what it does other than
saying that mremap operations dominate heavily.

This patch removes the LATENCY_LIMIT to handle TLB flushes on a PMD
boundary instead of every 64 pages to reduce the number of TLB
shootdowns by a factor of 8 in the ideal case.  LATENCY_LIMIT was almost
certainly used originally to limit the PTL hold times but the latency
savings are likely offset by the cost of IPIs in many cases.  This patch
is not reported to completely restore performance but gets it within an
acceptable percentage.  The given metric here is simply described as
"higher is better".

Baseline that was known good
002:  Metric:       91.05
004:  Metric:      109.45
008:  Metric:       73.08
016:  Metric:       58.14
032:  Metric:       61.09
064:  Metric:       57.76
128:  Metric:       55.43

Current
001:  Metric:       54.98
002:  Metric:       56.56
004:  Metric:       41.22
008:  Metric:       35.96
016:  Metric:       36.45
032:  Metric:       35.71
064:  Metric:       35.73
128:  Metric:       34.96

With patch
001:  Metric:       61.43
002:  Metric:       81.64
004:  Metric:       67.92
008:  Metric:       51.67
016:  Metric:       50.47
032:  Metric:       52.29
064:  Metric:       50.01
128:  Metric:       49.04

So for low threads, it's not restored but for larger number of threads,
it's closer to the "known good" baseline.

Using a different mremap-intensive workload that is not representative
of the real workload there is little difference observed outside of
noise in the headline metrics However, the TLB shootdowns are reduced by
11% on average and at the peak, TLB shootdowns were reduced by 21%.
Interrupts were sampled every second while the workload ran to get those
figures.  It's known that the figures will vary as the
non-representative load is non-deterministic.

An alternative patch was posted that should have significantly reduced
the TLB flushes but unfortunately it does not perform as well as this
version on the customer test case.  If revisited, the two patches can
stack on top of each other.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mel Gorman <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Aaron Lu <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
6 years agomm/memblock: add missing include <linux/bootmem.h>
Mathieu Malaterre [Thu, 14 Jun 2018 22:26:38 +0000 (15:26 -0700)]
mm/memblock: add missing include <linux/bootmem.h>

Commit 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis")
introduced two new function definitions:

  memblock_virt_alloc_try_nid_nopanic()
  memblock_virt_alloc_try_nid()

Commit ea1f5f3712af ("mm: define memblock_virt_alloc_try_nid_raw")
introduced the following function definition:

  memblock_virt_alloc_try_nid_raw()

This commit adds an includeof header file <linux/bootmem.h> to provide
the missing function prototypes.  Silence the following gcc warning
(W=1):

  mm/memblock.c:1334:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_raw' [-Wmissing-prototypes]
  mm/memblock.c:1371:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_nopanic' [-Wmissing-prototypes]
  mm/memblock.c:1407:15: warning: no previous prototype for `memblock_virt_alloc_try_nid' [-Wmissing-prototypes]

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mathieu Malaterre <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
This page took 0.123092 seconds and 4 git commands to generate.