Git Repo - linux.git/log

phy: realtek: use new helpers for paged register access

Make use of the new helpers for paged register access.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'phy-add-helpers-for-setting-clearing-bits-in-PHY-registers'

Heiner Kallweit says:

====================
phy: add helpers for setting/clearing bits in PHY registers

Based on the recent introduction of phy_modify add helpers for setting
and clearing bits in PHY registers. First user is phylib.
====================

Signed-off-by: David S. Miller <[email protected]>

phy: use new helpers phy_set_bits/phy_clear_bits in phylib

Use new helpers phy_set_bits / phy_clear_bits in phylib.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

phy: add helpers for setting/clearing bits in PHY registers

Based on the recent introduction of phy_modify add helpers for setting
and clearing bits in PHY registers.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Fix pending MAC address changes

Due to architecture limitations, the IBM VNIC client driver is unable
to perform MAC address changes unless the device has "logged in" to
its backing device. Currently, pending MAC changes are handled before
login, resulting in an error and failure to change the MAC address.
Moving that chunk to the end of the ibmvnic_login function, when we are
sure that it was successful, fixes that.

The MAC address can be changed when the device is up or down, so
only check if the device is in a "PROBED" state before setting the
MAC address.

Fixes: c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters")
Signed-off-by: Thomas Falcon <[email protected]>
Reviewed-by: John Allen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bpftool: recognize BPF_PROG_TYPE_CGROUP_DEVICE programs

Bpftool doesn't recognize BPF_PROG_TYPE_CGROUP_DEVICE programs,
so the prog show command prints the numeric type value:

$ bpftool prog show
1: type 15  name bpf_prog1  tag ac9f93dbfd6d9b74
loaded_at Jan 15/07:58  uid 0
xlated 96B  jited 105B  memlock 4096B

This patch defines the corresponding textual representation:

$ bpftool prog show
1: cgroup_device  name bpf_prog1  tag ac9f93dbfd6d9b74
loaded_at Jan 15/07:58  uid 0
xlated 96B  jited 105B  memlock 4096B

Signed-off-by: Roman Gushchin <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Quentin Monnet <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

RDMA/mlx5: Fix out-of-bound access while querying AH

The rdma_ah_find_type() accesses the port array based on an index
controlled by userspace. The existing bounds check is after the first use
of the index, so userspace can generate an out of bounds access, as shown
by the KASN report below.

==================================================================
BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0xa8/0x3b0
Read of size 4 at addr ffff880019ae2268 by task ibv_rc_pingpong/409

CPU: 0 PID: 409 Comm: ibv_rc_pingpong Not tainted 4.15.0-rc2-00031-gb60a3faf5b83-dirty #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
dump_stack+0xe9/0x18f
print_address_description+0xa2/0x350
kasan_report+0x3a5/0x400
to_rdma_ah_attr+0xa8/0x3b0
mlx5_ib_query_qp+0xd35/0x1330
ib_query_qp+0x8a/0xb0
ib_uverbs_query_qp+0x237/0x7f0
ib_uverbs_write+0x617/0xd80
__vfs_write+0xf7/0x500
vfs_write+0x149/0x310
SyS_write+0xca/0x190
entry_SYSCALL_64_fastpath+0x18/0x85
RIP: 0033:0x7fe9c7a275a0
RSP: 002b:00007ffee5498738 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fe9c7ce4b00 RCX: 00007fe9c7a275a0
RDX: 0000000000000018 RSI: 00007ffee5498800 RDI: 0000000000000003
RBP: 000055d0c8d3f010 R08: 00007ffee5498800 R09: 0000000000000018
R10: 00000000000000ba R11: 0000000000000246 R12: 0000000000008000
R13: 0000000000004fb0 R14: 000055d0c8d3f050 R15: 00007ffee5498560

Allocated by task 1:
__kmalloc+0x3f9/0x430
alloc_mad_private+0x25/0x50
ib_mad_post_receive_mads+0x204/0xa60
ib_mad_init_device+0xa59/0x1020
ib_register_device+0x83a/0xbc0
mlx5_ib_add+0x50e/0x5c0
mlx5_add_device+0x142/0x410
mlx5_register_interface+0x18f/0x210
mlx5_ib_init+0x56/0x63
do_one_initcall+0x15b/0x270
kernel_init_freeable+0x2d8/0x3d0
kernel_init+0x14/0x190
ret_from_fork+0x24/0x30

Freed by task 0:
(stack is not available)

The buggy address belongs to the object at ffff880019ae2000
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 104 bytes to the right of
512-byte region [ffff880019ae2000, ffff880019ae2200)
The buggy address belongs to the page:
page:000000005d674e18 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 0000000000000000 0000000000000000 00000001000c000c
raw: dead000000000100 dead000000000200 ffff88001a402000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff880019ae2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880019ae2180: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>ffff880019ae2200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff880019ae2280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880019ae2300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint

Cc: <[email protected]>
Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

Merge tag 'linux-can-next-for-4.16-20180105' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2017-12-01,Re: pull-request: can-next

this is a pull request of 7 patches for net-next/master.

All patches are by me. Patch 6 is for the "can_raw" protocol and add
error checking to the bind() function. All other patches clean up the
coding style and remove unused parameters in various CAN drivers and
infrastructure.
====================

Signed-off-by: David S. Miller <[email protected]>

netlink: extack: avoid parenthesized string constant warning

NL_SET_ERR_MSG() and NL_SET_ERR_MSG_ATTR() lead to the following warning
in newer versions of gcc:
warning: array initialized from parenthesized string constant

Just remove the parentheses, they're not needed in this context since
anyway since there can be no operator precendence issues or similar.

Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'sh_eth-simplify-TSU-initialization'

Sergei Shtylyov says:

====================
sh_eth: simplify TSU initialization

Here's a set of 2 patches against DaveM's 'net-next.git' repo. With those,
I'm somewhat simplifying the TSU init code in the driver probe() method...
====================

Signed-off-by: David S. Miller <[email protected]>

sh_eth: get Ether port # only when needed

The dual-port Ether configurations always have a shared TSU to e.g. pass
the packets between those ports. With the TSU init. code gathered under
the single *if*, we now can only get the port # from 'platform_device::id'
only when we actually need it (and not recalculate it each time)...

Signed-off-by: Sergei Shtylyov <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sh_eth: gather all TSU init code in one place

The  sh_eth_cpu_data::chip_reset() method  always resets using ARSTR and
this register is always located at the start of the  TSU register region.
Therefore, we can  only call  this method if we know TSU is there and thus
simplify  the probing code a  bit...

Signed-off-by: Sergei Shtylyov <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'ipv4-Make-neigh-lookup-keys-for-loopback-point-to-point-devices-be-INADDR_ANY'

Jim Westfall says:

====================
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY

This used to be the previous behavior in older kernels but became broken in
a263b3093641f (ipv4: Make neigh lookups directly in output packet path)
and then later removed because it was broken in 0bb4087cbec0 (ipv4: Fix neigh
lookup keying over loopback/point-to-point devices)

Not having this results in there being an arp entry for every remote ip
address that the device talks to.  Given a fairly active device it can
cause the arp table to become huge and/or having to add/purge large number
of entires to keep within table size thresholds.

$ ip -4 neigh show nud noarp | grep tun | wc -l
55850

$ lnstat -k arp_cache:entries,arp_cache:allocs,arp_cache:destroys -c 10
arp_cach|arp_cach|arp_cach|
entries|  allocs|destroys|
   81493|620166816|620126069|
  101867|   10186|       0|
  113854|    5993|       0|
  118773|    2459|       0|
   27937|   18579|   63998|
   39256|    5659|       0|
   56231|    8487|       0|
   65602|    4685|       0|
   79697|    7047|       0|
   90733|    5517|       0|

v2:
- fixes coding style issues
====================

Signed-off-by: David S. Miller <[email protected]>

ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY

Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
to avoid making an entry for every remote ip the device needs to talk to.

This used the be the old behavior but became broken in a263b3093641f
(ipv4: Make neigh lookups directly in output packet path) and later removed
in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
devices) because it was broken.

Signed-off-by: Jim Westfall <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Allow neigh contructor functions ability to modify the primary_key

Use n->primary_key instead of pkey to account for the possibility that a neigh
constructor function may have modified the primary_key value.

Signed-off-by: Jim Westfall <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sh_eth: fix dumping ARSTR

ARSTR  is always located at the start of the TSU register region, thus
using add_reg()  instead of add_tsu_reg() in __sh_eth_get_regs() to dump it
causes EDMR or EDSR (depending on the register layout) to be dumped instead
of ARSTR.  Use the correct condition/macro there...

Fixes: 6b4b4fead342 ("sh_eth: Implement ethtool register dump operations")
Signed-off-by: Sergei Shtylyov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'wireless-drivers-next-for-davem-2018-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.16

Here are patches which have been accumulating over the holidays and
after the New Year. Business as usual and nothing special really
standing out.

But what's noteworthy here is that Larry Finger is stepping down as
the rtlwifi maintainer. He has been maintaining rtlwifi since it was
applied back in 2010 in commit 0c8173385e54 ("rtl8192ce: Add new
driver") and it has been no easy role trying to juggle between the
vendor, demanding upstream community and users. So big thank you to
Larry for all his efforts!

ath10k

* more preparation work for wcn3990 support

* add memory dump to firmware coredump files

wil6210

* support scheduled scan

* support 40-bit DMA addresses

qtnfmac

* support MAC address based access control

* support for radar detection and Channel Availibility Check (CAC)

mwifiex

* firmware coredump for usb devices

rtlwifi

* Larry Finger steps down as the maintainer and Ping-Ke Shih becomes
the new maintainer

* add debugfs interfaces to dump register and btcoex status, and also
write registers and h2c
====================

Signed-off-by: David S. Miller <[email protected]>

net: ethernet: Add a driver for Gemini gigabit ethernet

The Gemini ethernet has been around for years as an out-of-tree
patch used with the NAS boxen and routers built on StorLink
SL3512 and SL3516, later Storm Semiconductor, later Cortina
Systems. These ASICs are still being deployed and brand new
off-the-shelf systems using it can easily be acquired.

The full name of the IP block is "Net Engine and Gigabit
Ethernet MAC" commonly just called "GMAC".

The hardware block contains a common TCP Offload Enginer (TOE)
that can be used by both MACs. The current driver does not use
it.

Cc: Tobias Waldvogel <[email protected]>
Signed-off-by: Michał Mirosław <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: Add DT bindings for the Gemini ethernet

This adds the device tree bindings for the Gemini ethernet
controller. It is pretty straight-forward, using standard
bindings and modelling the two child ports as child devices
under the parent ethernet controller device.

Cc: [email protected]
Cc: Tobias Waldvogel <[email protected]>
Cc: Michał Mirosław <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "openvswitch: Add erspan tunnel support."

This reverts commit ceaa001a170e43608854d5290a48064f57b565ed.

The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr should be designed
as a nested attribute to support all ERSPAN v1 and v2's fields.
The current attr is a be32 supporting only one field. Thus, this
patch reverts it and later patch will redo it using nested attr.

Signed-off-by: William Tu <[email protected]>
Cc: Jiri Benc <[email protected]>
Cc: Pravin Shelar <[email protected]>
Acked-by: Jiri Benc <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: Fix build with gcc-4.4.5

Emil reported the following compiler errors:

net/ipv6/route.c: In function `rt6_sync_up`:
net/ipv6/route.c:3586: error: unknown field `nh_flags` specified in initializer
net/ipv6/route.c:3586: warning: missing braces around initializer
net/ipv6/route.c:3586: warning: (near initialization for `arg.<anonymous>`)
net/ipv6/route.c: In function `rt6_sync_down_dev`:
net/ipv6/route.c:3695: error: unknown field `event` specified in initializer
net/ipv6/route.c:3695: warning: missing braces around initializer
net/ipv6/route.c:3695: warning: (near initialization for `arg.<anonymous>`)

Problem is with the named initializers for the anonymous union members.
Fix this by adding curly braces around the initialization.

Fixes: 4c981e28d373 ("ipv6: Prepare to handle multiple netdev events")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-by: Emil S Tantilov <[email protected]>
Tested-by: Emil S Tantilov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: fix bug during lookup of multicast destination nodes

In commit 232d07b74a33 ("tipc: improve groupcast scope handling") we
inadvertently broke non-group multicast transmission when changing the
parameter 'domain' to 'scope' in the function
tipc_nametbl_lookup_dst_nodes(). We missed to make the corresponding
change in the calling function, with the result that the lookup always
fails.

A closer anaysis reveals that this parameter is not needed at all.
Non-group multicast is hard coded to use CLUSTER_SCOPE, and in the
current implementation this will be delivered to all matching
destinations except those which are published with NODE_SCOPE on other
nodes. Since such publications never will be visible on the sending node
anyway, it makes no sense to discriminate by scope at all.

We now remove this parameter altogether.

Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Convert atomic_t net::count to refcount_t

Since net could be obtained from RCU lists,
and there is a race with net destruction,
the patch converts net::count to refcount_t.

This provides sanity checks for the cases of
incrementing counter of already dead net,
when maybe_get_net() has to used instead
of get_net().

Drivers: allyesconfig and allmodconfig are OK.

Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Kirill Tkhai <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/tls: Fix inverted error codes to avoid endless loop

sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
Socket error codes must be inverted by Kernel TLS before returning because
they are stored with positive sign. If returned non-inverted they are
interpreted as number of bytes sent, causing endless looping of the
splice mechanic behind sendfile().

Signed-off-by: Robert Hering <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: ip6_make_skb() needs to clear cork.base.dst

In my last patch, I missed fact that cork.base.dst was not initialized
in ip6_make_skb() :

If ip6_setup_cork() returns an error, we might attempt a dst_release()
on some random pointer.

Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y

I regularly get 50 MB - 60 MB files during kernel randconfig builds.
These large files mostly contain (many repeats of; e.g., 124,594):

In file included from ../include/linux/string.h:6:0,
                 from ../include/linux/uuid.h:20,
                 from ../include/linux/mod_devicetable.h:13,
                 from ../scripts/mod/devicetable-offsets.c:3:
../include/linux/compiler.h:64:4: warning: '______f' is static but declared in inline function 'strcpy' which is not static [enabled by default]
    ______f = {     \
    ^
../include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
                       ^
../include/linux/string.h:425:2: note: in expansion of macro 'if'
  if (p_size == (size_t)-1 && q_size == (size_t)-1)
  ^

This only happens when CONFIG_FORTIFY_SOURCE=y and
CONFIG_PROFILE_ALL_BRANCHES=y, so prevent PROFILE_ALL_BRANCHES if
FORTIFY_SOURCE=y.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

sctp: removed unused var from sctp_make_auth

Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sctp: avoid compiler warning on implicit fallthru

These fall-through are expected.

Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Neil Horman <[email protected]>
Reviewed-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ipv4: Make "ip route get" match iif lo rules again.

Commit 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu
versions of route lookup") broke "ip route get" in the presence
of rules that specify iif lo.

Host-originated traffic always has iif lo, because
ip_route_output_key_hash and ip6_route_output_flags set the flow
iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a
convenient way to select only originated traffic and not forwarded
traffic.

inet_rtm_getroute used to match these rules correctly because
even though it sets the flow iif to 0, it called
ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX.
But now that it calls ip_route_output_key_hash_rcu, the ifindex
will remain 0 and not match the iif lo in the rule. As a result,
"ip route get" will return ENETUNREACH.

Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork_test.py passes again
Signed-off-by: Lorenzo Colitti <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netlink: extack needs to be reset each time through loop

syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value.
The problem is that netlink_rcv_skb loops over the skb repeatedly invoking
the callback and without resetting the extack leaving potentially stale
data. Initializing each time through avoids the WARN_ON.

Fixes: 2d4bc93368f5a ("netlink: extended ACK reporting")
Reported-by: [email protected]
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: fix a memory leak in tipc_nl_node_get_link()

When tipc_node_find_by_name() fails, the nlmsg is not
freed.

While on it, switch to a goto label to properly
free it.

Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node")
Reported-by: Dmitry Vyukov <[email protected]>
Cc: Jon Maloy <[email protected]>
Cc: Ying Xue <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Acked-by: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: remove parameter new_link from phy_mac_interrupt()

I see two issues with parameter new_link:

1. It's not needed. See also phy_interrupt(), works w/o this parameter.
   phy_mac_interrupt sets the state to PHY_CHANGELINK and triggers the
   state machine which then calls phy_read_status. And phy_read_status
   updates the link state.

2. phy_mac_interrupt is used in interrupt context and getting the link
   state may sleep (at least when having to access the PHY registers
   via MDIO bus).

So let's remove it.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tipc: fix a potental access after delete in tipc_sk_join()

In commit d12d2e12cec2 "tipc: send out join messages as soon as new
member is discovered") we added a call to the function tipc_group_join()
without considering the case that the preceding tipc_sk_publish() might
have failed, and the group item already deleted.

We fix this by returning from tipc_sk_join() directly after the
failed tipc_sk_publish.

Reported-by: [email protected]
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: fix udpv6 sendmsg crash caused by too small MTU

The logic in __ip6_append_data() assumes that the MTU is at least large
enough for the headers.  A device's MTU may be adjusted after being
added while sendmsg() is processing data, resulting in
__ip6_append_data() seeing any MTU.  For an mtu smaller than the size of
the fragmentation header, the math results in a negative 'maxfraglen',
which causes problems when refragmenting any previous skb in the
skb_write_queue, leaving it possibly malformed.

Instead sendmsg returns EINVAL when the mtu is calculated to be less
than IPV6_MIN_MTU.

Found by syzkaller:
kernel BUG at ./include/linux/skbuff.h:2064!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801d0b68580 task.stack: ffff8801ac6b8000
RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline]
RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617
RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216
RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000
RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0
RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000
R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8
R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000
FS:  00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
ip6_finish_skb include/net/ipv6.h:911 [inline]
udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093
udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363
inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
SYSC_sendto+0x352/0x5a0 net/socket.c:1750
SyS_sendto+0x40/0x50 net/socket.c:1718
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512e9
RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9
RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005
RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c
R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69
R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000
Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba
RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570
RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570

Reported-by: syzbot <[email protected]>
Signed-off-by: Mike Maloney <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: cs89x0: add MODULE_LICENSE

This driver lacks a MODULE_LICENSE tag, leading to a Kbuild warning:

WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/cirrus/cs89x0.o

This adds license, author, and description according to the
comment block at the start of the file.

Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ppp: unlock all_ppp_mutex before registering device

ppp_dev_uninit(), which is the .ndo_uninit() handler of PPP devices,
needs to lock pn->all_ppp_mutex. Therefore we mustn't call
register_netdevice() with pn->all_ppp_mutex already locked, or we'd
deadlock in case register_netdevice() fails and calls .ndo_uninit().

Fortunately, we can unlock pn->all_ppp_mutex before calling
register_netdevice(). This lock protects pn->units_idr, which isn't
used in the device registration process.

However, keeping pn->all_ppp_mutex locked during device registration
did ensure that no device in transient state would be published in
pn->units_idr. In practice, unlocking it before calling
register_netdevice() doesn't change this property: ppp_unit_register()
is called with 'ppp_mutex' locked and all searches done in
pn->units_idr hold this lock too.

Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion")
Reported-and-tested-by: [email protected]
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ptr_ring: document usage around __ptr_ring_peek

This explains why is the net usage of __ptr_ring_peek
actually ok without locks.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: John Fastabend <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'dsa-lan9303-check-error-value-from-devm_gpiod_get_optional'

Phil Reid says:

====================
net: dsa: lan9303: check error value from devm_gpiod_get_optional()

Errors need to be prograted back from probe.

Note: I have only compile tested the code as I don't have the hardware.
Egil Hjelmeland <[email protected]> has tested it but I haven't
added at Test-by: wasn't in the standard form. Not sure if that's ok or
not.

Changes from v1:
- rebased on net-next
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: lan9303: check error value from devm_gpiod_get_optional()

devm_gpiod_get_optional() can return an error in addition to a NULL ptr.
Check for error and propagate that to the probe function. Check return
value in probe. This will now handle EPROBE_DEFER for the reset gpio.

Signed-off-by: Phil Reid <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: lan9303: make lan9303_handle_reset() a void function

lan9303_handle_reset never returns anything other than success.
So there's not need for it to return an error code.

Signed-off-by: Phil Reid <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

9p: add missing module license for xen transport

The 9P of Xen module is missing required license and module information.
See https://bugzilla.kernel.org/show_bug.cgi?id=198109

Reported-by: Alan Bartlett <[email protected]>
Fixes: 868eb122739a ("xen/9pfs: introduce Xen 9pfs transport driver")
Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: Have __phy_modify return 0 on success

__phy_modify would return the old value of the register before it was
modified. Thus on success, it does not return 0, but a positive value.
Thus functions using phy_modify, which is a wrapper around
__phy_modify, can start returning > 0 on success, rather than 0. As a
result, breakage has been noticed in various places, where 0 was
assumed.

Code inspection does not find any current location where the return of
the old value is currently used. So have __phy_modify return 0 on
success. When there is a real need for the old value, either a new
accessor can be added, or an additional parameter passed.

Fixes: fea23fb591cc ("net: phy: convert read-modify-write to phy_modify()")
Fixes: 2b74e5be17d2 ("net: phy: add phy_modify() accessor")
Reported-by: Geert Uytterhoeven <[email protected]>
Tested-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
Tested-by: Niklas Cassel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ring-buffer: Bring back context level recursive checks

Commit 1a149d7d3f45 ("ring-buffer: Rewrite trace_recursive_(un)lock() to be
simpler") replaced the context level recursion checks with a simple counter.
This would prevent the ring buffer code from recursively calling itself more
than the max number of contexts that exist (Normal, softirq, irq, nmi). But
this change caused a lockup in a specific case, which was during suspend and
resume using a global clock. Adding a stack dump to see where this occurred,
the issue was in the trace global clock itself:

  trace_buffer_lock_reserve+0x1c/0x50
  __trace_graph_entry+0x2d/0x90
  trace_graph_entry+0xe8/0x200
  prepare_ftrace_return+0x69/0xc0
  ftrace_graph_caller+0x78/0xa8
  queued_spin_lock_slowpath+0x5/0x1d0
  trace_clock_global+0xb0/0xc0
  ring_buffer_lock_reserve+0xf9/0x390

The function graph tracer traced queued_spin_lock_slowpath that was called
by trace_clock_global. This pointed out that the trace_clock_global() is not
reentrant, as it takes a spin lock. It depended on the ring buffer recursive
lock from letting that happen.

By removing the context detection and adding just a max number of allowable
recursions, it allowed the trace_clock_global() to be entered again and try
to retake the spinlock it already held, causing a deadlock.

Fixes: 1a149d7d3f45 ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler")
Reported-by: David Weinehall <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

cfg80211: check dev_set_name() return value

syzbot reported a warning from rfkill_alloc(), and after a while
I think that the reason is that it was doing fault injection and
the dev_set_name() failed, leaving the name NULL, and we didn't
check the return value and got to rfkill_alloc() with a NULL name.
Since we really don't want a NULL name, we ought to check the
return value.

Fixes: fb28ad35906a ("net: struct device - replace bus_id with dev_name(), dev_set_name()")
Reported-by: [email protected]
Signed-off-by: Johannes Berg <[email protected]>

mac80211_hwsim: validate number of different channels

When creating a new radio on the fly, hwsim allows this
to be done with an arbitrary number of channels, but
cfg80211 only supports a limited number of simultaneous
channels, leading to a warning.

Fix this by validating the number - this requires moving
the define for the maximum out to a visible header file.

Reported-by: [email protected]
Fixes: b59ec8dd4394 ("mac80211_hwsim: fix number of channels in interface combinations")
Signed-off-by: Johannes Berg <[email protected]>

mac80211_hwsim: add workqueue to wait for deferred radio deletion on mod unload

When closing multiple wmediumd instances with many radios and try to
unload the mac80211_hwsim module, it may happen that the work items live
longer than the module. To wait especially for this deletion work items,
add a work queue, otherwise flush_scheduled_work would be necessary.

Signed-off-by: Benjamin Beichler <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

nl80211: take RCU read lock when calling ieee80211_bss_get_ie()

As ieee80211_bss_get_ie() derefences an RCU to return ssid_ie, both
the call to this function and any operation on this variable need
protection by the RCU read lock.

Fixes: 44905265bc15 ("nl80211: don't expose wdev->ssid for most interfaces")
Signed-off-by: Dominik Brodowski <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

cfg80211: fully initialize old channel for event

Paul reported that he got a report about undefined behaviour
that seems to me to originate in using uninitialized memory
when the channel structure here is used in the event code in
nl80211 later.

He never reported whether this fixed it, and I wasn't able
to trigger this so far, but we should do the right thing and
fully initialize the on-stack structure anyway.

Reported-by: Paul Menzel <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

Linux 4.15-rc8

Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixlet from Thomas Gleixner.

Remove a warning about lack of compiler support for retpoline that most
people can't do anything about, so it just annoys them needlessly.

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Remove compile time warning

Merge tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
"One fix for an oops at boot if we take a hotplug interrupt before we
  are ready to handle it.

  The bulk is patches to implement mitigation for Meltdown, see the
  change logs for more details.

  Thanks to: Nicholas Piggin, Michael Neuling, Oliver O'Halloran, Jon
  Masters, Jose Ricardo Ziviani, David Gibson"

* tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Check device-tree for RFI flush settings
  powerpc/pseries: Query hypervisor for RFI flush settings
  powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti
  powerpc/64s: Add support for RFI flush of L1-D cache
  powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL
  powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL
  powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL
  powerpc/64s: Simple RFI macro conversions
  powerpc/64: Add macros for annotating the destination of rfid/hrfid
  powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper
  powerpc/pseries: Make RAS IRQ explicitly dependent on DLPAR WQ

Merge branch 'bpf-nfp-map-offload'

Jakub Kicinski says:

====================
This set adds support for creating maps on networking devices.  BPF is
programs+maps, the pure program offload has been around for quite some
time, this patchset adds the map part of the equation.

Maps are allocated on the target device from the start.  There is no
host copy when map is created on the device.  Device maps are represented
by struct bpf_offloaded_map, regardless of type.  Host programs can't
access such maps, access is only possible from a program also loaded
to the same device and/or via the BPF syscall.

Offloaded programs are currently only allowed to perform lookups,
control plane is responsible for populating the maps.

For brevity only infrastructure and basic NFP patches are included.
Target device reporting, netdevsim and tests will follow up as well as
some further optimizations to the NFP code.

v2:
- leave out the array maps, we will add them trivially later to avoid
   merge conflicts with ongoing spectere&meltdown mitigations.
====================

Signed-off-by: Daniel Borkmann <[email protected]>

nfp: bpf: implement bpf map offload

Plug in to the stack's map offload callbacks for BPF map offload.
Get next call needs some special handling on the FW side, since
we can't send a NULL pointer to the FW there is a get first entry
FW command.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

nfp: bpf: add support for reading map memory

Map memory needs to use 40 bit addressing. Add handling of such
accesses. Since 40 bit addresses are formed by using both 32 bit
operands we need to pre-calculate the actual address instead of
adding in the offset inside the instruction, like we did in 32 bit
mode.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

nfp: bpf: add verification and codegen for map lookups

Verify our current constraints on the location of the key are
met and generate the code for calling map lookup on the datapath.

New relocation types have to be added - for helpers and return
addresses.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

nfp: bpf: add helpers for updating immediate instructions

Immediate loads are used to load the return address of a helper.
We need to be able to update those loads for relocations.
Immediate loads can be slightly more complex and spread over
two instructions in general, but here we only care about simple
loads of small (< 65k) constants, so complex cases are not handled.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

nfp: bpf: parse function call and map capabilities

Parse helper function and supported map FW TLV capabilities.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

nfp: bpf: implement helpers for FW map ops

Implement calls for FW map communication.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

nfp: bpf: add basic control channel communication

For map support we will need to send and receive control messages.
Add basic support for sending a message to FW, and waiting for a
reply.

Control messages are tagged with a 16 bit ID. Add a simple ID
allocator and make sure we don't allow too many messages in flight,
to avoid request <> reply mismatches.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

nfp: bpf: add map data structure

To be able to split code into reasonable chunks we need to add
the map data structures already. Later patches will add code
piece by piece.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: offload: add map offload infrastructure

BPF map offload follow similar path to program offload.  At creation
time users may specify ifindex of the device on which they want to
create the map.  Map will be validated by the kernel's
.map_alloc_check callback and device driver will be called for the
actual allocation.  Map will have an empty set of operations
associated with it (save for alloc and free callbacks).  The real
device callbacks are kept in map->offload->dev_ops because they
have slightly different signatures.  Map operations are called in
process context so the driver may communicate with HW freely,
msleep(), wait() etc.

Map alloc and free callbacks are muxed via existing .ndo_bpf, and
are always called with rtnl lock held.  Maps and programs are
guaranteed to be destroyed before .ndo_uninit (i.e. before
unregister_netdev() returns).  Map callbacks are invoked with
bpf_devs_lock *read* locked, drivers must take care of exclusive
locking if necessary.

All offload-specific branches are marked with unlikely() (through
bpf_map_is_dev_bound()), given that branch penalty will be
negligible compared to IO anyway, and we don't want to penalize
SW path unnecessarily.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: offload: factor out netdev checking at allocation time

Add a helper to check if netdev could be found and whether it
has .ndo_bpf callback. There is no need to check the callback
every time it's invoked, ndos can't reasonably be swapped for
a set without .ndp_bpf while program is loaded.

bpf_dev_offload_check() will also be used by map offload.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: rename bpf_dev_offload -> bpf_prog_offload

With map offload coming, we need to call program offload structure
something less ambiguous. Pure rename, no functional changes.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: add helper for copying attrs to struct bpf_map

All map types reimplement the field-by-field copy of union bpf_attr
members into struct bpf_map. Add a helper to perform this operation.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: hashtab: move checks out of alloc function

Use the new callback to perform allocation checks for hash maps.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: hashtab: move attribute validation before allocation

Number of attribute checks are currently performed after hashtab
is already allocated.  Move them to be able to split them out to
the check function later on.  Checks have to now be performed on
the attr union directly instead of the members of bpf_map, since
bpf_map will be allocated later.  No functional changes.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

bpf: add map_alloc_check callback

.map_alloc callbacks contain a number of checks validating user-
-provided map attributes against constraints of a particular map
type. For offloaded maps we will need to check map attributes
without actually allocating any memory on the host. Add a new
callback for validating attributes before any memory is allocated.
This callback can be selectively implemented by map types for
sharing code with offloads, or simply to separate the logical
steps of validation and allocation.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

x86/retpoline: Remove compile time warning

Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler
does not have retpoline support. Linus rationale for this is:

  It's wrong because it will just make people turn off RETPOLINE, and the
  asm updates - and return stack clearing - that are independent of the
  compiler are likely the most important parts because they are likely the
  ones easiest to target.

  And it's annoying because most people won't be able to do anything about
  it. The number of people building their own compiler? Very small. So if
  their distro hasn't got a compiler yet (and pretty much nobody does), the
  warning is just annoying crap.

  It is already properly reported as part of the sysfs interface. The
  compile-time warning only encourages bad things.

Fixes: 76b043848fd2 ("x86/retpoline: Add initial retpoline support")
Requested-by: Linus Torvalds <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Cc: Rik van Riel <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAtw@mail.gmail.com

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull NVMe fix from Jens Axboe:
"Just a single fix for nvme over fabrics that should go into 4.15"

* 'for-linus' of git://git.kernel.dk/linux-block:
nvme-fabrics: initialize default host->id in nvmf_host_default()

Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 pti updates from Thomas Gleixner:
"This contains:

   - a PTI bugfix to avoid setting reserved CR3 bits when PCID is
     disabled. This seems to cause issues on a virtual machine at least
     and is incorrect according to the AMD manual.

   - a PTI bugfix which disables the perf BTS facility if PTI is
     enabled. The BTS AUX buffer is not globally visible and causes the
     CPU to fault when the mapping disappears on switching CR3 to user
     space. A full fix which restores BTS on PTI is non trivial and will
     be worked on.

   - PTI bugfixes for EFI and trusted boot which make sure that the user
     space visible page table entries have the NX bit cleared

   - removal of dead code in the PTI pagetable setup functions

   - add PTI documentation

   - add a selftest for vsyscall to verify that the kernel actually
     implements what it advertises.

   - a sysfs interface to expose vulnerability and mitigation
     information so there is a coherent way for users to retrieve the
     status.

   - the initial spectre_v2 mitigations, aka retpoline:

      + The necessary ASM thunk and compiler support

      + The ASM variants of retpoline and the conversion of affected ASM
        code

      + Make LFENCE serializing on AMD so it can be used as speculation
        trap

      + The RSB fill after vmexit

   - initial objtool support for retpoline

  As I said in the status mail this is the most of the set of patches
  which should go into 4.15 except two straight forward patches still on
  hold:

   - the retpoline add on of LFENCE which waits for ACKs

   - the RSB fill after context switch

  Both should be ready to go early next week and with that we'll have
  covered the major holes of spectre_v2 and go back to normality"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits)
  x86,perf: Disable intel_bts when PTI
  security/Kconfig: Correct the Documentation reference for PTI
  x86/pti: Fix !PCID and sanitize defines
  selftests/x86: Add test_vsyscall
  x86/retpoline: Fill return stack buffer on vmexit
  x86/retpoline/irq32: Convert assembler indirect jumps
  x86/retpoline/checksum32: Convert assembler indirect jumps
  x86/retpoline/xen: Convert Xen hypercall indirect jumps
  x86/retpoline/hyperv: Convert assembler indirect jumps
  x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
  x86/retpoline/entry: Convert entry assembler indirect jumps
  x86/retpoline/crypto: Convert crypto assembler indirect jumps
  x86/spectre: Add boot time option to select Spectre v2 mitigation
  x86/retpoline: Add initial retpoline support
  objtool: Allow alternatives to be ignored
  objtool: Detect jumps to retpoline thunks
  x86/pti: Make unpoison of pgd for trusted boot work for real
  x86/alternatives: Fix optimize_nops() checking
  sysfs/cpu: Fix typos in vulnerability documentation
  x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
  ...

Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2018-01-12

This series contains updates to ixgbe, fm10k and net core.

Alex updates the driver to remove a duplicate MAC address check and
verifies that we have not run out of resources to configure a MAC rule
in our filter table.  Also do not assume that dev->num_tc was populated
and configured with the driver, since it can be configured via mqprio
without any hardware coordination.  Fixed the recording of stats for
MACVLAN in ixgbe and fm10k instead of recording the receive queue on
MACVLAN offloaded frames.  When handling a MACVLAN offload, we should
be stopping/starting traffic on our own queues instead of the upper
devices transmit queues.  Fixed possible race conditions with the
MACVLAN cleanup with the interface cleanup on shutdown.  With the
recent fixes to ixgbe, we can cap the number of queues regardless of
accel_priv being in use or not, since the actual number of queues are
being reported via real_num_tx_queues.

Tony fixes up the kernel documentation for ixgbe and ixgbevf to resolve
warnings when W=1 is used.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mlxsw-Offload-PRIO-qdisc'

Jiri Pirko says:

====================
mlxsw: Offload PRIO qdisc

Nogah says:

Add an offload support for PRIO qdisc for mlxsw driver.
PRIO qdisc is being offloaded by using ndo_setup_tc. It has three
commands, to set or tune the qdisc, to remove it and to get its stats.

Like RED offloading, offloading this qdisc is not enforced on the driver
and determining its offload state is done in the dump action, when the
stats are being updated.
In the driver, offloading of PRIO is supported as root qdisc only. It
supports only priorities 0-7 (the range that is used by the current static
mapping of DSCP to skb prio and by 1:1 PCP values mapping) and up to 8
bands.

Patches 1-2 offload DSCP to priority mapping in the mlxsw_sp driver.
Patch 3 adds offload support for PRIO qdisc.
Patches 4-5 Add PRIO offload support in the mlxsw_sp driver.

---
v1->v2:
- Patch 1/5:
- Rewrite patch msg
- Patch 3/5:
- Send all the qstats in the replace command (and not just backlog)
- Patch 5/5:
- Align with the changes from 3/5
- Move backlog to the generic qdisc stats struct
- Delete extra newline
====================

Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum: qdiscs: Support stats for PRIO qdisc

Support basic stats for PRIO qdisc, which includes tx packets and bytes
count, drops count and backlog size. The rest of the stats are irrelevant
for this qdisc offload.
Since backlog is not only incremental but reflecting momentary value, in
case of a qdisc that stops being offloaded but is not destroyed, backlog
value needs to be updated about the un-offloading.
For that reason an unoffload function is being added to the ops struct.

Signed-off-by: Nogah Frankel <[email protected]>
Reviewed-by: Yuval Mintz <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum: qdiscs: Support PRIO qdisc offload

Add support for offloading PRIO qdisc as root qdisc.
The support is for up to 8 bands.
Routed packets priority is determined by the DSCP field with the default
translations. Bridged packets priority is determined by the PCP field, if
exist, otherwise it is set to 0.
Since both options have only priorities 0-7, higher priorities mapping are
being ignored.

Signed-off-by: Nogah Frankel <[email protected]>
Reviewed-by: Yuval Mintz <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sch: prio: Add offload ability to PRIO qdisc

Add the ability to offload PRIO qdisc by using ndo_setup_tc.
There are three commands for PRIO offloading:
* TC_PRIO_REPLACE: handles set and tune
* TC_PRIO_DESTROY: handles qdisc destroy
* TC_PRIO_STATS: updates the qdiscs counters (given as reference)

Like RED qdisc, the indication of whether PRIO is being offloaded is being
set and updated as part of the dump function. It is so because the driver
could decide to offload or not based on the qdisc parent, which could
change without notifying the qdisc.

Signed-off-by: Nogah Frankel <[email protected]>
Reviewed-by: Yuval Mintz <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_router: Configure default routing priority

When routing ip packets, the kernel is setting the SKB's priority
based on the tos field of the packet.
Imitate this behavior in the mlxsw router, having the internal
switch priority of a routed packet determined according to its DS
field.

Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: Nogah Frankel <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: reg: add rdpm register

Add rdpm definition - router DSCP to priority mapping register.
This register will be utilized later to align the default mapping between
packet DSCP and switch-priority to the kernel's mapping between
packet priority and skb priority.

This is the first non-bit indexed register where the entries are arranged
in descending order, i.e., entry at offset 0 matches configuration for
dscp[63]. As a result, the item's step is converted into a signed variable
to support descending arrays [where step would be negative].

Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: Nogah Frankel <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'dsa-mv88e6xxx-ATU-VTU-irq'

Andrew Lunn says:

====================
mv88e6xxx: ATU and VTU interrupts

Both the ATU and VTU of Mavell switches can generate interrupts when
violations occur. Trap this interrupts and print what violation
occurred.
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Decode VTU problem interrupt

When there is a problem with the VTU, an interrupt can be
generated. Trap this interrupt and decode the registers to determine
what the problem was, then log the error.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Decode ATU problem interrupt

When there is a problem with the ATU, an interrupt can be
generated. Trap this interrupt and decode the registers to determine
what the problem was, then log the error.

Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_router: Add support for IPv6 non-equal-cost multipath

Since commit eb789980d0aa ("mlxsw: spectrum_router: Populate adjacency
entries according to weights") the driver includes support for
non-equal-cost multipath, but IPv4 nexthops were the only user.

Now that the kernel supports weighted IPv6 nexthops, we can extend the
driver to support it as well.

This is done by assigning each nexthop its configured weight, so that it
will be populated accordingly in the device's adjacency table. The
`weight` parameter is also taken into account when comparing nexthop
groups in order not to consolidate non-identical groups.

Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: netsec: use dma_addr_t for storing dma address

On targets that have different sizes for phys_addr_t and dma_addr_t,
we get a type mismatch error:

drivers/net/ethernet/socionext/netsec.c: In function 'netsec_alloc_dring':
drivers/net/ethernet/socionext/netsec.c:970:9: error: passing argument 3 of 'dma_zalloc_coherent' from incompatible pointer type [-Werror=incompatible-pointer-types]

The code is otherwise correct, as the address is never actually used as a
physical address but only passed into a DMA register. For consistently,
I'm changing the variable name as well, to clarify that this is a DMA
address.

Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2018-01-13

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Follow-up fix to the recent BPF out-of-bounds speculation
   fix that prevents max_entries overflows and an undefined
   behavior on 32 bit archs on index_mask calculation, from
   Daniel.

2) Reject unsupported BPF_ARSH opcode in 32 bit ALU mode that
   was otherwise throwing an unknown opcode warning in the
   interpreter, from Daniel.

3) Typo fix in one of the user facing verbose() messages that
   was added during the BPF out-of-bounds speculation fix,
   from Colin.
====================

Signed-off-by: David S. Miller <[email protected]>

x86,perf: Disable intel_bts when PTI

The intel_bts driver does not use the 'normal' BTS buffer which is exposed
through the cpu_entry_area but instead uses the memory allocated for the
perf AUX buffer.

This obviously comes apart when using PTI because then the kernel mapping;
which includes that AUX buffer memory; disappears. Fixing this requires to
expose a mapping which is visible in all context and that's not trivial.

As a quick fix disable this driver when PTI is enabled to prevent
malfunction.

Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig")
Reported-by: Vince Weaver <[email protected]>
Reported-by: Robert Święcki <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

security/Kconfig: Correct the Documentation reference for PTI

When the config option for PTI was added a reference to documentation was
added as well. But the documentation did not exist at that point. The final
documentation has a different file name.

Fix it up to point to the proper file.

Fixes: 385ce0ea ("x86/mm/pti: Add Kconfig")
Signed-off-by: W. Trevor King <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: James Morris <[email protected]>
Cc: "Serge E. Hallyn" <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/3009cc8ccbddcd897ec1e0cb6dda524929de0d14.1515799398.git.wking@tremily.us

x86/pti: Fix !PCID and sanitize defines

The switch to the user space page tables in the low level ASM code sets
unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base
address of the page directory to the user part, bit 11 is switching the
PCID to the PCID associated with the user page tables.

This fails on a machine which lacks PCID support because bit 11 is set in
CR3. Bit 11 is reserved when PCID is inactive.

While the Intel SDM claims that the reserved bits are ignored when PCID is
disabled, the AMD APM states that they should be cleared.

This went unnoticed as the AMD APM was not checked when the code was
developed and reviewed and test systems with Intel CPUs never failed to
boot. The report is against a Centos 6 host where the guest fails to boot,
so it's not yet clear whether this is a virt issue or can happen on real
hardware too, but thats irrelevant as the AMD APM clearly ask for clearing
the reserved bits.

Make sure that on non PCID machines bit 11 is not set by the page table
switching code.

Andy suggested to rename the related bits and masks so they are clearly
describing what they should be used for, which is done as well for clarity.

That split could have been done with alternatives but the macro hell is
horrible and ugly. This can be done on top if someone cares to remove the
extra orq. For now it's a straight forward fix.

Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
Reported-by: Laura Abbott <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: stable <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Willy Tarreau <[email protected]>
Cc: David Woodhouse <[email protected]>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos

Merge tag 'usb-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB fixes and device ids for 4.15-rc8

  Nothing major, small fixes for various devices, some resolutions for
  bugs found by fuzzers, and the usual handful of new device ids.

  All of these have been in linux-next with no reported issues"

* tag 'usb-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  Documentation: usb: fix typo in UVC gadgetfs config command
  usb: misc: usb3503: make sure reset is low for at least 100us
  uas: ignore UAS for Norelsys NS1068(X) chips
  USB: UDC core: fix double-free in usb_add_gadget_udc_release
  USB: fix usbmon BUG trigger
  usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer
  usbip: remove kernel addresses from usb device and urb debug msgs
  usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
  USB: serial: cp210x: add new device ID ELV ALC 8xxx
  USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ

Merge tag 'staging-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fix from Greg KH:
"Here is a single android ashmem bugfix that resolves a reported issue
  in that interface. It's been in linux-next this week with no reported
  issues"

* tag 'staging-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl

Merge tag 'char-misc-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
"Here are two bugfixes for some driver bugs for 4.15-rc8

  The first is a bluetooth security bug that has been ignored by the
  Bluetooth developers for months for no obvious reason at all, so I've
  taken it through my tree.

  The second is a simple double-free bug in the mux subsystem.

  Both have been in linux-next for a while with no reported issues"

* tag 'char-misc-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  mux: core: fix double get_device()
  Bluetooth: Prevent stack info leak from the EFS element.

Merge tag 'kbuild-fixes-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- fix cross-compilation for architectures that setup CROSS_COMPILE in
   their arch Makefile

- fix Kconfig rational operators for bool / tristate

- drop a gperf-generated file from .gitignore

* tag 'kbuild-fixes-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  genksyms: drop *.hash.c from .gitignore
  kconfig: fix relational operators for bool and tristate symbols
  kbuild: move cc-option and cc-disable-warning after incl. arch Makefile

Merge tag 'apparmor-pr-2018-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull apparmor regression fixes from John Johansen:
"This fixes a couple bugs I have been working with Matthew Garrett on
  this week. Specifically a regression in the handling of a conflicting
  profile attachment and label match restrictions for ptrace when
  profiles are stacked.

  Summary:

   - fix ptrace label match when matching stacked labels

   - fix regression in profile conflict logic"

* tag 'apparmor-pr-2018-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
  apparmor: Fix regression in profile conflict logic
  apparmor: fix ptrace label match when matching stacked labels

Merge tag 'pci-v4.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
"Fix AMD boot regression due to 64-bit window conflicting with system
  memory (Christian König)"

* tag 'pci-v4.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  x86/PCI: Move and shrink AMD 64-bit window to avoid conflict
  x86/PCI: Add "pci=big_root_window" option for AMD 64-bit windows

Merge branch 'akpm' (patches from Andrew)

Merge misc fixlets from Andrew Morton:
"4 fixes"

* emailed patches from Andrew Morton <[email protected]>:
  tools/objtool/Makefile: don't assume sync-check.sh is executable
  kdump: write correct address of mem_section into vmcoreinfo
  kmemleak: allow to coexist with fault injection
  MAINTAINERS, nilfs2: change project home URLs

tools/objtool/Makefile: don't assume sync-check.sh is executable

patch(1) loses the x bit. So if a user follows our patching
instructions in Documentation/admin-guide/README.rst, their kernel will
not compile.

Fixes: 3bd51c5a371de ("objtool: Move kernel headers/code sync check to a script")
Reported-by: Nicolas Bock <[email protected]>
Reported-by Joakim Tjernlund <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kdump: write correct address of mem_section into vmcoreinfo

Depending on configuration mem_section can now be an array or a pointer
to an array allocated dynamically. In most cases, we can continue to
refer to it as 'mem_section' regardless of what it is.

But there's one exception: '&mem_section' means "address of the array"
if mem_section is an array, but if mem_section is a pointer, it would
mean "address of the pointer".

We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section)
writes down address of pointer into vmcoreinfo, not array as we wanted.

Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the
situation correctly for both cases.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kirill A. Shutemov <[email protected]>
Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y")
Acked-by: Baoquan He <[email protected]>
Acked-by: Dave Young <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kmemleak: allow to coexist with fault injection

kmemleak does one slab allocation per user allocation.  So if slab fault
injection is enabled to any degree, kmemleak instantly fails to allocate
and turns itself off.  However, it's useful to use kmemleak with fault
injection to find leaks on error paths.  On the other hand, checking
kmemleak itself is not so useful because (1) it's a debugging tool and
(2) it has a very regular allocation pattern (basically a single
allocation site, so it either works or not).

Turn off fault injection for kmemleak allocations.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dmitry Vyukov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS, nilfs2: change project home URLs

The domain of NILFS project home was changed to "nilfs.sourceforge.io"
to enable https access (the previous domain "nilfs.sourceforge.net" is
redirected to the new one). Modify URLs of the project home to reflect
this change and to replace their protocol from http to https.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

genksyms: drop *.hash.c from .gitignore

This is a left-over of commit bb3290d91695 ("Remove gperf usage from
toolchain").

We do not generate a hash function any more.

Signed-off-by: Masahiro Yamada <[email protected]>

selftests/x86: Add test_vsyscall

This tests that the vsyscall entries do what they're expected to do.
It also confirms that attempts to read the vsyscall page behave as
expected.

If changes are made to the vsyscall code or its memory map handling,
running this test in all three of vsyscall=none, vsyscall=emulate,
and vsyscall=native are helpful.

(Because it's easy, this also compares the vsyscall results to their
vDSO equivalents.)

Note to KAISER backporters: please test this under all three
vsyscall modes.  Also, in the emulate and native modes, make sure
that test_vsyscall_64 agrees with the command line or config
option as to which mode you're in.  It's quite easy to mess up
the kernel such that native mode accidentally emulates
or vice versa.

Greg, etc: please backport this to all your Meltdown-patched
kernels.  It'll help make sure the patches didn't regress
vsyscalls.

CSigned-off-by: Andy Lutomirski <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/2b9c5a174c1d60fd7774461d518aa75598b1d8fd.1515719552.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>

Merge branch 'error-injection'

Masami Hiramatsu says:

====================
Here are the 5th version of patches to moving error injection
table from kprobes. This version fixes a bug and update
fail-function to support multiple function error injection.

Here is the previous version:

https://patchwork.ozlabs.org/cover/858663/

Changes in v5:
- [3/5] Fix a bug that within_error_injection returns false always.
- [5/5] Update to support multiple function error injection.

Thank you,
====================

Signed-off-by: Alexei Starovoitov <[email protected]>