Git Repo - linux.git/log

Merge branch 'GSO_BY_FRAGS-correctness-improvements'

Daniel Axtens says:

====================
GSO_BY_FRAGS correctness improvements

As requested [1], I went through and had a look at users of gso_size to
see if there were things that need to be fixed to consider
GSO_BY_FRAGS, and I have tried to improve our helper functions to deal
with this case.

I found a few. This fixes bugs relating to the use of
skb_gso_*_seglen() where GSO_BY_FRAGS is not considered.

Patch 1 renames skb_gso_validate_mtu to skb_gso_validate_network_len.
This is follow-up to my earlier patch 2b16f048729b ("net: create
skb_gso_validate_mac_len()"), and just makes everything a bit clearer.

Patches 2 and 3 replace the final users of skb_gso_network_seglen() -
which doesn't consider GSO_BY_FRAGS - with
skb_gso_validate_network_len(), which does. This allows me to make the
skb_gso_*_seglen functions private in patch 4 - now future users won't
accidentally do the wrong comparison.

Two things remain. One is qdisc_pkt_len_init, which is discussed at
[2] - it's caught up in the GSO_DODGY mess. I don't have any expertise
in GSO_DODGY, and it looks like a good clean fix will involve
unpicking the whole validation mess, so I have left it for now.

Secondly, there are 3 eBPF opcodes that change the gso_size of an SKB
and don't consider GSO_BY_FRAGS. This is going through the bpf tree.

Regards,
Daniel

[1] https://patchwork.ozlabs.org/comment/1852414/
[2] https://www.spinics.net/lists/netdev/msg482397.html

PS: This is all in the core networking stack. For a driver to be
affected by this it would need to support NETIF_F_GSO_SCTP /
NETIF_F_GSO_SOFTWARE and then either use gso_size or not be a purely
virtual device. (Many drivers look at gso_size, but do not support
SCTP segmentation, so the core network will segment an SCTP gso before
it hits them.) Based on that, the only driver that may be affected is
sunvnet, but I have no way of testing it, so I haven't looked at it.

v2: split out bpf stuff
fix review comments from Dave Miller
====================

Signed-off-by: David S. Miller <[email protected]>

net: make skb_gso_*_seglen functions private

They're very hard to use properly as they do not consider the
GSO_BY_FRAGS case. Code should use skb_gso_validate_network_len
and skb_gso_validate_mac_len as they do consider this case.

Make the seglen functions static, which stops people using them
outside of skbuff.c

Signed-off-by: Daniel Axtens <[email protected]>
Reviewed-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: xfrm: use skb_gso_validate_network_len() to check gso sizes

Replace skb_gso_network_seglen() with
skb_gso_validate_network_len(), as it considers the GSO_BY_FRAGS
case.

Signed-off-by: Daniel Axtens <[email protected]>
Reviewed-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sched: tbf: handle GSO_BY_FRAGS case in enqueue

tbf_enqueue() checks the size of a packet before enqueuing it.
However, the GSO size check does not consider the GSO_BY_FRAGS
case, and so will drop GSO SCTP packets, causing a massive drop
in throughput.

Use skb_gso_validate_mac_len() instead, as it does consider that
case.

Signed-off-by: Daniel Axtens <[email protected]>
Reviewed-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len

If you take a GSO skb, and split it into packets, will the network
length (L3 headers + L4 headers + payload) of those packets be small
enough to fit within a given MTU?

skb_gso_validate_mtu gives you the answer to that question. However,
we recently added to add a way to validate the MAC length of a split GSO
skb (L2+L3+L4+payload), and the names get confusing, so rename
skb_gso_validate_mtu to skb_gso_validate_network_len

Signed-off-by: Daniel Axtens <[email protected]>
Reviewed-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"A small set of fixes for x86:

   - Add missing instruction suffixes to assembly code so it can be
     compiled by newer GAS versions without warnings.

   - Switch refcount WARN exceptions to UD2 as we did in general

   - Make the reboot on Intel Edison platforms work

   - A small documentation update so text and sample command match"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation, x86, resctrl: Make text and sample command match
  x86/platform/intel-mid: Handle Intel Edison reboot correctly
  x86/asm: Add instruction suffixes to bitops
  x86/entry/64: Add instruction suffix
  x86/refcounts: Switch to UD2 for exceptions

Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86/pti fixes from Thomas Gleixner:
"Three fixes related to melted spectrum:

   - Sync the cpu_entry_area page table to initial_page_table on 32 bit.

     Otherwise suspend/resume fails because resume uses
     initial_page_table and triggers a triple fault when accessing the
     cpu entry area.

   - Zero the SPEC_CTL MRS on XEN before suspend to address a
     shortcoming in the hypervisor.

   - Fix another switch table detection issue in objtool"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table
  objtool: Fix another switch table detection issue
  x86/xen: Zero MSR_IA32_SPEC_CTRL before suspend

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
"A small set of fixes from the timer departement:

   - Add a missing timer wheel clock forward when migrating timers off a
     unplugged CPU to prevent operating on a stale clock base and
     missing timer deadlines.

   - Use the proper shift count to extract data from a register value to
     prevent evaluating unrelated bits

   - Make the error return check in the FSL timer driver work correctly.
     Checking an unsigned variable for less than zero does not really
     work well.

   - Clarify the confusing comments in the ARC timer code"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Forward timer base before migrating timers
  clocksource/drivers/arc_timer: Update some comments
  clocksource/drivers/mips-gic-timer: Use correct shift count to extract data
  clocksource/drivers/fsl_ftm_timer: Fix error return checking

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixlet from Thomas Gleixner:
"Just a documentation update for the missing device tree property of
the R-Car M3N interrupt controller"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
dt-bindings/irqchip/renesas-irqc: Document R-Car M3-N support

Merge tag 'for-4.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- when NR_CPUS is large, a SRCU structure can significantly inflate
   size of the main filesystem structure that would not be possible to
   allocate by kmalloc, so the kvalloc fallback is used

- improved error handling

- fix endiannes when printing some filesystem attributes via sysfs,
   this is could happen when a filesystem is moved between different
   endianity hosts

- send fixes: the NO_HOLE mode should not send a write operation for a
   file hole

- fix log replay for for special files followed by file hardlinks

- fix log replay failure after unlink and link combination

- fix max chunk size calculation for DUP allocation

* tag 'for-4.16-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix log replay failure after unlink and link combination
  Btrfs: fix log replay failure after linking special file and fsync
  Btrfs: send, fix issuing write op when processing hole in no data mode
  btrfs: use proper endianness accessors for super_copy
  btrfs: alloc_chunk: fix DUP stripe size handling
  btrfs: Handle btrfs_set_extent_delalloc failure in relocate_file_extent_cluster
  btrfs: handle failure of add_pending_csums
  btrfs: use kvzalloc to allocate btrfs_fs_info

Merge branch 'dsa-serdes-stats'

Andrew Lunn says:

====================
Export SERDES stats via ethtool -S

The mv88e6352 family has a SERDES interface which can be used for
example to connect to SFF/SFP modules. This interface has a couple of
statistics counters. Add support for including these counters in the
output of ethtool -S.
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Get mv88e6352 SERDES statistics

Add support for reading the SERDES statistics of the mv88e8352, using
the standard ethtool -S option. The SERDES interface can be mapped to
either port 4 or 5, so only return statistics on those ports, if the
SERDES interface is in use.

The counters are reset on read, so need to be accumulated. Add a per
port structure to hold the stats counters. The 6352 only has a single
SERDES interface and so only one port will using the newly added
array. However the 6390 family has as many SERDES interfaces as ports,
each with statistics counters. Also, PTP has a number of counters per
port which will also need accumulating.

Signed-off-by: Andrew Lunn <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Add helper to determining if port has SERDES

Refactor the existing code. This helper will be used for SERDES
statistics.

Signed-off-by: Andrew Lunn <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Allow the SERDES interfaces to have statistics

When gettting the number of statistics, the strings and the actual
statistics, call the SERDES ops if implemented. This means the stats
code needs to return the number of strings/stats they have placed into
the data, so that the SERDES strings/stats can follow on.

Signed-off-by: Andrew Lunn <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Hold mutex while doing stats operations

Until now, there has been no need to hold the reg mutex while getting
the count of statistics, or the strings, because the hardware was not
accessed. When adding support for SERDES statistics, it is necessary
to access the hardware, to determine if a port is using the SERDES
interface. So add mutex lock/unlocks.

Signed-off-by: Andrew Lunn <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dsa: Pass the port to get_sset_count()

By passing the port, we allow different ports to have different
statistics. This is useful since some ports have SERDES interfaces
with their own statistic counters.

Signed-off-by: Andrew Lunn <[email protected]>
Tested-by: Florian Fainelli <[email protected]>
Reviewed-by: Vivien Didelot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tools: tc-testing: Add notap option

Add a command line arg to suppress tap output. Handy in case
all the tap output is being supplied by the plugins.

Signed-off-by: Brenda J. Butler <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-ipv6-Add-support-for-path-selection-using-hash-of-5-tuple'

David Ahern says:

====================
net/ipv6: Add support for path selection using hash of 5-tuple

Hardware supports multipath selection using the standard L4 5-tuple
instead of just L3 and the flow label. In addition, some network
operators prefer IPv6 path selection to use the 5-tuple. To that end,
add support to IPv6 for multipath hash policy similar to
bf4e0a3db97eb ("net: ipv4: add support for ECMP hash policy choice").
The default is still L3 which covers source and destination addresses
along with flow label and IPv6 protocol. This gives users a choice in
hash algorithms if they believe L3 only and the IPv6 flow label are not
sufficient for their use case.

A separate sysctl is added for IPv6, allowing IPv4 and IPv6 to use
different algorithms if desired.

The first 3 patches modify the IPv4 variant so that at the end of the
patch set the ipv4 and ipv6 implementations are direct parallels.

Patch 4 refactors the existing rt6_multipath_hash in preparation for
adding the policy option.

Patch 5 renames the existing netevent to have IPv4 in the name so ipv4
changes can be distinguished from IPv6 if the netevent handler cares.

Patch 6 adds the skb as an argument through the FIB lookup functions
to the multipath selection. Needed for the forwarding case.

Patch 7 adds the L4 hash support.

Patch 8 adds the hook for the netevent to the spectrum driver to update
the ASIC.

Patch 9 removes no longer used code.

Patch 10 adds a testcase for IPv6 multipath with L4 hash.

v3
- comments from Ido:
  - removed fib_info arg in patch 1; left by mistake on rebase to net-next
  - removed __get_hash_from_flowi4 declaration
  - line wrap change to spectrum_router.c to maintain 80 chars

v2
- rebased to top of tree
- added refactor of fib_multipath_hash following recent change
- plumb skb through lookup functions to multipath selection
- fix sysctl setting; was missing the data set in ipv6_sysctl_net_init
- added test case

RFC to v1:
- rebase to top of net-next
- fix addr_type in hash_keys and removed flow label as noticed by Ido
- added a comment to cover letter about choice in algorithms based on
  use case per Or's comments
====================

Signed-off-by: David S. Miller <[email protected]>

selftests: forwarding: Add multipath test for L4 hashing

Add IPv6 multipath test using L4 hashing. Created with inputs from
Ido Schimmel.

Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Remove unused get_hash_from_flow functions

__get_hash_from_flowi6 is still used for flowlabels, but the IPv4
variant and the wrappers to both are not used. Remove them.

Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_router: Add support for ipv6 hash policy update

Similar to 28678f07f127d ("mlxsw: spectrum_router: Update multipath hash
parameters upon netevents") for IPv4, make sure the kernel and asic are
using the same hash algorithm for path selection.

Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/ipv6: Add support for path selection using hash of 5-tuple

Some operators prefer IPv6 path selection to use a standard 5-tuple
hash rather than just an L3 hash with the flow the label. To that end
add support to IPv6 for multipath hash policy similar to bf4e0a3db97eb
("net: ipv4: add support for ECMP hash policy choice"). The default
is still L3 which covers source and destination addresses along with
flow label and IPv6 protocol.

Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/ipv6: Pass skb to route lookup

IPv6 does path selection for multipath routes deep in the lookup
functions. The next patch adds L4 hash option and needs the skb
for the forward path. To get the skb to the relevant FIB lookup
functions it needs to go through the fib rules layer, so add a
lookup_data argument to the fib_lookup_arg struct.

Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Rename NETEVENT_MULTIPATH_HASH_UPDATE

Rename NETEVENT_MULTIPATH_HASH_UPDATE to
NETEVENT_IPV4_MPATH_HASH_UPDATE to denote it relates to a change
in the IPv4 hash policy.

Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/ipv6: Make rt6_multipath_hash similar to fib_multipath_hash

Make rt6_multipath_hash more of a direct parallel to fib_multipath_hash
and reduce stack and overhead in the process: get_hash_from_flowi6 is
just a wrapper around __get_hash_from_flowi6 with another stack
allocation for flow_keys. Move setting the addresses, protocol and
label into rt6_multipath_hash and allow it to make the call to
flow_hash_from_keys.

Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/ipv4: Simplify fib_multipath_hash with optional flow keys

As of commit e37b1e978bec5 ("ipv6: route: dissect flow in input path if
fib rules need it") fib_multipath_hash takes an optional flow keys. If
non-NULL it means the skb has already been dissected. If not set, then
fib_multipath_hash needs to call skb_flow_dissect_flow_keys.

Simplify the logic by setting flkeys to the local stack variable keys.
Simplifies fib_multipath_hash by only have 1 set of instructions
setting hash_keys.

Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Align ip_multipath_l3_keys and ip6_multipath_l3_keys

Symmetry is good and allows easy comparison that ipv4 and ipv6 are
doing the same thing. To that end, change ip_multipath_l3_keys to
set addresses at the end after the icmp compares, and move the
initialization of ipv6 flow keys to rt6_multipath_hash.

Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/ipv4: Pass net to fib_multipath_hash instead of fib_info

fib_multipath_hash only needs net struct to check a sysctl. Make it
clear by passing net instead of fib_info. In the end this allows
alignment between the ipv4 and ipv6 versions.

Signed-off-by: David Ahern <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'sctp-clean-up-sctp_sendmsg'

Xin Long says:

====================
sctp: clean up sctp_sendmsg

This cleanup mostly does three things:

- extract some codes into functions to make sendmsg more readable.

- tidy up some codes to avoid the unnecessary checks.

- adjust some logic so that it will be easier to add the send flags
   and cmsgs features that I will post after this.

To make it easy to review and to check if the code is compatible with
before, this patchset is to do it step by step in 9 patches.

NOTE:
There will be a conflict when merging
Commit 2277c7cd75e3 ("sctp: Add LSM hooks") from selinux tree,
the solution is to:

1. remove all the lines in [B]:

    <<<<<<< HEAD
    [A]
    =======
    [B]
    >>>>>>> 2277c7c... sctp: Add LSM hooks

2. and apply the following diff-output:

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 980621e..d6803c8 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1686,6 +1686,7 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,
struct net *net = sock_net(sk);
struct sctp_association *asoc;
enum sctp_scope scope;
+ struct sctp_af *af;
int err = -EINVAL;

*tp = NULL;
@@ -1711,6 +1712,22 @@ static int sctp_sendmsg_new_asoc(struct sock *sk, __u16 sflags,

scope = sctp_scope(daddr);

+ /* Label connection socket for first association 1-to-many
+ * style for client sequence socket()->sendmsg(). This
+ * needs to be done before sctp_assoc_add_peer() as that will
+ * set up the initial packet that needs to account for any
+ * security ip options (CIPSO/CALIPSO) added to the packet.
+ */
+ af = sctp_get_af_specific(daddr->sa.sa_family);
+ if (!af)
+ return -EINVAL;
+
+ err = security_sctp_bind_connect(sk, SCTP_SENDMSG_CONNECT,
+ (struct sockaddr *)daddr,
+ af->sockaddr_len);
+ if (err < 0)
+ return err;
+
asoc = sctp_association_new(ep, sk, scope, GFP_KERNEL);
if (!asoc)
return -ENOMEM;
====================

Acked-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sctp: adjust some codes in a better order in sctp_sendmsg

sctp_sendmsg_new_asoc and SCTP_ADDR_OVER check is only necessary
when daddr is set, so move them up to if (daddr) statement.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sctp: improve some variables in sctp_sendmsg

This patch mostly is to:

  - rename sinfo_flags as sflags, to make the indents look better, and
    also keep consistent with other sctp_sendmsg_xx functions.

  - replace new_asoc with bool new, no need to define a pointer here,
    as if new_asoc is set, it must be asoc.

  - rename the 'out_nounlock:' as 'out', shorter and nicer.

  - remove associd, only one place is using it now, just use
    sinfo->sinfo_assoc_id directly.

  - remove 'cmsgs' initialization in sctp_sendmsg, as it will be done
    in sctp_sendmsg_parse.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sctp: remove the unnecessary transport looking up from sctp_sendmsg

Now sctp_assoc_lookup_paddr can only be called only if daddr has
been set. But if daddr has been set, sctp_endpoint_lookup_assoc
would be done, where it could already have the transport.

So this unnecessary transport looking up should be removed, but
only reset transport as NULL when SCTP_ADDR_OVER is not set for
UDP type socket.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sctp: factor out sctp_sendmsg_update_sinfo from sctp_sendmsg

This patch is to move the codes for trying to get sinfo from
asoc into sctp_sendmsg_update_sinfo.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sctp: factor out sctp_sendmsg_parse from sctp_sendmsg

This patch is to move the codes for parsing msghdr and checking
sk into sctp_sendmsg_parse.

Note that different from before, 'sinfo' in sctp_sendmsg won't
be NULL any more. It gets the value either from cmsgs->srinfo,
cmsgs->sinfo or asoc. With it, the 'sinfo' and 'fill_sinfo_ttl'
check can be removed from sctp_sendmsg.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sctp: factor out sctp_sendmsg_get_daddr from sctp_sendmsg

This patch is to move the codes for trying to get daddr from
msg->msg_name into sctp_sendmsg_get_daddr.

Note that after adding 'daddr', 'to' and 'msg_name' can be
deleted.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sctp: factor out sctp_sendmsg_check_sflags from sctp_sendmsg

This patch is to move the codes for checking sinfo_flags on one asoc
after this asoc has been found into sctp_sendmsg_check_sflags.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sctp: factor out sctp_sendmsg_new_asoc from sctp_sendmsg

This patch is to move the codes for creating a new asoc if
no asoc was found into sctp_sendmsg_new_asoc.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sctp: factor out sctp_sendmsg_to_asoc from sctp_sendmsg

This patch is to move the codes for checking and sending on
one asoc after this asoc has been found or created into
sctp_sendmsg_to_asoc.

Note that 'err != -ESRCH' check is for the case that asoc is
freed when waiting for tx buffer in sctp_sendmsg_to_asoc.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'batadv-net-for-davem-20180302' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here are some batman-adv bugfixes:

- fix skb checksum issues, by Matthias Schiffer (2 patches)

- fix exception handling when dumping data objects through netlink,
by Sven Eckelmann (4 patches)

- fix handling of interface indices, by Sven Eckelmann
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"A driver fix and a documentation fix (which makes dependency handling
  for the next cycle easier)"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: octeon: Prevent error message on bus error
  dt-bindings: at24: sort manufacturers alphabetically

Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
"A 4.16 regression fix, three fixes for -stable, and a cleanup fix:

   - During the merge window support for the new ACPI NVDIMM Platform
     Capabilities structure disabled support for "deep flush", a
     force-unit- access like mechanism for persistent memory. Restore
     that mechanism.

   - VFIO like RDMA is yet one more memory registration / pinning
     interface that is incompatible with Filesystem-DAX. Disable long
     term pins of Filesystem-DAX mappings via VFIO.

   - The Filesystem-DAX detection to prevent long terms pins mistakenly
     also disabled Device-DAX pins which are not subject to the same
     block- map collision concerns.

   - Similar to the setup path, softlockup warnings can trigger in the
     shutdown path for large persistent memory namespaces. Teach
     for_each_device_pfn() to perform cond_resched() in all cases.

   - Boaz noticed that the might_sleep() in dax_direct_access() is stale
     as of the v4.15 kernel.

  These have received a build success notification from the 0day robot,
  and the longterm pin fixes have appeared in -next. However, I recently
  rebased the tree to remove some other fixes that need to be reworked
  after review feedback.

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  memremap: fix softlockup reports at teardown
  libnvdimm: re-enable deep flush for pmem devices via fsync()
  vfio: disable filesystem-dax page pinning
  dax: fix vma_is_fsdax() helper
  dax: ->direct_access does not sleep anymore

Merge tag 'kbuild-fixes-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

- suppress sparse warnings about unknown attributes

- fix typos and stale comments

- fix build error of arch/sh

- fix wrong use of ld-option vs cc-ldoption

- remove redundant GCC_PLUGINS_CFLAGS assignment

- fix another memory leak of Kconfig

- fix line number in error messages of Kconfig

- do not write confusing CONFIG_DEFCONFIG_LIST out to .config

- add xstrdup() to Kconfig to handle memory shortage errors

- show also a Debian package name if ncurses is missing

* tag 'kbuild-fixes-v4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  MAINTAINERS: take over Kconfig maintainership
  kconfig: fix line number in recursive inclusion error message
  Coccinelle: memdup: Fix typo in warning messages
  kconfig: Update ncurses package names for menuconfig
  kbuild/kallsyms: trivial typo fix
  kbuild: test --build-id linker flag by ld-option instead of cc-ldoption
  kbuild: drop superfluous GCC_PLUGINS_CFLAGS assignment
  kconfig: Don't leak choice names during parsing
  sh: fix build error for empty CONFIG_BUILTIN_DTB_SOURCE
  kconfig: set SYMBOL_AUTO to the symbol marked with defconfig_list
  kconfig: add xstrdup() helper
  kbuild: disable sparse warnings about unknown attributes
  Makefile: Fix lying comment re. silentoldconfig

Merge tag 'media/v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

  - some build fixes with randconfigs

  - an m88ds3103 fix to prevent an OOPS if the chip doesn't provide the
    right version during probe (with can happen if the hardware hangs)

  - a potential out of array bounds reference in tvp5150

  - some fixes and improvements in the DVB memory mapped API (added for
    kernel 4.16)

* tag 'media/v4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: vb2: Makefile: place vb2-trace together with vb2-core
  media: Don't let tvp5150_get_vbi() go out of vbi_ram_default array
  media: dvb: update buffer mmaped flags and frame counter
  media: dvb: add continuity error indicators for memory mapped buffers
  media: dmxdev: Fix the logic that enables DMA mmap support
  media: dmxdev: fix error code for invalid ioctls
  media: m88ds3103: don't call a non-initalized function
  media: au0828: add VIDEO_V4L2 dependency
  media: dvb: fix DVB_MMAP dependency
  media: dvb: fix DVB_MMAP symbol name
  media: videobuf2: fix build issues with vb2-trace
  media: videobuf2: Add VIDEOBUF2_V4L2 Kconfig option for VB2 V4L2 part

Merge tag 'linux-watchdog-4.16-fixes-1' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog fixes from Wim Van Sebroeck:

- rave-sp: add NVMEM dependency

- build fixes for i6300esb_wdt, xen_wdt and sp5100_tco

* tag 'linux-watchdog-4.16-fixes-1' of git://www.linux-watchdog.org/linux-watchdog:
  watchdog: sp5100_tco.c: fix potential build failure
  watchdog: xen_wdt: fix potential build failure
  watchdog: i6300esb: fix build failure
  watchdog: rave-sp: add NVMEM dependency

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
"x86:

   - fix NULL dereference when using userspace lapic

   - optimize spectre v1 mitigations by allowing guests to use LFENCE

   - make microcode revision configurable to prevent guests from
     unnecessarily blacklisting spectre v2 mitigation feature"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: fix vcpu initialization with userspace lapic
  KVM: X86: Allow userspace to define the microcode version
  KVM: X86: Introduce kvm_get_msr_feature()
  KVM: SVM: Add MSR-based feature support for serializing LFENCE
  KVM: x86: Add a framework for supporting MSR-based features

memremap: fix softlockup reports at teardown

The cond_resched() currently in the setup path needs to be duplicated in
the teardown path. Rather than require each instance of
for_each_device_pfn() to open code the same sequence, embed it in the
helper.

Link: https://github.com/intel/ixpdimm_sw/issues/11
Cc: "Jérôme Glisse" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: <[email protected]>
Fixes: 71389703839e ("mm, zone_device: Replace {get, put}_zone_device_page()...")
Signed-off-by: Dan Williams <[email protected]>

libnvdimm: re-enable deep flush for pmem devices via fsync()

Re-enable deep flush so that users always have a way to be sure that a
write makes it all the way out to media. Writes from the PMEM driver
always arrive at the NVDIMM since movnt is used to bypass the cache, and
the driver relies on the ADR (Asynchronous DRAM Refresh) mechanism to
flush write buffers on power failure. The Deep Flush mechanism is there
to explicitly write buffers to protect against (rare) ADR failure. This
change prevents a regression in deep flush behavior so that applications
can continue to depend on fsync() as a mechanism to trigger deep flush
in the filesystem-DAX case.

Fixes: 06e8ccdab15f4 ("acpi: nfit: Add support for detect platform CPU cache...")
Reviewed-by: Jeff Moyer <[email protected]>
Signed-off-by: Dave Jiang <[email protected]>
Signed-off-by: Dan Williams <[email protected]>

MAINTAINERS: take over Kconfig maintainership

I have recently picked up Kconfig patches to my tree without any
declaration. Making it official now.

Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Linus Torvalds <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-03-03

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Extend bpftool to build up CFG information of eBPF programs and add an
   option to dump this in DOT format such that this can later be used with
   DOT graphic tools (xdot, graphviz, etc) to visualize it. Part of the
   analysis performed is sub-program detection and basic-block partitioning,
   from Jiong.

2) Multiple enhancements for bpftool's batch mode, more specifically the
   parser now understands comments (#), continuation lines (\), and arguments
   enclosed between quotes. Also, allow to read from stdin via '-' as input
   file, all from Quentin.

3) Improve BPF kselftests by i) unifying the rlimit handling into a helper
   that is then used by all tests, and ii) add support for testing tail calls
   to test_verifier plus add tests covering all corner cases. The latter is
   especially useful for testing JITs, from Daniel.

4) Remove x64 JIT's bpf_flush_icache() since flush_icache_range() is a noop
   on x64, from Daniel.

5) Fix one more occasion in BPF samples where we do not detach the BPF program
   from the cgroup after completion, from Prashant.
====================

Signed-off-by: David S. Miller <[email protected]>

vfio: disable filesystem-dax page pinning

Filesystem-DAX is incompatible with 'longterm' page pinning. Without
page cache indirection a DAX mapping maps filesystem blocks directly.
This means that the filesystem must not modify a file's block map while
any page in a mapping is pinned. In order to prevent the situation of
userspace holding of filesystem operations indefinitely, disallow
'longterm' Filesystem-DAX mappings.

RDMA has the same conflict and the plan there is to add a 'with lease'
mechanism to allow the kernel to notify userspace that the mapping is
being torn down for block-map maintenance. Perhaps something similar can
be put in place for vfio.

Note that xfs and ext4 still report:

"DAX enabled. Warning: EXPERIMENTAL, use at your own risk"

...at mount time, and resolving the dax-dma-vs-truncate problem is one
of the last hurdles to remove that designation.

Acked-by: Alex Williamson <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: [email protected]
Cc: <[email protected]>
Reported-by: Haozhong Zhang <[email protected]>
Tested-by: Haozhong Zhang <[email protected]>
Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>

Merge tag 'pci-v4.16-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

- Update pci.ids location (documentation only) (Randy Dunlap)

- Fix a crash when BIOS didn't assign a BAR and we try to enlarge it
   (Christian König)

* tag 'pci-v4.16-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Allow release of resources that were never assigned
  PCI: Update location of pci.ids file

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Put back reference on CLUSTERIP configuration structure from the
   error path, patch from Florian Westphal.

2) Put reference on CLUSTERIP configuration instead of freeing it,
   another cpu may still be walking over it, also from Florian.

3) Refetch pointer to IPv6 header from nf_nat_ipv6_manip_pkt() given
   packet manipulation may reallocation the skbuff header, from Florian.

4) Missing match size sanity checks in ebt_among, from Florian.

5) Convert BUG_ON to WARN_ON in ebtables, from Florian.

6) Sanity check userspace offsets from ebtables kernel, from Florian.

7) Missing checksum replace call in flowtable IPv4 DNAT, from Felix
   Fietkau.

8) Bump the right stats on checksum error from bridge netfilter,
   from Taehee Yoo.

9) Unset interface flag in IPv6 fib lookups otherwise we get
   misleading routing lookup results, from Florian.

10) Missing sk_to_full_sk() in ip6_route_me_harder() from Eric Dumazet.

11) Don't allow devices to be part of multiple flowtables at the same
    time, this may break setups.

12) Missing netlink attribute validation in flowtable deletion.

13) Wrong array index in nf_unregister_net_hook() call from error path
    in flowtable addition path.

14) Fix FTP IPVS helper when NAT mangling is in place, patch from
    Julian Anastasov.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'parisc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:

- a patch to change the ordering of cache and TLB flushes to hopefully
   fix the random segfaults we very rarely face (by Dave Anglin).

- a patch to hide the virtual kernel memory layout due to security
   reasons.

- two small patches to make the kernel run more smoothly under qemu.

* 'parisc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Reduce irq overhead when run in qemu
  parisc: Use cr16 interval timers unconditionally on qemu
  parisc: Check if secondary CPUs want own PDC calls
  parisc: Hide virtual kernel memory layout
  parisc: Fix ordering of cache and TLB flushes

Merge tag 'for-linus-4.16a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
"Five minor fixes for Xen-specific drivers"

* tag 'for-linus-4.16a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  pvcalls-front: 64-bit align flags
  x86/xen: add tty0 and hvc0 as preferred consoles for dom0
  xen-netfront: Fix hang on device removal
  xen/pirq: fix error path cleanup when binding MSIs
  xen/pvcalls: fix null pointer dereference on map->sock

Merge tag 'ceph-for-4.16-rc4' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
"A cap handling fix from Zhi that ensures that metadata writeback isn't
  delayed and three error path memory leak fixups from Chengguang"

* tag 'ceph-for-4.16-rc4' of git://github.com/ceph/ceph-client:
  ceph: fix potential memory leak in init_caches()
  ceph: fix dentry leak when failing to init debugfs
  libceph, ceph: avoid memory leak when specifying same option several times
  ceph: flush dirty caps of unlinked inode ASAP

Merge tag 'for-linus-20180302' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"A collection of fixes for this series. This is a little larger than
  usual at this time, but that's mainly because I was out on vacation
  last week. Nothing in here is major in any way, it's just two weeks of
  fixes. This contains:

   - NVMe pull from Keith, with a set of fixes from the usual suspects.

   - mq-deadline zone unlock fix from Damien, fixing an issue with the
     SMR zone locking added for 4.16.

   - two bcache fixes sent in by Michael, with changes from Coly and
     Tang.

   - comment typo fix from Eric for blktrace.

   - return-value error handling fix for nbd, from Gustavo.

   - fix a direct-io case where we don't defer to a completion handler,
     making us sleep from IRQ device completion. From Jan.

   - a small series from Jan fixing up holes around handling of bdev
     references.

   - small set of regression fixes from Jiufei, mostly fixing problems
     around the gendisk pointer -> partition index change.

   - regression fix from Ming, fixing a boundary issue with the discard
     page cache invalidation.

   - two-patch series from Ming, fixing both a core blk-mq-sched and
     kyber issue around token freeing on a requeue condition"

* tag 'for-linus-20180302' of git://git.kernel.dk/linux-block: (24 commits)
  block: fix a typo
  block: display the correct diskname for bio
  block: fix the count of PGPGOUT for WRITE_SAME
  mq-deadline: Make sure to always unlock zones
  nvmet: fix PSDT field check in command format
  nvme-multipath: fix sysfs dangerously created links
  nbd: fix return value in error handling path
  bcache: fix kcrashes with fio in RAID5 backend dev
  bcache: correct flash only vols (check all uuids)
  blktrace_api.h: fix comment for struct blk_user_trace_setup
  blockdev: Avoid two active bdev inodes for one device
  genhd: Fix BUG in blkdev_open()
  genhd: Fix use after free in __blkdev_get()
  genhd: Add helper put_disk_and_module()
  genhd: Rename get_disk() to get_disk_and_module()
  genhd: Fix leaked module reference for NVME devices
  direct-io: Fix sleep in atomic due to sync AIO
  nvme-pci: Fix nvme queue cleanup if IRQ setup fails
  block: kyber: fix domain token leak during requeue
  blk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatch
  ...

selftests: memory-hotplug: fix emit_tests regression

Commit 16c513b13477
("selftests: memory-hotplug: silence test command echo")

introduced regression in emit_tests and results in the following
failure when selftests are installed and run. Fix it.

Running tests in memory-hotplug
========================================
./run_kselftest.sh: line 121: @./mem-on-off-test.sh: No such file or
directory
selftests: memory-hotplug [FAIL]

Fixes: 16c513b13477 (selftests: memory-hotplug: silence test command echo")
Reported-by: Naresh Kamboju <[email protected]>
Tested-by: Anders Roxell <[email protected]>
Signed-off-by: Shuah Khan <[email protected]>

Merge tag 'mmc-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
"MMC core:
   - mmc: core: Avoid hang when claiming host

  MMC host:
   - dw_mmc: Avoid hang when accessing registers
   - dw_mmc: Fix out-of-bounds access for slot's caps
   - dw_mmc-k3: Fix out-of-bounds access through DT alias
   - sdhci-pci: Fix S0i3 for Intel BYT-based controllers"

* tag 'mmc-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: core: Avoid hanging to claim host for mmc via some nested calls
  mmc: dw_mmc: Avoid accessing registers in runtime suspended state
  mmc: dw_mmc: Fix out-of-bounds access for slot's caps
  mmc: dw_mmc: Factor out dw_mci_init_slot_caps
  mmc: dw_mmc-k3: Fix out-of-bounds access through DT alias
  mmc: sdhci-pci: Fix S0i3 for Intel BYT-based controllers

Merge tag 'pm-4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix three issues in cpufreq drivers: one recent regression, one
  leftover Kconfig dependency and one old but "stable" material.

  Specifics:

   - Make the task scheduler load and utilization signals be
     frequency-invariant again after recent changes in the SCPI cpufreq
     driver (Dietmar Eggemann).

   - Drop an unnecessary leftover Kconfig dependency from the SCPI
     cpufreq driver (Sudeep Holla).

   - Fix the initialization of the s3c24xx cpufreq driver (Viresh
     Kumar)"

* tag 'pm-4.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: s3c24xx: Fix broken s3c_cpufreq_init()
  cpufreq: scpi: Fix incorrect arm_big_little config dependency
  cpufreq: scpi: invoke frequency-invariance setter function

kconfig: fix line number in recursive inclusion error message

When recursive inclusion is detected, the line number of the last
'included from:' is wrong.

[Test Case]

Kconfig:
  -------->8--------
  source "Kconfig2"
  -------->8--------

Kconfig2:
  -------->8--------
  source "Kconfig3"
  -------->8--------

Kconfig3:
  -------->8--------
  source "Kconfig"
  -------->8--------

[Result]

  $ make allyesconfig
  scripts/kconfig/conf  --allyesconfig Kconfig
  Kconfig:1: recursive inclusion detected. Inclusion path:
    current file : 'Kconfig'
    included from: 'Kconfig3:1'
    included from: 'Kconfig2:1'
    included from: 'Kconfig:3'
  scripts/kconfig/Makefile:89: recipe for target 'allyesconfig' failed
  make[1]: *** [allyesconfig] Error 1
  Makefile:512: recipe for target 'allyesconfig' failed
  make: *** [allyesconfig] Error 2

where we expect

    current file : 'Kconfig'
    included from: 'Kconfig3:1'
    included from: 'Kconfig2:1'
    included from: 'Kconfig:1'

The 'iter->lineno+1' in the second fpinrtf() should be 'iter->lineno-1'.
I refactored the code to merge the two fprintf() calls.

Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Ulf Magnusson <[email protected]>

Coccinelle: memdup: Fix typo in warning messages

Replace 'kmemdep' with 'kmemdup' in warning messages.

Signed-off-by: Dafna Hirschfeld <[email protected]>
Acked-by: Julia Lawall <[email protected]>
Acked-by: Nicolas Palix <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

net/usb/kalmia: use ARRAY_SIZE for various array sizing calculations

Use the ARRAY_SIZE macro on a couple of arrays to determine
size of the arrays. Also fix up alignment to clean up a checkpatch
warning. Improvement suggested by Coccinelle.

Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb4: Add TP Congestion map entry for single-port

Add TP Congestion Map entry for single-port T6 cards.

Signed-off-by: Casey Leedom <[email protected]>
Signed-off-by: Ganesh Goudar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'mac80211-next-for-davem-2018-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Only a few new things:
* hwsim net namespace stuff from Kirill Tkhai
* A-MSDU support in fast-RX
* 4-addr mode support in fast-RX
* support for a spec quirk in Add-BA negotiation
====================

Signed-off-by: David S. Miller <[email protected]>

cxgb4: remove dead code when allocating filter

Error code is already returned earlier if filter exists
at specified location. So, remove dead code trying to
free existing filter.

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Ganesh Goudar <[email protected]>
Signed-off-by: Rahul Lakkireddy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'mac80211-for-davem-2018-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Three more patches:
* fix for a regression in 4-addr mode with fast-RX
* fix for a Kconfig problem with the new regdb
* fix for the long-standing TCP performance issue in
wifi using the new sk_pacing_shift_update()
====================

Signed-off-by: David S. Miller <[email protected]>

rds: Incorrect reference counting in TCP socket creation

Commit 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the
accept socket") has a reference counting issue in TCP socket creation
when accepting a new connection.  The code uses sock_create_lite() to
create a kernel socket.  But it does not do __module_get() on the
socket owner.  When the connection is shutdown and sock_release() is
called to free the socket, the owner's reference count is decremented
and becomes incorrect.  Note that this bug only shows up when the socket
owner is configured as a kernel module.

v2: Update comments

Fixes: 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the accept socket")
Signed-off-by: Ka-Cheong Poon <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Acked-by: Sowmini Varadhan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

i2c: octeon: Prevent error message on bus error

The error message:

[Fri Feb 16 13:42:13 2018] i2c-thunderx 0000:01:09.4: unhandled state: 0

is mis-leading as state 0 (bus error) is not an unknown state.

Return -EIO as before but avoid printing the message. Also rename
STAT_ERROR to STATE_BUS_ERROR.

Signed-off-by: Jan Glauber <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>

Merge tag 'at24-4.16-rc4-for-wolfram' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-current

Pull in this fixup to get rid of a dependency for the next cycle:

"- sort the manufacturers in DT bindings alphabetically"

Merge branch 'cpufreq-scpi'

* cpufreq-scpi:
cpufreq: scpi: Fix incorrect arm_big_little config dependency
cpufreq: scpi: invoke frequency-invariance setter function

parisc: Reduce irq overhead when run in qemu

When run under QEMU, calling mfctl(16) creates some overhead because the
qemu timer has to be scaled and moved into the register. This patch
reduces the number of calls to mfctl(16) by moving the calls out of the
loops.

Additionally, increase the minimal time interval to 8000 cycles instead
of 500 to compensate possible QEMU delays when delivering interrupts.

Signed-off-by: Helge Deller <[email protected]>
Cc: [email protected] # 4.14+

parisc: Use cr16 interval timers unconditionally on qemu

When running on qemu we know that the (emulated) cr16 cpu-internal
clocks are syncronized. So let's use them unconditionally on qemu.

Signed-off-by: Helge Deller <[email protected]>
Cc: [email protected] # 4.14+

parisc: Check if secondary CPUs want own PDC calls

The architecture specification says (for 64-bit systems): PDC is a per
processor resource, and operating system software must be prepared to
manage separate pointers to PDCE_PROC for each processor. The address
of PDCE_PROC for the monarch processor is stored in the Page Zero
location MEM_PDC. The address of PDCE_PROC for each non-monarch
processor is passed in gr26 when PDCE_RESET invokes OS_RENDEZ.

Currently we still use one PDC for all CPUs, but in case we face a
machine which is following the specification let's warn about it.

Signed-off-by: Helge Deller <[email protected]>

parisc: Hide virtual kernel memory layout

For security reasons do not expose the virtual kernel memory layout to
userspace.

Signed-off-by: Helge Deller <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Cc: [email protected] # 4.15
Reviewed-by: Kees Cook <[email protected]>

parisc: Fix ordering of cache and TLB flushes

The change to flush_kernel_vmap_range() wasn't sufficient to avoid the
SMP stalls.  The problem is some drivers call these routines with
interrupts disabled.  Interrupts need to be enabled for flush_tlb_all()
and flush_cache_all() to work.  This version adds checks to ensure
interrupts are not disabled before calling routines that need IPI
interrupts.  When interrupts are disabled, we now drop into slower code.

The attached change fixes the ordering of cache and TLB flushes in
several cases.  When we flush the cache using the existing PTE/TLB
entries, we need to flush the TLB after doing the cache flush.  We don't
need to do this when we flush the entire instruction and data caches as
these flushes don't use the existing TLB entries.  The same is true for
tmpalias region flushes.

The flush_kernel_vmap_range() and invalidate_kernel_vmap_range()
routines have been updated.

Secondly, we added a new purge_kernel_dcache_range_asm() routine to
pacache.S and use it in invalidate_kernel_vmap_range().  Nominally,
purges are faster than flushes as the cache lines don't have to be
written back to memory.

Hopefully, this is sufficient to resolve the remaining problems due to
cache speculation.  So far, testing indicates that this is the case.  I
did work up a patch using tmpalias flushes, but there is a performance
hit because we need the physical address for each page, and we also need
to sequence access to the tmpalias flush code.  This increases the
probability of stalls.

Signed-off-by: John David Anglin <[email protected]>
Cc: [email protected] # 4.9+
Signed-off-by: Helge Deller <[email protected]>

net: Convert hwsim_net_ops

These pernet_operations allocate and destroy IDA identifier,
and these actions are synchronized by IDA subsystem locks.
Exit method removes mac80211_hwsim_data enteries from the lists,
and this is synchronized by hwsim_radio_lock with the rest
parallel pernet_operations. Also it queues destroy_radio()
work, and these work already may be executed in parallel
with any pernet_operations (as it's a work :). So, we may
mark these pernet_operations as async.

Signed-off-by: Kirill Tkhai <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

mac80211_hwsim: Make hwsim_netgroup IDA

hwsim_netgroup counter is declarated as int, and it is incremented
every time a new net is created. After sizeof(int) net are created,
it will overflow, and different net namespaces will have the same
identifier. This patch fixes the problem by introducing IDA instead
of int counter. IDA guarantees, all the net namespaces have the uniq
identifier.

Note, that after we do ida_simple_remove() in hwsim_exit_net(),
and we destroy the ID, later there may be executed destroy_radio()
from the workqueue. But destroy_radio() does not use the ID, so it's OK.

Out of bounds of this patch, just as a report to wireless subsystem
maintainer, destroy_radio() increaments hwsim_radios_generation
without hwsim_radio_lock, so this may need one more patch to fix.

Signed-off-by: Kirill Tkhai <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

Merge branch 'bpf-bpftool-batch-improvements'

Quentin Monnet says:

====================
Several enhancements for bpftool batch mode are introduced in this series.

More specifically, input files for batch mode gain support for:
  * comments (starting with '#'),
  * continuation lines (after a line ending with '\'),
  * arguments enclosed between quotes.

Also, make bpftool able to read from standard input when "-" is provided as
input file name.
====================

Signed-off-by: Daniel Borkmann <[email protected]>

tools: bpftool: add support for quotations in batch files

Improve argument parsing from batch input files in order to support
arguments enclosed between single (') or double quotes ("). For example,
this command can now be parsed in batch mode:

bpftool prog dump xlated id 1337 file "/tmp/my file with spaces"

The function responsible for parsing command arguments is copied from
its counterpart in lib/utils.c in iproute2 package.

Signed-off-by: Quentin Monnet <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

tools: bpftool: read from stdin when batch file name is "-"

Make bpftool read its command list from standard input when the name if
the input file is a single dash.

Signed-off-by: Quentin Monnet <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

tools: bpftool: support continuation lines in batch files

Add support for continuation lines, such as in the following example:

    prog show
    prog dump xlated \
        id 1337 opcodes

This patch is based after the code for support for continuation lines
from file lib/utils.c from package iproute2.

"Lines" in error messages are renamed as "commands", as we count the
number of commands (but we ignore empty lines, comments, and do not add
continuation lines to the count).

Signed-off-by: Quentin Monnet <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

tools: bpftool: support comments in batch files

Replace '#' by '\0' in commands read from batch files in order to avoid
processing the remaining part of the line, thus allowing users to use
comments in the files.

Signed-off-by: Quentin Monnet <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

Merge branch 'tcp_bbr-more-GSO-work'

Eric Dumazet says:

====================
tcp_bbr: more GSO work

Playing with r8152 USB 1Gbit NIC, on both USB2 and USB3 slots, I found
that BBR was performing poorly, because of TSO being limited to 16KB

This patch series makes sure BBR is not under estimating number of
packets that are needed to fill the pipe when a device has suboptimal
TSO limits.
====================

Signed-off-by: David S. Miller <[email protected]>

tcp_bbr: remove bbr->tso_segs_goal

Its value is computed then immediately used,
there is no need to store it.

Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp_bbr: better deal with suboptimal GSO (II)

This is second part of dealing with suboptimal device gso parameters.
In first patch (350c9f484bde "tcp_bbr: better deal with suboptimal GSO")
we dealt with devices having low gso_max_segs

Some devices lower gso_max_size from 64KB to 16 KB (r8152 is an example)

In order to probe an optimal cwnd, we want BBR being not sensitive
to whatever GSO constraint a device can have.

This patch removes tso_segs_goal() CC callback in favor of
min_tso_segs() for CC wanting to override sysctl_tcp_min_tso_segs

Next patch will remove bbr->tso_segs_goal since it does not have
to be persistent.

Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2018-02-28

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Add schedule points and reduce the number of loop iterations
   the test_bpf kernel module is performing in order to not hog
   the CPU for too long, from Eric.

2) Fix an out of bounds access in tail calls in the ppc64 BPF
   JIT compiler, from Daniel.

3) Fix a crash on arm64 on unaligned BPF xadd operations that
   could be triggered via interpreter and JIT, from Daniel.

Please not that once you merge net into net-next at some point, there
is a minor merge conflict in test_verifier.c since test cases had
been added at the end in both trees. Resolution is trivial: keep all
the test cases from both trees.
====================

Signed-off-by: David S. Miller <[email protected]>

net: ethtool: don't ignore return from driver get_fecparam method

If ethtool_ops->get_fecparam returns an error, pass that error on to the
user, rather than ignoring it.

Fixes: 1a5f3da20bd9 ("net: ethtool: add support for forward error correction modes")
Signed-off-by: Edward Cree <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

vrf: check forwarding on the original netdevice when generating ICMP dest unreachable

When ip_error() is called the device is the l3mdev master instead of the
original device. So the forwarding check should be on the original one.

Changes from v2:
- Handle the original device disappearing (per David Ahern)
- Minimize the change in code order

Changes from v1:
- Only need to reset the device on which __in_dev_get_rcu() is done (per
David Ahern).

Signed-off-by: Stephen Suryaputra <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'bpftool-visualization'

Jakub Kicinski says:

====================
Jiong says:

This patch set is an application of CFG information on eBPF program
visualization. It presents some initial code for building CFG information
from eBPF instruction sequences.

After we get eBPF program bytecode, we do sub-program detection and
basic-block partition. These information then are visualized into DOT
graph.

The user could use any DOT graphic tools (xdot, graphviz etc) to view it.

For example:

  bpftool prog dump xlated id 2 visual &>output.dot

  [xdot | dotty] output.dot
  dot -Tpng -o output.png

This initial patch set hasn't tuned much on the dot description layout
nor decoration, we could improve them later once the direction of the patch
set is agreed on. We could also visualize some static analysis performance
data.

v2 (Jakub):
- update license headers and add SPDX tags.
====================

Acked-by: David S. Miller <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

tools: bpftool: add bash completion for CFG dump

Add bash completion for the "visual" keyword used for dumping the CFG of
eBPF programs with bpftool. Make sure we only complete with this keyword
when we dump "xlated" (and not "jited") instructions.

Acked-by: Jiong Wang <[email protected]>
Signed-off-by: Quentin Monnet <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

tools: bpftool: new command-line option and documentation for 'visual'

This patch adds new command-line option for visualizing the xlated eBPF
sequence.

Documentations are updated accordingly.

Usage:

bpftool prog dump xlated id 2 visual

Reviewed-by: Quentin Monnet <[email protected]>
Signed-off-by: Jiong Wang <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

tools: bpftool: generate .dot graph from CFG information

This patch let bpftool print .dot graph file into stdout.

This graph is generated by the following steps:

  - iterate through the function list.
  - generate basic-block(BB) definition for each BB in the function.
  - draw out edges to connect BBs.

This patch is the initial support, the layout and decoration of the .dot
graph could be improved.

Also, it will be useful if we could visualize some performance data from
static analysis.

Signed-off-by: Jiong Wang <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

tools: bpftool: add out edges for each basic-block

This patch adds out edges for each basic-block. We will need these out
edges to finish the .dot graph drawing.

Signed-off-by: Jiong Wang <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

tools: bpftool: partition basic-block for each function in the CFG

This patch partition basic-block for each function in the CFG. The
algorithm is simple, we identify basic-block head in a first traversal,
then second traversal to identify the tail.

We could build extended basic-block (EBB) in next steps. EBB could make the
graph more readable when the eBPF sequence is big.

Signed-off-by: Jiong Wang <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

tools: bpftool: detect sub-programs from the eBPF sequence

This patch detect all sub-programs from the eBPF sequence and keep the
information in the new CFG data structure.

The detection algorithm is basically the same as the one in verifier except
we need to use insn->off instead of insn->imm to get the pc-relative call
offset. Because verifier has modified insn->off/insn->imm during finishing
the verification.

Also, we don't need to do some sanity checks as verifier has done them.

Signed-off-by: Jiong Wang <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

tools: bpftool: factor out xlated dump related code into separate file

This patch factors out those code of dumping xlated eBPF instructions into
xlated_dumper.[h|c].

They are quite independent dumper functions, so better to be kept
separately.

New dumper support will be added in later patches in this set.

Signed-off-by: Jiong Wang <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

tools: bpftool: remove unnecessary 'if' to reduce indentation

It is obvious we could use 'else if' instead of start a new 'if' in the
touched code.

Signed-off-by: Jiong Wang <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

socket: skip checking sk_err for recvmmsg(MSG_ERRQUEUE)

recvmmsg does not call ___sys_recvmsg when sk_err is set.
That is fine for normal reads but, for MSG_ERRQUEUE, recvmmsg
should always call ___sys_recvmsg regardless of sk->sk_err to
be able to clear error queue. Otherwise, users are not able to
drain the error queue using recvmmsg.

Signed-off-by: Soheil Hassas Yeganeh <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: allow interface to be set into VRF if VLAN interface in same VRF

Setting an interface into a VRF fails with 'RTNETLINK answers: File
exists' if one of its VLAN interfaces is already in the same VRF.
As the VRF is an upper device of the VLAN interface, it is also showing
up as an upper device of the interface itself. The solution is to
restrict this check to devices other than master. As only one master
device can be linked to a device, the check in this case is that the
upper device (VRF) being linked to is not the same as the master device
instead of it not being any one of the upper devices.

The following example shows an interface ens12 (with a VLAN interface
ens12.10) being set into VRF green, which behaves as expected:

  # ip link add link ens12 ens12.10 type vlan id 10
  # ip link set dev ens12 master vrfgreen
  # ip link show dev ens12
    3: ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
       master vrfgreen state UP mode DEFAULT group default qlen 1000
       link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff

But if the VLAN interface has previously been set into the same VRF,
then setting the interface into the VRF fails:

  # ip link set dev ens12 nomaster
  # ip link set dev ens12.10 master vrfgreen
  # ip link show dev ens12.10
    39: ens12.10@ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
    qdisc noqueue master vrfgreen state UP mode DEFAULT group default
    qlen 1000 link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff
  # ip link set dev ens12 master vrfgreen
    RTNETLINK answers: File exists

The workaround is to move the VLAN interface back into the default VRF
beforehand, but it has to be shut first so as to avoid the risk of
traffic leaking from the VRF. This fix avoids needing this workaround.

Signed-off-by: Mike Manning <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-phy-Reduce-duplication'

Florian Fainelli says:

====================
net: phy: Reduce duplication

This patch series reduces the duplication among 10G PHY drivers that just
essentially stub most functions, but do that while replicating what the existing
generic functions do.

Changes in v3:

- removed unused "reg" variable in teranetics.c
- fixed subject for patch 5 since we actually use gen10g_no_soft_reset()

Changes in v2:

- rename gen10g_soft_reset() to gen10g_no_soft_reset() to better illustrate
what it does (or does not)
- removed stray comment in marvell10g.c
====================

Signed-off-by: David S. Miller <[email protected]>