Git Repo - linux.git/log

virtio-net: avoid unnecessary sg initialzation

Usually an skb does not have up to MAX_SKB_FRAGS frags. So no need to
initialize the unuse part of sg. This patch initialize the sg based on
the real number it will used:

- during xmit, it could be inferred from nr_frags and can_push.
- for small receive buffer, it will also be 2.

Cc: Michael S. Tsirkin <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'geneve-consolidation'

Pravin B Shelar says:

====================
Geneve: Add support for tunnel metadata mode

Following patches adds support for Geneve tunnel metadata
mode. OVS can make use of Geneve net-device with tunnel
metadata API from kernel.

This also allows us to consolidate Geneve implementation
from two kernel modules geneve_core and geneve to single
geneve module. geneve_core module was targeted to share
Geneve encap and decap code between Geneve netdevice and
OVS Geneve tunnel implementation, Since OVS no longer
needs these API, Geneve code can be consolidated into
single geneve module.

v3-v4:
- Drop NETIF_F_NETNS_LOCAL feature.
- Fix geneve device newlink check

v2-v3:
- make tunnel medata device and regular device mutually exclusive.
- Fix Kconfig dependency for Geneve.
- Fix dst-port netlink encoding.
- drop changelink patch.

v1-v2:
- Replaced per hash table tunnel pointer (metadata enabled) with flag.
- Added support for changelink.
- Improve geneve device route lookup with more parameters.
====================

Signed-off-by: David S. Miller <[email protected]>

geneve: Move device hash table to geneve socket.

This change simplifies Geneve Tunnel hash table management.

Signed-off-by: Pravin B Shelar <[email protected]>
Reviewed-by: Jesse Gross <[email protected]>
Reviewed-by: John W. Linville <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

geneve: Consolidate Geneve functionality in single module.

geneve_core module handles send and receive functionality.
This way OVS could use the Geneve API. Now with use of
tunnel meatadata mode OVS can directly use Geneve netdevice.
So there is no need for separate module for Geneve. Following
patch consolidates Geneve protocol processing in single module.

Signed-off-by: Pravin B Shelar <[email protected]>
Reviewed-by: Jesse Gross <[email protected]>
Acked-by: John W. Linville <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

openvswitch: Use Geneve device.

With help of tunnel metadata mode OVS can directly use
Geneve devices to implement Geneve tunnels.
This patch removes all of the OVS specific Geneve code
and make OVS use a Geneve net_device. Basic geneve vport
is still there to handle compatibility with current
userspace application.

Signed-off-by: Pravin B Shelar <[email protected]>
Reviewed-by: Jesse Gross <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

geneve: Add support to collect tunnel metadata.

Following patch create new tunnel flag which enable
tunnel metadata collection on given device. These devices
can be used by tunnel metadata based routing or by OVS.
Geneve Consolidation patch get rid of collect_md_tun to
simplify tunnel lookup further.

Signed-off-by: Pravin B Shelar <[email protected]>
Reviewed-by: Jesse Gross <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

geneve: Make dst-port configurable.

Add netlink interface to configure Geneve UDP port number.
So that user can configure it for a Gevene device.

Signed-off-by: Pravin B Shelar <[email protected]>
Reviewed-by: Jesse Gross <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Acked-by: John W. Linville <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tunnel: introduce udp_tun_rx_dst()

Introduce function udp_tun_rx_dst() to initialize tunnel dst on
receive path.

Signed-off-by: Pravin B Shelar <[email protected]>
Reviewed-by: Jesse Gross <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

geneve: Use skb mark and protocol to lookup route.

On packet transmit path geneve need to lookup route. Following
patch improves route lookup using more parameters.

Signed-off-by: Pravin B Shelar <[email protected]>
Reviewed-by: Jesse Gross <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Acked-by: John W. Linville <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

geneve: Initialize ethernet address in device setup.

Signed-off-by: Pravin B Shelar <[email protected]>
Reviewed-by: Jesse Gross <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Acked-by: John W. Linville <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bridge: Add netlink support for vlan_protocol attribute

This enables bridge vlan_protocol to be configured through netlink.

When CONFIG_BRIDGE_VLAN_FILTERING is disabled, kernel behaves the
same way as this feature is not implemented.

Signed-off-by: Toshiaki Makita <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bnx2x: Add new device ids under the Qlogic vendor

This adds support for 3 new PCI device combinations -
1077:16a1, 1077:16a4 and 1077:16ad.

Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

smsc911x: Ignore error return from device_get_phy_mode()

Commit 62ee783bf1f8 ("smsc911x: Fix crash seen if neither ACPI nor OF is
configured or used") introduces an error check for the return value from
device_get_phy_mode() and bails out if there is an error. Unfortunately,
there are configurations where no phy is configured. Those configurations
now fail.

To fix the problem, accept error returns from device_get_phy_mode(),
and use the return value from device_property_read_u32() to determine
if there is a suitable firmware interface to read the configuration.

Fixes: 62ee783bf1f8 ("smsc911x: Fix crash seen if neither ACPI nor OF is configured or used")
Tested-by: Tony Lindgren <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

device property: Return -ENXIO if there is no suitable FW interface

Return -ENXIO if device property array access functions don't find
a suitable firmware interface.

This lets drivers decide if they should use available platform data
instead.

Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Tested-by: Jeremy Linton <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sched: consolidate tc_classify{,_compat}

For classifiers getting invoked via tc_classify(), we always need an
extra function call into tc_classify_compat(), as both are being
exported as symbols and tc_classify() itself doesn't do much except
handling of reclassifications when tp->classify() returned with
TC_ACT_RECLASSIFY.

CBQ and ATM are the only qdiscs that directly call into tc_classify_compat(),
all others use tc_classify(). When tc actions are being configured
out in the kernel, tc_classify() effectively does nothing besides
delegating.

We could spare this layer and consolidate both functions. pktgen on
single CPU constantly pushing skbs directly into the netif_receive_skb()
path with a dummy classifier on ingress qdisc attached, improves
slightly from 22.3Mpps to 23.1Mpps.

Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-08-26

This series contains updates to i40e and i40evf only.

Anjali provides a fix for i40e where the part is not receiving multicast
or VLAN tagged packets when in promiscuous mode.  This can occur when a
software bridge is created on top of the device.  Fixed the legacy and MSI
interrupt mode in the driver, which was non-existent before since we
were assuming MSIX was the only mode that the driver ran in.  Fixed the
i40evf driver, where the wrong defines were getting used in the VF
driver.

Mitch fixes a sparse warning about comparing __le16 to u16 so use
le16_to_cpu() to resolve the warning.  Also fixed a dyslexic spelling
of invalid.

Shannon adds port.crc_errors to receive CRC error counter, since it
is a receive counter.

Catherine provides a fix to move the stopping of the service task and
flow director to i40e_shutdown() instead of i40e_suspend().

Greg fixes the ethtool offline diagnostic with netqueues, which just need
to be treated the same as virtual functions when someone wants to run the
ethtool offline diagnostic test.  Also fixed up code comments for the
i40e ethtool diagnostic test function.  Cleans up redundant and unneeded
messages, since the kernel notifies all VXLAN capable registered drivers,
so no need to log this.

Neerav adds the ability to update statistics per VEB per traffic class
and dump it via ethtool.

Jingjing adds support for virtual channel offload to support receive
polling mode in the VF driver.

v2: dropped patch which added helper functions into a header, feedback from
    David Miller was to make the functions constant to reduce the driver
    footprint, so remove the patch while Anjali works on making the requested
    changes.
====================

Signed-off-by: David S. Miller <[email protected]>

sctp: asconf's process should verify address parameter is in the beginning

in sctp_process_asconf(), we get address parameter from the beginning of
the addip params. but we never check if it's really there. if the addr
param is not there, it still can pass sctp_verify_asconf(), then to be
handled by sctp_process_asconf(), it will not be safe.

so add a code in sctp_verify_asconf() to check the address parameter is in
the beginning, or return false to send abort.

note that this can also detect multiple address parameters, and reject it.

Signed-off-by: Xin Long <[email protected]>
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

smsc9194: Remove uncompilable #if 0'd use of pr_dbg

No pr_dbg method exists.

While this code is #if 0'd, it'd be nicer to
use the generic hex_dump, so use it instead.

Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'xgene-tso'

Iyappan Subramanian says:

====================
drivers: net: xgene: Add TSO support

Adding TSO support for 10GbE

iperf Tx data rate without TSO: 3.42 Gbps
with TSO: 9.41 Gbps

v2: Address review comments from v1
- skb_linearize() if headers doesn't fit in 3 hardware buffers

v1:
* Initial version
====================

Signed-off-by: Iyappan Subramanian <[email protected]>

drivers: net: xgene: Adding support for TSO

Signed-off-by: Iyappan Subramanian <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drivers: net: xgene: Preparatory patch for TSO support

- Rearranged descriptor writes
- Moved increment command write to xgene_enet_setup_tx_desc

Signed-off-by: Iyappan Subramanian <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'ovs-conntrack'

Joe Stringer says:

====================
OVS conntrack support

The goal of this series is to allow OVS to send packets through the Linux
kernel connection tracker, and subsequently match on fields populated by
conntrack. This functionality is enabled through a new
CONFIG_OPENVSWITCH_CONNTRACK option.

This version addresses the feedback from v5, primarily checking the behaviour
is correct with different configurations such as disabling
CONFIG_OPENVSWITCH_CONNTRACK or disabling individual conntrack features like
connlabels.

The branch below has been updated with the corresponding userspace pieces:
https://github.com/joestringer/ovs dev/ct_20150818
====================

Signed-off-by: David S. Miller <[email protected]>

openvswitch: Allow attaching helpers to ct action

Add support for using conntrack helpers to assist protocol detection.
The new OVS_CT_ATTR_HELPER attribute of the CT action specifies a helper
to be used for this connection. If no helper is specified, then helpers
will be automatically applied as per the sysctl configuration of
net.netfilter.nf_conntrack_helper.

The helper may be specified as part of the conntrack action, eg:
ct(helper=ftp). Initial packets for related connections should be
committed to allow later packets for the flow to be considered
established.

Example ovs-ofctl flows allowing FTP connections from ports 1->2:
in_port=1,tcp,action=ct(helper=ftp,commit),2
in_port=2,tcp,ct_state=-trk,action=ct(recirc)
in_port=2,tcp,ct_state=+trk-new+est,action=1
in_port=2,tcp,ct_state=+trk+rel,action=1

Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

openvswitch: Allow matching on conntrack label

Allow matching and setting the ct_label field. As with ct_mark, this is
populated by executing the CT action. The label field may be modified by
specifying a label and mask nested under the CT action. It is stored as
metadata attached to the connection. Label modification occurs after
lookup, and will only persist when the conntrack entry is committed by
providing the COMMIT flag to the CT action. Labels are currently fixed
to 128 bits in size.

Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netfilter: connlabels: Export setting connlabel length

Add functions to change connlabel length into nf_conntrack_labels.c so
they may be reused by other modules like OVS and nftables without
needing to jump through xt_match_check() hoops.

Suggested-by: Florian Westphal <[email protected]>
Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Florian Westphal <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netfilter: Always export nf_connlabels_replace()

The following patches will reuse this code from OVS.

Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

openvswitch: Allow matching on conntrack mark

Allow matching and setting the ct_mark field. As with ct_state and
ct_zone, these fields are populated when the CT action is executed. To
write to this field, a value and mask can be specified as a nested
attribute under the CT action. This data is stored with the conntrack
entry, and is executed after the lookup occurs for the CT action. The
conntrack entry itself must be committed using the COMMIT flag in the CT
action flags for this change to persist.

Signed-off-by: Justin Pettit <[email protected]>
Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

openvswitch: Add conntrack action

Expose the kernel connection tracker via OVS. Userspace components can
make use of the CT action to populate the connection state (ct_state)
field for a flow. This state can be subsequently matched.

Exposed connection states are OVS_CS_F_*:
- NEW (0x01) - Beginning of a new connection.
- ESTABLISHED (0x02) - Part of an existing connection.
- RELATED (0x04) - Related to an established connection.
- INVALID (0x20) - Could not track the connection for this packet.
- REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
- TRACKED (0x80) - This packet has been sent through conntrack.

When the CT action is executed by itself, it will send the packet
through the connection tracker and populate the ct_state field with one
or more of the connection state flags above. The CT action will always
set the TRACKED bit.

When the COMMIT flag is passed to the conntrack action, this specifies
that information about the connection should be stored. This allows
subsequent packets for the same (or related) connections to be
correlated with this connection. Sending subsequent packets for the
connection through conntrack allows the connection tracker to consider
the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.

The CT action may optionally take a zone to track the flow within. This
allows connections with the same 5-tuple to be kept logically separate
from connections in other zones. If the zone is specified, then the
"ct_zone" match field will be subsequently populated with the zone id.

IP fragments are handled by transparently assembling them as part of the
CT action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.

IP frag handling contributed by Andy Zhou.

Based on original design by Justin Pettit.

Signed-off-by: Joe Stringer <[email protected]>
Signed-off-by: Justin Pettit <[email protected]>
Signed-off-by: Andy Zhou <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dst: Add __skb_dst_copy() variation

This variation on skb_dst_copy() doesn't require two skbs.

Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6: Export nf_ct_frag6_gather()

Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

openvswitch: Move MASKED* macros to datapath.h

This will allow the ovs-conntrack code to reuse these macros.

Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

openvswitch: Serialize acts with original netlink len

Previously, we used the kernel-internal netlink actions length to
calculate the size of messages to serialize back to userspace.
However,the sw_flow_actions may not be formatted exactly the same as the
actions on the wire, so store the original actions length when
de-serializing and re-use the original length when serializing.

Signed-off-by: Joe Stringer <[email protected]>
Acked-by: Pravin B Shelar <[email protected]>
Acked-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nfit: Clarify memory device state flags strings

ACPI 6.0 NFIT Memory Device State Flags in Table 5-129 defines
NVDIMM status as follows.  These bits indicate multiple info,
such as failures, pending event, and capability.

  Bit [0] set to 1 to indicate that the previous SAVE to the
  Memory Device failed.
  Bit [1] set to 1 to indicate that the last RESTORE from the
  Memory Device failed.
  Bit [2] set to 1 to indicate that platform flush of data to
  Memory Device failed. As a result, the restored data content
  may be inconsistent even if SAVE and RESTORE do not indicate
  failure.
  Bit [3] set to 1 to indicate that the Memory Device is observed
  to be not armed prior to OSPM hand off. A Memory Device is
  considered armed if it is able to accept persistent writes.
  Bit [4] set to 1 to indicate that the Memory Device observed
  SMART and health events prior to OSPM handoff.

/sys/bus/nd/devices/nmemX/nfit/flags shows this flags info.
The output strings associated with the bits are "save", "restore",
"smart", etc., which can be confusing as they may be interpreted
as positive status, i.e. save succeeded.

Change also the dev_info() message in acpi_nfit_register_dimms()
to be consistent with the sysfs flags strings.

Reported-by: Robert Elliott <[email protected]>
Signed-off-by: Toshi Kani <[email protected]>
[ross: rename 'not_arm' to 'not_armed']
Cc: Ross Zwisler <[email protected]>
[djbw: defer adding bit5, HEALTH_ENABLED, for now]
Signed-off-by: Dan Williams <[email protected]>

bgmac: support up to 3 cores (devices) on a bus

Broadcom buses may have more than 1 Ethernet device. This is used e.g.
to have few interfaces connected to different switch ports. So far we
saw chipsets with only 2 devices (e.g. BCM4706) but recent ones have
up to 3 (e.g. Netgear R8000 uses 3rd interface for most of switch
traffic, lower interfaces are for some kind of offloading).

Signed-off-by: Rafał Miłecki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sfc: only use vadaptor stats if firmware is capable

Some of the stats handling code differs based on SR-IOV support,
and SRIOV support is only available if full-featured firmware is
used.
Do not use vadaptor stats if firmware mode is not set to
full-featured.

Signed-off-by: Shradha Shah <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'wireless-drivers-next-for-davem-2015-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
Major changes:

iwlwifi:

* new Tx power firmware API
* bump max firmware API to 17
* fix bug in debug prints
* static checker fix
* fix unused defines
* fix command list on newest firmware

brcmfmac:

* support NVRAM loading for bcm47xx platform
* new debugfs entry for msgbuf protocol layer used with PCIe devices

ath10k:

* add spectral scan support for qca99x0
* add qca6164 support
====================

Signed-off-by: David S. Miller <[email protected]>

net: phy: fixed: propagate fixed link values to struct

The fixed link values parsed from the device tree are stored in
the struct fixed_phy member status. The struct phy_device members
speed, duplex were not updated.

Signed-off-by: Madalin Bucur <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

batman-adv: turn batadv_neigh_node_get() into local function

commit c214ebe1eb29 ("batman-adv: move neigh_node list add into
batadv_neigh_node_new()") removed external calls to
batadv_neigh_node_get().

Signed-off-by: Marek Lindner <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>

batman-adv: Add lower layer needed_(head|tail)room to own ones

The maximum of hard_header_len and maximum of all needed_(head|tail)room of
all slave interfaces of a batman-adv device must be used to define the
batman-adv device needed_(head|tail)room. This is required to avoid too
small buffer problems when these slave devices try to send the encapsulated
packet in a tx path without the possibility to resize the skbuff.

Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Marek Lindner <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>

batman-adv: don't access unregistered net_device object

In batadv_hardif_disable_interface() there is a call to
batadv_softif_destroy_sysfs() which in turns invokes
unregister_netdevice() on the soft_iface.
After this point we cannot rely on the soft_iface object
anymore because it might get free'd by the netdev periodic
routine at any time.

For this reason the netdev_upper_dev_unlink(.., soft_iface) call
is moved before the invocation of batadv_softif_destroy_sysfs() so
that we can be sure that the soft_iface object is still valid.

Signed-off-by: Antonio Quartulli <[email protected]>
Signed-off-by: Marek Lindner <[email protected]>

batman-adv: Start new development cycle

Signed-off-by: Simon Wunderlich <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>

batman-adv: fix gateway client style issues

commit 0511575c4d03 ("batman-adv: remove obsolete deleted attribute for
gateway node") incorrectly added an empy line and forgot to remove an
include.

Signed-off-by: Simon Wunderlich <[email protected]>
Signed-off-by: Marek Lindner <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>

batman-adv: rearrange batadv_neigh_node_new() arguments to follow convention

Signed-off-by: Marek Lindner <[email protected]>
Acked-by: Simon Wunderlich <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>

batman-adv: remove obsolete deleted attribute for gateway node

With rcu, the gateway node deleted attribute is not needed anymore. In
fact, it may delay the free of the gateway node and its referenced
structures. Therefore remove it altogether and simplify purging as well.

Signed-off-by: Simon Wunderlich <[email protected]>
Signed-off-by: Marek Lindner <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>

batman-adv: move neigh_node list add into batadv_neigh_node_new()

All batadv_neigh_node_* functions expect the neigh_node list item to be part
of the orig_node->neigh_list, therefore the constructor of said list item
should be adding the newly created neigh_node to the respective list.

Signed-off-by: Marek Lindner <[email protected]>
Acked-by: Simon Wunderlich <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>

batman-adv: remove redundant hard_iface assignment

The batadv_neigh_node_new() function already sets the hard_iface pointer.

Signed-off-by: Marek Lindner <[email protected]>
Acked-by: Simon Wunderlich <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>

batman-adv: move hardif refcount inc to batadv_neigh_node_new()

The batadv_neigh_node cleanup function 'batadv_neigh_node_free_rcu()'
takes care of reducing the hardif refcounter, hence it's only logical
to assume the creating function of that same object
'batadv_neigh_node_new()' takes care of increasing the same refcounter.

Signed-off-by: Marek Lindner <[email protected]>
Acked-by: Simon Wunderlich <[email protected]>
Signed-off-by: Antonio Quartulli <[email protected]>

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull amr64 kvm fix from Will Deacon:
"We've uncovered a nasty bug in the arm64 KVM code which allows a badly
  behaved 32-bit guest to bring down the host.  The fix is simple (it's
  what I believe we call a "brown paper bag" bug) and I don't think it
  makes sense to sit on this, particularly as Russell ended up
  triggering this rather than just somebody noticing a potential problem
  by inspection.

  Usually arm64 KVM changes would go via Paolo's tree, but he's on
  holiday at the moment and the deal is that anything urgent gets
  shuffled via the arch trees, so here it is.

  Summary:

  Fix arm64 KVM issue when injecting an abort into a 32-bit guest, which
  would lead to an illegal exception return at EL2 and a subsequent host
  crash"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: KVM: Fix host crash when injecting a fault into a 32bit guest

arm64: KVM: Fix host crash when injecting a fault into a 32bit guest

When injecting a fault into a misbehaving 32bit guest, it seems
rather idiotic to also inject a 64bit fault that is only going
to corrupt the guest state. This leads to a situation where we
perform an illegal exception return at EL2 causing the host
to crash instead of killing the guest.

Just fix the stupid bug that has been there from day 1.

Cc: <[email protected]>
Reported-by: Russell King <[email protected]>
Tested-by: Russell King <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

bpf: fix bpf_skb_set_tunnel_key() helper

Make sure to indicate to tunnel driver that key.tun_id is set,
otherwise gre won't recognize the metadata.

Fixes: d3aa45ce6b94 ("bpf: add helpers to access tunnel metadata")
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

i40e/i40evf: Bump i40e to 1.3.9 and i40evf to 1.3.5

Bump version and update the copyright year for i40evf.

Change-ID: Iddb81b9dba09f0dc57ab54937b5821ecdd721ff6
Signed-off-by: Catherine Sullivan <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e/i40evf: Cache the CEE TLV status returned from firmware

Store the CEE TLV status returned by firmware to allow drivers to dump that
for debug purposes.

Change-ID: Ie3c4cf8cebabee4f15e1e3fdc4fc8a68bbca40ee
Signed-off-by: Neerav Parikh <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e/i40evf: add VIRTCHNL_VF_OFFLOAD flag

Add virtual channel offload capability to support RX polling mode in the
VF.

Change-ID: Ib643ae2a7506dfc75fc489fc207493fabefa4832
Signed-off-by: Jingjing Wu <[email protected]>
Signed-off-by: Anjali Singhai Jain <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: Remove redundant and unneeded messages

The kernel notifies all VXLAN capable registered drivers, i.e. any
driver that implements ndo_add_vxlan_port(), of the addition of a
port so that the driver can track which ports are in use. There's
no need to log this - it just fills the system log with useless and
irksome noise.

Also, when failing to init SR-IOV interfaces the driver was printing the
same message twice. Just remove the inner printk and let the outer message
catch enable as well as the other failures.

Change-ID: Id5ecb1d425c2a357ee2bc1635dab24553831dade
Signed-off-by: Greg Rose <[email protected]>
Signed-off-by: Jesse Brandeburg <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40evf: Remove PF specific register definitions from the VF

There were quite a few issues when the wrong defines were getting used
in the VF driver. This patch fixes the code where PF driver registers
were getting used for VF driver, and also removes the registers that are
not being used from the VF register file.

Change-ID: If116a9730112950d006eb8ec763998fc914cc839
Signed-off-by: Anjali Singhai Jain <[email protected]>
Acked-by: Mitch Williams <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40evf: Use the correct defines to match the VF registers

Use CTLN1 instead of CTLN for the VF relative register space.

Change-ID: Iefba63faf0307af55fec8dbb64f26059f7d91318
Signed-off-by: Anjali Singhai Jain <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: correct spelling error

Turns out that 'inavlid' is an inavlid spelling for 'invalid'.

Change-ID: Ie1fe2d0f8d1ba75ab880594875ec2e4152a76f61
Signed-off-by: Mitch Williams <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: Fix comment for ethtool diagnostic link test

The existing comment is incorrect. Add new comment to point out that the
PF reset does not affect link but if the reset is changed to a different
type that does affect link then the link test would need to be moved to
before the reset.

Change-ID: I28d786f46e9465860babdee61c1dba51016464df
Reported-by: Jeremiah Kyle <[email protected]>
Signed-off-by: Greg Rose <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e/i40evf: Add capability to gather VEB per TC stats

This patch adds capability to update per VEB per TC statistics and dump
it via ethtool. It also adds a structure to hold VEB per TC statistics.
The fields can be filled by reading the GLVEBTC_* counters.

Change-ID: I28b4759b9ab6ad5a61f046a1bc9ef6b16fe31538
Signed-off-by: Neerav Parikh <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: Fix ethtool offline diagnostic with netqueues

Treat netqueues the same way we do virtual functions when someone wants to
run the ethtool offline diagnostic test.

Change-ID: Id48d2b933f1fd0db7be06305a93c6ebe3dc821f5
Signed-off-by: Greg Rose <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: Fix legacy interrupt mode in the driver

This patch fixes the driver flow to take into account legacy interrupts.
Over time we added code that assumes MSIX is the only mode that the
driver runs in. It also enables a legacy workaround to trigger SWINT
when the TX ring has non-cache aligned descriptors pending and interrupts
are disabled.

We work with a single vector in MSI mode too, so apply the same
restrictions as Legacy.

Change-ID: I826ddff1f9bd45d2dbe11f56a3ddcef0dbf42563
Signed-off-by: Anjali Singhai Jain <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: Move function calls to i40e_shutdown instead of i40e_suspend

We should be stopping the service task and flow director on
shutdown not on suspension.

Signed-off-by: Catherine Sullivan <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: add RX to port CRC errors label

The port.crc_errors is really an RX counter, so let's mark it as such.

Change-ID: I179afd3f8a95d45229bb4163a6aeb01f0d2d250b
Signed-off-by: Shannon Nelson <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

i40e: don't degrade __le16

Sparse cries when we compare an __le16 to a u16, almost like it cares
about architectures other than x86. Weird. Use the le16_to_cpu macro to
make it stop crying.

Change-ID: Id068f4d7868a2d3df234a791a76d15938f37db35
Signed-off-by: Mitch Williams <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

Merge tag 'ipvs2-for-v4.3' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next

Simon Horman says:

====================
Second Round of IPVS Updates for v4.3

I realise these are a little late in the cycle, so if you would prefer
me to repost them for v4.4 then just let me know.

The updates include:
* A new scheduler from Raducu Deaconu
* Enhanced configurability of the sync daemon from Julian Anastasov
====================

Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: ip6t_REJECT: added missing icmpv6 codes

RFC 4443 added two new codes values for ICMPv6 type 1:

5 - Source address failed ingress/egress policy
6 - Reject route to destination

And RFC 7084 states in L-14 that IPv6 Router MUST send ICMPv6 Destination
Unreachable with code 5 for packets forwarded to it that use an address
from a prefix that has been invalidated.

Codes 5 and 6 are more informative subsets of code 1.

Signed-off-by: Andreas Herz <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Two fixes in this pull request:

   - The writeback regression fix from Tejun, which has been weeks in
     the making.  This fixes a case where we would sometimes not issue
     writeback when we should have.

   - An older fix for a memory corruption issue in mtip32xx.  It was
     deferred since we wanted a better fix for this (driver should not
     have to handle that case), but given the timing, it's better to put
     the simple fix in for 4.2 release"

* 'for-linus' of git://git.kernel.dk/linux-block:
  mtip32x: fix regression introduced by blk-mq per-hctx flush
  writeback: sync_inodes_sb() must write out I_DIRTY_TIME inodes and always call wait_sb_inodes()

Merge branch 'act_bpf_lockless'

Alexei Starovoitov says:

====================
act_bpf: remove spinlock in fast path

v1 version had a race condition in cleanup path of bpf_prog.
I tried to fix it by adding new callback 'cleanup_rcu' to 'struct tcf_common'
and call it out of act_api cleanup path, but Daniel noticed
(thanks for the idea!) that most of the classifiers already do action cleanup
out of rcu callback.
So instead this set of patches converts tcindex and rsvp classifiers to call
tcf_exts_destroy() after rcu grace period and since action cleanup logic
in __tcf_hash_release() is only called when bind and refcnt goes to zero,
it's guaranteed that cleanup() callback is called from rcu callback.
More specifically:
patches 1 and 2 - simple fixes
patches 2 and 3 - convert tcf_exts_destroy in tcindex and rsvp to call_rcu
patch 5 - removes spin_lock from act_bpf

The cleanup of actions is now universally done after rcu grace period
and in the future we can drop (now unnecessary) call_rcu from tcf_hash_destroy()
patch 5 is using synchronize_rcu() in act_bpf replacement path, since it's
very rare and alternative of dynamically allocating 'struct tcf_bpf_cfg' just
to pass it to call_rcu looks even less appealing.
====================

Signed-off-by: David S. Miller <[email protected]>

net_sched: act_bpf: remove spinlock in fast path

Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing
with extra care taken to free bpf programs after rcu grace period.
Replacement of existing act_bpf (very rare) is done with synchronize_rcu()
and final destruction is done from tc_action_ops->cleanup() callback that is
called from tcf_exts_destroy()->tcf_action_destroy()->__tcf_hash_release() when
bind and refcnt reach zero which is only possible when classifier is destroyed.
Previous two patches fixed the last two classifiers (tcindex and rsvp) to
call tcf_exts_destroy() from rcu callback.

Similar to gact/mirred there is a race between prog->filter and
prog->tcf_action. Meaning that the program being replaced may use
previous default action if it happened to return TC_ACT_UNSPEC.
act_mirred race betwen tcf_action and tcfm_dev is similar.
In all cases the race is harmless.
Long term we may want to improve the situation by replacing the whole
tc_action->priv as single pointer instead of updating inner fields one by one.

Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net_sched: convert rsvp to call tcf_exts_destroy from rcu callback

Adjust destroy path of cls_rsvp to call tcf_exts_destroy() after
rcu grace period.

Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net_sched: convert tcindex to call tcf_exts_destroy from rcu callback

Adjust destroy path of cls_tcindex to call tcf_exts_destroy() after
rcu grace period.

Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net_sched: act_bpf: remove unnecessary copy

Fix harmless typo and avoid unnecessary copy of empty 'prog' into
unused 'strcut tcf_bpf_cfg old'.

Fixes: f4eaed28c783 ("act_bpf: fix memory leaks when replacing bpf programs")
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net_sched: make tcf_hash_destroy() static

tcf_hash_destroy() used once. Make it static.

Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

lib/Makefile: remove CONFIG_AVERAGE build rule

The Kconfig option AVERAGE and its implementation has been removed by
commit f4e774f55fe0 ("average: remove out-of-line implementation").
Remove the dead build rule in lib/Makefile.

Signed-off-by: Valentin Rothberg <[email protected]>
Reviewed-by: Johannes Berg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

powerpc/PCI: Disable MSI/MSI-X interrupts at PCI probe time in OF case

Since commit 1851617cd2da ("PCI/MSI: Disable MSI at enumeration even if
kernel doesn't support MSI"), the setup of dev->msi_cap/msix_cap and the
disable of MSI/MSI-X interrupts isn't being done at PCI probe time, as
the logic responsible for this was moved in the aforementioned commit
from pci_device_add() to pci_setup_device(). The latter function is not
reachable on PowerPC pseries platform during Open Firmware PCI probing
time.

This exhibits as drivers not being able to enable MSI, eg:

bnx2x 0000:01:00.0: no msix capability found

This patch calls pci_msi_setup_pci_dev() explicitly to disable MSI/MSI-X
during PCI probe time on pSeries platform.

Fixes: 1851617cd2da ("PCI/MSI: Disable MSI at enumeration even if kernel doesn't support MSI")
[mpe: Flesh out change log and clarify comment]
Signed-off-by: Guilherme G. Piccoli <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

PCI: Make pci_msi_setup_pci_dev() non-static for use by arch code

Commit 1851617cd2da ("PCI/MSI: Disable MSI at enumeration even if kernel
doesn't support MSI") changed the location of the code that initialises
dev->msi_cap/msix_cap and then disables MSI/MSI-X interrupts at PCI
probe time in devices that have this flag set. It moved the code from
pci_msi_init_pci_dev() to a new function named pci_msi_setup_pci_dev(),
called by pci_setup_device().

The pseries PCI probing code does not call pci_setup_device(), so since
the aforementioned commit the function pci_msi_setup_pci_dev() is not
called and MSI/MSI-X interrupts are left enabled. Additionally because
dev->msi_cap/msix_cap are not initialised no driver can ever enable
MSI/MSI-X.

To fix this, the pseries PCI probe should manually call
pci_msi_setup_pci_dev(), so this patch makes it non-static.

Fixes: 1851617cd2da ("PCI/MSI: Disable MSI at enumeration even if kernel doesn't support MSI")
[mpe: Update change log to mention dev->msi_cap/msix_cap]
Signed-off-by: Guilherme G. Piccoli <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

Merge ath-next from ath.git

Major changes in ath10k:

* add spectral scan support for qca99x0
* add qca6164 support

ath10k: fix compilation warnings in wmi phyerr pull function

Below compilation warnings are observed in gcc version 4.8.2.
Even though it's not seen in bit older gcc versions (for ex, 4.7.3),
It's good to fix it by changing format specifier from %d to
%zd in wmi pull phyerr functions.

wmi.c: In function 'ath10k_wmi_op_pull_phyerr_ev':
wmi.c:3567:8: warning: format '%d' expects argument of type 'int',
              but argument 4 has type 'long unsigned int' [-Wformat=]
              left_len, sizeof(*phyerr));
                        ^
wmi.c: In function 'ath10k_wmi_10_4_op_pull_phyerr_ev':
wmi.c:3612:8: warning: format '%d' expects argument of type 'int',
      but argument 4 has type 'long unsigned int' [-Wformat=]
              left_len, sizeof(*phyerr));
                        ^
Fixes: 991adf71a6cd ("ath10k: refactor phyerr event handlers")
Fixes: 2b0a2e0d7c2f ("ath10k: handle 10.4 firmware phyerr event")
Signed-off-by: Raja Mani <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

ath10k: add qca6164 support

This adds additional 0x0041 PCI Device ID
definition to ath10k for QCA6164 which is a 1
spatial stream sibling of the QCA6174 (which is 2
spatial stream chip).

The QCA6164 needs a dedicated board.bin file which
is different than the one used for QCA6174. If the
board.bin is wrong the device will crash early
while trying to boot firmware. The register dump
will look like this:

ath10k_pci 0000:02:00.0: firmware register dump:
ath10k_pci 0000:02:00.0: [00]: 0x05010000 0x000015B3 0x000A012D 0x00955B31
...

Note the value 0x000A012D.

Special credit goes to Alan Liu
<[email protected]> for providing support
help which enabled me to come up with this patch.

Signed-off-by: Michal Kazior <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

ath10k: add spectral scan support for 10.4 fw

To enable/configure spectral scan parameters in 10.4 firmware, existing
wmi spectral related functions can be reused. Link those functions in
10.4 wmi ops table.

In addition, adjust bin size (only when size is 68 bytes) before reporting
bin samples to user space. The background for this adjustment is that
qca99x0 reports bin size as 68 bytes (64 bytes + 4 bytes) in report
mode 2. First 64 bytes carries in-band tones (-32 to +31) and last 4 byte
carries band edge detection data (+32) mainly used in radar detection
purpose. Additional last 4 bytes are stripped to make bin size valid one.

This bin size adjustment will happen only for qca99x0, all other chipsets
will report proper bin sizes (64/128) without extra 4 bytes being added
at the end. The changes are validated in qca99x0 using 10.4 firmware.

Signed-off-by: Raja Mani <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

ath10k: fix dma_mapping_error() handling

The function returns 1 when DMA mapping fails. The
driver would return bogus values and could
possibly confuse itself if DMA failed.

Fixes: 767d34fc67af ("ath10k: remove DMA mapping wrappers")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Michal Kazior <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

ath10k: add missing mutex unlock on failpath

Kernel would complain about leaving a held lock
after going back to userspace and would
subsequently deadlock.

Fixes: e04cafbc38c7 ("ath10k: fix peer limit enforcement")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Michal Kazior <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared

It is needed to check EVENT_NO_RUNTIME_PM bit of dev->flags in
usbnet_stop(), but its value should be read before it is cleared
when dev->flags is set to 0.

The problem was spotted and the fix was provided by
Oliver Neukum <[email protected]>.

Signed-off-by: Eugene Shatokhin <[email protected]>
Acked-by: Oliver Neukum <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull LSM regression fix from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
LSM: restore certain default error codes

Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull nvdimm fix from Dan Williams:
"A single fix for status register read size in the nd_blk driver.

  The effect of getting the width of this register read wrong is that
  all I/O fails when the read returns non-zero.  Given the availability
  of ACPI 6 NFIT enabled platforms, this could reasonably wait to come
  in during the 4.3 merge window with a tag for 4.2-stable.  Otherwise,
  this makes the 4.2 kernel fully functional with devices that conform
  to the mmio-block-apertures defined in the ACPI 6 NFIT (NVDIMM
  Firmware Interface Table)"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nfit, nd_blk: BLK status register is only 32 bits

drivers: net: xgene: fix: Oops in linkwatch_fire_event

[ 1065.801569] Internal error: Oops: 96000006 [#1] SMP
...
[ 1065.866655] Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Apr 22 2015
[ 1065.873937] Workqueue: events_power_efficient phy_state_machine
[ 1065.879837] task: fffffe01de105e80 ti: fffffe00bcf18000 task.ti: fffffe00bcf18000
[ 1065.887288] PC is at linkwatch_fire_event+0xac/0xc0
[ 1065.892141] LR is at linkwatch_fire_event+0xa0/0xc0
[ 1065.896995] pc : [<fffffe000060284c>] lr : [<fffffe0000602840>] pstate: 200001c5
[ 1065.904356] sp : fffffe00bcf1bd00
...
[ 1066.196813] Call Trace:
[ 1066.199248] [<fffffe000060284c>] linkwatch_fire_event+0xac/0xc0
[ 1066.205140] [<fffffe000061167c>] netif_carrier_off+0x54/0x64
[ 1066.210773] [<fffffe00004f1654>] phy_state_machine+0x120/0x3bc
[ 1066.216578] [<fffffe00000d8d10>] process_one_work+0x15c/0x3a8
[ 1066.222296] [<fffffe00000d9090>] worker_thread+0x134/0x470
[ 1066.227757] [<fffffe00000df014>] kthread+0xe0/0xf8
[ 1066.232525] Code: 97f65ee9 f9420660 d538d082 8b000042 (885f7c40)

The fix is to call phy_disconnect() from xgene_enet_mdio_remove,
which in turn call cancel_delayed_work_sync().

Signed-off-by: Iyappan Subramanian <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cls_u32: complete the check for non-forced case in u32_destroy()

In commit 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
I added a check in u32_destroy() to see if all real filters are gone
for each tp, however, that is only done for root_ht, same is needed
for others.

This can be reproduced by the following tc commands:

tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor 256
tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32
ht 15:2: match ip src 10.0.0.2 flowid 1:10
tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32
ht 15:2: match ip src 10.0.0.3 flowid 1:10

Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
Reported-by: Akshat Kakkar <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'dsa-docs'

Florian Fainelli says:

====================
Documentation: dsa

This patch series adds some documentation about DSA as a subsystem as well
as the SF2 driver since it slightly diverges from your average DSA driver ;)
====================

Signed-off-by: David S. Miller <[email protected]>

Documentation: networking: dsa: Add Broadcom SF2 document

Add a document describing the Broadcom Starfigther 2 switch hardware,
its specifics, and how the driver is implemented and its specifics.

Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Vivien Didelot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Documentation: networking: add a DSA document

Describe how the DSA subsystem works, its design principles,
limitations, and describe in details how to implement a DSA switch
driver.

Acked-by: Andrew Lunn <[email protected]>
Acked-by: Scott Feldman <[email protected]>
Reviewed-by: Vivien Didelot <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

LSM: restore certain default error codes

While in most cases commit b1d9e6b064 ("LSM: Switch to lists of hooks")
retained previous error returns, in three cases it altered them without
any explanation in the commit message. Restore all of them - in the
security_old_inode_init_security() case this led to reiserfs using
uninitialized data, sooner or later crashing the system (the only other
user of this function - ocfs2 - was unaffected afaict, since it passes
pre-initialized structures).

Signed-off-by: Jan Beulich <[email protected]>
Signed-off-by: Casey Schaufler <[email protected]>
Signed-off-by: James Morris <[email protected]>

nfit, nd_blk: BLK status register is only 32 bits

Only read 32 bits for the BLK status register in read_blk_stat().

The format and size of this register is defined in the
"NVDIMM Driver Writer's guide":

http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf

Signed-off-by: Ross Zwisler <[email protected]>
Reported-by: Nicholas Moulin <[email protected]>
Tested-by: Nicholas Moulin <[email protected]>
Reviewed-by: Jeff Moyer <[email protected]>
Signed-off-by: Dan Williams <[email protected]>

net: fec: use reinit_completion() in mdio accessor functions

Rather than re-initialising the entire completion on every mdio access,
use reinit_completion() which only resets the completion count. This
avoids possible reinitialisation of the contained spinlock and waitqueue
while they may be in use (eg, mid-completion.)

Such an event could occur if there's a long delay in interrupt handling
causing the mdio accessor to time out, then a second access comes in
while the interrupt handler on a different CPU has called complete().
Another scenario where this has been observed is while locking has
been missing at the phy layer, allowing concurrent attempts to access
the MDIO bus.

Signed-off-by: Russell King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: add locking to phy_read_mmd_indirect()/phy_write_mmd_indirect()

The phy layer is missing locking for the above two functions - it
has been observed that two threads (userspace and the phy worker
thread) can race, entering the bus ->write or ->read functions
simultaneously.

This causes the FEC driver to initialise a completion while another
thread is waiting on it or while the interrupt is calling complete()
on it, which causes spinlock unlock-without-lock, spinlock lockups,
and completion timeouts.

Fixes: a59a4d192 ("phy: add the EEE support and the way to access to the MMD registers.")
Fixes: 0c1d77dfb ("net: libphy: Add phy specific function to access mmd phy registers")
Signed-off-by: Russell King <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'rds-more-fixes'

Santosh Shilimkar says:

====================
RDS: Few more fixes

As indicated in the earlier series [1], this is a follow-up series which
addresses few issues around the RDS FMR code. With [1] and the subject
series, now I can run many parallel threads with multiple sockets with
N x N traffic. The stress tests has survived overnight runs.

[1] https://lkml.org/lkml/2015/8/22/127
====================

Signed-off-by: David S. Miller <[email protected]>

RDS: remove superfluous from rds_ib_alloc_fmr()

Memory allocated for 'ibmr' uses kzalloc_node() which already
initialises the memory to zero. There is no need to do
memset() 0 on that memory.

Signed-off-by: Santosh Shilimkar <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

RDS: flush the FMR pool less often

FMR flush is an expensive and time consuming operation. Reduce the
frequency of FMR pool flush by 50% so that more FMR work gets accumulated
for more efficient flushing.

Signed-off-by: Santosh Shilimkar <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

RDS: push FMR pool flush work to its own worker

RDS FMR flush operation and also it races with connect/reconect
which happes a lot with RDS. FMR flush being on common rds_wq aggrevates
the problem. Lets push RDS FMR pool flush work to its own worker.

Signed-off-by: Santosh Shilimkar <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

RDS: fix fmr pool dirty_count

In rds_ib_flush_mr_pool(), dirty_count accounts the clean ones
which is wrong. This can lead to a negative dirty count value.

Lets fix it.

Signed-off-by: Wengang Wang <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

RDS: Fix rds MR reference count in rds_rdma_unuse()

rds_rdma_unuse() drops the mr reference count which it hasn't
taken. Correct way of removing mr is to remove mr from the tree
and then rdma_destroy_mr() it first, then rds_mr_put() to decrement
its reference count. Whichever thread holds last reference will free
the mr via rds_mr_put()

This bug was triggering weird null pointer crashes. One if the trace
for it is captured below.

BUG: unable to handle kernel NULL pointer dereference at
0000000000000104
IP: [<ffffffffa0899471>] rds_ib_free_mr+0x31/0x130 [rds_rdma]
PGD 4366fa067 PUD 4366f9067 PMD 0
Oops: 0000 [#1] SMP

[...]

task: ffff88046da6a000 ti: ffff88046da6c000 task.ti: ffff88046da6c000
RIP: 0010:[<ffffffffa0899471>]  [<ffffffffa0899471>]
rds_ib_free_mr+0x31/0x130 [rds_rdma]
RSP: 0018:ffff88046fa43bd8  EFLAGS: 00010286
RAX: 0000000071d38b80 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880079e7ff40
RBP: ffff88046fa43bf8 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88046fa43ca8 R11: ffff88046a802ed8 R12: ffff880079e7fa40
R13: 0000000000000000 R14: ffff880079e7ff40 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88046fa40000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000104 CR3: 00000004366fb000 CR4: 00000000000006e0
Stack:
ffff880079e7fa40 ffff880671d38f08 ffff880079e7ff40 0000000000000296
ffff88046fa43c28 ffffffffa087a38b ffff880079e7fa40 ffff880671d38f10
0000000000000000 0000000000000292 ffff88046fa43c48 ffffffffa087a3b6
Call Trace:
<IRQ>
[<ffffffffa087a38b>] rds_destroy_mr+0x8b/0xa0 [rds]
[<ffffffffa087a3b6>] __rds_put_mr_final+0x16/0x30 [rds]
[<ffffffffa087a492>] rds_rdma_unuse+0xc2/0x120 [rds]
[<ffffffffa08766d3>] rds_recv_incoming_exthdrs+0x83/0xa0 [rds]
[<ffffffffa0876782>] rds_recv_incoming+0x92/0x200 [rds]
[<ffffffffa0895269>] rds_ib_process_recv+0x259/0x320 [rds_rdma]
[<ffffffffa08962a8>] rds_ib_recv_tasklet_fn+0x1a8/0x490 [rds_rdma]
[<ffffffff810dcd78>] ? __remove_hrtimer+0x58/0x90
[<ffffffff810799e1>] tasklet_action+0xb1/0xc0
[<ffffffff81079b52>] __do_softirq+0xe2/0x290
[<ffffffff81079df6>] irq_exit+0xa6/0xb0
[<ffffffff81613915>] do_IRQ+0x65/0xf0
[<ffffffff816118ab>] common_interrupt+0x6b/0x6b

Signed-off-by: Santosh Shilimkar <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>