David S. Miller [Wed, 22 Mar 2017 19:06:49 +0000 (12:06 -0700)]
Merge tag 'wireless-drivers-for-davem-2017-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.11
iwlwifi
* fix a user reported warning in DQA
mwifiex
* fix a potential double free
* fix lost early debug logs
* fix init wakeup warning message from device framework
* add Ganapathi and Xinming as maintainers
ath10k
* fix regression with QCA6174 during resume and firmware crash
====================
Reshetova, Elena [Tue, 21 Mar 2017 11:59:19 +0000 (13:59 +0200)]
net: convert sk_filter.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Tobias Klauser [Tue, 21 Mar 2017 10:24:38 +0000 (11:24 +0100)]
net: greth: Utilize of_get_mac_address()
Do not open code getting the MAC address exclusively from the
"local-mac-address" property, but instead use of_get_mac_address() which
looks up the MAC address using the 3 typical property names.
Ying Xue [Tue, 21 Mar 2017 09:47:49 +0000 (10:47 +0100)]
tipc: fix nametbl deadlock at tipc_nametbl_unsubscribe
Until now, tipc_nametbl_unsubscribe() is called at subscriptions
reference count cleanup. Usually the subscriptions cleanup is
called at subscription timeout or at subscription cancel or at
subscriber delete.
We have ignored the possibility of this being called from other
locations, which causes deadlock as we try to grab the
tn->nametbl_lock while holding it already.
Gao Feng [Tue, 21 Mar 2017 01:28:03 +0000 (09:28 +0800)]
net: tcp: Permit user set TCP_MAXSEG to default value
When user_mss is zero, it means use the default value. But the current
codes don't permit user set TCP_MAXSEG to the default value.
It would return the -EINVAL when val is zero.
The sample action can be used for translating Openflow 'clone' action.
However its implementation has not been sufficiently optimized for this
use case. This series attempts to close the gap.
Patch 3 commit message has more details on the specific optimizations
implemented.
---
v3->v4: Enhance patch 4.
Fix two bugs pointed out by Pravin,
Remove 'is_sample' variable.
v2->v3: Enhance patch 4, Rafctor to move more common logic to clone_execute().
v1->v2: Address Pravin's comment, Refactor recirc and sample
to share more common code
====================
andy zhou [Mon, 20 Mar 2017 23:32:29 +0000 (16:32 -0700)]
openvswitch: Optimize sample action for the clone use cases
With the introduction of open flow 'clone' action, the OVS user space
can now translate the 'clone' action into kernel datapath 'sample'
action, with 100% probability, to ensure that the clone semantics,
which is that the packet seen by the clone action is the same as the
packet seen by the action after clone, is faithfully carried out
in the datapath.
While the sample action in the datpath has the matching semantics,
its implementation is only optimized for its original use.
Specifically, there are two limitation: First, there is a 3 level of
nesting restriction, enforced at the flow downloading time. This
limit turns out to be too restrictive for the 'clone' use case.
Second, the implementation avoid recursive call only if the sample
action list has a single userspace action.
The main optimization implemented in this series removes the static
nesting limit check, instead, implement the run time recursion limit
check, and recursion avoidance similar to that of the 'recirc' action.
This optimization solve both #1 and #2 issues above.
One related optimization attempts to avoid copying flow key as
long as the actions enclosed does not change the flow key. The
detection is performed only once at the flow downloading time.
Another related optimization is to rewrite the action list
at flow downloading time in order to save the fast path from parsing
the sample action list in its original form repeatedly.
andy zhou [Mon, 20 Mar 2017 23:32:28 +0000 (16:32 -0700)]
openvswitch: Refactor recirc key allocation.
The logic of allocating and copy key for each 'exec_actions_level'
was specific to execute_recirc(). However, future patches will reuse
as well. Refactor the logic into its own function clone_key().
andy zhou [Mon, 20 Mar 2017 23:32:27 +0000 (16:32 -0700)]
openvswitch: Deferred fifo API change.
add_deferred_actions() API currently requires actions to be passed in
as a fully encoded netlink message. So far both 'sample' and 'recirc'
actions happens to carry actions as fully encoded netlink messages.
However, this requirement is more restrictive than necessary, future
patch will need to pass in action lists that are not fully encoded
by themselves.
Device based features for VRF such as qdisc, netfilter and packet
captures are implemented by switching the dst on skbuffs to its per-VRF
dst. This has the effect of controlling the output function which points
a function in the VRF driver. [1] The skb proceeds down the stack with
dst->dev pointing to the VRF device. Netfilter, qdisc and tc rules and
network taps are evaluated based on this device. Finally, the skb makes
it to the vrf_xmit function which resets the dst based on a FIB lookup.
The feature comes at cost - between 5 and 10% depending on test (TCP vs
UDP, stream vs RR and IPv4 vs IPv6). The main cost is requiring a FIB
lookup in the VRF driver for each packet sent through it. The FIB lookup
is required because the real dst gets dropped so that the skb can
traverse the stack with dst->dev set to the VRF device.
All of that is really driven by the qdisc and not replicating the
processing of __dev_queue_xmit if a qdisc is set up on the device. But,
VRF devices by default do not have a qdisc and really have no need for
multiple Tx queues. This means the performance overhead is inflicted upon
all users for the potential use case of a qdisc being configured.
The overhead can be avoided by checking if the default configuration
applies to a specific VRF device before switching the dst. If a device
does not have a qdisc, the pass through netfilter hooks and packet taps
can be done inline without dropping the dst and thus avoiding the
performance penalty. With this change performance overhead of VRF drops
to neglible (difference with run-over-run variance) to 3% depending on
test type.
netperf performance comparison for 3 cases:
1. L3_MASTER_DEVICE compiled out
2. VRF with this patch set
3. current VRF code
* Although higher in the final run used for submitting this patch set, I
think what this really represents is a neglible performance overhead for
VRF with this change (i.e, within the +-1% variance of runs). Most
notably the FIB lookups in the Tx path are avoided for TCP_RR.
* UDP is consistently better with VRF for two reasons:
1. Source address selection with L3 domains is considering fewer
addresses since only addresses on interfaces in the domain are
considered for the selection. Specifically, perf-top shows
shows ipv6_get_saddr_eval, ipv6_dev_get_saddr and __ipv6_dev_get_saddr
running much lower with vrf than without.
2. The VRF table contains all routes (i.e, there are no separate local
and main tables per VRF). That means ip6_pol_route_output only has 1
lookup for VRF where it does 2 without it (1 in the local table and 1
in the main table).
David Ahern [Mon, 20 Mar 2017 18:19:45 +0000 (11:19 -0700)]
net: vrf: performance improvements for IPv6
The VRF driver allows users to implement device based features for an
entire domain. For example, a qdisc or netfilter rules can be attached
to a VRF device or tcpdump can be used to view packets for all devices
in the L3 domain.
The device-based features come with a performance penalty, most
notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
to switch the dst on an skb to its private dst. This allows the skb
to traverse the xmit stack with the device set to the VRF device
which in turn enables the netfilter and qdisc features. The VRF
driver then performs the FIB lookup again and reinserts the packet.
This patch avoids the redirect for IPv6 packets if a qdisc has not
been attached to a VRF device which is the default config. In this
case the netfilter hooks and network taps are directly traversed in
the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
then the redirect using the vrf dst is done.
Additional overhead is removed by only checking packet taps if a
socket is open on the device (vrf_dev->ptype_all list is not empty).
Packet sockets bound to any device will still get a copy of the
packet via the real ingress or egress interface.
The end result of this change is a decrease in the overhead of VRF
for the default, baseline case (ie., no netfilter rules, no packet
sockets, no qdisc) from a +3% improvement for UDP which has a lookup
per packet (VRF being better than no l3mdev) to ~2% loss for TCP_CRR
which connects a socket for each request-response.
David Ahern [Mon, 20 Mar 2017 18:19:44 +0000 (11:19 -0700)]
net: vrf: performance improvements for IPv4
The VRF driver allows users to implement device based features for an
entire domain. For example, a qdisc or netfilter rules can be attached
to a VRF device or tcpdump can be used to view packets for all devices
in the L3 domain.
The device-based features come with a performance penalty, most
notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
to switch the dst on an skb to its private dst. This allows the skb
to traverse the xmit stack with the device set to the VRF device
which in turn enables the netfilter and qdisc features. The VRF
driver then performs the FIB lookup again and reinserts the packet.
This patch avoids the redirect for IPv4 packets if a qdisc has not
been attached to a VRF device which is the default config. In this
case the netfilter hooks and network taps are directly traversed in
the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
then the redirect using the vrf dst is done.
Additional overhead is removed by only checking packet taps if a
socket is open on the device (vrf_dev->ptype_all list is not empty).
Packet sockets bound to any device will still get a copy of the
packet via the real ingress or egress interface.
The end result of this change is a decrease in the overhead of VRF
for the default, baseline case (ie., no netfilter rules, no packet
sockets, no qdisc) to ~3% for UDP which has a lookup per packet and
< 1% overhead for connected sockets that leverage early demux and
avoid FIB lookups.
Josh Hunt [Mon, 20 Mar 2017 19:22:03 +0000 (15:22 -0400)]
sock: introduce SO_MEMINFO getsockopt
Allows reading of SK_MEMINFO_VARS via socket option. This way an
application can get all meminfo related information in single socket
option call instead of multiple calls.
Adds helper function, sk_get_meminfo(), and uses that for both
getsockopt and sock_diag_put_meminfo().
Colin Ian King [Mon, 20 Mar 2017 11:37:22 +0000 (11:37 +0000)]
mlxsw: spectrum: fix swapped order of arguments packets and bytes
The arguments packets and bytes to call mlxsw_sp_acl_rule_get_stats are
in the wrong order. Fix this by swapping them.
Detected by CoverityScan, CID#1419705 ("Arguments in wrong order")
Fixes: 7c1b8eb175b69add8ea ("mlxsw: spectrum: Add support for TC flower offload statistics") Signed-off-by: Colin Ian King <[email protected]> Acked-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Xin Long [Mon, 20 Mar 2017 09:46:27 +0000 (17:46 +0800)]
sctp: declare struct sctp_stream before using it
sctp_stream_free uses struct sctp_stream as a param, but struct sctp_stream
is defined after it's declaration.
This patch is to declare struct sctp_stream before sctp_stream_free.
Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf") Signed-off-by: Xin Long <[email protected]> Acked-by: Neil Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Arnd Bergmann [Mon, 20 Mar 2017 08:58:33 +0000 (09:58 +0100)]
cpsw/netcp: cpts depends on posix_timers
With posix timers having become optional, we get a build error with
the cpts time sync option of the CPSW driver:
drivers/net/ethernet/ti/cpts.c: In function 'cpts_find_ts':
drivers/net/ethernet/ti/cpts.c:291:23: error: implicit declaration of function 'ptp_classify_raw';did you mean 'ptp_classifier_init'? [-Werror=implicit-function-declaration]
This adds a hard dependency on PTP_CLOCK to avoid the problem, as
building it without PTP support makes no sense anyway.
Arnd Bergmann [Mon, 20 Mar 2017 08:52:50 +0000 (09:52 +0100)]
cpsw/netcp: work around reverse cpts dependency
The dependency is reversed: cpsw and netcp call into cpts,
but cpts depends on the other two in Kconfig. This can lead
to cpts being a loadable module and its callers built-in:
drivers/net/ethernet/ti/cpsw.o: In function `cpsw_remove':
cpsw.c:(.text.cpsw_remove+0xd0): undefined reference to `cpts_release'
drivers/net/ethernet/ti/cpsw.o: In function `cpsw_rx_handler':
cpsw.c:(.text.cpsw_rx_handler+0x2dc): undefined reference to `cpts_rx_timestamp'
drivers/net/ethernet/ti/cpsw.o: In function `cpsw_tx_handler':
cpsw.c:(.text.cpsw_tx_handler+0x7c): undefined reference to `cpts_tx_timestamp'
drivers/net/ethernet/ti/cpsw.o: In function `cpsw_ndo_stop':
As a workaround, I'm introducing another Kconfig symbol to
control the compilation of cpts, while making the actual
module controlled by a silent symbol that is =y when necessary.
Fixes: 6246168b4a38 ("net: ethernet: ti: netcp: add support of cpts") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Grygorii Strashko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Arjun Vynipadath [Mon, 20 Mar 2017 08:52:38 +0000 (14:22 +0530)]
cxgb4: Update IngPad and IngPack values
We are using the smallest padding boundary (8 bytes), which isn't
smaller than the Memory Controller Read/Write Size
We get best performance in 100G when the Packing Boundary is a multiple
of the Maximum Payload Size. Its related to inefficient chopping of DMA
packets by PCIe, that causes more overhead on bus. So driver is helping
by making the starting address alignment to be MPS size.
We will try to determine PCIE MaxPayloadSize capabiltiy and set
IngPackBoundary based on this value. If cache line size is greater than
MPS or determinig MPS fails, we will use cache line size to determine
IngPackBoundary(as before).
Arnd Bergmann [Mon, 20 Mar 2017 08:51:13 +0000 (09:51 +0100)]
net: dwc-xlgmac: add module license
When building the driver as a module, we get a warning about the
lack of a license:
WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/synopsys/dwc-xlgmac.o
see include/linux/module.h for more information
Curiously the text in the .c files only mentions GPLv2+, while the license
tag in the PCI driver contains both GPL and BSD. I picked the license text
as the more definite reference here and put a GPL tag in there.
Fixes: 65e0ace2c5cd ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Arnd Bergmann [Mon, 20 Mar 2017 08:51:12 +0000 (09:51 +0100)]
net: dwc-xlgmac: include dcbnl.h
Without this header, we can run into a build error:
drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c: In function 'xlgmac_config_queue_mapping':
drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c:1548:36: error: 'IEEE_8021QAZ_MAX_TCS' undeclared (first use in this function)
prio_queues = min_t(unsigned int, IEEE_8021QAZ_MAX_TCS,
Fixes: 65e0ace2c5cd ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Jie Deng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Wed, 22 Mar 2017 17:50:37 +0000 (10:50 -0700)]
Merge branch 'r8152-rx-settings'
Hayes Wang says:
====================
r8152: fix the rx settings of RTL8153
The RMS and the rx early size should base on the same rx size. However,
the RMS is set to 9K bytes now and the rx early depends on mtu. For using
the rx buffer effectively, sync the two settings according to the mtu.
====================
hayeswang [Mon, 20 Mar 2017 08:13:44 +0000 (16:13 +0800)]
r8152: set the RMS of RTL8153 according to the mtu
Set the received maximum size (RMS) according to the mtu size. It is
unnecessary to receive a packet which is more than the size we could
transmit. Besides, this could let the rx buffer be used effectively.
Roopa Prabhu [Mon, 20 Mar 2017 05:01:28 +0000 (22:01 -0700)]
neighbour: fix nlmsg_pid in notifications
neigh notifications today carry pid 0 for nlmsg_pid
in all cases. This patch fixes it to carry calling process
pid when available. Applications (eg. quagga) rely on
nlmsg_pid to ignore notifications generated by their own
netlink operations. This patch follows the routing subsystem
which already sets this correctly.
Tejun Heo [Tue, 14 Mar 2017 23:25:56 +0000 (19:25 -0400)]
cgroup, net_cls: iterate the fds of only the tasks which are being migrated
The net_cls controller controls the classid field of each socket which
is associated with the cgroup. Because the classid is per-socket
attribute, when a task migrates to another cgroup or the configured
classid of the cgroup changes, the controller needs to walk all
sockets and update the classid value, which was implemented by 3b13758f51de ("cgroups: Allow dynamically changing net_classid").
While the approach is not scalable, migrating tasks which have a lot
of fds attached to them is rare and the cost is born by the ones
initiating the operations. However, for simplicity, both the
migration and classid config change paths call update_classid() which
scans all fds of all tasks in the target css. This is an overkill for
the migration path which only needs to cover a much smaller subset of
tasks which are actually getting migrated in.
On cgroup v1, this can lead to unexpected scalability issues when one
tries to migrate a task or process into a net_cls cgroup which already
contains a lot of fds. Even if the migration traget doesn't have many
to get scanned, update_classid() ends up scanning all fds in the
target cgroup which can be extremely numerous.
Unfortunately, on cgroup v2 which doesn't use net_cls, the problem is
even worse. Before bfc2cf6f61fc ("cgroup: call subsys->*attach() only
for subsystems which are actually affected by migration"), cgroup core
would call the ->css_attach callback even for controllers which don't
see actual migration to a different css.
As net_cls is always disabled but still mounted on cgroup v2, whenever
a process is migrated on the cgroup v2 hierarchy, net_cls sees
identity migration from root to root and cgroup core used to call
->css_attach callback for those. The net_cls ->css_attach ends up
calling update_classid() on the root net_cls css to which all
processes on the system belong to as the controller isn't used. This
makes any cgroup v2 migration O(total_number_of_fds_on_the_system)
which is horrible and easily leads to noticeable stalls triggering RCU
stall warnings and so on.
The worst symptom is already fixed in upstream by bfc2cf6f61fc
("cgroup: call subsys->*attach() only for subsystems which are
actually affected by migration"); however, backporting that commit is
too invasive and we want to avoid other cases too.
This patch updates net_cls's cgrp_attach() to iterate fds of only the
processes which are actually getting migrated. This removes the
surprising migration cost which is dependent on the total number of
fds in the target cgroup. As this leaves write_classid() the only
user of update_classid(), open-code the helper into write_classid().
David S. Miller [Wed, 22 Mar 2017 02:02:38 +0000 (19:02 -0700)]
Merge branch 'qed-IOV-cleanups'
Yuval Mintz says:
====================
qed: IOV related clenaups
This patch series targets IOV functionality [on both PF and VF].
Patches #2, #3 and #5 fix flows relating to malicious VFs, either by
upgrading and aligning current safe-guards or by correcing racy flows.
Patches #1 and #8 make some malicious/dysnfunctional VFs logging appear
by default in logs.
The rest of the patches either cleanup the existing code or else correct
some possible [yet fairly insignicant] issues in VF behavior.
====================
Mintz, Yuval [Sun, 19 Mar 2017 11:08:20 +0000 (13:08 +0200)]
qed: Always publish VF link from leading hwfn
The link information exists only on the leading hwfn,
but some of its derivatives [e.g., min/max rate] need to
be configured for each hwfn.
When re-basing the VF link view, use the leading hwfn
information as basis for all existing hwfns to allow
said configurations to stick.
Mintz, Yuval [Sun, 19 Mar 2017 11:08:19 +0000 (13:08 +0200)]
qed: Raise verbosity of Malicious VF indications
Malicious VF existance should be interesting enough for the
hyperuser. Change the PF indication that one of its child VF
became malicious to appear by default.
Mintz, Yuval [Sun, 19 Mar 2017 11:08:17 +0000 (13:08 +0200)]
qed: Deprecate VF multiple queue-stop
The PF<->VF interface allows for the VF to request
multiple queues closure via a single message, but this has
never been used by any official driver.
We now deprecate this option, forcing each queue close
to arrive via a different command; This would be required
for future TLVs that are going to extend the queue TLVs with
additional information on a per-queue basis.
Mintz, Yuval [Sun, 19 Mar 2017 11:08:16 +0000 (13:08 +0200)]
qed: Uniform IOV queue validation
PF needs to validate the status of VF queues before asking firmware
to configure anything for them, but that validation is done in various
different forms - sometimes inadequate.
Add auxillary functions that can be used for testing of the queue
state and convert the various flows to use those instead of current
existing flows; Also, add missing validations where needed.
Mintz, Yuval [Sun, 19 Mar 2017 11:08:15 +0000 (13:08 +0200)]
qed: Correct default VF coalescing configuration
When starting the VF's vport, the PF would first configure
the status blocks of the VF and then reset them.
That would cause some of the configured information to be lost -
specifically it would mean that all the VFs queues would use
the Rx coalescing state-machine of the status block.
Mintz, Yuval [Sun, 19 Mar 2017 11:08:14 +0000 (13:08 +0200)]
qed: Set HW-channel to ready before ACKing VF
When PF responds to the VF requests it also cleans the HW-channel
indication in firmware to allow further VF messages to arrive,
but the order currently applied is wrong -
The PF is copying by DMAE the response the VF is polling on for
completion, and only afterwards sets the HW-channel to ready state.
This creates a race condition where the VF would be able to send
an additional message to the PF before the channel would get ready
again, causing the firmware to consider the VF as malicious.
Mintz, Yuval [Sun, 19 Mar 2017 11:08:13 +0000 (13:08 +0200)]
qed: Clean VF malicious indication when disabling IOV
When a VF is considered malicious, driver handling of the VF
FLR flow would clean said indication - but not if the FLR is
part of an sriov-disable flow.
That leads to further issues, as PF wouldn't re-enable the
previously malicious VF when sriov is re-enabled.
No reason for that - simply clean malicious indications in
the sriov-disable flow as well.
Mintz, Yuval [Sun, 19 Mar 2017 11:08:12 +0000 (13:08 +0200)]
qed: Increase verbosity of VF -> PF errors
VFs are currently logging errors when communicating
with their PFs in a too-low verbosity that wouldn't
be shown by default. As timeouts and failed commands
are crucial for VF operability, make them appear by
default.
Zi Shen Lim [Mon, 20 Mar 2017 06:03:14 +0000 (23:03 -0700)]
selftests/bpf: fix broken build, take 2
Merge of 'linux-kselftest-4.11-rc1':
1. Partially removed use of 'test_objs' target, breaking force rebuild of
BPFOBJ, introduced in commit d498f8719a09 ("bpf: Rebuild bpf.o for any
dependency update").
Update target so dependency on BPFOBJ is restored.
2. Introduced commit 2047f1d8ba28 ("selftests: Fix the .c linking rule")
which fixes order of LDLIBS.
Commit d02d8986a768 ("bpf: Always test unprivileged programs") added
libcap dependency into CFLAGS. Use LDLIBS instead to fix linking of
test_verifier.
3. Introduced commit d83c3ba0b926 ("selftests: Fix selftests build to
just build, not run tests").
Reordering the Makefile allows us to remove the 'all' target.
Tested both:
selftests/bpf$ make
and
selftests$ make TARGETS=bpf
on Ubuntu 16.04.2.
SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled
while packets are collected on the error queue.
So, checking SOF_TIMESTAMPING_OPT_STATS in sk->sk_tsflags
is not enough to safely assume that the skb contains
OPT_STATS data.
Add a bit in sock_exterr_skb to indicate whether the
skb contains opt_stats data.
tcp: fix SCM_TIMESTAMPING_OPT_STATS for normal skbs
__sock_recv_timestamp can be called for both normal skbs (for
receive timestamps) and for skbs on the error queue (for transmit
timestamps).
Commit 1c885808e456
(tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING)
assumes any skb passed to __sock_recv_timestamp are from
the error queue, containing OPT_STATS in the content of the skb.
This results in accessing invalid memory or generating junk
data.
To fix this, set skb->pkt_type to PACKET_OUTGOING for packets
on the error queue. This is safe because on the receive path
on local sockets skb->pkt_type is never set to PACKET_OUTGOING.
With that, copy OPT_STATS from a packet, only if its pkt_type
is PACKET_OUTGOING.
Xin Long [Sat, 18 Mar 2017 11:12:22 +0000 (19:12 +0800)]
sctp: remove temporary variable confirm from sctp_packet_transmit
Commit c86a773c7802 ("sctp: add dst_pending_confirm flag") introduced
a temporary variable "confirm" in sctp_packet_transmit.
But it broke the rule that longer lines should be above shorter ones.
Besides, this variable is not necessary, so this patch is to just
remove it and use tp->dst_pending_confirm directly.
David Ahern [Fri, 17 Mar 2017 23:07:11 +0000 (16:07 -0700)]
net: vrf: Reset rt6i_idev in local dst after put
The VRF driver takes a reference to the inet6_dev on the VRF device for
its rt6_local dst when handling local traffic through the VRF device as
a loopback. When the device is deleted the driver does a put on the idev
but does not reset rt6i_idev in the rt6_info struct. When the dst is
destroyed, dst_destroy calls ip6_dst_destroy which does a second put for
what is essentially the same reference causing it to be prematurely freed.
Reset rt6i_idev after the put in the vrf driver.
Fixes: b4869aa2f881e ("net: vrf: ipv6 support for local traffic to
local addresses") Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Rick Farrington [Fri, 17 Mar 2017 22:43:26 +0000 (15:43 -0700)]
liquidio: fix for vf mac addr command sent to nic firmware
Change to support host<->firmware command return value.
Fix for vf mac addr state command.
1. Added support for firmware commands to return a value:
- previously, the returned code overlapped with host codes, thus
commands were only returning 0 (success) or -1 (interpreted as
timeout)
- per 'response_manager.h', the error codes are split into two fields
(major/minor) now, firmware commands are grouped into their own
'major' group, separate from the host's 'major' group, which allow f/w
commands to return any 16-bit value
2. The command to set vf mac addr was logging a success message even if
command failed. Now command uses a callback function to log the status
message.
3. The command to set vf mac addr was not logging a message when set via
the host 'ip' command. Now, the callback function will log an
appropriate message.
John Allen [Fri, 17 Mar 2017 22:13:43 +0000 (17:13 -0500)]
ibmvnic: Correct ibmvnic handling of device open/close
When closing the ibmvnic device we need to release the resources used
in communicating to the virtual I/O server. These need to be
re-negotiated with the server at open time.
This patch moves the releasing of resources a separate routine
and updates the open and close handlers to release all resources at
close and re-negotiate and allocate these resources at open.
John Allen [Fri, 17 Mar 2017 22:13:42 +0000 (17:13 -0500)]
ibmvnic: Move ibmvnic adapter intialization to its own routine
The intialization of the ibmvnic driver with respect to the virtual
server it connects to should be moved to its own routine. This will
alolow the driver to initiate this process from places outside of
the drivers probe routine.
John Allen [Fri, 17 Mar 2017 22:13:40 +0000 (17:13 -0500)]
ibmvnic: Move login and queue negotiation into ibmvnic_open
VNIC server expects LINK_STATE_UP to be sent within 30s of the login. If we
exceed the timeout, VNIC server will attempt to fail over. Since time
between probe and open of the device is indeterminate, move login and queue
negotiation into ibmvnic open so we can guarantee that login and sending
LINK_STATE_UP occur within the 30s window.
David S. Miller [Wed, 22 Mar 2017 00:24:02 +0000 (17:24 -0700)]
Merge branch 'stmmac-mq-part3'
Joao Pinto says:
====================
net: stmmac: adding multiple buffers and routing
As agreed with David Miller, this patch-set is the third and last to enable
multiple queues in stmmac.
This third one focuses on:
a) Enable multiple buffering to the driver and queue independent data
b) Configuration of RX and TX queues' priority
c) Configuration of RX queues' routing
====================
Joao Pinto [Fri, 17 Mar 2017 16:11:05 +0000 (16:11 +0000)]
net: stmmac: enable multiple buffers
This patch creates 2 new structures (stmmac_tx_queue and stmmac_rx_queue)
in include/linux/stmmac.h, enabling that each RX and TX queue has its
own buffers and data.
Eric Dumazet [Fri, 17 Mar 2017 15:05:28 +0000 (08:05 -0700)]
sch_dsmark: fix invalid skb_cow() usage
skb_cow(skb, sizeof(ip header)) is not very helpful in this context.
First we need to use pskb_may_pull() to make sure the ip header
is in skb linear part, then use skb_try_make_writable() to
address clones issues.
Fixes: 4c30719f4f55 ("[PKT_SCHED] dsmark: handle cloned and non-linear skb's") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Currently the struct representing router interface "mlxsw_sp_rif"
is reffered as "r" in various places in the driver. Furthermore it
contains a member which specify the index which is called "rif".
This patch change "r" to "rif" and "rif" to "rif_index".
David S. Miller [Tue, 21 Mar 2017 23:04:01 +0000 (16:04 -0700)]
Merge branch 'usbnet-ksettings'
Philippe Reynes says:
====================
net: usbnet: move to new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated. On usbnet, it
was often implemented with usbnet_{get|set}_settings.
In this series, I add usbnet_{get|set}_link_ksettings
in the first patch, then I update all the driver to
use this new api, and in the last patch I remove the
old api usbnet_{get|set}_settings.
====================
Florian Fainelli [Thu, 16 Mar 2017 17:27:08 +0000 (10:27 -0700)]
net: bcmgenet: Track per TX/RX rings statistics
__bcmgenet_tx_reclaim() is currently summing TX bytes/packets in a way
that is not SMP friendly, mutliples CPUs could run
__bcmgenet_tx_reclaim() independently and still update stats->tx_bytes
and stats->tx_packets, cloberring the other CPUs statistics.
Fix this by tracking per RX and TX rings the number of bytes, packets,
dropped and errors statistics, and provide a bcmgenet_get_stats()
function which aggregates everything and returns a consistent output.
net: ipv4: add support for ECMP hash policy choice
This patch adds support for ECMP hash policy choice via a new sysctl
called fib_multipath_hash_policy and also adds support for L4 hashes.
The current values for fib_multipath_hash_policy are:
0 - layer 3 (default)
1 - layer 4
If there's an skb hash already set and it matches the chosen policy then it
will be used instead of being calculated (currently only for L4).
In L3 mode we always calculate the hash due to the ICMP error special
case, the flow dissector's field consistentification should handle the
address order thus we can remove the address reversals.
If the skb is provided we always use it for the hash calculation,
otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.
Andrey Vagin [Thu, 16 Mar 2017 00:41:14 +0000 (17:41 -0700)]
net/8021q: create device with all possible features in wanted_features
wanted_features is a set of features which have to be enabled if a
hardware allows that.
Currently when a vlan device is created, its wanted_features is set to
current features of its base device.
The problem is that the base device can get new features and they are
not propagated to vlan-s of this device.
If we look at bonding devices, they doesn't have this problem and this
patch suggests to fix this issue by the same way how it works for bonding
devices.
We meet this problem, when we try to create a vlan device over a bonding
device. When a system are booting, real devices require time to be
initialized, so bonding devices created without slaves, then vlan
devices are created and only then ethernet devices are added to the
bonding device. As a result we have vlan devices with disabled
scatter-gather.
* create a bonding device
$ ip link add bond0 type bond
$ ethtool -k bond0 | grep scatter
scatter-gather: off
tx-scatter-gather: off [requested on]
tx-scatter-gather-fraglist: off [requested on]
* create a vlan device
$ ip link add link bond0 name bond0.10 type vlan id 10
$ ethtool -k bond0.10 | grep scatter
scatter-gather: off
tx-scatter-gather: off
tx-scatter-gather-fraglist: off
* Add a slave device to bond0
$ ip link set dev eth0 master bond0
And now we can see that the bond0 device has got the scatter-gather
feature, but the bond0.10 hasn't got it.
[root@laptop linux-task-diag]# ethtool -k bond0 | grep scatter
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: on
[root@laptop linux-task-diag]# ethtool -k bond0.10 | grep scatter
scatter-gather: off
tx-scatter-gather: off
tx-scatter-gather-fraglist: off
With this patch the vlan device will get all new features from the
bonding device.
Here is a call trace how features which are set in this patch reach
dev->wanted_features.
Andrey Ulanov [Wed, 15 Mar 2017 03:16:42 +0000 (20:16 -0700)]
net: unix: properly re-increment inflight counter of GC discarded candidates
Dmitry has reported that a BUG_ON() condition in unix_notinflight()
may be triggered by a simple code that forwards unix socket in an
SCM_RIGHTS message.
That is caused by incorrect unix socket GC implementation in unix_gc().
The GC first collects list of candidates, then (a) decrements their
"children's" inflight counter, (b) checks which inflight counters are
now 0, and then (c) increments all inflight counters back.
(a) and (c) are done by calling scan_children() with inc_inflight or
dec_inflight as the second argument.
Commit 6209344f5a37 ("net: unix: fix inflight counting bug in garbage
collector") changed scan_children() such that it no longer considers
sockets that do not have UNIX_GC_CANDIDATE flag. It also added a block
of code that that unsets this flag _before_ invoking
scan_children(, dec_iflight, ). This may lead to incorrect inflight
counters for some sockets.
This change fixes this bug by changing order of operations:
UNIX_GC_CANDIDATE is now unset only after all inflight counters are
restored to the original state.
David S. Miller [Tue, 21 Mar 2017 21:41:47 +0000 (14:41 -0700)]
Merge branch 'vsock-pkt-cancel'
Peng Tao says:
====================
vsock: cancel connect packets when failing to connect
Currently, if a connect call fails on a signal or timeout (e.g., guest is still
in the process of starting up), we'll just return to caller and leave the connect
packet queued and they are sent even though the connection is considered a failure,
which can confuse applications with unwanted false connect attempt.
The patchset enables vsock (both host and guest) to cancel queued packets when
a connect attempt is considered to fail.
v5 changelog:
- change virtio_vsock_pkt->cancel_token back to virtio_vsock_pkt->vsk
v4 changelog:
- drop two unnecessary void * cast
- update new callback comment
v3 changelog:
- define cancel_pkt callback in struct vsock_transport rather than struct virtio_transport
- rename virtio_vsock_pkt->vsk to virtio_vsock_pkt->cancel_token
v2 changelog:
- fix queued_replies counting and resume tx/rx when necessary
====================
Peng Tao [Wed, 15 Mar 2017 01:32:17 +0000 (09:32 +0800)]
vsock: cancel packets when failing to connect
Otherwise we'll leave the packets queued until releasing vsock device.
E.g., if guest is slow to start up, resulting ETIMEDOUT on connect, guest
will get the connect requests from failed host sockets.
This enables developing code that uses SOF_TIMESTAMPING_TX_SOFTWARE
by using localhost addresses (without needing to send packets outside),
as well as enabling unit and functional testing of TX timestamping code
without needing hardware support or network access.
It also fulfills the expectation of software network devices supporting
software-based timestamping.
Tested on qemu using txtimestamping.c from the kernel selftests, and
ethtool -T.