]> Git Repo - linux.git/log
linux.git
8 years agodrm/amdgpu: fix missing free wb for cond_exec
Monk Liu [Mon, 30 May 2016 06:17:42 +0000 (14:17 +0800)]
drm/amdgpu: fix missing free wb for cond_exec

Signed-off-by: Monk Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu: fix memleak in pptable_init
Monk Liu [Mon, 30 May 2016 05:43:45 +0000 (13:43 +0800)]
drm/amdgpu: fix memleak in pptable_init

Signed-off-by: Monk Liu <[email protected]>
Acked-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu: fix mem leak in atombios
Monk Liu [Fri, 27 May 2016 11:34:11 +0000 (19:34 +0800)]
drm/amdgpu: fix mem leak in atombios

Signed-off-by: Monk Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu: fix mem leak in pplib/hwmgr
Monk Liu [Fri, 27 May 2016 11:09:06 +0000 (19:09 +0800)]
drm/amdgpu: fix mem leak in pplib/hwmgr

Signed-off-by: Monk Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu: fix mem leak in smumgr
Monk Liu [Fri, 27 May 2016 09:52:58 +0000 (17:52 +0800)]
drm/amdgpu: fix mem leak in smumgr

Signed-off-by: Monk Liu <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu: add pipeline sync while vmid switch in same ctx
Chunming Zhou [Wed, 27 Apr 2016 10:07:41 +0000 (18:07 +0800)]
drm/amdgpu: add pipeline sync while vmid switch in same ctx

Since vmid-mgr supports vmid sharing in one vm, the same ctx could
get different vmids for two emits without vm flush, vm_flush could
be done in another ring.

Signed-off-by: Chunming Zhou <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu: vBIOS post only call when mem_size zero
Monk Liu [Tue, 24 May 2016 05:23:46 +0000 (13:23 +0800)]
drm/amdgpu: vBIOS post only call when mem_size zero

Signed-off-by: Monk Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu: modify sdma start sequence
Monk Liu [Wed, 25 May 2016 08:57:14 +0000 (16:57 +0800)]
drm/amdgpu: modify sdma start sequence

should fist halt engine, and then doing the register
programing, and later unhalt engine, and finally run
ring_test.

this help fix reloading driver hang issue of SDMA
ring

original sequence is wrong for it programing engine
after unhalt, which will lead to fault behavior when
doing driver reloading after unloaded.

Signed-off-by: Monk Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu: init more register for sdma
Monk Liu [Wed, 25 May 2016 08:55:50 +0000 (16:55 +0800)]
drm/amdgpu: init more register for sdma

This help fix reloading driver hang issue of SDMA
ring

Signed-off-by: Monk Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu: clear SA bo when created
Monk Liu [Wed, 25 May 2016 08:55:07 +0000 (16:55 +0800)]
drm/amdgpu: clear SA bo when created

This help fix reloading driver hang issue of SDMA
ring

Signed-off-by: Monk Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu: fix fw leak in non-powerplay dpm code
Alex Deucher [Wed, 1 Jun 2016 15:09:01 +0000 (11:09 -0400)]
drm/amdgpu: fix fw leak in non-powerplay dpm code

We need to release the firmware on driver tear down.

Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu: fix pplib finish bug
Monk Liu [Thu, 19 May 2016 06:36:34 +0000 (14:36 +0800)]
drm/amdgpu: fix pplib finish bug

1,should use late_fini to kfree all resource otherwise
the released pointer maybe accessed in IRQ ip fini routine.

2,hwmgr should not be kfree by pem_fini which is invoked
by hw fini path.

Signed-off-by: Monk Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu: impl late_fini for amdgpu_pp_ip
Monk Liu [Thu, 19 May 2016 06:36:01 +0000 (14:36 +0800)]
drm/amdgpu: impl late_fini for amdgpu_pp_ip

This implements late_init support for powerplay.

Signed-off-by: Monk Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu: add late_fini for ip_funcs
Monk Liu [Thu, 19 May 2016 06:35:17 +0000 (14:35 +0800)]
drm/amdgpu: add late_fini for ip_funcs

This give IP modules an optional late cleanup
function.  This is needed to handle tricky inter-module
dependencies during tear down.

Signed-off-by: Monk Liu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/admgpu/powerplay/polaris: fix powertune table upload
Rex Zhu [Mon, 23 May 2016 10:24:41 +0000 (18:24 +0800)]
drm/admgpu/powerplay/polaris: fix powertune table upload

Exclude AVFS related fields when update powertune table to hw.
The driver shouldn't set them directly.

Signed-off-by: Rex Zhu <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agodrm/amdgpu/iceland: Set SC_PA_RASTER_CONFIG according to different RB enabled
Ken Wang [Tue, 24 May 2016 01:26:27 +0000 (09:26 +0800)]
drm/amdgpu/iceland: Set SC_PA_RASTER_CONFIG according to different RB enabled

fix the raster config setting for different iceland configs.

Signed-off-by: Ken Wang <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
8 years agowext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel
Prasun Maiti [Mon, 6 Jun 2016 14:34:19 +0000 (20:04 +0530)]
wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel

iwpriv app uses iw_point structure to send data to Kernel. The iw_point
structure holds a pointer. For compatibility Kernel converts the pointer
as required for WEXT IOCTLs (SIOCIWFIRST to SIOCIWLAST). Some drivers
may use iw_handler_def.private_args to populate iwpriv commands instead
of iw_handler_def.private. For those case, the IOCTLs from
SIOCIWFIRSTPRIV to SIOCIWLASTPRIV will follow the path ndo_do_ioctl().
Accordingly when the filled up iw_point structure comes from 32 bit
iwpriv to 64 bit Kernel, Kernel will not convert the pointer and sends
it to driver. So, the driver may get the invalid data.

The pointer conversion for the IOCTLs (SIOCIWFIRSTPRIV to
SIOCIWLASTPRIV), which follow the path ndo_do_ioctl(), is mandatory.
This patch adds pointer conversion from 32 bit to 64 bit and vice versa,
if the ioctl comes from 32 bit iwpriv to 64 bit Kernel.

Cc: [email protected]
Signed-off-by: Prasun Maiti <[email protected]>
Signed-off-by: Ujjal Roy <[email protected]>
Tested-by: Dibyajyoti Ghosh <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
8 years agocfg80211: remove get/set antenna and tx power warnings
Johannes Berg [Thu, 9 Jun 2016 07:40:55 +0000 (09:40 +0200)]
cfg80211: remove get/set antenna and tx power warnings

Since set_tx_power and set_antenna are frequently implemented
without the matching get_tx_power/get_antenna, we shouldn't
have added warnings for those. Remove them.

The remaining ones are correct and need to be implemented
symmetrically for correct operation.

Cc: [email protected]
Fixes: de3bb771f471 ("cfg80211: add more warnings for inconsistent ops")
Signed-off-by: Johannes Berg <[email protected]>
8 years agoMerge branch 'cbq-kill-drop'
David S. Miller [Thu, 9 Jun 2016 06:58:52 +0000 (23:58 -0700)]
Merge branch 'cbq-kill-drop'

Florian Westphal says:

====================
sched, cbq: remove OVL_STRATEGY/POLICE support

iproute2 does not implement any options that result in the
TCA_CBQ_OVL_STRATEGY/TCA_CBQ_POLICE attributes being set/used.

This series removes these two attributes from cbq and makes kernel reject
 them via EOPNOTSUPP in case they are present.

The two followup changes then remove several features from qdisc
infrastructure that are then no longer used/needed.  These are:
 - The 'drop' method provided by most qdiscs
 - the 'reshape_fail' function used by some qdiscs
 - the __parent member in struct Qdisc

I tested this with allmod and allyesconfig builds and also with
a brief cbq script:

  tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 10Mbit avpkt 1000 cell 8
  tc class add dev eth0 parent 1:0 classid 1:1 est 1sec 8sec cbq bandwidth 10Mbit rate 5Mbit prio 1 allot 1514 maxburst 20 cell 8 avpkt 1000 bounded split 1:0 defmap 3f
  tc class add dev eth0 parent 1:0 classid 1:2 est 1sec 8sec cbq bandwidth 10Mbit rate 5Mbit prio 1 allot 1514 maxburst 20 cell 8 avpkt 1000 bounded split 1:0 defmap 3f
  tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip tos 0x10 0xff classid 1:1 police rate 2Mbit burst 10K reclassify
  tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip tos 0x0c 0xff classid 1:2
  tc filter add dev eth0 parent 1:0 protocol ip prio 2 u32 match ip tos 0x10 0xff classid 1:2
  tc filter add dev eth0 parent 1:0 protocol ip prio 3 u32 match ip tos 0x0 0x0 classid 1:2

No changes since v1 except patch #5 to fix up struct Qdisc layout.
====================

Signed-off-by: David S. Miller <[email protected]>
8 years agosched: place state, next_sched and gso_skb in same cacheline again
Florian Westphal [Wed, 8 Jun 2016 22:27:43 +0000 (00:27 +0200)]
sched: place state, next_sched and gso_skb in same cacheline again

Earlier commits removed two members from struct Qdisc which places
next_sched/gso_skb into a different cacheline than ->state.

This restores the struct layout to what it was before the removal.
Move the two members, then add an annotation so they all reside in the
same cacheline.

This adds a 16 byte hole after cpu_qstats.

The hole could be closed but as it doesn't decrease total struct size just
do it this way.

Reported-by: Eric Dumazet <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agosched: remove qdisc->drop
Florian Westphal [Wed, 8 Jun 2016 22:27:42 +0000 (00:27 +0200)]
sched: remove qdisc->drop

after removal of TCA_CBQ_OVL_STRATEGY from cbq scheduler, there are no
more callers of ->drop() outside of other ->drop functions, i.e.
nothing calls them.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agosched: remove qdisc_rehape_fail
Florian Westphal [Wed, 8 Jun 2016 22:27:41 +0000 (00:27 +0200)]
sched: remove qdisc_rehape_fail

After the removal of TCA_CBQ_POLICE in cbq scheduler qdisc->reshape_fail
is always NULL, i.e. qdisc_rehape_fail is now the same as qdisc_drop.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agocbq: remove TCA_CBQ_POLICE support
Florian Westphal [Wed, 8 Jun 2016 22:27:40 +0000 (00:27 +0200)]
cbq: remove TCA_CBQ_POLICE support

iproute2 doesn't implement any cbq option that results in this attribute
being sent to kernel.

To make use of it, user would have to

- patch iproute2
- add a class
- attach a qdisc to the class (default pfifo doesn't work as
  q->handle is 0 and cbq_set_police() is a no-op in this case)
- re-'add' the same class (tc class change ...) again
- user must also specifiy a defmap (e.g. 'split 1:0 defmap 3f'), since
  this 'police' feature relies on its presence
- the added qdisc must be one of bfifo, pfifo or netem

If all of these conditions are met and _some_ leaf qdiscs, namely
p/bfifo, netem, plug or tbf would drop a packet, kernel calls back into
cbq, which will attempt to re-queue the skb into a different class
as indicated by the parents' defmap entry for TC_PRIO_BESTEFFORT.

[ i.e. we behave as if tc_classify returned TC_ACT_RECLASSIFY ].

This feature, which isn't documented or implemented in iproute2,
and isn't implemented consistently (most qdiscs like sfq, codel, etc
drop right away instead of attempting this reclassification) is the
sole reason for the reshape_fail and __parent member in Qdisc struct.

So remove TCA_CBQ_POLICE support from the kernel, reject it via EOPNOTSUPP
so userspace knows we don't support it, and then remove no-longer needed
infrastructure in followup commit.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agocbq: remove TCA_CBQ_OVL_STRATEGY support
Florian Westphal [Wed, 8 Jun 2016 22:27:39 +0000 (00:27 +0200)]
cbq: remove TCA_CBQ_OVL_STRATEGY support

since initial revision of cbq in 2004 iproute 2 has never implemented
support for TCA_CBQ_OVL_STRATEGY, which is what needs to be set to
activate the class->drop() call (TC_CBQ_OVL_DROP strategy must be
set by userspace value must be set by userspace).

David Miller says:
   It seems really safe to kill this thing off, flag an error if someone
   tries to set the attribute, and therefore kill off all of the
   non-default cbq_ovl_*() functions.

A followup commit can then remove all .drop qdisc methods since this
removed the only caller.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoALSA: hda - Add PCI ID for Kabylake
Vinod Koul [Thu, 9 Jun 2016 06:02:14 +0000 (11:32 +0530)]
ALSA: hda - Add PCI ID for Kabylake

Kabylake shows up as PCI ID 0xa171. And Kabylake-LP as 0x9d71.
Since these are similar to Skylake add these to SKL_PLUS macro

Signed-off-by: Vinod Koul <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
8 years agoqfq: don't leak skb if kzalloc fails
Florian Westphal [Wed, 8 Jun 2016 21:23:01 +0000 (23:23 +0200)]
qfq: don't leak skb if kzalloc fails

When we need to create a new aggregate to enqueue the skb we call kzalloc.
If that fails we returned ENOBUFS without freeing the skb.

Spotted during code review.

Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoip6gre: Allow live link address change
Shweta Choudaha [Wed, 8 Jun 2016 19:15:43 +0000 (20:15 +0100)]
ip6gre: Allow live link address change

The ip6 GRE tap device should not be forced to down state to change
the mac address and should allow live address change for tap device
similar to ipv4 gre.

Signed-off-by: Shweta Choudaha <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoip6gre: Allow live link address change
Shweta Choudaha [Wed, 8 Jun 2016 19:15:43 +0000 (20:15 +0100)]
ip6gre: Allow live link address change

The ip6 GRE tap device should not be forced to down state to change
the mac address and should allow live address change for tap device
similar to ipv4 gre.

Signed-off-by: Shweta Choudaha <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoMerge branch 'cls_u32-hwoffload-fixes'
David S. Miller [Thu, 9 Jun 2016 04:43:15 +0000 (21:43 -0700)]
Merge branch 'cls_u32-hwoffload-fixes'

Jakub Kicinski says:

====================
incremental cls_u32 hardware offload fixes

These are incremental changes from v1 of cls_u32 fixes.
First patch is reposted in its entirety, patch 2 is an
incremental change from patch 2 of the original series.
====================

Signed-off-by: David S. Miller <[email protected]>
8 years agonet: cls_u32: be more strict about skip-sw flag for knodes
Jakub Kicinski [Wed, 8 Jun 2016 19:11:04 +0000 (20:11 +0100)]
net: cls_u32: be more strict about skip-sw flag for knodes

Return an error if user requested skip-sw and the underlaying
hardware cannot handle tc offloads (or offloads are disabled).
This patch fixes the knode handling.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: cls_u32: catch all hardware offload errors
Jakub Kicinski [Wed, 8 Jun 2016 19:11:03 +0000 (20:11 +0100)]
net: cls_u32: catch all hardware offload errors

Errors reported by u32_replace_hw_hnode() were not propagated.

Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Sridhar Samudrala <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoMerge tag 'drm-vc4-fixes-2016-06-06' of github.com:anholt/linux into drm-fixes
Dave Airlie [Thu, 9 Jun 2016 02:32:09 +0000 (12:32 +1000)]
Merge tag 'drm-vc4-fixes-2016-06-06' of github.com:anholt/linux into drm-fixes

This pull request brings in vblank/pageflip fixes I had hoped to see
merged before 4.7rc1, plus two new fixes that have come in since then.

* tag 'drm-vc4-fixes-2016-06-06' of github.com:anholt/linux:
  drm/vc4: Make pageflip completion handling more robust.
  drm/vc4: Fix ioctl permissions for render nodes.
  drm/vc4: Return -EBUSY if there's already a pending flip event.
  drm/vc4: Fix drm_vblank_put/get imbalance in page flip path.
  drm/vc4: Fix get_vblank_counter with proper no-op for Linux 4.4+

8 years agodrm/omap: fix unused variable warning in dsi & hdmi
Tomi Valkeinen [Fri, 3 Jun 2016 11:27:03 +0000 (14:27 +0300)]
drm/omap: fix unused variable warning in dsi & hdmi

Signed-off-by: Tomi Valkeinen <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
8 years agoMerge branch 'linux-4.7' of git://github.com/skeggsb/linux into drm-fixes
Dave Airlie [Thu, 9 Jun 2016 02:30:29 +0000 (12:30 +1000)]
Merge branch 'linux-4.7' of git://github.com/skeggsb/linux into drm-fixes

Fixes for two issues reported by KASAN, a display engine hang due to
incorrect BIOS table parsing, and incorrect LTC interrupt handling on
Maxwell which could lead to a never-ending interrupt storm.

* 'linux-4.7' of git://github.com/skeggsb/linux:
  drm/nouveau/disp/sor/gm107: training pattern registers are like gm200
  drm/nouveau/disp/sor/gf119: both links use the same training register
  drm/nouveau/core: swap the order of imem/fb
  drm/nouveau/fbcon: fix out-of-bounds memory accesses
  drm/nouveau/gr/gf100-: update sm error decoding from gk20a nvgpu headers
  drm/nouveau/ltc/gm107-: fix typo in the address of NV_PLTCG_LTC0_LTS0_INTR
  drm/nouveau/bios/disp: fix handling of "match any protocol" entries

8 years agodrm/fsl-dcu: use flat regmap cache
Stefan Agner [Fri, 3 Jun 2016 21:21:34 +0000 (14:21 -0700)]
drm/fsl-dcu: use flat regmap cache

Using flat regmap cache instead of RB-tree to avoid the following
lockdep warning on driver load:
WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:2755 lockdep_trace_alloc+0x15c/0x160()
DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))

The RB-tree regmap cache needs to allocate new space on first
writes. However, allocations in an atomic context (e.g. when a
spinlock is held) are not allowed. The function regmap_write
calls map->lock, which acquires a spinlock in the fast_io case.
Since the FSL DCU driver uses MMIO, the regmap bus of type
regmap_mmio is being used which has fast_io set to true.

Use flat regmap cache and specify max register to be large
enouth to cover all registers available in LS1021a and Vybrids
register space.

Signed-off-by: Stefan Agner <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: [email protected]
8 years agoMerge branch 'vrf-fib-rule-improve'
David S. Miller [Wed, 8 Jun 2016 18:36:02 +0000 (11:36 -0700)]
Merge branch 'vrf-fib-rule-improve'

David Ahern says:

====================
net: vrf: Improve use of FIB rules

Currently, VRFs require 1 oif and 1 iif rule per address family per
VRF. As the number of VRF devices increases it brings scalability
issues with the increasing rule list. All of the VRF rules have the
same format with the exception of the specific table id to direct the
lookup. Since the table id is available from the oif or iif in the
loopup, the VRF rules can be consolidated to a single rule that pulls
the table from the VRF device.

This solution still allows a user to insert their own rules for VRFs,
including rules with additional attributes. Accordingly, it is backwards
compatible with existing setups and allows other policy routing as
desired.

Hopefully v5 is the charm; my e-waste can is getting full.
====================

Signed-off-by: David S. Miller <[email protected]>
8 years agonet: vrf: Add l3mdev rules on first device create
David Ahern [Wed, 8 Jun 2016 17:55:40 +0000 (10:55 -0700)]
net: vrf: Add l3mdev rules on first device create

Add l3mdev rule per address family when the first VRF device is
created. The rules are installed with a default preference of 1000.
Users can replace the default rule as desired.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: Add l3mdev rule
David Ahern [Wed, 8 Jun 2016 17:55:39 +0000 (10:55 -0700)]
net: Add l3mdev rule

Currently, VRFs require 1 oif and 1 iif rule per address family per
VRF. As the number of VRF devices increases it brings scalability
issues with the increasing rule list. All of the VRF rules have the
same format with the exception of the specific table id to direct the
lookup. Since the table id is available from the oif or iif in the
loopup, the VRF rules can be consolidated to a single rule that pulls
the table from the VRF device.

This patch introduces a new rule attribute l3mdev. The l3mdev rule
means the table id used for the lookup is pulled from the L3 master
device (e.g., VRF) rather than being statically defined. With the
l3mdev rule all of the basic VRF FIB rules are reduced to 1 l3mdev
rule per address family (IPv4 and IPv6).

If an admin wishes to insert higher priority rules for specific VRFs
those rules will co-exist with the l3mdev rule. This capability means
current VRF scripts will co-exist with this new simpler implementation.

Currently, the rules list for both ipv4 and ipv6 look like this:
    $ ip  ru ls
    1000:       from all oif vrf1 lookup 1001
    1000:       from all iif vrf1 lookup 1001
    1000:       from all oif vrf2 lookup 1002
    1000:       from all iif vrf2 lookup 1002
    1000:       from all oif vrf3 lookup 1003
    1000:       from all iif vrf3 lookup 1003
    1000:       from all oif vrf4 lookup 1004
    1000:       from all iif vrf4 lookup 1004
    1000:       from all oif vrf5 lookup 1005
    1000:       from all iif vrf5 lookup 1005
    1000:       from all oif vrf6 lookup 1006
    1000:       from all iif vrf6 lookup 1006
    1000:       from all oif vrf7 lookup 1007
    1000:       from all iif vrf7 lookup 1007
    1000:       from all oif vrf8 lookup 1008
    1000:       from all iif vrf8 lookup 1008
    ...
    32765:      from all lookup local
    32766:      from all lookup main
    32767:      from all lookup default

With the l3mdev rule the list is just the following regardless of the
number of VRFs:
    $ ip ru ls
    1000:       from all lookup [l3mdev table]
    32765:      from all lookup local
    32766:      from all lookup main
    32767:      from all lookup default

(Note: the above pretty print of the rule is based on an iproute2
       prototype. Actual verbage may change)

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoMerge branch 'tipc-small-fixes'
David S. Miller [Wed, 8 Jun 2016 18:27:02 +0000 (11:27 -0700)]
Merge branch 'tipc-small-fixes'

Jon Maloy says:

====================
tipc: two small fixes

We fix a couple of rarely seen anomalies discovered during testing.
====================

Signed-off-by: David S. Miller <[email protected]>
8 years agotipc: change node timer unit from jiffies to ms
Jon Paul Maloy [Wed, 8 Jun 2016 16:00:05 +0000 (12:00 -0400)]
tipc: change node timer unit from jiffies to ms

The node keepalive interval is recalculated at each timer expiration
to catch any changes in the link tolerance, and stored in a field in
struct tipc_node. We use jiffies as unit for the stored value.

This is suboptimal, because it makes the calculation unnecessary
complex, including two unit conversions. The conversions also lead to
a rounding error that causes the link "abort limit" to be 3 in the
normal case, instead of 4, as intended. This again leads to unnecessary
link resets when the network is pushed close to its limit, e.g., in an
environment with hundreds of nodes or namesapces.

In this commit, we do instead let the keepalive value be calculated and
stored in milliseconds, so that there is only one conversion and the
rounding error is eliminated.

We also remove a redundant "keepalive" field in struct tipc_link. This
is remnant from the previous implementation.

Acked-by: Ying Xue <[email protected]>
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agotipc: correct error in node fsm
Jon Paul Maloy [Wed, 8 Jun 2016 16:00:04 +0000 (12:00 -0400)]
tipc: correct error in node fsm

commit 88e8ac7000dc ("tipc: reduce transmission rate of reset messages
when link is down") revealed a flaw in the node FSM, as defined in
the log of commit 66996b6c47ed ("tipc: extend node FSM").

We see the following scenario:
1: Node B receives a RESET message from node A before its link endpoint
   is fully up, i.e., the node FSM is in state SELF_UP_PEER_COMING. This
   event will not change the node FSM state, but the (distinct) link FSM
   will move to state RESETTING.
2: As an effect of the previous event, the local endpoint on B will
   declare node A lost, and post the event SELF_DOWN to the its node
   FSM. This moves the FSM state to SELF_DOWN_PEER_LEAVING, meaning
   that no messages will be accepted from A until it receives another
   RESET message that confirms that A's endpoint has been reset. This
   is  wasteful, since we know this as a fact already from the first
   received RESET, but worse is that the link instance's FSM has not
   wasted this information, but instead moved on to state ESTABLISHING,
   meaning that it repeatedly sends out ACTIVATE messages to the reset
   peer A.
3: Node A will receive one of the ACTIVATE messages, move its link FSM
   to state ESTABLISHED, and start repeatedly sending out STATE messages
   to node B.
4: Node B will consistently drop these messages, since it can only accept
   accept a RESET according to its node FSM.
5: After four lost STATE messages node A will reset its link and start
   repeatedly sending out RESET messages to B.
6: Because of the reduced send rate for RESET messages, it is very
   likely that A will receive an ACTIVATE (which is sent out at a much
   higher frequency) before it gets the chance to send a RESET, and A
   may hence quickly move back to state ESTABLISHED and continue sending
   out STATE messages, which will again be dropped by B.
7: GOTO 5.
8: After having repeated the cycle 5-7 a number of times, node A will
   by chance get in between with sending a RESET, and the situation is
   resolved.

Unfortunately, we have seen that it may take a substantial amount of
time before this vicious loop is broken, sometimes in the order of
minutes.

We correct this by making a small correction to the node FSM: When a
node in state SELF_UP_PEER_COMING receives a SELF_DOWN event, it now
moves directly back to state SELF_DOWN_PEER_DOWN, instead of as now
SELF_DOWN_PEER_LEAVING. This is logically consistent, since we don't
need to wait for RESET confirmation from of an endpoint that we alread
know has been reset. It also means that node B in the scenario above
will not be dropping incoming STATE messages, and the link can come up
immediately.

Finally, a symmetry comparison reveals that the  FSM has a similar
error when receiving the event PEER_DOWN in state PEER_UP_SELF_COMING.
Instead of moving to PERR_DOWN_SELF_LEAVING, it should move directly
to SELF_DOWN_PEER_DOWN. Although we have never seen any negative effect
of this logical error, we choose fix this one, too.

The node FSM looks as follows after those changes:

                           +----------------------------------------+
                           |                           PEER_DOWN_EVT|
                           |                                        |
  +------------------------+----------------+                       |
  |SELF_DOWN_EVT           |                |                       |
  |                        |                |                       |
  |              +-----------+          +-----------+               |
  |              |NODE_      |          |NODE_      |               |
  |   +----------|FAILINGOVER|<---------|SYNCHING   |-----------+   |
  |   |SELF_     +-----------+ FAILOVER_+-----------+   PEER_   |   |
  |   |DOWN_EVT   |          A BEGIN_EVT  A         |   DOWN_EVT|   |
  |   |           |          |            |         |           |   |
  |   |           |          |            |         |           |   |
  |   |           |FAILOVER_ |FAILOVER_   |SYNCH_   |SYNCH_     |   |
  |   |           |END_EVT   |BEGIN_EVT   |BEGIN_EVT|END_EVT    |   |
  |   |           |          |            |         |           |   |
  |   |           |          |            |         |           |   |
  |   |           |         +--------------+        |           |   |
  |   |           +-------->|   SELF_UP_   |<-------+           |   |
  |   |   +-----------------|   PEER_UP    |----------------+   |   |
  |   |   |SELF_DOWN_EVT    +--------------+   PEER_DOWN_EVT|   |   |
  |   |   |                    A        A                   |   |   |
  |   |   |                    |        |                   |   |   |
  |   |   |         PEER_UP_EVT|        |SELF_UP_EVT        |   |   |
  |   |   |                    |        |                   |   |   |
  V   V   V                    |        |                   V   V   V
+------------+       +-----------+    +-----------+       +------------+
|SELF_DOWN_  |       |SELF_UP_   |    |PEER_UP_   |       |PEER_DOWN   |
|PEER_LEAVING|       |PEER_COMING|    |SELF_COMING|       |SELF_LEAVING|
+------------+       +-----------+    +-----------+       +------------+
       |               |       A        A       |                |
       |               |       |        |       |                |
       |       SELF_   |       |SELF_   |PEER_  |PEER_           |
       |       DOWN_EVT|       |UP_EVT  |UP_EVT |DOWN_EVT        |
       |               |       |        |       |                |
       |               |       |        |       |                |
       |               |    +--------------+    |                |
       |PEER_DOWN_EVT  +--->|  SELF_DOWN_  |<---+   SELF_DOWN_EVT|
       +------------------->|  PEER_DOWN   |<--------------------+
                            +--------------+

Acked-by: Ying Xue <[email protected]>
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoMerge branch 'dsa-misc-improvements'
David S. Miller [Wed, 8 Jun 2016 18:23:42 +0000 (11:23 -0700)]
Merge branch 'dsa-misc-improvements'

Florian Fainelli says:

====================
net: dsa: misc improvements

This patch series builds on top of Andrew's "New DSA bind, switches as devices"
patch set and does the following:

- add a few helper functions/goodies for net/dsa/dsa2.c to be as close as possible
  from net/dsa/dsa.c in terms of what drivers can expect, in particular the slave
  MDIO bus and the enabled_port_mask and phy_mii_mask

- fix the CPU port ethtools ops to work in a multiple tree setup since we can
  no longer assume a single tree is supported

- make the bcm_sf2 driver register its own MDIO bus, yet assign it to
  ds->slave_mii_bus for everything to work in net/dsa/slave.c wrt. PHY probing,
  this is a tad cleaner than what we have now

Changes in v2:

Most of the previous patches have been dropped to just keep the relevant ones
now.

Changes in v3:
- split the addition of the slave MII bus as a separate patch
- properly unwind all operations at the right place and right time (ethtool ops,
  slave MDIO bus
- fixed a few typos here and there

Changes in v4:
- removed superfluous dst agrument to dsa_cpu_port_ethtool_{setup,restore}
====================

Signed-off-by: David S. Miller <[email protected]>
8 years agonet: dsa: bcm_sf2: Register our slave MDIO bus
Florian Fainelli [Tue, 7 Jun 2016 23:32:43 +0000 (16:32 -0700)]
net: dsa: bcm_sf2: Register our slave MDIO bus

Register a slave MDIO bus which allows us to divert problematic
read/writes towards conflicting pseudo-PHY address (30). Do no longer
rely on DSA's slave_mii_bus, but instead provide our own implementation
which offers more flexibility as to what to do, and when to register it.

We need to register it by the time we are able to get access to our
memory mapped registers, which is not until drv->setup() time. In order
to avoid forward declarations, we need to re-order the function bodies a
bit.

Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Vivien Didelot <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: dsa: Initialize CPU port ethtool ops per tree
Florian Fainelli [Tue, 7 Jun 2016 23:32:42 +0000 (16:32 -0700)]
net: dsa: Initialize CPU port ethtool ops per tree

Now that we can properly support multiple distinct trees in the system,
using a global variable: dsa_cpu_port_ethtool_ops is getting clobbered
as soon as the second switch tree gets probed, and we don't want that.

We need to move this to be dynamically allocated, and since we can't
really be comparing addresses anymore to determine first time
initialization versus any other times, just move this to dsa.c and
dsa2.c where the remainder of the dst/ds initialization happens.

The operations teardown restores the master netdev's ethtool_ops to its
original ethtool_ops pointer (typically within the Ethernet driver)

Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: dsa: Add initialization helper for CPU port ethtool_ops
Florian Fainelli [Tue, 7 Jun 2016 23:32:41 +0000 (16:32 -0700)]
net: dsa: Add initialization helper for CPU port ethtool_ops

Add a helper function: dsa_cpu_port_ethtool_init() which initializes a
custom ethtool_ops structure with custom DSA ethtool operations for CPU
ports. This is a preliminary change to move the initialization outside
of net/dsa/slave.c.

Reviewed-by: Vivien Didelot <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: dsa: Provide a slave MII bus if needed
Florian Fainelli [Tue, 7 Jun 2016 23:32:40 +0000 (16:32 -0700)]
net: dsa: Provide a slave MII bus if needed

Mimic what net/dsa/dsa.c does and provide a slave MII bus by default
which will be created if the driver implements a phy_read method.

Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Vivien Didelot <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: dsa: Initialize ds->enabled_port_mask and ds->phys_mii_mask
Florian Fainelli [Tue, 7 Jun 2016 23:32:39 +0000 (16:32 -0700)]
net: dsa: Initialize ds->enabled_port_mask and ds->phys_mii_mask

Some drivers rely on these two bitmasks to contain the correct values
for them to successfully probe and initialize at drv->setup() time,
calculate correct values to put in both masks as early as possible in
dsa_get_ports_dn().

Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: dsa: Provide unique DSA slave MII bus names
Florian Fainelli [Tue, 7 Jun 2016 23:32:38 +0000 (16:32 -0700)]
net: dsa: Provide unique DSA slave MII bus names

In case we have multiples trees and switches with the same index, we
need to add another discriminating id: the switch tree.

Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Vivien Didelot <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: sched: fix missing doc annotations
Eric Dumazet [Wed, 8 Jun 2016 14:22:49 +0000 (07:22 -0700)]
net: sched: fix missing doc annotations

"make htmldocs" complains otherwise:

.//net/core/gen_stats.c:168: warning: No description found for parameter 'running'
.//include/linux/netdevice.h:1867: warning: No description found for parameter 'qdisc_running_key'

Fixes: f9eb8aea2a1e ("net_sched: transform qdisc running bit into a seqcount")
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agosfc: report supported link speeds on SFP connections
Bert Kenward [Mon, 6 Jun 2016 16:29:30 +0000 (17:29 +0100)]
sfc: report supported link speeds on SFP connections

7000-series SFC NICs connected with an SFP+ module currently fail to
report any supported link speeds.

Reported-by: Jarod Wilson <[email protected]>
Signed-off-by: Bert Kenward <[email protected]>
Reviewed-by: Jarod Wilson <[email protected]>
Tested-by: Jarod Wilson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet_sched: add missing paddattr description
Eric Dumazet [Wed, 8 Jun 2016 13:19:45 +0000 (06:19 -0700)]
net_sched: add missing paddattr description

"make htmldocs" complains otherwise:

.//net/core/gen_stats.c:65: warning: No description found for parameter 'padattr'
.//net/core/gen_stats.c:101: warning: No description found for parameter 'padattr'

Fixes: 9854518ea04d ("sched: align nlattr properly when needed")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoipv6: Skip XFRM lookup if dst_entry in socket cache is valid
Jakub Sitnicki [Wed, 8 Jun 2016 13:13:34 +0000 (15:13 +0200)]
ipv6: Skip XFRM lookup if dst_entry in socket cache is valid

At present we perform an xfrm_lookup() for each UDPv6 message we
send. The lookup involves querying the flow cache (flow_cache_lookup)
and, in case of a cache miss, creating an XFRM bundle.

If we miss the flow cache, we can end up creating a new bundle and
deriving the path MTU (xfrm_init_pmtu) from on an already transformed
dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
to xfrm_lookup(). This can happen only if we're caching the dst_entry
in the socket, that is when we're using a connected UDP socket.

To put it another way, the path MTU shrinks each time we miss the flow
cache, which later on leads to incorrectly fragmented payload. It can
be observed with ESPv6 in transport mode:

  1) Set up a transformation and lower the MTU to trigger fragmentation
    # ip xfrm policy add dir out src ::1 dst ::1 \
      tmpl src ::1 dst ::1 proto esp spi 1
    # ip xfrm state add src ::1 dst ::1 \
      proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
    # ip link set dev lo mtu 1500

  2) Monitor the packet flow and set up an UDP sink
    # tcpdump -ni lo -ttt &
    # socat udp6-listen:12345,fork /dev/null &

  3) Send a datagram that needs fragmentation with a connected socket
    # perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
    2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
    00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
    00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
    00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
    (^ ICMPv6 Parameter Problem)
    00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136

  4) Compare it to a non-connected socket
    # perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
    00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
    00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)

What happens in step (3) is:

  1) when connecting the socket in __ip6_datagram_connect(), we
     perform an XFRM lookup, miss the flow cache, create an XFRM
     bundle, and cache the destination,

  2) afterwards, when sending the datagram, we perform an XFRM lookup,
     again, miss the flow cache (due to mismatch of flowi6_iif and
     flowi6_oif, which is an issue of its own), and recreate an XFRM
     bundle based on the cached (and already transformed) destination.

To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
altogether whenever we already have a destination entry cached in the
socket. This prevents the path MTU shrinkage and brings us on par with
UDPv4.

The fix also benefits connected PINGv6 sockets, another user of
ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
twice.

Joint work with Hannes Frederic Sowa.

Reported-by: Jan Tluka <[email protected]>
Signed-off-by: Jakub Sitnicki <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: Reduce queue allocation to one in kdump kernel
Hariprasad Shenai [Wed, 8 Jun 2016 12:39:08 +0000 (18:09 +0530)]
net: Reduce queue allocation to one in kdump kernel

When in kdump kernel, reduce memory usage by only using a single Queue
Set for multiqueue devices. So make netif_get_num_default_rss_queues()
return one, when in kdump kernel.

Signed-off-by: Hariprasad Shenai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agol2tp: fix configuration passed to setup_udp_tunnel_sock()
Guillaume Nault [Wed, 8 Jun 2016 10:59:17 +0000 (12:59 +0200)]
l2tp: fix configuration passed to setup_udp_tunnel_sock()

Unused fields of udp_cfg must be all zeros. Otherwise
setup_udp_tunnel_sock() fills ->gro_receive and ->gro_complete
callbacks with garbage, eventually resulting in panic when used by
udp_gro_receive().

[   72.694123] BUG: unable to handle kernel paging request at ffff880033f87d78
[   72.695518] IP: [<ffff880033f87d78>] 0xffff880033f87d78
[   72.696530] PGD 26e2067 PUD 26e3067 PMD 342ed063 PTE 8000000033f87163
[   72.696530] Oops: 0011 [#1] SMP KASAN
[   72.696530] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pptp gre pppox ppp_generic slhc crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel evdev aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper serio_raw acpi_cpufreq button proc\
essor ext4 crc16 jbd2 mbcache virtio_blk virtio_net virtio_pci virtio_ring virtio
[   72.696530] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.7.0-rc1 #1
[   72.696530] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[   72.696530] task: ffff880035b59700 ti: ffff880035b70000 task.ti: ffff880035b70000
[   72.696530] RIP: 0010:[<ffff880033f87d78>]  [<ffff880033f87d78>] 0xffff880033f87d78
[   72.696530] RSP: 0018:ffff880035f87bc0  EFLAGS: 00010246
[   72.696530] RAX: ffffed000698f996 RBX: ffff88003326b840 RCX: ffffffff814cc823
[   72.696530] RDX: ffff88003326b840 RSI: ffff880033e48038 RDI: ffff880034c7c780
[   72.696530] RBP: ffff880035f87c18 R08: 000000000000a506 R09: 0000000000000000
[   72.696530] R10: ffff880035f87b38 R11: ffff880034b9344d R12: 00000000ebfea715
[   72.696530] R13: 0000000000000000 R14: ffff880034c7c780 R15: 0000000000000000
[   72.696530] FS:  0000000000000000(0000) GS:ffff880035f80000(0000) knlGS:0000000000000000
[   72.696530] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   72.696530] CR2: ffff880033f87d78 CR3: 0000000033c98000 CR4: 00000000000406a0
[   72.696530] Stack:
[   72.696530]  ffffffff814cc834 ffff880034b93468 0000001481416818 ffff88003326b874
[   72.696530]  ffff880034c7ccb0 ffff880033e48038 ffff88003326b840 ffff880034b93462
[   72.696530]  ffff88003326b88a ffff88003326b88c ffff880034b93468 ffff880035f87c70
[   72.696530] Call Trace:
[   72.696530]  <IRQ>
[   72.696530]  [<ffffffff814cc834>] ? udp_gro_receive+0x1c6/0x1f9
[   72.696530]  [<ffffffff814ccb1c>] udp4_gro_receive+0x2b5/0x310
[   72.696530]  [<ffffffff814d989b>] inet_gro_receive+0x4a3/0x4cd
[   72.696530]  [<ffffffff81431b32>] dev_gro_receive+0x584/0x7a3
[   72.696530]  [<ffffffff810adf7a>] ? __lock_is_held+0x29/0x64
[   72.696530]  [<ffffffff814321f7>] napi_gro_receive+0x124/0x21d
[   72.696530]  [<ffffffffa000b145>] virtnet_receive+0x8df/0x8f6 [virtio_net]
[   72.696530]  [<ffffffffa000b27e>] virtnet_poll+0x1d/0x8d [virtio_net]
[   72.696530]  [<ffffffff81431350>] net_rx_action+0x15b/0x3b9
[   72.696530]  [<ffffffff815893d6>] __do_softirq+0x216/0x546
[   72.696530]  [<ffffffff81062392>] irq_exit+0x49/0xb6
[   72.696530]  [<ffffffff81588e9a>] do_IRQ+0xe2/0xfa
[   72.696530]  [<ffffffff81587a49>] common_interrupt+0x89/0x89
[   72.696530]  <EOI>
[   72.696530]  [<ffffffff810b05df>] ? trace_hardirqs_on_caller+0x229/0x270
[   72.696530]  [<ffffffff8102b3c7>] ? default_idle+0x1c/0x2d
[   72.696530]  [<ffffffff8102b3c5>] ? default_idle+0x1a/0x2d
[   72.696530]  [<ffffffff8102bb8c>] arch_cpu_idle+0xa/0xc
[   72.696530]  [<ffffffff810a6c39>] default_idle_call+0x1a/0x1c
[   72.696530]  [<ffffffff810a6d96>] cpu_startup_entry+0x15b/0x20f
[   72.696530]  [<ffffffff81039a81>] start_secondary+0x12c/0x133
[   72.696530] Code: ff ff ff ff ff ff ff ff ff ff 7f ff ff ff ff ff ff ff 7f 00 7e f8 33 00 88 ff ff 6d 61 58 81 ff ff ff ff 5e de 0a 81 ff ff ff ff <00> 5c e2 34 00 88 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00
[   72.696530] RIP  [<ffff880033f87d78>] 0xffff880033f87d78
[   72.696530]  RSP <ffff880035f87bc0>
[   72.696530] CR2: ffff880033f87d78
[   72.696530] ---[ end trace ad7758b9a1dccf99 ]---
[   72.696530] Kernel panic - not syncing: Fatal exception in interrupt
[   72.696530] Kernel Offset: disabled
[   72.696530] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

v2: use empty initialiser instead of "{ NULL }" to avoid relying on
    first field's type.

Fixes: 38fd2af24fcf ("udp: Add socket based GRO and config")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoMerge branch 'qed-dcbnl'
David S. Miller [Wed, 8 Jun 2016 18:11:00 +0000 (11:11 -0700)]
Merge branch 'qed-dcbnl'

Sudarsana Reddy Kalluru says:

====================
qed/qede support for dcbnl.

This series adds the dcbnl functionality to the driver. Patch (1) adds
the qed infrastucture for querying/configuring the dcbx parameters.
Patch (2) adds the qed infrastructure for dcbnl APIs. And patch (3)
adds the qede support for dcbnl.
====================

Signed-off-by: David S. Miller <[email protected]>
8 years agoqede: Add dcbnl support.
Sudarsana Reddy Kalluru [Wed, 8 Jun 2016 10:22:12 +0000 (06:22 -0400)]
qede: Add dcbnl support.

This patch adds the interfaces for ieee/cee dcbnl callbacks and registers
them with the kernel.

Signed-off-by: Sudarsana Reddy Kalluru <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoqed: Add dcbnl support.
Sudarsana Reddy Kalluru [Wed, 8 Jun 2016 10:22:11 +0000 (06:22 -0400)]
qed: Add dcbnl support.

This patch adds the implementation for both cee/ieee dcbnl callbacks by
using the qed query/config APIs.

Signed-off-by: Sudarsana Reddy Kalluru <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoqed: Add support for query/config dcbx.
Sudarsana Reddy Kalluru [Wed, 8 Jun 2016 10:22:10 +0000 (06:22 -0400)]
qed: Add support for query/config dcbx.

Query API reads the dcbx data from the device shared memory and return it
to the caller. The config API configures the user provided dcbx values on
the device, and initiates the dcbx negotiation with the peer.

Signed-off-by: Sudarsana Reddy Kalluru <[email protected]>
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agofsl/qe: Do not prefix header guard with CONFIG_
Andreas Ziegler [Wed, 8 Jun 2016 09:36:56 +0000 (11:36 +0200)]
fsl/qe: Do not prefix header guard with CONFIG_

The CONFIG_ prefix should only be used for options which
can be configured through Kconfig and not for guarding headers.

Signed-off-by: Andreas Ziegler <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agodrivers/net/fsl_ucc: Do not prefix header guard with CONFIG_
Andreas Ziegler [Wed, 8 Jun 2016 09:40:28 +0000 (11:40 +0200)]
drivers/net/fsl_ucc: Do not prefix header guard with CONFIG_

The CONFIG_ prefix should only be used for options which
can be configured through Kconfig and not for guarding headers.

Signed-off-by: Andreas Ziegler <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agocxgb4: Add device id of T540-BT adapter
Hariprasad Shenai [Wed, 8 Jun 2016 09:27:28 +0000 (14:57 +0530)]
cxgb4: Add device id of T540-BT adapter

Signed-off-by: Hariprasad Shenai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoila: Perform only one translation in forwarding path
Tom Herbert [Tue, 7 Jun 2016 23:09:44 +0000 (16:09 -0700)]
ila: Perform only one translation in forwarding path

When setting up ILA in a router we noticed that the the encapsulation
is invoked twice: once in the route input path and again upon route
output. To resolve this we add a flag set_csum_neutral for the
ila_update_ipv6_locator. If this flag is set and the checksum
neutral bit is also set we assume that checksum-neutral translation
has already been performed and take no further action. The
flag is set only in ila_output path. The flag is not set for ila_input and
ila_xlat.

Tested:

Used 3 netns to set to emulate a router and two hosts. The router
translates SIR addresses between the two destinations in other two netns.
Verified ping and netperf are functional.

Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet-sysfs: fix missing <linux/of_net.h>
Ben Dooks [Tue, 7 Jun 2016 18:27:51 +0000 (19:27 +0100)]
net-sysfs: fix missing <linux/of_net.h>

The of_find_net_device_by_node() function is defined in
<linux/of_net.h> but not included in the .c file that
implements it. Fix the following warning by including the
header:

net/core/net-sysfs.c:1494:19: warning: symbol 'of_find_net_device_by_node' was not declared. Should it be static?

Signed-off-by: Ben Dooks <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agotcp: accept RST if SEQ matches right edge of right-most SACK block
Pau Espin Pedrol [Tue, 7 Jun 2016 14:30:34 +0000 (16:30 +0200)]
tcp: accept RST if SEQ matches right edge of right-most SACK block

RFC 5961 advises to only accept RST packets containing a seq number
matching the next expected seq number instead of the whole receive
window in order to avoid spoofing attacks.

However, this situation is not optimal in the case SACK is in use at the
time the RST is sent. I recently run into a scenario in which packet
losses were high while uploading data to a server, and userspace was
willing to frequently terminate connections by sending a RST. In
this case, the ACK sent on the receiver side (rcv_nxt) is frozen waiting
for a lost packet retransmission and SACK blocks are used to let the
client continue uploading data. At some point later on, the client sends
the RST (snd_nxt), which matches the next expected seq number of the
right-most SACK block on the receiver side which is going forward
receiving data.

In this scenario, as RFC 5961 defines, the RST SEQ doesn't match the
frozen main ACK at receiver side and thus gets dropped and a challenge
ACK is sent, which gets usually lost due to network conditions. The main
consequence is that the connection stays alive for a while even if it
made sense to accept the RST. This can get really bad if lots of
connections like this one are created in few seconds, allocating all the
resources of the server easily.

For security reasons, not all SACK blocks are checked (there could be a
big amount of SACK blocks => acceptable SEQ numbers). Furthermore, it
wouldn't make sense to check for RST in blocks other than the right-most
received one because the sender is not expected to be sending new data
after the RST. For simplicity, only up to the 4 most recently updated
SACK blocks (selective_acks[4] field) are compared to find the
right-most block, as usually those are the ones with bigger probability
to contain it.

This patch was tested in a 3.18 kernel and probed to improve the
situation in the scenario described above.

Signed-off-by: Pau Espin Pedrol <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Tested-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoqed: potential overflow in qed_cxt_src_t2_alloc()
Dan Carpenter [Tue, 7 Jun 2016 12:04:16 +0000 (15:04 +0300)]
qed: potential overflow in qed_cxt_src_t2_alloc()

In the current code "ent_per_page" could be more than "conn_num" making
"conn_num" negative after the subtraction.  In the next iteration
through the loop then the negative is treated as a very high positive
meaning we don't put a limit on "ent_num".  It could lead to memory
corruption.

Fixes: dbb799c39717 ('qed: Initialize hardware for new protocols')
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Yuval Mintz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agobridge: Don't insert unnecessary local fdb entry on changing mac address
Toshiaki Makita [Tue, 7 Jun 2016 10:14:17 +0000 (19:14 +0900)]
bridge: Don't insert unnecessary local fdb entry on changing mac address

The missing br_vlan_should_use() test caused creation of an unneeded
local fdb entry on changing mac address of a bridge device when there is
a vlan which is configured on a bridge port but not on the bridge
device.

Fixes: 2594e9064a57 ("bridge: vlan: add per-vlan struct and move to rhashtables")
Signed-off-by: Toshiaki Makita <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoMerge branch 'vrf-local'
David S. Miller [Wed, 8 Jun 2016 07:25:38 +0000 (00:25 -0700)]
Merge branch 'vrf-local'

David Ahern says:

====================
net: vrf: Add support for local traffic to local addresses

Add support for locally originated traffic to VRF-local addresses,
be it addresses on enslaved devices or addresses on the VRF device:

$ ip addr show dev red
33: red: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc pfifo_fast state UP group default qlen 1000
    link/ether be:00:53:b5:e4:25 brd ff:ff:ff:ff:ff:ff
    inet 1.1.1.1/32 scope global red
       valid_lft forever preferred_lft forever
    inet6 1111:1::1/128 scope global
       valid_lft forever preferred_lft forever

$ ip addr show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
    link/ether 02:e0:f9:79:34:bd brd ff:ff:ff:ff:ff:ff
    inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 2100:1::1/120 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
       valid_lft forever preferred_lft forever

$ ping -c1 -I red 10.100.1.1
    ping: Warning: source address might be selected on device other than red.
    PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
    64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms

$ ping -c1 -I red 1.1.1.1
PING 1.1.1.1 (1.1.1.1) from 1.1.1.1 red: 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.136 ms

--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.136/0.136/0.136/0.000 ms

$ ping6 -c1 -I red  2100:1::1
ping6: Warning: source address might be selected on device other than red.
PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.167 ms

--- 2100:1::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.167/0.167/0.167/0.000 ms

$ ping6 -c1 -I red 1111::1
PING 1111::1(1111::1) from 1111:1::1 red: 56 data bytes
64 bytes from 1111::1: icmp_seq=1 ttl=64 time=0.187 ms

--- 1111::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.187/0.187/0.187/0.000 ms

This change also enables use of loopback address on the VRF device:
$ ip addr add dev red 127.0.0.1/8

$ ping -c1 -I red 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms
====================

Signed-off-by: David S. Miller <[email protected]>
8 years agonet: vrf: ipv6 support for local traffic to local addresses
David Ahern [Tue, 7 Jun 2016 03:50:40 +0000 (20:50 -0700)]
net: vrf: ipv6 support for local traffic to local addresses

Add support for locally originated traffic to VRF-local IPv6 addresses.
Similar to IPv4 a local dst is set on the skb and the packet is
reinserted with a call to netif_rx. With this patch, ping, tcp and udp
packets to a local IPv6 address are successfully routed:

    $ ip addr show dev eth1
    4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
        link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
        inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
           valid_lft forever preferred_lft forever
        inet6 2100:1::1/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
           valid_lft forever preferred_lft forever

    $ ping6 -c1 -I red 2100:1::1
    ping6: Warning: source address might be selected on device other than red.
    PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
    64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.098 ms

ip6_input is exported so the VRF driver can use it for the dst input
function. The dst_alloc function for IPv4 defaults to setting the input and
output functions; IPv6's does not. VRF does not need to duplicate the Rx path
so just export the ipv6 input function.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: vrf: ipv4 support for local traffic to local addresses
David Ahern [Tue, 7 Jun 2016 03:50:39 +0000 (20:50 -0700)]
net: vrf: ipv4 support for local traffic to local addresses

Add support for locally originated traffic to VRF-local addresses. If
destination device for an skb is the loopback or VRF device then set
its dst to a local version of the VRF cached dst_entry and call netif_rx
to insert the packet onto the rx queue - similar to what is done for
loopback. This patch handles IPv4 support; follow on patch handles IPv6.

With this patch, ping, tcp and udp packets to a local IPv4 address are
successfully routed:

    $ ip addr show dev eth1
    4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
        link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
        inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
           valid_lft forever preferred_lft forever
        inet6 2100:1::1/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
           valid_lft forever preferred_lft forever

    $ ping -c1 -I red 10.100.1.1
    ping: Warning: source address might be selected on device other than red.
    PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
    64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms

This patch also enables use of IPv4 loopback address on the VRF device:
    $ ip addr add dev red 127.0.0.1/8

    $ ping -c1 -I red 127.0.0.1
    PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
    64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: vrf: Minor refactoring for local address patches
David Ahern [Tue, 7 Jun 2016 03:50:38 +0000 (20:50 -0700)]
net: vrf: Minor refactoring for local address patches

Move the stripping of the ethernet header from is_ip_tx_frame into the
ipv4 and ipv6 outbound functions and collapse vrf_send_v4_prep into
vrf_process_v4_outbound.

Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agogue: Implement direction IP encapsulation
Tom Herbert [Mon, 6 Jun 2016 23:06:02 +0000 (16:06 -0700)]
gue: Implement direction IP encapsulation

This patch implements direct encapsulation of IPv4 and IPv6 packets
in UDP. This is done a version "1" of GUE and as explained in I-D
draft-ietf-nvo3-gue-03.

Changes here are only in the receive path, fou with IPxIPx already
supports the transmit side. Both the normal receive path and
GRO path are modified to check for GUE version and check for
IP version in the case that GUE version is "1".

Tested:

IPIP with direct GUE encap
  1 TCP_STREAM
    4530 Mbps
  200 TCP_RR
    1297625 tps
    135/232/444 90/95/99% latencies

IP4IP6 with direct GUE encap
  1 TCP_STREAM
    4903 Mbps
  200 TCP_RR
    1184481 tps
    149/253/473 90/95/99% latencies

IP6IP6 direct GUE encap
  1 TCP_STREAM
   5146 Mbps
  200 TCP_RR
    1202879 tps
    146/251/472 90/95/99% latencies

SIT with direct GUE encap
  1 TCP_STREAM
    6111 Mbps
  200 TCP_RR
    1250337 tps
    139/241/467 90/95/99% latencies

Signed-off-by: Tom Herbert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Wed, 8 Jun 2016 03:41:36 +0000 (20:41 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
 "Fixes for crap of assorted ages: EOPENSTALE one is 4.2+, autofs one is
  4.6, d_walk - 3.2+.

  The atomic_open() and coredump ones are regressions from this window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  coredump: fix dumping through pipes
  fix a regression in atomic_open()
  fix d_walk()/non-delayed __d_free() race
  autofs braino fix for do_last()
  fix EOPENSTALE bug in do_last()

8 years agocoredump: fix dumping through pipes
Mateusz Guzik [Sun, 5 Jun 2016 21:14:14 +0000 (23:14 +0200)]
coredump: fix dumping through pipes

The offset in the core file used to be tracked with ->written field of
the coredump_params structure. The field was retired in favour of
file->f_pos.

However, ->f_pos is not maintained for pipes which leads to breakage.

Restore explicit tracking of the offset in coredump_params. Introduce
->pos field for this purpose since ->written was already reused.

Fixes: a00839395103 ("get rid of coredump_params->written").
Reported-by: Zbigniew Jędrzejewski-Szmek <[email protected]>
Signed-off-by: Mateusz Guzik <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Signed-off-by: Al Viro <[email protected]>
8 years agofix a regression in atomic_open()
Al Viro [Wed, 8 Jun 2016 01:53:51 +0000 (21:53 -0400)]
fix a regression in atomic_open()

open("/foo/no_such_file", O_RDONLY | O_CREAT) on should fail with
EACCES when /foo is not writable; failing with ENOENT is obviously
wrong.  That got broken by a braino introduced when moving the
creat_error logics from atomic_open() to lookup_open().  Easy to
fix, fortunately.

Spotted-by: "Yan, Zheng" <[email protected]>
Tested-by: "Yan, Zheng" <[email protected]>
Signed-off-by: Al Viro <[email protected]>
8 years agofix d_walk()/non-delayed __d_free() race
Al Viro [Wed, 8 Jun 2016 01:26:55 +0000 (21:26 -0400)]
fix d_walk()/non-delayed __d_free() race

Ascend-to-parent logics in d_walk() depends on all encountered child
dentries not getting freed without an RCU delay.  Unfortunately, in
quite a few cases it is not true, with hard-to-hit oopsable race as
the result.

Fortunately, the fix is simiple; right now the rule is "if it ever
been hashed, freeing must be delayed" and changing it to "if it
ever had a parent, freeing must be delayed" closes that hole and
covers all cases the old rule used to cover.  Moreover, pipes and
sockets remain _not_ covered, so we do not introduce RCU delay in
the cases which are the reason for having that delay conditional
in the first place.

Cc: [email protected] # v3.2+ (and watch out for __d_materialise_dentry())
Signed-off-by: Al Viro <[email protected]>
8 years agocpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo
Srinivas Pandruvada [Wed, 8 Jun 2016 00:38:53 +0000 (17:38 -0700)]
cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo

When turbo is disabled, the ->set_policy() interface is broken.

For example, when turbo is disabled and cpuinfo.max = 2900000 (full
max turbo frequency), setting the limits results in frequency less
than the requested one:
Set 1000000 KHz results in 0700000 KHz
Set 1500000 KHz results in 1100000 KHz
Set 2000000 KHz results in  1500000 KHz

This is because the limits->max_perf fraction is calculated using
the max turbo frequency as the reference, but when the max P-State is
capped in intel_pstate_get_min_max(), the reference is not the max
turbo P-State. This results in reducing max P-State.

One option is to always use max turbo as reference for calculating
limits. But this will not be correct. By definition the intel_pstate
sysfs limits, shows percentage of available performance. So when
BIOS has disabled turbo, the available performance is max non turbo.
So the max_perf_pct should still show 100%.

Signed-off-by: Srinivas Pandruvada <[email protected]>
[ rjw : Subject & changelog, rewrite in fewer lines of code ]
Cc: All applicable <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
8 years agocpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()
Srinivas Pandruvada [Wed, 8 Jun 2016 00:38:52 +0000 (17:38 -0700)]
cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()

The limits->max_perf is rounded_up but immediately overwritten by
another assignment to limits->max_perf.

Move that operation to the correct location.

While here also added a pr_debug() call in ->set_policy to aid in
debugging.

Fixes: 785ee2788141 (cpufreq: intel_pstate: Fix limits->max_perf rounding error)
Signed-off-by: Srinivas Pandruvada <[email protected]>
[ rjw : Subject & changelog ]
Cc: 4.4+ <[email protected]> # 4.4+
Signed-off-by: Rafael J. Wysocki <[email protected]>
8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
David S. Miller [Wed, 8 Jun 2016 00:14:10 +0000 (17:14 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains two Netfilter/IPVS fixes for your net
tree, they are:

1) Fix missing alignment in next offset calculation for standard
   targets, introduced in the previous merge window, patch from
   Florian Westphal.

2) Fix to correct the handling of outgoing connections which use the
   SIP-pe such that the binding of a real-server is updated when needed.
   This was an omission from changes introduced by Marco Angaroni in
   the previous merge window too, to allow handling of outgoing
   connections by the SIP-pe. Patch and report came via Simon Horman.
====================

Signed-off-by: David S. Miller <[email protected]>
8 years agotcp: record TLP and ER timer stats in v6 stats
Yuchung Cheng [Mon, 6 Jun 2016 22:07:18 +0000 (15:07 -0700)]
tcp: record TLP and ER timer stats in v6 stats

The v6 tcp stats scan do not provide TLP and ER timer information
correctly like the v4 version . This patch fixes that.

Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Fixes: eed530b6c676 ("tcp: early retransmit")
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: sched: fix tc_should_offload for specific clsact classes
Daniel Borkmann [Mon, 6 Jun 2016 20:50:39 +0000 (22:50 +0200)]
net: sched: fix tc_should_offload for specific clsact classes

When offloading classifiers such as u32 or flower to hardware, and the
qdisc is clsact (TC_H_CLSACT), then we need to differentiate its classes,
since not all of them handle ingress, therefore we must leave those in
software path. Add a .tcf_cl_offload() callback, so we can generically
handle them, tested on ixgbe.

Fixes: 10cbc6843446 ("net/sched: cls_flower: Hardware offloaded filters statistics support")
Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support")
Fixes: a1b7c5fd7fe9 ("net: sched: add cls_u32 offload hooks for netdevs")
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoact_police: fix a crash during removal
WANG Cong [Mon, 6 Jun 2016 16:54:30 +0000 (09:54 -0700)]
act_police: fix a crash during removal

The police action is using its own code to initialize tcf hash
info, which makes us to forgot to initialize a->hinfo correctly.
Fix this by calling the helper function tcf_hash_create() directly.

This patch fixed the following crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
 IP: [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91
 PGD d3c34067 PUD d3e18067 PMD 0
 Oops: 0000 [#1] SMP
 CPU: 2 PID: 853 Comm: tc Not tainted 4.6.0+ #87
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 task: ffff8800d3e28040 ti: ffff8800d3f6c000 task.ti: ffff8800d3f6c000
 RIP: 0010:[<ffffffff810c099f>]  [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91
 RSP: 0000:ffff88011b203c80  EFLAGS: 00010002
 RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000028
 RBP: ffff88011b203d40 R08: 0000000000000001 R09: 0000000000000000
 R10: ffff88011b203d58 R11: ffff88011b208000 R12: 0000000000000001
 R13: ffff8800d3e28040 R14: 0000000000000028 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000028 CR3: 00000000d4be1000 CR4: 00000000000006e0
 Stack:
  ffff8800d3e289c0 0000000000000046 000000001b203d60 ffffffff00000000
  0000000000000000 ffff880000000000 0000000000000000 ffffffff00000000
  ffffffff8187142c ffff88011b203ce8 ffff88011b203ce8 ffffffff8101dbfc
 Call Trace:
  <IRQ>
  [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
  [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35
  [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35
  [<ffffffff810a9604>] ? sched_clock_local+0x11/0x78
  [<ffffffff810bf6a1>] ? mark_lock+0x24/0x201
  [<ffffffff810c1dbd>] lock_acquire+0x120/0x1b4
  [<ffffffff810c1dbd>] ? lock_acquire+0x120/0x1b4
  [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
  [<ffffffff81aad89f>] _raw_spin_lock_bh+0x3c/0x72
  [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
  [<ffffffff8187142c>] __tcf_hash_release+0x77/0xd1
  [<ffffffff81871a27>] tcf_action_destroy+0x49/0x7c
  [<ffffffff81870b1c>] tcf_exts_destroy+0x20/0x2d
  [<ffffffff8189273b>] u32_destroy_key+0x1b/0x4d
  [<ffffffff81892788>] u32_delete_key_freepf_rcu+0x1b/0x1d
  [<ffffffff810de3b8>] rcu_process_callbacks+0x610/0x82e
  [<ffffffff8189276d>] ? u32_destroy_key+0x4d/0x4d
  [<ffffffff81ab0bc1>] __do_softirq+0x191/0x3f4

Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Cc: Jamal Hadi Salim <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoMerge branch 'net-sched-fast-stats'
David S. Miller [Tue, 7 Jun 2016 23:37:14 +0000 (16:37 -0700)]
Merge branch 'net-sched-fast-stats'

Eric Dumazet says:

====================
net: sched: faster stats gathering

A while back, I sent one RFC patch using lockless stats gathering
on 64bit arches.

This patch series does it more cleanly, using a seqcount.

Since qdisc/class stats are written at dequeue() time,
we can ask the dequeue to change the seqcount, so that
stats readers can avoid taking the root qdisc lock,
and instead the typical read_seqcount_{begin|retry} guarded
loop.

This does not change fast path costs, as the seqcount
increments are not more expensive than the bit manipulation,
and allows readers to not freeze the fast path anymore.
====================

Signed-off-by: David S. Miller <[email protected]>
8 years agonet: sched: do not acquire qdisc spinlock in qdisc/class stats dump
Eric Dumazet [Mon, 6 Jun 2016 16:37:16 +0000 (09:37 -0700)]
net: sched: do not acquire qdisc spinlock in qdisc/class stats dump

Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :

For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure. Not only the dumps take time, they also slow
down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.

An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
that might need the qdisc lock in fq_codel_dump_stats() and
fq_codel_dump_class_stats()

In v2 of this patch, I now use the Qdisc running seqcount to provide
consistent reads of packets/bytes counters, regardless of 32/64 bit arches.

I also changed rate estimators to use the same infrastructure
so that they no longer need to lock root qdisc lock.

[1]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Kevin Athey <[email protected]>
Cc: Xiaotian Pei <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet_sched: transform qdisc running bit into a seqcount
Eric Dumazet [Mon, 6 Jun 2016 16:37:15 +0000 (09:37 -0700)]
net_sched: transform qdisc running bit into a seqcount

Instead of using a single bit (__QDISC___STATE_RUNNING)
in sch->__state, use a seqcount.

This adds lockdep support, but more importantly it will allow us
to sample qdisc/class statistics without having to grab qdisc root lock.

Signed-off-by: Eric Dumazet <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agofq_codel: return non zero qlen in class dumps
Eric Dumazet [Mon, 6 Jun 2016 16:12:39 +0000 (09:12 -0700)]
fq_codel: return non zero qlen in class dumps

We properly scan the flow list to count number of packets,
but John passed 0 to gnet_stats_copy_queue() so we report
a zero value to user space instead of the result.

Fixes: 640158536632 ("net: sched: restrict use of qstats qlen")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: John Fastabend <[email protected]>
Acked-by: John Fastabend <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoMerge branch 'u32-hwoffload-fixes'
David S. Miller [Tue, 7 Jun 2016 23:27:15 +0000 (16:27 -0700)]
Merge branch 'u32-hwoffload-fixes'

Jakub Kicinski says:

====================
cls_u32 hardware offload fixes

This set fixes two small issues with error codes I noticed
in cls_u32.  Second patch could be viewed as user space API
change but that portion of API is not part of any release,
yet.

Compile tested only.
====================

Signed-off-by: David S. Miller <[email protected]>
8 years agonet: cls_u32: be more strict about skip-sw flag
Jakub Kicinski [Mon, 6 Jun 2016 15:16:48 +0000 (16:16 +0100)]
net: cls_u32: be more strict about skip-sw flag

Return an error if user requested skip-sw and the underlaying
hardware cannot handle tc offloads (or offloads are disabled).

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: cls_u32: fix error code for invalid flags
Jakub Kicinski [Mon, 6 Jun 2016 15:16:47 +0000 (16:16 +0100)]
net: cls_u32: fix error code for invalid flags

'err' variable is not set in this test, we would return whatever
previous test set 'err' to.

Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Sridhar Samudrala <[email protected]>
Acked-by: John Fastabend <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agogtp: #define _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__
Colin Ian King [Mon, 6 Jun 2016 15:08:41 +0000 (16:08 +0100)]
gtp: #define _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__

Fix clang build warning:

./include/uapi/linux/gtp.h:1:9: warning: '_UAPI_LINUX_GTP_H_' is
used as a header guard here, followed by #define of a different
macro [-Wheader-guard]

fix by defining  _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__

Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 7 Jun 2016 23:24:44 +0000 (16:24 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "This finally removes the CLK_IS_ROOT flag by picking up the last few
  stragglers that didn't get merged by anyone this time around.

  Better to do it now than wait for another one to pop up.  There's also
  a minor maintainers update and a Kconfig fix"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: nxp: Select MFD_SYSCON for creg driver
  MAINTAINERS: Add file patterns for clock device tree bindings
  clk: Remove CLK_IS_ROOT flag
  clk: microchip: Remove CLK_IS_ROOT
  powerpc/512x: clk: Remove CLK_IS_ROOT
  vexpress/spc: Remove CLK_IS_ROOT

8 years agoMerge branch 'be2net-noncrit-fixes'
David S. Miller [Tue, 7 Jun 2016 23:18:20 +0000 (16:18 -0700)]
Merge branch 'be2net-noncrit-fixes'

Sathya Perla says:

====================
be2net: patch set

Hi David, the following patch set contains three non-critical fixes that
can go into the net-next tree.

Patch 1 fixes the logic for provisioning queue pairs on VFs to take into
account the limit on number of TXQs too as in some profiles the number
of TXQs is less than that of RXQs.

Patch 2 enables WoL support from shutdown on Skyhawk.

Patch 3 enhances the logic for provisioning queue pairs on VFs on
SR-IOV over multi-partition configs. Each PF (partition) on a port has to
compute the number of RSS tables it's VFs can use.
====================

Signed-off-by: David S. Miller <[email protected]>
8 years agobe2net: Fix provisioning of RSS for VFs in multi-partition configurations
Somnath Kotur [Mon, 6 Jun 2016 11:22:10 +0000 (07:22 -0400)]
be2net: Fix provisioning of RSS for VFs in multi-partition configurations

Currently, we do not distribute queue resources to enable RSS for VFs
in multi-channel/partition configurations.
Fix this by having each PF(SRIOV capable) calculate it's share of the
15 RSS Policy Tables available per port before provisioning resources for
all the VFs.
This  proportional share calculation is done based on division of the
PF's MAX VFs with the Total MAX VFs on that port. It also needs to
learn about the no: of NIC PFs on the port and subtract that from
the 15 RSS Policy Tables on the port.

Signed-off-by: Somnath Kotur <[email protected]>
Signed-off-by: Sathya Perla <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agobe2net: Enable Wake-On-LAN from shutdown for Skyhawk
Sriharsha Basavapatna [Mon, 6 Jun 2016 11:22:09 +0000 (07:22 -0400)]
be2net: Enable Wake-On-LAN from shutdown for Skyhawk

Skyhawk does support wake-up from ACPI shutdown state - S5, provided the
platform supports it (like Auxiliary power source etc). The changes listed
below are done to fix this.

1) There's no need to defer the HW configuration of WOL to be_suspend().
Remove this in be_suspend() and move it to be_set_wol() ethtool function
so it is configured directly in the context of ethtool. This automatically
takes care of the shutdown case.

2) The driver incorrectly uses WOL_CAP field in the FW response to
get_acpi_wol_cap() command, to determine if WOL is enabled. Instead the
driver must rely on the macaddr field in the response to infer WOL state.

3) In be_get_config() during init, if we find that WOL is enabled in FW,
call pci_enable_wake() to enable pmcsr.pme_en bit. This is needed to
support persistent WOL configuration provided by the FW in some platforms.

4) Remove code in be_set_wol() that writes to PCICFG_PM_CONTROL_OFFSET
to set pme_en bit; pci_enable_wake() sets that.

Fixes: 028991e49 ("Enabling Wake-on-LAN is not supported in S5 state")
Signed-off-by: Sriharsha Basavapatna <[email protected]>
Signed-off-by: Sathya Perla <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agobe2net: use max-TXQs limit too while provisioning VF queue pairs
Suresh Reddy [Mon, 6 Jun 2016 11:22:08 +0000 (07:22 -0400)]
be2net: use max-TXQs limit too while provisioning VF queue pairs

When the PF driver provisions resources for VFs, it currently only looks
at max RSS queues available to calculate the number of VF queue pairs.
This logic breaks when there are less number of TX-queues than RSS-queues.
This patch fixes this problem by using the max-TXQs available in the
PF-pool in the calculations. As a part of this change the
be_calculate_vf_qs() routine is renamed as be_calculate_vf_res() and the
code that calculates limits on other related resources is moved here to
contain all resource calculation code inside one routine.

Signed-off-by: Suresh Reddy <[email protected]>
Signed-off-by: Sathya Perla <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agonet: fec: fix spelling mistakes and add missing newline
Colin Ian King [Mon, 6 Jun 2016 08:21:44 +0000 (09:21 +0100)]
net: fec: fix spelling mistakes and add missing newline

trivial fix to spelling mistakes and add missing newline in pr_err
messages

Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Fugang Duan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agoMerge branch 'bnxt_en-fixes'
David S. Miller [Tue, 7 Jun 2016 23:02:04 +0000 (16:02 -0700)]
Merge branch 'bnxt_en-fixes'

Michael Chan says:

====================
bnxt_en: Bug fixes.

Fix a race condition and VLAN rx acceleration logic.
====================

Signed-off-by: David S. Miller <[email protected]>
8 years agobnxt_en: Simplify VLAN receive logic.
Michael Chan [Mon, 6 Jun 2016 06:37:16 +0000 (02:37 -0400)]
bnxt_en: Simplify VLAN receive logic.

Since both CTAG and STAG rx acceleration must be enabled together, we
only need to check one feature flag (NETIF_F_HW_VLAN_CTAG_RX) before
calling __vlan_hwaccel_put_tag().

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agobnxt_en: Enable and disable RX CTAG and RX STAG VLAN acceleration together.
Michael Chan [Mon, 6 Jun 2016 06:37:15 +0000 (02:37 -0400)]
bnxt_en: Enable and disable RX CTAG and RX STAG VLAN acceleration together.

The hardware can only be set to strip or not strip both the VLAN CTAG and
STAG.  It cannot strip one and not strip the other.  Add logic to
bnxt_fix_features() to toggle both feature flags when the user is toggling
one of them.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agobnxt_en: Fix tx push race condition.
Michael Chan [Mon, 6 Jun 2016 06:37:14 +0000 (02:37 -0400)]
bnxt_en: Fix tx push race condition.

Set the is_push flag in the software BD before the tx data is pushed to
the chip.  It is possible to get the tx interrupt as soon as the tx data
is pushed.  The tx handler will not handle the event properly if the
is_push flag is not set and it will crash.

Signed-off-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
8 years agodrivers/net: support hdlc function for QE-UCC
Zhao Qiang [Mon, 6 Jun 2016 06:30:02 +0000 (14:30 +0800)]
drivers/net: support hdlc function for QE-UCC

The driver add hdlc support for Freescale QUICC Engine.
It support NMSI and TSA mode.

Signed-off-by: Zhao Qiang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
This page took 0.15469 seconds and 4 git commands to generate.