Previously there was no way of updating flow rule stats after they
have been offloaded to hardware. This is solved by keeping track of
stats received from hardware and providing this to the TC handler
on request.
Adds metadata describing the mask id of each flow and keeps track of
flows installed in hardware. Previously a flow could not be removed
from hardware as there was no way of knowing if that a specific flow
was installed. This is solved by storing the offloaded flows in a
hash table.
Extends matching capabilities for flower offloads to include vlan,
layer 2, layer 3 and layer 4 type matches. This includes both exact
and wildcard matching.
Extends the flower flow add function by calculating which match
fields are present in the flower offload structure and allocating
the appropriate space to describe these.
nfp: provide infrastructure for offloading flower based TC filters
Adds a flower based TC offload handler for representor devices, this
is in addition to the bpf based offload handler. The changes in this
patch will be used in a follow-up patch to add tc flower offload to
the NFP.
The flower app enables tc offloads on representors by default.
Simon Horman [Thu, 29 Jun 2017 20:08:12 +0000 (22:08 +0200)]
nfp: add phys_switch_id support
Add phys_switch_id support by allowing lookup of
SWITCHDEV_ATTR_ID_PORT_PARENT_ID via the nfp_repr_port_attr_get
switchdev operation.
This is visible to user-space in the phys_switch_id attribute
of a netdev.
e.g.
cd /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0
find . -name phys_switch_id | xargs grep .
./net/eth3/phys_switch_id:00154d1300bd
./net/eth4/phys_switch_id:00154d1300bd
./net/eth2/phys_switch_id:00154d1300bd
grep: ./net/eth5/phys_switch_id: Operation not supported
In the above eth2 and eth3 and representor netdevs for the first and second
physical port. eth4 is the representor for the PF. And eth5 is the PF netdev.
Simon Horman [Thu, 29 Jun 2017 20:08:11 +0000 (22:08 +0200)]
net: switchdev: add SET_SWITCHDEV_OPS helper
Add a helper to allow switchdev ops to be set if NET_SWITCHDEV is configured
and do nothing otherwise. This allows for slightly cleaner code which
uses switchdev but does not select NET_SWITCHDEV.
David S. Miller [Sat, 1 Jul 2017 14:39:09 +0000 (07:39 -0700)]
Merge branch 'net-refcount_t'
Elena Reshetova says:
====================
v3 net generic subsystem refcount conversions
Changes in v3:
Rebased on top of the net-next tree.
Changes in v2:
No changes in patches apart from rebases, but now by
default refcount_t = atomic_t (*) and uses all atomic standard operations
unless CONFIG_REFCOUNT_FULL is enabled. This is a compromise for the
systems that are critical on performance (such as net) and cannot accept even
slight delay on the refcounter operations.
This series, for core network subsystem components, replaces atomic_t reference
counters with the new refcount_t type and API (see include/linux/refcount.h).
By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.
These patches contain only generic net pieces. Other changes will be sent separately.
The patches are fully independent and can be cherry-picked separately.
The big patches, such as conversions for sock structure, need a very detailed
look from maintainers: refcount managing is quite complex in them and while
it seems that they would benefit from the change, extra checking is needed.
The biggest corner issue is the fact that refcount_inc() does not increment
from zero.
If there are no objections to the patches, please merge them via respective trees.
* The respective change is currently merged into -next as
"locking/refcount: Create unchecked atomic_t implementation".
====================
Reshetova, Elena [Fri, 30 Jun 2017 10:08:10 +0000 (13:08 +0300)]
net: convert packet_fanout.sk_ref from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:08:09 +0000 (13:08 +0300)]
net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:08:08 +0000 (13:08 +0300)]
net: convert net.passive from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:08:07 +0000 (13:08 +0300)]
net: convert inet_frag_queue.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:08:06 +0000 (13:08 +0300)]
net: convert fib_rule.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:08:05 +0000 (13:08 +0300)]
net: convert unix_address.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:08:04 +0000 (13:08 +0300)]
net: convert netpoll_info.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:08:03 +0000 (13:08 +0300)]
net: convert in_device.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:08:02 +0000 (13:08 +0300)]
net: convert ip_mc_list.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:08:01 +0000 (13:08 +0300)]
net: convert sock.sk_refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
This patch uses refcount_inc_not_zero() instead of
atomic_inc_not_zero_hint() due to absense of a _hint()
version of refcount API. If the hint() version must
be used, we might need to revisit API.
Reshetova, Elena [Fri, 30 Jun 2017 10:08:00 +0000 (13:08 +0300)]
net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:07:59 +0000 (13:07 +0300)]
net: convert sk_buff_fclones.fclone_ref from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:07:58 +0000 (13:07 +0300)]
net: convert sk_buff.users from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:07:57 +0000 (13:07 +0300)]
net: convert nf_bridge_info.use from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:07:56 +0000 (13:07 +0300)]
net: convert neigh_params.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:07:55 +0000 (13:07 +0300)]
net: convert neighbour.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Reshetova, Elena [Fri, 30 Jun 2017 10:07:54 +0000 (13:07 +0300)]
net: convert inet_peer.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
This conversion requires overall +1 on the whole
refcounting scheme.
David S. Miller [Fri, 30 Jun 2017 17:11:43 +0000 (13:11 -0400)]
Merge branch 'PTP-support-for-macb-driver'
Rafal Ozieblo says:
====================
PTP support for macb driver
This patch series adds support for PTP synchronization protocol
in Cadence GEM driver based on PHC.
v2 changes:
* removed alarm's support
* removed external time stamp support
* removed PTP event interrupt handling
* removed ptp_hw_support flag
* removed all extra sanity checks
* removed unnecessary #ifdef
* fixed coding style and alligment issues
* renamed macb.c to macb_main.c
v3 changes:
* added checking NULL ptr from ptp_clock_register()
* fixed error codes return
* locals list in "upside down Christmas tree" style
* fixed some other issues from review
v4 changes:
* respin to the newest next-next (28 Jun 2017)
====================
Rafal Ozieblo [Thu, 29 Jun 2017 06:14:16 +0000 (07:14 +0100)]
net: macb: Add hardware PTP support
This patch is based on original Harini's patch and Andrei's patch,
implemented in a separate file to ease the review/maintanance
and integration with other platforms.
This driver supports GEM-GXL:
- Register ptp clock framework
- Initialize PTP related registers
- HW time stamp on the PTP Ethernet packets are received using the
SO_TIMESTAMPING API. Time stamps are obtained from the dma buffer
descriptors
- add macb_ptp to compilation chain
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree. This batch contains connection tracking updates for the cleanup
iteration path, patches from Florian Westphal:
X) Skip unconfirmed conntracks in nf_ct_iterate_cleanup_net(), just set
dying bit to let the CPU release them.
X) Add nf_ct_iterate_destroy() to be used on module removal, to kill
conntrack from all namespace.
X) Restart iteration on hashtable resizing, since both may occur at
the same time.
X) Use the new nf_ct_iterate_destroy() to remove conntrack with NAT
mapping on module removal.
X) Use nf_ct_iterate_destroy() to remove conntrack entries helper
module removal, from Liping Zhang.
X) Use nf_ct_iterate_cleanup_net() to remove the timeout extension
if user requests this, also from Liping.
X) Add net_ns_barrier() and use it from FTP helper, so make sure
no concurrent namespace removal happens at the same time while
the helper module is being removed.
X) Use NFPROTO_MAX in layer 3 conntrack protocol array, to reduce
module size. Same thing in nf_tables.
Updates for the nf_tables infrastructure:
X) Prepare usage of the extended ACK reporting infrastructure for
nf_tables.
X) Remove unnecessary forward declaration in nf_tables hash set.
X) Skip set size estimation if number of element is not specified.
X) Changes to accomodate a (faster) unresizable hash set implementation,
for anonymous sets and dynamic size fixed sets with no timeouts.
X) Faster lookup function for unresizable hash table for 2 and 4
bytes key.
And, finally, a bunch of asorted small updates and cleanups:
X) Do not hold reference to netdev from ipt_CLUSTER, instead subscribe
to device events and look up for index from the packet path, this
is fixing an issue that is present since the very beginning, patch
from Xin Long.
X) Use nf_register_net_hook() in ipt_CLUSTER, from Florian Westphal.
X) Use ebt_invalid_target() whenever possible in the ebtables tree,
from Gao Feng.
X) Calm down compilation warning in nf_dup infrastructure, patch from
stephen hemminger.
X) Statify functions in nftables rt expression, also from stephen.
X) Update Makefile to use canonical method to specify nf_tables-objs.
From Jike Song.
X) Use nf_conntrack_helpers_register() in amanda and H323.
X) Space cleanup for ctnetlink, from linzhang.
====================
Kalle Valo [Fri, 30 Jun 2017 10:48:19 +0000 (13:48 +0300)]
Merge tag 'iwlwifi-next-for-kalle-2017-06-30' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next
Final set of patches for 4.13
* Some important fixes for 9000 HW;
* FW API changes for the upcoming -30 ucode release;
* A few new PCI IDs for 9000 series;
* Reorganization of common files;
* Some more fixes and improvements here and there
* Initialization and other important fixes for 9000 series;
* Support for version 30 of the FW API for 8000 and 9000 series;
Ganapathi Bhat [Wed, 28 Jun 2017 06:56:58 +0000 (12:26 +0530)]
mwifiex: do not update MCS set from hostapd
We should not copy the MCS set from hostapd RX-STBC. We
have to just use the MCS set supported by the hardware.
This fixes an issue, where mwifiex is advertising wrong
MCS sets in beacons.
Fixes: 474a41e94dfc ("mwifiex: update MCS set as per RX-STBC bit from hostapd") Signed-off-by: Ganapathi Bhat <[email protected]> Signed-off-by: Kalle Valo <[email protected]>
nl80211: Don't verify owner_nlportid on NAN commands
If NAN interface is created with NL80211_ATTR_SOCKET_OWNER, the socket
that is used to create the interface is used for all NAN operations and
reporting NAN events.
However, it turns out that sending commands and receiving events on
the same socket is not possible in a completely race-free way:
If the socket buffer is overflowed by the events, the command response
will not be sent. In that case the caller will block forever on recv.
Using non-blocking socket for commands is more complicated and still
the command response or ack may not be received.
So, keep unicasting NAN events to the interface creator, but allow
using a different socket for commands.
The driver used cfg80211_connect_result() which is basically a wrapper
around cfg80211_connect_done() passing a subset of the information that
can be passed. For upcoming functionality this is not sufficient so
switching to use cfg80211_connect_done().
brcmfmac: support 4-way handshake offloading for WPA/WPA2-PSK
The firmware may have supplicant code built-in. This is detected
by the driver and indicated in the wiphy features flags. User-space
can use this flag to determine whether or not to provide the
pre-shared key material in the nl80211 CONNECT command.
1) Need to access netdev->num_rx_queues behind an accessor in netvsc
driver otherwise the build breaks with some configs, from Arnd
Bergmann.
2) Add dummy xfrm_dev_event() so that build doesn't fail when
CONFIG_XFRM_OFFLOAD is not set. From Hangbin Liu.
3) Don't OOPS when pfkey_msg2xfrm_state() signals an erros, from Dan
Carpenter.
4) Fix MCDI command size for filter operations in sfc driver, from
Martin Habets.
5) Fix UFO segmenting so that we don't calculate incorrect checksums,
from Michal Kubecek.
6) When ipv6 datagram connects fail, reset destination address and
port. From Wei Wang.
7) TCP disconnect must reset the cached receive DST, from WANG Cong.
8) Fix sign extension bug on 32-bit in dev_get_stats(), from Eric
Dumazet.
9) fman driver has to depend on HAS_DMA, from Madalin Bucur.
10) Fix bpf pointer leak with xadd in verifier, from Daniel Borkmann.
11) Fix negative page counts with GFO, from Michal Kubecek.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
sfc: fix attempt to translate invalid filter ID
net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
bpf: prevent leaking pointer via xadd on unpriviledged
arcnet: com20020-pci: add missing pdev setup in netdev structure
arcnet: com20020-pci: fix dev_id calculation
arcnet: com20020: remove needless base_addr assignment
Trivial fix to spelling mistake in arc_printk message
arcnet: change irq handler to lock irqsave
rocker: move dereference before free
mlxsw: spectrum_router: Fix NULL pointer dereference
net: sched: Fix one possible panic when no destroy callback
virtio-net: serialize tx routine during reset
net: usb: asix88179_178a: Add support for the Belkin B2B128
fsl/fman: add dependency on HAS_DMA
net: prevent sign extension in dev_get_stats()
tcp: reset sk_rx_dst in tcp_disconnect()
net: ipv6: reset daddr and dport in sk if connect() fails
bnx2x: Don't log mc removal needlessly
bnxt_en: Fix netpoll handling.
bnxt_en: Add missing logic to handle TPA end error conditions.
...
Linus Torvalds [Thu, 29 Jun 2017 21:23:02 +0000 (14:23 -0700)]
Merge tag 'for-4.12/dm-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- dm thinp fix for crash that will occur when metadata device failure
races with discard passdown to the underlying data device.
- dm raid fix to not access the superblock's >= 1.9.0 'sectors' member
unconditionally.
* tag 'for-4.12/dm-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm thin: do not queue freed thin mapping for next stage processing
dm raid: fix oops on upgrading to extended superblock format
Linus Torvalds [Thu, 29 Jun 2017 21:10:37 +0000 (14:10 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Two fixes that should go into this release.
One is an nvme regression fix from Keith, fixing a missing queue
freeze if the controller is being reset. This causes the reset to
hang.
The other is a fix for a leak of the bio protection info, if smaller
sized O_DIRECT is used. This fix should be more involved as we have
other problematic paths in the kernel, but given as this isn't a
regression in this series, we'll tackle those for 4.13"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: provide bio_uninit() free freeing integrity/task associations
nvme/pci: Fix stuck nvme reset
Edward Cree [Thu, 29 Jun 2017 15:50:06 +0000 (16:50 +0100)]
sfc: fix attempt to translate invalid filter ID
When filter insertion fails with no rollback, we were trying to convert
EFX_EF10_FILTER_ID_INVALID to an id to store in 'ids' (which is either
vlan->uc or vlan->mc). This would WARN_ON_ONCE and then record a bogus
filter ID of 0x1fff, neither of which is a good thing.
Fixes: 0ccb998bf46d ("sfc: fix filter_id misinterpretation in edge case") Signed-off-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Inbar Karmy [Thu, 29 Jun 2017 11:07:57 +0000 (14:07 +0300)]
net/mlx4_en: Do not allocate redundant TX queues when TC is disabled
Currently the number of TX queues that are allocated doesn't depend
on the number of TCs, the module always loads with max num of UP
per channel.
In order to prevent the allocation of unnecessary memory, the
module will load with minimum number of UPs per channel, and the
user will be able to control the number of TX queues per channel
by changing the number of TC to 8 using the tc command.
The variable num_up will hold the information about the current
number of UPs.
Due to the change, needed to remove the lines that set the value of
UP to be different than zero in the func "mlx4_en_select_queue",
since now the num of TX queues that are allocated is only one per channel
in default.
In order not to force the UP to be zero in case of only one TC, added
a condition before forcing it in the func "mlx4_en_fill_qp_context".
Tested:
After the module is loaded with minimum number of UP per channel, to
increase num of TCs to 8, use:
tc qdisc add dev ens8 root mqprio num_tc 8
In order to decrease the number of TCs to minimum number of UP per channel,
use:
tc qdisc del dev ens8 root
Inbar Karmy [Thu, 29 Jun 2017 11:07:56 +0000 (14:07 +0300)]
net/mlx4_en: Add dynamic variable to hold the number of user priorities (UP)
Until this patch, the number of UPs was hard coded for eight.
Replace this with a variable in struct "mlx4_en_port_profile".
Currently, the variable will hold the maximum number of UP,
as before.
The patch creates an infrastructure to add an option for dynamic
change of the actual number of TCs.
Michal Kubeček [Thu, 29 Jun 2017 09:13:36 +0000 (11:13 +0200)]
net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
Recently I started seeing warnings about pages with refcount -1. The
problem was traced to packets being reused after their head was merged into
a GRO packet by skb_gro_receive(). While bisecting the issue pointed to
commit c21b48cc1bbf ("net: adjust skb->truesize in ___pskb_trim()") and
I have never seen it on a kernel with it reverted, I believe the real
problem appeared earlier when the option to merge head frag in GRO was
implemented.
Handling NAPI_GRO_FREE_STOLEN_HEAD state was only added to GRO_MERGED_FREE
branch of napi_skb_finish() so that if the driver uses napi_gro_frags()
and head is merged (which in my case happens after the skb_condense()
call added by the commit mentioned above), the skb is reused including the
head that has been merged. As a result, we release the page reference
twice and eventually end up with negative page refcount.
To fix the problem, handle NAPI_GRO_FREE_STOLEN_HEAD in napi_frags_finish()
the same way it's done in napi_skb_finish().
Fixes: d7e8883cfcf4 ("net: make GRO aware of skb->head_frag") Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Arvind Yadav [Thu, 29 Jun 2017 11:09:38 +0000 (16:39 +0530)]
net: bridge: constify attribute_group structures.
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
2645 896 0 3541 dd5 net/bridge/br_sysfs_br.o
File size After adding 'const':
text data bss dec hex filename
2701 832 0 3533 dcd net/bridge/br_sysfs_br.o
Arvind Yadav [Thu, 29 Jun 2017 11:01:26 +0000 (16:31 +0530)]
net: constify attribute_group structures.
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/device.h> work with const
attribute_group. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
9968 3168 16 13152 3360 net/core/net-sysfs.o
File size After adding 'const':
text data bss dec hex filename
10160 2976 16 13152 3360 net/core/net-sysfs.o
dev_pm_ops are not supposed to change at runtime. All functions
working with dev_pm_ops provided by <linux/device.h> work with const
dev_pm_ops. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
19057 392 0 19449 4bf9 drivers/net/ethernet/freescale/gianfar.o
File size After adding 'const':
text data bss dec hex filename
19249 192 0 19441 4bf1 drivers/net/ethernet/freescale/gianfar.o
Arvind Yadav [Thu, 29 Jun 2017 05:51:00 +0000 (11:21 +0530)]
net: smc91x: constify dev_pm_ops structures.
dev_pm_ops are not supposed to change at runtime. All functions
working with dev_pm_ops provided by <linux/device.h> work with const
dev_pm_ops. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
18709 401 0 19110 4aa6 drivers/net/ethernet/smsc/smc91x.o
File size After adding 'const':
text data bss dec hex filename
18901 201 0 19102 4a9e drivers/net/ethernet/smsc/smc91x.o
dev_pm_ops are not supposed to change at runtime. All functions
working with dev_pm_ops provided by <linux/device.h> work with const
dev_pm_ops. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
15426 1256 0 16682 412a drivers/net/ethernet/ibm/ibmveth.o
File size After adding 'const':
text data bss dec hex filename
15618 1064 0 16682 412a drivers/net/ethernet/ibm/ibmveth.o
Doing pointer arithmetic on them is also forbidden, so that they
don't turn into unknown value and then get leaked out. However,
there's xadd as a special case, where we don't check the src reg
for being a pointer register, e.g. the following will pass:
We could store the pointer into skb->cb, loose the type context,
and then read it out from there again to leak it eventually out
of a map value. Or more easily in a different variant, too:
Thomas Falcon [Thu, 29 Jun 2017 00:55:54 +0000 (19:55 -0500)]
ibmvnic: Fix assignment of RX/TX IRQ's
The driver currently creates RX/TX queues during device probe, but
assigns IRQ's to them during device open. On reset, however,
IRQ's are assigned when resetting the queues. If there is a reset
while the device is closed and the device is later opened, the driver will
request IRQ's twice, causing the open to fail. This patch assigns
the IRQ's in the ibmvnic_init function after the queues are reset or
initialized, ensuring IRQ's are only requested once.
Donald Sharp [Wed, 28 Jun 2017 17:58:57 +0000 (13:58 -0400)]
net: ipmr: Add ipmr_rtm_getroute
Add to RTNL_FAMILY_IPMR, RTM_GETROUTE the ability
to retrieve one S,G mroute from a specified table.
*,G will return mroute information for just that
particular mroute if it exists. This is because
it is entirely possible to have more S's then
can fit in one skb to return to the requesting
process.
Martin KaFai Lau [Wed, 28 Jun 2017 17:41:24 +0000 (10:41 -0700)]
bpf: Fix out-of-bound access on interpreters[]
The index is off-by-one when fp->aux->stack_depth
has already been rounded up to 32. In particular,
if stack_depth is 512, the index will be 16.
The fix is to round_up and then takes -1 instead of round_down.
[ 22.318680] ==================================================================
[ 22.319745] BUG: KASAN: global-out-of-bounds in bpf_prog_select_runtime+0x48a/0x670
[ 22.320737] Read of size 8 at addr ffffffff82aadae0 by task sockex3/1946
[ 22.321646]
[ 22.321858] CPU: 1 PID: 1946 Comm: sockex3 Tainted: G W 4.12.0-rc6-01680-g2ee87db3a287 #22
[ 22.323061] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014
[ 22.324260] Call Trace:
[ 22.324612] dump_stack+0x67/0x99
[ 22.325081] print_address_description+0x1e8/0x290
[ 22.325734] ? bpf_prog_select_runtime+0x48a/0x670
[ 22.326360] kasan_report+0x265/0x350
[ 22.326860] __asan_report_load8_noabort+0x19/0x20
[ 22.327484] bpf_prog_select_runtime+0x48a/0x670
[ 22.328109] bpf_prog_load+0x626/0xd40
[ 22.328637] ? __bpf_prog_charge+0xc0/0xc0
[ 22.329222] ? check_nnp_nosuid.isra.61+0x100/0x100
[ 22.329890] ? __might_fault+0xf6/0x1b0
[ 22.330446] ? lock_acquire+0x360/0x360
[ 22.331013] SyS_bpf+0x67c/0x24d0
[ 22.331491] ? trace_hardirqs_on+0xd/0x10
[ 22.332049] ? __getnstimeofday64+0xaf/0x1c0
[ 22.332635] ? bpf_prog_get+0x20/0x20
[ 22.333135] ? __audit_syscall_entry+0x300/0x600
[ 22.333770] ? syscall_trace_enter+0x540/0xdd0
[ 22.334339] ? exit_to_usermode_loop+0xe0/0xe0
[ 22.334950] ? do_syscall_64+0x48/0x410
[ 22.335446] ? bpf_prog_get+0x20/0x20
[ 22.335954] do_syscall_64+0x181/0x410
[ 22.336454] entry_SYSCALL64_slow_path+0x25/0x25
[ 22.337121] RIP: 0033:0x7f263fe81f19
[ 22.337618] RSP: 002b:00007ffd9a3440c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[ 22.338619] RAX: ffffffffffffffda RBX: 0000000000aac5fb RCX: 00007f263fe81f19
[ 22.339600] RDX: 0000000000000030 RSI: 00007ffd9a3440d0 RDI: 0000000000000005
[ 22.340470] RBP: 0000000000a9a1e0 R08: 0000000000a9a1e0 R09: 0000009d00000001
[ 22.341430] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000010000
[ 22.342411] R13: 0000000000a9a023 R14: 0000000000000001 R15: 0000000000000003
[ 22.343369]
[ 22.343593] The buggy address belongs to the variable:
[ 22.344241] interpreters+0x80/0x980
[ 22.344708]
[ 22.344908] Memory state around the buggy address:
[ 22.345556] ffffffff82aad980: 00 00 00 04 fa fa fa fa 04 fa fa fa fa fa fa fa
[ 22.346449] ffffffff82aada00: 00 00 00 00 00 fa fa fa fa fa fa fa 00 00 00 00
[ 22.347361] >ffffffff82aada80: 00 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa
[ 22.348301] ^
[ 22.349142] ffffffff82aadb00: 00 01 fa fa fa fa fa fa 00 00 00 00 00 00 00 00
[ 22.350058] ffffffff82aadb80: 00 00 07 fa fa fa fa fa 00 00 05 fa fa fa fa fa
[ 22.350984] ==================================================================
Fixes: b870aa901f4b ("bpf: use different interpreter depending on required stack size") Signed-off-by: Martin KaFai Lau <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Thu, 29 Jun 2017 19:26:14 +0000 (15:26 -0400)]
Merge branch 'arcnet-features'
Michael Grzeschik says:
====================
arcnet: Collection of latest features
Here we sum up the latest features to improve the arcnet framework. One
patch is used to get feedback from the transfer queue about failed xfers
by adding the err_skb message queue. Beside that we improve the
backplane status that can be read by the PCI-based cards and offer that
status via an extra sysfs attribute. In the last patch we add another
card type PCIFB2.
====================
arcnet: add err_skb package for package status feedback
We need to track the status of our queued packages. This way the driving
process knows if failed packages need to be retransmitted. For this
purpose we queue the transferred/failed packages back into the err_skb
message queue added with some status information.
David S. Miller [Thu, 29 Jun 2017 19:18:38 +0000 (15:18 -0400)]
Merge branch 'arcnet-fixes'
Michael Grzeschik says:
====================
arcnet: Collection of latest fixes
Here we sum up the recent fixes I collected on the way to use and
stabilise the framework. Part of it is an possible deadlock that we
prevent as well to fix the calculation of the dev_id that can be setup
by an rotary encoder. Beside that we added an trivial spelling patch and
fix some wrong and missing assignments that improves the code footprint.
====================
Signed-off-by: Michael Grzeschik <[email protected]>
---
v1 -> v2: removed unneeded zero assignment of flags Signed-off-by: David S. Miller <[email protected]>
The following updates and fixes are included in this driver update series:
- Simplify mailbox interface code
- Fix SFP supported and advertising settings
- Fix PTP initialization register usage
- Insure there is timestamp skb present before using it
- Add a timeout to timestamp register updates
- Handle return code from software reset function
- Some fixes for handling 2.5Gbps rates
- Limit I2C error messages
- Fix non-DMA interrupt handling through tasklet usage
- Add NUMA affinity support for memory allocations
- Add NUMA affinity support for interrupts
- Prepare for more fine-grained cache coherency controls
- Simplify setting the DMA burst length programming
- Performance improvements
This patch series is based on net-next.
====================
Lendacky, Thomas [Wed, 28 Jun 2017 18:43:18 +0000 (13:43 -0500)]
amd-xgbe: Simplify the burst length settings
Currently the driver hardcodes the PBLx8 setting. Remove the need for
specifying the PBLx8 setting and automatically calculate based on the
specified PBL value. Since the PBLx8 setting applies to both Tx and Rx
use the same PBL value for both of them.
Also, the driver currently uses a bit field to set the AXI master burst
len setting. Change to the full bit field range and set the burst length
based on the specified value.
Lendacky, Thomas [Wed, 28 Jun 2017 18:42:51 +0000 (13:42 -0500)]
amd-xgbe: Add NUMA affinity support for memory allocations
Add support to perform memory allocations on the node of the device. The
original allocation or the ring structure and Tx/Rx queues allocated all
of the memory at once and then carved it up for each channel and queue.
To best ensure that we get as much memory from the NUMA node as we can,
break the channel and ring allocations into individual allocations.
Lendacky, Thomas [Wed, 28 Jun 2017 18:42:42 +0000 (13:42 -0500)]
amd-xgbe: Re-issue interrupt if interrupt status not cleared
Some of the device interrupts should function as level interrupts. For
some hardware configurations this requires setting some control bits
so that if the interrupt status has not been cleared the interrupt
should be reissued.
Additionally, when using MSI or MSI-X interrupts, run the interrupt
service routine as a tasklet so that the re-issuance of the interrupt
is handled properly.
Lendacky, Thomas [Wed, 28 Jun 2017 18:42:35 +0000 (13:42 -0500)]
amd-xgbe: Limit the I2C error messages that are output
When I2C communication fails, it tends to always fail. Rather than
continuously issue an error message (once per second in most cases),
change the message to be issued just once.
Lendacky, Thomas [Wed, 28 Jun 2017 18:42:16 +0000 (13:42 -0500)]
amd-xgbe: Handle return code from software reset function
Currently the function that performs a software reset of the hardware
provides a return code. During driver probe check this return code and
exit with an error if the software reset fails.
Lendacky, Thomas [Wed, 28 Jun 2017 18:42:07 +0000 (13:42 -0500)]
amd-xgbe: Prevent looping forever if timestamp update fails
Just to be on the safe side, should the update of the timestamp registers
not complete, issue a warning rather than looping forever waiting for the
update to complete.
Lendacky, Thomas [Wed, 28 Jun 2017 18:41:58 +0000 (13:41 -0500)]
amd-xgbe: Add a check for an skb in the timestamp path
Spurious Tx timestamp interrupts can cause an oops in the Tx timestamp
processing function if a Tx timestamp skb is NULL. Add a check to insure
a Tx timestamp skb is present before attempting to use it.
Lendacky, Thomas [Wed, 28 Jun 2017 18:41:49 +0000 (13:41 -0500)]
amd-xgbe: Use the proper register during PTP initialization
During PTP initialization, the Timestamp Control register should be
cleared and not the Tx Configuration register. While this typo causes
the wrong register to be cleared, the default value of each register and
and the fact that the Tx Configuration register is programmed afterwards
doesn't result in a bug, hence only fixing in net-next.
When using SFPs, the supported and advertised settings should be initially
based on the SFP that has been detected. The code currently indicates the
overall support of the device as opposed to what the SFP is capable of.
Update the code to change the supported link modes, auto-negotiation, etc.
to be based on the installed SFP.
Simplify and centralize the mailbox command rate change interface by
having a single function perform the writes to the mailbox registers
to issue the request.
Dan Carpenter [Wed, 28 Jun 2017 11:44:21 +0000 (14:44 +0300)]
rocker: move dereference before free
My static checker complains that ofdpa_neigh_del() can sometimes free
"found". It just makes sense to use it first before deleting it.
Fixes: ecf244f753e0 ("rocker: fix maybe-uninitialized warning") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
The newest devices need a longer time to reset because of
their more complex hardware. Wait 5ms after device reset.
Consolidate all the places that reset the device in the
PCIe transport to avoid future bugs.
While at it, unify the flow to use set_bit instead of full
write as requested by the hardware designers.
iwlwifi: pcie: propagate iwl_pcie_apm_init's status
iwl_pcie_apm_init can fail so make sure that the caller
takes the status into account.
Also, ensure that the error that iwl_pcie_apm_init can emit
will appear in the kernel log by default.
When a station that's not associated sends a data frame (e.g. an NDP)
hostapd will respond with a disassoc frame, telling it that it's not
associated. The station might also not be authenticated, in which case
there will not be a station entry for it, and as a result we need to
accept such frames without a station.
Fixes: 3ee0f0e23e4f ("iwlwifi: mvm: fix DQA AP mode station assumption") Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Luca Coelho <[email protected]>
Johannes Berg [Thu, 22 Jun 2017 11:06:21 +0000 (13:06 +0200)]
iwlwifi: mvm: remove DQA non-STA client mode special case
When we get a non-STA frame to transmit in client mode, we try to use
the IWL_MVM_DQA_BSS_CLIENT_QUEUE queue (queue #4). However, at this
point, the queue might not be allocated at all, causing warnings. The
scenario on which this happened was a race condition between mac80211
and our queue allocation work:
* mac80211 sends auth
* we stop mac80211 queues to allocate a hw queue
* authentication is aborted
* we allocate HW queue and start mac80211 queues
* mac80211 removes station
* mac80211 hands us the auth frame from the pending queue
At this point, since mac80211 has already removed the station, we try
to transmit the frame through this special non-station case on queue
4 anyway.
In order to really use it properly, we'd have to again go through the
hw queue allocation work, and attach it to a station, etc. In this
case that isn't possible (there's no station anymore), but if this
special case were needed, then we'd have to do it this way.
However, the special case is documented to exist for TDLS, but can't
trigger there because the TDLS setup frames etc. are normal to-DS
frames going to the peer through the AP. Testing also confirms that
this code path isn't triggered in TDLS.
Therefore, remove the code path to avoid using an unused queue. The
erroneous frame described above will still be transmitted on the AUX
queue, but arguably that's a mac80211 problem, which will eventually
be fixed by moving everything there to TXQs.
Fixes: e3118ad74d7e ("iwlwifi: mvm: support tdls in dqa mode") Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Luca Coelho <[email protected]>
Johannes Berg [Tue, 20 Jun 2017 10:51:30 +0000 (12:51 +0200)]
iwlwifi: pcie: reconfigure MSI-X HW on resume
When going into suspend, the HW configuration for MSI-X will
likely be lost. As a consequence, after waking up, all IRQ
causes will be mapped to interrupt 0, and as a consequence we
don't notice the interrupt because in most cases this is an
interrupt for a queue, and getting it doesn't read the other
cause registers.
Fixes: 2e5d4a8f61dc ("iwlwifi: pcie: Add new configuration to enable MSIX") Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Luca Coelho <[email protected]>
iwlwifi: mvm: don't send fetch the TID from a non-QoS packet in TSO
Getting the TID of a packet before we know it is a QoS data
packet isn't a good idea. Delay the TID retrieval until
we know the packet is a QoS data packet.
Johannes Berg [Mon, 19 Jun 2017 21:26:55 +0000 (23:26 +0200)]
iwlwifi: mvm: fix mac80211's hw_queue in DQA mode
When in non-DQA mode, mac80211 actually gets a pretty much perfect
idea (in vif->hw_queue/cab_queue) of which queues we're using. But
in DQA mode, this isn't true - nonetheless, we were adding all the
queues, even the ones stations are using, to the queue allocation
bitmap.
Fix this, we should only add the queues we really are using in DQA
mode:
* IWL_MVM_OFFCHANNEL_QUEUE, as we use this in both modes
* mvm->aux_queue, as we use this in both modes - mac80211
never really knows about it but we use it as a cookie
internally, so can't reuse it
* possibly the GCAST queue (cab_queue)
* all the "queues" we told mac80211 about we were using on each
interface (vif->hw_queue), these are entirely virtual in this
mode
Also add back the failure now when we can't allocate any more of
these - now virtual - queues; this was skipped in DQA mode and
would lead to having multiple ACs or even interfaces use the same
queue number in mac80211 (10, since that's the limit), which would
stop far too many queues if stopped.
Johannes Berg [Mon, 19 Jun 2017 20:48:32 +0000 (22:48 +0200)]
iwlwifi: mvm: map cab_queue to real one earlier
There may be a difference between the mac80211 vif->cab_queue and
mvmvif->cab_queue, particularly with TVQM. Make the code map this
earlier, instead of first returning the mac80211 one again from
iwl_mvm_get_ctrl_vif_queue().
Johannes Berg [Mon, 19 Jun 2017 20:31:56 +0000 (22:31 +0200)]
iwlwifi: mvm: fix mac80211 queue tracking
In the driver, we track which hardware queue is associated with
which mac80211 "hw_queue", in order to be able to stop and wake
it. When moving these bitmaps out of the queue_info structures,
the type of the bitmap was erroneously changed from u32 to u8,
presumably in order to save memory.
Turns out that u32 isn't needed, because the highest queue we
can ever tell mac80211 is always < 16, but a u16 definitely is
needed, queues >=8 do happen.
While at it, throw a BUILD_BUG_ON() into the place where we set
the limit (mvm->first_agg_queue) and a warning when it actually
gets put into the bitmap.
The consequence of this bug is that full HW queues associated
with such a too-high mac80211 number never stop higher layer
queues when full, and thus would simply drop all packets that
couldn't be enqueued to the hardware queue.
Fixes: 34e10860ae8d ("iwlwifi: mvm: remove references to queue_info in new TX path") Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Luca Coelho <[email protected]>
Johannes Berg [Mon, 19 Jun 2017 11:24:49 +0000 (13:24 +0200)]
iwlwifi: mvm: properly enable IP header checksumming
The code was intended to enable IP header checksumming on AMSDUs, but
failed to really do so because the A-MSDU bit was set after all the
checksumming bits, and thus checking for A-MSDU could never be true.
Fix this by setting the A-MSDU bit before the offload bits.
Fixes: 5e6a98dc4863 ("iwlwifi: mvm: enable TCP/UDP checksum support for 9000 family") Reported-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Luca Coelho <[email protected]>