Once we remove flow cache, we don't have a flow cache limit anymore.
We must not allow (virtually) unlimited allocations of xfrm dst entries.
Revert back to the old xfrm dst gc limits.
these drivers use tasklets or irq apis, but don't include interrupt.h.
Once flow cache is removed the implicit interrupt.h inclusion goes away
which will break the build.
This patch series removes the remaining capabilities as well as the
flags bitmap in the info structures. Most of them are turned into ops,
or new info members.
There is no mv88e6xxx_cap enum or bitmap flags anymore, only
mv88e6xxx_info and mv88e6xxx_ops structures.
While reviewing and documenting the related G2 registers, fix a few
inconsistencies: 88E6185 has no interrupt in G2 and 88E6390 has a POT.
Except these two adjustments, there is no functional changes.
====================
Vivien Didelot [Mon, 17 Jul 2017 17:03:46 +0000 (13:03 -0400)]
net: dsa: mv88e6xxx: add a multi_chip info flag
Instead of relying on a bitmap flag, add a new multi_chip info flag to
describe the presence of the indirect SMI access though the two device
registers 0x0 and 0x1.
All remaining capabilities and flags are now unused. Remove the
mv88e6xxx_cap enum and the info flags bitmaps.
Vivien Didelot [Mon, 17 Jul 2017 17:03:45 +0000 (13:03 -0400)]
net: dsa: mv88e6xxx: add Energy Detect ops
The 88E6352 family supports Energy Detect and has one bit for Sense and
one bit for periodically transmit NLP (Energy Detect+TM). The 88E6390
family adds another bit to distinguish Auto or SW wake-up. Chips
supporting EEE all have an EEE Enabled bit in the Port Status Register.
This patch adds new ops for the PHY Energy Detect accesses.
This also allows us to get rid of the MV88E6XXX_FLAG_EEE flag.
Vivien Didelot [Mon, 17 Jul 2017 17:03:41 +0000 (13:03 -0400)]
net: dsa: mv88e6xxx: distinguish Global 2 Rsvd2CPU
The 88E6185 family only has one 16-bit register to mark the 16 802.1D
reserved multicast addresses in the range of 01:80:C2:00:00:0x as MGMT.
The 88E6352 family also has one 16-bit register to mark the 16 GARP
reserved multicast addresses in the range of 01:80:C2:00:00:2x as MGMT.
Split the existing mv88e6095 prefixed mgmt_rsvd2cpu operation into two
distinct mv88e6185 and mv88e6352 prefixed operations, and wrap its call
into a mv88e6xxx_rsvd2cpu_setup helper.
This allows us to also get rid of the MV88E6XXX_CAP_G2_MGMT_EN_* flags.
John Fastabend [Tue, 18 Jul 2017 04:56:48 +0000 (21:56 -0700)]
net: fix build error in devmap helper calls
Initial patches missed case with CONFIG_BPF_SYSCALL not set.
Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map") Fixes: 97f91a7cf04f ("bpf: add bpf_redirect_map helper routine") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
5113 384 0 5497 1579 drivers/net/ethernet/ec_bhf.o
File size After adding 'const':
text data bss dec hex filename
5177 320 0 5497 1579 drivers/net/ethernet/ec_bhf.o
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
791 336 0 1127 467 net/ethernet/cadence/macb_pci.o
File size After adding 'const':
text data bss dec hex filename
855 272 0 1127 467 net/ethernet/cadence/macb_pci.o
net: Revert "net: add function to allocate sk_buff head without data area"
It was added for netlink mmap tx, there are no callers in the tree.
The commit also added a check for skb->head != NULL in kfree_skb path,
remove that too -- all skbs ought to have skb->head set.
David S. Miller [Mon, 17 Jul 2017 16:48:07 +0000 (09:48 -0700)]
Merge branch 'xdp-redirect'
John Fastabend says:
====================
Implement XDP bpf_redirect
This series adds two new XDP helper routines bpf_redirect() and
bpf_redirect_map(). The first variant bpf_redirect() is meant
to be used the same way it is currently being used by the cls_bpf
classifier. An xdp packet will be redirected immediately when this
is called.
The other variant bpf_redirect_map(map, key, flags) uses a new
map type called devmap. A devmap uses integers as keys and
net_devices as values. The user provies key/ifindex pairs to
update the map with new net_devices. This provides two benefits
over the normal variant 'bpf_redirect()'. First the datapath
bpf program is abstracted away from using hard-coded ifindex
values. Allowing a single bpf program to be run any many different
environments. Second, and perhaps more important, the map enables
batching packet transmits. The map plus small driver changes
allows for batching all send requests across a NAPI poll loop.
This allows driver writers to optimize the driver xmit path
and only call expensive operations once for a batch of xdp_buffs.
The devmap was designed to support possible future work for
multicast and broadcast as follow-up patches.
To see, in more detail, how to leverage the new helpers and
map from the userspace side please review these two patches,
xdp: sample program for new bpf_redirect helper
xdp: bpf redirect with map sample program
Performance numbers provided by Jesper are the following, tested
using the ixgbe driver with CPU E5-1650 v4 @ 3.60GHz:
13,939,674 pkt/s = XDP_DROP without touching memory
14,290,650 pkt/s = xdp1: XDP_DROP with reading packet data
13,221,812 pkt/s = xdp2: XDP_TX with swap mac (writes into pkt)
7,596,576 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (like XDP_TX)
13,058,435 pkt/s = xdp_redirect_map:XDP_REDIRECT with swap mac + devmap
A big thanks to everyone who helped with this series. Jesper
provided fixes, debugging, code review, performance benchmarks!
Daniel provided lots of useful feedback and code review. And last
but not least Andy provided useful feedback related to supporting
additional drivers, generic xdp implementation, testing, etc. Any
other feedback is welcome but I believe at this point these are
ready to be merged!
Whats left... get the rest of the drivers developers to implement
this in all the drivers.
====================
John Fastabend [Mon, 17 Jul 2017 16:30:02 +0000 (09:30 -0700)]
net: add notifier hooks for devmap bpf map
The BPF map devmap holds a refcnt on the net_device structure when
it is in the map. We need to do this to ensure on driver unload we
don't lose a dev reference.
However, its not very convenient to have to manually unload the map
when destroying a net device so add notifier handlers to do the cleanup
automatically. But this creates a race between update/destroy BPF
syscall and programs and the unregister netdev hook.
Unfortunately, the best I could come up with is either to live with
requiring manual removal of net devices from the map before removing
the net device OR to add a mutex in devmap to ensure the map is not
modified while we are removing a device. The fallout also requires
that BPF programs no longer update/delete the map from the BPF program
side because the mutex may sleep and this can not be done from inside
an rcu critical section. This is not a real problem though because I
have not come up with any use cases where this is actually useful in
practice. If/when we come up with a compelling user for this we may
need to revisit this.
John Fastabend [Mon, 17 Jul 2017 16:29:40 +0000 (09:29 -0700)]
xdp: Add batching support to redirect map
For performance reasons we want to avoid updating the tail pointer in
the driver tx ring as much as possible. To accomplish this we add
batching support to the redirect path in XDP.
This adds another ndo op "xdp_flush" that is used to inform the driver
that it should bump the tail pointer on the TX ring.
John Fastabend [Mon, 17 Jul 2017 16:28:56 +0000 (09:28 -0700)]
bpf: add devmap, a map for storing net device references
Device map (devmap) is a BPF map, primarily useful for networking
applications, that uses a key to lookup a reference to a netdevice.
The map provides a clean way for BPF programs to build virtual port
to physical port maps. Additionally, it provides a scoping function
for the redirect action itself allowing multiple optimizations. Future
patches will leverage the map to provide batching at the XDP layer.
Another optimization/feature, that is not yet implemented, would be
to support multiple netdevices per key to support efficient multicast
and broadcast support.
John Fastabend [Mon, 17 Jul 2017 16:27:50 +0000 (09:27 -0700)]
net: implement XDP_REDIRECT for xdp generic
Add support for redirect to xdp generic creating a fall back for
devices that do not yet have support and allowing test infrastructure
using veth pairs to be built.
John Fastabend [Mon, 17 Jul 2017 16:27:28 +0000 (09:27 -0700)]
xdp: sample program for new bpf_redirect helper
This implements a sample program for testing bpf_redirect. It reports
the number of packets redirected per second and as input takes the
ifindex of the device to run the xdp program on and the ifindex of the
interface to redirect packets to.
John Fastabend [Mon, 17 Jul 2017 16:27:07 +0000 (09:27 -0700)]
xdp: add bpf_redirect helper function
This adds support for a bpf_redirect helper function to the XDP
infrastructure. For now this only supports redirecting to the egress
path of a port.
In order to support drivers handling a xdp_buff natively this patches
uses a new ndo operation ndo_xdp_xmit() that takes pushes a xdp_buff
to the specified device.
If the program specifies either (a) an unknown device or (b) a device
that does not support the operation a BPF warning is thrown and the
XDP_ABORTED error code is returned.
John Fastabend [Mon, 17 Jul 2017 16:26:45 +0000 (09:26 -0700)]
net: xdp: support xdp generic on virtual devices
XDP generic allows users to test XDP programs and/or run them with
degraded performance on devices that do not yet support XDP. For
testing I typically test eBPF programs using a set of veth devices.
This allows testing topologies that would otherwise be difficult to
setup especially in the early stages of development.
This patch adds a xdp generic hook to the netif_rx_internal()
function which is called from dev_forward_skb(). With this addition
attaching XDP programs to veth devices works as expected! Also I
noticed multiple drivers using netif_rx(). These devices will also
benefit and generic XDP will work for them as well.
John Fastabend [Mon, 17 Jul 2017 16:26:24 +0000 (09:26 -0700)]
ixgbe: NULL xdp_tx rings on resource cleanup
tx_rings and rx_rings are cleaned up on close paths in ixgbe driver
however, xdp_rings are not. Set the xdp_rings to NULL here so that
we can use the pointer to indicate if the XDP rings are initialized.
David S. Miller [Mon, 17 Jul 2017 16:19:40 +0000 (09:19 -0700)]
Merge branch 'mlxsw-traps'
Jiri Pirko says:
====================
mlxsw: Traps enhancements
Ido says:
The first patch makes sure the driver marks packets that were trapped
in the router and might have already been flooded by the bridge, so that
the bridge driver won't flood them again. This isn't critical at this time
point, but will be when Neighbour Discovery traps are introduced as these
are multicast packets that are trapped in the router.
The second and third patches add new traps - for MLD and Router Alert
packets. The last patch takes advantage of that and floods IPv6
unregistered multicast packets only to mrouter ports instead of all ports.
====================
Up until now IPv6 unregistered multicast traffic would be flooded like
broadcast, even when MLD snooping was enabled on the bridge. This was
intentional as MLD packet traps were missing, preventing the bridge
driver from programming MDB entries to the device.
Previous patch added these traps, so we can now finally flood IPv6
unregistered multicast packets to specific ports via the multicast table
instead of flooding them to all ports via the broadcast table.
In commit 1c6c6d221e2b ("mlxsw: spectrum: Mirror certain packets to
CPU") we marked packets that were mirrored to the CPU, so that they
won't be flooded again by the bridge driver.
However, certain packets are trapped in the device's router block, after
passing through the bridge block where they were potentially flooded.
Mark all packets coming from L3 traps, so that they won't be potentially
flooded again by the bridge driver.
unix: correctly track in-flight fds in sending process user_struct
Hence, regardless of the stacking depth of FDs, the total number of
inflight FDs is limited, and accounted. There is no known way for a
local user to exceed those limits or exploit the accounting.
Furthermore, the GC logic is independent of the recursion/stacking depth
as well. It solely depends on the total number of inflight FDs,
regardless of their layout.
Lastly, the current `recursion_level' suffers a TOCTOU race, since it
checks and inherits depths only at queue time. If we consider `A<-B' to
mean `queue-B-on-A', the following sequence circumvents the recursion
level easily:
A<-B
B<-C
C<-D
...
Y<-Z
resulting in:
A<-B<-C<-...<-Z
With all of this in mind, lets drop the recursion limit. It has no
additional security value, anymore. On the contrary, it randomly
confuses message brokers that try to forward file-descriptors, since
any sendmsg(2) call can fail spuriously with ETOOMANYREFS if a client
maliciously modifies the FD while inflight.
Xin Long [Mon, 17 Jul 2017 03:29:56 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_supported_ext_param_t
This patch is to remove the typedef sctp_supported_ext_param_t, and
replace with struct sctp_supported_ext_param in the places where it's
using this typedef.
It is also to use sizeof(variable) instead of sizeof(type).
Xin Long [Mon, 17 Jul 2017 03:29:55 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_adaptation_ind_param_t
This patch is to remove the typedef sctp_adaptation_ind_param_t, and
replace with struct sctp_adaptation_ind_param in the places where it's
using this typedef.
Xin Long [Mon, 17 Jul 2017 03:29:53 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_supported_addrs_param_t
This patch is to remove the typedef sctp_supported_addrs_param_t, and
replace with struct sctp_supported_addrs_param in the places where it's
using this typedef.
Xin Long [Mon, 17 Jul 2017 03:29:51 +0000 (11:29 +0800)]
sctp: remove the typedef sctp_cookie_preserve_param_t
This patch is to remove the typedef sctp_cookie_preserve_param_t, and
replace with struct sctp_cookie_preserve_param in the places where it's
using this typedef.
It is also to fix some indents in sctp_sf_do_5_2_6_stale().
rds: cancel send/recv work before queuing connection shutdown
We could end up executing rds_conn_shutdown before the rds_recv_worker
thread, then rds_conn_shutdown -> rds_tcp_conn_shutdown can do a
sock_release and set sock->sk to null, which may interleave in bad
ways with rds_recv_worker, e.g., it could result in:
Also, do not enqueue any new shutdown workq items when the connection is
shutting down (this may happen for rds-tcp in softirq mode, if a FIN
or CLOSE is received while the modules is in the middle of an unload)
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
====================
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
27702 468 16 28186 6e1a drivers/atm/idt77252.o
File size After adding 'const':
text data bss dec hex filename
27766 404 16 28186 6e1a drivers/atm/idt77252.o
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
21565 352 56 21973 55d5 drivers/atm/eni.o
File size After adding 'const':
text data bss dec hex filename
21661 256 56 21973 55d5 drivers/atm/eni.o
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
16884 444 28 17356 43cc drivers/atm/firestream.o
File size After adding 'const':
text data bss dec hex filename
16980 348 28 17356 43cc drivers/atm/firestream.o
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
14350 352 40 14742 3996 drivers/atm/zatm.o
File size After adding 'const':
text data bss dec hex filename
14446 256 40 14742 3996 drivers/atm/zatm.o
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
18074 352 0 18426 47fa drivers/atm/lanai.o
File size After adding 'const':
text data bss dec hex filename
18170 256 0 18426 47fa drivers/atm/lanai.o
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
16138 4592 24 20754 5112 drivers/atm/solos-pci.o
File size After adding 'const':
text data bss dec hex filename
16218 4528 24 20754 5122 drivers/atm/solos-pci.o
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
9859 328 6 10193 27d1 drivers/atm/horizon.o
File size After adding 'const':
text data bss dec hex filename
9923 264 6 10193 27d1 drivers/atm/horizon.o
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
26514 440 48 27002 697a drivers/atm/he.o
File size After adding 'const':
text data bss dec hex filename
26578 376 48 27002 697a drivers/atm/he.o
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
22781 464 128 23373 5b4d drivers/atm/nicstar.o
File size After adding 'const':
text data bss dec hex filename
22845 400 128 23373 5b4d drivers/atm/nicstar.o
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
20025 320 16 20361 4f89 drivers/atm/fore200e.o
File size After adding 'const':
text data bss dec hex filename
20089 256 16 20361 4f89 drivers/atm/fore200e.o
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
13372 408 4 13784 35d8 drivers/atm/ambassador.o
File size After adding 'const':
text data bss dec hex filename
13484 296 4 13784 35d8 drivers/atm/ambassador.o
pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by <linux/pci.h> work with
const pci_device_id. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
23536 432 160 24128 5e40 drivers/atm/iphase.o
File size After adding 'const':
text data bss dec hex filename
23632 336 160 24128 5e40 drivers/atm/iphase.o
Vincent Bernat [Sat, 15 Jul 2017 17:40:20 +0000 (19:40 +0200)]
ip6: fix PMTU discovery when using /127 subnets
The definition of an "anycast destination address" has been tweaked as a
side-effect of commit 2647a9b07032 ("ipv6: Remove external dependency on
rt6i_gateway and RTF_ANYCAST"). The first address of a point-to-point
/127 subnet is now considered as an anycast address. This prevents
ICMPv6 errors to be returned to a sender of such a subnet and breaks
PMTU discovery.
This can be reproduced with:
ip link add name out6 type veth peer name in6
ip link add name out7 type veth peer name in7
ip link set mtu 1400 dev out7
ip link set mtu 1400 dev in7
ip netns add next-hop
ip netns add next-next-hop
ip link set netns next-hop dev in6
ip link set netns next-hop dev out7
ip link set netns next-next-hop dev in7
ip link set up dev out6
ip addr add 2001:db8:1::12/127 dev out6
ip netns exec next-hop ip link set up dev in6
ip netns exec next-hop ip link set up dev out7
ip netns exec next-hop ip addr add 2001:db8:1::13/127 dev in6
ip netns exec next-hop ip addr add 2001:db8:1::14/127 dev out7
ip netns exec next-hop ip route add default via 2001:db8:1::15
ip netns exec next-hop sysctl -qw net.ipv6.conf.all.forwarding=1
ip netns exec next-next-hop ip link set up dev in7
ip netns exec next-next-hop ip addr add 2001:db8:1::15/127 dev in7
ip netns exec next-next-hop ip addr add 2001:db8:1::50/128 dev in7
ip netns exec next-next-hop ip route add default via 2001:db8:1::14
ip netns exec next-next-hop sysctl -qw net.ipv6.conf.all.forwarding=1
ip route add 2001:db8:1::48/123 via 2001:db8:1::13
sleep 4
ping -M do -s 1452 -c 3 2001:db8:1::50 || true
ip route get 2001:db8:1::50
Before the patch, we get:
2001:db8:1::50 from :: via 2001:db8:1::13 dev out6 src 2001:db8:1::12 metric 1024 pref medium
After the patch, we get:
2001:db8:1::50 via 2001:db8:1::13 dev out6 src 2001:db8:1::12 metric 0
cache expires 578sec mtu 1400 pref medium
Fixes: 2647a9b07032 ("ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST") Signed-off-by: Vincent Bernat <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Shannon Nelson [Thu, 6 Jul 2017 23:57:10 +0000 (16:57 -0700)]
sunvnet: add support for IPv6 checksum offloads
The original code didn't handle non-IPv4 packets very well, so the
offload advertising had to be scaled back down to just IP. Here we
add the bits needed to support TCP and UDP packets over IPv6 and
turn the offload advertising back on.
I made the mistake of upgrading my desktop to the new Fedora 26 that
comes with gcc-7.1.1.
There's nothing wrong per se that I've noticed, but I now have 1500
lines of warnings, mostly from the new format-truncation warning
triggering all over the tree.
We use 'snprintf()' and friends in a lot of places, and often know that
the numbers are fairly small (ie a controller index or similar), but gcc
doesn't know that, and sees an 'int', and thinks that it could be some
huge number. And then complains when our buffers are not able to fit
the name for the ten millionth controller.
These warnings aren't necessarily bad per se, and we probably want to
look through them subsystem by subsystem, but at least during the merge
window they just mean that I can't even see if somebody is introducing
any *real* problems when I pull.
Merge tag 'modules-for-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules updates from Jessica Yu:
"Summary of modules changes for the 4.13 merge window:
- Minor code cleanups
- Avoid accessing mod struct prior to checking module struct version,
from Kees
- Fix racy atomic inc/dec logic of kmod_concurrent_max in kmod, from
Luis"
* tag 'modules-for-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: make the modinfo name const
kmod: reduce atomic operations on kmod_concurrent and simplify
module: use list_for_each_entry_rcu() on find_module_all()
kernel/module.c: suppress warning about unused nowarn variable
module: Add module name to modinfo
module: Pass struct load_info into symbol checks
net: stmmac: revert "support future possible different internal phy mode"
Since internal phy-mode is reserved for non-xMII protocol we cannot use
it with dwmac-sun8i.
Furthermore, all DT patchs which comes with this patch were cleaned, so
the current state is broken.
This reverts commit 1c2fa5f84683 ("net: stmmac: support future possible different internal phy mode")
Fixes: 1c2fa5f84683 ("net: stmmac: support future possible different internal phy mode") Signed-off-by: Corentin Labbe <[email protected]> Signed-off-by: David S. Miller <[email protected]>
If we have more than 32 unicast MAC addresses assigned to an interface
we will read beyond the end of the address table in the driver when
adding filters. The next 256 entries store multicast addresses, so we
will end up attempting to insert duplicate filters, which is mostly
harmless. If we add more than 288 unicast addresses we will then read
past the multicast address table, which is likely to be more exciting.
Fixes: 12fb0da45c9a ("sfc: clean fallbacks between promisc/normal in efx_ef10_filter_sync_rx_mode") Signed-off-by: Bert Kenward <[email protected]> Signed-off-by: David S. Miller <[email protected]>