====================
net: dsa: microchip: add KSZ9893 switch support
This series of patches is to modify the KSZ9477 DSA driver to support
running KSZ9893 switch.
The KSZ9893 switch is similar to KSZ9477 except the ingress tail tag has
1 byte instead of 2 bytes. The XMII register that governs the MAC
communication also has different register definitions.
v1
- Put KSZ9893 tagging in separate patch
- Remove other switch support
====================
Tristram Ha [Fri, 1 Mar 2019 03:57:24 +0000 (19:57 -0800)]
net: dsa: microchip: add KSZ9893 switch support
Add KSZ9893 switch support in KSZ9477 driver. This switch is similar to
KSZ9477 except the ingress tail tag has 1 byte instead of 2 bytes, so
KSZ9893 tagging will be used.
The XMII register that governs how the host port communicates with the
MAC also has different register definitions.
Tristram Ha [Fri, 1 Mar 2019 03:57:23 +0000 (19:57 -0800)]
net: dsa: add KSZ9893 switch tagging support
KSZ9893 switch is similar to KSZ9477 switch except the ingress tail tag
has 1 byte instead of 2 bytes. The size of the portmap is smaller and
so the override and lookup bits are also moved.
David S. Miller [Sun, 3 Mar 2019 21:01:49 +0000 (13:01 -0800)]
Merge branch 'appletalk-small-cleanup-and-bugfix'
Yue Haibing says:
====================
appletalk: small cleanup and bugfix
v2:
- Add cover letter log
This patch series mainly fix a use-after-free bug in atalk_proc_exit.
patch 1 use remove_proc_subtree helper to simplify atalk_proc fs code,
also some other cleanup.
patch 2 add proper error cleanup path in atalk_init to fix the issue, which
based on the patch 1 because of the change of atalk_proc_exit context.
====================
Memory state around the buggy address: ffff8881f41fe480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8881f41fe500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8881f41fe580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8881f41fe600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff8881f41fe680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
It should check the return value of atalk_proc_init fails,
otherwise atalk_exit will trgger use-after-free in pde_subdir_find
while unload the module.This patch fix error cleanup path of atalk_init
Eric Dumazet [Thu, 28 Feb 2019 20:55:43 +0000 (12:55 -0800)]
net: sched: put back q.qlen into a single location
In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'")
John made the assumption that the data path had no need to read
the qdisc qlen (number of packets in the qdisc).
It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
children.
But pfifo_fast can be used as leaf in class full qdiscs, and existing
logic needs to access the child qlen in an efficient way.
HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
htb_activate() -> WARN_ON()
htb_dequeue_tree() to decide if a class can be htb_deactivated
when it has no more packets.
HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
qdisc_tree_reduce_backlog() also read q.qlen directly.
Using qdisc_qlen_sum() (which iterates over all possible cpus)
in the data path is a non starter.
It seems we have to put back qlen in a central location,
at least for stable kernels.
For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
so the existing q.qlen{++|--} are correct.
For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
because the spinlock might be not held (for example from
pfifo_fast_enqueue() and pfifo_fast_dequeue())
This patch adds atomic_qlen (in the same location than qlen)
and renames the following helpers, since we want to express
they can be used without qdisc lock, and that qlen is no longer percpu.
Later (net-next) we might revert this patch by tracking all these
qlen uses and replace them by a more efficient method (not having
to access a precise qlen, but an empty/non_empty status that might
be less expensive to maintain/track).
Another possibility is to have a legacy pfifo_fast version that would
be used when used a a child qdisc, since the parent qdisc needs
a spinlock anyway. But then, future lockless qdiscs would also
have the same problem.
David S. Miller [Sat, 2 Mar 2019 22:04:20 +0000 (14:04 -0800)]
Merge tag 'mlx5-updates-2019-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-03-01
This series adds multipath offload support and contains some small updates
to mlx5 driver.
Multipath offload support from Roi Dayan:
We are going to track SW multipath route and related nexthops and reflect
that as port affinity to the HW.
1) Some patches are preparation.
2) add the multipath mode and fib events handling.
3) add support to handle offload failure for net error, i.e.
port down.
4) Small updates to match the behavior of multipath
Two small updates from Eran Ben Elisha,
5) Make a function static
6) Update PCIe supported devices list.
====================
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
1) Add .release_ops to properly unroll .select_ops, use it from nft_compat.
After this change, we can remove list of extensions too to simplify this
codebase.
2) Update amanda conntrack helper to support v3.4, from Florian Tham.
3) Get rid of the obsolete BUGPRINT macro in ebtables, from
Florian Westphal.
4) Merge IPv4 and IPv6 masquerading infrastructure into one single module.
From Florian Westphal.
5) Patchset to remove nf_nat_l3proto structure to get rid of
indirections, from Florian Westphal.
6) Skip unnecessary conntrack timeout updates in case the value is
still the same, also from Florian Westphal.
7) Remove unnecessary 'fall through' comments in empty switch cases,
from Li RongQing.
8) Fix lookup to fixed size hashtable sets on big endian with 32-bit keys.
9) Incorrect logic to deactivate path of fixed size hashtable sets,
element was being tested to self.
10) Remove nft_hash_key(), the bitmap set is always selected for 16-bit
keys.
11) Use boolean whenever possible in IPVS codebase, from Andrea Claudi.
12) Enter close state in conntrack if RST matches exact sequence number,
from Florian Westphal.
13) Initialize dst_cache in tunnel extension, from wenxu.
14) Pass protocol as u16 to xt_check_match and xt_check_target, from
Li RongQing.
15) SCTP header is granted to be in a linear area from IPVS NAT handler,
from Xin Long.
16) Don't steal packets coming from slave VRF device from the
ip_sabotage_in() path, from David Ahern.
17) Fix unsafe update of basechain stats, from Li RongQing.
18) Make sure CONNTRACK_LOCKS is power of 2 to let compiler optimize
modulo operation as bitwise AND, from Li RongQing.
19) Use device_attribute instead of internal definition in the IDLETIMER
target, from Sami Tolvanen.
20) Merge redir, masq and IPv4/IPv6 NAT chain types, from Florian Westphal.
====================
Here's one more bluetooth-next pull request for the 5.1 kernel:
- Added support for MediaTek MT7663U and MT7668U UART devices
- Cleanups & fixes to the hci_qca driver
- Fixed wakeup pin behavior for QCA6174A controller
Please let me know if there are any issues pulling. Thanks.
====================
Linus Torvalds [Sat, 2 Mar 2019 19:47:29 +0000 (11:47 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Two last minute fixes:
- Prevent value evaluation via functions happening in the user access
enabled region of __put_user() (put another way: make sure to
evaluate the value to be stored in user space _before_ enabling
user space accesses)
- Correct the definition of a Hyper-V hypercall constant"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyper-v: Fix definition of HV_MAX_FLUSH_REP_COUNT
x86/uaccess: Don't leak the AC flag into __put_user() value evaluation
Linus Torvalds [Sat, 2 Mar 2019 19:39:54 +0000 (11:39 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Nine small fixes.
The resume fix is a cosmetic removal of a warning with an incorrect
condition causing it to alarm people wrongly.
The other eight patches correct a thinko in Christoph Hellwig's DMA
conversion series. Without it all these drivers end up with 32 bit DMA
masks meaning they bounce any page over 4GB before sending it to the
controller.
Nowadays, even laptops mostly have memory above 4GB, so this can lead
to significant performance degradation with all the bouncing"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: core: Avoid that system resume triggers a kernel warning
scsi: hptiop: fix calls to dma_set_mask()
scsi: hisi_sas: fix calls to dma_set_mask_and_coherent()
scsi: csiostor: fix calls to dma_set_mask_and_coherent()
scsi: bfa: fix calls to dma_set_mask_and_coherent()
scsi: aic94xx: fix calls to dma_set_mask_and_coherent()
scsi: 3w-sas: fix calls to dma_set_mask_and_coherent()
scsi: 3w-9xxx: fix calls to dma_set_mask_and_coherent()
scsi: lpfc: fix calls to dma_set_mask_and_coherent()
====================
Recently we had linux-next bpf/bpf-next conflict when we added new
functionality to the test_progs.c at the same location. Let's split
test_progs.c the same way we recently split test_verifier.c.
I follow the same patten we did in commit 2dfb40121ee8 ("selftests: bpf:
prepare for break up of verifier tests") for verifier: create
scaffolding to support dedicated files and slowly move the tests into
separate files.
The first patch adds scaffolding, subsequent patches move progs into
separate files.
In theory, many of the standalone tests can be migrated to this new
framework as well. They get the benefit of common CHECK macro and
bpf_find_map function which a lot of standalone tests need to redefine.
v3 changes:
* respin on top of commit ebace0e981b2 ("selftests/bpf: use
__bpf_constant_htons in test_prog.c for flow dissector")
* put bpf_rlimit.h into test_progs.c instead of test_progs.h
v2 changes:
* added cover letter, added more description about file structure
====================
selftests: bpf: break up test_progs - preparations
Add new prog_tests directory where tests are supposed to land.
Each prog_tests/<filename>.c is expected to have a global function
with signature 'void test_<filename>(void)'. Makefile automatically
generates prog_tests/tests.h file with entry for each prog_tests file:
prog_tests/tests.h is included in test_progs.c in two places with
appropriate defines. This scheme allows us to move each function with
a separate patch without breaking anything.
Compared to the recent verifier split, each separate file here is
a compilation unit and test_progs.[ch] is now used as a place to put
some common routines that might be used by multiple tests.
Sean Wang [Sat, 2 Mar 2019 18:44:09 +0000 (02:44 +0800)]
Bluetooth: mediatek: add support for MediaTek MT7663U and MT7668U UART devices
This adds the support of enabling MT7663U and MT7668U Bluetooth function
running on the top of btmtkuart driver.
There are a few differences between MT766[3,8]U and MT7622 where
MT766[3,8]U are standalone devices based on UART transport while MT7622
bluetooth is a built-in device on MediaTek SoC communicating with the host
through BTIF serial transport. Thus, extra setup sequence is necessary
for these standalone devices such as remote regulator and reset control via
GPIO, baud rate changing handshake between the host and device and so on.
====================
Host Bandwidth Manager is a framework for limiting the bandwidth used
by v2 cgroups. It consists of 1 BPF helper, a sample BPF program to
limit egress bandwdith as well as a sample user program and script to
simplify HBM testing.
The sample HBM BPF program is not meant to be production quality, it is
provided as proof of concept. A lot more information, including sample
runs in some cases, are provided in the commit messages of the individual
patches.
A future patch will add support for reducing TCP's cwnd (we are evaluating
alternatives). Another patch will add support for fair queueing's Earliest
Departure Time. Until then, HBM is better suited for flows supporitng ECN.
In addition, A BPF program to limit ingress bandwidth will be provided in
an upcomming patchset.
Changes from v1 to v2:
* bpf_tcp_enter_cwr can only be called from a cgroup skb egress BPF
program (otherwise load or attach will fail) where we already hold
the sk lock. Also only applies for ESTABLISHED state.
* bpf_skb_ecn_set_ce uses INET_ECN_set_ce()
* bpf_tcp_check_probe_timer now uses tcp_reset_xmit_timer. Can only be
used by egress cgroup skb programs.
* removed load_cg_skb user program.
* nrm bpf egress program checks packet header in skb to determine
ECN value. Now also works for ECN enabled UDP packets.
Using ECN_ defines instead of integers.
* NRM script test program now uses bpftool instead of load_cg_skb
Changes from v2 to v3:
* Changed name from NRM (Network Resource Manager) to HBM (Host
Bandwdith Manager)
* The bpf helper to set ECN ce now checks that the header is writeable
* Removed helper bpf functions that modified TCP state due to a concern
about whether the socket is locked by the current thread.
====================
brakmo [Fri, 1 Mar 2019 20:38:50 +0000 (12:38 -0800)]
bpf: HBM test script
Script for testing HBM (Host Bandwidth Manager) framework.
It creates a cgroup to use for testing and load a BPF program to limit
egress bandwidht. It then uses iperf3 or netperf to create
loads. The output is the goodput in Mbps (unless -D is used).
It can work on a single host using loopback or among two hosts (with netperf).
When using loopback, it is recommended to also introduce a delay of at least
1ms (-d=1), otherwise the assigned bandwidth is likely to be underutilized.
USAGE: $name [out] [-b=<prog>|--bpf=<prog>] [-c=<cc>|--cc=<cc>] [-D]
[-d=<delay>|--delay=<delay>] [--debug] [-E]
[-f=<#flows>|--flows=<#flows>] [-h] [-i=<id>|--id=<id >] [-l]
[-N] [-p=<port>|--port=<port>] [-P] [-q=<qdisc>]
[-R] [-s=<server>|--server=<server] [--stats]
[-t=<time>|--time=<time>] [-w] [cubic|dctcp]
Where:
out Egress (default egress)
-b or --bpf BPF program filename to load and attach.
Default is nrm_out_kern.o for egress,
-c or -cc TCP congestion control (cubic or dctcp)
-d or --delay Add a delay in ms using netem
-D In addition to the goodput in Mbps, it also outputs
other detailed information. This information is
test dependent (i.e. iperf3 or netperf).
--debug Print BPF trace buffer
-E Enable ECN (not required for dctcp)
-f or --flows Number of concurrent flows (default=1)
-i or --id cgroup id (an integer, default is 1)
-l Do not limit flows using loopback
-N Use netperf instead of iperf3
-h Help
-p or --port iperf3 port (default is 5201)
-P Use an iperf3 instance for each flow
-q Use the specified qdisc.
-r or --rate Rate in Mbps (default 1s 1Gbps)
-R Use TCP_RR for netperf. 1st flow has req
size of 10KB, rest of 1MB. Reply in all
cases is 1 byte.
More detailed output for each flow can be found
in the files netperf.<cg>.<flow>, where <cg> is the
cgroup id as specified with the -i flag, and <flow>
is the flow id starting at 1 and increasing by 1 for
flow (as specified by -f).
-s or --server hostname of netperf server. Used to create netperf
test traffic between to hosts (default is within host)
netserver must be running on the host.
--stats Get HBM stats (marked, dropped, etc.)
-t or --time duration of iperf3 in seconds (default=5)
-w Work conserving flag. cgroup can increase its
bandwidth beyond the rate limit specified
while there is available bandwidth. Current
implementation assumes there is only one NIC
(eth0), but can be extended to support multiple
NICs. This is just a proof of concept.
cubic or dctcp specify TCP CC to use
Examples:
./do_hbm_test.sh -l -d=1 -D --stats
Runs a 5 second test, using a single iperf3 flow and with the default
rate limit of 1Gbps and a delay of 1ms (using netem) using the default
TCP congestion control on the loopback device (hence we use "-l" to
enforce bandwidth limit on loopback device). Since no direction is
specified, it defaults to egress. Since no TCP CC algorithm is
specified it uses the system default (Cubic for this test).
With no -D flag, only the value of the AGGREGATE OUTPUT would show.
id refers to the cgroup id and is useful when running multi cgroup
tests (supported by a future patch).
This patchset does not support calling TCP's congesion window
reduction, even when packets are dropped by the BPF program, resulting
in a large number of packets dropped. It is recommended that the current
HBM implemenation only be used with ECN enabled flows. A future patch
will add support for reducing TCP's cwnd and will increase the
performance of non-ECN enabled flows.
Output:
Details for HBM in cgroup 1
id:1
rate_mbps:493
duration:4.8 secs
packets:11355
bytes_MB:590
pkts_dropped:4497
bytes_dropped_MB:292
pkts_marked_percent: 39.60
bytes_marked_percent: 49.49
pkts_dropped_percent: 39.60
bytes_dropped_percent: 49.49
PING AVG DELAY:2.075
AGGREGATE_GOODPUT:505
./do_nrm_test.sh -l -d=1 -D --stats dctcp
Same as above but using dctcp. Note that fewer bytes are dropped
(0.01% vs. 49%).
Output:
Details for HBM in cgroup 1
id:1
rate_mbps:945
duration:4.9 secs
packets:16859
bytes_MB:578
pkts_dropped:1
bytes_dropped_MB:0
pkts_marked_percent: 28.74
bytes_marked_percent: 45.15
pkts_dropped_percent: 0.01
bytes_dropped_percent: 0.01
PING AVG DELAY:2.083
AGGREGATE_GOODPUT:965
./do_nrm_test.sh -d=1 -D --stats
As first example, but without limiting loopback device (i.e. no
"-l" flag). Since there is no bandwidth limiting, no details for
HBM are printed out.
Output:
Details for HBM in cgroup 1
PING AVG DELAY:2.019
AGGREGATE_GOODPUT:42655
./do_hbm.sh -l -d=1 -D --stats -f=2
Uses iper3 and does 2 flows
./do_hbm.sh -l -d=1 -D --stats -f=4 -P
Uses iperf3 and does 4 flows, each flow as a separate process.
./do_hbm.sh -l -d=1 -D --stats -f=4 -N
Uses netperf, 4 flows
./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats dctcp -s=<server-name>
Uses netperf between two hosts. The remote host name is specified
with -s= and you need to start the program netserver manually on
the remote host. It will use 1 flow, a rate limit of 2Gbps and dctcp.
./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats -w dctcp \
-s=<server-name>
As previous, but allows use of extra bandwidth. For this test the
rate is 8Gbps vs. 1Gbps of the previous test.
brakmo [Fri, 1 Mar 2019 20:38:49 +0000 (12:38 -0800)]
bpf: User program for testing HBM
The program nrm creates a cgroup and attaches a BPF program to the
cgroup for testing HBM (Host Bandwidth Manager) for egress traffic.
One still needs to create network traffic. This can be done through
netesto, netperf or iperf3.
A follow-up patch contains a script to create traffic.
USAGE: hbm [-d] [-l] [-n <id>] [-r <rate>] [-s] [-t <secs>]
[-w] [-h] [prog]
Where:
-d Print BPF trace debug buffer
-l Also limit flows doing loopback
-n <#> To create cgroup "/hbm#" and attach prog. Default is /nrm1
This is convenient when testing HBM in more than 1 cgroup
-r <rate> Rate limit in Mbps
-s Get HBM stats (marked, dropped, etc.)
-t <time> Exit after specified seconds (deault is 0)
-w Work conserving flag. cgroup can increase its bandwidth
beyond the rate limit specified while there is available
bandwidth. Current implementation assumes there is only
NIC (eth0), but can be extended to support multiple NICs.
Currrently only supported for egress. Note, this is just
a proof of concept.
-h Print this info
prog BPF program file name. Name defaults to hbm_out_kern.o
More information about HBM can be found in the paper "BPF Host Resource
Management" presented at the 2018 Linux Plumbers Conference, Networking Track
(http://vger.kernel.org/lpc_net2018_talks/LPC%20BPF%20Network%20Resource%20Paper.pdf)
brakmo [Fri, 1 Mar 2019 20:38:48 +0000 (12:38 -0800)]
bpf: Sample HBM BPF program to limit egress bw
A cgroup skb BPF program to limit cgroup output bandwidth.
It uses a modified virtual token bucket queue to limit average
egress bandwidth. The implementation uses credits instead of tokens.
Negative credits imply that queueing would have happened (this is
a virtual queue, so no queueing is done by it. However, queueing may
occur at the actual qdisc (which is not used for rate limiting).
This implementation uses 3 thresholds, one to start marking packets and
the other two to drop packets:
CREDIT
- <--------------------------|------------------------> +
| | | 0
| Large pkt |
| drop thresh |
Small pkt drop Mark threshold
thresh
The effect of marking depends on the type of packet:
a) If the packet is ECN enabled, then the packet is ECN ce marked.
The current mark threshold is tuned for DCTCP.
c) Else, it is dropped if it is a large packet.
If the credit is below the drop threshold, the packet is dropped.
Note that dropping a packet through the BPF program does not trigger CWR
(Congestion Window Reduction) in TCP packets. A future patch will add
support for triggering CWR.
This BPF program actually uses 2 drop thresholds, one threshold
for larger packets (>= 120 bytes) and another for smaller packets. This
protects smaller packets such as SYNs, ACKs, etc.
The default bandwidth limit is set at 1Gbps but this can be changed by
a user program through a shared BPF map. In addition, by default this BPF
program does not limit connections using loopback. This behavior can be
overwritten by the user program. There is also an option to calculate
some statistics, such as percent of packets marked or dropped, which
the user program can access.
brakmo [Fri, 1 Mar 2019 20:38:46 +0000 (12:38 -0800)]
bpf: add bpf helper bpf_skb_ecn_set_ce
This patch adds a new bpf helper BPF_FUNC_skb_ecn_set_ce
"int bpf_skb_ecn_set_ce(struct sk_buff *skb)". It is added to
BPF_PROG_TYPE_CGROUP_SKB typed bpf_prog which currently can
be attached to the ingress and egress path. The helper is needed
because his type of bpf_prog cannot modify the skb directly.
This helper is used to set the ECN field of ECN capable IP packets to ce
(congestion encountered) in the IPv6 or IPv4 header of the skb. It can be
used by a bpf_prog to manage egress or ingress network bandwdith limit
per cgroupv2 by inducing an ECN response in the TCP sender.
This works best when using DCTCP.
1) Fix refcount leak in act_ipt during replace, from Davide Caratti.
2) Set task state properly in tun during blocking reads, from Timur
Celik.
3) Leaked reference in DSA, from Wen Yang.
4) NULL deref in act_tunnel_key, from Vlad Buslov.
5) cipso_v4_erro can reference the skb IPCB in inappropriate contexts
thus referencing garbage, from Nazarov Sergey.
6) Don't accept RTA_VIA and RTA_GATEWAY in contexts where those
attributes make no sense.
7) Fix hung sendto in tipc, from Tung Nguyen.
8) Out-of-bounds access in netlabel, from Paul Moore.
9) Grant reference leak in xen-netback, from Igor Druzhinin.
10) Fix tx stalls with lan743x, from Bryan Whitehead.
11) Fix interrupt storm with mv88e6xxx, from Hein Kallweit.
12) Memory leak in sit on device registry failure, from Mao Wenan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
net: sit: fix memory leak in sit_init_net()
net: dsa: mv88e6xxx: Fix statistics on mv88e6161
geneve: correctly handle ipv6.disable module parameter
net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode
bpf: fix sanitation rewrite in case of non-pointers
ipv4: Add ICMPv6 support when parse route ipproto
MIPS: eBPF: Fix icache flush end address
lan743x: Fix TX Stall Issue
net: phy: phylink: fix uninitialized variable in phylink_get_mac_state
net: aquantia: regression on cpus with high cores: set mode with 8 queues
selftests: fixes for UDP GRO
bpf: drop refcount if bpf_map_new_fd() fails in map_create()
net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X
net: dsa: mv88e6xxx: Fix u64 statistics
xen-netback: don't populate the hash cache on XenBus disconnect
xen-netback: fix occasional leak of grant ref mappings under memory pressure
sctp: chunk.c: correct format string for size_t in printk
net: netem: fix skb length BUG_ON in __skb_to_sgvec
netlabel: fix out-of-bounds memory accesses
ipv4: Pass original device to ip_rcv_finish_core
...
Bluetooth: hci_qca: Reduce delay after sending baudrate request for WCN3990
The current 300ms delay after a baudrate change is extremely long.
For WCN3990 it is sufficient to wait 10ms after the baudrate change
request has been sent over the wire.
Linus Torvalds [Sat, 2 Mar 2019 16:32:02 +0000 (08:32 -0800)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull more crypto fixes from Herbert Xu:
"This fixes a couple of issues in arm64/chacha that was introduced in
5.0"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm64/chacha - fix hchacha_block_neon() for big endian
crypto: arm64/chacha - fix chacha_4block_xor_neon() for big endian
David S. Miller [Sat, 2 Mar 2019 07:23:35 +0000 (23:23 -0800)]
Merge branch 'net-mvpp2-fixes-and-improvements'
Antoine Tenart says:
====================
net: mvpp2: fixes and improvements
This series aims to improve the Marvell PPv2 driver and to fix various
issues we encountered while testing the ports in many different
configurations. The series is based on top of Russell PPv2 phylink
rework and improvement.
I'm not sending a v2 of the previous fixes series as half the patches
are not the same and lots of development happened in between.
While this series contains fixes, it's sent to net-next as it is based
on top of Russell patches that were merged into net-next. I'm also
aiming at net-next as the series reworks critical paths of the PPv2
driver, such as the reset handling of various blocks, to let more weeks
for users to tests and for possible fixes to be sent before it lands
into a stable kernel version.
The series is divided into three parts:
- Patches 1 to 3 are cosmetic changes, sent alongside the series, as I
saw these small issues while working on this.
- Patches 5 to 8 are fixing (or improving) individual issues that we
found while testing PPv2.1 and PPv2.2 ports while using various
interfaces.
Notable fixes are we support back RGMII interfaces (on both PPv2.1 and
PPv2.2), as their support was broken by previous patches. We also
reworked the RXQ computation as the RXQ assignment was not checking
the maximum number of RXQ available, and was broken for PPv2.1.
- As discussed in a previous fixes series, patches 9 to 15 rework the
way blocks are set in reset in the PPv2 engine (plus related changes).
There are four blocks we want to control the reset status: two MAC
(GMAC and XLG MAC) and two PCS (MPCS and XPCS). The XLG MAC is used
for 10G connexions and uses the MPCS or the XPCS depending on the mode
used (10GKR / XAUI / RXAUI) and the GMAC is used for the other modes.
The idea is to set all blocks in reset by default, and when not used,
and to de-assert the reset only when a block is used. There are four
cases to take in account:
1. Boot time: all four blocks should be put in reset, as we do not
know their initial state (configured by the firmware/bootloader).
2. Link up: only the blocks used by a given mode should be put out of
reset (eg. 10GKR uses the XLG MAC and the MPCS).
3. Mode reconfiguration: some ports may support mode reconfiguration,
and switching between the GMAC and the XLG MAC (or between the two
PCS). All blocks should be put in reset, and only the one used
should be put out of reset.
4. Link down: all four blocks are put in reset.
====================
Antoine Tenart [Fri, 1 Mar 2019 10:52:17 +0000 (11:52 +0100)]
net: mvpp2: set the GMAC, XLG MAC, XPCS and MPCS in reset when a port is down
This patch adds calls in the stop() helper to ensure both MACs and
both PCS blocks are set in reset when the user manually sets a port
down. This is done so that we have the exact same block reset states at
boot time and when a port is set down.
Antoine Tenart [Fri, 1 Mar 2019 10:52:16 +0000 (11:52 +0100)]
net: mvpp2: set the XPCS and MPCS in reset when not used
This patch sets both the XPCS and MPCS blocks in reset when they aren't
used. This is done both at boot time and when reconfiguring a port mode.
The advantage now is that only the PCS used is set out of reset when the
port is configured (10GKR uses the MCPS while RXAUI uses the XPCS).
Antoine Tenart [Fri, 1 Mar 2019 10:52:15 +0000 (11:52 +0100)]
net: mvpp2: reset the MACs when reconfiguring a port
This patch makes sure both PPv2 MACs (GMAC + XLG MAC) are set in reset
while a port is reconfigured. This is done so that we make sure a MAC is
in a reset state when not used, as only one of the two will be set out
of reset after the port is configured properly.
Antoine Tenart [Fri, 1 Mar 2019 10:52:14 +0000 (11:52 +0100)]
net: mvpp2: rework the XLG MAC reset handling
This patch reworks the way the XLG MAC is set in reset: the XLG MAC is
set in reset at probe time and taken out of this state only when used.
The idea is to move forward a situation where only the blocks used are
taken out of reset. This also has the effect to handle the GMAC and the
XLG MAC in a similar way (the GMAC already is set in reset at boot
time).
Antoine Tenart [Fri, 1 Mar 2019 10:52:13 +0000 (11:52 +0100)]
net: mvpp2: force the XLG MAC link up or down when not using in-band
This patch force the XLG MAC link state in the phylink link_up() and
link_down() helpers when not using in-band auto-negotiation. This mimics
what's already done for the GMAC and follows what's advised in the
phylink documentation.
Antoine Tenart [Fri, 1 Mar 2019 10:52:12 +0000 (11:52 +0100)]
net: mvpp2: only update the XLG configuration when needed
This patch improves the XLG configuration function, to only update the
XLG configuration register when a change is needed. This helps not
writing over and over the same XLG configuration each time phylink
request the MAC to be configured. This mimics the GMAC configuration
function.
Antoine Tenart [Fri, 1 Mar 2019 10:52:11 +0000 (11:52 +0100)]
net: mvpp2: always disable both MACs when disabling a port
This patch modifies the port_disable() helper to always disable both the
GMAC and the XLG MAC when called. At boot time we do not know of a port
was enabled in the firmware/bootloader, and if so what mode was used
(hence which of the two MACs was used).
This also help in implementing a logic where all blocks are disabled
when not used, and only enabled regarding the current mode used on a
given port.
Antoine Tenart [Fri, 1 Mar 2019 10:52:10 +0000 (11:52 +0100)]
net: mvpp2: some AN fields require the link to be down when updated
The GMAC configuration helper modifies values in the auto-negotiation
register. Some of its values require the port to be forced down when
modifying their values. This patches fixes the check made on the bit to
be updated in this register, so that the port is forced down when
needed. This fix cases where some of those parameters were updated, but
not taken into account, such as when using RGMII interfaces.
Fixes: d14e078f23cc ("net: marvell: mvpp2: only reprogram what is necessary on mac_config") Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Antoine Tenart [Fri, 1 Mar 2019 10:52:09 +0000 (11:52 +0100)]
net: mvpp2: fix the computation of the RXQs
The patch fixes the computation of RXQs being used by the PPv2 driver,
which is set depending on the PPv2 engine version and the queue mode
used. There are three cases:
- PPv2.1: 1 RXQ per CPU.
- PPV2.2 with MVPP2_QDIST_MULTI_MODE: 1 RXQ per CPU.
- PPv2.2 with MVPP2_QDIST_SINGLE_MODE: 1 RXQ is shared between the CPUs.
The PPv2 engine supports a maximum of 32 queues per port. This patch
adds a check so that we do not overstep this maximum.
It appeared the calculation was broken for PPv2.1 engines since f8c6ba8424b0, as PPv2.1 ports ended up with a single RXQ while they
needed 4. This patch fixes it.
Fixes: f8c6ba8424b0 ("net: mvpp2: use only one rx queue per port per CPU") Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Antoine Tenart [Fri, 1 Mar 2019 10:52:08 +0000 (11:52 +0100)]
net: mvpp2: fix validate for PPv2.1
The Phylink validate function is the Marvell PPv2 driver makes a check
on the GoP id. This is valid an has to be done when using PPv2.2 engines
but makes no sense when using PPv2.1. The check done when using an RGMII
interface makes sure the GoP id is not 0, but this breaks PPv2.1. Fixes
it.
Fixes: 0fb628f0f250 ("net: mvpp2: fix phylink handling of invalid PHY modes") Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Antoine Tenart [Fri, 1 Mar 2019 10:52:07 +0000 (11:52 +0100)]
net: mvpp2: reconfiguring the port interface is PPv2.2 specific
This patch adds a check on the PPv2 version in-use not to reconfigure
the port mode when an interface is updated when using PPv2.1 as the
functions called are PPv2.2 specific.
Antoine Tenart [Fri, 1 Mar 2019 10:52:06 +0000 (11:52 +0100)]
net: mvpp2: a port can be disabled even if we use the link IRQ
We had a check in the mvpp2_mac_link_down() function (called by phylink)
to avoid disabling the port when link interrupts are used. It turned out
the interrupt can still be used with the port disabled. We can thus
remove this check.
Antoine Tenart [Fri, 1 Mar 2019 10:52:04 +0000 (11:52 +0100)]
net: mvpp2: update the port documentation regarding the GoP
The Marvell PPv2 port structure stores the GoP id of a given port. This
information is specific to PPv2.2, but cannot be used by PPv2.1. Update
its comment to denote this specificity.
Eric Dumazet [Thu, 28 Feb 2019 23:17:28 +0000 (15:17 -0800)]
net: support 64bit rates for getsockopt(SO_MAX_PACING_RATE)
For legacy applications using 32bit variable, SO_MAX_PACING_RATE
has to cap the returned value to 0xFFFFFFFF, meaning that
rates above 34.35 Gbit are capped.
This patch allows applications to read socket pacing rate
at full resolution, if they provide a 64bit variable to store it,
and the kernel is 64bit.
Lucas Bates [Thu, 28 Feb 2019 22:38:40 +0000 (17:38 -0500)]
tc-testing: Allow test cases to be skipped
By adding a check for an optional key/value pair to the test case
data, individual test cases may be skipped to prevent tdc from
aborting a test run due to setup or teardown failure.
If a test case is skipped, it will still appear in the results
output to allow for a consistent number of executed tests in each
run. However, the test will be marked as skipped.
This support for skipping extends to any plugins that may generate
additional results for each executed test.
When IPv6 is compiled but disabled at runtime, geneve_sock_add returns
-EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
operation of bringing up the tunnel.
Ignore failure of IPv6 socket creation for metadata based tunnels caused by
IPv6 not being available.
This is the same fix as what commit d074bf960044 ("vxlan: correctly handle
ipv6.disable module parameter") is doing for vxlan.
Note there's also commit c0a47e44c098 ("geneve: should not call rt6_lookup()
when ipv6 was disabled") which fixes a similar issue but for regular
tunnels, while this patch is needed for metadata based tunnels.
David S. Miller [Sat, 2 Mar 2019 05:44:11 +0000 (21:44 -0800)]
Merge branch 'mlxsw-rehash-split'
Ido Schimmel says:
====================
mlxsw: spectrum_acl: Split rehash work into chunks
Jiri says:
When rehash happens on a vregion with many rules and they are being
migrated, it might take significant time to finish the job. During that
time vregion->lock is taken which prevents rules from being
added/deleted from the vregion.
Aim of this patchset is to allow to interrupt migration of rules during
rehash, reschedule and give chance for rules to be added/deleted. Then
continue migration in another execution of scheduled work.
====================
Jiri Pirko [Thu, 28 Feb 2019 06:59:26 +0000 (06:59 +0000)]
mlxsw: spectrum_acl: Remember where to continue rehash migration
Store pointer to vchunk where the migration was interrupted, as well as
ventry pointer to start from and to stop at (during rollback). This
saved pointers need to be forgotten in case of ventries list or vchunk
list changes, which is done by couple of "changed" helpers.
Jiri Pirko [Thu, 28 Feb 2019 06:59:25 +0000 (06:59 +0000)]
mlxsw: spectrum_acl: Allow to interrupt/continue rehash work
Currently, migration of vregions with many entries may take long time
during which insertions and removals of the rules are blocked
due to wait to acquire vregion->lock.
To overcome this, allow to interrupt and continue rehash work according
to the set credits - number of rules to migrate.
Jiri Pirko [Thu, 28 Feb 2019 06:59:24 +0000 (06:59 +0000)]
mlxsw: spectrum_acl: Do rollback as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all()
In order to simplify the code and to prepare it for
interrupted/continued migration process, do the rollback in case of
migration error as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all().
It can be understood as "migrate all back".
Jiri Pirko [Thu, 28 Feb 2019 06:59:24 +0000 (06:59 +0000)]
mlxsw: spectrum_acl: Put vchunk migrate start/end code into separate functions
In preparations of interrupt/continue of rehash work, put the code that
is done at the beginning/end of vchunk migrate function into separate
functions.
Jiri Pirko [Thu, 28 Feb 2019 06:59:19 +0000 (06:59 +0000)]
mlxsw: spectrum_acl: Push code start/end from mlxsw_sp_acl_tcam_vregion_migrate()
Push code from the beginning and end of function
mlxsw_sp_acl_tcam_vregion_migrate() into rehash_start()/end() functions.
Then all the things needed to be done before and after the actual
migration process will be grouped together.
Check if the entry is already in a chunk where we want it to be. In that
case, skip migration. This is preparation for "per parts" migration
where this situation may occur.
Heiner Kallweit [Thu, 28 Feb 2019 06:39:15 +0000 (07:39 +0100)]
net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode
When debugging another issue I faced an interrupt storm in this
driver (88E6390, port 9 in SGMII mode), consisting of alternating
link-up / link-down interrupts. Analysis showed that the driver
wanted to set a cmode that was set already. But so far
mv88e6390x_port_set_cmode() doesn't check this and powers down
SERDES, what causes the link to break, and eventually results in
the described interrupt storm.
Fix this by checking whether the cmode actually changes. We want
that the very first call to mv88e6390x_port_set_cmode() always
configures the registers, therefore initialize port.cmode with
a value that is different from any supported cmode value.
We have to take care that we only init the ports cmode once
chip->info->num_ports is set.
v2:
- add small helper and init the number of actual ports only
Fixes: 364e9d7776a3 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change") Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Florian Fainelli [Thu, 28 Feb 2019 00:29:16 +0000 (16:29 -0800)]
switchdev: Remove unused transaction item queue
There are no more in tree users of the
switchdev_trans_item_{dequeue,enqueue} or switchdev_trans_item structure
in the kernel since commit 00fc0c51e35b ("rocker: Change world_ops API
and implementation to be switchdev independant").
Remove this unused code and update the documentation accordingly since.
Daniel Borkmann [Fri, 1 Mar 2019 21:05:29 +0000 (22:05 +0100)]
bpf: fix sanitation rewrite in case of non-pointers
Marek reported that he saw an issue with the below snippet in that
timing measurements where off when loaded as unpriv while results
were reasonable when loaded as privileged:
[...]
uint64_t a = bpf_ktime_get_ns();
uint64_t b = bpf_ktime_get_ns();
uint64_t delta = b - a;
if ((int64_t)delta > 0) {
[...]
Turns out there is a bug where a corner case is missing in the fix d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar
type from different paths"), namely fixup_bpf_calls() only checks
whether aux has a non-zero alu_state, but it also needs to test for
the case of BPF_ALU_NON_POINTER since in both occasions we need to
skip the masking rewrite (as there is nothing to mask).
====================
doc: net: ieee802154: move from plain text to rst
The ieee802154 subsystem doc was still in plain text. With the networking book
taking shape I thought it was time to do the first step and move it over to rst.
This really is only the minimal conversion. I need to take some time to update
and extend the docs.
The patches are based on net-next, but they only touch the networking book so I
would not expect and trouble. From what I have seen they would go through
Jonathan's tree after being acked by Dave? If you want this patches against a
different tree let me know.
====================
Moving the ieee802154 docs from a plain text file into the new rst
style. This commit only does the minimal needed change to bring the
documentation over. Follow up patches will improve and extend on this.
Jakub Kicinski [Wed, 27 Feb 2019 19:36:36 +0000 (11:36 -0800)]
devlink: fix kdoc
devlink suffers from a few kdoc warnings:
net/core/devlink.c:5292: warning: Function parameter or member 'dev' not described in 'devlink_register'
net/core/devlink.c:5351: warning: Function parameter or member 'port_index' not described in 'devlink_port_register'
net/core/devlink.c:5753: warning: Function parameter or member 'parent_resource_id' not described in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Function parameter or member 'size_params' not described in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'top_hierarchy' description in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'reload_required' description in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'parent_reosurce_id' description in 'devlink_resource_register'
net/core/devlink.c:6451: warning: Function parameter or member 'region' not described in 'devlink_region_snapshot_create'
net/core/devlink.c:6451: warning: Excess function parameter 'devlink_region' description in 'devlink_region_snapshot_create'
Igor Russkikh [Wed, 27 Feb 2019 12:10:09 +0000 (12:10 +0000)]
net: aquantia: fixed instack structure overflow
This is a real stack undercorruption found by kasan build.
The issue did no harm normally because it only overflowed
2 bytes after `bitary` array which on most architectures
were mapped into `err` local.
Fixes: bab6de8fd180 ("net: ethernet: aquantia: Atlantic A0 and B0 specific functions.") Signed-off-by: Nikita Danilov <[email protected]> Signed-off-by: Igor Russkikh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Hangbin Liu [Wed, 27 Feb 2019 08:15:29 +0000 (16:15 +0800)]
ipv4: Add ICMPv6 support when parse route ipproto
For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers.
But for ip -6 route, currently we only support tcp, udp and icmp.
Add ICMPv6 support so we can match ipv6-icmp rules for route lookup.
v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to
rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family.
Reported-by: Jianlin Shi <[email protected]> Fixes: eacb9384a3fe ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE") Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Yonghong Song [Fri, 1 Mar 2019 06:19:41 +0000 (22:19 -0800)]
samples/bpf: silence compiler warning for xdpsock_user.c
Compiling xdpsock_user.c with 4.8.5, I hit the following
compilation warning:
HOSTCC samples/bpf/xdpsock_user.o
/data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c: In function ‘main’:
/data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:449:6: warning: ‘idx_cq’ may be used unini
tialized in this function [-Wmaybe-uninitialized]
u32 idx_cq, idx_fq;
^
/data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:606:7: warning: ‘idx_rx’ may be used unini
tialized in this function [-Wmaybe-uninitialized]
u32 idx_rx, idx_tx = 0;
^
/data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:506:6: warning: ‘idx_rx’ may be used unini
tialized in this function [-Wmaybe-uninitialized]
u32 idx_rx, idx_fq = 0;
As an example, the code pattern looks like:
u32 idx_cq;
...
ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
if (ret) {
...
}
... idx_fq ...
The compiler warns since it does not know whether &idx_fq is assigned
or not inside the library function xsk_ring_prod__reserve().
Let us assign an initial value 0 to such auto variables to silence
compiler warning.
Yonghong Song [Fri, 1 Mar 2019 06:18:16 +0000 (22:18 -0800)]
selftests/bpf: set unlimited RLIMIT_MEMLOCK for test_sock_fields
This is to avoid permission denied error. A lot of systems
may have a much lower number, e.g., 64KB, for RLIMIT_MEMLOCK,
which may not be sufficient for the test to run successfully.
Fixes: e0b27b3f97b8 ("bpf: Add test_sock_fields for skb->sk and bpf_tcp_sock") Signed-off-by: Yonghong Song <[email protected]> Acked-by: Song Liu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>