Git Repo - linux.git/log

Merge branch 'net-dsa-microchip-add-KSZ9893-switch-support'

Tristram Ha says:

====================
net: dsa: microchip: add KSZ9893 switch support

This series of patches is to modify the KSZ9477 DSA driver to support
running KSZ9893 switch.

The KSZ9893 switch is similar to KSZ9477 except the ingress tail tag has
1 byte instead of 2 bytes. The XMII register that governs the MAC
communication also has different register definitions.

v1
- Put KSZ9893 tagging in separate patch
- Remove other switch support
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: microchip: add KSZ9893 switch support

Add KSZ9893 switch support in KSZ9477 driver. This switch is similar to
KSZ9477 except the ingress tail tag has 1 byte instead of 2 bytes, so
KSZ9893 tagging will be used.

The XMII register that governs how the host port communicates with the
MAC also has different register definitions.

Signed-off-by: Tristram Ha <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: add KSZ9893 switch tagging support

KSZ9893 switch is similar to KSZ9477 switch except the ingress tail tag
has 1 byte instead of 2 bytes. The size of the portmap is smaller and
so the override and lookup bits are also moved.

Signed-off-by: Tristram Ha <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: net: dsa: document additional Microchip KSZ9477 family switches

Document additional Microchip KSZ9477 family switches.

Show how KSZ8565 switch should be configured as the host port is port 7
instead of port 5.

Signed-off-by: Tristram Ha <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'appletalk-small-cleanup-and-bugfix'

Yue Haibing says:

====================
appletalk: small cleanup and bugfix

v2:
- Add cover letter log

This patch series mainly fix a use-after-free bug in atalk_proc_exit.
patch 1 use remove_proc_subtree helper to simplify atalk_proc fs code,
also some other cleanup.
patch 2 add proper error cleanup path in atalk_init to fix the issue, which
based on the patch 1 because of the change of atalk_proc_exit context.
====================

Signed-off-by: David S. Miller <[email protected]>

appletalk: Fix use-after-free in atalk_proc_exit

KASAN report this:

BUG: KASAN: use-after-free in pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71
Read of size 8 at addr ffff8881f41fe5b0 by task syz-executor.0/2806

CPU: 0 PID: 2806 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xfa/0x1ce lib/dump_stack.c:113
print_address_description+0x65/0x270 mm/kasan/report.c:187
kasan_report+0x149/0x18d mm/kasan/report.c:317
pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71
remove_proc_entry+0xe8/0x420 fs/proc/generic.c:667
atalk_proc_exit+0x18/0x820 [appletalk]
atalk_exit+0xf/0x5a [appletalk]
__do_sys_delete_module kernel/module.c:1018 [inline]
__se_sys_delete_module kernel/module.c:961 [inline]
__x64_sys_delete_module+0x3dc/0x5e0 kernel/module.c:961
do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fb2de6b9c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200001c0
RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb2de6ba6bc
R13: 00000000004bccaa R14: 00000000006f6bc8 R15: 00000000ffffffff

Allocated by task 2806:
set_track mm/kasan/common.c:85 [inline]
__kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496
slab_post_alloc_hook mm/slab.h:444 [inline]
slab_alloc_node mm/slub.c:2739 [inline]
slab_alloc mm/slub.c:2747 [inline]
kmem_cache_alloc+0xcf/0x250 mm/slub.c:2752
kmem_cache_zalloc include/linux/slab.h:730 [inline]
__proc_create+0x30f/0xa20 fs/proc/generic.c:408
proc_mkdir_data+0x47/0x190 fs/proc/generic.c:469
0xffffffffc10c01bb
0xffffffffc10c0166
do_one_initcall+0xfa/0x5ca init/main.c:887
do_init_module+0x204/0x5f6 kernel/module.c:3460
load_module+0x66b2/0x8570 kernel/module.c:3808
__do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2806:
set_track mm/kasan/common.c:85 [inline]
__kasan_slab_free+0x130/0x180 mm/kasan/common.c:458
slab_free_hook mm/slub.c:1409 [inline]
slab_free_freelist_hook mm/slub.c:1436 [inline]
slab_free mm/slub.c:2986 [inline]
kmem_cache_free+0xa6/0x2a0 mm/slub.c:3002
pde_put+0x6e/0x80 fs/proc/generic.c:647
remove_proc_entry+0x1d3/0x420 fs/proc/generic.c:684
0xffffffffc10c031c
0xffffffffc10c0166
do_one_initcall+0xfa/0x5ca init/main.c:887
do_init_module+0x204/0x5f6 kernel/module.c:3460
load_module+0x66b2/0x8570 kernel/module.c:3808
__do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8881f41fe500
which belongs to the cache proc_dir_entry of size 256
The buggy address is located 176 bytes inside of
256-byte region [ffff8881f41fe500, ffff8881f41fe600)
The buggy address belongs to the page:
page:ffffea0007d07f80 count:1 mapcount:0 mapping:ffff8881f6e69a00 index:0x0
flags: 0x2fffc0000000200(slab)
raw: 02fffc0000000200 dead000000000100 dead000000000200 ffff8881f6e69a00
raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8881f41fe480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff8881f41fe500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8881f41fe580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8881f41fe600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff8881f41fe680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

It should check the return value of atalk_proc_init fails,
otherwise atalk_exit will trgger use-after-free in pde_subdir_find
while unload the module.This patch fix error cleanup path of atalk_init

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

appletalk: use remove_proc_subtree to simplify procfs code

Use remove_proc_subtree to remove the whole subtree
on cleanup.Also do some cleanup.

Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mscc: Enable all ports in QSGMII

When Ocelot phy-mode is QSGMII, all 4 ports involved in
QSGMII shall be kept out of reset and
Tx lanes shall be enabled to pass the data.

Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Kavya Sree Kotagiri <[email protected]>
Signed-off-by: Steen Hegelund <[email protected]>
Co-developed-by: Steen Hegelund <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"One more set of simple ARM platform fixes:

   - A boot regression on qualcomm msm8998

   - Gemini display controllers got turned off by accident

   - incorrect reference counting in optee"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  tee: optee: add missing of_node_put after of_device_is_available
  arm64: dts: qcom: msm8998: Extend TZ reserved memory area
  ARM: dts: gemini: Re-enable display controller

net: sched: put back q.qlen into a single location

In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'")
John made the assumption that the data path had no need to read
the qdisc qlen (number of packets in the qdisc).

It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO
children.

But pfifo_fast can be used as leaf in class full qdiscs, and existing
logic needs to access the child qlen in an efficient way.

HTB breaks badly, since it uses cl->leaf.q->q.qlen in :
  htb_activate() -> WARN_ON()
  htb_dequeue_tree() to decide if a class can be htb_deactivated
  when it has no more packets.

HFSC, DRR, CBQ, QFQ have similar issues, and some calls to
qdisc_tree_reduce_backlog() also read q.qlen directly.

Using qdisc_qlen_sum() (which iterates over all possible cpus)
in the data path is a non starter.

It seems we have to put back qlen in a central location,
at least for stable kernels.

For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock,
so the existing q.qlen{++|--} are correct.

For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}()
because the spinlock might be not held (for example from
pfifo_fast_enqueue() and pfifo_fast_dequeue())

This patch adds atomic_qlen (in the same location than qlen)
and renames the following helpers, since we want to express
they can be used without qdisc lock, and that qlen is no longer percpu.

- qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec()
- qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc()

Later (net-next) we might revert this patch by tracking all these
qlen uses and replace them by a more efficient method (not having
to access a precise qlen, but an empty/non_empty status that might
be less expensive to maintain/track).

Another possibility is to have a legacy pfifo_fast version that would
be used when used a a child qdisc, since the parent qdisc needs
a spinlock anyway. But then, future lockless qdiscs would also
have the same problem.

Fixes: 7e66016f2c65 ("net: sched: helpers to sum qlen and qlen for per cpu logic")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Jamal Hadi Salim <[email protected]>
Cc: Cong Wang <[email protected]>
Cc: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'mlx5-updates-2019-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-03-01

This series adds multipath offload support and contains some small updates
to mlx5 driver.

Multipath offload support from Roi Dayan:

We are going to track SW multipath route and related nexthops and reflect
that as port affinity to the HW.

1) Some patches are preparation.
2) add the multipath mode and fib events handling.
3) add support to handle offload failure for net error, i.e.
port down.
4) Small updates to match the behavior of multipath

Two small updates from Eran Ben Elisha,
5) Make a function static
6) Update PCIe supported devices list.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Add .release_ops to properly unroll .select_ops, use it from nft_compat.
   After this change, we can remove list of extensions too to simplify this
   codebase.

2) Update amanda conntrack helper to support v3.4, from Florian Tham.

3) Get rid of the obsolete BUGPRINT macro in ebtables, from
   Florian Westphal.

4) Merge IPv4 and IPv6 masquerading infrastructure into one single module.
   From Florian Westphal.

5) Patchset to remove nf_nat_l3proto structure to get rid of
   indirections, from Florian Westphal.

6) Skip unnecessary conntrack timeout updates in case the value is
   still the same, also from Florian Westphal.

7) Remove unnecessary 'fall through' comments in empty switch cases,
   from Li RongQing.

8) Fix lookup to fixed size hashtable sets on big endian with 32-bit keys.

9) Incorrect logic to deactivate path of fixed size hashtable sets,
   element was being tested to self.

10) Remove nft_hash_key(), the bitmap set is always selected for 16-bit
    keys.

11) Use boolean whenever possible in IPVS codebase, from Andrea Claudi.

12) Enter close state in conntrack if RST matches exact sequence number,
    from Florian Westphal.

13) Initialize dst_cache in tunnel extension, from wenxu.

14) Pass protocol as u16 to xt_check_match and xt_check_target, from
    Li RongQing.

15) SCTP header is granted to be in a linear area from IPVS NAT handler,
    from Xin Long.

16) Don't steal packets coming from slave VRF device from the
    ip_sabotage_in() path, from David Ahern.

17) Fix unsafe update of basechain stats, from Li RongQing.

18) Make sure CONNTRACK_LOCKS is power of 2 to let compiler optimize
    modulo operation as bitwise AND, from Li RongQing.

19) Use device_attribute instead of internal definition in the IDLETIMER
    target, from Sami Tolvanen.

20) Merge redir, masq and IPv4/IPv6 NAT chain types, from Florian Westphal.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2019-03-02

Here's one more bluetooth-next pull request for the 5.1 kernel:

- Added support for MediaTek MT7663U and MT7668U UART devices
- Cleanups & fixes to the hci_qca driver
- Fixed wakeup pin behavior for QCA6174A controller

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
"Two last minute fixes:

   - Prevent value evaluation via functions happening in the user access
     enabled region of __put_user() (put another way: make sure to
     evaluate the value to be stored in user space _before_ enabling
     user space accesses)

   - Correct the definition of a Hyper-V hypercall constant"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hyper-v: Fix definition of HV_MAX_FLUSH_REP_COUNT
  x86/uaccess: Don't leak the AC flag into __put_user() value evaluation

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Nine small fixes.

  The resume fix is a cosmetic removal of a warning with an incorrect
  condition causing it to alarm people wrongly.

  The other eight patches correct a thinko in Christoph Hellwig's DMA
  conversion series. Without it all these drivers end up with 32 bit DMA
  masks meaning they bounce any page over 4GB before sending it to the
  controller.

  Nowadays, even laptops mostly have memory above 4GB, so this can lead
  to significant performance degradation with all the bouncing"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: core: Avoid that system resume triggers a kernel warning
  scsi: hptiop: fix calls to dma_set_mask()
  scsi: hisi_sas: fix calls to dma_set_mask_and_coherent()
  scsi: csiostor: fix calls to dma_set_mask_and_coherent()
  scsi: bfa: fix calls to dma_set_mask_and_coherent()
  scsi: aic94xx: fix calls to dma_set_mask_and_coherent()
  scsi: 3w-sas: fix calls to dma_set_mask_and_coherent()
  scsi: 3w-9xxx: fix calls to dma_set_mask_and_coherent()
  scsi: lpfc: fix calls to dma_set_mask_and_coherent()

Merge branch 'split-test_progs'

Stanislav Fomichev says:

====================
Recently we had linux-next bpf/bpf-next conflict when we added new
functionality to the test_progs.c at the same location. Let's split
test_progs.c the same way we recently split test_verifier.c.

I follow the same patten we did in commit 2dfb40121ee8 ("selftests: bpf:
prepare for break up of verifier tests") for verifier: create
scaffolding to support dedicated files and slowly move the tests into
separate files.

The first patch adds scaffolding, subsequent patches move progs into
separate files.

In theory, many of the standalone tests can be migrated to this new
framework as well. They get the benefit of common CHECK macro and
bpf_find_map function which a lot of standalone tests need to redefine.

v3 changes:
* respin on top of commit ebace0e981b2 ("selftests/bpf: use
__bpf_constant_htons in test_prog.c for flow dissector")
* put bpf_rlimit.h into test_progs.c instead of test_progs.h

v2 changes:
* added cover letter, added more description about file structure
====================

Signed-off-by: Alexei Starovoitov <[email protected]>

selftests: bpf: break up test_progs - misc

Move the rest of prog tests into separate files.

Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests: bpf: break up test_progs - spinlock

Move spinlock prog tests into separate files.

Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests: bpf: break up test_progs - tracepoint

Move tracepoint prog tests into separate files.

Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests: bpf: break up test_progs - stackmap

Move stackmap prog tests into separate files.

Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests: bpf: break up test_progs - xdp

Move xdp prog tests into separate files.

Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests: bpf: break up test_progs - pkt access

Move pkt access prog tests into separate files.

Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests: bpf: break up test_progs - preparations

Add new prog_tests directory where tests are supposed to land.
Each prog_tests/<filename>.c is expected to have a global function
with signature 'void test_<filename>(void)'. Makefile automatically
generates prog_tests/tests.h file with entry for each prog_tests file:

#ifdef DECLARE
extern void test_<filename>(void);
...
#endif

#ifdef CALL
test_<filename>();
...
#endif

prog_tests/tests.h is included in test_progs.c in two places with
appropriate defines. This scheme allows us to move each function with
a separate patch without breaking anything.

Compared to the recent verifier split, each separate file here is
a compilation unit and test_progs.[ch] is now used as a place to put
some common routines that might be used by multiple tests.

Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

Bluetooth: mediatek: add support for MediaTek MT7663U and MT7668U UART devices

This adds the support of enabling MT7663U and MT7668U Bluetooth function
running on the top of btmtkuart driver.

There are a few differences between MT766[3,8]U and MT7622 where
MT766[3,8]U are standalone devices based on UART transport while MT7622
bluetooth is a built-in device on MediaTek SoC communicating with the host
through BTIF serial transport. Thus, extra setup sequence is necessary
for these standalone devices such as remote regulator and reset control via
GPIO, baud rate changing handshake between the host and device and so on.

Signed-off-by: Sean Wang <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Merge branch 'bpf_skb_ecn_set_ce'

Lawrence Brakmo says:

====================
Host Bandwidth Manager is a framework for limiting the bandwidth used
by v2 cgroups. It consists of 1 BPF helper, a sample BPF program to
limit egress bandwdith as well as a sample user program and script to
simplify HBM testing.

The sample HBM BPF program is not meant to be production quality, it is
provided as proof of concept. A lot more information, including sample
runs in some cases, are provided in the commit messages of the individual
patches.

A future patch will add support for reducing TCP's cwnd (we are evaluating
alternatives). Another patch will add support for fair queueing's Earliest
Departure Time. Until then, HBM is better suited for flows supporitng ECN.

In addition, A BPF program to limit ingress bandwidth will be provided in
an upcomming patchset.

Changes from v1 to v2:
  * bpf_tcp_enter_cwr can only be called from a cgroup skb egress BPF
    program (otherwise load or attach will fail) where we already hold
    the sk lock. Also only applies for ESTABLISHED state.
  * bpf_skb_ecn_set_ce uses INET_ECN_set_ce()
  * bpf_tcp_check_probe_timer now uses tcp_reset_xmit_timer. Can only be
    used by egress cgroup skb programs.
  * removed load_cg_skb user program.
  * nrm bpf egress program checks packet header in skb to determine
    ECN value. Now also works for ECN enabled UDP packets.
    Using ECN_ defines instead of integers.
  * NRM script test program now uses bpftool instead of load_cg_skb

Changes from v2 to v3:
  * Changed name from NRM (Network Resource Manager) to HBM (Host
    Bandwdith Manager)
  * The bpf helper to set ECN ce now checks that the header is writeable
  * Removed helper bpf functions that modified TCP state due to a concern
    about whether the socket is locked by the current thread.
====================

Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: HBM test script

Script for testing HBM (Host Bandwidth Manager) framework.
It creates a cgroup to use for testing and load a BPF program to limit
egress bandwidht. It then uses iperf3 or netperf to create
loads. The output is the goodput in Mbps (unless -D is used).

It can work on a single host using loopback or among two hosts (with netperf).
When using loopback, it is recommended to also introduce a delay of at least
1ms (-d=1), otherwise the assigned bandwidth is likely to be underutilized.

USAGE: $name [out] [-b=<prog>|--bpf=<prog>] [-c=<cc>|--cc=<cc>] [-D]
             [-d=<delay>|--delay=<delay>] [--debug] [-E]
             [-f=<#flows>|--flows=<#flows>] [-h] [-i=<id>|--id=<id >] [-l]
     [-N] [-p=<port>|--port=<port>] [-P] [-q=<qdisc>]
             [-R] [-s=<server>|--server=<server] [--stats]
     [-t=<time>|--time=<time>] [-w] [cubic|dctcp]
  Where:
    out               Egress (default egress)
    -b or --bpf       BPF program filename to load and attach.
                      Default is nrm_out_kern.o for egress,
    -c or -cc         TCP congestion control (cubic or dctcp)
    -d or --delay     Add a delay in ms using netem
    -D                In addition to the goodput in Mbps, it also outputs
                      other detailed information. This information is
                      test dependent (i.e. iperf3 or netperf).
    --debug           Print BPF trace buffer
    -E                Enable ECN (not required for dctcp)
    -f or --flows     Number of concurrent flows (default=1)
    -i or --id        cgroup id (an integer, default is 1)
    -l                Do not limit flows using loopback
    -N                Use netperf instead of iperf3
    -h                Help
    -p or --port      iperf3 port (default is 5201)
    -P                Use an iperf3 instance for each flow
    -q                Use the specified qdisc.
    -r or --rate      Rate in Mbps (default 1s 1Gbps)
    -R                Use TCP_RR for netperf. 1st flow has req
                      size of 10KB, rest of 1MB. Reply in all
                      cases is 1 byte.
                      More detailed output for each flow can be found
                      in the files netperf.<cg>.<flow>, where <cg> is the
                      cgroup id as specified with the -i flag, and <flow>
                      is the flow id starting at 1 and increasing by 1 for
                      flow (as specified by -f).
    -s or --server    hostname of netperf server. Used to create netperf
                      test traffic between to hosts (default is within host)
                      netserver must be running on the host.
    --stats           Get HBM stats (marked, dropped, etc.)
    -t or --time      duration of iperf3 in seconds (default=5)
    -w                Work conserving flag. cgroup can increase its
                      bandwidth beyond the rate limit specified
                      while there is available bandwidth. Current
                      implementation assumes there is only one NIC
                      (eth0), but can be extended to support multiple
                      NICs. This is just a proof of concept.
    cubic or dctcp    specify TCP CC to use

Examples:
./do_hbm_test.sh -l -d=1 -D --stats
     Runs a 5 second test, using a single iperf3 flow and with the default
     rate limit of 1Gbps and a delay of 1ms (using netem) using the default
     TCP congestion control on the loopback device (hence we use "-l" to
     enforce bandwidth limit on loopback device). Since no direction is
     specified, it defaults to egress. Since no TCP CC algorithm is
     specified it uses the system default (Cubic for this test).
     With no -D flag, only the value of the AGGREGATE OUTPUT would show.
     id refers to the cgroup id and is useful when running multi cgroup
     tests (supported by a future patch).
     This patchset does not support calling TCP's congesion window
     reduction, even when packets are dropped by the BPF program, resulting
     in a large number of packets dropped. It is recommended that the  current
     HBM implemenation only be used with ECN enabled flows. A future patch
     will add support for reducing TCP's cwnd and will increase the
     performance of non-ECN enabled flows.
   Output:
     Details for HBM in cgroup 1
     id:1
     rate_mbps:493
     duration:4.8 secs
     packets:11355
     bytes_MB:590
     pkts_dropped:4497
     bytes_dropped_MB:292
     pkts_marked_percent: 39.60
     bytes_marked_percent: 49.49
     pkts_dropped_percent: 39.60
     bytes_dropped_percent: 49.49
     PING AVG DELAY:2.075
     AGGREGATE_GOODPUT:505

./do_nrm_test.sh -l -d=1 -D --stats dctcp
     Same as above but using dctcp. Note that fewer bytes are dropped
     (0.01% vs. 49%).
   Output:
     Details for HBM in cgroup 1
     id:1
     rate_mbps:945
     duration:4.9 secs
     packets:16859
     bytes_MB:578
     pkts_dropped:1
     bytes_dropped_MB:0
     pkts_marked_percent: 28.74
     bytes_marked_percent: 45.15
     pkts_dropped_percent:  0.01
     bytes_dropped_percent:  0.01
     PING AVG DELAY:2.083
     AGGREGATE_GOODPUT:965

./do_nrm_test.sh -d=1 -D --stats
     As first example, but without limiting loopback device (i.e. no
     "-l" flag). Since there is no bandwidth limiting, no details for
     HBM are printed out.
   Output:
     Details for HBM in cgroup 1
     PING AVG DELAY:2.019
     AGGREGATE_GOODPUT:42655

./do_hbm.sh -l -d=1 -D --stats -f=2
     Uses iper3 and does 2 flows
./do_hbm.sh -l -d=1 -D --stats -f=4 -P
     Uses iperf3 and does 4 flows, each flow as a separate process.
./do_hbm.sh -l -d=1 -D --stats -f=4 -N
     Uses netperf, 4 flows
./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats dctcp -s=<server-name>
     Uses netperf between two hosts. The remote host name is specified
     with -s= and you need to start the program netserver manually on
     the remote host. It will use 1 flow, a rate limit of 2Gbps and dctcp.
./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats -w dctcp \
     -s=<server-name>
     As previous, but allows use of extra bandwidth. For this test the
     rate is 8Gbps vs. 1Gbps of the previous test.

Signed-off-by: Lawrence Brakmo <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: User program for testing HBM

The program nrm creates a cgroup and attaches a BPF program to the
cgroup for testing HBM (Host Bandwidth Manager) for egress traffic.
One still needs to create network traffic. This can be done through
netesto, netperf or iperf3.
A follow-up patch contains a script to create traffic.

USAGE: hbm [-d] [-l] [-n <id>] [-r <rate>] [-s] [-t <secs>]
           [-w] [-h] [prog]
  Where:
   -d        Print BPF trace debug buffer
   -l        Also limit flows doing loopback
   -n <#>    To create cgroup "/hbm#" and attach prog. Default is /nrm1
             This is convenient when testing HBM in more than 1 cgroup
   -r <rate> Rate limit in Mbps
   -s        Get HBM stats (marked, dropped, etc.)
   -t <time> Exit after specified seconds (deault is 0)
   -w        Work conserving flag. cgroup can increase its bandwidth
             beyond the rate limit specified while there is available
             bandwidth. Current implementation assumes there is only
             NIC (eth0), but can be extended to support multiple NICs.
             Currrently only supported for egress. Note, this is just
     a proof of concept.
   -h        Print this info
   prog      BPF program file name. Name defaults to hbm_out_kern.o

More information about HBM can be found in the paper "BPF Host Resource
Management" presented at the 2018 Linux Plumbers Conference, Networking Track
(http://vger.kernel.org/lpc_net2018_talks/LPC%20BPF%20Network%20Resource%20Paper.pdf)

Signed-off-by: Lawrence Brakmo <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: Sample HBM BPF program to limit egress bw

A cgroup skb BPF program to limit cgroup output bandwidth.
It uses a modified virtual token bucket queue to limit average
egress bandwidth. The implementation uses credits instead of tokens.
Negative credits imply that queueing would have happened (this is
a virtual queue, so no queueing is done by it. However, queueing may
occur at the actual qdisc (which is not used for rate limiting).

This implementation uses 3 thresholds, one to start marking packets and
the other two to drop packets:
                                 CREDIT
       - <--------------------------|------------------------> +
             |    |          |      0
             |  Large pkt    |
             |  drop thresh  |
  Small pkt drop             Mark threshold
      thresh

The effect of marking depends on the type of packet:
a) If the packet is ECN enabled, then the packet is ECN ce marked.
   The current mark threshold is tuned for DCTCP.
c) Else, it is dropped if it is a large packet.

If the credit is below the drop threshold, the packet is dropped.
Note that dropping a packet through the BPF program does not trigger CWR
(Congestion Window Reduction) in TCP packets. A future patch will add
support for triggering CWR.

This BPF program actually uses 2 drop thresholds, one threshold
for larger packets (>= 120 bytes) and another for smaller packets. This
protects smaller packets such as SYNs, ACKs, etc.

The default bandwidth limit is set at 1Gbps but this can be changed by
a user program through a shared BPF map. In addition, by default this BPF
program does not limit connections using loopback. This behavior can be
overwritten by the user program. There is also an option to calculate
some statistics, such as percent of packets marked or dropped, which
the user program can access.

A latter patch provides such a program (hbm.c)

Signed-off-by: Lawrence Brakmo <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: sync bpf.h to tools and update bpf_helpers.h

This patch syncs the uapi bpf.h to tools/ and also updates
bpf_herlpers.h in tools/

Signed-off-by: Lawrence Brakmo <[email protected]>
Acked-by: Song Liu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: add bpf helper bpf_skb_ecn_set_ce

This patch adds a new bpf helper BPF_FUNC_skb_ecn_set_ce
"int bpf_skb_ecn_set_ce(struct sk_buff *skb)". It is added to
BPF_PROG_TYPE_CGROUP_SKB typed bpf_prog which currently can
be attached to the ingress and egress path. The helper is needed
because his type of bpf_prog cannot modify the skb directly.

This helper is used to set the ECN field of ECN capable IP packets to ce
(congestion encountered) in the IPv6 or IPv4 header of the skb. It can be
used by a bpf_prog to manage egress or ingress network bandwdith limit
per cgroupv2 by inducing an ECN response in the TCP sender.
This works best when using DCTCP.

Signed-off-by: Lawrence Brakmo <[email protected]>
Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Song Liu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

dt-bindings: net: bluetooth: add support for MediaTek MT7663U and MT7668U UART devices

Update binding document with adding support of MT7663U and MT7668U UART
devices to mediatek-bluetooth.

Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Sean Wang <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix refcount leak in act_ipt during replace, from Davide Caratti.

2) Set task state properly in tun during blocking reads, from Timur
    Celik.

3) Leaked reference in DSA, from Wen Yang.

4) NULL deref in act_tunnel_key, from Vlad Buslov.

5) cipso_v4_erro can reference the skb IPCB in inappropriate contexts
    thus referencing garbage, from Nazarov Sergey.

6) Don't accept RTA_VIA and RTA_GATEWAY in contexts where those
    attributes make no sense.

7) Fix hung sendto in tipc, from Tung Nguyen.

8) Out-of-bounds access in netlabel, from Paul Moore.

9) Grant reference leak in xen-netback, from Igor Druzhinin.

10) Fix tx stalls with lan743x, from Bryan Whitehead.

11) Fix interrupt storm with mv88e6xxx, from Hein Kallweit.

12) Memory leak in sit on device registry failure, from Mao Wenan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  net: sit: fix memory leak in sit_init_net()
  net: dsa: mv88e6xxx: Fix statistics on mv88e6161
  geneve: correctly handle ipv6.disable module parameter
  net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode
  bpf: fix sanitation rewrite in case of non-pointers
  ipv4: Add ICMPv6 support when parse route ipproto
  MIPS: eBPF: Fix icache flush end address
  lan743x: Fix TX Stall Issue
  net: phy: phylink: fix uninitialized variable in phylink_get_mac_state
  net: aquantia: regression on cpus with high cores: set mode with 8 queues
  selftests: fixes for UDP GRO
  bpf: drop refcount if bpf_map_new_fd() fails in map_create()
  net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X
  net: dsa: mv88e6xxx: Fix u64 statistics
  xen-netback: don't populate the hash cache on XenBus disconnect
  xen-netback: fix occasional leak of grant ref mappings under memory pressure
  sctp: chunk.c: correct format string for size_t in printk
  net: netem: fix skb length BUG_ON in __skb_to_sgvec
  netlabel: fix out-of-bounds memory accesses
  ipv4: Pass original device to ip_rcv_finish_core
  ...

Bluetooth: hci_qca: Reduce delay after sending baudrate request for WCN3990

The current 300ms delay after a baudrate change is extremely long.
For WCN3990 it is sufficient to wait 10ms after the baudrate change
request has been sent over the wire.

Signed-off-by: Matthias Kaehlcke <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull more crypto fixes from Herbert Xu:
"This fixes a couple of issues in arm64/chacha that was introduced in
  5.0"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: arm64/chacha - fix hchacha_block_neon() for big endian
  crypto: arm64/chacha - fix chacha_4block_xor_neon() for big endian

Merge tag 'wireless-drivers-next-for-davem-2019-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 5.1

Last set of patches. A new hardware support for mt76 otherwise quite
normal.

Major changes:

mt76

* add driver for MT7603E/MT7628

ath10k

* more preparation for SDIO support

wil6210

* support up to 20 stations in AP mode
====================

Signed-off-by: David S. Miller <[email protected]>

net: sit: fix memory leak in sit_init_net()

If register_netdev() is failed to register sitn->fb_tunnel_dev,
it will go to err_reg_dev and forget to free netdev(sitn->fb_tunnel_dev).

BUG: memory leak
unreferenced object 0xffff888378daad00 (size 512):
  comm "syz-executor.1", pid 4006, jiffies 4295121142 (age 16.115s)
  hex dump (first 32 bytes):
    00 e6 ed c0 83 88 ff ff 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
backtrace:
    [<00000000d6dcb63e>] kvmalloc include/linux/mm.h:577 [inline]
    [<00000000d6dcb63e>] kvzalloc include/linux/mm.h:585 [inline]
    [<00000000d6dcb63e>] netif_alloc_netdev_queues net/core/dev.c:8380 [inline]
    [<00000000d6dcb63e>] alloc_netdev_mqs+0x600/0xcc0 net/core/dev.c:8970
    [<00000000867e172f>] sit_init_net+0x295/0xa40 net/ipv6/sit.c:1848
    [<00000000871019fa>] ops_init+0xad/0x3e0 net/core/net_namespace.c:129
    [<00000000319507f6>] setup_net+0x2ba/0x690 net/core/net_namespace.c:314
    [<0000000087db4f96>] copy_net_ns+0x1dc/0x330 net/core/net_namespace.c:437
    [<0000000057efc651>] create_new_namespaces+0x382/0x730 kernel/nsproxy.c:107
    [<00000000676f83de>] copy_namespaces+0x2ed/0x3d0 kernel/nsproxy.c:165
    [<0000000030b74bac>] copy_process.part.27+0x231e/0x6db0 kernel/fork.c:1919
    [<00000000fff78746>] copy_process kernel/fork.c:1713 [inline]
    [<00000000fff78746>] _do_fork+0x1bc/0xe90 kernel/fork.c:2224
    [<000000001c2e0d1c>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
    [<00000000ec48bd44>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<0000000039acff8a>] 0xffffffffffffffff

Signed-off-by: Mao Wenan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Fix statistics on mv88e6161

Despite what the datesheet says, the silicon implements the older way
of snapshoting the statistics. Change the op.

Reported-by: [email protected]
Tested-by: [email protected]
Fixes: 0ac64c394900 ("net: dsa: mv88e6xxx: mv88e6161 uses mv88e6320 stats snapshot")
Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ipv4: Fix NULL pointer dereference in route lookup

When calculating the multipath hash for input routes the flow info is
not available and therefore should not be used.

Fixes: 24ba14406c5c ("route: Add multipath_hash in flowi_common to make user-define hash")
Signed-off-by: Ido Schimmel <[email protected]>
Cc: wenxu <[email protected]>
Acked-by: wenxu <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-mvpp2-fixes-and-improvements'

Antoine Tenart says:

====================
net: mvpp2: fixes and improvements

This series aims to improve the Marvell PPv2 driver and to fix various
issues we encountered while testing the ports in many different
configurations. The series is based on top of Russell PPv2 phylink
rework and improvement.

I'm not sending a v2 of the previous fixes series as half the patches
are not the same and lots of development happened in between.

While this series contains fixes, it's sent to net-next as it is based
on top of Russell patches that were merged into net-next. I'm also
aiming at net-next as the series reworks critical paths of the PPv2
driver, such as the reset handling of various blocks, to let more weeks
for users to tests and for possible fixes to be sent before it lands
into a stable kernel version.

The series is divided into three parts:

- Patches 1 to 3 are cosmetic changes, sent alongside the series, as I
  saw these small issues while working on this.

- Patches 5 to 8 are fixing (or improving) individual issues that we
  found while testing PPv2.1 and PPv2.2 ports while using various
  interfaces.

  Notable fixes are we support back RGMII interfaces (on both PPv2.1 and
  PPv2.2), as their support was broken by previous patches. We also
  reworked the RXQ computation as the RXQ assignment was not checking
  the maximum number of RXQ available, and was broken for PPv2.1.

- As discussed in a previous fixes series, patches 9 to 15 rework the
  way blocks are set in reset in the PPv2 engine (plus related changes).

  There are four blocks we want to control the reset status: two MAC
  (GMAC and XLG MAC) and two PCS (MPCS and XPCS). The XLG MAC is used
  for 10G connexions and uses the MPCS or the XPCS depending on the mode
  used (10GKR / XAUI / RXAUI) and the GMAC is used for the other modes.

  The idea is to set all blocks in reset by default, and when not used,
  and to de-assert the reset only when a block is used. There are four
  cases to take in account:

  1. Boot time: all four blocks should be put in reset, as we do not
     know their initial state (configured by the firmware/bootloader).

  2. Link up: only the blocks used by a given mode should be put out of
     reset (eg. 10GKR uses the XLG MAC and the MPCS).

  3. Mode reconfiguration: some ports may support mode reconfiguration,
     and switching between the GMAC and the XLG MAC (or between the two
     PCS). All blocks should be put in reset, and only the one used
     should be put out of reset.

  4. Link down: all four blocks are put in reset.
====================

Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: set the GMAC, XLG MAC, XPCS and MPCS in reset when a port is down

This patch adds calls in the stop() helper to ensure both MACs and
both PCS blocks are set in reset when the user manually sets a port
down. This is done so that we have the exact same block reset states at
boot time and when a port is set down.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: set the XPCS and MPCS in reset when not used

This patch sets both the XPCS and MPCS blocks in reset when they aren't
used. This is done both at boot time and when reconfiguring a port mode.
The advantage now is that only the PCS used is set out of reset when the
port is configured (10GKR uses the MCPS while RXAUI uses the XPCS).

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: reset the MACs when reconfiguring a port

This patch makes sure both PPv2 MACs (GMAC + XLG MAC) are set in reset
while a port is reconfigured. This is done so that we make sure a MAC is
in a reset state when not used, as only one of the two will be set out
of reset after the port is configured properly.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: rework the XLG MAC reset handling

This patch reworks the way the XLG MAC is set in reset: the XLG MAC is
set in reset at probe time and taken out of this state only when used.
The idea is to move forward a situation where only the blocks used are
taken out of reset. This also has the effect to handle the GMAC and the
XLG MAC in a similar way (the GMAC already is set in reset at boot
time).

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: force the XLG MAC link up or down when not using in-band

This patch force the XLG MAC link state in the phylink link_up() and
link_down() helpers when not using in-band auto-negotiation. This mimics
what's already done for the GMAC and follows what's advised in the
phylink documentation.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: only update the XLG configuration when needed

This patch improves the XLG configuration function, to only update the
XLG configuration register when a change is needed. This helps not
writing over and over the same XLG configuration each time phylink
request the MAC to be configured. This mimics the GMAC configuration
function.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: always disable both MACs when disabling a port

This patch modifies the port_disable() helper to always disable both the
GMAC and the XLG MAC when called. At boot time we do not know of a port
was enabled in the firmware/bootloader, and if so what mode was used
(hence which of the two MACs was used).

This also help in implementing a logic where all blocks are disabled
when not used, and only enabled regarding the current mode used on a
given port.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: some AN fields require the link to be down when updated

The GMAC configuration helper modifies values in the auto-negotiation
register. Some of its values require the port to be forced down when
modifying their values. This patches fixes the check made on the bit to
be updated in this register, so that the port is forced down when
needed. This fix cases where some of those parameters were updated, but
not taken into account, such as when using RGMII interfaces.

Fixes: d14e078f23cc ("net: marvell: mvpp2: only reprogram what is necessary on mac_config")
Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: fix the computation of the RXQs

The patch fixes the computation of RXQs being used by the PPv2 driver,
which is set depending on the PPv2 engine version and the queue mode
used. There are three cases:

- PPv2.1: 1 RXQ per CPU.
- PPV2.2 with MVPP2_QDIST_MULTI_MODE: 1 RXQ per CPU.
- PPv2.2 with MVPP2_QDIST_SINGLE_MODE: 1 RXQ is shared between the CPUs.

The PPv2 engine supports a maximum of 32 queues per port. This patch
adds a check so that we do not overstep this maximum.

It appeared the calculation was broken for PPv2.1 engines since
f8c6ba8424b0, as PPv2.1 ports ended up with a single RXQ while they
needed 4. This patch fixes it.

Fixes: f8c6ba8424b0 ("net: mvpp2: use only one rx queue per port per CPU")
Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: fix validate for PPv2.1

The Phylink validate function is the Marvell PPv2 driver makes a check
on the GoP id. This is valid an has to be done when using PPv2.2 engines
but makes no sense when using PPv2.1. The check done when using an RGMII
interface makes sure the GoP id is not 0, but this breaks PPv2.1. Fixes
it.

Fixes: 0fb628f0f250 ("net: mvpp2: fix phylink handling of invalid PHY modes")
Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: reconfiguring the port interface is PPv2.2 specific

This patch adds a check on the PPv2 version in-use not to reconfigure
the port mode when an interface is updated when using PPv2.1 as the
functions called are PPv2.2 specific.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: a port can be disabled even if we use the link IRQ

We had a check in the mvpp2_mac_link_down() function (called by phylink)
to avoid disabling the port when link interrupts are used. It turned out
the interrupt can still be used with the port disabled. We can thus
remove this check.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: fix alignment of MVPP2_GMAC_CONFIG_MII_SPEED definition

Cosmetic patch fix the alignment of the MVPP2_GMAC_CONFIG_MII_SPEED
macro definition.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: update the port documentation regarding the GoP

The Marvell PPv2 port structure stores the GoP id of a given port. This
information is specific to PPv2.2, but cannot be used by PPv2.1. Update
its comment to denote this specificity.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mvpp2: fix a typo in the header

This cosmetic patch fixes a typo made in a comment in the Marvell PPv2
Ethernet driver header.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb4vf: Call netif_carrier_off properly in pci_probe

netif_carrier_off() should be called only after register_netdev().

Signed-off-by: Arjun Vynipadath <[email protected]>
Signed-off-by: Vishal Kulkarni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'cxgb4-vf-link-state'

Arjun Vynipadath says:

====================
cxgb4/cxgb4vf: VF link state support

This series of patches adds support for ndo_set_vf_link_state in
cxgb4 driver.

Patch 1 implements ndo_set_vf_link_state
Patch 2 reverts the existing force_link_up behaviour for cxgb4vf driver.

v2:
- Using reverse christmas tree for variable declaration in Patch 1
- Patch 2 has no change
====================

Signed-off-by: David S. Miller <[email protected]>

cxgb4vf: Revert force link up behaviour

Reverting force link up changes since this behaviour can be
achieved using VF link state feature.

Reverts:
commit 0913667ab3ad ("cxgb4vf: Forcefully link up virtual interfaces")

Signed-off-by: Arjun Vynipadath <[email protected]>
Signed-off-by: Vishal Kulkarni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb4: Add VF Link state support

Use ndo_set_vf_link_state to control the link states associated
with the virtual interfaces.

Signed-off-by: Arjun Vynipadath <[email protected]>
Signed-off-by: Vishal Kulkarni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb4vf: Prefix adapter flags with CXGB4VF

Some of these macros were conflicting with global namespace,
hence prefixing them with CXGB4VF.

Signed-off-by: Arjun Vynipadath <[email protected]>
Signed-off-by: Vishal Kulkarni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drivers: net: Remove unnecessary semicolon

drivers/net/dsa/mt7530.c:649:3-4: Unneeded semicolon
drivers/net/ethernet/cisco/enic/enic_clsf.c:35:2-3: Unneeded semicolon
drivers/net/ethernet/faraday/ftgmac100.c:1640:2-3: Unneeded semicolon
drivers/net/ethernet/mediatek/mtk_eth_soc.c:229:2-3: Unneeded semicolon
drivers/net/usb/sr9700.c:437:2-3: Unneeded semicolon

Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Signed-off-by: YueHaibing <[email protected]>
Acked-by: Sean Wang <[email protected]> for mt7530 and mtk_eth_soc
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'SO_MAX_PACING_RATE-64-bit'

Eric Dumazet says:

====================
net: 64bit support for SO_MAX_PACING_RATE

64bit kernels adopted 64bit type for sk_max_pacing_rate in linux-4.20

We can change how we implement SO_MAX_PACING_RATE socket option
to support 64bit values to/from user space as well.
====================

Signed-off-by: David S. Miller <[email protected]>

net: support 64bit rates for getsockopt(SO_MAX_PACING_RATE)

For legacy applications using 32bit variable, SO_MAX_PACING_RATE
has to cap the returned value to 0xFFFFFFFF, meaning that
rates above 34.35 Gbit are capped.

This patch allows applications to read socket pacing rate
at full resolution, if they provide a 64bit variable to store it,
and the kernel is 64bit.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: support 64bit values for setsockopt(SO_MAX_PACING_RATE)

64bit kernels now support 64bit pacing rates.

This commit changes setsockopt() to accept 64bit
values provided by applications.

Old applications providing 32bit value are still supported,
but limited to the old 34Gbit limitation.

Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tc-testing: Allow test cases to be skipped

By adding a check for an optional key/value pair to the test case
data, individual test cases may be skipped to prevent tdc from
aborting a test run due to setup or teardown failure.

If a test case is skipped, it will still appear in the results
output to allow for a consistent number of executed tests in each
run. However, the test will be marked as skipped.

This support for skipping extends to any plugins that may generate
additional results for each executed test.

Signed-off-by: Lucas Bates <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

geneve: correctly handle ipv6.disable module parameter

When IPv6 is compiled but disabled at runtime, geneve_sock_add returns
-EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
operation of bringing up the tunnel.

Ignore failure of IPv6 socket creation for metadata based tunnels caused by
IPv6 not being available.

This is the same fix as what commit d074bf960044 ("vxlan: correctly handle
ipv6.disable module parameter") is doing for vxlan.

Note there's also commit c0a47e44c098 ("geneve: should not call rt6_lookup()
when ipv6 was disabled") which fixes a similar issue but for regular
tunnels, while this patch is needed for metadata based tunnels.

Signed-off-by: Jiri Benc <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2019-03-01

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) fix sanitation rewrite, from Daniel.

2) fix error path on map_new_fd, from Peng.

3) fix icache flush address, from Paul.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mlxsw-rehash-split'

Ido Schimmel says:

====================
mlxsw: spectrum_acl: Split rehash work into chunks

Jiri says:

When rehash happens on a vregion with many rules and they are being
migrated, it might take significant time to finish the job. During that
time vregion->lock is taken which prevents rules from being
added/deleted from the vregion.

Aim of this patchset is to allow to interrupt migration of rules during
rehash, reschedule and give chance for rules to be added/deleted. Then
continue migration in another execution of scheduled work.
====================

Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Make mlxsw_sp_acl_tcam_vregion_rehash() return void

The return value is ignored anyway, so just return void.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Remember where to continue rehash migration

Store pointer to vchunk where the migration was interrupted, as well as
ventry pointer to start from and to stop at (during rollback). This
saved pointers need to be forgotten in case of ventries list or vchunk
list changes, which is done by couple of "changed" helpers.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Allow to interrupt/continue rehash work

Currently, migration of vregions with many entries may take long time
during which insertions and removals of the rules are blocked
due to wait to acquire vregion->lock.

To overcome this, allow to interrupt and continue rehash work according
to the set credits - number of rules to migrate.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Do rollback as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all()

In order to simplify the code and to prepare it for
interrupted/continued migration process, do the rollback in case of
migration error as another call to mlxsw_sp_acl_tcam_vchunk_migrate_all().
It can be understood as "migrate all back".

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Put vchunk migrate start/end code into separate functions

In preparations of interrupt/continue of rehash work, put the code that
is done at the beginning/end of vchunk migrate function into separate
functions.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Put this_is_rollback to rehash context struct

Put the this_is_rollback flag into rehash context struct in preparations
for interrupt/continue of rehash work.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Rename variables in mlxsw_sp_acl_tcam_ventry_migrate()

Remove some of variables in function mlxsw_sp_acl_tcam_ventry_migrate()
so the names are aligned with the rest of the code.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: assign vchunk->chunk by the newly created chunk

Make the vchunk->chunk contain pointer of a new chunk we migrate to.
In case of a rollback, it contains the original chunk.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: assign vregion->region by the newly created region

Make the vregion->region contain pointer of a new region we migrate to.
In case of a rollback, it contains the original region.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Push code start/end from mlxsw_sp_acl_tcam_vregion_migrate()

Push code from the beginning and end of function
mlxsw_sp_acl_tcam_vregion_migrate() into rehash_start()/end() functions.
Then all the things needed to be done before and after the actual
migration process will be grouped together.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Push rehash start/end code into separate functions

In preparations for interrupt/continue of rehash work, put the code at
the beginning/end of the rehash function into separate functions.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Introduce new rehash context struct and save hint_priv there

Prepare for continued migration. Introduce a new structure to track
rehash context and save hint_priv into it.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Don't migrate already migrated entry

Check if the entry is already in a chunk where we want it to be. In that
case, skip migration. This is preparation for "per parts" migration
where this situation may occur.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_acl: Push rehash dw struct into rehash sub-struct

More rehash related fields are going to come. Push "dw" into sub-struct
that will accommodate the others as well.

Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode

When debugging another issue I faced an interrupt storm in this
driver (88E6390, port 9 in SGMII mode), consisting of alternating
link-up / link-down interrupts. Analysis showed that the driver
wanted to set a cmode that was set already. But so far
mv88e6390x_port_set_cmode() doesn't check this and powers down
SERDES, what causes the link to break, and eventually results in
the described interrupt storm.

Fix this by checking whether the cmode actually changes. We want
that the very first call to mv88e6390x_port_set_cmode() always
configures the registers, therefore initialize port.cmode with
a value that is different from any supported cmode value.
We have to take care that we only init the ports cmode once
chip->info->num_ports is set.

v2:
- add small helper and init the number of actual ports only

Fixes: 364e9d7776a3 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change")
Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

switchdev: Remove unused transaction item queue

There are no more in tree users of the
switchdev_trans_item_{dequeue,enqueue} or switchdev_trans_item structure
in the kernel since commit 00fc0c51e35b ("rocker: Change world_ops API
and implementation to be switchdev independant").

Remove this unused code and update the documentation accordingly since.

Signed-off-by: Florian Fainelli <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bpf: fix sanitation rewrite in case of non-pointers

Marek reported that he saw an issue with the below snippet in that
timing measurements where off when loaded as unpriv while results
were reasonable when loaded as privileged:

    [...]
    uint64_t a = bpf_ktime_get_ns();
    uint64_t b = bpf_ktime_get_ns();
    uint64_t delta = b - a;
    if ((int64_t)delta > 0) {
    [...]

Turns out there is a bug where a corner case is missing in the fix
d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar
type from different paths"), namely fixup_bpf_calls() only checks
whether aux has a non-zero alu_state, but it also needs to test for
the case of BPF_ALU_NON_POINTER since in both occasions we need to
skip the masking rewrite (as there is nothing to mask).

Fixes: d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar type from different paths")
Reported-by: Marek Majkowski <[email protected]>
Reported-by: Arthur Fabre <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/netdev/CAJPywTJqP34cK20iLM5YmUMz9KXQOdu1-+BZrGMAGgLuBWz7fg@mail.gmail.com/T/
Acked-by: Song Liu <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

Merge branch 'doc-net-ieee802154-move-from-plain-text-to-rst'

Stefan Schmidt says:

====================
doc: net: ieee802154: move from plain text to rst

The ieee802154 subsystem doc was still in plain text. With the networking book
taking shape I thought it was time to do the first step and move it over to rst.
This really is only the minimal conversion. I need to take some time to update
and extend the docs.

The patches are based on net-next, but they only touch the networking book so I
would not expect and trouble. From what I have seen they would go through
Jonathan's tree after being acked by Dave? If you want this patches against a
different tree let me know.
====================

Signed-off-by: David S. Miller <[email protected]>

doc: net: ieee802154: remove old plain text docs after switching to rst

The plain text docs are converted to rst now, which allows us to remove
the old text file from the tree.

Signed-off-by: Stefan Schmidt <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

doc: net: ieee802154: introduce IEEE 802.15.4 subsystem doc in rst style

Moving the ieee802154 docs from a plain text file into the new rst
style. This commit only does the minimal needed change to bring the
documentation over. Follow up patches will improve and extend on this.

Signed-off-by: Stefan Schmidt <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

devlink: fix kdoc

devlink suffers from a few kdoc warnings:

net/core/devlink.c:5292: warning: Function parameter or member 'dev' not described in 'devlink_register'
net/core/devlink.c:5351: warning: Function parameter or member 'port_index' not described in 'devlink_port_register'
net/core/devlink.c:5753: warning: Function parameter or member 'parent_resource_id' not described in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Function parameter or member 'size_params' not described in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'top_hierarchy' description in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'reload_required' description in 'devlink_resource_register'
net/core/devlink.c:5753: warning: Excess function parameter 'parent_reosurce_id' description in 'devlink_resource_register'
net/core/devlink.c:6451: warning: Function parameter or member 'region' not described in 'devlink_region_snapshot_create'
net/core/devlink.c:6451: warning: Excess function parameter 'devlink_region' description in 'devlink_region_snapshot_create'

Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-aquantia-minor-bug-fixes-after-static-analysis'

Igor Russkikh says:

====================
net: aquantia: minor bug fixes after static analysis

This patchset fixes minor errors and warnings found by smatch and kasan.

Extra patch is to replace AQ_HW_WAIT_FOR with readx_poll_timeout
to improve readability.

V2:
use readx_poll
resubmitted to net-next since the changeset became quite big.
====================

Signed-off-by: David S. Miller <[email protected]>

net: aquantia: use better wrappers for state registers

Replace some direct registers reads with better
online functions.

Signed-off-by: Nikita Danilov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: aquantia: replace AQ_HW_WAIT_FOR with readx_poll_timeout_atomic

David noticed the original define was hiding 'err' variable
reference. Thats confusing and counterintuitive.

Andrew noted the whole macro could be replaced with standard readx_poll
kernel macro. This makes code more readable.

Signed-off-by: Nikita Danilov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: aquantia: fixed instack structure overflow

This is a real stack undercorruption found by kasan build.

The issue did no harm normally because it only overflowed
2 bytes after `bitary` array which on most architectures
were mapped into `err` local.

Fixes: bab6de8fd180 ("net: ethernet: aquantia: Atlantic A0 and B0 specific functions.")
Signed-off-by: Nikita Danilov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: aquantia: fixed buffer overflow

The overflow is detected by smatch:

drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c: 175
aq_pci_func_free_irqs() error: buffer overflow 'self->aq_vec' 8 <= 31

In reality msix_entry_mask always restricts number of iterations.
Adding extra condition to make logic clear and smatch happy.

Signed-off-by: Nikita Danilov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: aquantia: added newline at end of file

drivers/net/ethernet/aquantia/atlantic/aq_nic.c: 991:1:
warning: no newline at end of file

Signed-off-by: Nikita Danilov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: aquantia: fixed memcpy size

Not careful array dereference caused analysis tools
to think there could be memory overflow.

There was actually no corruption because the array is
two dimensional.

drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c: 140
aq_ethtool_get_strings() error:
memcpy() '*aq_ethtool_stat_names' too small (32 vs 704)

Signed-off-by: Nikita Danilov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv4: Add ICMPv6 support when parse route ipproto

For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers.
But for ip -6 route, currently we only support tcp, udp and icmp.

Add ICMPv6 support so we can match ipv6-icmp rules for route lookup.

v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to
rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family.

Reported-by: Jianlin Shi <[email protected]>
Fixes: eacb9384a3fe ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: Hangbin Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

samples/bpf: silence compiler warning for xdpsock_user.c

Compiling xdpsock_user.c with 4.8.5, I hit the following
compilation warning:
    HOSTCC  samples/bpf/xdpsock_user.o
  /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c: In function ‘main’:
  /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:449:6: warning: ‘idx_cq’ may be used unini
  tialized in this function [-Wmaybe-uninitialized]
    u32 idx_cq, idx_fq;
        ^
  /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:606:7: warning: ‘idx_rx’ may be used unini
  tialized in this function [-Wmaybe-uninitialized]
     u32 idx_rx, idx_tx = 0;
         ^
  /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:506:6: warning: ‘idx_rx’ may be used unini
  tialized in this function [-Wmaybe-uninitialized]
    u32 idx_rx, idx_fq = 0;

As an example, the code pattern looks like:
    u32 idx_cq;
    ...
    ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
    if (ret) {
      ...
    }
    ... idx_fq ...
The compiler warns since it does not know whether &idx_fq is assigned
or not inside the library function xsk_ring_prod__reserve().

Let us assign an initial value 0 to such auto variables to silence
compiler warning.

Fixes: 248c7f9c0e21 ("samples/bpf: convert xdpsock to use libbpf for AF_XDP access")
Signed-off-by: Yonghong Song <[email protected]>
Acked-by: Jonathan Lemon <[email protected]>
Acked-by: Song Liu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

selftests/bpf: set unlimited RLIMIT_MEMLOCK for test_sock_fields

This is to avoid permission denied error. A lot of systems
may have a much lower number, e.g., 64KB, for RLIMIT_MEMLOCK,
which may not be sufficient for the test to run successfully.

Fixes: e0b27b3f97b8 ("bpf: Add test_sock_fields for skb->sk and bpf_tcp_sock")
Signed-off-by: Yonghong Song <[email protected]>
Acked-by: Song Liu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>

Merge branch 'bpf-doc-improvements'

Andrii Nakryiko says:

====================
A bunch of BPF-related docs typo, wording and formatting fixes.

v1->v2:
- split off non-documentation changes into separate patchset
====================

Acked-by: Song Liu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>