]> Git Repo - linux.git/log
linux.git
2 years agodt-bindings: net: cdns,macb: document polarfire soc's macb
Conor Dooley [Wed, 6 Jul 2022 09:51:25 +0000 (10:51 +0100)]
dt-bindings: net: cdns,macb: document polarfire soc's macb

Until now the PolarFire SoC (MPFS) has been using the generic
"cdns,macb" compatible but has optional reset support. Add a specific
compatible which falls back to the currently used generic binding.

Acked-by: Rob Herring <[email protected]>
Reviewed-by: Claudiu Beznea <[email protected]>
Signed-off-by: Conor Dooley <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: l2tp: fix clang -Wformat warning
Justin Stitt [Wed, 6 Jul 2022 23:08:33 +0000 (16:08 -0700)]
net: l2tp: fix clang -Wformat warning

When building with clang we encounter this warning:
| net/l2tp/l2tp_ppp.c:1557:6: error: format specifies type 'unsigned
| short' but the argument has type 'u32' (aka 'unsigned int')
| [-Werror,-Wformat] session->nr, session->ns,

Both session->nr and session->ns are of type u32. The format specifier
previously used is `%hu` which would truncate our unsigned integer from
32 to 16 bits. This doesn't seem like intended behavior, if it is then
perhaps we need to consider suppressing the warning with pragma clauses.

This patch should get us closer to the goal of enabling the -Wformat
flag for Clang builds.

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Justin Stitt <[email protected]>
Acked-by: Guillaume Nault <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: page_pool: optimize page pool page allocation in NUMA scenario
Jie Wang [Tue, 5 Jul 2022 11:35:15 +0000 (19:35 +0800)]
net: page_pool: optimize page pool page allocation in NUMA scenario

Currently NIC packet receiving performance based on page pool deteriorates
occasionally. To analysis the causes of this problem page allocation stats
are collected. Here are the stats when NIC rx performance deteriorates:

bandwidth(Gbits/s) 16.8 6.91
rx_pp_alloc_fast 13794308 21141869
rx_pp_alloc_slow 108625 166481
rx_pp_alloc_slow_h 0 0
rx_pp_alloc_empty 8192 8192
rx_pp_alloc_refill 0 0
rx_pp_alloc_waive 100433 158289
rx_pp_recycle_cached 0 0
rx_pp_recycle_cache_full 0 0
rx_pp_recycle_ring 362400 420281
rx_pp_recycle_ring_full 6064893 9709724
rx_pp_recycle_released_ref 0 0

The rx_pp_alloc_waive count indicates that a large number of pages' numa
node are inconsistent with the NIC device numa node. Therefore these pages
can't be reused by the page pool. As a result, many new pages would be
allocated by __page_pool_alloc_pages_slow which is time consuming. This
causes the NIC rx performance fluctuations.

The main reason of huge numa mismatch pages in page pool is that page pool
uses alloc_pages_bulk_array to allocate original pages. This function is
not suitable for page allocation in NUMA scenario. So this patch uses
alloc_pages_bulk_array_node which has a NUMA id input parameter to ensure
the NUMA consistent between NIC device and allocated pages.

Repeated NIC rx performance tests are performed 40 times. NIC rx bandwidth
is higher and more stable compared to the datas above. Here are three test
stats, the rx_pp_alloc_waive count is zero and rx_pp_alloc_slow which
indicates pages allocated from slow patch is relatively low.

bandwidth(Gbits/s) 93 93.9 93.8
rx_pp_alloc_fast 60066264 61266386 60938254
rx_pp_alloc_slow 16512 16517 16539
rx_pp_alloc_slow_ho 0 0 0
rx_pp_alloc_empty 16512 16517 16539
rx_pp_alloc_refill 473841 481910 481585
rx_pp_alloc_waive 0 0 0
rx_pp_recycle_cached 0 0 0
rx_pp_recycle_cache_full 0 0 0
rx_pp_recycle_ring 29754145 30358243 30194023
rx_pp_recycle_ring_full 0 0 0
rx_pp_recycle_released_ref 0 0 0

Signed-off-by: Jie Wang <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Ilias Apalodimas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 7 Jul 2022 19:07:37 +0000 (12:07 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoMerge tag 'net-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 7 Jul 2022 17:08:20 +0000 (10:08 -0700)]
Merge tag 'net-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf, netfilter, can, and bluetooth.

  Current release - regressions:

   - bluetooth: fix deadlock on hci_power_on_sync

  Previous releases - regressions:

   - sched: act_police: allow 'continue' action offload

   - eth: usbnet: fix memory leak in error case

   - eth: ibmvnic: properly dispose of all skbs during a failover

  Previous releases - always broken:

   - bpf:
       - fix insufficient bounds propagation from
         adjust_scalar_min_max_vals
       - clear page contiguity bit when unmapping pool

   - netfilter: nft_set_pipapo: release elements in clone from
     abort path

   - mptcp: netlink: issue MP_PRIO signals from userspace PMs

   - can:
       - rcar_canfd: fix data transmission failed on R-Car V3U
       - gs_usb: gs_usb_open/close(): fix memory leak

  Misc:

   - add Wenjia as SMC maintainer"

* tag 'net-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
  wireguard: Kconfig: select CRYPTO_CHACHA_S390
  crypto: s390 - do not depend on CRYPTO_HW for SIMD implementations
  wireguard: selftests: use microvm on x86
  wireguard: selftests: always call kernel makefile
  wireguard: selftests: use virt machine on m68k
  wireguard: selftests: set fake real time in init
  r8169: fix accessing unset transport header
  net: rose: fix UAF bug caused by rose_t0timer_expiry
  usbnet: fix memory leak in error case
  Revert "tls: rx: move counting TlsDecryptErrors for sync"
  mptcp: update MIB_RMSUBFLOW in cmd_sf_destroy
  mptcp: fix local endpoint accounting
  selftests: mptcp: userspace PM support for MP_PRIO signals
  mptcp: netlink: issue MP_PRIO signals from userspace PMs
  mptcp: Acquire the subflow socket lock before modifying MP_PRIO flags
  mptcp: Avoid acquiring PM lock for subflow priority changes
  mptcp: fix locking in mptcp_nl_cmd_sf_destroy()
  net/mlx5e: Fix matchall police parameters validation
  net/sched: act_police: allow 'continue' action offload
  net: lan966x: hardcode the number of external ports
  ...

2 years agoMerge tag 'pinctrl-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Thu, 7 Jul 2022 17:02:38 +0000 (10:02 -0700)]
Merge tag 'pinctrl-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:

 - Tag Intel pin control as supported in MAINTAINERS

 - Fix a NULL pointer exception in the Aspeed driver

 - Correct some NAND functions in the Sunxi A83T driver

 - Use the right offset for some Sunxi pins

 - Fix a zero base offset in the Freescale (NXP) i.MX93

 - Fix the IRQ support in the STM32 driver

* tag 'pinctrl-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: stm32: fix optional IRQ support to gpios
  pinctrl: imx: Add the zero base flag for imx93
  pinctrl: sunxi: sunxi_pconf_set: use correct offset
  pinctrl: sunxi: a83t: Fix NAND function name for some pins
  pinctrl: aspeed: Fix potential NULL dereference in aspeed_pinmux_set_mux()
  MAINTAINERS: Update Intel pin control to Supported

2 years agosignal handling: don't use BUG_ON() for debugging
Linus Torvalds [Wed, 6 Jul 2022 19:20:59 +0000 (12:20 -0700)]
signal handling: don't use BUG_ON() for debugging

These are indeed "should not happen" situations, but it turns out recent
changes made the 'task_is_stopped_or_trace()' case trigger (fix for that
exists, is pending more testing), and the BUG_ON() makes it
unnecessarily hard to actually debug for no good reason.

It's been that way for a long time, but let's make it clear: BUG_ON() is
not good for debugging, and should never be used in situations where you
could just say "this shouldn't happen, but we can continue".

Use WARN_ON_ONCE() instead to make sure it gets logged, and then just
continue running.  Instead of making the system basically unusuable
because you crashed the machine while potentially holding some very core
locks (eg this function is commonly called while holding 'tasklist_lock'
for writing).

Signed-off-by: Linus Torvalds <[email protected]>
2 years agoselftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage
Dave Marchevsky [Tue, 5 Jul 2022 19:00:18 +0000 (12:00 -0700)]
selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage

This benchmark measures grace period latency and kthread cpu usage of
RCU Tasks Trace when many processes are creating/deleting BPF
local_storage. Intent here is to quantify improvement on these metrics
after Paul's recent RCU Tasks patches [0].

Specifically, fork 15k tasks which call a bpf prog that creates/destroys
task local_storage and sleep in a loop, resulting in many
call_rcu_tasks_trace calls.

To determine grace period latency, trace time elapsed between
rcu_tasks_trace_pregp_step and rcu_tasks_trace_postgp; for cpu usage
look at rcu_task_trace_kthread's stime in /proc/PID/stat.

On my virtualized test environment (Skylake, 8 cpus) benchmark results
demonstrate significant improvement:

BEFORE Paul's patches:

  SUMMARY tasks_trace grace period latency        avg 22298.551 us stddev 1302.165 us
  SUMMARY ticks per tasks_trace grace period      avg 2.291 stddev 0.324

AFTER Paul's patches:

  SUMMARY tasks_trace grace period latency        avg 16969.197 us  stddev 2525.053 us
  SUMMARY ticks per tasks_trace grace period      avg 1.146 stddev 0.178

Note that since these patches are not in bpf-next benchmarking was done
by cherry-picking this patch onto rcu tree.

  [0] https://lore.kernel.org/rcu/20220620225402.GA3842369@paulmck-ThinkPad-P17-Gen-1/

Signed-off-by: Dave Marchevsky <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agolibbpf, riscv: Use a0 for RC register
Yixun Lan [Wed, 6 Jul 2022 14:02:04 +0000 (22:02 +0800)]
libbpf, riscv: Use a0 for RC register

According to the RISC-V calling convention register usage here [0], a0
is used as return value register, so rename it to make it consistent
with the spec.

  [0] section 18.2, table 18.2
      https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf

Fixes: 589fed479ba1 ("riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h")
Signed-off-by: Yixun Lan <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Björn Töpel <[email protected]>
Acked-by: Amjad OULED-AMEUR <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoaf_unix: Optimise hash table layout.
Kuniyuki Iwashima [Tue, 5 Jul 2022 23:37:15 +0000 (16:37 -0700)]
af_unix: Optimise hash table layout.

Commit 6dd4142fb5a9 ("Merge branch 'af_unix-per-netns-socket-hash'") and
commit 51bae889fe11 ("af_unix: Put pathname sockets in the global hash
table.") changed a hash table layout.

  Before:
    unix_socket_table [0   - 255] : abstract & pathname sockets
                      [256 - 511] : unnamed sockets

  After:
    per-netns table   [0   - 255] : abstract & pathname sockets
                      [256 - 511] : unnamed sockets
    bsd_socket_table  [0   - 255] : pathname sockets (sk_bind_node)

Now, while looking up sockets, we traverse the global table for the
pathname sockets and the first half of each per-netns hash table for
abstract sockets, where pathname sockets are also linked.  Thus, the
more pathname sockets we have, the longer we take to look up abstract
sockets.  This characteristic has been there before the layout change,
but we can improve it now.

This patch changes the per-netns hash table's layout so that sockets not
requiring lookup reside in the first half and do not impact the lookup of
abstract sockets.

    per-netns table   [0   - 255] : pathname & unnamed sockets
                      [256 - 511] : abstract sockets
    bsd_socket_table  [0   - 255] : pathname sockets (sk_bind_node)

We have run a test that bind()s 100,000 abstract/pathname sockets for
each, bind()s an abstract socket 100,000 times and measures the time
on __unix_find_socket_byname().  The result shows that the patch makes
each lookup faster.

  Without this patch:
    $ sudo ./funclatency -p 2278 --microseconds __unix_find_socket_byname.isra.44
     usec                : count    distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 126      |                                        |
        16 -> 31         : 1438     |*                                       |
        32 -> 63         : 4150     |***                                     |
        64 -> 127        : 9049     |*******                                 |
       128 -> 255        : 37704    |*******************************         |
       256 -> 511        : 47533    |****************************************|

  With this patch:
    $ sudo ./funclatency -p 3648 --microseconds __unix_find_socket_byname.isra.46
     usec                : count    distribution
         0 -> 1          : 109      |                                        |
         2 -> 3          : 318      |                                        |
         4 -> 7          : 725      |                                        |
         8 -> 15         : 2501     |*                                       |
        16 -> 31         : 3061     |**                                      |
        32 -> 63         : 4028     |***                                     |
        64 -> 127        : 9312     |*******                                 |
       128 -> 255        : 51372    |****************************************|
       256 -> 511        : 28574    |**********************                  |

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
2 years agoMerge branch 'wireguard-patches-for-5-19-rc6'
Jakub Kicinski [Thu, 7 Jul 2022 03:04:09 +0000 (20:04 -0700)]
Merge branch 'wireguard-patches-for-5-19-rc6'

Jason A. Donenfeld says:

====================
wireguard patches for 5.19-rc6

1) A few small fixups to the selftests, per usual. Of particular note is
   a fix for a test flake that occurred on especially fast systems that
   boot in less than a second.

2) An addition during this cycle of some s390 crypto interacted with the
   way wireguard selects dependencies, resulting in linker errors
   reported by the kernel test robot. So Vladis sent in a patch for
   that, which also required a small preparatory fix moving some Kconfig
   symbols around.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agowireguard: Kconfig: select CRYPTO_CHACHA_S390
Vladis Dronov [Thu, 7 Jul 2022 00:31:57 +0000 (02:31 +0200)]
wireguard: Kconfig: select CRYPTO_CHACHA_S390

Select the new implementation of CHACHA20 for S390 when available.
It is faster than the generic software implementation, but also prevents
some linker errors in certain situations.

Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/linux-kernel/[email protected]/
Signed-off-by: Vladis Dronov <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agocrypto: s390 - do not depend on CRYPTO_HW for SIMD implementations
Jason A. Donenfeld [Thu, 7 Jul 2022 00:31:56 +0000 (02:31 +0200)]
crypto: s390 - do not depend on CRYPTO_HW for SIMD implementations

Various accelerated software implementation Kconfig values for S390 were
mistakenly placed into drivers/crypto/Kconfig, even though they're
mainly just SIMD code and live in arch/s390/crypto/ like usual. This
gives them the very unusual dependency on CRYPTO_HW, which leads to
problems elsewhere.

This patch fixes the issue by moving the Kconfig values for non-hardware
drivers into the usual place in crypto/Kconfig.

Acked-by: Herbert Xu <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agowireguard: selftests: use microvm on x86
Jason A. Donenfeld [Thu, 7 Jul 2022 00:31:55 +0000 (02:31 +0200)]
wireguard: selftests: use microvm on x86

This makes for faster tests, faster compile time, and allows us to ditch
ACPI finally.

Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agowireguard: selftests: always call kernel makefile
Jason A. Donenfeld [Thu, 7 Jul 2022 00:31:54 +0000 (02:31 +0200)]
wireguard: selftests: always call kernel makefile

These selftests are used for much more extensive changes than just the
wireguard source files. So always call the kernel's build file, which
will do something or nothing after checking the whole tree, per usual.

Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agowireguard: selftests: use virt machine on m68k
Jason A. Donenfeld [Thu, 7 Jul 2022 00:31:53 +0000 (02:31 +0200)]
wireguard: selftests: use virt machine on m68k

This should be a bit more stable hopefully.

Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agowireguard: selftests: set fake real time in init
Jason A. Donenfeld [Thu, 7 Jul 2022 00:31:52 +0000 (02:31 +0200)]
wireguard: selftests: set fake real time in init

Not all platforms have an RTC, and rather than trying to force one into
each, it's much easier to just set a fixed time. This is necessary
because WireGuard's latest handshakes parameter is returned in wallclock
time, and if the system time isn't set, and the system is really fast,
then this returns 0, which trips the test.

Turning this on requires setting CONFIG_COMPAT_32BIT_TIME=y, as musl
doesn't support settimeofday without it.

Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoqed: Use bitmap_empty()
Christophe JAILLET [Tue, 5 Jul 2022 20:36:26 +0000 (22:36 +0200)]
qed: Use bitmap_empty()

Use bitmap_empty() instead of hand-writing it.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/78713a72414b99f673c3a9ec0519bb41c080935a.1657053343.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoqed: Use the bitmap API to allocate bitmaps
Christophe JAILLET [Tue, 5 Jul 2022 20:36:16 +0000 (22:36 +0200)]
qed: Use the bitmap API to allocate bitmaps

Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/d61ec77ce0b92f7539c6a144106139f8d737ec29.1657053343.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agocnic: Use the bitmap API to allocate bitmaps
Christophe JAILLET [Tue, 5 Jul 2022 20:25:58 +0000 (22:25 +0200)]
cnic: Use the bitmap API to allocate bitmaps

Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/521bd2a49be5d88e493bcfb63505d3df91a1c2d2.1657052743.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agobnxt: Use the bitmap API to allocate bitmaps
Christophe JAILLET [Tue, 5 Jul 2022 20:22:59 +0000 (22:22 +0200)]
bnxt: Use the bitmap API to allocate bitmaps

Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/d508f3adf7e2804f4d3793271b82b196a2ccb940.1657052562.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agosfc: falcon: Use the bitmap API to allocate bitmaps
Christophe JAILLET [Tue, 5 Jul 2022 19:36:51 +0000 (21:36 +0200)]
sfc: falcon: Use the bitmap API to allocate bitmaps

Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <[email protected]>
Acked-by: Martin Habets <[email protected]>
Link: https://lore.kernel.org/r/c62c1774e6a34bc64323ce526b385aa87c1ca575.1657049799.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agosfc/siena: Use the bitmap API to allocate bitmaps
Christophe JAILLET [Tue, 5 Jul 2022 19:34:08 +0000 (21:34 +0200)]
sfc/siena: Use the bitmap API to allocate bitmaps

Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <[email protected]>
Acked-by: Martin Habets <[email protected]>
Link: https://lore.kernel.org/r/717ba530215f4d7ce9fedcc73d98dba1f70d7f71.1657049636.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agor8169: fix accessing unset transport header
Heiner Kallweit [Tue, 5 Jul 2022 19:15:22 +0000 (21:15 +0200)]
r8169: fix accessing unset transport header

66e4c8d95008 ("net: warn if transport header was not set") added
a check that triggers a warning in r8169, see [0].

The commit referenced in the Fixes tag refers to the change from
which the patch applies cleanly, there's nothing wrong with this
commit. It seems the actual issue (not bug, because the warning
is harmless here) was introduced with bdfa4ed68187
("r8169: use Giant Send").

[0] https://bugzilla.kernel.org/show_bug.cgi?id=216157

Fixes: 8d520b4de3ed ("r8169: work around RTL8125 UDP hw bug")
Reported-by: Erhard F. <[email protected]>
Tested-by: Erhard F. <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: dsa: b53: remove unnecessary spi_set_drvdata()
Yang Yingliang [Tue, 5 Jul 2022 13:17:33 +0000 (21:17 +0800)]
net: dsa: b53: remove unnecessary spi_set_drvdata()

Remove unnecessary spi_set_drvdata() in b53_spi_remove(), the
driver_data will be set to NULL in device_unbind_cleanup() after
calling ->remove().

Signed-off-by: Yang Yingliang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: rose: fix UAF bug caused by rose_t0timer_expiry
Duoming Zhou [Tue, 5 Jul 2022 12:56:10 +0000 (20:56 +0800)]
net: rose: fix UAF bug caused by rose_t0timer_expiry

There are UAF bugs caused by rose_t0timer_expiry(). The
root cause is that del_timer() could not stop the timer
handler that is running and there is no synchronization.
One of the race conditions is shown below:

    (thread 1)             |        (thread 2)
                           | rose_device_event
                           |   rose_rt_device_down
                           |     rose_remove_neigh
rose_t0timer_expiry        |       rose_stop_t0timer(rose_neigh)
  ...                      |         del_timer(&neigh->t0timer)
                           |         kfree(rose_neigh) //[1]FREE
  neigh->dce_mode //[2]USE |

The rose_neigh is deallocated in position [1] and use in
position [2].

The crash trace triggered by POC is like below:

BUG: KASAN: use-after-free in expire_timers+0x144/0x320
Write of size 8 at addr ffff888009b19658 by task swapper/0/0
...
Call Trace:
 <IRQ>
 dump_stack_lvl+0xbf/0xee
 print_address_description+0x7b/0x440
 print_report+0x101/0x230
 ? expire_timers+0x144/0x320
 kasan_report+0xed/0x120
 ? expire_timers+0x144/0x320
 expire_timers+0x144/0x320
 __run_timers+0x3ff/0x4d0
 run_timer_softirq+0x41/0x80
 __do_softirq+0x233/0x544
 ...

This patch changes rose_stop_ftimer() and rose_stop_t0timer()
in rose_remove_neigh() to del_timer_sync() in order that the
timer handler could be finished before the resources such as
rose_neigh and so on are deallocated. As a result, the UAF
bugs could be mitigated.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agousbnet: fix memory leak in error case
Oliver Neukum [Tue, 5 Jul 2022 12:53:51 +0000 (14:53 +0200)]
usbnet: fix memory leak in error case

usbnet_write_cmd_async() mixed up which buffers
need to be freed in which error case.

v2: add Fixes tag
v3: fix uninitialized buf pointer

Fixes: 877bd862f32b8 ("usbnet: introduce usbnet 3 command helpers")
Signed-off-by: Oliver Neukum <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoRevert "Merge branch 'octeontx2-af-next'"
Jakub Kicinski [Thu, 7 Jul 2022 01:32:01 +0000 (18:32 -0700)]
Revert "Merge branch 'octeontx2-af-next'"

This reverts commit 2ef8e39f58f08589ab035223c2687830c0eba30f, reversing
changes made to e7ce9fc9ad38773b660ef663ae98df4f93cb6a37.

There are build warnings here which break the normal
build due to -Werror. Ratheesh was nice enough to quickly
follow up with fixes but didn't hit all the warnings I
see on GCC 12 so to unlock net-next from taking patches
let get this series out for now.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoMerge tag 'for-linus' of https://github.com/openrisc/linux
Linus Torvalds [Wed, 6 Jul 2022 17:10:26 +0000 (10:10 -0700)]
Merge tag 'for-linus' of https://github.com/openrisc/linux

Pull OpenRISC fixes from Stafford Horne:
 "Fixups for OpenRISC found during recent testing:

   - An OpenRISC irqchip fix to stop acking level interrupts which was
     causing issues on SMP platforms

   - A comment typo fix in our unwinder code"

* tag 'for-linus' of https://github.com/openrisc/linux:
  openrisc: unwinder: Fix grammar issue in comment
  irqchip: or1k-pic: Undefine mask_ack for level triggered hardware

2 years agoMerge tag 'sound-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Wed, 6 Jul 2022 17:01:00 +0000 (10:01 -0700)]
Merge tag 'sound-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "This became largish as it includes the pending ASoC fixes.

  Almost all changes are device-specific small fixes, while many of them
  are coverage for mixer issues that were detected by selftest. In
  addition, usual suspects for HD/USB-audio are there"

* tag 'sound-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (43 commits)
  ALSA: cs46xx: Fix missing snd_card_free() call at probe error
  ALSA: usb-audio: Add quirk for Fiero SC-01 (fw v1.0.0)
  ALSA: usb-audio: Add quirk for Fiero SC-01
  ALSA: hda/realtek: Add quirk for Clevo L140PU
  ALSA: usb-audio: Add quirks for MacroSilicon MS2100/MS2106 devices
  ASoC: madera: Fix event generation for rate controls
  ASoC: madera: Fix event generation for OUT1 demux
  ASoC: cs47l15: Fix event generation for low power mux control
  ASoC: cs35l41: Add ASP TX3/4 source to register patch
  ASoC: dapm: Initialise kcontrol data for mux/demux controls
  ASoC: rt711-sdca: fix kernel NULL pointer dereference when IO error
  ASoC: cs35l41: Correct some control names
  ASoC: wm5110: Fix DRE control
  ASoC: wm_adsp: Fix event for preloader
  MAINTAINERS: update ASoC Qualcomm maintainer email-id
  ASoC: rockchip: i2s: switch BCLK to GPIO
  ASoC: SOF: Intel: disable IMR boot when resuming from ACPI S4 and S5 states
  ASoC: SOF: pm: add definitions for S4 and S5 states
  ASoC: SOF: pm: add explicit behavior for ACPI S1 and S2
  ASoC: SOF: Intel: hda: Fix compressed stream position tracking
  ...

2 years agolibbpf: Remove unnecessary usdt_rel_ip assignments
Andrii Nakryiko [Tue, 5 Jul 2022 22:48:18 +0000 (15:48 -0700)]
libbpf: Remove unnecessary usdt_rel_ip assignments

Coverity detected that usdt_rel_ip is unconditionally overwritten
anyways, so there is no need to unnecessarily initialize it with unused
value. Clean this up.

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoselftests/bpf: Fix few more compiler warnings
Andrii Nakryiko [Tue, 5 Jul 2022 22:48:17 +0000 (15:48 -0700)]
selftests/bpf: Fix few more compiler warnings

When compiling with -O2, GCC detects few problems with selftests/bpf, so
fix all of them. Two are real issues (uninitialized err and nums
out-of-bounds access), but two other uninitialized variables warnings
are due to GCC not being able to prove that variables are indeed
initialized under conditions under which they are used.

Fix all 4 cases, though.

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoselftests/bpf: Fix bogus uninitialized variable warning
Andrii Nakryiko [Tue, 5 Jul 2022 22:48:16 +0000 (15:48 -0700)]
selftests/bpf: Fix bogus uninitialized variable warning

When compiling selftests/bpf in optimized mode (-O2), GCC erroneously
complains about uninitialized token variable:

  In file included from network_helpers.c:22:
  network_helpers.c: In function ‘open_netns’:
  test_progs.h:355:22: error: ‘token’ may be used uninitialized [-Werror=maybe-uninitialized]
    355 |         int ___err = libbpf_get_error(___res);                          \
        |                      ^~~~~~~~~~~~~~~~~~~~~~~~
  network_helpers.c:440:14: note: in expansion of macro ‘ASSERT_OK_PTR’
    440 |         if (!ASSERT_OK_PTR(token, "malloc token"))
        |              ^~~~~~~~~~~~~
  In file included from /data/users/andriin/linux/tools/testing/selftests/bpf/tools/include/bpf/libbpf.h:21,
                   from bpf_util.h:9,
                   from network_helpers.c:20:
  /data/users/andriin/linux/tools/testing/selftests/bpf/tools/include/bpf/libbpf_legacy.h:113:17: note: by argument 1 of type ‘const void *’ to ‘libbpf_get_error’ declared here
    113 | LIBBPF_API long libbpf_get_error(const void *ptr);
        |                 ^~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors
  make: *** [Makefile:522: /data/users/andriin/linux/tools/testing/selftests/bpf/network_helpers.o] Error 1

This is completely bogus becuase libbpf_get_error() doesn't dereference
pointer, but the only easy way to silence this is to allocate initialized
memory with calloc().

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoRevert "tls: rx: move counting TlsDecryptErrors for sync"
Gal Pressman [Tue, 5 Jul 2022 11:08:37 +0000 (14:08 +0300)]
Revert "tls: rx: move counting TlsDecryptErrors for sync"

This reverts commit 284b4d93daee56dff3e10029ddf2e03227f50dbf.
When using TLS device offload and coming from tls_device_reencrypt()
flow, -EBADMSG error in tls_do_decryption() should not be counted
towards the TLSTlsDecryptError counter.

Move the counter increase back to the decrypt_internal() call site in
decrypt_skb_update().
This also fixes an issue where:
if (n_sgin < 1)
return -EBADMSG;

Errors in decrypt_internal() were not counted after the cited patch.

Fixes: 284b4d93daee ("tls: rx: move counting TlsDecryptErrors for sync")
Cc: Jakub Kicinski <[email protected]>
Reviewed-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Gal Pressman <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoMerge branch 'hinic-dev_get_stats-fixes'
David S. Miller [Wed, 6 Jul 2022 12:09:28 +0000 (13:09 +0100)]
Merge branch 'hinic-dev_get_stats-fixes'

Qiao Ma says:

====================
net: hinic: fix bugs about dev_get_stats

These patches fixes 2 bugs of hinic driver:
- fix bug that ethtool get wrong stats because of hinic_{txq|rxq}_clean_stats() is called
- avoid kernel hung in hinic_get_stats64()

See every patch for more information.

Changes in v4:
- removed meaningless u64_stats_sync protection in hinic_{txq|rxq}_get_stats
- merged the third patch in v2 into first one

Changes in v3:
- fixes a compile warning reported by kernel test robot <[email protected]>

Changes in v2:
- fixes another 2 bugs. (v1 is a single patch, see: https://lore.kernel.org/all/07736c2b7019b6883076a06129e06e8f7c5f7154.1656487154[email protected]/).
- to fix extra bugs, hinic_dev.tx_stats/rx_stats is removed, so there is no need to use spinlock or semaphore now.
====================

Signed-off-by: David S. Miller <[email protected]>
2 years agonet: hinic: avoid kernel hung in hinic_get_stats64()
Qiao Ma [Tue, 5 Jul 2022 11:22:23 +0000 (19:22 +0800)]
net: hinic: avoid kernel hung in hinic_get_stats64()

When using hinic device as a bond slave device, and reading device stats
of master bond device, the kernel may hung.

The kernel panic calltrace as follows:
Kernel panic - not syncing: softlockup: hung tasks
Call trace:
  native_queued_spin_lock_slowpath+0x1ec/0x31c
  dev_get_stats+0x60/0xcc
  dev_seq_printf_stats+0x40/0x120
  dev_seq_show+0x1c/0x40
  seq_read_iter+0x3c8/0x4dc
  seq_read+0xe0/0x130
  proc_reg_read+0xa8/0xe0
  vfs_read+0xb0/0x1d4
  ksys_read+0x70/0xfc
  __arm64_sys_read+0x20/0x30
  el0_svc_common+0x88/0x234
  do_el0_svc+0x2c/0x90
  el0_svc+0x1c/0x30
  el0_sync_handler+0xa8/0xb0
  el0_sync+0x148/0x180

And the calltrace of task that actually caused kernel hungs as follows:
  __switch_to+124
  __schedule+548
  schedule+72
  schedule_timeout+348
  __down_common+188
  __down+24
  down+104
  hinic_get_stats64+44 [hinic]
  dev_get_stats+92
  bond_get_stats+172 [bonding]
  dev_get_stats+92
  dev_seq_printf_stats+60
  dev_seq_show+24
  seq_read_iter+964
  seq_read+220
  proc_reg_read+164
  vfs_read+172
  ksys_read+108
  __arm64_sys_read+28
  el0_svc_common+132
  do_el0_svc+40
  el0_svc+24
  el0_sync_handler+164
  el0_sync+324

When getting device stats from bond, kernel will call bond_get_stats().
It first holds the spinlock bond->stats_lock, and then call
hinic_get_stats64() to collect hinic device's stats.
However, hinic_get_stats64() calls `down(&nic_dev->mgmt_lock)` to
protect its critical section, which may schedule current task out.
And if system is under high pressure, the task cannot be woken up
immediately, which eventually triggers kernel hung panic.

Since previous patch has replaced hinic_dev.tx_stats/rx_stats with local
variable in hinic_get_stats64(), there is nothing need to be protected
by lock, so just removing down()/up() is ok.

Fixes: edd384f682cc ("net-next/hinic: Add ethtool and stats")
Signed-off-by: Qiao Ma <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agonet: hinic: fix bug that ethtool get wrong stats
Qiao Ma [Tue, 5 Jul 2022 11:22:22 +0000 (19:22 +0800)]
net: hinic: fix bug that ethtool get wrong stats

Function hinic_get_stats64() will do two operations:
1. reads stats from every hinic_rxq/txq and accumulates them
2. calls hinic_rxq/txq_clean_stats() to clean every rxq/txq's stats

For hinic_get_stats64(), it could get right data, because it sums all
data to nic_dev->rx_stats/tx_stats.
But it is wrong for get_drv_queue_stats(), this function will read
hinic_rxq's stats, which have been cleared to zero by hinic_get_stats64().

I have observed hinic's cleanup operation by using such command:
> watch -n 1 "cat ethtool -S eth4 | tail -40"

Result before:
     ...
     rxq7_pkts: 1
     rxq7_bytes: 90
     rxq7_errors: 0
     rxq7_csum_errors: 0
     rxq7_other_errors: 0
     ...
     rxq9_pkts: 11
     rxq9_bytes: 726
     rxq9_errors: 0
     rxq9_csum_errors: 0
     rxq9_other_errors: 0
     ...
     rxq11_pkts: 0
     rxq11_bytes: 0
     rxq11_errors: 0
     rxq11_csum_errors: 0
     rxq11_other_errors: 0

Result after a few seconds:
     ...
     rxq7_pkts: 0
     rxq7_bytes: 0
     rxq7_errors: 0
     rxq7_csum_errors: 0
     rxq7_other_errors: 0
     ...
     rxq9_pkts: 2
     rxq9_bytes: 132
     rxq9_errors: 0
     rxq9_csum_errors: 0
     rxq9_other_errors: 0
     ...
     rxq11_pkts: 1
     rxq11_bytes: 170
     rxq11_errors: 0
     rxq11_csum_errors: 0
     rxq11_other_errors: 0

To solve this problem, we just keep every queue's total stats in their own
queue (aka hinic_{rxq|txq}), and simply sum all per-queue stats every time
calling hinic_get_stats64().
With that solution, there is no need to clean per-queue stats now,
and there is no need to maintain global hinic_dev.{tx|rx}_stats, too.

Fixes: edd384f682cc ("net-next/hinic: Add ethtool and stats")
Signed-off-by: Qiao Ma <[email protected]>
Reported-by: kernel test robot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoMerge branch 'tls-rx-nopad-and-backlog-flushing'
David S. Miller [Wed, 6 Jul 2022 11:56:35 +0000 (12:56 +0100)]
Merge branch 'tls-rx-nopad-and-backlog-flushing'

Jakub Kicinski says:

====================
tls: rx: nopad and backlog flushing

This small series contains the two changes I've been working
towards in the previous ~50 patches a couple of months ago.

The first major change is the optional "nopad" optimization.
Currently TLS 1.3 Rx performs quite poorly because it does
not support the "zero-copy" or rather direct decrypt to a user
space buffer. Because of TLS 1.3 record padding we don't
know if a record contains data or a control message until
we decrypt it. Most records will contain data, tho, so the
optimization is to try the decryption hoping its data and
retry if it wasn't.

The performance gain from doing that is significant (~40%)
but if I'm completely honest the major reason is that we
call skb_cow_data() on the non-"zc" path. The next series
will remove the CoW, dropping the gain to only ~10%.

The second change is to flush the backlog every 128kB.
====================

Signed-off-by: David S. Miller <[email protected]>
2 years agotls: rx: periodically flush socket backlog
Jakub Kicinski [Tue, 5 Jul 2022 23:59:26 +0000 (16:59 -0700)]
tls: rx: periodically flush socket backlog

We continuously hold the socket lock during large reads and writes.
This may inflate RTT and negatively impact TCP performance.
Flush the backlog periodically. I tried to pick a flush period (128kB)
which gives significant benefit but the max Bps rate is not yet visibly
impacted.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoselftests: tls: add selftest variant for pad
Jakub Kicinski [Tue, 5 Jul 2022 23:59:25 +0000 (16:59 -0700)]
selftests: tls: add selftest variant for pad

Add a self-test variant with TLS 1.3 nopad set.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agotls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3
Jakub Kicinski [Tue, 5 Jul 2022 23:59:24 +0000 (16:59 -0700)]
tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3

Since optimisitic decrypt may add extra load in case of retries
require socket owner to explicitly opt-in.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agotls: rx: support optimistic decrypt to user buffer with TLS 1.3
Jakub Kicinski [Tue, 5 Jul 2022 23:59:23 +0000 (16:59 -0700)]
tls: rx: support optimistic decrypt to user buffer with TLS 1.3

We currently don't support decrypt to user buffer with TLS 1.3
because we don't know the record type and how much padding
record contains before decryption. In practice data records
are by far most common and padding gets used rarely so
we can assume data record, no padding, and if we find out
that wasn't the case - retry the crypto in place (decrypt
to skb).

To safeguard from user overwriting content type and padding
before we can check it attach a 1B sg entry where last byte
of the record will land.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agotls: rx: don't include tail size in data_len
Jakub Kicinski [Tue, 5 Jul 2022 23:59:22 +0000 (16:59 -0700)]
tls: rx: don't include tail size in data_len

To make future patches easier to review make data_len
contain the length of the data, without the tail.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoMerge branch 'mptcp-path-manager-fixes'
David S. Miller [Wed, 6 Jul 2022 11:50:27 +0000 (12:50 +0100)]
Merge branch 'mptcp-path-manager-fixes'

Mat Martineau says:

====================
mptcp: Path manager fixes for 5.19

The MPTCP userspace path manager is new in 5.19, and these patches fix
some issues in that new code.

Patches 1-3 fix path manager locking issues.

Patches 4 and 5 allow userspace path managers to change priority of
established subflows using the existing MPTCP_PM_CMD_SET_FLAGS generic
netlink command. Includes corresponding self test update.

Patches 6 and 7 fix accounting of available endpoint IDs and the
MPTCP_MIB_RMSUBFLOW counter.
====================

Signed-off-by: David S. Miller <[email protected]>
2 years agomptcp: update MIB_RMSUBFLOW in cmd_sf_destroy
Geliang Tang [Tue, 5 Jul 2022 21:32:17 +0000 (14:32 -0700)]
mptcp: update MIB_RMSUBFLOW in cmd_sf_destroy

This patch increases MPTCP_MIB_RMSUBFLOW mib counter in userspace pm
destroy subflow function mptcp_nl_cmd_sf_destroy() when removing subflow.

Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agomptcp: fix local endpoint accounting
Paolo Abeni [Tue, 5 Jul 2022 21:32:16 +0000 (14:32 -0700)]
mptcp: fix local endpoint accounting

In mptcp_pm_nl_rm_addr_or_subflow() we always mark as available
the id corresponding to the just removed address.

The used bitmap actually tracks only the local IDs: we must
restrict the operation when a (local) subflow is removed.

Fixes: a88c9e496937 ("mptcp: do not block subflows creation on errors")
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoselftests: mptcp: userspace PM support for MP_PRIO signals
Kishen Maloor [Tue, 5 Jul 2022 21:32:15 +0000 (14:32 -0700)]
selftests: mptcp: userspace PM support for MP_PRIO signals

This change updates the testing sample (pm_nl_ctl) to exercise
the updated MPTCP_PM_CMD_SET_FLAGS command for userspace PMs to
issue MP_PRIO signals over the selected subflow.

E.g. ./pm_nl_ctl set 10.0.1.2 port 47234 flags backup token 823274047 rip 10.0.1.1 rport 50003

userspace_pm.sh has a new selftest that invokes this command.

Fixes: 259a834fadda ("selftests: mptcp: functional tests for the userspace PM type")
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Kishen Maloor <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agomptcp: netlink: issue MP_PRIO signals from userspace PMs
Kishen Maloor [Tue, 5 Jul 2022 21:32:14 +0000 (14:32 -0700)]
mptcp: netlink: issue MP_PRIO signals from userspace PMs

This change updates MPTCP_PM_CMD_SET_FLAGS to allow userspace PMs
to issue MP_PRIO signals over a specific subflow selected by
the connection token, local and remote address+port.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/286
Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Kishen Maloor <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agomptcp: Acquire the subflow socket lock before modifying MP_PRIO flags
Mat Martineau [Tue, 5 Jul 2022 21:32:13 +0000 (14:32 -0700)]
mptcp: Acquire the subflow socket lock before modifying MP_PRIO flags

When setting up a subflow's flags for sending MP_PRIO MPTCP options, the
subflow socket lock was not held while reading and modifying several
struct members that are also read and modified in mptcp_write_options().

Acquire the subflow socket lock earlier and send the MP_PRIO ACK with
that lock already acquired. Add a new variant of the
mptcp_subflow_send_ack() helper to use with the subflow lock held.

Fixes: 067065422fcd ("mptcp: add the outgoing MP_PRIO support")
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agomptcp: Avoid acquiring PM lock for subflow priority changes
Mat Martineau [Tue, 5 Jul 2022 21:32:12 +0000 (14:32 -0700)]
mptcp: Avoid acquiring PM lock for subflow priority changes

The in-kernel path manager code for changing subflow flags acquired both
the msk socket lock and the PM lock when possibly changing the "backup"
and "fullmesh" flags. mptcp_pm_nl_mp_prio_send_ack() does not access
anything protected by the PM lock, and it must release and reacquire
the PM lock.

By pushing the PM lock to where it is needed in mptcp_pm_nl_fullmesh(),
the lock is only acquired when the fullmesh flag is changed and the
backup flag code no longer has to release and reacquire the PM lock. The
change in locking context requires the MIB update to be modified - move
that to a better location instead.

This change also makes it possible to call
mptcp_pm_nl_mp_prio_send_ack() for the userspace PM commands without
manipulating the in-kernel PM lock.

Fixes: 0f9f696a502e ("mptcp: add set_flags command in PM netlink")
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agomptcp: fix locking in mptcp_nl_cmd_sf_destroy()
Paolo Abeni [Tue, 5 Jul 2022 21:32:11 +0000 (14:32 -0700)]
mptcp: fix locking in mptcp_nl_cmd_sf_destroy()

The user-space PM subflow removal path uses a couple of helpers
that must be called under the msk socket lock and the current
code lacks such requirement.

Change the existing lock scope so that the relevant code is under
its protection.

Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/287
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoMerge branch 'act_police-continue-offload-fix'
David S. Miller [Wed, 6 Jul 2022 11:44:39 +0000 (12:44 +0100)]
Merge branch 'act_police-continue-offload-fix'

Vlad Buslov says:

====================
net: Fix police 'continue' action offload

TC act_police with 'continue' action had been supported by mlx5 matchall
classifier offload implementation for some time. However, 'continue' was
assumed implicitly and recently got broken in multiple places. Fix it in
both TC hardware offload validation code and mlx5 driver.
====================

Signed-off-by: David S. Miller <[email protected]>
2 years agonet/mlx5e: Fix matchall police parameters validation
Vlad Buslov [Mon, 4 Jul 2022 20:44:05 +0000 (22:44 +0200)]
net/mlx5e: Fix matchall police parameters validation

Referenced commit prepared the code for upcoming extension that allows mlx5
to offload police action attached to flower classifier. However, with
regard to existing matchall classifier offload validation should be
reversed as FLOW_ACTION_CONTINUE is the only supported notexceed police
action type. Fix the problem by allowing FLOW_ACTION_CONTINUE for police
action and extend scan_tc_matchall_fdb_actions() to only allow such actions
with matchall classifier.

Fixes: d97b4b105ce7 ("flow_offload: reject offload for all drivers with invalid police parameters")
Signed-off-by: Vlad Buslov <[email protected]>
Acked-by: Saeed Mahameed <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agonet/sched: act_police: allow 'continue' action offload
Vlad Buslov [Mon, 4 Jul 2022 20:44:04 +0000 (22:44 +0200)]
net/sched: act_police: allow 'continue' action offload

Offloading police with action TC_ACT_UNSPEC was erroneously disabled even
though it was supported by mlx5 matchall offload implementation, which
didn't verify the action type but instead assumed that any single police
action attached to matchall classifier is a 'continue' action. Lack of
action type check made it non-obvious what mlx5 matchall implementation
actually supports and caused implementers and reviewers of referenced
commits to disallow it as a part of improved validation code.

Fixes: b8cd5831c61c ("net: flow_offload: add tc police action parameters")
Fixes: b50e462bc22d ("net/sched: act_police: Add extack messages for offload failure")
Signed-off-by: Vlad Buslov <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoMerge branch 'octeontx2-af-next'
David S. Miller [Wed, 6 Jul 2022 07:16:48 +0000 (08:16 +0100)]
Merge branch 'octeontx2-af-next'

Ratheesh Kannoth says:

====================
octeontx2: *** Exact Match Table and Field hash ***

*** Exact match table and Field hash support for CN10KB silicon ***

Ratheesh Kannoth (11):

These patch series enables exact match table in CN10KB silicon. Legacy
silicon used NPC mcam to do packet fields/channel matching for NPC rules.
NPC mcam resources exahausted as customer use case increased.
Supporting many DMAC filter becomes a challenge, as RPM based filter
count is less. Exact match table has 4way 2K entry table and a 32 entry
fully associative cam table. Second table is to handle hash
table collision overflows in 4way 2K entry table. Enabling exact match table
results in KEX key to be appended with Hit/Miss status. This can be used
to match in NPC mcam for a more generic rule and drop those packets than
having DMAC drop rules for each DMAC entry in NPC mcam.

  octeontx2-af: Exact match support
  octeontx2-af: Exact match scan from kex profile
  octeontx2-af: devlink configuration support
  octeontx2-af: FLR handler for exact match table.
  octeontx2-af: Drop rules for NPC MCAM
  octeontx2-af: Debugsfs support for exact match.
  octeontx2: Modify mbox request and response structures
  octeontx2-af: Wrapper functions for mac addr add/del/update/reset
  octeontx2-af: Invoke exact match functions if supported
  octeontx2-pf: Add support for exact match table.
  octeontx2-af: Enable Exact match flag in kex profile

Suman Ghosh (1):

CN10KB variant of CN10K series of silicons supports
a new feature where in a large protocol field
(eg 128bit IPv6 DIP) can be condensed into a small
hashed 32bit data. This saves a lot of space in MCAM key
and allows user to add more protocol fields into the filter.
A max of two such protocol data can be hashed.
This patch adds support for hashing IPv6 SIP and/or DIP.
====================

Signed-off-by: David S. Miller <[email protected]>
2 years agoocteontx2-af: Enable Exact match flag in kex profile
Ratheesh Kannoth [Wed, 6 Jul 2022 03:44:42 +0000 (09:14 +0530)]
octeontx2-af: Enable Exact match flag in kex profile

Enabled EXACT match flag in Kex default profile. Since
there is no space in key, NPC_PARSE_NIBBLE_ERRCODE
is removed

Signed-off-by: Ratheesh Kannoth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoocteontx2-pf: Add support for exact match table.
Ratheesh Kannoth [Wed, 6 Jul 2022 03:44:41 +0000 (09:14 +0530)]
octeontx2-pf: Add support for exact match table.

NPC exact match table can support more entries than RPM
dmac filters. This requires field size of DMAC filter count
and index to be increased.

Signed-off-by: Ratheesh Kannoth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoocteontx2-af: Invoke exact match functions if supported
Ratheesh Kannoth [Wed, 6 Jul 2022 03:44:40 +0000 (09:14 +0530)]
octeontx2-af: Invoke exact match functions if supported

If exact match table is suppoted, call functions to add/del/update
entries in exact match table instead of RPM dmac filters

Signed-off-by: Ratheesh Kannoth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoocteontx2-af: Wrapper functions for MAC addr add/del/update/reset
Ratheesh Kannoth [Wed, 6 Jul 2022 03:44:39 +0000 (09:14 +0530)]
octeontx2-af: Wrapper functions for MAC addr add/del/update/reset

These functions are wrappers for mac add/addr/del/update in
exact match table. These will be invoked from mbox handler routines
if exact matct table is supported and enabled.

Signed-off-by: Ratheesh Kannoth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoocteontx2: Modify mbox request and response structures
Ratheesh Kannoth [Wed, 6 Jul 2022 03:44:38 +0000 (09:14 +0530)]
octeontx2: Modify mbox request and response structures

Exact match table modification requires wider fields as it has
more number of slots to fill in. Modifying an entry in exact match
table may cause hash collision and may be required to delete entry
from 4-way 2K table and add to fully associative 32 entry CAM table.

Signed-off-by: Ratheesh Kannoth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoocteontx2-af: Debugsfs support for exact match.
Ratheesh Kannoth [Wed, 6 Jul 2022 03:44:37 +0000 (09:14 +0530)]
octeontx2-af: Debugsfs support for exact match.

There debugfs files created.
1. General information on exact match table
2. Exact match table entries.
3. NPC mcam drop on hit count stats.

Signed-off-by: Ratheesh Kannoth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoocteontx2-af: Drop rules for NPC MCAM
Ratheesh Kannoth [Wed, 6 Jul 2022 03:44:36 +0000 (09:14 +0530)]
octeontx2-af: Drop rules for NPC MCAM

NPC exact match table installs drop on hit rules in
NPC mcam for each channel. This rule has broadcast and multicast
bits cleared. Exact match bit cleared and channel bits
set. If exact match table hit bit is 0, corresponding NPC mcam
drop rule will be hit for the packet and will be dropped.

Signed-off-by: Ratheesh Kannoth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoocteontx2-af: FLR handler for exact match table.
Ratheesh Kannoth [Wed, 6 Jul 2022 03:44:35 +0000 (09:14 +0530)]
octeontx2-af: FLR handler for exact match table.

FLR handler should remove/free all exact match table resources
corresponding to each interface.

Signed-off-by: Ratheesh Kannoth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoocteontx2-af: devlink configuration support
Ratheesh Kannoth [Wed, 6 Jul 2022 03:44:34 +0000 (09:14 +0530)]
octeontx2-af: devlink configuration support

CN10KB silicon supports Exact match feature. This feature can be disabled
through devlink configuration. Devlink command fails if DMAC filter rules
are already present. Once disabled, legacy RPM based DMAC filters will be
configured.

Signed-off-by: Ratheesh Kannoth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoocteontx2-af: Exact match scan from kex profile
Ratheesh Kannoth [Wed, 6 Jul 2022 03:44:33 +0000 (09:14 +0530)]
octeontx2-af: Exact match scan from kex profile

CN10KB silicon supports exact match table. Scanning KEX
profile should check for exact match feature is enabled
and then set profile masks properly.

These kex profile masks are required to configure NPC
MCAM drop rules. If there is a miss in exact match table,
these drop rules will drop those packets.

Signed-off-by: Ratheesh Kannoth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoocteontx2-af: Exact match support
Ratheesh Kannoth [Wed, 6 Jul 2022 03:44:32 +0000 (09:14 +0530)]
octeontx2-af: Exact match support

CN10KB silicon has support for exact match table. This table
can be used to match maimum 64 bit value of KPU parsed output.
Hit/non hit in exact match table can be used as a KEX key to
NPC mcam.

This patch makes use of Exact match table to increase number of
DMAC filters supported. NPC  mcam is no more need for each of these
DMAC entries as will be populated in Exact match table.

This patch implements following

1. Initialization of exact match table only for CN10KB.
2. Add/del/update interface function for exact match table.

Signed-off-by: Ratheesh Kannoth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoocteontx2-af: Use hashed field in MCAM key
Ratheesh Kannoth [Wed, 6 Jul 2022 03:44:31 +0000 (09:14 +0530)]
octeontx2-af: Use hashed field in MCAM key

CN10KB variant of CN10K series of silicons supports
a new feature where in a large protocol field
(eg 128bit IPv6 DIP) can be condensed into a small
hashed 32bit data. This saves a lot of space in MCAM key
and allows user to add more protocol fields into the filter.
A max of two such protocol data can be hashed.
This patch adds support for hashing IPv6 SIP and/or DIP.

Signed-off-by: Suman Ghosh <[email protected]>
Signed-off-by: Ratheesh Kannoth <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agoMerge branch 'nfp-tso'
David S. Miller [Wed, 6 Jul 2022 07:15:51 +0000 (08:15 +0100)]
Merge branch 'nfp-tso'

Merge branch 'nfp-tso'

Simon Horman says:

====================
nfp: enable TSO by default

this short series enables TSO by default on all NICs supported by the NFP
driver.
====================

Signed-off-by: David S. Miller <[email protected]>
2 years agonfp: enable TSO by default for nfp netdev
Simon Horman [Tue, 5 Jul 2022 07:36:04 +0000 (08:36 +0100)]
nfp: enable TSO by default for nfp netdev

We can benefit from TSO when the host CPU is not powerful enough,
so enable it by default now.

Signed-off-by: Yinjun Zhang <[email protected]>
Reviewed-by: Louis Peens <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agonfp: allow TSO packets with metadata prepended in NFDK path
Yinjun Zhang [Tue, 5 Jul 2022 07:36:03 +0000 (08:36 +0100)]
nfp: allow TSO packets with metadata prepended in NFDK path

Packets with metadata prepended can be correctly handled in
firmware when TSO is enabled, now remove the error path and
related comments. Since there's no existing firmware that
uses prepended metadata, no need to add compatibility check
here.

Signed-off-by: Yinjun Zhang <[email protected]>
Reviewed-by: Louis Peens <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
2 years agobpftool: Remove zlib feature test from Makefile
Quentin Monnet [Tue, 5 Jul 2022 20:04:56 +0000 (21:04 +0100)]
bpftool: Remove zlib feature test from Makefile

The feature test to detect the availability of zlib in bpftool's
Makefile does not bring much. The library is not optional: it may or may
not be required along libbfd for disassembling instructions, but in any
case it is necessary to build feature.o or even libbpf, on which bpftool
depends.

If we remove the feature test, we lose the nicely formatted error
message, but we get a compiler error about "zlib.h: No such file or
directory", which is equally informative. Let's get rid of the test.

Suggested-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Quentin Monnet <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoMerge branch 'cleanup the legacy probe_event on failed scenario'
Andrii Nakryiko [Wed, 6 Jul 2022 04:20:42 +0000 (21:20 -0700)]
Merge branch 'cleanup the legacy probe_event on failed scenario'

Chuang Wang says:

====================
A potential scenario, when an error is returned after
add_uprobe_event_legacy() in perf_event_uprobe_open_legacy(), or
bpf_program__attach_perf_event_opts() in
bpf_program__attach_uprobe_opts() returns an error, the uprobe_event
that was previously created is not cleaned.

At the same time, the legacy kprobe_event also have similar problems.

With these patches, whenever an error is returned, it ensures that
the created kprobe_event/uprobe_event is cleaned.

V1 -> v3:

- add detail commits
- call remove_kprobe_event_legacy() on failed bpf_program__attach_perf_event_opts()

v3 -> v4:

- cleanup the legacy kprobe_event on failed add/attach_event
====================

Signed-off-by: Andrii Nakryiko <[email protected]>
2 years agolibbpf: Cleanup the legacy uprobe_event on failed add/attach_event()
Chuang Wang [Wed, 29 Jun 2022 15:18:47 +0000 (23:18 +0800)]
libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()

A potential scenario, when an error is returned after
add_uprobe_event_legacy() in perf_event_uprobe_open_legacy(), or
bpf_program__attach_perf_event_opts() in
bpf_program__attach_uprobe_opts() returns an error, the uprobe_event
that was previously created is not cleaned.

So, with this patch, when an error is returned, fix this by adding
remove_uprobe_event_legacy()

Signed-off-by: Chuang Wang <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agolibbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()
Chuang Wang [Wed, 29 Jun 2022 15:18:46 +0000 (23:18 +0800)]
libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()

Use "type" as opposed to "err" in pr_warn() after
determine_uprobe_perf_type_legacy() returns an error.

Signed-off-by: Chuang Wang <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agolibbpf: Cleanup the legacy kprobe_event on failed add/attach_event()
Chuang Wang [Wed, 29 Jun 2022 15:18:45 +0000 (23:18 +0800)]
libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()

Before the 0bc11ed5ab60 commit ("kprobes: Allow kprobes coexist with
livepatch"), in a scenario where livepatch and kprobe coexist on the
same function entry, the creation of kprobe_event using
add_kprobe_event_legacy() will be successful, at the same time as a
trace event (e.g. /debugfs/tracing/events/kprobe/XXX) will exist, but
perf_event_open() will return an error because both livepatch and kprobe
use FTRACE_OPS_FL_IPMODIFY. As follows:

1) add a livepatch

$ insmod livepatch-XXX.ko

2) add a kprobe using tracefs API (i.e. add_kprobe_event_legacy)

$ echo 'p:mykprobe XXX' > /sys/kernel/debug/tracing/kprobe_events

3) enable this kprobe (i.e. sys_perf_event_open)

This will return an error, -EBUSY.

On Andrii Nakryiko's comment, few error paths in
bpf_program__attach_kprobe_opts() that should need to call
remove_kprobe_event_legacy().

With this patch, whenever an error is returned after
add_kprobe_event_legacy() or bpf_program__attach_perf_event_opts(), this
ensures that the created kprobe_event is cleaned.

Signed-off-by: Chuang Wang <[email protected]>
Signed-off-by: Jingren Zhou <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoMerge branch 'Introduce type match support'
Andrii Nakryiko [Wed, 6 Jul 2022 03:24:13 +0000 (20:24 -0700)]
Merge branch 'Introduce type match support'

Daniel Müller says:

====================

This patch set proposes the addition of a new way for performing type queries to
BPF. It introduces the "type matches" relation, similar to what is already
present with "type exists" (in the form of bpf_core_type_exists).

"type exists" performs fairly superficial checking, mostly concerned with
whether a type exists in the kernel and is of the same kind (enum/struct/...).
Notably, compatibility checks for members of composite types is lacking.

The newly introduced "type matches" (bpf_core_type_matches) fills this gap in
that it performs stricter checks: compatibility of members and existence of
similarly named enum variants is checked as well. E.g., given these definitions:

struct task_struct___og { int pid; int tgid; };

struct task_struct___foo { int foo; }

'task_struct___og' would "match" the kernel type 'task_struct', because the
members match up, while 'task_struct___foo' would not match, because the
kernel's 'task_struct' has no member named 'foo'.

More precisely, the "type match" relation is defined as follows (copied from
source):
- modifiers and typedefs are stripped (and, hence, effectively ignored)
- generally speaking types need to be of same kind (struct vs. struct, union
  vs. union, etc.)
  - exceptions are struct/union behind a pointer which could also match a
    forward declaration of a struct or union, respectively, and enum vs.
    enum64 (see below)
Then, depending on type:
- integers:
  - match if size and signedness match
- arrays & pointers:
  - target types are recursively matched
- structs & unions:
  - local members need to exist in target with the same name
  - for each member we recursively check match unless it is already behind a
    pointer, in which case we only check matching names and compatible kind
- enums:
  - local variants have to have a match in target by symbolic name (but not
    numeric value)
  - size has to match (but enum may match enum64 and vice versa)
- function pointers:
  - number and position of arguments in local type has to match target
  - for each argument and the return value we recursively check match

Enabling this feature requires a new relocation to be made known to the
compiler. This is being taken care of for LLVM as part of
https://reviews.llvm.org/D126838.

If applied, among other things, usage of this functionality could have helped
flag issues such as the one discussed here
https://lore.kernel.org/all/93a20759600c05b6d9e4359a1517c88e06b44834[email protected]/
earlier.

Suggested-by: Andrii Nakryiko <[email protected]>
---
Changelog:
v2 -> v3:
- renamed btfgen_mark_types_match
- covered BTF_KIND_RESTRICT in type match marking logic
- used bpf_core_names_match in more places
- reworked "behind pointer" logic
- added test using live task_struct

v1 -> v2:
- deduplicated and moved core algorithm into relo_core.c
- adjusted bpf_core_names_match to get btf_type passed in
- removed some length equality checks before strncmp usage
- correctly use kflag from targ_t instead of local_t
- added comment for meaning of kflag w/ FWD kind
- __u32 -> u32
- handle BTF_KIND_FWD properly in bpftool marking logic
- rebased
====================

Signed-off-by: Andrii Nakryiko <[email protected]>
2 years agoselftests/bpf: Add type match test against kernel's task_struct
Daniel Müller [Tue, 28 Jun 2022 16:01:27 +0000 (16:01 +0000)]
selftests/bpf: Add type match test against kernel's task_struct

This change extends the existing core_reloc/kernel test to include a
type match check of a local task_struct against the kernel's definition
-- which we assume to succeed.

Signed-off-by: Daniel Müller <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoselftests/bpf: Add nested type to type based tests
Daniel Müller [Tue, 28 Jun 2022 16:01:26 +0000 (16:01 +0000)]
selftests/bpf: Add nested type to type based tests

This change extends the type based tests with another struct type (in
addition to a_struct) to check relocations against: a_complex_struct.
This type is nested more deeply to provide additional coverage of
certain paths in the type match logic.

Signed-off-by: Daniel Müller <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoselftests/bpf: Add test checking more characteristics
Daniel Müller [Tue, 28 Jun 2022 16:01:25 +0000 (16:01 +0000)]
selftests/bpf: Add test checking more characteristics

This change adds another type-based self-test that specifically aims to
test some more characteristics of the TYPE_MATCH logic. Specifically, it
covers a few more potential differences between types, such as different
orders, enum variant values, and integer signedness.

Signed-off-by: Daniel Müller <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoselftests/bpf: Add type-match checks to type-based tests
Daniel Müller [Tue, 28 Jun 2022 16:01:24 +0000 (16:01 +0000)]
selftests/bpf: Add type-match checks to type-based tests

Now that we have type-match logic in both libbpf and the kernel, this
change adjusts the existing BPF self tests to check this functionality.
Specifically, we extend the existing type-based tests to check the
previously introduced bpf_core_type_matches macro.

Signed-off-by: Daniel Müller <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agolibbpf: add bpf_core_type_matches() helper macro
Andrii Nakryiko [Wed, 6 Jul 2022 03:56:48 +0000 (20:56 -0700)]
libbpf: add bpf_core_type_matches() helper macro

This patch finalizes support for the proposed type match relation in libbpf by
adding bpf_core_type_matches() macro which emits TYPE_MATCH relocation.

Clang support for this relocation was added in [0].

  [0] https://reviews.llvm.org/D126838

Signed-off-by: Daniel Müller <[email protected]
Signed-off-by: Andrii Nakryiko <[email protected]
Link: https://lore.kernel.org/bpf/[email protected]¬
2 years agobpf, libbpf: Add type match support
Daniel Müller [Tue, 28 Jun 2022 16:01:21 +0000 (16:01 +0000)]
bpf, libbpf: Add type match support

This patch adds support for the proposed type match relation to
relo_core where it is shared between userspace and kernel. It plumbs
through both kernel-side and libbpf-side support.

The matching relation is defined as follows (copy from source):
- modifiers and typedefs are stripped (and, hence, effectively ignored)
- generally speaking types need to be of same kind (struct vs. struct, union
  vs. union, etc.)
  - exceptions are struct/union behind a pointer which could also match a
    forward declaration of a struct or union, respectively, and enum vs.
    enum64 (see below)
Then, depending on type:
- integers:
  - match if size and signedness match
- arrays & pointers:
  - target types are recursively matched
- structs & unions:
  - local members need to exist in target with the same name
  - for each member we recursively check match unless it is already behind a
    pointer, in which case we only check matching names and compatible kind
- enums:
  - local variants have to have a match in target by symbolic name (but not
    numeric value)
  - size has to match (but enum may match enum64 and vice versa)
- function pointers:
  - number and position of arguments in local type has to match target
  - for each argument and the return value we recursively check match

Signed-off-by: Daniel Müller <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agobpftool: Honor BPF_CORE_TYPE_MATCHES relocation
Daniel Müller [Tue, 28 Jun 2022 16:01:19 +0000 (16:01 +0000)]
bpftool: Honor BPF_CORE_TYPE_MATCHES relocation

bpftool needs to know about the newly introduced BPF_CORE_TYPE_MATCHES
relocation for its 'gen min_core_btf' command to work properly in the
present of this relocation.
Specifically, we need to make sure to mark types and fields so that they
are present in the minimized BTF for "type match" checks to work out.
However, contrary to the existing btfgen_record_field_relo, we need to
rely on the BTF -- and not the spec -- to find fields. With this change
we handle this new variant correctly. The functionality will be tested
with follow on changes to BPF selftests, which already run against a
minimized BTF created with bpftool.

Signed-off-by: Daniel Müller <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agobpf: Introduce TYPE_MATCH related constants/macros
Daniel Müller [Tue, 28 Jun 2022 16:01:18 +0000 (16:01 +0000)]
bpf: Introduce TYPE_MATCH related constants/macros

In order to provide type match support we require a new type of
relocation which, in turn, requires toolchain support. Recent LLVM/Clang
versions support a new value for the last argument to the
__builtin_preserve_type_info builtin, for example.
With this change we introduce the necessary constants into relevant
header files, mirroring what the compiler may support.

Signed-off-by: Daniel Müller <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agonet: asix: change the type of asix_set_sw/hw_mii to static
Zhengchao Shao [Mon, 4 Jul 2022 12:34:48 +0000 (20:34 +0800)]
net: asix: change the type of asix_set_sw/hw_mii to static

The functions of asix_set_sw/hw_mii are not called in other files, so
change them to static.

Signed-off-by: Zhengchao Shao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: lan966x: hardcode the number of external ports
Michael Walle [Mon, 4 Jul 2022 15:36:54 +0000 (17:36 +0200)]
net: lan966x: hardcode the number of external ports

Instead of counting the child nodes in the device tree, hardcode the
number of ports in the driver itself.  The counting won't work at all
if an ethernet port is marked as disabled, e.g. because it is not
connected on the board at all.

It turns out that the LAN9662 and LAN9668 use the same switching IP
with the same synthesis parameters. The only difference is that the
output ports are not connected. Thus, we can just hardcode the
number of physical ports to 8.

Fixes: db8bcaad5393 ("net: lan966x: add the basic lan966x driver")
Signed-off-by: Michael Walle <[email protected]>
Reviewed-by: Horatiu Vultur <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: dsa: felix: build as module when tc-taprio is module
Vladimir Oltean [Mon, 4 Jul 2022 19:02:41 +0000 (22:02 +0300)]
net: dsa: felix: build as module when tc-taprio is module

felix_vsc9959.c calls taprio_offload_get() and taprio_offload_free(),
symbols exported by net/sched/sch_taprio.c. As such, we must disallow
building the Felix driver as built-in when the symbol exported by
tc-taprio isn't present in the kernel image.

Fixes: 1c9017e44af2 ("net: dsa: felix: keep reference on entire tc-taprio config")
Signed-off-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agonet: sched: provide shim definitions for taprio_offload_{get,free}
Vladimir Oltean [Mon, 4 Jul 2022 19:02:40 +0000 (22:02 +0300)]
net: sched: provide shim definitions for taprio_offload_{get,free}

All callers of taprio_offload_get() and taprio_offload_free() prior to
the blamed commit are conditionally compiled based on CONFIG_NET_SCH_TAPRIO.

felix_vsc9959.c is different; it provides vsc9959_qos_port_tas_set()
even when taprio is compiled out.

Provide shim definitions for the functions exported by taprio so that
felix_vsc9959.c is able to compile. vsc9959_qos_port_tas_set() in that
case is dead code anyway, and ocelot_port->taprio remains NULL, which is
fine for the rest of the logic.

Fixes: 1c9017e44af2 ("net: dsa: felix: keep reference on entire tc-taprio config")
Reported-by: Colin Foster <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Tested-by: Colin Foster <[email protected]>
Acked-by: Vinicius Costa Gomes <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoeth: remove neterion/vxge
Jakub Kicinski [Tue, 5 Jul 2022 22:22:28 +0000 (15:22 -0700)]
eth: remove neterion/vxge

The last meaningful change to this driver was made by Jon in 2011.
As much as we'd like to believe that this is because the code is
perfect the chances are nobody is using this hardware.

Because of the size of this driver there is a nontrivial maintenance
cost to keeping this code around, in the last 2 years we're averaging
more than 1 change a month. Some of which require nontrivial review
effort, see commit 877fe9d49b74 ("Revert "drivers/net/ethernet/neterion/vxge:
Fix a use-after-free bug in vxge-main.c"") for example.

Let's try to remove this driver. In general, IMHO, we need to
establish a clear path for shedding dead code. It will be hard
to unless we have some experience trying to delete stuff.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agodt-bindings: net: dsa: mediatek,mt7530: Add missing 'reg' property
Rob Herring [Fri, 1 Jul 2022 22:22:40 +0000 (16:22 -0600)]
dt-bindings: net: dsa: mediatek,mt7530: Add missing 'reg' property

The 'reg' property is missing from the mediatek,mt7530 schema which
results in the following warning once 'unevaluatedProperties' is fixed:

Documentation/devicetree/bindings/net/dsa/mediatek,mt7530.example.dtb: switch@0: Unevaluated properties are not allowed ('reg' was unexpected)

Fixes: e0dda3119741 ("dt-bindings: net: dsa: convert binding for mediatek switches")
Signed-off-by: Rob Herring <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoMerge tag 'for-net-2022-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Jakub Kicinski [Tue, 5 Jul 2022 21:42:09 +0000 (14:42 -0700)]
Merge tag 'for-net-2022-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fix deadlock when powering on.

* tag 'for-net-2022-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: core: Fix deadlock on hci_power_on_sync.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
2 years agoBluetooth: core: Fix deadlock on hci_power_on_sync.
Vasyl Vavrychuk [Tue, 5 Jul 2022 12:59:31 +0000 (15:59 +0300)]
Bluetooth: core: Fix deadlock on hci_power_on_sync.

`cancel_work_sync(&hdev->power_on)` was moved to hci_dev_close_sync in
commit [1] to ensure that power_on work is canceled after HCI interface
down.

But, in certain cases power_on work function may call hci_dev_close_sync
itself: hci_power_on -> hci_dev_do_close -> hci_dev_close_sync ->
cancel_work_sync(&hdev->power_on), causing deadlock. In particular, this
happens when device is rfkilled on boot. To avoid deadlock, move
power_on work canceling out of hci_dev_do_close/hci_dev_close_sync.

Deadlock introduced by commit [1] was reported in [2,3] as broken
suspend. Suspend did not work because `hdev->req_lock` held as result of
`power_on` work deadlock. In fact, other BT features were not working.
It was not observed when testing [1] since it was verified without
rfkill in place.

NOTE: It is not needed to cancel power_on work from other places where
hci_dev_do_close/hci_dev_close_sync is called in case:
* Requests were serialized due to `hdev->req_workqueue`. The power_on
work is first in that workqueue.
* hci_rfkill_set_block which won't close device anyway until HCI_SETUP
is on.
* hci_sock_release which runs after hci_sock_bind which ensures
HCI_SETUP was cleared.

As result, behaviour is the same as in pre-dd06ed7 commit, except
power_on work cancel added to hci_dev_close.

[1]: commit ff7f2926114d ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
[2]: https://lore.kernel.org/lkml/20220614181706[email protected]/
[2]: https://lore.kernel.org/lkml/1236061d-95dd-c3ad-a38f-2dae7aae51ef@o2.pl/

Fixes: ff7f2926114d ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
Signed-off-by: Vasyl Vavrychuk <[email protected]>
Reported-by: Max Krummenacher <[email protected]>
Reported-by: Mateusz Jonczyk <[email protected]>
Tested-by: Max Krummenacher <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2 years agoMerge tag 'xsa-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Linus Torvalds [Tue, 5 Jul 2022 16:18:32 +0000 (09:18 -0700)]
Merge tag 'xsa-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen security fixes from Juergen Gross:

 - XSA-403 (4 patches for blkfront and netfront drivers):

   Linux Block and Network PV device frontends don't zero memory regions
   before sharing them with the backend (CVE-2022-26365,
   CVE-2022-33740). Additionally the granularity of the grant table
   doesn't allow sharing less than a 4K page, leading to unrelated data
   residing in the same 4K page as data shared with a backend being
   accessible by such backend (CVE-2022-33741, CVE-2022-33742).

 - XSA-405 (1 patch for netfront driver, only 5.10 and newer):

   While adding logic to support XDP (eXpress Data Path), a code label
   was moved in a way allowing for SKBs having references (pointers)
   retained for further processing to nevertheless be freed.

 - XSA-406 (1 patch for Arm specific dom0 code):

   When mapping pages of guests on Arm, dom0 is using an rbtree to keep
   track of the foreign mappings.

   Updating of that rbtree is not always done completely with the
   related lock held, resulting in a small race window, which can be
   used by unprivileged guests via PV devices to cause inconsistencies
   of the rbtree. These inconsistencies can lead to Denial of Service
   (DoS) of dom0, e.g. by causing crashes or the inability to perform
   further mappings of other guests' memory pages.

* tag 'xsa-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/arm: Fix race in RB-tree based P2M accounting
  xen-netfront: restore __skb_queue_tail() positioning in xennet_get_responses()
  xen/blkfront: force data bouncing when backend is untrusted
  xen/netfront: force data bouncing when backend is untrusted
  xen/netfront: fix leaking data in shared pages
  xen/blkfront: fix leaking data in shared pages

2 years agoALSA: cs46xx: Fix missing snd_card_free() call at probe error
Takashi Iwai [Tue, 5 Jul 2022 15:23:36 +0000 (17:23 +0200)]
ALSA: cs46xx: Fix missing snd_card_free() call at probe error

The previous cleanup with devres may lead to the incorrect release
orders at the probe error handling due to the devres's nature.  Until
we register the card, snd_card_free() has to be called at first for
releasing the stuff properly when the driver tries to manage and
release the stuff via card->private_free().

This patch fixes it by calling snd_card_free() manually on the error
from the probe callback.

Fixes: 5bff69b3645d ("ALSA: cs46xx: Allocate resources with device-managed APIs")
Cc: <[email protected]>
Reported-and-tested-by: Jan Engelhardt <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
2 years agocxgb4: Use the bitmap API to allocate bitmaps
Christophe JAILLET [Sun, 3 Jul 2022 16:46:36 +0000 (18:46 +0200)]
cxgb4: Use the bitmap API to allocate bitmaps

Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

While at it, remove a useless bitmap_zero(). The bitmap is already zeroed
when allocated.

Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/8a2168ef9871bd9c4f1cf19b8d5f7530662a5d15.1656866770.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Paolo Abeni <[email protected]>
2 years agonet/mlx5: fix 32bit build
Paolo Abeni [Tue, 5 Jul 2022 07:17:04 +0000 (09:17 +0200)]
net/mlx5: fix 32bit build

We can't use the division operator on 64 bits integers, that breaks
32 bits build. Instead use the relevant helper.

Fixes: 6ddac26cf763 ("net/mlx5e: Add support to modify hardware flow meter parameters")
Acked-by: Saeed Mahameed <[email protected]>
Link: https://lore.kernel.org/r/ecb00ddd1197b4f8a4882090206bd2eee1eb8b5b.1657005206.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <[email protected]>
2 years agobpf, samples: Remove AF_XDP samples
Magnus Karlsson [Thu, 30 Jun 2022 09:37:17 +0000 (11:37 +0200)]
bpf, samples: Remove AF_XDP samples

Remove the AF_XDP samples from samples/bpf/ as they are dependent on
the AF_XDP support in libbpf. This support has now been removed in the
1.0 release, so these samples cannot be compiled anymore. Please start
to use libxdp instead. It is backwards compatible with the AF_XDP
support that was offered in libbpf. New samples can be found in the
various xdp-project repositories connected to libxdp and by googling.

Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Acked-by: Maciej Fijalkowski <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agobpftool: Rename "bpftool feature list" into "... feature list_builtins"
Quentin Monnet [Fri, 1 Jul 2022 09:38:05 +0000 (10:38 +0100)]
bpftool: Rename "bpftool feature list" into "... feature list_builtins"

To make it more explicit that the features listed with "bpftool feature
list" are known to bpftool, but not necessary available on the system
(as opposed to the probed features), rename the "feature list" command
into "feature list_builtins".

Note that "bpftool feature list" still works as before given that we
recognise arguments from their prefixes; but the real name of the
subcommand, in particular as displayed in the man page or the
interactive help, will now include "_builtins".

Since we update the bash completion accordingly, let's also take this
chance to redirect error output to /dev/null in the completion script,
to avoid displaying unexpected error messages when users attempt to
tab-complete.

Suggested-by: Daniel Borkmann <[email protected]>
Signed-off-by: Quentin Monnet <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
2 years agoMerge branch 'fix-bridge_vlan_aware-sh-and-bridge_vlan_unaware-sh-with-iff_unicast_flt'
Paolo Abeni [Tue, 5 Jul 2022 09:52:35 +0000 (11:52 +0200)]
Merge branch 'fix-bridge_vlan_aware-sh-and-bridge_vlan_unaware-sh-with-iff_unicast_flt'

Vladimir Oltean says:

====================
Fix bridge_vlan_aware.sh and bridge_vlan_unaware.sh with IFF_UNICAST_FLT

Make sure that h1 and h2 don't drop packets with a random MAC DA, which
otherwise confuses these selftests. Also, fix an incorrect error message
found during those failures.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
2 years agoselftests: forwarding: fix error message in learning_test
Vladimir Oltean [Sun, 3 Jul 2022 07:36:26 +0000 (10:36 +0300)]
selftests: forwarding: fix error message in learning_test

When packets are not received, they aren't received on $host1_if, so the
message talking about the second host not receiving them is incorrect.
Fix it.

Fixes: d4deb01467ec ("selftests: forwarding: Add a test for FDB learning")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
This page took 0.131688 seconds and 4 git commands to generate.