bpf: selftests: Add selftests for module kfunc support
This adds selftests that tests the success and failure path for modules
kfuncs (in presence of invalid kfunc calls) for both libbpf and
gen_loader. It also adds a prog_test kfunc_btf_id_list so that we can
add module BTF ID set from bpf_testmod.
This also introduces a couple of test cases to verifier selftests for
validating whether we get an error or not depending on if invalid kfunc
call remains after elimination of unreachable instructions.
libbpf: Update gen_loader to emit BTF_KIND_FUNC relocations
This change updates the BPF syscall loader to relocate BTF_KIND_FUNC
relocations, with support for weak kfunc relocations. The general idea
is to move map_fds to loader map, and also use the data for storing
kfunc BTF fds. Since both reuse the fd_array parameter, they need to be
kept together.
For map_fds, we reserve MAX_USED_MAPS slots in a region, and for kfunc,
we reserve MAX_KFUNC_DESCS. This is done so that insn->off has more
chances of being <= INT16_MAX than treating data map as a sparse array
and adding fd as needed.
When the MAX_KFUNC_DESCS limit is reached, we fall back to the sparse
array model, so that as long as it does remain <= INT16_MAX, we pass an
index relative to the start of fd_array.
We store all ksyms in an array where we try to avoid calling the
bpf_btf_find_by_name_kind helper, and also reuse the BTF fd that was
already stored. This also speeds up the loading process compared to
emitting calls in all cases, in later tests.
libbpf: Resolve invalid weak kfunc calls with imm = 0, off = 0
Preserve these calls as it allows verifier to succeed in loading the
program if they are determined to be unreachable after dead code
elimination during program load. If not, the verifier will fail at
runtime. This is done for ext->is_weak symbols similar to the case for
variable ksyms.
This patch adds libbpf support for kernel module function call support.
The fd_array parameter is used during BPF program load to pass module
BTFs referenced by the program. insn->off is set to index into this
array, but starts from 1, because insn->off as 0 is reserved for
btf_vmlinux.
We try to use existing insn->off for a module, since the kernel limits
the maximum distinct module BTFs for kfuncs to 256, and also because
index must never exceed the maximum allowed value that can fit in
insn->off (INT16_MAX). In the future, if kernel interprets signed offset
as unsigned for kfunc calls, this limit can be increased to UINT16_MAX.
Also introduce a btf__find_by_name_kind_own helper to start searching
from module BTF's start id when we know that the BTF ID is not present
in vmlinux BTF (in find_ksym_btf_id).
bpf: Enable TCP congestion control kfunc from modules
This commit moves BTF ID lookup into the newly added registration
helper, in a way that the bbr, cubic, and dctcp implementation set up
their sets in the bpf_tcp_ca kfunc_btf_set list, while the ones not
dependent on modules are looked up from the wrapper function.
This lifts the restriction for them to be compiled as built in objects,
and can be loaded as modules if required. Also modify Makefile.modfinal
to call resolve_btfids for each module.
Note that since kernel kfunc_ids never overlap with module kfunc_ids, we
only match the owner for module btf id sets.
See following commits for background on use of:
CONFIG_X86 ifdef: 569c484f9995 (bpf: Limit static tcp-cc functions in the .BTF_ids list to x86)
CONFIG_DYNAMIC_FTRACE ifdef: 7aae231ac93b (bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACE)
tools: Allow specifying base BTF file in resolve_btfids
This commit allows specifying the base BTF for resolving btf id
lists/sets during link time in the resolve_btfids tool. The base BTF is
set to NULL if no path is passed. This allows resolving BTF ids for
module kernel objects.
Also, drop the --no-fail option, as it is only used in case .BTF_ids
section is not present, instead make no-fail the default mode. The long
option name is same as that of pahole.
bpf: btf: Introduce helpers for dynamic BTF set registration
This adds helpers for registering btf_id_set from modules and the
bpf_check_mod_kfunc_call callback that can be used to look them up.
With in kernel sets, the way this is supposed to work is, in kernel
callback looks up within the in-kernel kfunc whitelist, and then defers
to the dynamic BTF set lookup if it doesn't find the BTF id. If there is
no in-kernel BTF id set, this callback can be used directly.
Also fix includes for btf.h and bpfptr.h so that they can included in
isolation. This is in preparation for their usage in tcp_bbr, tcp_cubic
and tcp_dctcp modules in the next patch.
bpf: Be conservative while processing invalid kfunc calls
This patch also modifies the BPF verifier to only return error for
invalid kfunc calls specially marked by userspace (with insn->imm == 0,
insn->off == 0) after the verifier has eliminated dead instructions.
This can be handled in the fixup stage, and skip processing during add
and check stages.
If such an invalid call is dropped, the fixup stage will not encounter
insn->imm as 0, otherwise it bails out and returns an error.
This will be exposed as weak ksym support in libbpf in later patches.
bpf: Introduce BPF support for kernel module function calls
This change adds support on the kernel side to allow for BPF programs to
call kernel module functions. Userspace will prepare an array of module
BTF fds that is passed in during BPF_PROG_LOAD using fd_array parameter.
In the kernel, the module BTFs are placed in the auxilliary struct for
bpf_prog, and loaded as needed.
The verifier then uses insn->off to index into the fd_array. insn->off
0 is reserved for vmlinux BTF (for backwards compat), so userspace must
use an fd_array index > 0 for module kfunc support. kfunc_btf_tab is
sorted based on offset in an array, and each offset corresponds to one
descriptor, with a max limit up to 256 such module BTFs.
We also change existing kfunc_tab to distinguish each element based on
imm, off pair as each such call will now be distinct.
Another change is to check_kfunc_call callback, which now include a
struct module * pointer, this is to be used in later patch such that the
kfunc_id and module pointer are matched for dynamically registered BTF
sets from loadable modules, so that same kfunc_id in two modules doesn't
lead to check_kfunc_call succeeding. For the duration of the
check_kfunc_call, the reference to struct module exists, as it returns
the pointer stored in kfunc_btf_tab.
Jakub Kicinski [Tue, 5 Oct 2021 14:41:14 +0000 (07:41 -0700)]
Merge tag 'for-net-next-2021-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- Add support for MediaTek MT7922 and MT7921
- Enable support for AOSP extention in Qualcomm WCN399x and Realtek
8822C/8852A.
- Add initial support for link quality and audio/codec offload.
- Rework of sockets sendmsg to avoid locking issues.
- Add vhci suspend/resume emulation.
In theory addr_len may not be ETH_ALEN, but we don't expect
non-Ethernet devices to live under this directory, and only
the following cases of setting addr_len exist:
- cxgb4 for mgmt device,
and the drivers which set it to ETH_ALEN: s2io, mlx4, vxge.
David S. Miller [Tue, 5 Oct 2021 12:15:35 +0000 (13:15 +0100)]
Merge branch 'mlx4-const-dev_addr'
Jakub Kicinski says:
====================
mlx4: prep for constant dev->dev_addr
This patch converts mlx4 for dev->dev_addr being const. It converts
to use of common helpers but also removes some seemingly unnecessary
idiosyncrasies.
Jakub Kicinski [Mon, 4 Oct 2021 19:14:45 +0000 (12:14 -0700)]
mlx4: remove custom dev_addr clearing
mlx4_en_u64_to_mac() takes the dev->dev_addr pointer and writes
to it byte by byte. It also clears the two bytes _after_ ETH_ALEN
which seems unnecessary. dev->addr_len is set to ETH_ALEN just
before the call.
Jakub Kicinski [Mon, 4 Oct 2021 19:14:43 +0000 (12:14 -0700)]
mlx4: replace mlx4_mac_to_u64() with ether_addr_to_u64()
mlx4_mac_to_u64() predates and opencodes ether_addr_to_u64().
It doesn't make the argument constant so it'll be problematic
when dev->dev_addr becomes a const. Convert to the generic helper.
David S. Miller [Tue, 5 Oct 2021 10:42:38 +0000 (11:42 +0100)]
Merge tag 'mlx5-updates-2021-10-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-10-04
Misc updates for mlx5 driver
1) Add TX max rate support for MQPRIO channel mode
2) Trivial TC action and modify header refactoring
3) TC support for accept action in fdb offloads
4) Allow single IRQ for PCI functions
5) Bridge offload: Pop PVID VLAN header on egress miss
Vlad Buslov says:
=================
With current architecture of mlx5 bridge offload it is possible for a
packet to match in ingress table by source MAC (resulting VLAN header push
in case of port with configured PVID) and then miss in egress table when
destination MAC is not in FDB. Due to the lack of hardware learning in
NICs, this, in turn, results packet going to software data path with PVID
VLAN already added by hardware. This doesn't break software bridge since it
accepts either untagged packets or packets with any provisioned VLAN on
ports with PVID, but can break ingress TC, if affected part of Ethernet
header is matched by classifier.
Improve compatibility with software TC by restoring the packet header on
egress miss. Effectively, this change implements atomicity of mlx5 bridge
offload implementation - packet is either modified and redirected to
destination port or appears unmodified in software.
Rafał Miłecki [Sat, 2 Oct 2021 17:58:12 +0000 (19:58 +0200)]
net: bgmac: support MDIO described in DT
Check ethernet controller DT node for "mdio" subnode and use it with
of_mdiobus_register() when present. That allows specifying MDIO and its
PHY devices in a standard DT based way.
This is required for BCM53573 SoC support. That family is sometimes
called Northstar (by marketing?) but is quite different from it. It uses
different CPU(s) and many different hw blocks.
One of shared blocks in BCM53573 is Ethernet controller. Switch however
is not SRAB accessible (as it Northstar) but is MDIO attached.
Jakub Kicinski [Tue, 5 Oct 2021 01:11:14 +0000 (18:11 -0700)]
ethernet: ehea: add missing cast
We need to cast the pointer, unlike memcpy() eth_hw_addr_set()
does not take void *. The driver already casts &port->mac_addr
to u8 * in other places.
Reported-by: Stephen Rothwell <[email protected]> Fixes: a96d317fb1a3 ("ethernet: use eth_hw_addr_set()") Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Shay Drory [Thu, 19 Aug 2021 13:18:57 +0000 (16:18 +0300)]
net/mlx5: Shift control IRQ to the last index
Control IRQ is the first IRQ vector. This complicates handling of
completion irqs as we need to offset them by one.
in the next patch, there are scenarios where completion and control EQs
will share the same irq. for example: functions with single IRQ. To ease
such scenarios, we shift control IRQ to the end of the irq array.
Create lowest priority flow group in egress table with single rule that
matches on special reg_c1 value that is set on ingress VLAN push with
single action that pops VLAN. The flow destination is skip table that is
used to skip any further processing of packet in FDB bridge priority.
On ingress VLAN push also assign value 0x7FE to reg_c1 tunnel id+opts
bits (tunnel id 0, which is not a valid tunnel id, and option 0x7FE which
was reserved by one of previous patches in the series). In following patch
the reg value is matched on egress miss to restore the packet to its
original state by removing the VLAN before passing it to the software data
path.
net/mlx5: Bridge, extract VLAN pop code to dedicated functions
Following patches in series need to pop VLAN when packet misses on egress.
To reuse existing bridge VLAN pop handling code, extract it to dedicated
helpers mlx5_esw_bridge_pkt_reformat_vlan_pop_supported() and
mlx5_esw_bridge_pkt_reformat_vlan_pop_create().
Several functions in bridge.c excessively obtain pointer to parent eswitch
instance by dereferencing br_offloads->esw on every usage and following
patches in this series add even more usages of eswitch. Introduce local
variable 'esw' and use it instead.
Support TC generic 'accept' action in mlx5 by introducing
MLX5_ESW_ATTR_FLAG_ACCEPT attribute flag. Flag has similar semantics to
existing MLX5_ESW_ATTR_FLAG_SLOW_PATH flag, however, dedicated flag is
required because existing 'slow path' flag can be flipped by tunneling
subsystem when neighbor changes state.
Introduce new helper function mlx5_esw_attr_flags_skip() to check whether
attribute flags for 'slow path' or 'accept' action are set and use it in
eswitch code instead of direct bit manipulation.
Chris Mi [Sun, 26 Sep 2021 09:17:49 +0000 (17:17 +0800)]
net/mlx5e: Specify out ifindex when looking up encap route
There is a use case that the local and remote VTEPs are in the same
host. Currently, the out ifindex is not specified when looking up the
encap route for offloads. So in this case, a local route is returned
and the route dev is lo.
Actual tunnel interface can be created with a parameter "dev" [1],
which specifies the physical device to use for tunnel endpoint
communication. Pass this parameter to driver when looking up encap
route for offloads. So that a unicast route will be returned.
[1] ip link add name vxlan1 type vxlan id 100 dev enp4s0f0 remote 1.1.1.1 dstport 4789
Roi Dayan [Sun, 15 Aug 2021 10:36:01 +0000 (13:36 +0300)]
net/mlx5e: Move parse fdb check into actions_match_supported_fdb()
The parse fdb/nic actions funcs parse the actions and then call
actions_match_supported() for final check.
Move related check in parse_tc_fdb_actions() into
actions_match_supported_fdb() for more organized code.
Roi Dayan [Sun, 15 Aug 2021 09:53:13 +0000 (12:53 +0300)]
net/mlx5e: Split actions_match_supported() into a sub function
There will probably be more checks, some for nic flows, some for fdb
flows and some are shared checks. Split it for fdb and nic to avoid
the function getting too big.
Roi Dayan [Tue, 14 Sep 2021 08:32:51 +0000 (11:32 +0300)]
net/mlx5e: TC, Refactor sample offload error flow
Refactor sample unoffload to be symmetric to sample offload.
Use the existing del_post_rule() to release the post rule.
Also mlx5e_tc_sample_unoffload() should not return post_rule
which is NULL when post actions are supported.
Sample offload works with this NULL because many places of the
code use IS_ERR() instead of IS_ERR_OR_NULL() to check rule is valid
and when rule is detected as sample offload the code is not using the
rule. Let's be persistent and avoid returning NULL anyway and return the
pre rule, like in CT case, which is not NULL.
Tariq Toukan [Wed, 18 Aug 2021 11:17:33 +0000 (14:17 +0300)]
net/mlx5e: Specify SQ stats struct for mlx5e_open_txqsq()
Let the caller of mlx5e_open_txqsq() directly pass the SQ stats
structure pointer.
This replaces logic involving the qos_queue_group_id parameter,
and helps generalizing its role in the next patch.
David S. Miller [Mon, 4 Oct 2021 12:50:05 +0000 (13:50 +0100)]
Merge branch 'phy-10g-mode-helper'
Russell King says:
====================
Add phylink helper for 10G modes
During the last cycle, there was discussion about adding a helper
to set the 10G link modes for phylink, which resulted in these two
patches introduce such a helper.
====================
MichelleJin [Sat, 2 Oct 2021 22:33:32 +0000 (22:33 +0000)]
net: ipv6: fix use after free of struct seg6_pernet_data
sdata->tun_src should be freed before sdata is freed
because sdata->tun_src is allocated after sdata allocation.
So, kfree(sdata) and kfree(rcu_dereference_raw(sdata->tun_src)) are
changed code order.
Fixes: f04ed7d277e8 ("net: ipv6: check return value of rhashtable_init") Signed-off-by: MichelleJin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Mon, 4 Oct 2021 11:55:49 +0000 (12:55 +0100)]
Merge branch 'qed-new-fw'
Prabhakar Kushwaha says:
====================
qed: new firmware version 8.59.1.0 support
This series integrate new firmware version 8.59.1.0, along with updated
HSI (hardware software interface) to use the FW, into the family of
qed drivers (fastlinq devices). This FW does not reside in the NVRAM.
It needs to be programmed to device during driver load as the part of
initialization sequence.
Similar to previous FW support series, this FW is tightly linked to
software and pf function driver. This means FW release is not backward
compatible, and driver should always run with the FW it was designed
against.
FW binary blob is already submitted & accepted in linux-firmware repo.
Patches in the series include:
patch 1 - qed: Fix kernel-doc warnings
patch 2 - qed: Remove e4_ and _e4 from FW HSI
patch 3 - qed: split huge qed_hsi.h header file
patch 4-8 - HSI (hardware software interface) changes
patch 9 - qed: Add '_GTT' suffix to the IRO RAM macros
patch 10 - qed: Update debug related changes
patch 11 - qed: rdma: Update TCP silly-window-syndrome timeout
patch 12 - qed: Update the TCP active termination 2 MSL timer
patch 13 - qed: fix ll2 establishment during load of RDMA driver
In addition, this patch series also fixes existing checkpatch warnings
and checks which are missing.
Changes for v2:
- Incorporated Jakub's comments.
- New patch introduced to fix all kernel-doc issue in qed driver.
- Fixed warning: ‘qed_mfw_ext_20g’ defined but not used.
- Fixed warning related to kernel-doc wrt to this series.
- Removed inline function declaration.
====================
Nikolay Assa [Mon, 4 Oct 2021 06:58:49 +0000 (09:58 +0300)]
qed: Update TCP silly-window-syndrome timeout for iwarp, scsi
Update TCP silly-window-syndrome timeout, for the cases where
initiator's small TCP window size prevents FW from transmitting
packets on the connection. Timeout causes FW to retransmit
window probes if needed, preventing I/O stall if initiator ignores
first window probe.
qed_debug features are updated to support FW version 8.59.1.0 along
with few enhancements.
- Removal of _BB_K2 from register defines.
- Add new condition cond14.
- Add dump of new area sw-platform, epoch, iscsi_task_pages,
fcoe_task_pages, roce_task_pages and eth_task_pages.
- Introduced new functions qed_dbg_phy_size().
- Update in qed_mcp_nvm_rd_cmd() declaration.
- Allow QED to control init/exit at pf level.
- Dump partial "ILT-dump" if buffer size is not sufficient.
This patch also fixes the existing checkpatch warnings and few important
checks.
GTT (Global translation table) is a fast-access window in the BAR into
the register space, which only maps certain register addresses.
This change helps enforce that only those addresses which are indeed
mapped by the GTT are being accessed through it.
Adding the '_GTT' suffix to the IRO FW memory (“RAM”) macros that
access GTT-able region in FW memories (“RAM”) and use GTT macros
to access RAM BAR from drivers.
Omkar Kulkarni [Mon, 4 Oct 2021 06:58:46 +0000 (09:58 +0300)]
qed: Update FW init functions to support FW 8.59.1.0
The qed_init_fw_func.c and qed_init_ops.c updated to support FW
version 8.59.1.0.
- Support 16-bit VPORT WFQ (weighted fair queueing) weights.
- Support WFQ (weighted fair queueing) weight per VPORT + TC.
- Support allocation of Tx PQs(physical queues) per PF,VF.
- Modify Global RL (rate limiter) upper bound configuration.
- Update FW operation functions.
- Update iro_arr[] array.
This patch also fixes the existing checkpatch warnings and few important
checks.
qed_iro_hsi.h contains HSI changes related to storm memories access.
Existing code is based on hard-coded index.
Use enum as defined for FW HSI 8.59.1.0, instead of hard-coded index.
This patch also removes unnecessary header file inclusion.
The qed_hsi.h has been updated to support new FW version 8.59.1.0 with
changes.
- Updates FW HSI (Hardware Software interface) structures.
- Addition/update in function declaration and defines as per HSI.
- Add generic infrastructure for FW error reporting as part of
common event queue handling.
- Move malicious VF error reporting to FW error reporting
infrastructure.
- Move consolidation queue initialization from FW context to ramrod
message.
qed_hsi.h header file changes lead to change in many files to ensure
compilation.
This patch also fixes the existing checkpatch warnings and few important
checks.
The qed_mfw_hsi.h contains HSI (Hardware Software Interface) changes
related to management firmware. It has been updated to support new FW
version 8.59.1.0 with below changes.
- New defines for VF bitmap.
- fec_mode and extended_speed defines updated in struct eth_phy_cfg.
- Updated structutres lldp_system_tlvs_buffer_s, public_global,
public_port, public_func, drv_union_data, public_drv_mb
with all dependent new structures.
- Updates in NVM related structures and defines.
- Msg defines are added in enum drv_msg_code and fw_msg_code.
- Updated/added new defines.
This patch also fixes the existing checkpatch warnings and few important
checks.
The common_hsi.h has been updated for FW version 8.59.1.0 with below
changes.
- FW and Tools version.
- New structures related to search table, packet duplication.
- Structure for doorbell address for legacy mode without DEM.
- Enhanced union rdma_eqe_data for RoCE Suspend Event Data.
- New defines.
This patch also fixes the existing checkpatch warnings and few important
checks.
Omkar Kulkarni [Mon, 4 Oct 2021 06:58:41 +0000 (09:58 +0300)]
qed: Split huge qed_hsi.h header file
The qed_hsi.h is a huge header file containing HSI (Hardware Software
Interface) definitions of storm memory access, debug related, general
and management firmware specific. In order to have a better
code-organization HSI definition, this patch split the code across
multiple files, i.e.
- storm memory access HSI : qed_iro_hsi.h
- debug related HSI : qed_dbg_hsi.h
- Management firmware HSI : qed_mfg_hsi.h
- General HSI : qed_hsi.h
In addition, this patch also fixes existing checkpatch warnings and
few important checks.
Shai Malin [Mon, 4 Oct 2021 06:58:40 +0000 (09:58 +0300)]
qed: Remove e4_ and _e4 from FW HSI
The existing qed/qede/qedr/qedi/qedf code uses chip-specific naming in
structures, functions, variables and defines in FW HSI (Hardware
Software Interface).
The new FW version introduced a generic naming convention in HSI
in-which the same code will be used across different versions
for simpler maintainability. It also eases in providing support for
new features.
With this patch every "_e4" or "e4_" prefix or suffix is not needed
anymore and it will be removed.
David S. Miller [Mon, 4 Oct 2021 11:53:36 +0000 (12:53 +0100)]
Merge branch 'ipv6-ioam-encap'
Justin Iurman says:
====================
Support for the ip6ip6 encapsulation of IOAM
v2:
- add prerequisite patches
- keep uapi backwards compatible by adding two new attributes
- add more comments to document the ioam6_iptunnel uapi
In the current implementation, IOAM can only be inserted directly (i.e., only
inside packets generated locally) by default, to be compliant with RFC8200.
This patch adds support for in-transit packets and provides the ip6ip6
encapsulation of IOAM (RFC8200 compliant). Therefore, three ioam6 encap modes
are defined:
Justin Iurman [Sun, 3 Oct 2021 18:45:37 +0000 (20:45 +0200)]
ipv6: ioam: Prerequisite patch for ioam6_iptunnel
This prerequisite patch provides some minor edits (alignments, renames) and a
minor modification inside a function to facilitate the next patch by using
existing nla_* functions.
Justin Iurman [Sun, 3 Oct 2021 18:45:36 +0000 (20:45 +0200)]
ipv6: ioam: Distinguish input and output for hop-limit
This patch anticipates the support for the IOAM insertion inside in-transit
packets, by making a difference between input and output in order to determine
the right value for its hop-limit (inherited from the IPv6 hop-limit).
Input case: happens before ip6_forward, the IPv6 hop-limit is not decremented
yet -> decrement the IOAM hop-limit to reflect the new hop inside the trace.
Output case: happens after ip6_forward, the IPv6 hop-limit has already been
decremented -> keep the same value for the IOAM hop-limit.
Eric Dumazet [Fri, 1 Oct 2021 00:52:49 +0000 (17:52 -0700)]
net/mlx4_en: avoid one cache line miss to ring doorbell
This patch caches doorbell address directly in struct mlx4_en_tx_ring.
This removes the need to bring in cpu caches whole struct mlx4_uar
in fast path.
Note that mlx4_uar is not guaranteed to be on a local node,
because mlx4_bf_alloc() uses a single free list (priv->bf_list)
regardless of its node parameter.
This kind of change does matter in presence of light/moderate traffic.
In high stress, this read-only line would be kept hot in caches.
David S. Miller [Sun, 3 Oct 2021 13:35:42 +0000 (14:35 +0100)]
Merge branch 'mctp-kunit-tests'
Jeremy Kerr says:
====================
MCTP kunit tests
This change adds some initial kunit tests for the MCTP core. We'll
expand the coverage in a future series, and augment with a few
selftests, but this establishes a baseline set of tests for now.
Jeremy Kerr [Sun, 3 Oct 2021 03:17:06 +0000 (11:17 +0800)]
mctp: Add packet rx tests
Add a few tests for the initial packet ingress through
mctp_pkttype_receive function; mainly packet header sanity checks. Full
input routing checks will be added as a separate change.
M Chetan Kumar [Sat, 2 Oct 2021 14:32:12 +0000 (20:02 +0530)]
net: wwan: iosm: correct devlink extra params
1. Removed driver specific extra params like download_region,
address & region_count. The required information is passed
as part of flash API.
2. IOSM Devlink documentation updated to reflect the same.
David S. Miller [Sat, 2 Oct 2021 13:18:26 +0000 (14:18 +0100)]
Merge branch 'hw_addr_set'
Jakub Kicinski says:
====================
Use netdev->dev_addr write helpers (part 1)
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
This is the first installment of predictably tedious conversion.
It tackles:
memcpy(netdev->dev_addr, something, ETH_ADDR)
and
ether_addr_copy(netdev->dev_addr, something)
replacing both with eth_hw_addr_set().
The first 7 patches are done entirely by sparse.
Next 4 were semi-manual because the sparse conversion
resulted in errors.
====================
Jakub Kicinski [Fri, 1 Oct 2021 21:32:28 +0000 (14:32 -0700)]
ethernet: use eth_hw_addr_set() - casts
eth_hw_addr_set() takes a u8 pointer, like other
etherdevice helpers. Convert the few drivers which
require casts because they memcpy from "endian marked"
types.
Jakub Kicinski [Fri, 1 Oct 2021 21:32:24 +0000 (14:32 -0700)]
net: usb: use eth_hw_addr_set() instead of ether_addr_copy()
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Convert net/usb from ether_addr_copy() to eth_hw_addr_set():
Jakub Kicinski [Fri, 1 Oct 2021 21:32:21 +0000 (14:32 -0700)]
net: usb: use eth_hw_addr_set()
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Convert usb drivers from memcpy(... ETH_ADDR) to eth_hw_addr_set():
Jakub Kicinski [Fri, 1 Oct 2021 21:32:18 +0000 (14:32 -0700)]
arch: use eth_hw_addr_set()
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Convert misc arch drivers from memcpy(... ETH_ADDR) to eth_hw_addr_set():
David S. Miller [Sat, 2 Oct 2021 13:15:57 +0000 (14:15 +0100)]
Merge branch 'ocelot-vlan'
Vladimir Oltean says:
====================
Egress VLAN modification using VCAP ES0 on Ocelot switches
This patch set adds support for modifying a VLAN ID at the egress stage
of Ocelot/Felix switch ports. It is useful for replicating a packet on
multiple ports, and each egress port sends it using a different VLAN ID.
Tested by rewriting the VLAN ID of both
(a) packets injected from the CPU port
(b) packets received from an external station on a front-facing port
Adding a selftest to make sure it doesn't bit-rot, and if it does, that
it can be traced back easily.
====================
Vladimir Oltean [Fri, 1 Oct 2021 15:15:31 +0000 (18:15 +0300)]
selftests: net: mscc: ocelot: add a test for egress VLAN modification
For this test we are exercising the VCAP ES0 block's ability to match on
a packet with a given VLAN ID, and push an ES0 TAG A with a VID derived
from VID_A_VAL plus the classified VLAN.
$eth3.200 is the generator port
$eth0 is the bridged DUT port that receives
$eth1 is the bridged DUT port that forwards and rewrites VID 200 to 300
on egress via VCAP ES0
$eth2 is the port that receives from the DUT port $eth1
Since the egress rewriting happens outside the bridging service, VID 300
does not need to be in the bridge VLAN table of $eth1.
Vladimir Oltean [Fri, 1 Oct 2021 15:15:29 +0000 (18:15 +0300)]
selftests: net: mscc: ocelot: bring up the ports automatically
Looks like when I wrote the selftests I was using a network manager that
brought up the ports automatically. In order to not rely on that, let
the script open them up.
Vladimir Oltean [Fri, 1 Oct 2021 15:15:28 +0000 (18:15 +0300)]
net: dsa: tag_ocelot: set the classified VLAN during xmit
Currently, all packets injected into Ocelot switches are classified to
VLAN 0, regardless of whether they are VLAN-tagged or not. This is
because the switch only looks at the VLAN TCI from the DSA tag.
VLAN 0 is then stripped on egress due to REW_TAG_CFG_TAG_CFG. There are
2 cases really, below is the explanation for ocelot_port_set_native_vlan:
- Port is VLAN-aware, we set REW_TAG_CFG_TAG_CFG to 1 (egress-tag all
frames except VID 0 and the native VLAN) if a native VLAN exists, or
to 3 otherwise (tag all frames, including VID 0).
- Port is VLAN-unaware, we set REW_TAG_CFG_TAG_CFG to 0 (port tagging
disabled, classified VLAN never appears in the packet).
One can already see an inconsistency: when a native VLAN exists, VID 0
is egress-untagged, but when it doesn't, VID 0 is egress-tagged.
So when we do this:
ip link add br0 type bridge vlan_filtering 1
ip link set swp0 master br0
bridge vlan del dev swp0 vid 1
bridge vlan add dev swp0 vid 1 pvid # but not untagged
and we ping through swp0, packets will look like this:
MAC > 33:33:00:00:00:02, ethertype 802.1Q (0x8100): vlan 0, p 0,
ethertype 802.1Q (0x8100), vlan 1, p 0, ethertype IPv6 (0x86dd),
ICMP6, router solicitation, length 16
So VID 1 frames (sent that way by the Linux bridge) are encapsulated in
a VID 0 header - the classified VLAN of the packets as far as the hw is
concerned. To avoid that, what we really need to do is stop injecting
packets using the classified VLAN of 0.
This patch strips the VLAN header from the skb payload, if that VLAN
exists and if the port is under a VLAN-aware bridge. Then it copies that
VLAN header into the DSA injection frame header.
A positive side effect is that VCAP ES0 VLAN rewriting rules now work
for packets injected from the CPU into a port that's under a VLAN-aware
bridge, and we are able to match those packets by the VLAN ID that was
sent by the network stack, and not by VLAN ID 0.
Vladimir Oltean [Fri, 1 Oct 2021 15:15:26 +0000 (18:15 +0300)]
net: mscc: ocelot: support egress VLAN rewriting via VCAP ES0
Currently the ocelot driver does support the 'vlan modify' action, but
in the ingress chain, and it is offloaded to VCAP IS1. This action
changes the classified VLAN before the packet enters the bridging
service, and the bridging works with the classified VLAN modified by
VCAP IS1.
That is good for some use cases, but there are others where the VLAN
must be modified at the stage of the egress port, after the packet has
exited the bridging service. One example is simulating IEEE 802.1CB
active stream identification filters ("active" means that not only the
rule matches on a packet flow, but it is also able to change some
headers). For example, a stream is replicated on two egress ports, but
they must have different VLAN IDs on egress ports A and B.
This seems like a task for the VCAP ES0, but that currently only
supports pushing the ES0 tag A, which is specified in the rule. Pushing
another VLAN header is not what we want, but rather overwriting the
existing one.
It looks like when we push the ES0 tag A, it is actually possible to not
only take the ES0 tag A's value from the rule itself (VID_A_VAL), but
derive it from the following formula:
ES0_TAG_A = Classified VID + VID_A_VAL
Otherwise said, ES0_TAG_A can be used to increment with a given value
the VLAN ID that the packet was already classified to, and the packet
will have this value as an outer VLAN tag. This new VLAN ID value then
gets stripped on egress (or not) according to the value of the native
VLAN from the bridging service.
While the hardware will happily increment the classified VLAN ID for all
packets that match the ES0 rule, in practice this would be rather
insane, so we only allow this kind of ES0 action if the ES0 filter
contains a VLAN ID too, so as to restrict the matching on a known
classified VLAN. If we program VID_A_VAL with the delta between the
desired final VLAN (ES0_TAG_A) and the classified VLAN, we obtain the
desired behavior.
It doesn't look like it is possible with the tc-vlan action to modify
the VLAN ID but not the PCP. In hardware it is possible to leave the PCP
to the classified value, but we unconditionally program it to overwrite
it with the PCP value from the rule.
Shannon Nelson [Fri, 1 Oct 2021 18:05:56 +0000 (11:05 -0700)]
ionic: have ionic_qcq_disable decide on sending to hardware
Simplify the code a little by keeping the send_to_hw decision
inside of ionic_qcq_disable rather than in the callers. Also,
add ENXIO to the decision expression.
Shannon Nelson [Fri, 1 Oct 2021 18:05:55 +0000 (11:05 -0700)]
ionic: add polling to adminq wait
Split the adminq wait into smaller polling periods in order
to watch for broken firmware and not have to wait for the full
adminq devcmd_timeout.
Generally, adminq commands take fewer than 2 msecs. If the
FW is busy they can take longer, but usually still under 100
msecs. We set the polling period to 100 msecs in order to
start snooping on FW status when a command is taking longer
than usual.
Shannon Nelson [Fri, 1 Oct 2021 18:05:54 +0000 (11:05 -0700)]
ionic: widen queue_lock use around lif init and deinit
Widen the coverage of the queue_lock to be sure the lif init
and lif deinit actions are protected. This addresses a hang
seen when a Tx Timeout action was attempted at the same time
as a FW Reset was started.
Shannon Nelson [Fri, 1 Oct 2021 18:05:53 +0000 (11:05 -0700)]
ionic: move lif mutex setup and delete
Move creation and deletion of lif mutex a level out to
lif creation and delete, rather than in init and deinit.
This assures that nothing will get hung if anything is waiting
on the mutex while the driver is clearing the lif while handling
the fw_down/fw_up cycle.
Shannon Nelson [Fri, 1 Oct 2021 18:05:52 +0000 (11:05 -0700)]
ionic: check for binary values in FW ver string
If the PCI connection is broken, reading the FW version string
will only get 0xff bytes, which shouldn't get printed. This
checks the first byte and prints only the first 4 bytes
if non-ASCII.
Also, add a limit to the string length printed when a valid
string is found, just in case it is not properly terminated.
Shannon Nelson [Fri, 1 Oct 2021 18:05:51 +0000 (11:05 -0700)]
ionic: remove debug stats
These debug stats are not really useful, their collection is
likely detrimental to performance, and they suck up a lot
of memory which never gets used if no one ever enables the
priv-flag to print them, so just remove these bits.
David S. Miller [Sat, 2 Oct 2021 12:52:46 +0000 (13:52 +0100)]
Merge branch 'ravb-gigabit'
Biju Das says:
====================
Fillup stubs for Gigabit Ethernet driver support
The DMAC and EMAC blocks of Gigabit Ethernet IP found on RZ/G2L SoC are
similar to the R-Car Ethernet AVB IP.
The Gigabit Ethernet IP consists of Ethernet controller (E-MAC), Internal
TCP/IP Offload Engine (TOE) and Dedicated Direct memory access controller
(DMAC).
With a few changes in the driver we can support both IPs.
This patch series is for adding Gigabit ethernet driver support to RZ/G2L SoC.
The number of patches after incorporatng RFC review comments is 18.
So split the patches into 2 patchsets (10 + 8).
This series is the second patchset, aims to fillup all the stubs for the
Gigabit Ethernet driver.
RFC->V1:
* used rx_max_buf_size instead of rx_2k_buffers feature bit.
* renamed "rgeth" to "gbeth".
* renamed ravb_rx_ring_free to ravb_rx_ring_free_rcar
* renamed ravb_rx_ring_format to ravb_rx_ring_format_rcar
* renamed ravb_alloc_rx_desc to ravb_alloc_rx_desc_rcar
* renamed ravb_rcar_rx to ravb_rx_rcar
* Added Sergey's Rb tag for patch #6.
* Moved CSR0 initialization to patch #8.
====================