Git Repo - linux.git/log

cfi: Flip headers

Normal include order is that linux/foo.h should include asm/foo.h, CFI has it
the wrong way around.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Sami Tolvanen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment

If an abnormally huge cnt is used for multi-kprobes attachment, the
following warning will be reported:

  ------------[ cut here ]------------
  WARNING: CPU: 1 PID: 392 at mm/util.c:632 kvmalloc_node+0xd9/0xe0
  Modules linked in: bpf_testmod(O)
  CPU: 1 PID: 392 Comm: test_progs Tainted: G ...... 6.7.0-rc3+ #32
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  ......
  RIP: 0010:kvmalloc_node+0xd9/0xe0
   ? __warn+0x89/0x150
   ? kvmalloc_node+0xd9/0xe0
   bpf_kprobe_multi_link_attach+0x87/0x670
   __sys_bpf+0x2a28/0x2bc0
   __x64_sys_bpf+0x1a/0x30
   do_syscall_64+0x36/0xb0
   entry_SYSCALL_64_after_hwframe+0x6e/0x76
  RIP: 0033:0x7fbe067f0e0d
  ......
   </TASK>
  ---[ end trace 0000000000000000 ]---

So add a test to ensure the warning is fixed.

Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test

Since libbpf v1.0, libbpf doesn't return error code embedded into the
pointer iteself, libbpf_get_error() is deprecated and it is basically
the same as using -errno directly.

So replace the invocations of libbpf_get_error() by -errno in
kprobe_multi_test. For libbpf_get_error() in test_attach_api_fails(),
saving -errno before invoking ASSERT_xx() macros just in case that
errno is overwritten by these macros. However, the invocation of
libbpf_get_error() in get_syms() should be kept intact, because
hashmap__new() still returns a pointer with embedded error code.

Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment

If an abnormally huge cnt is used for multi-uprobes attachment, the
following warning will be reported:

  ------------[ cut here ]------------
  WARNING: CPU: 7 PID: 406 at mm/util.c:632 kvmalloc_node+0xd9/0xe0
  Modules linked in: bpf_testmod(O)
  CPU: 7 PID: 406 Comm: test_progs Tainted: G ...... 6.7.0-rc3+ #32
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
  RIP: 0010:kvmalloc_node+0xd9/0xe0
  ......
  Call Trace:
   <TASK>
   ? __warn+0x89/0x150
   ? kvmalloc_node+0xd9/0xe0
   bpf_uprobe_multi_link_attach+0x14a/0x480
   __sys_bpf+0x14a9/0x2bc0
   do_syscall_64+0x36/0xb0
   entry_SYSCALL_64_after_hwframe+0x6e/0x76
   ......
   </TASK>
  ---[ end trace 0000000000000000 ]---

So add a test to ensure the warning is fixed.

Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Limit the number of kprobes when attaching program to multiple kprobes

An abnormally big cnt may also be assigned to kprobe_multi.cnt when
attaching multiple kprobes. It will trigger the following warning in
kvmalloc_node():

if (unlikely(size > INT_MAX)) {
WARN_ON_ONCE(!(flags & __GFP_NOWARN));
return NULL;
}

Fix the warning by limiting the maximal number of kprobes in
bpf_kprobe_multi_link_attach(). If the number of kprobes is greater than
MAX_KPROBE_MULTI_CNT, the attachment will fail and return -E2BIG.

Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Limit the number of uprobes when attaching program to multiple uprobes

An abnormally big cnt may be passed to link_create.uprobe_multi.cnt,
and it will trigger the following warning in kvmalloc_node():

if (unlikely(size > INT_MAX)) {
WARN_ON_ONCE(!(flags & __GFP_NOWARN));
return NULL;
}

Fix the warning by limiting the maximal number of uprobes in
bpf_uprobe_multi_link_attach(). If the number of uprobes is greater than
MAX_UPROBE_MULTI_CNT, the attachment will return -E2BIG.

Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link")
Reported-by: Xingwei Lee <[email protected]>
Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Closes: https://lore.kernel.org/bpf/CABOYnLwwJY=yFAGie59LFsUsBAgHfroVqbzZ5edAXbFE3YiNVA@mail.gmail.com
Link: https://lore.kernel.org/bpf/[email protected]

wifi: ath11k: workaround too long expansion sparse warnings

In v6.7-rc1 sparse warns:

drivers/net/wireless/ath/ath11k/mac.c:4702:15: error: too long token expansion
drivers/net/wireless/ath/ath11k/mac.c:4702:15: error: too long token expansion
drivers/net/wireless/ath/ath11k/mac.c:8393:23: error: too long token expansion
drivers/net/wireless/ath/ath11k/mac.c:8393:23: error: too long token expansion

Workaround the warnings by refactoring the code to a new function, which also
reduces code duplication. And in the new function use max3() to make the code
more readable.

No functional changes, compile tested only.

Acked-by: Jeff Johnson <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

Revert "wifi: ath12k: use ATH12K_PCI_IRQ_DP_OFFSET for DP IRQ"

This reverts commit 1f1f7d548a00ebe50808cb1f580df9693e194a7c. The commit
caused bootup failure on QCN9274 hw2.0 platform. Incorrect hardcode DP
irq offset overwrite the CE irq, which caused the driver to miss the
mandatory bootup message from the firmware through the CE interrupt. This
occurs because the CE count differs between platforms. The revert has no
impact since the original change was based on an incorrect assumption.

Log:

ath12k_pci 0000:06:00.0: fw_version 0x1011001d fw_build_timestamp 2022-12-02 01:16 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1
ath12k_pci 0000:06:00.0: failed to receive control response completion, polling..
ath12k_pci 0000:06:00.0: Service connect timeout
ath12k_pci 0000:06:00.0: failed to connect to HTT: -110
ath12k_pci 0000:06:00.0: failed to start core: -110

Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4

Signed-off-by: Karthikeyan Periyasamy <[email protected]>
Acked-by: Jeff Johnson <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: rt2x00: remove useless code in rt2x00queue_create_tx_descriptor()

In 'rt2x00queue_create_tx_descriptor()', there is no need to call
'ieee80211_get_rts_cts_rate()' while checking for RTS/CTS frame
since this function returns NULL or pointer to internal bitrate
table entry, and the return value is not actually used. Compile
tested only.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Dmitry Antipov <[email protected]>
Acked-by: Stanislaw Gruszka <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: rtw89: only reset BB/RF for existing WiFi 6 chips while starting up

The new WiFi 7 chips change the design, so no need to disable/enable
BB/RF when core_start(). Keep the same logic for existing chips.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: rtw89: add DBCC H2C to notify firmware the status

To support MLO of WiFi 7, we should configure hardware as DBCC mode, and
notify this status to firmware.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: rtw89: mac: add suffix _ax to MAC functions

Many existing MAC access functions are used by WiFi 6 chips only, so add
suffix _ax to be clearer. Some are common and can be used by WiFi 7, so
export this kind of functions. This patch doesn't change logic at all.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: rtw89: mac: add flags to check if CMAC and DMAC are enabled

Before accessing CMAC and DMAC registers, we should ensure they have been
powered on, so add flag to determine the state. For old chips, we read
registers and check corresponding bit, but it takes extra cost to read.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: rtw89: 8922a: add power on/off functions

The power on/off functions are to turn on hardware function blocks and
to turn off them if we are going to stay in idle state.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: rtw89: add XTAL SI for WiFi 7 chips

The XTAL SI is a serial interface to indirectly access registers of
analog hardware circuit. Since WiFi 7 chips use different registers, add
a ops to access them via common functions. This patch doesn't change logic
for existing chips.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: rtw89: phy: print out RFK log with formatted string

With formatted string loaded from firmware file, we can use the formatted
string ID and get corresponding string, and then use regular rtw89_debug()
to show the message if debug mask of RFK is enabled.

If the string ID doesn't present, fallback to print plain hexadecimal.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: rtw89: parse and print out RFK log from C2H events

RFK log events contains two types. One called RUN log is to reflect state
during RFK is running, and it replies on formatted string loaded from
firmware file, but print this type as plain hexadecimal only in this patch.
The other is REPORT log that reflects the final result of a RFK, and
each calibration has its own struct to carry many specific information.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: rtw89: add C2H event handlers of RFK log and report

Trigger a RFK (RF calibration) in firmware by a H2C command, and in
progress it reports log and a result finally by C2H events. Firstly, add
prototype of the C2H event handlers to have a simple picture of framework.

The callers who trigger H2C will wait until a C2H event is received,
so we must process these C2H events in receiving process. Thus, mark this
kind of C2H events as atomic. Also, timestamp is also useful for
debugging, mark C2H events carrying RFK log as atomic as well.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: rtw89: load RFK log format string from firmware file

To debug RFK (RF calibration) in firmware, it sends log via firmware C2H
events to driver with string format ID and four arguments. Load formatted
string from firmware file, and the string ID can get back its string. Then,
use regular print format to show the message.

This firmware element layout looks like

    +============================================+
    |  elm ID  | elm size | version  |           |
    +----------+----------+----------+-----------+
    |                     | nr |rsvd |rfk_id|rsvd|
    +--------------------------------------------+
    | offset[] (__le16 * nr)                     |
    | ...                                        |
    +--------------------------------------------+
    | formatted string with null termintor (*nr) |
    | ...                                        |
    +============================================+

* a firmware file can contains more than one elements with this element ID
   named RTW89_FW_ELEMENT_ID_RFKLOG_FMT (19), because many RFK needs its
   own formatted strings, so add 'rfk_id' to know it belongs to which RFK.
* the 'formatted string' just follow 'offset[]' without padding to align
   32bits.

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: rtw89: fw: add version field to BB MCU firmware element

8922AE has more than one hardware version, and they use different BB MCU
firmware, so occupy a byte from element priv[] to annotate version. Since
there are more than one firmware and only matched version is adopted,
return 1 to ignore not matched firmware.

     +===========================================+
     |  elm ID  | elm size | version  |          |
     +----------+----------+----------+----------+
     |                     |  element_priv[]     |
     +-------------------------------------------+

                change to  |
                           v

     +===========================================+
     |  elm ID  | elm size | version  |          |
     +----------+----------+----------+----------+
     |                     | cv | element_rsvd[] |
     +-------------------------------------------+

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: rtw89: fw: load TX power track tables from fw_element

The TX power track tables are used to define compensation power reflected
to thermal value. Currently, we have 16 (2 * 4 * 2) tables made by
combinations of
{negative/positive thermal value, 2GHz/2GHz-CCK/5GHz/6GHz, path A/B}

Signed-off-by: Ping-Ke Shih <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: mwifiex: configure BSSID consistently when starting AP

AP BSSID configuration is missing at AP start.  Without this fix, FW returns
STA interface MAC address after first init.  When hostapd restarts, it gets MAC
address from netdev before driver sets STA MAC to netdev again. Now MAC address
between hostapd and net interface are different causes STA cannot connect to
AP.  After that MAC address of uap0 mlan0 become the same. And issue disappears
after following hostapd restart (another issue is AP/STA MAC address become the
same).

This patch fixes the issue cleanly.

Signed-off-by: David Lin <[email protected]>
Fixes: 12190c5d80bd ("mwifiex: add cfg80211 start_ap and stop_ap handlers")
Cc: [email protected]
Reviewed-by: Francesco Dolcini <[email protected]>
Tested-by: Rafael Beims <[email protected]> # Verdin iMX8MP/SD8997 SD
Acked-by: Brian Norris <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

wifi: mwifiex: add extra delay for firmware ready

For SDIO IW416, due to a bug, FW may return ready before complete full
initialization. Command timeout may occur at driver load after reboot.
Workaround by adding 100ms delay at checking FW status.

Signed-off-by: David Lin <[email protected]>
Cc: [email protected]
Reviewed-by: Francesco Dolcini <[email protected]>
Acked-by: Brian Norris <[email protected]>
Tested-by: Marcel Ziswiler <[email protected]> # Verdin AM62 (IW416)
Signed-off-by: Kalle Valo <[email protected]>
Link: https://msgid.link/[email protected]

hv_netvsc: remove duplicated including of slab.h

rm the second include <linux/slab.h>

Signed-off-by: Wang Jinchao <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'netlink-specs-legacy'

Jakub Kicinski says:

====================
netlink: specs: prep legacy specs for C code gen

Minor adjustments to some specs to make them ready for C code gen.

v2:
- fix MAINATINERS and subject of patch 3
====================

Signed-off-by: David S. Miller <[email protected]>

netlink: specs: mptcp: rename the MPTCP path management spec

We assume in handful of places that the name of the spec is
the same as the name of the family. We could fix that but
it seems like a fair assumption to make. Rename the MPTCP
spec instead.

Reviewed-by: Mat Martineau <[email protected]>
Reviewed-by: Donald Hunter <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netlink: specs: ovs: correct enum names in specs

Align the enum-names of OVS with what's actually in the uAPI.
Either correct the names, or mark the enum as empty because
the values are in fact #defines.

Reviewed-by: Donald Hunter <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netlink: specs: ovs: remove fixed header fields from attrs

Op's "attributes" list is a workaround for families with a single
attr set. We don't want to render a single huge request structure,
the same for each op since we know that most ops accept only a small
set of attributes. "Attributes" list lets us narrow down the attributes
to what op acctually pays attention to.

It doesn't make sense to put names of fixed headers in there.
They are not "attributes" and we can't really narrow down the struct
members.

Remove the fixed header fields from attrs for ovs families
in preparation for C codegen support.

Reviewed-by: Donald Hunter <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
add v2 FW logging for ice driver

Paul Stillwell says:

Firmware (FW) log support was added to the ice driver, but that version is
no longer supported. There is a newer version of FW logging (v2) that
adds more control knobs to get the exact data out of the FW
for debugging.

The interface for FW logging is debugfs. This was chosen based on
discussions here:
https://lore.kernel.org/netdev/20230214180712.53fc8ba2@kernel.org/ and
https://lore.kernel.org/netdev/20231012164033.1069fb4b@kernel.org/

We talked about using devlink in a variety of ways, but none of those
options made any sense for the way the FW reports data. We briefly talked
about using ethtool, but that seemed to go by the wayside. Ultimately it
seems like using debugfs is the way to go so re-implement the code to use
that.

FW logging is across all the PFs on the device so restrict the commands to
only PF0.

If the device supports FW logging then a directory named 'fwlog' will be
created under '/sys/kernel/debug/ice/<pci_dev>'. A variety of files will be
created to manage the behavior of logging. The following files will be
created:
- modules/<module>
- nr_messages
- enable
- log_size
- data

where
modules/<module> is used to read/write the log level for a specific module

nr_messages is used to determine how many events should be in each message
sent to the driver

enable is used to start/stop FW logging. This is a boolean value so only 1
or 0 are permissible values

log_size is used to configure the amount of memory the driver uses for log
data

data is used to read/clear the log data

Generally there is a lot of data and dumping that data to syslog will
result in a loss of data. This causes problems when decoding the data and
the user doesn't know that data is missing until later. Instead of dumping
the FW log output to syslog use debugfs. This ensures that all the data the
driver has gets retrieved correctly.

The FW log data is binary data that the FW team decodes to determine what
happened in firmware. The binary blob is sent to Intel for decoding.
---
v6:
- use seq_printf() for outputting module info when reading from 'module' file
- replace code that created argc and argv for handling command line input
- removed checks in all the _read() and _write() functions to see if FW logging
  is supported because the files will not exist if it is not supported
- removed warnings on allocation failures on debugfs file creation failures
- removed a newline between memory allocation and checking if the memory was
  allocated
- fixed cases where we could just return the value from a function call
  instead of saving the value in a variable
- moved the check for PFO in ice_fwlog_init() to an earlier patch
- reworked all of argument scanning in the _write() functions in ice_debugfs.c
  to remove adding characters past the end of the buffer

v5: https://lore.kernel.org/netdev/20231205211251.2122874 [email protected]/
- changed the log level configuration from a single file for all modules to a
  file per module.
- changed 'nr_buffs' to 'log_size' because users understand memory sizes
  better than a number of buffers
- changed 'resolution' to 'nr_messages' to better reflect what it represents
- updated documentation to reflect these changes
- updated documentation to indicate that FW logging must be disabled to
  clear the data. also clarified that any value written to the 'data' file will
  clear the data

v4: https://lore.kernel.org/netdev/20231005170110.3221306 [email protected]/
- removed CONFIG_DEBUG_FS wrapper around code because the debugfs calls handle
  this case already
- moved ice_debugfs_exit() call to remove unreachable code issue
- minor changes to documentation based on feedback

v3: https://lore.kernel.org/netdev/20230815165750.2789609 [email protected]/
- Adjust error path cleanup in ice_module_init() for unreachable code.

v2: https://lore.kernel.org/netdev/20230810170109.1963832 [email protected]/
- Rewrote code to use debugfs instead of devlink

v1: https://lore.kernel.org/netdev/20230209190702.3638688 [email protected]/
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mv88e6xxx-counters'

Tobias Waldekranz says:

====================
net: dsa: mv88e6xxx: Add "eth-mac" and "rmon" counter group support

The majority of the changes (2/8) are about refactoring the existing
ethtool statistics support to make it possible to read individual
counters, rather than the whole set.

4/8 tries to collect all information about a stat in a single place
using a mapper macro, which is then used to generate the original list
of stats, along with a matching enum. checkpatch is less than amused
with this construct, but prior art exists (__BPF_FUNC_MAPPER in
include/uapi/linux/bpf.h, for example).

To support the histogram counters from the "rmon" group, we have to
change mv88e6xxx's configuration of them. Instead of counting rx and
tx, we restrict them to rx-only. 6/8 has the details.

With that in place, adding the actual counter groups is pretty
straight forward (5,7/8).

Tie it all together with a selftest (8/8).

v3 -> v4:
- Return size_t from mv88e6xxx_stats_get_stats
- Spelling errors in commit message of 6/8
- Improve selftest:
  - Report progress per-bucket
  - Test both ports in the pair
  - Increase MTU, if required

v2 -> v3:
- Added 6/8
- Added 8/8

v1 -> v2:
- Added 1/6
- Added 3/6
- Changed prototype of stats operation to reflect the fact that the
  number of read stats are returned, no errors
- Moved comma into MV88E6XXX_HW_STAT_MAPPER definition
- Avoid the construction of mapping table iteration which relied on
  struct layouts outside of mv88e6xxx's control
====================

Signed-off-by: David S. Miller <[email protected]>

selftests: forwarding: ethtool_rmon: Add histogram counter test

Validate the operation of rx and tx histogram counters, if supported
by the interface, by sending batches of packets targeted for each
bucket.

Signed-off-by: Tobias Waldekranz <[email protected]>
Tested-by: Vladimir Oltean <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Add "rmon" counter group support

Report the applicable subset of an mv88e6xxx port's counters using
ethtool's standardized "rmon" counter group.

Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Limit histogram counters to ingress traffic

Chips in this family only have one set of histogram counters, which
can be used to count ingressing and/or egressing traffic. mv88e6xxx
has, up until this point, kept the hardware default of counting both
directions.

In the mean time, standard counter group support has been added to
ethtool. Via that interface, drivers may report ingress-only and
egress-only histograms separately - but not combined.

In order for mv88e6xxx to maximize amount of diagnostic information
that can be exported via standard interfaces, we opt to limit the
histogram counters to ingress traffic only. Which will allow us to
export them via the standard "rmon" group in an upcoming commit.

The reason for choosing ingress-only over egress-only, is to be
compatible with RFC2819 (RMON MIB).

Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Add "eth-mac" counter group support

Report the applicable subset of an mv88e6xxx port's counters using
ethtool's standardized "eth-mac" counter group.

Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Give each hw stat an ID

With the upcoming standard counter group support, we are no longer
reading out the whole set of counters, but rather mapping a subset to
the requested group.

Therefore, create an enum with an ID for each stat, such that
mv88e6xxx_hw_stats[] can be subscripted with a human-readable ID
corresponding to the counter's name.

Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Fix mv88e6352_serdes_get_stats error path

mv88e6xxx_get_stats, which collects stats from various sources,
expects all callees to return the number of stats read. If an error
occurs, 0 should be returned.

Prevent future mishaps of this kind by updating the return type to
reflect this contract.

Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Create API to read a single stat counter

This change contains no functional change. We simply push the hardware
specific stats logic to a function reading a single counter, rather
than the whole set.

This is a preparatory change for the upcoming standard ethtool
statistics support (i.e. "eth-mac", "eth-ctrl" etc.).

Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Push locking into stats snapshotting

This is more consistent with the driver's general structure.

Reviewed-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-optmem_max-changes'

Eric Dumazet says:

====================
net: optmem_max changes

optmem_max default value is too small for tx zerocopy workloads.

First patch increases default from 20KB to 128 KB,
which is the value we have used for seven years.

Second patch makes optmem_max sysctl per netns.

Last patch tweaks two tests accordingly.
====================

Signed-off-by: David S. Miller <[email protected]>

selftests/net: optmem_max became per netns

/proc/sys/net/core/optmem_max is now per netns, change two tests
that were saving/changing/restoring its value on the parent netns.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: Namespace-ify sysctl_optmem_max

optmem_max being used in tx zerocopy,
we want to be able to control it on a netns basis.

Following patch changes two tests.

Tested:

oqq130:~# cat /proc/sys/net/core/optmem_max
131072
oqq130:~# echo 1000000 >/proc/sys/net/core/optmem_max
oqq130:~# cat /proc/sys/net/core/optmem_max
1000000
oqq130:~# unshare -n
oqq130:~# cat /proc/sys/net/core/optmem_max
131072
oqq130:~# exit
logout
oqq130:~# cat /proc/sys/net/core/optmem_max
1000000

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: increase optmem_max default value

For many years, /proc/sys/net/core/optmem_max default value
on a 64bit kernel has been 20 KB.

Regular usage of TCP tx zerocopy needs a bit more.

Google has used 128KB as the default value for 7 years without
any problem.

Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mlxsw-CFF-flood-mode'

Petr Machata says:

====================
mlxsw: CFF flood mode: NVE underlay configuration

Recently, support for CFF flood mode (for Compressed FID Flooding) was
added to the mlxsw driver. The most recent patchset has a detailed coverage
of what CFF is and what has changed and how:

https://lore.kernel.org/netdev/cover.1701183891 [email protected]/

In CFF flood mode, each FID allocates a handful (in our implementation two
or three) consecutive PGT entries. One entry holds the flood vector for
unknown-UC traffic, one for MC, one for BC.

To determine how to look up flood vectors, the CFF flood mode uses a
concept of flood profiles, which are IDs that reference mappings from
traffic types to offsets. In the case of CFF flood mode, the offset in
question is applied to the PGT address configured at a FID. The same
mechanism is used by NVE underlay for flooding. Again the profile ID and
the traffic type determine the offset to apply, this time to KVD address
used to look up flooding entries. Since mlxsw configures NVE underlay flood
the same regardless of traffic type, only one offset was ever needed: the
zero, which is the default, and thus no explicit configuration was needed.

Now that CFF uses profiles as well, it would be better to configure the
profile used by NVE explicitly, to make the configuration visible in the
source code.

In this patchset, add the register support (in patch #1), add a new traffic
type to refer to "any traffic at all" (in patch #2) and finally configure
the NVE profile explicitly for FIDs (in patch #3).

So far, the implicitly configured flood profile was the ID 0. With this
patchset, it changes to 3, leaving the 0 free to allow us to spot missed
configuration.
====================

Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_fid: Set NVE flood profile as part of FID configuration

The NVE flood profile is used for determining of offset applied to KVD
address for NVE flood. We currently do not set it, leaving it at the
default value of 0. That is not an issue: all the traffic-type-to-offset
mappings (as configured by SFFP) default to offset of 0. This is what we
need anyway, as mlxsw only allocates a single KVD entry for NVE underlay.

The field is only relevant on Spectrum-2 and above. So to be fully
consistent, we should split the existing controlled ops to Spectrum-1 and
Spectrum>1 variants, with only the latter setting the field. But that seems
like a lot of overhead for a single field whose meaning is "everything is
the default". So instead pretend that the NVE flood profile does not exist
in the controlled flood mode, like we have so far, and only set it when
flood mode is CFF.

Setting this at all serves dual purpose. First, it is now clear which
profile belongs to NVE, because in the CFF mode, we have multiple users.
This should prevent bugs in flood profile management. Second, using
specifically non-zero value means there will be no valid uses of the
profile 0, which we can therefore use as a sentinel.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: spectrum_fid: Add an "any" packet type

Flood profiles have been used prior to CFF support for NVE underlay. Like
is the case with FID flooding, an NVE profile describes at which offset a
datum is located given traffic type. mlxsw currently only ever uses one KVD
entry for NVE lookup, i.e. regardless of traffic type, the offset is always
zero. To be able to describe this, add a traffic type enumerator describing
"any traffic type".

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mlxsw: reg: Add nve_flood_prf_id field to SFMR

The field is used for setting a flood profile for lookup of KVD entry for
NVE underlay. As the other uses of flood profile, this references a traffic
type-to-offset mapping, except here it is not applied to PGT offsets, but
KVD offsets.

Signed-off-by: Petr Machata <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-at803x-cleanups'

Christian Marangi says:

====================
net: phy: at803x: additional cleanup for qca808x

This small series is a preparation for the big code split. While the
qca808x code is waiting to be reviwed and merged, we can further cleanup
and generalize shared functions between at803x and qca808x.

With these last 2 patch everything is ready to move the driver to a
dedicated directory and split the code by creating a library module
for the few shared functions between the 2 driver.

Eventually at803x can be further cleaned and generalized but everything
will be already self contained and related only to at803x family of PHYs.
====================

Signed-off-by: David S. Miller <[email protected]>

net: phy: at803x: make read specific status function more generic

Rework read specific status function to be more generic. The function
apply different speed mask based on the PHY ID. Make it more generic by
adding an additional arg to pass the specific speed (ss) mask and use
the provided mask to parse the speed value.

This is needed to permit an easier deatch of qca808x code from the
at803x driver.

Signed-off-by: Christian Marangi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: at803x: move specific qca808x config_aneg to dedicated function

Move specific qca808x config_aneg to dedicated function to permit easier
split of qca808x portion from at803x driver.

Signed-off-by: Christian Marangi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'vsock-credit-update'

Arseniy Krasnov says:

====================
send credit update during setting SO_RCVLOWAT

                               DESCRIPTION

This patchset fixes old problem with hungup of both rx/tx sides and adds
test for it. This happens due to non-default SO_RCVLOWAT value and
deferred credit update in virtio/vsock. Link to previous old patchset:
https://lore.kernel.org/netdev/39b2e9fd-601b-189d-39a9-914e5574524c@sberdevices.ru/

Here is what happens step by step:

                                  TEST

                            INITIAL CONDITIONS

1) Vsock buffer size is 128KB.
2) Maximum packet size is also 64KB as defined in header (yes it is
   hardcoded, just to remind about that value).
3) SO_RCVLOWAT is default, e.g. 1 byte.

                                 STEPS

            SENDER                              RECEIVER
1) sends 128KB + 1 byte in a
   single buffer. 128KB will
   be sent, but for 1 byte
   sender will wait for free
   space at peer. Sender goes
   to sleep.

2)                                     reads 64KB, credit update not sent
3)                                     sets SO_RCVLOWAT to 64KB + 1
4)                                     poll() -> wait forever, there is
                                       only 64KB available to read.

So in step 4) receiver also goes to sleep, waiting for enough data or
connection shutdown message from the sender. Idea to fix it is that rx
kicks tx side to continue transmission (and may be close connection)
when rx changes number of bytes to be woken up (e.g. SO_RCVLOWAT) and
this value is bigger than number of available bytes to read.

I've added small test for this, but not sure as it uses hardcoded value
for maximum packet length, this value is defined in kernel header and
used to control deferred credit update. And as this is not available to
userspace, I can't control test parameters correctly (if one day this
define will be changed - test may become useless).

Head for this patchset is:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=9bab51bd662be4c3ebb18a28879981d69f3ef15a

Link to v1:
https://lore.kernel.org/netdev/20231108072004.1045669 [email protected]/
Link to v2:
https://lore.kernel.org/netdev/20231119204922.2251912 [email protected]/
Link to v3:
https://lore.kernel.org/netdev/20231122180510.2297075 [email protected]/
Link to v4:
https://lore.kernel.org/netdev/20231129212519.2938875 [email protected]/
Link to v5:
https://lore.kernel.org/netdev/20231130130840 [email protected]/
Link to v6:
https://lore.kernel.org/netdev/20231205064806.2851305 [email protected]/
Link to v7:
https://lore.kernel.org/netdev/20231206211849.2707151 [email protected]/
Link to v8:
https://lore.kernel.org/netdev/20231211211658.2904268 [email protected]/
Link to v9:
https://lore.kernel.org/netdev/20231214091947 [email protected]/

Changelog:
v1 -> v2:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* New patch is added as 0001 - it removes return from SO_RCVLOWAT set
   callback in 'af_vsock.c' when transport callback is set - with that
   we can set 'sk_rcvlowat' only once in 'af_vsock.c' and in future do
   not copy-paste it to every transport. It was discussed in v1.
* See per-patch changelog after ---.
v2 -> v3:
* See changelog after --- in 0003 only (0001 and 0002 still same).
v3 -> v4:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
v4 -> v5:
* Change patchset tag 'RFC' -> 'net-next'.
* See per-patch changelog after ---.
v5 -> v6:
* New patch 0003 which sends credit update during reading bytes from
   socket.
* See per-patch changelog after ---.
v6 -> v7:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
v7 -> v8:
* See per-patch changelog after ---.
v8 -> v9:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* Add 'Fixes' tag for the current 0002.
* Reorder patches by moving two fixes first.
v9 -> v10:
* Squash 0002 and 0003 and update commit message in result.
====================

Signed-off-by: David S. Miller <[email protected]>

vsock/test: two tests to check credit update logic

Both tests are almost same, only differs in two 'if' conditions, so
implemented in a single function. Tests check, that credit update
message is sent:

1) During setting SO_RCVLOWAT value of the socket.
2) When number of 'rx_bytes' become smaller than SO_RCVLOWAT value.

Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

virtio/vsock: send credit update during setting SO_RCVLOWAT

Send credit update message when SO_RCVLOWAT is updated and it is bigger
than number of bytes in rx queue. It is needed, because 'poll()' will
wait until number of bytes in rx queue will be not smaller than
O_RCVLOWAT, so kick sender to send more data. Otherwise mutual hungup
for tx/rx is possible: sender waits for free space and receiver is
waiting data in 'poll()'.

Rename 'set_rcvlowat' callback to 'notify_set_rcvlowat' and set
'sk->sk_rcvlowat' only in one place (i.e. 'vsock_set_rcvlowat'), so the
transport doesn't need to do it.

Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages")
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

virtio/vsock: fix logic which reduces credit update messages

Add one more condition for sending credit update during dequeue from
stream socket: when number of bytes in the rx queue is smaller than
SO_RCVLOWAT value of the socket. This is actual for non-default value
of SO_RCVLOWAT (e.g. not 1) - idea is to "kick" peer to continue data
transmission, because we need at least SO_RCVLOWAT bytes in our rx
queue to wake up user for reading data (in corner case it is also
possible to stuck both tx and rx sides, this is why 'Fixes' is used).

Fixes: b89d882dc9fc ("vsock/virtio: reduce credit update messages")
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipmr: support IP_PKTINFO on cache report IGMP msg

In order to support IP_PKTINFO on those packets, we need to call
ipv4_pktinfo_prepare.

When sending mrouted/pimd daemons a cache report IGMP msg, it is
unnecessary to set dst on the newly created skb.
It used to be necessary on older versions until
commit d826eb14ecef ("ipv4: PKTINFO doesnt need dst reference") which
changed the way IP_PKTINFO struct is been retrieved.

Changes from v1:
1. Undo changes in ipv4_pktinfo_prepare function. use it directly
and copy the control block.

Fixes: d826eb14ecef ("ipv4: PKTINFO doesnt need dst reference")
Signed-off-by: Leone Fernando <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mana: add msix index sharing between EQs

This patch allows to assign and poll more than one EQ on the same
msix index.
It is achieved by introducing a list of attached EQs in each IRQ context.
It also removes the existing msix_index map that tried to ensure that there
is only one EQ at each msix_index.
This patch exports symbols for creating EQs from other MANA kernel modules.

Signed-off-by: Konstantin Taranov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

octeontx2-af: Fix multicast/mirror group lock/unlock issue

As per the existing implementation, there exists a race between finding
a multicast/mirror group entry and deleting that entry. The group lock
was taken and released independently by rvu_nix_mcast_find_grp_elem()
function. Which is incorrect and group lock should be taken during the
entire operation of group updation/deletion. This patch fixes the same.

Fixes: 51b2804c19cd ("octeontx2-af: Add new mbox to support multicast/mirror offload")
Signed-off-by: Suman Ghosh <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'mlx5-updates-2023-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-12-13

Preparation for mlx5e socket direct feature.

Socket direct will allow multiple PF devices attached to different
NUMA nodes but sharing the same physical port.

The following series is a small refactoring series in preparation
to support socket direct in the following submission.

Highlights:
- Define required device registers and bits related to socket direct
- Flow steering re-arrangements
- Generalize TX objects (TISs) and store them in a common object, will
be useful in the next series for per function object management.
- Decouple raw CQ objects from their parent netdev priv
- Prepare devcom for Socket Direct device group discovery.

Please see the individual patches for more information.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-phy-rust'

FUJITA Tomonori says:

====================
Rust abstractions for network PHY drivers

No functional change since v10; only comment and commit log updates.

This patchset adds Rust abstractions for phylib. It doesn't fully
cover the C APIs yet but I think that it's already useful. I implement
two PHY drivers (Asix AX88772A PHYs and Realtek Generic FE-GE). Seems
they work well with real hardware.

The first patch introduces Rust bindings for phylib.

The second patch adds a macro to declare a kernel module for PHYs
drivers.

The third adds the Rust ETHERNET PHY LIBRARY entry to MAINTAINERS
file; adds the binding file and me as a maintainer (as Andrew Lunn
suggested) with Trevor Gross as a reviewer.

The last patch introduces the Rust version of Asix PHY driver,
drivers/net/phy/ax88796b.c. The features are equivalent to the C
version. You can choose C (by default) or Rust version on kernel
configuration.

v11:
  - adds Andrew, Alice, and Trevor's Reviewed-by
  - comment update
v10: https://lore.kernel.org/netdev/20231210234924.1453917 [email protected]/T/
  - adds Trevor's SoB to the third patch
  - adds Benno's Reviewed-by to the second patch
v9: https://lore.kernel.org/netdev/20231205.124531.842372711631366729 [email protected]/T/
  - adds a workaround to access to a bit field in phy_device
  - fixes a comment typo
v8: https://lore.kernel.org/netdev/20231123050412.1012252 [email protected]/
  - updates the safety comments on Device and its related code
  - uses _phy_start_aneg instead of phy_start_aneg
  - drops the patch for enum synchronization
  - moves Sync From Registration to DriverVTable
  - fixes doctest errors
  - minor cleanups
v7: https://lore.kernel.org/netdev/20231026001050.1720612 [email protected]/T/
  - renames get_link() to is_link_up()
  - improves the macro format
  - improves the commit log in the third patch
  - improves comments
v6: https://lore.kernel.org/netdev/20231025.090243.1437967503809186729 [email protected]/T/
  - improves comments
  - makes the requirement of phy_drivers_register clear
  - fixes Makefile of the third patch
v5: https://lore.kernel.org/all/20231019.094147.1808345526469629486 [email protected]/T/
  - drops the rustified-enum option, writes match by hand; no *risk* of UB
  - adds Miguel's patch for enum checking
  - moves CONFIG_RUST_PHYLIB_ABSTRACTIONS to drivers/net/phy/Kconfig
  - adds a new entry for this abstractions in MAINTAINERS
  - changes some of Device's methods to take &mut self
  - comment improvment
v4: https://lore.kernel.org/netdev/20231012125349.2702474 [email protected]/T/
  - split the core patch
  - making Device::from_raw() private
  - comment improvement with code update
  - commit message improvement
  - avoiding using bindings::phy_driver in public functions
  - using an anonymous constant in module_phy_driver macro
v3: https://lore.kernel.org/netdev/20231011.231607.1747074555988728415 [email protected]/T/
  - changes the base tree to net-next from rust-next
  - makes this feature optional; only enabled with CONFIG_RUST_PHYLIB_BINDINGS=y
  - cosmetic code and comment improvement
  - adds copyright
v2: https://lore.kernel.org/netdev/20231006094911.3305152 [email protected]/T/
  - build failure fix
  - function renaming
v1: https://lore.kernel.org/netdev/20231002085302.2274260 [email protected]/T/
====================

Signed-off-by: David S. Miller <[email protected]>

net: phy: add Rust Asix PHY driver

This is the Rust implementation of drivers/net/phy/ax88796b.c. The
features are equivalent. You can choose C or Rust version kernel
configuration.

Signed-off-by: FUJITA Tomonori <[email protected]>
Reviewed-by: Trevor Gross <[email protected]>
Reviewed-by: Benno Lossin <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Alice Ryhl <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

MAINTAINERS: add Rust PHY abstractions for ETHERNET PHY LIBRARY

Adds me as a maintainer and Trevor as a reviewer.

The files are placed at rust/kernel/ directory for now but the files
are likely to be moved to net/ directory once a new Rust build system
is implemented.

Signed-off-by: FUJITA Tomonori <[email protected]>
Signed-off-by: Trevor Gross <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rust: net::phy add module_phy_driver macro

This macro creates an array of kernel's `struct phy_driver` and
registers it. This also corresponds to the kernel's
`MODULE_DEVICE_TABLE` macro, which embeds the information for module
loading into the module binary file.

A PHY driver should use this macro.

Signed-off-by: FUJITA Tomonori <[email protected]>
Reviewed-by: Alice Ryhl <[email protected]>
Reviewed-by: Benno Lossin <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Trevor Gross <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rust: core abstractions for network PHY drivers

This patch adds abstractions to implement network PHY drivers; the
driver registration and bindings for some of callback functions in
struct phy_driver and many genphy_ functions.

This feature is enabled with CONFIG_RUST_PHYLIB_ABSTRACTIONS=y.

This patch enables unstable const_maybe_uninit_zeroed feature for
kernel crate to enable unsafe code to handle a constant value with
uninitialized data. With the feature, the abstractions can initialize
a phy_driver structure with zero easily; instead of initializing all
the members by hand. It's supposed to be stable in the not so distant
future.

Link: https://github.com/rust-lang/rust/pull/116218
Signed-off-by: FUJITA Tomonori <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Alice Ryhl <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: don't create a MDIO bus if unnecessary

Currently a MDIO bus is created if the devicetree description is either:

    1. Not fixed-link
    2. fixed-link but contains a MDIO bus as well

The "1" case above isn't always accurate. If there's a phy-handle,
it could be referencing a phy on another MDIO controller's bus[1]. In
this case, where the MDIO bus is not described at all, currently
stmmac will make a MDIO bus and scan its address space to discover
phys (of which there are none). This process takes time scanning a bus
that is known to be empty, delaying time to complete probe.

There are also a lot of upstream devicetrees[2] that expect a MDIO bus
to be created, scanned for phys, and the first one found connected
to the MAC. This case can be inferred from the platform description by
not having a phy-handle && not being fixed-link. This hits case "1" in
the current driver's logic, and must be handled in any logic change here
since it is a valid legacy dt-binding.

Let's improve the logic to create a MDIO bus if either:

    - Devicetree contains a MDIO bus
    - !fixed-link && !phy-handle (legacy handling)

This way the case where no MDIO bus should be made is handled, as well
as retaining backwards compatibility with the valid cases.

Below devicetree snippets can be found that explain some of
the cases above more concretely.

Here's[0] a devicetree example where the MAC is both fixed-link and
driving a switch on MDIO (case "2" above). This needs a MDIO bus to
be created:

    &fec1 {
            phy-mode = "rmii";

            fixed-link {
                    speed = <100>;
                    full-duplex;
            };

            mdio1: mdio {
                    switch0: switch0@0 {
                            compatible = "marvell,mv88e6190";
                            pinctrl-0 = <&pinctrl_gpio_switch0>;
                    };
            };
    };

Here's[1] an example where there is no MDIO bus or fixed-link for
the ethernet1 MAC, so no MDIO bus should be created since ethernet0
is the MDIO master for ethernet1's phy:

    &ethernet0 {
            phy-mode = "sgmii";
            phy-handle = <&sgmii_phy0>;

            mdio {
                    compatible = "snps,dwmac-mdio";
                    sgmii_phy0: phy@8 {
                            compatible = "ethernet-phy-id0141.0dd4";
                            reg = <0x8>;
                            device_type = "ethernet-phy";
                    };

                    sgmii_phy1: phy@a {
                            compatible = "ethernet-phy-id0141.0dd4";
                            reg = <0xa>;
                            device_type = "ethernet-phy";
                    };
            };
    };

    &ethernet1 {
            phy-mode = "sgmii";
            phy-handle = <&sgmii_phy1>;
    };

Finally there's descriptions like this[2] which don't describe the
MDIO bus but expect it to be created and the whole address space
scanned for a phy since there's no phy-handle or fixed-link described:

    &gmac {
            phy-supply = <&vcc_lan>;
            phy-mode = "rmii";
            snps,reset-gpio = <&gpio3 RK_PB4 GPIO_ACTIVE_HIGH>;
            snps,reset-active-low;
            snps,reset-delays-us = <0 10000 1000000>;
    };

[0] https://elixir.bootlin.com/linux/v6.5-rc5/source/arch/arm/boot/dts/nxp/vf/vf610-zii-ssmb-dtu.dts
[1] https://elixir.bootlin.com/linux/v6.6-rc5/source/arch/arm64/boot/dts/qcom/sa8775p-ride.dts
[2] https://elixir.bootlin.com/linux/v6.6-rc5/source/arch/arm64/boot/dts/rockchip/rk3368-r88.dts#L164

Reviewed-by: Serge Semin <[email protected]>
Co-developed-by: Bartosz Golaszewski <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>
Signed-off-by: Andrew Halaney <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

bpf: xdp: Register generic_kfunc_set with XDP programs

Registering generic_kfunc_set with XDP programs enables some of the
newer BPF features inside XDP -- namely tree based data structures and
BPF exceptions.

The current motivation for this commit is to enable assertions inside
XDP bpf progs. Assertions are a standard and useful tool to encode
intent.

Signed-off-by: Daniel Xu <[email protected]>
Link: https://lore.kernel.org/r/d07d4614b81ca6aada44fcb89bb6b618fb66e4ca.1702594357.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <[email protected]>

i40e: remove fake support of rx-frames-irq

Since we never support this feature for I40E driver, we don't have to
display the value when using 'ethtool -c eth0'.

Before this patch applied, the rx-frames-irq is 256 which is consistent
with tx-frames-irq. Apparently it could mislead users.

Signed-off-by: Jason Xing <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'mdio-mux-cleanup'

Vladimir Oltean says:

====================
MDIO mux cleanup

This small patch set resolves some technical debt in the MDIO mux driver
which was discovered during the investigation for commit 1f9f2143f24e
("net: mdio-mux: fix C45 access returning -EIO after API change").

The patches have been sitting for 2 months in the NXP SDK kernel and
haven't caused issues.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: mdio-mux: be compatible with parent buses which only support C45

After the mii_bus API conversion to a split read() / read_c45(), there
might be MDIO parent buses which only populate the read_c45() and
write_c45() function pointers but not the C22 variants.

We haven't seen these in the wild paired with MDIO multiplexers, but
Andrew points out we should treat the corner case.

Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: mdio-mux: show errors on probe failure

Showing the precise error symbols can help debugging probe issues, such
as the recent -EIO error in of_mdiobus_register() caused by the lack of
bus->read_c45() and bus->write_c45() methods.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: atlantic: eliminate double free in error handling logic

Driver has a logic leak in ring data allocation/free,
where aq_ring_free could be called multiple times on same ring,
if system is under stress and got memory allocation error.

Ring pointer was used as an indicator of failure, but this is
not correct since only ring data is allocated/deallocated.
Ring itself is an array member.

Changing ring allocation functions to return error code directly.
This simplifies error handling and eliminates aq_ring_free
on higher layer.

Signed-off-by: Igor Russkikh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'convert-net-selftests-to-run-in-unique-namespace-part-3'

Hangbin Liu says:

====================
Convert net selftests to run in unique namespace (Part 3)

Here is the 3rd part of converting net selftests to run in unique namespace.
This part converts all srv6 and fib tests.

Note that patch 06 is a fix for testing fib_nexthop_multiprefix.

Here is the part 1 link:
https://lore.kernel.org/netdev/20231202020110 [email protected]
And part 2 link:
https://lore.kernel.org/netdev/20231206070801.1691247 [email protected]
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/net: convert fdb_flush.sh to run it in unique namespace

Here is the test result after conversion.
# ./fdb_flush.sh
TEST: vx10: Expected 5 FDB entries, got 5                           [ OK ]
TEST: vx20: Expected 5 FDB entries, got 5                           [ OK ]
...
TEST: vx10: Expected 5 FDB entries, got 5                           [ OK ]
TEST: Test entries with dst 192.0.2.1                               [ OK ]

Acked-by: David Ahern <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/net: convert fib_tests.sh to run it in unique namespace

Here is the test result after conversion.

# ./fib_tests.sh

Single path route test
     Start point
     TEST: IPv4 fibmatch                                                 [ OK ]

...

Fib6 garbage collection test
     TEST: ipv6 route garbage collection                                 [ OK ]

IPv4 multipath list receive tests
     TEST: Multipath route hit ratio (1.00)                              [ OK ]

IPv6 multipath list receive tests
     TEST: Multipath route hit ratio (1.00)                              [ OK ]

Tests passed: 225
Tests failed:   0

Acked-by: David Ahern <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/net: convert fib_rule_tests.sh to run it in unique namespace

Here is the test result after conversion.

]# ./fib_rule_tests.sh

     TEST: rule6 check: oif redirect to table                  [ OK ]

     ...

     TEST: rule4 dsfield tcp connect (dsfield 0x07)            [ OK ]

Tests passed:  66
Tests failed:   0

Acked-by: David Ahern <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/net: convert fib-onlink-tests.sh to run it in unique namespace

Remove PEER_CMD, which is not used in this test

Here is the test result after conversion.

]# ./fib-onlink-tests.sh
Error: ipv4: FIB table does not exist.
Flush terminated
Error: ipv6: FIB table does not exist.
Flush terminated

########################################
Configuring interfaces

   ...

     TEST: Gateway resolves to wrong nexthop device - VRF      [ OK ]

Tests passed:  38
Tests failed:   0

Acked-by: David Ahern <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/net: convert fib_nexthops.sh to run it in unique namespace

Here is the test result after conversion.

]# ./fib_nexthops.sh

Basic functional tests
----------------------
TEST: List with nothing defined                                     [ OK ]
TEST: Nexthop get on non-existent id                                [ OK ]

...

TEST: IPv6 resilient nexthop group torture test                     [ OK ]

Tests passed: 234
Tests failed:   0

Acked-by: David Ahern <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/net: convert fib_nexthop_nongw.sh to run it in unique namespace

Here is the test result after conversion.

]# ./fib_nexthop_nongw.sh
TEST: nexthop: get route with nexthop without gw [ OK ]
TEST: nexthop: ping through nexthop without gw [ OK ]

Acked-by: David Ahern <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/net: convert fib_nexthop_multiprefix to run it in unique namespace

Here is the test result after conversion.

]# ./fib_nexthop_multiprefix.sh
TEST: IPv4: host 0 to host 1, mtu 1300                              [ OK ]
TEST: IPv6: host 0 to host 1, mtu 1300                              [ OK ]

TEST: IPv4: host 0 to host 2, mtu 1350                              [ OK ]
TEST: IPv6: host 0 to host 2, mtu 1350                              [ OK ]

TEST: IPv4: host 0 to host 3, mtu 1400                              [ OK ]
TEST: IPv6: host 0 to host 3, mtu 1400                              [ OK ]

TEST: IPv4: host 0 to host 1, mtu 1300                              [ OK ]
TEST: IPv6: host 0 to host 1, mtu 1300                              [ OK ]

TEST: IPv4: host 0 to host 2, mtu 1350                              [ OK ]
TEST: IPv6: host 0 to host 2, mtu 1350                              [ OK ]

TEST: IPv4: host 0 to host 3, mtu 1400                              [ OK ]
TEST: IPv6: host 0 to host 3, mtu 1400                              [ OK ]

Acked-by: David Ahern <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/net: fix grep checking for fib_nexthop_multiprefix

When running fib_nexthop_multiprefix test I saw all IPv6 test failed.
e.g.

]# ./fib_nexthop_multiprefix.sh
TEST: IPv4: host 0 to host 1, mtu 1300                              [ OK ]
TEST: IPv6: host 0 to host 1, mtu 1300                              [FAIL]

With -v it shows

COMMAND: ip netns exec h0 /usr/sbin/ping6 -s 1350 -c5 -w5 2001:db8:101::1
PING 2001:db8:101::1(2001:db8:101::1) 1350 data bytes
From 2001:db8:100::64 icmp_seq=1 Packet too big: mtu=1300

--- 2001:db8:101::1 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

Route get
2001:db8:101::1 via 2001:db8:100::64 dev eth0 src 2001:db8:100::1 metric 1024 expires 599sec mtu 1300 pref medium
Searching for:
     2001:db8:101::1 from :: via 2001:db8:100::64 dev eth0 src 2001:db8:100::1 .* mtu 1300

The reason is when CONFIG_IPV6_SUBTREES is not enabled, rt6_fill_node() will
not put RTA_SRC info. After fix:

]# ./fib_nexthop_multiprefix.sh
TEST: IPv4: host 0 to host 1, mtu 1300                              [ OK ]
TEST: IPv6: host 0 to host 1, mtu 1300                              [ OK ]

Fixes: 735ab2f65dce ("selftests: Add test with multiple prefixes using single nexthop")
Signed-off-by: Hangbin Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/net: convert fcnal-test.sh to run it in unique namespace

Here is the test result after conversion. There are some failures, but it
also exists on my system without this patch. So it's not affectec by
this patch and I will check the reason later.

  ]# time ./fcnal-test.sh
  /usr/bin/which: no nettest in (/root/.local/bin:/root/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin)

  ###########################################################################
  IPv4 ping
  ###########################################################################

  #################################################################
  No VRF

  SYSCTL: net.ipv4.raw_l3mdev_accept=0

  TEST: ping out - ns-B IP                                                      [ OK ]
  TEST: ping out, device bind - ns-B IP                                         [ OK ]
  TEST: ping out, address bind - ns-B IP                                        [ OK ]
  ...

  #################################################################
  SNAT on VRF

  TEST: IPv4 TCP connection over VRF with SNAT                                  [ OK ]
  TEST: IPv6 TCP connection over VRF with SNAT                                  [ OK ]

  Tests passed: 893
  Tests failed:  21

  real    52m48.178s
  user    0m34.158s
  sys     1m42.976s

BTW, this test needs a really long time. So expand the timeout to 1h.

Acked-by: David Ahern <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/net: convert srv6_end_dt6_l3vpn_test.sh to run it in unique namespace

As the name \${rt-${rt}} may make reader confuse, convert the variable
hs/rt in setup_rt/hs to hid, rid. Here is the test result after conversion.

]# ./srv6_end_dt6_l3vpn_test.sh

################################################################################
TEST SECTION: IPv6 routers connectivity test
################################################################################

     TEST: Routers connectivity: rt-1 -> rt-2                            [ OK ]

     TEST: Routers connectivity: rt-2 -> rt-1                            [ OK ]
...

     TEST: Hosts isolation: hs-t200-4 -X-> hs-t100-2                     [ OK ]

Tests passed:  18
Tests failed:   0

Signed-off-by: Hangbin Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/net: convert srv6_end_dt4_l3vpn_test.sh to run it in unique namespace

As the name \${rt-${rt}} may make reader confuse, convert the variable
hs/rt in setup_rt/hs to hid, rid. Here is the test result after conversion.

]# ./srv6_end_dt4_l3vpn_test.sh

################################################################################
TEST SECTION: IPv6 routers connectivity test
################################################################################

     TEST: Routers connectivity: rt-1 -> rt-2                            [ OK ]

     TEST: Routers connectivity: rt-2 -> rt-1                            [ OK ]
...
     TEST: Hosts isolation: hs-t200-4 -X-> hs-t100-2                     [ OK ]

Tests passed:  18
Tests failed:   0

Signed-off-by: Hangbin Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/net: convert srv6_end_dt46_l3vpn_test.sh to run it in unique namespace

As the name \${rt-${rt}} may make reader confuse, convert the variable
hs/rt in setup_rt/hs to hid, rid. Here is the test result after conversion.

]# ./srv6_end_dt46_l3vpn_test.sh

################################################################################
TEST SECTION: IPv6 routers connectivity test
################################################################################

     TEST: Routers connectivity: rt-1 -> rt-2                            [ OK ]

     TEST: Routers connectivity: rt-2 -> rt-1                            [ OK ]

...

     TEST: IPv4 Hosts isolation: hs-t200-4 -X-> hs-t100-2                [ OK ]

Tests passed:  34
Tests failed:   0

Signed-off-by: Hangbin Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/net: add variable NS_LIST for lib.sh

Add a global variable NS_LIST to store all the namespaces that setup_ns
created, so the caller could call cleanup_all_ns() instead of remember
all the netns names when using cleanup_ns().

Signed-off-by: Hangbin Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

page_pool: fix typos and punctuation

Correct spelling (s/and/any) and a run-on sentence.
Spell out "multi".

Signed-off-by: Randy Dunlap <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Ilias Apalodimas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: skbuff: fix spelling errors

Correct spelling as reported by codespell.

Signed-off-by: Randy Dunlap <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'net-mdio-mdio-bcm-unimac-optimizations-and-clean-up'

Justin Chen says:

====================
net: mdio: mdio-bcm-unimac: optimizations and clean up

Clean up mdio poll to use read_poll_timeout() and reduce the potential
poll time.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: mdio: mdio-bcm-unimac: Use read_poll_timeout

Simplify the code by using read_poll_timeout().

Signed-off-by: Justin Chen <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: mdio: mdio-bcm-unimac: Delay before first poll

With a clock interval of 400 nsec and a 64 bit transactions (32 bit
preamble & 16 bit control & 16 bit data), it is reasonable to assume
the mdio transaction will take 25.6 usec. Add a 30 usec delay before
the first poll to reduce the chance of a 1000-2000 usec sleep.

Reduce the timeout from 1000ms to 100ms as it is unlikely for the bus
to take this long.

Signed-off-by: Justin Chen <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'tools-ynl-gen-fill-in-the-gaps-in-support-of-legacy-families'

Jakub Kicinski says:

====================
tools: ynl-gen: fill in the gaps in support of legacy families

Fill in the gaps in YNL C code gen so that we can generate user
space code for all genetlink families for which we have specs.

The two major changes we need are support for fixed headers and
support for recursive nests.

For fixed header support - place the struct for the fixed header
directly in the request struct (and don't bother generating access
helpers). The member of a fixed header can't be too complex, and
also are by definition not optional so the user has to fill them in.
The YNL core needs a bit of a tweak to understand that the attrs
may now start at a fixed offset, which is not necessarily equal
to sizeof(struct genlmsghdr).

Dealing with nested sets is much harder. Previously we'd gen
the nested structs as:

struct outer {
struct inner inner;
};

If structs are recursive (e.g. inner contains outer again)
we must break this chain and allocate one of the structs
dynamically (store a pointer rather than full struct).
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tools: ynl-gen: print prototypes for recursive stuff

We avoid printing forward declarations and prototypes for most
types by sorting things topologically. But if structs nest we
do need the forward declarations, there's no other way.

Reviewed-by: Donald Hunter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tools: ynl-gen: store recursive nests by a pointer

To avoid infinite nesting store recursive structs by pointer.
If recursive struct is placed in the op directly - the first
instance can be stored by value. That makes the code much
less of a pain for majority of practical uses.

Reviewed-by: Donald Hunter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tools: ynl-gen: re-sort ignoring recursive nests

We try to keep the structures and helpers "topologically sorted",
to avoid forward declarations. When recursive nests are at play
we need to sort twice, because structs which end up being marked
as recursive will get a full set of forward declarations, so we
should ignore them for the purpose of sorting.

Reviewed-by: Donald Hunter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tools: ynl-gen: record information about recursive nests

Track which nests are recursive. Non-recursive nesting gets
rendered in C as directly nested structs. For recursive
ones we need to put a pointer in, rather than full struct.

Track this information, no change to generated code, yet.

Reviewed-by: Donald Hunter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tools: ynl-gen: fill in implementations for TypeUnused

Fill in more empty handlers for TypeUnused. When 'unused'
attr gets specified in a nested set we have to cleanly
skip it during code generation.

Reviewed-by: Donald Hunter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tools: ynl-gen: support fixed headers in genetlink

Support genetlink families using simple fixed headers.
Assume fixed header is identical for all ops of the family for now.

Fixed headers are added to the request and reply structs as a _hdr
member, and copied to/from netlink messages appropriately.

Reviewed-by: Donald Hunter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tools: ynl-gen: use enum user type for members and args

Commit 30c902001534 ("tools: ynl-gen: use enum name from the spec")
added pre-cooked user type for enums. Use it to fix ignoring
enum-name provided in the spec.

This changes a type in struct ethtool_tunnel_udp_entry but is
generally inconsequential for current families.

Reviewed-by: Donald Hunter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tools: ynl-gen: add missing request free helpers for dumps

The code gen generates a prototype for dump request free
in the header, but no implementation in the source.

Reviewed-by: Donald Hunter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'bpf-fs-mount-options-parsing-follow-ups'

Andrii Nakryiko says:

====================
BPF FS mount options parsing follow ups

Original BPF token patch set ([0]) added delegate_xxx mount options which
supported only special "any" value and hexadecimal bitmask. This patch set
attempts to make specifying and inspecting these mount options more
human-friendly by supporting string constants matching corresponding bpf_cmd,
bpf_map_type, bpf_prog_type, and bpf_attach_type enumerators.

This implementation relies on BTF information to find all supported symbolic
names. If kernel wasn't built with BTF, BPF FS will still support "any" and
hex-based mask.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=805707&state=*

v1->v2:
  - strip BPF_, BPF_MAP_TYPE_, and BPF_PROG_TYPE_ prefixes,
    do case-insensitive comparison, normalize to lower case (Alexei).
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

selftests/bpf: utilize string values for delegate_xxx mount options

Use both hex-based and string-based way to specify delegate mount
options for BPF FS.

Acked-by: John Fastabend <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

bpf: support symbolic BPF FS delegation mount options

Besides already supported special "any" value and hex bit mask, support
string-based parsing of delegation masks based on exact enumerator
names. Utilize BTF information of `enum bpf_cmd`, `enum bpf_map_type`,
`enum bpf_prog_type`, and `enum bpf_attach_type` types to find supported
symbolic names (ignoring __MAX_xxx guard values and stripping repetitive
prefixes like BPF_ for cmd and attach types, BPF_MAP_TYPE_ for maps, and
BPF_PROG_TYPE_ for prog types). The case doesn't matter, but it is
normalized to lower case in mount option output. So "PROG_LOAD",
"prog_load", and "MAP_create" are all valid values to specify for
delegate_cmds options, "array" is among supported for map types, etc.

Besides supporting string values, we also support multiple values
specified at the same time, using colon (':') separator.

There are corresponding changes on bpf_show_options side to use known
values to print them in human-readable format, falling back to hex mask
printing, if there are any unrecognized bits. This shouldn't be
necessary when enum BTF information is present, but in general we should
always be able to fall back to this even if kernel was built without BTF.
As mentioned, emitted symbolic names are normalized to be all lower case.

Example below shows various ways to specify delegate_cmds options
through mount command and how mount options are printed back:

12/14 14:39:07.604
vmuser@archvm:~/local/linux/tools/testing/selftests/bpf
$ mount | rg token

  $ sudo mkdir -p /sys/fs/bpf/token
  $ sudo mount -t bpf bpffs /sys/fs/bpf/token \
               -o delegate_cmds=prog_load:MAP_CREATE \
               -o delegate_progs=kprobe \
               -o delegate_attachs=xdp
  $ mount | grep token
  bpffs on /sys/fs/bpf/token type bpf (rw,relatime,delegate_cmds=map_create:prog_load,delegate_progs=kprobe,delegate_attachs=xdp)

Acked-by: John Fastabend <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>