Vladimir Oltean [Tue, 18 Apr 2023 11:14:54 +0000 (14:14 +0300)]
net: enetc: include MAC Merge / FP registers in register dump
These have been useful in debugging various problems related to frame
preemption, so make them available through ethtool --register-dump for
later too.
Vladimir Oltean [Tue, 18 Apr 2023 11:14:53 +0000 (14:14 +0300)]
net: enetc: only commit preemptible TCs to hardware when MM TX is active
This was left as TODO in commit 01e23b2b3bad ("net: enetc: add support
for preemptible traffic classes") since it's relatively complicated.
Where this makes a difference is with a configuration as follows:
ethtool --set-mm eno0 pmac-enabled on tx-enabled on verify-enabled on
Preemptible packets should only be sent when the MAC Merge TX direction
becomes active (i.o.w. when the verification process succeeds, aka when
the link partner confirms it can process preemptible traffic). But the
tc qdisc with the preemptible traffic classes is offloaded completely
asynchronously w.r.t. the MM becoming active.
The ENETC manual does suggest that this should be handled in the driver:
"On startup, software should wait for the verification process to
complete (MMCSR[VSTS]=011) before initiating traffic".
Adding the necessary logic allows future selftests to uphold the claim
that an inactive or disabled MAC Merge layer should never send data
packets through the pMAC.
This change moves enetc_set_ptcfpr() from enetc.c to enetc_ethtool.c,
where its only caller is now - enetc_mm_commit_preemptible_tcs().
Vladimir Oltean [Tue, 18 Apr 2023 11:14:52 +0000 (14:14 +0300)]
net: enetc: report mm tx-active based on tx-enabled and verify-status
The MMCSR register contains 2 fields with overlapping meaning:
- LPA (Local preemption active):
This read-only status bit indicates whether preemption is active for
this port. This bit will be set if preemption is both enabled and has
completed the verification process.
- TXSTS (Merge status):
This read-only status field provides the state of the MAC Merge sublayer
transmit status as defined in IEEE Std 802.3-2018 Clause 99.
00 Transmit preemption is inactive
01 Transmit preemption is active
10 Reserved
11 Reserved
However none of these 2 fields offer reliable reporting to software.
When connecting ENETC to a link partner which is not capable of Frame
Preemption, the expectation is that ENETC's verification should fail
(VSTS=4) and its MM TX direction should be inactive (LPA=0, TXSTS=00)
even though the MM TX is enabled (ME=1). But surprise, the LPA bit of
MMCSR stays set even if VSTS=4 and ME=1.
OTOH, the TXSTS field has the opposite problem. I cannot get its value
to change from 0, even when connecting to a link partner capable of
frame preemption, which does respond to its verification frames (ME=1
and VSTS=3, "SUCCEEDED").
The only option with such buggy hardware seems to be to reimplement the
formula for calculating tx-active in software, which is for tx-enabled
to be true, and for the verify-status to be either SUCCEEDED, or
DISABLED.
Without reliable tx-active reporting, we have no good indication when
to commit the preemptible traffic classes to hardware, which makes it
possible (but not desirable) to send preemptible traffic to a link
partner incapable of receiving it. However, currently we do not have the
logic to wait for TX to be active yet, so the impact is limited.
Vladimir Oltean [Tue, 18 Apr 2023 11:14:51 +0000 (14:14 +0300)]
net: enetc: fix MAC Merge layer remaining enabled until a link down event
Current enetc_set_mm() is designed to set the priv->active_offloads bit
ENETC_F_QBU for enetc_mm_link_state_update() to act on, but if the link
is already up, it modifies the ENETC_MMCSR_ME ("Merge Enable") bit
directly.
The problem is that it only *sets* ENETC_MMCSR_ME if the link is up, it
doesn't *clear* it if needed. So subsequent enetc_get_mm() calls still
see tx-enabled as true, up until a link down event, which is when
enetc_mm_link_state_update() will get called.
This is not a functional issue as far as I can assess. It has only come
up because I'd like to uphold a simple API rule in core ethtool code:
the pMAC cannot be disabled if TX is going to be enabled. Currently,
the fact that TX remains enabled for longer than expected (after the
enetc_set_mm() call that disables it) is going to violate that rule,
which is how it was caught.
wwan: core: add print for wwan port attach/disconnect
Refer to USB serial device or net device, there is a notice to
let end user know the status of device, like attached or
disconnected. Add attach/disconnect print for wwan device as
well.
Jakub Kicinski [Thu, 20 Apr 2023 02:00:05 +0000 (19:00 -0700)]
net: skbuff: update and rename __kfree_skb_defer()
__kfree_skb_defer() uses the old naming where "defer" meant
slab bulk free/alloc APIs. In the meantime we also made
__kfree_skb_defer() feed the per-NAPI skb cache, which
implies bulk APIs. So take away the 'defer' and add 'napi'.
While at it add a drop reason. This only matters on the
tx_action path, if the skb has a frag_list. But getting
rid of a SKB_DROP_REASON_NOT_SPECIFIED seems like a net
benefit so why not.
Simon Horman [Wed, 19 Apr 2023 11:34:35 +0000 (13:34 +0200)]
flow_dissector: Address kdoc warnings
Address a number of warnings flagged by
./scripts/kernel-doc -none include/net/flow_dissector.h
include/net/flow_dissector.h:23: warning: Function parameter or member 'addr_type' not described in 'flow_dissector_key_control'
include/net/flow_dissector.h:23: warning: Function parameter or member 'flags' not described in 'flow_dissector_key_control'
include/net/flow_dissector.h:46: warning: Function parameter or member 'padding' not described in 'flow_dissector_key_basic'
include/net/flow_dissector.h:145: warning: Function parameter or member 'tipckey' not described in 'flow_dissector_key_addrs'
include/net/flow_dissector.h:157: warning: cannot understand function prototype: 'struct flow_dissector_key_arp '
include/net/flow_dissector.h:171: warning: cannot understand function prototype: 'struct flow_dissector_key_ports '
include/net/flow_dissector.h:203: warning: cannot understand function prototype: 'struct flow_dissector_key_icmp '
Also improve indentation on adjacent lines to those changed
to address the above.
Jakub Kicinski [Wed, 19 Apr 2023 18:20:06 +0000 (11:20 -0700)]
page_pool: unlink from napi during destroy
Jesper points out that we must prevent recycling into cache
after page_pool_destroy() is called, because page_pool_destroy()
is not synchronized with recycling (some pages may still be
outstanding when destroy() gets called).
I assumed this will not happen because NAPI can't be scheduled
if its page pool is being destroyed. But I missed the fact that
NAPI may get reused. For instance when user changes ring configuration
driver may allocate a new page pool, stop NAPI, swap, start NAPI,
and then destroy the old pool. The NAPI is running so old page
pool will think it can recycle to the cache, but the consumer
at that point is the destroy() path, not NAPI.
To avoid extra synchronization let the drivers do "unlinking"
during the "swap" stage while NAPI is indeed disabled.
The CONFIG_PHYLIB symbol is selected by a number of device drivers that
need PHY support, but it now has a dependency on CONFIG_LEDS_CLASS,
which may not be enabled, causing build failures.
Avoid the risk of missing and circular dependencies by guarding the
phylib LED support itself in another Kconfig symbol that can only be
enabled if the dependency is met.
This could be made a hidden symbol and always enabled when both CONFIG_OF
and CONFIG_LEDS_CLASS are reachable from the phylib, but there may be an
advantage in having users see this option when they have a misconfigured
kernel without built-in LED support.
Linking the NAPI to the rq page_pool to improve page_pool cache
usage during skb recycling.
Here are the observed improvements for a iperf single stream
test case:
- For 1500 MTU and legacy rq, seeing a 20% improvement of cache usage.
- For 9K MTU, seeing 33-40 % page_pool cache usage improvements for
both striding and legacy rq (depending if the application is running on
the same core as the rq or not).
net/mlx5e: RX, Fix XDP_TX page release for legacy rq nonlinear case
When the XDP handler marks the data for transmission (XDP_TX),
it is incorrect to release the page fragment. Instead, the
fragments should be marked as MLX5E_WQE_FRAG_SKIP_RELEASE
because XDP will release the page directly to the page_pool
(page_pool_put_defragged_page) after TX completion.
The linear case already does this. This patch fixes the
nonlinear part as well.
Also, the looping over the fragments was incorrect: When handling
pages after XDP_TX in the legacy rq nonlinear case, the loop was
skipping the first wqe fragment.
Dragos Tatulea [Thu, 30 Mar 2023 16:37:00 +0000 (19:37 +0300)]
net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQ
mlx5e_free_rx_descs is responsible for calling the dealloc_wqe op which
returns pages to the page_pool. This can happen during flush or close.
For XSK, the regular RQ is flushed (when replaced by the XSK RQ) and
also closed later. This is normally not a problem as the wqe list is
empty on a second call to mlx5e_free_rx_descs. However, for striding RQ,
the previously released wqes from the list will appear as missing
and will be released a second time by mlx5e_free_rx_missing_descs.
This patch sets the no release bits on the striding RQ wqes in the
dealloc_wqe op to prevent releasing the pages a second time.
Please note that the bits are set only in the control path during
close and not in the data path.
Maher Sanalla [Mon, 20 Mar 2023 22:10:16 +0000 (00:10 +0200)]
net/mlx5: Add vnic devlink health reporter to PFs/VFs
Create a vnic devlink health reporter for PFs/VFs interfaces.
The reporter's diagnose callback displays the values of vNIC/vport
transport debug counters of PFs/VFs, as follows:
Maher Sanalla [Mon, 20 Mar 2023 17:43:47 +0000 (19:43 +0200)]
Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports"
This reverts commit 606e6a72e29dff9e3341c4cc9b554420e4793f401 which exposes
the vnic diagnostic counters via debugfs. Instead, The upcoming series will
expose the same counters through devlink health reporter.
This reverts commit 4fe1b3a5f8fe2fdcedcaba9561e5b0ae5cb1d15b, which
exposes the steering dropped packets counter via debugfs. The upcoming
series will expose the counter via devlink health reporter instead
of debugfs.
net/mlx5: DR, Calculate sync threshold of each pool according to its type
When certain ICM chunk is no longer needed, it needs to be freed.
Fully freeing ICM memory involves issuing FW SYNC_STEERING command.
This is very time consuming, and it is impractical to do it for every
freed chunk.
Instead, we manage these 'freed' chunks in hot list (list of chunks
that are not required by SW any more, but HW might still access them).
When size of the hot list reaches certain threshold, we purge it and
issue SYNC_STEERING FW command.
There is one threshold for all the different ICM types, which is not
optimal, as different ICM types require different approach: STEs pool
is very large, and it is very 'dynamic' in its nature, so letting hot
list to become too large will result in a significant perf hiccup when
purging the hot list. Modify action is much smaller and less dynamic,
so we can let the hot list to grow to almost the size of the whole pool.
This patch fixes this problem: instead of having same hot memory
threshold for all the pools, sync operation will be triggered in
accordance with the ICM type.
net/mlx5: DR, Fix dumping of legacy modify_hdr in debug dump
The steering dump parser expects to see 0 as rewrite num of actions
in case pattern/args aren't supported - parsing of legacy modify header
is based on this assumption.
Fix this to align to parser's expectation.
Merge branch 'fix __retval() being always ignored'
Eduard Zingerman says:
====================
Florian Westphal found a bug in test_loader.c processing of __retval tag.
Because of this bug the function test_loader.c:do_prog_test_run()
never executed and all __retval test tags were ignored. See [1].
Fix for this bug uncovers two additional bugs:
- During test_verifier tests migration to inline assembly (see [2])
I missed the fact that some tests require maps to contain mock values;
- Some issue with a new refcounted_kptr test, which causes kernel to
produce dead lock and refcount saturation warnings when subject to
libbpf's bpf_test_run_opts().
This series fixes the bug in __retval() processing, and address the
issue with test maps not being populated. The issue in refcounted_kptr
is not addressed, __retval tags in those tests are commented out.
I found that the following tests depend on test maps being populated:
- progs/verifier_array_access.c
- verifier/value_ptr_arith.c (planned for migration to inline assembly)
Given the small amount of these tests I decided to opt for simple
non-generic solution (see patch #4).
Eduard Zingerman [Thu, 20 Apr 2023 23:23:16 +0000 (02:23 +0300)]
selftests/bpf: add pre bpf_prog_test_run_opts() callback for test_loader
When a test case is annotated with __retval tag the test_loader engine
would use libbpf's bpf_prog_test_run_opts() to do a test run of the
program and compare retvals.
This commit allows to perform arbitrary actions on bpf object right
before test loader invokes bpf_prog_test_run_opts(). This could be
used to setup some state for program execution, e.g. fill some maps.
Eduard Zingerman [Thu, 20 Apr 2023 23:23:15 +0000 (02:23 +0300)]
selftests/bpf: fix __retval() being always ignored
Florian Westphal found a bug in and suggested a fix for test_loader.c
processing of __retval tag. Because of this bug the function
test_loader.c:do_prog_test_run() never executed and all __retval test
tags were ignored.
If this bug is fixed a number of test cases from
progs/verifier_array_access.c fail with retval not matching the
expected value. This test was recently converted to use test_loader.c
and inline assembly in [1]. When doing the conversion I missed the
important detail of test_verifier.c operation: when it creates
fixup_map_array_ro, fixup_map_array_wo and fixup_map_array_small it
populates these maps with a dummy record.
Disabling the __retval checks for the affected verifier_array_access
in this commit to avoid false-postivies in any potential bisects.
The issue is addressed in the next patch.
I verified that the __retval tags are now respected by changing
expected return values for all tests annotated with __retval, and
checking that these tests started to fail.
Eduard Zingerman [Thu, 20 Apr 2023 23:23:14 +0000 (02:23 +0300)]
selftests/bpf: disable program test run for progs/refcounted_kptr.c
Florian Westphal found a bug in test_loader.c processing of __retval
tag. Because of this bug the function test_loader.c:do_prog_test_run()
never executed and all __retval test tags were ignored. This hid an
issue with progs/refcounted_kptr.c tests.
When __retval tag bug is fixed and refcounted_kptr.c tests are run
kernel reports various issues and eventually hangs. Shortest reproducer
is the following command run a few times:
$ for i in $(seq 1 4); do (./test_progs --allow=refcounted_kptr &); done
Commenting out __retval tags for these tests until this issue is resolved.
bpftool: Replace "__fallthrough" by a comment to address merge conflict
The recent support for inline annotations in control flow graphs
generated by bpftool introduced the usage of the "__fallthrough" macro
in a switch/case block in btf_dumper.c. This change went through the
bpf-next tree, but resulted in a merge conflict in linux-next, because
this macro has been renamed "fallthrough" (no underscores) in the
meantime.
To address the conflict, we temporarily switch to a simple comment
instead of a macro.
Related: commit f7a858bffcdd ("tools: Rename __fallthrough to fallthrough")
Michal Schmidt [Wed, 12 Apr 2023 08:19:29 +0000 (10:19 +0200)]
ice: sleep, don't busy-wait, in the SQ send retry loop
10 ms is a lot of time to spend busy-waiting. Sleeping is clearly
allowed here, because we have just returned from ice_sq_send_cmd(),
which takes a mutex.
On kernels with HZ=100, this msleep may be twice as long, but I don't
think it matters.
I did not actually observe any retries happening here.
Michal Schmidt [Wed, 12 Apr 2023 08:19:27 +0000 (10:19 +0200)]
ice: sleep, don't busy-wait, for ICE_CTL_Q_SQ_CMD_TIMEOUT
The driver polls for ice_sq_done() with a 100 µs period for up to 1 s
and it uses udelay to do that.
Let's use usleep_range instead. We know sleeping is allowed here,
because we're holding a mutex (cq->sq_lock). To preserve the total
max waiting time, measure the timeout in jiffies.
ICE_CTL_Q_SQ_CMD_TIMEOUT is used also in ice_release_res(), but there
the polling period is 1 ms (i.e. 10 times longer). Since the timeout was
expressed in terms of the number of loops, the total timeout in this
function is 10 s. I do not know if this is intentional. This patch keeps
it.
The patch lowers the CPU usage of the ice-gnss-<dev_name> kernel thread
on my system from ~8 % to less than 1 %.
I received a report of high CPU usage with ptp4l where the busy-waiting
in ice_sq_send_cmd dominated the profile. This patch has been tested in
that usecase too and it made a huge improvement there.
Michal Schmidt [Wed, 12 Apr 2023 08:19:25 +0000 (10:19 +0200)]
ice: increase the GNSS data polling interval to 20 ms
Double the GNSS data polling interval from 10 ms to 20 ms.
According to Karol Kolacinski from the Intel team, they have been
planning to make this change.
Michal Schmidt [Wed, 12 Apr 2023 08:19:24 +0000 (10:19 +0200)]
ice: do not busy-wait to read GNSS data
The ice-gnss-<dev_name> kernel thread, which reads data from the u-blox
GNSS module, keep a CPU core almost 100% busy. The main reason is that
it busy-waits for data to become available.
A simple improvement would be to replace the "mdelay(10);" in
ice_gnss_read() with sleeping. A better fix is to not do any waiting
directly in the function and just requeue this delayed work as needed.
The advantage is that canceling the work from ice_gnss_exit() becomes
immediate, rather than taking up to ~2.5 seconds (ICE_MAX_UBX_READ_TRIES
* 10 ms).
This lowers the CPU usage of the ice-gnss-<dev_name> thread on my system
from ~90 % to ~8 %.
I am not sure if the larger 0.1 s pause after inserting data into the
gnss subsystem is really necessary, but I'm keeping that as it was.
Of course, ideally the driver would not have to poll at all, but I don't
know if the E810 can watch for GNSS data availability over the i2c bus
by itself and notify the driver.
Turns out the channelmap variable is not actually read-only, it's modified
through the MCI_GPM_CLR_CHANNEL_BIT() macro further down in the function,
so making it read-only causes page faults when that code is hit.
* tag 'rust-fixes-6.3' of https://github.com/Rust-for-Linux/linux:
rust: allow to use INIT_STACK_ALL_ZERO
rust: fix regexp in scripts/is_rust_module.sh
rust: build: Fix grep warning
scripts: generate_rust_analyzer: Handle sub-modules with no Makefile
rust: kernel: Mark rust_fmt_argument as extern "C"
rust: sort uml documentation arch support table
rust: str: fix requierments->requirements typo
- eth: cxgb4: fix use after free bugs caused by circular dependency
problem
- eth: mlxsw: pci: fix possible crash during initialization
Previous releases - always broken:
- sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg
- netfilter: validate catch-all set elements
- bridge: don't notify FDB entries with "master dynamic"
- eth: bonding: fix memory leak when changing bond type to ethernet
- eth: i40e: fix accessing vsi->active_filters without holding lock
Misc:
- Mat is back as MPTCP co-maintainer"
* tag 'net-6.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits)
net: bridge: switchdev: don't notify FDB entries with "master dynamic"
Revert "net/mlx5: Enable management PF initialization"
MAINTAINERS: Resume MPTCP co-maintainer role
mailmap: add entries for Mat Martineau
e1000e: Disable TSO on i219-LM card to increase speed
bnxt_en: fix free-runnig PHC mode
net: dsa: microchip: ksz8795: Correctly handle huge frame configuration
bpf: Fix incorrect verifier pruning due to missing register precision taints
hamradio: drop ISA_DMA_API dependency
mlxsw: pci: Fix possible crash during initialization
mptcp: fix accept vs worker race
mptcp: stops worker on unaccepted sockets at listener close
net: rpl: fix rpl header size calculation
net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete()
bonding: Fix memory leak when changing bond type to Ethernet
veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag
mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next()
bnxt_en: Fix a possible NULL pointer dereference in unload path
bnxt_en: Do not initialize PTP on older P3/P4 chips
netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements
...
net: libwx: fix memory leak in wx_setup_rx_resources
When wx_alloc_page_pool() failed in wx_setup_rx_resources(), it doesn't
release DMA buffer. Add dma_free_coherent() in the error path to release
the DMA buffer.
Bitterblue Smith [Mon, 17 Apr 2023 17:08:20 +0000 (20:08 +0300)]
wifi: rtl8xxxu: Simplify setting the initial gain
The goal of writing 0x6954341e / 0x6955341e to REG_OFDM0_XA_AGC_CORE1
appears to be setting the initial gain, which is stored in bits 0..6.
Bits 7..31 are the same as what the phy init tables write.
Modify only bits 0..6 so that we don't have to care about the values
of the others. This way we don't have to add another "else if" for the
RTL8192FU.
Why we need to change the initial gain from the default 0x20 to 0x1e?
Not sure. Some of the vendor drivers change it to 0x1e before scanning
and then restore it to the original value after.
Also add rtl8xxxu_write32_mask, rtl8xxxu_write_rfreg_mask.
These helper functions make it easier to modify only parts of a register
by eliminating the call to the register reading function and the bit
manipulations.
Bitterblue Smith [Mon, 17 Apr 2023 17:05:43 +0000 (20:05 +0300)]
wifi: rtl8xxxu: Don't print the vendor/product/serial
Most devices have a vendor name, product name, and serial number in the
efuse, but it's pretty useless. It duplicates the information already
printed by the USB subsystem:
usb 1-4: New USB device found, idVendor=0bda, idProduct=8178, bcdDevice= 2.00
usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-4: Product: 802.11n WLAN Adapter
usb 1-4: Manufacturer: Realtek
usb 1-4: SerialNumber: 00e04c000001
-> usb 1-4: Vendor: Realtek
-> usb 1-4: Product: 802.11n WLAN Adapter
usb 1-4: New USB device found, idVendor=0bda, idProduct=818b, bcdDevice= 2.00
usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-4: Product: 802.11n NIC
usb 1-4: Manufacturer: Realtek
usb 1-4: SerialNumber: 00e04c000001
-> usb 1-4: Vendor: Realtek
-> usb 1-4: Product: 802.11n NIC
-> usb 1-4: Serial not available.
usb 1-4: New USB device found, idVendor=0bda, idProduct=f179, bcdDevice= 0.00
usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-4: Product: 802.11n
usb 1-4: Manufacturer: Realtek
usb 1-4: SerialNumber: 002E2DC0041F
-> usb 1-4: Vendor: Realtek
-> usb 1-4: Product: 802.11n
usb 1-4: New USB device found, idVendor=0bda, idProduct=8179, bcdDevice= 0.00
usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-4: Product: 802.11n NIC
usb 1-4: Manufacturer: Realtek
usb 1-4: SerialNumber: 00E04C0001
-> usb 1-4: Vendor: Realtek
-> usb 1-4: Product: 802.11n NIC
-> usb 1-4: Serial: 00E04C0001
Also, that data is not interpreted correctly in all cases:
usb 3-1.1.2: New USB device found, idVendor=0bda, idProduct=8179, bcdDevice= 0.00
usb 3-1.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 3-1.1.2: Product: 802.11n NIC
usb 3-1.1.2: Manufacturer: Realtek
usb 3-1.1.2: Vendor: Realtek
usb 3-1.1.2: Product: \x03802.11n NI
usb 3-1.1.2: Serial: \xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff
wifi: rtw88: call rtw8821c_switch_rf_set() according to chip variant
We have to call rtw8821c_switch_rf_set() with SWITCH_TO_WLG or
SWITCH_TO_BTG according to the chip variant as denoted in rfe_option.
The information which argument to use for which variant has been
taken from the vendor driver.
wifi: rtw88: set pkg_type correctly for specific rtw8821c variants
According to the vendor driver the pkg_type has to be set to '1'
for some rtw8821c variants. As the pkg_type has been hardcoded to
'0', add a field for it in struct rtw_hal and set this correctly
in the rtw8821c part.
With this parsing of a rtw_table is influenced and check_positive()
in phy.c returns true for some cases here. The same is done in the
vendor driver. However, this has no visible effect on the driver
here.
On my RTW8821CU chipset rfe_option reads as 0x22. Looking at the
vendor driver suggests that the field width of rfe_option is 5 bit,
so rfe_option should be masked with 0x1f.
Without this the rfe_option comparisons with 2 further down the
driver evaluate as false when they should really evaluate as true.
The effect is that 2G channels do not work.
rfe_option is also used as an array index into rtw8821c_rfe_defs[].
rtw8821c_rfe_defs[34] (0x22) was added as part of adding USB support,
likely because rfe_option reads as 0x22. As this now becomes 0x2,
rtw8821c_rfe_defs[34] is no longer used and can be removed.
Note that this might not be the whole truth. In the vendor driver
there are indeed places where the unmasked rfe_option value is used.
However, the driver has several places where rfe_option is tested
with the pattern if (rfe_option == 2 || rfe_option == 0x22) or
if (rfe_option == 4 || rfe_option == 0x24), so that rfe_option BIT(5)
has no influence on the code path taken. We therefore mask BIT(5)
out from rfe_option entirely until this assumption is proved wrong
by some chip variant we do not know yet.
wifi: rtw88: usb: fix priority queue to endpoint mapping
The RTW88 chipsets have four different priority queues in hardware. For
the USB type chipsets the packets destined for a specific priority queue
must be sent through the endpoint corresponding to the queue. This was
not fully understood when porting from the RTW88 USB out of tree driver
and thus violated.
This patch implements the qsel to endpoint mapping as in
get_usb_bulkout_id_88xx() in the downstream driver.
Without this the driver often issues "timed out to flush queue 3"
warnings and often TX stalls completely.
Allow 8822c to operate two interface concurrently, only 1 AP mode plus
1 station mode under same frequency is supported. Combination of other
types will not be added until requested.
wifi: rtw88: handle station mode concurrent scan with AP mode
This patch allows vifs sharing same hardware with the AP mode vif to
do scan, do note that this could lead to packet loss or disconnection
of the AP's clients. Since we don't have chanctx, update scan info
upon set channel so bandwidth changes won't go unnoticed and get
misconfigured after scan. Download beacon just before scan starts to
allow hardware to get proper content to do beaconing. Last, beacons
should only be transmitted in AP's operating channel. Turn related
beacon functions off while we're in other channels so the receiving
stations won't get confused.
Only abort scan with current scanning VIF. If we have more than one
interface, we could call rtw_hw_scan_abort() with the wrong VIF as
input. This avoids potential null pointer being accessed when actually
the other VIF is scanning.
wifi: rtw88: refine reserved page flow for AP mode
Only gather reserved pages from AP interface after it has started. Or
else ieee80211_beacon_get_*() returns NULL and causes other VIFs'
reserved pages fail to download. Update location of current reserved page
after beacon renews so offsets changed by beacon can be recognized.
Extend 8822c's reserved page number to accommodate additional required
pages. Reserved page is an area of memory in the FIFO dedicated for
special purposes. Previously only one interface is supported so 8 pages
should suffice, extend it so we can support 2 interfaces concurrently.
Switch port settings if AP mode does not start on port 0 because of
hardware limitation. For some ICs, beacons on ports other than zero
could misbehave and do not issue properly, to fix this we change AP
VIFs to port zero when multiple interfaces is active.
In order to support multiple interfaces, multiple port settings will
be required. Current code always uses port 0 and should be changed.
Declare a bitmap with size equal to hardware port number to record
the current usage.
wifi: rtw89: mac: use regular int as return type of DLE buffer request
The function to request DLE (data link engine) buffer uses 'u16' as return
value that mixes error code, so change it to 'int' as regular error code.
Also, treat invalid register value (0xfff) as an error.
wifi: iwlwifi: mvm: fix RFKILL report when driver is going down
When CSME takes ownership, the driver sets RFKILL on, and this
triggers driver unload and sending the confirmation SAP message.
However, when IWL_MVM_MEI_REPORT_RFKILL is set, RFKILL was not
reported and as a result, the driver did not confirm the ownership
transition. Fix it.
wifi: iwlwifi: mei: re-ask for ownership after it was taken by CSME
When the host disconnects from the AP CSME takes ownership right away.
Since the driver never asks for ownership again wifi is left in rfkill
until CSME releases the NIC, although in many cases the host could
re-connect shortly after the disconnection. To allow the host to
recover from occasional disconnection, re-ask for ownership to let
the host connect again.
Allow one minute before re-asking for ownership to avoid too frequent
ownership transitions.
wifi: iwlwifi: mei: make mei filtered scan more aggressive
When mei filtered scan is performed, it must find the AP on the first
scan, otherwise CSME will take the ownership of the NIC.
Make this scan more aggressive by scanning the channel the AP is
supposed to be on (as reported by CSME) several times.
wifi: iwlwifi: modify scan request and results when in link protection
When CSME is connected and has link protection set, the driver must
connect to the same AP CSME is connected to.
When in link protection, modify scan request parameters to include
only the channel of the AP CSME is connected to and scan for the
same SSID. In addition, filter the scan results to include only
results from the same AP. This will make sure the driver will connect
to the same AP and will do it fast enough to keep the session alive.
Johannes Berg [Tue, 18 Apr 2023 09:28:08 +0000 (12:28 +0300)]
wifi: iwlwifi: mvm: fix potential memory leak
If we do get multiple notifications from firmware, then
we might have allocated 'notif', but don't free it. Fix
that by checking for duplicates before allocation.
Johannes Berg [Tue, 18 Apr 2023 09:28:06 +0000 (12:28 +0300)]
wifi: iwlwifi: mvm: fix MIC removal confusion
The RADA/firmware collaborate on MIC stripping in the following
way:
- the firmware fills the IWL_RX_MPDU_MFLG1_MIC_CRC_LEN_MASK
value for how many words need to be removed at the end of
the frame, CRC and, if decryption was done, MIC
- if the RADA is active, it will
- remove that much from the end of the frame
- zero the value in IWL_RX_MPDU_MFLG1_MIC_CRC_LEN_MASK
As a consequence, the only thing the driver should need to do
is to
- unconditionally tell mac80211 that the MIC was removed
if decryption was already done
- remove as much as IWL_RX_MPDU_MFLG1_MIC_CRC_LEN_MASK says
at the end of the frame, since either RADA did it and then
the value is 0, or RADA was disabled and then the value is
whatever should be removed to strip both CRC & MIC
However, all this code was historically grown and getting a
bit confused. Originally, we were indicating that the MIC was
not stripped, which is the version of the code upstreamed in
commit 780e87c29e77 ("iwlwifi: mvm: add 9000 series RX processing")
which indicated RX_FLAG_DECRYPTED in iwl_mvm_rx_crypto().
We later had a commit to change that to also indicate that the
MIC was stripped, adding RX_FLAG_MIC_STRIPPED. However, this was
then "fixed" later to only do that conditionally on RADA being
enabled, since otherwise RADA didn't strip the MIC bytes yet.
At the time, we were also always including the FCS if the RADA
was not enabled, so that was still broken wrt. the FCS if the
RADA isn't enabled - but that's a pretty rare case. Notably
though, it does happen for management frames, where we do need
to remove the MIC and CRC but the RADA is disabled.
Later, in commit 40a0b38d7a7f ("iwlwifi: mvm: Fix calculation of
frame length"), we changed this again, upstream this was just a
single commit, but internally it was split into first the correct
commit and then an additional fix that reduced the number of bytes
that are removed by crypt_len. Note that this is clearly wrong
since crypt_len indicates the length of the PN header (always 8),
not the length of the MIC (8 or 16 depending on algorithm).
However, this additional fix mostly canceled the other bugs,
apart from the confusion about the size of the MIC.
To fix this correctly, remove all those additional workarounds.
We really should always indicate to mac80211 the MIC was stripped
(it cannot use it anyway if decryption was already done), and also
always actually remove it and the CRC regardless of the RADA being
enabled or not. That's simple though, the value indicated in the
metadata is zeroed by the RADA if it's enabled and used the value,
so there's no need to check if it's enabled or not.
Notably then, this fixes the MIC size confusion, letting us receive
GCMP-256 encrypted management frames correctly that would otherwise
be reported to mac80211 8 bytes too short since the RADA is turned
off for them, crypt_len is 8, but the MIC size is 16, so when we do
the adjustment based on IWL_RX_MPDU_MFLG1_MIC_CRC_LEN_MASK (which
indicates 20 bytes to remove) we remove 12 bytes but indicate then
to mac80211 the MIC is still present, so mac80211 again removes the
MIC of 16 bytes, for an overall removal of 28 rather than 20 bytes.
Johannes Berg [Tue, 18 Apr 2023 09:28:05 +0000 (12:28 +0300)]
wifi: iwlwifi: fw: fix memory leak in debugfs
Fix a memory leak that occurs when reading the fw_info
file all the way, since we return NULL indicating no
more data, but don't free the status tracking object.
Vladimir Oltean [Tue, 18 Apr 2023 15:59:02 +0000 (18:59 +0300)]
net: bridge: switchdev: don't notify FDB entries with "master dynamic"
There is a structural problem in switchdev, where the flag bits in
struct switchdev_notifier_fdb_info (added_by_user, is_local etc) only
represent a simplified / denatured view of what's in struct
net_bridge_fdb_entry :: flags (BR_FDB_ADDED_BY_USER, BR_FDB_LOCAL etc).
Each time we want to pass more information about struct
net_bridge_fdb_entry :: flags to struct switchdev_notifier_fdb_info
(here, BR_FDB_STATIC), we find that FDB entries were already notified to
switchdev with no regard to this flag, and thus, switchdev drivers had
no indication whether the notified entries were static or not.
For example, this command:
ip link add br0 type bridge && ip link set swp0 master br0
bridge fdb add dev swp0 00:01:02:03:04:05 master dynamic
has never worked as intended with switchdev. It causes a struct
net_bridge_fdb_entry to be passed to br_switchdev_fdb_notify() which has
a single flag set: BR_FDB_ADDED_BY_USER.
This is further passed to the switchdev notifier chain, where interested
drivers have no choice but to assume this is a static (does not age) and
sticky (does not migrate) FDB entry. So currently, all drivers offload
it to hardware as such, as can be seen below ("offload" is set).
bridge fdb get 00:01:02:03:04:05 dev swp0 master
00:01:02:03:04:05 dev swp0 offload master br0
The software FDB entry expires $ageing_time centiseconds after the
kernel last sees a packet with this MAC SA, and the bridge notifies its
deletion as well, so it eventually disappears from hardware too.
This is a problem, because it is actually desirable to start offloading
"master dynamic" FDB entries correctly - they should expire $ageing_time
centiseconds after the *hardware* port last sees a packet with this
MAC SA - and this is how the current incorrect behavior was discovered.
With an offloaded data plane, it can be expected that software only sees
exception path packets, so an otherwise active dynamic FDB entry would
be aged out by software sooner than it should.
With the change in place, these FDB entries are no longer offloaded:
bridge fdb get 00:01:02:03:04:05 dev swp0 master
00:01:02:03:04:05 dev swp0 master br0
and this also constitutes a better way (assuming a backport to stable
kernels) for user space to determine whether the kernel has the
capability of doing something sane with these or not.
As opposed to "master dynamic" FDB entries, on the current behavior of
which no one currently depends on (which can be deduced from the lack of
kselftests), Ido Schimmel explains that entries with the "extern_learn"
flag (BR_FDB_ADDED_BY_EXT_LEARN) should still be notified to switchdev,
since the spectrum driver listens to them (and this is kind of okay,
because although they are treated identically to "static", they are
expected to not age, and to roam).
selftests/bpf: Add test to access integer type of variable array
Add prog test for accessing integer type of variable array in tracing
program.
In addition, hook load_balance function to access sd->span[0], only
to confirm whether the load is successful. Because there is no direct
way to trigger load_balance call.
bpf: support access variable length array of integer type
After this commit:
bpf: Support variable length array in tracing programs (9c5f8a1008a1)
Trace programs can access variable length array, but for structure
type. This patch adds support for integer type.
Paul reports that it causes a regression with IB on CX4
and FW 12.18.1000. In addition I think that the concept
of "management PF" is not fully accepted and requires
a discussion.
====================
Another crack at a handshake upcall mechanism
Here is v10 of a series to add generic support for transport layer
security handshake on behalf of kernel socket consumers (user space
consumers use a security library directly, of course). A summary of
the purpose of these patches is archived here:
The first patch in the series applies to the top-level .gitignore
file to address the build warnings reported a few days ago. I intend
to submit that separately. I'd like you to consider taking the rest
of this series for v6.4.
The full patch set to support SunRPC with TLSv1.3 is available in
the topic-rpc-with-tls-upcall branch here, based on net-next/main:
Chuck Lever [Mon, 17 Apr 2023 14:32:26 +0000 (10:32 -0400)]
net/handshake: Create a NETLINK service for handling handshake requests
When a kernel consumer needs a transport layer security session, it
first needs a handshake to negotiate and establish a session. This
negotiation can be done in user space via one of the several
existing library implementations, or it can be done in the kernel.
No in-kernel handshake implementations yet exist. In their absence,
we add a netlink service that can:
a. Notify a user space daemon that a handshake is needed.
b. Once notified, the daemon calls the kernel back via this
netlink service to get the handshake parameters, including an
open socket on which to establish the session.
c. Once the handshake is complete, the daemon reports the
session status and other information via a second netlink
operation. This operation marks that it is safe for the
kernel to use the open socket and the security session
established there.
The notification service uses a multicast group. Each handshake
mechanism (eg, tlshd) adopts its own group number so that the
handshake services are completely independent of one another. The
kernel can then tell via netlink_has_listeners() whether a handshake
service is active and prepared to handle a handshake request.
A new netlink operation, ACCEPT, acts like accept(2) in that it
instantiates a file descriptor in the user space daemon's fd table.
If this operation is successful, the reply carries the fd number,
which can be treated as an open and ready file descriptor.
While user space is performing the handshake, the kernel keeps its
muddy paws off the open socket. A second new netlink operation,
DONE, indicates that the user space daemon is finished with the
socket and it is safe for the kernel to use again. The operation
also indicates whether a session was established successfully.
Chuck Lever [Mon, 17 Apr 2023 14:32:19 +0000 (10:32 -0400)]
.gitignore: Do not ignore .kunitconfig files
Circumvent the .gitignore wildcard to avoid warnings about ignored
.kunitconfig files. As far as I can tell, the warnings are harmless
and these files are not actually ignored.
Jakub Kicinski [Thu, 20 Apr 2023 01:46:17 +0000 (18:46 -0700)]
Merge tag 'ipsec-next-2023-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
ipsec-next 2023-04-19
1) Remove inner/outer modes from input/output path. These are
not needed anymore. From Herbert Xu.
* tag 'ipsec-next-2023-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: Remove inner/outer modes from output path
xfrm: Remove inner/outer modes from input path
====================
A JSON pointer reference (the part after the "#") must start with a "/".
Conversely, references to the entire document must not have a trailing "/"
and should be just a "#". The existing jsonschema package allows these,
but coming changes make allowed "$ref" URIs stricter and throw errors on
these references.
At the beginning of the file micrel.c there is list of supported PHYs.
Extend this list with the following PHYs lan8841, lan8814 and lan8804,
as these PHYs were added but the list was not updated.
Simon Horman [Tue, 18 Apr 2023 11:07:33 +0000 (13:07 +0200)]
net: stmmac: dwmac-meson8b: Avoid cast to incompatible function type
Rather than casting clk_disable_unprepare to an incompatible function
type provide a trivial wrapper with the correct signature for the
use-case.
Reported by clang-16 with W=1:
drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:276:6: error: cast from 'void (*)(struct clk *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
(void(*)(void *))clk_disable_unprepare,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
No functional change intended.
Compile tested only.
Jakub Kicinski [Thu, 20 Apr 2023 01:22:18 +0000 (18:22 -0700)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
bpf 2023-04-19
We've added 3 non-merge commits during the last 6 day(s) which contain
a total of 3 files changed, 34 insertions(+), 9 deletions(-).
The main changes are:
1) Fix a crash on s390's bpf_arch_text_poke() under a NULL new_addr,
from Ilya Leoshkevich.
2) Fix a bug in BPF verifier's precision tracker, from Daniel Borkmann
and Andrii Nakryiko.
3) Fix a regression in veth's xdp_features which led to a broken BPF CI
selftest, from Lorenzo Bianconi.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix incorrect verifier pruning due to missing register precision taints
veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag
s390/bpf: Fix bpf_arch_text_poke() with new_addr == NULL
====================
Merge tag 'mm-hotfixes-stable-2023-04-19-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"22 hotfixes.
19 are cc:stable and the remainder address issues which were
introduced during this merge cycle, or aren't considered suitable for
-stable backporting.
19 are for MM and the remainder are for other subsystems"
* tag 'mm-hotfixes-stable-2023-04-19-16-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
nilfs2: initialize unused bytes in segment summary blocks
mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages
mm/mmap: regression fix for unmapped_area{_topdown}
maple_tree: fix mas_empty_area() search
maple_tree: make maple state reusable after mas_empty_area_rev()
mm: kmsan: handle alloc failures in kmsan_ioremap_page_range()
mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush()
tools/Makefile: do missed s/vm/mm/
mm: fix memory leak on mm_init error handling
mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()
Revert "userfaultfd: don't fail on unrecognized features"
writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs
maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug
tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used
mailmap: update jtoppins' entry to reference correct email
mm/mempolicy: fix use-after-free of VMA iterator
mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO
mm/mprotect: fix do_mprotect_pkey() return on error
mm/khugepaged: check again on anon uffd-wp during isolation
...
e1000e: Disable TSO on i219-LM card to increase speed
While using i219-LM card currently it was only possible to achieve
about 60% of maximum speed due to regression introduced in Linux 5.8.
This was caused by TSO not being disabled by default despite commit f29801030ac6 ("e1000e: Disable TSO for buffer overrun workaround").
Fix that by disabling TSO during driver probe.
The patch in fixes changed the way real-time mode is chosen for PHC on
the NIC. Apparently there is one more use case of the check outside of
ptp part of the driver which was not converted to the new macro and is
making a lot of noise in free-running mode.
Daniel Golle [Sun, 16 Apr 2023 12:08:14 +0000 (13:08 +0100)]
net: dsa: mt7530: fix support for MT7531BE
There are two variants of the MT7531 switch IC which got different
features (and pins) regarding port 5:
* MT7531AE: SGMII/1000Base-X/2500Base-X SerDes PCS
* MT7531BE: RGMII
Moving the creation of the SerDes PCS from mt753x_setup to mt7530_probe
with commit 6de285229773 ("net: dsa: mt7530: move SGMII PCS creation
to mt7530_probe function") works fine for MT7531AE which got two
instances of mtk-pcs-lynxi, however, MT7531BE requires mt7531_pll_setup
to setup clocks before the single PCS on port 6 (usually used as CPU
port) starts to work and hence the PCS creation failed on MT7531BE.
Fix this by introducing a pointer to mt7531_create_sgmii function in
struct mt7530_priv and call it again at the end of mt753x_setup like it
was before commit 6de285229773 ("net: dsa: mt7530: move SGMII PCS
creation to mt7530_probe function").
Merge tag 'spi-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"A small fix in the error handling for the rockchip driver, ensuring we
don't leak clock enables if we fail to request the interrupt for the
device"
* tag 'spi-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-rockchip: Fix missing unwind goto in rockchip_sfc_probe()
Merge tag 'regulator-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A few driver specific fixes, one build coverage issue and a couple of
'someone typed in the wrong number' style errors in describing devices
to the subsystem"
* tag 'regulator-fix-v6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: sm5703: Fix missing n_voltages for fixed regulators
regulator: fan53555: Fix wrong TCS_SLEW_MASK
regulator: fan53555: Explicitly include bits header