Git Repo - linux.git/log

wifi: iwlwifi: remove retry loops in start

There's either the pldr_sync case, in which case we didn't want
or do the retry loops anyway, or things will just continue to
fail. Remove the retry loop that was added in a previous attempt
to address the issue that was later (though still a bit broken)
addressed by the pldr_sync case.

Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240123200528.f80a88a18799.I48f21eda090f4cc675f40e99eef69a986d21b500@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: mvm: limit EHT 320 MHz MCS for STEP URM

If the STEP (the interface between MAC and PHY) is in URM
(a lower speed mode) then we cannot use 320 MHz MCS > 9.
Therefore, limit the MCS in our capabilities in this case.
Note that this also limits the TX/rate scaling since that
takes both TX and RX capabilities into account.

Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240123200528.02bae683b7fc.Id5efbb71d45da02c8c4e211d20396637ddd44da8@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: disable 160 MHz based on subsystem device ID

The driver should not send 160 MHz BW support for 5 GHz
band in HE if PCI subsystem device ID indicates no 160 MHz support.

Signed-off-by: Mukesh Sisodiya <[email protected]>
Reviewed-by: Mordechay Goodstein <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240126085924.77c248ce6986.I558e8d0cf19dc862b1c4124df78a4cb690095bb2@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: make TB reallocation a debug message

There's no need to print this, it's a known issue and
the workaround works just fine. Make the reallocation
message just a debug message.

Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240123200528.329d5f2ee7f7.I0bfc6dde17fe2c738129f3aba746c6cba57589f9@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: Add support for new 802.11be device

Add support for the new 802.11be device with limites capabilities:
- 320 MHz isn't supported
- MCSs 12 and 13 are not supported

Signed-off-by: Mukesh Sisodiya <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240123200528.8529bd2acedf.I25dccb7bbeb21b8df2123fad51dde7fcf137a508@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: iwlwifi: add kunit test for devinfo ordering

We used to have a test built into the code for this internally,
but now we can put that into kunit and let everyone run it, to
verify the devinfo table ordering if it's changed.

Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Benjamin Berg <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240123200528.a4a8af7c091f.I0fb09083317b331168b99b8db39656a126a5cc4d@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: apply duration for SW scan

This patch makes duration in scan request be applicable when using
SW scan, but only accepts durations greater than the default value for
the following reasons:
1. Most APs have a beacoon interval of 100ms.
2. Sending and receiving probe require some delay.
3. Setting channel to HW also requires some delays

Signed-off-by: Michael-CY Lee <[email protected]>
Link: https://msgid.link/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: use deflink and fix typo in link ID check

This does not change anything effectively, but it is closer to what the
code is trying to achieve here. i.e. select the link data if it is an
MLD and fall back to using the deflink otherwise.

Fixes: 0f99f0878350 ("wifi: mac80211: Print local link address during authentication")
Signed-off-by: Benjamin Berg <[email protected]>
Reviewed-by: Ilan Peer <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240111181514.4c4b1c40eb3c.I2771621dee328c618536596b7e56232df42a79c8@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: don't set bss_conf in parsing

When parsing 6 GHz operation, don't set the bss_conf
values. We only commit to that later in association,
so move the code there. Also clear it later.

While at it, handle IEEE80211_6GHZ_CTRL_REG_VLP_AP.

Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240111181514.c2da4bc515e8.I219ca40e15c0fbaff0e7c3e83ca4b92ecbc1f8ae@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: disallow drivers with HT wider than HE

To simplify the code in the next patch, disallow drivers
supporting 40 MHz in HT but not HE, since we'd otherwise
have to track local maximum bandwidth per mode there.

Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240111181514.da15fe3214d2.I4df51ad2f4c844615c168bf9bdb498925b3c77d4@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: simplify HE capability access

For verifying the required HE capabilities are supported
locally, we access the HE capability element of the AP.
Simplify that access, we've already parsed and validated
it when parsing elements.

Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240111181514.2ef62b43caeb.I8baa604dd3f3399e08b86c99395a2c6a1185d35d@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: remove extra element parsing

We already parse all the BSS elements into elems, there's
really no need to separately find EHT/ML again. Remove the
extra code.

Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240111181514.c4a55da9f778.I112b1ef00904c4183ac7644800f8daa8a4449875@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: simplify ieee80211_config_bw() prototype

The only user of this function passes a lot of pointers
directly from the parsed elements, so it's simpler to
just pass the entire elements parsing struct. This also
shows that the ht_cap is actually unused.

Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240111181514.f0653cd5e7dd.I8bd5ee848074029a9f0495c95e4339546ad8fe15@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211_hwsim: advertise 15 simultaneous links

Advertise MLD capabilities and operations in AP mode that
say that up to 15 links are supported simultaneously.

Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Reviewed-by: Ilan Peer <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240111181514.52a1d48b67e6.Ie459df742944d24d6401683d54d2f3ac44834803@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: cfg80211: validate MLO connections better

When going into an MLO connection, validate that the link IDs
match what userspace indicated, and that the AP MLD addresses
and capabilities are all matching between the links.

Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240102213313.ff83c034cb9a.I9962db0bfa8c73b37b8d5b59a3fad7f02f2129ae@changeid
[roll in extra fix from Miri to actually check the return value]
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: take EML/MLD capa from assoc response

The association response is more likely to be correct
than a random scan result, which really also should be
correct, but we generally prefer to take data from the
association response, so do that here as well.

Also reset the data so it doesn't hang around from an
old connection to a non-MLO connection, drivers would
hopefully not look at it, but less surprise this way.

Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240102213313.1d10f1d1dbab.I545e955675e2269a52496a22ae7822d95b40235e@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211_hwsim: advertise AP-side EMLSR/EMLMR capa

Advertise EMLSR and EMLMR capability on the AP side to be
a better compliant AP MLD.

Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Ilan Peer <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240102213313.dc8786efa787.Ic460c13a91d770c208ac16d0b3e94941bab9b8eb@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: add support for SPP A-MSDUs

If software crypto is used, simply add support for SPP A-MSDUs
(and use it whenever enabled as required by the cfg80211 API).

If hardware crypto is used, leave it up to the driver to set
the NL80211_EXT_FEATURE_SPP_AMSDU_SUPPORT flag and then check
sta->spp_amsdu or the IEEE80211_KEY_FLAG_SPP_AMSDU key flag.

Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Daniel Gabay <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240102213313.b8ada4514e2b.I1ac25d5f158165b5a88062a5a5e4c4fbeecf9a5d@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: cfg80211: add support for SPP A-MSDUs

Add SPP (signaling and payload protected) AMSDU support.

Since userspace has to build the RSNX element, add an extended
feature flag to indicate that this is supported.

In order to avoid downgrade/mismatch attacks, add a flag to the assoc
command on the station side, so that we can be sure that the value of
the flag comes from the same RSNX element that will be validated by
the supplicant against the 4-way-handshake. If we just pulled the
data out of a beacon/probe response, we could theoretically look an
RSNX element from a different frame, with a different value for this
flag, than the supplicant is using to validate in the
4-way-handshake.

Note that this patch is only geared towards software crypto
implementations or hardware ones that can perfectly implement SPP
A-MSDUs, i.e. are able to switch the AAD construction on the fly for
each TX/RX frame.

For more limited hardware implementations, more capability
advertisement would be required, e.g. if the hardware has no way
to switch this on the fly but has only a global configuration that
must apply to all stations.

The driver could of course *reject* mismatches, but the supplicant
must know so it can do things like not negotiating SPP A-MSDUs on
a T-DLS link when connected to an AP that doesn't support it, or
similar.

Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Daniel Gabay <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240102213313.fadac8df7030.I9240aebcba1be49636a73c647ed0af862713fc6f@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211_hwsim: Declare support for negotiated TTLM

Advertise support for negotiated TTLM in AP mode for testing
purposes. In addition, declare support for some extended
capabilities that are globally advertised by mac80211.

Signed-off-by: Ilan Peer <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Reviewed-by: Johannes Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240102213313.3f54382f8449.I42b2f7c52f7574448cc8da3ad3db45075e4e0baa@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: add support for negotiated TTLM request

Update neg_ttlm and active_links according to the new mapping,
and send a negotiated TID-to-link map request with the new mapping.

Signed-off-by: Ayala Beker <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Reviewed-by: Johannes Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240102213313.eeb385d771df.I2a5441c14421de884dbd93d1624ce7bb2c944833@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211_hwsim: handle BSS_CHANGED_MLD_TTLM

Same as in BSS_CHANGED_VALID_LINKS, set the active
links to all the usable links.

Signed-off-by: Ayala Beker <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Reviewed-by: Johannes Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240102213313.3c28da3534e9.I76846c5dd693f930d4828e411c734639708b5a1a@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211_hwsim: handle TID to link mapping neg request

Accept the request if all TIDs are mapped to the same link set,
otherwise reject it.

Signed-off-by: Ayala Beker <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Reviewed-by: Johannes Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240102213313.dfa8e132d0cd.I5fbec1fef933980819ea39c1227f37d307ab1145@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: mac80211: process and save negotiated TID to Link mapping request

An MLD may send TID-to-Link mapping request frame to negotiate
TID to link mapping with a peer MLD.
Support handling negotiated TID-to-Link mapping request frame
by parsing the frame, asking the driver whether it supports the
received mapping or not, and sending a TID-to-Link mapping response
to the AP MLD.
Theoretically, links that became inactive due to the received TID-to-Link
mapping request, can be selected to be activated but this would require
tearing down the negotiated TID-to-Link mapping, which is still not
supported.

Signed-off-by: Ayala Beker <[email protected]>
Reviewed-by: Johannes Berg <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240102213313.0bc1a24fcc9d.Ie72e47dc6f8c77d4a2f0947b775ef6367fe0edac@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: ieee80211: add definitions for negotiated TID to Link map

Add the relevant definitions and structures for TID to Link mapping
negotiation request/response/teardown according to P802.11be_D4.0.

Signed-off-by: Ayala Beker <[email protected]>
Reviewed-by: Gregory Greenman <[email protected]>
Reviewed-by: Johannes Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240102213313.9ef2b866c8c7.Ieaf7dadea9961e0edc55d19c99f0f9fbae591de6@changeid
Signed-off-by: Johannes Berg <[email protected]>

wifi: cfg80211: add RNR with reporting AP information

If the reporting AP is part of the same MLD, then an entry in the RNR is
required in order to discover it again from the BSS generated from the
per-STA profile in the Multi-Link Probe Response.

We need this because we do not have a direct concept of an MLD AP and
just do the lookup from one to the other on the fly if needed. As such,
we need to ensure that this lookup will work both ways.

Fixes: 2481b5da9c6b ("wifi: cfg80211: handle BSS data contained in ML probe responses")
Signed-off-by: Benjamin Berg <[email protected]>
Signed-off-by: Miri Korenblit <[email protected]>
Link: https://msgid.link/20240102213313.4cb3dbb1d84f.I7c74edec83c5d7598cdd578929fd0876d67aef7f@changeid
[roll in off-by-one fix and test updates from Benjamin]
Signed-off-by: Johannes Berg <[email protected]>

wifi: wireless: avoid strlen() in cfg80211_michael_mic_failure()

In 'cfg80211_michael_mic_failure()', avoid extra call to 'strlen()'
by using the value returned by 'sprintf()'. Compile tested only.

Signed-off-by: Dmitry Antipov <[email protected]>
Link: https://msgid.link/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

tsnep: Add link down PHY loopback support

PHY loopback turns off link state change signalling. Therefore, the
loopback only works if the link is already up before the PHY loopback is
activated.

Ensure that PHY loopback works even if the link is not already up during
activation by calling netif_carrier_on() explicitly.

Signed-off-by: Gerhard Engleder <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

gve: Modify rx_buf_alloc_fail counter centrally and closer to failure

Previously, each caller of gve_rx_alloc_buffer had to increase counter
and as a result one caller was not tracking those failure. Increasing
counters at a common location now so callers don't have to duplicate
code or miss counter management.

Signed-off-by: Ankit Garg <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'selftests-updates-to-fcnal-test-for-autoamted-environment'

David Ahern says:

====================
selftests: Updates to fcnal-test for autoamted environment

The first patch updates the PATH for fcnal-test.sh to find the nettest
binary when invoked at the top-level directory via
make -C tools/testing/selftests TARGETS=net run_tests

Second patch fixes a bug setting the ping_group; it has a compound value
and that value is not traversing the various helper functions in tact.
Fix by creating a helper specific to setting it.

Third patch adds more output when a test fails - e.g., to catch a change
in the return code of some test.

With these 3 patches, the entire suite completes successfully when
run on Ubuntu 23.10 with 6.5 kernel - 914 tests pass, 0 fail.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftest: Show expected and actual return codes for test failures in fcnal-test

Capture expected and actual return codes for a test that fails in
the fcnal-test suite.

Signed-off-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftest: Fix set of ping_group_range in fcnal-test

ping_group_range sysctl has a compound value which does not go
through the various function layers in tact. Create a helper
function to bypass the layers and correctly set the value.

Signed-off-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftest: Update PATH for nettest in fcnal-test

Allow fcnal-test.sh to be run from top level directory in the
kernel repo as well as from tools/testing/selftests/net by
setting the PATH to find the in-tree nettest.

Signed-off-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'wireless-next-2024-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.9

The first "new features" pull request for v6.9. We have only driver
changes this time and most of them are for Realtek drivers. Really
nice to see activity in Broadcom drivers again.

Major changes:

rtwl8xxxu
* RTL8188F: concurrent interface support
* Channel Switch Announcement (CSA) support in AP mode

brcmfmac
* per-vendor feature support
* per-vendor SAE password setup

rtlwifi
* speed up USB firmware initialisation

* tag 'wireless-next-2024-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (105 commits)
  wifi: iwlegacy: Use kcalloc() instead of kzalloc()
  wifi: rtw89: fix disabling concurrent mode TX hang issue
  wifi: rtw89: fix HW scan timeout due to TSF sync issue
  wifi: rtw89: add wait/completion for abort scan
  wifi: rtw89: fix null pointer access when abort scan
  wifi: rtw89: disable RTS when broadcast/multicast
  wifi: rtw89: Set default CQM config if not present
  wifi: rtw89: refine hardware scan C2H events
  wifi: rtw89: refine add_chan H2C command to encode_bits
  wifi: rtw89: 8922a: add BTG functions to assist BT coexistence to control TX/RX
  wifi: rtw89: 8922a: add TX power related ops
  wifi: rtw89: 8922a: add register definitions of H2C, C2H, page, RRSR and EDCCA
  wifi: rtw89: 8922a: add chip_ops related to BB init
  wifi: rtw89: 8922a: add chip_ops::{enable,disable}_bb_rf
  wifi: rtw89: add mlo_dbcc_mode for WiFi 7 chips
  wifi: rtlwifi: Speed up firmware loading for USB
  wifi: rtl8xxxu: add missing number of sec cam entries for all variants
  wifi: brcmfmac: allow per-vendor event handling
  wifi: brcmfmac: avoid invalid list operation when vendor attach fails
  wifi: brcmfmac: Demote vendor-specific attach/detach messages to info
  ...
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

vsock/test: print type for SOCK_SEQPACKET

SOCK_SEQPACKET is supported for virtio transport, so do not interpret
such type of socket as unknown.

Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'selftests-tc-testing-misc-changes-for-tdc'

Pedro Tammela says:

====================
selftests: tc-testing: misc changes for tdc

Patches 1 and 3 are fixes for tdc that were discovered when running it
using defconfig + tc-testing config and against the latest iproute2.

Patch 2 improves the taprio tests.

Patch 4 enables all tdc tests.

Patch 5 fixes the return code of tdc for when a test fails
setup/teardown.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: tc-testing: return fail if a test fails in setup/teardown

As of today tests throwing exceptions in setup/teardown phase are
treated as skipped but they should really be failures.

Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Davide Caratti <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Reviewed-by: Davide Caratti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: tc-testing: enable all tdc tests

For the longest time tdc ran only actions and qdiscs tests.
It's time to enable all the remaining tests so every user visible
piece of TC is tested by the downstream CIs.

Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Davide Caratti <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Reviewed-by: Davide Caratti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: tc-testing: adjust fq test to latest iproute2

Adjust the fq verify regex to the latest iproute2

Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Davide Caratti <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Reviewed-by: Davide Caratti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: tc-testing: check if 'jq' is available in taprio tests

If 'jq' is not available the taprio tests might enter an infinite loop,
use the "dependsOn" feature from tdc to check if jq is present. If it's
not the test is skipped.

Suggested-by: Davide Caratti <[email protected]>
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Davide Caratti <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Reviewed-by: Davide Caratti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: tc-testing: add missing netfilter config

On a default config + tc-testing config build, tdc will miss
all the netfilter related tests because it's missing:
CONFIG_NETFILTER=y

Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Davide Caratti <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Reviewed-by: Davide Caratti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

No conflicts or adjacent changes.

Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'net-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bpf, netfilter and WiFi.

  Jakub is doing a lot of work to include the self-tests in our CI, as a
  result a significant amount of self-tests related fixes is flowing in
  (and will likely continue in the next few weeks).

  Current release - regressions:

   - bpf: fix a kernel crash for the riscv 64 JIT

   - bnxt_en: fix memory leak in bnxt_hwrm_get_rings()

   - revert "net: macsec: use skb_ensure_writable_head_tail to expand
     the skb"

  Previous releases - regressions:

   - core: fix removing a namespace with conflicting altnames

   - tc/flower: fix chain template offload memory leak

   - tcp:
      - make sure init the accept_queue's spinlocks once
      - fix autocork on CPUs with weak memory model

   - udp: fix busy polling

   - mlx5e:
      - fix out-of-bound read in port timestamping
      - fix peer flow lists corruption

   - iwlwifi: fix a memory corruption

  Previous releases - always broken:

   - netfilter:
      - nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress
        basechain
      - nft_limit: reject configurations that cause integer overflow

   - bpf: fix bpf_xdp_adjust_tail() with XSK zero-copy mbuf, avoiding a
     NULL pointer dereference upon shrinking

   - llc: make llc_ui_sendmsg() more robust against bonding changes

   - smc: fix illegal rmb_desc access in SMC-D connection dump

   - dpll: fix pin dump crash for rebound module

   - bnxt_en: fix possible crash after creating sw mqprio TCs

   - hv_netvsc: calculate correct ring size when PAGE_SIZE is not 4kB

  Misc:

   - several self-tests fixes for better integration with the netdev CI

   - added several missing modules descriptions"

* tag 'net-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ring
  tsnep: Remove FCS for XDP data path
  net: fec: fix the unhandled context fault from smmu
  selftests: bonding: do not test arp/ns target with mode balance-alb/tlb
  fjes: fix memleaks in fjes_hw_setup
  i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
  i40e: set xdp_rxq_info::frag_size
  xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
  ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
  intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
  ice: remove redundant xdp_rxq_info registration
  i40e: handle multi-buffer packets that are shrunk by xdp prog
  ice: work on pre-XDP prog frag count
  xsk: fix usage of multi-buffer BPF helpers for ZC XDP
  xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
  xsk: recycle buffer in case Rx queue was full
  net: fill in MODULE_DESCRIPTION()s for rvu_mbox
  net: fill in MODULE_DESCRIPTION()s for litex
  net: fill in MODULE_DESCRIPTION()s for fsl_pq_mdio
  net: fill in MODULE_DESCRIPTION()s for fec
  ...

Merge tag 'ovl-fixes-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs

Pull overlayfs fix from Amir Goldstein:
"Change the on-disk format for the new "xwhiteouts" feature introduced
  in v6.7

  The change reduces unneeded overhead of an extra getxattr per readdir.
  The only user of the "xwhiteout" feature is the external composefs
  tool, which has been updated to support the new on-disk format.

  This change is also designated for 6.7.y"

* tag 'ovl-fixes-6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: mark xwhiteouts directory with overlay.opaque='x'

Merge tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull netfs fixes from Christian Brauner:
"This contains various fixes for the netfs work merged earlier this
  cycle:

  afs:
   - Fix locking imbalance in afs_proc_addr_prefs_show()
   - Remove afs_dynroot_d_revalidate() which is redundant
   - Fix error handling during lookup
   - Hide sillyrenames from userspace. This fixes a race between
     silly-rename files being created/removed and userspace iterating
     over directory entries
   - Don't use unnecessary folio_*() functions

  cifs:
   - Don't use unnecessary folio_*() functions

  cachefiles:
   - erofs: Fix Null dereference when cachefiles are not doing
     ondemand-mode
   - Update mailing list

  netfs library:
   - Add Jeff Layton as reviewer
   - Update mailing list
   - Fix a error checking in netfs_perform_write()
   - fscache: Check error before dereferencing
   - Don't use unnecessary folio_*() functions"

* tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  afs: Fix missing/incorrect unlocking of RCU read lock
  afs: Remove afs_dynroot_d_revalidate() as it is redundant
  afs: Fix error handling with lookup via FS.InlineBulkStatus
  afs: Hide silly-rename files from userspace
  cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode
  netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write()
  netfs, fscache: Prevent Oops in fscache_put_cache()
  cifs: Don't use certain unnecessary folio_*() functions
  afs: Don't use certain unnecessary folio_*() functions
  netfs: Don't use certain unnecessary folio_*() functions
  netfs: Add Jeff Layton as reviewer
  netfs, cachefiles: Change mailing list

Merge tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

- Fix in-kernel RPC UDP transport

- Fix NFSv4.0 RELEASE_LOCKOWNER

* tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix RELEASE_LOCKOWNER
SUNRPC: use request size to initialize bio_vec in svc_udp_sendto()

Merge tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux

Pull RCU fix from Neeraj Upadhyay:
"This fixes RCU grace period stalls, which are observed when an
  outgoing CPU's quiescent state reporting results in wakeup of one of
  the grace period kthreads, to complete the grace period.

  If those kthreads have SCHED_FIFO policy, the wake up can indirectly
  arm the RT bandwith timer to the local offline CPU.

  Earlier migration of the hrtimers from the CPU introduced in commit
  5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU
  earlier") results in this timer getting ignored.

  If the RCU grace period kthreads are waiting for RT bandwidth to be
  available, they may never be actually scheduled, resulting in RCU
  stall warnings"

* tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux:
  rcu: Defer RCU kthreads wakeup when CPU is dying

net: dsa: mt7530: support OF-based registration of switch MDIO bus

Currently the MDIO bus of the switches the MT7530 DSA subdriver controls
can only be registered as non-OF-based. Bring support for registering the
bus OF-based.

The subdrivers that control switches [with MDIO bus] probed on OF must
follow this logic to support all cases properly:

No switch MDIO bus defined: Populate ds->user_mii_bus, register the MDIO
bus, set the interrupts for PHYs if "interrupt-controller" is defined at
the switch node. This case should only be covered for the switches which
their dt-bindings documentation didn't document the MDIO bus from the
start. This is to keep supporting the device trees that do not describe the
MDIO bus on the device tree but the MDIO bus is being used nonetheless.

Switch MDIO bus defined: Don't populate ds->user_mii_bus, register the MDIO
bus, set the interrupts for PHYs if ["interrupt-controller" is defined at
the switch node and "interrupts" is defined at the PHY nodes under the
switch MDIO bus node].

Switch MDIO bus defined but explicitly disabled: If the device tree says
status = "disabled" for the MDIO bus, we shouldn't need an MDIO bus at all.
Instead, just exit as early as possible and do not call any MDIO API.

The use of ds->user_mii_bus is inappropriate when the MDIO bus of the
switch is described on the device tree [1], which is why we don't populate
ds->user_mii_bus in that case.

Link: https://lore.kernel.org/netdev/20231213120656.x46fyad6ls7sqyzv@skbuf/
Suggested-by: David Bauer <[email protected]>
Signed-off-by: Arınç ÜNAL <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'tsnep-xdp-fixes'

Gerhard Engleder says:

====================
tsnep: XDP fixes

Found two driver specific problems during XDP and XSK testing.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ring

The fill ring of the XDP socket may contain not enough buffers to
completey fill the RX queue during socket creation. In this case the
flag XDP_RING_NEED_WAKEUP is not set as this flag is only set if the RX
queue is not completely filled during polling.

Set XDP_RING_NEED_WAKEUP flag also if RX queue is not completely filled
during XDP socket creation.

Fixes: 3fc2333933fd ("tsnep: Add XDP socket zero-copy RX support")
Signed-off-by: Gerhard Engleder <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

tsnep: Remove FCS for XDP data path

The RX data buffer includes the FCS. The FCS is already stripped for the
normal data path. But for the XDP data path the FCS is included and
acts like additional/useless data.

Remove the FCS from the RX data buffer also for XDP.

Fixes: 65b28c810035 ("tsnep: Add XDP RX support")
Fixes: 3fc2333933fd ("tsnep: Add XDP socket zero-copy RX support")
Signed-off-by: Gerhard Engleder <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'mlx5-fixes-2024-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2024-01-24

This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.

* tag 'mlx5-fixes-2024-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: fix a potential double-free in fs_any_create_groups
  net/mlx5e: fix a double-free in arfs_create_groups
  net/mlx5e: Ignore IPsec replay window values on sender side
  net/mlx5e: Allow software parsing when IPsec crypto is enabled
  net/mlx5: Use mlx5 device constant for selecting CQ period mode for ASO
  net/mlx5: DR, Can't go to uplink vport on RX rule
  net/mlx5: DR, Use the right GVMI number for drop action
  net/mlx5: Bridge, fix multicast packets sent to uplink
  net/mlx5: Fix a WARN upon a callback command failure
  net/mlx5e: Fix peer flow lists handling
  net/mlx5e: Fix inconsistent hairpin RQT sizes
  net/mlx5e: Fix operation precedence bug in port timestamping napi_poll context
  net/mlx5: Fix query of sd_group field
  net/mlx5e: Use the correct lag ports number when creating TISes
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-01-25

The following pull-request contains BPF updates for your *net* tree.

We've added 12 non-merge commits during the last 2 day(s) which contain
a total of 13 files changed, 190 insertions(+), 91 deletions(-).

The main changes are:

1) Fix bpf_xdp_adjust_tail() in context of XSK zero-copy drivers which
   support XDP multi-buffer. The former triggered a NULL pointer
   dereference upon shrinking, from Maciej Fijalkowski & Tirthendu Sarkar.

2) Fix a bug in riscv64 BPF JIT which emitted a wrong prologue and
   epilogue for struct_ops programs, from Pu Lehui.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
  i40e: set xdp_rxq_info::frag_size
  xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
  ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
  intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
  ice: remove redundant xdp_rxq_info registration
  i40e: handle multi-buffer packets that are shrunk by xdp prog
  ice: work on pre-XDP prog frag count
  xsk: fix usage of multi-buffer BPF helpers for ZC XDP
  xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
  xsk: recycle buffer in case Rx queue was full
  riscv, bpf: Fix unpredictable kernel crash about RV64 struct_ops
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: fec: fix the unhandled context fault from smmu

When repeatedly changing the interface link speed using the command below:

ethtool -s eth0 speed 100 duplex full
ethtool -s eth0 speed 1000 duplex full

The following errors may sometimes be reported by the ARM SMMU driver:

[ 5395.035364] fec 5b040000.ethernet eth0: Link is Down
[ 5395.039255] arm-smmu 51400000.iommu: Unhandled context fault:
fsr=0x402, iova=0x00000000, fsynr=0x100001, cbfrsynra=0x852, cb=2
[ 5398.108460] fec 5b040000.ethernet eth0: Link is Up - 100Mbps/Full -
flow control off

It is identified that the FEC driver does not properly stop the TX queue
during the link speed transitions, and this results in the invalid virtual
I/O address translations from the SMMU and causes the context faults.

Fixes: dbc64a8ea231 ("net: fec: move calls to quiesce/resume packet processing out of fec_restart()")
Signed-off-by: Shenwei Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

selftests: bonding: do not test arp/ns target with mode balance-alb/tlb

The prio_arp/ns tests hard code the mode to active-backup. At the same
time, The balance-alb/tlb modes do not support arp/ns target. So remove
the prio_arp/ns tests from the loop and only test active-backup mode.

Fixes: 481b56e0391e ("selftests: bonding: re-format bond option tests")
Reported-by: Jay Vosburgh <[email protected]>
Closes: https://lore.kernel.org/netdev/17415.1705965957@famine/
Signed-off-by: Hangbin Liu <[email protected]>
Acked-by: Jay Vosburgh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'nf-24-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Update nf_tables kdoc to keep it in sync with the code, from George Guo.

2) Handle NETDEV_UNREGISTER event for inet/ingress basechain.

3) Reject configuration that cause nft_limit to overflow,
   from Florian Westphal.

4) Restrict anonymous set/map names to 16 bytes, from Florian Westphal.

5) Disallow to encode queue number and error in verdicts. This reverts
   a patch which seems to have introduced an early attempt to support for
   nfqueue maps, which is these days supported via nft_queue expression.

6) Sanitize family via .validate for expressions that explicitly refer
   to NF_INET_* hooks.

* tag 'nf-24-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: validate NFPROTO_* family
  netfilter: nf_tables: reject QUEUE/DROP verdict parameters
  netfilter: nf_tables: restrict anonymous set and map names to 16 bytes
  netfilter: nft_limit: reject configurations that cause integer overflow
  netfilter: nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain
  netfilter: nf_tables: cleanup documentation
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

fjes: fix memleaks in fjes_hw_setup

In fjes_hw_setup, it allocates several memory and delay the deallocation
to the fjes_hw_exit in fjes_probe through the following call chain:

fjes_probe
  |-> fjes_hw_init
        |-> fjes_hw_setup
  |-> fjes_hw_exit

However, when fjes_hw_setup fails, fjes_hw_exit won't be called and thus
all the resources allocated in fjes_hw_setup will be leaked. In this
patch, we free those resources in fjes_hw_setup and prevents such leaks.

Fixes: 2fcbca687702 ("fjes: platform_driver's .probe and .remove routine")
Signed-off-by: Zhipeng Lu <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tipc: node: remove Excess struct member kernel-doc warnings

Remove 2 kernel-doc descriptions to squelch warnings:

node.c:150: warning: Excess struct member 'inputq' description in 'tipc_node'
node.c:150: warning: Excess struct member 'namedq' description in 'tipc_node'

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Jon Maloy <[email protected]>
Cc: Ying Xue <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

tipc: socket: remove Excess struct member kernel-doc warning

Remove a kernel-doc description to squelch a warning:

socket.c:143: warning: Excess struct member 'blocking_link' description in 'tipc_sock'

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Jon Maloy <[email protected]>
Cc: Ying Xue <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

vsock/test: add '--peer-port' input argument

Implement port for given CID as input argument instead of using
hardcoded value '1234'. This allows to run different test instances
on a single CID. Port argument is not required parameter and if it is
not set, then default value will be '1234' - thus we preserve previous
behaviour.

Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'ceph-for-6.8-rc2' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
"A fix to avoid triggering an assert in some cases where RBD exclusive
  mappings are involved and a deprecated API cleanup"

* tag 'ceph-for-6.8-rc2' of https://github.com/ceph/ceph-client:
  rbd: don't move requests to the running list on errors
  rbd: remove usage of the deprecated ida_simple_*() API

Merge tag 'integrity-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

Pull integrity fix from Mimi Zohar:
"Revert patch that required user-provided key data, since keys can be
created from kernel-generated random numbers"

* tag 'integrity-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
Revert "KEYS: encrypted: Add check for strsep"

Merge branch 'net-bpf_xdp_adjust_tail-and-intel-mbuf-fixes'

Maciej Fijalkowski says:

====================
net: bpf_xdp_adjust_tail() and Intel mbuf fixes

Hey,

after a break followed by dealing with sickness, here is a v6 that makes
bpf_xdp_adjust_tail() actually usable for ZC drivers that support XDP
multi-buffer. Since v4 I tried also using bpf_xdp_adjust_tail() with
positive offset which exposed yet another issues, which can be observed
by increased commit count when compared to v3.

John, in the end I think we should remove handling
MEM_TYPE_XSK_BUFF_POOL from __xdp_return(), but it is out of the scope
for fixes set, IMHO.

Thanks,
Maciej

v6:
- add acks [Magnus]
- fix spelling mistakes [Magnus]
- avoid touching xdp_buff in xp_alloc_{reused,new_from_fq}() [Magnus]
- s/shrink_data/bpf_xdp_shrink_data [Jakub]
- remove __shrink_data() [Jakub]
- check retvals from __xdp_rxq_info_reg() [Magnus]

v5:
- pick correct version of patch 5 [Simon]
- elaborate a bit more on what patch 2 fixes

v4:
- do not clear frags flag when deleting tail; xsk_buff_pool now does
that
- skip some NULL tests for xsk_buff_get_tail [Martin, John]
- address problems around registering xdp_rxq_info
- fix bpf_xdp_frags_increase_tail() for ZC mbuf

v3:
- add acks
- s/xsk_buff_tail_del/xsk_buff_del_tail
- address i40e as well (thanks Tirthendu)

v2:
- fix !CONFIG_XDP_SOCKETS builds
- add reviewed-by tag to patch 3
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue

Now that i40e driver correctly sets up frag_size in xdp_rxq_info, let us
make it work for ZC multi-buffer as well. i40e_ring::rx_buf_len for ZC
is being set via xsk_pool_get_rx_frame_size() and this needs to be
propagated up to xdp_rxq_info.

Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
Acked-by: Magnus Karlsson <[email protected]>
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

i40e: set xdp_rxq_info::frag_size

i40e support XDP multi-buffer so it is supposed to use
__xdp_rxq_info_reg() instead of xdp_rxq_info_reg() and set the
frag_size. It can not be simply converted at existing callsite because
rx_buf_len could be un-initialized, so let us register xdp_rxq_info
within i40e_configure_rx_ring(), which happen to be called with already
initialized rx_buf_len value.

Commit 5180ff1364bc ("i40e: use int for i40e_status") converted 'err' to
int, so two variables to deal with return codes are not needed within
i40e_configure_rx_ring(). Remove 'ret' and use 'err' to handle status
from xdp_rxq_info registration.

Fixes: e213ced19bef ("i40e: add support for XDP multi-buffer Rx")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL

XSK ZC Rx path calculates the size of data that will be posted to XSK Rx
queue via subtracting xdp_buff::data_end from xdp_buff::data.

In bpf_xdp_frags_increase_tail(), when underlying memory type of
xdp_rxq_info is MEM_TYPE_XSK_BUFF_POOL, add offset to data_end in tail
fragment, so that later on user space will be able to take into account
the amount of bytes added by XDP program.

Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue

Now that ice driver correctly sets up frag_size in xdp_rxq_info, let us
make it work for ZC multi-buffer as well. ice_rx_ring::rx_buf_len for ZC
is being set via xsk_pool_get_rx_frame_size() and this needs to be
propagated up to xdp_rxq_info.

Use a bigger hammer and instead of unregistering only xdp_rxq_info's
memory model, unregister it altogether and register it again and have
xdp_rxq_info with correct frag_size value.

Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers

Ice and i40e ZC drivers currently set offset of a frag within
skb_shared_info to 0, which is incorrect. xdp_buffs that come from
xsk_buff_pool always have 256 bytes of a headroom, so they need to be
taken into account to retrieve xdp_buff::data via skb_frag_address().
Otherwise, bpf_xdp_frags_increase_tail() would be starting its job from
xdp_buff::data_hard_start which would result in overwriting existing
payload.

Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support")
Acked-by: Magnus Karlsson <[email protected]>
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

ice: remove redundant xdp_rxq_info registration

xdp_rxq_info struct can be registered by drivers via two functions -
xdp_rxq_info_reg() and __xdp_rxq_info_reg(). The latter one allows
drivers that support XDP multi-buffer to set up xdp_rxq_info::frag_size
which in turn will make it possible to grow the packet via
bpf_xdp_adjust_tail() BPF helper.

Currently, ice registers xdp_rxq_info in two spots:
1) ice_setup_rx_ring() // via xdp_rxq_info_reg(), BUG
2) ice_vsi_cfg_rxq() // via __xdp_rxq_info_reg(), OK

Cited commit under fixes tag took care of setting up frag_size and
updated registration scheme in 2) but it did not help as
1) is called before 2) and as shown above it uses old registration
function. This means that 2) sees that xdp_rxq_info is already
registered and never calls __xdp_rxq_info_reg() which leaves us with
xdp_rxq_info::frag_size being set to 0.

To fix this misbehavior, simply remove xdp_rxq_info_reg() call from
ice_setup_rx_ring().

Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Acked-by: Magnus Karlsson <[email protected]>
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

i40e: handle multi-buffer packets that are shrunk by xdp prog

XDP programs can shrink packets by calling the bpf_xdp_adjust_tail()
helper function. For multi-buffer packets this may lead to reduction of
frag count stored in skb_shared_info area of the xdp_buff struct. This
results in issues with the current handling of XDP_PASS and XDP_DROP
cases.

For XDP_PASS, currently skb is being built using frag count of
xdp_buffer before it was processed by XDP prog and thus will result in
an inconsistent skb when frag count gets reduced by XDP prog. To fix
this, get correct frag count while building the skb instead of using
pre-obtained frag count.

For XDP_DROP, current page recycling logic will not reuse the page but
instead will adjust the pagecnt_bias so that the page can be freed. This
again results in inconsistent behavior as the page refcnt has already
been changed by the helper while freeing the frag(s) as part of
shrinking the packet. To fix this, only adjust pagecnt_bias for buffers
that are stillpart of the packet post-xdp prog run.

Fixes: e213ced19bef ("i40e: add support for XDP multi-buffer Rx")
Reported-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Tirthendu Sarkar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

ice: work on pre-XDP prog frag count

Fix an OOM panic in XDP_DRV mode when a XDP program shrinks a
multi-buffer packet by 4k bytes and then redirects it to an AF_XDP
socket.

Since support for handling multi-buffer frames was added to XDP, usage
of bpf_xdp_adjust_tail() helper within XDP program can free the page
that given fragment occupies and in turn decrease the fragment count
within skb_shared_info that is embedded in xdp_buff struct. In current
ice driver codebase, it can become problematic when page recycling logic
decides not to reuse the page. In such case, __page_frag_cache_drain()
is used with ice_rx_buf::pagecnt_bias that was not adjusted after
refcount of page was changed by XDP prog which in turn does not drain
the refcount to 0 and page is never freed.

To address this, let us store the count of frags before the XDP program
was executed on Rx ring struct. This will be used to compare with
current frag count from skb_shared_info embedded in xdp_buff. A smaller
value in the latter indicates that XDP prog freed frag(s). Then, for
given delta decrement pagecnt_bias for XDP_DROP verdict.

While at it, let us also handle the EOP frag within
ice_set_rx_bufs_act() to make our life easier, so all of the adjustments
needed to be applied against freed frags are performed in the single
place.

Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Acked-by: Magnus Karlsson <[email protected]>
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

xsk: fix usage of multi-buffer BPF helpers for ZC XDP

Currently when packet is shrunk via bpf_xdp_adjust_tail() and memory
type is set to MEM_TYPE_XSK_BUFF_POOL, null ptr dereference happens:

[1136314.192256] BUG: kernel NULL pointer dereference, address:
0000000000000034
[1136314.203943] #PF: supervisor read access in kernel mode
[1136314.213768] #PF: error_code(0x0000) - not-present page
[1136314.223550] PGD 0 P4D 0
[1136314.230684] Oops: 0000 [#1] PREEMPT SMP NOPTI
[1136314.239621] CPU: 8 PID: 54203 Comm: xdpsock Not tainted 6.6.0+ #257
[1136314.250469] Hardware name: Intel Corporation S2600WFT/S2600WFT,
BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[1136314.265615] RIP: 0010:__xdp_return+0x6c/0x210
[1136314.274653] Code: ad 00 48 8b 47 08 49 89 f8 a8 01 0f 85 9b 01 00 00 0f 1f 44 00 00 f0 41 ff 48 34 75 32 4c 89 c7 e9 79 cd 80 ff 83 fe 03 75 17 <f6> 41 34 01 0f 85 02 01 00 00 48 89 cf e9 22 cc 1e 00 e9 3d d2 86
[1136314.302907] RSP: 0018:ffffc900089f8db0 EFLAGS: 00010246
[1136314.312967] RAX: ffffc9003168aed0 RBX: ffff8881c3300000 RCX:
0000000000000000
[1136314.324953] RDX: 0000000000000000 RSI: 0000000000000003 RDI:
ffffc9003168c000
[1136314.336929] RBP: 0000000000000ae0 R08: 0000000000000002 R09:
0000000000010000
[1136314.348844] R10: ffffc9000e495000 R11: 0000000000000040 R12:
0000000000000001
[1136314.360706] R13: 0000000000000524 R14: ffffc9003168aec0 R15:
0000000000000001
[1136314.373298] FS:  00007f8df8bbcb80(0000) GS:ffff8897e0e00000(0000)
knlGS:0000000000000000
[1136314.386105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1136314.396532] CR2: 0000000000000034 CR3: 00000001aa912002 CR4:
00000000007706f0
[1136314.408377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[1136314.420173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[1136314.431890] PKRU: 55555554
[1136314.439143] Call Trace:
[1136314.446058]  <IRQ>
[1136314.452465]  ? __die+0x20/0x70
[1136314.459881]  ? page_fault_oops+0x15b/0x440
[1136314.468305]  ? exc_page_fault+0x6a/0x150
[1136314.476491]  ? asm_exc_page_fault+0x22/0x30
[1136314.484927]  ? __xdp_return+0x6c/0x210
[1136314.492863]  bpf_xdp_adjust_tail+0x155/0x1d0
[1136314.501269]  bpf_prog_ccc47ae29d3b6570_xdp_sock_prog+0x15/0x60
[1136314.511263]  ice_clean_rx_irq_zc+0x206/0xc60 [ice]
[1136314.520222]  ? ice_xmit_zc+0x6e/0x150 [ice]
[1136314.528506]  ice_napi_poll+0x467/0x670 [ice]
[1136314.536858]  ? ttwu_do_activate.constprop.0+0x8f/0x1a0
[1136314.546010]  __napi_poll+0x29/0x1b0
[1136314.553462]  net_rx_action+0x133/0x270
[1136314.561619]  __do_softirq+0xbe/0x28e
[1136314.569303]  do_softirq+0x3f/0x60

This comes from __xdp_return() call with xdp_buff argument passed as
NULL which is supposed to be consumed by xsk_buff_free() call.

To address this properly, in ZC case, a node that represents the frag
being removed has to be pulled out of xskb_list. Introduce
appropriate xsk helpers to do such node operation and use them
accordingly within bpf_xdp_adjust_tail().

Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Acked-by: Magnus Karlsson <[email protected]> # For the xsk header part
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags

XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is
used by drivers to notify data path whether xdp_buff contains fragments
or not. Data path looks up mentioned flag on first buffer that occupies
the linear part of xdp_buff, so drivers only modify it there. This is
sufficient for SKB and XDP_DRV modes as usually xdp_buff is allocated on
stack or it resides within struct representing driver's queue and
fragments are carried via skb_frag_t structs. IOW, we are dealing with
only one xdp_buff.

ZC mode though relies on list of xdp_buff structs that is carried via
xsk_buff_pool::xskb_list, so ZC data path has to make sure that
fragments do *not* have XDP_FLAGS_HAS_FRAGS set. Otherwise,
xsk_buff_free() could misbehave if it would be executed against xdp_buff
that carries a frag with XDP_FLAGS_HAS_FRAGS flag set. Such scenario can
take place when within supplied XDP program bpf_xdp_adjust_tail() is
used with negative offset that would in turn release the tail fragment
from multi-buffer frame.

Calling xsk_buff_free() on tail fragment with XDP_FLAGS_HAS_FRAGS would
result in releasing all the nodes from xskb_list that were produced by
driver before XDP program execution, which is not what is intended -
only tail fragment should be deleted from xskb_list and then it should
be put onto xsk_buff_pool::free_list. Such multi-buffer frame will never
make it up to user space, so from AF_XDP application POV there would be
no traffic running, however due to free_list getting constantly new
nodes, driver will be able to feed HW Rx queue with recycled buffers.
Bottom line is that instead of traffic being redirected to user space,
it would be continuously dropped.

To fix this, let us clear the mentioned flag on xsk_buff_pool side
during xdp_buff initialization, which is what should have been done
right from the start of XSK multi-buffer support.

Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support")
Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

xsk: recycle buffer in case Rx queue was full

Add missing xsk_buff_free() call when __xsk_rcv_zc() failed to produce
descriptor to XSK Rx queue.

Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Acked-by: Magnus Karlsson <[email protected]>
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>

Merge branch 'fix-module_description-for-net-p2'

Breno Leitao says:

====================
Fix MODULE_DESCRIPTION() for net (p2)

There are hundreds of network modules that misses MODULE_DESCRIPTION(),
causing a warnning when compiling with W=1. Example:

        WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com90io.o
        WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/arc-rimi.o
        WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com20020.o

This part2 of the patchset focus on the drivers/net/ethernet drivers.
There are still some missing warnings in drivers/net/ethernet that will
be fixed in an upcoming patchset.

v1: https://lore.kernel.org/all/20240122184543.2501493 [email protected]/
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: fill in MODULE_DESCRIPTION()s for rvu_mbox

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Marvel RVU mbox driver.

Signed-off-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: fill in MODULE_DESCRIPTION()s for litex

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the LiteX Liteeth Ethernet device.

Signed-off-by: Breno Leitao <[email protected]>
Acked-by: Gabriel Somlo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: fill in MODULE_DESCRIPTION()s for fsl_pq_mdio

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Freescale PQ MDIO driver.

Signed-off-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: fill in MODULE_DESCRIPTION()s for fec

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the FEC (MPC8xx) Ethernet controller.

Signed-off-by: Breno Leitao <[email protected]>
Reviewed-by: Wei Fang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: fill in MODULE_DESCRIPTION()s for enetc

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the NXP ENETC Ethernet driver.

Signed-off-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: fill in MODULE_DESCRIPTION()s for nps_enet

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the EZchip NPS ethernet driver.

Signed-off-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: fill in MODULE_DESCRIPTION()s for ep93xxx_eth

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Cirrus EP93xx ethernet driver.

Signed-off-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: fill in MODULE_DESCRIPTION()s for liquidio

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Cavium Liquidio.

Signed-off-by: Breno Leitao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: fill in MODULE_DESCRIPTION()s for Broadcom bgmac

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Broadcom iProc GBit driver.

Signed-off-by: Breno Leitao <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: fill in MODULE_DESCRIPTION()s for 8390

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to all the good old 8390 modules and drivers.

Signed-off-by: Breno Leitao <[email protected]>
CC: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: netdevsim: fix the udp_tunnel_nic test

This test is missing a whole bunch of checks for interface
renaming and one ifup. Presumably it was only used on a system
with renaming disabled and NetworkManager running.

Fixes: 91f430b2c49d ("selftests: net: add a test for UDP tunnel info infra")
Acked-by: Paolo Abeni <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: net: fix rps_default_mask with >32 CPUs

If there is more than 32 cpus the bitmask will start to contain
commas, leading to:

./rps_default_mask.sh: line 36: [: 00000000,00000000: integer expression expected

Remove the commas, bash doesn't interpret leading zeroes as oct
so that should be good enough. Switch to bash, Simon reports that
not all shells support this type of substitution.

Fixes: c12e0d5f267d ("self-tests: introduce self-tests for RPS default mask")
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'execve-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve fixes from Kees Cook:

- Fix error handling in begin_new_exec() (Bernd Edlinger)

- MAINTAINERS: specifically mention ELF (Alexey Dobriyan)

- Various cleanups related to earlier open() (Askar Safin, Kees Cook)

* tag 'execve-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  exec: Distinguish in_execve from in_exec
  exec: Fix error handling in begin_new_exec()
  exec: Add do_close_execat() helper
  exec: remove useless comment
  ELF, MAINTAINERS: specifically mention ELF

uselib: remove use of __FMODE_EXEC

Jann Horn points out that uselib() really shouldn't trigger the new
FMODE_EXEC logic introduced by commit 4759ff71f23e ("exec: __FMODE_EXEC
instead of in_execve for LSMs").

In fact, it shouldn't even have ever triggered the old pre-existing
logic for __FMODE_EXEC (like the NFS code that makes executables not
need read permissions). Unlike a real execve(), that can work even with
files that are purely executable by the user (not readable), uselib()
has that MAY_READ requirement becasue it's really just a convenience
wrapper around mmap() for legacy shared libraries.

The whole FMODE_EXEC bit was originally introduced by commit
b500531e6f5f ("[PATCH] Introduce FMODE_EXEC file flag"), primarily to
give ETXTBUSY error returns for distributed filesystems.

It has since grown a few other warts (like that NFS thing), but there
really isn't any reason to use it for uselib(), and now that we are
trying to use it to replace the horrid 'tsk->in_execve' flag, it's
actively wrong.

Of course, as Jann Horn also points out, nobody should be enabling
CONFIG_USELIB in the first place in this day and age, but that's a
different discussion entirely.

Reported-by: Jann Horn <[email protected]>
Fixes: 4759ff71f23e ("exec: __FMODE_EXEC instead of in_execve for LSMs")
Cc: Kees Cook <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Revert "KEYS: encrypted: Add check for strsep"

This reverts commit b4af096b5df5dd131ab796c79cedc7069d8f4882.

New encrypted keys are created either from kernel-generated random
numbers or user-provided decrypted data. Revert the change requiring
user-provided decrypted data.

Reported-by: Vishal Verma <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>

net: mvpp2: clear BM pool before initialization

Register value persist after booting the kernel using
kexec which results in kernel panic. Thus clear the
BM pool registers before initialisation to fix the issue.

Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Jenishkumar Maheshbhai Patel <[email protected]>
Reviewed-by: Maxime Chevallier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: stmmac: Wait a bit for the reset to take effect

otherwise the synopsys_id value may be read out wrong,
because the GMAC_VERSION register might still be in reset
state, for at least 1 us after the reset is de-asserted.

Add a wait for 10 us before continuing to be on the safe side.

> From what have you got that delay value?

Just try and error, with very old linux versions and old gcc versions
the synopsys_id was read out correctly most of the time (but not always),
with recent linux versions and recnet gcc versions it was read out
wrongly most of the time, but again not always.
I don't have access to the VHDL code in question, so I cannot
tell why it takes so long to get the correct values, I also do not
have more than a few hardware samples, so I cannot tell how long
this timeout must be in worst case.
Experimentally I can tell that the register is read several times
as zero immediately after the reset is de-asserted, also adding several
no-ops is not enough, adding a printk is enough, also udelay(1) seems to
be enough but I tried that not very often, and I have not access to many
hardware samples to be 100% sure about the necessary delay.
And since the udelay here is only executed once per device instance,
it seems acceptable to delay the boot for 10 us.

BTW: my hardware's synopsys id is 0x37.

Fixes: c5e4ddbdfa11 ("net: stmmac: Add support for optional reset control")
Signed-off-by: Bernd Edlinger <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: Serge Semin <[email protected]>
Link: https://lore.kernel.org/r/AS8P193MB1285A810BD78C111E7F6AA34E4752@AS8P193MB1285.EURP193.PROD.OUTLOOK.COM
Signed-off-by: Jakub Kicinski <[email protected]>

samples/cgroup: add .gitignore file for generated samples

Make 'git status' quietly happy again after a full allmodconfig build.

Fixes: 60433a9d038d ("samples: introduce new samples subdir for cgroup")
Signed-off-by: Linus Torvalds <[email protected]>

exec: Distinguish in_execve from in_exec

Just to help distinguish the fs->in_exec flag from the current->in_execve
flag, add comments in check_unsafe_exec() and copy_fs() for more
context. Also note that in_execve is only used by TOMOYO now.

Cc: Kentaro Takeda <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>

exec: Check __FMODE_EXEC instead of in_execve for LSMs

After commit 978ffcbf00d8 ("execve: open the executable file before
doing anything else"), current->in_execve was no longer in sync with the
open(). This broke AppArmor and TOMOYO which depend on this flag to
distinguish "open" operations from being "exec" operations.

Instead of moving around in_execve, switch to using __FMODE_EXEC, which
is where the "is this an exec?" intent is stored. Note that TOMOYO still
uses in_execve around cred handling.

Reported-by: Kevin Locke <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]
Suggested-by: Linus Torvalds <[email protected]>
Fixes: 978ffcbf00d8 ("execve: open the executable file before doing anything else")
Cc: Josh Triplett <[email protected]>
Cc: John Johansen <[email protected]>
Cc: Paul Moore <[email protected]>
Cc: James Morris <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Cc: Kentaro Takeda <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

netfilter: nf_tables: validate NFPROTO_* family

Several expressions explicitly refer to NF_INET_* hook definitions
from expr->ops->validate, however, family is not validated.

Bail out with EOPNOTSUPP in case they are used from unsupported
families.

Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression")
Fixes: 2fa841938c64 ("netfilter: nf_tables: introduce routing expression")
Fixes: 554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching")
Fixes: ad49d86e07a4 ("netfilter: nf_tables: Add synproxy support")
Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support")
Fixes: 6c47260250fc ("netfilter: nf_tables: add xfrm expression")
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_tables: reject QUEUE/DROP verdict parameters

This reverts commit e0abdadcc6e1.

core.c:nf_hook_slow assumes that the upper 16 bits of NF_DROP
verdicts contain a valid errno, i.e. -EPERM, -EHOSTUNREACH or similar,
or 0.

Due to the reverted commit, its possible to provide a positive
value, e.g. NF_ACCEPT (1), which results in use-after-free.

Its not clear to me why this commit was made.

NF_QUEUE is not used by nftables; "queue" rules in nftables
will result in use of "nft_queue" expression.

If we later need to allow specifiying errno values from userspace
(do not know why), this has to call NF_DROP_GETERR and check that
"err <= 0" holds true.

Fixes: e0abdadcc6e1 ("netfilter: nf_tables: accept QUEUE/DROP verdict parameters")
Cc: [email protected]
Reported-by: Notselwyn <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_tables: restrict anonymous set and map names to 16 bytes

nftables has two types of sets/maps, one where userspace defines the
name, and anonymous sets/maps, where userspace defines a template name.

For the latter, kernel requires presence of exactly one "%d".
nftables uses "__set%d" and "__map%d" for this. The kernel will
expand the format specifier and replaces it with the smallest unused
number.

As-is, userspace could define a template name that allows to move
the set name past the 256 bytes upperlimit (post-expansion).

I don't see how this could be a problem, but I would prefer if userspace
cannot do this, so add a limit of 16 bytes for the '%d' template name.

16 bytes is the old total upper limit for set names that existed when
nf_tables was merged initially.

Fixes: 387454901bd6 ("netfilter: nf_tables: Allow set names of up to 255 chars")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nft_limit: reject configurations that cause integer overflow

Reject bogus configs where internal token counter wraps around.
This only occurs with very very large requests, such as 17gbyte/s.

Its better to reject this rather than having incorrect ratelimit.

Fixes: d2168e849ebf ("netfilter: nft_limit: add per-byte limiting")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain

Remove netdevice from inet/ingress basechain in case NETDEV_UNREGISTER
event is reported, otherwise a stale reference to netdevice remains in
the hook list.

Fixes: 60a3815da702 ("netfilter: add inet ingress support")
Cc: [email protected]
Signed-off-by: Pablo Neira Ayuso <[email protected]>