We've added 9 non-merge commits during the last 4 day(s) which contain
a total of 3 files changed, 226 insertions(+), 84 deletions(-).
The main changes are:
1) Fixes to bpf_msg_push/pop_data and test_sockmap. The changes has
dependency on the other changes in the bpf-next/net branch,
from Zijian Zhang.
2) Drop netns codes from mptcp test. Reuse the common helpers in
test_progs, from Geliang Tang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
bpf, sockmap: Fix sk_msg_reset_curr
bpf, sockmap: Several fixes to bpf_msg_pop_data
bpf, sockmap: Several fixes to bpf_msg_push_data
selftests/bpf: Add more tests for test_txmsg_push_pop in test_sockmap
selftests/bpf: Add push/pop checking for msg_verify_data in test_sockmap
selftests/bpf: Fix total_bytes in msg_loop_rx in test_sockmap
selftests/bpf: Fix SENDPAGE data logic in test_sockmap
selftests/bpf: Add txmsg_pass to pull/push/pop in test_sockmap
selftests/bpf: Drop netns helpers in mptcp
====================
====================
ipv4: Prepare bpf helpers to .flowi4_tos conversion.
Continue the process of making a dscp_t variable available when setting
.flowi4_tos. This series focuses on the BPF helpers that initialise a
struct flowi4 manually.
The objective is to eventually convert .flowi4_tos to dscp_t, (to get
type annotation and prevent ECN bits from interfering with DSCP).
====================
Revert patch 1bf70e6c3a53 which modified check_ntf() and instead add a
new poll_ntf() with async notification semantics. See patch 2 for a
detailed description.
====================
Donald Hunter [Wed, 13 Nov 2024 09:08:43 +0000 (09:08 +0000)]
tools/net/ynl: add async notification handling
The notification handling in ynl is currently very simple, using sleep()
to wait a period of time and then handling all the buffered messages in
a single batch.
This patch adds async notification handling so that messages can be
processed as they are received. This makes it possible to use ynl as a
library that supplies notifications in a timely manner.
- Add poll_ntf() to be a generator that yields 1 notification at a
time and blocks until a notification is available.
- Add a --duration parameter to the CLI, with --sleep as an alias.
This modification to check_ntf() is being reverted so that its behaviour
remains equivalent to ynl_ntf_check() in the C YNL. Instead a new
poll_ntf() will be added in a separate patch.
====================
net: phy: switch eee_broken_modes to linkmode bitmap and add accessor
eee_broken_modes has a eee_cap1 register layout currently. This doesn't
allow to flag e.g. 2.5Gbps or 5Gbps BaseT EEE as broken. To overcome
this limitation switch eee_broken_modes to a linkmode bitmap.
Add an accessor for the bitmap and use it in r8169.
====================
Vendor driver r8125 doesn't advertise 2.5G EEE on RTL8125A, and r8126
doesn't advertise 5G EEE. Likely there are compatibility issues,
therefore do the same in r8169.
With this change we don't have to disable 2.5G EEE advertisement in
rtl8125a_config_eee_phy() any longer.
We use new phylib accessor phy_set_eee_broken() to mark the respective
EEE modes as broken.
Heiner Kallweit [Fri, 8 Nov 2024 06:54:47 +0000 (07:54 +0100)]
net: phy: convert eee_broken_modes to a linkmode bitmap
eee_broken_modes has a eee_cap1 register layout currently. This doen't
allow to flag e.g. 2.5Gbps or 5Gbps BaseT EEE as broken. To overcome
this limitation switch eee_broken_modes to a linkmode bitmap.
Linus Torvalds [Thu, 14 Nov 2024 18:00:23 +0000 (10:00 -0800)]
Merge tag 'bcachefs-2024-11-13' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"This fixes one minor regression from the btree cache fixes (in the
scan_for_btree_nodes repair path) - and the shutdown path fix is the
big one here, in terms of bugs closed:
The shutdown path wasn't flushing the btree write buffer, leading
to shutting down while we still had operations in flight. This
fixes a whole slew of syzbot bugs, and undoubtedly other strange
heisenbugs.
* tag 'bcachefs-2024-11-13' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix assertion pop in bch2_ptr_swab()
bcachefs: Fix journal_entry_dev_usage_to_text() overrun
bcachefs: Allow for unknown key types in backpointers fsck
bcachefs: Fix assertion pop in topology repair
bcachefs: Fix hidden btree errors when reading roots
bcachefs: Fix validate_bset() repair path
bcachefs: Fix missing validation for bch_backpointer.level
bcachefs: Fix bch_member.btree_bitmap_shift validation
bcachefs: bch2_btree_write_buffer_flush_going_ro()
Mohsin Bashir [Tue, 12 Nov 2024 22:26:05 +0000 (14:26 -0800)]
eth: fbnic: Add support to dump registers
Add support for the 'ethtool -d <dev>' command to retrieve and print
a register dump for fbnic. The dump defaults to version 1 and consists
of two parts: all the register sections that can be dumped linearly, and
an RPC RAM section that is structured in an interleaved fashion and
requires special handling. For each register section, the dump also
contains the start and end boundary information which can simplify parsing.
net: sched: u32: Add test case for systematic hnode IDR leaks
Add a tdc test case to exercise the just-fixed systematic leak of
IDR entries in u32 hnode disposal. Given the IDR in question is
confined to the range [1..0x7FF], it is sufficient to create/delete
the same filter 2048 times to fill it up and get a nonzero exit
status from "tc filter add".
====================
bonding: fix ns targets not work on hardware NIC
The first patch fixed ns targets not work on hardware NIC when bonding
set arp_validate.
The second patch add a related selftest for bonding.
v4: Thanks Nikolay for the comments:
use bond_slave_ns_maddrs_{add/del} with clear name
fix comments typos
remove _slave_set_ns_maddrs underscore directly
update bond_option_arp_validate_set() change logic
v3: use ndisc_mc_map to convert the mcast mac address (Jay Vosburgh)
v2: only add/del mcast group on backup slaves when arp_validate is set (Jay Vosburgh)
arp_validate doesn't support 3ad, tlb, alb. So let's only do it on ab mode.
====================
Hangbin Liu [Mon, 11 Nov 2024 10:16:49 +0000 (10:16 +0000)]
bonding: add ns target multicast address to slave device
Commit 4598380f9c54 ("bonding: fix ns validation on backup slaves")
tried to resolve the issue where backup slaves couldn't be brought up when
receiving IPv6 Neighbor Solicitation (NS) messages. However, this fix only
worked for drivers that receive all multicast messages, such as the veth
interface.
For standard drivers, the NS multicast message is silently dropped because
the slave device is not a member of the NS target multicast group.
To address this, we need to make the slave device join the NS target
multicast group, ensuring it can receive these IPv6 NS messages to validate
the slave’s status properly.
There are three policies before joining the multicast group:
1. All settings must be under active-backup mode (alb and tlb do not support
arp_validate), with backup slaves and slaves supporting multicast.
2. We can add or remove multicast groups when arp_validate changes.
3. Other operations, such as enslaving, releasing, or setting NS targets,
need to be guarded by arp_validate.
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets") Signed-off-by: Hangbin Liu <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
Meghana Malladi [Mon, 11 Nov 2024 09:58:42 +0000 (15:28 +0530)]
net: ti: icssg-prueth: Fix 1 PPS sync
The first PPS latch time needs to be calculated by the driver
(in rounded off seconds) and configured as the start time
offset for the cycle. After synchronizing two PTP clocks
running as master/slave, missing this would cause master
and slave to start immediately with some milliseconds
drift which causes the PPS signal to never synchronize with
the PTP master.
Tristram Ha [Sat, 9 Nov 2024 01:57:05 +0000 (17:57 -0800)]
net: dsa: microchip: Add LAN9646 switch support to KSZ DSA driver
LAN9646 switch is a 6-port switch with functions like KSZ9897. It has
4 internal PHYs and 1 SGMII port. The chip id read from hardware is
same as KSZ9477, so software driver needs to create a new chip id and
group allowable functions under its chip data structure to
differentiate the product.
net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol
The mediatek,mac-wol property is being handled backwards to what is
described in the binding: it currently enables PHY WOL when the property
is present and vice versa. Invert the driver logic so it matches the
binding description.
Breno Leitao [Fri, 8 Nov 2024 14:08:36 +0000 (06:08 -0800)]
ipmr: Fix access to mfc_cache_list without lock held
Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the
following code flow, the RCU read lock is not held, causing the
following error when `RCU_PROVE` is not held. The same problem might
show up in the IPv6 code path.
6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G E N
-----------------------------
net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!!
This is not a problem per see, since the RTNL lock is held here, so, it
is safe to iterate in the list without the RCU read lock, as suggested
by Eric.
To alleviate the concern, modify the code to use
list_for_each_entry_rcu() with the RTNL-held argument.
The annotation will raise an error only if RTNL or RCU read lock are
missing during iteration, signaling a legitimate problem, otherwise it
will avoid this false positive.
This will solve the IPv6 case as well, since ip6mr_rtm_dumproute() calls
this function as well.
Wei Fang [Tue, 12 Nov 2024 03:03:47 +0000 (11:03 +0800)]
samples: pktgen: correct dev to DEV
In the pktgen_sample01_simple.sh script, the device variable is uppercase
'DEV' instead of lowercase 'dev'. Because of this typo, the script cannot
enable UDP tx checksum.
net: phylink: ensure PHY momentary link-fails are handled
Normally, phylib won't notify changes in quick succession. However, as
a result of commit 3e43b903da04 ("net: phy: Immediately call
adjust_link if only tx_lpi_enabled changes") this is no longer true -
it is now possible that phy_link_down() and phy_link_up() will both
complete before phylink's resolver has run, which means it'll miss that
pl->phy_state.link momentarily became false.
Rename "mac_link_dropped" to be more generic "link_failed" since it will
cover more than the MAC/PCS end of the link failing, and arrange to set
this in phylink_phy_change() if we notice that the PHY reports that the
link is down.
This will ensure that we capture an EEE reconfiguration event.
====================
Support external snapshots on dwmac1000
The main change since v3 is the move of the fifo flush wait in the
ptp_clock_info enable() function within the mutex that protects the ptp
registers. Thanks Jakub and Paolo for spotting this.
This series also aggregates Daniel's reviews, except for the patch 4
which was modified since then.
This series is another take on the previous work [1] done by
Alexis Lothoré, that fixes the support for external snapshots
timestamping in GMAC3-based devices.
Details on why this is needed are mentionned on the cover [2] from V1.
net: stmmac: dwmac_socfpga: This platform has GMAC
Indicate that dwmac_socfpga has a gmac. This will make sure that
gmac-specific interrupt processing is done, including timestamp
interrupt handling. Without this, the external snapshot interrupt is
never ack'd and we have an interrupt storm on external snapshot event.
net: stmmac: Configure only the relevant bits for timestamping setup
The PTP_TCR (Timestamp Control Register) is used to configure several
features related to packet timestamping.
On one hand, it configures the 1588 packet processing, to indicate what
types of frames should be timestamped (all, only 1588v1 or 1588v2, using
L2 or L4 timestamping, on IPv4 or IPv6, etc.). This is congfigured
usually through the ioctl / ndo dedicated for such setup. This
configuration is done by setting some fields in that register, that seem
to behave the same way on all dwmac variants, including DWMAC1000.
On the other hand, and only on DWMAC1000 apparently, some fields in that
register are used to configure external snapshots (bits 24/25).
On DWMAC4 and others, these fields are reserved and external
snapshots are configured through a dedicated register that simply
doesn't seem to exist on DWMAC1000.
This configuration is done in the dwmac1000-specific ptp_clock_info ops
(cf dwmac1000_ptp_enable()).
So to avoid the timestamping configuration interfering with the external
snapshots, this commit makes sure that the config_hw_tstamping only
configures the relevant bits in PTP_TCR, so that the DWMAC1000
timestamping can correctly rely on these otherwise reserved fields.
net: stmmac: Enable timestamping interrupt on dwmac1000
The default configuration for the interrupts on dwmac1000 have the
timestamping interrupt masked. Now that the timestamping has been
adapted to dwmac1000, enable the timestamping interrupt on these
platforms.
On dwmac1000, the external snapshot interrupt is configured through a
dedicated bit, that is set as reserved on other dwmac variants. The
timestaming interrupt is acknowledged by reading the
GMAC3_X_TIMESTAMP_STATUS register.
Make sure that this interrupt is enabled when snapshot is enabled, and
masked when disabled.
In GMAC3_X, the timestamping configuration differs from GMAC4 in the
layout of the registers accessed to grab the number of snapshots in FIFO
as well as the register offset to grab the aux snapshot timestamp.
Introduce dedicated ops to configure timestamping on dwmac100 and
dwmac1000. The latency correction doesn't seem to exist on GMAC3, so its
corresponding operation isn't populated.
net: stmmac: Introduce dwmac1000 ptp_clock_info and operations
The PTP configuration for GMAC3_X differs from the other implementations
in several ways :
- There's only one external snapshot trigger
- The snapshot configuration is done through the PTP_TCR register,
whereas the other dwmac variants have a dedicated ACR (auxiliary
control reg) for that purpose
- The layout for the PTP_TCR register also differs, as bits 24/25 are
used for the snapshot configuration. These bits are reserved on other
variants.
On GMAC3_X, we also can't discover the number of snapshot triggers
automatically.
The GMAC3_X has one PPS output, however it's configuration isn't
supported yet so report 0 n_per_out for now.
Introduce a dedicated set of ptp_clock_info ops and configuration
parameters to reflect these differences specific to GMAC3_X.
net: stmmac: Only update the auto-discovered PTP clock features
Some DWMAC variants such as dwmac1000 don't support discovering the
number of output pps and auxiliary snapshots. Allow these parameters to
be defined in default ptp_clock_info, and let them be updated only when
the feature discovery yielded a result.
The auxiliary snapshot configuration was found to differ depending on
the dwmac version. To prepare supporting this, allow specifying the
ptp_clock_info ops in the hwif array
net: stmmac: Don't modify the global ptp ops directly
The stmmac_ptp_clock_ops are copied into the stmmac_priv structure
before being registered to the PTP core. Some adjustments are made prior
to that, such as the number of snapshots or max adjustment parameters.
Instead of modifying the global definition, then copying into the local
private data, let's first copy then modify the local parameters.
Jakub Kicinski [Thu, 14 Nov 2024 02:51:09 +0000 (18:51 -0800)]
Merge branch 'mptcp-pm-a-few-more-fixes'
Matthieu Baerts says:
====================
mptcp: pm: a few more fixes
Three small fixes related to the MPTCP path-manager:
- Patch 1: correctly reflect the backup flag to the corresponding local
address entry of the userspace path-manager. A fix for v5.19.
- Patch 2: hold the PM lock when deleting an entry from the local
addresses of the userspace path-manager to avoid messing up with this
list. A fix for v5.19.
- Patch 3: use _rcu variant to iterate the in-kernel path-manager's
local addresses list, when under rcu_read_lock(). A fix for v5.17.
====================
In mptcp_pm_create_subflow_or_signal_addr(), rcu_read_(un)lock() are
used as expected to iterate over the list of local addresses, but
list_for_each_entry() was used instead of list_for_each_entry_rcu() in
__lookup_addr(). It is important to use this variant which adds the
required READ_ONCE() (and diagnostic checks if enabled).
Because __lookup_addr() is also used in mptcp_pm_nl_set_flags() where it
is called under the pernet->lock and not rcu_read_lock(), an extra
condition is then passed to help the diagnostic checks making sure
either the associated spin lock or the RCU lock is held.
Geliang Tang [Tue, 12 Nov 2024 19:18:33 +0000 (20:18 +0100)]
mptcp: update local address flags when setting it
Just like in-kernel pm, when userspace pm does set_flags, it needs to send
out MP_PRIO signal, and also modify the flags of the corresponding address
entry in the local address list. This patch implements the missing logic.
Traverse all address entries on userspace_pm_local_addr_list to find the
local address entry, if bkup is true, set the flags of this entry with
FLAG_BACKUP, otherwise, clear FLAG_BACKUP.
Luo Yifan [Wed, 13 Nov 2024 01:11:42 +0000 (09:11 +0800)]
ynl: samples: Fix the wrong format specifier
Make a minor change to eliminate a static checker warning. The type
of s->ifc is unsigned int, so the correct format specifier should be
%u instead of %d.
====================
tools: ynl: two patches to ease building with rpmbuild
I'm looking to build and package ynl for Fedora and Centos Stream users.
Default rpmbuild has couple hardening options enabled by default [1][2],
which currently prevent ynl from building.
This series contains 2 small patches to address it.
Jan Stancek [Tue, 12 Nov 2024 08:21:33 +0000 (09:21 +0100)]
tools: ynl: extend CFLAGS to keep options from environment
Package build environments like Fedora rpmbuild introduced hardening
options (e.g. -pie -Wl,-z,now) by passing a -spec option to CFLAGS
and LDFLAGS.
ynl Makefiles currently override CFLAGS but not LDFLAGS, which leads
to a mismatch and build failure:
CC sample devlink
/usr/bin/ld: devlink.o: relocation R_X86_64_32 against symbol `ynl_devlink_family' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
Extend CFLAGS to support hardening options set by build environment.
Jakub Kicinski [Thu, 14 Nov 2024 02:35:18 +0000 (18:35 -0800)]
Merge tag 'wireless-next-2024-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.13
Most likely the last -next pull request for v6.13. Most changes are in
Realtek and Qualcomm drivers, otherwise not really anything
noteworthy.
Major changes:
mac80211
* EHT 1024 aggregation size for transmissions
ath12k
* switch to using wiphy_lock() and remove ar->conf_mutex
* firmware coredump collection support
* add debugfs support for a multitude of statistics
ath11k
* dt: document WCN6855 hardware inputs
ath9k
* remove include/linux/ath9k_platform.h
ath5k
* Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
* tag 'wireless-next-2024-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (154 commits)
Revert "wifi: iwlegacy: do not skip frames with bad FCS"
wifi: mac80211: pass MBSSID config by reference
wifi: mac80211: Support EHT 1024 aggregation size in TX
net: rfkill: gpio: Add check for clk_enable()
wifi: brcmfmac: Fix oops due to NULL pointer dereference in brcmf_sdiod_sglist_rw()
wifi: Switch back to struct platform_driver::remove()
wifi: ipw2x00: libipw_rx_any(): fix bad alignment
wifi: brcmfmac: release 'root' node in all execution paths
wifi: iwlwifi: mvm: don't call power_update_mac in fast suspend
wifi: iwlwifi: s/IWL_MVM_INVALID_STA/IWL_INVALID_STA
wifi: iwlwifi: bump minimum API version in BZ/SC to 92
wifi: iwlwifi: move IWL_LMAC_*_INDEX to fw/api/context.h
wifi: iwlwifi: be less noisy if the NIC is dead in S3
wifi: iwlwifi: mvm: tell iwlmei when we finished suspending
wifi: iwlwifi: allow fast resume on ax200
wifi: iwlwifi: mvm: support new initiator and responder command version
wifi: iwlwifi: mvm: use wiphy locked debugfs for low-latency
wifi: iwlwifi: mvm: MLO scan upon channel condition degradation
wifi: iwlwifi: mvm: support new versions of the wowlan APIs
wifi: iwlwifi: mvm: allow always calling iwl_mvm_get_bss_vif()
...
====================
Linus Torvalds [Wed, 13 Nov 2024 21:32:51 +0000 (13:32 -0800)]
Merge tag 'pm-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a locking issue in the asymmetric CPU capacity setup code in the
intel_pstate driver that may lead to a deadlock if CPU online/offline
runs in parallel with the code in question, which is unlikely but not
impossible (Rafael Wysocki)"
* tag 'pm-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Rearrange locking in hybrid_init_cpu_capacity_scaling()
Linus Torvalds [Wed, 13 Nov 2024 21:28:58 +0000 (13:28 -0800)]
Merge tag 'tpmdd-next-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
"Two bug fixes for TPM bus encryption (the remaining reported issues in
the feature)"
* tag 'tpmdd-next-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: Disable TPM on tpm2_create_primary() failure
tpm: Opt-in in disable PCR integrity protection
Jarkko Sakkinen [Wed, 13 Nov 2024 05:54:12 +0000 (07:54 +0200)]
tpm: Opt-in in disable PCR integrity protection
The initial HMAC session feature added TPM bus encryption and/or integrity
protection to various in-kernel TPM operations. This can cause performance
bottlenecks with IMA, as it heavily utilizes PCR extend operations.
In order to mitigate this performance issue, introduce a kernel
command-line parameter to the TPM driver for disabling the integrity
protection for PCR extend operations (i.e. TPM2_PCR_Extend).
Linus Torvalds [Wed, 13 Nov 2024 17:14:19 +0000 (09:14 -0800)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann:
- Fix a mismatching RCU unlock flavor in bpf_out_neigh_v6 (Jiawei Ye)
- Fix BPF sockmap with kTLS to reject vsock and unix sockets upon kTLS
context retrieval (Zijian Zhang)
- Fix BPF bits iterator selftest for s390x (Hou Tao)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6
bpf: Add sk_is_inet and IS_ICSK check in tls_sw_has_ctx_tx/rx
selftests/bpf: Use -4095 as the bad address for bits iterator
Linus Torvalds [Wed, 13 Nov 2024 17:09:00 +0000 (09:09 -0800)]
Merge tag 'loongarch-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
- fix possible CPUs setup logical-physical CPU mapping, in order to
avoid CPU hotplug issue
- fix some KASAN bugs
- fix AP booting issue in VM mode
- some trivial cleanups
* tag 'loongarch-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Fix AP booting issue in VM mode
LoongArch: Add WriteCombine shadow mapping in KASAN
LoongArch: Disable KASAN if PGDIR_SIZE is too large for cpu_vabits
LoongArch: Make KASAN work with 5-level page-tables
LoongArch: Define a default value for VM_DATA_DEFAULT_FLAGS
LoongArch: Fix early_numa_add_cpu() usage for FDT systems
LoongArch: For all possible CPUs setup logical-physical CPU mapping
Linus Torvalds [Wed, 13 Nov 2024 16:58:11 +0000 (08:58 -0800)]
Merge tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"10 hotfixes, 7 of which are cc:stable. 7 are MM, 3 are not. All
singletons"
* tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: swapfile: fix cluster reclaim work crash on rotational devices
selftests: hugetlb_dio: fixup check for initial conditions to skip in the start
mm/thp: fix deferred split queue not partially_mapped: fix
mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
nommu: pass NULL argument to vma_iter_prealloc()
ocfs2: fix UBSAN warning in ocfs2_verify_volume()
nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint
nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint
mm: page_alloc: move mlocked flag clearance into free_pages_prepare()
mm: count zeromap read and set for swapout and swapin
David S. Miller [Wed, 13 Nov 2024 13:06:04 +0000 (13:06 +0000)]
Merge branch 'phy-mediatek-reorg'
Sky Huang says:
====================
Re-organize MediaTek ethernet phy drivers and propose mtk-phy-lib
This patchset comes from patch 1/9, 3/9, 4/9, 5/9 and 7/9 of:
https://lore.kernel.org/netdev/20241004102413[email protected]/
This patchset changes MediaTek's ethernet phy's folder structure and
integrates helper functions, including LED & token ring manipulation,
into mtk-phy-lib.
---
Change in v2:
- Add correct Reviewed-by tag in each patch.
Change in v3:
[patch 4/5]
- Fix kernel test robot error by adding missing MTK_NET_PHYLIB.
====================
SkyLake.Huang [Fri, 8 Nov 2024 16:34:52 +0000 (00:34 +0800)]
net: phy: mediatek: Move LED helper functions into mtk phy lib
This patch creates mtk-phy-lib.c & mtk-phy.h and integrates mtk-ge-soc.c's
LED helper functions so that we can use those helper functions in other
MTK's ethernet phy driver.
Re-organize MediaTek ethernet phy driver files and get ready to integrate
some common functions and add new 2.5G phy driver.
mtk-ge.c: MT7530 Gphy on MT7621 & MT7531 Gphy
mtk-ge-soc.c: Built-in Gphy on MT7981 & Built-in switch Gphy on MT7988
mtk-2p5ge.c: Planned for built-in 2.5G phy on MT7988
David S. Miller [Wed, 13 Nov 2024 11:57:12 +0000 (11:57 +0000)]
Merge branch 'octeontx2-rvu-rep'
Geetha sowjanya says:
====================
Introduce RVU representors
This series adds representor support for each rvu devices.
When switchdev mode is enabled, representor netdev is registered
for each rvu device. In implementation of representor model,
one NIX HW LF with multiple SQ and RQ is reserved, where each
RQ and SQ of the LF are mapped to a representor. A loopback channel
is reserved to support packet path between representors and VFs.
CN10K silicon supports 2 types of MACs, RPM and SDP. This
patch set adds representor support for both RPM and SDP MAC
interfaces.
- Patch 1: Implements basic representor driver.
- Patch 2: Add devlink support to create representor netdevs that
can be used to manage VFs.
- Patch 3: Implements basec netdev_ndo_ops.
- Patch 4: Installs tcam rules to route packets between representor and
VFs.
- Patch 5: Enables fetching VF stats via representor interface
- Patch 6: Adds support to sync link state between representors and VFs .
- Patch 7: Enables configuring VF MTU via representor netdevs.
- Patch 8: Adds representors for sdp MAC.
- Patch 9: Adds devlink port support.
- Patch 10: Implements offload stats.
- Patch 11: Implements tc offload support.
- patch 12: Adds documentation for rvu port representor.
pci/0002:1c:00.0
Command to create PF/VF representor
Rpf1vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether f6:43:83:ee:26:21 brd ff:ff:ff:ff:ff:ff
Rpf1vf1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 12:b2:54:0e:24:54 brd ff:ff:ff:ff:ff:ff
Rpf1vf2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 4a:12:c4:4c:32:62 brd ff:ff:ff:ff:ff:ff
Rpf1vf3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether ca:cb:68:0e:e2:6e brd ff:ff:ff:ff:ff:ff
Rpf2vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 06:cc:ad:b4:f0:93 brd ff:ff:ff:ff:ff:ff
~# devlink port
pci/0002:1c:00.0/0: type eth netdev Rpf1vf0 flavour physical port 0 splittable false
pci/0002:1c:00.0/1: type eth netdev Rpf1vf1 flavour pcivf controller 0 pfnum 1 vfnum 1 external false splittable false
pci/0002:1c:00.0/2: type eth netdev Rpf1vf2 flavour pcivf controller 0 pfnum 1 vfnum 2 external false splittable false
pci/0002:1c:00.0/3: type eth netdev Rpf1vf3 flavour pcivf controller 0 pfnum 1 vfnum 3 external false splittable false
-----------
v11:v1:
- Submitted refactoring changes as a separate patch set.
https://lore.kernel.org/netdev/20241023161843[email protected]/T/
- Moved documentation to a separate patch.
- patch 9: Added code changes to forward updated mac address to VF.
- Implemented TC offload support.
v10-v11:
- As suggested by "Jiri Pirko" adjusted the documentation.
- Added more commit description to patch1.
Geetha sowjanya [Thu, 7 Nov 2024 16:08:35 +0000 (21:38 +0530)]
octeontx2-pf: Add representors for sdp MAC
Hardware supports different types of MACs eg RPM, SDP, LBK.
LBK is for internal Tx->Rx HW loopback path. RPM and SDP MACs support
ingress/egress pkt IO on interfaces with different set of capabilities
like interface modes. At the time of netdev driver registration PF will
seek MAC related information from Admin function driver
'drivers/net/ethernet/marvell/octeontx2/af' and sets up ingress/egress
queues etc such that pkt IO on the channels of these different MACs is
possible. This patch add representors for SDP MAC.
Geetha sowjanya [Thu, 7 Nov 2024 16:08:34 +0000 (21:38 +0530)]
octeontx2-pf: Configure VF mtu via representor
Adds support to manage the mtu configuration for VF through representor.
On update of representor mtu a mbox notification is send
to VF to update its mtu.
This feature is implemented based on the "Network Function Representors"
kernel documentation.
"
Setting an MTU on the representor should cause that same MTU
to be reported to the representee.
"
Geetha sowjanya [Thu, 7 Nov 2024 16:08:33 +0000 (21:38 +0530)]
octeontx2-pf: Add support to sync link state between representor and VFs
Implements the below requirement mentioned
in the representors documentation.
"
The representee's link state is controlled through the
representor. Setting the representor administratively UP
or DOWN should cause carrier ON or OFF at the representee.
"
This patch enables
- Reflecting the link state of representor based on the VF state and
link state of VF based on representor.
- On VF interface up/down a notification is sent via mbox to representor
to update the link state.
eg: ip link set eth0 up/down will disable carrier on/off
of the corresponding representor(r0p1) interface.
- On representor interface up/down will cause the link state update of VF.
eg: ip link set r0p1 up/down will disable carrier on/off
of the corresponding representee(eth0) interface.
Geetha sowjanya [Thu, 7 Nov 2024 16:08:31 +0000 (21:38 +0530)]
octeontx2-af: Add packet path between representor and VF
Current HW, do not support in-built switch which will forward pkts
between representee and representor. When representor is put under
a bridge and pkts needs to be sent to representee, then pkts from
representor are sent on a HW internal loopback channel, which again
will be punted to ingress pkt parser. Now the rules that this patch
installs are the MCAM filters/rules which will match against these
pkts and forward them to representee.
The rules that this patch installs are for basic
representor <=> representee path similar to Tun/TAP between VM and
Host.
Geetha sowjanya [Thu, 7 Nov 2024 16:08:29 +0000 (21:38 +0530)]
octeontx2-pf: Create representor netdev
Adds initial devlink support to set/get the switchdev mode.
Representor netdevs are created for each rvu devices when
the switch mode is set to 'switchdev'. These netdevs are
be used to control and configure VFs.
Geetha sowjanya [Thu, 7 Nov 2024 16:08:28 +0000 (21:38 +0530)]
octeontx2-pf: RVU representor driver
Adds basic driver for the RVU representor.
Driver on probe does pci specific initialization and
does hw resources configuration. Introduces RVU_ESWITCH
kernel config to enable/disable the driver. Representor
and NIC shares the code but representors netdev support
subset of NIC functionality. Hence "otx2_rep_dev" API
helps to skip the features initialization that are not
supported by the representors.
Jakub Kicinski [Sat, 9 Nov 2024 02:33:03 +0000 (18:33 -0800)]
net: page_pool: do not count normal frag allocation in stats
Commit 0f6deac3a079 ("net: page_pool: add page allocation stats for
two fast page allocate path") added increments for "fast path"
allocation to page frag alloc. It mentions performance degradation
analysis but the details are unclear. Could be that the author
was simply surprised by the alloc stats not matching packet count.
In my experience the key metric for page pool is the recycling rate.
Page return stats, however, count returned _pages_ not frags.
This makes it impossible to calculate recycling rate for drivers
using the frag API. Here is example output of the page-pool
YNL sample for a driver allocating 1200B frags (4k pages)
with nearly perfect recycling:
The recycling rate is reported as 33.3% because we give out
4096 // 1200 = 3 frags for every recycled page.
Effectively revert the aforementioned commit. This also aligns
with the stats we would see for drivers which do the fragmentation
themselves, although that's not a strong reason in itself.
On the (very unlikely) path where we can reuse the current page
let's bump the "cached" stat. The fact that we don't put the page
in the cache is just an optimization.
Jakub Kicinski [Sat, 9 Nov 2024 03:51:19 +0000 (19:51 -0800)]
eth: bnxt: use page pool for head frags
Testing small size RPCs (300B-400B) on a large AMD system suggests
that page pool recycling is very useful even for just the head frags.
With this patch (and copy break disabled) I see a 30% performance
improvement (82Gbps -> 106Gbps).
Convert bnxt from normal page frags to page pool frags for head buffers.
On systems with small page size we can use the same pool as for TPA
pages. On systems with large pages the frag allocation logic of the
page pool is already used to split a large page into TPA chunks.
TPA chunks are much larger than heads (8k or 64k, AFAICT vs 1kB)
and we always allocate the same sized chunks. Mixing allocation
of TPA and head pages would lead to sub-optimal memory use.
Plus Taehee's work on zero-copy / devmem will need to differentiate
between TPA and non-TPA page pool, anyway. Conditionally allocate
a new page pool for heads.
net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.
To generate hnode handles (in gen_new_htid()), u32 uses IDR and
encodes the returned small integer into a structured 32-bit
word. Unfortunately, at disposal time, the needed decoding
is not done. As a result, idr_remove() fails, and the IDR
fills up. Since its size is 2048, the following script ends up
with "Filter already exists":
tc filter add dev myve $FILTER1
tc filter add dev myve $FILTER2
for i in {1..2048}
do
echo $i
tc filter del dev myve $FILTER2
tc filter add dev myve $FILTER2
done
This patch adds the missing decoding logic for handles that
deserve it.
Andrew Lunn [Sun, 10 Nov 2024 17:59:55 +0000 (18:59 +0100)]
dsa: qca8k: Use nested lock to avoid splat
qca8k_phy_eth_command() is used to probe the child MDIO bus while the
parent MDIO is locked. This causes lockdep splat, reporting a possible
deadlock. It is not an actually deadlock, because different locks are
used. By making use of mutex_lock_nested() we can avoid this false
positive.
Removing full driver sections also removed mailing list entries, causing
submitters of future patches to forget CCing these mailing lists.
Hence re-add the sections for the Renesas Ethernet AVB, R-Car SATA, and
SuperH Ethernet drivers. Add people who volunteered to maintain these
drivers (thanks a lot!), and mark all of them as supported.
Jakub Kicinski [Wed, 13 Nov 2024 01:30:41 +0000 (17:30 -0800)]
Merge tag 'for-net-2024-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- btintel: Direct exception event to bluetooth stack
- hci_core: Fix calling mgmt_device_connected
* tag 'for-net-2024-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: btintel: Direct exception event to bluetooth stack
Bluetooth: hci_core: Fix calling mgmt_device_connected
====================
The syzbot console output indicates a virtual environment where swapfile
is on a rotational device. In this case, clusters aren't actually used,
and si->full_clusters is not initialized. Daan's report is from qemu, so
likely rotational too.
Make sure to only schedule the cluster reclaim work when clusters are
actually in use.
Si-Wei Liu [Mon, 21 Oct 2024 13:40:39 +0000 (16:40 +0300)]
vdpa/mlx5: Fix PA offset with unaligned starting iotlb map
When calculating the physical address range based on the iotlb and mr
[start,end) ranges, the offset of mr->start relative to map->start
is not taken into account. This leads to some incorrect and duplicate
mappings.
For the case when mr->start < map->start the code is already correct:
the range in [mr->start, map->start) was handled by a different
iteration.
Linus Torvalds [Tue, 12 Nov 2024 21:35:13 +0000 (13:35 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86 and selftests fixes.
x86:
- When emulating a guest TLB flush for a nested guest, flush vpid01,
not vpid02, if L2 is active but VPID is disabled in vmcs12, i.e. if
L2 and L1 are sharing VPID '0' (from L1's perspective).
- Fix a bug in the SNP initialization flow where KVM would return '0'
to userspace instead of -errno on failure.
- Move the Intel PT virtualization (i.e. outputting host trace to
host buffer and guest trace to guest buffer) behind CONFIG_BROKEN.
- Fix memory leak on failure of KVM_SEV_SNP_LAUNCH_START
- Fix a bug where KVM fails to inject an interrupt from the IRR after
KVM_SET_LAPIC.
Selftests:
- Increase the timeout for the memslot performance selftest to avoid
false failures on arm64 and nested x86 platforms.
- Fix a goof in the guest_memfd selftest where a for-loop initialized
a bit mask to zero instead of BIT(0).
- Disable strict aliasing when building KVM selftests to prevent the
compiler from treating things like "u64 *" to "uint64_t *" cases as
undefined behavior, which can lead to nasty, hard to debug
failures.
- Force -march=x86-64-v2 for KVM x86 selftests if and only if the
uarch is supported by the compiler.
- Fix broken compilation of kvm selftests after a header sync in
tools/"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: Bury Intel PT virtualization (guest/host mode) behind CONFIG_BROKEN
KVM: x86: Unconditionally set irr_pending when updating APICv state
kvm: svm: Fix gctx page leak on invalid inputs
KVM: selftests: use X86_MEMTYPE_WB instead of VMX_BASIC_MEM_TYPE_WB
KVM: SVM: Propagate error from snp_guest_req_init() to userspace
KVM: nVMX: Treat vpid01 as current if L2 is active, but with VPID disabled
KVM: selftests: Don't force -march=x86-64-v2 if it's unsupported
KVM: selftests: Disable strict aliasing
KVM: selftests: fix unintentional noop test in guest_memfd_test.c
KVM: selftests: memslot_perf_test: increase guest sync timeout
Linus Torvalds [Tue, 12 Nov 2024 21:21:07 +0000 (13:21 -0800)]
Merge tag 'for-6.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mikulas Patocka:
- fix warnings about duplicate slab cache names
* tag 'for-6.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-cache: fix warnings about duplicate slab caches
dm-bufio: fix warnings about duplicate slab caches
Linus Torvalds [Tue, 12 Nov 2024 21:06:31 +0000 (13:06 -0800)]
Merge tag 'integrity-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity fixes from Mimi Zohar:
"One bug fix, one performance improvement, and the use of
static_assert:
- The bug fix addresses "only a cosmetic change" commit, which didn't
take into account the original 'ima' template definition.
- The performance improvement limits the atomic_read()"
* tag 'integrity-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
integrity: Use static_assert() to check struct sizes
evm: stop avoidably reading i_writecount in evm_file_release
ima: fix buffer overrun in ima_eventdigest_init_common
Linus Torvalds [Tue, 12 Nov 2024 21:01:09 +0000 (13:01 -0800)]
Merge tag 'landlock-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fixes from Mickaël Salaün:
"This fixes issues in the Landlock's sandboxer sample and
documentation, slightly refactors helpers (required for ongoing patch
series), and improve/fix a feature merged in v6.12 (signal and
abstract UNIX socket scoping)"
* tag 'landlock-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Optimize scope enforcement
landlock: Refactor network access mask management
landlock: Refactor filesystem access mask management
samples/landlock: Clarify option parsing behaviour
samples/landlock: Refactor help message
samples/landlock: Fix port parsing in sandboxer
landlock: Fix grammar issues in documentation
landlock: Improve documentation of previous limitations
Alf reports that this commit causes the connection to eventually die on
iwl4965. The reason is that rx_status.flag is zeroed after
RX_FLAG_FAILED_FCS_CRC is set and mac80211 doesn't know the received frame is
corrupted.
Donet Tom [Sun, 10 Nov 2024 06:49:03 +0000 (00:49 -0600)]
selftests: hugetlb_dio: fixup check for initial conditions to skip in the start
This test verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping. To test this, we read the
count of free hugepages before and after the mmap, DIO, and munmap
operations, then check if the free hugepage count is the same.
Reading free hugepages before the test was removed by commit 0268d4579901
('selftests: hugetlb_dio: check for initial conditions to skip at the
start'), causing the test to always fail.
This patch adds back reading the free hugepages before starting the test.
With this patch, the tests are now passing.
Test results without this patch:
./tools/testing/selftests/mm/hugetlb_dio
TAP version 13
1..4
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 1 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 2 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 3 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 4 : Huge pages not freed!
# Totals: pass:0 fail:4 xfail:0 xpass:0 skip:0 error:0
Test results with this patch:
/tools/testing/selftests/mm/hugetlb_dio
TAP version 13
1..4
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 1 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 2 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 3 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 4 : Huge pages freed successfully !
Hugh Dickins [Sun, 10 Nov 2024 21:11:21 +0000 (13:11 -0800)]
mm/thp: fix deferred split queue not partially_mapped: fix
Though even more elusive than before, list_del corruption has still been
seen on THP's deferred split queue.
The idea in commit e66f3185fa04 was right, but its implementation wrong.
The context omitted an important comment just before the critical test:
"split_folio() removes folio from list on success." In ignoring that
comment, when a THP split succeeded, the code went on to release the
preceding safe folio, preserving instead an irrelevant (formerly head)
folio: which gives no safety because it's not on the list. Fix the logic.
John Hubbard [Tue, 5 Nov 2024 03:29:44 +0000 (19:29 -0800)]
mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
commit 53ba78de064b ("mm/gup: introduce
check_and_migrate_movable_folios()") created a new constraint on the
pin_user_pages*() API family: a potentially large internal allocation must
now occur, for FOLL_LONGTERM cases.
A user-visible consequence has now appeared: user space can no longer pin
more than 2GB of memory anymore on x86_64. That's because, on a 4KB
PAGE_SIZE system, when user space tries to (indirectly, via a device
driver that calls pin_user_pages()) pin 2GB, this requires an allocation
of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for
kmalloc().
In addition to the directly visible effect described above, there is also
the problem of adding an unnecessary allocation. The **pages array
argument has already been allocated, and there is no need for a redundant
**folios array allocation in this case.
Fix this by avoiding the new allocation entirely. This is done by
referring to either the original page[i] within **pages, or to the
associated folio. Thanks to David Hildenbrand for suggesting this
approach and for providing the initial implementation (which I've tested
and adjusted slightly) as well.
Kiran K [Tue, 22 Oct 2024 09:11:34 +0000 (14:41 +0530)]
Bluetooth: btintel: Direct exception event to bluetooth stack
Have exception event part of HCI traces which helps for debug.
snoop traces:
> HCI Event: Vendor (0xff) plen 79
Vendor Prefix (0x8780)
Intel Extended Telemetry (0x03)
Unknown extended telemetry event type (0xde)
01 01 de
Unknown extended subevent 0x07
01 01 de 07 01 de 06 1c ef be ad de ef be ad de
ef be ad de ef be ad de ef be ad de ef be ad de
ef be ad de 05 14 ef be ad de ef be ad de ef be
ad de ef be ad de ef be ad de 43 10 ef be ad de
ef be ad de ef be ad de ef be ad de
Fixes: af395330abed ("Bluetooth: btintel: Add Intel devcoredump support") Signed-off-by: Kiran K <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Since 61a939c68ee0 ("Bluetooth: Queue incoming ACL data until
BT_CONNECTED state is reached") there is no long the need to call
mgmt_device_connected as ACL data will be queued until BT_CONNECTED
state.
MeiChia Chiu [Tue, 12 Nov 2024 08:38:46 +0000 (16:38 +0800)]
wifi: mac80211: Support EHT 1024 aggregation size in TX
Support EHT 1024 aggregation size in TX
The 1024 agg size for RX is supported but not for TX.
This patch adds this support and refactors common parsing logics for
addbaext in both process_addba_resp and process_addba_req into a
function.