]> Git Repo - linux.git/log
linux.git
4 months agor8169: enable EEE at 2.5G per default on RTL8125B
Heiner Kallweit [Thu, 17 Oct 2024 20:27:44 +0000 (22:27 +0200)]
r8169: enable EEE at 2.5G per default on RTL8125B

Register a6d/12 is shadowing register MDIO_AN_EEE_ADV2. So this line
disables advertisement of EEE at 2.5G. Latest vendor driver r8125
doesn't do this (any longer?), so this mode seems to be safe.
EEE saves quite some energy, therefore enable this mode per default.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Message-ID: <95dd5a0c-09ea-4847-94d9-b7aa3063e8ff@gmail.com>
Signed-off-by: Andrew Lunn <[email protected]>
4 months agonet: phy: realtek: add RTL8125D-internal PHY
Heiner Kallweit [Thu, 17 Oct 2024 16:01:13 +0000 (18:01 +0200)]
net: phy: realtek: add RTL8125D-internal PHY

The first boards show up with Realtek's RTL8125D. This MAC/PHY chip
comes with an integrated 2.5Gbps PHY with ID 0x001cc841. It's not
clear yet whether there's an external version of this PHY and how
Realtek calls it, therefore use the numeric id for now.

Link: https://lore.kernel.org/netdev/[email protected]/T/
Signed-off-by: Heiner Kallweit <[email protected]>
Message-ID: <7d2924de-053b-44d2-a479-870dc3878170@gmail.com>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
4 months agonet: airoha: Reset BQL stopping the netdevice
Lorenzo Bianconi [Thu, 17 Oct 2024 14:01:41 +0000 (16:01 +0200)]
net: airoha: Reset BQL stopping the netdevice

Run airoha_qdma_cleanup_tx_queue() in ndo_stop callback in order to
unmap pending skbs. Moreover, reset BQL txq state stopping the netdevice,

Signed-off-by: Lorenzo Bianconi <[email protected]>
Reviewed-by: Hariprasad Kelam <[email protected]>
Message-ID: <20241017-airoha-en7581-reset-bql-v1-1-08c0c9888de5@kernel.org>
Signed-off-by: Andrew Lunn <[email protected]>
4 months agonet: phy: mediatek-ge-soc: Propagate error code correctly in cal_cycle()
SkyLake.Huang [Thu, 17 Oct 2024 03:22:13 +0000 (11:22 +0800)]
net: phy: mediatek-ge-soc: Propagate error code correctly in cal_cycle()

This patch propagates error code correctly in cal_cycle()
and improve with FIELD_GET().

Signed-off-by: SkyLake.Huang <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
4 months agonet: phy: mediatek-ge-soc: Shrink line wrapping to 80 characters
SkyLake.Huang [Thu, 17 Oct 2024 03:22:12 +0000 (11:22 +0800)]
net: phy: mediatek-ge-soc: Shrink line wrapping to 80 characters

This patch shrinks line wrapping to 80 chars. Also, in
tx_amp_fill_result(), use FIELD_PREP() to prettify code.

Signed-off-by: SkyLake.Huang <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
4 months agonet: phy: mediatek-ge-soc: Fix coding style
SkyLake.Huang [Thu, 17 Oct 2024 03:22:11 +0000 (11:22 +0800)]
net: phy: mediatek-ge-soc: Fix coding style

This patch fixes spelling errors, re-arrange vars with
reverse Xmas tree and remove unnecessary parens in
mediatek-ge-soc.c.

Signed-off-by: SkyLake.Huang <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
4 months agor8169: remove rtl_dash_loop_wait_high/low
Heiner Kallweit [Wed, 16 Oct 2024 20:31:10 +0000 (22:31 +0200)]
r8169: remove rtl_dash_loop_wait_high/low

Remove rtl_dash_loop_wait_high/low to simplify the code.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Message-ID: <fb8c490c-2d92-48f5-8bbf-1fc1f2ee1649@gmail.com>
Signed-off-by: Andrew Lunn <[email protected]>
4 months agor8169: avoid duplicated messages if loading firmware fails and switch to warn level
Heiner Kallweit [Wed, 16 Oct 2024 20:29:39 +0000 (22:29 +0200)]
r8169: avoid duplicated messages if loading firmware fails and switch to warn level

In case of a problem with firmware loading we inform at the driver level,
in addition the firmware load code itself issues warnings. Therefore
switch to firmware_request_nowarn() to avoid duplicated error messages.
In addition switch to warn level because the firmware is optional and
typically just fixes compatibility issues.

Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Message-ID: <d9c5094c-89a6-40e2-b5fe-8df7df4624ef@gmail.com>
Signed-off-by: Andrew Lunn <[email protected]>
4 months agor8169: replace custom flag with disable_work() et al
Heiner Kallweit [Wed, 16 Oct 2024 20:06:53 +0000 (22:06 +0200)]
r8169: replace custom flag with disable_work() et al

So far we use a custom flag to define when a task can be scheduled and
when not. Let's use the standard mechanism with disable_work() et al
instead.
Note that in rtl8169_close() we can remove the call to cancel_work()
because we now call disable_work_sync() in rtl8169_down() already.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
4 months agor8169: don't take RTNL lock in rtl_task()
Heiner Kallweit [Wed, 16 Oct 2024 20:05:57 +0000 (22:05 +0200)]
r8169: don't take RTNL lock in rtl_task()

There's not really a benefit here in taking the RTNL lock. The task
handler does exception handling only, so we're in trouble anyway when
we come here, and there's no need to protect against e.g. a parallel
ethtool call.
A benefit of removing the RTNL lock here is that we now can
synchronously cancel the workqueue from a context holding the RTNL mutex.

Signed-off-by: Heiner Kallweit <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
4 months agoeth: fbnic: add CONFIG_PTP_1588_CLOCK_OPTIONAL dependency
Arnd Bergmann [Wed, 16 Oct 2024 06:22:58 +0000 (06:22 +0000)]
eth: fbnic: add CONFIG_PTP_1588_CLOCK_OPTIONAL dependency

fbnic fails to link as built-in when PTP support is in a loadable
module:

aarch64-linux-ld: drivers/net/ethernet/meta/fbnic/fbnic_ethtool.o: in function `fbnic_get_ts_info':
fbnic_ethtool.c:(.text+0x428): undefined reference to `ptp_clock_index'
aarch64-linux-ld: drivers/net/ethernet/meta/fbnic/fbnic_time.o: in function `fbnic_time_start':
fbnic_time.c:(.text+0x820): undefined reference to `ptp_schedule_worker'
aarch64-linux-ld: drivers/net/ethernet/meta/fbnic/fbnic_time.o: in function `fbnic_ptp_setup':
fbnic_time.c:(.text+0xa68): undefined reference to `ptp_clock_register'

Add the appropriate dependency to enforce this.

Fixes: 6a2b3ede9543 ("eth: fbnic: add RX packets timestamping support")
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Vadim Fedorenko <[email protected]>
Message-ID: <20241016062303.2551686[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: vxlan: update the document for vxlan_snoop()
Menglong Dong [Tue, 15 Oct 2024 09:02:44 +0000 (17:02 +0800)]
net: vxlan: update the document for vxlan_snoop()

The function vxlan_snoop() returns drop reasons now, so update the
document of it too.

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: vxlan: replace VXLAN_INVALID_HDR with VNI_NOT_FOUND
Menglong Dong [Tue, 15 Oct 2024 08:28:30 +0000 (16:28 +0800)]
net: vxlan: replace VXLAN_INVALID_HDR with VNI_NOT_FOUND

Replace the drop reason "SKB_DROP_REASON_VXLAN_INVALID_HDR" with
"SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND" in encap_bypass_if_local(), as the
latter is more accurate.

Fixes: 790961d88b0e ("net: vxlan: use kfree_skb_reason() in encap_bypass_if_local()")
Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
5 months agonet: airoha: Fix typo in REG_CDM2_FWD_CFG configuration
Lorenzo Bianconi [Tue, 15 Oct 2024 07:58:09 +0000 (09:58 +0200)]
net: airoha: Fix typo in REG_CDM2_FWD_CFG configuration

Fix typo in airoha_fe_init routine configuring CDM2_OAM_QSEL_MASK field
of REG_CDM2_FWD_CFG register.
This bug is not introducing any user visible problem since Frame Engine
CDM2 port is used just by the second QDMA block and we currently enable
just QDMA1 block connected to the MT7530 dsa switch via CDM1 port.

Introduced by commit 23020f049327 ("net: airoha: Introduce ethernet
support for EN7581 SoC")

Reported-by: ChihWei Cheng <[email protected]>
Signed-off-by: Lorenzo Bianconi <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Message-ID: <20241015-airoha-eth-cdm2-fixes-v1-1-9dc6993286c3@kernel.org>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: ravb: Add VLAN checksum support
Paul Barker [Tue, 15 Oct 2024 13:36:34 +0000 (14:36 +0100)]
net: ravb: Add VLAN checksum support

The GbEth IP supports offloading checksum calculation for VLAN-tagged
packets, provided that the EtherType is 0x8100 and only one VLAN tag is
present.

Signed-off-by: Paul Barker <[email protected]>
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: ravb: Enable IPv6 TX checksum offload for GbEth
Paul Barker [Tue, 15 Oct 2024 13:36:33 +0000 (14:36 +0100)]
net: ravb: Enable IPv6 TX checksum offload for GbEth

The GbEth IP supports offloading IPv6 TCP, UDP & ICMPv6 checksums in the
TX path.

Signed-off-by: Paul Barker <[email protected]>
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: ravb: Enable IPv6 RX checksum offloading for GbEth
Paul Barker [Tue, 15 Oct 2024 13:36:32 +0000 (14:36 +0100)]
net: ravb: Enable IPv6 RX checksum offloading for GbEth

The GbEth IP supports offloading IPv6 TCP, UDP & ICMPv6 checksums in the
RX path.

Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Paul Barker <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: ravb: Simplify UDP TX checksum offload
Paul Barker [Tue, 15 Oct 2024 13:36:31 +0000 (14:36 +0100)]
net: ravb: Simplify UDP TX checksum offload

The GbEth IP will pass through a zero UDP checksum without asserting any
error flags so we do not need to resort to software checksum calculation
in this case.

Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Paul Barker <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: ravb: Disable IP header TX checksum offloading
Paul Barker [Tue, 15 Oct 2024 13:36:30 +0000 (14:36 +0100)]
net: ravb: Disable IP header TX checksum offloading

For IPv4 packets, the header checksum will always be calculated in software
in the TX path (Documentation/networking/checksum-offloads.rst says "No
offloading of the IP header checksum is performed; it is always done in
software.") so there is no advantage in asking the hardware to also
calculate this checksum.

Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Paul Barker <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: ravb: Simplify types in RX csum validation
Paul Barker [Tue, 15 Oct 2024 13:36:29 +0000 (14:36 +0100)]
net: ravb: Simplify types in RX csum validation

The hardware checksum value is used as a 16-bit flag, it is zero when
the checksum has been validated and non-zero otherwise. Therefore we
don't need to treat this as an actual __wsum type or call csum_unfold(),
we can just use a u16 pointer.

Signed-off-by: Paul Barker <[email protected]>
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: ravb: Combine if conditions in RX csum validation
Paul Barker [Tue, 15 Oct 2024 13:36:28 +0000 (14:36 +0100)]
net: ravb: Combine if conditions in RX csum validation

We can merge the two if conditions on skb_is_nonlinear(). Since
skb_frag_size_sub() and skb_trim() do not free memory, it is still safe
to access the trimmed bytes at the end of the packet after these calls.

Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Paul Barker <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: ravb: Drop IP protocol check from RX csum verification
Paul Barker [Tue, 15 Oct 2024 13:36:27 +0000 (14:36 +0100)]
net: ravb: Drop IP protocol check from RX csum verification

We do not need to confirm that the protocol is IPv4. If the hardware
encounters an unsupported protocol, it will set the checksum value to
0xFFFF.

Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Paul Barker <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: ravb: Disable IP header RX checksum offloading
Paul Barker [Tue, 15 Oct 2024 13:36:26 +0000 (14:36 +0100)]
net: ravb: Disable IP header RX checksum offloading

For IPv4 packets, the header checksum will always be checked in software
in the RX path (inet_gro_receive() calls ip_fast_csum() unconditionally)
so there is no advantage in asking the hardware to also calculate this
checksum.

Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Paul Barker <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: ravb: Factor out checksum offload enable bits
Paul Barker [Tue, 15 Oct 2024 13:36:25 +0000 (14:36 +0100)]
net: ravb: Factor out checksum offload enable bits

Introduce new constants for the CSR1 (TX) and CSR2 (RX) checksum enable
bits, removing the risk of inconsistency when we change which flags we
enable.

Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Paul Barker <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agotg3: Increase buffer size for IRQ label
Andy Shevchenko [Wed, 16 Oct 2024 09:05:54 +0000 (12:05 +0300)]
tg3: Increase buffer size for IRQ label

GCC is not happy with the current code, e.g.:

.../tg3.c:11313:37: error: ‘-txrx-’ directive output may be truncated writing 6 bytes into a region of size between 1 and 16 [-Werror=format-truncation=]
11313 |                                  "%s-txrx-%d", tp->dev->name, irq_num);
      |                                     ^~~~~~
.../tg3.c:11313:34: note: using the range [-21474836482147483647] for directive argument
11313 |                                  "%s-txrx-%d", tp->dev->name, irq_num);

When `make W=1` is supplied, this prevents kernel building. Fix it by
increasing the buffer size for IRQ label and use sizeoF() instead of
hard coded constants.

Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Michael Chan <[email protected]>
Message-ID: <20241016090647[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: phylink: remove "using_mac_select_pcs"
Russell King (Oracle) [Wed, 16 Oct 2024 09:58:44 +0000 (10:58 +0100)]
net: phylink: remove "using_mac_select_pcs"

With DSA's implementation of the mac_select_pcs() method removed, we
can now remove the detection of mac_select_pcs() implementation.

Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: phylink: remove use of pl->pcs in phylink_validate_mac_and_pcs()
Russell King (Oracle) [Wed, 16 Oct 2024 09:58:39 +0000 (10:58 +0100)]
net: phylink: remove use of pl->pcs in phylink_validate_mac_and_pcs()

When the mac_select_pcs() method is not implemented, there is no way
for pl->pcs to be set to a non-NULL value. This was here to support
the old phylink_set_pcs() method which has been removed a few years
ago. Simplify the code in phylink_validate_mac_and_pcs().

Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: phylink: allow mac_select_pcs() to remove a PCS
Russell King (Oracle) [Wed, 16 Oct 2024 09:58:34 +0000 (10:58 +0100)]
net: phylink: allow mac_select_pcs() to remove a PCS

phylink has historically not permitted a PCS to be removed. An attempt
to permit this with phylink_set_pcs() resulted in comments indicating
that there was no need for this. This behaviour has been propagated
forward to the mac_select_pcs() approach as it was believed from these
comments that changing this would be NAK'd.

However, with mac_select_pcs(), it takes more code and thus complexity
to maintain this behaviour, which can - and in this case has - resulted
in a bug. If mac_select_pcs() returns NULL for a particular interface
type, but there is already a PCS in-use, then we skip the pcs_validate()
method, but continue using the old PCS. Also, it wouldn't be expected
behaviour by implementers of mac_select_pcs().

Allow this by removing this old unnecessary restriction.

Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: dsa: mv88e6xxx: return NULL when no PCS is present
Russell King (Oracle) [Wed, 16 Oct 2024 09:58:29 +0000 (10:58 +0100)]
net: dsa: mv88e6xxx: return NULL when no PCS is present

Rather than returning an EOPNOTSUPP error pointer when the switch
has no support for PCS, return NULL to indicate that no PCS is
required.

Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: dsa: remove dsa_port_phylink_mac_select_pcs()
Russell King (Oracle) [Wed, 16 Oct 2024 09:58:24 +0000 (10:58 +0100)]
net: dsa: remove dsa_port_phylink_mac_select_pcs()

There is no longer any reason to implement the mac_select_pcs()
callback in DSA. Returning ERR_PTR(-EOPNOTSUPP) is functionally
equivalent to not providing the function.

Signed-off-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: ks8851: use %*ph to print small buffer
Andy Shevchenko [Wed, 16 Oct 2024 13:25:26 +0000 (16:25 +0300)]
net: ks8851: use %*ph to print small buffer

Use %*ph format to print small buffer as hex string. It will change
the output format from 32-bit words to byte hexdump, but this is not
critical as it's only a debug message.

Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Message-ID: <20241016132615[email protected]>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: usb: sr9700: only store little-endian values in __le16 variable
Simon Horman [Wed, 16 Oct 2024 14:31:14 +0000 (15:31 +0100)]
net: usb: sr9700: only store little-endian values in __le16 variable

In sr_mdio_read() the local variable res is used to store both
little-endian and host byte order values. This prevents Sparse
from helping us by flagging when endian miss matches occur - the
detection process hinges on the type of variables matching the
byte order of values stored in them.

Address this by adding a new local variable, word, to store little-endian
values; change the type of res to int, and use it to store host-byte
order values.

Flagged by Sparse as:

.../sr9700.c:205:21: warning: incorrect type in assignment (different base types)
.../sr9700.c:205:21:    expected restricted __le16 [addressable] [usertype] res
.../sr9700.c:205:21:    got int
.../sr9700.c:207:21: warning: incorrect type in assignment (different base types)
.../sr9700.c:207:21:    expected restricted __le16 [addressable] [usertype] res
.../sr9700.c:207:21:    got int
.../sr9700.c:212:16: warning: incorrect type in return expression (different base types)
.../sr9700.c:212:16:    expected int
.../sr9700.c:212:16:    got restricted __le16 [addressable] [usertype] res

Compile tested only.
No functional change intended.

Signed-off-by: Simon Horman <[email protected]>
Message-ID: <20241016-blackbird-le16-v1-1-97ba8de6b38f@kernel.org>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agonet: ethernet: ti: am65-cpsw: Fix uninitialized variable
Dan Carpenter [Wed, 16 Oct 2024 14:41:44 +0000 (17:41 +0300)]
net: ethernet: ti: am65-cpsw: Fix uninitialized variable

The *ndev pointer needs to be set or it leads to an uninitialized variable
bug in the caller.

Fixes: 4a7b2ba94a59 ("net: ethernet: ti: am65-cpsw: Use tstats instead of open coded version")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Roger Quadros <[email protected]>
Message-ID: <b168d5c7-704b-4452-84f9-1c1762b1f4ce@stanley.mountain>
Signed-off-by: Andrew Lunn <[email protected]>
5 months agoMerge tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 17 Oct 2024 16:31:18 +0000 (09:31 -0700)]
Merge tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Current release - new code bugs:

   - eth: mlx5: HWS, don't destroy more bwc queue locks than allocated

  Previous releases - regressions:

   - ipv4: give an IPv4 dev to blackhole_netdev

   - udp: compute L4 checksum as usual when not segmenting the skb

   - tcp/dccp: don't use timer_pending() in reqsk_queue_unlink().

   - eth: mlx5e: don't call cleanup on profile rollback failure

   - eth: microchip: vcap api: fix memory leaks in
     vcap_api_encode_rule_test()

   - eth: enetc: disable Tx BD rings after they are empty

   - eth: macb: avoid 20s boot delay by skipping MDIO bus registration
     for fixed-link PHY

  Previous releases - always broken:

   - posix-clock: fix missing timespec64 check in pc_clock_settime()

   - genetlink: hold RCU in genlmsg_mcast()

   - mptcp: prevent MPC handshake on port-based signal endpoints

   - eth: vmxnet3: fix packet corruption in vmxnet3_xdp_xmit_frame

   - eth: stmmac: dwmac-tegra: fix link bring-up sequence

   - eth: bcmasp: fix potential memory leak in bcmasp_xmit()

  Misc:

   - add Andrew Lunn as a co-maintainer of all networking drivers"

* tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  net/mlx5e: Don't call cleanup on profile rollback failure
  net/mlx5: Unregister notifier on eswitch init failure
  net/mlx5: Fix command bitmask initialization
  net/mlx5: Check for invalid vector index on EQ creation
  net/mlx5: HWS, use lock classes for bwc locks
  net/mlx5: HWS, don't destroy more bwc queue locks than allocated
  net/mlx5: HWS, fixed double free in error flow of definer layout
  net/mlx5: HWS, removed wrong access to a number of rules variable
  mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
  net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init
  vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame
  net: dsa: vsc73xx: fix reception from VLAN-unaware bridges
  net: ravb: Only advertise Rx/Tx timestamps if hardware supports it
  net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()
  net: phy: mdio-bcm-unimac: Add BCM6846 support
  dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio
  udp: Compute L4 checksum as usual when not segmenting the skb
  genetlink: hold RCU in genlmsg_mcast()
  net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361
  tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
  ...

5 months agonet: phy: realtek: merge the drivers for internal NBase-T PHY's
Heiner Kallweit [Tue, 15 Oct 2024 05:47:14 +0000 (07:47 +0200)]
net: phy: realtek: merge the drivers for internal NBase-T PHY's

The Realtek RTL8125/RTL8126 NBase-T MAC/PHY chips have internal PHY's
which are register-compatible, at least for the registers we use here.
So let's use just one PHY driver to support all of them.
These internal PHY's exist also as external C45 PHY's, but on the
internal PHY's no access to MMD registers is possible. This can be
used to differentiate between the internal and external version.

As a side effect the drivers for two now external-only drivers don't
require read_mmd/write_mmd hooks any longer.

Signed-off-by: Heiner Kallweit <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
5 months agoeth: fbnic: Add hardware monitoring support via HWMON interface
Sanman Pradhan [Mon, 14 Oct 2024 15:27:09 +0000 (08:27 -0700)]
eth: fbnic: Add hardware monitoring support via HWMON interface

This patch adds support for hardware monitoring to the fbnic driver,
allowing for temperature and voltage sensor data to be exposed to
userspace via the HWMON interface. The driver registers a HWMON device
and provides callbacks for reading sensor data, enabling system
admins to monitor the health and operating conditions of fbnic.

Signed-off-by: Sanman Pradhan <[email protected]>
Reviewed-by: Kalesh AP <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
5 months agoMerge branch 'mlx5-misc-fixes-2024-10-15'
Paolo Abeni [Thu, 17 Oct 2024 10:14:10 +0000 (12:14 +0200)]
Merge branch 'mlx5-misc-fixes-2024-10-15'

Tariq Toukan says:

====================
mlx5 misc fixes 2024-10-15

This patchset provides misc bug fixes from the team to the mlx5 core and
Eth drivers.

Series generated against:
commit 174714f0e505 ("selftests: drivers: net: fix name not defined")
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
5 months agonet/mlx5e: Don't call cleanup on profile rollback failure
Cosmin Ratiu [Tue, 15 Oct 2024 09:32:08 +0000 (12:32 +0300)]
net/mlx5e: Don't call cleanup on profile rollback failure

When profile rollback fails in mlx5e_netdev_change_profile, the netdev
profile var is left set to NULL. Avoid a crash when unloading the driver
by not calling profile->cleanup in such a case.

This was encountered while testing, with the original trigger that
the wq rescuer thread creation got interrupted (presumably due to
Ctrl+C-ing modprobe), which gets converted to ENOMEM (-12) by
mlx5e_priv_init, the profile rollback also fails for the same reason
(signal still active) so the profile is left as NULL, leading to a crash
later in _mlx5e_remove.

 [  732.473932] mlx5_core 0000:08:00.1: E-Switch: Unload vfs: mode(OFFLOADS), nvfs(2), necvfs(0), active vports(2)
 [  734.525513] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
 [  734.557372] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12
 [  734.559187] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: new profile init failed, -12
 [  734.560153] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
 [  734.589378] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12
 [  734.591136] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12
 [  745.537492] BUG: kernel NULL pointer dereference, address: 0000000000000008
 [  745.538222] #PF: supervisor read access in kernel mode
<snipped>
 [  745.551290] Call Trace:
 [  745.551590]  <TASK>
 [  745.551866]  ? __die+0x20/0x60
 [  745.552218]  ? page_fault_oops+0x150/0x400
 [  745.555307]  ? exc_page_fault+0x79/0x240
 [  745.555729]  ? asm_exc_page_fault+0x22/0x30
 [  745.556166]  ? mlx5e_remove+0x6b/0xb0 [mlx5_core]
 [  745.556698]  auxiliary_bus_remove+0x18/0x30
 [  745.557134]  device_release_driver_internal+0x1df/0x240
 [  745.557654]  bus_remove_device+0xd7/0x140
 [  745.558075]  device_del+0x15b/0x3c0
 [  745.558456]  mlx5_rescan_drivers_locked.part.0+0xb1/0x2f0 [mlx5_core]
 [  745.559112]  mlx5_unregister_device+0x34/0x50 [mlx5_core]
 [  745.559686]  mlx5_uninit_one+0x46/0xf0 [mlx5_core]
 [  745.560203]  remove_one+0x4e/0xd0 [mlx5_core]
 [  745.560694]  pci_device_remove+0x39/0xa0
 [  745.561112]  device_release_driver_internal+0x1df/0x240
 [  745.561631]  driver_detach+0x47/0x90
 [  745.562022]  bus_remove_driver+0x84/0x100
 [  745.562444]  pci_unregister_driver+0x3b/0x90
 [  745.562890]  mlx5_cleanup+0xc/0x1b [mlx5_core]
 [  745.563415]  __x64_sys_delete_module+0x14d/0x2f0
 [  745.563886]  ? kmem_cache_free+0x1b0/0x460
 [  745.564313]  ? lockdep_hardirqs_on_prepare+0xe2/0x190
 [  745.564825]  do_syscall_64+0x6d/0x140
 [  745.565223]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 [  745.565725] RIP: 0033:0x7f1579b1288b

Fixes: 3ef14e463f6e ("net/mlx5e: Separate between netdev objects and mlx5e profiles initialization")
Signed-off-by: Cosmin Ratiu <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
5 months agonet/mlx5: Unregister notifier on eswitch init failure
Cosmin Ratiu [Tue, 15 Oct 2024 09:32:07 +0000 (12:32 +0300)]
net/mlx5: Unregister notifier on eswitch init failure

It otherwise remains registered and a subsequent attempt at eswitch
enabling might trigger warnings of the sort:

[  682.589148] ------------[ cut here ]------------
[  682.590204] notifier callback eswitch_vport_event [mlx5_core] already registered
[  682.590256] WARNING: CPU: 13 PID: 2660 at kernel/notifier.c:31 notifier_chain_register+0x3e/0x90
[...snipped]
[  682.610052] Call Trace:
[  682.610369]  <TASK>
[  682.610663]  ? __warn+0x7c/0x110
[  682.611050]  ? notifier_chain_register+0x3e/0x90
[  682.611556]  ? report_bug+0x148/0x170
[  682.611977]  ? handle_bug+0x36/0x70
[  682.612384]  ? exc_invalid_op+0x13/0x60
[  682.612817]  ? asm_exc_invalid_op+0x16/0x20
[  682.613284]  ? notifier_chain_register+0x3e/0x90
[  682.613789]  atomic_notifier_chain_register+0x25/0x40
[  682.614322]  mlx5_eswitch_enable_locked+0x1d4/0x3b0 [mlx5_core]
[  682.614965]  mlx5_eswitch_enable+0xc9/0x100 [mlx5_core]
[  682.615551]  mlx5_device_enable_sriov+0x25/0x340 [mlx5_core]
[  682.616170]  mlx5_core_sriov_configure+0x50/0x170 [mlx5_core]
[  682.616789]  sriov_numvfs_store+0xb0/0x1b0
[  682.617248]  kernfs_fop_write_iter+0x117/0x1a0
[  682.617734]  vfs_write+0x231/0x3f0
[  682.618138]  ksys_write+0x63/0xe0
[  682.618536]  do_syscall_64+0x4c/0x100
[  682.618958]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

Fixes: 7624e58a8b3a ("net/mlx5: E-switch, register event handler before arming the event")
Signed-off-by: Cosmin Ratiu <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
5 months agonet/mlx5: Fix command bitmask initialization
Shay Drory [Tue, 15 Oct 2024 09:32:06 +0000 (12:32 +0300)]
net/mlx5: Fix command bitmask initialization

Command bitmask have a dedicated bit for MANAGE_PAGES command, this bit
isn't Initialize during command bitmask Initialization, only during
MANAGE_PAGES.

In addition, mlx5_cmd_trigger_completions() is trying to trigger
completion for MANAGE_PAGES command as well.

Hence, in case health error occurred before any MANAGE_PAGES command
have been invoke (for example, during mlx5_enable_hca()),
mlx5_cmd_trigger_completions() will try to trigger completion for
MANAGE_PAGES command, which will result in null-ptr-deref error.[1]

Fix it by Initialize command bitmask correctly.

While at it, re-write the code for better understanding.

[1]
BUG: KASAN: null-ptr-deref in mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core]
Write of size 4 at addr 0000000000000214 by task kworker/u96:2/12078
CPU: 10 PID: 12078 Comm: kworker/u96:2 Not tainted 6.9.0-rc2_for_upstream_debug_2024_04_07_19_01 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_health0000:08:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core]
Call Trace:
 <TASK>
 dump_stack_lvl+0x7e/0xc0
 kasan_report+0xb9/0xf0
 kasan_check_range+0xec/0x190
 mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core]
 mlx5_cmd_flush+0x94/0x240 [mlx5_core]
 enter_error_state+0x6c/0xd0 [mlx5_core]
 mlx5_fw_fatal_reporter_err_work+0xf3/0x480 [mlx5_core]
 process_one_work+0x787/0x1490
 ? lockdep_hardirqs_on_prepare+0x400/0x400
 ? pwq_dec_nr_in_flight+0xda0/0xda0
 ? assign_work+0x168/0x240
 worker_thread+0x586/0xd30
 ? rescuer_thread+0xae0/0xae0
 kthread+0x2df/0x3b0
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x2d/0x70
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork_asm+0x11/0x20
 </TASK>

Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load")
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
5 months agonet/mlx5: Check for invalid vector index on EQ creation
Maher Sanalla [Tue, 15 Oct 2024 09:32:05 +0000 (12:32 +0300)]
net/mlx5: Check for invalid vector index on EQ creation

Currently, mlx5 driver does not enforce vector index to be lower than
the maximum number of supported completion vectors when requesting a
new completion EQ. Thus, mlx5_comp_eqn_get() fails when trying to
acquire an IRQ with an improper vector index.

To prevent the case above, enforce that vector index value is
valid and lower than maximum in mlx5_comp_eqn_get() before handling the
request.

Fixes: f14c1a14e632 ("net/mlx5: Allocate completion EQs dynamically")
Signed-off-by: Maher Sanalla <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
5 months agonet/mlx5: HWS, use lock classes for bwc locks
Cosmin Ratiu [Tue, 15 Oct 2024 09:32:04 +0000 (12:32 +0300)]
net/mlx5: HWS, use lock classes for bwc locks

The HWS BWC API uses one lock per queue and usually acquires one of
them, except when doing changes which require locking all queues in
order. Naturally, lockdep isn't too happy about acquiring the same lock
class multiple times, so inform it that each queue lock is a different
class to avoid false positives.

Fixes: 2ca62599aa0b ("net/mlx5: HWS, added send engine and context handling")
Signed-off-by: Cosmin Ratiu <[email protected]>
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
5 months agonet/mlx5: HWS, don't destroy more bwc queue locks than allocated
Cosmin Ratiu [Tue, 15 Oct 2024 09:32:03 +0000 (12:32 +0300)]
net/mlx5: HWS, don't destroy more bwc queue locks than allocated

hws_send_queues_bwc_locks_destroy destroyed more queue locks than
allocated, leading to memory corruption (occasionally) and warnings such
as DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock)) in __mutex_destroy because
sometimes, the 'mutex' being destroyed was random memory.
The severity of this problem is proportional to the number of queues
configured because the code overreaches beyond the end of the
bwc_send_queue_locks array by 2x its length.

Fix that by using the correct number of bwc queues.

Fixes: 2ca62599aa0b ("net/mlx5: HWS, added send engine and context handling")
Signed-off-by: Cosmin Ratiu <[email protected]>
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
5 months agonet/mlx5: HWS, fixed double free in error flow of definer layout
Yevgeny Kliteynik [Tue, 15 Oct 2024 09:32:02 +0000 (12:32 +0300)]
net/mlx5: HWS, fixed double free in error flow of definer layout

Fix error flow bug that could lead to double free of a buffer
during a failure to calculate a suitable definer layout.

Fixes: 74a778b4a63f ("net/mlx5: HWS, added definers handling")
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Reviewed-by: Itamar Gozlan <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
5 months agonet/mlx5: HWS, removed wrong access to a number of rules variable
Yevgeny Kliteynik [Tue, 15 Oct 2024 09:32:01 +0000 (12:32 +0300)]
net/mlx5: HWS, removed wrong access to a number of rules variable

Removed wrong access to the num_of_rules field of the matcher.
This is a usual u32 variable, but the access was as if it was atomic.

This fixes the following CI warnings:
  mlx5hws_bwc.c:708:17: warning: large atomic operation may incur significant performance penalty;
  the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Watomic-alignment]

Fixes: 510f9f61a112 ("net/mlx5: HWS, added API and enabled HWS support")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Yevgeny Kliteynik <[email protected]>
Reviewed-by: Itamar Gozlan <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
5 months agomptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
Matthieu Baerts (NGI0) [Tue, 15 Oct 2024 08:38:47 +0000 (10:38 +0200)]
mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow

Syzkaller reported this splat:

  ==================================================================
  BUG: KASAN: slab-use-after-free in mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
  Read of size 4 at addr ffff8880569ac858 by task syz.1.2799/14662

  CPU: 0 UID: 0 PID: 14662 Comm: syz.1.2799 Not tainted 6.12.0-rc2-syzkaller-00307-g36c254515dc6 #0
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
  Call Trace:
   <TASK>
   __dump_stack lib/dump_stack.c:94 [inline]
   dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
   print_address_description mm/kasan/report.c:377 [inline]
   print_report+0xc3/0x620 mm/kasan/report.c:488
   kasan_report+0xd9/0x110 mm/kasan/report.c:601
   mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
   mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
   mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
   mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
   genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
   genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
   genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
   netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
   genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
   netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
   netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
   netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
   sock_sendmsg_nosec net/socket.c:729 [inline]
   __sock_sendmsg net/socket.c:744 [inline]
   ____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
   ___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
   __sys_sendmsg+0x117/0x1f0 net/socket.c:2690
   do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
   __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
   do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
   entry_SYSENTER_compat_after_hwframe+0x84/0x8e
  RIP: 0023:0xf7fe4579
  Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
  RSP: 002b:00000000f574556c EFLAGS: 00000296 ORIG_RAX: 0000000000000172
  RAX: ffffffffffffffda RBX: 000000000000000b RCX: 0000000020000140
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000
  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
   </TASK>

  Allocated by task 5387:
   kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
   kasan_save_track+0x14/0x30 mm/kasan/common.c:68
   poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
   __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:394
   kmalloc_noprof include/linux/slab.h:878 [inline]
   kzalloc_noprof include/linux/slab.h:1014 [inline]
   subflow_create_ctx+0x87/0x2a0 net/mptcp/subflow.c:1803
   subflow_ulp_init+0xc3/0x4d0 net/mptcp/subflow.c:1956
   __tcp_set_ulp net/ipv4/tcp_ulp.c:146 [inline]
   tcp_set_ulp+0x326/0x7f0 net/ipv4/tcp_ulp.c:167
   mptcp_subflow_create_socket+0x4ae/0x10a0 net/mptcp/subflow.c:1764
   __mptcp_subflow_connect+0x3cc/0x1490 net/mptcp/subflow.c:1592
   mptcp_pm_create_subflow_or_signal_addr+0xbda/0x23a0 net/mptcp/pm_netlink.c:642
   mptcp_pm_nl_fully_established net/mptcp/pm_netlink.c:650 [inline]
   mptcp_pm_nl_work+0x3a1/0x4f0 net/mptcp/pm_netlink.c:943
   mptcp_worker+0x15a/0x1240 net/mptcp/protocol.c:2777
   process_one_work+0x958/0x1b30 kernel/workqueue.c:3229
   process_scheduled_works kernel/workqueue.c:3310 [inline]
   worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391
   kthread+0x2c1/0x3a0 kernel/kthread.c:389
   ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

  Freed by task 113:
   kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
   kasan_save_track+0x14/0x30 mm/kasan/common.c:68
   kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:579
   poison_slab_object mm/kasan/common.c:247 [inline]
   __kasan_slab_free+0x51/0x70 mm/kasan/common.c:264
   kasan_slab_free include/linux/kasan.h:230 [inline]
   slab_free_hook mm/slub.c:2342 [inline]
   slab_free mm/slub.c:4579 [inline]
   kfree+0x14f/0x4b0 mm/slub.c:4727
   kvfree+0x47/0x50 mm/util.c:701
   kvfree_rcu_list+0xf5/0x2c0 kernel/rcu/tree.c:3423
   kvfree_rcu_drain_ready kernel/rcu/tree.c:3563 [inline]
   kfree_rcu_monitor+0x503/0x8b0 kernel/rcu/tree.c:3632
   kfree_rcu_shrink_scan+0x245/0x3a0 kernel/rcu/tree.c:3966
   do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435
   shrink_slab+0x32b/0x12a0 mm/shrinker.c:662
   shrink_one+0x47e/0x7b0 mm/vmscan.c:4818
   shrink_many mm/vmscan.c:4879 [inline]
   lru_gen_shrink_node mm/vmscan.c:4957 [inline]
   shrink_node+0x2452/0x39d0 mm/vmscan.c:5937
   kswapd_shrink_node mm/vmscan.c:6765 [inline]
   balance_pgdat+0xc19/0x18f0 mm/vmscan.c:6957
   kswapd+0x5ea/0xbf0 mm/vmscan.c:7226
   kthread+0x2c1/0x3a0 kernel/kthread.c:389
   ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

  Last potentially related work creation:
   kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
   __kasan_record_aux_stack+0xba/0xd0 mm/kasan/generic.c:541
   kvfree_call_rcu+0x74/0xbe0 kernel/rcu/tree.c:3810
   subflow_ulp_release+0x2ae/0x350 net/mptcp/subflow.c:2009
   tcp_cleanup_ulp+0x7c/0x130 net/ipv4/tcp_ulp.c:124
   tcp_v4_destroy_sock+0x1c5/0x6a0 net/ipv4/tcp_ipv4.c:2541
   inet_csk_destroy_sock+0x1a3/0x440 net/ipv4/inet_connection_sock.c:1293
   tcp_done+0x252/0x350 net/ipv4/tcp.c:4870
   tcp_rcv_state_process+0x379b/0x4f30 net/ipv4/tcp_input.c:6933
   tcp_v4_do_rcv+0x1ad/0xa90 net/ipv4/tcp_ipv4.c:1938
   sk_backlog_rcv include/net/sock.h:1115 [inline]
   __release_sock+0x31b/0x400 net/core/sock.c:3072
   __tcp_close+0x4f3/0xff0 net/ipv4/tcp.c:3142
   __mptcp_close_ssk+0x331/0x14d0 net/mptcp/protocol.c:2489
   mptcp_close_ssk net/mptcp/protocol.c:2543 [inline]
   mptcp_close_ssk+0x150/0x220 net/mptcp/protocol.c:2526
   mptcp_pm_nl_rm_addr_or_subflow+0x2be/0xcc0 net/mptcp/pm_netlink.c:878
   mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
   mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
   mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
   genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
   genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
   genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
   netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
   genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
   netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
   netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
   netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
   sock_sendmsg_nosec net/socket.c:729 [inline]
   __sock_sendmsg net/socket.c:744 [inline]
   ____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
   ___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
   __sys_sendmsg+0x117/0x1f0 net/socket.c:2690
   do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
   __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
   do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
   entry_SYSENTER_compat_after_hwframe+0x84/0x8e

  The buggy address belongs to the object at ffff8880569ac800
   which belongs to the cache kmalloc-512 of size 512
  The buggy address is located 88 bytes inside of
   freed 512-byte region [ffff8880569ac800ffff8880569aca00)

  The buggy address belongs to the physical page:
  page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x569ac
  head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
  flags: 0x4fff00000000040(head|node=1|zone=1|lastcpupid=0x7ff)
  page_type: f5(slab)
  raw: 04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
  raw: 0000000000000000 0000000080100010 00000001f5000000 0000000000000000
  head: 04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
  head: 0000000000000000 0000000080100010 00000001f5000000 0000000000000000
  head: 04fff00000000002 ffffea00015a6b01 ffffffffffffffff 0000000000000000
  head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: kasan: bad access detected
  page_owner tracks the page as allocated
  page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 10238, tgid 10238 (kworker/u32:6), ts 597403252405, free_ts 597177952947
   set_page_owner include/linux/page_owner.h:32 [inline]
   post_alloc_hook+0x2d1/0x350 mm/page_alloc.c:1537
   prep_new_page mm/page_alloc.c:1545 [inline]
   get_page_from_freelist+0x101e/0x3070 mm/page_alloc.c:3457
   __alloc_pages_noprof+0x223/0x25a0 mm/page_alloc.c:4733
   alloc_pages_mpol_noprof+0x2c9/0x610 mm/mempolicy.c:2265
   alloc_slab_page mm/slub.c:2412 [inline]
   allocate_slab mm/slub.c:2578 [inline]
   new_slab+0x2ba/0x3f0 mm/slub.c:2631
   ___slab_alloc+0xd1d/0x16f0 mm/slub.c:3818
   __slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3908
   __slab_alloc_node mm/slub.c:3961 [inline]
   slab_alloc_node mm/slub.c:4122 [inline]
   __kmalloc_cache_noprof+0x2c5/0x310 mm/slub.c:4290
   kmalloc_noprof include/linux/slab.h:878 [inline]
   kzalloc_noprof include/linux/slab.h:1014 [inline]
   mld_add_delrec net/ipv6/mcast.c:743 [inline]
   igmp6_leave_group net/ipv6/mcast.c:2625 [inline]
   igmp6_group_dropped+0x4ab/0xe40 net/ipv6/mcast.c:723
   __ipv6_dev_mc_dec+0x281/0x360 net/ipv6/mcast.c:979
   addrconf_leave_solict net/ipv6/addrconf.c:2253 [inline]
   __ipv6_ifa_notify+0x3f6/0xc30 net/ipv6/addrconf.c:6283
   addrconf_ifdown.isra.0+0xef9/0x1a20 net/ipv6/addrconf.c:3982
   addrconf_notify+0x220/0x19c0 net/ipv6/addrconf.c:3781
   notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
   call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1996
   call_netdevice_notifiers_extack net/core/dev.c:2034 [inline]
   call_netdevice_notifiers net/core/dev.c:2048 [inline]
   dev_close_many+0x333/0x6a0 net/core/dev.c:1589
  page last free pid 13136 tgid 13136 stack trace:
   reset_page_owner include/linux/page_owner.h:25 [inline]
   free_pages_prepare mm/page_alloc.c:1108 [inline]
   free_unref_page+0x5f4/0xdc0 mm/page_alloc.c:2638
   stack_depot_save_flags+0x2da/0x900 lib/stackdepot.c:666
   kasan_save_stack+0x42/0x60 mm/kasan/common.c:48
   kasan_save_track+0x14/0x30 mm/kasan/common.c:68
   unpoison_slab_object mm/kasan/common.c:319 [inline]
   __kasan_slab_alloc+0x89/0x90 mm/kasan/common.c:345
   kasan_slab_alloc include/linux/kasan.h:247 [inline]
   slab_post_alloc_hook mm/slub.c:4085 [inline]
   slab_alloc_node mm/slub.c:4134 [inline]
   kmem_cache_alloc_noprof+0x121/0x2f0 mm/slub.c:4141
   skb_clone+0x190/0x3f0 net/core/skbuff.c:2084
   do_one_broadcast net/netlink/af_netlink.c:1462 [inline]
   netlink_broadcast_filtered+0xb11/0xef0 net/netlink/af_netlink.c:1540
   netlink_broadcast+0x39/0x50 net/netlink/af_netlink.c:1564
   uevent_net_broadcast_untagged lib/kobject_uevent.c:331 [inline]
   kobject_uevent_net_broadcast lib/kobject_uevent.c:410 [inline]
   kobject_uevent_env+0xacd/0x1670 lib/kobject_uevent.c:608
   device_del+0x623/0x9f0 drivers/base/core.c:3882
   snd_card_disconnect.part.0+0x58a/0x7c0 sound/core/init.c:546
   snd_card_disconnect+0x1f/0x30 sound/core/init.c:495
   snd_usx2y_disconnect+0xe9/0x1f0 sound/usb/usx2y/usbusx2y.c:417
   usb_unbind_interface+0x1e8/0x970 drivers/usb/core/driver.c:461
   device_remove drivers/base/dd.c:569 [inline]
   device_remove+0x122/0x170 drivers/base/dd.c:561

That's because 'subflow' is used just after 'mptcp_close_ssk(subflow)',
which will initiate the release of its memory. Even if it is very likely
the release and the re-utilisation will be done later on, it is of
course better to avoid any issues and read the content of 'subflow'
before closing it.

Fixes: 1c1f72137598 ("mptcp: pm: only decrement add_addr_accepted for MPJ req")
Cc: [email protected]
Reported-by: [email protected]
Closes: https://lore.kernel.org/[email protected]
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
5 months agonet: ethernet: mtk_eth_soc: fix memory corruption during fq dma init
Felix Fietkau [Tue, 15 Oct 2024 08:17:55 +0000 (10:17 +0200)]
net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init

The loop responsible for allocating up to MTK_FQ_DMA_LENGTH buffers must
only touch as many descriptors, otherwise it ends up corrupting unrelated
memory. Fix the loop iteration count accordingly.

Fixes: c57e55819443 ("net: ethernet: mtk_eth_soc: handle dma buffer size soc specific")
Signed-off-by: Felix Fietkau <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
5 months agovmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame
Daniel Borkmann [Mon, 14 Oct 2024 19:03:11 +0000 (21:03 +0200)]
vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame

Andrew and Nikolay reported connectivity issues with Cilium's service
load-balancing in case of vmxnet3.

If a BPF program for native XDP adds an encapsulation header such as
IPIP and transmits the packet out the same interface, then in case
of vmxnet3 a corrupted packet is being sent and subsequently dropped
on the path.

vmxnet3_xdp_xmit_frame() which is called e.g. via vmxnet3_run_xdp()
through vmxnet3_xdp_xmit_back() calculates an incorrect DMA address:

  page = virt_to_page(xdpf->data);
  tbi->dma_addr = page_pool_get_dma_addr(page) +
                  VMXNET3_XDP_HEADROOM;
  dma_sync_single_for_device(&adapter->pdev->dev,
                             tbi->dma_addr, buf_size,
                             DMA_TO_DEVICE);

The above assumes a fixed offset (VMXNET3_XDP_HEADROOM), but the XDP
BPF program could have moved xdp->data. While the passed buf_size is
correct (xdpf->len), the dma_addr needs to have a dynamic offset which
can be calculated as xdpf->data - (void *)xdpf, that is, xdp->data -
xdp->data_hard_start.

Fixes: 54f00cce1178 ("vmxnet3: Add XDP support.")
Reported-by: Andrew Sauber <[email protected]>
Reported-by: Nikolay Nikolaev <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Tested-by: Nikolay Nikolaev <[email protected]>
Acked-by: Anton Protopopov <[email protected]>
Cc: William Tu <[email protected]>
Cc: Ronak Doshi <[email protected]>
Link: https://patch.msgid.link/a0888656d7f09028f9984498cc698bb5364d89fc.1728931137.git.daniel@iogearbox.net
Signed-off-by: Paolo Abeni <[email protected]>
5 months agoMerge branch 'ethtool-rss-track-rss-ctx-busy-from-core'
Paolo Abeni [Thu, 17 Oct 2024 08:22:03 +0000 (10:22 +0200)]
Merge branch 'ethtool-rss-track-rss-ctx-busy-from-core'

Daniel Zahka says:

====================
ethtool: rss: track rss ctx busy from core

This series prevents deletion of rss contexts that are
in use by ntuple filters from ethtool core.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
5 months agoselftests: drv-net: rss_ctx: add rss ctx busy testcase
Daniel Zahka [Fri, 11 Oct 2024 18:35:48 +0000 (11:35 -0700)]
selftests: drv-net: rss_ctx: add rss ctx busy testcase

It should be invalid to delete an rss context while it is being
referenced from an ntuple filter. ethtool core should prevent this
from happening. This patch adds a testcase to verify this behavior.

Signed-off-by: Daniel Zahka <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
5 months agoethtool: rss: prevent rss ctx deletion when in use
Daniel Zahka [Fri, 11 Oct 2024 18:35:47 +0000 (11:35 -0700)]
ethtool: rss: prevent rss ctx deletion when in use

ntuple filters can specify an rss context to use for packet hashing
and queue selection. When a filter is referencing an rss context, it
should be invalid for that context to be deleted. A list of active
ntuple filters and their associated rss contexts can be compiled by
querying a device's ethtool_ops.get_rxnfc. This patch checks to see if
any ntuple filters are referencing an rss context during context
deletion, and prevents the deletion if the requested context is still
in use.

Signed-off-by: Daniel Zahka <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
5 months agonet: phy: realtek: clear 1000Base-T link partner advertisement
Daniel Golle [Thu, 10 Oct 2024 13:07:39 +0000 (14:07 +0100)]
net: phy: realtek: clear 1000Base-T link partner advertisement

Clear 1000Base-T link partner advertisement bits in Clause-45
read_status() function in case auto-negotiation is disabled or has not
been completed.

Signed-off-by: Daniel Golle <[email protected]>
Link: https://patch.msgid.link/9dc9b47b2d675708afef3ad366bfd78eb584d958.1728565530.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <[email protected]>
5 months agonet: phy: realtek: change order of calls in C22 read_status()
Daniel Golle [Thu, 10 Oct 2024 13:07:26 +0000 (14:07 +0100)]
net: phy: realtek: change order of calls in C22 read_status()

Always call rtlgen_read_status() first, so genphy_read_status() which
is called by it clears bits in case auto-negotiation has not completed.
Also clear 10GBT link-partner advertisement bits in case auto-negotiation
is disabled or has not completed.

Suggested-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Daniel Golle <[email protected]>
Link: https://patch.msgid.link/b15929a41621d215c6b2b57393368086589569ec.1728565530.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <[email protected]>
5 months agonet: phy: realtek: read duplex and gbit master from PHYSR register
Daniel Golle [Thu, 10 Oct 2024 13:07:16 +0000 (14:07 +0100)]
net: phy: realtek: read duplex and gbit master from PHYSR register

The PHYSR MMD register is present and defined equally for all RTL82xx
Ethernet PHYs.
Read duplex and Gbit master bits from rtlgen_decode_speed() and rename
it to rtlgen_decode_physr().

Signed-off-by: Daniel Golle <[email protected]>
Link: https://patch.msgid.link/b9a76341da851a18c985bc4774fa295babec79bb.1728565530.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <[email protected]>
5 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Wed, 16 Oct 2024 20:37:59 +0000 (13:37 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Several miscellaneous fixes. A lot of bnxt_re activity, there will be
  more rc patches there coming.

   - Many bnxt_re bug fixes - Memory leaks, kasn, NULL pointer deref,
     soft lockups, error unwinding and some small functional issues

   - Error unwind bug in rdma netlink

   - Two issues with incorrect VLAN detection for iWarp

   - skb_splice_from_iter() splat in siw

   - Give SRP slab caches unique names to resolve the merge window
     WARN_ON regression"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/bnxt_re: Fix the GID table length
  RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages
  RDMA/bnxt_re: Change the sequence of updating the CQ toggle value
  RDMA/bnxt_re: Fix an error path in bnxt_re_add_device
  RDMA/bnxt_re: Avoid CPU lockups due fifo occupancy check loop
  RDMA/bnxt_re: Fix a possible NULL pointer dereference
  RDMA/bnxt_re: Return more meaningful error
  RDMA/bnxt_re: Fix incorrect dereference of srq in async event
  RDMA/bnxt_re: Fix out of bound check
  RDMA/bnxt_re: Fix the max CQ WQEs for older adapters
  RDMA/srpt: Make slab cache names unique
  RDMA/irdma: Fix misspelling of "accept*"
  RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP
  RDMA/siw: Add sendpage_ok() check to disable MSG_SPLICE_PAGES
  RDMA/core: Fix ENODEV error for iWARP test over vlan
  RDMA/nldev: Fix NULL pointer dereferences issue in rdma_nl_notify_event
  RDMA/bnxt_re: Fix the max WQEs used in Static WQE mode
  RDMA/bnxt_re: Add a check for memory allocation
  RDMA/bnxt_re: Fix incorrect AVID type in WQE structure
  RDMA/bnxt_re: Fix a possible memory leak

5 months agoMerge tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Wed, 16 Oct 2024 16:30:20 +0000 (09:30 -0700)]
Merge tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - regression fix: dirty extents tracked in xarray for qgroups must be
   adjusted for 32bit platforms

 - fix potentially freeing uninitialized name in fscrypt structure

 - fix warning about unneeded variable in a send callback

* tag 'for-6.12-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix uninitialized pointer free on read_alloc_one_name() error
  btrfs: send: cleanup unneeded return variable in changed_verity()
  btrfs: fix uninitialized pointer free in add_inode_ref()
  btrfs: use sector numbers as keys for the dirty extents xarray

5 months agoMerge tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Wed, 16 Oct 2024 16:15:43 +0000 (09:15 -0700)]
Merge tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - fix race between session setup and session logoff

 - add supplementary group support

* tag 'v6.12-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: add support for supplementary groups
  ksmbd: fix user-after-free from session log off

5 months agoMerge tag 'v6.12-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Wed, 16 Oct 2024 15:42:54 +0000 (08:42 -0700)]
Merge tag 'v6.12-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

 - Remove bogus testmgr ENOENT error messages

 - Ensure algorithm is still alive before marking it as tested

 - Disable buggy hash algorithms in marvell/cesa

* tag 'v6.12-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: marvell/cesa - Disable hash algorithms
  crypto: testmgr - Hide ENOENT errors better
  crypto: api - Fix liveliness check in crypto_alg_tested

5 months agoMerge tag 'sched_ext-for-6.12-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 16 Oct 2024 02:47:19 +0000 (19:47 -0700)]
Merge tag 'sched_ext-for-6.12-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

 - More issues reported in the enable/disable paths on large machines
   with many tasks due to scx_tasks_lock being held too long. Break up
   the task iterations

 - Remove ops.select_cpu() dependency in bypass mode so that a
   misbehaving implementation can't live-lock the machine by pushing all
   tasks to few CPUs in bypass mode

 - Other misc fixes

* tag 'sched_ext-for-6.12-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Remove unnecessary cpu_relax()
  sched_ext: Don't hold scx_tasks_lock for too long
  sched_ext: Move scx_tasks_lock handling into scx_task_iter helpers
  sched_ext: bypass mode shouldn't depend on ops.select_cpu()
  sched_ext: Move scx_buildin_idle_enabled check to scx_bpf_select_cpu_dfl()
  sched_ext: Start schedulers with consistent p->scx.slice values
  Revert "sched_ext: Use shorter slice while bypassing"
  sched_ext: use correct function name in pick_task_scx() warning message
  selftests: sched_ext: Add sched_ext as proper selftest target

5 months agoMerge branch 'rtnetlink-use-rtnl_register_many'
Jakub Kicinski [Wed, 16 Oct 2024 01:52:28 +0000 (18:52 -0700)]
Merge branch 'rtnetlink-use-rtnl_register_many'

Kuniyuki Iwashima says:

====================
rtnetlink: Use rtnl_register_many().

This series converts all rtnl_register() and rtnl_register_module()
to rtnl_register_many() and finally removes them.

Once this series is applied, I'll start converting doit() to per-netns
RTNL.

v1: https://lore.kernel.org/20241011220550[email protected]/
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agortnetlink: Remove rtnl_register() and rtnl_register_module().
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:28 +0000 (13:18 -0700)]
rtnetlink: Remove rtnl_register() and rtnl_register_module().

No one uses rtnl_register() and rtnl_register_module().

Let's remove them.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agocan: gw: Use rtnl_register_many().
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:27 +0000 (13:18 -0700)]
can: gw: Use rtnl_register_many().

We will remove rtnl_register_module() in favour of rtnl_register_many().

rtnl_register_many() will unwind the previous successful registrations
on failure and simplify module error handling.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Marc Kleine-Budde <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agodcb: Use rtnl_register_many().
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:26 +0000 (13:18 -0700)]
dcb: Use rtnl_register_many().

We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoipmr: Use rtnl_register_many().
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:25 +0000 (13:18 -0700)]
ipmr: Use rtnl_register_many().

We will remove rtnl_register() and rtnl_register_module() in favour
of rtnl_register_many().

When it succeeds for built-in callers, rtnl_register_many() guarantees
all rtnetlink types in the passed array are supported, and there is no
chance that a part of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoipv6: Use rtnl_register_many().
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:24 +0000 (13:18 -0700)]
ipv6: Use rtnl_register_many().

We will remove rtnl_register_module() in favour of rtnl_register_many().

rtnl_register_many() will unwind the previous successful registrations
on failure and simplify module error handling.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoipv4: Use rtnl_register_many().
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:23 +0000 (13:18 -0700)]
ipv4: Use rtnl_register_many().

We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: Use rtnl_register_many().
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:22 +0000 (13:18 -0700)]
net: Use rtnl_register_many().

We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: sched: Use rtnl_register_many().
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:21 +0000 (13:18 -0700)]
net: sched: Use rtnl_register_many().

We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoneighbour: Use rtnl_register_many().
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:20 +0000 (13:18 -0700)]
neighbour: Use rtnl_register_many().

We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agortnetlink: Use rtnl_register_many().
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:19 +0000 (13:18 -0700)]
rtnetlink: Use rtnl_register_many().

We will remove rtnl_register() in favour of rtnl_register_many().

When it succeeds, rtnl_register_many() guarantees all rtnetlink types
in the passed array are supported, and there is no chance that a part
of message types is not supported.

Let's use rtnl_register_many() instead.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agortnetlink: Panic when __rtnl_register_many() fails for builtin callers.
Kuniyuki Iwashima [Mon, 14 Oct 2024 20:18:18 +0000 (13:18 -0700)]
rtnetlink: Panic when __rtnl_register_many() fails for builtin callers.

We will replace all rtnl_register() and rtnl_register_module() with
rtnl_register_many().

Currently, rtnl_register() returns nothing and prints an error message
when it fails to register a rtnetlink message type and handlers.

The failure happens only when rtnl_register_internal() fails to allocate
rtnl_msg_handlers[protocol][msgtype], but it's unlikely for built-in
callers on boot time.

rtnl_register_many() unwinds the previous successful registrations on
failure and returns an error, but it will be useless for built-in callers,
especially some subsystems that do not have the legacy ioctl() interface
and do not work without rtnetlink.

Instead of booting up without rtnetlink functionality, let's panic on
failure for built-in rtnl_register_many() callers.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMerge branch 'gve-adopt-page-pool'
Jakub Kicinski [Wed, 16 Oct 2024 01:50:14 +0000 (18:50 -0700)]
Merge branch 'gve-adopt-page-pool'

Harshitha Ramamurthy says:

====================
gve: adopt page pool

This patchset implements page pool support for gve.
The first patch deals with movement of code to make
page pool adoption easier in the next patch. The
second patch adopts the page pool API. The third patch
adds basic per queue stats which includes page pool
allocation failures as well.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agogve: add support for basic queue stats
Harshitha Ramamurthy [Mon, 14 Oct 2024 20:21:08 +0000 (13:21 -0700)]
gve: add support for basic queue stats

Implement netdev_stats_ops to export basic per-queue stats.

With page pool support for DQO added in the previous patches,
rx-alloc-fail captures failures in page pool allocations as
well since the rx_buf_alloc_fail stat tracked in the driver
is incremented when gve_alloc_buffer returns error.

Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agogve: adopt page pool for DQ RDA mode
Harshitha Ramamurthy [Mon, 14 Oct 2024 20:21:07 +0000 (13:21 -0700)]
gve: adopt page pool for DQ RDA mode

For DQ queue format in raw DMA addressing(RDA) mode,
implement page pool recycling of buffers by leveraging
a few helper functions.

DQ QPL mode will continue to use the exisiting recycling
logic. This is because in QPL mode, the pages come from a
constant set of pages that the driver pre-allocates and
registers with the device.

Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Shailend Chand <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agogve: move DQO rx buffer management related code to a new file
Harshitha Ramamurthy [Mon, 14 Oct 2024 20:21:06 +0000 (13:21 -0700)]
gve: move DQO rx buffer management related code to a new file

In preparation for the upcoming page pool adoption for DQO
raw addressing mode, move RX buffer management code to a new
file. In the follow on patches, page pool code will be added
to this file.

No functional change, just movement of code.

Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Shailend Chand <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMerge branch 'do-not-leave-dangling-sk-pointers-in-pf-create-functions'
Jakub Kicinski [Wed, 16 Oct 2024 01:43:11 +0000 (18:43 -0700)]
Merge branch 'do-not-leave-dangling-sk-pointers-in-pf-create-functions'

Ignat Korchagin says:

====================
do not leave dangling sk pointers in pf->create functions

Some protocol family create() implementations have an error path after
allocating the sk object and calling sock_init_data(). sock_init_data()
attaches the allocated sk object to the sock object, provided by the
caller.

If the create() implementation errors out after calling sock_init_data(),
it releases the allocated sk object, but the caller ends up having a
dangling sk pointer in its sock object on return. Subsequent manipulations
on this sock object may try to access the sk pointer, because it is not
NULL thus creating a use-after-free scenario.

We have implemented a stable hotfix in commit 631083143315
("net: explicitly clear the sk pointer, when pf->create fails"), but this
series aims to fix it properly by going through each of the pf->create()
implementations and making sure they all don't return a sock object with
a dangling pointer on error.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoRevert "net: do not leave a dangling sk pointer, when socket creation fails"
Ignat Korchagin [Mon, 14 Oct 2024 15:38:08 +0000 (16:38 +0100)]
Revert "net: do not leave a dangling sk pointer, when socket creation fails"

This reverts commit 6cd4a78d962bebbaf8beb7d2ead3f34120e3f7b2.

inet/inet6->create() implementations have been fixed to explicitly NULL the
allocated sk object on error.

A warning was put in place to make sure any future changes will not leave
a dangling pointer in pf->create() implementations.

So this code is now redundant.

Suggested-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Ignat Korchagin <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: warn, if pf->create does not clear sock->sk on error
Ignat Korchagin [Mon, 14 Oct 2024 15:38:07 +0000 (16:38 +0100)]
net: warn, if pf->create does not clear sock->sk on error

All pf->create implementations have been fixed now to clear sock->sk on
error, when they deallocate the allocated sk object.

Put a warning in place to make sure we don't break this promise in the
future.

Suggested-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: Ignat Korchagin <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: inet6: do not leave a dangling sk pointer in inet6_create()
Ignat Korchagin [Mon, 14 Oct 2024 15:38:06 +0000 (16:38 +0100)]
net: inet6: do not leave a dangling sk pointer in inet6_create()

sock_init_data() attaches the allocated sk pointer to the provided sock
object. If inet6_create() fails later, the sk object is released, but the
sock object retains the dangling sk pointer, which may cause use-after-free
later.

Clear the sock sk pointer on error.

Signed-off-by: Ignat Korchagin <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: inet: do not leave a dangling sk pointer in inet_create()
Ignat Korchagin [Mon, 14 Oct 2024 15:38:05 +0000 (16:38 +0100)]
net: inet: do not leave a dangling sk pointer in inet_create()

sock_init_data() attaches the allocated sk object to the provided sock
object. If inet_create() fails later, the sk object is freed, but the
sock object retains the dangling pointer, which may create use-after-free
later.

Clear the sk pointer in the sock object on error.

Signed-off-by: Ignat Korchagin <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: ieee802154: do not leave a dangling sk pointer in ieee802154_create()
Ignat Korchagin [Mon, 14 Oct 2024 15:38:04 +0000 (16:38 +0100)]
net: ieee802154: do not leave a dangling sk pointer in ieee802154_create()

sock_init_data() attaches the allocated sk object to the provided sock
object. If ieee802154_create() fails later, the allocated sk object is
freed, but the dangling pointer remains in the provided sock object, which
may allow use-after-free.

Clear the sk pointer in the sock object on error.

Signed-off-by: Ignat Korchagin <[email protected]>
Reviewed-by: Miquel Raynal <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: af_can: do not leave a dangling sk pointer in can_create()
Ignat Korchagin [Mon, 14 Oct 2024 15:38:03 +0000 (16:38 +0100)]
net: af_can: do not leave a dangling sk pointer in can_create()

On error can_create() frees the allocated sk object, but sock_init_data()
has already attached it to the provided sock object. This will leave a
dangling sk pointer in the sock object and may cause use-after-free later.

Signed-off-by: Ignat Korchagin <[email protected]>
Reviewed-by: Vincent Mailhol <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Marc Kleine-Budde <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoBluetooth: RFCOMM: avoid leaving dangling sk pointer in rfcomm_sock_alloc()
Ignat Korchagin [Mon, 14 Oct 2024 15:38:02 +0000 (16:38 +0100)]
Bluetooth: RFCOMM: avoid leaving dangling sk pointer in rfcomm_sock_alloc()

bt_sock_alloc() attaches allocated sk object to the provided sock object.
If rfcomm_dlc_alloc() fails, we release the sk object, but leave the
dangling pointer in the sock object, which may cause use-after-free.

Fix this by swapping calls to bt_sock_alloc() and rfcomm_dlc_alloc().

Signed-off-by: Ignat Korchagin <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoBluetooth: L2CAP: do not leave dangling sk pointer on error in l2cap_sock_create()
Ignat Korchagin [Mon, 14 Oct 2024 15:38:01 +0000 (16:38 +0100)]
Bluetooth: L2CAP: do not leave dangling sk pointer on error in l2cap_sock_create()

bt_sock_alloc() allocates the sk object and attaches it to the provided
sock object. On error l2cap_sock_alloc() frees the sk object, but the
dangling pointer is still attached to the sock object, which may create
use-after-free in other code.

Signed-off-by: Ignat Korchagin <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoaf_packet: avoid erroring out after sock_init_data() in packet_create()
Ignat Korchagin [Mon, 14 Oct 2024 15:38:00 +0000 (16:38 +0100)]
af_packet: avoid erroring out after sock_init_data() in packet_create()

After sock_init_data() the allocated sk object is attached to the provided
sock object. On error, packet_create() frees the sk object leaving the
dangling pointer in the sock object on return. Some other code may try
to use this pointer and cause use-after-free.

Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Ignat Korchagin <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: dsa: vsc73xx: fix reception from VLAN-unaware bridges
Vladimir Oltean [Mon, 14 Oct 2024 15:30:41 +0000 (18:30 +0300)]
net: dsa: vsc73xx: fix reception from VLAN-unaware bridges

Similar to the situation described for sja1105 in commit 1f9fc48fd302
("net: dsa: sja1105: fix reception from VLAN-unaware bridges"), the
vsc73xx driver uses tag_8021q and doesn't need the ds->untag_bridge_pvid
request. In fact, this option breaks packet reception.

The ds->untag_bridge_pvid option strips VLANs from packets received on
VLAN-unaware bridge ports. But those VLANs should already be stripped
by tag_vsc73xx_8021q.c as part of vsc73xx_rcv() - they are not VLANs in
VLAN-unaware mode, but DSA tags. Thus, dsa_software_vlan_untag() tries
to untag a VLAN that doesn't exist, corrupting the packet.

Fixes: 93e4649efa96 ("net: dsa: provide a software untagging function on RX for VLAN-aware bridges")
Tested-by: Pawel Dembicki <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: ravb: Only advertise Rx/Tx timestamps if hardware supports it
Niklas Söderlund [Mon, 14 Oct 2024 12:43:43 +0000 (14:43 +0200)]
net: ravb: Only advertise Rx/Tx timestamps if hardware supports it

Recent work moving the reporting of Rx software timestamps to the core
[1] highlighted an issue where hardware time stamping was advertised
for the platforms where it is not supported.

Fix this by covering advertising support for hardware timestamps only if
the hardware supports it. Due to the Tx implementation in RAVB software
Tx timestamping is also only considered if the hardware supports
hardware timestamps. This should be addressed in future, but this fix
only reflects what the driver currently implements.

1. Commit 277901ee3a26 ("ravb: Remove setting of RX software timestamp")

Fixes: 7e09a052dc4e ("ravb: Exclude gPTP feature support for RZ/G2L")
Signed-off-by: Niklas Söderlund <[email protected]>
Reviewed-by: Paul Barker <[email protected]>
Tested-by: Paul Barker <[email protected]>
Reviewed-by: Sergey Shtylyov <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()
Jinjie Ruan [Mon, 14 Oct 2024 12:19:22 +0000 (20:19 +0800)]
net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()

Commit a3c1e45156ad ("net: microchip: vcap: Fix use-after-free error in
kunit test") fixed the use-after-free error, but introduced below
memory leaks by removing necessary vcap_free_rule(), add it to fix it.

unreferenced object 0xffffff80ca58b700 (size 192):
  comm "kunit_try_catch", pid 1215, jiffies 4294898264
  hex dump (first 32 bytes):
    00 12 7a 00 05 00 00 00 0a 00 00 00 64 00 00 00  ..z.........d...
    00 00 00 00 00 00 00 00 00 04 0b cc 80 ff ff ff  ................
  backtrace (crc 9c09c3fe):
    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
    [<0000000040a01b8d>] vcap_alloc_rule+0x3cc/0x9c4
    [<000000003fe86110>] vcap_api_encode_rule_test+0x1ac/0x16b0
    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<00000000c5d82c9a>] kthread+0x2e8/0x374
    [<00000000f4287308>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80cc0b0400 (size 64):
  comm "kunit_try_catch", pid 1215, jiffies 4294898265
  hex dump (first 32 bytes):
    80 04 0b cc 80 ff ff ff 18 b7 58 ca 80 ff ff ff  ..........X.....
    39 00 00 00 02 00 00 00 06 05 04 03 02 01 ff ff  9...............
  backtrace (crc daf014e9):
    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
    [<000000000ff63fd4>] vcap_rule_add_key+0x2cc/0x528
    [<00000000dfdb1e81>] vcap_api_encode_rule_test+0x224/0x16b0
    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<00000000c5d82c9a>] kthread+0x2e8/0x374
    [<00000000f4287308>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80cc0b0700 (size 64):
  comm "kunit_try_catch", pid 1215, jiffies 4294898265
  hex dump (first 32 bytes):
    80 07 0b cc 80 ff ff ff 28 b7 58 ca 80 ff ff ff  ........(.X.....
    3c 00 00 00 00 00 00 00 01 2f 03 b3 ec ff ff ff  <......../......
  backtrace (crc 8d877792):
    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
    [<000000006eadfab7>] vcap_rule_add_action+0x2d0/0x52c
    [<00000000323475d1>] vcap_api_encode_rule_test+0x4d4/0x16b0
    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<00000000c5d82c9a>] kthread+0x2e8/0x374
    [<00000000f4287308>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80cc0b0900 (size 64):
  comm "kunit_try_catch", pid 1215, jiffies 4294898266
  hex dump (first 32 bytes):
    80 09 0b cc 80 ff ff ff 80 06 0b cc 80 ff ff ff  ................
    7d 00 00 00 01 00 00 00 00 00 00 00 ff 00 00 00  }...............
  backtrace (crc 34181e56):
    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
    [<000000000ff63fd4>] vcap_rule_add_key+0x2cc/0x528
    [<00000000991e3564>] vcap_val_rule+0xcf0/0x13e8
    [<00000000fc9868e5>] vcap_api_encode_rule_test+0x678/0x16b0
    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<00000000c5d82c9a>] kthread+0x2e8/0x374
    [<00000000f4287308>] ret_from_fork+0x10/0x20
unreferenced object 0xffffff80cc0b0980 (size 64):
  comm "kunit_try_catch", pid 1215, jiffies 4294898266
  hex dump (first 32 bytes):
    18 b7 58 ca 80 ff ff ff 00 09 0b cc 80 ff ff ff  ..X.............
    67 00 00 00 00 00 00 00 01 01 74 88 c0 ff ff ff  g.........t.....
  backtrace (crc 275fd9be):
    [<0000000052a0be73>] kmemleak_alloc+0x34/0x40
    [<0000000043605459>] __kmalloc_cache_noprof+0x26c/0x2f4
    [<000000000ff63fd4>] vcap_rule_add_key+0x2cc/0x528
    [<000000001396a1a2>] test_add_def_fields+0xb0/0x100
    [<000000006e7621f0>] vcap_val_rule+0xa98/0x13e8
    [<00000000fc9868e5>] vcap_api_encode_rule_test+0x678/0x16b0
    [<00000000b3595fc4>] kunit_try_run_case+0x13c/0x3ac
    [<0000000010f5d2bf>] kunit_generic_run_threadfn_adapter+0x80/0xec
    [<00000000c5d82c9a>] kthread+0x2e8/0x374
    [<00000000f4287308>] ret_from_fork+0x10/0x20
......

Cc: [email protected]
Fixes: a3c1e45156ad ("net: microchip: vcap: Fix use-after-free error in kunit test")
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Jens Emil Schulz Østergaard <[email protected]>
Signed-off-by: Jinjie Ruan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet/sched: cbs: Fix integer overflow in cbs_set_port_rate()
Elena Salomatkina [Sun, 13 Oct 2024 12:45:29 +0000 (15:45 +0300)]
net/sched: cbs: Fix integer overflow in cbs_set_port_rate()

The subsequent calculation of port_rate = speed * 1000 * BYTES_PER_KBIT,
where the BYTES_PER_KBIT is of type LL, may cause an overflow.
At least when speed = SPEED_20000, the expression to the left of port_rate
will be greater than INT_MAX.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Elena Salomatkina <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMerge branch 'net-phy-mdio-bcm-unimac-add-bcm6846-variant'
Jakub Kicinski [Wed, 16 Oct 2024 01:23:55 +0000 (18:23 -0700)]
Merge branch 'net-phy-mdio-bcm-unimac-add-bcm6846-variant'

Linus Walleij says:

====================
net: phy: mdio-bcm-unimac: Add BCM6846 variant

As pointed out by Florian:
https://lore.kernel.org/linux-devicetree/b542b2e8-115c-4234-a464-e73aa6bece5c@broadcom.com/

The BCM6846 has a few extra registers and cannot reuse the
compatible string from other variants of the Unimac
MDIO block: we need to be able to tell them apart.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: phy: mdio-bcm-unimac: Add BCM6846 support
Linus Walleij [Sat, 12 Oct 2024 20:35:23 +0000 (22:35 +0200)]
net: phy: mdio-bcm-unimac: Add BCM6846 support

Add Unimac mdio compatible string for the special BCM6846
variant.

This variant has a few extra registers compared to other
versions.

Suggested-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/linux-devicetree/[email protected]/
Signed-off-by: Linus Walleij <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agodt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio
Linus Walleij [Sat, 12 Oct 2024 20:35:22 +0000 (22:35 +0200)]
dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio

The MDIO block in the BCM6846 is not identical to any of the
previous versions, but has extended registers not present in
the other variants. For this reason we need to use a new
compatible especially for this SoC.

Suggested-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/linux-devicetree/[email protected]/
Signed-off-by: Linus Walleij <[email protected]>
Acked-by: Rob Herring (Arm) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoudp: Compute L4 checksum as usual when not segmenting the skb
Jakub Sitnicki [Fri, 11 Oct 2024 12:17:30 +0000 (14:17 +0200)]
udp: Compute L4 checksum as usual when not segmenting the skb

If:

  1) the user requested USO, but
  2) there is not enough payload for GSO to kick in, and
  3) the egress device doesn't offer checksum offload, then

we want to compute the L4 checksum in software early on.

In the case when we are not taking the GSO path, but it has been requested,
the software checksum fallback in skb_segment doesn't get a chance to
compute the full checksum, if the egress device can't do it. As a result we
end up sending UDP datagrams with only a partial checksum filled in, which
the peer will discard.

Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
Reported-by: Ivan Babrou <[email protected]>
Signed-off-by: Jakub Sitnicki <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Cc: [email protected]
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agogenetlink: hold RCU in genlmsg_mcast()
Eric Dumazet [Fri, 11 Oct 2024 17:12:17 +0000 (17:12 +0000)]
genetlink: hold RCU in genlmsg_mcast()

While running net selftests with CONFIG_PROVE_RCU_LIST=y I saw
one lockdep splat [1].

genlmsg_mcast() uses for_each_net_rcu(), and must therefore hold RCU.

Instead of letting all callers guard genlmsg_multicast_allns()
with a rcu_read_lock()/rcu_read_unlock() pair, do it in genlmsg_mcast().

This also means the @flags parameter is useless, we need to always use
GFP_ATOMIC.

[1]
[10882.424136] =============================
[10882.424166] WARNING: suspicious RCU usage
[10882.424309] 6.12.0-rc2-virtme #1156 Not tainted
[10882.424400] -----------------------------
[10882.424423] net/netlink/genetlink.c:1940 RCU-list traversed in non-reader section!!
[10882.424469]
other info that might help us debug this:

[10882.424500]
rcu_scheduler_active = 2, debug_locks = 1
[10882.424744] 2 locks held by ip/15677:
[10882.424791] #0: ffffffffb6b491b0 (cb_lock){++++}-{3:3}, at: genl_rcv (net/netlink/genetlink.c:1219)
[10882.426334] #1: ffffffffb6b49248 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg (net/netlink/genetlink.c:61 net/netlink/genetlink.c:57 net/netlink/genetlink.c:1209)
[10882.426465]
stack backtrace:
[10882.426805] CPU: 14 UID: 0 PID: 15677 Comm: ip Not tainted 6.12.0-rc2-virtme #1156
[10882.426919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[10882.427046] Call Trace:
[10882.427131]  <TASK>
[10882.427244] dump_stack_lvl (lib/dump_stack.c:123)
[10882.427335] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
[10882.427387] genlmsg_multicast_allns (net/netlink/genetlink.c:1940 (discriminator 7) net/netlink/genetlink.c:1977 (discriminator 7))
[10882.427436] l2tp_tunnel_notify.constprop.0 (net/l2tp/l2tp_netlink.c:119) l2tp_netlink
[10882.427683] l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:253) l2tp_netlink
[10882.427748] genl_family_rcv_msg_doit (net/netlink/genetlink.c:1115)
[10882.427834] genl_rcv_msg (net/netlink/genetlink.c:1195 net/netlink/genetlink.c:1210)
[10882.427877] ? __pfx_l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:186) l2tp_netlink
[10882.427927] ? __pfx_genl_rcv_msg (net/netlink/genetlink.c:1201)
[10882.427959] netlink_rcv_skb (net/netlink/af_netlink.c:2551)
[10882.428069] genl_rcv (net/netlink/genetlink.c:1220)
[10882.428095] netlink_unicast (net/netlink/af_netlink.c:1332 net/netlink/af_netlink.c:1357)
[10882.428140] netlink_sendmsg (net/netlink/af_netlink.c:1901)
[10882.428210] ____sys_sendmsg (net/socket.c:729 (discriminator 1) net/socket.c:744 (discriminator 1) net/socket.c:2607 (discriminator 1))

Fixes: 33f72e6f0c67 ("l2tp : multicast notification to the registered listeners")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: James Chapman <[email protected]>
Cc: Tom Parkin <[email protected]>
Cc: Johannes Berg <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361
Peter Rashleigh [Mon, 14 Oct 2024 20:43:42 +0000 (13:43 -0700)]
net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361

According to the Marvell datasheet the 88E6361 has two VTU pages
(4k VIDs per page) so the max_vid should be 8191, not 4095.

In the current implementation mv88e6xxx_vtu_walk() gives unexpected
results because of this error. I verified that mv88e6xxx_vtu_walk()
works correctly on the MV88E6361 with this patch in place.

Fixes: 12899f299803 ("net: dsa: mv88e6xxx: enable support for 88E6361 switch")
Signed-off-by: Peter Rashleigh <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agotcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
Kuniyuki Iwashima [Mon, 14 Oct 2024 22:33:12 +0000 (15:33 -0700)]
tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().

Martin KaFai Lau reported use-after-free [0] in reqsk_timer_handler().

  """
  We are seeing a use-after-free from a bpf prog attached to
  trace_tcp_retransmit_synack. The program passes the req->sk to the
  bpf_sk_storage_get_tracing kernel helper which does check for null
  before using it.
  """

The commit 83fccfc3940c ("inet: fix potential deadlock in
reqsk_queue_unlink()") added timer_pending() in reqsk_queue_unlink() not
to call del_timer_sync() from reqsk_timer_handler(), but it introduced a
small race window.

Before the timer is called, expire_timers() calls detach_timer(timer, true)
to clear timer->entry.pprev and marks it as not pending.

If reqsk_queue_unlink() checks timer_pending() just after expire_timers()
calls detach_timer(), TCP will miss del_timer_sync(); the reqsk timer will
continue running and send multiple SYN+ACKs until it expires.

The reported UAF could happen if req->sk is close()d earlier than the timer
expiration, which is 63s by default.

The scenario would be

  1. inet_csk_complete_hashdance() calls inet_csk_reqsk_queue_drop(),
     but del_timer_sync() is missed

  2. reqsk timer is executed and scheduled again

  3. req->sk is accept()ed and reqsk_put() decrements rsk_refcnt, but
     reqsk timer still has another one, and inet_csk_accept() does not
     clear req->sk for non-TFO sockets

  4. sk is close()d

  5. reqsk timer is executed again, and BPF touches req->sk

Let's not use timer_pending() by passing the caller context to
__inet_csk_reqsk_queue_drop().

Note that reqsk timer is pinned, so the issue does not happen in most
use cases. [1]

[0]
BUG: KFENCE: use-after-free read in bpf_sk_storage_get_tracing+0x2e/0x1b0

Use-after-free read at 0x00000000a891fb3a (in kfence-#1):
bpf_sk_storage_get_tracing+0x2e/0x1b0
bpf_prog_5ea3e95db6da0438_tcp_retransmit_synack+0x1d20/0x1dda
bpf_trace_run2+0x4c/0xc0
tcp_rtx_synack+0xf9/0x100
reqsk_timer_handler+0xda/0x3d0
run_timer_softirq+0x292/0x8a0
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
intel_idle_irq+0x5a/0xa0
cpuidle_enter_state+0x94/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb

kfence-#1: 0x00000000a72cc7b6-0x00000000d97616d9, size=2376, cache=TCPv6

allocated by task 0 on cpu 9 at 260507.901592s:
sk_prot_alloc+0x35/0x140
sk_clone_lock+0x1f/0x3f0
inet_csk_clone_lock+0x15/0x160
tcp_create_openreq_child+0x1f/0x410
tcp_v6_syn_recv_sock+0x1da/0x700
tcp_check_req+0x1fb/0x510
tcp_v6_rcv+0x98b/0x1420
ipv6_list_rcv+0x2258/0x26e0
napi_complete_done+0x5b1/0x2990
mlx5e_napi_poll+0x2ae/0x8d0
net_rx_action+0x13e/0x590
irq_exit_rcu+0xf5/0x320
common_interrupt+0x80/0x90
asm_common_interrupt+0x22/0x40
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb

freed by task 0 on cpu 9 at 260507.927527s:
rcu_core_si+0x4ff/0xf10
irq_exit_rcu+0xf5/0x320
sysvec_apic_timer_interrupt+0x6d/0x80
asm_sysvec_apic_timer_interrupt+0x16/0x20
cpuidle_enter_state+0xfb/0x273
cpu_startup_entry+0x15e/0x260
start_secondary+0x8a/0x90
secondary_startup_64_no_verify+0xfa/0xfb

Fixes: 83fccfc3940c ("inet: fix potential deadlock in reqsk_queue_unlink()")
Reported-by: Martin KaFai Lau <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/
Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Martin KaFai Lau <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoneighbour: Remove NEIGH_DN_TABLE.
Kuniyuki Iwashima [Mon, 14 Oct 2024 23:52:16 +0000 (16:52 -0700)]
neighbour: Remove NEIGH_DN_TABLE.

Since commit 1202cdd66531 ("Remove DECnet support from kernel"),
NEIGH_DN_TABLE is no longer used.

MPLS has implicit dependency on it in nla_put_via(), but nla_get_via()
does not support DECnet.

Let's remove NEIGH_DN_TABLE.

Now, neigh_tables[] has only 2 elements and no extra iteration
for DECnet in many places.

Signed-off-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: bcmasp: fix potential memory leak in bcmasp_xmit()
Wang Hai [Mon, 14 Oct 2024 14:59:01 +0000 (22:59 +0800)]
net: bcmasp: fix potential memory leak in bcmasp_xmit()

The bcmasp_xmit() returns NETDEV_TX_OK without freeing skb
in case of mapping fails, add dev_kfree_skb() to fix it.

Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Wang Hai <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agonet: cxgb3: Remove stid deadcode
Dr. David Alan Gilbert [Sun, 13 Oct 2024 01:29:46 +0000 (02:29 +0100)]
net: cxgb3: Remove stid deadcode

cxgb3_alloc_stid() and cxgb3_free_stid() have been unused since
commit 30e0f6cf5acb ("RDMA/iw_cxgb3: Remove the iw_cxgb3 module
from kernel")

Remove them.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
5 months agoMerge branch 'cxgb4-deadcode-removal'
Jakub Kicinski [Tue, 15 Oct 2024 23:47:45 +0000 (16:47 -0700)]
Merge branch 'cxgb4-deadcode-removal'

Dr. David Alan Gilbert says:

====================
cxgb4: Deadcode removal

This is a bunch of deadcode removal in cxgb4.

It's all complete function removal rather than any actual change to
logic.

Build and boot tested, but I don't have the hardware to test
the actual card.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
This page took 0.138991 seconds and 4 git commands to generate.