]> Git Repo - linux.git/log
linux.git
4 months agoMerge branch 'bonding-fix-ns-targets-not-work-on-hardware-nic'
Paolo Abeni [Thu, 14 Nov 2024 10:16:30 +0000 (11:16 +0100)]
Merge branch 'bonding-fix-ns-targets-not-work-on-hardware-nic'

Hangbin Liu says:

====================
bonding: fix ns targets not work on hardware NIC

The first patch fixed ns targets not work on hardware NIC when bonding
set arp_validate.

The second patch add a related selftest for bonding.

v4: Thanks Nikolay for the comments:
    use bond_slave_ns_maddrs_{add/del} with clear name
    fix comments typos
    remove _slave_set_ns_maddrs underscore directly
    update bond_option_arp_validate_set() change logic
v3: use ndisc_mc_map to convert the mcast mac address (Jay Vosburgh)
v2: only add/del mcast group on backup slaves when arp_validate is set (Jay Vosburgh)
    arp_validate doesn't support 3ad, tlb, alb. So let's only do it on ab mode.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
4 months agoselftests: bonding: add ns multicast group testing
Hangbin Liu [Mon, 11 Nov 2024 10:16:50 +0000 (10:16 +0000)]
selftests: bonding: add ns multicast group testing

Add a test to make sure the backup slaves join correct multicast group
when arp_validate enabled and ns_ip6_target is set. Here is the result:

TEST: arp_validate (active-backup ns_ip6_target arp_validate 0)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 1)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 2)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 3)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 4)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 5)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 6)     [ OK ]
TEST: arp_validate (join mcast group)                               [ OK ]

Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
4 months agobonding: add ns target multicast address to slave device
Hangbin Liu [Mon, 11 Nov 2024 10:16:49 +0000 (10:16 +0000)]
bonding: add ns target multicast address to slave device

Commit 4598380f9c54 ("bonding: fix ns validation on backup slaves")
tried to resolve the issue where backup slaves couldn't be brought up when
receiving IPv6 Neighbor Solicitation (NS) messages. However, this fix only
worked for drivers that receive all multicast messages, such as the veth
interface.

For standard drivers, the NS multicast message is silently dropped because
the slave device is not a member of the NS target multicast group.

To address this, we need to make the slave device join the NS target
multicast group, ensuring it can receive these IPv6 NS messages to validate
the slave’s status properly.

There are three policies before joining the multicast group:
1. All settings must be under active-backup mode (alb and tlb do not support
   arp_validate), with backup slaves and slaves supporting multicast.
2. We can add or remove multicast groups when arp_validate changes.
3. Other operations, such as enslaving, releasing, or setting NS targets,
   need to be guarded by arp_validate.

Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <[email protected]>
Reviewed-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
4 months agonet: ti: icssg-prueth: Fix 1 PPS sync
Meghana Malladi [Mon, 11 Nov 2024 09:58:42 +0000 (15:28 +0530)]
net: ti: icssg-prueth: Fix 1 PPS sync

The first PPS latch time needs to be calculated by the driver
(in rounded off seconds) and configured as the start time
offset for the cycle. After synchronizing two PTP clocks
running as master/slave, missing this would cause master
and slave to start immediately with some milliseconds
drift which causes the PPS signal to never synchronize with
the PTP master.

Fixes: 186734c15886 ("net: ti: icssg-prueth: add packet timestamping and ptp support")
Signed-off-by: Meghana Malladi <[email protected]>
Reviewed-by: Vadim Fedorenko <[email protected]>
Reviewed-by: MD Danish Anwar <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
4 months agostmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines
Vitalii Mordan [Fri, 8 Nov 2024 17:33:34 +0000 (20:33 +0300)]
stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines

If the clock dwmac->tx_clk was not enabled in intel_eth_plat_probe,
it should not be disabled in any path.

Conversely, if it was enabled in intel_eth_plat_probe, it must be disabled
in all error paths to ensure proper cleanup.

Found by Linux Verification Center (linuxtesting.org) with Klever.

Fixes: 9efc9b2b04c7 ("net: stmmac: Add dwmac-intel-plat for GBE driver")
Signed-off-by: Vitalii Mordan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet: Make copy_safe_from_sockptr() match documentation
Michal Luczaj [Sun, 10 Nov 2024 23:17:34 +0000 (00:17 +0100)]
net: Make copy_safe_from_sockptr() match documentation

copy_safe_from_sockptr()
  return copy_from_sockptr()
    return copy_from_sockptr_offset()
      return copy_from_user()

copy_from_user() does not return an error on fault. Instead, it returns a
number of bytes that were not copied. Have it handled.

Patch has a side effect: it un-breaks garbage input handling of
nfc_llcp_setsockopt() and mISDN's data_sock_setsockopt().

Fixes: 6309863b31dd ("net: add copy_safe_from_sockptr() helper")
Signed-off-by: Michal Luczaj <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol
Nícolas F. R. A. Prado [Sat, 9 Nov 2024 15:16:32 +0000 (10:16 -0500)]
net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol

The mediatek,mac-wol property is being handled backwards to what is
described in the binding: it currently enables PHY WOL when the property
is present and vice versa. Invert the driver logic so it matches the
binding description.

Fixes: fd1d62d80ebc ("net: stmmac: replace the use_phy_wol field with a flag")
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Link: https://patch.msgid.link/20241109-mediatek-mac-wol-noninverted-v2-1-0e264e213878@collabora.com
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoipmr: Fix access to mfc_cache_list without lock held
Breno Leitao [Fri, 8 Nov 2024 14:08:36 +0000 (06:08 -0800)]
ipmr: Fix access to mfc_cache_list without lock held

Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the
following code flow, the RCU read lock is not held, causing the
following error when `RCU_PROVE` is not held. The same problem might
show up in the IPv6 code path.

6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G            E    N
-----------------------------
net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!!

rcu_scheduler_active = 2, debug_locks = 1
   2 locks held by RetransmitAggre/3519:
    #0: ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290
    #1: ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90

stack backtrace:
    lockdep_rcu_suspicious
    mr_table_dump
    ipmr_rtm_dumproute
    rtnl_dump_all
    rtnl_dumpit
    netlink_dump
    __netlink_dump_start
    rtnetlink_rcv_msg
    netlink_rcv_skb
    netlink_unicast
    netlink_sendmsg

This is not a problem per see, since the RTNL lock is held here, so, it
is safe to iterate in the list without the RCU read lock, as suggested
by Eric.

To alleviate the concern, modify the code to use
list_for_each_entry_rcu() with the RTNL-held argument.

The annotation will raise an error only if RTNL or RCU read lock are
missing during iteration, signaling a legitimate problem, otherwise it
will avoid this false positive.

This will solve the IPv6 case as well, since ip6mr_rtm_dumproute() calls
this function as well.

Signed-off-by: Breno Leitao <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agosamples: pktgen: correct dev to DEV
Wei Fang [Tue, 12 Nov 2024 03:03:47 +0000 (11:03 +0800)]
samples: pktgen: correct dev to DEV

In the pktgen_sample01_simple.sh script, the device variable is uppercase
'DEV' instead of lowercase 'dev'. Because of this typo, the script cannot
enable UDP tx checksum.

Fixes: 460a9aa23de6 ("samples: pktgen: add UDP tx checksum support")
Signed-off-by: Wei Fang <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet: phylink: ensure PHY momentary link-fails are handled
Russell King (Oracle) [Tue, 12 Nov 2024 16:20:00 +0000 (16:20 +0000)]
net: phylink: ensure PHY momentary link-fails are handled

Normally, phylib won't notify changes in quick succession. However, as
a result of commit 3e43b903da04 ("net: phy: Immediately call
adjust_link if only tx_lpi_enabled changes") this is no longer true -
it is now possible that phy_link_down() and phy_link_up() will both
complete before phylink's resolver has run, which means it'll miss that
pl->phy_state.link momentarily became false.

Rename "mac_link_dropped" to be more generic "link_failed" since it will
cover more than the MAC/PCS end of the link failing, and arrange to set
this in phylink_phy_change() if we notice that the PHY reports that the
link is down.

This will ensure that we capture an EEE reconfiguration event.

Fixes: 3e43b903da04 ("net: phy: Immediately call adjust_link if only tx_lpi_enabled changes")
Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Oleksij Rempel <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge branch 'mptcp-pm-a-few-more-fixes'
Jakub Kicinski [Thu, 14 Nov 2024 02:51:09 +0000 (18:51 -0800)]
Merge branch 'mptcp-pm-a-few-more-fixes'

Matthieu Baerts says:

====================
mptcp: pm: a few more fixes

Three small fixes related to the MPTCP path-manager:

- Patch 1: correctly reflect the backup flag to the corresponding local
  address entry of the userspace path-manager. A fix for v5.19.

- Patch 2: hold the PM lock when deleting an entry from the local
  addresses of the userspace path-manager to avoid messing up with this
  list. A fix for v5.19.

- Patch 3: use _rcu variant to iterate the in-kernel path-manager's
  local addresses list, when under rcu_read_lock(). A fix for v5.17.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agomptcp: pm: use _rcu variant under rcu_read_lock
Matthieu Baerts (NGI0) [Tue, 12 Nov 2024 19:18:35 +0000 (20:18 +0100)]
mptcp: pm: use _rcu variant under rcu_read_lock

In mptcp_pm_create_subflow_or_signal_addr(), rcu_read_(un)lock() are
used as expected to iterate over the list of local addresses, but
list_for_each_entry() was used instead of list_for_each_entry_rcu() in
__lookup_addr(). It is important to use this variant which adds the
required READ_ONCE() (and diagnostic checks if enabled).

Because __lookup_addr() is also used in mptcp_pm_nl_set_flags() where it
is called under the pernet->lock and not rcu_read_lock(), an extra
condition is then passed to help the diagnostic checks making sure
either the associated spin lock or the RCU lock is held.

Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk")
Cc: [email protected]
Reviewed-by: Geliang Tang <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agomptcp: hold pm lock when deleting entry
Geliang Tang [Tue, 12 Nov 2024 19:18:34 +0000 (20:18 +0100)]
mptcp: hold pm lock when deleting entry

When traversing userspace_pm_local_addr_list and deleting an entry from
it in mptcp_pm_nl_remove_doit(), msk->pm.lock should be held.

This patch holds this lock before mptcp_userspace_pm_lookup_addr_by_id()
and releases it after list_move() in mptcp_pm_nl_remove_doit().

Fixes: d9a4594edabf ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE")
Cc: [email protected]
Signed-off-by: Geliang Tang <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agomptcp: update local address flags when setting it
Geliang Tang [Tue, 12 Nov 2024 19:18:33 +0000 (20:18 +0100)]
mptcp: update local address flags when setting it

Just like in-kernel pm, when userspace pm does set_flags, it needs to send
out MP_PRIO signal, and also modify the flags of the corresponding address
entry in the local address list. This patch implements the missing logic.

Traverse all address entries on userspace_pm_local_addr_list to find the
local address entry, if bkup is true, set the flags of this entry with
FLAG_BACKUP, otherwise, clear FLAG_BACKUP.

Fixes: 892f396c8e68 ("mptcp: netlink: issue MP_PRIO signals from userspace PMs")
Cc: [email protected]
Signed-off-by: Geliang Tang <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.
Alexandre Ferrieux [Sun, 10 Nov 2024 17:28:36 +0000 (18:28 +0100)]
net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.

To generate hnode handles (in gen_new_htid()), u32 uses IDR and
encodes the returned small integer into a structured 32-bit
word. Unfortunately, at disposal time, the needed decoding
is not done. As a result, idr_remove() fails, and the IDR
fills up. Since its size is 2048, the following script ends up
with "Filter already exists":

  tc filter add dev myve $FILTER1
  tc filter add dev myve $FILTER2
  for i in {1..2048}
  do
    echo $i
    tc filter del dev myve $FILTER2
    tc filter add dev myve $FILTER2
  done

This patch adds the missing decoding logic for handles that
deserve it.

Fixes: e7614370d6f0 ("net_sched: use idr to allocate u32 filter handles")
Reviewed-by: Eric Dumazet <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Alexandre Ferrieux <[email protected]>
Tested-by: Victor Nogueira <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMAINTAINERS: Re-add cancelled Renesas driver sections
Geert Uytterhoeven [Mon, 11 Nov 2024 10:03:21 +0000 (11:03 +0100)]
MAINTAINERS: Re-add cancelled Renesas driver sections

Removing full driver sections also removed mailing list entries, causing
submitters of future patches to forget CCing these mailing lists.

Hence re-add the sections for the Renesas Ethernet AVB, R-Car SATA, and
SuperH Ethernet drivers.  Add people who volunteered to maintain these
drivers (thanks a lot!), and mark all of them as supported.

Fixes: 6e90b675cf942e50 ("MAINTAINERS: Remove some entries due to various compliance requirements.")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Acked-by: Niklas Cassel <[email protected]>
Acked-by: Niklas Söderlund <[email protected]>
Reviewed-by: Paul Barker <[email protected]>
Link: https://patch.msgid.link/4b2105332edca277f07ffa195796975e9ddce994.1731319098.git.geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoRevert "igb: Disable threaded IRQ for igb_msix_other"
Wander Lairson Costa [Wed, 6 Nov 2024 11:14:26 +0000 (08:14 -0300)]
Revert "igb: Disable threaded IRQ for igb_msix_other"

This reverts commit 338c4d3902feb5be49bfda530a72c7ab860e2c9f.

Sebastian noticed the ISR indirectly acquires spin_locks, which are
sleeping locks under PREEMPT_RT, which leads to kernel splats.

Fixes: 338c4d3902feb ("igb: Disable threaded IRQ for igb_msix_other")
Reported-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Wander Lairson Costa <[email protected]>
Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
Acked-by: Przemek Kitszel <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge tag 'for-net-2024-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Jakub Kicinski [Wed, 13 Nov 2024 01:30:41 +0000 (17:30 -0800)]
Merge tag 'for-net-2024-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - btintel: Direct exception event to bluetooth stack
 - hci_core: Fix calling mgmt_device_connected

* tag 'for-net-2024-11-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: btintel: Direct exception event to bluetooth stack
  Bluetooth: hci_core: Fix calling mgmt_device_connected
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoBluetooth: btintel: Direct exception event to bluetooth stack
Kiran K [Tue, 22 Oct 2024 09:11:34 +0000 (14:41 +0530)]
Bluetooth: btintel: Direct exception event to bluetooth stack

Have exception event part of HCI traces which helps for debug.

snoop traces:
> HCI Event: Vendor (0xff) plen 79
        Vendor Prefix (0x8780)
      Intel Extended Telemetry (0x03)
        Unknown extended telemetry event type (0xde)
        01 01 de
        Unknown extended subevent 0x07
        01 01 de 07 01 de 06 1c ef be ad de ef be ad de
        ef be ad de ef be ad de ef be ad de ef be ad de
        ef be ad de 05 14 ef be ad de ef be ad de ef be
        ad de ef be ad de ef be ad de 43 10 ef be ad de
        ef be ad de ef be ad de ef be ad de

Fixes: af395330abed ("Bluetooth: btintel: Add Intel devcoredump support")
Signed-off-by: Kiran K <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
4 months agoBluetooth: hci_core: Fix calling mgmt_device_connected
Luiz Augusto von Dentz [Fri, 8 Nov 2024 16:19:54 +0000 (11:19 -0500)]
Bluetooth: hci_core: Fix calling mgmt_device_connected

Since 61a939c68ee0 ("Bluetooth: Queue incoming ACL data until
BT_CONNECTED state is reached") there is no long the need to call
mgmt_device_connected as ACL data will be queued until BT_CONNECTED
state.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219458
Link: https://github.com/bluez/bluez/issues/1014
Fixes: 333b4fd11e89 ("Bluetooth: L2CAP: Fix uaf in l2cap_connect")
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
4 months agoMerge branch 'virtio-vsock-fix-memory-leaks'
Paolo Abeni [Tue, 12 Nov 2024 11:16:54 +0000 (12:16 +0100)]
Merge branch 'virtio-vsock-fix-memory-leaks'

Michal Luczaj says:

====================
virtio/vsock: Fix memory leaks

Short series fixing some memory leaks that I've stumbled upon while toying
with the selftests.

Signed-off-by: Michal Luczaj <[email protected]>
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
4 months agovirtio/vsock: Improve MSG_ZEROCOPY error handling
Michal Luczaj [Thu, 7 Nov 2024 20:46:14 +0000 (21:46 +0100)]
virtio/vsock: Improve MSG_ZEROCOPY error handling

Add a missing kfree_skb() to prevent memory leaks.

Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Michal Luczaj <[email protected]>
Acked-by: Arseniy Krasnov <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
4 months agovsock: Fix sk_error_queue memory leak
Michal Luczaj [Thu, 7 Nov 2024 20:46:13 +0000 (21:46 +0100)]
vsock: Fix sk_error_queue memory leak

Kernel queues MSG_ZEROCOPY completion notifications on the error queue.
Where they remain, until explicitly recv()ed. To prevent memory leaks,
clean up the queue when the socket is destroyed.

unreferenced object 0xffff8881028beb00 (size 224):
  comm "vsock_test", pid 1218, jiffies 4294694897
  hex dump (first 32 bytes):
    90 b0 21 17 81 88 ff ff 90 b0 21 17 81 88 ff ff  ..!.......!.....
    00 00 00 00 00 00 00 00 00 b0 21 17 81 88 ff ff  ..........!.....
  backtrace (crc 6c7031ca):
    [<ffffffff81418ef7>] kmem_cache_alloc_node_noprof+0x2f7/0x370
    [<ffffffff81d35882>] __alloc_skb+0x132/0x180
    [<ffffffff81d2d32b>] sock_omalloc+0x4b/0x80
    [<ffffffff81d3a8ae>] msg_zerocopy_realloc+0x9e/0x240
    [<ffffffff81fe5cb2>] virtio_transport_send_pkt_info+0x412/0x4c0
    [<ffffffff81fe6183>] virtio_transport_stream_enqueue+0x43/0x50
    [<ffffffff81fe0813>] vsock_connectible_sendmsg+0x373/0x450
    [<ffffffff81d233d5>] ____sys_sendmsg+0x365/0x3a0
    [<ffffffff81d246f4>] ___sys_sendmsg+0x84/0xd0
    [<ffffffff81d26f47>] __sys_sendmsg+0x47/0x80
    [<ffffffff820d3df3>] do_syscall_64+0x93/0x180
    [<ffffffff8220012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
Signed-off-by: Michal Luczaj <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Acked-by: Arseniy Krasnov <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
4 months agovirtio/vsock: Fix accept_queue memory leak
Michal Luczaj [Thu, 7 Nov 2024 20:46:12 +0000 (21:46 +0100)]
virtio/vsock: Fix accept_queue memory leak

As the final stages of socket destruction may be delayed, it is possible
that virtio_transport_recv_listen() will be called after the accept_queue
has been flushed, but before the SOCK_DONE flag has been set. As a result,
sockets enqueued after the flush would remain unremoved, leading to a
memory leak.

vsock_release
  __vsock_release
    lock
    virtio_transport_release
      virtio_transport_close
        schedule_delayed_work(close_work)
    sk_shutdown = SHUTDOWN_MASK
(!) flush accept_queue
    release
                                        virtio_transport_recv_pkt
                                          vsock_find_bound_socket
                                          lock
                                          if flag(SOCK_DONE) return
                                          virtio_transport_recv_listen
                                            child = vsock_create_connected
                                      (!)   vsock_enqueue_accept(child)
                                          release
close_work
  lock
  virtio_transport_do_close
    set_flag(SOCK_DONE)
    virtio_transport_remove_sock
      vsock_remove_sock
        vsock_remove_bound
  release

Introduce a sk_shutdown check to disallow vsock_enqueue_accept() during
socket destruction.

unreferenced object 0xffff888109e3f800 (size 2040):
  comm "kworker/5:2", pid 371, jiffies 4294940105
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    28 00 0b 40 00 00 00 00 00 00 00 00 00 00 00 00  (..@............
  backtrace (crc 9e5f4e84):
    [<ffffffff81418ff1>] kmem_cache_alloc_noprof+0x2c1/0x360
    [<ffffffff81d27aa0>] sk_prot_alloc+0x30/0x120
    [<ffffffff81d2b54c>] sk_alloc+0x2c/0x4b0
    [<ffffffff81fe049a>] __vsock_create.constprop.0+0x2a/0x310
    [<ffffffff81fe6d6c>] virtio_transport_recv_pkt+0x4dc/0x9a0
    [<ffffffff81fe745d>] vsock_loopback_work+0xfd/0x140
    [<ffffffff810fc6ac>] process_one_work+0x20c/0x570
    [<ffffffff810fce3f>] worker_thread+0x1bf/0x3a0
    [<ffffffff811070dd>] kthread+0xdd/0x110
    [<ffffffff81044fdd>] ret_from_fork+0x2d/0x50
    [<ffffffff8100785a>] ret_from_fork_asm+0x1a/0x30

Fixes: 3fe356d58efa ("vsock/virtio: discard packets only when socket is really closed")
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Michal Luczaj <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
4 months agoMerge branch 'mlx5-misc-fixes-2024-11-07'
Jakub Kicinski [Tue, 12 Nov 2024 03:23:40 +0000 (19:23 -0800)]
Merge branch 'mlx5-misc-fixes-2024-11-07'

Tariq Toukan says:

====================
mlx5 misc fixes 2024-11-07

This patchset provides misc bug fixes from the team to the mlx5 core and
Eth drivers.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet/mlx5e: Disable loopback self-test on multi-PF netdev
Carolina Jubran [Thu, 7 Nov 2024 18:35:27 +0000 (20:35 +0200)]
net/mlx5e: Disable loopback self-test on multi-PF netdev

In Multi-PF (Socket Direct) configurations, when a loopback packet is
sent through one of the secondary devices, it will always be received
on the primary device. This causes the loopback layer to fail in
identifying the loopback packet as the devices are different.

To avoid false test failures, disable the loopback self-test in
Multi-PF configurations.

Fixes: ed29705e4ed1 ("net/mlx5: Enable SD feature")
Signed-off-by: Carolina Jubran <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet/mlx5e: CT: Fix null-ptr-deref in add rule err flow
Moshe Shemesh [Thu, 7 Nov 2024 18:35:26 +0000 (20:35 +0200)]
net/mlx5e: CT: Fix null-ptr-deref in add rule err flow

In error flow of mlx5_tc_ct_entry_add_rule(), in case ct_rule_add()
callback returns error, zone_rule->attr is used uninitiated. Fix it to
use attr which has the needed pointer value.

Kernel log:
 BUG: kernel NULL pointer dereference, address: 0000000000000110
 RIP: 0010:mlx5_tc_ct_entry_add_rule+0x2b1/0x2f0 [mlx5_core]

 Call Trace:
  <TASK>
  ? __die+0x20/0x70
  ? page_fault_oops+0x150/0x3e0
  ? exc_page_fault+0x74/0x140
  ? asm_exc_page_fault+0x22/0x30
  ? mlx5_tc_ct_entry_add_rule+0x2b1/0x2f0 [mlx5_core]
  ? mlx5_tc_ct_entry_add_rule+0x1d5/0x2f0 [mlx5_core]
  mlx5_tc_ct_block_flow_offload+0xc6a/0xf90 [mlx5_core]
  ? nf_flow_offload_tuple+0xd8/0x190 [nf_flow_table]
  nf_flow_offload_tuple+0xd8/0x190 [nf_flow_table]
  flow_offload_work_handler+0x142/0x320 [nf_flow_table]
  ? finish_task_switch.isra.0+0x15b/0x2b0
  process_one_work+0x16c/0x320
  worker_thread+0x28c/0x3a0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xb8/0xf0
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x2d/0x50
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>

Fixes: 7fac5c2eced3 ("net/mlx5: CT: Avoid reusing modify header context for natted entries")
Signed-off-by: Moshe Shemesh <[email protected]>
Reviewed-by: Cosmin Ratiu <[email protected]>
Reviewed-by: Yevgeny Kliteynik <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet/mlx5e: clear xdp features on non-uplink representors
William Tu [Thu, 7 Nov 2024 18:35:25 +0000 (20:35 +0200)]
net/mlx5e: clear xdp features on non-uplink representors

Non-uplink representor port does not support XDP. The patch clears
the xdp feature by checking the net_device_ops.ndo_bpf is set or not.

Verify using the netlink tool:
$ tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml --dump dev-get

Representor netdev before the patch:
{'ifindex': 8,
  'xdp-features': {'basic',
                   'ndo-xmit',
                   'ndo-xmit-sg',
                   'redirect',
                   'rx-sg',
                   'xsk-zerocopy'},
  'xdp-rx-metadata-features': set(),
  'xdp-zc-max-segs': 1,
  'xsk-features': set()},
With the patch:
 {'ifindex': 8,
  'xdp-features': set(),
  'xdp-rx-metadata-features': set(),
  'xsk-features': set()},

Fixes: 4d5ab0ad964d ("net/mlx5e: take into account device reconfiguration for xdp_features flag")
Signed-off-by: William Tu <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet/mlx5e: kTLS, Fix incorrect page refcounting
Dragos Tatulea [Thu, 7 Nov 2024 18:35:24 +0000 (20:35 +0200)]
net/mlx5e: kTLS, Fix incorrect page refcounting

The kTLS tx handling code is using a mix of get_page() and
page_ref_inc() APIs to increment the page reference. But on the release
path (mlx5e_ktls_tx_handle_resync_dump_comp()), only put_page() is used.

This is an issue when using pages from large folios: the get_page()
references are stored on the folio page while the page_ref_inc()
references are stored directly in the given page. On release the folio
page will be dereferenced too many times.

This was found while doing kTLS testing with sendfile() + ZC when the
served file was read from NFS on a kernel with NFS large folios support
(commit 49b29a573da8 ("nfs: add support for large folios")).

Fixes: 84d1bb2b139e ("net/mlx5e: kTLS, Limit DUMP wqe size")
Signed-off-by: Dragos Tatulea <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet/mlx5: fs, lock FTE when checking if active
Mark Bloch [Thu, 7 Nov 2024 18:35:23 +0000 (20:35 +0200)]
net/mlx5: fs, lock FTE when checking if active

The referenced commits introduced a two-step process for deleting FTEs:

- Lock the FTE, delete it from hardware, set the hardware deletion function
  to NULL and unlock the FTE.
- Lock the parent flow group, delete the software copy of the FTE, and
  remove it from the xarray.

However, this approach encounters a race condition if a rule with the same
match value is added simultaneously. In this scenario, fs_core may set the
hardware deletion function to NULL prematurely, causing a panic during
subsequent rule deletions.

To prevent this, ensure the active flag of the FTE is checked under a lock,
which will prevent the fs_core layer from attaching a new steering rule to
an FTE that is in the process of deletion.

[  438.967589] MOSHE: 2496 mlx5_del_flow_rules del_hw_func
[  438.968205] ------------[ cut here ]------------
[  438.968654] refcount_t: decrement hit 0; leaking memory.
[  438.969249] WARNING: CPU: 0 PID: 8957 at lib/refcount.c:31 refcount_warn_saturate+0xfb/0x110
[  438.970054] Modules linked in: act_mirred cls_flower act_gact sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core zram zsmalloc fuse [last unloaded: cls_flower]
[  438.973288] CPU: 0 UID: 0 PID: 8957 Comm: tc Not tainted 6.12.0-rc1+ #8
[  438.973888] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  438.974874] RIP: 0010:refcount_warn_saturate+0xfb/0x110
[  438.975363] Code: 40 66 3b 82 c6 05 16 e9 4d 01 01 e8 1f 7c a0 ff 0f 0b c3 cc cc cc cc 48 c7 c7 10 66 3b 82 c6 05 fd e8 4d 01 01 e8 05 7c a0 ff <0f> 0b c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90
[  438.976947] RSP: 0018:ffff888124a53610 EFLAGS: 00010286
[  438.977446] RAX: 0000000000000000 RBX: ffff888119d56de0 RCX: 0000000000000000
[  438.978090] RDX: ffff88852c828700 RSI: ffff88852c81b3c0 RDI: ffff88852c81b3c0
[  438.978721] RBP: ffff888120fa0e88 R08: 0000000000000000 R09: ffff888124a534b0
[  438.979353] R10: 0000000000000001 R11: 0000000000000001 R12: ffff888119d56de0
[  438.979979] R13: ffff888120fa0ec0 R14: ffff888120fa0ee8 R15: ffff888119d56de0
[  438.980607] FS:  00007fe6dcc0f800(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000
[  438.983984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  438.984544] CR2: 00000000004275e0 CR3: 0000000186982001 CR4: 0000000000372eb0
[  438.985205] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  438.985842] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  438.986507] Call Trace:
[  438.986799]  <TASK>
[  438.987070]  ? __warn+0x7d/0x110
[  438.987426]  ? refcount_warn_saturate+0xfb/0x110
[  438.987877]  ? report_bug+0x17d/0x190
[  438.988261]  ? prb_read_valid+0x17/0x20
[  438.988659]  ? handle_bug+0x53/0x90
[  438.989054]  ? exc_invalid_op+0x14/0x70
[  438.989458]  ? asm_exc_invalid_op+0x16/0x20
[  438.989883]  ? refcount_warn_saturate+0xfb/0x110
[  438.990348]  mlx5_del_flow_rules+0x2f7/0x340 [mlx5_core]
[  438.990932]  __mlx5_eswitch_del_rule+0x49/0x170 [mlx5_core]
[  438.991519]  ? mlx5_lag_is_sriov+0x3c/0x50 [mlx5_core]
[  438.992054]  ? xas_load+0x9/0xb0
[  438.992407]  mlx5e_tc_rule_unoffload+0x45/0xe0 [mlx5_core]
[  438.993037]  mlx5e_tc_del_fdb_flow+0x2a6/0x2e0 [mlx5_core]
[  438.993623]  mlx5e_flow_put+0x29/0x60 [mlx5_core]
[  438.994161]  mlx5e_delete_flower+0x261/0x390 [mlx5_core]
[  438.994728]  tc_setup_cb_destroy+0xb9/0x190
[  438.995150]  fl_hw_destroy_filter+0x94/0xc0 [cls_flower]
[  438.995650]  fl_change+0x11a4/0x13c0 [cls_flower]
[  438.996105]  tc_new_tfilter+0x347/0xbc0
[  438.996503]  ? ___slab_alloc+0x70/0x8c0
[  438.996929]  rtnetlink_rcv_msg+0xf9/0x3e0
[  438.997339]  ? __netlink_sendskb+0x4c/0x70
[  438.997751]  ? netlink_unicast+0x286/0x2d0
[  438.998171]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[  438.998625]  netlink_rcv_skb+0x54/0x100
[  438.999020]  netlink_unicast+0x203/0x2d0
[  438.999421]  netlink_sendmsg+0x1e4/0x420
[  438.999820]  __sock_sendmsg+0xa1/0xb0
[  439.000203]  ____sys_sendmsg+0x207/0x2a0
[  439.000600]  ? copy_msghdr_from_user+0x6d/0xa0
[  439.001072]  ___sys_sendmsg+0x80/0xc0
[  439.001459]  ? ___sys_recvmsg+0x8b/0xc0
[  439.001848]  ? generic_update_time+0x4d/0x60
[  439.002282]  __sys_sendmsg+0x51/0x90
[  439.002658]  do_syscall_64+0x50/0x110
[  439.003040]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 718ce4d601db ("net/mlx5: Consolidate update FTE for all removal changes")
Fixes: cefc23554fc2 ("net/mlx5: Fix FTE cleanup")
Signed-off-by: Mark Bloch <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet/mlx5: Fix msix vectors to respect platform limit
Parav Pandit [Thu, 7 Nov 2024 18:35:22 +0000 (20:35 +0200)]
net/mlx5: Fix msix vectors to respect platform limit

The number of PCI vectors allocated by the platform (which may be fewer
than requested) is currently not honored when creating the SF pool;
only the PCI MSI-X capability is considered.

As a result, when a platform allocates fewer vectors
(in non-dynamic mode) than requested, the PF and SF pools end up
with an invalid vector range.

This causes incorrect SF vector accounting, which leads to the
following call trace when an invalid IRQ vector is allocated.

This issue is resolved by ensuring that the platform's vector
limit is respected for both the SF and PF pools.

Workqueue: mlx5_vhca_event0 mlx5_sf_dev_add_active_work [mlx5_core]
RIP: 0010:pci_irq_vector+0x23/0x80
RSP: 0018:ffffabd5cebd7248 EFLAGS: 00010246
RAX: ffff980880e7f308 RBX: ffff9808932fb880 RCX: 0000000000000001
RDX: 00000000000001ff RSI: 0000000000000200 RDI: ffff980880e7f308
RBP: 0000000000000200 R08: 0000000000000010 R09: ffff97a9116f0860
R10: 0000000000000002 R11: 0000000000000228 R12: ffff980897cd0160
R13: 0000000000000000 R14: ffff97a920fec0c0 R15: ffffabd5cebd72d0
FS:  0000000000000000(0000) GS:ffff97c7ff9c0000(0000) knlGS:0000000000000000
 ? rescuer_thread+0x350/0x350
 kthread+0x11b/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30
mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22
mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22
mlx5_core.sf mlx5_core.sf.6: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced)
mlx5_core.sf mlx5_core.sf.7: firmware version: 32.43.356
mlx5_core.sf mlx5_core.sf.6 enpa1s0f0s4: renamed from eth0
mlx5_core.sf mlx5_core.sf.7: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps
mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22
mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22
mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22

Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Amir Tzin <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet/mlx5: E-switch, unload IB representors when unloading ETH representors
Chiara Meiohas [Thu, 7 Nov 2024 18:35:21 +0000 (20:35 +0200)]
net/mlx5: E-switch, unload IB representors when unloading ETH representors

IB representors depend on ETH representors, so the IB representors
should not exist without the ETH ones. When unloading the ETH
representors, the corresponding IB representors should be also
unloaded.

The commit 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions")
introduced the use of the ib_device_set_netdev API in IB
repsresentors. ib_device_set_netdev() increments the refcount of
the representor's netdev when loading an IB representor and
decrements it when unloading.
Without the unloading of the IB representor, the refcount of the
representor's netdev remains greater than 0, preventing it from
being unregistered.
The patch uncovered an underlying bug where the eth representor is
unloaded, without unloading the IB representor.

This issue happened when using multiport E-switch and rebooting,
causing the shutdown to hang when unloading the ETH representor
because the refcount of the representor's netdevice was greater than 0.

Call trace:
unregister_netdevice: waiting for eth3 to become free. Usage count = 2
ref_tracker: eth%d@00000000661d60f7 has 1/1 users at
ib_device_set_netdev+0x160/0x2d0 [ib_core]
mlx5_ib_vport_rep_load+0x104/0x3f0 [mlx5_ib]
mlx5_eswitch_reload_ib_reps+0xfc/0x110 [mlx5_core]
mlx5_mpesw_work+0x236/0x330 [mlx5_core]
process_one_work+0x169/0x320
worker_thread+0x288/0x3a0
kthread+0xb8/0xe0
ret_from_fork+0x2d/0x50
ret_from_fork_asm+0x11/0x20

Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions")
Signed-off-by: Chiara Meiohas <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge branch 'mptcp-fix-a-couple-of-races'
Jakub Kicinski [Tue, 12 Nov 2024 03:06:36 +0000 (19:06 -0800)]
Merge branch 'mptcp-fix-a-couple-of-races'

Paolo Abeni says:

====================
mptcp: fix a couple of races

The first patch addresses a division by zero issue reported by Eric,
the second one solves a similar issue found by code inspection while
investigating the former.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agomptcp: cope racing subflow creation in mptcp_rcv_space_adjust
Paolo Abeni [Fri, 8 Nov 2024 10:58:17 +0000 (11:58 +0100)]
mptcp: cope racing subflow creation in mptcp_rcv_space_adjust

Additional active subflows - i.e. created by the in kernel path
manager - are included into the subflow list before starting the
3whs.

A racing recvmsg() spooling data received on an already established
subflow would unconditionally call tcp_cleanup_rbuf() on all the
current subflows, potentially hitting a divide by zero error on
the newly created ones.

Explicitly check that the subflow is in a suitable state before
invoking tcp_cleanup_rbuf().

Fixes: c76c6956566f ("mptcp: call tcp_cleanup_rbuf on subflows")
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/02374660836e1b52afc91966b7535c8c5f7bafb0.1731060874.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agomptcp: error out earlier on disconnect
Paolo Abeni [Fri, 8 Nov 2024 10:58:16 +0000 (11:58 +0100)]
mptcp: error out earlier on disconnect

Eric reported a division by zero splat in the MPTCP protocol:

Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 UID: 0 PID: 6094 Comm: syz-executor317 Not tainted
6.12.0-rc5-syzkaller-00291-g05b92660cdfe #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 09/13/2024
RIP: 0010:__tcp_select_window+0x5b4/0x1310 net/ipv4/tcp_output.c:3163
Code: f6 44 01 e3 89 df e8 9b 75 09 f8 44 39 f3 0f 8d 11 ff ff ff e8
0d 74 09 f8 45 89 f4 e9 04 ff ff ff e8 00 74 09 f8 44 89 f0 99 <f7> 7c
24 14 41 29 d6 45 89 f4 e9 ec fe ff ff e8 e8 73 09 f8 48 89
RSP: 0018:ffffc900041f7930 EFLAGS: 00010293
RAX: 0000000000017e67 RBX: 0000000000017e67 RCX: ffffffff8983314b
RDX: 0000000000000000 RSI: ffffffff898331b0 RDI: 0000000000000004
RBP: 00000000005d6000 R08: 0000000000000004 R09: 0000000000017e67
R10: 0000000000003e80 R11: 0000000000000000 R12: 0000000000003e80
R13: ffff888031d9b440 R14: 0000000000017e67 R15: 00000000002eb000
FS: 00007feb5d7f16c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feb5d8adbb8 CR3: 0000000074e4c000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__tcp_cleanup_rbuf+0x3e7/0x4b0 net/ipv4/tcp.c:1493
mptcp_rcv_space_adjust net/mptcp/protocol.c:2085 [inline]
mptcp_recvmsg+0x2156/0x2600 net/mptcp/protocol.c:2289
inet_recvmsg+0x469/0x6a0 net/ipv4/af_inet.c:885
sock_recvmsg_nosec net/socket.c:1051 [inline]
sock_recvmsg+0x1b2/0x250 net/socket.c:1073
__sys_recvfrom+0x1a5/0x2e0 net/socket.c:2265
__do_sys_recvfrom net/socket.c:2283 [inline]
__se_sys_recvfrom net/socket.c:2279 [inline]
__x64_sys_recvfrom+0xe0/0x1c0 net/socket.c:2279
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7feb5d857559
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 18 00 00 90 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007feb5d7f1208 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
RAX: ffffffffffffffda RBX: 00007feb5d8e1318 RCX: 00007feb5d857559
RDX: 000000800000000e RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007feb5d8e1310 R08: 0000000000000000 R09: ffffffff81000000
R10: 0000000000000100 R11: 0000000000000246 R12: 00007feb5d8e131c
R13: 00007feb5d8ae074 R14: 000000800000000e R15: 00000000fffffdef

and provided a nice reproducer.

The root cause is the current bad handling of racing disconnect.
After the blamed commit below, sk_wait_data() can return (with
error) with the underlying socket disconnected and a zero rcv_mss.

Catch the error and return without performing any additional
operations on the current socket.

Reported-by: Eric Dumazet <[email protected]>
Fixes: 419ce133ab92 ("tcp: allow again tcp_disconnect() when threads are waiting")
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/8c82ecf71662ecbc47bf390f9905de70884c9f2d.1731060874.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet: clarify SO_DEVMEM_DONTNEED behavior in documentation
Mina Almasry [Thu, 7 Nov 2024 21:03:31 +0000 (21:03 +0000)]
net: clarify SO_DEVMEM_DONTNEED behavior in documentation

Document new behavior when the number of frags passed is too big.

Signed-off-by: Mina Almasry <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet: fix SO_DEVMEM_DONTNEED looping too long
Mina Almasry [Thu, 7 Nov 2024 21:03:30 +0000 (21:03 +0000)]
net: fix SO_DEVMEM_DONTNEED looping too long

Exit early if we're freeing more than 1024 frags, to prevent
looping too long.

Also minor code cleanups:
- Flip checks to reduce indentation.
- Use sizeof(*tokens) everywhere for consistentcy.

Cc: Yi Lai <[email protected]>
Signed-off-by: Mina Almasry <[email protected]>
Acked-by: Stanislav Fomichev <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet: fix data-races around sk->sk_forward_alloc
Wang Liang [Thu, 7 Nov 2024 02:34:05 +0000 (10:34 +0800)]
net: fix data-races around sk->sk_forward_alloc

Syzkaller reported this warning:
 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 16 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x1c5/0x1e0
 Modules linked in:
 CPU: 0 UID: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.12.0-rc5 #26
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
 RIP: 0010:inet_sock_destruct+0x1c5/0x1e0
 Code: 24 12 4c 89 e2 5b 48 c7 c7 98 ec bb 82 41 5c e9 d1 18 17 ff 4c 89 e6 5b 48 c7 c7 d0 ec bb 82 41 5c e9 bf 18 17 ff 0f 0b eb 83 <0f> 0b eb 97 0f 0b eb 87 0f 0b e9 68 ff ff ff 66 66 2e 0f 1f 84 00
 RSP: 0018:ffffc9000008bd90 EFLAGS: 00010206
 RAX: 0000000000000300 RBX: ffff88810b172a90 RCX: 0000000000000007
 RDX: 0000000000000002 RSI: 0000000000000300 RDI: ffff88810b172a00
 RBP: ffff88810b172a00 R08: ffff888104273c00 R09: 0000000000100007
 R10: 0000000000020000 R11: 0000000000000006 R12: ffff88810b172a00
 R13: 0000000000000004 R14: 0000000000000000 R15: ffff888237c31f78
 FS:  0000000000000000(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007ffc63fecac8 CR3: 000000000342e000 CR4: 00000000000006f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <TASK>
  ? __warn+0x88/0x130
  ? inet_sock_destruct+0x1c5/0x1e0
  ? report_bug+0x18e/0x1a0
  ? handle_bug+0x53/0x90
  ? exc_invalid_op+0x18/0x70
  ? asm_exc_invalid_op+0x1a/0x20
  ? inet_sock_destruct+0x1c5/0x1e0
  __sk_destruct+0x2a/0x200
  rcu_do_batch+0x1aa/0x530
  ? rcu_do_batch+0x13b/0x530
  rcu_core+0x159/0x2f0
  handle_softirqs+0xd3/0x2b0
  ? __pfx_smpboot_thread_fn+0x10/0x10
  run_ksoftirqd+0x25/0x30
  smpboot_thread_fn+0xdd/0x1d0
  kthread+0xd3/0x100
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x34/0x50
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>
 ---[ end trace 0000000000000000 ]---

Its possible that two threads call tcp_v6_do_rcv()/sk_forward_alloc_add()
concurrently when sk->sk_state == TCP_LISTEN with sk->sk_lock unlocked,
which triggers a data-race around sk->sk_forward_alloc:
tcp_v6_rcv
    tcp_v6_do_rcv
        skb_clone_and_charge_r
            sk_rmem_schedule
                __sk_mem_schedule
                    sk_forward_alloc_add()
            skb_set_owner_r
                sk_mem_charge
                    sk_forward_alloc_add()
        __kfree_skb
            skb_release_all
                skb_release_head_state
                    sock_rfree
                        sk_mem_uncharge
                            sk_forward_alloc_add()
                            sk_mem_reclaim
                                // set local var reclaimable
                                __sk_mem_reclaim
                                    sk_forward_alloc_add()

In this syzkaller testcase, two threads call
tcp_v6_do_rcv() with skb->truesize=768, the sk_forward_alloc changes like
this:
 (cpu 1)             | (cpu 2)             | sk_forward_alloc
 ...                 | ...                 | 0
 __sk_mem_schedule() |                     | +4096 = 4096
                     | __sk_mem_schedule() | +4096 = 8192
 sk_mem_charge()     |                     | -768  = 7424
                     | sk_mem_charge()     | -768  = 6656
 ...                 |    ...              |
 sk_mem_uncharge()   |                     | +768  = 7424
 reclaimable=7424    |                     |
                     | sk_mem_uncharge()   | +768  = 8192
                     | reclaimable=8192    |
 __sk_mem_reclaim()  |                     | -4096 = 4096
                     | __sk_mem_reclaim()  | -8192 = -4096 != 0

The skb_clone_and_charge_r() should not be called in tcp_v6_do_rcv() when
sk->sk_state is TCP_LISTEN, it happens later in tcp_v6_syn_recv_sock().
Fix the same issue in dccp_v6_do_rcv().

Suggested-by: Eric Dumazet <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Signed-off-by: Wang Liang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoselftests: net: add netlink-dumps to .gitignore
Jakub Kicinski [Fri, 8 Nov 2024 00:47:31 +0000 (16:47 -0800)]
selftests: net: add netlink-dumps to .gitignore

Commit 55d42a0c3f9c ("selftests: net: add a test for closing
a netlink socket ith dump in progress") added a new test
but did not add it to gitignore.

Reviewed-by: Joe Damato <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet: vertexcom: mse102x: Fix tx_bytes calculation
Stefan Wahren [Fri, 8 Nov 2024 11:43:43 +0000 (12:43 +0100)]
net: vertexcom: mse102x: Fix tx_bytes calculation

The tx_bytes should consider the actual size of the Ethernet frames
without the SPI encapsulation. But we still need to take care of
Ethernet padding.

Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agosctp: fix possible UAF in sctp_v6_available()
Eric Dumazet [Thu, 7 Nov 2024 19:20:21 +0000 (19:20 +0000)]
sctp: fix possible UAF in sctp_v6_available()

A lockdep report [1] with CONFIG_PROVE_RCU_LIST=y hints
that sctp_v6_available() is calling dev_get_by_index_rcu()
and ipv6_chk_addr() without holding rcu.

[1]
 =============================
 WARNING: suspicious RCU usage
 6.12.0-rc5-virtme #1216 Tainted: G        W
 -----------------------------
 net/core/dev.c:876 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by sctp_hello/31495:
 #0: ffff9f1ebbdb7418 (sk_lock-AF_INET6){+.+.}-{0:0}, at: sctp_bind (./arch/x86/include/asm/jump_label.h:27 net/sctp/socket.c:315) sctp

stack backtrace:
 CPU: 7 UID: 0 PID: 31495 Comm: sctp_hello Tainted: G        W          6.12.0-rc5-virtme #1216
 Tainted: [W]=WARN
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
 Call Trace:
  <TASK>
 dump_stack_lvl (lib/dump_stack.c:123)
 lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822)
 dev_get_by_index_rcu (net/core/dev.c:876 (discriminator 7))
 sctp_v6_available (net/sctp/ipv6.c:701) sctp
 sctp_do_bind (net/sctp/socket.c:400 (discriminator 1)) sctp
 sctp_bind (net/sctp/socket.c:320) sctp
 inet6_bind_sk (net/ipv6/af_inet6.c:465)
 ? security_socket_bind (security/security.c:4581 (discriminator 1))
 __sys_bind (net/socket.c:1848 net/socket.c:1869)
 ? do_user_addr_fault (./include/linux/rcupdate.h:347 ./include/linux/rcupdate.h:880 ./include/linux/mm.h:729 arch/x86/mm/fault.c:1340)
 ? do_user_addr_fault (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:98 (discriminator 13) ./include/linux/rcupdate.h:882 (discriminator 13) ./include/linux/mm.h:729 (discriminator 13) arch/x86/mm/fault.c:1340 (discriminator 13))
 __x64_sys_bind (net/socket.c:1877 (discriminator 1) net/socket.c:1875 (discriminator 1) net/socket.c:1875 (discriminator 1))
 do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
 RIP: 0033:0x7f59b934a1e7
 Code: 44 00 00 48 8b 15 39 8c 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 09 8c 0c 00 f7 d8 64 89 01 48
All code
========
   0: 44 00 00              add    %r8b,(%rax)
   3: 48 8b 15 39 8c 0c 00  mov    0xc8c39(%rip),%rdx        # 0xc8c43
   a: f7 d8                 neg    %eax
   c: 64 89 02              mov    %eax,%fs:(%rdx)
   f: b8 ff ff ff ff        mov    $0xffffffff,%eax
  14: eb bd                 jmp    0xffffffffffffffd3
  16: 66 2e 0f 1f 84 00 00  cs nopw 0x0(%rax,%rax,1)
  1d: 00 00 00
  20: 0f 1f 00              nopl   (%rax)
  23: b8 31 00 00 00        mov    $0x31,%eax
  28: 0f 05                 syscall
  2a:* 48 3d 01 f0 ff ff     cmp    $0xfffffffffffff001,%rax <-- trapping instruction
  30: 73 01                 jae    0x33
  32: c3                    ret
  33: 48 8b 0d 09 8c 0c 00  mov    0xc8c09(%rip),%rcx        # 0xc8c43
  3a: f7 d8                 neg    %eax
  3c: 64 89 01              mov    %eax,%fs:(%rcx)
  3f: 48                    rex.W

Code starting with the faulting instruction
===========================================
   0: 48 3d 01 f0 ff ff     cmp    $0xfffffffffffff001,%rax
   6: 73 01                 jae    0x9
   8: c3                    ret
   9: 48 8b 0d 09 8c 0c 00  mov    0xc8c09(%rip),%rcx        # 0xc8c19
  10: f7 d8                 neg    %eax
  12: 64 89 01              mov    %eax,%fs:(%rcx)
  15: 48                    rex.W
 RSP: 002b:00007ffe2d0ad398 EFLAGS: 00000202 ORIG_RAX: 0000000000000031
 RAX: ffffffffffffffda RBX: 00007ffe2d0ad3d0 RCX: 00007f59b934a1e7
 RDX: 000000000000001c RSI: 00007ffe2d0ad3d0 RDI: 0000000000000005
 RBP: 0000000000000005 R08: 1999999999999999 R09: 0000000000000000
 R10: 00007f59b9253298 R11: 0000000000000202 R12: 00007ffe2d0ada61
 R13: 0000000000000000 R14: 0000562926516dd8 R15: 00007f59b9479000
  </TASK>

Fixes: 6fe1e52490a9 ("sctp: check ipv6 addr with sk_bound_dev if set")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Xin Long <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoselftests: net: add a test for closing a netlink socket ith dump in progress
Jakub Kicinski [Wed, 6 Nov 2024 01:52:35 +0000 (17:52 -0800)]
selftests: net: add a test for closing a netlink socket ith dump in progress

Close a socket with dump in progress. We need a dump which generates
enough info not to fit into a single skb. Policy dump fits the bill.

Use the trick discovered by syzbot for keeping a ref on the socket
longer than just close, with mqueue.

  TAP version 13
  1..3
  # Starting 3 tests from 1 test cases.
  #  RUN           global.test_sanity ...
  #            OK  global.test_sanity
  ok 1 global.test_sanity
  #  RUN           global.close_in_progress ...
  #            OK  global.close_in_progress
  ok 2 global.close_in_progress
  #  RUN           global.close_with_ref ...
  #            OK  global.close_with_ref
  ok 3 global.close_with_ref
  # PASSED: 3 / 3 tests passed.
  # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0

Note that this test is not expected to fail but rather crash
the kernel if we get the cleanup wrong.

Reviewed-by: Kuniyuki Iwashima <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonetlink: terminate outstanding dump on socket close
Jakub Kicinski [Wed, 6 Nov 2024 01:52:34 +0000 (17:52 -0800)]
netlink: terminate outstanding dump on socket close

Netlink supports iterative dumping of data. It provides the families
the following ops:
 - start - (optional) kicks off the dumping process
 - dump  - actual dump helper, keeps getting called until it returns 0
 - done  - (optional) pairs with .start, can be used for cleanup
The whole process is asynchronous and the repeated calls to .dump
don't actually happen in a tight loop, but rather are triggered
in response to recvmsg() on the socket.

This gives the user full control over the dump, but also means that
the user can close the socket without getting to the end of the dump.
To make sure .start is always paired with .done we check if there
is an ongoing dump before freeing the socket, and if so call .done.

The complication is that sockets can get freed from BH and .done
is allowed to sleep. So we use a workqueue to defer the call, when
needed.

Unfortunately this does not work correctly. What we defer is not
the cleanup but rather releasing a reference on the socket.
We have no guarantee that we own the last reference, if someone
else holds the socket they may release it in BH and we're back
to square one.

The whole dance, however, appears to be unnecessary. Only the user
can interact with dumps, so we can clean up when socket is closed.
And close always happens in process context. Some async code may
still access the socket after close, queue notification skbs to it etc.
but no dumps can start, end or otherwise make progress.

Delete the workqueue and flush the dump state directly from the release
handler. Note that further cleanup is possible in -next, for instance
we now always call .done before releasing the main module reference,
so dump doesn't have to take a reference of its own.

Reported-by: syzkaller <[email protected]>
Fixes: ed5d7788a934 ("netlink: Do not schedule work from sk_destruct")
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge tag 'net-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 7 Nov 2024 21:07:57 +0000 (11:07 -1000)]
Merge tag 'net-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from can and netfilter.

  Things are slowing down quite a bit, mostly driver fixes here. No
  known ongoing investigations.

  Current release - new code bugs:

   - eth: ti: am65-cpsw:
      - fix multi queue Rx on J7
      - fix warning in am65_cpsw_nuss_remove_rx_chns()

  Previous releases - regressions:

   - mptcp: do not require admin perm to list endpoints, got missed in a
     refactoring

   - mptcp: use sock_kfree_s instead of kfree

  Previous releases - always broken:

   - sctp: properly validate chunk size in sctp_sf_ootb() fix OOB access

   - virtio_net: make RSS interact properly with queue number

   - can: mcp251xfd: mcp251xfd_get_tef_len(): fix length calculation

   - can: mcp251xfd: mcp251xfd_ring_alloc(): fix coalescing
     configuration when switching CAN modes

  Misc:

   - revert earlier hns3 fixes, they were ignoring IOMMU abstractions
     and need to be reworked

   - can: {cc770,sja1000}_isa: allow building on x86_64"

* tag 'net-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits)
  drivers: net: ionic: add missed debugfs cleanup to ionic_probe() error path
  net/smc: do not leave a dangling sk pointer in __smc_create()
  rxrpc: Fix missing locking causing hanging calls
  net/smc: Fix lookup of netdev by using ib_device_get_netdev()
  net: arc: rockchip: fix emac mdio node support
  net: arc: fix the device for dma_map_single/dma_unmap_single
  virtio_net: Update rss when set queue
  virtio_net: Sync rss config to device when virtnet_probe
  virtio_net: Add hash_key_length check
  virtio_net: Support dynamic rss indirection table size
  netfilter: nf_tables: wait for rcu grace period on net_device removal
  net: stmmac: Fix unbalanced IRQ wake disable warning on single irq case
  net: vertexcom: mse102x: Fix possible double free of TX skb
  mptcp: use sock_kfree_s instead of kfree
  mptcp: no admin perm to list endpoints
  net: phy: ti: add PHY_RST_AFTER_CLK_EN flag
  net: ethernet: ti: am65-cpsw: fix warning in am65_cpsw_nuss_remove_rx_chns()
  net: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7
  net: hns3: fix kernel crash when uninstalling driver
  Revert "Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'"
  ...

4 months agodrivers: net: ionic: add missed debugfs cleanup to ionic_probe() error path
Wentao Liang [Thu, 7 Nov 2024 02:17:56 +0000 (10:17 +0800)]
drivers: net: ionic: add missed debugfs cleanup to ionic_probe() error path

The ionic_setup_one() creates a debugfs entry for ionic upon
successful execution. However, the ionic_probe() does not
release the dentry before returning, resulting in a memory
leak.

To fix this bug, we add the ionic_debugfs_del_dev() to release
the resources in a timely manner before returning.

Fixes: 0de38d9f1dba ("ionic: extract common bits from ionic_probe")
Signed-off-by: Wentao Liang <[email protected]>
Acked-by: Shannon Nelson <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet/smc: do not leave a dangling sk pointer in __smc_create()
Eric Dumazet [Wed, 6 Nov 2024 22:19:22 +0000 (22:19 +0000)]
net/smc: do not leave a dangling sk pointer in __smc_create()

Thanks to commit 4bbd360a5084 ("socket: Print pf->create() when
it does not clear sock->sk on failure."), syzbot found an issue with AF_SMC:

smc_create must clear sock->sk on failure, family: 43, type: 1, protocol: 0
 WARNING: CPU: 0 PID: 5827 at net/socket.c:1565 __sock_create+0x96f/0xa30 net/socket.c:1563
Modules linked in:
CPU: 0 UID: 0 PID: 5827 Comm: syz-executor259 Not tainted 6.12.0-rc6-next-20241106-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
 RIP: 0010:__sock_create+0x96f/0xa30 net/socket.c:1563
Code: 03 00 74 08 4c 89 e7 e8 4f 3b 85 f8 49 8b 34 24 48 c7 c7 40 89 0c 8d 8b 54 24 04 8b 4c 24 0c 44 8b 44 24 08 e8 32 78 db f7 90 <0f> 0b 90 90 e9 d3 fd ff ff 89 e9 80 e1 07 fe c1 38 c1 0f 8c ee f7
RSP: 0018:ffffc90003e4fda0 EFLAGS: 00010246
RAX: 099c6f938c7f4700 RBX: 1ffffffff1a595fd RCX: ffff888034823c00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 00000000ffffffe9 R08: ffffffff81567052 R09: 1ffff920007c9f50
R10: dffffc0000000000 R11: fffff520007c9f51 R12: ffffffff8d2cafe8
R13: 1ffffffff1a595fe R14: ffffffff9a789c40 R15: ffff8880764298c0
FS:  000055557b518380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa62ff43225 CR3: 0000000031628000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  sock_create net/socket.c:1616 [inline]
  __sys_socket_create net/socket.c:1653 [inline]
  __sys_socket+0x150/0x3c0 net/socket.c:1700
  __do_sys_socket net/socket.c:1714 [inline]
  __se_sys_socket net/socket.c:1712 [inline]

For reference, see commit 2d859aff775d ("Merge branch
'do-not-leave-dangling-sk-pointers-in-pf-create-functions'")

Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Ignat Korchagin <[email protected]>
Cc: D. Wythe <[email protected]>
Cc: Dust Li <[email protected]>
Reviewed-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Wenjia Zhang <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agorxrpc: Fix missing locking causing hanging calls
David Howells [Wed, 6 Nov 2024 13:03:22 +0000 (13:03 +0000)]
rxrpc: Fix missing locking causing hanging calls

If a call gets aborted (e.g. because kafs saw a signal) between it being
queued for connection and the I/O thread picking up the call, the abort
will be prioritised over the connection and it will be removed from
local->new_client_calls by rxrpc_disconnect_client_call() without a lock
being held.  This may cause other calls on the list to disappear if a race
occurs.

Fix this by taking the client_call_lock when removing a call from whatever
list its ->wait_link happens to be on.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Reported-by: Marc Dionne <[email protected]>
Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread")
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet/smc: Fix lookup of netdev by using ib_device_get_netdev()
Wenjia Zhang [Wed, 6 Nov 2024 08:26:12 +0000 (09:26 +0100)]
net/smc: Fix lookup of netdev by using ib_device_get_netdev()

The SMC-R variant of the SMC protocol used direct call to function
ib_device_ops.get_netdev() to lookup netdev. As we used mlx5 device
driver to run SMC-R, it failed to find a device, because in mlx5_ib the
internal net device management for retrieving net devices was replaced
by a common interface ib_device_get_netdev() in commit 8d159eb2117b
("RDMA/mlx5: Use IB set_netdev and get_netdev functions").

Since such direct accesses to the internal net device management is not
recommended at all, update the SMC-R code to use proper API
ib_device_get_netdev().

Fixes: 54903572c23c ("net/smc: allow pnetid-less configuration")
Reported-by: Aswin K <[email protected]>
Reviewed-by: Gerd Bayer <[email protected]>
Reviewed-by: Halil Pasic <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Dust Li <[email protected]>
Reviewed-by: Wen Gu <[email protected]>
Reviewed-by: Zhu Yanjun <[email protected]>
Reviewed-by: D. Wythe <[email protected]>
Signed-off-by: Wenjia Zhang <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge tag 'pwm/for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 7 Nov 2024 17:41:34 +0000 (07:41 -1000)]
Merge tag 'pwm/for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux

Pull pwm fix from Uwe Kleine-König:
 "Fix period setting in imx-tpm driver and a maintainer update

  Erik Schumacher found and fixed a problem in the calculation of the
  PWM period setting yielding too long periods. Trevor Gamblin - who
  already cared about mainlining the pwm-axi-pwmgen driver - stepped
  forward as an additional reviewer.

  Thanks to Erik and Trevor"

* tag 'pwm/for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
  MAINTAINERS: add self as reviewer for AXI PWM GENERATOR
  pwm: imx-tpm: Use correct MODULO value for EPWM mode

4 months agoproc/softirqs: replace seq_printf with seq_put_decimal_ull_width
David Wang [Wed, 6 Nov 2024 02:12:28 +0000 (10:12 +0800)]
proc/softirqs: replace seq_printf with seq_put_decimal_ull_width

seq_printf is costy, on a system with n CPUs, reading /proc/softirqs
would yield 10*n decimal values, and the extra cost parsing format string
grows linearly with number of cpus. Replace seq_printf with
seq_put_decimal_ull_width have significant performance improvement.
On an 8CPUs system, reading /proc/softirqs show ~40% performance
gain with this patch.

Signed-off-by: David Wang <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
4 months agoMerge tag 'nf-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Thu, 7 Nov 2024 16:16:42 +0000 (08:16 -0800)]
Merge tag 'nf-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fix for net

The following series contains a Netfilter fix:

1) Wait for rcu grace period after netdevice removal is reported via event.

* tag 'nf-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: wait for rcu grace period on net_device removal
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge branch 'fix-the-arc-emac-driver'
Paolo Abeni [Thu, 7 Nov 2024 12:39:43 +0000 (13:39 +0100)]
Merge branch 'fix-the-arc-emac-driver'

Andy Yan says:

====================
Fix the arc emac driver

The arc emac driver was broken for a long time,
The first broken happens when a dma releated fix introduced in Linux 5.10.
The second broken happens when a emac device tree node restyle introduced
in Linux 6.1.

These two patches are try to make the arc emac work again.

Changes in v2:
- Add cover letter.
- Add fix tag.
- Add more detail explaination.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
4 months agonet: arc: rockchip: fix emac mdio node support
Johan Jonker [Mon, 4 Nov 2024 13:01:39 +0000 (21:01 +0800)]
net: arc: rockchip: fix emac mdio node support

The binding emac_rockchip.txt is converted to YAML.
Changed against the original binding is an added MDIO subnode.
This make the driver failed to find the PHY, and given the 'mdio
has invalid PHY address' it is probably looking in the wrong node.
Fix emac_mdio.c so that it can handle both old and new
device trees.

Fixes: 1dabb74971b3 ("ARM: dts: rockchip: restyle emac nodes")
Signed-off-by: Johan Jonker <[email protected]>
Tested-by: Andy Yan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Andy Yan <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
4 months agonet: arc: fix the device for dma_map_single/dma_unmap_single
Johan Jonker [Mon, 4 Nov 2024 13:01:38 +0000 (21:01 +0800)]
net: arc: fix the device for dma_map_single/dma_unmap_single

The ndev->dev and pdev->dev aren't the same device, use ndev->dev.parent
which has dma_mask, ndev->dev.parent is just pdev->dev.
Or it would cause the following issue:

[   39.933526] ------------[ cut here ]------------
[   39.938414] WARNING: CPU: 1 PID: 501 at kernel/dma/mapping.c:149 dma_map_page_attrs+0x90/0x1f8

Fixes: f959dcd6ddfd ("dma-direct: Fix potential NULL pointer dereference")
Signed-off-by: David Wu <[email protected]>
Signed-off-by: Johan Jonker <[email protected]>
Signed-off-by: Andy Yan <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
4 months agoMerge branch 'virtio_net-make-rss-interact-properly-with-queue-number'
Paolo Abeni [Thu, 7 Nov 2024 11:40:20 +0000 (12:40 +0100)]
Merge branch 'virtio_net-make-rss-interact-properly-with-queue-number'

Philo Lu says:

====================
virtio_net: Make RSS interact properly with queue number

With this patch set, RSS updates with queue_pairs changing:
- When virtnet_probe, init default rss and commit
- When queue_pairs changes _without_ user rss configuration, update rss
  with the new queue number
- When queue_pairs changes _with_ user rss configuration, keep rss as user
  configured

Patch 1 and 2 fix possible out of bound errors for indir_table and key.
Patch 3 and 4 add RSS update in probe() and set_queues().
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
4 months agovirtio_net: Update rss when set queue
Philo Lu [Mon, 4 Nov 2024 08:57:06 +0000 (16:57 +0800)]
virtio_net: Update rss when set queue

RSS configuration should be updated with queue number. In particular, it
should be updated when (1) rss enabled and (2) default rss configuration
is used without user modification.

During rss command processing, device updates queue_pairs using
rss.max_tx_vq. That is, the device updates queue_pairs together with
rss, so we can skip the sperate queue_pairs update
(VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET below) and return directly.

Also remove the `vi->has_rss ?` check when setting vi->rss.max_tx_vq,
because this is not used in the other hash_report case.

Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Signed-off-by: Philo Lu <[email protected]>
Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
4 months agovirtio_net: Sync rss config to device when virtnet_probe
Philo Lu [Mon, 4 Nov 2024 08:57:05 +0000 (16:57 +0800)]
virtio_net: Sync rss config to device when virtnet_probe

During virtnet_probe, default rss configuration is initialized, but was
not committed to the device. This patch fix this by sending rss command
after device ready in virtnet_probe. Otherwise, the actual rss
configuration used by device can be different with that read by user
from driver, which may confuse the user.

If the command committing fails, driver rss will be disabled.

Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Signed-off-by: Philo Lu <[email protected]>
Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Joe Damato <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
4 months agovirtio_net: Add hash_key_length check
Philo Lu [Mon, 4 Nov 2024 08:57:04 +0000 (16:57 +0800)]
virtio_net: Add hash_key_length check

Add hash_key_length check in virtnet_probe() to avoid possible out of
bound errors when setting/reading the hash key.

Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Signed-off-by: Philo Lu <[email protected]>
Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Joe Damato <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
4 months agovirtio_net: Support dynamic rss indirection table size
Philo Lu [Mon, 4 Nov 2024 08:57:03 +0000 (16:57 +0800)]
virtio_net: Support dynamic rss indirection table size

When reading/writing virtio_net_ctrl_rss, we get the indirection table
size from vi->rss_indir_table_size, which is initialized in
virtnet_probe(). However, the actual size of indirection_table was set
as VIRTIO_NET_RSS_MAX_TABLE_LEN=128. This collision may cause issues if
the vi->rss_indir_table_size exceeds 128.

This patch instead uses dynamic indirection table, allocated with
vi->rss after vi->rss_indir_table_size initialized. And free it in
virtnet_remove().

In virtnet_commit_rss_command(), sgs for rss is initialized differently
with hash_report. So indirection_table is not used if !vi->has_rss, and
then we don't need to alloc indirection_table for hash_report only uses.

Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Signed-off-by: Philo Lu <[email protected]>
Signed-off-by: Xuan Zhuo <[email protected]>
Acked-by: Joe Damato <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
4 months agonetfilter: nf_tables: wait for rcu grace period on net_device removal
Pablo Neira Ayuso [Tue, 5 Nov 2024 11:07:22 +0000 (12:07 +0100)]
netfilter: nf_tables: wait for rcu grace period on net_device removal

8c873e219970 ("netfilter: core: free hooks with call_rcu") removed
synchronize_net() call when unregistering basechain hook, however,
net_device removal event handler for the NFPROTO_NETDEV was not updated
to wait for RCU grace period.

Note that 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks
on net_device removal") does not remove basechain rules on device
removal, I was hinted to remove rules on net_device removal later, see
5ebe0b0eec9d ("netfilter: nf_tables: destroy basechain and rules on
netdevice removal").

Although NETDEV_UNREGISTER event is guaranteed to be handled after
synchronize_net() call, this path needs to wait for rcu grace period via
rcu callback to release basechain hooks if netns is alive because an
ongoing netlink dump could be in progress (sockets hold a reference on
the netns).

Note that nf_tables_pre_exit_net() unregisters and releases basechain
hooks but it is possible to see NETDEV_UNREGISTER at a later stage in
the netns exit path, eg. veth peer device in another netns:

 cleanup_net()
  default_device_exit_batch()
   unregister_netdevice_many_notify()
    notifier_call_chain()
     nf_tables_netdev_event()
      __nft_release_basechain()

In this particular case, same rule of thumb applies: if netns is alive,
then wait for rcu grace period because netlink dump in the other netns
could be in progress. Otherwise, if the other netns is going away then
no netlink dump can be in progress and basechain hooks can be released
inmediately.

While at it, turn WARN_ON() into WARN_ON_ONCE() for the basechain
validation, which should not ever happen.

Fixes: 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
4 months agonet: stmmac: Fix unbalanced IRQ wake disable warning on single irq case
Nícolas F. R. A. Prado [Fri, 1 Nov 2024 21:17:29 +0000 (17:17 -0400)]
net: stmmac: Fix unbalanced IRQ wake disable warning on single irq case

Commit a23aa0404218 ("net: stmmac: ethtool: Fixed calltrace caused by
unbalanced disable_irq_wake calls") introduced checks to prevent
unbalanced enable and disable IRQ wake calls. However it only
initialized the auxiliary variable on one of the paths,
stmmac_request_irq_multi_msi(), missing the other,
stmmac_request_irq_single().

Add the same initialization on stmmac_request_irq_single() to prevent
"Unbalanced IRQ <x> wake disable" warnings from being printed the first
time disable_irq_wake() is called on platforms that run on that code
path.

Fixes: a23aa0404218 ("net: stmmac: ethtool: Fixed calltrace caused by unbalanced disable_irq_wake calls")
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/20241101-stmmac-unbalanced-wake-single-fix-v1-1-5952524c97f0@collabora.com
Signed-off-by: Paolo Abeni <[email protected]>
4 months agonet: vertexcom: mse102x: Fix possible double free of TX skb
Stefan Wahren [Tue, 5 Nov 2024 16:31:01 +0000 (17:31 +0100)]
net: vertexcom: mse102x: Fix possible double free of TX skb

The scope of the TX skb is wider than just mse102x_tx_frame_spi(),
so in case the TX skb room needs to be expanded, we should free the
the temporary skb instead of the original skb. Otherwise the original
TX skb pointer would be freed again in mse102x_tx_work(), which leads
to crashes:

  Internal error: Oops: 0000000096000004 [#2] PREEMPT SMP
  CPU: 0 PID: 712 Comm: kworker/0:1 Tainted: G      D            6.6.23
  Hardware name: chargebyte Charge SOM DC-ONE (DT)
  Workqueue: events mse102x_tx_work [mse102x]
  pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : skb_release_data+0xb8/0x1d8
  lr : skb_release_data+0x1ac/0x1d8
  sp : ffff8000819a3cc0
  x29: ffff8000819a3cc0 x28: ffff0000046daa60 x27: ffff0000057f2dc0
  x26: ffff000005386c00 x25: 0000000000000002 x24: 00000000ffffffff
  x23: 0000000000000000 x22: 0000000000000001 x21: ffff0000057f2e50
  x20: 0000000000000006 x19: 0000000000000000 x18: ffff00003fdacfcc
  x17: e69ad452d0c49def x16: 84a005feff870102 x15: 0000000000000000
  x14: 000000000000024a x13: 0000000000000002 x12: 0000000000000000
  x11: 0000000000000400 x10: 0000000000000930 x9 : ffff00003fd913e8
  x8 : fffffc00001bc008
  x7 : 0000000000000000 x6 : 0000000000000008
  x5 : ffff00003fd91340 x4 : 0000000000000000 x3 : 0000000000000009
  x2 : 00000000fffffffe x1 : 0000000000000000 x0 : 0000000000000000
  Call trace:
   skb_release_data+0xb8/0x1d8
   kfree_skb_reason+0x48/0xb0
   mse102x_tx_work+0x164/0x35c [mse102x]
   process_one_work+0x138/0x260
   worker_thread+0x32c/0x438
   kthread+0x118/0x11c
   ret_from_fork+0x10/0x20
  Code: aa1303e0 97fffab6 72001c1f 54000141 (f9400660)

Cc: [email protected]
Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
Signed-off-by: Stefan Wahren <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Linus Torvalds [Wed, 6 Nov 2024 23:09:22 +0000 (13:09 -1000)]
Merge tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "These are mostly fixes that came up during the nfs bakeathon the other
  week.

  Stable Fixes:
   - Fix KMSAN warning in decode_getfattr_attrs()

  Other Bugfixes:
   - Handle -ENOTCONN in xs_tcp_setup_socked()
   - NFSv3: only use NFS timeout for MOUNT when protocols are compatible
   - Fix attribute delegation behavior on exclusive create and a/mtime
     changes
   - Fix localio to cope with racing nfs_local_probe()
   - Avoid i_lock contention in fs_clear_invalid_mapping()"

* tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  nfs: avoid i_lock contention in nfs_clear_invalid_mapping
  nfs_common: fix localio to cope with racing nfs_local_probe()
  NFS: Further fixes to attribute delegation a/mtime changes
  NFS: Fix attribute delegation behaviour on exclusive create
  nfs: Fix KMSAN warning in decode_getfattr_attrs()
  NFSv3: only use NFS timeout for MOUNT when protocols are compatible
  sunrpc: handle -ENOTCONN in xs_tcp_setup_socket()

4 months agoMerge tag 'keys-next-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkk...
Linus Torvalds [Wed, 6 Nov 2024 19:29:15 +0000 (09:29 -1000)]
Merge tag 'keys-next-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull keys fixes from Jarkko Sakkinen:
 "A couple of fixes for keys and trusted keys"

* tag 'keys-next-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  KEYS: trusted: dcp: fix NULL dereference in AEAD crypto operation
  security/keys: fix slab-out-of-bounds in key_task_permission

4 months agoMerge tag 'tracefs-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Wed, 6 Nov 2024 18:08:39 +0000 (08:08 -1000)]
Merge tag 'tracefs-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracefs fixes from Steven Rostedt:
 "Fix tracefs mount options.

  Commit 78ff64081949 ("vfs: Convert tracefs to use the new mount API")
  broke the gid setting when set by fstab or other mount utility. It is
  ignored when it is set. Fix the code so that it recognises the option
  again and will honor the settings on mount at boot up.

  Update the internal documentation and create a selftest to make sure
  it doesn't break again in the future"

* tag 'tracefs-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/selftests: Add tracefs mount options test
  tracing: Document tracefs gid mount option
  tracing: Fix tracefs mount options

4 months agoMerge tag 'platform-drivers-x86-v6.12-4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 6 Nov 2024 18:03:19 +0000 (08:03 -1000)]
Merge tag 'platform-drivers-x86-v6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Hans de Goede:

 - AMD PMF: Add new hardware id

 - AMD PMC: Fix crash when loaded with enable_stb=1 on devices without STB

 - Dell: Add Alienware hwid for Alienware systems with Dell WMI interface

 - thinkpad_acpi: Quirk to fix wrong fan speed readings on L480

 - New hotkey mappings for Dell and Lenovo laptops

* tag 'platform-drivers-x86-v6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: thinkpad_acpi: Fix for ThinkPad's with ECFW showing incorrect fan speed
  platform/x86: ideapad-laptop: add missing Ideapad Pro 5 fn keys
  platform/x86: dell-wmi-base: Handle META key Lock/Unlock events
  platform/x86: dell-smbios-base: Extends support to Alienware products
  platform/x86/amd/pmc: Detect when STB is not available
  platform/x86/amd/pmf: Add SMU metrics table support for 1Ah family 60h model

4 months agoMerge tag 'for-6.12/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 6 Nov 2024 17:56:47 +0000 (07:56 -1000)]
Merge tag 'for-6.12/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mikulas Patocka:

 - fix memory safety bugs in dm-cache

 - fix restart/panic logic in dm-verity

 - fix 32-bit unsigned integer overflow in dm-unstriped

 - fix a device mapper crash if blk_alloc_disk fails

* tag 'for-6.12/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache: fix potential out-of-bounds access on the first resume
  dm cache: optimize dirty bit checking with find_next_bit when resizing
  dm cache: fix out-of-bounds access to the dirty bitset when resizing
  dm cache: fix flushing uninitialized delayed_work on cache_ctr error
  dm cache: correct the number of origin blocks to match the target length
  dm-verity: don't crash if panic_on_corruption is not selected
  dm-unstriped: cast an operand to sector_t to prevent potential uint32_t overflow
  dm: fix a crash if blk_alloc_disk fails

4 months agoMerge tag 'hid-for-linus-20241105' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 6 Nov 2024 17:49:54 +0000 (07:49 -1000)]
Merge tag 'hid-for-linus-20241105' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fix from Jiri Kosina:

 - report buffer sanitization fix for HID core (Jiri Kosina)

* tag 'hid-for-linus-20241105' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: core: zero-initialize the report buffer

4 months agoplatform/x86: thinkpad_acpi: Fix for ThinkPad's with ECFW showing incorrect fan speed
Vishnu Sankar [Tue, 5 Nov 2024 23:55:05 +0000 (08:55 +0900)]
platform/x86: thinkpad_acpi: Fix for ThinkPad's with ECFW showing incorrect fan speed

Fix for Thinkpad's with ECFW showing incorrect fan speed. Some models use
decimal instead of hexadecimal for the speed stored in the EC registers.
For example the rpm register will have 0x4200 instead of 0x1068, here
the actual RPM is "4200" in decimal.

Add a quirk to handle this.

Signed-off-by: Vishnu Sankar <[email protected]>
Suggested-by: Mark Pearson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
4 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Wed, 6 Nov 2024 02:05:50 +0000 (18:05 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-11-04 (ice, idpf, i40e, e1000e)

For ice:

Marcin adjusts ordering of calls in ice_eswitch_detach() to resolve a
use after free issue.

Mateusz corrects variable type for Flow Director queue to fix issues
related to drop actions.

For idpf:

Pavan resolves issues related to reset on idpf; avoiding use of freed
vport and correctly unrolling the mailbox task.

For i40e:

Aleksandr fixes a race condition involving addition and deletion of VF
MAC filters.

For e1000e:

Vitaly reverts workaround for Meteor Lake causing regressions in power
management flows.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  e1000e: Remove Meteor Lake SMBUS workarounds
  i40e: fix race condition by adding filter's intermediate sync state
  idpf: fix idpf_vc_core_init error path
  idpf: avoid vport access in idpf_get_link_ksettings
  ice: change q_index variable type to s16 to store -1 value
  ice: Fix use after free during unload with ports in bridge
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge branch 'mptcp-pm-fix-wrong-perm-and-sock-kfree'
Jakub Kicinski [Wed, 6 Nov 2024 01:51:10 +0000 (17:51 -0800)]
Merge branch 'mptcp-pm-fix-wrong-perm-and-sock-kfree'

Matthieu Baerts says:

====================
mptcp: pm: fix wrong perm and sock kfree

Two small fixes related to the MPTCP path-manager:

- Patch 1: remove an accidental restriction to admin users to list MPTCP
  endpoints. A regression from v6.7.

- Patch 2: correctly use sock_kfree_s() instead of kfree() in the
  userspace PM. A fix for another fix introduced in v6.4 and
  backportable up to v5.19.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agomptcp: use sock_kfree_s instead of kfree
Geliang Tang [Mon, 4 Nov 2024 12:31:42 +0000 (13:31 +0100)]
mptcp: use sock_kfree_s instead of kfree

The local address entries on userspace_pm_local_addr_list are allocated
by sock_kmalloc().

It's then required to use sock_kfree_s() instead of kfree() to free
these entries in order to adjust the allocated size on the sk side.

Fixes: 24430f8bf516 ("mptcp: add address into userspace pm list")
Cc: [email protected]
Signed-off-by: Geliang Tang <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agomptcp: no admin perm to list endpoints
Matthieu Baerts (NGI0) [Mon, 4 Nov 2024 12:31:41 +0000 (13:31 +0100)]
mptcp: no admin perm to list endpoints

During the switch to YNL, the command to list all endpoints has been
accidentally restricted to users with admin permissions.

It looks like there are no reasons to have this restriction which makes
it harder for a user to quickly check if the endpoint list has been
correctly populated by an automated tool. Best to go back to the
previous behaviour then.

mptcp_pm_gen.c has been modified using ynl-gen-c.py:

   $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \
     --spec Documentation/netlink/specs/mptcp_pm.yaml --source \
     -o net/mptcp/mptcp_pm_gen.c

The header file doesn't need to be regenerated.

Fixes: 1d0507f46843 ("net: mptcp: convert netlink from small_ops to ops")
Cc: [email protected]
Reviewed-by: Davide Caratti <[email protected]>
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agonet: phy: ti: add PHY_RST_AFTER_CLK_EN flag
Diogo Silva [Sat, 2 Nov 2024 15:15:05 +0000 (16:15 +0100)]
net: phy: ti: add PHY_RST_AFTER_CLK_EN flag

DP83848 datasheet (section 4.7.2) indicates that the reset pin should be
toggled after the clocks are running. Add the PHY_RST_AFTER_CLK_EN to
make sure that this indication is respected.

In my experience not having this flag enabled would lead to, on some
boots, the wrong MII mode being selected if the PHY was initialized on
the bootloader and was receiving data during Linux boot.

Signed-off-by: Diogo Silva <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Fixes: 34e45ad9378c ("net: phy: dp83848: Add TI DP83848 Ethernet PHY")
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge branch 'net-ethernet-ti-am65-cpsw-fixes-to-multi-queue-rx-feature'
Paolo Abeni [Tue, 5 Nov 2024 14:56:48 +0000 (15:56 +0100)]
Merge branch 'net-ethernet-ti-am65-cpsw-fixes-to-multi-queue-rx-feature'

Roger Quadros says:

====================
net: ethernet: ti: am65-cpsw: Fixes to multi queue RX feature

On J7 platforms, setting up multiple RX flows was failing
as the RX free descriptor ring 0 is shared among all flows
and we did not allocate enough elements in the RX free descriptor
ring 0 to accommodate for all RX flows. Patch 1 fixes this.

The second patch fixes a warning if there was any error in
am65_cpsw_nuss_init_rx_chns() and am65_cpsw_nuss_cleanup_rx_chns()
was called after that.

Signed-off-by: Roger Quadros <[email protected]>
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
4 months agonet: ethernet: ti: am65-cpsw: fix warning in am65_cpsw_nuss_remove_rx_chns()
Roger Quadros [Fri, 1 Nov 2024 10:18:51 +0000 (12:18 +0200)]
net: ethernet: ti: am65-cpsw: fix warning in am65_cpsw_nuss_remove_rx_chns()

flow->irq is initialized to 0 which is a valid IRQ. Set it to -EINVAL
in error path of am65_cpsw_nuss_init_rx_chns() so we do not try
to free an unallocated IRQ in am65_cpsw_nuss_remove_rx_chns().

If user tried to change number of RX queues and am65_cpsw_nuss_init_rx_chns()
failed due to any reason, the warning will happen if user tries to change
the number of RX queues after the error condition.

root@am62xx-evm:~# ethtool -L eth0 rx 3
[   40.385293] am65-cpsw-nuss 8000000.ethernet: set new flow-id-base 19
[   40.393211] am65-cpsw-nuss 8000000.ethernet: Failed to init rx flow2
netlink error: Invalid argument
root@am62xx-evm:~# ethtool -L eth0 rx 2
[   82.306427] ------------[ cut here ]------------
[   82.311075] WARNING: CPU: 0 PID: 378 at kernel/irq/devres.c:144 devm_free_irq+0x84/0x90
[   82.469770] Call trace:
[   82.472208]  devm_free_irq+0x84/0x90
[   82.475777]  am65_cpsw_nuss_remove_rx_chns+0x6c/0xac [ti_am65_cpsw_nuss]
[   82.482487]  am65_cpsw_nuss_update_tx_rx_chns+0x2c/0x9c [ti_am65_cpsw_nuss]
[   82.489442]  am65_cpsw_set_channels+0x30/0x4c [ti_am65_cpsw_nuss]
[   82.495531]  ethnl_set_channels+0x224/0x2dc
[   82.499713]  ethnl_default_set_doit+0xb8/0x1b8
[   82.504149]  genl_family_rcv_msg_doit+0xc0/0x124
[   82.508757]  genl_rcv_msg+0x1f0/0x284
[   82.512409]  netlink_rcv_skb+0x58/0x130
[   82.516239]  genl_rcv+0x38/0x50
[   82.519374]  netlink_unicast+0x1d0/0x2b0
[   82.523289]  netlink_sendmsg+0x180/0x3c4
[   82.527205]  __sys_sendto+0xe4/0x158
[   82.530779]  __arm64_sys_sendto+0x28/0x38
[   82.534782]  invoke_syscall+0x44/0x100
[   82.538526]  el0_svc_common.constprop.0+0xc0/0xe0
[   82.543221]  do_el0_svc+0x1c/0x28
[   82.546528]  el0_svc+0x28/0x98
[   82.549578]  el0t_64_sync_handler+0xc0/0xc4
[   82.553752]  el0t_64_sync+0x190/0x194
[   82.557407] ---[ end trace 0000000000000000 ]---

Fixes: da70d184a8c3 ("net: ethernet: ti: am65-cpsw: Introduce multi queue Rx")
Signed-off-by: Roger Quadros <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
4 months agonet: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7
Roger Quadros [Fri, 1 Nov 2024 10:18:50 +0000 (12:18 +0200)]
net: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7

On J7 platforms, setting up multiple RX flows was failing
as the RX free descriptor ring 0 is shared among all flows
and we did not allocate enough elements in the RX free descriptor
ring 0 to accommodate for all RX flows.

This issue is not present on AM62 as separate pair of
rings are used for free and completion rings for each flow.

Fix this by allocating enough elements for RX free descriptor
ring 0.

However, we can no longer rely on desc_idx (descriptor based
offsets) to identify the pages in the respective flows as
free descriptor ring includes elements for all flows.
To solve this, introduce a new swdata data structure to store
flow_id and page. This can be used to identify which flow (page_pool)
and page the descriptor belonged to when popped out of the
RX rings.

Fixes: da70d184a8c3 ("net: ethernet: ti: am65-cpsw: Introduce multi queue Rx")
Signed-off-by: Roger Quadros <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
4 months agonet: hns3: fix kernel crash when uninstalling driver
Peiyang Wang [Fri, 1 Nov 2024 09:15:07 +0000 (17:15 +0800)]
net: hns3: fix kernel crash when uninstalling driver

When the driver is uninstalled and the VF is disabled concurrently, a
kernel crash occurs. The reason is that the two actions call function
pci_disable_sriov(). The num_VFs is checked to determine whether to
release the corresponding resources. During the second calling, num_VFs
is not 0 and the resource release function is called. However, the
corresponding resource has been released during the first invoking.
Therefore, the problem occurs:

[15277.839633][T50670] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
...
[15278.131557][T50670] Call trace:
[15278.134686][T50670]  klist_put+0x28/0x12c
[15278.138682][T50670]  klist_del+0x14/0x20
[15278.142592][T50670]  device_del+0xbc/0x3c0
[15278.146676][T50670]  pci_remove_bus_device+0x84/0x120
[15278.151714][T50670]  pci_stop_and_remove_bus_device+0x6c/0x80
[15278.157447][T50670]  pci_iov_remove_virtfn+0xb4/0x12c
[15278.162485][T50670]  sriov_disable+0x50/0x11c
[15278.166829][T50670]  pci_disable_sriov+0x24/0x30
[15278.171433][T50670]  hnae3_unregister_ae_algo_prepare+0x60/0x90 [hnae3]
[15278.178039][T50670]  hclge_exit+0x28/0xd0 [hclge]
[15278.182730][T50670]  __se_sys_delete_module.isra.0+0x164/0x230
[15278.188550][T50670]  __arm64_sys_delete_module+0x1c/0x30
[15278.193848][T50670]  invoke_syscall+0x50/0x11c
[15278.198278][T50670]  el0_svc_common.constprop.0+0x158/0x164
[15278.203837][T50670]  do_el0_svc+0x34/0xcc
[15278.207834][T50670]  el0_svc+0x20/0x30

For details, see the following figure.

     rmmod hclge              disable VFs
----------------------------------------------------
hclge_exit()            sriov_numvfs_store()
  ...                     device_lock()
  pci_disable_sriov()     hns3_pci_sriov_configure()
                            pci_disable_sriov()
                              sriov_disable()
    sriov_disable()             if !num_VFs :
      if !num_VFs :               return;
        return;                 sriov_del_vfs()
      sriov_del_vfs()             ...
        ...                       klist_put()
        klist_put()               ...
        ...                     num_VFs = 0;
      num_VFs = 0;        device_unlock();

In this patch, when driver is removing, we get the device_lock()
to protect num_VFs, just like sriov_numvfs_store().

Fixes: 0dd8a25f355b ("net: hns3: disable sriov before unload hclge layer")
Signed-off-by: Peiyang Wang <[email protected]>
Signed-off-by: Jijie Shao <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
4 months agoRevert "Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'"
Jakub Kicinski [Tue, 5 Nov 2024 02:03:52 +0000 (18:03 -0800)]
Revert "Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'"

This reverts commit d80a3091308491455b6501b1c4b68698c4a7cd24, reversing
changes made to 637f41476384c76d3cd7dcf5947caf2c8b8d7a9b:

2cf246143519 ("net: hns3: fix kernel crash when 1588 is sent on HIP08 devices")
3e22b7de34cb ("net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue")
d1c2e2961ab4 ("net: hns3: initialize reset_timer before hclgevf_misc_irq_init()")
5f62009ff108 ("net: hns3: don't auto enable misc vector")
2758f18a83ef ("net: hns3: Resolved the issue that the debugfs query result is inconsistent.")
662ecfc46690 ("net: hns3: fix missing features due to dev->features configuration too early")
3e0f7cc887b7 ("net: hns3: fixed reset failure issues caused by the incorrect reset type")
f2c14899caba ("net: hns3: add sync command to sync io-pgtable")
e6ab19443b36 ("net: hns3: default enable tx bounce buffer when smmu enabled")

The series is making the driver poke into IOMMU internals instead of
implementing appropriate IOMMU workarounds.

Link: https://lore.kernel.org/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge tag 'linux-can-fixes-for-6.12-20241104' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Tue, 5 Nov 2024 01:48:52 +0000 (17:48 -0800)]
Merge tag 'linux-can-fixes-for-6.12-20241104' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2024-11-04

Alexander Hölzl contributes a patch to fix an error in the CAN j1939
documentation.

Thomas Mühlbacher's patch allows building of the {cc770,sja1000}_isa
drivers on x86_64 again.

A patch by me targets the m_can driver and limits the call to
free_irq() to devices with IRQs.

Dario Binacchi's patch fixes the RX and TX error counters in the c_can
driver.

The next 2 patches target the rockchip_canfd driver. Geert
Uytterhoeven's patch lets the driver depend on ARCH_ROCKCHIP. Jean
Delvare's patch drops the obsolete dependency on COMPILE_TEST.

The last 2 patches are by me and fix 2 regressions in the mcp251xfd
driver: fix broken coalescing configuration when switching CAN modes
and fix the length calculation of the Transmit Event FIFO (TEF) on
full TEF.

* tag 'linux-can-fixes-for-6.12-20241104' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: mcp251xfd: mcp251xfd_get_tef_len(): fix length calculation
  can: mcp251xfd: mcp251xfd_ring_alloc(): fix coalescing configuration when switching CAN modes
  can: rockchip_canfd: Drop obsolete dependency on COMPILE_TEST
  can: rockchip_canfd: CAN_ROCKCHIP_CANFD should depend on ARCH_ROCKCHIP
  can: c_can: fix {rx,tx}_errors statistics
  can: m_can: m_can_close(): don't call free_irq() for IRQ-less devices
  can: {cc770,sja1000}_isa: allow building on x86_64
  can: j1939: fix error in J1939 documentation.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
4 months agoMerge tag 'arm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Tue, 5 Nov 2024 01:23:26 +0000 (15:23 -1000)]
Merge tag 'arm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull SoC fixes from Arnd Bergmann:
 "Where the last set of fixes was mostly drivers, this time the
  devicetree changes all come at once, targeting mostly the Rockchips,
  Qualcomm and NXP platforms.

  The Qualcomm bugfixes target the Snapdragon X Elite laptops,
  specifically problems with PCIe and NVMe support to improve
  reliability, and a boot regresion on msm8939.

  Also for Snapdragon platforms, there are a number of correctness
  changes in the several platform specific device drivers, but none of
  these are as impactful.

  On the NXP i.MX platform, the fixes are all for 64-bit i.MX8 variants,
  correcting individual entries in the devicetree that were incorrect
  and causing the media, video, mmc and spi drivers to misbehave in
  minor ways.

  The Arm SCMI firmware driver gets fixes for a use-after-free bug and
  for correctly parsing firmware information.

  On the RISC-V side, there are three minor devicetree fixes for
  starfive and sophgo, again addressing only minor mistakes. One device
  driver patch fixes a problem with spurious interrupt handling"

* tag 'arm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (63 commits)
  firmware: arm_scmi: Use vendor string in max-rx-timeout-ms
  dt-bindings: firmware: arm,scmi: Add missing vendor string
  riscv: dts: Replace deprecated snps,nr-gpios property for snps,dw-apb-gpio-port devices
  arm64: dts: rockchip: Correct GPIO polarity on brcm BT nodes
  arm64: dts: rockchip: Drop invalid clock-names from es8388 codec nodes
  ARM: dts: rockchip: Fix the realtek audio codec on rk3036-kylin
  ARM: dts: rockchip: Fix the spi controller on rk3036
  ARM: dts: rockchip: drop grf reference from rk3036 hdmi
  ARM: dts: rockchip: fix rk3036 acodec node
  arm64: dts: rockchip: remove orphaned pinctrl-names from pinephone pro
  soc: qcom: pmic_glink: Handle GLINK intent allocation rejections
  rpmsg: glink: Handle rejected intent request better
  arm64: dts: qcom: x1e80100: fix PCIe5 interconnect
  arm64: dts: qcom: x1e80100: fix PCIe4 interconnect
  arm64: dts: qcom: x1e80100: Fix up BAR spaces
  MAINTAINERS: invert Misc RISC-V SoC Support's pattern
  soc: qcom: socinfo: fix revision check in qcom_socinfo_probe()
  arm64: dts: qcom: x1e80100-qcp: fix nvme regulator boot glitch
  arm64: dts: qcom: x1e80100-microsoft-romulus: fix nvme regulator boot glitch
  arm64: dts: qcom: x1e80100-yoga-slim7x: fix nvme regulator boot glitch
  ...

4 months agoe1000e: Remove Meteor Lake SMBUS workarounds
Vitaly Lifshits [Tue, 1 Oct 2024 17:08:48 +0000 (20:08 +0300)]
e1000e: Remove Meteor Lake SMBUS workarounds

This is a partial revert to commit 76a0a3f9cc2f ("e1000e: fix force smbus
during suspend flow"). That commit fixed a sporadic PHY access issue but
introduced a regression in runtime suspend flows.
The original issue on Meteor Lake systems was rare in terms of the
reproduction rate and the number of the systems affected.

After the integration of commit 0a6ad4d9e169 ("e1000e: avoid failing the
system during pm_suspend"), PHY access loss can no longer cause a
system-level suspend failure. As it only occurs when the LAN cable is
disconnected, and is recovered during system resume flow. Therefore, its
functional impact is low, and the priority is given to stabilizing
runtime suspend.

Fixes: 76a0a3f9cc2f ("e1000e: fix force smbus during suspend flow")
Signed-off-by: Vitaly Lifshits <[email protected]>
Tested-by: Avigail Dahan <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
4 months agoi40e: fix race condition by adding filter's intermediate sync state
Aleksandr Loktionov [Wed, 16 Oct 2024 09:30:11 +0000 (11:30 +0200)]
i40e: fix race condition by adding filter's intermediate sync state

Fix a race condition in the i40e driver that leads to MAC/VLAN filters
becoming corrupted and leaking. Address the issue that occurs under
heavy load when multiple threads are concurrently modifying MAC/VLAN
filters by setting mac and port VLAN.

1. Thread T0 allocates a filter in i40e_add_filter() within
        i40e_ndo_set_vf_port_vlan().
2. Thread T1 concurrently frees the filter in __i40e_del_filter() within
        i40e_ndo_set_vf_mac().
3. Subsequently, i40e_service_task() calls i40e_sync_vsi_filters(), which
        refers to the already freed filter memory, causing corruption.

Reproduction steps:
1. Spawn multiple VFs.
2. Apply a concurrent heavy load by running parallel operations to change
        MAC addresses on the VFs and change port VLANs on the host.
3. Observe errors in dmesg:
"Error I40E_AQ_RC_ENOSPC adding RX filters on VF XX,
please set promiscuous on manually for VF XX".

Exact code for stable reproduction Intel can't open-source now.

The fix involves implementing a new intermediate filter state,
I40E_FILTER_NEW_SYNC, for the time when a filter is on a tmp_add_list.
These filters cannot be deleted from the hash list directly but
must be removed using the full process.

Fixes: 278e7d0b9d68 ("i40e: store MAC/VLAN filters in a hash with the MAC Address as key")
Signed-off-by: Aleksandr Loktionov <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Reviewed-by: Michal Schmidt <[email protected]>
Tested-by: Michal Schmidt <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
4 months agoidpf: fix idpf_vc_core_init error path
Pavan Kumar Linga [Fri, 25 Oct 2024 18:38:43 +0000 (11:38 -0700)]
idpf: fix idpf_vc_core_init error path

In an event where the platform running the device control plane
is rebooted, reset is detected on the driver. It releases
all the resources and waits for the reset to complete. Once the
reset is done, it tries to build the resources back. At this
time if the device control plane is not yet started, then
the driver timeouts on the virtchnl message and retries to
establish the mailbox again.

In the retry flow, mailbox is deinitialized but the mailbox
workqueue is still alive and polling for the mailbox message.
This results in accessing the released control queue leading to
null-ptr-deref. Fix it by unrolling the work queue cancellation
and mailbox deinitialization in the reverse order which they got
initialized.

Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request")
Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager")
Cc: [email protected] # 6.9+
Reviewed-by: Tarun K Singh <[email protected]>
Signed-off-by: Pavan Kumar Linga <[email protected]>
Tested-by: Krishneil Singh <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
4 months agoidpf: avoid vport access in idpf_get_link_ksettings
Pavan Kumar Linga [Fri, 25 Oct 2024 18:38:42 +0000 (11:38 -0700)]
idpf: avoid vport access in idpf_get_link_ksettings

When the device control plane is removed or the platform
running device control plane is rebooted, a reset is detected
on the driver. On driver reset, it releases the resources and
waits for the reset to complete. If the reset fails, it takes
the error path and releases the vport lock. At this time if the
monitoring tools tries to access link settings, it call traces
for accessing released vport pointer.

To avoid it, move link_speed_mbps to netdev_priv structure
which removes the dependency on vport pointer and the vport lock
in idpf_get_link_ksettings. Also use netif_carrier_ok()
to check the link status and adjust the offsetof to use link_up
instead of link_speed_mbps.

Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks")
Cc: [email protected] # 6.7+
Reviewed-by: Tarun K Singh <[email protected]>
Signed-off-by: Pavan Kumar Linga <[email protected]>
Tested-by: Krishneil Singh <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
4 months agoice: change q_index variable type to s16 to store -1 value
Mateusz Polchlopek [Mon, 28 Oct 2024 16:59:22 +0000 (12:59 -0400)]
ice: change q_index variable type to s16 to store -1 value

Fix Flow Director not allowing to re-map traffic to 0th queue when action
is configured to drop (and vice versa).

The current implementation of ethtool callback in the ice driver forbids
change Flow Director action from 0 to -1 and from -1 to 0 with an error,
e.g:

 # ethtool -U eth2 flow-type tcp4 src-ip 1.1.1.1 loc 1 action 0
 # ethtool -U eth2 flow-type tcp4 src-ip 1.1.1.1 loc 1 action -1
 rmgr: Cannot insert RX class rule: Invalid argument

We set the value of `u16 q_index = 0` at the beginning of the function
ice_set_fdir_input_set(). In case of "drop traffic" action (which is
equal to -1 in ethtool) we store the 0 value. Later, when want to change
traffic rule to redirect to queue with index 0 it returns an error
caused by duplicate found.

Fix this behaviour by change of the type of field `q_index` from u16 to s16
in `struct ice_fdir_fltr`. This allows to store -1 in the field in case
of "drop traffic" action. What is more, change the variable type in the
function ice_set_fdir_input_set() and assign at the beginning the new
`#define ICE_FDIR_NO_QUEUE_IDX` which is -1. Later, if the action is set
to another value (point specific queue index) the variable value is
overwritten in the function.

Fixes: cac2a27cd9ab ("ice: Support IPv4 Flow Director filters")
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Mateusz Polchlopek <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
4 months agoice: Fix use after free during unload with ports in bridge
Marcin Szycik [Wed, 9 Oct 2024 15:18:35 +0000 (17:18 +0200)]
ice: Fix use after free during unload with ports in bridge

Unloading the ice driver while switchdev port representors are added to
a bridge can lead to kernel panic. Reproducer:

  modprobe ice

  devlink dev eswitch set $PF1_PCI mode switchdev

  ip link add $BR type bridge
  ip link set $BR up

  echo 2 > /sys/class/net/$PF1/device/sriov_numvfs
  sleep 2

  ip link set $PF1 master $BR
  ip link set $VF1_PR master $BR
  ip link set $VF2_PR master $BR
  ip link set $PF1 up
  ip link set $VF1_PR up
  ip link set $VF2_PR up
  ip link set $VF1 up

  rmmod irdma ice

When unloading the driver, ice_eswitch_detach() is eventually called as
part of VF freeing. First, it removes a port representor from xarray,
then unregister_netdev() is called (via repr->ops.rem()), finally
representor is deallocated. The problem comes from the bridge doing its
own deinit at the same time. unregister_netdev() triggers a notifier
chain, resulting in ice_eswitch_br_port_deinit() being called. It should
set repr->br_port = NULL, but this does not happen since repr has
already been removed from xarray and is not found. Regardless, it
finishes up deallocating br_port. At this point, repr is still not freed
and an fdb event can happen, in which ice_eswitch_br_fdb_event_work()
takes repr->br_port and tries to use it, which causes a panic (use after
free).

Note that this only happens with 2 or more port representors added to
the bridge, since with only one representor port, the bridge deinit is
slightly different (ice_eswitch_br_port_deinit() is called via
ice_eswitch_br_ports_flush(), not ice_eswitch_br_port_unlink()).

Trace:
  Oops: general protection fault, probably for non-canonical address 0xf129010fd1a93284: 0000 [#1] PREEMPT SMP KASAN NOPTI
  KASAN: maybe wild-memory-access in range [0x8948287e8d499420-0x8948287e8d499427]
  (...)
  Workqueue: ice_bridge_wq ice_eswitch_br_fdb_event_work [ice]
  RIP: 0010:__rht_bucket_nested+0xb4/0x180
  (...)
  Call Trace:
   (...)
   ice_eswitch_br_fdb_find+0x3fa/0x550 [ice]
   ? __pfx_ice_eswitch_br_fdb_find+0x10/0x10 [ice]
   ice_eswitch_br_fdb_event_work+0x2de/0x1e60 [ice]
   ? __schedule+0xf60/0x5210
   ? mutex_lock+0x91/0xe0
   ? __pfx_ice_eswitch_br_fdb_event_work+0x10/0x10 [ice]
   ? ice_eswitch_br_update_work+0x1f4/0x310 [ice]
   (...)

A workaround is available: brctl setageing $BR 0, which stops the bridge
from adding fdb entries altogether.

Change the order of operations in ice_eswitch_detach(): move the call to
unregister_netdev() before removing repr from xarray. This way
repr->br_port will be correctly set to NULL in
ice_eswitch_br_port_deinit(), preventing a panic.

Fixes: fff292b47ac1 ("ice: add VF representors one by one")
Reviewed-by: Michal Swiatkowski <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Signed-off-by: Marcin Szycik <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
4 months agoKEYS: trusted: dcp: fix NULL dereference in AEAD crypto operation
David Gstir [Tue, 29 Oct 2024 11:34:01 +0000 (12:34 +0100)]
KEYS: trusted: dcp: fix NULL dereference in AEAD crypto operation

When sealing or unsealing a key blob we currently do not wait for
the AEAD cipher operation to finish and simply return after submitting
the request. If there is some load on the system we can exit before
the cipher operation is done and the buffer we read from/write to
is already removed from the stack. This will e.g. result in NULL
pointer dereference errors in the DCP driver during blob creation.

Fix this by waiting for the AEAD cipher operation to finish before
resuming the seal and unseal calls.

Cc: [email protected] # v6.10+
Fixes: 0e28bf61a5f9 ("KEYS: trusted: dcp: fix leak of blob encryption key")
Reported-by: Parthiban N <[email protected]>
Closes: https://lore.kernel.org/keyrings/[email protected]/
Signed-off-by: David Gstir <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
4 months agosecurity/keys: fix slab-out-of-bounds in key_task_permission
Chen Ridong [Tue, 8 Oct 2024 12:46:39 +0000 (12:46 +0000)]
security/keys: fix slab-out-of-bounds in key_task_permission

KASAN reports an out of bounds read:
BUG: KASAN: slab-out-of-bounds in __kuid_val include/linux/uidgid.h:36
BUG: KASAN: slab-out-of-bounds in uid_eq include/linux/uidgid.h:63 [inline]
BUG: KASAN: slab-out-of-bounds in key_task_permission+0x394/0x410
security/keys/permission.c:54
Read of size 4 at addr ffff88813c3ab618 by task stress-ng/4362

CPU: 2 PID: 4362 Comm: stress-ng Not tainted 5.10.0-14930-gafbffd6c3ede #15
Call Trace:
 __dump_stack lib/dump_stack.c:82 [inline]
 dump_stack+0x107/0x167 lib/dump_stack.c:123
 print_address_description.constprop.0+0x19/0x170 mm/kasan/report.c:400
 __kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560
 kasan_report+0x3a/0x50 mm/kasan/report.c:585
 __kuid_val include/linux/uidgid.h:36 [inline]
 uid_eq include/linux/uidgid.h:63 [inline]
 key_task_permission+0x394/0x410 security/keys/permission.c:54
 search_nested_keyrings+0x90e/0xe90 security/keys/keyring.c:793

This issue was also reported by syzbot.

It can be reproduced by following these steps(more details [1]):
1. Obtain more than 32 inputs that have similar hashes, which ends with the
   pattern '0xxxxxxxe6'.
2. Reboot and add the keys obtained in step 1.

The reproducer demonstrates how this issue happened:
1. In the search_nested_keyrings function, when it iterates through the
   slots in a node(below tag ascend_to_node), if the slot pointer is meta
   and node->back_pointer != NULL(it means a root), it will proceed to
   descend_to_node. However, there is an exception. If node is the root,
   and one of the slots points to a shortcut, it will be treated as a
   keyring.
2. Whether the ptr is keyring decided by keyring_ptr_is_keyring function.
   However, KEYRING_PTR_SUBTYPE is 0x2UL, the same as
   ASSOC_ARRAY_PTR_SUBTYPE_MASK.
3. When 32 keys with the similar hashes are added to the tree, the ROOT
   has keys with hashes that are not similar (e.g. slot 0) and it splits
   NODE A without using a shortcut. When NODE A is filled with keys that
   all hashes are xxe6, the keys are similar, NODE A will split with a
   shortcut. Finally, it forms the tree as shown below, where slot 6 points
   to a shortcut.

                      NODE A
              +------>+---+
      ROOT    |       | 0 | xxe6
      +---+   |       +---+
 xxxx | 0 | shortcut  :   : xxe6
      +---+   |       +---+
 xxe6 :   :   |       |   | xxe6
      +---+   |       +---+
      | 6 |---+       :   : xxe6
      +---+           +---+
 xxe6 :   :           | f | xxe6
      +---+           +---+
 xxe6 | f |
      +---+

4. As mentioned above, If a slot(slot 6) of the root points to a shortcut,
   it may be mistakenly transferred to a key*, leading to a read
   out-of-bounds read.

To fix this issue, one should jump to descend_to_node if the ptr is a
shortcut, regardless of whether the node is root or not.

[1] https://lore.kernel.org/linux-kernel/1cfa878e-8c7b-4570-8606-21daf5e13ce7@huaweicloud.com/

[jarkko: tweaked the commit message a bit to have an appropriate closes
 tag.]
Fixes: b2a4df200d57 ("KEYS: Expand the capacity of a keyring")
Reported-by: [email protected]
Closes: https://lore.kernel.org/all/[email protected]/T/
Signed-off-by: Chen Ridong <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>
4 months agoMerge tag 'mmc-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Mon, 4 Nov 2024 18:07:22 +0000 (08:07 -1000)]
Merge tag 'mmc-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull mmc fixes from Ulf Hansson:

 - sdhci-pci-gli: A couple of fixes for low power mode on GL9767

* tag 'mmc-v6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-pci-gli: GL9767: Fix low power mode in the SD Express process
  mmc: sdhci-pci-gli: GL9767: Fix low power mode on the set clock function

4 months agoMerge tag 'tpmdd-next-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 4 Nov 2024 18:00:14 +0000 (08:00 -1000)]
Merge tag 'tpmdd-next-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm fix from Jarkko Sakkinen:
 "Fix a race condition between tpm_pm_suspend() and tpm_hwrng_read() (I
  think for good now)"

* tag 'tpmdd-next-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: Lock TPM chip in tpm_pm_suspend() first

4 months agocan: mcp251xfd: mcp251xfd_get_tef_len(): fix length calculation
Marc Kleine-Budde [Tue, 1 Oct 2024 14:56:22 +0000 (16:56 +0200)]
can: mcp251xfd: mcp251xfd_get_tef_len(): fix length calculation

Commit b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround
broken TEF FIFO tail index erratum") introduced
mcp251xfd_get_tef_len() to get the number of unhandled transmit events
from the Transmit Event FIFO (TEF).

As the TEF has no head pointer, the driver uses the TX FIFO's tail
pointer instead, assuming that send frames are completed. However the
check for the TEF being full was not correct. This leads to the driver
stop working if the TEF is full.

Fix the TEF full check by assuming that if, from the driver's point of
view, there are no free TX buffers in the chip and the TX FIFO is
empty, all messages must have been sent and the TEF must therefore be
full.

Reported-by: Sven Schuchmann <[email protected]>
Closes: https://patch.msgid.link/FR3P281MB155216711EFF900AD9791B7ED9692@FR3P281MB1552.DEUP281.PROD.OUTLOOK.COM
Fixes: b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum")
Tested-by: Sven Schuchmann <[email protected]>
Cc: [email protected]
Link: https://patch.msgid.link/20241104-mcp251xfd-fix-length-calculation-v3-1-608b6e7e2197@pengutronix.de
Signed-off-by: Marc Kleine-Budde <[email protected]>
4 months agocan: mcp251xfd: mcp251xfd_ring_alloc(): fix coalescing configuration when switching...
Marc Kleine-Budde [Fri, 25 Oct 2024 12:34:40 +0000 (14:34 +0200)]
can: mcp251xfd: mcp251xfd_ring_alloc(): fix coalescing configuration when switching CAN modes

Since commit 50ea5449c563 ("can: mcp251xfd: fix ring configuration
when switching from CAN-CC to CAN-FD mode"), the current ring and
coalescing configuration is passed to can_ram_get_layout(). That fixed
the issue when switching between CAN-CC and CAN-FD mode with
configured ring (rx, tx) and/or coalescing parameters (rx-frames-irq,
tx-frames-irq).

However 50ea5449c563 ("can: mcp251xfd: fix ring configuration when
switching from CAN-CC to CAN-FD mode"), introduced a regression when
switching CAN modes with disabled coalescing configuration: Even if
the previous CAN mode has no coalescing configured, the new mode is
configured with active coalescing. This leads to delayed receiving of
CAN-FD frames.

This comes from the fact, that ethtool uses usecs = 0 and max_frames =
1 to disable coalescing, however the driver uses internally
priv->{rx,tx}_obj_num_coalesce_irq = 0 to indicate disabled
coalescing.

Fix the regression by assigning struct ethtool_coalesce
ec->{rx,tx}_max_coalesced_frames_irq = 1 if coalescing is disabled in
the driver as can_ram_get_layout() expects this.

Reported-by: https://github.com/vdh-robothania
Closes: https://github.com/raspberrypi/linux/issues/6407
Fixes: 50ea5449c563 ("can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode")
Cc: [email protected]
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/20241025-mcp251xfd-fix-coalesing-v1-1-9d11416de1df@pengutronix.de
Signed-off-by: Marc Kleine-Budde <[email protected]>
4 months agocan: rockchip_canfd: Drop obsolete dependency on COMPILE_TEST
Jean Delvare [Tue, 22 Oct 2024 11:04:39 +0000 (13:04 +0200)]
can: rockchip_canfd: Drop obsolete dependency on COMPILE_TEST

Since commit 0166dc11be91 ("of: make CONFIG_OF user selectable"), OF
can be enabled on all architectures. Therefore depending on
COMPILE_TEST as an alternative is no longer needed.

Signed-off-by: Jean Delvare <[email protected]>
Reviewed-by: Vincent Mailhol <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
4 months agocan: rockchip_canfd: CAN_ROCKCHIP_CANFD should depend on ARCH_ROCKCHIP
Geert Uytterhoeven [Tue, 24 Sep 2024 09:15:31 +0000 (11:15 +0200)]
can: rockchip_canfd: CAN_ROCKCHIP_CANFD should depend on ARCH_ROCKCHIP

The Rockchip CAN-FD controller is only present on Rockchip SoCs. Hence
add a dependency on ARCH_ROCKCHIP, to prevent asking the user about
this driver when configuring a kernel without Rockchip platform
support.

Fixes: ff60bfbaf67f219c ("can: rockchip_canfd: add driver for Rockchip CAN-FD controller")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Heiko Stuebner <[email protected]>
Link: https://patch.msgid.link/a4b3c8c1cca9515e67adac83af5ba1b1fab2fcbc.1727169288.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <[email protected]>
4 months agocan: c_can: fix {rx,tx}_errors statistics
Dario Binacchi [Mon, 14 Oct 2024 13:53:13 +0000 (15:53 +0200)]
can: c_can: fix {rx,tx}_errors statistics

The c_can_handle_bus_err() function was incorrectly incrementing only the
receive error counter, even in cases of bit or acknowledgment errors that
occur during transmission. The patch fixes the issue by incrementing the
appropriate counter based on the type of error.

Fixes: 881ff67ad450 ("can: c_can: Added support for Bosch C_CAN controller")
Signed-off-by: Dario Binacchi <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
4 months agocan: m_can: m_can_close(): don't call free_irq() for IRQ-less devices
Marc Kleine-Budde [Mon, 30 Sep 2024 17:02:30 +0000 (19:02 +0200)]
can: m_can: m_can_close(): don't call free_irq() for IRQ-less devices

In commit b382380c0d2d ("can: m_can: Add hrtimer to generate software
interrupt") support for IRQ-less devices was added. Instead of an
interrupt, the interrupt routine is called by a hrtimer-based polling
loop.

That patch forgot to change free_irq() to be only called for devices
with IRQs. Fix this, by calling free_irq() conditionally only if an
IRQ is available for the device (and thus has been requested
previously).

Fixes: b382380c0d2d ("can: m_can: Add hrtimer to generate software interrupt")
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Markus Schneider-Pargmann <[email protected]>
Link: https://patch.msgid.link/[email protected]
Cc: <[email protected]> # v6.6+
Signed-off-by: Marc Kleine-Budde <[email protected]>
4 months agocan: {cc770,sja1000}_isa: allow building on x86_64
Thomas Mühlbacher [Thu, 19 Sep 2024 17:35:22 +0000 (17:35 +0000)]
can: {cc770,sja1000}_isa: allow building on x86_64

The ISA variable is only defined if X86_32 is also defined. However,
these drivers are still useful and in use on at least some modern 64-bit
x86 industrial systems as well. With the correct module parameters, they
work as long as IO port communication is possible, despite their name
having ISA in them.

Fixes: a29689e60ed3 ("net: handle HAS_IOPORT dependencies")
Signed-off-by: Thomas Mühlbacher <[email protected]>
Link: https://patch.msgid.link/[email protected]
Cc: [email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
4 months agocan: j1939: fix error in J1939 documentation.
Alexander Hölzl [Wed, 23 Oct 2024 14:52:57 +0000 (16:52 +0200)]
can: j1939: fix error in J1939 documentation.

The description of PDU1 format usage mistakenly referred to PDU2 format.

Signed-off-by: Alexander Hölzl <[email protected]>
Acked-by: Oleksij Rempel <[email protected]>
Acked-by: Vincent Mailhol <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
4 months agodm cache: fix potential out-of-bounds access on the first resume
Ming-Hung Tsai [Tue, 22 Oct 2024 07:13:54 +0000 (15:13 +0800)]
dm cache: fix potential out-of-bounds access on the first resume

Out-of-bounds access occurs if the fast device is expanded unexpectedly
before the first-time resume of the cache table. This happens because
expanding the fast device requires reloading the cache table for
cache_create to allocate new in-core data structures that fit the new
size, and the check in cache_preresume is not performed during the
first resume, leading to the issue.

Reproduce steps:

1. prepare component devices:

dmsetup create cmeta --table "0 8192 linear /dev/sdc 0"
dmsetup create cdata --table "0 65536 linear /dev/sdc 8192"
dmsetup create corig --table "0 524288 linear /dev/sdc 262144"
dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct

2. load a cache table of 512 cache blocks, and deliberately expand the
   fast device before resuming the cache, making the in-core data
   structures inadequate.

dmsetup create cache --notable
dmsetup reload cache --table "0 524288 cache /dev/mapper/cmeta \
/dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0"
dmsetup reload cdata --table "0 131072 linear /dev/sdc 8192"
dmsetup resume cdata
dmsetup resume cache

3. suspend the cache to write out the in-core dirty bitset and hint
   array, leading to out-of-bounds access to the dirty bitset at offset
   0x40:

dmsetup suspend cache

KASAN reports:

  BUG: KASAN: vmalloc-out-of-bounds in is_dirty_callback+0x2b/0x80
  Read of size 8 at addr ffffc90000085040 by task dmsetup/90

  (...snip...)
  The buggy address belongs to the virtual mapping at
   [ffffc90000085000ffffc90000087000) created by:
   cache_ctr+0x176a/0x35f0

  (...snip...)
  Memory state around the buggy address:
   ffffc90000084f00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
   ffffc90000084f80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
  >ffffc90000085000: 00 00 00 00 00 00 00 00 f8 f8 f8 f8 f8 f8 f8 f8
                                             ^
   ffffc90000085080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
   ffffc90000085100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8

Fix by checking the size change on the first resume.

Signed-off-by: Ming-Hung Tsai <[email protected]>
Fixes: f494a9c6b1b6 ("dm cache: cache shrinking support")
Cc: [email protected]
Signed-off-by: Mikulas Patocka <[email protected]>
Acked-by: Joe Thornber <[email protected]>
This page took 0.17009 seconds and 4 git commands to generate.