Git Repo - linux.git/log

Bluetooth: hci_sync: use hci_skb_event() helper

This instance is the only opencoded version of the macro, so have it
follow suit.

Signed-off-by: Ahmad Fatoum <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: eir: Add helpers for managing service data

This adds helpers for accessing and appending service data (0x16) ad
type.

Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: hci_sync: Fix attempting to suspend with unfiltered passive scan

When suspending the passive scanning _must_ have its filter_policy set
to 0x01 to use the accept list otherwise _any_ advertise report would
end up waking up the system.

In order to fix the filter_policy the code now checks for
hdev->suspended && HCI_CONN_FLAG_REMOTE_WAKEUP
first, since the MGMT_OP_SET_DEVICE_FLAGS will reject any attempt to
set HCI_CONN_FLAG_REMOTE_WAKEUP when it cannot be programmed in the
acceptlist, so it can return success causing the proper filter_policy
to be used.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215768
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: MGMT: Add conditions for setting HCI_CONN_FLAG_REMOTE_WAKEUP

HCI_CONN_FLAG_REMOTE_WAKEUP can only be set if device can be programmed
in the allowlist which in case of device using RPA requires LL Privacy
support to be enabled.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215768
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: btmtksdio: fix the reset takes too long

Sending WMT command during the reset in progress is invalid and would get
no response from firmware until the reset is complete, so we ignore the WMT
command here to resolve the issue which causes the whole reset process
taking too long.

Fixes: 8fafe702253d ("Bluetooth: mt7921s: support bluetooth reset mechanism")
Co-developed-by: Yake Yang <[email protected]>
Signed-off-by: Yake Yang <[email protected]>
Signed-off-by: Sean Wang <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: btmtksdio: fix possible FW initialization failure

According to FW advised sequence, mt7921s need to re-acquire privilege
immediately after the firmware download is complete before normal running.
Otherwise, it is still possible the bus may be stuck in an abnormal status
that causes FW initialization failure in the current driver.

Fixes: 752aea58489f ("Bluetooth: mt7921s: fix bus hang with wrong privilege")
Co-developed-by: Yake Yang <[email protected]>
Signed-off-by: Yake Yang <[email protected]>
Signed-off-by: Sean Wang <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: btmtksdio: fix use-after-free at btmtksdio_recv_event

We should not access skb buffer data anymore after hci_recv_frame was
called.

[   39.634809] BUG: KASAN: use-after-free in btmtksdio_recv_event+0x1b0
[   39.634855] Read of size 1 at addr ffffff80cf28a60d by task kworker
[   39.634962] Call trace:
[   39.634974]  dump_backtrace+0x0/0x3b8
[   39.634999]  show_stack+0x20/0x2c
[   39.635016]  dump_stack_lvl+0x60/0x78
[   39.635040]  print_address_description+0x70/0x2f0
[   39.635062]  kasan_report+0x154/0x194
[   39.635079]  __asan_report_load1_noabort+0x44/0x50
[   39.635099]  btmtksdio_recv_event+0x1b0/0x1c4
[   39.635129]  btmtksdio_txrx_work+0x6cc/0xac4
[   39.635157]  process_one_work+0x560/0xc5c
[   39.635177]  worker_thread+0x7ec/0xcc0
[   39.635195]  kthread+0x2d0/0x3d0
[   39.635215]  ret_from_fork+0x10/0x20
[   39.635247] Allocated by task 0:
[   39.635260] (stack is not available)
[   39.635281] Freed by task 2392:
[   39.635295]  kasan_save_stack+0x38/0x68
[   39.635319]  kasan_set_track+0x28/0x3c
[   39.635338]  kasan_set_free_info+0x28/0x4c
[   39.635357]  ____kasan_slab_free+0x104/0x150
[   39.635374]  __kasan_slab_free+0x18/0x28
[   39.635391]  slab_free_freelist_hook+0x114/0x248
[   39.635410]  kfree+0xf8/0x2b4
[   39.635427]  skb_free_head+0x58/0x98
[   39.635447]  skb_release_data+0x2f4/0x410
[   39.635464]  skb_release_all+0x50/0x60
[   39.635481]  kfree_skb+0xc8/0x25c
[   39.635498]  hci_event_packet+0x894/0xca4 [bluetooth]
[   39.635721]  hci_rx_work+0x1c8/0x68c [bluetooth]
[   39.635925]  process_one_work+0x560/0xc5c
[   39.635951]  worker_thread+0x7ec/0xcc0
[   39.635970]  kthread+0x2d0/0x3d0
[   39.635990]  ret_from_fork+0x10/0x20
[   39.636021] The buggy address belongs to the object at ffffff80cf28a600
                which belongs to the cache kmalloc-512 of size 512
[   39.636039] The buggy address is located 13 bytes inside of
                512-byte region [ffffff80cf28a600, ffffff80cf28a800)

Fixes: 9aebfd4a2200 ("Bluetooth: mediatek: add support for MediaTek MT7663S and MT7668S SDIO devices")
Co-developed-by: Yake Yang <[email protected]>
Signed-off-by: Yake Yang <[email protected]>
Signed-off-by: Sean Wang <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: btbcm: Add entry for BCM4373A0 UART Bluetooth

This patch adds the device ID for the BCM4373A0 module, found e.g. in
the Infineon (Cypress) CYW4373E chip. The required firmware file is
named 'BCM4373A0.hcd'.

Signed-off-by: Tim Harvey <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: btusb: Add a new PID/VID 0489/e0c8 for MT7921

Add VID 0489 & PID e0c8 for MediaTek MT7921 USB Bluetooth chip.

The information in /sys/kernel/debug/usb/devices about the Bluetooth
device is listed as the below.

T:  Bus=01 Lev=01 Prnt=01 Port=13 Cnt=03 Dev#=  4 Spd=480  MxCh= 0
D:  Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0489 ProdID=e0c8 Rev= 1.00
S:  Manufacturer=MediaTek Inc.
S:  Product=Wireless_Device
S:  SerialNumber=000000000
C:* #Ifs= 3 Cfg#= 1 Atr=e0 MxPwr=100mA
A:  FirstIf#= 0 IfCount= 3 Cls=e0(wlcon) Sub=01 Prot=01
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=125us
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
I:  If#= 1 Alt= 6 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  63 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  63 Ivl=1ms
I:* If#= 2 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none)
E:  Ad=8a(I) Atr=03(Int.) MxPS=  64 Ivl=125us
E:  Ad=0a(O) Atr=03(Int.) MxPS=  64 Ivl=125us
I:  If#= 2 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none)
E:  Ad=8a(I) Atr=03(Int.) MxPS= 512 Ivl=125us
E:  Ad=0a(O) Atr=03(Int.) MxPS= 512 Ivl=125us

Signed-off-by: Sean Wang <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: btusb: Add 0x0bda:0x8771 Realtek 8761BUV devices

Identifies as just "Realtek Semiconductor Corp. Bluetooth Radio"; it's
used in many adapters, e.g.:

- UGREEN CM390
- C-TECH BTD-01
- Orico BTA-508
- KS-is KS-457

Device description at /sys/kernel/debug/usb/devices:

T:  Bus=03 Lev=01 Prnt=01 Port=05 Cnt=02 Dev#=  3 Spd=12   MxCh= 0
D:  Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0bda ProdID=8771 Rev= 2.00
S:  Manufacturer=Realtek
S:  Product=Bluetooth Radio
S:  SerialNumber=00E04C239987
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms

Signed-off-by: Ismael Luceno <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: btusb: Set HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for QCA

Set HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for QCA controllers since
they answer HCI_OP_READ_DEF_ERR_DATA_REPORTING with error code
"UNKNOWN HCI COMMAND" as shown below:

[  580.517552] Bluetooth: hci0: unexpected cc 0x0c5a length: 1 < 2
[  580.517660] Bluetooth: hci0: Opcode 0x c5a failed: -38

hcitool -i hci0 cmd 0x03 0x5a
< HCI Command: ogf 0x03, ocf 0x005a, plen 0
> HCI Event: 0x0e plen 4
  01 5A 0C 01

btmon log:
< HCI Command: Read Default Erroneous Data Reporting (0x03|0x005a) plen 0
> HCI Event: Command Complete (0x0e) plen 4
      Read Default Erroneous Data Reporting (0x03|0x005a) ncmd 1
        Status: Unknown HCI Command (0x01)

Signed-off-by: Zijun Hu <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: core: Fix missing power_on work cancel on HCI close

Move power_on work cancel to hci_dev_close_sync to ensure that power_on
work is canceled after HCI interface down, power off, rfkill, etc.

For example, if

hciconfig hci0 down

is done early enough during boot, it may run before power_on work.
Then, power_on work will actually bring up interface despite above
hciconfig command.

Signed-off-by: Vasyl Vavrychuk <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: btusb: add support for Qualcomm WCN785x

Qualcomm WCN785x has PID/VID 0cf3/e700 as shown by
/sys/kernel/debug/usb/devices:

T:  Bus=02 Lev=02 Prnt=02 Port=01 Cnt=02 Dev#=  8 Spd=12   MxCh= 0
D:  Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0cf3 ProdID=e700 Rev= 0.01
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
I:  If#= 1 Alt= 6 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  63 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  63 Ivl=1ms
I:  If#= 1 Alt= 7 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  65 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  65 Ivl=1ms

Signed-off-by: Zijun Hu <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: protect le accept and resolv lists with hdev->lock

Concurrent operations from events on le_{accept,resolv}_list are
currently unprotected by hdev->lock.
Most existing code do already protect the lists with that lock.
This can be observed in hci_debugfs and hci_sync.
Add the protection for these events too.

Fixes: b950aa88638c ("Bluetooth: Add definitions and track LE resolve list modification")
Fixes: 0f36b589e4ee ("Bluetooth: Track LE white list modification via HCI commands")
Signed-off-by: Niels Dossche <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: use hdev lock for accept_list and reject_list in conn req

All accesses (both reads and modifications) to
hdev->{accept,reject}_list are protected by hdev lock,
except the ones in hci_conn_request_evt. This can cause a race
condition in the form of a list corruption.
The solution is to protect these lists in hci_conn_request_evt as well.

I was unable to find the exact commit that introduced the issue for the
reject list, I was only able to find it for the accept list.

Fixes: a55bd29d5227 ("Bluetooth: Add white list lookup for incoming connection requests")
Signed-off-by: Niels Dossche <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: use hdev lock in activate_scan for hci_is_adv_monitoring

hci_is_adv_monitoring's function documentation states that it must be
called under the hdev lock. Paths that leads to an unlocked call are:
discov_update => start_discovery => interleaved_discov => active_scan
and: discov_update => start_discovery => active_scan

The solution is to take the lock in active_scan during the duration of
the call to hci_is_adv_monitoring.

Fixes: c32d624640fd ("Bluetooth: disable filter dup when scan for adv monitor")
Signed-off-by: Niels Dossche <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: btrtl: Add support for RTL8852C

Add the support for RTL8852C BT controller on USB interface.
The necessary firmware file will be submitted to linux-firmware.

Signed-off-by: Max Chou <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: btusb: Set HCI_QUIRK_BROKEN_ENHANCED_SETUP_SYNC_CONN for QCA

This sets HCI_QUIRK_BROKEN_ENHANCED_SETUP_SYNC_CONN for QCA controllers
since SCO appear to not work when using HCI_OP_ENHANCED_SETUP_SYNC_CONN.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215576
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: Print broken quirks

This prints warnings for controllers setting broken quirks to increase
their visibility and warn about broken controllers firmware that
probably needs updates to behave properly.

Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: HCI: Add HCI_QUIRK_BROKEN_ENHANCED_SETUP_SYNC_CONN quirk

This adds HCI_QUIRK_BROKEN_ENHANCED_SETUP_SYNC_CONN quirk which can be
used to mark HCI_Enhanced_Setup_Synchronous_Connection as broken even
if its support command bit are set since some controller report it as
supported but the command don't work properly with some configurations
(e.g. BT_VOICE_TRANSPARENT/mSBC).

Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: hci_qca: Use del_timer_sync() before freeing

While looking at a crash report on a timer list being corrupted, which
usually happens when a timer is freed while still active. This is
commonly triggered by code calling del_timer() instead of
del_timer_sync() just before freeing.

One possible culprit is the hci_qca driver, which does exactly that.

Eric mentioned that wake_retrans_timer could be rearmed via the work
queue, so also move the destruction of the work queue before
del_timer_sync().

Cc: Eric Dumazet <[email protected]>
Cc: [email protected]
Fixes: 0ff252c1976da ("Bluetooth: hciuart: Add support QCA chipset for UART")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: btintel: Constify static struct regmap_bus

The only usage of regmap_ibt is to (after the regmap_init() macro is
expanded), pass its address to __regmap_init(), which takes a pointer to
const struct regmap_bus as input. Make it const to allow the compiler to
put it in read-only memory.

Signed-off-by: Rikard Falkeborn <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: Keep MGMT pending queue ordered FIFO

Small change to add new commands to tail of the list, and find/remove them
from the head of the list.

Signed-off-by: Brian Gix <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: fix dangling sco_conn and use-after-free in sco_sock_timeout

Connecting the same socket twice consecutively in sco_sock_connect()
could lead to a race condition where two sco_conn objects are created
but only one is associated with the socket. If the socket is closed
before the SCO connection is established, the timer associated with the
dangling sco_conn object won't be canceled. As the sock object is being
freed, the use-after-free problem happens when the timer callback
function sco_sock_timeout() accesses the socket. Here's the call trace:

dump_stack+0x107/0x163
? refcount_inc+0x1c/
print_address_description.constprop.0+0x1c/0x47e
? refcount_inc+0x1c/0x7b
kasan_report+0x13a/0x173
? refcount_inc+0x1c/0x7b
check_memory_region+0x132/0x139
refcount_inc+0x1c/0x7b
sco_sock_timeout+0xb2/0x1ba
process_one_work+0x739/0xbd1
? cancel_delayed_work+0x13f/0x13f
? __raw_spin_lock_init+0xf0/0xf0
? to_kthread+0x59/0x85
worker_thread+0x593/0x70e
kthread+0x346/0x35a
? drain_workqueue+0x31a/0x31a
? kthread_bind+0x4b/0x4b
ret_from_fork+0x1f/0x30

Link: https://syzkaller.appspot.com/bug?extid=2bef95d3ab4daa10155b
Reported-by: [email protected]
Fixes: e1dee2c1de2b ("Bluetooth: fix repeated calls to sco_sock_kill")
Signed-off-by: Ying Hsu <[email protected]>
Reviewed-by: Joseph Hwang <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: mt7921s: Fix the incorrect pointer check

Fix the incorrect pointer check on ven_data.

Fixes: f41b91fa1783 ("Bluetooth: mt7921s: Add .btmtk_get_codec_config_data")
Co-developed-by: Yake Yang <[email protected]>
Signed-off-by: Yake Yang <[email protected]>
Signed-off-by: Sean Wang <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

Bluetooth: btbcm: Support per-board firmware variants

There are provedly different firmware variants for the different
phones using some of these chips. These were extracted from a few
Samsung phones:

37446 BCM4334B0.samsung,codina-tmo.hcd
37366 BCM4334B0.samsung,golden.hcd
37403 BCM4334B0.samsung,kyle.hcd
37366 BCM4334B0.samsung,skomer.hcd

This patch supports the above naming schedule with inserting
[.board_name] between the firmware name and ".hcd". This scheme
is the same as used by the companion BRCM wireless chips
as can be seen in
drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c
or just by looking at the firmwares in linux-firmware/brcm.

Currently we only support board variants using the device
tree compatible string as board type, but other schemes are
possible.

This makes it possible to successfully load a few unique
firmware variants for some Samsung phones.

Cc: [email protected]
Cc: Markuss Broks <[email protected]>
Cc: Stephan Gerhold <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>

selftests: fib_nexthops: Make the test more robust

Rarely some of the test cases fail. Make the test more robust by increasing
the timeout of ping commands to 5 seconds.

Signed-off-by: Amit Cohen <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'lan95xx-no-polling'

Lukas Wunner says:

====================
Polling be gone on LAN95xx

Do away with link status polling on LAN95xx USB Ethernet
and rely on interrupts instead, thereby reducing bus traffic,
CPU overhead and improving interface bringup latency.

Link to v2:
https://lore.kernel.org/netdev/cover.1651574194 [email protected]/

Only change since v2:

* Patch [5/7]:
  * Drop call to __irq_enter_raw() which worked around a warning in
    generic_handle_domain_irq().  That warning is gone since
    792ea6a074ae (queued on tip.git/irq/urgent).
    (Marc Zyngier, Thomas Gleixner)
====================

net: phy: smsc: Cope with hot-removal in interrupt handler

If reading the Interrupt Source Flag register fails with -ENODEV, then
the PHY has been hot-removed and the correct response is to bail out
instead of throwing a WARN splat and attempting to suspend the PHY.
The PHY should be stopped in due course anyway as the kernel
asynchronously tears down the device.

Tested-by: Oleksij Rempel <[email protected]> # LAN9514/9512/9500
Tested-by: Ferry Toth <[email protected]> # LAN9514
Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: smsc: Cache interrupt mask

Cache the interrupt mask to avoid re-reading it from the PHY upon every
interrupt.

This will simplify a subsequent commit which detects hot-removal in the
interrupt handler and bails out.

Analyzing and debugging PHY transactions also becomes simpler if such
redundant reads are avoided.

Last not least, interrupt overhead and latency is slightly improved.

Tested-by: Oleksij Rempel <[email protected]> # LAN9514/9512/9500
Tested-by: Ferry Toth <[email protected]> # LAN9514
Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling

Link status of SMSC LAN95xx chips is polled once per second, even though
they're capable of signaling PHY interrupts through the MAC layer.

Forward those interrupts to the PHY driver to avoid polling.  Benefits
are reduced bus traffic, reduced CPU overhead and quicker interface
bringup.

Polling was introduced in 2016 by commit d69d16949346 ("usbnet:
smsc95xx: fix link detection for disabled autonegotiation").
Back then, the LAN95xx driver neglected to enable the ENERGYON interrupt,
hence couldn't detect link-up events when auto-negotiation was disabled.
The proper solution would have been to enable the ENERGYON interrupt
instead of polling.

Since then, PHY handling was moved from the LAN95xx driver to the SMSC
PHY driver with commit 05b35e7eb9a1 ("smsc95xx: add phylib support").
That PHY driver is capable of link detection with auto-negotiation
disabled because it enables the ENERGYON interrupt.

Note that signaling interrupts through the MAC layer not only works with
the integrated PHY, but also with an external PHY, provided its
interrupt pin is attached to LAN95xx's nPHY_INT pin.

In the unlikely event that the interrupt pin of an external PHY is
attached to a GPIO of the SoC (or not connected at all), the driver can
be amended to retrieve the irq from the PHY's of_node.

To forward PHY interrupts to phylib, it is not sufficient to call
phy_mac_interrupt().  Instead, the PHY's interrupt handler needs to run
so that PHY interrupts are cleared.  That's because according to page
119 of the LAN950x datasheet, "The source of this interrupt is a level.
The interrupt persists until it is cleared in the PHY."

https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf

Therefore, create an IRQ domain with a single IRQ for the PHY.  In the
future, the IRQ domain may be extended to support the 11 GPIOs on the
LAN95xx.

Normally the PHY interrupt should be masked until the PHY driver has
cleared it.  However masking requires a (sleeping) USB transaction and
interrupts are received in (non-sleepable) softirq context.  I decided
not to mask the interrupt at all (by using the dummy_irq_chip's noop
->irq_mask() callback):  The USB interrupt endpoint is polled in 1 msec
intervals and normally that's sufficient to wake the PHY driver's IRQ
thread and have it clear the interrupt.  If it does take longer, worst
thing that can happen is the IRQ thread is woken again.  No big deal.

Because PHY interrupts are now perpetually enabled, there's no need to
selectively enable them on suspend.  So remove all invocations of
smsc95xx_enable_phy_wakeup_interrupts().

In smsc95xx_resume(), move the call of phy_init_hw() before
usbnet_resume() (which restarts the status URB) to ensure that the PHY
is fully initialized when an interrupt is handled.

Tested-by: Oleksij Rempel <[email protected]> # LAN9514/9512/9500
Tested-by: Ferry Toth <[email protected]> # LAN9514
Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]> # from a PHY perspective
Cc: Andre Edich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

usbnet: smsc95xx: Avoid link settings race on interrupt reception

When a PHY interrupt is signaled, the SMSC LAN95xx driver updates the
MAC full duplex mode and PHY flow control registers based on cached data
in struct phy_device:

  smsc95xx_status()                 # raises EVENT_LINK_RESET
    usbnet_deferred_kevent()
      smsc95xx_link_reset()         # uses cached data in phydev

Simultaneously, phylib polls link status once per second and updates
that cached data:

  phy_state_machine()
    phy_check_link_status()
      phy_read_status()
        lan87xx_read_status()
          genphy_read_status()      # updates cached data in phydev

If smsc95xx_link_reset() wins the race against genphy_read_status(),
the registers may be updated based on stale data.

E.g. if the link was previously down, phydev->duplex is set to
DUPLEX_UNKNOWN and that's what smsc95xx_link_reset() will use, even
though genphy_read_status() may update it to DUPLEX_FULL afterwards.

PHY interrupts are currently only enabled on suspend to trigger wakeup,
so the impact of the race is limited, but we're about to enable them
perpetually.

Avoid the race by delaying execution of smsc95xx_link_reset() until
phy_state_machine() has done its job and calls back via
smsc95xx_handle_link_change().

Signaling EVENT_LINK_RESET on wakeup is not necessary because phylib
picks up link status changes through polling.  So drop the declaration
of a ->link_reset() callback.

Note that the semicolon on a line by itself added in smsc95xx_status()
is a placeholder for a function call which will be added in a subsequent
commit.  That function call will actually handle the INT_ENP_PHY_INT_
interrupt.

Tested-by: Oleksij Rempel <[email protected]> # LAN9514/9512/9500
Tested-by: Ferry Toth <[email protected]> # LAN9514
Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

usbnet: smsc95xx: Don't reset PHY behind PHY driver's back

smsc95xx_reset() resets the PHY behind the PHY driver's back, which
seems like a bad idea generally.  Remove that portion of the function.

We're about to use PHY interrupts instead of polling to detect link
changes on SMSC LAN95xx chips.  Because smsc95xx_reset() is called from
usbnet_open(), PHY interrupt settings are lost whenever the net_device
is brought up.

There are two other callers of smsc95xx_reset(), namely smsc95xx_bind()
and smsc95xx_reset_resume(), and both may indeed benefit from a PHY
reset.  However they already perform one through their calls to
phy_connect_direct() and phy_init_hw().

Tested-by: Oleksij Rempel <[email protected]> # LAN9514/9512/9500
Tested-by: Ferry Toth <[email protected]> # LAN9514
Signed-off-by: Lukas Wunner <[email protected]>
Cc: Martyn Welch <[email protected]>
Cc: Gabriel Hojda <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

usbnet: smsc95xx: Don't clear read-only PHY interrupt

Upon receiving data from the Interrupt Endpoint, the SMSC LAN95xx driver
attempts to clear the signaled interrupts by writing "all ones" to the
Interrupt Status Register.

However the driver only ever enables a single type of interrupt, namely
the PHY Interrupt. And according to page 119 of the LAN950x datasheet,
its bit in the Interrupt Status Register is read-only. There's no other
way to clear it than in a separate PHY register:

https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf

Consequently, writing "all ones" to the Interrupt Status Register is
pointless and can be dropped.

Tested-by: Oleksij Rempel <[email protected]> # LAN9514/9512/9500
Tested-by: Ferry Toth <[email protected]> # LAN9514
Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

usbnet: Run unregister_netdev() before unbind() again

Commit 2c9d6c2b871d ("usbnet: run unbind() before unregister_netdev()")
sought to fix a use-after-free on disconnect of USB Ethernet adapters.

It turns out that a different fix is necessary to address the issue:
https://lore.kernel.org/netdev/18b3541e5372bc9b9fc733d422f4e698c089077c.1650177997 [email protected]/

So the commit was not necessary.

The commit made binding and unbinding of USB Ethernet asymmetrical:
Before, usbnet_probe() first invoked the ->bind() callback and then
register_netdev(). usbnet_disconnect() mirrored that by first invoking
unregister_netdev() and then ->unbind().

Since the commit, the order in usbnet_disconnect() is reversed and no
longer mirrors usbnet_probe().

One consequence is that a PHY disconnected (and stopped) in ->unbind()
is afterwards stopped once more by unregister_netdev() as it closes the
netdev before unregistering. That necessitates a contortion in ->stop()
because the PHY may only be stopped if it hasn't already been
disconnected.

Reverting the commit allows making the call to phy_stop() unconditional
in ->stop().

Tested-by: Oleksij Rempel <[email protected]> # LAN9514/9512/9500
Tested-by: Ferry Toth <[email protected]> # LAN9514
Signed-off-by: Lukas Wunner <[email protected]>
Acked-by: Oliver Neukum <[email protected]>
Cc: Martyn Welch <[email protected]>
Cc: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: fix platform_no_drv_owner.cocci warning

Remove .owner field if calls are used which set it automatically.
./drivers/net/ethernet/sunplus/spl2sw_driver.c:569:3-8: No need to set
.owner here. The core will do it.

Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Yang Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: page_pool: add page allocation stats for two fast page allocate path

Currently If use page pool allocation stats to analysis a RX performance
degradation problem. These stats only count for pages allocate from
page_pool_alloc_pages. But nic drivers such as hns3 use
page_pool_dev_alloc_frag to allocate pages, so page stats in this API
should also be counted.

Signed-off-by: Jie Wang <[email protected]>
Signed-off-by: Guangbin Huang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: ethernet: Use swap() instead of open coding it

Clean the following coccicheck warning:

./drivers/net/ethernet/sunplus/spl2sw_driver.c:217:27-28: WARNING
opportunity for swap().

./drivers/net/ethernet/sunplus/spl2sw_driver.c:222:27-28: WARNING
opportunity for swap().

Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Jiapeng Chong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'net-inet-retire-port-only-listening_hash'

Martin KaFai Lau says:

====================
net: inet: Retire port only listening_hash

This series is to retire the port only listening_hash.

The listen sk is currently stored in two hash tables,
listening_hash (hashed by port) and lhash2 (hashed by port and address).

After commit 0ee58dad5b06 ("net: tcp6: prefer listeners bound to an address")
and commit d9fbc7f6431f ("net: tcp: prefer listeners bound to an address"),
the TCP-SYN lookup fast path does not use listening_hash.

The commit 05c0b35709c5 ("tcp: seq_file: Replace listening_hash with lhash2")
also moved the seq_file (/proc/net/tcp) iteration usage from
listening_hash to lhash2.

There are still a few listening_hash usages left.
One of them is inet_reuseport_add_sock() which uses the listening_hash
to search a listen sk during the listen() system call. This turns
out to be very slow on use cases that listen on many different
VIPs at a popular port (e.g. 443). [ On top of the slowness in
adding to the tail in the IPv6 case ]. A latter patch has a
selftest to demonstrate this case.

This series takes this chance to move all remaining listening_hash
usages to lhash2 and then retire listening_hash.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: selftests: Stress reuseport listen

This patch adds a test that has 300 VIPs listening on port 443.
Each VIP:443 will have 80 listening socks by using SO_REUSEPORT.
Thus, it will have 24000 listening socks.

Before removing the port only listening_hash, all socks will be in the
same port 443 bucket and inet_reuseport_add_sock() spends much time to
walk through the bucket. After removing the port only listening_hash
and move all usage to the port+addr lhash2, each bucket in the
ideal case has 80 sk which is much smaller than before.

Here is the test result from a qemu:
Before: listen 24000 socks took 210.210485362 (~210s)
After: listen 24000 socks took 0.207173 (~210ms)

Signed-off-by: Martin KaFai Lau <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: inet: Retire port only listening_hash

The listen sk is currently stored in two hash tables,
listening_hash (hashed by port) and lhash2 (hashed by port and address).

After commit 0ee58dad5b06 ("net: tcp6: prefer listeners bound to an address")
and commit d9fbc7f6431f ("net: tcp: prefer listeners bound to an address"),
the TCP-SYN lookup fast path does not use listening_hash.

The commit 05c0b35709c5 ("tcp: seq_file: Replace listening_hash with lhash2")
also moved the seq_file (/proc/net/tcp) iteration usage from
listening_hash to lhash2.

There are still a few listening_hash usages left.
One of them is inet_reuseport_add_sock() which uses the listening_hash
to search a listen sk during the listen() system call.  This turns
out to be very slow on use cases that listen on many different
VIPs at a popular port (e.g. 443).  [ On top of the slowness in
adding to the tail in the IPv6 case ].  The latter patch has a
selftest to demonstrate this case.

This patch takes this chance to move all remaining listening_hash
usages to lhash2 and then retire listening_hash.

Since most changes need to be done together, it is hard to cut
the listening_hash to lhash2 switch into small patches.  The
changes in this patch is highlighted here for the review
purpose.

1. Because of the listening_hash removal, lhash2 can use the
   sk->sk_nulls_node instead of the icsk->icsk_listen_portaddr_node.
   This will also keep the sk_unhashed() check to work as is
   after stop adding sk to listening_hash.

   The union is removed from inet_listen_hashbucket because
   only nulls_head is needed.

2. icsk->icsk_listen_portaddr_node and its helpers are removed.

3. The current lhash2 users needs to iterate with sk_nulls_node
   instead of icsk_listen_portaddr_node.

   One case is in the inet[6]_lhash2_lookup().

   Another case is the seq_file iterator in tcp_ipv4.c.
   One thing to note is sk_nulls_next() is needed
   because the old inet_lhash2_for_each_icsk_continue()
   does a "next" first before iterating.

4. Move the remaining listening_hash usage to lhash2

   inet_reuseport_add_sock() which this series is
   trying to improve.

   inet_diag.c and mptcp_diag.c are the final two
   remaining use cases and is moved to lhash2 now also.

Signed-off-by: Martin KaFai Lau <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: inet: Open code inet_hash2 and inet_unhash2

This patch folds lhash2 related functions into __inet_hash and
inet_unhash. This will make the removal of the listening_hash
in a latter patch easier to review.

First, this patch folds inet_hash2 into __inet_hash.

For unhash, the current call sequence is like
inet_unhash() => __inet_unhash() => inet_unhash2().
The specific testing cases in __inet_unhash() are mostly related
to TCP_LISTEN sk and its caller inet_unhash() already has
the TCP_LISTEN test, so this patch folds both __inet_unhash() and
inet_unhash2() into inet_unhash().

Note that all listening_hash users also have lhash2 initialized,
so the !h->lhash2 check is no longer needed.

Signed-off-by: Martin KaFai Lau <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: inet: Remove count from inet_listen_hashbucket

After commit 0ee58dad5b06 ("net: tcp6: prefer listeners bound to an address")
and commit d9fbc7f6431f ("net: tcp: prefer listeners bound to an address"),
the count is no longer used. This patch removes it.

Signed-off-by: Martin KaFai Lau <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'make-sfc-siena-ko-specific-to-siena'

Martin Habets says:

====================
Make sfc-siena.ko specific to Siena

This series is a follow-up to the one titled "Move Siena into
a separate subdirectory".
It enhances the new sfc-siena.ko module to differentiate it from sfc.ko.

Patches

Patches 1-5 create separate Kconfig options for Siena, and adjusts the
various names used for work items and directories.
Patch 6 reinstates SRIOV functionality in sfc-siena.ko.

Testing

Various build tests were done such as allyesconfig, W=1 and sparse.
The new sfc-siena.ko and sfc.ko modules were tested on a machine with NICs
for both modules in them.
Inserting the updated sfc.ko and the new sfc-siena.ko modules at the same
time works, and no work items and directories exist with the same name.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

sfc/siena: Reinstate SRIOV init/fini function calls

They were removed in the first series since they were not used for EF10.
Put that code back for Siena, with the prototypes in siena_sriov.h
since that file is a more applicable place for it.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Martin Habets <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

sfc/siena: Make PTP and reset support specific for Siena

Change the clock name and work queue names to differentiate them from
the names used in sfc.ko.

Signed-off-by: Martin Habets <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

sfc/siena: Make MCDI logging support specific for Siena

Add a Siena Kconfig option and use it in stead of the sfc one.
Rename the internal variable for the 'mcdi_logging_default' module
parameter to avoid a naming conflict with the one in sfc.ko.

Signed-off-by: Martin Habets <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

siena: Make HWMON support specific for Siena

Add a Siena Kconfig option and use it in stead of the sfc one.

Signed-off-by: Martin Habets <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

siena: Make SRIOV support specific for Siena

Add a Siena Kconfig option and use it in stead of the sfc one.

Signed-off-by: Martin Habets <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

siena: Make MTD support specific for Siena

Add a Siena Kconfig option and use it in stead of the sfc one.

Signed-off-by: Martin Habets <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'restructure-struct-ocelot_port'

Vladimir Oltean says:

====================
Restructure struct ocelot_port

This patch set represents preparation for further work. It adds an
"index" field to struct ocelot_port, and populates it from the Felix DSA
driver and Ocelot switchdev driver.

The users of struct ocelot_port :: index are the same users as those of
struct ocelot_port_private :: chip_port.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: mscc: ocelot: move ocelot_port_private :: chip_port to ocelot_port :: index

Currently the ocelot switch lib is unaware of the index of a struct
ocelot_port, since that is kept in the encapsulating structures of outer
drivers (struct dsa_port :: index, struct ocelot_port_private :: chip_port).

With the upcoming increase in complexity associated with assigning DSA
tag_8021q CPU ports to certain user ports, it becomes necessary for the
switch lib to be able to retrieve the index of a certain ocelot_port.

Therefore, introduce a new u8 to ocelot_port (same size as the chip_port
used by the ocelot switchdev driver) and rework the existing code to
populate and use it.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: mscc: ocelot: minimize holes in struct ocelot_port

Reorder members of struct ocelot_port to eliminate holes and reduce
structure size. Pahole says:

Before:

struct ocelot_port {
        struct ocelot *            ocelot;               /*     0     8 */
        struct regmap *            target;               /*     8     8 */
        bool                       vlan_aware;           /*    16     1 */

        /* XXX 7 bytes hole, try to pack */

        const struct ocelot_bridge_vlan  * pvid_vlan;    /*    24     8 */
        unsigned int               ptp_skbs_in_flight;   /*    32     4 */
        u8                         ptp_cmd;              /*    36     1 */

        /* XXX 3 bytes hole, try to pack */

        struct sk_buff_head        tx_skbs;              /*    40    96 */
        /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
        u8                         ts_id;                /*   136     1 */

        /* XXX 3 bytes hole, try to pack */

        phy_interface_t            phy_mode;             /*   140     4 */
        bool                       is_dsa_8021q_cpu;     /*   144     1 */
        bool                       learn_ena;            /*   145     1 */

        /* XXX 6 bytes hole, try to pack */

        struct net_device *        bond;                 /*   152     8 */
        bool                       lag_tx_active;        /*   160     1 */

        /* XXX 1 byte hole, try to pack */

        u16                        mrp_ring_id;          /*   162     2 */

        /* XXX 4 bytes hole, try to pack */

        struct net_device *        bridge;               /*   168     8 */
        int                        bridge_num;           /*   176     4 */
        u8                         stp_state;            /*   180     1 */

        /* XXX 3 bytes hole, try to pack */

        int                        speed;                /*   184     4 */

        /* size: 192, cachelines: 3, members: 18 */
        /* sum members: 161, holes: 7, sum holes: 27 */
        /* padding: 4 */
};

After:

struct ocelot_port {
        struct ocelot *            ocelot;               /*     0     8 */
        struct regmap *            target;               /*     8     8 */
        struct net_device *        bond;                 /*    16     8 */
        struct net_device *        bridge;               /*    24     8 */
        const struct ocelot_bridge_vlan  * pvid_vlan;    /*    32     8 */
        phy_interface_t            phy_mode;             /*    40     4 */
        unsigned int               ptp_skbs_in_flight;   /*    44     4 */
        struct sk_buff_head        tx_skbs;              /*    48    96 */
        /* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */
        u16                        mrp_ring_id;          /*   144     2 */
        u8                         ptp_cmd;              /*   146     1 */
        u8                         ts_id;                /*   147     1 */
        u8                         stp_state;            /*   148     1 */
        bool                       vlan_aware;           /*   149     1 */
        bool                       is_dsa_8021q_cpu;     /*   150     1 */
        bool                       learn_ena;            /*   151     1 */
        bool                       lag_tx_active;        /*   152     1 */

        /* XXX 3 bytes hole, try to pack */

        int                        bridge_num;           /*   156     4 */
        int                        speed;                /*   160     4 */

        /* size: 168, cachelines: 3, members: 18 */
        /* sum members: 161, holes: 1, sum holes: 3 */
        /* padding: 4 */
        /* last cacheline: 40 bytes */
};

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: mscc: ocelot: delete ocelot_port :: xmit_template

This is no longer used since commit 7c4bb540e917 ("net: dsa: tag_ocelot:
create separate tagger for Seville").

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'dsa-changes-for-multiple-cpu-ports-part-1'

Vladimir Oltean says:

====================
DSA changes for multiple CPU ports (part 1)

I am trying to enable the second internal port pair from the NXP LS1028A
Felix switch for DSA-tagged traffic via "ocelot-8021q". This series
represents part 1 (of an unknown number) of that effort.

It does some preparation work, like managing host flooding in DSA via a
dedicated method, and removing the CPU port as argument from the tagging
protocol change procedure.

In terms of driver-specific changes, it reworks the 2 tag protocol
implementations in the Felix driver to have a structured data format.
It enables host flooding towards all tag_8021q CPU ports. It dynamically
updates the tag_8021q CPU port used for traps. It also fixes a bug
introduced by a previous refactoring/oversimplification commit in
net-next.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: felix: reimplement tagging protocol change with function pointers

The error handling for the current tagging protocol change procedure is
a bit brittle (we dismantle the previous tagging protocol entirely
before setting up the new one). By identifying which parts of a tagging
protocol are unique to itself and which parts are shared with the other,
we can implement a protocol change procedure where error handling is a
bit more robust, because we start setting up the new protocol first, and
tear down the old one only after the setup of the specific and shared
parts succeeded.

The protocol change is a bit too open-coded too, in the area of
migrating host flood settings and MDBs. By identifying what differs
between tagging protocols (the forwarding masks for host flooding) we
can implement a more straightforward migration procedure which is
handled in the shared portion of the protocol change, rather than
individually by each protocol.

Therefore, a more structured approach calls for the introduction of a
structure of function pointers per tagging protocol. This covers setup,
teardown and the host forwarding mask. In the future it will also cover
how to prepare for a new DSA master.

The initial tagging protocol setup (at driver probe time) and the final
teardown (at driver removal time) are also adapted to call into the
structured methods of the specific protocol in current use. This is
especially relevant for teardown, where we previously called
felix_del_tag_protocol() only for the first CPU port. But by not
specifying which CPU port this is for, we gain more flexibility to
support multiple CPU ports in the future.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: felix: dynamically determine tag_8021q CPU port for traps

Ocelot switches support a single active CPU port at a time (at least as
a trapping destination, i.e. for control traffic). This is true
regardless of whether we are using the native copy-to-CPU-port-module
functionality, or a redirect action towards the software-defined
tag_8021q CPU port.

Currently we assume that the trapping destination in tag_8021q mode is
the first CPU port, yet in the future we may want to migrate the user
ports to the second CPU port.

For that to work, we need to make sure that the tag_8021q trapping
destination is a CPU port that is active, i.e. is used by at least some
user port on which the trap was added. Otherwise, we may end up
redirecting the traffic to a CPU port which isn't even up.

Note that due to the current design where we simply choose the CPU port
of the first port from the trap's ingress port mask, it may be that a
CPU port absorbes control traffic from user ports which aren't affine to
it as per user space's request. This isn't ideal, but is the lesser of
two evils. Following the user-configured affinity for traps would mean
that we can no longer reuse a single TCAM entry for multiple traps,
which is what we actually do for e.g. PTP. Either we duplicate and
deduplicate TCAM entries on the fly when user-to-CPU-port mappings
change (which is unnecessarily complicated), or we redirect trapped
traffic to all tag_8021q CPU ports if multiple such ports are in use.
The latter would have actually been nice, if it actually worked, but it
doesn't, since a OCELOT_MASK_MODE_REDIRECT action towards multiple ports
would not take PGID_SRC into consideration, and it would just duplicate
the packet towards each (CPU) port, leading to duplicates in software.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: remove port argument from ->change_tag_protocol()

DSA has not supported (and probably will not support in the future
either) independent tagging protocols per CPU port.

Different switch drivers have different requirements, some may need to
replicate some settings for each CPU port, some may need to apply some
settings on a single CPU port, while some may have to configure some
global settings and then some per-CPU-port settings.

In any case, the current model where DSA calls ->change_tag_protocol for
each CPU port turns out to be impractical for drivers where there are
global things to be done. For example, felix calls dsa_tag_8021q_register(),
which makes no sense per CPU port, so it suppresses the second call.

Let drivers deal with replication towards all CPU ports, and remove the
CPU port argument from the function prototype.

Signed-off-by: Vladimir Oltean <[email protected]>
Acked-by: Luiz Angelo Daros de Luca <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: felix: manage host flooding using a specific driver callback

At the time - commit 7569459a52c9 ("net: dsa: manage flooding on the CPU
ports") - not introducing a dedicated switch callback for host flooding
made sense, because for the only user, the felix driver, there was
nothing different to do for the CPU port than set the flood flags on the
CPU port just like on any other bridge port.

There are 2 reasons why this approach is not good enough, however.

(1) Other drivers, like sja1105, support configuring flooding as a
    function of {ingress port, egress port}, whereas the DSA
    ->port_bridge_flags() function only operates on an egress port.
    So with that driver we'd have useless host flooding from user ports
    which don't need it.

(2) Even with the felix driver, support for multiple CPU ports makes it
    difficult to piggyback on ->port_bridge_flags(). The way in which
    the felix driver is going to support host-filtered addresses with
    multiple CPU ports is that it will direct these addresses towards
    both CPU ports (in a sort of multicast fashion), then restrict the
    forwarding to only one of the two using the forwarding masks.
    Consequently, flooding will also be enabled towards both CPU ports.
    However, ->port_bridge_flags() gets passed the index of a single CPU
    port, and that leaves the flood settings out of sync between the 2
    CPU ports.

This is to say, it's better to have a specific driver method for host
flooding, which takes the user port as argument. This solves problem (1)
by allowing the driver to do different things for different user ports,
and problem (2) by abstracting the operation and letting the driver do
whatever, rather than explicitly making the DSA core point to the CPU
port it thinks needs to be touched.

This new method also creates a problem, which is that cross-chip setups
are not handled. However I don't have hardware right now where I can
test what is the proper thing to do, and there isn't hardware compatible
with multi-switch trees that supports host flooding. So it remains a
problem to be tackled in the future.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: introduce the dsa_cpu_ports() helper

Similar to dsa_user_ports() which retrieves a port mask of all user
ports, introduce dsa_cpu_ports() which retrieves the mask of all CPU
ports of a switch.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: felix: bring the NPI port indirection for host flooding to surface

For symmetry with host FDBs and MDBs where the indirection is now
handled outside the ocelot switch lib, do the same for bridge port
flags (unicast/multicast/broadcast flooding).

The only caller of the ocelot switch lib which uses the NPI port is the
Felix DSA driver.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: felix: bring the NPI port indirection for host MDBs to surface

For symmetry with host FDBs where the indirection is now handled outside
the ocelot switch lib, do the same for host MDB entries. The only caller
of the ocelot switch lib which uses the NPI port is the Felix DSA driver.

Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: felix: program host FDB entries towards PGID_CPU for tag_8021q too

I remembered why we had the host FDB migration procedure in place.

It is true that host FDB entry migration can be done by changing the
value of PGID_CPU, but the problem is that only host FDB entries learned
while operating in NPI mode go to PGID_CPU. When the CPU port operates
in tag_8021q mode, the FDB entries are learned towards the unicast PGID
equal to the physical port number of this CPU port, bypassing the
PGID_CPU indirection.

So host FDB entries learned in tag_8021q mode are not migrated any
longer towards the NPI port.

Fix this by extracting the NPI port -> PGID_CPU redirection from the
ocelot switch lib, moving it to the Felix DSA driver, and applying it
for any CPU port regardless of its kind (NPI or tag_8021q).

Fixes: a51c1c3f3218 ("net: dsa: felix: stop migrating FDBs back and forth on tag proto change")
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: lan966x: Fix use of pointer after being freed

The smatch found the following warning:

drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c:736 lan966x_fdma_reload()
warn: 'rx_dcbs' was already freed.

This issue can happen when changing the MTU on one of the ports and once
the RX buffers are allocated and then the TX buffer allocation fails.
In that case the RX buffers should not be restore. This fix this issue
such that the RX buffers will not be restored if the TX buffers failed
to be allocated.

Fixes: 2ea1cbac267e2a ("net: lan966x: Update FDMA to change MTU.")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Horatiu Vultur <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: update the register_netdevice() kdoc

The BUGS section looks quite dated, the registration
is under rtnl lock. Remove some obvious information
while at it.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

skbuff: replace a BUG_ON() with the new DEBUG_NET_WARN_ON_ONCE()

Very few drivers actually have Kconfig knobs for adding
-DDEBUG. 8 according to a quick grep, while there are
93 users of skb_checksum_none_assert(). Switch to the
new DEBUG_NET_WARN_ON_ONCE() to catch bad skbs.

Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mlxbf_gige: remove driver-managed interrupt counts

The driver currently has three interrupt counters,
which are incremented every time each interrupt handler
executes. These driver-managed counters are not
necessary as the kernel already has logic that manages
interrupt counts and exposes them via /proc/interrupts.
This patch removes the driver-managed counters.

Signed-off-by: David Thompson <[email protected]>
Signed-off-by: Asmaa Mnebhi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

No conflicts.

Build issue in drivers/net/ethernet/sfc/ptp.c
54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static")
49e6123c65da ("net: sfc: fix memory leak due to ptp channel")
https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, and bluetooth.

  No outstanding fires.

  Current release - regressions:

   - eth: atlantic: always deep reset on pm op, fix null-deref

  Current release - new code bugs:

   - rds: use maybe_get_net() when acquiring refcount on TCP sockets
     [refinement of a previous fix]

   - eth: ocelot: mark traps with a bool instead of guessing type based
     on list membership

  Previous releases - regressions:

   - net: fix skipping features in for_each_netdev_feature()

   - phy: micrel: fix null-derefs on suspend/resume and probe

   - bcmgenet: check for Wake-on-LAN interrupt probe deferral

  Previous releases - always broken:

   - ipv4: drop dst in multicast routing path, prevent leaks

   - ping: fix address binding wrt vrf

   - net: fix wrong network header length when BPF protocol translation
     is used on skbs with a fraglist

   - bluetooth: fix the creation of hdev->name

   - rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition

   - wifi: iwlwifi: iwl-dbg: use del_timer_sync() before freeing

   - wifi: ath11k: reduce the wait time of 11d scan and hw scan while
     adding an interface

   - mac80211: fix rx reordering with non explicit / psmp ack policy

   - mac80211: reset MBSSID parameters upon connection

   - nl80211: fix races in nl80211_set_tx_bitrate_mask()

   - tls: fix context leak on tls_device_down

   - sched: act_pedit: really ensure the skb is writable

   - batman-adv: don't skb_split skbuffs with frag_list

   - eth: ocelot: fix various issues with TC actions (null-deref; bad
     stats; ineffective drops; ineffective filter removal)"

* tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
  tls: Fix context leak on tls_device_down
  net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()
  net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending
  net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down()
  mlxsw: Avoid warning during ip6gre device removal
  net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral
  net: ethernet: mediatek: ppe: fix wrong size passed to memset()
  Bluetooth: Fix the creation of hdev->name
  i40e: i40e_main: fix a missing check on list iterator
  net/sched: act_pedit: really ensure the skb is writable
  s390/lcs: fix variable dereferenced before check
  s390/ctcm: fix potential memory leak
  s390/ctcm: fix variable dereferenced before check
  net: atlantic: verify hw_head_ lies within TX buffer ring
  net: atlantic: add check for MAX_SKB_FRAGS
  net: atlantic: reduce scope of is_rsc_complete
  net: atlantic: fix "frag[0] not initialized"
  net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()
  net: phy: micrel: Fix incorrect variable type in micrel
  decnet: Use container_of() for struct dn_neigh casts
  ...

Merge branch 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fix from Tejun Heo:
"Waiman's fix for a cgroup2 cpuset bug where it could miss nodes which
were hot-added"

* 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()

Merge tag 'fixes_for_v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fs fixes from Jan Kara:
"Three fixes that I'd still like to get to 5.18:

   - add a missing sanity check in the fanotify FAN_RENAME feature
     (added in 5.17, let's fix it before it gets wider usage in
     userspace)

   - udf fix for recently introduced filesystem corruption issue

   - writeback fix for a race in inode list handling that can lead to
     delayed writeback and possible dirty throttling stalls"

* tag 'fixes_for_v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Avoid using stale lengthOfImpUse
  writeback: Avoid skipping inode writeback
  fanotify: do not allow setting dirent events in mask of non-dir

tls: Fix context leak on tls_device_down

The commit cited below claims to fix a use-after-free condition after
tls_device_down. Apparently, the description wasn't fully accurate. The
context stayed alive, but ctx->netdev became NULL, and the offload was
torn down without a proper fallback, so a bug was present, but a
different kind of bug.

Due to misunderstanding of the issue, the original patch dropped the
refcount_dec_and_test line for the context to avoid the alleged
premature deallocation. That line has to be restored, because it matches
the refcount_inc_not_zero from the same function, otherwise the contexts
that survived tls_device_down are leaked.

This patch fixes the described issue by restoring refcount_dec_and_test.
After this change, there is no leak anymore, and the fallback to
software kTLS still works.

Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()

In the NIC ->probe() callback, ->mtd_probe() callback is called.
If NIC has 2 ports, ->probe() is called twice and ->mtd_probe() too.
In the ->mtd_probe(), which is efx_ef10_mtd_probe() it allocates and
initializes mtd partiion.
But mtd partition for sfc is shared data.
So that allocated mtd partition data from last called
efx_ef10_mtd_probe() will not be used.
Therefore it must be freed.
But it doesn't free a not used mtd partition data in efx_ef10_mtd_probe().

kmemleak reports:
unreferenced object 0xffff88811ddb0000 (size 63168):
  comm "systemd-udevd", pid 265, jiffies 4294681048 (age 348.586s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffffa3767749>] kmalloc_order_trace+0x19/0x120
    [<ffffffffa3873f0e>] __kmalloc+0x20e/0x250
    [<ffffffffc041389f>] efx_ef10_mtd_probe+0x11f/0x270 [sfc]
    [<ffffffffc0484c8a>] efx_pci_probe.cold.17+0x3df/0x53d [sfc]
    [<ffffffffa414192c>] local_pci_probe+0xdc/0x170
    [<ffffffffa4145df5>] pci_device_probe+0x235/0x680
    [<ffffffffa443dd52>] really_probe+0x1c2/0x8f0
    [<ffffffffa443e72b>] __driver_probe_device+0x2ab/0x460
    [<ffffffffa443e92a>] driver_probe_device+0x4a/0x120
    [<ffffffffa443f2ae>] __driver_attach+0x16e/0x320
    [<ffffffffa4437a90>] bus_for_each_dev+0x110/0x190
    [<ffffffffa443b75e>] bus_add_driver+0x39e/0x560
    [<ffffffffa4440b1e>] driver_register+0x18e/0x310
    [<ffffffffc02e2055>] 0xffffffffc02e2055
    [<ffffffffa3001af3>] do_one_initcall+0xc3/0x450
    [<ffffffffa33ca574>] do_init_module+0x1b4/0x700

Acked-by: Martin Habets <[email protected]>
Fixes: 8127d661e77f ("sfc: Add support for Solarflare SFC9100 family")
Signed-off-by: Taehee Yoo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending

Non blocking sendmsg will return -EAGAIN when any signal pending
and no send space left, while non blocking recvmsg return -EINTR
when signal pending and no data received. This may makes confused.
As TCP returns -EAGAIN in the conditions described above. Align the
behavior of smc with TCP.

Fixes: 846e344eb722 ("net/smc: add receive timeout check")
Signed-off-by: Guangguan Wang <[email protected]>
Reviewed-by: Tony Lu <[email protected]>
Acked-by: Karsten Graul <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down()

After commit 2d1f90f9ba83 ("net: dsa/bcm_sf2: fix incorrect usage of
state->link") the interface suspend path would call our mac_link_down()
call back which would forcibly set the link down, thus preventing
Wake-on-LAN packets from reaching our management port.

Fix this by looking at whether the port is enabled for Wake-on-LAN and
not clearing the link status in that case to let packets go through.

Fixes: 2d1f90f9ba83 ("net: dsa/bcm_sf2: fix incorrect usage of state->link")
Signed-off-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mlxsw: Avoid warning during ip6gre device removal

IPv6 addresses which are used for tunnels are stored in a hash table
with reference counting. When a new GRE tunnel is configured, the driver
is notified and configures it in hardware.

Currently, any change in the tunnel is not applied in the driver. It
means that if the remote address is changed, the driver is not aware of
this change and the first address will be used.

This behavior results in a warning [1] in scenarios such as the
following:

# ip link add name gre1 type ip6gre local 2000::3 remote 2000::fffe tos inherit ttl inherit
# ip link set name gre1 type ip6gre local 2000::3 remote 2000::ffff ttl inherit
# ip link delete gre1

The change of the address is not applied in the driver. Currently, the
driver uses the remote address which is stored in the 'parms' of the
overlay device. When the tunnel is removed, the new IPv6 address is
used, the driver tries to release it, but as it is not aware of the
change, this address is not configured and it warns about releasing non
existing IPv6 address.

Fix it by using the IPv6 address which is cached in the IPIP entry, this
address is the last one that the driver used, so even in cases such the
above, the first address will be released, without any warning.

[1]:

WARNING: CPU: 1 PID: 2197 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2920 mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum]
...
CPU: 1 PID: 2197 Comm: ip Not tainted 5.17.0-rc8-custom-95062-gc1e5ded51a9a #84
Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021
RIP: 0010:mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum]
...
Call Trace:
<TASK>
mlxsw_sp2_ipip_rem_addr_unset_gre6+0xf1/0x120 [mlxsw_spectrum]
mlxsw_sp_netdevice_ipip_ol_event+0xdb/0x640 [mlxsw_spectrum]
mlxsw_sp_netdevice_event+0xc4/0x850 [mlxsw_spectrum]
raw_notifier_call_chain+0x3c/0x50
call_netdevice_notifiers_info+0x2f/0x80
unregister_netdevice_many+0x311/0x6d0
rtnl_dellink+0x136/0x360
rtnetlink_rcv_msg+0x12f/0x380
netlink_rcv_skb+0x49/0xf0
netlink_unicast+0x233/0x340
netlink_sendmsg+0x202/0x440
____sys_sendmsg+0x1f3/0x220
___sys_sendmsg+0x70/0xb0
__sys_sendmsg+0x54/0xa0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: e846efe2737b ("mlxsw: spectrum: Add hash table for IPv6 address mapping")
Reported-by: Maksym Yaremchuk <[email protected]>
Signed-off-by: Amit Cohen <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'nfp-vf-rate-limit-support'

Simon Horman says:

====================
*nfp: VF rate limit support

this short series adds VF rate limiting to the NFP driver.

The first patch, as suggested by Jakub Kicinski, adds a helper
to check that ndo_set_vf_rate() rate parameters are sane.
It also provides a place for further parameter checking to live,
if needed in future.

The second patch adds VF rate limit support to the NFP driver.
It addresses several comments made on v1, including removing
the parameter check that is now provided by the helper added
in the first patch.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

nfp: VF rate limit support

Add VF rate limit feature

This patch enhances the NFP driver to supports assignment of
both max_tx_rate and min_tx_rate to VFs

The template of configurations below is all supported.
e.g.
# ip link set $DEV vf $VF_NUM max_tx_rate $RATE_VALUE
# ip link set $DEV vf $VF_NUM min_tx_rate $RATE_VALUE
# ip link set $DEV vf $VF_NUM max_tx_rate $RATE_VALUE \
min_tx_rate $RATE_VALUE
# ip link set $DEV vf $VF_NUM min_tx_rate $RATE_VALUE \
max_tx_rate $RATE_VALUE

The max RATE_VALUE is limited to 0xFFFF which is about
63Gbps (using 1024 for 1G)

Signed-off-by: Bin Chen <[email protected]>
Signed-off-by: Louis Peens <[email protected]>
Signed-off-by: Baowen Zheng <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

rtnetlink: verify rate parameters for calls to ndo_set_vf_rate

When calling ndo_set_vf_rate() the max_tx_rate parameter may be zero,
in which case the setting is cleared, or it must be greater or equal to
min_tx_rate.

Enforce this requirement on all calls to ndo_set_vf_rate via a wrapper
which also only calls ndo_set_vf_rate() if defined by the driver.

Based on work by Jakub Kicinski <[email protected]>

Signed-off-by: Bin Chen <[email protected]>
Signed-off-by: Baowen Zheng <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

net: ethernet: SP7021: Fix spelling mistake "Interrput" -> "Interrupt"

There is a spelling mistake in a dev_dbg message. Fix it.

Signed-off-by: Colin Ian King <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: enetc: kill PHY-less mode for PFs

Right now, a PHY-less port (no phy-mode, no fixed-link, no phy-handle)
doesn't register with phylink, but calls netif_carrier_on() from
enetc_start().

This makes sense for a VF, but for a PF, this is braindead, because we
never call enetc_mac_enable() so the MAC is left inoperational.
Furthermore, commit 71b77a7a27a3 ("enetc: Migrate to PHYLINK and
PCS_LYNX") put the nail in the coffin because it removed the initial
netif_carrier_off() call done right after register_netdev().

Without that call, netif_carrier_on() does not call
linkwatch_fire_event(), so the operstate remains IF_OPER_UNKNOWN.

Just deny the broken configuration by requiring that a phy-mode is
present, and always register a PF with phylink.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Claudiu Manoil <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

fortify: Provide a memcpy trap door for sharp corners

As we continue to narrow the scope of what the FORTIFY memcpy() will
accept and build alternative APIs that give the compiler appropriate
visibility into more complex memcpy scenarios, there is a need for
"unfortified" memcpy use in rare cases where combinations of compiler
behaviors, source code layout, etc, result in cases where the stricter
memcpy checks need to be bypassed until appropriate solutions can be
developed (i.e. fix compiler bugs, code refactoring, new API, etc). The
intention is for this to be used only if there's no other reasonable
solution, for its use to include a justification that can be used
to assess future solutions, and for it to be temporary.

Example usage included, based on analysis and discussion from:
https://lore.kernel.org/netdev/CANn89iLS_2cshtuXPyNUGDPaic=sJiYfvTb_wNLgWrZRyBxZ_g@mail.gmail.com

Cc: Jakub Kicinski <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Coco Li <[email protected]>
Cc: Tariq Toukan <[email protected]>
Cc: Saeed Mahameed <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral

The interrupt controller supplying the Wake-on-LAN interrupt line maybe
modular on some platforms (irq-bcm7038-l1.c) and might be probed at a
later time than the GENET driver. We need to specifically check for
-EPROBE_DEFER and propagate that error to ensure that we eventually
fetch the interrupt descriptor.

Fixes: 9deb48b53e7f ("bcmgenet: add WOL IRQ check")
Fixes: 5b1f0e62941b ("net: bcmgenet: Avoid touching non-existent interrupt")
Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Stefan Wahren <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

net: ethernet: mediatek: ppe: fix wrong size passed to memset()

'foe_table' is a pointer, the real size of struct mtk_foe_entry
should be pass to memset().

Fixes: ba37b7caf1ed ("net: ethernet: mtk_eth_soc: add support for initializing the PPE")
Signed-off-by: Yang Yingliang <[email protected]>
Acked-by: Felix Fietkau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'for-net-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

- Fix the creation of hdev->name when index is greater than 9999

* tag 'for-net-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: Fix the creation of hdev->name
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'wireless-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v5.18

Second set of fixes for v5.18 and hopefully the last one. We have a
new iwlwifi maintainer, a fix to rfkill ioctl interface and important
fixes to both stack and two drivers.

* tag 'wireless-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition
  nl80211: fix locking in nl80211_set_tx_bitrate_mask()
  mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protection
  mac80211_hwsim: fix RCU protected chanctx access
  mailmap: update Kalle Valo's email
  mac80211: Reset MBSSID parameters upon connection
  cfg80211: retrieve S1G operating channel number
  nl80211: validate S1G channel width
  mac80211: fix rx reordering with non explicit / psmp ack policy
  ath11k: reduce the wait time of 11d scan and hw scan while add interface
  MAINTAINERS: update iwlwifi driver maintainer
  iwlwifi: iwl-dbg: Use del_timer_sync() before freeing
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Bluetooth: Fix the creation of hdev->name

Set a size limit of 8 bytes of the written buffer to "hdev->name"
including the terminating null byte, as the size of "hdev->name" is 8
bytes. If an id value which is greater than 9999 is allocated,
then the "snprintf(hdev->name, sizeof(hdev->name), "hci%d", id)"
function call would lead to a truncation of the id value in decimal
notation.

Set an explicit maximum id parameter in the id allocation function call.
The id allocation function defines the maximum allocated id value as the
maximum id parameter value minus one. Therefore, HCI_MAX_ID is defined
as 10000.

Signed-off-by: Itay Iellin <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>

Merge branch 'count-tc-taprio-window-drops-in-enetc-driver'

Vladimir Oltean says:

====================
Count tc-taprio window drops in enetc driver

This series includes a patch from Po Liu (no longer with NXP) which
counts frames dropped by the tc-taprio offload in ethtool -S and in
ndo_get_stats64. It also contains a preparation patch from myself.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: enetc: count the tc-taprio window drops

The enetc scheduler for IEEE 802.1Qbv has 2 options (depending on
PTGCR[TG_DROP_DISABLE]) when we attempt to send an oversized packet
which will never fit in its allotted time slot for its traffic class:
either block the entire port due to head-of-line blocking, or drop the
packet and set a bit in the writeback format of the transmit buffer
descriptor, allowing other packets to be sent.

We obviously choose the second option in the driver, but we do not
detect the drop condition, so from the perspective of the network stack,
the packet is sent and no error counter is incremented.

This change checks the writeback of the TX BD when tc-taprio is enabled,
and increments a specific ethtool statistics counter and a generic
"tx_dropped" counter in ndo_get_stats64.

Signed-off-by: Po Liu <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Claudiu Manoil <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: enetc: manage ENETC_F_QBV in priv->active_offloads only when enabled

Future work in this driver would like to look at priv->active_offloads &
ENETC_F_QBV to determine whether a tc-taprio qdisc offload was
installed, but this does not produce the intended effect.

All the other flags in priv->active_offloads are managed dynamically,
except ENETC_F_QBV which is set statically based on the probed SI capability.

This change makes priv->active_offloads & ENETC_F_QBV really track the
presence of a tc-taprio schedule on the port.

Some existing users, like the enetc_sched_speed_set() call from
phylink_mac_link_up(), are best kept using the old logic: the tc-taprio
offload does not re-trigger another link mode resolve, so the scheduler
needs to be functional from the get go, as long as Qbv is supported at
all on the port. So to preserve functionality there, look at the static
station interface capability from pf->si->hw_features instead.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Claudiu Manoil <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'macb-napi-improvements'

Robert Hancock says:

====================
MACB NAPI improvements

Simplify the logic in the Cadence MACB/GEM driver for determining
when to reschedule NAPI processing, and update it to use NAPI for the
TX path as well as the RX path.

Changes since v1: Changed to use separate TX and RX NAPI instances and
poll functions to avoid unnecessary checks of the other ring (TX/RX)
states during polling and to use budget handling for both RX and TX.
Fixed locking to protect against concurrent access to TX ring on
TX transmit and TX poll paths.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: macb: use NAPI for TX completion path

This driver was using the TX IRQ handler to perform all TX completion
tasks. Under heavy TX network load, this can cause significant irqs-off
latencies (found to be in the hundreds of microseconds using ftrace).
This can cause other issues, such as overrunning serial UART FIFOs when
using high baud rates with limited UART FIFO sizes.

Switch to using a NAPI poll handler to perform the TX completion work
to get this out of hard IRQ context and avoid the IRQ latency impact. A
separate NAPI instance is used for TX and RX to avoid checking the other
ring's state unnecessarily when doing the poll, and so that the NAPI
budget handling can work for both TX and RX packets.

A new per-queue tx_ptr_lock spinlock has been added to avoid using the
main device lock (with IRQs needing to be disabled) across the entire TX
mapping operation, and also to protect the TX queue pointers from
concurrent access between the TX start and TX poll operations.

The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved into
the TX NAPI poll handler to maintain the proper order of operations. A flag
is used to notify the poll handler that a UBR condition needs to be
handled. The macb_tx_restart handler has had some locking added for global
register access, since this could now potentially happen concurrently on
different queues.

Signed-off-by: Robert Hancock <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: macb: simplify/cleanup NAPI reschedule checking

Previously the macb_poll method was checking the RSR register after
completing its RX receive work to see if additional packets had been
received since IRQs were disabled, since this controller does not
maintain the pending IRQ status across IRQ disable. It also had to
double-check the register after re-enabling IRQs to detect if packets
were received after the first check but before IRQs were enabled.

Using the RSR register for this purpose is problematic since it reflects
the global device state rather than the per-queue state, so if packets
are being received on multiple queues it may end up retriggering receive
on a queue where the packets did not actually arrive and not on the one
where they did arrive. This will also cause problems with an upcoming
change to use NAPI for the TX path where use of multiple queues is more
likely.

Add a macb_rx_pending function to check the RX ring to see if more
packets have arrived in the queue, and use that to check if NAPI should
be rescheduled rather than the RSR register. By doing this, we can just
ignore the global RSR register entirely, and thus save some extra device
register accesses at the same time.

This also makes the previous first check for pending packets rather
redundant, since it would be checking the RX ring state which was just
checked in the receive work function. Therefore we can get rid of it and
just check after enabling interrupts whether packets are already
pending.

Signed-off-by: Robert Hancock <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

net: dsa: ocelot: accept 1000base-X for VSC9959 and VSC9953

Switches using the Lynx PCS driver support 1000base-X optical SFP
modules. Accept this interface type on a port.

Signed-off-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

i40e: i40e_main: fix a missing check on list iterator

The bug is here:
ret = i40e_add_macvlan_filter(hw, ch->seid, vdev->dev_addr, &aq_err);

The list iterator 'ch' will point to a bogus position containing
HEAD if the list is empty or no element is found. This case must
be checked before any use of the iterator, otherwise it will
lead to a invalid memory access.

To fix this bug, use a new variable 'iter' as the list iterator,
while use the origin variable 'ch' as a dedicated pointer to
point to the found element.

Cc: [email protected]
Fixes: 1d8d80b4e4ff6 ("i40e: Add macvlan support on i40e")
Signed-off-by: Xiaomeng Tong <[email protected]>
Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests: forwarding: tc_actions: allow mirred egress test to run on non-offloaded h2

The host interfaces $h1 and $h2 don't have to be switchdev interfaces,
but due to the fact that we pass $tcflags which may have the value of
"skip_sw", we force $h2 to offload a drop rule for dst_ip, something
which it may not be able to do.

The selftest only wants to verify the hit count of this rule as a means
of figuring out whether the packet was received, so remove the $tcflags
for it and let it be done in software.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2022-05-10

This series contains updates to igc driver only.

Sasha cleans up the code by removing an unused function and removing an
enum for PHY type as there is only one PHY. The return type for
igc_check_downshift() is changed to void as it always returns success.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  igc: Change type of the 'igc_check_downshift' method
  igc: Remove unused phy_type enum
  igc: Remove igc_set_spd_dplx method
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net/sched: act_pedit: really ensure the skb is writable

Currently pedit tries to ensure that the accessed skb offset
is writable via skb_unclone(). The action potentially allows
touching any skb bytes, so it may end-up modifying shared data.

The above causes some sporadic MPTCP self-test failures, due to
this code:

tc -n $ns2 filter add dev ns2eth$i egress \
protocol ip prio 1000 \
handle 42 fw \
action pedit munge offset 148 u8 invert \
pipe csum tcp \
index 100

The above modifies a data byte outside the skb head and the skb is
a cloned one, carrying a TCP output packet.

This change addresses the issue by keeping track of a rough
over-estimate highest skb offset accessed by the action and ensuring
such offset is really writable.

Note that this may cause performance regressions in some scenarios,
but hopefully pedit is not in the critical path.

Fixes: db2c24175d14 ("act_pedit: access skb->data safely")
Acked-by: Mat Martineau <[email protected]>
Tested-by: Geliang Tang <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/1fcf78e6679d0a287dd61bb0f04730ce33b3255d.1652194627.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>

eth: amd: remove NI6510 support (ni65)

Looks like all the changes to this driver had been tree-wide
refactoring since git era begun. The driver is using virt_to_bus()
we should make it use more modern DMA APIs but since it's unlikely
to be getting any use these days delete it instead. We can always
revert to bring it back.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: appletalk: remove Apple/Farallon LocalTalk PC support

Looks like all the changes to this driver had been tree-wide
refactoring since git era begun. The driver is using virt_to_bus()
we should make it use more modern DMA APIs but since it's unlikely
to be getting any use these days delete it instead. We can always
revert to bring it back.

Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>