Frédéric Danis [Thu, 7 Mar 2024 16:42:05 +0000 (17:42 +0100)]
Bluetooth: Fix eir name length
According to Section 1.2 of Core Specification Supplement Part A the
complete or short name strings are defined as utf8s, which should not
include the trailing NULL for variable length array as defined in Core
Specification Vol1 Part E Section 2.9.3.
Removing the trailing NULL allows PTS to retrieve the random address based
on device name, e.g. for SM/PER/KDU/BV-02-C, SM/PER/KDU/BV-08-C or
GAP/BROB/BCST/BV-03-C.
Fixes: f61851f64b17 ("Bluetooth: Fix append max 11 bytes of name to scan rsp data") Signed-off-by: Frédéric Danis <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Vinicius Peixoto [Tue, 27 Feb 2024 01:43:26 +0000 (22:43 -0300)]
Bluetooth: Add new quirk for broken read key length on ATS2851
The ATS2851 controller erroneously reports support for the "Read
Encryption Key Length" HCI command. This makes it unable to connect
to any devices, since this command is issued by the kernel during the
connection process in response to an "Encryption Change" HCI event.
Add a new quirk (HCI_QUIRK_BROKEN_ENC_KEY_SIZE) to hint that the command
is unsupported, preventing it from interrupting the connection process.
This is the error log from btmon before this patch:
Roman Smirnov [Fri, 1 Mar 2024 13:39:16 +0000 (13:39 +0000)]
Bluetooth: mgmt: remove NULL check in add_ext_adv_params_complete()
Remove the cmd pointer NULL check in add_ext_adv_params_complete()
because it occurs earlier in add_ext_adv_params(). This check is
also unnecessary because the pointer is dereferenced just before it.
Found by Linux Verification Center (linuxtesting.org) with Svace.
Roman Smirnov [Fri, 1 Mar 2024 13:39:15 +0000 (13:39 +0000)]
Bluetooth: mgmt: remove NULL check in mgmt_set_connectable_complete()
Remove the cmd pointer NULL check in mgmt_set_connectable_complete()
because it occurs earlier in set_connectable(). This check is also
unnecessary because the pointer is dereferenced just before it.
Found by Linux Verification Center (linuxtesting.org) with Svace.
Takashi Iwai [Tue, 27 Feb 2024 10:29:14 +0000 (11:29 +0100)]
Bluetooth: btmtk: Add MODULE_FIRMWARE() for MT7922
Since dracut refers to the module info for defining the required
firmware files and btmtk driver doesn't provide the firmware info for
MT7922, the generate initrd misses the firmware, resulting in the
broken Bluetooth.
This patch simply adds the MODULE_FIRMWARE() for the missing entry
for covering that.
Dan Carpenter [Sat, 2 Mar 2024 08:30:43 +0000 (11:30 +0300)]
Bluetooth: ISO: Clean up returns values in iso_connect_ind()
This function either returns 0 or HCI_LM_ACCEPT. Make it clearer which
returns are which and delete the "lm" variable because it is no longer
required.
Fixes: 2615fd9a7c25 ("Bluetooth: hci_sync: Fix overwriting request callback") Signed-off-by: Pauli Virtanen <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Attemting to do sock_lock on .recvmsg may cause a deadlock as shown
bellow, so instead of using sock_sock this uses sk_receive_queue.lock
on bt_sock_ioctl to avoid the UAF:
This fixes attempting to access past ethhdr.h_source, although it seems
intentional to copy also the contents of h_proto this triggers
out-of-bound access problems with the likes of static analyzer, so this
instead just copy ETH_ALEN and then proceed to use put_unaligned to copy
h_proto separetely.
This checks if CONFIG_DEV_COREDUMP is enabled before attempting to clone
the skb and also make sure btmtk_process_coredump frees the skb passed
following the same logic.
Fixes: 0b7015132878 ("Bluetooth: btusb: mediatek: add MediaTek devcoredump support") Signed-off-by: Luiz Augusto von Dentz <[email protected]>
struct hci_dev_info has a fixed size name[8] field so in the event that
hdev->name is bigger than that strcpy would attempt to write past its
size, so this fixes this problem by switching to use strscpy.
Fixes: dcda165706b9 ("Bluetooth: hci_core: Fix build warnings") Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Andrey Skvortsov [Fri, 23 Feb 2024 21:37:04 +0000 (00:37 +0300)]
Bluetooth: btrtl: fix out of bounds memory access
The problem is detected by KASAN.
btrtl driver uses private hci data to store 'struct btrealtek_data'.
If btrtl driver is used with btusb, then memory for private hci data
is allocated in btusb. But no private data is allocated after hci_dev,
when btrtl is used with hci_h5.
This commit adds memory allocation for hci_h5 case.
==================================================================
BUG: KASAN: slab-out-of-bounds in btrtl_initialize+0x6cc/0x958 [btrtl]
Write of size 8 at addr ffff00000f5a5748 by task kworker/u9:0/76
In a few cases the stack may generate commands as responses to events
which would happen to overwrite the sent_cmd, so this attempts to store
the request in req_skb so even if sent_cmd is replaced with a new
command the pending request will remain in stored in req_skb.
Fixes: 6a98e3836fa2 ("Bluetooth: Add helper for serialized HCI command execution") Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Bluetooth: hci_sync: Use address filtering when HCI_PA_SYNC is set
If HCI_PA_SYNC flag is set it means there is a Periodic Advertising
Synchronization pending, so this attempts to locate the address passed
to HCI_OP_LE_PA_CREATE_SYNC and program it in the accept list so only
reports with that address are processed.
Iulia Tanasescu [Fri, 23 Feb 2024 13:14:42 +0000 (15:14 +0200)]
Bluetooth: ISO: Reassemble PA data for bcast sink
This adds support to reassemble PA data for a Broadcast Sink
listening socket. This is needed in case the BASE is received
fragmented in multiple PA reports.
PA data is first reassembled inside the hcon, before the BASE
is extracted and stored inside the socket. The length of the
le_per_adv_data hcon array has been raised to 1650, to accommodate
the maximum PA data length that can come fragmented, according to
spec.
Bluetooth: hci_qca: don't use IS_ERR_OR_NULL() with gpiod_get_optional()
The optional variants for the gpiod_get() family of functions return NULL
if the GPIO in question is not associated with this device. They return
ERR_PTR() on any other error. NULL descriptors are graciously handled by
GPIOLIB and can be safely passed to any of the GPIO consumer interfaces
as they will return 0 and act as if the function succeeded. If one is
using the optional variant, then there's no point in checking for NULL.
Fixes: 6845667146a2 ("Bluetooth: hci_qca: Fix NULL vs IS_ERR_OR_NULL check in qca_serdev_probe") Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Kiran K [Wed, 21 Feb 2024 13:33:46 +0000 (19:03 +0530)]
Bluetooth: btintel: Print Firmware Sequencer information
Firmware sequencer (FSEQ) is a common code shared across Bluetooth
and Wifi. Printing FSEQ will help to debug if there is any mismatch
between Bluetooth and Wifi FSEQ.
Since commit aed65af1cc2f ("drivers: make device_type const"), the driver
core can properly handle constant struct device_type. Move the bt_type and
bnep_type variables to be constant structures as well, placing it into
read-only memory which can not be modified at runtime.
Bluetooth: hci_sync: Attempt to dequeue connection attempt
If connection is still queued/pending in the cmd_sync queue it means no
command has been generated and it should be safe to just dequeue the
callback when it is being aborted.
Bluetooth: hci_conn: Fix UAF Write in __hci_acl_create_connection_sync
This fixes the UAF on __hci_acl_create_connection_sync caused by
connection abortion, it uses the same logic as to LE_LINK which uses
hci_cmd_sync_cancel to prevent the callback to run if the connection is
abort prematurely.
Reported-by: [email protected] Fixes: 45340097ce6e ("Bluetooth: hci_conn: Only do ACL connections sequentially") Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Bluetooth: hci_conn: Always use sk_timeo as conn_timeout
This aligns the use socket sk_timeo as conn_timeout when initiating a
connection and then use it when scheduling the resulting HCI command,
that way the command is actually aborted synchronously thus not
blocking commands generated by hci_abort_conn_sync to inform the
controller the connection is to be aborted.
Jonas Dreßler [Tue, 6 Feb 2024 11:08:14 +0000 (12:08 +0100)]
Bluetooth: Remove pending ACL connection attempts
With the last commit we moved to using the hci_sync queue for "Create
Connection" requests, removing the need for retrying the paging after
finished/failed "Create Connection" requests and after the end of
inquiries.
hci_conn_check_pending() was used to trigger this retry, we can remove it
now.
Note that we can also remove the special handling for COMMAND_DISALLOWED
errors in the completion handler of "Create Connection", because "Create
Connection" requests are now always serialized.
This is somewhat reverting commit 4c67bc74f016 ("[Bluetooth] Support
concurrent connect requests").
With this, the BT_CONNECT2 state of ACL hci_conn objects should now be
back to meaning only one thing: That we received a "Connection Request"
from another device (see hci_conn_request_evt), but the response to that
is going to be deferred.
Jonas Dreßler [Tue, 6 Feb 2024 11:08:13 +0000 (12:08 +0100)]
Bluetooth: hci_conn: Only do ACL connections sequentially
Pretty much all bluetooth chipsets only support paging a single device at
a time, and if they don't reject a secondary "Create Connection" request
while another is still ongoing, they'll most likely serialize those
requests in the firware.
With commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
requests") we started adding some serialization of our own in case the
adapter returns "Command Disallowed" HCI error.
This commit was using the BT_CONNECT2 state for the serialization, this
state is also used for a few more things (most notably to indicate we're
waiting for an inquiry to cancel) and therefore a bit unreliable. Also
not all BT firwares would respond with "Command Disallowed" on too many
connection requests, some will also respond with "Hardware Failure"
(BCM4378), and others will error out later and send a "Connect Complete"
event with error "Rejected Limited Resources" (Marvell 88W8897).
We can clean things up a bit and also make the serialization more reliable
by using our hci_sync machinery to always do "Create Connection" requests
in a sequential manner.
This is very similar to what we're already doing for establishing LE
connections, and it works well there.
Note that this causes a test failure in mgmt-tester (test "Pair Device
- Power off 1") because the hci_abort_conn_sync() changes the error we
return on timeout of the "Create Connection". We'll fix this on the
mgmt-tester side by adjusting the expected error for the test.
Bluetooth: hci_event: Fix not indicating new connection for BIG Sync
BIG Sync (aka. Broadcast sink) requires to inform that the device is
connected when a data path is active otherwise userspace could attempt
to free resources allocated to the device object while scanning.
Fixes: 1d11d70d1f6b ("Bluetooth: ISO: Pass BIG encryption info through QoS") Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Bluetooth: hci_core: Cancel request on command timeout
If command has timed out call __hci_cmd_sync_cancel to notify the
hci_req since it will inevitably cause a timeout.
This also rework the code around __hci_cmd_sync_cancel since it was
wrongly assuming it needs to cancel timer as well, but sometimes the
timers have not been started or in fact they already had timed out in
which case they don't need to be cancel yet again.
Jonas Dreßler [Mon, 8 Jan 2024 22:46:06 +0000 (23:46 +0100)]
Bluetooth: Remove superfluous call to hci_conn_check_pending()
The "pending connections" feature was originally introduced with commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect requests") and 6bd57416127e ("[Bluetooth] Handling pending connect attempts after
inquiry") to handle controllers supporting only a single connection request
at a time. Later things were extended to also cancel ongoing inquiries on
connect() with commit 89e65975fea5 ("Bluetooth: Cancel Inquiry before
Create Connection").
With commit a9de9248064b ("[Bluetooth] Switch from OGF+OCF to using only
opcodes"), hci_conn_check_pending() was introduced as a helper to
consolidate a few places where we check for pending connections (indicated
by the BT_CONNECT2 flag) and then try to connect.
This refactoring commit also snuck in two more calls to
hci_conn_check_pending():
- One is in the failure callback of hci_cs_inquiry(), this one probably
makes sense: If we send an "HCI Inquiry" command and then immediately
after a "Create Connection" command, the "Create Connection" command might
fail before the "HCI Inquiry" command, and then we want to retry the
"Create Connection" on failure of the "HCI Inquiry".
- The other added call to hci_conn_check_pending() is in the event handler
for the "Remote Name" event, this seems unrelated and is possibly a
copy-paste error, so remove that one.
Fixes: a9de9248064b ("[Bluetooth] Switch from OGF+OCF to using only opcodes") Signed-off-by: Jonas Dreßler <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Jonas Dreßler [Sun, 7 Jan 2024 18:02:50 +0000 (19:02 +0100)]
Bluetooth: Disconnect connected devices before rfkilling adapter
On a lot of platforms (at least the MS Surface devices, M1 macbooks, and
a few ThinkPads) firmware doesn't do its job when rfkilling a device
and the bluetooth adapter is not actually shut down properly on rfkill.
This leads to connected devices remaining in connected state and the
bluetooth connection eventually timing out after rfkilling an adapter.
Use the rfkill hook in the HCI driver to go through the full power-off
sequence (including stopping scans and disconnecting devices) before
rfkilling it, just like MGMT_OP_SET_POWERED would do.
In case anything during the larger power-off sequence fails, make sure
the device is still closed and the rfkill ends up being effective in
the end.
Jonas Dreßler [Sun, 7 Jan 2024 18:02:48 +0000 (19:02 +0100)]
Bluetooth: mgmt: Remove leftover queuing of power_off work
Queuing of power_off work was introduced in these functions with commits 8b064a3ad377 ("Bluetooth: Clean up HCI state when doing power off") and c9910d0fb4fc ("Bluetooth: Fix disconnecting connections in non-connected
states") in an effort to clean up state and do things like disconnecting
devices before actually powering off the device.
After that, commit a3172b7eb4a2 ("Bluetooth: Add timer to force power off")
introduced a timeout to ensure that the device actually got powered off,
even if some of the cleanup work would never complete.
This code later got refactored with commit cf75ad8b41d2 ("Bluetooth:
hci_sync: Convert MGMT_SET_POWERED"), which made powering off the device
synchronous and removed the need for initiating the power_off work from
other places. The timeout mentioned above got removed too, because we now
also made use of the command timeout during power on/off.
These days the power_off work still exists, but it only seems to only be
used for HCI_AUTO_OFF functionality, which is why we never noticed
those two leftover places where we queue power_off work. So let's remove
that code.
Fixes: cf75ad8b41d2 ("Bluetooth: hci_sync: Convert MGMT_SET_POWERED") Signed-off-by: Jonas Dreßler <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Jonas Dreßler [Sun, 7 Jan 2024 18:02:47 +0000 (19:02 +0100)]
Bluetooth: Remove HCI_POWER_OFF_TIMEOUT
With commit cf75ad8b41d2 ("Bluetooth: hci_sync: Convert MGMT_SET_POWERED"),
the power off sequence got refactored so that this timeout was no longer
necessary, let's remove the leftover define from the header too.
Fixes: cf75ad8b41d2 ("Bluetooth: hci_sync: Convert MGMT_SET_POWERED") Signed-off-by: Jonas Dreßler <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Bluetooth: btnxpuart: Resolve TX timeout error in power save stress test
This fixes the tx timeout issue seen while running a stress test on
btnxpuart for couple of hours, such that the interval between two HCI
commands coincide with the power save timeout value of 2 seconds.
Test procedure using bash script:
<load btnxpuart.ko>
hciconfig hci0 up
//Enable Power Save feature
hcitool -i hci0 cmd 3f 23 02 00 00
while (true)
do
hciconfig hci0 leadv
sleep 2
hciconfig hci0 noleadv
sleep 2
done
Error log, after adding few more debug prints:
Bluetooth: btnxpuart_queue_skb(): 01 0A 20 01 00
Bluetooth: hci0: Set UART break: on, status=0
Bluetooth: hci0: btnxpuart_tx_wakeup() tx_work scheduled
Bluetooth: hci0: btnxpuart_tx_work() dequeue: 01 0A 20 01 00
Can't set advertise mode on hci0: Connection timed out (110)
Bluetooth: hci0: command 0x200a tx timeout
When the power save mechanism turns on UART break, and btnxpuart_tx_work()
is scheduled simultaneously, psdata->ps_state is read as PS_STATE_AWAKE,
which prevents the psdata->work from being scheduled, which is responsible
to turn OFF UART break.
This issue is fixed by adding a ps_lock mutex around UART break on/off as
well as around ps_state read/write.
btnxpuart_tx_wakeup() will now read updated ps_state value. If ps_state is
PS_STATE_SLEEP, it will first schedule psdata->work, and then it will
reschedule itself once UART break has been turned off and ps_state is
PS_STATE_AWAKE.
Tested above script for 50,000 iterations and TX timeout error was not
observed anymore.
Juntong Deng [Mon, 4 Mar 2024 11:32:08 +0000 (11:32 +0000)]
inet: Add getsockopt support for IP_ROUTER_ALERT and IPV6_ROUTER_ALERT
Currently getsockopt does not support IP_ROUTER_ALERT and
IPV6_ROUTER_ALERT, and we are unable to get the values of these two
socket options through getsockopt.
This patch adds getsockopt support for IP_ROUTER_ALERT and
IPV6_ROUTER_ALERT.
David S. Miller [Wed, 6 Mar 2024 12:07:44 +0000 (12:07 +0000)]
Merge branch 'ynl-small-recv'
Jakub Kicinski says:
====================
tools: ynl: add --dbg-small-recv for easier kernel testing
When testing netlink dumps I usually hack some user space up
to constrain its user space buffer size (iproute2, ethtool or ynl).
Netlink will try to fill the messages up, so since these apps use
large buffers by default, the dumps are rarely fragmented.
I was hoping to figure out a way to create a selftest for dump
testing, but so far I have no idea how to do that in a useful
and generic way.
Until someone does that, make manual dump testing easier with YNL.
Create a special option for limiting the buffer size, so I don't
have to make the same edits each time, and maybe others will benefit,
too :)
A real test would also have to check the messages are complete
and not duplicated. That part has to be done manually right now.
Note that the first message is always conservatively sized by the kernel.
Still, I think this is good enough to be useful.
v2:
- patch 2:
- move the recv_size setting up
- change the default to 0 so that cli.py doesn't have to worry
what the "unset" value is
v1: https://lore.kernel.org/all/20240301230542[email protected]/
====================
Jakub Kicinski [Tue, 5 Mar 2024 05:33:10 +0000 (21:33 -0800)]
tools: ynl: add --dbg-small-recv for easier kernel testing
Most "production" netlink clients use large buffers to
make dump efficient, which means that handling of dump
continuation in the kernel is not very well tested.
Add an option for debugging / testing handling of dumps.
It enables printing of extra netlink-level debug and
lowers the recv() buffer size in one go. When used
without any argument (--dbg-small-recv) it picks
a very small default (4000), explicit size can be set,
too (--dbg-small-recv 5000).
Jakub Kicinski [Tue, 5 Mar 2024 05:33:08 +0000 (21:33 -0800)]
tools: ynl: allow setting recv() size
Make the size of the buffer we use for recv() configurable.
The details of the buffer sizing in netlink are somewhat
arcane, we could spend a lot of time polishing this API.
Let's just leave some hopefully helpful comments for now.
This is a for-developers-only feature, anyway.
David S. Miller [Wed, 6 Mar 2024 12:05:11 +0000 (12:05 +0000)]
Merge branch 'tools-ynl-make-clean'
Jakub Kicinski says:
====================
tools: ynl: clean up make clean
First change renames the clean target which removes build results,
to a more common name. Second one add missing .PHONY targets.
Third one ensures that clean deletes __pycache__.
Jakub Kicinski [Tue, 5 Mar 2024 05:13:26 +0000 (21:13 -0800)]
tools: ynl: rename make hardclean -> distclean
The make target to remove all generated files used to be called
"hardclean" because it deleted files which were tracked by git.
We no longer track generated user space files, so use the more
common "distclean" name.
David S. Miller [Wed, 6 Mar 2024 11:29:19 +0000 (11:29 +0000)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-03-04 (ice)
This series contains updates to ice driver only.
Jake changes the driver to use relative VSI index for VF VSIs as the VF
driver has no direct use of the VSI number on ice hardware. He also
reworks some Tx/Rx functions to clarify their uses, cleans up some style
issues, and utilizes kernel helper functions.
Maciej removes a redundant call to disable Tx queues on ifdown and
removes some unnecessary devm usages.
====================
David S. Miller [Wed, 6 Mar 2024 11:23:21 +0000 (11:23 +0000)]
Merge branch 'ravb-cleanups'
Niklas Söderlund says:
====================
ravb: Align Rx descriptor setup and maintenance
When RZ/G2L support was added the Rx code path was split in two, one to
support R-Car and one to support RZ/G2L. One reason for this is that
R-Car uses the extended Rx descriptor format, while RZ/G2L uses the
normal descriptor format.
In many aspects this is not needed as the extended descriptor format is
just a normal descriptor with extra metadata (timestamsp) appended. And
the R-Car SoCs can also use normal descriptors if hardware timestamps
were not desired. This split has led to RZ/G2L gaining support for
split descriptors in the Rx path while R-Car still lacks this.
This series is the first step in trying to merge the R-Car and RZ/G2L Rx
paths so features and bugs corrected in one will benefit the other.
The first patch in the series clarifies that the driver now supports
either normal or extended descriptors, not both at the same time by
grouping them in a union. This is the foundation that later patches will
build on the aligning the two Rx paths.
Patches 2-5 deals with correcting small issues in the Rx frame and
descriptor sizes that either were incorrect at the time they were added
in 2017 (my bad) or concepts built on-top of this initial incorrect
design.
While finally patch 6 merges the R-Car and RZ/G2L for Rx descriptor
setup and maintenance.
When this work has landed I plan to follow up with more work aligning
the rest of the Rx code paths and hopefully bring split descriptor
support to the R-Car SoCs.
====================
The R-Car and RZ/G2L Rx code paths were split in two separate
implementations when support for RZ/G2L was added due to the fact that
R-Car uses the extended descriptor format while RZ/G2L uses normal
descriptors. This has led to a duplication of Rx logic with the only
difference being the different Rx descriptors types used. The
implementation however neglects to take into account that extended
descriptors are normal descriptors with additional metadata at the end
to carry hardware timestamp information.
The hardware timestamp information is only consumed in the R-Car Rx
loop and all the maintenance code around the Rx ring can be shared
between the two implementations if the difference in descriptor length
is carefully considered.
This change merges the two implementations for Rx ring maintenance by
adding a method to access both types of descriptors as normal
descriptors, as this part covers all the fields needed for Rx ring
maintenance the only difference between using normal or extended
descriptor is the size of the memory region to allocate/free and the
step size between each descriptor in the ring.
ravb: Move maximum Rx descriptor data usage to info struct
To make it possible to merge the R-Car and RZ/G2L code paths move the
maximum usable size of a single Rx descriptor data slice into the
hardware information instead of using two different defines in the two
different code paths.
ravb: Use the max frame size from hardware info for RZ/G2L
Remove the define describing the RZ/G2L maximum frame size and only use
the information in the hardware information struct. This will make it
easier to merge the R-Car and RZ/G2L code paths.
There is no functional change as both the define and the maximum frame
length in the hardware information is set to 8K.
The EtherAVB device requires the SKB data to be aligned to 128 bytes.
The alignment is done by allocating an skb 128 bytes larger than the
maximum frame size supported by the device and adjusting the headroom to
fit the requirement.
This code has been refactored a few times and small issues have been
added along the way. The issues are not harmful but prevent merging
parts of the Rx code which have been split in two implementations with
the addition of RZ/G2L support, a device that supports larger frame
sizes.
This change removes the need for duplicated and somewhat inaccurate
hardware alignment constrains stored in the hardware information struct
by creating a helper to handle the allocation of an skb and alignment of
an skb data.
For the R-Car class of devices the maximum frame size is 4K and each
descriptor is limited to 2K of data. The current implementation does not
support split descriptors, this limits the frame size to 2K. The
current hardware information however records the descriptor size just
under 2K due to bad understanding of the device when larger MTUs where
added.
For the RZ/G2L device the maximum frame size is 8K and each descriptor
is limited to 4K of data. The current hardware information records this
correctly, but it gets the alignment constrains wrong as just aligns it
by 128, it does not extend it by 128 bytes to allow the full frame to be
stored. This works because the RZ/G2L device supports split descriptors
and allocates each skb to 8K and aligns each 4K descriptor in this
space.
ravb: Make it clear the information relates to maximum frame size
The struct member rx_max_buf_size was added before split descriptor
support was added. It is unclear if the value describes the full skb
frame buffer or the data descriptor buffer which can be combined into a
single skb.
Rename it to make it clear it referees to the maximum frame size and can
cover multiple descriptors.
The Rx ring can either be made up of normal or extended descriptors, not
a mix of the two at the same time. Make this explicit by grouping the
two variables in a rx_ring union.
The extension of the storage for more than one queue of normal
descriptors from a single to NUM_RX_QUEUE queues have no practical
effect. But aids in making the code readable as the code that uses it
already piggyback on other members of struct ravb_private that are
arrays of max length NUM_RX_QUEUE, e.g. rx_desc_dma. This will also make
further refactoring easier.
While at it, rename the normal descriptor Rx ring to make it clear it's
not strictly related to the GbEthernet E-MAC IP found in RZ/G2L, normal
descriptors could be used on R-Car SoCs too.
The motivation for this series has two primary goals. We want to enable
support of multiple simultaneous messages and make the channel more
robust. The way it works right now, the driver can only send and receive
a single message at a time and if something goes really wrong, it can
lead to data corruption and strange bugs.
To start the series, we introduce an idpf_virtchnl.h file. This reduces
the burden on idpf.h which is overloaded with struct and function
declarations.
The conversion works by conceptualizing a send and receive as a
"virtchnl transaction" (idpf_vc_xn) and introducing a "transaction
manager" (idpf_vc_xn_manager). The vcxn_mngr will init a ring of
transactions from which the driver will pop from a bitmap of free
transactions to track in-flight messages. Instead of needing to handle a
complicated send/recv for every a message, the driver now just needs to
fill out a xn_params struct and hand it over to idpf_vc_xn_exec which
will take care of all the messy bits. Once a message is sent and
receives a reply, we leverage the completion API to signal the received
buffer is ready to be used (assuming success, or an error code
otherwise).
At a low-level, this implements the "sw cookie" field of the virtchnl
message descriptor to enable this. We have 16 bits we can put whatever
we want and the recipient is required to apply the same cookie to the
reply for that message. We use the first 8 bits as an index into the
array of transactions to enable fast lookups and we use the second 8
bits as a salt to make sure each cookie is unique for that message. As
transactions are received in arbitrary order, it's possible to reuse a
transaction index and the salt guards against index conflicts to make
certain the lookup is correct. As a primitive example, say index 1 is
used with salt 1. The message times out without receiving a reply so
index 1 is renewed to be ready for a new transaction, we report the
timeout, and send the message again. Since index 1 is free to be used
again now, index 1 is again sent but now salt is 2. This time we do get
a reply, however it could be that the reply is _actually_ for the
previous send index 1 with salt 1. Without the salt we would have no
way of knowing for sure if it's the correct reply, but with we will know
for certain.
Through this conversion we also get several other benefits. We can now
more appropriately handle asynchronously sent messages by providing
space for a callback to be defined. This notably allows us to handle MAC
filter failures better; previously we could potentially have stale,
failed filters in our list, which shouldn't really have a major impact
but is obviously not correct. I also managed to remove fairly
significant more lines than I added which is a win in my book.
Additionally, this converts some variables to use auto-variables where
appropriate. This makes the alloc paths much cleaner and less prone to
memory leaks. We also fix a few virtchnl related bugs while we're here.
David S. Miller [Wed, 6 Mar 2024 08:07:45 +0000 (08:07 +0000)]
Merge branch 'netlink-emsgsize'
Jakub Kicinski says:
====================
netlink: handle EMSGSIZE errors in the core
Ido discovered some time back that we usually force NLMSG_DONE
to be delivered in a separate recv() syscall, even if it would
fit into the same skb as data messages. He made nexthop try
to fit DONE with data in commit 8743aeff5bc4 ("nexthop: Fix
infinite nexthop bucket dump when using maximum nexthop ID"),
and nobody has complained so far.
We have since also tried to follow the same pattern in new
genetlink families, but explaining to people, or even remembering
the correct handling ourselves is tedious.
Let the netlink socket layer consume -EMSGSIZE errors.
Practically speaking most families use this error code
as "dump needs more space", anyway.
Jakub Kicinski [Sun, 3 Mar 2024 05:24:08 +0000 (21:24 -0800)]
genetlink: fit NLMSG_DONE into same read() as families
Make sure ctrl_fill_info() returns sensible error codes and
propagate them out to netlink core. Let netlink core decide
when to return skb->len and when to treat the exit as an
error. Netlink core does better job at it, if we always
return skb->len the core doesn't know when we're done
dumping and NLMSG_DONE ends up in a separate read().
Jakub Kicinski [Sun, 3 Mar 2024 05:24:06 +0000 (21:24 -0800)]
netlink: handle EMSGSIZE errors in the core
Eric points out that our current suggested way of handling
EMSGSIZE errors ((err == -EMSGSIZE) ? skb->len : err) will
break if we didn't fit even a single object into the buffer
provided by the user. This should not happen for well behaved
applications, but we can fix that, and free netlink families
from dealing with that completely by moving error handling
into the core.
Let's assume from now on that all EMSGSIZE errors in dumps are
because we run out of skb space. Families can now propagate
the error nla_put_*() etc generated and not worry about any
return value magic. If some family really wants to send EMSGSIZE
to user space, assuming it generates the same error on the next
dump iteration the skb->len should be 0, and user space should
still see the EMSGSIZE.
This should simplify families and prevent mistakes in return
values which lead to DONE being forced into a separate recv()
call as discovered by Ido some time ago.
Jakub Kicinski [Mon, 4 Mar 2024 23:36:20 +0000 (15:36 -0800)]
selftests: avoid using SKIP(exit()) in harness fixure setup
selftest harness uses various exit codes to signal test
results. Avoid calling exit() directly, otherwise tests
may get broken by harness refactoring (like the commit
under Fixes). SKIP() will instruct the harness that the
test shouldn't run, it used to not be the case, but that
has been fixed. So just return, no need to exit.
Note that for hmm-tests this actually changes the result
from pass to skip. Which seems fair, the test is skipped,
after all.
Jakub Kicinski [Wed, 6 Mar 2024 03:21:19 +0000 (19:21 -0800)]
Merge branch 'net-ethernet-rework-eee'
Oleksij Rempel says:
====================
net: ethernet: Rework EEE
with Andrew's permission I'll continue mainlining this patches:
==============================================================
Most MAC drivers get EEE wrong. The API to the PHY is not very
obvious, which is probably why. Rework the API, pushing most of the
EEE handling into phylib core, leaving the MAC drivers to just
enable/disable support for EEE in there change_link call back.
MAC drivers are now expect to indicate to phylib if they support
EEE. This will allow future patches to configure the PHY to advertise
no EEE link modes when EEE is not supported. The information could
also be used to enable SmartEEE if the PHY supports it.
With these changes, the uAPI configuration eee_enable becomes a global
on/off. tx-lpi must also be enabled before EEE is enabled. This fits
the discussion here:
This patchset puts in place all the infrastructure, and converts one
MAC driver to the new API. Following patchsets will convert other MAC
drivers, extend support into phylink, and when all MAC drivers are
converted to the new scheme, clean up some unneeded code.
====================
Andrew Lunn [Sat, 2 Mar 2024 19:53:06 +0000 (20:53 +0100)]
net: fec: Fixup EEE
The enabling/disabling of EEE in the MAC should happen as a result of
auto negotiation. So move the enable/disable into
fec_enet_adjust_link() which gets called by phylib when there is a
change in link status.
fec_enet_set_eee() now just stores away the LPI timer value.
Everything else is passed to phylib, so it can correctly setup the
PHY.
fec_enet_get_eee() relies on phylib doing most of the work,
the MAC driver just adds the LPI timer value.
Call phy_support_eee() if the quirk is present to indicate the MAC
actually supports EEE.
Andrew Lunn [Sat, 2 Mar 2024 19:53:04 +0000 (20:53 +0100)]
net: phy: Add phy_support_eee() indicating MAC support EEE
In order for EEE to operate, both the MAC and the PHY need to support
it, similar to how pause works. With some exception - a number of PHYs
have SmartEEE or AutoGrEEEn support in order to provide some EEE-like
power savings with non-EEE capable MACs.
Copy the pause concept and add the call phy_support_eee() which the MAC
makes after connecting the PHY to indicate it supports EEE. phylib will
then advertise EEE when auto-neg is performed.
Andrew Lunn [Sat, 2 Mar 2024 19:53:03 +0000 (20:53 +0100)]
net: phy: Immediately call adjust_link if only tx_lpi_enabled changes
The MAC driver changes its EEE hardware configuration in its
adjust_link callback. This is called when auto-neg
completes. Disabling EEE via eee_enabled false will trigger an
autoneg, and as a result the adjust_link callback will be called with
phydev->enable_tx_lpi set to false. Similarly, eee_enabled set to true
and with a change of advertised link modes will result in a new
autoneg, and a call the adjust_link call.
If set_eee is called with only a change to tx_lpi_enabled which does
not trigger an auto-neg, it is necessary to call the adjust_link
callback so that the MAC is reconfigured to take this change into
account.
When setting phydev->enable_tx_lpi, take both eee_enabled and
tx_lpi_enabled into account, so the MAC drivers just needs to act on
phydev->enable_tx_lpi and not the whole EEE configuration.
The same check should be done for tx_lpi_timer too.
Andrew Lunn [Sat, 2 Mar 2024 19:53:01 +0000 (20:53 +0100)]
net: phy: Add phydev->enable_tx_lpi to simplify adjust link callbacks
MAC drivers which support EEE need to know the results of the EEE
auto-neg in order to program the hardware to perform EEE or not. The
oddly named phy_init_eee() can be used to determine this, it returns 0
if EEE should be used, or a negative error code,
e.g. -EOPPROTONOTSUPPORT if the PHY does not support EEE or negotiate
resulted in it not being used.
However, many MAC drivers get this wrong. Add phydev->enable_tx_lpi
which indicates the result of the autoneg for EEE, including if EEE is
administratively disabled with ethtool. The MAC driver can then access
this in the same way as link speed and duplex in the adjust link
callback. If enable_tx_lpi is true, the MAC should send low power
indications and does not need to consider anything else with respect
to EEE.
Russell King [Sat, 2 Mar 2024 19:53:00 +0000 (20:53 +0100)]
net: add helpers for EEE configuration
Add helpers that phylib and phylink can use to manage EEE configuration
and determine whether the MAC should be permitted to use LPI based on
that configuration.
Heiner Kallweit [Sat, 2 Mar 2024 14:18:27 +0000 (15:18 +0100)]
ethtool: ignore unused/unreliable fields in set_eee op
This function is used with the set_eee() ethtool operation. Certain
fields of struct ethtool_keee() are relevant only for the get_eee()
operation. In addition, in case of the ioctl interface, we have no
guarantee that userspace sends sane values in struct ethtool_eee.
Therefore explicitly ignore all fields not needed for set_eee().
This protects from drivers trying to use unchecked and unreliable
data, relying on specific userspace behavior.
Note: Such unsafe driver behavior has been found and fixed in the
tg3 driver.
Kees Cook [Mon, 4 Mar 2024 21:29:31 +0000 (13:29 -0800)]
sock: Use unsafe_memcpy() for sock_copy()
While testing for places where zero-sized destinations were still showing
up in the kernel, sock_copy() and inet_reqsk_clone() were found, which
are using very specific memcpy() offsets for both avoiding a portion of
struct sock, and copying beyond the end of it (since struct sock is really
just a common header before the protocol-specific allocation). Instead
of trying to unravel this historical lack of container_of(), just switch
to unsafe_memcpy(), since that's effectively what was happening already
(memcpy() wasn't checking 0-sized destinations while the code base was
being converted away from fake flexible arrays).
Avoid the following false positive warning with future changes to
CONFIG_FORTIFY_SOURCE:
memcpy: detected field-spanning write (size 3068) of destination "&nsk->__sk_common.skc_dontcopy_end" at net/core/sock.c:2057 (size 0)
Breno Leitao [Mon, 4 Mar 2024 18:38:08 +0000 (10:38 -0800)]
net: tap: Remove generic .ndo_get_stats64
Commit 3e2f544dd8a33 ("net: get stats64 if device if driver is
configured") moved the callback to dev_get_tstats64() to net core, so,
unless the driver is doing some custom stats collection, it does not
need to set .ndo_get_stats64.
Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it
doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64
function pointer.
Breno Leitao [Mon, 4 Mar 2024 18:38:07 +0000 (10:38 -0800)]
net: tuntap: Leverage core stats allocator
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of in this driver.
With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.
Remove the allocation in the tun/tap driver and leverage the network
core allocation instead.
ptp: fc3: Convert to platform remove callback returning void
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Since commit 43a7206b0963 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the nfc_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Since commit 43a7206b0963 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the wwan_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Since commit 43a7206b0963 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the wwan_hwsim_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Since commit 43a7206b0963 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the ppp_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Since commit 43a7206b0963 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the framer_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Since commit 43a7206b0963 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the hnae_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.
Add two erratas for lan8814. First one fix the led which might
stay on even that there is no link. The second one improves increases
length of the cable that can be used when used in 1000Base-T.
====================
When the length of the cable is more than 100m and the lan8814 is
configured to run in 1000Base-T Slave then the register of the device
needs to be optimized.
Workaround this by setting the measure time to a value of 0xb. This
value can be set regardless of the configuration.
This issue is described in 'LAN8814 Silicon Errata and Data Sheet
Clarification' and according to that, this will not be corrected in a
future silicon revision.
Horatiu Vultur [Mon, 4 Mar 2024 09:15:47 +0000 (10:15 +0100)]
net: phy: micrel: lan8814 led errata
Lan8814 phy led behavior is not correct. It was noticed that the led
still remains ON when the cable is unplugged while there was traffic
passing at that time.
The fix consists in clearing bit 10 of register 0x38, in this way the
led behaviour is correct and gets OFF when there is no link.
====================
selftests: forwarding: Various improvements
This patchset speeds up the multipath tests (patches #1-#2) and makes
other tests more stable (patches #3-#6) so that they will not randomly
fail in the netdev CI.
On my system, after applying the first two patches, the run time of
gre_multipath_nh_res.sh is reduced by over 90%.
====================
Ido Schimmel [Mon, 4 Mar 2024 09:56:12 +0000 (11:56 +0200)]
selftests: forwarding: Make {, ip6}gre-inner-v6-multipath tests more robust
These tests generate various IPv6 flows, encapsulate them in GRE packets
and check that the encapsulated packets are distributed between the
available nexthops according to the configured weights.
Unlike the corresponding IPv4 tests, these tests sometimes fail in the
netdev CI because of large discrepancies between the expected and
measured ratios [1]. This can be explained by the fact that the IPv4
tests generate about 3,600 different flows whereas the IPv6 tests only
generate about 784 different flows (potentially by mistake).
Fix by aligning the IPv6 tests to the IPv4 ones and increase the number
of generated flows.
[1]
[...]
# TEST: ping [ OK ]
# INFO: Running IPv6 over GRE over IPv4 multipath tests
# TEST: ECMP [FAIL]
# Too large discrepancy between expected and measured ratios
# INFO: Expected ratio 1.00 Measured ratio 1.18
[...]