Ahmed Zaki [Wed, 13 Dec 2023 00:33:16 +0000 (17:33 -0700)]
net: ethtool: add support for symmetric-xor RSS hash
Symmetric RSS hash functions are beneficial in applications that monitor
both Tx and Rx packets of the same flow (IDS, software firewalls, ..etc).
Getting all traffic of the same flow on the same RX queue results in
higher CPU cache efficiency.
A NIC that supports "symmetric-xor" can achieve this RSS hash symmetry
by XORing the source and destination fields and pass the values to the
RSS hash algorithm.
The user may request RSS hash symmetry for a specific algorithm, via:
# ethtool -X eth0 hfunc <hash_alg> symmetric-xor
or turn symmetry off (asymmetric) by:
# ethtool -X eth0 hfunc <hash_alg>
The specific fields for each flow type should then be specified as usual
via:
# ethtool -N|-U eth0 rx-flow-hash <flow_type> s|d|f|n
Ahmed Zaki [Wed, 13 Dec 2023 00:33:15 +0000 (17:33 -0700)]
net: ethtool: get rid of get/set_rxfh_context functions
Add the RSS context parameters to struct ethtool_rxfh_param and use the
get/set_rxfh to handle the RSS contexts as well.
This is part 2/2 of the fix suggested in [1]:
- Add a rss_context member to the argument struct and a capability
like cap_link_lanes_supported to indicate whether driver supports
rss contexts, then you can remove *et_rxfh_context functions,
and instead call *et_rxfh() with a non-zero rss_context.
Ahmed Zaki [Wed, 13 Dec 2023 00:33:14 +0000 (17:33 -0700)]
net: ethtool: pass a pointer to parameters to get/set_rxfh ethtool ops
The get/set_rxfh ethtool ops currently takes the rxfh (RSS) parameters
as direct function arguments. This will force us to change the API (and
all drivers' functions) every time some new parameters are added.
This is part 1/2 of the fix, as suggested in [1]:
- First simplify the code by always providing a pointer to all params
(indir, key and func); the fact that some of them may be NULL seems
like a weird historic thing or a premature optimization.
It will simplify the drivers if all pointers are always present.
- Then make the functions take a dev pointer, and a pointer to a
single struct wrapping all arguments. The set_* should also take
an extack.
Liang Chen [Tue, 12 Dec 2023 04:46:11 +0000 (12:46 +0800)]
page_pool: transition to reference count management after page draining
To support multiple users referencing the same fragment,
'pp_frag_count' is renamed to 'pp_ref_count', transitioning pp pages
from fragment management to reference count management after draining
based on the suggestion from [1].
The idea is that the concept of fragmenting exists before the page is
drained, and all related functions retain their current names.
However, once the page is drained, its management shifts to being
governed by 'pp_ref_count'. Therefore, all functions associated with
that lifecycle stage of a pp page are renamed.
Kees Cook [Tue, 12 Dec 2023 22:09:57 +0000 (14:09 -0800)]
cxgb3: Avoid potential string truncation in desc
Builds with W=1 were warning about potential string truncations:
drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c: In function 'cxgb_up':
drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:394:38: warning: '%d' directive output may be truncated writing between 1 and 11 bytes into a region of size between 5 and 20 [-Wformat-truncation=]
394 | "%s-%d", d->name, pi->first_qset + i);
| ^~
In function 'name_msix_vecs',
inlined from 'cxgb_up' at drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:1264:3: drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:394:34: note: directive argument in the range [-2147483641, 509]
394 | "%s-%d", d->name, pi->first_qset + i);
| ^~~~~~~
drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:393:25: note: 'snprintf' output between 3 and 28 bytes into a destination of size 21
393 | snprintf(adap->msix_info[msi_idx].desc, n,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
394 | "%s-%d", d->name, pi->first_qset + i);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avoid open-coded %NUL-termination (this code was assuming snprintf
wasn't %NUL terminating when it does -- likely thinking of strncpy),
and grow the size of the string to handle a maximal value.
Kees Cook [Tue, 12 Dec 2023 22:13:12 +0000 (14:13 -0800)]
amd-xgbe: Avoid potential string truncation in name
Build with W=1 were warning about a potential string truncation:
drivers/net/ethernet/amd/xgbe/xgbe-drv.c: In function 'xgbe_alloc_channels':
drivers/net/ethernet/amd/xgbe/xgbe-drv.c:211:73: warning: '%u' directive output may be truncated writing between 1 and 10 bytes into a region of size 8 [-Wformat-truncation=]
211 | snprintf(channel->name, sizeof(channel->name), "channel-%u", i);
| ^~
drivers/net/ethernet/amd/xgbe/xgbe-drv.c:211:64: note: directive argument in the range [0, 4294967294]
211 | snprintf(channel->name, sizeof(channel->name), "channel-%u", i);
| ^~~~~~~~~~~~
drivers/net/ethernet/amd/xgbe/xgbe-drv.c:211:17: note: 'snprintf' output between 10 and 19 bytes into a destination of size 16
211 | snprintf(channel->name, sizeof(channel->name), "channel-%u", i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Increase the size of the "name" buffer to handle the full format range.
====================
idpf: add get/set for Ethtool's header split ringparam
Currently, the header split feature (putting headers in one smaller
buffer and then the data in a separate bigger one) is always enabled
in idpf when supported.
One may want to not have fragmented frames per each packet, for example,
to avoid XDP frags. To better optimize setups for particular workloads,
add ability to switch the header split state on and off via Ethtool's
ringparams, as well as to query the current status.
There's currently only GET in the Ethtool Netlink interface for now,
so add SET first. I suspect idpf is not the only one supporting this.
====================
Michal Kubiak [Tue, 12 Dec 2023 14:27:52 +0000 (15:27 +0100)]
idpf: add get/set for Ethtool's header split ringparam
idpf supports the header split feature and that feature is always
enabled by default.
However, for flexibility reasons and to simplify some scenarios, it
would be useful to have the support for switching the header split
off (and on) from the userspace.
Address that need by adding the user config parameter, the functions
for disabling (or enabling) the header split feature, and calls to
them from the Ethtool ringparam callbacks.
It still is enabled by default if supported by the hardware.
Follow up commit 9690ae604290 ("ethtool: add header/data split
indication") and add the set part of Ethtool's header split, i.e.
ability to enable/disable header split via the Ethtool Netlink
interface. This might be helpful to optimize the setup for particular
workloads, for example, to avoid XDP frags, and so on.
A driver should advertise ``ETHTOOL_RING_USE_TCP_DATA_SPLIT`` in its
ops->supported_ring_params to allow doing that. "Unknown" passed from
the userspace when the header split is supported means the driver is
free to choose the preferred state.
Oleksij Rempel [Tue, 12 Dec 2023 05:41:43 +0000 (06:41 +0100)]
net: phy: c45: add genphy_c45_pma_read_ext_abilities() function
Move part of the genphy_c45_pma_read_abilities() code to a separate
function.
Some PHYs do not implement PMA/PMD status 2 register (Register 1.8) but
do implement PMA/PMD extended ability register (Register 1.11). To make
use of it, we need to be able to access this part of code separately.
====================
net/sched: optimizations around action binding and init
Scaling optimizations for action binding in rtnl-less filters.
We saw a noticeable lock contention around idrinfo->lock when
testing in a 56 core system, which disappeared after the patches.
====================
Pedro Tammela [Mon, 11 Dec 2023 18:18:07 +0000 (15:18 -0300)]
net/sched: act_api: skip idr replace on bound actions
tcf_idr_insert_many will replace the allocated -EBUSY pointer in
tcf_idr_check_alloc with the real action pointer, exposing it
to all operations. This operation is only needed when the action pointer
is created (ACT_P_CREATED). For actions which are bound to (returned 0),
the pointer already resides in the idr making such operation a nop.
Even though it's a nop, it's still not a cheap operation as internally
the idr code walks the idr and then does a replace on the appropriate slot.
So if the action was bound, better skip the idr replace entirely.
Pedro Tammela [Mon, 11 Dec 2023 18:18:06 +0000 (15:18 -0300)]
net/sched: act_api: rely on rcu in tcf_idr_check_alloc
Instead of relying only on the idrinfo->lock mutex for
bind/alloc logic, rely on a combination of rcu + mutex + atomics
to better scale the case where multiple rtnl-less filters are
binding to the same action object.
Action binding happens when an action index is specified explicitly and
an action exists which such index exists. Example:
tc actions add action drop index 1
tc filter add ... matchall action drop index 1
tc filter add ... matchall action drop index 1
tc filter add ... matchall action drop index 1
tc filter ls ...
filter protocol all pref 49150 matchall chain 0 filter protocol all pref 49150 matchall chain 0 handle 0x1
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 4 bind 3
filter protocol all pref 49151 matchall chain 0 filter protocol all pref 49151 matchall chain 0 handle 0x1
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 4 bind 3
filter protocol all pref 49152 matchall chain 0 filter protocol all pref 49152 matchall chain 0 handle 0x1
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 4 bind 3
When no index is specified, as before, grab the mutex and allocate
in the idr the next available id. In this version, as opposed to before,
it's simplified to store the -EBUSY pointer instead of the previous
alloc + replace combination.
When an index is specified, rely on rcu to find if there's an object in
such index. If there's none, fallback to the above, serializing on the
mutex and reserving the specified id. If there's one, it can be an -EBUSY
pointer, in which case we just try again until it's an action, or an action.
Given the rcu guarantees, the action found could be dead and therefore
we need to bump the refcount if it's not 0, handling the case it's
in fact 0.
As bind and the action refcount are already atomics, these increments can
happen without the mutex protection while many tcf_idr_check_alloc race
to bind to the same action instance.
In case binding encounters a parallel delete or add, it will return
-EAGAIN in order to try again. Both filter and action apis already
have the retry machinery in-place. In case it's an unlocked filter it
retries under the rtnl lock.
====================
virtio-net: support dynamic coalescing moderation
Now, virtio-net already supports per-queue moderation parameter
setting. Based on this, we use the linux dimlib to support
dynamic coalescing moderation for virtio-net.
Due to some scheduling issues, we only support and test the rx dim.
Some test results:
I. Sockperf UDP
=================================================
1. Env
rxq_0 with affinity to cpu_0.
3. Result
dim off: 1143277.00 rxpps, throughput 17.844 MBps, cpu is 100%.
dim on: 1124161.00 rxpps, throughput 17.610 MBps, cpu is 83.5%.
=================================================
II. Redis
=================================================
1. Env
There are 8 rxqs, and rxq_i with affinity to cpu_i.
2. Result
When all cpus are 100%, ops/sec of memtier_benchmark client is
dim off: 978437.23
dim on: 1143638.28
=================================================
III. Nginx
=================================================
1. Env
There are 8 rxqs and rxq_i with affinity to cpu_i.
2. Result
When all cpus are 100%, requests/sec of wrk client is
dim off: 877931.67
dim on: 1019160.31
=================================================
IV. Latency of sockperf udp
=================================================
1. Rx cmd
taskset -c 0 sockperf sr -p 8989
After running this cmd 5 times and averaging the results,
3. Result
dim off: 17.7735 usec
dim on: 18.0110 usec
=================================================
Changelog:
v7->v8:
- Add select DIMLIB.
v6->v7:
- Drop the patch titled "spin lock for ctrl cmd access"
- Use rtnl_trylock to avoid the deadlock.
v5->v6:
- Add patch(4/5): spin lock for ctrl cmd access
- Patch(5/5):
- Use spin lock and cancel_work_sync to synchronize
v4->v5:
- Patch(4/4):
- Fix possible synchronization issues with cancel_work_sync.
- Reduce if/else nesting levels
v3->v4:
- Patch(5/5): drop.
v2->v3:
- Patch(4/5): some minor modifications.
v1->v2:
- Patch(2/5): a minor fix.
- Patch(4/5):
- improve the judgment of dim switch conditions.
- Cancel the work when vq reset.
- Patch(5/5): drop the tx dim implementation.
====================
Heng Qi [Mon, 11 Dec 2023 10:36:07 +0000 (18:36 +0800)]
virtio-net: support rx netdim
By comparing the traffic information in the complete napi processes,
let the virtio-net driver automatically adjust the coalescing
moderation parameters of each receive queue.
Heng Qi [Mon, 11 Dec 2023 10:36:04 +0000 (18:36 +0800)]
virtio-net: returns whether napi is complete
rx netdim needs to count the traffic during a complete napi process,
and start updating and comparing samples to make decisions after
the napi ends. Let virtqueue_napi_complete() return true if napi is done,
otherwise vice versa.
Shannon Nelson [Mon, 11 Dec 2023 18:58:04 +0000 (10:58 -0800)]
ionic: fill out pci error handlers
Set up the pci_error_handlers error_detected and resume to be useful in
handling AER events. If the error detected is pci_channel_io_frozen we
set up to do an FLR at the end of the AER handling - this tends to clear
things up well enough that traffic can continue. Else, let the AER/PCI
machinery do what is needed for the less serious errors seen.
Shannon Nelson [Mon, 11 Dec 2023 18:58:03 +0000 (10:58 -0800)]
ionic: lif debugfs refresh on reset
Remove and restore the lif's debugfs pointers on a reset,
and make sure to check for the dentry before removing it
in case an earlier reset failed to rebuild the lif.
Shannon Nelson [Mon, 11 Dec 2023 18:58:01 +0000 (10:58 -0800)]
ionic: no fw read when PCI reset failed
If there was a failed attempt to reset the PCI connection,
don't later try to read from PCI as the space is unmapped
and will cause a paging request crash. When clearing the PCI
setup we can clear the dev_info register pointer, and check
it before using it in the fw_running test.
Shannon Nelson [Mon, 11 Dec 2023 18:58:00 +0000 (10:58 -0800)]
ionic: prevent pci disable of already disabled device
If a reset fails, the PCI device is left in a disabled
state, so don't try to disable it again on driver remove.
This prevents a scary looking WARN trace in the kernel log.
Shannon Nelson [Mon, 11 Dec 2023 18:57:59 +0000 (10:57 -0800)]
ionic: bypass firmware cmds when stuck in reset
If the driver or firmware is stuck in reset state, don't bother
trying to use adminq commands. This speeds up shutdown and
prevents unnecessary timeouts and error messages.
This includes a bit of rework on ionic_adminq_post_wait()
and ionic_adminq_post_wait_nomsg() to both use
__ionic_adminq_post_wait() which can do the checks needed in
both cases.
Shannon Nelson [Mon, 11 Dec 2023 18:57:58 +0000 (10:57 -0800)]
ionic: keep filters across FLR
Make sure we keep and replay the filters and RSS config across
an FLR by using our FW_RESET flag. This gets checked on the
way down and on the way back up to help determine how much LIF
state to keep and restore across a reset action.
Shannon Nelson [Mon, 11 Dec 2023 18:57:57 +0000 (10:57 -0800)]
ionic: pass opcode to devcmd_wait
Don't rely on the PCI memory for the devcmd opcode because we
read a 0xff value if the PCI bus is broken, which can cause us
to report a bogus dev_cmd opcode later.
David S. Miller [Wed, 13 Dec 2023 10:34:28 +0000 (10:34 +0000)]
Merge branch 'net-at803x-cleanups'
Christian Marangi says:
====================
net: phy: at803x: cleanup
The intention of this big series is to try to cleanup the big
at803x PHY driver.
It currently have 3 different family of PHY in it. at803x, qca83xx
and qca808x.
The current codebase required lots of cleanup and reworking to
make the split possible as currently there is a greater use of
adding special function matching the phy_id.
This has been reworked to make the function actually generic
and make the change only in more specific one. The result
is the addition of micro additional function but that is for good
as it massively simplify splitting the driver later.
Consider that this is all in preparation for the addition of
qca807x PHY driver that will also uso some of the functions of
at803x.
Subsequent series will come with the actual PHY split and other
required cleanup. This is only to start the process with minor
changes.
Changes v4:
- Improve at8031_probe function
Changes v3:
- Add Reviewed-by tag from Andrew
- Split patch 10 (at8031 rename) to rename and move
Changes v2:
- Drop split part due to series too big
- Split changes even more
- Fix problem pointed out by Russell (flawed reworked function logic)
- Add Reviewed-by tag from Andrew
- Minor rework to prevent further code duplication for cdt
====================
net: phy: at803x: drop specific PHY ID check from cable test functions
Drop specific PHY ID check for cable test functions for at803x. This is
done to make functions more generic. While at it better describe what
the functions does by using more symbolic function names.
PHYs that requires to set additional reg are moved to specific function
calling the more generic one.
cdt_start and cdt_wait_for_completion are changed to take an additional
arg to pass specific values specific to the PHY.
net: phy: at803x: move specific at8031 WOL bits to dedicated function
Move specific at8031 WOL enable/disable to dedicated function to make
at803x_set_wol more generic.
This is needed in preparation for PHY driver split as qca8081 share the
same function to toggle WOL settings.
In this new implementation WOL module in at8031 is enabled after the
generic interrupt is setup. This should not cause any problem as the
WOL_INT has a separate implementation and only relay on MAC bits.
net: phy: at803x: raname hw_stats functions to qca83xx specific name
The function and the struct related to hw_stats were specific to qca83xx
PHY but were called following the convention in the driver of calling
everything with at803x prefix.
To better organize the code, rename these function a more specific name
to better describe that they are specific to 83xx PHY family.
Jiri Pirko [Thu, 7 Dec 2023 15:12:04 +0000 (16:12 +0100)]
dpll: remove leftover mode_supported() op and use mode_get() instead
Mode supported is currently reported to the user exactly the same, as
the current mode. That's because mode changing is not implemented.
Remove the leftover mode_supported() op and use mode_get() to fill up
the supported mode exposed to user.
One, if even, mode changing is going to be introduced, this could be
very easily taken back. In the meantime, prevent drivers form
implementing this in wrong way (as for example recent netdevsim
implementation attempt intended to do).
Jakub Kicinski [Wed, 13 Dec 2023 00:06:00 +0000 (16:06 -0800)]
Merge branch 'bnxt_en-update-for-net-next'
Michael Chan says:
====================
bnxt_en: Update for net-next
The first 4 patches in the series fix issues in the net-next tree
introduced in the last 4 weeks. The first 3 patches fix ring accounting
and indexing logic. The 4th patch fix TX timeout when the TX ring is
very small.
The next 7 patches add new features on the P7 chips, including TX
coalesced completions, VXLAN GPE and UDP GSO stateless offload, a
new rx_filter_miss counters, and more QP backing store memory for
RoCE.
The last 2 patches are PTP improvements.
====================
Pavan Chebbi [Tue, 12 Dec 2023 00:51:22 +0000 (16:51 -0800)]
bnxt_en: Make PTP TX timestamp HWRM query silent
In a busy network, especially with flow control enabled, we may
experience timestamp query failures fairly regularly. After a while,
dmesg may be flooded with timestamp query failure error messages.
Silence the error message from the low level hwrm function that
sends the firmware message. Change netdev_err() to netdev_WARN_ONCE()
if this FW call ever fails.
Pavan Chebbi [Tue, 12 Dec 2023 00:51:21 +0000 (16:51 -0800)]
bnxt_en: Skip nic close/open when configuring tstamp filters
We don't have to close and open the nic to make sure we have
valid rx timestamps. Once we have the timestamp filter applied to
the HW and the timestamp_fld_format bit is cleared in the rx
completion and the timestamp is non-zero, we can be sure that rx
timestamp is valid data.
rx_filter_miss counter is newly added to the rx_port_stats_ext
stats structure for newer chips. Newer firmware will return the
structure size that includes this counter. Add this entry to
the bnxt_port_stats_ext_arr array and the ethtool -S code will
pick up this counter if it is supported.
Michael Chan [Tue, 12 Dec 2023 00:51:18 +0000 (16:51 -0800)]
bnxt_en: Configure UDP tunnel TPA
On the new P7 chips, TPA for tunnel packets can be independently
enabled for each VNIC. The default TPA configuration should not
include UDP tunnels because the UDP ports for these tunnels are not
known yet. The chip should not aggregate these UDP tunneled packets
using default UDP ports until the ports are known.
Add a new function bnxt_hwrm_vnic_update_tunl_tpa() to enable VXLAN
and Geneve TPA if the corresponding UDP ports are known.
Michael Chan [Tue, 12 Dec 2023 00:51:17 +0000 (16:51 -0800)]
bnxt_en: Add support for VXLAN GPE
Add a new bnxt_udp_tunnels_p7 struct to support the new P7 chips that
can parse VXLAN GPE packets. Add VXLAN GPE tunnel type handling to
the .set_port() and .unset_port() functions. .ndo_features_check()
is also enhanced to support VXLAN GPE which may encapsulate inner
IP packets instead of ethernet packets.
Michael Chan [Tue, 12 Dec 2023 00:51:16 +0000 (16:51 -0800)]
bnxt_en: Use proper TUNNEL_DST_PORT_ALLOC* commands
In bnxt_udp_tunnel_set_port(), use the proper ALLOC commands instead
of the FREE commands for correctness. The ALLOC and FREE commands
happen to be identical so this is just a cosmetic fix for correctness.
Selvin Xavier [Tue, 12 Dec 2023 00:51:15 +0000 (16:51 -0800)]
bnxt_en: Allocate extra QP backing store memory when RoCE FW reports it
The Fast QP modify destroy RoCE feature requires additional QP entries
in QP context backing store. FW reports the extra count to be
allocated during backing store query. Use this value and allocate extra
memory. Note that this works for both the V1 and V1 backing store
FW APIs.
Michael Chan [Tue, 12 Dec 2023 00:51:14 +0000 (16:51 -0800)]
bnxt_en: Support TX coalesced completion on 5760X chips
TX coalesced completions are supported on newer chips to provide
one TX completion record for multiple TX packets up to the
sq_cons_idx in the completion record. This method saves PCIe
bandwidth by reducing the number of TX completions.
Only very minor changes are now required to support this mode
with the new framework that handles TX completions based on
the consumer indices.
Michael Chan [Tue, 12 Dec 2023 00:51:13 +0000 (16:51 -0800)]
bnxt_en: Prevent TX timeout with a very small TX ring
If xmit_more condition is true, the driver may set the
TX_BD_FLAGS_NO_CMPL flag. If after this packet, the TX ring can no
longer hold a packet with maximum fragments, we will stop the TX
queue. When this happens, we must clear the TX_BD_FLAGS_NO_CMPL flag
on the last packet or there will be no completion and cause TX
timeout.
Michael Chan [Tue, 12 Dec 2023 00:51:12 +0000 (16:51 -0800)]
bnxt_en: Fix TX ring indexing logic
Two spots were missed when modifying the TX ring indexing logic.
The use of unmasked TX index in bnxt_tx_int() will cause unnecessary
__bnxt_tx_int() calls. The same issue in bnxt_tx_int_xdp() can
result in illegal array index.
Somnath Kotur [Tue, 12 Dec 2023 00:51:11 +0000 (16:51 -0800)]
bnxt_en: Fix AGG ring check logic in bnxt_check_rings()
_bnxt_get_max_rings() that is invoked in bnxt_check_rings() already
accounts for the AGG ring(s) and gives a max value based on that.
Increasing for AGG rings before calling _bnxt_get_max_rings() will
result in checking for twice the number of rings than required and
it can fail. Fix it by adjusting for AGG rings after calling
_bnxt_get_max_rings().
Michael Chan [Tue, 12 Dec 2023 00:51:10 +0000 (16:51 -0800)]
bnxt_en: Fix trimming of P5 RX and TX rings
The recent commit to trim the RX and TX rings on P5 chips by assigning
each with max CP rings divided by 2 is not correct. Max CP rings
divided by 2 may be bigger than the original RX or TX and would
lead to failure. In other words, we may be checking for increased
RX/TX rings than required and it may fail.
Fix it by calling __bnxt_trim_rings() instead that would properly
trim RX and TX without the possibility of increasing their values.
Justin Stitt [Mon, 11 Dec 2023 19:10:00 +0000 (19:10 +0000)]
net: mdio-gpio: replace deprecated strncpy with strscpy
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
We expect new_bus->id to be NUL-terminated but not NUL-padded based on
its prior assignment through snprintf:
| snprintf(new_bus->id, MII_BUS_ID_SIZE, "gpio-%x", bus_id);
Due to this, a suitable replacement is `strscpy` [2] due to the fact
that it guarantees NUL-termination on the destination buffer without
unnecessarily NUL-padding.
We can also use sizeof() instead of a length macro as this more closely
ties the maximum buffer size to the destination buffer. Do this for two
instances.
Jakub Kicinski [Tue, 12 Dec 2023 23:14:16 +0000 (15:14 -0800)]
Merge tag 'pef2256-framer' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Linus Walleij says:
====================
Immutable tag for the PEF2256 framer
* tag 'pef2256-framer' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
MAINTAINERS: Add the Lantiq PEF2256 driver entry
pinctrl: Add support for the Lantic PEF2256 pinmux
net: wan: framer: Add support for the Lantiq PEF2256 framer
dt-bindings: net: Add the Lantiq PEF2256 E1/T1/J1 framer
net: wan: Add framer framework support
====================
Herve Codina [Tue, 28 Nov 2023 13:25:33 +0000 (14:25 +0100)]
pinctrl: Add support for the Lantic PEF2256 pinmux
The Lantiq PEF2256 is a framer and line interface component designed to
fulfill all required interfacing between an analog E1/T1/J1 line and the
digital PCM system highway/H.100 bus.
This kind of component can be found in old telecommunication system.
It was used to digital transmission of many simultaneous telephone calls
by time-division multiplexing. Also using HDLC protocol, WAN networks
can be reached through the framer.
This pinmux support handles the pin muxing part (pins RP(A..D) and pins
XP(A..D)) of the PEF2256.
Herve Codina [Tue, 28 Nov 2023 13:25:32 +0000 (14:25 +0100)]
net: wan: framer: Add support for the Lantiq PEF2256 framer
The Lantiq PEF2256 is a framer and line interface component designed to
fulfill all required interfacing between an analog E1/T1/J1 line and the
digital PCM system highway/H.100 bus.
Herve Codina [Tue, 28 Nov 2023 13:25:31 +0000 (14:25 +0100)]
dt-bindings: net: Add the Lantiq PEF2256 E1/T1/J1 framer
The Lantiq PEF2256 is a framer and line interface component designed to
fulfill all required interfacing between an analog E1/T1/J1 line and the
digital PCM system highway/H.100 bus.
Herve Codina [Tue, 28 Nov 2023 13:25:30 +0000 (14:25 +0100)]
net: wan: Add framer framework support
A framer is a component in charge of an E1/T1 line interface.
Connected usually to a TDM bus, it converts TDM frames to/from E1/T1
frames. It also provides information related to the E1/T1 line.
The framer framework provides a set of APIs for the framer drivers
(framer provider) to create/destroy a framer and APIs for the framer
users (framer consumer) to obtain a reference to the framer, and
use the framer.
This basic implementation provides a framer abstraction for:
- power on/off the framer
- get the framer status (line state)
- be notified on framer status changes
- get/set the framer configuration
Dmitry Antipov [Mon, 11 Dec 2023 09:05:32 +0000 (12:05 +0300)]
net: asix: fix fortify warning
When compiling with gcc version 14.0.0 20231129 (experimental) and
CONFIG_FORTIFY_SOURCE=y, I've noticed the following warning:
...
In function 'fortify_memcpy_chk',
inlined from 'ax88796c_tx_fixup' at drivers/net/ethernet/asix/ax88796c_main.c:287:2:
./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field'
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Wattribute-warning]
588 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
This call to 'memcpy()' is interpreted as an attempt to copy TX_OVERHEAD
(which is 8) bytes from 4-byte 'sop' field of 'struct tx_pkt_info' and
thus overread warning is issued. Since we actually want to copy both
'sop' and 'seg' fields at once, use the convenient 'struct_group()' here.
Linus Walleij [Sat, 9 Dec 2023 22:37:35 +0000 (23:37 +0100)]
net: dsa: realtek: Rewrite RTL8366RB MTU handling
The MTU callbacks are in layer 1 size, so for example 1500
bytes is a normal setting. Cache this size, and only add
the layer 2 framing right before choosing the setting. On
the CPU port this will however include the DSA tag since
this is transmitted from the parent ethernet interface!
Add the layer 2 overhead such as ethernet and VLAN framing
and FCS before selecting the size in the register.
This will make the code easier to understand.
The rtl8366rb_max_mtu() callback returns a bogus MTU
just subtracting the CPU tag, which is the only thing
we should NOT subtract. Return the correct layer 1
max MTU after removing headers and checksum.
Andy Shevchenko [Fri, 8 Dec 2023 15:33:27 +0000 (17:33 +0200)]
net: dl2k: Use proper conversion of dev_addr before IO to device
The driver is using iowriteXX()/ioreadXX() APIs which are LE IO
accessors simplified as
1. Convert given value _from_ CPU _to_ LE
2. Write it to the device as is
The dev_addr is a byte stream, but because the driver uses 16-bit
IO accessors, it wants to perform double conversion on BE CPUs,
but it took it wrong, as it effectivelly does two times _from_ CPU
_to_ LE. What it has to do is to consider dev_addr as an array of
LE16 and hence do _from_ LE _to_ CPU conversion, followed by implied
_from_ CPU _to_ LE in the iowrite16().
To achieve that, use get_unaligned_le16(). This will make it correct
and allows to avoid sparse warning as reported by LKP.
====================
net/sched: conditional notification of events for cls and act
This is an optimization we have been leveraging on P4TC but we believe
it will benefit rtnl users in general.
It's common to allocate an skb, build a notification message and then
broadcast an event. In the absence of any user space listeners, these
resources (cpu and memory operations) are wasted. In cases where the subsystem
is lockless (such as in tc-flower) this waste is more prominent. For the
scenarios where the rtnl_lock is held it is not as prominent.
The idea is simple. Build and send the notification iif:
- The user requests via NLM_F_ECHO or
- Someone is listening to the rtnl group (tc mon)
On a simple test with tc-flower adding 1M entries, using just a single core,
there's already a noticeable difference in the cycles spent in tc_new_tfilter
with this patchset.
Note, the above test is using iproute2:tc which execs a shell.
We expect people using netlink directly to observe even greater
reductions.
The qdisc side needs some refactoring of the notification routines to fit in
this new model, so they will be sent in a later patchset.
====================
Pedro Tammela [Fri, 8 Dec 2023 19:28:47 +0000 (16:28 -0300)]
net/sched: cls_api: conditional notification of events
As of today tc-filter/chain events are unconditionally built and sent to
RTNLGRP_TC. As with the introduction of rtnl_notify_needed we can check
before-hand if they are really needed. This will help to alleviate
system pressure when filters are concurrently added without the rtnl
lock as in tc-flower.
Pedro Tammela [Fri, 8 Dec 2023 19:28:45 +0000 (16:28 -0300)]
net/sched: act_api: conditional notification of events
As of today tc-action events are unconditionally built and sent to
RTNLGRP_TC. As with the introduction of rtnl_notify_needed we can check
before-hand if they are really needed.
Pedro Tammela [Fri, 8 Dec 2023 19:28:43 +0000 (16:28 -0300)]
rtnl: add helper to send if skb is not null
This is a convenience helper for routines handling conditional rtnl
events, that is code that might send a notification depending on
rtnl_has_listeners/rtnl_notify_needed.
rtnl: add helper to check if rtnl group has listeners
As of today, rtnl code creates a new skb and unconditionally fills and
broadcasts it to the relevant group. For most operations this is okay
and doesn't waste resources in general.
When operations are done without the rtnl_lock, as in tc-flower, such
skb allocation, message fill and no-op broadcasting can happen in all
cores of the system, which contributes to system pressure and wastes
precious cpu cycles when no one will receive the built message.
Introduce this helper so rtnetlink operations can simply check if someone
is listening and then proceed if necessary.
Johannes Berg [Fri, 8 Dec 2023 09:52:15 +0000 (10:52 +0100)]
Revert "net: rtnetlink: remove local list in __linkwatch_run_queue()"
This reverts commit b8dbbbc535a9 ("net: rtnetlink: remove local list
in __linkwatch_run_queue()"). It's evidently broken when there's a
non-urgent work that gets added back, and then the loop can never
finish.
While reverting, add a note about that.
Reported-by: Marek Szyprowski <[email protected]> Fixes: b8dbbbc535a9 ("net: rtnetlink: remove local list in __linkwatch_run_queue()") Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Fei Qin [Fri, 8 Dec 2023 06:03:01 +0000 (08:03 +0200)]
nfp: support UDP segmentation offload
The device supports UDP hardware segmentation offload, which helps
improving the performance. Thus, this patch adds support for UDP
segmentation offload from the driver side.
David S. Miller [Sun, 10 Dec 2023 19:31:42 +0000 (19:31 +0000)]
Merge branch 'rswitch-jumbo-frames'
Yoshihiro Shimoda says:
====================
net: rswitch: Add jumbo frames support
This patch series is based on the latest net-next.git / main branch.
Changes from v3:
https://lore.kernel.org/all/20231204012058.3876078[email protected]/
- Based on the latest net-next.git / main branch.
- Modify for code consistancy in the patch 3/9.
- Add a condition in the patch 3/9.
- Fix usage of dma_addr in the patch 8/9.
Changes from v2:
https://lore.kernel.org/all/20231201054655.3731772[email protected]/
- Based on the latest net-next.git / main branch.
- Fix using a variable in the patch 8/9.
- Add Reviewed-by tag in the patch 1/9.
Changes from v1:
https://lore.kernel.org/all/20231127115334.3670790[email protected]/
- Based on the latest net-next.git / main branch.
- Fix commit descriptions (s/near the future/the near future/).
====================
If the driver would like to transmit a jumbo frame like 2KiB or more,
it should be split into multiple queues. In the near future, to support
this, add handling specific descriptor types F{START,MID,END}. However,
such jumbo frames will not happen yet because the maximum MTU size is
still default for now.
If this hardware receives a jumbo frame like 2KiB or more, it will be
split into multiple queues. In the near future, to support this,
add handling specific descriptor types F{START,MID,END}. However, such
jumbo frames will not happen yet because the maximum MTU size is still
default for now.
net: rswitch: Add a setting ext descriptor function
If the driver would like to transmit a jumbo frame like 2KiB or more,
it should be split into multiple queues. In the near future, to support
this, add a setting ext descriptor function to improve code readability.
net: rswitch: Add unmap_addrs instead of dma address in each desc
If the driver would like to transmit a jumbo frame like 2KiB or more,
it should be split into multiple queues. In the near future, to support
this, add unmap_addrs array to unmap dma mapping address instead of dma
address in each TX descriptor because the descriptors may not have
the top dma address.
If this hardware receives a jumbo frame like 2KiB or more, it will be
split into multiple queues. In the near future, to support this, use
build_skb() instead of netdev_alloc_skb_ip_align().