Add a new optional firmware-name property to the IPA DT node. It
is used only if the modem is not doing early initialization (i.e.,
if the modem-init property is not present). Its value is the name
of the firmware file to use; if it's not specified, a default name
("ipa_fws.mdt") is used.
Xuan Zhuo [Fri, 16 Apr 2021 09:16:15 +0000 (17:16 +0800)]
virtio-net: page_to_skb() use build_skb when there's sufficient tailroom
In page_to_skb(), if we have enough tailroom to save skb_shared_info, we
can use build_skb to create skb directly. No need to alloc for
additional space. And it can save a 'frags slot', which is very friendly
to GRO.
Here, if the payload of the received package is too small (less than
GOOD_COPY_LEN), we still choose to copy it directly to the space got by
napi_alloc_skb. So we can reuse these pages.
Testing Machine:
The four queues of the network card are bound to the cpu1.
Test command:
for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& done
The size of the udp package is 1000, so in the case of this patch, there
will always be enough tailroom to use build_skb. The sent udp packet
will be discarded because there is no port to receive it. The irqsoftd
of the machine is 100%, we observe the received quantity displayed by
sar -n DEV 1:
no build_skb: 956864.00 rxpck/s
build_skb: 1158465.00 rxpck/s
The MHI WWWAN control driver allows MHI QCOM-based modems to expose
different modem control protocols/ports via the WWAN framework, so that
userspace modem tools or daemon (e.g. ModemManager) can control WWAN
config and state (APN config, SMS, provider selection...). A QCOM-based
modem can expose one or several of the following protocols:
- AT: Well known AT commands interactive protocol (microcom, minicom...)
- MBIM: Mobile Broadband Interface Model (libmbim, mbimcli)
- QMI: QCOM MSM/Modem Interface (libqmi, qmicli)
- QCDM: QCOM Modem diagnostic interface (libqcdm)
- FIREHOSE: XML-based protocol for Modem firmware management
(qmi-firmware-update)
Note that this patch is mostly a rework of the earlier MHI UCI
tentative that was a generic interface for accessing MHI bus from
userspace. As suggested, this new version is WWAN specific and is
dedicated to only expose channels used for controlling a modem, and
for which related opensource userpace support exist.
This change introduces initial support for a WWAN framework. Given the
complexity and heterogeneity of existing WWAN hardwares and interfaces,
there is no strict definition of what a WWAN device is and how it should
be represented. It's often a collection of multiple devices that perform
the global WWAN feature (netdev, tty, chardev, etc).
One usual way to expose modem controls and configuration is via high
level protocols such as the well known AT command protocol, MBIM or
QMI. The USB modems started to expose them as character devices, and
user daemons such as ModemManager learnt to use them.
This initial version adds the concept of WWAN port, which is a logical
pipe to a modem control protocol. The protocols are rawly exposed to
user via character device, allowing straigthforward support in existing
tools (ModemManager, ofono...). The WWAN core takes care of the generic
part, including character device management, and relies on port driver
operations to receive/submit protocol data.
Since the different devices exposing protocols for a same WWAN hardware
do not necessarily know about each others (e.g. two different USB
interfaces, PCI/MHI channel devices...) and can be created/removed in
different orders, the WWAN core ensures that all WAN ports contributing
to the 'whole' WWAN feature are grouped under the same virtual WWAN
device, relying on the provided parent device (e.g. mhi controller,
USB device). It's a 'trick' I copied from Johannes's earlier WWAN
subsystem proposal.
This initial version is purposely minimalist, it's essentially moving
the generic part of the previously proposed mhi_wwan_ctrl driver inside
a common WWAN framework, but the implementation is open and flexible
enough to allow extension for further drivers.
Stefan Chulski [Fri, 16 Apr 2021 08:15:17 +0000 (11:15 +0300)]
net: mvpp2: Add parsing support for different IPv4 IHL values
Add parser entries for different IPv4 IHL values.
Each entry will set the L4 header offset according to the IPv4 IHL field.
L3 header offset will set during the parsing of the IPv4 protocol.
Because of missed parser support for IP header length > 20, RX IPv4 checksum HW offload fails
and skb->ip_summed set to CHECKSUM_NONE(checksum done by Network stack).
This patch adds RX IPv4 checksum HW offload capability for frames with IP header length > 20.
Hayes Wang [Fri, 16 Apr 2021 08:04:34 +0000 (16:04 +0800)]
r8152: add help function to change mtu
The different chips may have different requests when changing mtu.
Therefore, add a new help function of rtl_ops to change mtu. Besides,
reset the tx/rx after changing mtu.
Additionally, add mtu_to_size() and size_to_mtu() macros to simplify
the code.
Hayes Wang [Fri, 16 Apr 2021 08:04:32 +0000 (16:04 +0800)]
r8152: set inter fram gap time depending on speed
Set the maximum inter frame gap time (144ns) for speed 10M/half and
100M/half. It improves the performance for those speeds. And, there
is no effect for the other speeds.
For 10M/half and 100M/half, the fast inter frame gap time let the
device couldn't use the feature of the aggregation effectively,
because the transfer would be completed fastly. Therefore, use the
maximum value to improve the effect of the aggregation. However, you
may not feel the improvement for fast CPUs, because they compensate
for the effect of the aggregation.
The intention is for the loop to timeout if the body does not succeed.
The current logic calls time_is_before_jiffies(timeout) which is false
until after the timeout, so the loop body never executes.
Fix by using readl_poll_timeout as a more standard and less error-prone
solution.
Fixes: ba37b7caf1ed ("net: ethernet: mtk_eth_soc: add support for initializing the PPE") Signed-off-by: Ilya Lipnitskiy <[email protected]> Cc: Felix Fietkau <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
MPTCP sockets have previously had limited socket option support. The
architecture of MPTCP sockets (one userspace-facing MPTCP socket that
manages one or more in-kernel TCP subflow sockets) adds complexity for
passing options through to lower levels. This patch set adds MPTCP
support for socket options commonly used with TCP.
Patch 1 reverts an interim socket option fix (a socket option blocklist)
that was merged in the net tree for v5.12.
Patch 2 moves the socket option code to a separate file, with no
functional changes.
Patch 3 adds an allowlist for socket options that are known to function
with MPTCP. Later patches in this set add more allowed options.
Patches 4 and 5 add infrastructure for syncing MPTCP-level options with
the TCP subflows.
Patches 6-12 add support for specific socket options.
Patch 13 adds a socket option self test.
====================
TCP_CONGESTION is set for all subflows.
The mptcp socket gains icsk_ca_ops too so it can be used to keep the
authoritative state that should be set on new/future subflows.
TCP_INFO will return first subflow only.
The out-of-tree kernel has a MPTCP_INFO getsockopt, this could be added
later on.
Paolo Abeni suggested to avoid re-syncing new subflows because
they inherit options from listener. In case options were set on
listener but are not set on mptcp-socket there is no need to
do any synchronisation for new subflows.
This change sets sockopt_seq of new mptcp sockets to the seq of
the mptcp listener sock.
Subflow sequence is set to the embedded tcp listener sk.
Add a comment explaing why sk_state is involved in sockopt_seq
generation.
mptcp: add skeleton to sync msk socket options to subflows
Handle following cases:
1. setsockopt is called with multiple subflows.
Change might have to be mirrored to all of them.
This is done directly in process context/setsockopt call.
2. Outgoing subflow is created after one or several setsockopt()
calls have been made. Old setsockopt changes should be
synced to the new socket.
3. Incoming subflow, after setsockopt call(s).
Cases 2 and 3 are handled right after the join list is spliced to the conn
list.
Not all sockopt values can be just be copied by value, some require
helper calls. Those can acquire socket lock (which can sleep).
If the join->conn list splicing is done from preemptible context,
synchronization can be done right away, otherwise its deferred to work
queue.
mptcp: revert "mptcp: forbit mcast-related sockopt on MPTCP sockets"
This change reverts commit 86581852d771 ("mptcp: forbit mcast-related sockopt on MPTCP sockets").
As announced in the cover letter of the mentioned patch above, the
following commits introduce a larger MPTCP sockopt implementation
refactor.
This time, we switch from a blocklist to an allowlist. This is safer for
the future where new sockoptions could be added while not being fully
supported with MPTCP sockets and thus causing unstabilities.
atl1c: move tx cleanup processing out of interrupt
Tx queue cleanup happens in interrupt handler on same core as rx queue
processing. Both can take considerable amount of processing in high
packet-per-second scenarios.
Sending big amounts of packets can stall the rx processing which is
unfair and also can lead to out-of-memory condition since
__dev_kfree_skb_irq queues the skbs for later kfree in softirq which
is not allowed to happen with heavy load in interrupt handler.
This puts tx cleanup in its own napi and enables threaded napi to
allow the rx/tx queue processing to happen on different cores.
The ability to sustain equal amounts of tx/rx traffic increased:
from 280Kpps to 1130Kpps on Threadripper 3960X with upcoming
Mikrotik 10/25G NIC,
from 520Kpps to 850Kpps on Intel i3-3320 with Mikrotik RB44Ge adapter.
David S. Miller [Fri, 16 Apr 2021 22:15:45 +0000 (15:15 -0700)]
Merge branch 'BR_FDB_LOCAL'
Vladimir Oltean says:
====================
Pass the BR_FDB_LOCAL information to switchdev drivers
Bridge FDB entries with the is_local flag are entries which are
terminated locally and not forwarded. Switchdev drivers might want to be
notified of these addresses so they can trap them. If they don't program
these entries to hardware, there is no guarantee that they will do the
right thing with these entries, and they won't be, let's say, flooded.
Ideally none of the switchdev drivers should ignore these entries, but
having access to the is_local bit is the bare minimum change that should
be done in the bridge layer, before this is even possible.
These 2 changes are extracted from the larger "RX filtering in DSA"
series:
https://patchwork.kernel.org/project/netdevbpf/patch/20210224114350.2791260[email protected]/
https://patchwork.kernel.org/project/netdevbpf/patch/20210224114350.2791260[email protected]/
and submitted separately, because they touch all switchdev drivers,
while the rest is mostly specific to DSA.
This change is not a functional one, in the sense that everybody still
ignores the local FDB entries, but this will be changed by further
patches at least for DSA.
====================
Vladimir Oltean [Wed, 14 Apr 2021 16:52:56 +0000 (19:52 +0300)]
net: bridge: switchdev: include local flag in FDB notifications
As explained in bugfix commit 6ab4c3117aec ("net: bridge: don't notify
switchdev for local FDB addresses") as well as in this discussion:
https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/
the switchdev notifiers for FDB entries managed to have a zero-day bug,
which was that drivers would not know what to do with local FDB entries,
because they were not told that they are local. The bug fix was to
simply not notify them of those addresses.
Let us now add the 'is_local' bit to bridge FDB entries, and make all
drivers ignore these entries by their own choice.
Instead of having to add more and more arguments to
br_switchdev_fdb_call_notifiers, get rid of it and build the info
struct directly in br_switchdev_fdb_notify.
David S. Miller [Fri, 16 Apr 2021 00:08:30 +0000 (17:08 -0700)]
Merge branch 'ehtool-fec-stats'
Jakub Kicinski says:
====================
ethtool: add standard FEC statistics
This set adds uAPI for reporting standard FEC statistics, and
implements it in a handful of drivers.
The statistics are taken from the IEEE standard, with one
extra seemingly popular but not standard statistics added.
The implementation is similar to that of the pause frame
statistics, user requests the stats by setting a bit
(ETHTOOL_FLAG_STATS) in the common ethtool header of
ETHTOOL_MSG_FEC_GET.
Since standard defines the statistics per lane what's
reported is both total and per-lane counters:
# ethtool -I --show-fec eth0
FEC parameters for eth0:
Configured FEC encodings: None
Active FEC encoding: None
Statistics:
corrected_blocks: 256
Lane 0: 255
Lane 1: 1
uncorrectable_blocks: 145
Lane 0: 128
Lane 1: 17
v2: check for errors in mlx5 register access
====================
Jakub Kicinski [Thu, 15 Apr 2021 22:53:17 +0000 (15:53 -0700)]
sfc: ef10: implement ethtool::get_fec_stats
Report what appears to be the standard block counts:
- 30.5.1.1.17 aFECCorrectedBlocks
- 30.5.1.1.18 aFECUncorrectableBlocks
Don't report the per-lane symbol counts, if those really
count symbols they are not what the standard calls for
(even if symbols seem like the most useful thing to count.)
Fingers crossed that fec_corrected_errors is not in symbols.
Jakub Kicinski [Thu, 15 Apr 2021 22:53:15 +0000 (15:53 -0700)]
ethtool: add FEC statistics
Similarly to pause statistics add stats for FEC.
The IEEE standard mandates two sets of counters:
- 30.5.1.1.17 aFECCorrectedBlocks
- 30.5.1.1.18 aFECUncorrectableBlocks
where block is a block of bits FEC operates on.
Each of these counters is defined per lane (PCS instance).
Multiple vendors provide number of corrected _bits_ rather
than/as well as blocks.
This set adds the 2 standard-based block counters and a extra
one for corrected bits.
Counters are exposed to user space via netlink in new attributes.
Each attribute carries an array of u64s, first element is
the total count, and the following ones are a per-lane break down.
Much like with pause stats the operation will not fail when driver
does not implement the get_fec_stats callback (nor can the driver
fail the operation by returning an error). If stats can't be
reported the relevant attributes will be empty.
Jakub Kicinski [Thu, 15 Apr 2021 22:53:14 +0000 (15:53 -0700)]
ethtool: fec_prepare_data() - jump to error handling
Refactor fec_prepare_data() a little bit to skip the body
of the function and exit on error. Currently the code
depends on the fact that we only have one call which
may fail between ethnl_ops_begin() and ethnl_ops_complete()
and simply saves the error code. This will get hairy with
the stats also being queried.
Yangbo Lu [Thu, 15 Apr 2021 05:34:55 +0000 (13:34 +0800)]
enetc: convert to schedule_work()
Convert system_wq queue_work() to schedule_work() which is
a wrapper around it, since the former is a rare construct.
Fixes: 7294380c5211 ("enetc: support PTP Sync packet one-step timestamping") Signed-off-by: Yangbo Lu <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
net: hns3: VF not request link status when PF support push link status feature
To reduce the processing of unnecessary mailbox command when PF supports
actively push its link status to VFs, VFs stop sending request link
status command in periodic service task in this case.
net: hns3: PF add support for pushing link status to VFs
Previously, VF updates its link status every second by send query command
to PF in periodic service task. If link stats of PF is changed, VF may
need at most one second to update its link status.
To reduce delay of link status between PF and VFs, PF actively push its
link status to VFs when its link status is updated. And to let VF know
PF supports this new feature, the link status changed mailbox command
adds one bit to indicate it.
David Bauer [Thu, 15 Apr 2021 01:26:50 +0000 (03:26 +0200)]
net: phy: at803x: select correct page on config init
The Atheros AR8031 and AR8033 expose different registers for SGMII/Fiber
as well as the copper side of the PHY depending on the BT_BX_REG_SEL bit
in the chip configure register.
The driver assumes the copper side is selected on probe, but this might
not be the case depending which page was last selected by the
bootloader. Notably, Ubiquiti UniFi bootloaders show this behavior.
Select the copper page when probing to circumvent this.
David S. Miller [Thu, 15 Apr 2021 23:45:14 +0000 (16:45 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-04-14
This series contains updates to ice driver only.
Bruce changes and removes open coded values to instead use existing
kernel defines and suppresses false cppcheck issues.
Ani adds new VSI states to track netdev allocation and registration. He
also removes leading underscores in the ice_pf_state enum.
Jesse refactors ITR by introducing helpers to reduce duplicated code and
structures to simplify checking of ITR mode. He also triggers a software
interrupt when exiting napi poll or busy-poll to ensure all work is
processed. Modifies /proc/iomem to display driver name instead of PCI
address. He also changes the checks of vsi->type to use a local variable
in ice_vsi_rebuild() and removes an unneeded struct member.
Jake replaces the driver's adaptive interrupt moderation algorithm to
use the kernel's DIM library implementation.
Scott reworks module reads to reduce the number of reads needed and
remove excessive increment of QSFP page.
Brett sets the vsi->vf_id to invalid for non-VF VSIs.
Paul removes the return value from ice_vsi_manage_rss_lut() as it's not
communicating anything critical. He also reduces the scope of a
variable.
====================
We were saving the return value from ice_vsi_manage_rss_lut(), but
the errors from that function are not critical so change it to
return void and remove the code that saved the value.
Brett Creeley [Wed, 31 Mar 2021 21:17:04 +0000 (14:17 -0700)]
ice: Set vsi->vf_id as ICE_INVAL_VFID for non VF VSI types
Currently the vsi->vf_id is set only for ICE_VSI_VF and it's left as 0
for all other VSI types. This is confusing and could be problematic
since 0 is a valid vf_id. Fix this by always setting non VF VSI types to
ICE_INVAL_VFID.
Jesse Brandeburg [Wed, 31 Mar 2021 21:17:03 +0000 (14:17 -0700)]
ice: remove unused struct member
The only time you can ever have a rq_last_status is if
a firmware event was somehow reporting a status on the receive
queue, which are generally firmware initiated events or
mailbox messages from a VF. Mostly this struct member was unused.
Fix this problem by still printing the value of the field in a debug
print, but don't store the value forever in a struct, potentially
creating opportunities for callers to use the wrong struct member.
Jesse Brandeburg [Wed, 31 Mar 2021 21:17:01 +0000 (14:17 -0700)]
ice: print name in /proc/iomem
The driver previously printed it's PCI address in
the name field for the pci resource, which when displayed
via /proc/iomem, would print the same thing twice.
It's more useful for debugging to see the driver name, as
most other modules do.
Scott W Taylor [Wed, 31 Mar 2021 21:17:00 +0000 (14:17 -0700)]
ice: Reimplement module reads used by ethtool
There was an excessive increment of the QSFP page, which is
now fixed. Additionally, this new update now reads 8 bytes
at a time and will retry each request if the module/bus is
busy.
Also, prevent reading from upper pages if module does not
support those pages.
Jesse Brandeburg [Wed, 31 Mar 2021 21:16:59 +0000 (14:16 -0700)]
ice: refactor ITR data structures
Use a dedicated bitfield in order to both increase
the amount of checking around the length of ITR writes
as well as simplify the checks of dynamic mode.
Basically unpack the "high bit means dynamic" logic
into bitfields.
Jesse Brandeburg [Wed, 31 Mar 2021 21:16:58 +0000 (14:16 -0700)]
ice: manage interrupts during poll exit
The driver would occasionally miss that there were outstanding
descriptors to clean when exiting busy/napi poll. This issue has
been in the code since the introduction of the ice driver.
Attempt to "catch" any remaining work by triggering a software
interrupt when exiting napi poll or busy-poll. This will not
cause extra interrupts in the case of normal execution.
This issue was found when running sfnt-pingpong, with busy
poll enabled, and typically with larger I/O sizes like > 8192,
the program would occasionally report > 1 second maximums
to complete a ping pong.
Jacob Keller [Wed, 31 Mar 2021 21:16:57 +0000 (14:16 -0700)]
ice: replace custom AIM algorithm with kernel's DIM library
The ice driver has support for adaptive interrupt moderation, an
algorithm for tuning the interrupt rate dynamically. This algorithm
is based on various assumptions about ring size, socket buffer size,
link speed, SKB overhead, ethernet frame overhead and more.
The Linux kernel has support for a dynamic interrupt moderation
algorithm known as "dimlib". Replace the custom driver-specific
implementation of dynamic interrupt moderation with the kernel's
algorithm.
The Intel hardware has a different hardware implementation than the
originators of the dimlib code had to work with, which requires the
driver to use a slightly different set of inputs for the actual
moderation values, while getting all the advice from dimlib of
better/worse, shift left or right.
The change made for this implementation is to use a pair of values
for each of the 5 "slots" that the dimlib moderation expects, and
the driver will program those pairs when dimlib recommends a slot to
use. The currently implementation uses two tables, one for receive
and one for transmit, and the pairs of values in each slot set the
maximum delay of an interrupt and a maximum number of interrupts per
second (both expressed in microseconds).
There are two separate kinds of bugs fixed by using DIMLIB, one is
UDP single stream send was too slow, and the other is that 8K
ping-pong was going to the most aggressive moderation and has much
too high latency.
The overall result of using DIMLIB is that we meet or exceed our
performance expectations set based on the old algorithm.
Jesse Brandeburg [Wed, 31 Mar 2021 21:16:56 +0000 (14:16 -0700)]
ice: refactor interrupt moderation writes
Introduce several new helpers for writing ITR and GLINT_RATE
registers, and refactor the code calling them. This resulted
in removal of several duplicate functions and rolled a bunch
of simple code back into the calling routines.
In particular this removes some code that was doing both
a store and a set in a helper function, which seems better
done as separate tasks in the caller (and generally takes
less lines of code even with a tiny bit of repetition).
ice: Add new VSI states to track netdev alloc/registration
Add two new VSI states, one to track if a netdev for the VSI has been
allocated and the other to track if the netdev has been registered.
Call unregister_netdev/free_netdev only when the corresponding state
bits are set.
Bruce Allan [Tue, 2 Mar 2021 18:12:06 +0000 (10:12 -0800)]
ice: use kernel definitions for IANA protocol ports and ether-types
The well-known IANA protocol port 3260 (iscsi-target 0x0cbc) and the
ether-types 0x8906 (ETH_P_FCOE) and 0x8914 (ETH_P_FIP) are already defined
in kernel header files. Use those definitions instead of open-coding the
same.
this is a pull request of a single patch for net-next/master.
Vincent Mailhol's patch fixes a NULL pointer dereference when handling
error frames in the etas_es58x driver, which has been added in the
previous PR.
====================
net: bridge: propagate error code and extack from br_mc_disabled_update
Some Ethernet switches might only be able to support disabling multicast
snooping globally, which is an issue for example when several bridges
span the same physical device and request contradictory settings.
Propagate the return value of br_mc_disabled_update() such that this
limitation is transmitted correctly to user-space.
David S. Miller [Wed, 14 Apr 2021 21:14:03 +0000 (14:14 -0700)]
Merge tag 'mlx5-updates-2021-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-04-13
mlx5 core and netdev driver updates
1) E-Switch updates from Parav,
1.1) Devlink parameter to control mlx5 metadata enablement for E-Switch
1.2) Trivial cleanups for E-Switch code
1.3) Dynamically allocate vport steering namespaces only when required
2) From Jianbo, Use variably sized data structures for Software steering
Michael Walle [Wed, 14 Apr 2021 14:48:14 +0000 (16:48 +0200)]
net: enetc: fetch MAC address from device tree
Normally, the bootloader will already initialize the MAC address
registers of the ENETC and the driver will just use them or generate a
random one, if it is not initialized.
Add a new way to provide the MAC address: via device tree. Besides the
usual 'mac-address' property, there is also the possibility to fetch it
via a NVMEM provider. The sl28 board stores the MAC address in the SPI
NOR flash OTP region. Having this will allow linux to fetch the MAC
address from there without being dependent on the bootloader.
No in-tree boards have the device tree properties set, thus for these,
this is a no-op.
Paolo Abeni [Wed, 14 Apr 2021 10:48:48 +0000 (12:48 +0200)]
skbuff: revert "skbuff: remove some unnecessary operation in skb_segment_list()"
the commit 1ddc3229ad3c ("skbuff: remove some unnecessary operation
in skb_segment_list()") introduces an issue very similar to the
one already fixed by commit 53475c5dd856 ("net: fix use-after-free when
UDP GRO with shared fraglist").
If the GSO skb goes though skb_clone() and pskb_expand_head() before
entering skb_segment_list(), the latter will unshare the frag_list
skbs and will release the old list. With the reverted commit in place,
when skb_segment_list() completes, skb->next points to the just
released list, and later on the kernel will hit UaF.
Note that since commit e0e3070a9bc9 ("udp: properly complete L4 GRO
over UDP tunnel packet") the critical scenario can be reproduced also
receiving UDP over vxlan traffic with:
NIC (NETIF_F_GRO_FRAGLIST enabled) -> vxlan -> UDP sink
Attaching a packet socket to the NIC will cause skb_clone() and the
tunnel decapsulation will call pskb_expand_head().
Fixes: 1ddc3229ad3c ("skbuff: remove some unnecessary operation in skb_segment_list()") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Tan Tee Min [Wed, 14 Apr 2021 00:16:17 +0000 (08:16 +0800)]
net: stmmac: Add support for external trigger timestamping
The Synopsis MAC controller supports auxiliary snapshot feature that
allows user to store a snapshot of the system time based on an external
event.
This patch add supports to the above mentioned feature. Users will be
able to triggered capturing the time snapshot from user-space using
application such as testptp or any other applications that uses the
PTP_EXTTS_REQUEST ioctl request.
David S. Miller [Wed, 14 Apr 2021 19:56:44 +0000 (12:56 -0700)]
Merge branch 'marvell-88x2222-improvements'
Ivan Bornyakov says:
====================
net: phy: marvell-88x2222: a couple of improvements
First, there are some SFP modules that only uses RX_LOS for link
indication. Add check that link is operational before actual read of
line-side status.
Second, it is invalid to set 10G speed without autonegotiation,
according to phy_ethtool_ksettings_set(). Implement switching between
10GBase-R and 1000Base-X/SGMII if autonegotiation can't complete but
there is signal in line.
Changelog:
v1 -> v2:
* make checking that link is operational more friendly for
trancievers without SFP cages.
* split swapping 1G/10G modes into non-functional and functional
commits for the sake of easier review.
====================
Ivan Bornyakov [Tue, 13 Apr 2021 20:54:52 +0000 (23:54 +0300)]
net: phy: marvell-88x2222: swap 1G/10G modes on autoneg
Setting 10G without autonegotiation is invalid according to
phy_ethtool_ksettings_set(). Thus, we need to set it during
autonegotiation.
If 1G autonegotiation can't complete for quite a time, but there is
signal in line, switch line interface type to 10GBase-R, if supported,
in hope for link to be established.
And vice versa. If 10GBase-R link can't be established for quite a time,
and autonegotiation is enabled, and there is signal in line, switch line
interface type to appropriate 1G mode, i.e. 1000Base-X or SGMII, if
supported.
Ivan Bornyakov [Tue, 13 Apr 2021 20:54:50 +0000 (23:54 +0300)]
net: phy: marvell-88x2222: check that link is operational
Some SFP modules uses RX_LOS for link indication. In such cases link
will be always up, even without cable connected. RX_LOS changes will
trigger link_up()/link_down() upstream operations. Thus, check that SFP
link is operational before actual read link status.
If there is no SFP cage connected to the tranciever, check only PMD
Recieve Signal Detect register.
Colin Ian King [Tue, 6 Apr 2021 16:53:46 +0000 (17:53 +0100)]
net/mlx5: Fix bit-wise and with zero
The bit-wise and of the action field with MLX5_ACCEL_ESP_ACTION_DECRYPT
is incorrect as MLX5_ACCEL_ESP_ACTION_DECRYPT is zero and not intended
to be a bit-flag. Fix this by using the == operator as was originally
intended.
Addresses-Coverity: ("Logically dead code") Fixes: 7dfee4b1d79e ("net/mlx5: IPsec, Refactor SA handle creation and destruction") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
Jianbo Liu [Thu, 12 Nov 2020 01:32:52 +0000 (01:32 +0000)]
net/mlx5: DR, Use variably sized data structures for different actions
mlx5dr_action is a generally used data structure, and there is an
union for different types of actions in it. The size of mlx5dr_action
is about 72 bytes, but for those actions with fewer fields, most of
the allocated memory is wasted.
Remove this union, and mlx5dr_action becomes a generic action header.
Then actions are dynamically allocated with needed memory, the data
for each action is stored right after the header.
Parav Pandit [Mon, 1 Mar 2021 12:12:13 +0000 (14:12 +0200)]
net/mlx5: E-Switch, Initialize eswitch acls ns when eswitch is enabled
Currently eswitch flow steering (FS) namespace of vport's ingress and
egress ACL are enabled when FS layer is initialized. This is done even
when eswitch is diabled. This demands that total eswitch ports to be
known to FS layer without eswitch in use.
Given the FS core is not dependent on eswitch, make namespace init and
cleanup routines as helper routines to be invoked only when eswitch is
needed.
With this change, ingress and egress ACL namespaces are created only
when eswitch legacy/offloads mode is enabled.
Parav Pandit [Fri, 30 Oct 2020 20:44:16 +0000 (22:44 +0200)]
net/mlx5: E-Switch, let user to enable disable metadata
Currently each packet inserted in eswitch is tagged with a internal
metadata to indicate source vport. Metadata tagging is not always
needed. Metadata insertion is needed for multi-port RoCE, failover
between representors and stacked devices. In many other cases,
metadata enablement is not needed.
Metadata insertion slows down the packet processing rate of the E-switch
when it is in switchdev mode.
Below table show performance gain with metadata disabled for VXLAN
offload rules in both SMFS and DMFS steering mode on ConnectX-5 device.
Hence, allow user to disable metadata using driver specific devlink
parameter. Metadata setting of the eswitch is applicable only for the
switchdev mode.
Example to show and disable metadata before changing eswitch mode:
$ devlink dev param show pci/0000:06:00.0 name esw_port_metadata
pci/0000:06:00.0:
name esw_port_metadata type driver-specific
values:
cmode runtime value true
$ devlink dev param set pci/0000:06:00.0 \
name esw_port_metadata value false cmode runtime
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Reviewed-by: Vu Pham <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
---
changelog:
v1->v2:
- added performance numbers in commit log
- updated commit log and documentation for switchdev mode
- added explicit note on when user can disable metadata in
documentation
Vincent Mailhol [Tue, 13 Apr 2021 11:42:42 +0000 (20:42 +0900)]
can: etas_es58x: fix null pointer dereference when handling error frames
During the handling of CAN bus errors, a CAN error SKB is allocated
using alloc_can_err_skb(). Even if the allocation of the SKB fails,
the function continues in order to do the stats handling.
All access to the can_frame pointer (cf) should be guarded by an if
statement:
if (cf)
However, the increment of the rx_bytes stats:
netdev->stats.rx_bytes += cf->can_dlc;
dereferences the cf pointer and was not guarded by an if condition
leading to a NULL pointer dereference if the can_err_skb() function
failed.
Replacing the cf->can_dlc by the macro CAN_ERR_DLC (which is the
length of any CAN error frames) solves this NULL pointer dereference.
David S. Miller [Tue, 13 Apr 2021 22:12:19 +0000 (15:12 -0700)]
Merge branch 'dpaa2-switch-tc-hw-offload'
Ioana Ciornei says:
====================
dpaa2-switch: add tc hardware offload on ingress traffic
This patch set adds tc hardware offload on ingress traffic in
dpaa2-switch. The cls flower and matchall classifiers are supported
using the same ACL infrastructure supported by the dpaa2-switch.
The first patch creates a new structure to hold all the necessary
information related to an ACL table. This structure is used in the next
patches to create a link between each switch port and the table used.
Multiple ports can share the same ACL table when they also share the
ingress tc block. Also, some small changes in the priority of the
default STP trap is done in the second patch.
The support for cls flower is added in the 3rd patch, while the 4th
one builds on top of the infrastructure put in place and adds cls
matchall support.
The following flow keys are supported:
- Ethernet: dst_mac/src_mac
- IPv4: dst_ip/src_ip/ip_proto/tos
- VLAN: vlan_id/vlan_prio/vlan_tpid/vlan_dei
- L4: dst_port/src_port
Each filter can support only one action from the following list:
- drop
- mirred egress redirect
- trap
With the last patch, we reuse the dpaa2_switch_acl_entry_add() function
added previously instead of open-coding the install of a new ACL entry
into the table.
====================
dpaa2-switch: reuse dpaa2_switch_acl_entry_add() for STP frames trap
Since we added the dpaa2_switch_acl_entry_add() function in the previous
patches to hide all the details of actually adding the ACL entry by
issuing a firmware command, let's use it also for adding a CPU trap for
the STP frames.