net: dsa: mv88e6xxx: Only allow LAG offload on supported hardware
There are chips that do have Global 2 registers, and therefore trunk
mapping/mask tables are not available. Refuse the offload as early as
possible on those devices.
Fixes: 57e661aae6a8 ("net: dsa: mv88e6xxx: Link aggregation support") Signed-off-by: Tobias Waldekranz <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
net: dsa: mv88e6xxx: Provide dummy implementations for trunk setters
Support for Global 2 registers is build-time optional. In the case
where it was not enabled the build would fail as no "dummy"
implementation of these functions was available.
This series adds a DSA driver for the Arrow SpeedChips XRS 7000 series
of HSR/PRP gigabit switch chips.
The chips use Flexibilis IP.
More information can be found here:
https://www.flexibilis.com/products/speedchips-xrs7000/
The switches have up to three RGMII ports and one MII port and are
managed via mdio or i2c. They use a one byte trailing tag to identify
the switch port when in managed mode so I've added a tag driver which
implements this.
This series contains minimal DSA functionality which may be built upon
in future patches. The ultimate goal is to add HSR and PRP
(IEC 62439-3 Clause 5 & 4) offloading with integration into net/hsr.
====================
Add a driver with initial support for the Arrow SpeedChips XRS7000
series of gigabit Ethernet switch chips which are typically used in
critical networking applications.
The switches have up to three RGMII ports and one RMII port.
Management to the switches can be performed over i2c or mdio.
Support for advanced features such as PTP and
HSR/PRP (IEC 62439-3 Clause 5 & 4) is not included in this patch and
may be added at a later date.
====================
Add further DT configuration for AT803x PHYs
This patch series adds the ability to configure the SmartEEE feature
in AT803x PHYs. SmartEEE defaults to enabled on these PHYs, and has
a history of causing random sporadic link drops at Gigabit speeds.
There appears to be two solutions to this. There is the approach that
Freescale adopted early on, which is to disable the SmartEEE feature.
However, this loses the power saving provided by EEE. Another solution
was found by Jon Nettleton is to increase the Tw parameter for Gigabit
links.
This patch series adds support for both approaches, by adding a boolean:
qca,disable-smarteee
if one wishes to disable SmartEEE, and two properties to configure the
SmartEEE Tw parameters:
qca,smarteee-tw-us-100m
qca,smarteee-tw-us-1g
Sadly, the PHY quirk I merged a while back for AT8035 on iMX6 is broken
- rather than disabling SmartEEE mode, it enables it.
The addition of these properties will be sent to the appropriate
platform maintainers - although for SolidRun platforms, we only make use
of "qca,smarteee-tw-us-1g".
====================
Russell King [Thu, 14 Jan 2021 10:45:49 +0000 (10:45 +0000)]
net: phy: at803x: add support for configuring SmartEEE
SmartEEE for the atheros phy was deemed buggy by Freescale and commits
were added to disable it for their boards.
In initial testing, SolidRun found that the default settings were
causing disconnects but by increasing the Tw buffer time we could allow
enough time for all parts of the link to come out of a low power state
and function properly without causing a disconnect. This allows us to
have functional power savings of between 300 and 400mW, rather than
disabling the feature altogether.
This commit adds support for disabling SmartEEE and configuring the Tw
parameters for 1G and 100M speeds.
Russell King [Thu, 14 Jan 2021 10:45:44 +0000 (10:45 +0000)]
dt: ar803x: document SmartEEE properties
The SmartEEE feature of Atheros AR803x PHYs can cause the link to
bounce. Add DT properties to allow SmartEEE to be disabled, and to
allow the Tw parameters for 100M and 1G links to be configured.
hi,
adding the support to have buildid stored in mmap2 event,
so we can bypass the final perf record hunt on build ids.
This patchset allows perf to record build ID in mmap2 event,
and adds perf tooling to store/download binaries to .debug
cache based on these build IDs.
Note that the build id retrieval code is stolen from bpf
code, where it's been used (together with file offsets)
to replace IPs in user space stack traces. It's now added
under lib directory.
v7 changes:
- included only missing kernel patches, cc-ed bpf@vger and
rebased on bpf-next/master [Alexei]
v6 changes:
- last 4 patches rebased Arnaldo's perf/core
v5 changes:
- rebased on latest perf/core
- several patches already pulled in
- fixed trace+probe_vfs_getname.sh output redirection
- fixed changelogs [Arnaldo]
- renamed BUILD_ID_SIZE to BUILD_ID_SIZE_MAX [Song]
v4 changes:
- fixed typo in changelog [Namhyung]
- removed force_download bool from struct dso_store_data,
because it's not used [Namhyung]
v3 changes:
- added acks
- removed forgotten debug code [Arnaldo]
- fixed readlink termination [Ian]
- fixed doc for --debuginfod=URLs [Ian]
- adopted kernel's memchr_inv function and used
it in build_id__is_defined function [Arnaldo]
On recording server:
- on the recording server we can run record with --buildid-mmap
option to store build ids in mmap2 events:
# perf record --buildid-mmap
^C[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.836 MB perf.data ]
- it stores nothing to ~/.debug cache:
# find ~/.debug
find: ‘/root/.debug’: No such file or directory
Jiri Olsa [Thu, 14 Jan 2021 13:40:44 +0000 (14:40 +0100)]
perf: Add build id data in mmap2 event
Adding support to carry build id data in mmap2 event.
The build id data replaces maj/min/ino/ino_generation
fields, which are also used to identify map's binary,
so it's ok to replace them with build id data:
There's still one unresolved review comment from John[3] which I
will resolve with a followup patch.
Differences from v6->v7 [1]:
* Fixed riscv build error detected by 0-day robot.
Differences from v5->v6 [1]:
* Carried Björn Töpel's ack for RISC-V code, plus a couple more acks from
Yonhgong.
* Doc fixups.
* Trivial cleanups.
Differences from v4->v5 [1]:
* Fixed bogus type casts in interpreter that led to warnings from
the 0day robot.
* Dropped feature-detection for Clang per Andrii's suggestion in [4].
The selftests will now fail to build unless you have llvm-project
commit 286daafd6512. The ENABLE_ATOMICS_TEST macro is still needed
to support the no_alu32 tests.
* Carried some Acks from John and Yonghong.
* Dropped confusing usage of __atomic_exchange from prog_test in
favour of __sync_lock_test_and_set.
* [Really] got rid of all the forest of instruction macros
(BPF_ATOMIC_FETCH_ADD and friends); now there's just BPF_ATOMIC_OP
to define all the instructions as we use them in the verifier
tests. This makes the atomic ops less special in that API, and I
don't think the resulting usage is actually any harder to read.
Differences from v3->v4 [1]:
* Added one Ack from Yonghong. He acked some other patches but those
have now changed non-trivally so I didn't add those acks.
* Fixups to commit messages.
* Fixed disassembly and comments: first arg to atomic_fetch_* is a
pointer.
* Improved prog_test efficiency. BPF progs are now all loaded in a
single call, then the skeleton is re-used for each subtest.
* Dropped use of tools/build/feature in favour of a one-liner in the
Makefile.
* Dropped the commit that created an emit_neg helper in the x86
JIT. It's not used any more (it wasn't used in v3 either).
* Combined all the different filter.h macros (used to be
BPF_ATOMIC_ADD, BPF_ATOMIC_FETCH_ADD, BPF_ATOMIC_AND, etc) into
just BPF_ATOMIC32 and BPF_ATOMIC64.
* Removed some references to BPF_STX_XADD from tools/, samples/ and
lib/ that I missed before.
Differences from v2->v3 [1]:
* More minor fixes and naming/comment changes
* Dropped atomic subtract: compilers can implement this by preceding
an atomic add with a NEG instruction (which is what the x86 JIT did
under the hood anyway).
* Dropped the use of -mcpu=v4 in the Clang BPF command-line; there is
no longer an architecture version bump. Instead a feature test is
added to Kbuild - it builds a source file to check if Clang
supports BPF atomics.
* Fixed the prog_test so it no longer breaks
test_progs-no_alu32. This requires some ifdef acrobatics to avoid
complicating the prog_tests model where the same userspace code
exercises both the normal and no_alu32 BPF test objects, using the
same skeleton header.
Differences from v1->v2 [1]:
* Fixed mistakes in the netronome driver
* Addd sub, add, or, xor operations
* The above led to some refactors to keep things readable. (Maybe I
should have just waited until I'd implemented these before starting
the review...)
* Replaced BPF_[CMP]SET | BPF_FETCH with just BPF_[CMP]XCHG, which
include the BPF_FETCH flag
* Added a bit of documentation. Suggestions welcome for more places
to dump this info...
The prog_test that's added depends on Clang/LLVM features added by
Yonghong in commit 286daafd6512 (was
https://reviews.llvm.org/D72184).
This only includes a JIT implementation for x86_64 - I don't plan to
implement JIT support myself for other architectures.
Operations
==========
This patchset adds atomic operations to the eBPF instruction set. The
use-case that motivated this work was a trivial and efficient way to
generate globally-unique cookies in BPF progs, but I think it's
obvious that these features are pretty widely applicable. The
instructions that are added here can be summarised with this list of
kernel operations:
The following are left out of scope for this effort:
* 16 and 8 bit operations
* Explicit memory barriers
Encoding
========
I originally planned to add new values for bpf_insn.opcode. This was
rather unpleasant: the opcode space has holes in it but no entire
instruction classes[2]. Yonghong Song had a better idea: use the
immediate field of the existing STX XADD instruction to encode the
operation. This works nicely, without breaking existing programs,
because the immediate field is currently reserved-must-be-zero, and
extra-nicely because BPF_ADD happens to be zero.
Note that this of course makes immediate-source atomic operations
impossible. It's hard to imagine a measurable speedup from such
instructions, and if it existed it would certainly not benefit x86,
which has no support for them.
The BPF_OP opcode fields are re-used in the immediate, and an
additional flag BPF_FETCH is used to mark instructions that should
fetch a pre-modification value from memory.
So, BPF_XADD is now called BPF_ATOMIC (the old name is kept to avoid
breaking userspace builds), and where we previously had .imm = 0, we
now have .imm = BPF_ADD (which is 0).
Operands
========
Reg-source eBPF instructions only have two operands, while these
atomic operations have up to four. To avoid needing to encode
additional operands, then:
- One of the input registers is re-used as an output register
(e.g. atomic_fetch_add both reads from and writes to the source
register).
- Where necessary (i.e. for cmpxchg) , R0 is "hard-coded" as one of
the operands.
This approach also allows the new eBPF instructions to map directly
to single x86 instructions.
[3] Comment from John about propagating bounds in verifier:
https://lore.kernel.org/bpf/[email protected]/
[4] Mail from Andrii about not supporting old Clang in selftests:
https://lore.kernel.org/bpf/CAEf4BzYBddPaEzRUs=jaWSo5kbf=LZdb7geAUVj85GxLQztuAQ@mail.gmail.com/
====================
Brendan Jackman [Thu, 14 Jan 2021 18:17:50 +0000 (18:17 +0000)]
bpf: Add tests for new BPF atomic operations
The prog_test that's added depends on Clang/LLVM features added by
Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184).
Note the use of a define called ENABLE_ATOMICS_TESTS: this is used
to:
- Avoid breaking the build for people on old versions of Clang
- Avoid needing separate lists of test objects for no_alu32, where
atomics are not supported even if Clang has the feature.
The atomics_test.o BPF object is built unconditionally both for
test_progs and test_progs-no_alu32. For test_progs, if Clang supports
atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper
test code. Otherwise, progs and global vars are defined anyway, as
stubs; this means that the skeleton user code still builds.
The atomics_test.o userspace object is built once and used for both
test_progs and test_progs-no_alu32. A variable called skip_tests is
defined in the BPF object's data section, which tells the userspace
object whether to skip the atomics test.
All these operations are isomorphic enough to implement with the same
verifier, interpreter, and x86 JIT code, hence being a single commit.
The main interesting thing here is that x86 doesn't directly support
the fetch_ version these operations, so we need to generate a CMPXCHG
loop in the JIT. This requires the use of two temporary registers,
IIUC it's safe to use BPF_REG_AX and x86's AUX_REG for this purpose.
Brendan Jackman [Thu, 14 Jan 2021 18:17:48 +0000 (18:17 +0000)]
bpf: Pull out a macro for interpreting atomic ALU operations
Since the atomic operations that are added in subsequent commits are
all isomorphic with BPF_ADD, pull out a macro to avoid the
interpreter becoming dominated by lines of atomic-related code.
Note that this sacrificies interpreter performance (combining
STX_ATOMIC_W and STX_ATOMIC_DW into single switch case means that we
need an extra conditional branch to differentiate them) in favour of
compact and (relatively!) simple C code.
Brendan Jackman [Thu, 14 Jan 2021 18:17:47 +0000 (18:17 +0000)]
bpf: Add instructions for atomic_[cmp]xchg
This adds two atomic opcodes, both of which include the BPF_FETCH
flag. XCHG without the BPF_FETCH flag would naturally encode
atomic_set. This is not supported because it would be of limited
value to userspace (it doesn't imply any barriers). CMPXCHG without
BPF_FETCH woulud be an atomic compare-and-write. We don't have such
an operation in the kernel so it isn't provided to BPF either.
There are two significant design decisions made for the CMPXCHG
instruction:
- To solve the issue that this operation fundamentally has 3
operands, but we only have two register fields. Therefore the
operand we compare against (the kernel's API calls it 'old') is
hard-coded to be R0. x86 has similar design (and A64 doesn't
have this problem).
A potential alternative might be to encode the other operand's
register number in the immediate field.
- The kernel's atomic_cmpxchg returns the old value, while the C11
userspace APIs return a boolean indicating the comparison
result. Which should BPF do? A64 returns the old value. x86 returns
the old value in the hard-coded register (and also sets a
flag). That means return-old-value is easier to JIT, so that's
what we use.
Brendan Jackman [Thu, 14 Jan 2021 18:17:46 +0000 (18:17 +0000)]
bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
The BPF_FETCH field can be set in bpf_insn.imm, for BPF_ATOMIC
instructions, in order to have the previous value of the
atomically-modified memory location loaded into the src register
after an atomic op is carried out.
Brendan Jackman [Thu, 14 Jan 2021 18:17:44 +0000 (18:17 +0000)]
bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
A subsequent patch will add additional atomic operations. These new
operations will use the same opcode field as the existing XADD, with
the immediate discriminating different operations.
In preparation, rename the instruction mode BPF_ATOMIC and start
calling the zero immediate BPF_ADD.
This is possible (doesn't break existing valid BPF progs) because the
immediate field is currently reserved MBZ and BPF_ADD is zero.
All uses are removed from the tree but the BPF_XADD definition is
kept around to avoid breaking builds for people including kernel
headers.
Brendan Jackman [Thu, 14 Jan 2021 18:17:42 +0000 (18:17 +0000)]
bpf: x86: Factor out emission of REX byte
The JIT case for encoding atomic ops is about to get more
complicated. In order to make the review & resulting code easier,
let's factor out some shared helpers.
Jakub Kicinski [Fri, 15 Jan 2021 02:24:55 +0000 (18:24 -0800)]
Merge branch 'dissect-ptp-l2-packet-header'
Eran Ben Elisha says:
====================
Dissect PTP L2 packet header
This series adds support for dissecting PTP L2 packet
header (EtherType 0x88F7).
For packet header dissecting, skb->protocol is needed. Add protocol
parsing operation to vlan ops, to guarantee skb->protocol is set,
as EtherType 0x88F7 occasionally follows a vlan header.
====================
Eran Ben Elisha [Tue, 12 Jan 2021 19:07:13 +0000 (21:07 +0200)]
net: flow_dissector: Parse PTP L2 packet header
Add support for parsing PTP L2 packet header. Such packet consists
of an L2 header (with ethertype of ETH_P_1588), PTP header, body
and an optional suffix.
Eran Ben Elisha [Tue, 12 Jan 2021 19:07:12 +0000 (21:07 +0200)]
net: vlan: Add parse protocol header ops
Add parse protocol header ops for vlan device. Before this patch, vlan
tagged packet transmitted by af_packet had skb->protocol unset. Some
kernel methods (like __skb_flow_dissect()) rely on this missing information
for its packet processing.
This series implements some updates for the GSI interrupt code,
buliding on some bug fixes implemented last month.
The first two are simple changes made to improve readability and
consistency. The third replaces all msleep() calls with comparable
usleep_range() calls.
The remainder make some more substantive changes to make the code
align with recommendations from Qualcomm. The fourth implements a
much shorter timeout for completion GSI commands, and the fifth
implements a longer delay between retries of the STOP channel
command. Finally, the last implements retries for stopping TX
channels (in addition to RX channels).
====================
Alex Elder [Wed, 13 Jan 2021 17:15:29 +0000 (11:15 -0600)]
net: ipa: use usleep_range()
65;6003;1c
The use of msleep() for small periods (less than 20 milliseconds) is
not recommended because the actual delay can be much different than
expected.
We use msleep(1) in several places in the IPA driver to insert short
delays. Replace them with usleep_range calls, which should reliably
delay a period in the range requested.
Alex Elder [Wed, 13 Jan 2021 17:15:28 +0000 (11:15 -0600)]
net: ipa: introduce some interrupt helpers
Create a new function gsi_irq_ev_ctrl_enable() that encapsulates
enabling the event ring control GSI interrupt type, and enables a
single event ring to signal that interrupt. When an event ring
changes state as a result of an event ring command, it triggers this
interrupt.
Create an inverse function gsi_irq_ev_ctrl_disable() as well.
Because only one event ring at a time is enabled for this interrupt,
we can simply disable the interrupt for *all* channels.
Create a pair of helpers that serve the same purpose for channel
commands.
The first two patches update the MAINTAINERS file, Lukas Bulwahn's patch fixes
the files entry for the tcan4x5x driver, which was broken by me in net-next.
A patch by me adds the a missing header file to the CAN Networking Layer.
The next 5 patches are by me and split the the CAN driver related
infrastructure code into more files in a separate subdir. The next two patches
by me clean up the CAN length related code. This is followed by 6 patches by
Vincent Mailhol and me, they add helper code for for CAN frame length
calculation neede for BQL support.
A patch by Vincent Mailhol adds software TX timestamp support.
The last patch is by me, targets the tcan4x5x driver, and removes the unneeded
__packed attribute from the struct tcan4x5x_map_buf.
* tag 'linux-can-next-for-5.12-20210114' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: tcan4x5x: remove __packed attribute from struct tcan4x5x_map_buf
can: dev: can_put_echo_skb(): add software tx timestamps
can: dev: can_rx_offload_get_echo_skb(): extend to return can frame length
can: dev: can_get_echo_skb(): extend to return can frame length
can: dev: can_put_echo_skb(): extend to handle frame_len
can: dev: extend struct can_skb_priv to hold CAN frame length
can: length: can_skb_get_frame_len(): introduce function to get data length of frame in data link layer
can: length: canfd_sanitize_len(): add function to sanitize CAN-FD data length
can: length: can_fd_len2dlc(): simplify length calculcation
can: length: convert to kernel coding style
can: dev: move netlink related code into seperate file
can: dev: move skb related into seperate file
can: dev: move length related code into seperate file
can: dev: move bittiming related code into seperate file
can: dev: move driver related infrastructure into separate subdir
MAINTAINERS: CAN network layer: add missing header file can-ml.h
MAINTAINERS: adjust entry to tcan4x5x file split
====================
Jakub Kicinski [Fri, 15 Jan 2021 01:11:59 +0000 (17:11 -0800)]
Merge branch 'net-dsa-link-aggregation-support'
Tobias Waldekranz says:
====================
net: dsa: Link aggregation support
Start of by adding an extra notification when adding a port to a bond,
this allows static LAGs to be offloaded using the bonding driver.
Then add the generic support required to offload link aggregates to
drivers built on top of the DSA subsystem.
Finally, implement offloading for the mv88e6xxx driver, i.e. Marvell's
LinkStreet family.
Supported LAG implementations:
- Bonding
- Team
Supported modes:
- Isolated. The LAG may be used as a regular interface outside of any
bridge.
- Bridged. The LAG may be added to a bridge, in which case switching
is offloaded between the LAG and any other switch ports. I.e. the
LAG behaves just like a port from this perspective.
In bridged mode, the following is supported:
- STP filtering.
- VLAN filtering.
- Multicast filtering. The bridge correctly snoops IGMP and configures
the proper groups if snooping is enabled. Static groups can also be
configured. MLD seems to work, but has not been extensively tested.
- Unicast filtering. Automatic learning works. Static entries are
_not_ supported. This will be added in a later series as it requires
some more general refactoring in mv88e6xxx before I can test it.
v4 -> v5:
- Cleanup PVT configuration for LAGed ports in mv88e6xxx (Vladimir)
- Document dsa_lag_{map,unmap} (Vladimir)
====================
net: dsa: tag_dsa: Support reception of packets from LAG devices
Packets ingressing on a LAG that egress on the CPU port, which are not
classified as management, will have a FORWARD tag that does not
contain the normal source device/port tuple. Instead the trunk bit
will be set, and the port field holds the LAG id.
Since the exact source port information is not available in the tag,
frames are injected directly on the LAG interface and thus do never
pass through any DSA port interface on ingress.
Management frames (TO_CPU) are not affected and will pass through the
DSA port interface as usual.
Monitor the following events and notify the driver when:
- A DSA port joins/leaves a LAG.
- A LAG, made up of DSA ports, joins/leaves a bridge.
- A DSA port in a LAG is enabled/disabled (enabled meaning
"distributing" in 802.3ad LACP terms).
When a LAG joins a bridge, the DSA subsystem will treat that as each
individual port joining the bridge. The driver may look at the port's
LAG device pointer to see if it is associated with any LAG, if that is
required. This is analogue to how switchdev events are replicated out
to all lower devices when reaching e.g. a LAG.
Drivers can optionally request that DSA maintain a linear mapping from
a LAG ID to the corresponding netdev by setting ds->num_lag_ids to the
desired size.
In the event that the hardware is not capable of offloading a
particular LAG for any reason (the typical case being use of exotic
modes like broadcast), DSA will take a hands-off approach, allowing
the LAG to be formed as a pure software construct. This is reported
back through the extended ACK, but is otherwise transparent to the
user.
net: dsa: Don't offload port attributes on standalone ports
In a situation where a standalone port is indirectly attached to a
bridge (e.g. via a LAG) which is not offloaded, do not offload any
port attributes either. The port should behave as a standard NIC.
Previously, on mv88e6xxx, this meant that in the following setup:
br0
/
team0
/ \
swp0 swp1
If vlan filtering was enabled on br0, swp0's and swp1's QMode was set
to "secure". This caused all untagged packets to be dropped, as their
default VID (0) was not loaded into the VTU.
net: bonding: Notify ports about their initial state
When creating a static bond (e.g. balance-xor), all ports will always
be enabled. This is set, and the corresponding notification is sent
out, before the port is linked to the bond upper.
In the offloaded case, this ordering is hard to deal with.
The lower will first see a notification that it can not associate with
any bond. Then the bond is joined. After that point no more
notifications are sent, so all ports remain disabled.
This change simply sends an extra notification once the port has been
linked to the upper to synchronize the initial state.
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.
When memory is allocated in 'mlxsw_pci_queue_init()' and
'mlxsw_pci_fw_area_init()' GFP_KERNEL can be used because both functions
are already using this flag and no lock is acquired.
When memory is allocated in 'mlxsw_pci_mbox_alloc()' GFP_KERNEL can be used
because it is only called from the probe function and no lock is acquired
in the between.
The call chain is:
--> mlxsw_pci_probe()
--> mlxsw_pci_cmd_init()
--> mlxsw_pci_mbox_alloc()
While at it, also replace the 'dma_set_mask/dma_set_coherent_mask' sequence
by a less verbose 'dma_set_mask_and_coherent() call.
Vladimir Oltean [Thu, 14 Jan 2021 08:35:56 +0000 (10:35 +0200)]
net: marvell: prestera: fix uninitialized vid in prestera_port_vlans_add
prestera_bridge_port_vlan_add should have been called with vlan->vid,
however this was masked by the presence of the local vid variable and I
did not notice the build warning.
====================
selftests: Updates to allow single instance of nettest for client and server
Update nettest to handle namespace change internally to allow a
single instance to run both client and server modes. Device validation
needs to be moved after the namespace change and a few run time
options need to be split to allow values for client and server.
v4
- really fix the memory leak with stdout/stderr buffers
v3
- send proper status in do_server for UDP sockets
- fix memory leak with stdout/stderr buffers
- new patch with separate option for address binding
- new patch to remove unnecessary newline
David Ahern [Thu, 14 Jan 2021 03:09:47 +0000 (20:09 -0700)]
selftests: Add separate options for server device bindings
Add new options to nettest to specify device binding and expected
device binding for server mode, and update fcnal-test script. This
is needed to allow a single instance of nettest running both server
and client modes to use different device bindings.
David Ahern [Thu, 14 Jan 2021 03:09:46 +0000 (20:09 -0700)]
selftests: Add new option for client-side passwords
Add new option to nettest to specify MD5 password to use for client
side. Update fcnal-test script. This is needed for a single instance
running both server and client modes to test password mismatches.
David Ahern [Thu, 14 Jan 2021 03:09:45 +0000 (20:09 -0700)]
selftests: Consistently specify address for MD5 protection
nettest started with -r as the remote address for MD5 passwords.
The -m argument was added to use prefixes with a length when that
feature was added to the kernel. Since -r is used to specify
remote address for client mode, change nettest to only use -m
for MD5 passwords and update fcnal-test script.
David Ahern [Thu, 14 Jan 2021 03:09:42 +0000 (20:09 -0700)]
selftests: Use separate stdout and stderr buffers in nettest
When a single instance of nettest is doing both client and
server modes, stdout and stderr messages can get interlaced
and become unreadable. Allocate a new set of buffers for the
child process handling server mode.
David Ahern [Thu, 14 Jan 2021 03:09:41 +0000 (20:09 -0700)]
selftests: Add support to nettest to run both client and server
Add option to nettest to run both client and server within a
single instance. Client forks a child process to run the server
code. A pipe is used for the server to tell the client it has
initialized and is ready or had an error. This avoid unnecessary
sleeps to handle such race when the commands are separately launched.
David Ahern [Thu, 14 Jan 2021 03:09:39 +0000 (20:09 -0700)]
selftests: Move address validation in nettest
IPv6 addresses can have a device name to declare a scope (e.g.,
fe80::5054:ff:fe12:3456%eth0). The next patch adds support to
switch network namespace before running client or server code
(or both), so move the address validation to the server and
client functions.
IPv4 multicast groups do not have the device scope in the address
specification, so they can be validated inline with option parsing.
David Ahern [Thu, 14 Jan 2021 03:09:37 +0000 (20:09 -0700)]
selftests: Move device validation in nettest
Later patch adds support for switching network namespaces before
running client, server or both. Device validations need to be
done after the network namespace switch, so add a helper to do it
and invoke in server and client code versus inline with argument
parsing. Move related argument checks as well.
Jakub Kicinski [Thu, 14 Jan 2021 23:40:36 +0000 (15:40 -0800)]
Merge branch 'add-100-base-x-mode'
Bjarni Jonasson says:
====================
Add 100 base-x mode
Adding support for 100 base-x in phylink.
The Sparx5 switch supports 100 base-x pcs (IEEE 802.3 Clause 24) 4b5b encoded.
These patches adds phylink support for that mode.
Tested in Sparx5, using sfp modules:
Axcen 100fx AXFE-1314-0521 (base-fx)
Axcen 100lx AXFE-1314-0551 (base-lx)
HP SFP 100FX J9054C (bx-10)
Excom SFP-SX-M1002 (base-lx)
v1 -> v2:
Added description to Documentation/networking/phy.rst
Moved PHY_INTERFACE_MODE_100BASEX to above 1000BASEX
Patching against net-next
====================
Russell King [Tue, 12 Jan 2021 22:59:43 +0000 (22:59 +0000)]
net: phy: ar803x: disable extended next page bit
This bit is enabled by default and advertises support for extended
next page support. XNP is only needed for 10GBase-T and MultiGig
support which is not supported. Additionally, Cisco MultiGig switches
will read this bit and attempt 10Gb negotiation even though Next Page
support is disabled. This will cause timeouts when the interface is
forced to 100Mbps and auto-negotiation will fail. The interfaces are
only 1000Base-T and supporting auto-negotiation for this only requires
the Next Page bit to be set.
Linus Torvalds [Thu, 14 Jan 2021 21:54:09 +0000 (13:54 -0800)]
Merge tag 'linux-kselftest-fixes-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"One single fix to skip BPF selftests by default.
BPF selftests have a hard dependency on cutting edge versions of tools
in the BPF ecosystem including LLVM.
Skipping BPF allows by default will make it easier for users
interested in running kselftest as a whole. Users can include BPF in
Kselftest build by via SKIP_TARGETS variable"
* tag 'linux-kselftest-fixes-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: Skip BPF seftests by default
Linus Torvalds [Thu, 14 Jan 2021 21:31:07 +0000 (13:31 -0800)]
Merge tag 'net-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"We have a few fixes for long standing issues, in particular Eric's fix
to not underestimate the skb sizes, and my fix for brokenness of
register_netdevice() error path. They may uncover other bugs so we
will keep an eye on them. Also included are Willem's fixes for
kmap(_atomic).
Looking at the "current release" fixes, it seems we are about one rc
behind a normal cycle. We've previously seen an uptick of "people had
run their test suites" / "humans actually tried to use new features"
fixes between rc2 and rc3.
Summary:
Current release - regressions:
- fix feature enforcement to allow NETIF_F_HW_TLS_TX if IP_CSUM &&
IPV6_CSUM
- dcb: accept RTM_GETDCB messages carrying set-like DCB commands if
user is admin for backward-compatibility
- selftests/tls: fix selftests build after adding ChaCha20-Poly1305
Current release - always broken:
- ppp: fix refcount underflow on channel unbridge
- bnxt_en: clear DEFRAG flag in firmware message when retry flashing
- smc: fix out of bound access in the new netlink interface
Previous releases - regressions:
- fix use-after-free with UDP GRO by frags
- mptcp: better msk-level shutdown
- rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM
request
- avoid 32 x truesize under-estimation for tiny skbs
- fix issues around register_netdevice() failures
- udp: prevent reuseport_select_sock from reading uninitialized socks
- dsa: unbind all switches from tree when DSA master unbinds
- dsa: clear devlink port type before unregistering slave netdevs
- can: isotp: isotp_getname(): fix kernel information leak
- mlxsw: core: Thermal control fixes
- ipv6: validate GSO SKB against MTU before finish IPv6 processing
- stmmac: use __napi_schedule() for PREEMPT_RT
- net: mvpp2: remove Pause and Asym_Pause support
Misc:
- remove from MAINTAINERS folks who had been inactive for >5yrs"
* tag 'net-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
mptcp: fix locking in mptcp_disconnect()
net: Allow NETIF_F_HW_TLS_TX if IP_CSUM && IPV6_CSUM
MAINTAINERS: dccp: move Gerrit Renker to CREDITS
MAINTAINERS: ipvs: move Wensong Zhang to CREDITS
MAINTAINERS: tls: move Aviad to CREDITS
MAINTAINERS: ena: remove Zorik Machulsky from reviewers
MAINTAINERS: vrf: move Shrijeet to CREDITS
MAINTAINERS: net: move Alexey Kuznetsov to CREDITS
MAINTAINERS: altx: move Jay Cliburn to CREDITS
net: avoid 32 x truesize under-estimation for tiny skbs
nt: usb: USB_RTL8153_ECM should not default to y
net: stmmac: fix taprio configuration when base_time is in the past
net: stmmac: fix taprio schedule configuration
net: tip: fix a couple kernel-doc markups
net: sit: unregister_netdevice on newlink's error path
net: stmmac: Fixed mtu channged by cache aligned
cxgb4/chtls: Fix tid stuck due to wrong update of qid
i40e: fix potential NULL pointer dereferencing
net: stmmac: use __napi_schedule() for PREEMPT_RT
can: mcp251xfd: mcp251xfd_handle_rxif_one(): fix wrong NULL pointer check
...
Linus Torvalds [Thu, 14 Jan 2021 19:10:12 +0000 (11:10 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- memory leak fix for Wacom driver (Ping Cheng)
- various trivial small fixes, cleanups and device ID additions
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: logitech-hidpp: Add product ID for MX Ergo in Bluetooth mode
HID: Ignore battery for Elan touchscreen on ASUS UX550
HID: logitech-dj: add the G602 receiver
HID: wiimote: remove h from printk format specifier
HID: uclogic: remove h from printk format specifier
HID: sony: select CONFIG_CRC32
HID: sfh: fix address space confusion
HID: multitouch: Enable multi-input for Synaptics pointstick/touchpad device
HID: wacom: Fix memory leakage caused by kfifo_alloc
Tariq Toukan [Thu, 14 Jan 2021 15:12:15 +0000 (17:12 +0200)]
net: Allow NETIF_F_HW_TLS_TX if IP_CSUM && IPV6_CSUM
Cited patch below blocked the TLS TX device offload unless HW_CSUM
is set. This broke devices that use IP_CSUM && IP6_CSUM.
Here we fix it.
Note that the single HW_TLS_TX feature flag indicates support for
both IPv4/6, hence it should still be disabled in case only one of
(IP_CSUM | IPV6_CSUM) is set.
To make maintainers' lives easier we're trying to nudge people
towards CCing all the relevant folks on patches, in an attempt
to improve review rate. We have a check in patchwork which validates
the CC list against get_maintainers.pl. It's a little awkward, however,
to force people to CC maintainers who we haven't seen on the mailing
list for years. This series removes from maintainers folks who didn't
provide any tag (incl. authoring a patch) in the last 5 years.
To ensure reasonable signal to noise ratio we only considered
MAINTAINERS entries which had more than 100 patches fall under
them in that time period.
All this is purely a process-greasing exercise, I hope nobody
sees this series as an affront. Most folks are moved to CREDITS,
a couple entries are simply removed.
The following inactive maintainers are kept, because they indicated
the intention to come back in the near future:
- Veaceslav Falico (bonding)
- Christian Benvenuti (Cisco drivers)
- Felix Fietkau (mtk-eth)
- Mirko Linder (skge/sky2)
Patches in this series contain report from a script which did
the analysis. Big thanks to Jonathan Corbet for help and writing
the script (although I feel like I used it differently than Jon
may have intended ;)). The output format is thus:
Subsystem $name
Changes $reviewed / $total ($percent%) // how many changes to the subsystem had at least one ack/review
Last activity: $date_of_most_recent_patch
$maintainer/reviewer1:
Author $last_commit_authored_by_the_person $how_many_in_5yrs
Committer $last_committed $how_many
Tags $last_tag_like_review_signoff_etc $how_many
$maintainer/reviewer2:
Author $last_commit_authored_by_the_person $how_many_in_5yrs
Committer $last_committed $how_many
Tags $last_tag_like_review_signoff_etc $how_many
Top reviewers: // Top 3 reviewers (who are not listed in MAINTAINERS)
[$count_of_reviews_and_acks]: $email
INACTIVE MAINTAINER $name // maintainer / reviewer who has done nothing in last 5yrs
Jakub Kicinski [Thu, 14 Jan 2021 01:49:12 +0000 (17:49 -0800)]
MAINTAINERS: dccp: move Gerrit Renker to CREDITS
As far as I can tell we haven't heard from Gerrit for roughly
5 years now. DCCP patch would really benefit from some review.
Gerrit was the last maintainer so mark this entry as orphaned.
Jakub Kicinski [Thu, 14 Jan 2021 01:49:09 +0000 (17:49 -0800)]
MAINTAINERS: ena: remove Zorik Machulsky from reviewers
While ENA has 3 reviewers and 2 maintainers, we mostly see review
tags and comments from the maintainers. While we very much appreciate
Zorik's invovment in the community let's trim the reviewer list
down to folks we've seen tags from.
Eric Dumazet [Wed, 13 Jan 2021 16:18:19 +0000 (08:18 -0800)]
net: avoid 32 x truesize under-estimation for tiny skbs
Both virtio net and napi_get_frags() allocate skbs
with a very small skb->head
While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
under estimating memory usage.
For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
per page (order-3 page in x86), or even 64 on PowerPC
We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
but consuming far more memory for TCP buffers than instructed in tcp_mem[2]
Even if we force napi_alloc_skb() to only use order-0 pages, the issue
would still be there on arches with PAGE_SIZE >= 32768
This patch makes sure that small skb head are kmalloc backed, so that
other objects in the slab page can be reused instead of being held as long
as skbs are sitting in socket queues.
Note that we might in the future use the sk_buff napi cache,
instead of going through a more expensive __alloc_skb()
Another idea would be to use separate page sizes depending
on the allocated length (to never have more than 4 frags per page)
I would like to thank Greg Thelen for his precious help on this matter,
analysing crash dumps is always a time consuming task.
Yannick Vignon [Wed, 13 Jan 2021 13:15:57 +0000 (14:15 +0100)]
net: stmmac: fix taprio configuration when base_time is in the past
The Synopsys TSN MAC supports Qbv base times in the past, but only up to a
certain limit. As a result, a taprio qdisc configuration with a small
base time (for example when treating the base time as a simple phase
offset) is not applied by the hardware and silently ignored.
This was observed on an NXP i.MX8MPlus device, but likely affects all
TSN-variants of the MAC.
Fix the issue by making sure the base time is in the future, pushing it by
an integer amount of cycle times if needed. (a similar check is already
done in several other taprio implementations, see for example
drivers/net/ethernet/intel/igc/igc_tsn.c#L116 or
drivers/net/dsa/sja1105/sja1105_ptp.h#L39).
Yannick Vignon [Wed, 13 Jan 2021 13:15:56 +0000 (14:15 +0100)]
net: stmmac: fix taprio schedule configuration
When configuring a 802.1Qbv schedule through the tc taprio qdisc on an NXP
i.MX8MPlus device, the effective cycle time differed from the requested one
by N*96ns, with N number of entries in the Qbv Gate Control List. This is
because the driver was adding a 96ns margin to each interval of the GCL,
apparently to account for the IPG. The problem was observed on NXP
i.MX8MPlus devices but likely affected all devices relying on the same
configuration callback (dwmac 4.00, 4.10, 5.10 variants).
Fix the issue by removing the margins, and simply setup the MAC with the
provided cycle time value. This is the behavior expected by the user-space
API, as altering the Qbv schedule timings would break standards conformance.
This is also the behavior of several other Ethernet MAC implementations
supporting taprio, including the dwxgmac variant of stmmac.
A function has a different name between their prototype
and its kernel-doc markup:
../net/tipc/link.c:2551: warning: expecting prototype for link_reset_stats(). Prototype was for tipc_link_reset_stats() instead
../net/tipc/node.c:1678: warning: expecting prototype for is the general link level function for message sending(). Prototype was for tipc_node_xmit() instead
Jakub Kicinski [Thu, 14 Jan 2021 01:29:47 +0000 (17:29 -0800)]
net: sit: unregister_netdevice on newlink's error path
We need to unregister the netdevice if config failed.
.ndo_uninit takes care of most of the heavy lifting.
This was uncovered by recent commit c269a24ce057 ("net: make
free_netdev() more lenient with unregistering devices").
Previously the partially-initialized device would be left
in the system.
Nicholas Miell [Mon, 11 Jan 2021 06:09:25 +0000 (22:09 -0800)]
HID: logitech-hidpp: Add product ID for MX Ergo in Bluetooth mode
The Logitech MX Ergo trackball supports HID++ 4.5 over Bluetooth. Add its
product ID to the table so we can get battery monitoring support.
(The hid-logitech-hidpp driver already recognizes it when connected via
a Unifying Receiver.)
can: tcan4x5x: remove __packed attribute from struct tcan4x5x_map_buf
The first member of struct tcan4x5x_map_buf is the struct tcan4x5x_buf_cmd,
which has a size of 4 bytes. It's followed by an array of u8. The compiler
places the array directly after the struct tcan4x5x_buf_cmd.
This patch removes the not needed attribute __packed from the struct
tcan4x5x_map_buf.
Call skb_tx_timestamp() within can_put_echo_skb() so that a software tx
timestamp gets attached to the skb.
There two main reasons to include this call in can_put_echo_skb():
* It easily allow to enable the tx timestamp on all devices with
just one small change.
* According to Documentation/networking/timestamping.rst, the tx
timestamps should be generated in the device driver as close as possible,
but always prior to passing the packet to the network interface. During the
call to can_put_echo_skb(), the skb gets cloned meaning that the driver
should not dereference the skb variable anymore after can_put_echo_skb()
returns. This makes can_put_echo_skb() the very last place we can use the
skb without having to access the echo_skb[] array.
Remark: by default, skb_tx_timestamp() does nothing. It needs to be activated
by passing the SOF_TIMESTAMPING_TX_SOFTWARE flag either through socket options
or control messages.
References:
* Support for the error queue in CAN RAW sockets (which is needed for
tx timestamps) was introduced in:
https://git.kernel.org//torvalds/c/eb88531bdbfaafb827192d1fc6c5a3fcc4fadd96
* Put the call to skb_tx_timestamp() just before adding it to the
array:
https://lore.kernel.org/r/043c3ea1-6bdd-59c0-0269-27b2b5b36cec@victronenergy.com
can: dev: can_rx_offload_get_echo_skb(): extend to return can frame length
In order to implement byte queue limits (bql) in CAN drivers, the length of the
CAN frame needs to be passed into the networking stack after queueing and after
transmission completion.
To avoid to calculate this length twice, extend can_rx_offload_get_echo_skb()
to return that value. Convert all users of this function, too.
can: dev: can_get_echo_skb(): extend to return can frame length
In order to implement byte queue limits (bql) in CAN drivers, the length of the
CAN frame needs to be passed into the networking stack after queueing and after
transmission completion.
To avoid to calculate this length twice, extend can_get_echo_skb() to return
that value. Convert all users of this function, too.
Vincent Mailhol [Mon, 11 Jan 2021 14:19:27 +0000 (15:19 +0100)]
can: dev: can_put_echo_skb(): extend to handle frame_len
Add a frame_len argument to can_put_echo_skb() which is used to save length of
the CAN frame into field frame_len of struct can_skb_priv so that it can be
later used after transmission completion. Convert all users of this function,
too.
Drivers which implement BQL call can_put_echo_skb() with the output of
can_skb_get_frame_len(skb) and drivers which do not simply pass zero as an
input (in the same way that NULL would be given to can_get_echo_skb()). This
way, we have a nice symmetry between the two echo functions.
can: dev: extend struct can_skb_priv to hold CAN frame length
In order to implement byte queue limits (bql) in CAN drivers, the length of the
CAN frame needs to be passed into the networking stack after queueing and after
transmission completion.
To avoid to calculate this length twice, extend the struct can_skb_priv to hold
the length of the CAN frame and extend __can_get_echo_skb() to return that
value.
Vincent Mailhol [Mon, 11 Jan 2021 14:19:25 +0000 (15:19 +0100)]
can: length: can_skb_get_frame_len(): introduce function to get data length of frame in data link layer
This patch adds the function can_skb_get_frame_len() which returns the length
of a CAN frame on the data link layer, including Start-of-frame, Identifier,
various other bits, the actual data, the CRC, the End-of-frame, the Inter frame
spacing.
can: length: canfd_sanitize_len(): add function to sanitize CAN-FD data length
The data field in CAN-FD frames have specifig frame length (0, 1, 2, 3, 4, 5,
6, 7, 8, 12, 16, 20, 24, 32, 48, 64). This function "rounds" up a given length
to the next valid CAN-FD frame length.
David Wu [Wed, 13 Jan 2021 03:41:09 +0000 (11:41 +0800)]
net: stmmac: Fixed mtu channged by cache aligned
Since the original mtu is not used when the mtu is updated,
the mtu is aligned with cache, this will get an incorrect.
For example, if you want to configure the mtu to be 1500,
but mtu 1536 is configured in fact.
Ayush Sawal [Tue, 12 Jan 2021 05:36:00 +0000 (11:06 +0530)]
cxgb4/chtls: Fix tid stuck due to wrong update of qid
TID stuck is seen when there is a race in
CPL_PASS_ACCEPT_RPL/CPL_ABORT_REQ and abort is arriving
before the accept reply, which sets the queue number.
In this case HW ends up sending CPL_ABORT_RPL_RSS to an
incorrect ingress queue.
V1->V2:
- Removed the unused variable len in chtls_set_quiesce_ctrl().
V2->V3:
- As kfree_skb() has a check for null skb, so removed this
check before calling kfree_skb() in func chtls_send_reset().
Yuchung Cheng [Mon, 11 Jan 2021 23:05:52 +0000 (15:05 -0800)]
tcp: assign skb hash after tcp_event_data_sent
Move skb_set_hash_from_sk s.t. it's called after instead of before
tcp_event_data_sent is called. This enables congestion control
modules to change the socket hash right before restarting from
idle (via the TX_START congestion event).
Currently, the function i40e_construct_skb_zc only frees the input xdp
buffer when the output skb is successfully built. On error, the
function i40e_clean_rx_irq_zc does not commit anything for the current
packet descriptor and simply exits the packet descriptor processing
loop, with the plan to restart the processing of this descriptor on
the next invocation. Therefore, on error the ring next-to-clean
pointer should not advance, the xdp i.e. *bi buffer should not be
freed and the current buffer info should not be invalidated by setting
*bi to NULL. Therefore, the *bi should only be set to NULL when the
function i40e_construct_skb_zc is successful, otherwise a NULL *bi
will be dereferenced when the work for the current descriptor is
eventually restarted.
Ioana Ciornei [Mon, 11 Jan 2021 17:18:02 +0000 (19:18 +0200)]
dpaa2-mac: fix the remove path for non-MAC interfaces
Check if the interface is indeed connected to a MAC before trying to
close the DPMAC object representing it. Without this check we end up
working with a NULL pointer.
Declare Rx VLAN filtering as supported and user-changeable only when
there are VLAN filtering entries available on the DPNI object. Even
then, rx-vlan-filtering is by default disabled.
Also, populate the .ndo_vlan_rx_add_vid() and .ndo_vlan_rx_kill_vid()
callbacks for adding and removing a specific VLAN from the VLAN table.