In an attempt to actually support shared IRQs in phylib, we now move the
responsibility of triggering the phylib state machine or just returning
IRQ_NONE, based on the IRQ status register, to the PHY driver. Having
3 different IRQ handling callbacks (.handle_interrupt(),
.did_interrupt() and .ack_interrupt() ) is confusing so let the PHY
driver implement directly an IRQ handler like any other device driver.
Make this driver follow the new convention.
Ioana Ciornei [Sun, 1 Nov 2020 12:50:58 +0000 (14:50 +0200)]
net: phy: make .ack_interrupt() optional
As a first step into making phylib and all PHY drivers to actually
have support for shared IRQs, make the .ack_interrupt() callback
optional.
After all drivers have been moved to implement the generic
interrupt handle, the phy_drv_supports_irq() check will be
changed again to only require the .handle_interrupts() callback.
Ioana Ciornei [Sun, 1 Nov 2020 12:50:57 +0000 (14:50 +0200)]
net: phy: add a shutdown procedure
In case of a board which uses a shared IRQ we can easily end up with an
IRQ storm after a forced reboot.
For example, a 'reboot -f' will trigger a call to the .shutdown()
callbacks of all devices. Because phylib does not implement that hook,
the PHY is not quiesced, thus it can very well leave its IRQ enabled.
At the next boot, if that IRQ line is found asserted by the first PHY
driver that uses it, but _before_ the driver that is _actually_ keeping
the shared IRQ asserted is probed, the IRQ is not going to be
acknowledged, thus it will keep being fired preventing the boot process
of the kernel to continue. This is even worse when the second PHY driver
is a module.
To fix this, implement the .shutdown() callback and disable the
interrupts if these are used.
Note that we are still susceptible to IRQ storms if the previous kernel
exited with a panic or if the bootloader left the shared IRQ active, but
there is absolutely nothing we can do about these cases.
Ioana Ciornei [Sun, 1 Nov 2020 12:50:56 +0000 (14:50 +0200)]
net: phy: export phy_error and phy_trigger_machine
These functions are currently used by phy_interrupt() to either signal
an error condition or to trigger the link state machine. In an attempt
to actually support shared PHY IRQs, export these two functions so that
the actual PHY drivers can use them.
Xin Long [Wed, 4 Nov 2020 06:55:32 +0000 (14:55 +0800)]
sctp: bring inet(6)_skb_parm back to sctp_input_cb
inet(6)_skb_parm was removed from sctp_input_cb by Commit a1dd2cf2f1ae
("sctp: allow changing transport encap_port by peer packets"), as it
thought sctp_input_cb->header is not used any more in SCTP.
this series adds a DSA driver for the Hirschmann Hellcreek TSN switch
IP. Characteristics of that IP:
* Full duplex Ethernet interface at 100/1000 Mbps on three ports
* IEEE 802.1Q-compliant Ethernet Switch
* IEEE 802.1Qbv Time-Aware scheduling support
* IEEE 1588 and IEEE 802.1AS support
That IP is used e.g. in
https://www.arrow.com/en/campaigns/arrow-kairos
Due to the hardware setup the switch driver is implemented using DSA. A special
tagging protocol is leveraged. Furthermore, this driver supports PTP and
hardware timestamping.
This work is part of the AccessTSN project: https://www.accesstsn.com/
* Drop TAPRIO support (David Miller)
=> Switch to mutexes due to the lack of hrtimers
* Use more specific compatible strings and add platform data (Andrew Lunn)
* Fix Kconfig ordering (Andrew Lunn)
Changes since v2:
* Make it compile by getting all requirements merged first (Jakub Kicinski, David Miller)
* Use "tsn" for TSN register set (Rob Herring)
* Fix DT binding issues (Rob Herring)
Changes since v1:
* Code simplifications (Florian Fainelli, Vladimir Oltean)
* Fix issues with hellcreek.yaml bindings (Florian Fainelli)
* Clear reserved field in ptp v2 event messages (Richard Cochran)
* Make use of generic ptp parsing function (Richard Cochran, Vladimir Oltean)
* Fix Kconfig (Florian Fainelli)
* Add tags (Florian Fainelli, Rob Herring, Richard Cochran)
Changes since RFC ordered by reviewers:
* Andrew Lunn
* Use dev_dbg for debug messages
* Get rid of __ function names where possible
* Use reverse xmas tree variable ordering
* Remove redundant/useless checks
* Improve comments e.g. for PTP
* Fix Kconfig ordering
* Make LED handling more generic and provide info via DT
* Setup advertisement of PHYs according to hardware
* Drop debugfs patch
* Jakub Kicinski
* Fix compiler warnings
* Florian Fainelli
* Switch to YAML DT bindings
* Richard Cochran
* Fix typo
* Add missing NULL checks
====================
Kamil Alkhouri [Tue, 3 Nov 2020 07:10:58 +0000 (08:10 +0100)]
net: dsa: hellcreek: Add support for hardware timestamping
The switch has the ability to take hardware generated time stamps per port for
PTPv2 event messages in Rx and Tx direction. That is useful for achieving needed
time synchronization precision for TSN devices/switches. So add support for it.
There are two directions:
* RX
The switch has a single register per port to capture a timestamp. That
mechanism is not used due to correlation problems. If the software processing
is too slow and a PTPv2 event message is received before the previous one has
been processed, false timestamps will be captured. Therefore, the switch can
do "inline" timestamping which means it can insert the nanoseconds part of
the timestamp directly into the PTPv2 event message. The reserved field (4
bytes) is leveraged for that. This might not be in accordance with (older)
PTP standards, but is the only way to get reliable results.
* TX
In Tx direction there is no correlation problem, because the software and the
driver has to ensure that only one event message is "on the fly". However,
the switch provides also a mechanism to check whether a timestamp is
lost. That can only happen when a timestamp is read and at this point another
message is timestamped. So, that lost bit is checked just in case to indicate
to the user that the driver or the software is somewhat buggy.
Kamil Alkhouri [Tue, 3 Nov 2020 07:10:57 +0000 (08:10 +0100)]
net: dsa: hellcreek: Add PTP clock support
The switch has internal PTP hardware clocks. Add support for it. There are three
clocks:
* Synchronized
* Syntonized
* Free running
Currently the synchronized clock is exported to user space which is a good
default for the beginning. The free running clock might be exported later
e.g. for implementing 802.1AS-2011/2020 Time Aware Bridges (TAB). The switch
also supports cross time stamping for that purpose.
The implementation adds support setting/getting the time as well as offset and
frequency adjustments. However, the clock only holds a partial timeofday
timestamp. This is why we track the seconds completely in software (see overflow
work and last_ts).
Furthermore, add the PTP multicast addresses into the FDB to forward that
packages only to the CPU port where they are processed by a PTP program.
Kurt Kanzenbach [Tue, 3 Nov 2020 07:10:56 +0000 (08:10 +0100)]
net: dsa: Add DSA driver for Hirschmann Hellcreek switches
Add a basic DSA driver for Hirschmann Hellcreek switches. Those switches are
implementing features needed for Time Sensitive Networking (TSN) such as support
for the Time Precision Protocol and various shapers like the Time Aware Shaper.
This driver includes basic support for networking:
Vladimir Oltean [Tue, 3 Nov 2020 07:10:55 +0000 (08:10 +0100)]
net: dsa: Give drivers the chance to veto certain upper devices
Some switches rely on unique pvids to ensure port separation in
standalone mode, because they don't have a port forwarding matrix
configurable in hardware. So, setups like a group of 2 uppers with the
same VLAN, swp0.100 and swp1.100, will cause traffic tagged with VLAN
100 to be autonomously forwarded between these switch ports, in spite
of there being no bridge between swp0 and swp1.
These drivers need to prevent this from happening. They need to have
VLAN filtering enabled in standalone mode (so they'll drop frames tagged
with unknown VLANs) and they can only accept an 8021q upper on a port as
long as it isn't installed on any other port too. So give them the
chance to veto bad user requests.
Kurt Kanzenbach [Tue, 3 Nov 2020 07:10:54 +0000 (08:10 +0100)]
net: dsa: Add tag handling for Hirschmann Hellcreek switches
The Hirschmann Hellcreek TSN switches have a special tagging protocol for frames
exchanged between the CPU port and the master interface. The format is a one
byte trailer indicating the destination or origin port.
It's quite similar to the Micrel KSZ tagging. That's why the implementation is
based on that code.
Benjamin Gwin [Tue, 3 Nov 2020 20:11:06 +0000 (12:11 -0800)]
arm64: kexec_file: try more regions if loading segments fails
It's possible that the first region picked for the new kernel will make
it impossible to fit the other segments in the required 32GB window,
especially if we have a very large initrd.
Instead of giving up, we can keep testing other regions for the kernel
until we find one that works.
mlx5_eq_async_int() uses in_irq() to decide whether eq::lock needs to be
acquired and released with spin_[un]lock() or the irq saving/restoring
variants.
The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.
mlx5_eq_async_int() knows the context via the action argument already so
using it for the lock variant decision is a straight forward replacement
for in_irq().
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:57:
warning: Enum value 'MLX5_FPGA_ACCESS_TYPE_I2C' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:57:
warning: Enum value 'MLX5_FPGA_ACCESS_TYPE_DONTCARE' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:118:
warning: Function parameter or member 'cb_arg' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:160:
warning: Function parameter or member 'conn' not described ...
drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h:160:
warning: Excess function parameter 'fdev' description ...
drivers/net/ethernet/mellanox/mlx4/fw_qos.h:144:
warning: Function parameter or member 'in_param' not described ...
drivers/net/ethernet/mellanox/mlx4/fw_qos.h:144:
warning: Excess function parameter 'out_param' description ...
net/mlx5e: Validate stop_room size upon user input
Stop room is a space that may be taken by WQEs in the SQ during a packet
transmit. It is used to check if next packet has enough room in the SQ.
Stop room guarantees this packet can be served and if not, the queue is
stopped, so no more packets are passed to the driver until it's ready.
Currently, stop_room size is calculated and validated upon tx queues
allocation. This makes it impossible to know if user provided valid
input for certain parameters when interface is down.
Instead, store stop_room in mlx5e_sq_param and create
mlx5e_validate_params(), to validate its fields upon user input even
when the interface is down.
Track buddy's used ICM memory, and free it if all
of the buddy's memory bacame unused.
Do this only for STEs.
MODIFY_ACTION buddies are much smaller, so in case there
is a large amount of modify_header actions, which result
in large amount of MODIFY_ACTION buddies, doing this
cleanup during sync will result in performance hit while
not freeing significant amount of memory.
Track the pool's hot ICM memory when freeing/allocating
chunk, so that when checking if the sync is required, just
check if the pool hot memory has reached the sync threshold.
When freeing chunks, we want to sync the steering
so that all the "hot" memory will be written to ICM
and all the chunks that are in the hot_list will be
actually destroyed.
When allocating from the pool, we don't have a need
to sync the steering, as we're not freeing anything,
and sync might just hurt the performance in terms of
flow-per-second offloaded.
net/mlx5: DR, Handle ICM memory via buddy allocation instead of buckets
Till now in order to manage the ICM memory we used bucket
mechanism, which kept a bucket per specified size (sizes were
between 1 block to 2^21 blocks).
Now changing that with buddy-system mechanism, which gives us much
more flexible way to manage the ICM memory.
Its biggest advantage over the bucket is by using the same ICM memory
area for all the sizes of blocks, which reduces the memory consumption.
Add implementation of SW Steering variation of buddy allocator.
The buddy system for ICM memory uses 2 main data structures:
- Bitmap per order, that keeps the current state of allocated
blocks for this order
- Indicator for the number of available blocks per each order
We will support multiple STE versions.
The existing naming is not suitable for newer versions.
Removed the HW specific details and renamed with a more
general names.
Linus Torvalds [Thu, 5 Nov 2020 19:52:17 +0000 (11:52 -0800)]
Merge tag 'linux-kselftest-kunit-fixes-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kunit fixes from Shuah Khan:
"Several kunit_tool and documentation fixes"
* tag 'linux-kselftest-kunit-fixes-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: tools: fix kunit_tool tests for parsing test plans
Documentation: kunit: Update Kconfig parts for KUNIT's module support
kunit: test: fix remaining kernel-doc warnings
kunit: Don't fail test suites if one of them is empty
kunit: Fix kunit.py --raw_output option
Linus Torvalds [Thu, 5 Nov 2020 19:41:38 +0000 (11:41 -0800)]
Merge tag 'trace-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix off-by-one error in retrieving the context buffer for
trace_printk()
- Fix off-by-one error in stack nesting limit
- Fix recursion to not make all NMI code false positive as recursing
- Stop losing events in function tracing when transitioning between irq
context
- Stop losing events in ring buffer when transitioning between irq
context
- Fix return code of error pointer in parse_synth_field() to prevent
NULL pointer dereference.
- Fix false positive of NMI recursion in kprobe event handling
* tag 'trace-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
kprobes: Tell lockdep about kprobe nesting
tracing: Make -ENOMEM the default error for parse_synth_field()
ring-buffer: Fix recursion protection transitions between interrupt context
tracing: Fix the checking of stackidx in __ftrace_trace_stack
ftrace: Handle tracing when switching between context
ftrace: Fix recursion check for NMI test
tracing: Fix out of bounds write in get_trace_buf
Linus Torvalds [Thu, 5 Nov 2020 19:32:03 +0000 (11:32 -0800)]
Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- clarify a comment (Michael Kelley)
- change a pr_warn() to pr_info() (Olaf Hering)
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Clarify comment on x2apic mode
hv_balloon: disable warning when floor reached
Linus Torvalds [Thu, 5 Nov 2020 19:25:02 +0000 (11:25 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"A few more merge window regressions that didn't make rc1:
- New validation in the DMA layer triggers wrong use of the DMA layer
in rxe, siw and rdmavt
- Accidental change of a hypervisor facing ABI when widening the port
speed u8 to u16 in vmw_pvrdma
- Memory leak on error unwind in SRP target"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/srpt: Fix typo in srpt_unregister_mad_agent docstring
RDMA/vmw_pvrdma: Fix the active_speed and phys_state value
IB/srpt: Fix memory leak in srpt_add_one
RDMA: Fix software RDMA drivers for dma mapping error
Linus Torvalds [Thu, 5 Nov 2020 19:16:34 +0000 (11:16 -0800)]
Merge tag 'spi-fix-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A small collection of driver specific fixes that have come in since
the merge window, nothing too major here but all good to have"
* tag 'spi-fix-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fsl-dspi: fix wrong pointer in suspend/resume
spi: bcm2835: fix gpio cs level inversion
spi: imx: fix runtime pm support for !CONFIG_PM
Linus Torvalds [Thu, 5 Nov 2020 19:11:40 +0000 (11:11 -0800)]
Merge tag 'regulator-fix-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"An addition to MAINTAINERS plus a fix for a nasty bootstrapping
problem which caused problems when we need to read the voltage of a
regulator that is not yet available during initialization, we were not
correctly distinguishing between this case and the case where a
regulator is put into a bypass mode"
* tag 'regulator-fix-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: defer probe when trying to get voltage from unresolved supply
MAINTAINERS: Add entry for Qualcomm IPQ4019 VQMMC regulator
Linus Torvalds [Thu, 5 Nov 2020 19:04:29 +0000 (11:04 -0800)]
Merge tag 'pm-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix the device links support in runtime PM, correct mistakes in
the cpuidle documentation, fix the handling of policy limits changes
in the schedutil cpufreq governor, fix assorted issues in the OPP
(operating performance points) framework and make one janitorial
change.
Specifics:
- Unify the handling of managed and stateless device links in the
runtime PM framework and prevent runtime PM references to devices
from being leaked after device link removal (Rafael Wysocki).
- Fix two mistakes in the cpuidle documentation (Julia Lawall).
- Prevent the schedutil cpufreq governor from missing policy limits
updates in some cases (Viresh Kumar).
- Prevent static OPPs from being dropped by mistake (Viresh Kumar).
- Prevent helper function in the OPP framework from returning
prematurely (Viresh Kumar).
- Prevent opp_table_lock from being held too long during removal of
OPP tables with no more active references (Viresh Kumar).
- Drop redundant semicolon from the Intel RAPL power capping driver
(Tom Rix)"
* tag 'pm-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: runtime: Resume the device earlier in __device_release_driver()
PM: runtime: Drop pm_runtime_clean_up_links()
PM: runtime: Drop runtime PM references to supplier on link removal
powercap/intel_rapl: remove unneeded semicolon
Documentation: PM: cpuidle: correct path name
Documentation: PM: cpuidle: correct typo
cpufreq: schedutil: Don't skip freq update if need_freq_update is set
opp: Reduce the size of critical section in _opp_table_kref_release()
opp: Fix early exit from dev_pm_opp_register_set_opp_helper()
opp: Don't always remove static OPPs in _of_add_opp_table_v1()
Linus Torvalds [Thu, 5 Nov 2020 18:57:01 +0000 (10:57 -0800)]
Merge tag 'fixes-2020-11-05' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull highmem initialization fix from Mike Rapoport:
"Fix highmem initialization on arm and xtensa
Recent refactoring of memblock iterators has broken initialization of
highmem on arm and xtensa because it changed the way beginning and end
of memory regions are rounded to PFNs. This fix restores the original
behaviour"
* tag 'fixes-2020-11-05' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
ARM, xtensa: highmem: avoid clobbering non-page aligned memory reservations
Linus Torvalds [Thu, 5 Nov 2020 18:51:51 +0000 (10:51 -0800)]
Merge tag 'gfs2-v5.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
"Various gfs2 fixes"
* tag 'gfs2-v5.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Wake up when sd_glock_disposal becomes zero
gfs2: Don't call cancel_delayed_work_sync from within delete work function
gfs2: check for live vs. read-only file system in gfs2_fitrim
gfs2: don't initialize statfs_change inodes in spectator mode
gfs2: Split up gfs2_meta_sync into inode and rgrp versions
gfs2: init_journal's undo directive should also undo the statfs inodes
gfs2: Add missing truncate_inode_pages_final for sd_aspace
gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free
Linus Torvalds [Thu, 5 Nov 2020 18:41:14 +0000 (10:41 -0800)]
Merge tag 'pci-v5.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Fix ACS regression that broke device pass-through (Rajat Jain)
- Revert DesignWare ATU memory resource to use last entry to fix
Tegra194 regression (Rob Herring)
- Remove duplicate mvebu resource requests to fix regression on Turris
Omnia (Rob Herring)
* tag 'pci-v5.10-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: mvebu: Fix duplicate resource requests
PCI: dwc: Restore ATU memory resource setup to use last entry
PCI: Always enable ACS even if no ACS Capability
Vlad Buslov [Mon, 2 Nov 2020 20:12:43 +0000 (22:12 +0200)]
net: sched: implement action-specific terse dump
Allow user to request action terse dump with new flag value
TCA_FLAG_TERSE_DUMP. Only output essential action info in terse dump (kind,
stats, index and cookie, if set by the user when creating the action). This
is different from filter terse dump where index is excluded (filter can be
identified by its own handle).
Move tcf_action_dump_terse() function to the beginning of source file in
order to call it from tcf_dump_walker().
* pm-opp:
opp: Reduce the size of critical section in _opp_table_kref_release()
opp: Fix early exit from dev_pm_opp_register_set_opp_helper()
opp: Don't always remove static OPPs in _of_add_opp_table_v1()
====================
Netfilter updates for net-next
1) Move existing bridge packet reject infra to nf_reject_{ipv4,ipv6}.c
from Jose M. Guisado.
2) Consolidate nft_reject_inet initialization and dump, also from Jose.
3) Add the netdev reject action, from Jose.
4) Allow to combine the exist flag and the destroy command in ipset,
from Joszef Kadlecsik.
5) Expose bucket size parameter for hashtables, also from Jozsef.
6) Expose the init value for reproducible ipset listings, from Jozsef.
7) Use __printf attribute in nft_request_module, from Andrew Lunn.
8) Allow to use reject from the inet ingress chain.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
netfilter: nft_reject_inet: allow to use reject from inet ingress
netfilter: nftables: Add __printf() attribute
netfilter: ipset: Expose the initval hash parameter to userspace
netfilter: ipset: Add bucketsize parameter to all hash types
netfilter: ipset: Support the -exist flag with the destroy command
netfilter: nft_reject: add reject verdict support for netdev
netfilter: nft_reject: unify reject init and dump into nft_reject
netfilter: nf_reject: add reject skbuff creation helpers
====================
This is a collection of small fixup and minor enhancement patches that
have accumulated in the MPTCP tree while net-next was closed. These are
prerequisites for larger changes we have queued up.
Patch 1 refines receive buffer autotuning.
Patches 2 and 4 are some minor locking and refactoring changes.
Patch 3 improves GRO and RX coalescing with MPTCP skbs.
Patches 5-7 add a sysctl for tuning ADD_ADDR retransmission timeout,
corresponding test code, and documentation.
v2: Add sysctl documentation and fix signoff tags.
====================
Geliang Tang [Tue, 3 Nov 2020 19:05:09 +0000 (11:05 -0800)]
selftests: mptcp: add ADD_ADDR timeout test case
This patch added the test case for retransmitting ADD_ADDR when timeout
occurs. It set NS1's add_addr_timeout to 1 second, and drop NS2's ADD_ADDR
echo packets.
Here we need to slow down the transfer process of all data to let the
ADD_ADDR suboptions can be retransmitted three times. So we added a new
parameter "speed" for do_transfer, it can be set with fast or slow.
We also added three new optional parameters for run_tests, and dropped
run_remove_tests function.
Since we added the netfilter rules in this test case, we need to update
the "config" file.
Paolo Abeni [Tue, 3 Nov 2020 19:05:05 +0000 (11:05 -0800)]
tcp: propagate MPTCP skb extensions on xmit splits
When the TCP stack splits a packet on the write queue, the tail
half currently lose the associated skb extensions, and will not
carry the DSM on the wire.
The above does not cause functional problems and is allowed by
the RFC, but interact badly with GRO and RX coalescing, as possible
candidates for aggregation will carry different TCP options.
This change tries to improve the MPTCP behavior, propagating the
skb extensions on split.
Additionally, we must prevent the MPTCP stack from updating the
mapping after the split occur: that will both violate the RFC and
fool the reader.
mptcp: adjust mptcp receive buffer limit if subflow has larger one
In addition to tcp autotuning during read, it may also increase the
receive buffer in tcp_clamp_window().
In this case, mptcp should adjust its receive buffer size as well so
it can move all pending skbs from the subflow socket to the mptcp socket.
At this time, TCP can have more skbs ready for processing than what the
mptcp receive buffer size allows.
In the mptcp case, the receive window announced is based on the free
space of the mptcp parent socket instead of the individual subflows.
Following the subflow allows mptcp to grow its receive buffer.
This is especially noticeable for loopback traffic where two skbs are
enough to fill the initial receive window.
In mptcp_data_ready() we do not hold the mptcp socket lock, so modifying
mptcp_sk->sk_rcvbuf is racy. Do it when moving skbs from subflow to
mptcp socket, both sockets are locked in this case.
Heiner Kallweit [Tue, 3 Nov 2020 17:52:18 +0000 (18:52 +0100)]
r8169: work around short packet hw bug on RTL8125
Network problems with RTL8125B have been reported [0] and with help
from Realtek it turned out that this chip version has a hw problem
with short packets (similar to RTL8168evl). Having said that activate
the same workaround as for RTL8168evl.
Realtek suggested to activate the workaround for RTL8125A too, even
though they're not 100% sure yet which RTL8125 versions are affected.
Claudiu Manoil [Tue, 3 Nov 2020 14:02:13 +0000 (16:02 +0200)]
enetc: Remove Tx checksumming offload code
Tx checksumming has been defeatured and completely removed
from the h/w reference manual. Made a little cleanup for the
TSE case as this is complementary code.
The ADIN1300/ADIN1200 support cable diagnostics using TDR.
The cable fault detection is automatically run on all four pairs looking at
all combinations of pair faults by first putting the PHY in standby (clear
the LINK_EN bit, PHY_CTRL_3 register, Address 0x0017) and then enabling the
diagnostic clock (set the DIAG_CLK_EN bit, PHY_CTRL_1 register, Address
0x0012).
Cable diagnostics can then be run (set the CDIAG_RUN bit in the
CDIAG_RUN register, Address 0xBA1B). The results are reported for each pair
in the cable diagnostics results registers, CDIAG_DTLD_RSLTS_0,
CDIAG_DTLD_RSLTS_1, CDIAG_DTLD_RSLTS_2, and CDIAG_DTLD_RSLTS_3, Address
0xBA1D to Address 0xBA20).
The distance to the first fault for each pair is reported in the cable
fault distance registers, CDIAG_FLT_DIST_0, CDIAG_FLT_DIST_1,
CDIAG_FLT_DIST_2, and CDIAG_FLT_DIST_3, Address 0xBA21 to Address 0xBA24).
This change implements support for this using phylib's cable-test support.
When the PHY powers up, the diagnostics clock isn't enabled (bit 2 in
register PHY_CTRL_1 (0x0012)).
Also, the PHY is not in standby mode, so bit 13 in PHY_CTRL_3 (0x0017) is
always set at power up.
The standby mode and the diagnostics clock are both meant to be for the
cable diagnostics feature of the PHY (in phylib this would be equivalent to
the cable-test support), and for the frame-generator feature of the PHY.
In standby mode, the PHY doesn't negotiate links or manage links.
To use the cable diagnostics/test (or frame-generator), the PHY must be
first set in standby mode, so that the link operation doesn't interfere.
Then, the diagnostics clock must be enabled.
For the cable-test feature, when the operation finishes, the PHY goes into
PHY_UP state, and the config_aneg hook is called.
For the ADIN PHY, we need to make sure that during autonegotiation
configuration/setup the PHY is removed from standby mode and the
diagnostics clock is disabled, so that normal operation is resumed.
This change does that by moving the set of the ADIN1300_LINKING_EN bit (2)
in the config_aneg (to disable standby mode).
Previously, this was set in the downshift setup, because the downshift
retry value and the ADIN1300_LINKING_EN are in the same register.
And the ADIN1300_DIAG_CLK_EN bit (13) is cleared, to disable the
diagnostics clock.
====================
selftests: net: bridge: add tests for MLDv2
This is the second selftests patch-set for the new multicast functionality
which adds tests for the bridge's MLDv2 support. The tests use full
precooked packets which are sent via mausezahn and the resulting state
after each test is checked for proper X,Y sets, (*,G) source list, source
list entry timers, (S,G) existence and flags, packet forwarding and
blocking, exclude group expiration and (*,G) auto-add. The first 3 patches
factor out common functions which are used by IGMPv3 tests in lib.sh and
add support for IPv6 test UDP packet, then patch 4 adds the first test with
the initial MLDv2 setup.
The following new tests are added:
- base case: MLDv2 report ff02::cc is_include
- include -> allow report
- include -> is_include report
- include -> is_exclude report
- include -> to_exclude report
- exclude -> allow report
- exclude -> is_include report
- exclude -> is_exclude report
- exclude -> to_exclude report
- include -> block report
- exclude -> block report
- exclude timeout (move to include + entry deletion)
- S,G port entry automatic add to a *,G,exclude port
The variable names and set notation are the same as per RFC 3810,
for more information check RFC 3810 sections 2.3 and 7.
====================
selftests: net: bridge: add test for mldv2 *,g auto-add
When we have *,G ports in exclude mode and a new S,G,port is added
the kernel has to automatically create an S,G entry for each exclude
port to get proper forwarding.
selftests: net: bridge: add test for mldv2 exc -> block report
The test checks for the following case:
Router State Report Received New Router State Actions
EXCLUDE (X,Y) BLOCK (A) EXCLUDE (X+(A-Y),Y) (A-X-Y) =
Filter Timer
Send Q(MA,A-Y)
selftests: net: bridge: add test for mldv2 exc -> to_exclude report
The test checks for the following case:
Router State Report Received New Router State Actions
EXCLUDE (X,Y) TO_EX (A) EXCLUDE (A-Y,Y*A) (A-X-Y) =
Filter Timer
Delete (X-A)
Delete (Y-A)
Send Q(MA,A-Y)
Filter Timer=MALI
selftests: net: bridge: add test for mldv2 exc -> is_exclude report
The test checks for the following case:
Router State Report Received New Router State Actions
EXCLUDE (X,Y) IS_EX (A) EXCLUDE (A-Y, Y*A) (A-X-Y)=MALI
Delete (X-A)
Delete (Y-A)
Filter Timer=MALI
selftests: net: bridge: add test for mldv2 inc -> to_exclude report
The test checks for the following case:
Router State Report Received New Router State Actions
INCLUDE (A) TO_EX (B) EXCLUDE (A*B,B-A) (B-A)=0
Delete (A-B)
Send Q(MA,A*B)
Filter Timer=MALI
selftests: net: bridge: add test for mldv2 inc -> is_exclude report
The test checks for the following case:
Router State Report Received New Router State Actions
INCLUDE (A) IS_EX (B) EXCLUDE (A*B, B-A) (B-A)=0
Delete (A-B)
Filter Timer=MALI
selftests: net: bridge: add initial MLDv2 include test
Add the initial setup for MLDv2 tests with the first test of a simple
is_include report. For MLDv2 we need to setup the bridge properly and we
also send the full precooked packets instead of relying on mausezahn to
fill in some parts. For verification we use the generic S,G state checking
functions from lib.sh.
selftests: net: bridge: factor out and rename sg state functions
Factor out S,G entry state checking functions for existence, forwarding,
blocking and timer to lib.sh so they can be later used by MLDv2 tests.
Add brmcast_ suffix to their name to make the relation to the bridge
explicit.
selftests: net: lib: add support for IPv6 mcast packet test
In order to test an IPv6 multicast packet we need to pass different tc
and mausezahn protocols only, so add a simple check for the destination
address which decides if we should generate an IPv4 or IPv6 mcast
packet.
Jakub Kicinski [Thu, 5 Nov 2020 00:28:07 +0000 (16:28 -0800)]
Merge branch 'net-ipa-tell-gsi-the-ipa-version'
Alex Elder says:
====================
net: ipa: tell GSI the IPA version
The GSI code that supports IPA avoids having knowledge about the
IPA layer it serves. One result of this is that Boolean flags are
used during GSI initialization to convey that certain hardware
version-dependent special behaviors should be used.
A given version of IPA hardware uses a fixed/well-defined version
of GSI, so the IPA version really implies the GSI version.
If given only the IPA version, the GSI code supporting IPA can
use it to implement certain special behaviors required for IPA
*or* GSI. This avoids the need to pass and maintain numerous
Boolean flags.
====================
Alex Elder [Mon, 2 Nov 2020 17:54:00 +0000 (11:54 -0600)]
net: ipa: eliminate legacy arguments
We enable a channel doorbell engine only for IPA v3.5.1, and that is
now handled directly by gsi_channel_program().
When initially setting up a channel, we want that doorbell engine
enabled, and we can request that independent of the IPA version.
Doing that makes the "legacy" argument to gsi_channel_setup_one()
unnecessary. And with that gone we can get rid of the "legacy"
argument to gsi_channel_setup(), and gsi_setup() as well.
Alex Elder [Mon, 2 Nov 2020 17:53:59 +0000 (11:53 -0600)]
net: ipa: use version in gsi_channel_program()
Use the IPA version in gsi_channel_program() to determine whether
we should enable the GSI doorbell engine when requested. This way,
callers only say whether or not it should be enabled if needed,
regardless of hardware version.
Rename the "legacy" argument to gsi_channel_reset(), and have
it indicate whether the doorbell engine should be enabled when
reprogramming following the reset.
Change all callers of gsi_channel_reset() to indicate whether to
enable the doorbell engine after reset, independent of hardware
version.
Rework a little logic in ipa_endpoint_reset() to get rid of the
"legacy" variable previously passed to gsi_channel_reset().
Alex Elder [Mon, 2 Nov 2020 17:53:58 +0000 (11:53 -0600)]
net: ipa: use version in gsi_channel_reset()
A quirk of IPA v3.5.1 requires a channel reset on an RX channel to
be performed twice. Use the IPA version in gsi_channel_reset()
rather than the passed-in legacy flag to determine that.
This is actually a bug fix, because this double reset is supposed
to occur independent of whether we're enabling the doorbell engine.
Now they will be independent.
Alex Elder [Mon, 2 Nov 2020 17:53:57 +0000 (11:53 -0600)]
net: ipa: use version in gsi_channel_init()
A quirk of IPA v4.2 requires the AP to allocate the GSI channels
that are owned by the modem.
Rather than pass a flag argument to gsi_channel_init(), use the
IPA version directly in that function to determine whether modem
channels need to be allocated.
Alex Elder [Mon, 2 Nov 2020 17:53:56 +0000 (11:53 -0600)]
net: ipa: record IPA version in GSI structure
Record the IPA version passed to gsi_init() in the GSI structure.
This allows that value to be used directly where needed, rather than
passing and storing certain flag arguments through the code.
In particular, for all but one supported version of IPA, the command
channel is programmed to only use an "escape buffer". By storing
the IPA version, we can do a simple version check in one location,
and avoid storing a flag field in every channel (and passing a flag
along while initializing channels to set that field properly).
Alex Elder [Mon, 2 Nov 2020 17:53:55 +0000 (11:53 -0600)]
net: ipa: expose IPA version to the GSI layer
Although GSI is integral to IPA, it is a separate hardware component
and the IPA code supporting it has been structured to avoid explicit
dependence on IPA details. An example of this is that gsi_init() is
passed a number of Boolean flags to indicate special behaviors,
whose values are dependent on the IPA hardware version. Looking
ahead, newer hardware versions would require even more such special
behaviors.
For any given version of IPA hardware (like 3.5.1 or 4.2), the GSI
hardware version is fixed (in this case, 1.3 and 2.2, respectively).
So the IPA version *implies* the GSI version, and the IPA version
can be used as effectively the equivalent of the GSI hardware version.
Rather than proliferating new special behavior flags, just provide
the IPA version to the GSI layer when it is initialized. The GSI
code can then use that directly to determine whether special
behaviors are required. The IPA version enumerated type is already
isolated to its own header file, so the exposure of this IPA detail
is very limited.
For now, just change gsi_init() to pass the version rather than the
Boolean flags, and set the flag values internal to that function.
Rob Herring [Fri, 23 Oct 2020 14:52:52 +0000 (09:52 -0500)]
PCI: mvebu: Fix duplicate resource requests
With commit 669cbc708122 ("PCI: Move DT resource setup into
devm_pci_alloc_host_bridge()"), the DT 'ranges' is parsed and populated
into resources when the host bridge is allocated. The resources are
requested as well, but that happens a second time for the mvebu driver in
mvebu_pcie_parse_request_resources(). We should only be requesting the
additional resources added in mvebu_pcie_parse_request_resources(). These
are not added by default because they use custom properties rather than
standard DT address translation.
Also, the bus ranges was also populated by default, so we can remove it
from mvebu_pci_host_probe().
Rob Herring [Mon, 26 Oct 2020 15:48:52 +0000 (10:48 -0500)]
PCI: dwc: Restore ATU memory resource setup to use last entry
Prior to commit 0f71c60ffd26 ("PCI: dwc: Remove storing of PCI resources"),
the DWC driver was setting up the last memory resource rather than the
first memory resource. This doesn't matter for most platforms which only
have 1 memory resource, but it broke Tegra194 which has a 2nd
(prefetchable) memory region that requires an ATU entry. The first region
on Tegra194 relies on the default 1:1 pass-thru of outbound transactions
and doesn't need an ATU entry.
Jakub Kicinski [Wed, 4 Nov 2020 18:36:37 +0000 (10:36 -0800)]
Merge tag 'linux-can-fixes-for-5.10-20201103' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2020-11-03
The first two patches are by Oleksij Rempel and they add a generic
can-controller Device Tree yaml binding and convert the text based binding
of the flexcan driver to a yaml based binding.
Zhang Changzhong's patch fixes a remove_proc_entry warning in the AF_CAN
core.
A patch by me fixes a kfree_skb() call from IRQ context in the rx-offload
helper.
Vincent Mailhol contributes a patch to prevent a call to kfree_skb() in
hard IRQ context in can_get_echo_skb().
Oliver Hartkopp's patch fixes the length calculation for RTR CAN frames
in the __can_get_echo_skb() helper.
Oleksij Rempel's patch fixes a use-after-free that shows up with j1939 in
can_create_echo_skb().
Yegor Yefremov contributes 4 patches to enhance the j1939 documentation.
Zhang Changzhong's patch fixes a hanging task problem in j1939_sk_bind()
if the netdev is down.
Then there are three patches for the newly added CAN_ISOTP protocol. Geert
Uytterhoeven enhances the kconfig help text. Oliver Hartkopp's patch adds
missing RX timeout handling in listen-only mode and Colin Ian King's patch
decreases the generated object code by 926 bytes.
Zhang Changzhong contributes a patch for the ti_hecc driver that fixes the
error path in the probe function.
Navid Emamdoost's patch for the xilinx_can driver fixes the error handling
in case of failing pm_runtime_get_sync().
There are two patches for the peak_usb driver. Dan Carpenter adds range
checking in decode operations and Stephane Grosjean's patch fixes
a timestamp wrapping problem.
Stephane Grosjean's patch for th peak_canfd driver fixes echo management if
loopback is on.
The next three patches all target the mcp251xfd driver. The first one is
by me and it increased the severity of CRC read error messages. The kernel
test robot removes an unneeded semicolon and Tom Rix removes unneeded
break in several switch-cases.
The last 4 patches are by Joakim Zhang and target the flexcan driver,
the first three fix ECC related device specific quirks for the LS1021A,
LX2160A and the VF610 SoC. The last patch disable wakeup completely upon
driver remove.
* tag 'linux-can-fixes-for-5.10-20201103' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: (27 commits)
can: flexcan: flexcan_remove(): disable wakeup completely
can: flexcan: add ECC initialization for VF610
can: flexcan: add ECC initialization for LX2160A
can: flexcan: remove FLEXCAN_QUIRK_DISABLE_MECR quirk for LS1021A
can: mcp251xfd: remove unneeded break
can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings
can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages
can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is on
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
can: peak_usb: add range checking in decode operations
can: xilinx_can: handle failure cases of pm_runtime_get_sync
can: ti_hecc: ti_hecc_probe(): add missed clk_disable_unprepare() in error path
can: isotp: padlen(): make const array static, makes object smaller
can: isotp: isotp_rcv_cf(): enable RX timeout handling in listen-only mode
can: isotp: Explain PDU in CAN_ISOTP help text
can: j1939: j1939_sk_bind(): return failure if netdev is down
can: j1939: use backquotes for code samples
can: j1939: swap addr and pgn in the send example
can: j1939: fix syntax and spelling
can: j1939: rename jacd tool
...
====================
Zhao Qiang [Tue, 3 Nov 2020 02:05:46 +0000 (10:05 +0800)]
spi: fsl-dspi: fix wrong pointer in suspend/resume
Since commit 530b5affc675 ("spi: fsl-dspi: fix use-after-free in
remove path"), this driver causes a "NULL pointer dereference"
in dspi_suspend/resume.
This is because since this commit, the drivers private data point to
"dspi" instead of "ctlr", the codes in suspend and resume func were
not modified correspondly.
Since the kprobe handlers have protection that prohibits other handlers from
executing in other contexts (like if an NMI comes in while processing a
kprobe, and executes the same kprobe, it will get fail with a "busy"
return). Lockdep is unaware of this protection. Use lockdep's nesting api to
differentiate between locks taken in INT3 context and other context to
suppress the false warnings.