This adds the missing MODULE_DEVICE_TABLE() for SDIO IDs. While certain
platforms using this driver indeed have HW issues causing problems if
the module is loaded too early - this should be handled from user-space
by blacklisting it or delaying the loading.
Claire Chang [Thu, 31 Oct 2019 10:46:14 +0000 (18:46 +0800)]
Bluetooth: hci_qca: add PM support
Add PM suspend/resume callbacks for hci_qca driver.
BT host will make sure both Rx and Tx go into sleep state in
qca_suspend. Without this, Tx may still remain in awake state, which
prevents BTSOC from entering deep sleep. For example, BlueZ will send
Set Event Mask to device when suspending and this will wake the device
Rx up. However, the Tx idle timeout on the host side is 2000 ms. If the
host is suspended before its Tx idle times out, it won't send
HCI_IBS_SLEEP_IND to the device and the device Rx will remain awake.
We implement this by canceling relevant work in workqueue, sending
HCI_IBS_SLEEP_IND to the device and then waiting HCI_IBS_SLEEP_IND sent
by the device.
In order to prevent the device from being awaken again after qca_suspend
is called, we introduce QCA_SUSPEND flag. QCA_SUSPEND is set in the
beginning of qca_suspend to indicate system is suspending and that we'd
like to ignore any further wake events.
With QCA_SUSPEND and spinlock, we can avoid race condition, e.g. if
qca_enqueue acquires qca->hci_ibs_lock before qca_suspend calls
cancel_work_sync and then qca_enqueue adds a new qca->ws_awake_device
work after the previous one is cancelled.
If BTSOC wants to wake the whole system up after qca_suspend is called,
it will keep sending HCI_IBS_WAKE_IND and uart driver will take care of
waking the system. For example, uart driver will reconfigure its Rx pin
to a normal GPIO pin and enable irq wake on that pin when suspending.
Once host detects Rx falling, the system will begin resuming. Then, the
BT host clears QCA_SUSPEND flag in qca_resume and begins dealing with
normal HCI packets. By doing so, only a few HCI_IBS_WAKE_IND packets are
lost and there is no data packet loss.
Instances may have flags set as part of its data in which case the code
should not attempt to add it again otherwise it can cause duplication:
< HCI Command: LE Set Extended Advertising Data (0x08|0x0037) plen 35
Handle: 0x00
Operation: Complete extended advertising data (0x03)
Fragment preference: Minimize fragmentation (0x01)
Data length: 0x06
Flags: 0x04
BR/EDR Not Supported
Flags: 0x06
LE General Discoverable Mode
BR/EDR Not Supported
Bluetooth: Fix using advertising instance duration as timeout
When using LE Set Extended Advertising Enable command the duration
refers to the lifetime of instance not the length which is actually
controlled by the interval_min and interval_max when setting the
parameters.
Bluetooth: hci_bcm: Add compatible string for BCM43540
The BCM43540 chip is a 802.11 a/b/g/n/ac + Bluetooth 4.1 combo module.
This patch adds a compatible string match to the serdev driver for the
Bluetooth part of the chip.
David S. Miller [Sat, 26 Oct 2019 03:52:36 +0000 (20:52 -0700)]
Merge branch 'ionic-updates'
Shannon Nelson says:
====================
ionic updates
These are a few of the driver updates we've been working on internally.
These clean up a few mismatched struct comments, add checking for dead
firmware, fix an initialization bug, and change the Rx buffer management.
Shannon Nelson [Thu, 24 Oct 2019 00:48:59 +0000 (17:48 -0700)]
ionic: implement support for rx sgl
Even out Rx performance across MTU sizes by changing from full
skb allocations to page-based frag allocations. The device
supports a form of scatter-gather in the Rx path, so we can
set up a number of pages for each descriptor, all of which are
easier to alloc and pass around than the standard kzalloc'd
buffer. An skb is wrapped around the pages while processing
the received packets, and pages are recycled as needed, or
left alone if they weren't used in the Rx.
Shannon Nelson [Thu, 24 Oct 2019 00:48:57 +0000 (17:48 -0700)]
ionic: add heartbeat check
Most of our firmware has a heartbeat feature that the driver
can watch for to see if the FW is still alive and likely to
answer a dev_cmd or AdminQ request.
Shannon Nelson [Thu, 24 Oct 2019 00:48:56 +0000 (17:48 -0700)]
ionic: reverse an interrupt coalesce calculation
Fix the initial interrupt coalesce usec-to-hw setting
to actually be usec-to-hw.
Fixes: 780eded34ccc ("ionic: report users coalesce request") Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Heiner Kallweit [Wed, 23 Oct 2019 19:36:14 +0000 (21:36 +0200)]
r8169: improve rtl8169_rx_fill
We have only one user of the error path, so we can inline it.
In addition the call to rtl8169_make_unusable_by_asic() can be removed
because rtl8169_alloc_rx_data() didn't call rtl8169_mark_to_asic() yet
for the respective index if returning NULL.
Heiner Kallweit [Wed, 23 Oct 2019 19:09:34 +0000 (21:09 +0200)]
r8169: align fix_features callback with vendor driver
This patch aligns the fix_features callback with the vendor driver and
also disables IPv6 HW checksumming and TSO if jumbo packets are used
on RTL8101/RTL8168/RTL8125.
Here's the main bluetooth-next pull request for the 5.5 kernel:
- Multiple fixes to hci_qca driver
- Fix for HCI_USER_CHANNEL initialization
- btwlink: drop superseded driver
- Add support for Intel FW download error recovery
- Various other smaller fixes & improvements
Please let me know if there are any issues pulling. Thanks.
====================
Jason Baron [Wed, 23 Oct 2019 15:09:26 +0000 (11:09 -0400)]
tcp: add TCP_INFO status for failed client TFO
The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether
or not data-in-SYN was ack'd on both the client and server side. We'd like
to gather more information on the client-side in the failure case in order
to indicate the reason for the failure. This can be useful for not only
debugging TFO, but also for creating TFO socket policies. For example, if
a middle box removes the TFO option or drops a data-in-SYN, we can
can detect this case, and turn off TFO for these connections saving the
extra retransmits.
The newly added tcpi_fastopen_client_fail status is 2 bits and has the
following 4 states:
1) TFO_STATUS_UNSPEC
Catch-all state which includes when TFO is disabled via black hole
detection, which is indicated via LINUX_MIB_TCPFASTOPENBLACKHOLE.
2) TFO_COOKIE_UNAVAILABLE
If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie
is available in the cache.
3) TFO_DATA_NOT_ACKED
Data was sent with SYN, we received a SYN/ACK but it did not cover the data
portion. Cookie is not accepted by server because the cookie may be invalid
or the server may be overloaded.
4) TFO_SYN_RETRANSMITTED
Data was sent with SYN, we received a SYN/ACK which did not cover the data
after at least 1 additional SYN was sent (without data). It may be the case
that a middle-box is dropping data-in-SYN packets. Thus, it would be more
efficient to not use TFO on this connection to avoid extra retransmits
during connection establishment.
These new fields do not cover all the cases where TFO may fail, but other
failures, such as SYN/ACK + data being dropped, will result in the
connection not becoming established. And a connection blackhole after
session establishment shows up as a stalled connection.
The link detection timeouts can be observed (or link might not be detected
at all) when dp83867 PHY is configured in manual mode (speed/duplex).
CFG3[9] Robust Auto-MDIX option allows to significantly improve link detection
in case dp83867 is configured in manual mode and reduce link detection
time.
As per DM: "If link partners are configured to operational modes that are
not supported by normal Auto MDI/MDIX mode (like Auto-Neg versus Force
100Base-TX or Force 100Base-TX versus Force 100Base-TX), this Robust Auto
MDI/MDIX mode allows MDI/MDIX resolution and prevents deadlock."
Hence, enable this option by default as there are no known reasons
not to do so.
Vincent Prince [Wed, 23 Oct 2019 13:44:20 +0000 (15:44 +0200)]
net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
There is networking hardware that isn't based on Ethernet for layers 1 and 2.
For example CAN.
CAN is a multi-master serial bus standard for connecting Electronic Control
Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
of payload. Frame corruption is detected by a CRC. However frame loss due to
corruption is possible, but a quite unusual phenomenon.
While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
legacy protocols on top of CAN, which are not build with flow control or high
CAN frame drop rates in mind.
When using fq_codel, as soon as the queue reaches a certain delay based length,
skbs from the head of the queue are silently dropped. Silently meaning that the
user space using a send() or similar syscall doesn't get an error. However
TCP's flow control algorithm will detect dropped packages and adjust the
bandwidth accordingly.
When using fq_codel and sending raw frames over CAN, which is the common use
case, the user space thinks the package has been sent without problems, because
send() returned without an error. pfifo_fast will drop skbs, if the queue
length exceeds the maximum. But with this scheduler the skbs at the tail are
dropped, an error (-ENOBUFS) is propagated to user space. So that the user
space can slow down the package generation.
On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
during compile time, or set default during runtime with sysctl
net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
with pfifo_fast, I can transfer thousands of million CAN frames without a frame
drop. On the other hand with fq_codel there is more then one lost CAN frame per
thousand frames.
As pointed out fq_codel is not suited for CAN hardware, so this patch changes
attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices.
During transition of a netdev from down to up state the default queuing
discipline is attached by attach_default_qdiscs() with the help of
attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
attach the pfifo_fast (pfifo_fast_ops) if the network device type is
"ARPHRD_CAN".
David S. Miller [Thu, 24 Oct 2019 22:21:58 +0000 (15:21 -0700)]
Merge branch 'DPAA-Ethernet-changes'
Madalin Bucur says:
====================
DPAA Ethernet changes
v3: add newline at the end of error messages
v2: resending with From: field matching signed-off-by
Here's a series of changes for the DPAA Ethernet, addressing minor
or unapparent issues in the codebase, adding probe ordering based on
a recently added DPAA QMan API, removing some redundant code.
====================
Madalin Bucur [Wed, 23 Oct 2019 09:08:45 +0000 (12:08 +0300)]
fsl/fman: remove unused struct member
Remove unused struct member second_largest_buf_size. Also, an out of
bounds access would have occurred in the removed code if there was only
one buffer pool in use.
Madalin Bucur [Wed, 23 Oct 2019 09:08:44 +0000 (12:08 +0300)]
dpaa_eth: change DMA device
The DPAA Ethernet driver is using the FMan MAC as the device for DMA
mapping. This is not actually correct, as the real DMA device is the
FMan port (the FMan Rx port for reception and the FMan Tx port for
transmission). Changing the device used for DMA mapping to the Fman
Rx and Tx port devices.
Laurentiu Tudor [Wed, 23 Oct 2019 09:08:43 +0000 (12:08 +0300)]
fsl/fman: add API to get the device behind a fman port
Add an API that retrieves the 'struct device' that the specified FMan
port probed against. The new API will be used in a subsequent patch
that corrects the DMA devices used by the dpaa_eth driver.
Laurentiu Tudor [Wed, 23 Oct 2019 09:08:41 +0000 (12:08 +0300)]
dpaa_eth: defer probing after qbman
If the DPAA 1 Ethernet driver gets probed before the QBMan driver it will
cause a boot crash. Add predictability in the probing order by deferring
the Ethernet driver probe after QBMan and portals by using the recently
introduced QBMan APIs.
====================
net: aquantia: PTP support for AQC devices
This patchset introduces PTP feature support in Aquantia AQC atlantic driver.
This implementation is a joined effort of aquantia developers:
Egor is the main designer and driver/firmware architect on PTP,
Sergey and Dmitry are included as co-developers.
Dmitry also helped me in the overall patchset preparations.
Feature was verified on AQC hardware with testptp tool, linuxptp,
gptp and with Motu hardware unit.
version3 updates:
- Review comments applied: error handling, various fixes
version2 updates:
- Fixing issues from Andrew's review: replacing self with
ptp var name, making ptp_clk_offset a field in the ptp instance.
devm_kzalloc advice is actually non applicable, because ptp object gets
created/destroyed on each network device close/open and it should not be
linked with dev lifecycle.
- Rearranging commit authorship, adding Egor as a ptp module main maintainer
- Fixing kbuild 32bit division issues
====================
Igor Russkikh [Tue, 22 Oct 2019 09:53:49 +0000 (09:53 +0000)]
net: aquantia: adding atlantic ptp maintainer
PTP implementation is designed and maintained by Egor Pomozov, adding
him as this module maintainer. Egor is the author of the core
functionality and the architect, and is to be contacted for
all Aquantia PTP/AVB functionality.
Dmitry Bezrukov [Tue, 22 Oct 2019 09:53:47 +0000 (09:53 +0000)]
net: aquantia: add support for PIN funcs
Depending on FW configuration we can manage from 0 to 3 PINs for periodic output
and from 0 to 1 ext ts PIN for getting TS for external event.
Ext TS PIN functionality is implemented via periodic timestamps polling
directly from PHY, because right now there is now way to receive the
PIN trigger interrupt from phy.
Dmitry Bezrukov [Tue, 22 Oct 2019 09:53:45 +0000 (09:53 +0000)]
net: aquantia: add support for Phy access
GPIO PIN control and access is done by direct phy manipulation.
Here we add an aq_phy module which is able to access phy registers
via MDIO access mailbox.
Dmitry Bezrukov [Tue, 22 Oct 2019 09:53:38 +0000 (09:53 +0000)]
net: aquantia: rx filters for ptp
We implement HW filter reservation for PTP traffic. Special location
in filters table is marked as reserved, because incoming ptp traffic
should be directed only to PTP designated queue. This way HW will do PTP
timestamping and proper processing.
Egor Pomozov [Tue, 22 Oct 2019 09:53:27 +0000 (09:53 +0000)]
net: aquantia: add basic ptp_clock callbacks
Basic HW functions implemented for adjusting frequency,
adjusting time, getting and setting time.
With these callbacks we now do register ptp clock in the system.
Firmware interface parts are defined for PTP requests and interactions.
Enable/disable PTP counters in HW on clock register/unregister.
Egor Pomozov [Tue, 22 Oct 2019 09:53:22 +0000 (09:53 +0000)]
net: aquantia: PTP skeleton declarations and callbacks
Here we add basic function for PTP clock register/unregister.
We also declare FW/HW capability bits used to control PTP feature on device.
PTP device is created if network card has appropriate FW that has PTP
enabled in config. HW supports timestamping for PTPv2 802.AS1 and
PTPv2 IPv4 UDP packets.
It also supports basic PTP callbacks for getting/setting time, adjusting
frequency and time as well.
====================
mlxsw: Update main pool computation and pool size limits
Petr says:
In Spectrum ASICs, the shared buffer is an area of memory where packets are
kept until they can be transmitted. There are two resources associated with
shared buffer size: cap_total_buffer_size and cap_guaranteed_shared_buffer.
So far, mlxsw has been using the former as a limit when validating shared
buffer pool size configuration. However, the total size also includes
headrooms and reserved space, which really cannot be used for shared buffer
pools. Patch #1 mends this and has mlxsw use the guaranteed size.
To configure default pool sizes, mlxsw has historically hard-coded one or
two smallish pools, and one "main" pool that took most of the shared buffer
(that would be pool 0 on ingress and pool 4 on egress). During the
development of Spectrum-2, it became clear that the shared buffer size
keeps shrinking as bugs are identified and worked around. In order to
prevent having to tweak the size of pools 0 and 4 to catch up with updates
to values reported by the FW, patch #2 changes the way these pools are set.
Instead of hard-coding a fixed value, the main pool now takes whatever is
left from the guaranteed size after the smaller pool(s) are taken into
account.
====================
Petr Machata [Wed, 23 Oct 2019 06:05:00 +0000 (09:05 +0300)]
mlxsw: spectrum_buffers: Calculate the size of the main pool
Instead of hard-coding the size of the largest pool, calculate it from the
reported guaranteed shared buffer size and sizes of other pools (currently
only the CPU port pool).
Petr Machata [Wed, 23 Oct 2019 06:04:59 +0000 (09:04 +0300)]
mlxsw: spectrum: Use guaranteed buffer size as pool size limit
There are two resources associated with shared buffer size:
cap_total_buffer_size, and cap_guaranteed_shared_buffer. So far, mlxsw has
been using the former as a limit to determine how large a pool size is
allowed to be. However, the total size also includes headrooms and reserved
space, which really cannot be used for shared buffer pools.
Therefore convert mlxsw to use the latter resource as a limit. Adjust
hard-coded pool sizes to be the guaranteed size minus 256000 bytes for CPU
port pool. On Spectrum-1 that actually leads to an increase. A follow-up
patch will have this size calculated automatically.
Heiner Kallweit [Tue, 22 Oct 2019 19:30:57 +0000 (21:30 +0200)]
r8169: never set PCI_EXP_DEVCTL_NOSNOOP_EN
Setting PCI_EXP_DEVCTL_NOSNOOP_EN for certain chip versions had been
added to the vendor driver more than 10 years ago, and copied from
there to r8169. It has been removed from the vendor driver meanwhile
and I think we can safely remove this too.
====================
net: phy: support 1000Base-X auto-negotiation for BCM54616S
This patch series aims at supporting auto negotiation when BCM54616S is
running in 1000Base-X mode: without the patch series, BCM54616S PHY driver
would report incorrect link speed in 1000Base-X mode.
Patch #1 (of 3) modifies assignment to OR when dealing with dev_flags in
phy_attach_direct function, so that dev_flags updated in BCM54616S PHY's
probe callback won't be lost.
Patch #2 (of 3) adds several genphy_c37_* functions to support clause 37
1000Base-X auto-negotiation, and these functions are called in BCM54616S
PHY driver.
Patch #3 (of 3) detects BCM54616S PHY's operation mode and calls according
genphy_c37_* functions to configure auto-negotiation and parse link
attributes (speed, duplex, and etc.) in 1000Base-X mode.
====================
Tao Ren [Tue, 22 Oct 2019 18:31:08 +0000 (11:31 -0700)]
net: phy: broadcom: add 1000Base-X support for BCM54616S
The BCM54616S PHY cannot work properly in RGMII->1000Base-X mode, mainly
because genphy functions are designed for copper links, and 1000Base-X
(clause 37) auto negotiation needs to be handled differently.
This patch enables 1000Base-X support for BCM54616S by customizing 3
driver callbacks, and it's verified to be working on Facebook CMM BMC
platform (RGMII->1000Base-KX):
- probe: probe callback detects PHY's operation mode based on
INTERF_SEL[1:0] pins and 1000X/100FX selection bit in SerDES 100-FX
Control register.
- config_aneg: calls genphy_c37_config_aneg when the PHY is running in
1000Base-X mode; otherwise, genphy_config_aneg will be called.
- read_status: calls genphy_c37_read_status when the PHY is running in
1000Base-X mode; otherwise, genphy_read_status will be called.
Note: BCM54616S PHY can also be configured in RGMII->100Base-FX mode, and
100Base-FX support is not available as of now.
Tao Ren [Tue, 22 Oct 2019 18:31:06 +0000 (11:31 -0700)]
net: phy: modify assignment to OR for dev_flags in phy_attach_direct
Modify the assignment to OR when dealing with phydev->dev_flags in
phy_attach_direct function, and this is to make sure dev_flags set in
driver's probe callback won't be lost.
====================
The dsa_switch structure represents the physical switch device itself,
and is allocated by the driver. The dsa_switch_tree and dsa_port structures
represent the logical switch fabric (eventually composed of multiple switch
devices) and its ports, and are allocated by the DSA core.
This branch lists the logical ports directly in the fabric which simplifies
the iteration over all ports when assigning the default CPU port or configuring
the D in DSA in drivers like mv88e6xxx.
This also removes the unique dst->cpu_dp pointer and is a first step towards
supporting multiple CPU ports and dropping the DSA_MAX_PORTS limitation.
Because the dsa_port structures are not tied to the dsa_switch structure
anymore, we do not need to provide an helper for the drivers to allocate a
switch structure. Like in many other subsystems, drivers can now embed their
dsa_switch structure as they wish into their private structure. This will
be particularly interesting for the Broadcom drivers which were currently
limited by the dynamically allocated array of DSA ports.
The series implements the list of dsa_port structures, makes use of it,
then drops dst->cpu_dp and the dsa_switch_alloc helper.
====================
Vivien Didelot [Mon, 21 Oct 2019 20:51:30 +0000 (16:51 -0400)]
net: dsa: remove dsa_switch_alloc helper
Now that ports are dynamically listed in the fabric, there is no need
to provide a special helper to allocate the dsa_switch structure. This
will give more flexibility to drivers to embed this structure as they
wish in their private structure.
Vivien Didelot [Mon, 21 Oct 2019 20:51:28 +0000 (16:51 -0400)]
net: dsa: sja1105: register switch before assigning port private data
Like the dsa_switch_tree structures, the dsa_port structures will be
allocated on switch registration.
The SJA1105 driver is the only one accessing the dsa_port structure
after the switch allocation and before the switch registration.
For that reason, move switch registration prior to assigning the priv
member of the dsa_port structures.
Vivien Didelot [Mon, 21 Oct 2019 20:51:27 +0000 (16:51 -0400)]
net: dsa: mv88e6xxx: use ports list to map bridge
Instead of digging into the other dsa_switch structures of the fabric
and relying too much on the dsa_to_port helper, use the new list
of switch fabric ports to remap the Port VLAN Map of local bridge
group members or remap the Port VLAN Table entry of external bridge
group members.
Vivien Didelot [Mon, 21 Oct 2019 20:51:26 +0000 (16:51 -0400)]
net: dsa: mv88e6xxx: use ports list to map port VLAN
Instead of digging into the other dsa_switch structures of the fabric
and relying too much on the dsa_to_port helper, use the new list of
switch fabric ports to define the mask of the local ports allowed to
receive frames from another port of the fabric.
Vivien Didelot [Mon, 21 Oct 2019 20:51:16 +0000 (16:51 -0400)]
net: dsa: add ports list in the switch fabric
Add a list of switch ports within the switch fabric. This will help the
lookup of a port inside the whole fabric, and it is the first step
towards supporting multiple CPU ports, before deprecating the usage of
the unique dst->cpu_dp pointer.
In preparation for a future allocation of the dsa_port structures,
return -ENOMEM in case no structure is returned, even though this
error cannot be reached yet.
====================
The attempt to improve performance by changing the PCIe max read request
size was added in the vendor driver more than 10 years back and copied
to r8169 driver. In the vendor driver this has been removed long ago.
Obviously it had no effect, also in my tests I didn't see any
difference. Typically the max payload size is less than 512 bytes
anyway, and the PCI core takes care that the maximum supported value
is set. So let's remove fiddling with PCIe max read request size from
r8169 too. This change allows to simplify the driver in the subsequent
three patches of this series.
====================
Heiner Kallweit [Mon, 21 Oct 2019 19:24:15 +0000 (21:24 +0200)]
r8169: remove rtl_hw_start_8168bef
We can remove rtl_hw_start_8168bef() and use rtl_hw_start_8168b()
instead because setting register Config4 is done in
rtl_jumbo_config(), being called from rtl_hw_start().
Heiner Kallweit [Mon, 21 Oct 2019 19:22:42 +0000 (21:22 +0200)]
r8169: simplify setting PCI_EXP_DEVCTL_NOSNOOP_EN
r8168b_0_hw_jumbo_enable() and r8168b_0_hw_jumbo_disable() both do the
same and just set PCI_EXP_DEVCTL_NOSNOOP_EN. We can simplify the code
by moving this setting for RTL8168B to rtl_hw_start_8168().
Heiner Kallweit [Mon, 21 Oct 2019 19:22:07 +0000 (21:22 +0200)]
r8169: remove fiddling with the PCIe max read request size
The attempt to improve performance by changing the PCIe max read request
size was added in the vendor driver more than 10 years back and copied
to r8169 driver. In the vendor driver this has been removed long ago.
Obviously it had no effect, also in my tests I didn't see any
difference. Typically the max payload size is less than 512 bytes
anyway, and the PCI core takes care that the maximum supported value
is set. So let's remove fiddling with PCIe max read request size from
r8169 too.
Ursula Braun [Mon, 21 Oct 2019 14:13:15 +0000 (16:13 +0200)]
net/smc: remove close abort worker
With the introduction of the link group termination worker there is
no longer a need to postpone smc_close_active_abort() to a worker.
To protect socket destruction due to normal and abnormal socket
closing, the socket refcount is increased.
Ursula Braun [Mon, 21 Oct 2019 14:13:13 +0000 (16:13 +0200)]
net/smc: improve abnormal termination of link groups
If a link group and its connections must be terminated,
* wake up socket waiters
* do not enable buffer reuse
A linkgroup might be terminated while normal connection closing
is running. Avoid buffer reuse and its related LLC DELETE RKEY
call, if linkgroup termination has started. And use the earliest
indication of linkgroup termination possible, namely the removal
from the linkgroup list.
Ursula Braun [Mon, 21 Oct 2019 14:13:12 +0000 (16:13 +0200)]
net/smc: tell peers about abnormal link group termination
There are lots of link group termination scenarios. Most of them
still allow to inform the peer of the terminating sockets about aborting.
This patch tries to call smc_close_abort() for terminating sockets.
And the internal TCP socket is reset with tcp_abort().
Ursula Braun [Mon, 21 Oct 2019 14:13:11 +0000 (16:13 +0200)]
net/smc: improve link group freeing
Usually link groups are freed delayed to enable quick connection
creation for a follow-on SMC socket. Terminated link groups are
freed faster. This patch makes sure, fast schedule of link group
freeing is not rescheduled by a delayed schedule. And it makes sure
link group freeing is not rescheduled, if the real freeing is already
running.
Ursula Braun [Mon, 21 Oct 2019 14:13:10 +0000 (16:13 +0200)]
net/smc: improve abnormal termination locking
Locking hierarchy requires that the link group conns_lock can be
taken if the socket lock is held, but not vice versa. Nevertheless
socket termination during abnormal link group termination should
be protected by the socket lock.
This patch reduces the time segments the link group conns_lock is
held to enable usage of lock_sock in smc_lgr_terminate().
Ursula Braun [Mon, 21 Oct 2019 14:13:09 +0000 (16:13 +0200)]
net/smc: terminate link group without holding lgr lock
When a link group is to be terminated, it is sufficient to hold
the lgr lock when unlinking the link group from its list.
Move the lock-protected link group unlinking into smc_lgr_terminate().
Ursula Braun [Mon, 21 Oct 2019 14:13:08 +0000 (16:13 +0200)]
net/smc: cancel send and receive for terminated socket
The resources for a terminated socket are being cleaned up.
This patch makes sure
* no more data is received for an actively terminated socket
* no more data is sent for an actively or passively terminated socket
Jakub Kicinski [Tue, 22 Oct 2019 17:31:54 +0000 (10:31 -0700)]
Merge branch 'mlxsw-core-extend-qsfp-eeprom-size'
Ido Schimmel says:
====================
Vadim says:
This patch set extends the size of QSFP EEPROM for the cable types
SSF-8436 and SFF-8636 from 256 bytes to 640 bytes. This allows ethtool
to show correct information for these cable types (more details below).
Patch #1 adds a macro that computes the EEPROM page number from the
provided offset specified in the request.
Patch #2 teaches the driver to access the information stored in the
upper pages of the QSFP memory map.
Details and examples:
SFF-8436 specification defines pages 0, 1, 2 and 3. Page 0 contains
lower memory page offsets (from 0x00 to 0x7f) and upper page offsets
(from 0x80 to 0xfe). Upper pages 1, 2 and 3 are optional and can be
empty.
Page 1 is provided if upper page 0 byte 0xc3 bit 6 is set.
Page 2 is provided if upper page 0 byte 0xc3 bit 7 is set.
Page 3 is provided if lower page 0 byte 0x02 bit 2 is cleared.
Offset 0xc3 for the upper page is provided as 0x43 = 0xc3 - 0x80.
As a result of exposing 256 bytes only, ethtool shows wrong information
for pages 1, 2 and 3. In the below hex dump from ethtool for a cable
compliant to SFF-8636 specification, it can be seen that EEPROM of this
device contains optical diagnostic page (lower page 0 byte 0x02 bit 2 is
cleared), but it is not exposed, as the length defined for this type is
256 bytes.
After changing the length returned by get_module_info() callback from
256 bytes to 640 bytes, the upper pages 1, 2 and 3 are exposed by
ethtool. In the below hex dump from the same cable it can be seen that
the optical diagnostic page (page 3, from offset 0x0200) has non-zero
data.
And 'ethtool -m sfp42' shows the real values for the below fields, while
before it exposed zeros for these fields:
Laser bias current high alarm threshold : 8.500 mA
Laser bias current low alarm threshold : 5.492 mA
Laser bias current high warning threshold : 8.000 mA
Laser bias current low warning threshold : 6.000 mA
Laser output power high alarm threshold : 3.4673 mW / 5.40 dBm
Laser output power low alarm threshold : 0.0724 mW / -11.40 dBm
Laser output power high warning threshold : 1.7378 mW / 2.40 dBm
Laser output power low warning threshold : 0.1445 mW / -8.40 dBm
Module temperature high alarm threshold : 80.00 degrees C / 176.00 F
Module temperature low alarm threshold : -10.00 degrees C / 14.00 F
Module temperature high warning threshold : 70.00 degrees C / 158.00 F
Module temperature low warning threshold : 0.00 degrees C / 32.00 F
Module voltage high alarm threshold : 3.5000 V
Module voltage low alarm threshold : 3.1000 V
====================
Vadim Pasternak [Mon, 21 Oct 2019 10:30:31 +0000 (13:30 +0300)]
mlxsw: core: Extend QSFP EEPROM size for ethtool
Extend the size of QSFP EEPROM for the cable types SSF8436 and SFF8636
from 256 to 640 bytes in order to expose all the EEPROM pages by
ethtool.
For SFF-8636 and SFF-8436 specifications, the driver exposes 256 bytes
of data for ethtool's get_module_eeprom() callback. This is because the
driver uses the below defines to specify SFF module length in ethtool's
get_module_info() callback:
'ETH_MODULE_SFF_8636_LEN' and 'ETH_MODULE_SFF_8436_LEN' (both are 256).
As a result of exposing 256 bytes only, ethtool shows wrong "zero" info
for pages 1, 2, 3.
The patch changes the length returned by callback for get_module_info()
to the values from the next defines: 'ETH_MODULE_SFF_8636_MAX_LEN' and
'ETH_MODULE_SFF_8436_MAX_LEN' (both are 640) to allow exposing of upper
page 1, 2 and 3.
Juergen Gross [Mon, 21 Oct 2019 05:30:52 +0000 (07:30 +0200)]
xen/netback: cleanup init and deinit code
Do some cleanup of the netback init and deinit code:
- add an omnipotent queue deinit function usable from
xenvif_disconnect_data() and the error path of xenvif_connect_data()
- only install the irq handlers after initializing all relevant items
(especially the kthreads related to the queue)
- there is no need to use get_task_struct() after creating a kthread
and using put_task_struct() again after having stopped it.
- use kthread_run() instead of kthread_create() to spare the call of
wake_up_process().
Jakub Kicinski [Tue, 22 Oct 2019 03:16:12 +0000 (20:16 -0700)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
1GbE Intel Wired LAN Driver Updates 2019-10-21
This series contains updates to e1000e and igc only.
Sasha adds stream control transmission protocol (SCTP) CRC checksum
support for igc. Also added S0ix support to the e1000e driver. Then
added multicast support by adding the address list to the MTA table and
providing the option for IPv6 address for igc. In addition, added
receive checksum support to igc as well. Lastly, cleaned up some code
that was not fully implemented yet for the VLAN filter table array.
v2: Dropped patch 1 & 2 from the original series. Patch 1 is being sent
to 'net' tree as a fix and patch 2 implementation needs to be
re-worked. Updated the patch to add support for S0ix to fix the
reverse Xmas tree issues and made the entry/exit functions void
since they constantly returned success. All based on community
feedback.
v3: Cleaned up patch 4 of the series based on feedback from the
community. Cleaned up a stray comma in a code comment and removed
the 'inline' of a function that would be inlined by the compiler
anyways.
====================