mac80211: Use proper smps_mode enum in sta opmode event
SMPS_MODE change value notified via nl80211 contains mac80211
specific value(ieee80211_smps_mode) and user space application
will not know those values. This patch add support to map
the mac80211 enum value to nl80211_smps_mode which will be
understood by the userspace application.
cfg80211: fix data type of sta_opmode_info parameter
Currently bw and smps_mode are u8 type value in sta_opmode_info
structure. This values filled in mac80211 from ieee80211_sta_rx_bandwidth
and ieee80211_smps_mode. These enum values are specific to mac80211 and
userspace/cfg80211 doesn't know about that. This will lead to incorrect
result/assumption by the user space application.
Change bw and smps_mode parameters to their respective enums in nl80211.
mac80211: notify driver for change in multicast rates
With drivers implementing rate control in driver or firmware
rate_control_send_low() may not get called, and thus the
driver needs to know about changes in the multicast rate.
Dmitry Lebed [Thu, 1 Mar 2018 09:39:15 +0000 (12:39 +0300)]
cfg80211/nl80211: add DFS offload flag
Add wiphy EXT_FEATURE flag to indicate that HW or driver does
all DFS actions by itself.
User-space functionality already implemented in hostapd using
vendor-specific (QCA) OUI to advertise DFS offload support.
Need to introduce generic flag to inform about DFS offload support.
For devices with DFS_OFFLOAD flag set user-space will no longer
need to issue CAC or do any actions in response to
"radar detected" events. HW will do everything by itself and send
events to user-space to indicate that CAC was started/finished, etc.
mac80211_hwsim: fix use-after-free bug in hwsim_exit_net
When destroying a net namespace, all hwsim interfaces, which are not
created in default namespace are deleted. But the async deletion of the
interfaces could last longer than the actual destruction of the
namespace, which results to an use after free bug. Therefore use
synchronous deletion in this case.
Fixes: 100cb9ff40e0 ("mac80211_hwsim: Allow managing radios from non-initial namespaces") Reported-by: [email protected] Signed-off-by: Benjamin Beichler <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
David S. Miller [Tue, 20 Mar 2018 16:29:58 +0000 (12:29 -0400)]
Merge branch 'dsa-mv88e6xxx-some-fixes'
Uwe Kleine-König says:
====================
net: dsa: mv88e6xxx: some fixes
these patches target net-next and got approved by Andrew Lunn.
Compared to (implicit) v1, I dropped the patch that I didn't know if it
was right because of missing documentation on my side. But Andrew
already cared for that in a patch that is now adfccf118211 in net-next.
====================
The switch name is emitted in the kernel log, so having the right name
there is nice.
Fixes: 1558727a1c1b ("net: dsa: mv88e6xxx: Add support for ethernet switch 88E6141") Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Uwe Kleine-König <[email protected]> Signed-off-by: David S. Miller <[email protected]>
====================
mlxsw: Adapt driver to upcoming firmware versions
The first two patches make sure that reserved fields are set to zero, as
required by the device's programmer's reference manual (PRM).
Last two patches prevent the driver from performing an invalid operation
that is going to be denied by upcoming firmware versions.
====================
Ido Schimmel [Mon, 19 Mar 2018 07:51:03 +0000 (09:51 +0200)]
mlxsw: spectrum_acl: Do not invalidate already invalid ACL groups
When a new ACL group is created its region (ACL) list is initially
empty. Thus, the call to mlxsw_sp_acl_tcam_group_update() would
basically invalidate an already invalid (non-existent) group.
Remove the unnecessary call and make the function symmetric to its del()
counterpart.
Matthew Wilcox [Thu, 15 Mar 2018 02:57:24 +0000 (19:57 -0700)]
mlx5: Remove call to ida_pre_get
The mlx5 driver calls ida_pre_get() in a loop for no readily apparent
reason. The driver uses ida_simple_get() which will call ida_pre_get()
by itself and there's no need to use ida_pre_get() unless using
ida_get_new().
David S. Miller [Mon, 19 Mar 2018 19:12:03 +0000 (15:12 -0400)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2018-03-19
This series contains updates to i40e and i40evf only.
Alex fixes a potential deadlock in the configure_clsflower function in
i40evf, where we exit with the "IN_CRITICAL_TASK" bit set while
notifying the PF of flower filters.
Jan fixed an issue where it was possible to set a mode that is not
allowed which resulted in link being down, so fixed the parity between
i40e_set_link_ksettings() and i40e_get_link_ksettings().
Patryk fixes a bug where a backplane device was allowing the setting of
link settings, which is not allowed.
Shiraz fixes a crash when entering S3 because the client interface was
freeing the MSIx vectors while they are still in use.
Jake fixes up a function header comment to document a newly added
parameter. Also cleaned up flags that were never used.
Doug fixes the incorrect return type for i40e_aq_add_cloud_filters().
====================
Paweł Jabłoński [Mon, 19 Mar 2018 16:28:04 +0000 (09:28 -0700)]
i40e: Fix the polling mechanism of GLGEN_RSTAT.DEVSTATE
This fixes the polling mechanism of GLGEN_RSTAT.DEVSTATE in the
PF Reset path when Global Reset is in progress. While the driver
is polling for the end of the PF Reset and the Global Reset is
triggered, abandon the PF Reset path and prepare for the
upcoming Global Reset.
Jacob Keller [Mon, 19 Mar 2018 16:28:04 +0000 (09:28 -0700)]
i40e: add doxygen comment for new mode parameter
A recent patch updated the signature for i40e_aq_set_switch_config() to
add a new parameter 'mode'. It forgot to document the parameter in the
doxygen function header comment. Add the parameter to the function
description now.
The i40e_set_link_ksettings and i40e_get_link_ksettings use different
codepaths to check available and supported advertisement modes. This
creates scenarios where it's possible to set a mode that's not allowed,
resulting in a link down.
Fix setting advertisement in i40e_set_link_ksettings by calling
i40e_get_link_ksettings to check what modes are allowed.
Alexander Duyck [Mon, 19 Mar 2018 16:28:03 +0000 (09:28 -0700)]
i40evf: Reorder configure_clsflower to avoid deadlock on error
While doing some code review I noticed that we can get into a state where
we exit with the "IN_CRITICAL_TASK" bit set while notifying the PF of
flower filters. This patch is meant to address that plus tweak the ordering
of the while loop waiting on it slightly so that we don't wait an extra
period after we have failed for the last time.
David S. Miller [Sun, 18 Mar 2018 20:52:59 +0000 (16:52 -0400)]
Merge branch 'mv88e6xxx-auto-phy-intr'
Andrew Lunn says:
====================
Automatic PHY interrupts
Now that the mv88e6xxx driver either installs in interrupt handler, or
polls for interrupts, it is possible to always handle PHY interrupts,
rather than have phylib perform the polling. This speeds up detection
of link changes and reduces the load on the MDIO bus, which is
beneficial for PTP.
====================
Andrew Lunn [Sat, 17 Mar 2018 19:32:05 +0000 (20:32 +0100)]
net: dsa: mv88e6xxx: Add MDIO interrupts for internal PHYs
When registering an MDIO bus, it is possible to pass an array of
interrupts, one per address on the bus. phylib will then associate the
interrupt to the PHY device, if no other interrupt is provided.
Some of the global2 interrupts are PHY interrupts. Place them into the
MDIO bus structure.
Andrew Lunn [Sat, 17 Mar 2018 19:32:04 +0000 (20:32 +0100)]
net: dsa: mv88e6xxx: Add number of internal PHYs
Add to the info structure the number of internal PHYs, if they generate
interrupts. Some of the older generations of switches have internal
PHYs, but no interrupt registers. In this case, set the count to zero.
Andrew Lunn [Sat, 17 Mar 2018 19:32:03 +0000 (20:32 +0100)]
net: dsa: mv88e6xxx: Add missing g1 IRQ numbers
With the recent change to polling for interrupts, it is important that
the number of global 1 interrupts is listed. Without it, the driver
requests an interrupt domain for zero interrupts, which returns
EINVAL, and the probe fails.
mv88e6xxx_get_ethtool_stats() calls mv88e6xxx_get_stats() which calls both
chip->info->ops->stats_get_stats(), which holds the register lock, and
chip->info->ops->serdes_get_stats() which does not. Have
chip->info->ops->serdes_get_stats() be running with the register lock held to
avoid such assertions.
Fixes: 436fe17d273b ("net: dsa: mv88e6xxx: Allow the SERDES interfaces to have statistics") Signed-off-by: Florian Fainelli <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Andrew Lunn [Sat, 17 Mar 2018 19:21:09 +0000 (20:21 +0100)]
net: dsa: mv88e6xxx: Fix IRQ when loading module
Handle polled interrupts correctly when loading the module.
Signed-off-by: Andrew Lunn <[email protected]> Fixes: 294d711ee8c0 ("net: dsa: mv88e6xxx: Poll when no interrupt defined") Signed-off-by: David S. Miller <[email protected]>
====================
selftests: pmtu: Add further vti/vti6 MTU and PMTU tests
Patches 5/10 to 10/10 add tests to verify default MTU assignment
for vti4 and vti6 interfaces, to check that MTU values set on new
link and link changes are properly taken and validated, and to
verify PMTU exceptions on vti4 interfaces.
Patch 1/10 reverses function return codes as suggested by David
Ahern.
Patch 2/10 fixes the helper to fetch exceptions MTU to run in the
passed namespace.
Patches 3/10 and 4/10 are preparation work to make it easier to
introduce those tests.
v2: Reverse return codes, and make output prettier in 4/9 by
using padded printf, test descriptions and buffered error
strings. Remove accidental output to /dev/kmsg from 10/10
(was 9/9).
====================
Stefano Brivio [Sat, 17 Mar 2018 01:31:47 +0000 (02:31 +0100)]
selftests: pmtu: Add pmtu_vti6_link_change_mtu test
This test checks that MTU configured from userspace is used on
link creation and changes, and that when it's not passed from
userspace, it's calculated properly from the MTU of the lower
layer.
Stefano Brivio [Sat, 17 Mar 2018 01:31:42 +0000 (02:31 +0100)]
selftests: pmtu: Add pmtu_vti4_default_mtu test
This test checks that the MTU assigned by default to a vti (IPv4)
interface created on top of veth is simply veth's MTU minus the
length of the encapsulated IPv4 header.
Stefano Brivio [Sat, 17 Mar 2018 01:31:41 +0000 (02:31 +0100)]
selftests: pmtu: Introduce support for multiple tests
Introduce list of tests and their descriptions, and loop on it
in main body.
Tests will now just take care of calling setup with a list of
"units" they need, and return 0 on success, 1 on failure, 2 if
the test had to be skipped.
Main script body will take care of displaying results and
cleaning up after every test. Introduce guard variable so that
we don't clean up twice in case of interrupts or unexpected
failures.
The pmtu_vti6_exception test can now run its third step even if
the previous one failed, as we can return values from it.
Also introduce support to display test descriptions, and display
aligned OK/FAIL/SKIP test outcomes. Buffer error strings so that
in case of failure we can display them right under the outcome
for each test.
Stefano Brivio [Sat, 17 Mar 2018 01:31:39 +0000 (02:31 +0100)]
selftests: pmtu: Use namespace command prefix to fetch route mtu
In 7af137b72131 ("selftests: net: Introduce first PMTU test") I
accidentally assumed route_get_* helpers would run from a single
namespace. Make them a bit more generic, by passing the
namespace command prefix as a parameter instead.
Fixes: 7af137b72131 ("selftests: net: Introduce first PMTU test") Signed-off-by: Stefano Brivio <[email protected]> Signed-off-by: David S. Miller <[email protected]>
====================
ibmvnic: Update TX pool and TX routines
This patch restructures the TX pool data structure and provides a
separate TX pool array for TSO transmissions. This is already used
in some way due to our unique DMA situation, namely that we cannot
use single DMA mappings for packet data. Previously, both buffer
arrays used the same pool entry. This restructuring allows for
some additional cleanup in the driver code, especially in some
places in the device transmit routine.
In addition, it allows us to more easily track the consumer
and producer indexes of a particular pool. This has been
further improved by better tracking of in-use buffers to
prevent possible data corruption in case an invalid buffer
entry is used.
v5: Fix bisectability mistake in the first patch. Removed
TSO-specific data in a later patch when it is no longer used.
v4: Fix error in 7th patch that causes an oops by using
the older fixed value for number of buffers instead
of the respective field in the tx pool data structure
v3: Forgot to update TX pool cleaning function to handle new data
structures. Included 7th patch for that.
v2: Fix typo in 3/6 commit subject line
====================
Thomas Falcon [Sat, 17 Mar 2018 01:00:31 +0000 (20:00 -0500)]
ibmvnic: Remove unused TSO resources in TX pool structure
Finally, remove the TSO-specific fields in the TX pool
strcutures. These are no longer needed with the introduction
of separate buffer pools for TSO transmissions.
Thomas Falcon [Sat, 17 Mar 2018 01:00:30 +0000 (20:00 -0500)]
ibmvnic: Update TX pool cleaning routine
Update routine that cleans up any outstanding transmits that
have not received completions when the device needs to close.
Introduces a helper function that cleans one TX pool to make
code more readable.
Thomas Falcon [Sat, 17 Mar 2018 01:00:29 +0000 (20:00 -0500)]
ibmvnic: Improve TX buffer accounting
Improve TX pool buffer accounting to prevent the producer
index from overruning the consumer. First, set the next free
index to an invalid value if it is in use. If next buffer
to be consumed is in use, drop the packet.
Finally, if the transmit fails for some other reason, roll
back the consumer index and set the free map entry to its original
value. This should also be done if the DMA map fails.
Thomas Falcon [Sat, 17 Mar 2018 01:00:28 +0000 (20:00 -0500)]
ibmvnic: Update TX and TX completion routines
Update TX and TX completion routines to account for TX pool
restructuring. TX routine first chooses the pool depending
on whether a packet is GSO or not, then uses it accordingly.
For the completion routine to know which pool it needs to use,
set the most significant bit of the correlator index to one
if the packet uses the TSO pool. On completion, unset the bit
and use the correlator index to release the buffer pool entry.
Thomas Falcon [Sat, 17 Mar 2018 01:00:25 +0000 (20:00 -0500)]
ibmvnic: Update and clean up reset TX pool routine
Update TX pool reset routine to accommodate new TSO pool array. Introduce
a function that resets one TX pool, and use that function to initialize
each pool in both pool arrays.
Thomas Falcon [Sat, 17 Mar 2018 01:00:24 +0000 (20:00 -0500)]
ibmvnic: Generalize TX pool structure
Remove some unused fields in the structure and include values
describing the individual buffer size and number of buffers in
a TX pool. This allows us to use these fields for TX pool buffer
accounting as opposed to using hard coded values. Include a new
pool array for TSO transmissions.
In VLAN_AWARE mode CPSW can insert VLAN header encapsulation word on Host
port 0 egress (RX) before the packet data if RX_VLAN_ENCAP bit is set in
CPSW_CONTROL register. VLAN header encapsulation word has following format:
This feature can be used to implement TX VLAN offload in case of
VLAN-tagged packets and to insert VLAN tag in case Non-tagged packet was
received on port with PVID set. As per documentation, CPSW never modifies
packet data on Host egress (RX) and as result, without this feature
enabled, Host port will not be able to receive properly packets which
entered switch non-tagged through external Port with PVID set (when
non-tagged packet forwarded from external Port with PVID set to another
external Port - packet will be VLAN tagged properly).
Implementation details:
- on RX driver will check CPDMA status bit RX_VLAN_ENCAP BIT(19) in CPPI
descriptor to identify when VLAN header encapsulation word is present.
- PKT_Type = 0x01 or 0x02 then ignore VLAN header encapsulation word and
pass packet as is;
- if HDR_PKT_Vid = 0 then ignore VLAN header encapsulation word and pass
packet as is;
- In dual mac mode traffic is separated between ports using default port
vlans, which are not be visible to Host and so should not be reported.
Hence, check for default port vlans in dual mac mode and ignore VLAN header
encapsulation word;
- otherwise fill SKB with VLAN info using __vlan_hwaccel_put_tag();
- PKT_Type = 0x00 (VLAN-tagged) then strip out VLAN header from SKB.
Arjun Vynipadath [Thu, 15 Mar 2018 12:04:14 +0000 (17:34 +0530)]
cxgb4: Fix queue free path of ULD drivers
Setting sge_uld_rxq_info to NULL in free_queues_uld().
We are referencing sge_uld_rxq_info in cxgb_up(). This
will fix a panic when interface is brought up after a
ULDq creation failure.
Fixes: 94cdb8bb993a (cxgb4: Add support for dynamic allocation
of resources for ULD) Signed-off-by: Arjun Vynipadath <[email protected]> Signed-off-by: Casey Leedom <[email protected]> Signed-off-by: Ganesh Goudhar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Sowmini Varadhan [Thu, 15 Mar 2018 10:54:26 +0000 (03:54 -0700)]
rds: tcp: must use spin_lock_irq* and not spin_lock_bh with rds_tcp_conn_lock
rds_tcp_connection allocation/free management has the potential to be
called from __rds_conn_create after IRQs have been disabled, so
spin_[un]lock_bh cannot be used with rds_tcp_conn_lock.
Bottom-halves that need to synchronize for critical sections protected
by rds_tcp_conn_lock should instead use rds_destroy_pending() correctly.
Reported-by: [email protected] Fixes: ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management") Signed-off-by: Sowmini Varadhan <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Sat, 17 Mar 2018 21:11:47 +0000 (17:11 -0400)]
Merge branch 'tipc-obsolete-zone-concept'
Jon Maloy says:
====================
tipc: obsolete zone concept
Functionality related to the 'zone' concept was never implemented in
TIPC. In this series we eliminate the remaining traces of it in the
code, and can hence take a first step in reducing the footprint and
complexity of the binding table.
====================
Jon Maloy [Thu, 15 Mar 2018 15:48:55 +0000 (16:48 +0100)]
tipc: some name changes
We rename some lists and fields in struct publication both to make
the naming more consistent and to better reflect their roles. We
also update the descriptions of those lists.
Jon Maloy [Thu, 15 Mar 2018 15:48:54 +0000 (16:48 +0100)]
tipc: merge two lists in struct publication
The size of struct publication can be reduced further. Membership in
lists 'nodesub_list' and 'local_list' is mutually exlusive, in that
remote publications use the former and local publications the latter.
We replace the two lists with one single, named 'binding_node' which
reflects what it really is.
Jon Maloy [Thu, 15 Mar 2018 15:48:53 +0000 (16:48 +0100)]
tipc: remove zone_list member in struct publication
As a further consequence of the previous commits, we can also remove
the member 'zone_list 'in struct name_info and struct publication.
Instead, we now let the member cluster_list take over the role a
container of all publications of a given <type,lower, upper>.
We also remove the counters for the size of those lists, since
they don't serve any purpose.
Jon Maloy [Thu, 15 Mar 2018 15:48:52 +0000 (16:48 +0100)]
tipc: remove zone publication list in name table
As a consequence of the previous commit we nan now eliminate zone scope
related lists in the name table. We start with name_table::publ_list[3],
which can now be replaced with two lists, one for node scope publications
and one for cluster scope publications.
Jon Maloy [Thu, 15 Mar 2018 15:48:51 +0000 (16:48 +0100)]
tipc: obsolete TIPC_ZONE_SCOPE
Publications for TIPC_CLUSTER_SCOPE and TIPC_ZONE_SCOPE are in all
aspects handled the same way, both on the publishing node and on the
receiving nodes.
Despite previous ambitions to the contrary, this is never going to change,
so we take the conseqeunce of this and obsolete TIPC_ZONE_SCOPE and related
macros/functions. Whenever a user is doing a bind() or a sendmsg() attempt
using ZONE_SCOPE we translate this internally to CLUSTER_SCOPE, while we
remain compatible with users and remote nodes still using ZONE_SCOPE.
Furthermore, the non-formalized scope value 0 has always been permitted
for use during lookup, with the same meaning as ZONE_SCOPE/CLUSTER_SCOPE.
We now permit it even as binding scope, but for compatibility reasons we
choose to not change the value of TIPC_CLUSTER_SCOPE.
this series continues to review and to convert pernet_operations
to make them possible to be executed in parallel for several
net namespaces at the same time. There are different operations
over the tree, mostly are ipvs.
====================
Kirill Tkhai [Thu, 15 Mar 2018 09:11:44 +0000 (12:11 +0300)]
net: Convert ip_vs_ftp_ops
These pernet_operations register and unregister ipvs app.
register_ip_vs_app(), unregister_ip_vs_app() and
register_ip_vs_app_inc() modify per-net structures,
and there are no global structures touched. So,
this looks safe to be marked as async.
Kirill Tkhai [Thu, 15 Mar 2018 09:11:25 +0000 (12:11 +0300)]
net: Convert ipvs_core_ops
These pernet_operations register and unregister nf hooks,
/proc entries, sysctl, percpu statistics. There are several
global lists, and the only list modified without exclusive
locks is ip_vs_conn_tab in ip_vs_conn_flush(). We iterate
the list and force the timers expire at the moment. Since
there were possible several timer expirations before this
patch, and since they are safe, the patch does not invent
new parallelism of their destruction. These pernet_operations
look safe to be converted.
Kirill Tkhai [Thu, 15 Mar 2018 09:11:16 +0000 (12:11 +0300)]
net: Convert ovs_net_ops
These pernet_operations initialize and destroy net_generic()
data pointed by ovs_net_id. Exit method destroys vports from
alive net to exiting net. Since they are only pernet_operations
interested in this data, and exit method is executed under
exclusive global lock (ovs_mutex), they are safe to be executed
in parallel.
Kirill Tkhai [Thu, 15 Mar 2018 09:11:06 +0000 (12:11 +0300)]
net: Convert mpls_net_ops
These pernet_operations register and unregister sysctl table.
Exit methods frees platform_labels from net::mpls::platform_label.
Everything is per-net, and they looks safe to be marked async.
Yousuk Seung [Fri, 16 Mar 2018 17:51:49 +0000 (10:51 -0700)]
net-tcp_bbr: set tp->snd_ssthresh to BDP upon STARTUP exit
Set tp->snd_ssthresh to BDP upon STARTUP exit. This allows us
to check if a BBR flow exited STARTUP and the BDP at the
time of STARTUP exit with SCM_TIMESTAMPING_OPT_STATS. Since BBR does not
use snd_ssthresh this fix has no impact on BBR's behavior.
selftests/txtimestamp: Add more configurable parameters
Add a way to configure if poll() should wait forever for an event, the
number of packets that should be sent for each and if there should be
any delay between packets.
Intiyaz Basha [Fri, 16 Mar 2018 17:21:31 +0000 (10:21 -0700)]
liquidio: Simplified napi poll
1) Moved interrupt enable related code from octeon_process_droq_poll_cmd()
to separate function octeon_enable_irq().
2) Removed wrapper function octeon_process_droq_poll_cmd(), and directlyi
using octeon_droq_process_poll_pkts().
3) Removed unused macros POLL_EVENT_XXX.
Karsten Graul [Fri, 16 Mar 2018 14:06:41 +0000 (15:06 +0100)]
net/smc: enable ipv6 support for smc
Add ipv6 support to the smc socket layer functions. Make use of the
updated clc layer functions to retrieve and match ipv6 information.
The indicator for ipv4 or ipv6 is the protocol constant that is provided
in the socket() call with address family AF_SMC.
Karsten Graul [Fri, 16 Mar 2018 14:06:40 +0000 (15:06 +0100)]
net/smc: add ipv6 support to CLC layer
The CLC layer is updated to support ipv6 proposal messages from peers and
to match incoming proposal messages against the ipv6 addresses of the net
device. struct smc_clc_ipv6_prefix is updated to provide the space for an
ipv6 address (struct was not used before). SMC_CLC_MAX_LEN is updated to
include the size of the proposal prefix. Existing code in net is not
affected, the previous SMC_CLC_MAX_LEN value is large enough to hold ipv4
proposal messages.
Karsten Graul [Fri, 16 Mar 2018 14:06:39 +0000 (15:06 +0100)]
net/smc: restructure netinfo for CLC proposal msgs
Introduce functions smc_clc_prfx_set to retrieve IP information for the
CLC proposal msg and smc_clc_prfx_match to match the contents of a
proposal message against the IP addresses of the net device. The new
functions replace the functionality provided by smc_clc_netinfo_by_tcpsk,
which is removed by this patch. The match functionality is extended to
scan all ipv4 addresses of the net device for a match against the
ipv4 subnet from the proposal msg.
rtnl_lock() is widely used mutex in kernel. Some of kernel code
does memory allocations under it. In case of memory deficit this
may invoke OOM killer, but the problem is a killed task can't
exit if it's waiting for the mutex. This may be a reason of deadlock
and panic.
This patchset adds a new primitive, which responds on SIGKILL,
and it allows to use it in the places, where we don't want
to sleep forever. Also, the first place is made to use it.
====================
Kirill Tkhai [Wed, 14 Mar 2018 19:17:20 +0000 (22:17 +0300)]
net: Add rtnl_lock_killable()
rtnl_lock() is widely used mutex in kernel. Some of kernel code
does memory allocations under it. In case of memory deficit this
may invoke OOM killer, but the problem is a killed task can't
exit if it's waiting for the mutex. This may be a reason of deadlock
and panic.
This patch adds a new primitive, which responds on SIGKILL, and
it allows to use it in the places, where we don't want to sleep
forever.
====================
net/ipv6: Address checks need to consider the L3 domain
IPv6 prohibits a local address from being used as a gateway for a route.
However, it is ok for the gateway to be a local address in a different L3
domain (e.g., VRF). This allows, for example, veth pairs to connect VRFs.
ip6_route_info_create calls ipv6_chk_addr_and_flags for gateway addresses
to determine if the address is a local one, but ipv6_chk_addr_and_flags
does not currently consider L3 domains. As a result routes can not be
added in one VRF with a nexthop that points to a local address in a
second VRF.
Resolve by comparing the l3mdev for the passed in device and requiring an
l3mdev match with the device containing an address. The intent of checking
for an address on the specified device versus any device in the domain is
mantained by a new argument to skip the check between the passed in device
and the device with the address.
Patch 1 moves the gateway validation from ip6_route_info_create into a
helper; the function is long enough and refactoring drops the indent
level.
Patch 2 adds a skip_dev_check argument to ipv6_chk_addr_and_flags to
allow a device to always be passed yet skip the device check when
looking at addresses and fixes up a few ipv6_chk_addr callers that
pass a NULL device.
Patch 3 adds l3mdev checks to ipv6_chk_addr_and_flags.
Patches 4 and 5 do some refactoring to the fib_tests script and then
patch 6 adds nexthop validation tests.
v4
- separated l3mdev check into a separate patch (patch 3 of this set)
as suggested by Kirill
- consolidated dev and ipv6_chk_addr_and_flags call into 1 if (Kirill)
- added a temp variable for gw type (Kirill)
v3
- set skip_dev_check in ipv6_chk_addr based on dev == NULL (per
comment from Ido)
v2
- handle 2 variations of route spec with sane error path
- add test cases
====================
David Ahern [Tue, 13 Mar 2018 15:29:41 +0000 (08:29 -0700)]
selftests: fib_tests: Add IPv6 nexthop spec tests
Add series of tests for valid and invalid nexthop specs for IPv6.
$ TEST=fib_nexthop_test ./fib_tests.sh
...
IPv6 nexthop tests
TEST: Directly connected nexthop, unicast address [ OK ]
TEST: Directly connected nexthop, unicast address with device [ OK ]
TEST: Gateway is linklocal address [ OK ]
TEST: Gateway is linklocal address, no device [ OK ]
TEST: Gateway can not be local unicast address [ OK ]
TEST: Gateway can not be local unicast address, with device [ OK ]
TEST: Gateway can not be a local linklocal address [ OK ]
TEST: Gateway can be local address in a VRF [ OK ]
TEST: Gateway can be local address in a VRF, with device [ OK ]
TEST: Gateway can be local linklocal address in a VRF [ OK ]
TEST: Redirect to VRF lookup [ OK ]
TEST: VRF route, gateway can be local address in default VRF [ OK ]
TEST: VRF route, gateway can not be a local address [ OK ]
TEST: VRF route, gateway can not be a local addr with device [ OK ]
David Ahern [Tue, 13 Mar 2018 15:29:38 +0000 (08:29 -0700)]
net/ipv6: Add l3mdev check to ipv6_chk_addr_and_flags
Lookup the L3 master device for the passed in device. Only consider
addresses on netdev's with the same master device. If the device is
not enslaved or is NULL, then the l3mdev is NULL which means only
devices not enslaved (ie, in the default domain) are considered.
David Ahern [Tue, 13 Mar 2018 15:29:37 +0000 (08:29 -0700)]
net/ipv6: Change address check to always take a device argument
ipv6_chk_addr_and_flags determines if an address is a local address and
optionally if it is an address on a specific device. For example, it is
called by ip6_route_info_create to determine if a given gateway address
is a local address. The address check currently does not consider L3
domains and as a result does not allow a route to be added in one VRF
if the nexthop points to an address in a second VRF. e.g.,
$ ip route add 2001:db8:1::/64 vrf r2 via 2001:db8:102::23
Error: Invalid gateway address.
where 2001:db8:102::23 is an address on an interface in vrf r1.
ipv6_chk_addr_and_flags needs to allow callers to always pass in a device
with a separate argument to not limit the address to the specific device.
The device is used used to determine the L3 domain of interest.
To that end add an argument to skip the device check and update callers
to always pass a device where possible and use the new argument to mean
any address in the domain.
Update a handful of users of ipv6_chk_addr with a NULL dev argument. This
patch handles the change to these callers without adding the domain check.
ip6_validate_gw needs to handle 2 cases - one where the device is given
as part of the nexthop spec and the other where the device is resolved.
There is at least 1 VRF case where deferring the check to only after
the route lookup has resolved the device fails with an unintuitive error
"RTNETLINK answers: No route to host" as opposed to the preferred
"Error: Gateway can not be a local address." The 'no route to host'
error is because of the fallback to a full lookup. The check is done
twice to avoid this error.
David Ahern [Tue, 13 Mar 2018 15:29:36 +0000 (08:29 -0700)]
net/ipv6: Refactor gateway validation on route add
Move gateway validation code from ip6_route_info_create into
ip6_validate_gw. Code move plus adjustments to handle the potential
reset of dev and idev and to make checkpatch happy.
Consider the situation where a macb netdev is connected through
a phydev that sits on a mii bus other than the one provided to
this particular netdev. This situation is what this patchset aims
to accomplish through the existing phy-handle optional binding.
This optional binding (as described in the ethernet DT bindings doc)
directs the netdev to the phydev to use. This is precisely the
situation this patchset aims to solve, so it makes sense to introduce
the functionality to this driver (where the physical layout discussed
was encountered).
The devicetree snippet would look something like this:
ethernet@deadbeef {
...
phy-handle = <&phy1> // tells the driver to use phy1 on the
// first mac's mdio bus (it's wired thusly)
...
}
...
The work done to add the phy_node in the first place (dacdbb4dfc1a1:
"net: macb: add fixed-link node support") will consume the
device_node (if found).
v2: Reorganization of mii probe/init functions, suggested by Andrew Lunn
v3: Moved some of the bus init code back into init (erroneously moved to probe)
some style issues, and an unintialized variable warning addressed.
v4: Add Reviewed-by: tags
Skip fallback code if phy-handle phandle is found
v5: Cleanup formatting issues
Fix compile failure introduced in 1/4 "net: macb: Reorganize macb_mii
bringup"
Fix typo in "Documentation: macb: Document phy-handle binding"
====================
Brad Mouring [Tue, 13 Mar 2018 21:32:15 +0000 (16:32 -0500)]
net: macb: Add phy-handle DT support
This optional binding (as described in the ethernet DT bindings doc)
directs the netdev to the phydev to use. This is useful for a phy
chip that has >1 phy in it, and two netdevs are using the same phy
chip (i.e. the second mac's phy lives on the first mac's MDIO bus)
The devicetree snippet would look something like this:
ethernet@deadbeef {
...
phy-handle = <&phy1> // tells the driver to use phy1 on the
// first mac's mdio bus (it's wired thusly)
...
}
The work done to add the phy_node in the first place (dacdbb4dfc1a1:
"net: macb: add fixed-link node support") will consume the
device_node (if found).