David S. Miller [Fri, 25 May 2018 20:41:23 +0000 (16:41 -0400)]
Merge tag 'mlx5e-updates-2018-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5e-updates-2018-05-19
This series contains updates for mlx5e netdevice driver with one subject,
DSCP to priority mapping, in the first patch Huy adds the needed API in
dcbnl, the second patch adds the needed mlx5 core capability bits for the
feature, and all other patches are mlx5e (netdev) only changes to add
support for the feature.
From: Huy Nguyen
Dscp to priority mapping for Ethernet packet:
These patches enable differentiated services code point (dscp) to
priority mapping for Ethernet packet. Once this feature is
enabled, the packet is routed to the corresponding priority based on its
dscp. User can combine this feature with priority flow control (pfc)
feature to have priority flow control based on the dscp.
Firmware interface:
Mellanox firmware provides two control knobs for this feature:
QPTS register allow changing the trust state between dscp and
pcp mode. The default is pcp mode. Once in dscp mode, firmware will
route the packet based on its dscp value if the dscp field exists.
QPDPM register allow mapping a specific dscp (0 to 63) to a
specific priority (0 to 7). By default, all the dscps are mapped to
priority zero.
Software interface:
This feature is controlled via application priority TLV. IEEE
specification P802.1Qcd/D2.1 defines priority selector id 5 for
application priority TLV. This APP TLV selector defines DSCP to priority
map. This APP TLV can be sent by the switch or can be set locally using
software such as lldptool. In mlx5 drivers, we add the support for net
dcb's getapp and setapp call back. Mlx5 driver only handles the selector
id 5 application entry (dscp application priority application entry).
If user sends multiple dscp to priority APP TLV entries on the same
dscp, the last sent one will take effect. All the previous sent will be
deleted.
This attribute combined with pfc attribute allows advanced user to
fine tune the qos setting for specific priority queue. For example,
user can give dedicated buffer for one or more priorities or user
can give large buffer to certain priorities.
The dcb buffer configuration will be controlled by lldptool.
>> lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6
maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6
>> lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0
sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively
After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to
choose dcbnl over devlink interface since this feature is intended to set
port attributes which are governed by the netdev instance of that port, where
devlink API is more suitable for global ASIC configurations.
The firmware trust state (in QPTS register) is changed based on the
number of dscp to priority application entries. When the first dscp to
priority application entry is added by the user, the trust state is
changed to dscp. When the last dscp to priority application entry is
deleted by the user, the trust state is changed to pcp.
When the port is in DSCP trust state, the transmit queue is selected
based on the dscp of the skb.
When the port is in DSCP trust state and vport inline mode is not NONE,
firmware requires mlx5 driver to copy the IP header to the
wqe ethernet segment inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features
such as xdpsq, icosq are not modified.
====================
Thomas Falcon [Thu, 24 May 2018 19:37:53 +0000 (14:37 -0500)]
ibmvnic: Fix partial success login retries
In its current state, the driver will handle backing device
login in a loop for a certain number of retries while the
device returns a partial success, indicating that the driver
may need to try again using a smaller number of resources.
The variable it checks to continue retrying may change
over the course of operations, resulting in reallocation
of resources but exits without sending the login attempt.
Guard against this by introducing a boolean variable that
will retain the state indicating that the driver needs to
reattempt login with backing device firmware.
This series re-structures the driver's ethtool rx flow
classification flow, following that it adds other flow
profiles and rx flow classification enhancements
via "ethtool -N/-U"
Please consider applying this to "net-next"
====================
Manish Chopra [Thu, 24 May 2018 16:54:53 +0000 (09:54 -0700)]
qed*: Support drop action classification
With this patch, User can configure for the supported
flows to be dropped. Added a stat "gft_filter_drop"
as well to be populated in ethtool for the dropped flows.
Manish Chopra [Thu, 24 May 2018 16:54:52 +0000 (09:54 -0700)]
qede: Support flow classification to the VFs.
With the supported classification modes [4 tuples based,
udp port based, src-ip based], flows can be classified
to the VFs as well. With this patch, flows can be re-directed
to the requested VF provided in "action" field of command.
Please note that driver doesn't really care about the queue bits
in "action" field for the VFs. Since queue will be still chosen
by FW using RSS hash. [I.e., the classification would be done
according to vport-only]
Manish Chopra [Thu, 24 May 2018 16:54:51 +0000 (09:54 -0700)]
qed*: Support other classification modes.
Currently, driver supports flow classification to PF
receive queues based on TCP/UDP 4 tuples [src_ip, dst_ip,
src_port, dst_port] only.
This patch enables to configure different flow profiles
[For example - only UDP dest port or src_ip based] on the
adapter so that classification can be done according to
just those fields as well. Although, at a time just one
type of flow configuration is supported due to limited
number of flow profiles available on the device.
Manish Chopra [Thu, 24 May 2018 16:54:50 +0000 (09:54 -0700)]
qede: Validate unsupported configurations
Validate and prevent some of the configurations for
unsupported [by firmware] inputs [for example - mac ext,
vlans, masks/prefix, tos/tclass] via ethtool -N/-U.
Manish Chopra [Thu, 24 May 2018 16:54:49 +0000 (09:54 -0700)]
qede: Refactor ethtool rx classification flow.
This patch simplifies the ethtool rx flow configuration
[via ethtool -U/-N] flow code base by dividing it logically
into various APIs based on given protocols. It also separates
various validations and calculations done along the flow
in their own APIs.
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a bug in the original fix to prevent out of bounds speculation when
multiple tail call maps from different branches or calls end up at the
same tail call helper invocation, from Daniel.
2) Two selftest fixes, one in reuseport_bpf_numa where test is skipped in
case of missing numa support and another one to update kernel config to
properly support xdp_meta.sh test, from Anders.
...
Would be great if you have a chance to merge net into net-next after that.
The verifier fix would be needed later as a dependency in bpf-next for
upcomig work there. When you do the merge there's a trivial conflict on
BPF side with 849fa50662fb ("bpf/verifier: refine retval R0 state for
bpf_get_stack helper"): Resolution is to keep both functions, the
do_refine_retval_range() and record_func_map().
====================
Arjun Vynipadath [Thu, 24 May 2018 14:03:37 +0000 (19:33 +0530)]
cxgb4/cxgb4vf: Notify link changes to OS-dependent code
We have a confusion of two different abstractions in the Common
Code: Physical Link (Port) and Logical Network Interface (Virtual
Interface), and we haven't been properly managing the state of the
intersection of those two abstractions.
On the one hand we have the Physical state of the Link -- up or down --
and on the other we have the logical state of the VI, enabled or not.
{ethN} refers to both the Physical and Logical State. In this case,
ifconfig only affects/interrogates the Logical State of a VI,
and ethtool only deals with the Physical State. And these are different.
So, just because we disable the VI, we don't really want to change the
Physical Link Up/Down state. Thus, the previous hack to set
"lc->link_ok = 0" when we disable a VI is completely incorrect.
Where we get into trouble is where the Physical Link State and the
Logical VI State cross swords. And that happens in
t4_handle_get_port_info() where we need to manage/safe the Physical
Link State, but we also need to know when the Logical VI State has
changed and pass that back up to the OS-dependent Driver routine
t4_os_link_changed() which is concerned about the Logical Interface.
So we enable a VI and that causes Firmware to send us a new Port
Information message, but if none of the Physical Link State
particulars have changed, we don't call t4_os_link_changed().
This fix uses the existing OS Contract APIs for the Common Code to
inform the OS-dependent portion of the Host Driver when the "Link" (really
Logical Network Interface) is "up" or "down". A new API
t4_enable_pi_params() is added which calls t4_enable_vi_params() and,
if that is successful, then calls back to the OS Contract API
t4_os_link_changed() notifying the OS-dependent layer of the
potential Link State change.
Ganesh Goudar [Thu, 24 May 2018 12:19:30 +0000 (17:49 +0530)]
cxgb4/cxgb4vf: link management changes for new SFP
newer SFPs like SFP28 and QSFP28 Transceiver Modules present
several new possibilities which we haven't faced before. Fix the
assumptions in the code reflecting the more limited capabilities
of previous Transceiver Module systems
YueHaibing [Thu, 24 May 2018 11:27:07 +0000 (19:27 +0800)]
net: fec: remove stale comment
This comment is outdated as fec_ptp_ioctl has been replaced by fec_ptp_set/fec_ptp_get
since commit 1d5244d0e43b ("fec: Implement the SIOCGHWTSTAMP ioctl")
Martin Habets [Thu, 24 May 2018 09:14:00 +0000 (10:14 +0100)]
sfc: stop the TX queue before pushing new buffers
efx_enqueue_skb() can push new buffers for the xmit_more functionality.
We must stops the TX queue before this or else the TX queue does not get
restarted and we get a netdev watchdog.
In the error handling we may now need to unwind more than 1 packet, and
we may need to push the new buffers onto the partner queue.
v2: In the error leg also push this queue if xmit_more is set
This patch adds support for a new port flag - BR_ISOLATED. If it is set
then isolated ports cannot communicate between each other, but they can
still communicate with non-isolated ports. The same can be achieved via
ACLs but they can't scale with large number of ports and also the
complexity of the rules grows. This feature can be used to achieve
isolated vlan functionality (similar to pvlan) as well, though currently
it will be port-wide (for all vlans on the port). The new test in
should_deliver uses data that is already cache hot and the new boolean
is used to avoid an additional source port test in should_deliver.
Linus Torvalds [Fri, 25 May 2018 16:35:11 +0000 (09:35 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 fixes from Will Deacon:
- fix application of read-only permissions to kernel section mappings
- sanitise reported ESR values for signals delivered on a kernel
address
- ensure tishift GCC helpers are exported to modules
- fix inline asm constraints for some LSE atomics
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Make sure permission updates happen for pmd/pud
arm64: fault: Don't leak data in ESR context for user fault on kernel VA
arm64: export tishift functions to modules
arm64: lse: Add early clobbers to some input/output asm operands
Linus Torvalds [Fri, 25 May 2018 16:32:00 +0000 (09:32 -0700)]
Merge tag 'powerpc-4.17-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
"Just one fix, to make sure the PCR (Processor Compatibility Register)
is reset on boot.
Otherwise if we're running in compat mode in a guest (eg. pretending a
Power9 is a Power8) and the host kernel oopses and kdumps then the
kdump kernel's userspace will be running in Power8 mode, and will
SIGILL if it uses Power9-only instructions.
Thanks to Michael Neuling"
* tag 'powerpc-4.17-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Clear PCR on boot
Linus Torvalds [Fri, 25 May 2018 16:29:17 +0000 (09:29 -0700)]
Merge tag 'mmc-v4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Propagate correct error code for RPMB requests
MMC host:
- sdhci-iproc: Drop hard coded cap for 1.8v
- sdhci-iproc: Fix 32bit writes for transfer mode
- sdhci-iproc: Enable SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus"
* tag 'mmc-v4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus
mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
mmc: block: propagate correct returned value in mmc_rpmb_ioctl
Qing Huang [Wed, 23 May 2018 23:22:46 +0000 (16:22 -0700)]
mlx4_core: allocate ICM memory in page size chunks
When a system is under memory presure (high usage with fragments),
the original 256KB ICM chunk allocations will likely trigger kernel
memory management to enter slow path doing memory compact/migration
ops in order to complete high order memory allocations.
When that happens, user processes calling uverb APIs may get stuck
for more than 120s easily even though there are a lot of free pages
in smaller chunks available in the system.
Syslog:
...
Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
oracle_205573_e:205573 blocked for more than 120 seconds.
...
With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.
However in order to support smaller ICM chunk size, we need to fix
another issue in large size kcalloc allocations.
E.g.
Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
entry). So we need a 16MB allocation for a table->icm pointer array to
hold 2M pointers which can easily cause kcalloc to fail.
The solution is to use kvzalloc to replace kcalloc which will fall back
to vmalloc automatically if kmalloc fails.
====================
nfp: offload LAG for tc flower egress
This series from John adds bond offload to the nfp driver. Patch 5
exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp
hashing matches that of the software LAG. This may be unnecessarily
conservative, let's see what LAG maintainers think :)
John says:
This patchset sets up the infrastructure and offloads output actions for
when a TC flower rule attempts to egress a packet to a LAG port.
Firstly it adds some of the infrastructure required to the flower app and
to the nfp core. This includes the ability to change the MAC address of a
repr, a function for combining lookup and write to a FW symbol, and the
addition of private data to a repr on a per app basis.
Patch 6 continues by implementing notifiers that track Linux bonds and
communicates to the FW those which enslave reprs, along with the current
state of reprs within the bond.
Patch 7 ensures bonds are synchronised with FW by receiving and acting
upon cmsgs sent to the kernel. These may request that a bond message is
retransmitted when FW can process it, or may request a full sync of the
bonds defined in the kernel.
Patch 8 offloads a flower action when that action requires egressing to a
pre-defined Linux bond.
====================
John Hurley [Thu, 24 May 2018 02:22:55 +0000 (19:22 -0700)]
nfp: flower: compute link aggregation action
If the egress device of an offloaded rule is a LAG port, then encode the
output port to the NFP with a LAG identifier and the offloaded group ID.
A prelag action is also offloaded which must be the first action of the
series (although may appear after other pre-actions - e.g. tunnels). This
causes the FW to check that it has the necessary information to output to
the requested LAG port. If it does not, the packet is sent to the kernel
before any other actions are applied to it.
John Hurley [Thu, 24 May 2018 02:22:54 +0000 (19:22 -0700)]
nfp: flower: implement host cmsg handler for LAG
Adds the control message handler to synchronize offloaded group config
with that of the kernel. Such messages are sent from fw to driver and
feature the following 3 flags:
- Data: an attached cmsg could not be processed - store for retransmission
- Xon: FW can accept new messages - retransmit any stored cmsgs
- Sync: full sync requested so retransmit all kernel LAG group info
John Hurley [Thu, 24 May 2018 02:22:53 +0000 (19:22 -0700)]
nfp: flower: monitor and offload LAG groups
Monitor LAG events via the NETDEV_CHANGEUPPER/NETDEV_CHANGELOWERSTATE
notifiers to maintain a list of offloadable groups. Sync these groups with
HW via a delayed workqueue to prevent excessive re-configuration. When the
workqueue is triggered it may generate multiple control messages for
different groups. These messages are linked via a batch ID and flags to
indicate a new batch and the end of a batch.
Update private data in each repr to track their LAG lower state flags. The
state of a repr is used to determine the active netdevs that can be
offloaded. For example, in active-backup mode, we only offload the netdev
currently active.
John Hurley [Thu, 24 May 2018 02:22:52 +0000 (19:22 -0700)]
net: include hash policy in LAG changeupper info
LAG upper event notifiers contain the tx type used by the LAG device.
Extend this to also include the hash policy used for tx types that
utilize hashing.
John Hurley [Thu, 24 May 2018 02:22:51 +0000 (19:22 -0700)]
nfp: flower: add per repr private data for LAG offload
Add a bitmap to each flower repr to track its state if it is enslaved by a
bond. This LAG state may be different to the port state - for example, the
port may be up but LAG state may be down due to the selection in an
active/backup bond.
John Hurley [Thu, 24 May 2018 02:22:50 +0000 (19:22 -0700)]
nfp: flower: check for/turn on LAG support in firmware
Check if the fw contains the _abi_flower_balance_sync_enable symbol. If it
does then write a 1 to this indicating that the driver is willing to
receive NIC to kernel LAG related control messages.
If the write is successful, update the list of extra features supported by
the fw and add a stub to accept LAG cmsgs.
If the guest network adapter is not configured with DeviceNaming
enabled on the host, then the query for friendly name will return
success but with a zero length name. Which then leads to a garbage value
(stack contents) for ifalias.
Fix is simple, just don't set name if host doesn't return it.
Fixes: 0fe554a46a0f ("hv_netvsc: propogate Hyper-V friendly name into interface alias") Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
In commit 624dbf55a359b ("driver/net: enic: Try DMA 64 first, then
failover to DMA") DMA mask was changed from 40 bits to 64 bits.
Hardware actually supports only 47 bits.
Fixes: 624dbf55a359b ("driver/net: enic: Try DMA 64 first, then failover to DMA") Signed-off-by: Govindarajulu Varadarajan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Update the FIB lookup tracepoints to include ip proto and port fields
from the flow struct. In the process make the IPv4 tracepoint inline
with IPv6 which is much easier to use and follow the lookup and result.
Remove the tracepoint in fib_validate_source which does not provide
value above the fib_table_lookup which immediately follows it.
v2
- move CREATE_TRACE_POINTS for the v6 tracepoint to route.c to handle
its need for an internal function to convert route type to error and
handle IPv6 as a module or builtin. Reported by kbuild robot.
====================
David Ahern [Thu, 24 May 2018 00:08:48 +0000 (17:08 -0700)]
net/ipv6: Udate fib6_table_lookup tracepoint
Commit bb0ad1987e96 ("ipv6: fib6_rules: support for match on sport, dport
and ip proto") added support for protocol and ports to FIB rules.
Update the FIB lookup tracepoint to dump the parameters.
David Ahern [Thu, 24 May 2018 00:08:47 +0000 (17:08 -0700)]
net/ipv4: Udate fib_table_lookup tracepoint
Commit 4a2d73a4fb36 ("ipv4: fib_rules: support match on sport, dport
and ip proto") added support for protocol and ports to FIB rules.
Update the FIB lookup tracepoint to dump the parameters.
In addition, make the IPv4 tracepoint similar to the IPv6 one where
the lookup parameters and result are dumped in 1 event. It is much
easier to use and understand the outcome of the lookup.
Cong Wang [Wed, 23 May 2018 22:26:53 +0000 (15:26 -0700)]
net_sched: switch to rcu_work
Commit 05f0fe6b74db ("RCU, workqueue: Implement rcu_work") introduces
new API's for dispatching work in a RCU callback. Now we can just
switch to the new API's for tc filters. This could get rid of a lot
of code.
Eric Biggers [Wed, 23 May 2018 21:37:38 +0000 (14:37 -0700)]
ppp: remove the PPPIOCDETACH ioctl
The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
before f_count has reached 0, which is fundamentally a bad idea. It
does check 'f_count < 2', which excludes concurrent operations on the
file since they would only be possible with a shared fd table, in which
case each fdget() would take a file reference. However, it fails to
account for the fact that even with 'f_count == 1' the file can still be
linked into epoll instances. As reported by syzbot, this can trivially
be used to cause a use-after-free.
Yet, the only known user of PPPIOCDETACH is pppd versions older than
ppp-2.4.2, which was released almost 15 years ago (November 2003).
Also, PPPIOCDETACH apparently stopped working reliably at around the
same time, when the f_count check was added to the kernel, e.g. see
https://lkml.org/lkml/2002/12/31/83. Also, the current 'f_count < 2'
check makes PPPIOCDETACH only work in single-threaded applications; it
always fails if called from a multithreaded application.
All pppd versions released in the last 15 years just close() the file
descriptor instead.
Therefore, instead of hacking around this bug by exporting epoll
internals to modules, and probably missing other related bugs, just
remove the PPPIOCDETACH ioctl and see if anyone actually notices. Leave
a stub in place that prints a one-time warning and returns EINVAL.
This patchset tests mirror-to-gretap with various underlay
configurations involving VLAN netdevice in particular. Some of the tests
involve bridges as well, but tests aimed specifically at testing bridges
(i.e. FDB, STP) are not part of this patchset.
In patches #1-#6, the codebase is adapted to support the new tests.
In patch #7, a test for mirroring to VLAN is introduced.
Patches #8-#10 add three tests where VLAN is part of underlay path after
gretap encapsulation.
====================
Petr Machata [Thu, 24 May 2018 14:28:06 +0000 (16:28 +0200)]
selftests: forwarding: Test mirror-to-gre w/ UL 802.1d+VLAN
Test for "tc action mirred egress mirror" that mirrors to GRE when the
underlay route points at an 802.1d bridge and packet egresses through a
VLAN device.
Besides testing basic connectivity, this also tests that the traffic is
properly tagged.
Petr Machata [Thu, 24 May 2018 14:27:55 +0000 (16:27 +0200)]
selftests: forwarding: Test mirror-to-gre w/ UL VLAN+802.1q
Test for "tc action mirred egress mirror" that mirrors to GRE when the
underlay route points at a vlan device on top of a bridge device with
vlan filtering (802.1q).
Petr Machata [Thu, 24 May 2018 14:27:48 +0000 (16:27 +0200)]
selftests: forwarding: Test mirror-to-vlan
Test for "tc action mirred egress mirror" that mirrors to a vlan device.
- test_vlan() tests that the packets get mirrored
- test_tagged_vlan() tests that the mirrored packets have correct inner
VLAN tag.
A mirror-to-vlan test that's coming next needs to install the trap
unconditionally. Therefore extract from slow_path_trap_{,un}install()
a more generic functions trap_install() and trap_uninstall(), and covert
the former two to conditional wrappers around these.
Petr Machata [Thu, 24 May 2018 14:27:21 +0000 (16:27 +0200)]
selftests: forwarding: Add $h3's clsact to mirror_topo_lib.sh
Having a clsact qdisc on $h3 is useful in several tests, and will be
useful in more tests to come. Move the registration from all the tests
that need it into the topology file itself.
For non-GRE mirroring tests, a functions along the lines of
do_test_span_gre_dir_ips() and test_span_gre_dir_ips() are necessary,
but such that they don't assume tunnels are involved. Extract the code
from mirror_gre_lib.sh to mirror_lib.sh and convert to just use a given
device without assuming it's named "h3-$tundev". Convert the two
above-mentioned functions to wrappers that pass along the correct device
name.
Add test_span_dir() and fail_test_span_dir() to round up the API for use
by following patches.
Move generic parts of mirror_gre_topo_lib.sh into a new file
mirror_topo_lib.sh. Reuse the functions in GRE topo, adding the tunnel
devices as necessary.
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Björn Töpel cleans up AF_XDP (removes rebind, explicit cache alignment from uapi, etc).
2) David Ahern adds mtu checks to bpf_ipv{4,6}_fib_lookup() helpers.
3) Jesper Dangaard Brouer adds bulking support to ndo_xdp_xmit.
4) Jiong Wang adds support for indirect and arithmetic shifts to NFP
5) Martin KaFai Lau cleans up BTF uapi and makes the btf_header extensible.
6) Mathieu Xhonneux adds an End.BPF action to seg6local with BPF helpers allowing
to edit/grow/shrink a SRH and apply on a packet generic SRv6 actions.
7) Sandipan Das adds support for bpf2bpf function calls in ppc64 JIT.
8) Yonghong Song adds BPF_TASK_FD_QUERY command for introspection of tracing events.
9) other misc fixes from Gustavo A. R. Silva, Sirio Balmelli, John Fastabend, and Magnus Karlsson
====================
David S. Miller [Fri, 25 May 2018 02:19:26 +0000 (22:19 -0400)]
Merge branch 'ibmvnic-Failover-hardening'
Thomas Falcon says:
====================
ibmvnic: Failover hardening
Introduce additional transport event hardening to handle
events during device reset. In the driver's current state,
if a transport event is received during device reset, it can
cause the device to become unresponsive as invalid operations
are processed as the backing device context changes. After
a transport event, the device expects a request to begin the
initialization process. If the driver is still processing
a previously queued device reset in this state, it is likely
to fail as firmware will reject any commands other than the
one to initialize the client driver's Command-Response Queue.
Instead of failing and becoming dormant, the driver will make
one more attempt to recover and continue operation. This is
achieved by setting a state flag, which if true will direct
the driver to clean up all allocated resources and perform
a hard reset in an attempt to bring the driver back to an
operational state.
====================
Thomas Falcon [Wed, 23 May 2018 18:38:02 +0000 (13:38 -0500)]
ibmvnic: Introduce hard reset recovery
Introduce a recovery hard reset to handle reset failure as a result of
change of device context following a transport event, such as a
backing device failover or partition migration. These operations reset
the device context to its initial state. If this occurs during a reset,
any initialization commands are likely to fail with an invalid state
error as backing device firmware requests reinitialization.
When this happens, make one more attempt by performing a hard reset,
which frees any resources currently allocated and performs device
initialization. If a transport event occurs during a device reset, a
flag is set which will trigger a new hard reset following the
completionof the current reset event.
Thomas Falcon [Wed, 23 May 2018 18:38:01 +0000 (13:38 -0500)]
ibmvnic: Set resetting state at earliest possible point
Set device resetting state at the earliest possible point: as soon as a
reset is successfully scheduled. The reset state is toggled off when
all resets have been processed to completion.
Thomas Falcon [Wed, 23 May 2018 18:38:00 +0000 (13:38 -0500)]
ibmvnic: Create separate initialization routine for resets
Instead of having one initialization routine for all cases, create
a separate, simpler function for standard initialization, such as during
device probe. Use the original initialization function to handle
device reset scenarios. The goal of this patch is to avoid having
a single, cluttered init function to handle all possible
scenarios.
Thomas Falcon [Wed, 23 May 2018 18:37:57 +0000 (13:37 -0500)]
ibmvnic: Check CRQ command return codes
Check whether CRQ command is successful before awaiting a response
from the management partition. If the command was not successful, the
driver may hang waiting for a response that will never come.
Thomas Falcon [Wed, 23 May 2018 18:37:56 +0000 (13:37 -0500)]
ibmvnic: Introduce active CRQ state
Introduce an "active" state for a IBM vNIC Command-Response Queue. A CRQ
is considered active once it has initialized or linked with its partner by
sending an initialization request and getting a successful response back
from the management partition. Until this has happened, do not allow CRQ
commands to be sent other than the initialization request.
This change will avoid a protocol error in case of a device transport
event occurring during a initialization. When the driver receives a
transport event notification indicating that the backing hardware
has changed and needs reinitialization, any further commands other
than the initialization handshake with the VIOS management partition
will result in an invalid state error. Instead of sending a command
that will be returned with an error, print a warning and return an
error that will be handled by the caller.
Willem de Bruijn [Wed, 23 May 2018 18:29:52 +0000 (14:29 -0400)]
ipv4: remove warning in ip_recv_error
A precondition check in ip_recv_error triggered on an otherwise benign
race. Remove the warning.
The warning triggers when passing an ipv6 socket to this ipv4 error
handling function. RaceFuzzer was able to trigger it due to a race
in setsockopt IPV6_ADDRFORM.
This socket option converts a v6 socket that is connected to a v4 peer
to an v4 socket. It updates the socket on the fly, changing fields in
sk as well as other structs. This is inherently non-atomic. It races
with the lockless udp_recvmsg path.
No other code makes an assumption that these fields are updated
atomically. It is benign here, too, as ip_recv_error cares only about
the protocol of the skbs enqueued on the error queue, for which
sk_family is not a precise predictor (thanks to another isue with
IPV6_ADDRFORM).
David S. Miller [Fri, 25 May 2018 02:14:37 +0000 (22:14 -0400)]
Merge branch 'gretap-mirroring-selftests'
Petr Machata says:
====================
selftests: forwarding: Additions to mirror-to-gretap tests
This patchset is for a handful of edge cases in mirror-to-gretap
scenarios: removal of mirrored-to netdevice (#1), removal of underlay
route for tunnel remote endpoint (#2) and cessation of mirroring upon
removal of flower mirroring rule (#3).
====================
Or Gerlitz [Wed, 23 May 2018 16:24:48 +0000 (19:24 +0300)]
net : sched: cls_api: deal with egdev path only if needed
When dealing with ingress rule on a netdev, if we did fine through the
conventional path, there's no need to continue into the egdev route,
and we can stop right there.
Not doing so may cause a 2nd rule to be added by the cls api layer
with the ingress being the egdev.
For example, under sriov switchdev scheme, a user rule of VFR A --> VFR B
will end up with two HW rules (1) VF A --> VF B and (2) uplink --> VF B
Fixes: 208c0f4b5237 ('net: sched: use tc_setup_cb_call to call per-block callbacks') Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Willem de Bruijn [Thu, 24 May 2018 22:10:30 +0000 (18:10 -0400)]
packet: fix reserve calculation
Commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link
layer allocation") ensures that packet_snd always starts writing
the link layer header in reserved headroom allocated for this
purpose.
This is needed because packets may be shorter than hard_header_len,
in which case the space up to hard_header_len may be zeroed. But
that necessary padding is not accounted for in skb->len.
The fix, however, is buggy. It calls skb_push, which grows skb->len
when moving skb->data back. But in this case packet length should not
change.
Instead, call skb_reserve, which moves both skb->data and skb->tail
back, without changing length.
Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation") Reported-by: Tariq Toukan <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
====================
This patchset change ndo_xdp_xmit API to take a bulk of xdp frames.
When kernel is compiled with CONFIG_RETPOLINE, every indirect function
pointer (branch) call hurts performance. For XDP this have a huge
negative performance impact.
This patchset reduce the needed (indirect) calls to ndo_xdp_xmit, but
also prepares for further optimizations. The DMA APIs use of indirect
function pointer calls is the primary source the regression. It is
left for a followup patchset, to use bulking calls towards the DMA API
(via the scatter-gatter calls).
The other advantage of this API change is that drivers can easier
amortize the cost of any sync/locking scheme, over the bulk of
packets. The assumption of the current API is that the driver
implemementing the NDO will also allocate a dedicated XDP TX queue for
every CPU in the system. Which is not always possible or practical to
configure. E.g. ixgbe cannot load an XDP program on a machine with
more than 96 CPUs, due to limited hardware TX queues. E.g. virtio_net
is hard to configure as it requires manually increasing the
queues. E.g. tun driver chooses to use a per XDP frame producer lock
modulo smp_processor_id over avail queues.
I'm considered adding 'flags' to ndo_xdp_xmit, but it's not part of
this patchset. This will be a followup patchset, once we know if this
will be needed (e.g. for non-map xdp_redirect flush-flag, and if
AF_XDP chooses to use ndo_xdp_xmit for TX).
---
V5: Fixed up issues spotted by Daniel and John
V4: Splitout the patches from 4 to 8 patches. I cannot split the
driver changes from the NDO change, but I've tried to isolated the NDO
change together with the driver change as much as possible.
====================
samples/bpf: xdp_monitor use err code from tracepoint xdp:xdp_devmap_xmit
Update xdp_monitor to use the recently added err code introduced
in tracepoint xdp:xdp_devmap_xmit, to show if the drop count is
caused by some driver general delivery problem. Other kind of drops
will likely just be more normal TX space issues.
xdp/trace: extend tracepoint in devmap with an err
Extending tracepoint xdp:xdp_devmap_xmit in devmap with an err code
allow people to easier identify the reason behind the ndo_xdp_xmit
call to a given driver is failing.
This patch change the API for ndo_xdp_xmit to support bulking
xdp_frames.
When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
Most of the slowdown is caused by DMA API indirect function calls, but
also the net_device->ndo_xdp_xmit() call.
Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
performance improved:
for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps
With frames avail as a bulk inside the driver ndo_xdp_xmit call,
further optimizations are possible, like bulk DMA-mapping for TX.
Testing without CONFIG_RETPOLINE show the same performance for
physical NIC drivers.
The virtual NIC driver tun sees a huge performance boost, as it can
avoid doing per frame producer locking, but instead amortize the
locking cost over the bulk.
V2: Fix compile errors reported by kbuild test robot <[email protected]>
V4: Isolated ndo, driver changes and callers.
When sending an xdp_frame through xdp_do_redirect call, then error
cases can happen where the xdp_frame needs to be dropped, and
returning an -errno code isn't sufficient/possible any-longer
(e.g. for cpumap case). This is already fully supported, by simply
calling xdp_return_frame.
This patch is an optimization, which provides xdp_return_frame_rx_napi,
which is a faster variant for these error cases. It take advantage of
the protection provided by XDP RX running under NAPI protection.
This change is mostly relevant for drivers using the page_pool
allocator as it can take advantage of this. (Tested with mlx5).
Notice how this allow us get XDP statistic without affecting the XDP
performance, as tracepoint is no-longer activated on a per packet basis.
V5: Spotted by John Fastabend.
Fix 'sent' also counted 'drops' in this patch, a later patch corrected
this, but it was a mistake in this intermediate step.
Like cpumap create queue for xdp frames that will be bulked. For now,
this patch simply invoke ndo_xdp_xmit foreach frame. This happens,
either when the map flush operation is envoked, or when the limit
DEV_MAP_BULK_SIZE is reached.
V5: Avoid memleak on error path in dev_map_update_elem()
Functionality is the same, but the ndo_xdp_xmit call is now
simply invoked from inside the devmap.c code.
V2: Fix compile issue reported by kbuild test robot <[email protected]>
V5: Cleanups requested by Daniel
- Newlines before func definition
- Use BUILD_BUG_ON checks
- Remove unnecessary use return value store in dev_map_enqueue
====================
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, this command will return bpf related information
to user space. Right now it only supports tracepoint/kprobe/uprobe
perf event fd's. For such a fd, BPF_TASK_FD_QUERY will return
. prog_id
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Patch #1 adds function perf_get_event() in kernel/events/core.c.
Patch #2 implements the bpf subcommand BPF_TASK_FD_QUERY.
Patch #3 syncs tools bpf.h header and also add bpf_task_fd_query()
in the libbpf library for samples/selftests/bpftool to use.
Patch #4 adds ksym_get_addr() utility function.
Patch #5 add a test in samples/bpf for querying k[ret]probes and
u[ret]probes.
Patch #6 add a test in tools/testing/selftests/bpf for querying
raw_tracepoint and tracepoint.
Patch #7 add a new subcommand "perf" to bpftool.
Changelogs:
v4 -> v5:
. return strlen(buf) instead of strlen(buf) + 1
in the attr.buf_len. As long as user provides
non-empty buffer, it will be filed with empty
string, truncated string, or full string
based on the buffer size and the length of
to-be-copied string.
v3 -> v4:
. made attr buf_len input/output. The length of
actual buffter is written to buf_len so user space knows
what is actually needed. If user provides a buffer
with length >= 1 but less than required, do partial
copy and return -ENOSPC.
. code simplification with put_user.
. changed query result attach_info to fd_type.
. add tests at selftests/bpf to test zero len, null buf and
insufficient buf.
v2 -> v3:
. made perf_get_event() return perf_event pointer const.
this was to ensure that event fields are not meddled.
. detect whether newly BPF_TASK_FD_QUERY is supported or
not in "bpftool perf" and warn users if it is not.
v1 -> v2:
. changed bpf subcommand name from BPF_PERF_EVENT_QUERY
to BPF_TASK_FD_QUERY.
. fixed various "bpftool perf" issues and added documentation
and auto-completion.
====================
Yonghong Song [Thu, 24 May 2018 18:21:58 +0000 (11:21 -0700)]
tools/bpftool: add perf subcommand
The new command "bpftool perf [show | list]" will traverse
all processes under /proc, and if any fd is associated
with a perf event, it will print out related perf event
information. Documentation is also added.
Below is an example to show the results using bcc commands.
Running the following 4 bcc commands:
kprobe: trace.py '__x64_sys_nanosleep'
kretprobe: trace.py 'r::__x64_sys_nanosleep'
tracepoint: trace.py 't:syscalls:sys_enter_nanosleep'
uprobe: trace.py 'p:/home/yhs/a.out:main'
Yonghong Song [Thu, 24 May 2018 18:21:57 +0000 (11:21 -0700)]
tools/bpf: add two BPF_TASK_FD_QUERY tests in test_progs
The new tests are added to query perf_event information
for raw_tracepoint and tracepoint attachment. For tracepoint,
both syscalls and non-syscalls tracepoints are queries as
they are treated slightly differently inside the kernel.
Yonghong Song [Thu, 24 May 2018 18:21:11 +0000 (11:21 -0700)]
tools/bpf: add ksym_get_addr() in trace_helpers
Given a kernel function name, ksym_get_addr() will return the kernel
address for this function, or 0 if it cannot find this function name
in /proc/kallsyms. This function will be used later when a kernel
address is used to initiate a kprobe perf event.
Yonghong Song [Thu, 24 May 2018 18:21:10 +0000 (11:21 -0700)]
tools/bpf: sync kernel header bpf.h and add bpf_task_fd_query in libbpf
Sync kernel header bpf.h to tools/include/uapi/linux/bpf.h and
implement bpf_task_fd_query() in libbpf. The test programs
in samples/bpf and tools/testing/selftests/bpf, and later bpftool
will use this libbpf function to query kernel.
Yonghong Song [Thu, 24 May 2018 18:21:09 +0000 (11:21 -0700)]
bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
. prog_id
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Dave Airlie [Thu, 24 May 2018 23:47:56 +0000 (09:47 +1000)]
Merge branch 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux into drm-fixes
Three fixes for vmwgfx. Two are cc'd stable and fix host logging and its
error paths on 32-bit VMs. One is a fix for a hibernate flaw
introduced with the 4.17 merge window.
* 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: Schedule an fb dirty update after resume
drm/vmwgfx: Fix host logging / guestinfo reading error paths
drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
Linus Torvalds [Thu, 24 May 2018 21:42:43 +0000 (14:42 -0700)]
Merge branch 'stable/for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fix from Konrad Rzeszutek Wilk:
"One single fix in here: under Xen the DMA32 heap (in the hypervisor)
would end up looking like swiss cheese.
The reason being that for every coherent DMA allocation we didn't do
the proper hypercall to tell Xen to return the page back to the DMA32
heap. End result was (eventually) no DMA32 space if you (for example)
continously unloaded and loaded modules"
* 'stable/for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
Yossi Kuperman [Tue, 17 Oct 2017 17:39:17 +0000 (20:39 +0300)]
net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
Sandbox QP Commands are retired in the order they are sent. Outstanding
commands are stored in a linked-list in the order they appear. Once a
response is received and the callback gets called, we pull the first
element off the pending list, assuming they correspond.
Sending a message and adding it to the pending list is not done atomically,
hence there is an opportunity for a race between concurrent requests.
Eran Ben Elisha [Tue, 1 May 2018 13:25:07 +0000 (16:25 +0300)]
net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
When RXFCS feature is enabled, the HW do not strip the FCS data,
however it is not present in the checksum calculated by the HW.
Fix that by manually calculating the FCS checksum and adding it to the SKB
checksum field.
Add helper function to find the FCS data for all SKB forms (linear,
one fragment or more).
Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag") Signed-off-by: Eran Ben Elisha <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
Huy Nguyen [Thu, 22 Mar 2018 02:39:17 +0000 (21:39 -0500)]
net/mlx5e: Receive buffer support for DCBX
Add dcbnl's set/get buffer configuration callback that allows user to
set/get buffer size configuration and priority to buffer mapping.
By default, firmware controls receive buffer configuration and priority
of buffer mapping based on the changes in pfc settings. When set buffer
call back is triggered, the buffer configuration changes to manual mode.
The manual mode means mlx5 driver will adjust the buffer configuration
accordingly based on the changes in pfc settings.
ConnectX buffer stride is 128 Bytes. If the buffer size is not multiple
of 128, the buffer size will be rounded down to the nearest multiple of
128.
Huy Nguyen [Thu, 22 Mar 2018 02:10:22 +0000 (21:10 -0500)]
net/mlx5e: Receive buffer configuration
Add APIs for buffer configuration based on the changes in
pfc configuration, cable len, buffer size configuration,
and priority to buffer mapping.
Note that the xoff fomula is as below
xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B]
xoff_threshold = buffer_size - xoff
xon_threshold = xoff_threshold - MTU
Huy Nguyen [Thu, 22 Feb 2018 19:22:56 +0000 (13:22 -0600)]
net/mlx5e: Move port speed code from en_ethtool.c to en/port.c
Move four below functions from en_ethtool.c to en/port.c. These
functions are used by both en_ethtool.c and en_main.c. Future code
can use these functions without ethtool link mode dependency.
u32 mlx5e_port_ptys2speed(u32 eth_proto_oper);
int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
u32 mlx5e_port_speed2linkmodes(u32 speed);
Delete the speed field from table mlx5e_build_ptys2ethtool_map. This
table only keeps the mapping between the mlx5e link mode and
ethtool link mode. Add new table mlx5e_link_speed for translation
from mlx5e link mode to actual speed.
Huy Nguyen [Thu, 22 Feb 2018 17:57:10 +0000 (11:57 -0600)]
net/dcb: Add dcbnl buffer attribute
In this patch, we add dcbnl buffer attribute to allow user
change the NIC's buffer configuration such as priority
to buffer mapping and buffer size of individual buffer.
This attribute combined with pfc attribute allows advanced user to
fine tune the qos setting for specific priority queue. For example,
user can give dedicated buffer for one or more priorities or user
can give large buffer to certain priorities.
The dcb buffer configuration will be controlled by lldptool.
lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6
maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6
lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0
sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively
After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to
choose dcbnl over devlink interface since this feature is intended to set
port attributes which are governed by the netdev instance of that port, where
devlink API is more suitable for global ASIC configurations.
We present an use case scenario where dcbnl buffer attribute configured
by advance user helps reduce the latency of messages of different sizes.
Scenarios description:
On ConnectX-5, we run latency sensitive traffic with
small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
traffic with large messages sizes 512KB and 1MB. We group small, medium,
and large message sizes to their own pfc enables priorities as follow.
Priorities 1 & 2 (64B, 256B and 1KB)
Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
Priorities 5 & 6 (512KB and 1MB)
By default, ConnectX-5 maps all pfc enabled priorities to a single
lossless fixed buffer size of 50% of total available buffer space. The
other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
we create three equal size lossless buffers. Each buffer has 25% of total
available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
to lossless buffer mappings are set as follow.
Priorities 1 & 2 on lossless buffer #1
Priorities 3 & 4 on lossless buffer #2
Priorities 5 & 6 on lossless buffer #3
We observe improvements in latency for small and medium message sizes
as follows. Please note that the large message sizes bandwidth performance is
reduced but the total bandwidth remains the same.
256B message size (42 % latency reduction)
4K message size (21% latency reduction)
64K message size (16% latency reduction)
Linus Torvalds [Thu, 24 May 2018 21:12:05 +0000 (14:12 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"This is pretty much just the usual array of smallish driver bugs.
- remove bouncing addresses from the MAINTAINERS file
- kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and
hns drivers
- various small LOC behavioral/operational bugs in mlx5, hns, qedr
and i40iw drivers
- two fixes for patches already sent during the merge window
- a long-standing bug related to not decreasing the pinned pages
count in the right MM was found and fixed"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits)
RDMA/hns: Move the location for initializing tmp_len
RDMA/hns: Bugfix for cq record db for kernel
IB/uverbs: Fix uverbs_attr_get_obj
RDMA/qedr: Fix doorbell bar mapping for dpi > 1
IB/umem: Use the correct mm during ib_umem_release
iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()'
RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint
RDMA/i40iw: Avoid reference leaks when processing the AEQ
RDMA/i40iw: Avoid panic when objects are being created and destroyed
RDMA/hns: Fix the bug with NULL pointer
RDMA/hns: Set NULL for __internal_mr
RDMA/hns: Enable inner_pa_vld filed of mpt
RDMA/hns: Set desc_dma_addr for zero when free cmq desc
RDMA/hns: Fix the bug with rq sge
RDMA/hns: Not support qp transition from reset to reset for hip06
RDMA/hns: Add return operation when configured global param fail
RDMA/hns: Update convert function of endian format
RDMA/hns: Load the RoCE dirver automatically
RDMA/hns: Bugfix for rq record db for kernel
RDMA/hns: Add rq inline flags judgement
...