qede: place ethtool_rx_flow_spec after code after TC flower codebase
This is a preparation patch to reuse the existing TC flower codebase
from ethtool_rx_flow_spec.
This patch is merely moving the core ethtool_rx_flow_spec parser after
tc flower offload driver code so we can skip a few forward function
declarations in the follow up patch.
Update this driver to use the flow_rule infrastructure, hence we can use
the same code to populate hardware IR from ethtool_rx_flow and the
cls_flower interfaces.
ethtool: add ethtool_rx_flow_spec to flow_rule structure translator
This patch adds a function to translate the ethtool_rx_flow_spec
structure to the flow_rule representation.
This allows us to reuse code from the driver side given that both flower
and ethtool_rx_flow interfaces use the same representation.
This patch also includes support for the flow type flags FLOW_EXT,
FLOW_MAC_EXT and FLOW_RSS.
The ethtool_rx_flow_spec_input wrapper structure is used to convey the
rss_context field, that is away from the ethtool_rx_flow_spec structure,
and the ethtool_rx_flow_spec structure.
flow_offload: add wake-up-on-lan and queue to flow_action
These actions need to be added to support the ethtool_rx_flow interface.
The queue action includes a field to specify the RSS context, that is
set via FLOW_RSS flow type flag and the rss_context field in struct
ethtool_rxnfc, plus the corresponding queue index. FLOW_RSS implies that
rss_context is non-zero, therefore, queue.ctx == 0 means that FLOW_RSS
was not set. Also add a field to store the vf index which is stored in
the ethtool_rxnfc ring_cookie field.
flow_offload: add statistics retrieval infrastructure and use it
This patch provides the flow_stats structure that acts as container for
tc_cls_flower_offload, then we can use to restore the statistics on the
existing TC actions. Hence, tcf_exts_stats_update() is not used from
drivers anymore.
cls_api: add translator to flow_action representation
This patch implements a new function to translate from native TC action
to the new flow_action representation. Moreover, this patch also updates
cls_flower to use this new function.
This new infrastructure defines the nic actions that you can perform
from existing network drivers. This infrastructure allows us to avoid a
direct dependency with the native software TC action representation.
net/mlx5e: support for two independent packet edit actions
This patch adds pedit_headers_action structure to store the result of
parsing tc pedit actions. Then, it calls alloc_tc_pedit_action() to
populate the mlx5e hardware intermediate representation once all actions
have been parsed.
This patch comes in preparation for the new flow_action infrastructure,
where each packet mangling comes in an separated action, ie. not packed
as in tc pedit.
flow_offload: add flow_rule and flow_match structures and use them
This patch wraps the dissector key and mask - that flower uses to
represent the matching side - around the flow_match structure.
To avoid a follow up patch that would edit the same LoCs in the drivers,
this patch also wraps this new flow match structure around the flow rule
object. This new structure will also contain the flow actions in follow
up patches.
This introduces two new interfaces:
bool flow_rule_match_key(rule, dissector_id)
that returns true if a given matching key is set on, and:
flow_rule_match_XYZ(rule, &match);
To fetch the matching side XYZ into the match container structure, to
retrieve the key and the mask with one single call.
Breno Leitao [Tue, 5 Feb 2019 17:12:34 +0000 (15:12 -0200)]
bpf: test_maps: fix possible out of bound access warning
When compiling test_maps selftest with GCC-8, it warns that an array
might be indexed with a negative value, which could cause a negative
out of bound access, depending on parameters of the function. This
is the GCC-8 warning:
gcc -Wall -O2 -I../../../include/uapi -I../../../lib -I../../../lib/bpf -I../../../../include/generated -DHAVE_GENHDR -I../../../include test_maps.c /home/breno/Devel/linux/tools/testing/selftests/bpf/libbpf.a -lcap -lelf -lrt -lpthread -o /home/breno/Devel/linux/tools/testing/selftests/bpf/test_maps
In file included from test_maps.c:16:
test_maps.c: In function ‘run_all_tests’:
test_maps.c:1079:10: warning: array subscript -1 is below array bounds of ‘pid_t[<Ube20> + 1]’ [-Warray-bounds]
assert(waitpid(pid[i], &status, 0) == pid[i]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
test_maps.c:1059:6: warning: array subscript -1 is below array bounds of ‘pid_t[<Ube20> + 1]’ [-Warray-bounds]
pid[i] = fork();
~~~^~~
This patch simply guarantees that the task(s) variables are unsigned,
thus, they could never be a negative number (which they are not in
current code anyway), hence avoiding an out of bound access warning.
Prashant Bhole [Wed, 6 Feb 2019 01:47:23 +0000 (10:47 +0900)]
tools: bpftool: doc, fix incorrect text
Documentation about cgroup, feature, prog uses wrong header
'MAP COMMANDS' while listing commands. This patch corrects the header
in respective doc files.
Daniel Borkmann [Wed, 6 Feb 2019 14:35:43 +0000 (15:35 +0100)]
Merge branch 'bpf-xdp-hw-plus-generic'
Jakub Kicinski says:
====================
Offloaded and native/driver XDP programs can already coexist.
Allow offload and generic hook to coexist as well, there seem
to be no reason why not to do so.
====================
Jakub Kicinski [Wed, 6 Feb 2019 04:03:21 +0000 (20:03 -0800)]
net: xdp: allow generic and driver XDP on one interface
Since commit a25717d2b604 ("xdp: support simultaneous driver and
hw XDP attachment") users can load an XDP program for offload and
in native driver mode simultaneously. Allow a similar mix of
offload and SKB mode/generic XDP.
Jakub Kicinski [Wed, 6 Feb 2019 04:03:20 +0000 (20:03 -0800)]
selftests/bpf: fix the expected messages
Recent changes added extack to program replacement path,
expect extack instead of generic messages.
Fixes: 01dde20ce04b ("xdp: Provide extack messages when prog attachment failed") Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
Yonghong Song [Wed, 6 Feb 2019 05:38:30 +0000 (21:38 -0800)]
tools/bpf: silence a libbpf unnecessary warning
Commit 96408c43447a ("tools/bpf: implement libbpf
btf__get_map_kv_tids() API function") refactored
function bpf_map_find_btf_info() and moved bulk of
implementation into btf.c as btf__get_map_kv_tids().
This change introduced a bug such that test_btf will
print out the following warning although the test passed:
BTF libbpf test[2] (test_btf_nokv.o): libbpf: map:btf_map
container_name:____btf_map_btf_map cannot be found
in BTF. Missing BPF_ANNOTATE_KV_PAIR?
Previously, the error message is guarded with pr_debug().
Commit 96408c43447a changed it to pr_warning() and
hence caused the warning.
Restoring to pr_debug() for the message fixed the issue.
Fixes: 96408c43447a ("tools/bpf: implement libbpf btf__get_map_kv_tids() API function") Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
David S. Miller [Wed, 6 Feb 2019 04:15:30 +0000 (20:15 -0800)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2019-02-05
This series contains updates to igc, e1000e, ixgbe, fm10k and driver
documentation.
Kai-Heng Feng fixes an e1000e issue where the Wake-On-LAN settings where
being set incorrectly during a system suspend.
Sasha addresses community feedback on the igc driver and provides a
number of code cleanups to remove either unreachable or unused code. In
addition, added basic ethtool support for the igc driver.
Mike Rapoport fixes the formatting of the kernel driver documentation so
that the title is properly formatted and does not get lumped with the
document sections in the HTML kernel documents generated.
Jiri Kosina updates a hard coded RAR entries value with the existing
define IXGBE_82599_RAR_ENTRIES.
Jake fixes up whitespace in the fm10k driver.
Konstantin Khlebnikov fixes an issue where in some cases, the e1000e
driver will continually reset during a system boot because the watchdog
task sees items in the transmit buffer but the carrier is off (trying to
establish link) causing the device reset to flush the buffer. To
resolve, just move this check/flush into the watchdog section for when
the carrier is off.
Todd bumps the igb driver version to reflect the recent driver changes.
====================
Yonghong Song [Tue, 5 Feb 2019 19:48:22 +0000 (11:48 -0800)]
tools/bpf: add const qualifier to btf__get_map_kv_tids() map_name parameter
Commit 96408c43447a ("tools/bpf: implement libbpf btf__get_map_kv_tids() API function")
added the API function btf__get_map_kv_tids():
btf__get_map_kv_tids(const struct btf *btf, char *map_name, ...)
The parameter map_name has type "char *". This is okay inside libbpf library since
the map_name is from bpf_map->name which also has type "char *".
This will be problematic if the caller for map_name already has attribute "const",
e.g., from C++ string.c_str(). It will result in either a warning or an error.
/home/yhs/work/bcc/src/cc/btf.cc:166:51:
error: invalid conversion from ‘const char*’ to ‘char*’ [-fpermissive]
return btf__get_map_kv_tids(btf_, map_name.c_str()
This patch added "const" attributes to map_name parameter.
Yonghong Song [Tue, 5 Feb 2019 22:28:44 +0000 (14:28 -0800)]
tools/bpf: fix a selftest test_btf failure
Commit 9c651127445c ("selftests/btf: add initial BTF dedup tests")
added dedup tests in test_btf.c.
It broke the raw test:
BTF raw test[71] (func proto (Bad arg name_off)):
btf_raw_create:2905:FAIL Error getting string #65535, strs_cnt:1
The test itself encodes invalid func_proto parameter name
offset 0xffffFFFF as a negative test for the kernel.
The above commit changed the meaning of that offset and
resulted in a user space error.
#define NAME_NTH(N) (0xffff0000 | N)
#define IS_NAME_NTH(X) ((X & 0xffff0000) == 0xffff0000)
#define GET_NAME_NTH_IDX(X) (X & 0x0000ffff)
Currently, the kernel permits maximum name offset 0xffff.
Set the test name off as 0x0fffFFFF to trigger the kernel
verification failure.
e1000e: fix cyclic resets at link up with active tx
I'm seeing series of e1000e resets (sometimes endless) at system boot
if something generates tx traffic at this time. In my case this is
netconsole who sends message "e1000e 0000:02:00.0: Some CPU C-states
have been disabled in order to enable jumbo frames" from e1000e itself.
As result e1000_watchdog_task sees used tx buffer while carrier is off
and start this reset cycle again.
[ 17.794359] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 17.794714] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[ 22.936455] e1000e 0000:02:00.0 eth1: changing MTU from 1500 to 9000
[ 23.033336] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[ 26.102364] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 27.174495] 8021q: 802.1Q VLAN Support v1.8
[ 27.174513] 8021q: adding VLAN 0 to HW filter on device eth1
[ 30.671724] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation
[ 30.898564] netpoll: netconsole: local port 6666
[ 30.898566] netpoll: netconsole: local IPv6 address 2a02:6b8:0:80b:beae:c5ff:fe28:23f8
[ 30.898567] netpoll: netconsole: interface 'eth1'
[ 30.898568] netpoll: netconsole: remote port 6666
[ 30.898568] netpoll: netconsole: remote IPv6 address 2a02:6b8:b000:605c:e61d:2dff:fe03:3790
[ 30.898569] netpoll: netconsole: remote ethernet address b0:a8:6e:f4:ff:c0
[ 30.917747] console [netcon0] enabled
[ 30.917749] netconsole: network logging started
[ 31.453353] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[ 34.185730] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[ 34.321840] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[ 34.465822] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[ 34.597423] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[ 34.745417] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[ 34.877356] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[ 35.005441] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[ 35.157376] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[ 35.289362] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[ 35.417441] e1000e 0000:02:00.0: Some CPU C-states have been disabled in order to enable jumbo frames
[ 37.790342] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
This patch flushes tx buffers only once when carrier is off
rather than at each watchdog iteration.
Sasha Neftin [Sun, 13 Jan 2019 09:22:31 +0000 (11:22 +0200)]
igc: Remove unneeded code
Remove the 'igc_get_link_up_info_base method' from igc_base.c file.
Use the 'igc_get_speed_and_duplex_copper' method directly and reduce
the code redundancy.
Jeff Kirsher [Fri, 4 Jan 2019 01:17:03 +0000 (17:17 -0800)]
e1000e: fix a missing check for return value
The change is based on the issue found by Kangjie Lu <[email protected]> where
we not checking the return value of a register read/write which could result
in a NULL pointer dereference if the read/write fails.
Since we are only trying to disable the far-end loopback, if the read
and write of register fails, we do not want to bail out of the function.
We just want to log that it failed to disable and continue on.
Jiri Kosina [Wed, 2 Jan 2019 19:20:33 +0000 (20:20 +0100)]
ixgbe: remove magic constant in ixgbe_reset_hw_82599()
ixgbe_reset_hw_82599() resets the value of hw->mac.num_rar_entries to
pre-defined value of 128. Let's get rid of that hardcoded literal, and use
IXGBE_82599_RAR_ENTRIES instead, the same way the normal initialization
path does.
Sasha Neftin [Tue, 18 Dec 2018 09:29:54 +0000 (11:29 +0200)]
igc: Fix code redundancy
Remove redundant igc_check_for_link_base code and replace it with
an igc_check_for_copper_link method.
Fix duplication of IGC_ADVTXD_PAYLEN_SHIFT mask declaration.
Remove obsolete IGC_SCVPC register definition.
Sasha Neftin [Tue, 11 Dec 2018 16:55:41 +0000 (18:55 +0200)]
igc: Remove unreachable code from igc_phy.c file
Address community comment.
Remove the unreachable code leads to the static checker warning.
PHY functionality will be added later per demand.
Reported by Dan Carpenter.
Kai-Heng Feng [Tue, 11 Dec 2018 07:59:37 +0000 (15:59 +0800)]
e1000e: Exclude device from suspend direct complete optimization
e1000e sets different WoL settings in system suspend callback and
runtime suspend callback.
The suspend direct complete optimization leaves e1000e in runtime
suspended state with wrong WoL setting during system suspend.
To fix this, we need to disable suspend direct complete optimization to
let e1000e always use suspend callback to set correct WoL during system
suspend.
Russell King [Mon, 4 Feb 2019 23:35:59 +0000 (23:35 +0000)]
net: marvell: mvpp2: fix lack of link interrupts
Sven Auhagen reports that if he changes a SFP+ module for a SFP module
on the Macchiatobin Single Shot, the link does not come back up. For
Sven, it is as easy as:
- Insert a SFP+ module connected, and use ping6 to verify link is up.
- Remove SFP+ module
- Insert SFP 1000base-X module use ping6 to verify link is up: Link
up event did not trigger and the link is down
but that doesn't show the problem for me. Locally, this has been
reproduced by:
- Boot with no modules.
- Insert SFP+ module, confirm link is up.
- Replace module with 25000base-X module. Confirm link is up.
- Set remote end down, link is reported as dropped at both ends.
- Set remote end up, link is reported up at remote end, but not local
end due to lack of link interrupt.
Fix this by setting up both GMAC and XLG interrupts for port 0, but
only unmasking the appropriate interrupt according to the current mode
set in the mac_config() method. However, only do the mask/unmask
dance when we are really changing the link mode to avoid missing any
link interrupts.
Russell King [Mon, 4 Feb 2019 23:35:54 +0000 (23:35 +0000)]
net: marvell: mvpp2: use phy_interface_mode_is_8023z() helper
Use the phy_interface_mode_is_8023z() helper for detecting interface
modes that use 802.3z serial encoding. This is equivalent to testing
for both 1000base-X and 2500base-X.
David S. Miller [Tue, 5 Feb 2019 18:34:34 +0000 (10:34 -0800)]
Merge branch 'nixge-Fixed-link-support'
Moritz Fischer says:
====================
nixge: Fixed-link support
This series adds fixed-link support to nixge.
The first patch corrects the binding to correctly reflect
hardware that does not come with MDIO cores instantiated.
The second patch adds fixed link support to the driver.
The third patch updates the binding document with the now
optional (formerly required) phy-handle property and references
the fixed-link docs.
====================
Moritz Fischer [Mon, 4 Feb 2019 17:30:38 +0000 (09:30 -0800)]
net: nixge: Make mdio child node optional
Make MDIO child optional and only instantiate the
MDIO bus if the child is actually present.
There are currently no (in-tree) users of this
binding; all (out-of-tree) users use overlays that
get shipped together with the FPGA images that contain
the IP.
This will significantly increase maintainabilty
of future revisions of this IP.
Daniel Borkmann [Tue, 5 Feb 2019 15:56:11 +0000 (16:56 +0100)]
Merge branch 'bpf-riscv-jit'
Björn Töpel says:
====================
This v2 series adds an RV64G BPF JIT to the kernel.
At the moment the RISC-V Linux port does not support
CONFIG_HAVE_KPROBES (Patrick Stählin sent out an RFC last year), which
means that CONFIG_BPF_EVENTS is not supported. Thus, no tests
involving BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT,
BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_RAW_TRACEPOINT passes.
The implementation does not support "far branching" (>4KiB).
Note that "test_verifier" was run with one build with
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise
many of the the tests that require unaligned access were skipped.
The two failing test_verifier tests are:
"ld_abs: vlan + abs, test 1"
"ld_abs: jump around ld_abs"
This is due to that "far branching" involved in those tests.
All tests where done on QEMU emulator version 3.1.50
(v3.1.0-688-g8ae951fbc106). I'll test it on real hardware, when I get
access to it.
I'm routing this patch via bpf-next/netdev mailing list (after a
conversation with Palmer at FOSDEM), mainly because the other JITs
went that path.
RFCv1 -> v1:
* Cleaned up the Kconfig and net/Makefile. (Christoph)
* Removed the entry-stub and squashed the build/config changes to be
part of the JIT implementation. (Christoph)
* Simplified the register tracking code. (Daniel)
* Removed unused macros. (Daniel)
* Added myself as maintainer and updated documentation. (Daniel)
* Removed HAVE_EFFICIENT_UNALIGNED_ACCESS. (Christoph, Palmer)
* Added tail-calls and cleaned up the code.
====================
Björn Töpel [Tue, 5 Feb 2019 12:41:25 +0000 (13:41 +0100)]
selftests/bpf: add "any alignment" annotation for some tests
RISC-V does, in-general, not have "efficient unaligned access". When
testing the RISC-V BPF JIT, some selftests failed in the verification
due to misaligned access. Annotate these tests with the
F_NEEDS_EFFICIENT_UNALIGNED_ACCESS flag.
Björn Töpel [Tue, 5 Feb 2019 12:41:22 +0000 (13:41 +0100)]
bpf, riscv: add BPF JIT for RV64G
This commit adds a BPF JIT for RV64G.
The JIT is a two-pass JIT, and has a dynamic prolog/epilogue (similar
to the MIPS64 BPF JIT) instead of static ones (e.g. x86_64).
At the moment the RISC-V Linux port does not support
CONFIG_HAVE_KPROBES, which means that CONFIG_BPF_EVENTS is not
supported. Thus, no tests involving BPF_PROG_TYPE_TRACEPOINT,
BPF_PROG_TYPE_PERF_EVENT, BPF_PROG_TYPE_KPROBE and
BPF_PROG_TYPE_RAW_TRACEPOINT passes.
The implementation does not support "far branching" (>4KiB).
Note that "test_verifier" was run with one build with
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and one without, otherwise
many of the the tests that require unaligned access were skipped.
Daniel Borkmann [Tue, 5 Feb 2019 15:52:58 +0000 (16:52 +0100)]
Merge branch 'bpf-btf-dedup'
Andrii Nakryiko says:
====================
This patch series adds BTF deduplication algorithm to libbpf. This algorithm
allows to take BTF type information containing duplicate per-compilation unit
information and reduce it to equivalent set of BTF types with no duplication without
loss of information. It also deduplicates strings and removes those strings that
are not referenced from any BTF type (and line information in .BTF.ext section,
if any).
Algorithm also resolves struct/union forward declarations into concrete BTF types
across multiple compilation units to facilitate better deduplication ratio. If
undesired, this resolution can be disabled through specifying corresponding options.
When applied to BTF data emitted by pahole's DWARF->BTF converter, it reduces
the overall size of .BTF section by about 65x, from about 112MB to 1.75MB, leaving
only 29247 out of initial 3073497 BTF type descriptors.
Algorithm with minor differences and preliminary results before FUNC/FUNC_PROTO
support is also described more verbosely at:
v1->v2:
- rebase on latest bpf-next
- err_log/elog -> pr_debug
- btf__dedup, btf__get_strings, btf__get_nr_types listed under 0.0.2 version
====================
Andrii Nakryiko [Tue, 5 Feb 2019 01:29:46 +0000 (17:29 -0800)]
selftests/btf: add initial BTF dedup tests
This patch sets up a new kind of tests (BTF dedup tests) and tests few aspects of
BTF dedup algorithm. More complete set of tests will come in follow up patches.
Andrii Nakryiko [Tue, 5 Feb 2019 01:29:45 +0000 (17:29 -0800)]
btf: add BTF types deduplication algorithm
This patch implements BTF types deduplication algorithm. It allows to
greatly compress typical output of pahole's DWARF-to-BTF conversion or
LLVM's compilation output by detecting and collapsing identical types emitted in
isolation per compilation unit. Algorithm also resolves struct/union forward
declarations into concrete BTF types representing referenced struct/union. If
undesired, this resolution can be disabled through specifying corresponding options.
Algorithm itself and its application to Linux kernel's BTF types is
described in details at:
https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html
Linus Walleij [Mon, 4 Feb 2019 10:26:18 +0000 (11:26 +0100)]
net: phy: fixed-phy: Drop GPIO from fixed_phy_add()
All users of the fixed_phy_add() pass -1 as GPIO number
to the fixed phy driver, and all users of fixed_phy_register()
pass -1 as GPIO number as well, except for the device
tree MDIO bus.
Any new users should create a proper device and pass the
GPIO as a descriptor associated with the device so delete
the GPIO argument from the calls and drop the code looking
requesting a GPIO in fixed_phy_add().
In fixed phy_register(), investigate the "fixed-link"
node and pick the GPIO descriptor from "link-gpios" if
this property exists. Move the corresponding code out
of of_mdio.c as the fixed phy code anyways requires
OF to be in use.
With the recent print rework we now have the following problem:
pr_{warning,info,debug} expand to __pr which calls libbpf_print.
libbpf_print does va_start and calls __libbpf_pr with va_list argument.
In __base_pr we again do va_start. Because the next argument is a
va_list, we don't get correct pointer to the argument (and print noting
in my case, I don't know why it doesn't crash tbh).
Fix this by changing libbpf_print_fn_t signature to accept va_list and
remove unneeded calls to va_start in the existing users.
Alternatively, this can we solved by exporting __libbpf_pr and
changing __pr macro to (and killing libbpf_print):
{
if (__libbpf_pr)
__libbpf_pr(level, "libbpf: " fmt, ##__VA_ARGS__)
}
For RDMA transports, RDS TOS is an extension of IB QoS(Annex A13)
to provide clients the ability to segregate traffic flows for
different type of data. RDMA CM abstract it for ULPs using
rdma_set_service_type(). Internally, each traffic flow is
represented by a connection with all of its independent resources
like that of a normal connection, and is differentiated by
service type. In other words, there can be multiple qp connections
between an IP pair and each supports a unique service type.
The feature has been added from RDSv4.1 onwards and supports
rolling upgrades. RDMA connection metadata also carries the tos
information to set up SL on end to end context. The original
code was developed by Bang Nguyen in downstream kernel back in
2.6.32 kernel days and it has evolved over period of time.
RDMA transport maps user tos to underline virtual lanes(VL)
for IB or DSCP values. RDMA CM transport abstract thats for
RDS. TCP transport makes use of default priority 0 and maps
all user tos values to it.
RDS Service type (TOS) is user-defined and needs to be configured
via RDS IOCTL interface. It must be set before initiating any
traffic and once set the TOS can not be changed. All out-going
traffic from the socket will be associated with its TOS.
For legacy protocol version incompatibility with non linux RDS,
consumer reject reason being used to convey it to peer. But the
choice of reject reason value as '1' was really poor.
Anyway for interoperability reasons with shipping products,
it needs to be supported. For any future versions, properly
encoded reject reason should to be used.
Mark RDSv3.1 as compat version and add v4.1 version macro's.
Subsequent patches enable TOS(Type of Service) feature which is
tied with v4.1 for RDMA transport.
Here's a set of 7 patches against DaveM's 'net-next.git' repo. I'm implemeting
the simple RX checksum offload (like was done for the 'ravb' driver by Simon
Horman); it has been only tested on the R8A7740 and R8A77980 SoCs, the other
SoCs should just work (according to their manuals)...
====================
Sergei Shtylyov [Mon, 4 Feb 2019 18:06:52 +0000 (21:06 +0300)]
sh_eth: RX checksum offload support
Add support for the RX checksum offload. This is enabled by default and
may be disabled and re-enabled using 'ethtool':
# ethtool -K eth0 rx off
# ethtool -K eth0 rx on
Some Ether MACs provide a simple checksumming scheme which appears to be
completely compatible with CHECKSUM_COMPLETE: sum of all packet data after
the L2 header is appended to packet data; this may be trivially read by
the driver and used to update the skb accordingly. The same checksumming
scheme is implemented in the EtherAVB MACs and now supported by the 'ravb'
driver.
In terms of performance, throughput is close to gigabit line rate with the
RX checksum offload both enabled and disabled. The 'perf' output, however,
appears to indicate that significantly less time is spent in do_csum() --
this is as expected.
Test results with RX checksum offload enabled:
~/netperf-2.2pl4# perf record -a ./netperf -t TCP_MAERTS -H 192.168.2.4
TCP MAERTS TEST to 192.168.2.4
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
Sergei Shtylyov [Mon, 4 Feb 2019 18:05:55 +0000 (21:05 +0300)]
sh_eth: rename sh_eth_cpu_data::hw_checksum
Commit 62e04b7e0e3c ("sh_eth: rename 'sh_eth_cpu_data::hw_crc'") renamed
the field to 'hw_checksum' for the Ether DMAC "intelligent checksum",
however some Ether MACs implement a simpler checksumming scheme, so that
name now seems misleading. Rename that field to 'csmr' as the "intelligent
checksum" is always controlled by the CSMR register.
====================
This patch set exposed a few functions in libbpf.
All these newly added API functions are helpful for
JIT based bpf compilation where .BTF and .BTF.ext
are available as in-memory data blobs.
Patch #1 exposed several btf_ext__* API functions which
are used to handle .BTF.ext ELF sections.
Patch #2 refactored the function bpf_map_find_btf_info()
and exposed API function btf__get_map_kv_tids() to
retrieve the map key/value type id's generated by
bpf program through BPF_ANNOTATE_KV_PAIR macro.
====================
Yonghong Song [Mon, 4 Feb 2019 19:00:58 +0000 (11:00 -0800)]
tools/bpf: implement libbpf btf__get_map_kv_tids() API function
Currently, to get map key/value type id's, the macro
BPF_ANNOTATE_KV_PAIR(<map_name>, <key_type>, <value_type>)
needs to be defined in the bpf program for the
corresponding map.
During program/map loading time,
the local static function bpf_map_find_btf_info()
in libbpf.c is implemented to retrieve the key/value
type ids given the map name.
The patch refactored function bpf_map_find_btf_info()
to create an API btf__get_map_kv_tids() which includes
the bulk of implementation for the original function.
The API btf__get_map_kv_tids() can be used by bcc,
a JIT based bpf compilation system, which uses the
same BPF_ANNOTATE_KV_PAIR to record map key/value types.
Yonghong Song [Mon, 4 Feb 2019 19:00:57 +0000 (11:00 -0800)]
tools/bpf: expose functions btf_ext__* as API functions
The following set of functions, which manipulates .BTF.ext
section, are exposed as API functions:
. btf_ext__new
. btf_ext__free
. btf_ext__reloc_func_info
. btf_ext__reloc_line_info
. btf_ext__func_info_rec_size
. btf_ext__line_info_rec_size
These functions are useful for JIT based bpf codegen, e.g.,
bcc, to manipulate in-memory .BTF.ext sections.
The signature of function btf_ext__reloc_func_info()
is also changed to be the same as its definition in btf.c.
Heiko Carstens [Mon, 4 Feb 2019 15:44:55 +0000 (16:44 +0100)]
s390: bpf: fix JMP32 code-gen
Commit 626a5f66da0d19 ("s390: bpf: implement jitting of JMP32") added
JMP32 code-gen support for s390. However it triggers the warning below
due to some unusual gotos in the original s390 bpf jit code.
Add a couple of additional "is_jmp32" initializations to fix this.
Also fix the wrong opcode for the "llilf" instruction that was
introduced with the same commit.
arch/s390/net/bpf_jit_comp.c: In function 'bpf_jit_insn':
arch/s390/net/bpf_jit_comp.c:248:55: warning: 'is_jmp32' may be used uninitialized in this function [-Wmaybe-uninitialized]
_EMIT6(op1 | reg(b1, b2) << 16 | (rel & 0xffff), op2 | mask); \
^
arch/s390/net/bpf_jit_comp.c:1211:8: note: 'is_jmp32' was declared here
bool is_jmp32 = BPF_CLASS(insn->code) == BPF_JMP32;
====================
These are patches responding to my comments for
Magnus's patch (https://patchwork.ozlabs.org/patch/1032848/).
The goal is to make pr_* macros available to other C files
than libbpf.c, and to simplify API function libbpf_set_print().
Specifically, Patch #1 used global functions
to facilitate pr_* macros in the header files so they
are available in different C files.
Patch #2 removes the global function libbpf_print_level_available()
which is added in Patch 1.
Patch #3 simplified libbpf_set_print() which takes only one print
function with a debug level argument among others.
Changelogs:
v3 -> v4:
. rename libbpf internal header util.h to libbpf_util.h
. rename libbpf internal function libbpf_debug_print() to libbpf_print()
v2 -> v3:
. bailed out earlier in libbpf_debug_print() if __libbpf_pr is NULL
. added missing LIBBPF_DEBUG level check in libbpf.c __base_pr().
v1 -> v2:
. Renamed global function libbpf_dprint() to libbpf_debug_print()
to be more expressive.
. Removed libbpf_dprint_level_available() as it is used only
once in btf.c and we can remove it by optimizing for common cases.
====================
Yonghong Song [Sat, 2 Feb 2019 00:14:17 +0000 (16:14 -0800)]
tools/bpf: simplify libbpf API function libbpf_set_print()
Currently, the libbpf API function libbpf_set_print()
takes three function pointer parameters for warning, info
and debug printout respectively.
This patch changes the API to have just one function pointer
parameter and the function pointer has one additional
parameter "debugging level". So if in the future, if
the debug level is increased, the function signature
won't change.
Yonghong Song [Sat, 2 Feb 2019 00:14:15 +0000 (16:14 -0800)]
tools/bpf: print out btf log at LIBBPF_WARN level
Currently, the btf log is allocated and printed out in case
of error at LIBBPF_DEBUG level.
Such logs from kernel are very important for debugging.
For example, bpf syscall BPF_PROG_LOAD command can get
verifier logs back to user space. In function load_program()
of libbpf.c, the log buffer is allocated unconditionally
and printed out at pr_warning() level.
Let us do the similar thing here for btf. Allocate buffer
unconditionally and print out error logs at pr_warning() level.
This can reduce one global function and
optimize for common situations where pr_warning()
is activated either by default or by user supplied
debug output function.
Yonghong Song [Sat, 2 Feb 2019 00:14:14 +0000 (16:14 -0800)]
tools/bpf: move libbpf pr_* debug print functions to headers
A global function libbpf_print, which is invisible
outside the shared library, is defined to print based
on levels. The pr_warning, pr_info and pr_debug
macros are moved into the newly created header
common.h. So any .c file including common.h can
use these macros directly.
Currently btf__new and btf_ext__new API has an argument getting
__pr_debug function pointer into btf.c so the debugging information
can be printed there. This patch removed this parameter
from btf__new and btf_ext__new and directly using pr_debug in btf.c.
Another global function libbpf_print_level_available, also
invisible outside the shared library, can test
whether a particular level debug printing is
available or not. It is used in btf.c to
test whether DEBUG level debug printing is availabl or not,
based on which the log buffer will be allocated when loading
btf to the kernel.
Kees Cook [Wed, 16 Jan 2019 00:02:23 +0000 (16:02 -0800)]
ath9k: eeprom: Use scnprintf instead of snprintf
Change snprintf to scnprintf. There are generally two cases where using
snprintf causes problems.
1) Uses of size += snprintf(buf, SIZE - size, fmt, ...) In this case,
if snprintf would have written more characters than what the buffer
size (SIZE) is, then size will end up larger than SIZE. In later uses
of snprintf, SIZE - size will result in a negative number, leading to
problems. Note that size might already be too large by using size =
snprintf before the code reaches a case of size += snprintf.
2) If size is ultimately used as a length parameter for a copy back to
user space, then it will potentially allow for a buffer overflow and
information disclosure when size is greater than SIZE. When the size is
used to index the buffer directly, we can have memory corruption. This
also means when size = snprintf... is used, it may also cause problems
since size may become large. Copying to userspace is mitigated by the
HARDENED_USERCOPY kernel configuration.
The solution to these issues is to use scnprintf which returns the number
of characters actually written to the buffer, so the size variable will
never exceed SIZE.
Govind Singh [Wed, 16 Jan 2019 11:29:42 +0000 (16:59 +0530)]
ath10k: Add support for extended HTT aggr msg support
HTT aggr message parameter in HL2.0 fw are different in comparison
to legacy fw version. Fill correct HTT aggr msg parameter for
targets using HL2.0 firmware.
Yu Wang [Tue, 15 Jan 2019 06:31:07 +0000 (14:31 +0800)]
ath10k: fix S5 power consumption issue for QCA9377
After system entering S5 (shut down but system still
providing power to QCA9377) on Ubuntu platform, power
consumption of QCA9377 is 69mA, which is too high.
The root cause is pci_soft_reset is not set for QCA9377
during pci probe.
To fix this issue, set 'pci_soft_reset' to 'th10k_pci_warm_reset',
and then the power consumption drops to a normal value(10mA).
Verified on Dell Ubuntu platform with firmware:
WLAN.TF.1.0-00002-QCATFSWPZ-5
ath10k: Set DMA address mask to 35 bit for WCN3990
WCN3990 is a 37-bit target but can address memory range
only upto 35 bits. The 36th bit is used to control the
smmu/iommu translation and the 37th bit is used by the
internal bus masters to access the wifi subsystem internal
SRAM. With the DMA mask set to 37i-bit, the host driver
can get 37-bit dma address, which leads to incorrect
address access in the target.
Hence the host driver can used addresses upto 35-bit
for WCN3990. Fix the dma mask for wcn3990 to 35-bit,
instead of 37-bit.
Johannes Berg [Tue, 11 Dec 2018 20:20:43 +0000 (21:20 +0100)]
iwlwifi: mvm: fix RFH config command with >=10 CPUs
If we have >=10 (logical) CPUs, our command size exceeds the
internal buffer size and the command fails; fix that by using
IWL_HCMD_DFL_NOCOPY for the command that's allocated anyway.
While at it, also fix the leak of cmd, and use struct_size()
to calculate its size.
Signed-off-by: Johannes Berg <[email protected]> Fixes: 8edbfaa19835 ("iwlwifi: mvm: configure multi RX queue") Signed-off-by: Luca Coelho <[email protected]>
Both iwl_trans_fw_error and iwl_force_nmi initiate async recovery flow.
Calling them both is redundant and causing a race.
Solve this by removing the call to iwl_trans_fw_error.
Signed-off-by: Shahar S Matityahu <[email protected]> Fixes: cfadc3ffccd5 ("iwlwifi: pcie: stop the firmware when we restart it") Signed-off-by: Luca Coelho <[email protected]>
Implement paging memory dump in the new dump mechanism.
To support this change, moved iwl_self_init_dram strcut from trans_pcie
to trans so that it will accessible via fw_runtime.
Johannes Berg [Mon, 10 Dec 2018 09:36:29 +0000 (10:36 +0100)]
iwlwifi: mvm: don't hide HE radiotap data in SKB
Hiding the HE radiotap data for further processing of the SKB just
caused another bug when adding the L-SIG data. Simply stop doing
this and adjust the skb->data pointer accordingly when we need to
get the 802.11 header.
While at it, also verify and fix the data alignment, we need to add
2 bytes padding with the vendor data to ensure the whole length of
all radiotap headers is a multiple of 4.
Signed-off-by: Johannes Berg <[email protected]> Fixes: 6721039d5b8a ("iwlwifi: mvm: add L-SIG length to radiotap") Signed-off-by: Luca Coelho <[email protected]>
Johannes Berg [Mon, 10 Dec 2018 08:04:55 +0000 (09:04 +0100)]
iwlwifi: move config structs to C file
Even if they're static const, there's no need to duplicate
the structs every time they're included and used. Move them
to an appropriate C file instead.