Currently mlx5_core_dev contains array of capabilities. It contains 19
valid capabilities of the device, 2 reserved entries and 12 holes.
Due to this for 14 unused entries, mlx5_core_dev allocates 14 * 8K = 112K
bytes of memory which is never used. Due to this mlx5_core_dev structure
size is 270Kbytes odd. This allocation further aligns to next power of 2
to 512Kbytes.
By skipping non-existent entries,
(a) 112Kbyte is saved,
(b) mlx5_core_dev reduces to 8KB with alignment
(c) 350KB saved in alignment
In future individual capability allocation can be used to skip its
allocation when such capability is disabled at the device level. This
patch prepares mlx5_core_dev to hold capability using a pointer instead
of inline array.
net/mlx5: Reorganize current and maximal capabilities to be per-type
In the current code, the current and maximal capabilities are
maintained in separate arrays which are both per type. In order to
allow the creation of such a basic structure as a dynamically
allocated array, we move curr and max fields to a unified
structure so that specific capabilities can be allocated as one unit.
Shay Drory [Tue, 29 Jun 2021 11:47:30 +0000 (14:47 +0300)]
net/mlx5: Change SF missing dedicated MSI-X err message to dbg
When MSI-X vectors allocated are not enough for SFs to have dedicated,
MSI-X, kernel log buffer has too many entries.
Hence only enable such log with debug level.
Leon Romanovsky [Sun, 1 Aug 2021 08:37:57 +0000 (11:37 +0300)]
net/mlx5: Delete impossible dev->state checks
New mlx5_core device structure is allocated through devlink_alloc
with\ kzalloc and that ensures that all fields are equal to zero
and it includes ->state too.
That means that checks of that field in the mlx5_init_one() is
completely redundant, because that function is called only once
in the begging of mlx5_core_dev lifetime.
PCI:
.probe()
-> probe_one()
-> mlx5_init_one()
The recovery flow can't run at that time or before it, because relevant
work initialized later in mlx5_init_once().
Such initialization flow ensures that dev->state can't be
MLX5_DEVICE_STATE_UNINITIALIZED at all, so remove such impossible
checks.
David S. Miller [Wed, 11 Aug 2021 13:44:59 +0000 (14:44 +0100)]
Merge branch 'dsa-tagger-helpers'
Vladimir Oltean says:
====================
DSA tagger helpers
The goal of this series is to minimize the use of memmove and skb->data
in the DSA tagging protocol drivers. Unfiltered access to this level of
information is not very friendly to drive-by contributors, and sometimes
is also not the easiest to review.
For starters, I have converted the most common form of DSA tagging
protocols: the DSA headers which are placed where the EtherType is.
The helper functions introduced by this series are:
- dsa_alloc_etype_header
- dsa_strip_etype_header
- dsa_etype_header_pos_rx
- dsa_etype_header_pos_tx
This series is just a resend as non-RFC of v1.
====================
Vladimir Oltean [Tue, 10 Aug 2021 13:13:55 +0000 (16:13 +0300)]
net: dsa: create a helper for locating EtherType DSA headers on RX
It seems that protocol tagging driver writers are always surprised about
the formula they use to reach their EtherType header on RX, which
becomes apparent from the fact that there are comments in multiple
drivers that mention the same information.
Create a helper that returns a void pointer to skb->data - 2, as well as
centralize the explanation why that is the case.
Vladimir Oltean [Tue, 10 Aug 2021 13:13:54 +0000 (16:13 +0300)]
net: dsa: create a helper which allocates space for EtherType DSA headers
Hide away the memmove used by DSA EtherType header taggers to shift the
MAC SA and DA to the left to make room for the header, after they've
called skb_push(). The call to skb_push() is still left explicit in
drivers, to be symmetric with dsa_strip_etype_header, and because not
all callers can be refactored to do it (for example, brcm_tag_xmit_ll
has common code for a pre-Ethernet DSA tag and an EtherType DSA tag).
Vladimir Oltean [Tue, 10 Aug 2021 13:13:53 +0000 (16:13 +0300)]
net: dsa: create a helper that strips EtherType DSA headers on RX
All header taggers open-code a memmove that is fairly not all that
obvious, and we can hide the details behind a helper function, since the
only thing specific to the driver is the length of the header tag.
David S. Miller [Wed, 11 Aug 2021 13:34:22 +0000 (14:34 +0100)]
Merge branch 'devlink-aux-devices'
Parav Pandit says:
====================
devlink: Control auxiliary devices
Currently, for mlx5 multi-function device, a user is not able to control
which functionality to enable/disable. For example, each PCI
PF, VF, SF function by default has netdevice, RDMA and vdpa-net
devices always enabled.
Hence, enable user to control which device functionality to enable/disable.
This is achieved by using existing devlink params [1] to
enable/disable eth, rdma and vdpa net functionality control knob.
For example user interested in only vdpa device function: performs,
$ devlink dev param set pci/0000:06:00.0 name enable_rdma value false \
cmode driverinit
$ devlink dev param set pci/0000:06:00.0 name enable_eth value false \
cmode driverinit
$ devlink dev param set pci/0000:06:00.0 name enable_vnet value true \
cmode driverinit
$ devlink dev reload pci/0000:06:00.0
Reload command honors parameters set, initializes the device that user
has composed using devlink dev params and resources.
Devices before reload:
Parav Pandit [Tue, 10 Aug 2021 13:24:19 +0000 (16:24 +0300)]
devlink: Add API to register and unregister single parameter
Currently device configuration parameters can be registered as an array.
Due to this a constant array must be registered. A single driver
supporting multiple devices each with different device capabilities end
up registering all parameters even if it doesn't support it.
One possible workaround a driver can do is, it registers multiple single
entry arrays to overcome such limitation.
Better is to provide a API that enables driver to register/unregister a
single parameter. This also further helps in two ways.
(1) to reduce the memory of devlink_param_entry by avoiding in registering
parameters which are not supported by the device.
(2) avoid generating multiple parameter add, delete, publish, unpublish,
init value notifications for such unsupported parameters
Parav Pandit [Tue, 10 Aug 2021 13:24:18 +0000 (16:24 +0300)]
devlink: Create a helper function for one parameter registration
Create and use a helper function for one parameter registration.
Subsequent patch also will reuse this for driver facing routine to
register a single parameter.
David S. Miller [Wed, 11 Aug 2021 12:34:41 +0000 (13:34 +0100)]
Merge branch 'bridge-global-mcast'
Nikolay Aleksandrov says:
====================
net: bridge: vlan: add global mcast options
This is the first follow-up set after the support for per-vlan multicast
contexts which extends global vlan options to support bridge's multicast
config per-vlan, it enables user-space to change and dump the already
existing bridge vlan multicast context options. The global option patches
(01 - 09 and 12-13) follow a similar pattern of changing current mcast
functions to take multicast context instead of a port/bridge directly.
Option equality checks have been added for dumping vlan range compression.
The last 2 patches extend the mcast router dump support so it can be
re-used when dumping vlan config.
patches 01 - 09: add support for various mcast options
patches 10 - 11: prepare for per-vlan querier control
patches 12 - 13: add support for querier control and router control
patches 14 - 15: add support for dumping per-vlan router ports
Next patch-sets:
- per-port/vlan router option config
- iproute2 support for all new vlan options
- selftests
====================
net: bridge: vlan: use br_rports_fill_info() to export mcast router ports
Embed the standard multicast router port export by br_rports_fill_info()
into a new global vlan attribute BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS.
In order to have the same format for the global bridge mcast context and
the per-vlan mcast context we need a double-nesting:
- BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS
- MDBA_ROUTER
Currently we don't compare router lists, if any router port exists in
the bridge mcast contexts we consider their option sets as different and
export them separately.
In addition we export the router port vlan id when dumping similar to
the router port notification format.
net: bridge: mcast: use the proper multicast context when dumping router ports
When we are dumping the router ports of a vlan mcast context we need to
use the bridge/vlan and port/vlan's multicast contexts to check if
IPv4/IPv6 router port is present and later to dump the vlan id.
net: bridge: vlan: add support for mcast router global option
Add support to change and retrieve global vlan multicast router state
which is used for the bridge itself. We just need to pass multicast context
to br_multicast_set_router instead of bridge device and the rest of the
logic remains the same.
net: bridge: vlan: add support for mcast querier global option
Add support to change and retrieve global vlan multicast querier state.
We just need to pass multicast context to br_multicast_set_querier
instead of bridge device and the rest of the logic remains the same.
net: bridge: mcast: querier and query state affect only current context type
It is a minor optimization and better behaviour to make sure querier and
query sending routines affect only the matching multicast context
depending if vlan snooping is enabled (vlan ctx vs bridge ctx).
It also avoids sending unnecessary extra query packets.
net: bridge: mcast: move querier state to the multicast context
We need to have the querier state per multicast context in order to have
per-vlan control, so remove the internal option bit and move it to the
multicast context. Also annotate the lockless reads of the new variable.
David S. Miller [Wed, 11 Aug 2021 12:31:56 +0000 (13:31 +0100)]
Merge branch 'ipa-runtime-pm'
Alex Elder says:
====================
net: ipa: use runtime PM reference counting
This series does further rework of the IPA clock code so that we
rely on some of the core runtime power management code (including
its referencing counting) instead.
The first patch makes ipa_clock_get() act like pm_runtime_get_sync().
The second patch makes system suspend occur regardless of the
current reference count value, which is again more like how the
runtime PM core code behaves.
The third patch creates functions to encapsulate all hardware
suspend and resume activity. The fourth uses those functions as
the ->runtime_suspend and ->runtime_resume power callbacks. With
that in place, ipa_clock_get() and ipa_clock_put() are changed to
use runtime PM get and put functions when needed.
The fifth patch eliminates an extra clock reference previously used
to control system suspend. The sixth eliminates the "IPA clock"
reference count and mutex.
The final patch replaces the one call to ipa_clock_get_additional()
with a call to pm_runtime_get_if_active(), making the former
unnecessary.
====================
Alex Elder [Tue, 10 Aug 2021 19:27:04 +0000 (14:27 -0500)]
net: ipa: kill ipa_clock_get_additional()
Now that ipa_clock_get_additional() is a trivial wrapper around
pm_runtime_get_if_active(), just open-code it in its only caller
and delete the function.
Alex Elder [Tue, 10 Aug 2021 19:27:03 +0000 (14:27 -0500)]
net: ipa: kill IPA clock reference count
The runtime power management core code maintains a usage count. This
count mirrors the IPA clock reference count, and there's no need to
maintain both. So get rid of the IPA clock reference count and just
rely on the runtime PM usage count to determine when the hardware
should be suspended or resumed.
Use pm_runtime_get_if_active() in ipa_clock_get_additional(). We
care whether power is active, regardless of whether it's in use, so
pass true for its ign_usage_count argument.
The IPA clock mutex is just used to make enabling/disabling the
clock and updating the reference count occur atomically. Without
the reference count, there's no need for the mutex, so get rid of
that too.
Alex Elder [Tue, 10 Aug 2021 19:27:02 +0000 (14:27 -0500)]
net: ipa: get rid of extra clock reference
Suspending the IPA hardware is now managed by the runtime PM core
code. The ->runtime_idle callback returns a non-zero value, so it
will never suspend except when forced. As a result, there's no need
to take an extra "do not suspend" clock reference.
Alex Elder [Tue, 10 Aug 2021 19:27:01 +0000 (14:27 -0500)]
net: ipa: use runtime PM core
Use the runtime power management core to cause hardware suspend and
resume to occur. Enable it in ipa_clock_init() (without autosuspend),
and disable it in ipa_clock_exit().
Use ipa_runtime_suspend() as the ->runtime_suspend power operation,
and arrange for it to be called by having ipa_clock_get() call
pm_runtime_get_sync() when the first clock reference is taken.
Similarly, use ipa_runtime_resume() as the ->runtime_resume power
operation, and pm_runtime_put() when the last IPA clock reference
is dropped.
Introduce ipa_runtime_idle() as the ->runtime_idle power operation,
and have it return a non-zero value; this way suspend will never
occur except when forced.
Use pm_runtime_force_suspend() and pm_runtime_force_resume() as the
system suspend and resume callbacks, and remove ipa_suspend() and
ipa_resume().
Store a pointer to the device structure passed to ipa_clock_init(),
so it can be used by ipa_clock_exit() to disable runtime power
management.
Alex Elder [Tue, 10 Aug 2021 19:27:00 +0000 (14:27 -0500)]
net: ipa: resume in ipa_clock_get()
Introduce ipa_runtime_suspend() and ipa_runtime_resume(), which
encapsulate the activities necessary for suspending and resuming
the IPA hardware. Call these functions from ipa_clock_get() and
ipa_clock_put() when the first reference is taken or last one is
dropped.
When the very first clock reference is taken (for ipa_config()),
setup isn't complete yet, so (as before) only the core clock gets
enabled.
When the last clock reference is dropped (after ipa_deconfig()),
ipa_teardown() will have made the setup_complete flag false, so
there too, the core clock will be stopped without affecting GSI
or the endpoints.
Otherwise these new functions will perform the desired suspend and
resume actions once setup is complete.
Alex Elder [Tue, 10 Aug 2021 19:26:59 +0000 (14:26 -0500)]
net: ipa: disable clock in suspend
Disable the IPA clock rather than dropping a reference to it in the
system suspend callback. This forces the suspend to occur without
affecting existing references.
Similarly, enable the clock rather than taking a reference in
ipa_resume(), forcing a resume without changing the reference count.
Alex Elder [Tue, 10 Aug 2021 19:26:58 +0000 (14:26 -0500)]
net: ipa: have ipa_clock_get() return a value
We currently assume no errors occur when enabling or disabling the
IPA core clock and interconnects. And although this commit exposes
errors that could occur, we generally assume this won't happen in
practice.
This commit changes ipa_clock_get() and ipa_clock_put() so each
returns a value. The values returned are meant to mimic what the
runtime power management functions return, so we can set up error
handling here before we make the switch. Have ipa_clock_get()
increment the reference count even if it returns an error, to match
the behavior of pm_runtime_get().
More details follow.
When taking a reference in ipa_clock_get(), return 0 for the first
reference, 1 for subsequent references, or a negative error code if
an error occurs. Note that if ipa_clock_get() returns an error, we
must not touch hardware; in some cases such errors now cause entire
blocks of code to be skipped.
When dropping a reference in ipa_clock_put(), we return 0 or an
error code. The error would come from ipa_clock_disable(), which
now returns what ipa_interconnect_disable() returns (either 0 or a
negative error code). For now, callers ignore the return value;
if an error occurs, a message will have already been logged, and
little more can actually be done to improve the situation.
Linus Torvalds [Wed, 11 Aug 2021 02:34:34 +0000 (16:34 -1000)]
Merge tag 'arc-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- Fix FPU_STATUS update
- Update my email address
- Other spellos and fixes
* tag 'arc-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
MAINTAINERS: update Vineet's email address
ARC: fp: set FPU_STATUS.FWE to enable FPU_STATUS update on context switch
ARC: Fix CONFIG_STACKDEPOT
arc: Fix spelling mistake and grammar in Kconfig
arc: Prefer unsigned int to bare use of unsigned
Currently there's support for filtering neighbours/links for interfaces
which have a specific master device (using the IFLA_MASTER/NDA_MASTER
attributes).
This patch adds support for filtering interfaces/neighbours dump for
interfaces that *don't* have a master.
Mark Bloch [Tue, 10 Aug 2021 03:43:05 +0000 (03:43 +0000)]
net/sched: cls_api, reset flags on replay
tc_new_tfilter() can replay a request if it got EAGAIN. The cited commit
didn't account for this when it converted TC action ->init() API
to use flags instead of parameters. This can lead to passing stale flags
down the call chain which results in trying to lock rtnl when it's
already locked, deadlocking the entire system.
Fix by making sure to reset flags on each replay.
============================================
WARNING: possible recursive locking detected 5.14.0-rc3-custom-49011-g3d2bbb4f104d #447 Not tainted
--------------------------------------------
tc/37605 is trying to acquire lock: ffffffff841df2f0 (rtnl_mutex){+.+.}-{3:3}, at: tc_setup_cb_add+0x14b/0x4d0
but task is already holding lock: ffffffff841df2f0 (rtnl_mutex){+.+.}-{3:3}, at: tc_new_tfilter+0xb12/0x22e0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(rtnl_mutex);
lock(rtnl_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by tc/37605:
#0: ffffffff841df2f0 (rtnl_mutex){+.+.}-{3:3}, at: tc_new_tfilter+0xb12/0x22e0
Vladimir Oltean [Tue, 10 Aug 2021 11:50:24 +0000 (14:50 +0300)]
net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge
The blamed commit added a new field to struct switchdev_notifier_fdb_info,
but did not make sure that all call paths set it to something valid.
For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE
notifier, and since the 'is_local' flag is not set, it contains junk
from the stack, so the bridge might interpret those notifications as
being for local FDB entries when that was not intended.
To avoid that now and in the future, zero-initialize all
switchdev_notifier_fdb_info structures created by drivers such that all
newly added fields to not need to touch drivers again.
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: Lag, Create shared FDB when in switchdev mode
net/mlx5: E-Switch, add logic to enable shared FDB
net/mlx5: Lag, move lag destruction to a workqueue
net/mlx5: Lag, properly lock eswitch if needed
net/mlx5: Add send to vport rules on paired device
net/mlx5: E-Switch, Add event callback for representors
net/mlx5e: Use shared mappings for restoring from metadata
net/mlx5e: Add an option to create a shared mapping
net/mlx5: E-Switch, set flow source for send to uplink rule
RDMA/mlx5: Add shared FDB support
{net, RDMA}/mlx5: Extend send to vport rules
RDMA/mlx5: Fill port info based on the relevant eswitch
net/mlx5: Lag, add initial logic for shared FDB
net/mlx5: Return mdev from eswitch
IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
net/mlx5: Add DCS caps & fields support
====================
net: bridge: fix flags interpretation for extern learn fdb entries
Ignore fdb flags when adding port extern learn entries and always set
BR_FDB_LOCAL flag when adding bridge extern learn entries. This is
closest to the behaviour we had before and avoids breaking any use cases
which were allowed.
This patch fixes iproute2 calls which assume NUD_PERMANENT and were
allowed before, example:
$ bridge fdb add 00:11:22:33:44:55 dev swp1 extern_learn
Extern learn entries are allowed to roam, but do not expire, so static
or dynamic flags make no sense for them.
Linus Torvalds [Tue, 10 Aug 2021 16:46:33 +0000 (09:46 -0700)]
Merge tag 'platform-drivers-x86-v5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Small set of pdx86 fixes for 5.14"
* tag 'platform-drivers-x86-v5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: pcengines-apuv2: Add missing terminating entries to gpio-lookup tables
platform/x86: Make dual_accel_detect() KIOX010A + KIOX020A detect more robust
platform/x86: Add and use a dual_accel_detect() helper
Linus Torvalds [Tue, 10 Aug 2021 16:40:09 +0000 (09:40 -0700)]
Merge tag 'ovl-fixes-5.14-rc6-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"Fix several bugs in overlayfs"
* tag 'ovl-fixes-5.14-rc6-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: prevent private clone if bind mount is not allowed
ovl: fix uninitialized pointer read in ovl_lookup_real_one()
ovl: fix deadlock in splice write
ovl: skip stale entries in merge dir cache iteration
We've added 31 non-merge commits during the last 8 day(s) which contain
a total of 28 files changed, 3644 insertions(+), 519 deletions(-).
1) Native XDP support for bonding driver & related BPF selftests, from Jussi Maki.
2) Large batch of new BPF JIT tests for test_bpf.ko that came out as a result from
32-bit MIPS JIT development, from Johan Almbladh.
3) Rewrite of netcnt BPF selftest and merge into test_progs, from Stanislav Fomichev.
4) Fix XDP bpf_prog_test_run infra after net to net-next merge, from Andrii Nakryiko.
5) Follow-up fix in unix_bpf_update_proto() to enforce socket type, from Cong Wang.
6) Fix bpf-iter-tcp4 selftest to print the correct dest IP, from Jose Blanquicet.
7) Various misc BPF XDP sample improvements, from Niklas Söderlund, Matthew Cover,
and Muhammad Falak R Wani.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (31 commits)
bpf, tests: Add tail call test suite
bpf, tests: Add tests for BPF_CMPXCHG
bpf, tests: Add tests for atomic operations
bpf, tests: Add test for 32-bit context pointer argument passing
bpf, tests: Add branch conversion JIT test
bpf, tests: Add word-order tests for load/store of double words
bpf, tests: Add tests for ALU operations implemented with function calls
bpf, tests: Add more ALU64 BPF_MUL tests
bpf, tests: Add more BPF_LSH/RSH/ARSH tests for ALU64
bpf, tests: Add more ALU32 tests for BPF_LSH/RSH/ARSH
bpf, tests: Add more tests of ALU32 and ALU64 bitwise operations
bpf, tests: Fix typos in test case descriptions
bpf, tests: Add BPF_MOV tests for zero and sign extension
bpf, tests: Add BPF_JMP32 test cases
samples, bpf: Add an explict comment to handle nested vlan tagging.
selftests/bpf: Add tests for XDP bonding
selftests/bpf: Fix xdp_tx.c prog section name
net, core: Allow netdev_lower_get_next_private_rcu in bh context
bpf, devmap: Exclude XDP broadcast to master device
net, bonding: Add XDP support to the bonding driver
...
====================
Kenneth Feng [Fri, 6 Aug 2021 02:28:04 +0000 (10:28 +0800)]
drm/amd/pm: bug fix for the runtime pm BACO
In some systems only MACO is supported. This is to fix the problem
that runtime pm is enabled but BACO is not supported. MACO will be
handled seperately.
Jeremy Szu [Tue, 10 Aug 2021 10:08:45 +0000 (18:08 +0800)]
ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 650 G8 Notebook PC
The HP ProBook 650 G8 Notebook PC is using ALC236 codec which is
using 0x02 to control mute LED and 0x01 to control micmute LED.
Therefore, add a quirk to make it works.
David S. Miller [Tue, 10 Aug 2021 12:17:22 +0000 (13:17 +0100)]
Merge branch 'fdb-backpressure-fixes'
Vladimir Oltean says:
====================
Fix broken backpressure during FDB dump in DSA drivers
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
multiple netlink skbs if the buffer provided by user space is too small
(one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in
dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
point, and then the dump resumes on the same port with a new skb, and
FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
drivers, then drivers must check for the -EMSGSIZE error code returned
by it. Otherwise, when a netlink skb becomes full, DSA will no longer
save newly dumped FDB entries to it, but the driver will continue
dumping. So FDB entries will be missing from the dump.
DSA is one of the few switchdev drivers that have an .ndo_fdb_dump
implementation, because of the assumption that the hardware and software
FDBs cannot be efficiently kept in sync via SWITCHDEV_FDB_ADD_TO_BRIDGE.
Other drivers with a home-cooked .ndo_fdb_dump implementation are
ocelot and dpaa2-switch. These appear to do the correct thing, as do the
other DSA drivers, so nothing else appears to need fixing.
====================
Vladimir Oltean [Tue, 10 Aug 2021 11:19:56 +0000 (14:19 +0300)]
net: dsa: sja1105: fix broken backpressure in .port_fdb_dump
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
multiple netlink skbs if the buffer provided by user space is too small
(one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in
dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
point, and then the dump resumes on the same port with a new skb, and
FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
drivers, then drivers must check for the -EMSGSIZE error code returned
by it. Otherwise, when a netlink skb becomes full, DSA will no longer
save newly dumped FDB entries to it, but the driver will continue
dumping. So FDB entries will be missing from the dump.
Fix the broken backpressure by propagating the "cb" return code and
allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
Fixes: 291d1e72b756 ("net: dsa: sja1105: Add support for FDB and MDB management") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Tue, 10 Aug 2021 11:19:55 +0000 (14:19 +0300)]
net: dsa: lantiq: fix broken backpressure in .port_fdb_dump
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
multiple netlink skbs if the buffer provided by user space is too small
(one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in
dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
point, and then the dump resumes on the same port with a new skb, and
FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
drivers, then drivers must check for the -EMSGSIZE error code returned
by it. Otherwise, when a netlink skb becomes full, DSA will no longer
save newly dumped FDB entries to it, but the driver will continue
dumping. So FDB entries will be missing from the dump.
Fix the broken backpressure by propagating the "cb" return code and
allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
Fixes: 58c59ef9e930 ("net: dsa: lantiq: Add Forwarding Database access") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Tue, 10 Aug 2021 11:19:54 +0000 (14:19 +0300)]
net: dsa: lan9303: fix broken backpressure in .port_fdb_dump
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
multiple netlink skbs if the buffer provided by user space is too small
(one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in
dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
point, and then the dump resumes on the same port with a new skb, and
FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
drivers, then drivers must check for the -EMSGSIZE error code returned
by it. Otherwise, when a netlink skb becomes full, DSA will no longer
save newly dumped FDB entries to it, but the driver will continue
dumping. So FDB entries will be missing from the dump.
Fix the broken backpressure by propagating the "cb" return code and
allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
Fixes: ab335349b852 ("net: dsa: lan9303: Add port_fast_age and port_fdb_dump methods") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Tue, 10 Aug 2021 11:19:53 +0000 (14:19 +0300)]
net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump
rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
multiple netlink skbs if the buffer provided by user space is too small
(one buffer will typically handle a few hundred FDB entries).
When the current buffer becomes full, nlmsg_put() in
dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
point, and then the dump resumes on the same port with a new skb, and
FDB entries up to the saved index are simply skipped.
Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
drivers, then drivers must check for the -EMSGSIZE error code returned
by it. Otherwise, when a netlink skb becomes full, DSA will no longer
save newly dumped FDB entries to it, but the driver will continue
dumping. So FDB entries will be missing from the dump.
Fix the broken backpressure by propagating the "cb" return code and
allow rtnl_fdb_dump() to restart the FDB dump with a new skb.
Fixes: e4b27ebc780f ("net: dsa: Add DSA driver for Hirschmann Hellcreek switches") Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Kurt Kanzenbach <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Randy Dunlap [Mon, 9 Aug 2021 21:52:29 +0000 (14:52 -0700)]
bpf, core: Fix kernel-doc notation
Fix kernel-doc warnings in kernel/bpf/core.c (found by scripts/kernel-doc
and W=1 builds). That is, correct a function name in a comment and add
return descriptions for 2 functions.
Fixes these kernel-doc warnings:
kernel/bpf/core.c:1372: warning: expecting prototype for __bpf_prog_run(). Prototype was for ___bpf_prog_run() instead
kernel/bpf/core.c:1372: warning: No description found for return value of '___bpf_prog_run'
kernel/bpf/core.c:1883: warning: No description found for return value of 'bpf_prog_select_runtime'
Eric Dumazet [Tue, 10 Aug 2021 09:45:47 +0000 (02:45 -0700)]
net: igmp: fix data-race in igmp_ifc_timer_expire()
Fix the data-race reported by syzbot [1]
Issue here is that igmp_ifc_timer_expire() can update in_dev->mr_ifc_count
while another change just occured from another context.
in_dev->mr_ifc_count is only 8bit wide, so the race had little
consequences.
[1]
BUG: KCSAN: data-race in igmp_ifc_event / igmp_ifc_timer_expire
write to 0xffff8881051e3062 of 1 bytes by task 12547 on cpu 0:
igmp_ifc_event+0x1d5/0x290 net/ipv4/igmp.c:821
igmp_group_added+0x462/0x490 net/ipv4/igmp.c:1356
____ip_mc_inc_group+0x3ff/0x500 net/ipv4/igmp.c:1461
__ip_mc_join_group+0x24d/0x2c0 net/ipv4/igmp.c:2199
ip_mc_join_group_ssm+0x20/0x30 net/ipv4/igmp.c:2218
do_ip_setsockopt net/ipv4/ip_sockglue.c:1285 [inline]
ip_setsockopt+0x1827/0x2a80 net/ipv4/ip_sockglue.c:1423
tcp_setsockopt+0x8c/0xa0 net/ipv4/tcp.c:3657
sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3362
__sys_setsockopt+0x18f/0x200 net/socket.c:2159
__do_sys_setsockopt net/socket.c:2170 [inline]
__se_sys_setsockopt net/socket.c:2167 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2167
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 12539 Comm: syz-executor.1 Not tainted 5.14.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Johan Almbladh [Mon, 9 Aug 2021 09:18:29 +0000 (11:18 +0200)]
bpf, tests: Add tail call test suite
While BPF_CALL instructions were tested implicitly by the cBPF-to-eBPF
translation, there has not been any tests for BPF_TAIL_CALL instructions.
The new test suite includes tests for tail call chaining, tail call count
tracking and error paths. It is mainly intended for JIT development and
testing.
Johan Almbladh [Mon, 9 Aug 2021 09:18:28 +0000 (11:18 +0200)]
bpf, tests: Add tests for BPF_CMPXCHG
Tests for BPF_CMPXCHG with both word and double word operands. As with
the tests for other atomic operations, these tests only check the result
of the arithmetic operation. The atomicity of the operations is not tested.
Johan Almbladh [Mon, 9 Aug 2021 09:18:25 +0000 (11:18 +0200)]
bpf, tests: Add branch conversion JIT test
Some JITs may need to convert a conditional jump instruction to
to short PC-relative branch and a long unconditional jump, if the
PC-relative offset exceeds offset field width in the CPU instruction.
This test triggers such branch conversion on the 32-bit MIPS JIT.
Johan Almbladh [Mon, 9 Aug 2021 09:18:24 +0000 (11:18 +0200)]
bpf, tests: Add word-order tests for load/store of double words
A double word (64-bit) load/store may be implemented as two successive
32-bit operations, one for each word. Check that the order of those
operations is consistent with the machine endianness.
Johan Almbladh [Mon, 9 Aug 2021 09:18:23 +0000 (11:18 +0200)]
bpf, tests: Add tests for ALU operations implemented with function calls
32-bit JITs may implement complex ALU64 instructions using function calls.
The new tests check aspects related to this, such as register clobbering
and register argument re-ordering.
Johan Almbladh [Mon, 9 Aug 2021 09:18:22 +0000 (11:18 +0200)]
bpf, tests: Add more ALU64 BPF_MUL tests
This patch adds BPF_MUL tests for 64x32 and 64x64 multiply. Mainly
testing 32-bit JITs that implement ALU64 operations with two 32-bit
CPU registers per operand.
Johan Almbladh [Mon, 9 Aug 2021 09:18:21 +0000 (11:18 +0200)]
bpf, tests: Add more BPF_LSH/RSH/ARSH tests for ALU64
This patch adds a number of tests for BPF_LSH, BPF_RSH amd BPF_ARSH
ALU64 operations with values that may trigger different JIT code paths.
Mainly testing 32-bit JITs that implement ALU64 operations with two
32-bit CPU registers per operand.
Johan Almbladh [Mon, 9 Aug 2021 09:18:20 +0000 (11:18 +0200)]
bpf, tests: Add more ALU32 tests for BPF_LSH/RSH/ARSH
This patch adds more tests of ALU32 shift operations BPF_LSH and BPF_RSH,
including the special case of a zero immediate. Also add corresponding
BPF_ARSH tests which were missing for ALU32.
Johan Almbladh [Mon, 9 Aug 2021 09:18:19 +0000 (11:18 +0200)]
bpf, tests: Add more tests of ALU32 and ALU64 bitwise operations
This patch adds tests of BPF_AND, BPF_OR and BPF_XOR with different
magnitude of the immediate value. Mainly checking 32-bit JIT sub-word
handling and zero/sign extension.
Johan Almbladh [Mon, 9 Aug 2021 09:18:17 +0000 (11:18 +0200)]
bpf, tests: Add BPF_MOV tests for zero and sign extension
Tests for ALU32 and ALU64 MOV with different sizes of the immediate
value. Depending on the immediate field width of the native CPU
instructions, a JIT may generate code differently depending on the
immediate value. Test that zero or sign extension is performed as
expected. Mainly for JIT testing.
Johan Almbladh [Mon, 9 Aug 2021 09:18:16 +0000 (11:18 +0200)]
bpf, tests: Add BPF_JMP32 test cases
An eBPF JIT may implement JMP32 operations in a different way than JMP,
especially on 32-bit architectures. This patch adds a series of tests
for JMP32 operations, mainly for testing JITs.
David S. Miller [Tue, 10 Aug 2021 08:58:15 +0000 (09:58 +0100)]
Merge branch 'ks8795-vlan-fixes'
Ben Hutchings says:
====================
ksz8795 VLAN fixes
This series fixes a number of bugs in the ksz8795 driver that affect
VLAN filtering, tag insertion, and tag removal.
I've tested these on the KSZ8795CLXD evaluation board, and checked the
register usage against the datasheets for the other supported chips.
====================
Ben Hutchings [Mon, 9 Aug 2021 23:00:15 +0000 (01:00 +0200)]
net: dsa: microchip: ksz8795: Don't use phy_port_cnt in VLAN table lookup
The magic number 4 in VLAN table lookup was the number of entries we
can read and write at once. Using phy_port_cnt here doesn't make
sense and presumably broke VLAN filtering for 3-port switches. Change
it back to 4.
Fixes: 4ce2a984abd8 ("net: dsa: microchip: ksz8795: use phy_port_cnt ...") Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Ben Hutchings [Mon, 9 Aug 2021 23:00:06 +0000 (01:00 +0200)]
net: dsa: microchip: ksz8795: Fix VLAN filtering
Currently ksz8_port_vlan_filtering() sets or clears the VLAN Enable
hardware flag. That controls discarding of packets with a VID that
has not been enabled for any port on the switch.
Since it is a global flag, set the dsa_switch::vlan_filtering_is_global
flag so that the DSA core understands this can't be controlled per
port.
When VLAN filtering is enabled, the switch should also discard packets
with a VID that's not enabled on the ingress port. Set or clear each
external port's VLAN Ingress Filter flag in ksz8_port_vlan_filtering()
to make that happen.
Fixes: e66f840c08a2 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Ben Hutchings [Mon, 9 Aug 2021 22:59:57 +0000 (00:59 +0200)]
net: dsa: microchip: ksz8795: Use software untagging on CPU port
On the CPU port, we can support both tagged and untagged VLANs at the
same time by doing any necessary untagging in software rather than
hardware. To enable that, keep the CPU port's Remove Tag flag cleared
and set the dsa_switch::untag_bridge_pvid flag.
Fixes: e66f840c08a2 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Ben Hutchings [Mon, 9 Aug 2021 22:59:47 +0000 (00:59 +0200)]
net: dsa: microchip: ksz8795: Fix VLAN untagged flag change on deletion
When a VLAN is deleted from a port, the flags in struct
switchdev_obj_port_vlan are always 0. ksz8_port_vlan_del() copies the
BRIDGE_VLAN_INFO_UNTAGGED flag to the port's Tag Removal flag, and
therefore always clears it.
In case there are multiple VLANs configured as untagged on this port -
which seems useless, but is allowed - deleting one of them changes the
remaining VLANs to be tagged.
It's only ever necessary to change this flag when a VLAN is added to
the port, so leave it unchanged in ksz8_port_vlan_del().
Fixes: e66f840c08a2 ("net: dsa: ksz: Add Microchip KSZ8795 DSA driver") Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: David S. Miller <[email protected]>