Arun Ramadoss [Tue, 16 Aug 2022 10:55:16 +0000 (16:25 +0530)]
net: dsa: microchip: ksz9477: fix fdb_dump last invalid entry
In the ksz9477_fdb_dump function it reads the ALU control register and
exit from the timeout loop if there is valid entry or search is
complete. After exiting the loop, it reads the alu entry and report to
the user space irrespective of entry is valid. It works till the valid
entry. If the loop exited when search is complete, it reads the alu
table. The table returns all ones and it is reported to user space. So
bridge fdb show gives ff:ff:ff:ff:ff:ff as last entry for every port.
To fix it, after exiting the loop the entry is reported only if it is
valid one.
====================
net: dsa: bcm_sf2: Utilize PHYLINK for all ports
This patch series has the bcm_sf2 driver utilize PHYLINK to configure
the CPU port link parameters to unify the configuration and pave the way
for DSA to utilize PHYLINK for all ports in the future.
Tested on BCM7445 and BCM7278
====================
Florian Fainelli [Mon, 15 Aug 2022 17:50:09 +0000 (10:50 -0700)]
net: dsa: bcm_sf2: Have PHYLINK configure CPU/IMP port(s)
Remove the artificial limitations imposed upon
bcm_sf2_sw_mac_link_{up,down} and allow us to override the link
parameters for IMP port(s) as well as regular ports by accounting for
the special differences that exist there.
Remove the code that did override the link parameters in
bcm_sf2_imp_setup().
Florian Fainelli [Mon, 15 Aug 2022 17:50:08 +0000 (10:50 -0700)]
net: dsa: bcm_sf2: Introduce helper for port override offset
Depending upon the generation of switches, we have different offsets for
configuring a given port's status override where link parameters are
applied. Introduce a helper function that we re-use throughout the code
in order to let phylink callbacks configure the IMP/CPU port(s) in
subsequent changes.
Hao Luo [Tue, 16 Aug 2022 23:40:11 +0000 (16:40 -0700)]
libbpf: Allows disabling auto attach
Adds libbpf APIs for disabling auto-attach for individual functions.
This is motivated by the use case of cgroup iter [1]. Some iter
types require their parameters to be non-zero, therefore applying
auto-attach on them will fail. With these two new APIs, users who
want to use auto-attach and these types of iters can disable
auto-attach on the program and perform manual attach.
ice: Fix VF not able to send tagged traffic with no VLAN filters
VF was not able to send tagged traffic when it didn't
have any VLAN interfaces and VLAN anti-spoofing was enabled.
Fix this by allowing VFs with no VLAN filters to send tagged
traffic. After VF adds a VLAN interface it will be able to
send tagged traffic matching VLAN filters only.
Testing hints:
1. Spawn VF
2. Send tagged packet from a VF
3. The packet should be sent out and not dropped
4. Add a VLAN interface on VF
5. Send tagged packet on that VLAN interface
6. Packet should be sent out and not dropped
7. Send tagged packet with id different than VLAN interface
8. Packet should be dropped
ice: Ignore error message when setting same promiscuous mode
Commit 1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling")
introduced new checks when setting/clearing promiscuous mode. But if the
requested promiscuous mode setting already exists, an -EEXIST error
message would be printed. This is incorrect because promiscuous mode is
either on/off and shouldn't print an error when the requested
configuration is already set.
This can happen when removing a bridge with two bonded interfaces and
promiscuous most isn't fully cleared from VLAN VSI in hardware.
Fix this by ignoring cases where requested promiscuous mode exists.
Grzegorz Siwik [Fri, 12 Aug 2022 13:25:49 +0000 (15:25 +0200)]
ice: Fix clearing of promisc mode with bridge over bond
When at least two interfaces are bonded and a bridge is enabled on the
bond, an error can occur when the bridge is removed and re-added. The
reason for the error is because promiscuous mode was not fully cleared from
the VLAN VSI in the hardware. With this change, promiscuous mode is
properly removed when the bridge disconnects from bonding.
[ 1033.676359] bond1: link status definitely down for interface enp95s0f0, disabling it
[ 1033.676366] bond1: making interface enp175s0f0 the new active one
[ 1033.676369] device enp95s0f0 left promiscuous mode
[ 1033.676522] device enp175s0f0 entered promiscuous mode
[ 1033.676901] ice 0000:af:00.0 enp175s0f0: Error setting Multicast promiscuous mode on VSI 6
[ 1041.795662] ice 0000:af:00.0 enp175s0f0: Error setting Multicast promiscuous mode on VSI 6
[ 1041.944826] bond1: link status definitely down for interface enp175s0f0, disabling it
[ 1041.944874] device enp175s0f0 left promiscuous mode
[ 1041.944918] bond1: now running without any active interface!
Grzegorz Siwik [Fri, 12 Aug 2022 13:25:48 +0000 (15:25 +0200)]
ice: Ignore EEXIST when setting promisc mode
Ignore EEXIST error when setting promiscuous mode.
This fix is needed because the driver could set promiscuous mode
when it still has not cleared properly.
Promiscuous mode could be set only once, so setting it second
time will be rejected.
Grzegorz Siwik [Fri, 12 Aug 2022 13:25:47 +0000 (15:25 +0200)]
ice: Fix double VLAN error when entering promisc mode
Avoid enabling or disabling VLAN 0 when trying to set promiscuous
VLAN mode if double VLAN mode is enabled. This fix is needed
because the driver tries to add the VLAN 0 filter twice (once for
inner and once for outer) when double VLAN mode is enabled. The
filter program is rejected by the firmware when double VLAN is
enabled, because the promiscuous filter only needs to be set once.
This issue was missed in the initial implementation of double VLAN
mode.
Linus Torvalds [Wed, 17 Aug 2022 15:58:54 +0000 (08:58 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Most notably this drops the commits that trip up google cloud (turns
out, any legacy device).
Plus a kerneldoc patch"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio: kerneldocs fixes and enhancements
virtio: Revert "virtio: find_vqs() add arg sizes"
virtio_vdpa: Revert "virtio_vdpa: support the arg sizes of find_vqs()"
virtio_pci: Revert "virtio_pci: support the arg sizes of find_vqs()"
virtio-mmio: Revert "virtio_mmio: support the arg sizes of find_vqs()"
virtio: Revert "virtio: add helper virtio_find_vqs_ctx_size()"
virtio_net: Revert "virtio_net: set the default max ring size by find_vqs()"
Florian Westphal [Tue, 16 Aug 2022 12:15:22 +0000 (14:15 +0200)]
testing: selftests: nft_flowtable.sh: rework test to detect offload failure
This test fails on current kernel releases because the flotwable path
now calls dst_check from packet path and will then remove the offload.
Test script has two purposes:
1. check that file (random content) can be sent to other netns (and vv)
2. check that the flow is offloaded (rather than handled by classic
forwarding path).
Since dst_check is in place, 2) fails because the nftables ruleset in
router namespace 1 intentionally blocks traffic under the assumption
that packets are not passed via classic path at all.
Rework this: Instead of blocking traffic, create two named counters, one
for original and one for reverse direction.
The first three test cases are handled by classic forwarding path
(path mtu discovery is disabled and packets exceed MTU).
But all other tests enable PMTUD, so the originator and responder are
expected to lower packet size and flowtable is expected to do the packet
forwarding.
For those tests, check that the packet counters (which are only
incremented for packets that are passed up to classic forward path)
are significantly lower than the file size transferred.
I've tested that the counter-checks fail as expected when the 'flow add'
statement is removed from the ruleset.
M Chetan Kumar [Tue, 16 Aug 2022 04:24:05 +0000 (09:54 +0530)]
net: wwan: t7xx: Enable devlink based fw flashing and coredump collection
This patch brings-in support for t7xx wwan device firmware flashing &
coredump collection using devlink.
Driver Registers with Devlink framework.
Implements devlink ops flash_update callback that programs modem firmware.
Creates region & snapshot required for device coredump log collection.
On early detection of wwan device in fastboot mode driver sets up CLDMA0 HW
tx/rx queues for raw data transfer then registers with devlink framework.
Upon receiving firmware image & partition details driver sends fastboot
commands for flashing the firmware.
In this flow the fastboot command & response gets exchanged between driver
and device. Once firmware flashing is success completion status is reported
to user space application.
Below is the devlink command usage for firmware flashing
$devlink dev flash pci/$BDF file ABC.img component ABC
Note: ABC.img is the firmware to be programmed to "ABC" partition.
In case of coredump collection when wwan device encounters an exception
it reboots & stays in fastboot mode for coredump collection by host driver.
On detecting exception state driver collects the core dump, creates the
devlink region & reports an event to user space application for dump
collection. The user space application invokes devlink region read command
for dump collection.
Below are the devlink commands used for coredump collection.
devlink region new pci/$BDF/mr_dump
devlink region read pci/$BDF/mr_dump snapshot $ID address $ADD length $LEN
devlink region del pci/$BDF/mr_dump snapshot $ID
Haijun Liu [Tue, 16 Aug 2022 04:23:53 +0000 (09:53 +0530)]
net: wwan: t7xx: PCIe reset rescan
PCI rescan module implements "rescan work queue". In firmware flashing
or coredump collection procedure WWAN device is programmed to boot in
fastboot mode and a work item is scheduled for removal & detection.
The WWAN device is reset using APCI call as part driver removal flow.
Work queue rescans pci bus at fixed interval for device detection,
later when device is detect work queue exits.
Haijun Liu [Tue, 16 Aug 2022 04:23:40 +0000 (09:53 +0530)]
net: wwan: t7xx: Infrastructure for early port configuration
To support cases such as FW update or Core dump, the t7xx device
is capable of signaling the host that a special port needs
to be created before the handshake phase.
This patch adds the infrastructure required to create the
early ports which also requires a different configuration of
CLDMA queues.
Haijun Liu [Tue, 16 Aug 2022 04:23:28 +0000 (09:53 +0530)]
net: wwan: t7xx: Add AP CLDMA
The t7xx device contains two Cross Layer DMA (CLDMA) interfaces to
communicate with AP and Modem processors respectively. So far only
MD-CLDMA was being used, this patch enables AP-CLDMA.
Florian Fainelli [Mon, 15 Aug 2022 19:07:47 +0000 (12:07 -0700)]
net: phy: broadcom: Implement suspend/resume for AC131 and BCM5241
Implement the suspend/resume procedure for the Broadcom AC131 and BCM5241 type
of PHYs (10/100 only) by entering the standard power down followed by the
proprietary standby mode in the auxiliary mode 4 shadow register. On resume,
the PHY software reset is enough to make it come out of standby mode so we can
utilize brcm_fet_config_init() as the resume hook.
Jakub Kicinski [Tue, 16 Aug 2022 00:23:58 +0000 (17:23 -0700)]
tls: rx: react to strparser initialization errors
Even though the normal strparser's init function has a return
value we got away with ignoring errors until now, as it only
validates the parameters and we were passing correct parameters.
tls_strp can fail to init on memory allocation errors, which
syzbot duly induced and reported.
David S. Miller [Wed, 17 Aug 2022 09:20:45 +0000 (10:20 +0100)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex
t-queue
Tony Nguyen says:
====================
ice: detect and report PTP timestamp issues
Jacob Keller says:
This series fixes a few small issues with the cached PTP Hardware Clock
timestamp used for timestamp extension. It also introduces extra checks to
help detect issues with this logic, such as if the cached timestamp is not
updated within the 2 second window.
This introduces a few statistics similar to the ones already available in
other Intel drivers, including tx_hwtstamp_skipped and tx_hwtstamp_timeouts.
It is intended to aid in debugging issues we're seeing with some setups
which might be related to incorrect cached timestamp values.
====================
Florian Westphal [Tue, 16 Aug 2022 12:15:21 +0000 (14:15 +0200)]
testing: selftests: nft_flowtable.sh: use random netns names
"ns1" is a too generic name, use a random suffix to avoid
errors when such a netns exists. Also allows to run multiple
instances of the script in parallel.
Linus Torvalds [Tue, 16 Aug 2022 18:42:11 +0000 (11:42 -0700)]
Merge tag 'nios2_fixes_v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux
Pull NIOS2 fixes from Dinh Nguyen:
- Security fixes from Al Viro
* tag 'nios2_fixes_v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
nios2: add force_successful_syscall_return()
nios2: restarts apply only to the first sigframe we build...
nios2: fix syscall restart checks
nios2: traced syscall does need to check the syscall number
nios2: don't leave NULLs in sys_call_table[]
nios2: page fault et.al. are *not* restartable syscalls...
Linus Torvalds [Tue, 16 Aug 2022 18:40:15 +0000 (11:40 -0700)]
Merge tag 'spi-fix-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few fixes that came in since my pull request, the Meson fix is a
little large since it's fixing all possible cases of the problem that
was observed with the driver and clock API trying to share
configuration by integrating the device clocking fully with the clock
API rather than spot fixing the one instance that was observed"
* tag 'spi-fix-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: dt-bindings: Drop Pratyush Yadav
spi: meson-spicc: add local pow2 clock ops to preserve rate between messages
MAINTAINERS: rectify entry for ARM/HPE GXP ARCHITECTURE
spi: spi.c: Add missing __percpu annotations in users of spi_statistics
Linus Torvalds [Tue, 16 Aug 2022 18:36:38 +0000 (11:36 -0700)]
Merge tag 'regulator-fix-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of small fixes that came in since my pull request, nothing
major here"
* tag 'regulator-fix-v6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: Fix missing error return from regulator_bulk_get()
regulator: pca9450: Remove restrictions for regulator-name
The exception for the "unaligned access at the end of the page, next
page not mapped" never happens, but the fixup code ends up causing
trouble for compilers to optimize well.
clang in particular ends up seeing it being in the middle of a loop, and
tries desperately to optimize the exception fixup code that is never
really reached.
The simple solution is to just move all the fixups into the exception
handler itself, which moves it all out of the hot case code, and means
that the compiler never sees it or needs to worry about it.
Hector Martin [Tue, 16 Aug 2022 07:03:11 +0000 (16:03 +0900)]
locking/atomic: Make test_and_*_bit() ordered on failure
These operations are documented as always ordered in
include/asm-generic/bitops/instrumented-atomic.h, and producer-consumer
type use cases where one side needs to ensure a flag is left pending
after some shared data was updated rely on this ordering, even in the
failure case.
This is the case with the workqueue code, which currently suffers from a
reproducible ordering violation on Apple M1 platforms (which are
notoriously out-of-order) that ends up causing the TTY layer to fail to
deliver data to userspace properly under the right conditions. This
change fixes that bug.
Change the documentation to restrict the "no order on failure" story to
the _lock() variant (for which it makes sense), and remove the
early-exit from the generic implementation, which is what causes the
missing barrier semantics in that case. Without this, the remaining
atomic op is fully ordered (including on ARM64 LSE, as of recent
versions of the architecture spec).
Suggested-by: Linus Torvalds <[email protected]> Cc: [email protected] Fixes: e986a0d6cb36 ("locking/atomics, asm-generic/bitops/atomic.h: Rewrite using atomic_*() APIs") Fixes: 61e02392d3c7 ("locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()") Signed-off-by: Hector Martin <[email protected]> Acked-by: Will Deacon <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Jacob Keller [Wed, 27 Jul 2022 23:16:02 +0000 (16:16 -0700)]
ice: introduce ice_ptp_reset_cached_phctime function
If the PTP hardware clock is adjusted, the ice driver must update the
cached PHC timestamp. This is required in order to perform timestamp
extension on the shorter timestamps captured by the PHY.
Currently, we simply call ice_ptp_update_cached_phctime in the settime and
adjtime callbacks. This has a few issues:
1) if ICE_CFG_BUSY is set because another thread is updating the Rx rings,
we will exit with an error. This is not checked, and the functions do
not re-schedule the update. This could leave the cached timestamp
incorrect until the next scheduled work item execution.
2) even if we did handle an update, any currently outstanding Tx timestamp
would be extended using the wrong cached PHC time. This would produce
incorrect results.
To fix these issues, introduce a new ice_ptp_reset_cached_phctime function.
This function calls the ice_ptp_update_cached_phctime, and discards
outstanding Tx timestamps.
If the ice_ptp_update_cached_phctime function fails because ICE_CFG_BUSY is
set, we log a warning and schedule the thread to execute soon. The update
function is modified so that it always updates the cached copy in the PF
regardless. This ensures we have the most up to date values possible and
minimizes the risk of a packet timestamp being extended with the wrong
value.
It would be nice if we could skip reporting Rx timestamps until the cached
values are up to date. However, we can't access the Rx rings while
ICE_CFG_BUSY is set because they are actively being updated by another
thread.
Jacob Keller [Wed, 27 Jul 2022 23:16:01 +0000 (16:16 -0700)]
ice: re-arrange some static functions in ice_ptp.c
A following change is going to want to make use of ice_ptp_flush_tx_tracker
earlier in the ice_ptp.c file. To make this work, move the Tx timestamp
tracking functions higher up in the file, and pull the
ice_ptp_update_cached_timestamp function below them. This should have no
functional change.
Jacob Keller [Wed, 27 Jul 2022 23:16:00 +0000 (16:16 -0700)]
ice: track and warn when PHC update is late
The ice driver requires a cached copy of the PHC time in order to perform
timestamp extension on Tx and Rx hardware timestamp values. This cached PHC
time must always be updated at least once every 2 seconds. Otherwise, the
math used to perform the extension would produce invalid results.
The updates are supposed to occur periodically in the PTP kthread work
item, which is scheduled to run every half second. Thus, we do not expect
an update to be delayed for so long. However, there are error conditions
which can cause the update to be delayed.
Track this situation by using jiffies to determine approximately how long
ago the last update occurred. Add a new statistic and a dev_warn when we
have failed to update the cached PHC time. This makes the error case more
obvious.
Jacob Keller [Wed, 27 Jul 2022 23:15:59 +0000 (16:15 -0700)]
ice: track Tx timestamp stats similar to other Intel drivers
Several Intel networking drivers which support PTP track when Tx timestamps
are skipped or when they timeout without a timestamp from hardware. The
conditions which could cause these events are rare, but it can be useful to
know when and how often they occur.
Implement similar statistics for the ice driver, tx_hwtstamp_skipped,
tx_hwtstamp_timeouts, and tx_hwtstamp_flushed.
Jacob Keller [Wed, 27 Jul 2022 23:15:58 +0000 (16:15 -0700)]
ice: initialize cached_phctime when creating Rx rings
When we create new Rx rings, the cached_phctime field is zero initialized.
This could result in incorrect timestamp reporting due to the cached value
not yet being updated. Although a background task will periodically update
the cached value, ensure it matches the existing cached value in the PF
structure at ring initialization.
Jacob Keller [Wed, 27 Jul 2022 23:15:57 +0000 (16:15 -0700)]
ice: set tx_tstamps when creating new Tx rings via ethtool
When the user changes the number of queues via ethtool, the driver
allocates new rings. This allocation did not initialize tx_tstamps. This
results in the tx_tstamps field being zero (due to kcalloc allocation), and
would result in a NULL pointer dereference when attempting a transmit
timestamp on the new ring.
Alan Brady [Tue, 2 Aug 2022 08:19:17 +0000 (10:19 +0200)]
i40e: Fix to stop tx_timeout recovery if GLOBR fails
When a tx_timeout fires, the PF attempts to recover by incrementally
resetting. First we try a PFR, then CORER and finally a GLOBR. If the
GLOBR fails, then we keep hitting the tx_timeout and incrementing the
recovery level and issuing dmesgs, which is both annoying to the user
and accomplishes nothing.
If the GLOBR fails, then we're pretty much totally hosed, and there's
not much else we can do to recover, so this makes it such that we just
kill the VSI and stop hitting the tx_timeout in such a case.
i40e: Fix tunnel checksum offload with fragmented traffic
Fix checksum offload on VXLAN tunnels.
In case, when mpls protocol is not used, set l4 header to transport
header of skb. This fixes case, when user tries to offload checksums
of VXLAN tunneled traffic.
Steps for reproduction (requires link partner with tunnels):
ip l s enp130s0f0 up
ip a f enp130s0f0
ip a a 10.10.110.2/24 dev enp130s0f0
ip l s enp130s0f0 mtu 1600
ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev \
enp130s0f0 dstport 4789
ip l s vxlan12_sut up
ip a a 20.10.110.2/24 dev vxlan12_sut
iperf3 -c 20.10.110.1 #should connect
Without this patch, TX descriptor was using wrong data, due to
l4 header pointing wrong address. NIC would then drop those packets
internally, due to incorrect TX descriptor data, which increased
GLV_TEPC register.
virtio_pci: Revert "virtio_pci: support the arg sizes of find_vqs()"
This reverts commit cdb44806fca2d0ad29ca644cbf1505433902ee0c: the legacy
path is wrong and in fact can not support the proposed API since for a
legacy device we never communicate the vq size to the hypervisor.
This has been reported to trip up guests on GCP (Google Cloud).
The reason is that virtio_find_vqs_ctx_size is broken on legacy
devices. We can in theory fix virtio_find_vqs_ctx_size but
in fact the patch itself has several other issues:
- It treats unknown speed as < 10G
- It leaves userspace no way to find out the ring size set by hypervisor
- It tests speed when link is down
- It ignores the virtio spec advice:
Both \field{speed} and \field{duplex} can change, thus the driver
is expected to re-read these values after receiving a
configuration change notification.
- It is not clear the performance impact has been tested properly
Jakub Kicinski [Tue, 16 Aug 2022 03:14:39 +0000 (20:14 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-08-12 (iavf)
This series contains updates to iavf driver only.
Przemyslaw frees memory for admin queues in initialization error paths,
prevents freeing of vf_res which is causing null pointer dereference,
and adjusts calls in error path of reset to avoid iavf_close() which
could cause deadlock.
Ivan Vecera avoids deadlock that can occur when driver if part of
failover.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: Fix deadlock in initialization
iavf: Fix reset error handling
iavf: Fix NULL pointer dereference in iavf_get_link_ksettings
iavf: Fix adminq error handling
====================
Zhengchao Shao [Mon, 15 Aug 2022 02:46:29 +0000 (10:46 +0800)]
net: rtnetlink: fix module reference count leak issue in rtnetlink_rcv_msg
When bulk delete command is received in the rtnetlink_rcv_msg function,
if bulk delete is not supported, module_put is not called to release
the reference counting. As a result, module reference count is leaked.
Hangbin Liu [Sat, 13 Aug 2022 00:09:36 +0000 (08:09 +0800)]
libbpf: Making bpf_prog_load() ignore name if kernel doesn't support
Similar with commit 10b62d6a38f7 ("libbpf: Add names for auxiliary maps"),
let's make bpf_prog_load() also ignore name if kernel doesn't support
program name.
To achieve this, we need to call sys_bpf_prog_load() directly in
probe_kern_prog_name() to avoid circular dependency. sys_bpf_prog_load()
also need to be exported in the libbpf_internal.h file.
Daniel Xu [Thu, 11 Aug 2022 21:55:25 +0000 (15:55 -0600)]
selftests/bpf: Add existing connection bpf_*_ct_lookup() test
Add a test where we do a conntrack lookup on an existing connection.
This is nice because it's a more realistic test than artifically
creating a ct entry and looking it up afterwards.
Quentin Monnet [Mon, 15 Aug 2022 16:22:05 +0000 (17:22 +0100)]
bpftool: Clear errno after libcap's checks
When bpftool is linked against libcap, the library runs a "constructor"
function to compute the number of capabilities of the running kernel
[0], at the beginning of the execution of the program. As part of this,
it performs multiple calls to prctl(). Some of these may fail, and set
errno to a non-zero value:
# strace -e prctl ./bpftool version
prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 1
prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 1
prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
** fprintf added at the top of main(): we have errno == 1
./bpftool v7.0.0
using libbpf v1.0
features: libbfd, libbpf_strict, skeletons
+++ exited with 0 +++
This has been addressed in libcap 2.63 [1], but until this version is
available everywhere, we can fix it on bpftool side.
Let's clean errno at the beginning of the main() function, to make sure
that these checks do not interfere with the batch mode, where we error
out if errno is set after a bpftool command.
selftests/landlock: fix broken include of linux/landlock.h
Revert part of the earlier changes to fix the kselftest build when
using a sub-directory from the top of the tree as this broke the
landlock test build as a side-effect when building with "make -C
tools/testing/selftests/landlock".
netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified
Since f3a2181e16f1 ("netfilter: nf_tables: Support for sets with
multiple ranged fields"), it possible to combine intervals and
concatenations. Later on, ef516e8625dd ("netfilter: nf_tables:
reintroduce the NFT_SET_CONCAT flag") provides the NFT_SET_CONCAT flag
for userspace to report that the set stores a concatenation.
Make sure NFT_SET_CONCAT is set on if field_count is specified for
consistency. Otherwise, if NFT_SET_CONCAT is specified with no
field_count, bail out with EINVAL.
Fixes: ef516e8625dd ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag") Signed-off-by: Pablo Neira Ayuso <[email protected]>
Al Viro [Mon, 8 Aug 2022 15:09:45 +0000 (16:09 +0100)]
nios2: add force_successful_syscall_return()
If we use the ancient SysV syscall ABI, we'd better have tell the
kernel how to claim that a negative return value is a success.
Use ->orig_r2 for that - it's inaccessible via ptrace, so it's
a fair game for changes and it's normally[*] non-negative on return
from syscall. Set to -1; syscall is not going to be restart-worthy
by definition, so we won't interfere with that use either.
[*] the only exception is rt_sigreturn(), where we skip the entire
messing with r1/r2 anyway.
netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags
If the NFT_SET_CONCAT|NFT_SET_INTERVAL flags are set on, then the
netlink attribute NFTA_SET_ELEM_KEY_END must be specified. Otherwise,
NFTA_SET_ELEM_KEY_END should not be present.
For catch-all element, NFTA_SET_ELEM_KEY_END should not be present.
The NFT_SET_ELEM_INTERVAL_END is never used with this set flags
combination.
Quentin Monnet [Fri, 12 Aug 2022 15:37:27 +0000 (16:37 +0100)]
bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation
Adding or removing room space _below_ layers 2 or 3, as the description
mentions, is ambiguous. This was written with a mental image of the
packet with layer 2 at the top, layer 3 under it, and so on. But it has
led users to believe that it was on lower layers (before the beginning
of the L2 and L3 headers respectively).
Let's make it more explicit, and specify between which layers the room
space is adjusted.
David S. Miller [Mon, 15 Aug 2022 10:49:58 +0000 (11:49 +0100)]
Merge branch 'mlxsw-fixes'
Petr Machata says:
====================
mlxsw: Fixes for PTP support
This set fixes several issues in mlxsw PTP code.
- Patch #1 fixes compilation warnings.
- Patch #2 adjusts the order of operation during cleanup, thereby
closing the window after PTP state was already cleaned in the ASIC
for the given port, but before the port is removed, when the user
could still in theory make changes to the configuration.
- Patch #3 protects the PTP configuration with a custom mutex, instead
of relying on RTNL, which is not held in all access paths.
- Patch #4 forbids enablement of PTP only in RX or only in TX. The
driver implicitly assumed this would be the case, but neglected to
sanitize the configuration.
====================
Amit Cohen [Fri, 12 Aug 2022 15:32:03 +0000 (17:32 +0200)]
mlxsw: spectrum_ptp: Forbid PTP enablement only in RX or in TX
Currently mlxsw driver configures one global PTP configuration for all
ports. The reason is that the switch behaves like a transparent clock
between CPU port and front-panel ports. When time stamp is enabled in
any port, the hardware is configured to update the correction field. The
fact that the configuration of CPU port affects all the ports, makes the
correction field update to be global for all ports. Otherwise, user will
see odd values in the correction field, as the switch will update the
correction field in the CPU port, but not in all the front-panel ports.
The CPU port is relevant in both RX and TX, so to avoid problematic
configuration, forbid PTP enablement only in one direction, i.e., only in
RX or TX.
Without the change:
$ hwstamp_ctl -i swp1 -r 12 -t 0
current settings:
tx_type 0
rx_filter 0
new settings:
tx_type 0
rx_filter 2
$ echo $?
0
With the change:
$ hwstamp_ctl -i swp1 -r 12 -t 0
current settings:
tx_type 1
rx_filter 2
SIOCSHWTSTAMP failed: Invalid argument
Fixes: 08ef8bc825d96 ("mlxsw: spectrum_ptp: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Signed-off-by: Amit Cohen <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Amit Cohen [Fri, 12 Aug 2022 15:32:02 +0000 (17:32 +0200)]
mlxsw: spectrum_ptp: Protect PTP configuration with a mutex
Currently the functions mlxsw_sp2_ptp_{configure, deconfigure}_port()
assume that they are called when RTNL is locked and they warn otherwise.
The deconfigure function can be called when port is removed, for example
as part of device reload, then there is no locked RTNL and the function
warns [1].
To avoid such case, do not assume that RTNL protects this code, add a
dedicated mutex instead. The mutex protects 'ptp_state->config' which
stores the existing global configuration in hardware. Use this mutex also
to protect the code which configures the hardware. Then, there will be
only one configuration in any time, which will be updated in 'ptp_state'
and a race will be avoided.
Amit Cohen [Fri, 12 Aug 2022 15:32:01 +0000 (17:32 +0200)]
mlxsw: spectrum: Clear PTP configuration after unregistering the netdevice
Currently as part of removing port, PTP API is called to clear the
existing configuration and set the 'rx_filter' and 'tx_type' to zero.
The clearing is done before unregistering the netdevice, which means that
there is a window of time in which the user can reconfigure PTP in the
port, and this configuration will not be cleared.
Reorder the operations, clear PTP configuration after unregistering the
netdevice.
Fixes: 8748642751ede ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Signed-off-by: Amit Cohen <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Amit Cohen [Fri, 12 Aug 2022 15:32:00 +0000 (17:32 +0200)]
mlxsw: spectrum_ptp: Fix compilation warnings
In case that 'CONFIG_PTP_1588_CLOCK' is not enabled in the config file,
there are implementations for the functions
mlxsw_{sp,sp2}_ptp_txhdr_construct() as part of 'spectrum_ptp.h'. In this
case, they should be defined as 'static' as they are not supposed to be
used out of this file. Make the functions 'static', otherwise the following
warnings are returned:
"warning: no previous prototype for 'mlxsw_sp_ptp_txhdr_construct'"
"warning: no previous prototype for 'mlxsw_sp2_ptp_txhdr_construct'"
In addition, make the functions 'inline' for case that 'spectrum_ptp.h'
will be included anywhere else and the functions would probably not be
used, so compilation warnings about unused static will be returned.
handle of 0 implies from/to of universe realm which is not very
sensible.
Lets see what this patch will do:
$sudo tc qdisc add dev $DEV root handle 1:0 prio
//lets manufacture a way to insert handle of 0
$sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 \
route to 0 from 0 classid 1:10 action ok
//gets rejected...
Error: handle of 0 is not valid.
We have an error talking to the kernel, -1
//lets create a legit entry..
sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 route from 10 \
classid 1:10 action ok
//what did the kernel insert?
$sudo tc filter ls dev $DEV parent 1:0
filter protocol ip pref 100 route chain 0
filter protocol ip pref 100 route chain 0 fh 0x000a8000 flowid 1:10 from 10
action order 1: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
//Lets try to replace that legit entry with a handle of 0
$ sudo tc filter replace dev $DEV parent 1:0 protocol ip prio 100 \
handle 0x000a8000 route to 0 from 0 classid 1:10 action drop
Error: Replacing with handle of 0 is invalid.
We have an error talking to the kernel, -1
And last, lets run Cascardo's POC:
$ ./poc
0
0
-22
-22
-22
Xin Xiong [Sat, 13 Aug 2022 12:49:08 +0000 (20:49 +0800)]
net: fix potential refcount leak in ndisc_router_discovery()
The issue happens on specific paths in the function. After both the
object `rt` and `neigh` are grabbed successfully, when `lifetime` is
nonzero but the metric needs change, the function just deletes the
route and set `rt` to NULL. Then, it may try grabbing `rt` and `neigh`
again if above conditions hold. The function simply overwrite `neigh`
if succeeds or returns if fails, without decreasing the reference
count of previous `neigh`. This may result in memory leaks.
Fix it by decrementing the reference count of `neigh` in place.
Fixes: 6b2e04bc240f ("net: allow user to set metric on default route learned via Router Advertisement") Signed-off-by: Xin Xiong <[email protected]> Signed-off-by: Xin Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Right now we have a neigh_param PROXY_QLEN which specifies maximum length
of neigh_table->proxy_queue. But in fact, this limitation doesn't work well
because check condition looks like:
tbl->proxy_queue.qlen > NEIGH_VAR(p, PROXY_QLEN)
The problem is that p (struct neigh_parms) is a per-device thing,
but tbl (struct neigh_table) is a system-wide global thing.
It seems reasonable to make proxy_queue limit per-device based.
v2:
- nothing changed in this patch
v3:
- rebase to net tree
Denis V. Lunev [Thu, 11 Aug 2022 15:20:11 +0000 (18:20 +0300)]
neigh: fix possible DoS due to net iface start/stop loop
Normal processing of ARP request (usually this is Ethernet broadcast
packet) coming to the host is looking like the following:
* the packet comes to arp_process() call and is passed through routing
procedure
* the request is put into the queue using pneigh_enqueue() if
corresponding ARP record is not local (common case for container
records on the host)
* the request is processed by timer (within 80 jiffies by default) and
ARP reply is sent from the same arp_process() using
NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside
pneigh_enqueue())
And here the problem comes. Linux kernel calls pneigh_queue_purge()
which destroys the whole queue of ARP requests on ANY network interface
start/stop event through __neigh_ifdown().
This is actually not a problem within the original world as network
interface start/stop was accessible to the host 'root' only, which
could do more destructive things. But the world is changed and there
are Linux containers available. Here container 'root' has an access
to this API and could be considered as untrusted user in the hosting
(container's) world.
Thus there is an attack vector to other containers on node when
container's root will endlessly start/stop interfaces. We have observed
similar situation on a real production node when docker container was
doing such activity and thus other containers on the node become not
accessible.
The patch proposed doing very simple thing. It drops only packets from
the same namespace in the pneigh_queue_purge() where network interface
state change is detected. This is enough to prevent the problem for the
whole node preserving original semantics of the code.
v2:
- do del_timer_sync() if queue is empty after pneigh_queue_purge()
v3:
- rebase to net tree
Maxim Kochetkov [Thu, 11 Aug 2022 09:48:40 +0000 (12:48 +0300)]
net: qrtr: start MHI channel after endpoit creation
MHI channel may generates event/interrupt right after enabling.
It may leads to 2 race conditions issues.
1)
Such event may be dropped by qcom_mhi_qrtr_dl_callback() at check:
if (!qdev || mhi_res->transaction_status)
return;
Because dev_set_drvdata(&mhi_dev->dev, qdev) may be not performed at
this moment. In this situation qrtr-ns will be unable to enumerate
services in device.
---------------------------------------------------------------
2)
Such event may come at the moment after dev_set_drvdata() and
before qrtr_endpoint_register(). In this case kernel will panic with
accessing wrong pointer at qcom_mhi_qrtr_dl_callback():
Because endpoint is not created yet.
--------------------------------------------------------------
So move mhi_prepare_for_transfer_autoqueue after endpoint creation
to fix it.
Yury Norov [Fri, 12 Aug 2022 05:34:25 +0000 (22:34 -0700)]
radix-tree: replace gfp.h inclusion with gfp_types.h
Radix tree header includes gfp.h for __GFP_BITS_SHIFT only. Now we
have gfp_types.h for this.
Fixes powerpc allmodconfig build:
In file included from include/linux/nodemask.h:97,
from include/linux/mmzone.h:17,
from include/linux/gfp.h:7,
from include/linux/radix-tree.h:12,
from include/linux/idr.h:15,
from include/linux/kernfs.h:12,
from include/linux/sysfs.h:16,
from include/linux/kobject.h:20,
from include/linux/pci.h:35,
from arch/powerpc/kernel/prom_init.c:24:
include/linux/random.h: In function 'add_latent_entropy':
>> include/linux/random.h:25:46: error: 'latent_entropy' undeclared (first use in this function); did you mean 'add_latent_entropy'?
25 | add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy));
| ^~~~~~~~~~~~~~
| add_latent_entropy
include/linux/random.h:25:46: note: each undeclared identifier is reported only once for each function it appears in
Linus Torvalds [Sun, 14 Aug 2022 20:03:53 +0000 (13:03 -0700)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs lseek fix from Al Viro:
"Fix proc_reg_llseek() breakage. Always had been possible if somebody
left NULL ->proc_lseek, became a practical issue now"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
take care to handle NULL ->proc_lseek()
Linus Torvalds [Sun, 14 Aug 2022 16:22:11 +0000 (09:22 -0700)]
Merge tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull more perf tool updates from Arnaldo Carvalho de Melo:
- 'perf c2c' now supports ARM64, adjust its output to cope with
differences with what is in x86_64. Now go find false sharing on
ARM64 (at least Neoverse) as well!
- Refactor the JSON processing, making the output more compact and thus
reducing the size of the resulting perf binary
- Improvements for 'perf offcpu' profiling, including tracking child
processes
- Update Intel JSON metrics and events files for broadwellde,
broadwellx, cascadelakex, haswellx, icelakex, ivytown, jaketown,
knightslanding, sapphirerapids, skylakex and snowridgex
- Add 'perf stat' JSON output and a 'perf test' entry for it
- Ignore memfd and anonymous mmap events if jitdump present
- Fix an error handling path in 'parse_perf_probe_command()'
- Fixes for the guest Intel PT tracing patchkit in the 1st batch of
this merge window
- Print debuginfod queries if -v option is used, to explain delays in
processing when debuginfo servers are enabled to fetch DSOs with
richer symbol tables
- Improve error message for 'perf record -p not_existing_pid'
- Fix openssl and libbpf feature detection
- Add PMU pai_crypto event description for IBM z16 on 'perf list'
- Fix typos and duplicated words on comments in various places
* tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (81 commits)
perf test: Refactor shell tests allowing subdirs
perf vendor events: Update events for snowridgex
perf vendor events: Update events and metrics for skylakex
perf vendor events: Update metrics for sapphirerapids
perf vendor events: Update events for knightslanding
perf vendor events: Update metrics for jaketown
perf vendor events: Update metrics for ivytown
perf vendor events: Update events and metrics for icelakex
perf vendor events: Update events and metrics for haswellx
perf vendor events: Update events and metrics for cascadelakex
perf vendor events: Update events and metrics for broadwellx
perf vendor events: Update metrics for broadwellde
perf jevents: Fold strings optimization
perf jevents: Compress the pmu_events_table
perf metrics: Copy entire pmu_event in find metric
perf pmu-events: Hide the pmu_events
perf pmu-events: Don't assume pmu_event is an array
perf pmu-events: Move test events/metrics to JSON
perf test: Use full metric resolution
perf pmu-events: Hide pmu_events_map
...
Linus Torvalds [Sun, 14 Aug 2022 15:48:13 +0000 (08:48 -0700)]
Merge tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Ensure we never emit lwarx with EH=1 on 32-bit, because some 32-bit
CPUs trap on it rather than ignoring it as they should.
- Fix ftrace when building with clang, which was broken by some
refactoring.
- A couple of other minor fixes.
Thanks to Christophe Leroy, Naveen N. Rao, Nick Desaulniers, Ondrej
Mosnacek, Pali Rohár, Russell Currey, and Segher Boessenkool.
* tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/kexec: Fix build failure from uninitialised variable
powerpc/ppc-opcode: Fix PPC_RAW_TW()
powerpc64/ftrace: Fix ftrace for clang builds
powerpc: Make eh value more explicit when using lwarx
powerpc: Don't hide eh field of lwarx behind a macro
powerpc: Fix eh field when calling lwarx on PPC32
Linus Torvalds [Sun, 14 Aug 2022 00:31:18 +0000 (17:31 -0700)]
Merge tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more cifs updates from Steve French:
- two fixes for stable, one for a lock length miscalculation, and
another fixes a lease break timeout bug
- improvement to handle leases, allows the close timeout to be
configured more safely
- five restructuring/cleanup patches
* tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Do not access tcon->cfids->cfid directly from is_path_accessible
cifs: Add constructor/destructors for tcon->cfid
SMB3: fix lease break timeout when multiple deferred close handles for the same file.
smb3: allow deferred close timeout to be configurable
cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir
cifs: Move cached-dir functions into a separate file
cifs: Remove {cifs,nfs}_fscache_release_page()
cifs: fix lock length calculation
David Howells [Wed, 10 Aug 2022 17:52:47 +0000 (18:52 +0100)]
afs: Enable multipage folio support
Enable multipage folio support for the afs filesystem.
Support has already been implemented in netfslib, fscache and cachefiles
and in most of afs, but I've waited for Matthew Wilcox's latest folio
changes.
Note that it does require a change to afs_write_begin() to return the
correct subpage. This is a "temporary" change as we're working on
getting rid of the need for ->write_begin() and ->write_end()
completely, at least as far as network filesystems are concerned - but
it doesn't prevent afs from making use of the capability.
Linus Torvalds [Sat, 13 Aug 2022 21:38:22 +0000 (14:38 -0700)]
Merge tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Misc timer fixes:
- fix a potential use-after-free bug in posix timers
- correct a prototype
- address a build warning"
* tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Cleanup CPU timers before freeing them during exec
time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64
posix-timers: Make do_clock_gettime() static
Linus Torvalds [Sat, 13 Aug 2022 21:24:12 +0000 (14:24 -0700)]
Merge tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"Fix the 'IBPB mitigated RETBleed' mode of operation on AMD CPUs (not
turned on by default), which also need STIBP enabled (if available) to
be '100% safe' on even the shortest speculation windows"
* tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Enable STIBP for IBPB mitigated RETBleed
Linus Torvalds [Sat, 13 Aug 2022 21:00:45 +0000 (14:00 -0700)]
Merge tag 'ntb-5.20' of https://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
"Non-Transparent Bridge updates.
Fix of heap data and clang warnings, support for a new Intel NTB
device, and NTB EndPoint Function (EPF) support and the various fixes
for that"
* tag 'ntb-5.20' of https://github.com/jonmason/ntb:
MAINTAINERS: add PCI Endpoint NTB drivers to NTB files
NTB: EPF: Tidy up some bounds checks
NTB: EPF: Fix error code in epf_ntb_bind()
PCI: endpoint: pci-epf-vntb: reduce several globals to statics
PCI: endpoint: pci-epf-vntb: fix error handle in epf_ntb_mw_bar_init()
PCI: endpoint: Fix Kconfig dependency
NTB: EPF: set pointer addr to null using NULL rather than 0
Documentation: PCI: extend subheading underline for "lspci output" section
Documentation: PCI: Use code-block block for scratchpad registers diagram
Documentation: PCI: Add specification for the PCI vNTB function device
PCI: endpoint: Support NTB transfer between RC and EP
NTB: epf: Allow more flexibility in the memory BAR map method
PCI: designware-ep: Allow pci_epc_set_bar() update inbound map address
ntb: intel: add GNR support for Intel PCIe gen5 NTB
NTB: ntb_tool: uninitialized heap data in tool_fn_write()
ntb: idt: fix clang -Wformat warnings
Linus Torvalds [Sat, 13 Aug 2022 20:50:11 +0000 (13:50 -0700)]
Merge tag 'xfs-5.20-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull more xfs updates from Darrick Wong:
"There's not a lot this time around, just the usual bug fixes and
corrections for missing error returns.
- Return error codes from block device flushes to userspace
- Fix a deadlock between reclaim and mount time quotacheck
- Fix an unnecessary ENOSPC return when doing COW on a filesystem
with severe free space fragmentation
- Fix a miscalculation in the transaction reservation computations
for file removal operations"
* tag 'xfs-5.20-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix inode reservation space for removing transaction
xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork
xfs: fix intermittent hang during quotacheck
xfs: check return codes when flushing block devices