Adding filters with the same values inside for VXLAN and Geneve causes HW
error, because it looks exactly the same. To choose between different
type of tunnels new recipe is needed. Add storing tunnel types in
creating recipes function and start checking it in finding function.
Change getting open tunnels function to return port on correct tunnel
type. This is needed to copy correct port to dummy packet.
Block user from adding enc_dst_port via tc flower, because VXLAN and
Geneve filters can be created only with destination port which was
previously opened.
Fixes: 8b032a55c1bd5 ("ice: low level support for tunnels") Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Sandeep Penigalapati <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
In tunnels packet there can be two UDP headers:
- outer which for hw should be mark as ICE_UDP_OF
- inner which for hw should be mark as ICE_UDP_ILOS or as ICE_TCP_IL if
inner header is of TCP type
In none tunnels packet header can be:
- UDP, which for hw should be mark as ICE_UDP_ILOS
- TCP, which for hw should be mark as ICE_TCP_IL
Change incorrect ICE_UDP_OF for none tunnel packets to ICE_UDP_ILOS.
ICE_UDP_OF is incorrect for none tunnel packets and setting it leads to
error from hw while adding this kind of recipe.
In summary, for tunnel outer port type should always be set to
ICE_UDP_OF, for none tunnel outer and tunnel inner it should always be
set to ICE_UDP_ILOS.
Fixes: 9e300987d4a8 ("ice: VXLAN and Geneve TC support") Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Sandeep Penigalapati <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
Jesse Brandeburg [Sat, 23 Oct 2021 00:28:17 +0000 (17:28 -0700)]
ice: ignore dropped packets during init
If the hardware is constantly receiving unicast or broadcast packets
during driver load, the device previously counted many GLV_RDPC (VSI
dropped packets) events during init. This causes confusing dropped
packet statistics during driver load. The dropped packets counter
incrementing does stop once the driver finishes loading.
Avoid this problem by baselining our statistics at the end of driver
open instead of the end of probe.
Dave Ertman [Tue, 12 Oct 2021 20:31:21 +0000 (13:31 -0700)]
ice: Fix problems with DSCP QoS implementation
The patch that implemented DSCP QoS implementation removed a
bandwidth check that was used to check for a specific condition
caused by some corner cases. This check should not of been
removed.
The same patch also added a check for when the DCBx state could
be changed in relation to DSCP, but the check was erroneously
added nested in a check for CEE mode, which made the check useless.
Fix these problems by re-adding the bandwidth check and relocating
the DSCP mode check earlier in the function that changes DCBx state
in the driver.
Paul Greenwalt [Mon, 12 Jul 2021 11:54:25 +0000 (07:54 -0400)]
ice: rearm other interrupt cause register after enabling VFs
The other interrupt cause register (OICR), global interrupt 0, is
disabled when enabling VFs to prevent handling VFLR. If the OICR is
not rearmed then the VF cannot communicate with the PF.
Rearm the OICR after enabling VFs.
Fixes: 916c7fdf5e93 ("ice: Separate VF VSI initialization/creation from reset flow") Signed-off-by: Paul Greenwalt <[email protected]> Tested-by: Tony Brelinski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
Marc Zyngier [Tue, 23 Nov 2021 18:06:36 +0000 (18:06 +0000)]
PCI: apple: Fix PERST# polarity
Now that PERST# is properly defined as active-low in the device tree, fix
the driver to correctly drive the line independently of the implied
polarity.
Dan Carpenter [Tue, 7 Dec 2021 08:24:16 +0000 (11:24 +0300)]
net/qla3xxx: fix an error code in ql_adapter_up()
The ql_wait_for_drvr_lock() fails and returns false, then this
function should return an error code instead of returning success.
The other problem is that the success path prints an error message
netdev_err(ndev, "Releasing driver lock\n"); Delete that and
re-order the code a little to make it more clear.
Jakub Kicinski [Tue, 7 Dec 2021 18:32:04 +0000 (10:32 -0800)]
Merge tag 'linux-can-fixes-for-5.16-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
can 2021-12-07
The 1st patch is by Vincent Mailhol and fixes a use after free in the
pch_can driver.
Dan Carpenter fixes a use after free in the ems_pcmcia sja1000 driver.
The remaining 7 patches target the m_can driver. Brian Silverman
contributes a patch to disable and ignore the ELO interrupt, which is
currently not handled in the driver and may lead to an interrupt
storm. Vincent Mailhol's patch fixes a memory leak in the error path
of the m_can_read_fifo() function. The remaining patches are
contributed by Matthias Schiffer, first a iomap_read_fifo() and
iomap_write_fifo() functions are fixed in the PCI glue driver, then
the clock rate for the Intel Ekhart Lake platform is fixed, the last 3
patches add support for the custom bit timings on the Elkhart Lake
platform.
* tag 'linux-can-fixes-for-5.16-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: m_can: pci: use custom bit timings for Elkhart Lake
can: m_can: make custom bittiming fields const
Revert "can: m_can: remove support for custom bit timing"
can: m_can: pci: fix incorrect reference clock rate
can: m_can: pci: fix iomap_read_fifo() and iomap_write_fifo()
can: m_can: m_can_read_fifo: fix memory leak in error branch
can: m_can: Disable and ignore ELO interrupt
can: sja1000: fix use after free in ems_pcmcia_add_card()
can: pch_can: pch_can_rx_normal: fix use after free
====================
Darrick J. Wong [Mon, 6 Dec 2021 23:38:20 +0000 (15:38 -0800)]
xfs: remove all COW fork extents when remounting readonly
As part of multiple customer escalations due to file data corruption
after copy on write operations, I wrote some fstests that use fsstress
to hammer on COW to shake things loose. Regrettably, I caught some
filesystem shutdowns due to incorrect rmap operations with the following
loop:
mount <filesystem> # (0)
fsstress <run only readonly ops> & # (1)
while true; do
fsstress <run all ops>
mount -o remount,ro # (2)
fsstress <run only readonly ops>
mount -o remount,rw # (3)
done
When (2) happens, notice that (1) is still running. xfs_remount_ro will
call xfs_blockgc_stop to walk the inode cache to free all the COW
extents, but the blockgc mechanism races with (1)'s reader threads to
take IOLOCKs and loses, which means that it doesn't clean them all out.
Call such a file (A).
When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which
walks the ondisk refcount btree and frees any COW extent that it finds.
This function does not check the inode cache, which means that incore
COW forks of inode (A) is now inconsistent with the ondisk metadata. If
one of those former COW extents are allocated and mapped into another
file (B) and someone triggers a COW to the stale reservation in (A), A's
dirty data will be written into (B) and once that's done, those blocks
will be transferred to (A)'s data fork without bumping the refcount.
The results are catastrophic -- file (B) and the refcount btree are now
corrupt. Solve this race by forcing the xfs_blockgc_free_space to run
synchronously, which causes xfs_icwalk to return to inodes that were
skipped because the blockgc code couldn't take the IOLOCK. This is safe
to do here because the VFS has already prohibited new writer threads.
Fixes: 10ddf64e420f ("xfs: remove leftover CoW reservations when remounting ro") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Chandan Babu R <[email protected]>
Linus Torvalds [Tue, 7 Dec 2021 18:10:20 +0000 (10:10 -0800)]
Merge tag 'platform-drivers-x86-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Various bug-fixes and hardware-id additions"
* tag 'platform-drivers-x86-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/intel: hid: add quirk to support Surface Go 3
platform/x86: amd-pmc: Fix s2idle failures on certain AMD laptops
platform/x86: touchscreen_dmi: Add TrekStor SurfTab duo W1 touchscreen info
platform/x86: lg-laptop: Recognize more models
platform/x86: thinkpad_acpi: Add lid_logo_dot to the list of safe LEDs
platform/x86: thinkpad_acpi: Restore missing hotkey_tablet_mode and hotkey_radio_sw sysfs-attr
RDMA/irdma: Don't arm the CQ more than two times if no CE for this CQ
Completion events (CEs) are lost if the application is allowed to arm the
CQ more than two times when no new CE for this CQ has been generated by
the HW.
Check if arming has been done for the CQ and if not, arm the CQ for any
event otherwise promote to arm the CQ for any event only when the last arm
event was solicited.
RDMA/irdma: Fix a potential memory allocation issue in 'irdma_prm_add_pble_mem()'
'pchunk->bitmapbuf' is a bitmap. Its size (in number of bits) is stored in
'pchunk->sizeofbitmap'.
When it is allocated, the size (in bytes) is computed by:
size_in_bits >> 3
There are 2 issues (numbers bellow assume that longs are 64 bits):
- there is no guarantee here that 'pchunk->bitmapmem.size' is modulo
BITS_PER_LONG but bitmaps are stored as longs
(sizeofbitmap=8 bits will only allocate 1 byte, instead of 8 (1 long))
- the number of bytes is computed with a shift, not a round up, so we
may allocate less memory than needed
(sizeofbitmap=65 bits will only allocate 8 bytes (i.e. 1 long), when 2
longs are needed = 16 bytes)
Fix both issues by using 'bitmap_zalloc()' and remove the useless
'bitmapmem' from 'struct irdma_chunk'.
While at it, remove some useless NULL test before calling
kfree/bitmap_free.
The free_irq() results in a callback to the registered interrupt handler,
and rcd->do_interrupt is NULL because the receive context data structures
are not fully initialized.
Fix by ensuring that the do_interrupt is always assigned and adding a
guards in the slow path handler to detect and handle a partially
initialized receive context and noop the receive.
Ruozhu Li [Thu, 4 Nov 2021 07:13:32 +0000 (15:13 +0800)]
nvme: fix use after free when disconnecting a reconnecting ctrl
A crash happens when trying to disconnect a reconnecting ctrl:
1) The network was cut off when the connection was just established,
scan work hang there waiting for some IOs complete. Those I/Os were
retried because we return BLK_STS_RESOURCE to blk in reconnecting.
2) After a while, I tried to disconnect this connection. This
procedure also hangs because it tried to obtain ctrl->scan_lock.
It should be noted that now we have switched the controller state
to NVME_CTRL_DELETING.
3) In nvme_check_ready(), we always return true when ctrl->state is
NVME_CTRL_DELETING, so those retrying I/Os were issued to the bottom
device which was already freed.
To fix this, when ctrl->state is NVME_CTRL_DELETING, issue cmd to bottom
device only when queue state is live. If not, return host path error to
the block layer
Marc Zyngier [Tue, 23 Nov 2021 18:06:34 +0000 (18:06 +0000)]
PCI: apple: Follow the PCIe specifications when resetting the port
While the Apple PCIe driver works correctly when directly booted from the
firmware, it fails to initialise when the kernel is booted from a
bootloader using PCIe such as u-boot.
That's because we're missing a proper reset of the port (we only clear the
reset, but never assert it).
The PCIe spec requirements are two-fold:
- PERST# must be asserted before setting up the clocks and stay asserted
for at least 100us (Tperst-clk)
- Once PERST# is deasserted, the OS must wait for at least 100ms "from
the end of a Conventional Reset" before we can start talking to the
devices
Jeffle Xu [Tue, 7 Dec 2021 03:14:49 +0000 (11:14 +0800)]
netfs: fix parameter of cleanup()
The order of these two parameters is just reversed. gcc didn't warn on
that, probably because 'void *' can be converted from or to other
pointer types without warning.
David Howells [Tue, 7 Dec 2021 09:53:24 +0000 (09:53 +0000)]
netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lock
Taking sb_writers whilst holding mmap_lock isn't allowed and will result in
a lockdep warning like that below. The problem comes from cachefiles
needing to take the sb_writers lock in order to do a write to the cache,
but being asked to do this by netfslib called from readpage, readahead or
write_begin[1].
Fix this by always offloading the write to the cache off to a worker
thread. The main thread doesn't need to wait for it, so deadlock can be
avoided.
This can be tested by running the quick xfstests on something like afs or
ceph with lockdep enabled.
WARNING: possible circular locking dependency detected
5.15.0-rc1-build2+ #292 Not tainted
------------------------------------------------------
holetest/65517 is trying to acquire lock: ffff88810c81d730 (mapping.invalidate_lock#3){.+.+}-{3:3}, at: filemap_fault+0x276/0x7a5
but task is already holding lock: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
can: m_can: pci: use custom bit timings for Elkhart Lake
The relevant datasheet [1] specifies nonstandard limits for the bit timing
parameters. While it is unclear what the exact effect of violating these
limits is, it seems like a good idea to adhere to the documentation.
[1] Intel Atom® x6000E Series, and Intel® Pentium® and Celeron® N and J
Series Processors for IoT Applications Datasheet,
Volume 2 (Book 3 of 3), July 2021, Revision 001
When testing the CAN controller on our Ekhart Lake hardware, we
determined that all communication was running with twice the configured
bitrate. Changing the reference clock rate from 100MHz to 200MHz fixed
this. Intel's support has confirmed to us that 200MHz is indeed the
correct clock rate.
can: m_can: pci: fix iomap_read_fifo() and iomap_write_fifo()
The same fix that was previously done in m_can_platform in commit 99d173fbe894 ("can: m_can: fix iomap_read_fifo() and iomap_write_fifo()")
is required in m_can_pci as well to make iomap_read_fifo() and
iomap_write_fifo() work for val_count > 1.
Vincent Mailhol [Sun, 7 Nov 2021 05:07:55 +0000 (14:07 +0900)]
can: m_can: m_can_read_fifo: fix memory leak in error branch
In m_can_read_fifo(), if the second call to m_can_fifo_read() fails,
the function jump to the out_fail label and returns without calling
m_can_receive_skb(). This means that the skb previously allocated by
alloc_can_skb() is not freed. In other terms, this is a memory leak.
This patch adds a goto label to destroy the skb if an error occurs.
Issue was found with GCC -fanalyzer, please follow the link below for
details.
Brian Silverman [Mon, 29 Nov 2021 22:26:28 +0000 (14:26 -0800)]
can: m_can: Disable and ignore ELO interrupt
With the design of this driver, this condition is often triggered.
However, the counter that this interrupt indicates an overflow is never
read either, so overflowing is harmless.
On my system, when a CAN bus starts flapping up and down, this locks up
the whole system with lots of interrupts and printks.
Specifically, this interrupt indicates the CEL field of ECR has
overflowed. All reads of ECR mask out CEL.
Vincent Mailhol [Tue, 23 Nov 2021 11:16:54 +0000 (20:16 +0900)]
can: pch_can: pch_can_rx_normal: fix use after free
After calling netif_receive_skb(skb), dereferencing skb is unsafe.
Especially, the can_frame cf which aliases skb memory is dereferenced
just after the call netif_receive_skb(skb).
Roman Bolshakov [Fri, 12 Nov 2021 14:54:46 +0000 (17:54 +0300)]
scsi: qla2xxx: Format log strings only if needed
Commit 598a90f2002c ("scsi: qla2xxx: add ring buffer for tracing debug
logs") introduced unconditional log string formatting to ql_dbg() even if
ql_dbg_log event is disabled. It harms performance because some strings are
formatted in fastpath and/or interrupt context.
scsi: scsi_debug: Fix buffer size of REPORT ZONES command
According to ZBC and SPC specifications, the unit of ALLOCATION LENGTH
field of REPORT ZONES command is byte. However, current scsi_debug
implementation handles it as number of zones to calculate buffer size to
report zones. When the ALLOCATION LENGTH has a large number, this results
in too large buffer size and causes memory allocation failure. Fix the
failure by handling ALLOCATION LENGTH as byte unit.
When issued LUN reset under heavy I/O we hit the qedi WARN_ON because of a
mismatch in firmware I/O cmd cleanup request count and I/O cmd cleanup
response count received. The mismatch is because of a race caused by the
postfix increment of cmd_cleanup_cmpl.
Song Liu [Fri, 3 Dec 2021 19:32:34 +0000 (19:32 +0000)]
perf bpf: Fix building perf with BUILD_BPF_SKEL=1 by default in more distros
Arnaldo reported that building all his containers with BUILD_BPF_SKEL=1
to then make this the default he found problems in some distros where
the system linux/bpf.h file was being used and lacked this:
util/bpf_skel/bperf_leader.bpf.c:13:20: error: use of undeclared identifier 'BPF_F_PRESERVE_ELEMS'
__uint(map_flags, BPF_F_PRESERVE_ELEMS);
So use instead the vmlinux.h file generated by bpftool from BTF info.
This fixed these as well, getting the build back working on debian:11,
debian:experimental and ubuntu:21.10:
In file included from In file included from util/bpf_skel/bperf_leader.bpf.cutil/bpf_skel/bpf_prog_profiler.bpf.c::33:
:
In file included from In file included from /usr/include/linux/bpf.h/usr/include/linux/bpf.h::1111:
:
/usr/include/linux/types.h/usr/include/linux/types.h::55::1010:: In file included from util/bpf_skel/bperf_follower.bpf.c:3fatal errorfatal error:
: : In file included from /usr/include/linux/bpf.h:'asm/types.h' file not found11'asm/types.h' file not found:
/usr/include/linux/types.h:5:10: fatal error: 'asm/types.h' file not found
#include <asm/types.h>#include <asm/types.h>
Ian Rogers [Sun, 28 Nov 2021 08:58:10 +0000 (00:58 -0800)]
perf test: Reset shadow counts before loading
Otherwise load counting is an average. Without this change
duration_time in test_memory_bandwidth will alter its value if an
earlier test contains duration_time.
This patch fixes an issue that's introduced in the proposed patch:
https://lore.kernel.org/lkml/20211124015226.3317994[email protected]/
in perf test "Parse and process metrics".
if (uname(&uts) < 0)
return false;
if (strncmp(uts.machine, "x86_64", 6))
return false;
....
}
which always returns false on s390. The caller build_cpu_topology()
checks has_die_topology() return value. On false the the struct
cpu_topology::die_cpu_list is not contructed and has zero entries. This
leads to the failing comparison: #num_dies >= #num_packages. s390 of
course has a positive number of packages.
Fix this and check if the function build_cpu_topology() did build up
a die_cpus_list. The number of entries in this list should be larger
than 0. If the number of list element is zero, the die_cpus_list has
not been created and the check in function test__expr():
tools build: Remove needless libpython-version feature check that breaks test-all fast path
Since 66dfdff03d196e51 ("perf tools: Add Python 3 support") we don't use
the tools/build/feature/test-libpython-version.c version in any Makefile
feature check:
$ find tools/ -type f | xargs grep feature-libpython-version
$
The only place where this was used was removed in 66dfdff03d196e51:
And nowadays we either build with PYTHON=python3 or just install the
python3 devel packages and perf will build against it.
But the leftover feature-libpython-version check made the fast path
feature detection to break in all cases except when python2 devel files
were installed:
As python3 is the norm these days, fix this by just removing the unused
feature-libpython-version feature check, making the test-all fast path
to work with the common case.
Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'
diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
Adrian Hunter [Thu, 25 Nov 2021 07:14:57 +0000 (09:14 +0200)]
perf inject: Fix itrace space allowed for new attributes
The space allowed for new attributes can be too small if existing header
information is large. That can happen, for example, if there are very
many CPUs, due to having an event ID per CPU per event being stored in the
header information.
Fix by adding the existing header.data_offset. Also increase the extra
space allowed to 8KiB and align to a 4KiB boundary for neatness.
That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.
$ grep futex tools/perf/arch/s390/entry/syscalls/syscall.tbl
238 common futex sys_futex sys_futex_time32
422 32 futex_time64 - sys_futex
449 common futex_waitv sys_futex_waitv sys_futex_waitv
$
This addresses this perf build warnings:
Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl'
diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRRR FAILED!
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRRR FAILED!
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRR FAILED!
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRRRRRRRR Ok
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRR FAILED!
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRR Ok
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRRRRR Ok
yep, it seems the perf bench is broken so the counts won't correlated if
I revert this one:
92723ea0f11d perf bench: Fix two memory leaks detected with ASan
it works for me again.. it seems to break -t option
Antoine Tenart [Fri, 3 Dec 2021 10:13:18 +0000 (11:13 +0100)]
ethtool: do not perform operations on net devices being unregistered
There is a short period between a net device starts to be unregistered
and when it is actually gone. In that time frame ethtool operations
could still be performed, which might end up in unwanted or undefined
behaviours[1].
Do not allow ethtool operations after a net device starts its
unregistration. This patch targets the netlink part as the ioctl one
isn't affected: the reference to the net device is taken and the
operation is executed within an rtnl lock section and the net device
won't be found after unregister.
[1] For example adding Tx queues after unregister ends up in NULL
pointer exceptions and UaFs, such as:
BUG: KASAN: use-after-free in kobject_get+0x14/0x90
Read of size 1 at addr ffff88801961248c by task ethtool/755
Makefile: Do not quote value for CONFIG_CC_IMPLICIT_FALLTHROUGH
Andreas reported that a specific build environment for an external
module, being a bit broken, does pass CC_IMPLICIT_FALLTHROUGH quoted as
argument to gcc, causing an error
gcc-11: error: "-Wimplicit-fallthrough=5": linker input file not found: No such file or directory
Until this is more generally fixed as outlined in [1], by fixing
scripts/link-vmlinux.sh, scripts/gen_autoksyms.sh, etc to not directly
include the include/config/auto.conf, and in a second step, change
Kconfig to generate the auto.conf without "", workaround the issue by
explicitly unquoting CC_IMPLICIT_FALLTHROUGH.
dt-bindings: input: gpio-keys: Fix interrupts in example
The "interrupts" property in the example looks weird:
- The type is not in the last cell,
- Level interrupts don't work well with gpio-keys, as they keep the
interrupt asserted as long as the key is pressed, causing an
interrupt storm.
Use a more realistic falling-edge interrupt instead.
Alexander Stein [Tue, 30 Nov 2021 08:27:56 +0000 (09:27 +0100)]
dt-bindings: net: Reintroduce PHY no lane swap binding
This binding was already documented in phy.txt, commit 252ae5330daa
("Documentation: devicetree: Add PHY no lane swap binding"), but got
accidently removed during YAML conversion in commit d8704342c109
("dt-bindings: net: Add a YAML schemas for the generic PHY options").
Note: 'enet-phy-lane-no-swap' and the absence of 'enet-phy-lane-swap' are
not identical, as the former one disable this feature, while the latter
one doesn't change anything.
Linus Torvalds [Mon, 6 Dec 2021 18:46:20 +0000 (10:46 -0800)]
Merge tag 'docs-5.16-3' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A few important documentation fixes, including breakage that comes
with v1.0 of the ReadTheDocs theme"
* tag 'docs-5.16-3' of git://git.lwn.net/linux:
Documentation: Add minimum pahole version
Documentation/process: fix self reference
docs: admin-guide/blockdev: Remove digraph of node-states
docs: conf.py: fix support for Readthedocs v 1.0.0
Jens Axboe [Mon, 6 Dec 2021 17:49:04 +0000 (10:49 -0700)]
io-wq: remove spurious bit clear on task_work addition
There's a small race here where the task_work could finish and drop
the worker itself, so that by the time that task_work_add() returns
with a successful addition we've already put the worker.
The worker callbacks clear this bit themselves, so we don't actually
need to manually clear it in the caller. Get rid of it.
Norbert Zulinski [Mon, 22 Nov 2021 11:29:05 +0000 (12:29 +0100)]
i40e: Fix NULL pointer dereference in i40e_dbg_dump_desc
When trying to dump VFs VSI RX/TX descriptors
using debugfs there was a crash
due to NULL pointer dereference in i40e_dbg_dump_desc.
Added a check to i40e_dbg_dump_desc that checks if
VSI type is correct for dumping RX/TX descriptors.
After setting pre-set combined to 16 queues and reserving 16 queues by
tc qdisc, pre-set maximum combined queues returned to default value
after VF reset being 4 and this generated errors during removing tc.
Fixed by removing clear num_req_queues before reset VF.
Karen Sornek [Fri, 14 May 2021 09:43:13 +0000 (11:43 +0200)]
i40e: Fix failed opcode appearing if handling messages from VF
Fix failed operation code appearing if handling messages from VF.
Implemented by waiting for VF appropriate state if request starts
handle while VF reset.
Without this patch the message handling request while VF is in
a reset state ends with error -5 (I40E_ERR_PARAM).
iavf_set_ringparams doesn't communicate to the user that
1. The user requested descriptor count is out of range. Instead it
just quietly sets descriptors to the "clamped" value and calls it
done. This makes it look an invalid value was successfully set as
the descriptor count when this isn't actually true.
2. The user provided descriptor count needs to be inflated for alignment
reasons.
This behavior is confusing. The ice driver has already addressed this
by rejecting invalid values for descriptor count and
messaging for alignment adjustments.
Do the same thing here by adding the error and info messages.
Takashi Iwai [Mon, 6 Dec 2021 16:25:10 +0000 (17:25 +0100)]
Merge tag 'asoc-fix-v5.16-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.16
A relatively large collection of updates, the size is increased quite a
bit by there being some repetitive changes for similar issues that occur
multiple times with both notifying control value changes and runtime PM.
The Rockchip update looks at first glance like a cleanup but fixes
instantiation of the hardware on some systems.
Olivia Mackintosh has posted to alsa-devel reporting that
there's a potential bug that could break mixer quirks for Pioneer
devices introduced by 6d27788160362a7ee6c0d317636fe4b1ddbe59a7
"ALSA: usb-audio: Add support for the Pioneer DJM 750MK2
Mixer/Soundcard".
This happened because the DJM 750 MK2 was added last to the Pioneer DJM
device table index and defined as 0x4 but was added to snd_djm_devices[]
just after the DJM 750 (MK1) entry instead of last, after the DJM 900
NXS2. This escaped review.
To prevent that from ever happening again, Takashi Iwai suggested to use
C99 array designators in snd_djm_devices[] instead of simply reordering
the entries.
Niklas Cassel [Fri, 26 Nov 2021 10:42:44 +0000 (10:42 +0000)]
nvme: report write pointer for a full zone as zone start + zone len
The write pointer in NVMe ZNS is invalid for a zone in zone state full.
The same also holds true for ZAC/ZBC.
The current behavior for NVMe is to simply propagate the wp reported by
the drive, even for full zones. Since the wp is invalid for a full zone,
the wp reported by the drive may be any value.
The way that the sd_zbc driver handles a full zone is to always report
the wp as zone start + zone len, regardless of what the drive reported.
null_blk also follows this convention.
Do the same for NVMe, so that a BLKREPORTZONE ioctl reports the write
pointer for a full zone in a consistent way, regardless of the interface
of the underlying zoned block device.
blkzone report before patch:
start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0xfffffffffffbfff8
reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)]
blkzone report after patch:
start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0x040000 reset:0
non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)]
Keith Busch [Tue, 30 Nov 2021 16:14:54 +0000 (08:14 -0800)]
nvme: disable namespace access for unsupported metadata
The only fabrics target that supports metadata handling through the
separate integrity buffer is RDMA. It is currently usable only if the
size is 8B per block and formatted for protection information. If an
rdma target were to export a namespace with a different format (ex:
4k+64B), the driver will not be able to submit valid read/write commands
for that namespace.
Suppress setting the metadata feature in the namespace so that the
gendisk capacity will be set to 0. This will prevent read/write access
through the block stack, but will continue to allow ioctl passthrough
commands.
Keith Busch [Mon, 29 Nov 2021 16:24:34 +0000 (08:24 -0800)]
nvme: show subsys nqn for duplicate cntlids
The driver assigned nvme handle isn't persistent across reboots, so is
not enough information to match up where the collisions are occuring.
Add the subsys nqn string to the output so that it can more easily be
identified later.
Damien Le Moal [Thu, 2 Dec 2021 06:27:08 +0000 (15:27 +0900)]
ata: ahci_ceva: Fix id array access in ceva_ahci_read_id()
ATA IDENTIFY command returns an array of le16 words. Accessing it as a
u16 array triggers the following sparse warning:
drivers/ata/ahci_ceva.c:107:33: warning: invalid assignment: &=
drivers/ata/ahci_ceva.c:107:33: left side has type unsigned short
drivers/ata/ahci_ceva.c:107:33: right side has type restricted __le16
Use a local variable to explicitly cast the id array to __le16 to avoid
this warning.
Linus Torvalds [Sun, 5 Dec 2021 20:58:18 +0000 (12:58 -0800)]
Merge tag 'for-5.16/parisc-6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"Some bug and warning fixes:
- Fix "make install" to use debians "installkernel" script which is
now in /usr/sbin
- Fix the bindeb-pkg make target by giving the correct KBUILD_IMAGE
file name
- Fix compiler warnings by annotating parisc agp init functions with
__init
- Fix timekeeping on SMP machines with dual-core CPUs
- Enable some more config options in the 64-bit defconfig"
* tag 'for-5.16/parisc-6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Mark cr16 CPU clocksource unstable on all SMP machines
parisc: Fix "make install" on newer debian releases
parisc/agp: Annotate parisc agp init functions with __init
parisc: Enable sata sil, audit and usb support on 64-bit defconfig
parisc: Fix KBUILD_IMAGE for self-extracting kernel
Linus Torvalds [Sun, 5 Dec 2021 17:34:57 +0000 (09:34 -0800)]
Merge tag 'usb-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for a few reported issues. Included in
here are:
- xhci fix for a _much_ reported regression. I don't think there's a
community distro that has not reported this problem yet :(
- new USB quirk addition
- cdns3 minor fixes
- typec regression fix.
All of these have been in linux-next with no reported problems, and
the xhci fix has been reported by many to resolve their reported
problem"
* tag 'usb-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: cdnsp: Fix a NULL pointer dereference in cdnsp_endpoint_init()
usb: cdns3: gadget: fix new urb never complete if ep cancel previous requests
usb: typec: tcpm: Wait in SNK_DEBOUNCED until disconnect
USB: NO_LPM quirk Lenovo Powered USB-C Travel Hub
xhci: Fix commad ring abort, write all 64 bits to CRCR register.
Linus Torvalds [Sun, 5 Dec 2021 17:13:20 +0000 (09:13 -0800)]
Merge tag 'tty-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small TTY and Serial driver fixes for 5.16-rc4 to
resolve a number of reported problems.
They include:
- liteuart serial driver fixes
- 8250_pci serial driver fixes for pericom devices
- 8250 RTS line control fix while in RS-485 mode
- tegra serial driver fix
- msm_serial driver fix
- pl011 serial driver new id
- fsl_lpuart revert of broken change
- 8250_bcm7271 serial driver fix
- MAINTAINERS file update for rpmsg tty driver that came in 5.16-rc1
- vgacon fix for reported problem
All of these, except for the 8250_bcm7271 fix have been in linux-next
with no reported problem. The 8250_bcm7271 fix was added to the tree
on Friday so no chance to be linux-next yet. But it should be fine as
the affected developers submitted it"
* tag 'tty-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250_bcm7271: UART errors after resuming from S2
serial: 8250_pci: rewrite pericom_do_set_divisor()
serial: 8250_pci: Fix ACCES entries in pci_serial_quirks array
serial: 8250: Fix RTS modem control while in rs485 mode
Revert "tty: serial: fsl_lpuart: drop earlycon entry for i.MX8QXP"
serial: tegra: Change lower tolerance baud rate limit for tegra20 and tegra30
serial: liteuart: relax compile-test dependencies
serial: liteuart: fix minor-number leak on probe errors
serial: liteuart: fix use-after-free and memleak on unbind
serial: liteuart: Fix NULL pointer dereference in ->remove()
vgacon: Propagate console boot parameters before calling `vc_resize'
tty: serial: msm_serial: Deactivate RX DMA for polling support
serial: pl011: Add ACPI SBSA UART match id
serial: core: fix transmit-buffer reset and memleak
MAINTAINERS: Add rpmsg tty driver maintainer
Linus Torvalds [Sun, 5 Dec 2021 16:58:52 +0000 (08:58 -0800)]
Merge tag 'timers_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Borislav Petkov:
- Prevent a tick storm when a dedicated timekeeper CPU in nohz_full
mode runs for prolonged periods with interrupts disabled and ends up
programming the next tick in the past, leading to that storm
* tag 'timers_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers/nohz: Last resort update jiffies on nohz_full IRQ entry
Linus Torvalds [Sun, 5 Dec 2021 16:53:31 +0000 (08:53 -0800)]
Merge tag 'sched_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Properly init uclamp_flags of a runqueue, on first enqueuing
- Fix preempt= callback return values
- Correct utime/stime resource usage reporting on nohz_full to return
the proper times instead of shorter ones
* tag 'sched_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/uclamp: Fix rq->uclamp_max not set on first enqueue
preempt/dynamic: Fix setup_preempt_mode() return value
sched/cputime: Fix getrusage(RUSAGE_THREAD) with nohz_full
Linus Torvalds [Sun, 5 Dec 2021 16:43:35 +0000 (08:43 -0800)]
Merge tag 'x86_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Fix a couple of SWAPGS fencing issues in the x86 entry code
- Use the proper operand types in __{get,put}_user() to prevent
truncation in SEV-ES string io
- Make sure the kernel mappings are present in trampoline_pgd in order
to prevent any potential accesses to unmapped memory after switching
to it
- Fix a trivial list corruption in objtool's pv_ops validation
- Disable the clocksource watchdog for TSC on platforms which claim
that the TSC is constant, doesn't stop in sleep states, CPU has TSC
adjust and the number of sockets of the platform are max 2, to
prevent erroneous markings of the TSC as unstable.
- Make sure TSC adjust is always checked not only when going idle
- Prevent a stack leak by initializing struct _fpx_sw_bytes properly in
the FPU code
- Fix INTEL_FAM6_RAPTORLAKE define naming to adhere to the convention
* tag 'x86_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/xen: Add xenpv_restore_regs_and_return_to_usermode()
x86/entry: Use the correct fence macro after swapgs in kernel CR3
x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry()
x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qword
x86/64/mm: Map all kernel memory into trampoline_pgd
objtool: Fix pv_ops noinstr validation
x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
x86/tsc: Add a timer to make sure TSC_adjust is always checked
x86/fpu/signal: Initialize sw_bytes in save_xstate_epilog()
x86/cpu: Drop spurious underscore from RAPTOR_LAKE #define
Linus Torvalds [Sun, 5 Dec 2021 16:25:33 +0000 (08:25 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm fixes from Paolo Bonzini:
- Static analysis fix
- New SEV-ES protocol for communicating invalid VMGEXIT requests
- Ensure APICv is considered inactive if there is no APIC
- Fix reserved bits for AMD PerfEvtSeln register
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure
KVM: SEV: Fall back to vmalloc for SEV-ES scratch area if necessary
KVM: SEV: Return appropriate error codes if SEV-ES scratch setup fails
KVM: x86/mmu: Retry page fault if root is invalidated by memslot update
KVM: VMX: Set failure code in prepare_vmcs02()
KVM: ensure APICv is considered inactive if there is no APIC
KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register
Tom Lendacky [Wed, 20 Oct 2021 18:02:11 +0000 (13:02 -0500)]
x86/sme: Explicitly map new EFI memmap table as encrypted
Reserving memory using efi_mem_reserve() calls into the x86
efi_arch_mem_reserve() function. This function will insert a new EFI
memory descriptor into the EFI memory map representing the area of
memory to be reserved and marking it as EFI runtime memory. As part
of adding this new entry, a new EFI memory map is allocated and mapped.
The mapping is where a problem can occur. This new memory map is mapped
using early_memremap() and generally mapped encrypted, unless the new
memory for the mapping happens to come from an area of memory that is
marked as EFI_BOOT_SERVICES_DATA memory. In this case, the new memory will
be mapped unencrypted. However, during replacement of the old memory map,
efi_mem_type() is disabled, so the new memory map will now be long-term
mapped encrypted (in efi.memmap), resulting in the map containing invalid
data and causing the kernel boot to crash.
Since it is known that the area will be mapped encrypted going forward,
explicitly map the new memory map as encrypted using early_memremap_prot().
Tom Lendacky [Thu, 2 Dec 2021 18:52:05 +0000 (12:52 -0600)]
KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure
Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT
exit code or exit parameters fails.
The VMGEXIT instruction can be issued from userspace, even though
userspace (likely) can't update the GHCB. To prevent userspace from being
able to kill the guest, return an error through the GHCB when validation
fails rather than terminating the guest. For cases where the GHCB can't be
updated (e.g. the GHCB can't be mapped, etc.), just return back to the
guest.
The new error codes are documented in the lasest update to the GHCB
specification.
KVM: SEV: Fall back to vmalloc for SEV-ES scratch area if necessary
Use kvzalloc() to allocate KVM's buffer for SEV-ES's GHCB scratch area so
that KVM falls back to __vmalloc() if physically contiguous memory isn't
available. The buffer is purely a KVM software construct, i.e. there's
no need for it to be physically contiguous.
Return appropriate error codes if setting up the GHCB scratch area for an
SEV-ES guest fails. In particular, returning -EINVAL instead of -ENOMEM
when allocating the kernel buffer could be confusing as userspace would
likely suspect a guest issue.
Linus Torvalds [Sat, 4 Dec 2021 21:43:52 +0000 (13:43 -0800)]
Merge tag '5.16-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Three SMB3 multichannel/fscache fixes and a DFS fix.
In testing multichannel reconnect scenarios recently various problems
with the cifs.ko implementation of fscache were found (e.g. incorrect
initialization of fscache cookies in some cases)"
* tag '5.16-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: avoid use of dstaddr as key for fscache client cookie
cifs: add server conn_id to fscache client cookie
cifs: wait for tcon resource_id before getting fscache super
cifs: fix missed refcounting of ipc tcon
Helge Deller [Sat, 4 Dec 2021 20:21:46 +0000 (21:21 +0100)]
parisc: Mark cr16 CPU clocksource unstable on all SMP machines
In commit c8c3735997a3 ("parisc: Enhance detection of synchronous cr16
clocksources") I assumed that CPUs on the same physical core are syncronous.
While booting up the kernel on two different C8000 machines, one with a
dual-core PA8800 and one with a dual-core PA8900 CPU, this turned out to be
wrong. The symptom was that I saw a jump in the internal clocks printed to the
syslog and strange overall behaviour. On machines which have 4 cores (2
dual-cores) the problem isn't visible, because the current logic already marked
the cr16 clocksource unstable in this case.
This patch now marks the cr16 interval timers unstable if we have more than one
CPU in the system, and it fixes this issue.
Helge Deller [Sat, 4 Dec 2021 20:14:40 +0000 (21:14 +0100)]
parisc: Fix "make install" on newer debian releases
On newer debian releases the debian-provided "installkernel" script is
installed in /usr/sbin. Fix the kernel install.sh script to look for the
script in this directory as well.
Linus Torvalds [Sat, 4 Dec 2021 16:34:59 +0000 (08:34 -0800)]
Merge tag 'io_uring-5.16-2021-12-03' of git://git.kernel.dk/linux-block
Pull io_uring fix from Jens Axboe:
"Just a single fix preventing repeated retries of task_work based io-wq
thread creation, fixing a regression from when io-wq was made more (a
bit too much) resilient against signals"
* tag 'io_uring-5.16-2021-12-03' of git://git.kernel.dk/linux-block:
io-wq: don't retry task_work creation failure on fatal conditions
Linus Torvalds [Sat, 4 Dec 2021 16:28:42 +0000 (08:28 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two patches, both in drivers.
One is a fix to FC recovery (lpfc) and the other is an enhancement to
support the Intel Alder Motherboard with the UFS driver which comes
under the -rc exception process for hardware enabling"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: ufs-pci: Add support for Intel ADL
scsi: lpfc: Fix non-recovery of remote ports following an unsolicited LOGO
Linus Torvalds [Sat, 4 Dec 2021 16:13:20 +0000 (08:13 -0800)]
Merge tag 'gfs2-v5.16-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
- Since commit 486408d690e1 ("gfs2: Cancel remote delete work
asynchronously"), inode create and lookup-by-number can overlap more
easily and we can end up with temporary duplicate inodes. Fix the
code to prevent that.
- Fix a BUG demoting weak glock holders from a remote node.
* tag 'gfs2-v5.16-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: gfs2_create_inode rework
gfs2: gfs2_inode_lookup rework
gfs2: gfs2_inode_lookup cleanup
gfs2: Fix remote demote of weak glock holders
Qais Yousef [Thu, 2 Dec 2021 11:20:33 +0000 (11:20 +0000)]
sched/uclamp: Fix rq->uclamp_max not set on first enqueue
Commit d81ae8aac85c ("sched/uclamp: Fix initialization of struct
uclamp_rq") introduced a bug where uclamp_max of the rq is not reset to
match the woken up task's uclamp_max when the rq is idle.
The code was relying on rq->uclamp_max initialized to zero, so on first
enqueue
if (uc_se->value > READ_ONCE(uc_rq->value))
WRITE_ONCE(uc_rq->value, uc_se->value);
}
was actually resetting it. But since commit d81ae8aac85c changed the
default to 1024, this no longer works. And since rq->uclamp_flags is
also initialized to 0, neither above code path nor uclamp_idle_reset()
update the rq->uclamp_max on first wake up from idle.
This is only visible from first wake up(s) until the first dequeue to
idle after enabling the static key. And it only matters if the
uclamp_max of this task is < 1024 since only then its uclamp_max will be
effectively ignored.
Fix it by properly initializing rq->uclamp_flags = UCLAMP_FLAG_IDLE to
ensure uclamp_idle_reset() is called which then will update the rq
uclamp_max value as expected.
Lee Jones [Thu, 2 Dec 2021 14:34:37 +0000 (14:34 +0000)]
net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero
Currently, due to the sequential use of min_t() and clamp_t() macros,
in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is not set, the logic
sets tx_max to 0. This is then used to allocate the data area of the
SKB requested later in cdc_ncm_fill_tx_frame().
This does not cause an issue presently because when memory is
allocated during initialisation phase of SKB creation, more memory
(512b) is allocated than is required for the SKB headers alone (320b),
leaving some space (512b - 320b = 192b) for CDC data (172b).
However, if more elements (for example 3 x u64 = [24b]) were added to
one of the SKB header structs, say 'struct skb_shared_info',
increasing its original size (320b [320b aligned]) to something larger
(344b [384b aligned]), then suddenly the CDC data (172b) no longer
fits in the spare SKB data area (512b - 384b = 128b).
Consequently the SKB bounds checking semantics fails and panics:
By overriding the max value with the default CDC_NCM_NTB_MAX_SIZE_TX
when not offered through the system provided params, we ensure enough
data space is allocated to handle the CDC data, meaning no crash will
occur.