net: sundance: Replace one-element array with non-array object
It seems this one-element array is not actually being used as an
array of variable size, so we can just replace it with just a
non-array object of type struct desc_frag and refactor a bit the
rest of the code.
This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().
This issue was found with the help of Coccinelle and audited and fixed,
manually.
bnx2x: Replace one-element array with flexible-array member
There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].
This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().
This issue was found with the help of Coccinelle and audited and fixed,
manually.
Vladimir Oltean [Fri, 4 Feb 2022 23:03:21 +0000 (01:03 +0200)]
net: mscc: ocelot: fix all IP traffic getting trapped to CPU with PTP over IP
The filters for the PTP trap keys are incorrectly configured, in the
sense that is2_entry_set() only looks at trap->key.ipv4.dport or
trap->key.ipv6.dport if trap->key.ipv4.proto or trap->key.ipv6.proto is
set to IPPROTO_TCP or IPPROTO_UDP.
But we don't do that, so is2_entry_set() goes through the "else" branch
of the IP protocol check, and ends up installing a rule for "Any IP
protocol match" (because msk is also 0). The UDP port is ignored.
This means that when we run "ptp4l -i swp0 -4", all IP traffic is
trapped to the CPU, which hinders bridging.
Fix this by specifying the IP protocol in the VCAP IS2 filters for PTP
over UDP.
Fixes: 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Main goal of this series is to be able to detect the following case
which apparently is still haunting us.
dev_hold_track(dev, tracker_1, GFP_ATOMIC);
dev_hold(dev);
dev_put(dev);
dev_put(dev); // Should complain loudly here.
dev_put_track(dev, tracker_1); // instead of here (as before this series)
v2: third patch:
I replaced the dev_put() in linkwatch_do_dev() with __dev_put().
====================
Eric Dumazet [Fri, 4 Feb 2022 22:42:37 +0000 (14:42 -0800)]
net: refine dev_put()/dev_hold() debugging
We are still chasing some syzbot reports where we think a rogue dev_put()
is called with no corresponding prior dev_hold().
Unfortunately it eats a reference on dev->dev_refcnt taken by innocent
dev_hold_track(), meaning that the refcount saturation splat comes
too late to be useful.
Make sure that 'not tracked' dev_put() and dev_hold() better use
CONFIG_NET_DEV_REFCNT_TRACKER=y debug infrastructure:
Prior patch in the series allowed ref_tracker_alloc() and ref_tracker_free()
to be called with a NULL @trackerp parameter, and to use a separate refcount
only to detect too many put() even in the following case:
dev_hold_track(dev, tracker_1, GFP_ATOMIC);
dev_hold(dev);
dev_put(dev);
dev_put(dev); // Should complain loudly here.
dev_put_track(dev, tracker_1); // instead of here
Add clarification about netdev_tracker_alloc() role.
v2: I replaced the dev_put() in linkwatch_do_dev()
with __dev_put() because callers called netdev_tracker_free().
Eric Dumazet [Fri, 4 Feb 2022 22:42:36 +0000 (14:42 -0800)]
ref_tracker: add a count of untracked references
We are still chasing a netdev refcount imbalance, and we suspect
we have one rogue dev_put() that is consuming a reference taken
from a dev_hold_track()
To detect this case, allow ref_tracker_alloc() and ref_tracker_free()
to be called with a NULL @trackerp parameter, and use a dedicated
refcount_t just for them.
Jakub Kicinski [Fri, 4 Feb 2022 15:59:27 +0000 (07:59 -0800)]
net: dsa: realtek: don't default Kconfigs to y
We generally default the vendor to y and the drivers itself
to n. NET_DSA_REALTEK, however, selects a whole bunch of things,
so it's not a pure "vendor selection" knob. Let's default it all
to n.
David S. Miller [Sat, 5 Feb 2022 15:13:52 +0000 (15:13 +0000)]
Merge branch 'gro-minor-opts'
Paolo Abeni says:
====================
gro: a couple of minor optimization
This series collects a couple of small optimizations for the GRO engine,
reducing slightly the number of cycles for dev_gro_receive().
The delta is within noise range in tput tests, but with big TCP coming
every cycle saved from the GRO engine will count - I hope ;)
v1 -> v2:
- a few cleanup suggested from Alexander(s)
- moved away the more controversial 3rd patch
====================
Paolo Abeni [Fri, 4 Feb 2022 11:28:37 +0000 (12:28 +0100)]
net: gro: minor optimization for dev_gro_receive()
While inspecting some perf report, I noticed that the compiler
emits suboptimal code for the napi CB initialization, fetching
and storing multiple times the memory for flags bitfield.
This is with gcc 10.3.1, but I observed the same with older compiler
versions.
We can help the compiler to do a nicer work clearing several
fields at once using an u32 alias. The generated code is quite
smaller, with the same number of conditional.
Paolo Abeni [Fri, 4 Feb 2022 11:28:36 +0000 (12:28 +0100)]
net: gro: avoid re-computing truesize twice on recycle
After commit 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb
carring sock reference") and commit af352460b465 ("net: fix GRO
skb truesize update") the truesize of the skb with stolen head is
properly updated by the GRO engine, we don't need anymore resetting
it at recycle time.
v1 -> v2:
- clarify the commit message (Alexander)
Dan Carpenter [Fri, 4 Feb 2022 10:03:36 +0000 (13:03 +0300)]
net: dsa: qca8k: check correct variable in qca8k_phy_eth_command()
This is a copy and paste bug. It was supposed to check "clear_skb"
instead of "write_skb".
Fixes: 2cd548566384 ("net: dsa: qca8k: add support for phy read/write with mgmt Ethernet") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Horatiu Vultur [Fri, 4 Feb 2022 09:14:52 +0000 (10:14 +0100)]
net: lan966x: Update mdb when enabling/disabling mcast_snooping
When the multicast snooping is disabled, the mdb entries should be
removed from the HW, but they still need to be kept in memory for when
the mcast_snooping will be enabled again.
Horatiu Vultur [Fri, 4 Feb 2022 09:14:51 +0000 (10:14 +0100)]
net: lan966x: Implement the callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED
The callback allows to enable/disable multicast snooping.
When the snooping is enabled, all IGMP and MLD frames are redirected to
the CPU, therefore make sure not to set the skb flag 'offload_fwd_mark'.
The HW will not flood multicast ipv4/ipv6 data frames.
When the snooping is disabled, the HW will flood IGMP, MLD and multicast
ipv4/ipv6 frames according to the mcast_flood flag.
Paul Blakey [Thu, 3 Feb 2022 08:44:30 +0000 (10:44 +0200)]
net/sched: Enable tc skb ext allocation on chain miss only when needed
Currently tc skb extension is used to send miss info from
tc to ovs datapath module, and driver to tc. For the tc to ovs
miss it is currently always allocated even if it will not
be used by ovs datapath (as it depends on a requested feature).
Export the static key which is used by openvswitch module to
guard this code path as well, so it will be skipped if ovs
datapath doesn't need it. Enable this code path once
ovs datapath needs it.
====================
mptcp: Improve set-flags command and update self tests
Patches 1-3 allow more flexibility in the combinations of features and
flags allowed with the MPTCP_PM_CMD_SET_FLAGS netlink command, and add
self test case coverage for the new functionality.
Patches 4-6 and 9 refactor the mptcp_join.sh self tests to allow them to
configure all of the test cases using either the pm_nl_ctl utility (part
of the mptcp self tests) or the 'ip mptcp' command (from iproute2). The
default remains to use pm_nl_ctl.
Patches 7 and 8 update the pm_netlink.sh self tests to cover the use of
endpoint ids to set endpoint flags (instead of just addresses).
====================
Geliang Tang [Sat, 5 Feb 2022 00:03:37 +0000 (16:03 -0800)]
selftests: mptcp: set ip_mptcp in command line
This patch added a command line option '-i' for mptcp_join.sh to use
'ip mptcp' commands instead of using 'pm_nl_ctl' commands to deal with
PM netlink.
Geliang Tang [Sat, 5 Feb 2022 00:03:34 +0000 (16:03 -0800)]
selftests: mptcp: add wrapper for setting flags
This patch implemented a new function named pm_nl_set_endpoint(), wrapped
the PM netlink commands 'ip mptcp endpoint change flags' and 'pm_nl_ctl
set flags' in it, and used a new argument 'ip_mptcp' to choose which one
to use to set the flags of the PM endpoint.
'ip mptcp' used the ID number argument to find out the address to change
flags, while 'pm_nl_ctl' used the address and port number arguments. So
we need to parse the address ID from the PM dump output as well as the
address and port number.
Used this wrapper in do_transfer() instead of using the pm_nl_ctl command
directly.
Geliang Tang [Sat, 5 Feb 2022 00:03:33 +0000 (16:03 -0800)]
selftests: mptcp: add wrapper for showing addrs
This patch implemented a new function named pm_nl_show_endpoints(), wrapped
the PM netlink commands 'ip mptcp endpoint show' and 'pm_nl_ctl dump' in
it, used a new argument 'ip_mptcp' to choose which one to use to show all
the PM endpoints.
Used this wrapper in do_transfer() instead of using the pm_nl_ctl commands
directly.
The original 'pos+=5' in the remoing tests only works for the output of
'pm_nl_ctl show':
id 1 flags subflow 10.0.1.1
It doesn't work for the output of 'ip mptcp endpoint show':
10.0.1.1 id 1 subflow
So implemented a more flexible approach to get the address ID from the PM
dump output to fit for both commands.
Wrapped the PM netlink commands 'ip mptcp' and 'pm_nl_ctl' in them, and
used a new argument 'ip_mptcp' to choose which one to use for setting the
PM limits, adding or deleting the PM endpoint.
Used the wrappers in all the selftests in mptcp_join.sh instead of using
the pm_nl_ctl commands directly.
Geliang Tang [Sat, 5 Feb 2022 00:03:31 +0000 (16:03 -0800)]
selftests: mptcp: add backup with port testcase
This patch added the backup testcase using an address with a port number.
The original backup tests only work for the output of 'pm_nl_ctl dump'
without the port number. It chooses the last item in the dump to parse
the address in it, and in this case, the address is showed at the end
of the item.
But it doesn't work for the dump with the port number, in this case, the
port number is showed at the end of the item, not the address.
So implemented a more flexible approach to get the address and the port
number from the dump to fit for the port number case.
Geliang Tang [Sat, 5 Feb 2022 00:03:29 +0000 (16:03 -0800)]
mptcp: allow to use port and non-signal in set_flags
It's illegal to use both port and non-signal flags for adding address.
But it's legal to use both of them for setting flags, which always uses
non-signal flags, backup or fullmesh.
This patch moves this non-signal flag with port check from
mptcp_pm_parse_addr() to mptcp_nl_cmd_add_addr(). Do the check only when
adding addresses, not setting flags or deleting addresses.
====================
Support for the IOAM insertion frequency
The insertion frequency is represented as "k/n", meaning IOAM will be
added to {k} packets over {n} packets, with 0 < k <= n and 1 <= {k,n} <= 1000000. Therefore, it provides the following percentages of insertion
frequency: [0.0001% (min) ... 100% (max)].
Not only this solution allows an operator to apply dynamic frequencies
based on the current traffic load, but it also provides some
flexibility, i.e., by distinguishing similar cases (e.g., "1/2" and
"2/4").
"1/2" = Y N Y N Y N Y N ...
"2/4" = Y Y N N Y Y N N ...
====================
Justin Iurman [Wed, 2 Feb 2022 14:25:54 +0000 (15:25 +0100)]
ipv6: ioam: Insertion frequency in lwtunnel output
Add support for the IOAM insertion frequency inside its lwtunnel output
function. This patch introduces a new (atomic) counter for packets,
based on which the algorithm will decide if IOAM should be added or not.
Default frequency is "1/1" (i.e., applied to all packets) for backward
compatibility. The iproute2 patch is ready and will be submitted as soon
as this one is accepted.
Previous iproute2 command:
ip -6 ro ad fc00::1/128 encap ioam6 [ mode ... ] ...
New iproute2 command:
ip -6 ro ad fc00::1/128 encap ioam6 [ freq k/n ] [ mode ... ] ...
Herbert Xu [Wed, 2 Feb 2022 06:46:48 +0000 (17:46 +1100)]
crypto: api - Move cryptomgr soft dependency into algapi
The soft dependency on cryptomgr is only needed in algapi because
if algapi isn't present then no algorithms can be loaded. This
also fixes the case where api is built-in but algapi is built as
a module as the soft dependency would otherwise get lost.
Eric Dumazet [Thu, 3 Feb 2022 22:55:47 +0000 (14:55 -0800)]
tcp: take care of mixed splice()/sendmsg(MSG_ZEROCOPY) case
syzbot found that mixing sendpage() and sendmsg(MSG_ZEROCOPY)
calls over the same TCP socket would again trigger the
infamous warning in inet_sock_destruct()
WARN_ON(sk_forward_alloc_get(sk));
While Talal took into account a mix of regular copied data
and MSG_ZEROCOPY one in the same skb, the sendpage() path
has been forgotten.
We want the charging to happen for sendpage(), because
pages could be coming from a pipe. What is missing is the
downgrading of pure zerocopy status to make sure
sk_forward_alloc will stay synced.
Add tcp_downgrade_zcopy_pure() helper so that we can
use it from the two callers.
Yonghong Song [Fri, 4 Feb 2022 21:43:55 +0000 (13:43 -0800)]
libbpf: Fix build issue with llvm-readelf
There are cases where clang compiler is packaged in a way
readelf is a symbolic link to llvm-readelf. In such cases,
llvm-readelf will be used instead of default binutils readelf,
and the following error will appear during libbpf build:
Warning: Num of global symbols in
/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/sharedobjs/libbpf-in.o (367)
does NOT match with num of versioned symbols in
/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf.so libbpf.map (383).
Please make sure all LIBBPF_API symbols are versioned in libbpf.map.
--- /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_global_syms.tmp ...
+++ /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_versioned_syms.tmp ...
@@ -324,6 +324,22 @@
btf__str_by_offset
btf__type_by_id
btf__type_cnt
+LIBBPF_0.0.1
+LIBBPF_0.0.2
+LIBBPF_0.0.3
+LIBBPF_0.0.4
+LIBBPF_0.0.5
+LIBBPF_0.0.6
+LIBBPF_0.0.7
+LIBBPF_0.0.8
+LIBBPF_0.0.9
+LIBBPF_0.1.0
+LIBBPF_0.2.0
+LIBBPF_0.3.0
+LIBBPF_0.4.0
+LIBBPF_0.5.0
+LIBBPF_0.6.0
+LIBBPF_0.7.0
libbpf_attach_type_by_name
libbpf_find_kernel_btf
libbpf_find_vmlinux_btf_id
make[2]: *** [Makefile:184: check_abi] Error 1
make[1]: *** [Makefile:140: all] Error 2
The above failure is due to different printouts for some ABS
versioned symbols. For example, with the same libbpf.so,
$ /bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so | grep "LIBBPF" | grep ABS
134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0
202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0
...
$ /opt/llvm/bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so | grep "LIBBPF" | grep ABS
134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0@@LIBBPF_0.5.0
202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0@@LIBBPF_0.6.0
...
The binutils readelf doesn't print out the symbol LIBBPF_* version and llvm-readelf does.
Such a difference caused libbpf build failure with llvm-readelf.
The proposed fix filters out all ABS symbols as they are not part of the comparison.
This works for both binutils readelf and llvm-readelf.
Linus Torvalds [Fri, 4 Feb 2022 23:27:45 +0000 (15:27 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Seven fixes, six of which are fairly obvious driver fixes.
The one core change to the device budget depth is to try to ensure
that if the default depth is large (which can produce quite a sizeable
bitmap allocation per device), we give back the memory we don't need
if there's a queue size reduction in slave_configure (which happens to
a lot of devices)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: hisi_sas: Fix setting of hisi_sas_slot.is_internal
scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task
scsi: pm8001: Fix use-after-free for aborted TMF sas_task
scsi: pm8001: Fix warning for undescribed param in process_one_iomb()
scsi: core: Reallocate device's budget map on queue depth change
scsi: bnx2fc: Make bnx2fc_recv_frame() mp safe
scsi: pm80xx: Fix double completion for SATA devices
Linus Torvalds [Fri, 4 Feb 2022 23:22:35 +0000 (15:22 -0800)]
Merge tag 'pci-v5.17-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:
- Restructure j721e_pcie_probe() so we don't dereference a NULL pointer
(Bjorn Helgaas)
- Add a kirin_pcie_data struct to identify different Kirin variants to
fix probe failure for controllers with an internal PHY (Bjorn
Helgaas)
* tag 'pci-v5.17-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: kirin: Add dev struct for of_device_get_match_data()
PCI: j721e: Initialize pcie->cdns_pcie before using it
Bjorn Helgaas [Wed, 2 Feb 2022 15:52:41 +0000 (09:52 -0600)]
PCI: kirin: Add dev struct for of_device_get_match_data()
Bean reported that a622435fbe1a ("PCI: kirin: Prefer
of_device_get_match_data()") broke kirin_pcie_probe() because it assumed
match data of 0 was a failure when in fact, it meant the match data was
"(void *)PCIE_KIRIN_INTERNAL_PHY".
Therefore, probing of "hisilicon,kirin960-pcie" devices failed with -EINVAL
and an "OF data missing" message.
Add a struct kirin_pcie_data to encode the PHY type. Then the result of
of_device_get_match_data() should always be a non-NULL pointer to a struct
kirin_pcie_data that contains the PHY type.
Linus Torvalds [Fri, 4 Feb 2022 20:14:58 +0000 (12:14 -0800)]
Merge tag 'for-5.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few fixes and error handling improvements:
- fix deadlock between quota disable and qgroup rescan worker
- fix use-after-free after failure to create a snapshot
- skip warning on unmount after log cleanup failure
- don't start transaction for scrub if the fs is mounted read-only
- tree checker verifies item sizes"
* tag 'for-5.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: skip reserved bytes warning on unmount after log cleanup failure
btrfs: fix use of uninitialized variable at rm device ioctl
btrfs: fix use-after-free after failure to create a snapshot
btrfs: tree-checker: check item_size for dev_item
btrfs: tree-checker: check item_size for inode_item
btrfs: fix deadlock between quota disable and qgroup rescan worker
btrfs: don't start transaction for scrub if the fs is mounted read-only
Sean Young [Tue, 1 Feb 2022 18:38:36 +0000 (18:38 +0000)]
selftests/ir: fix build with ancient kernel headers
Since commit e2bcbd7769ee ("tools headers UAPI: remove stale lirc.h"),
the build of the selftests fails on rhel 8 since its version of
/usr/include/linux/lirc.h has no definition of RC_PROTO_RCMM32, etc [1].
Linus Torvalds [Fri, 4 Feb 2022 20:08:49 +0000 (12:08 -0800)]
Merge tag 'erofs-for-5.17-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"Two fixes related to fsdax cleanup in this cycle and ztailpacking to
fix small compressed data inlining. There is also a trivial cleanup to
rearrange code for better reading.
Summary:
- fix fsdax partition offset misbehavior
- clean up z_erofs_decompressqueue_work() declaration
- fix up EOF lcluster inlining, especially for small compressed data"
* tag 'erofs-for-5.17-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix small compressed files inlining
erofs: avoid unnecessary z_erofs_decompressqueue_work() declaration
erofs: fix fsdax partition offset handling
Axel Rasmussen [Thu, 27 Jan 2022 22:11:15 +0000 (14:11 -0800)]
selftests: fixup build warnings in pidfd / clone3 tests
These are some trivial fixups, which were needed to build the tests with
clang and -Werror. The following issues are fixed:
- Remove various unused variables.
- In child_poll_leader_exit_test, clang isn't smart enough to realize
syscall(SYS_exit, 0) won't return, so it complains we never return
from a non-void function. Add an extra exit(0) to appease it.
- In test_pidfd_poll_leader_exit, ret may be branched on despite being
uninitialized, if we have !use_waitpid. Initialize it to zero to get
the right behavior in that case.
Axel Rasmussen [Thu, 27 Jan 2022 21:29:51 +0000 (13:29 -0800)]
pidfd: fix test failure due to stack overflow on some arches
When running the pidfd_fdinfo_test on arm64, it fails for me. After some
digging, the reason is that the child exits due to SIGBUS, because it
overflows the 1024 byte stack we've reserved for it.
To fix the issue, increase the stack size to 8192 bytes (this number is
somewhat arbitrary, and was arrived at through experimentation -- I kept
doubling until the failure no longer occurred).
Also, let's make the issue easier to debug. wait_for_pid() returns an
ambiguous value: it may return -1 in all of these cases:
1. waitpid() itself returned -1
2. waitpid() returned success, but we found !WIFEXITED(status).
3. The child process exited, but it did so with a -1 exit code.
There's no way for the caller to tell the difference. So, at least log
which occurred, so the test runner can debug things.
While debugging this, I found that we had !WIFEXITED(), because the
child exited due to a signal. This seems like a reasonably common case,
so also print out whether or not we have WIFSIGNALED(), and the
associated WTERMSIG() (if any). This lets us see the SIGBUS I'm fixing
clearly when it occurs.
Finally, I'm suspicious of allocating the child's stack on our stack.
man clone(2) suggests that the correct way to do this is with mmap(),
and in particular by setting MAP_STACK. So, switch to doing it that way
instead.
Linus Torvalds [Fri, 4 Feb 2022 20:01:57 +0000 (12:01 -0800)]
Merge tag 'block-5.17-2022-02-04' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request
- fix use-after-free in rdma and tcp controller reset (Sagi Grimberg)
- fix the state check in nvmf_ctlr_matches_baseopts (Uday Shankar)
- MD nowait null pointer fix (Song)
- blk-integrity seed advance fix (Martin)
- Fix a dio regression in this merge window (Ilya)
* tag 'block-5.17-2022-02-04' of git://git.kernel.dk/linux-block:
block: bio-integrity: Advance seed correctly for larger interval sizes
nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts()
md: fix NULL pointer deref with nowait but no mddev->queue
block: fix DIO handling regressions in blkdev_read_iter()
nvme-rdma: fix possible use-after-free in transport error_recovery work
nvme-tcp: fix possible use-after-free in transport error_recovery work
nvme: fix a possible use-after-free in controller reset during load
Linus Torvalds [Fri, 4 Feb 2022 19:52:37 +0000 (11:52 -0800)]
Merge tag 'ata-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fixes from Damien Le Moal:
- Sergey volunteered to be a reviewer for the Renesas R-Car SATA driver
and PATA drivers. Update the MAINTAINERS file accordingly.
- Regression fix: add a horkage flag to prevent accessing the log
directory log page with SATADOM-ML 3ME SATA devices as they react
badly to reading that log page (from Anton).
* tag 'ata-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-core: Introduce ATA_HORKAGE_NO_LOG_DIR horkage
MAINTAINERS: add myself as Renesas R-Car SATA driver reviewer
MAINTAINERS: add myself as PATA drivers reviewer
Linus Torvalds [Fri, 4 Feb 2022 19:45:16 +0000 (11:45 -0800)]
Merge tag 'iommu-fixes-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Warning fixes and a fix for a potential use-after-free in IOMMU core
code
- Another potential memory leak fix for the Intel VT-d driver
- Fix for an IO polling loop timeout issue in the AMD IOMMU driver
* tag 'iommu-fixes-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()
iommu/vt-d: Fix potential memory leak in intel_setup_irq_remapping()
iommu: Fix some W=1 warnings
iommu: Fix potential use-after-free during probe
Linus Torvalds [Fri, 4 Feb 2022 19:38:01 +0000 (11:38 -0800)]
Merge tag 'random-5.17-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator fixes from Jason Donenfeld:
"For this week, we have:
- A fix to make more frequent use of hwgenerator randomness, from
Dominik.
- More cleanups to the boot initialization sequence, from Dominik.
- A fix for an old shortcoming with the ZAP ioctl, from me.
- A workaround for a still unfixed Clang CFI/FullLTO compiler bug,
from me. On one hand, it's a bummer to commit workarounds for
experimental compiler features that have bugs. But on the other, I
think this actually improves the code somewhat, independent of the
bug. So a win-win"
* tag 'random-5.17-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: only call crng_finalize_init() for primary_crng
random: access primary_pool directly rather than through pointer
random: wake up /dev/random writers after zap
random: continually use hwgenerator randomness
lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFI
Linus Torvalds [Fri, 4 Feb 2022 19:32:46 +0000 (11:32 -0800)]
Merge tag 'acpi-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix compilation in the case when ACPI is selected and CRC32, depended
on by ACPI after recent changes, is not (Randy Dunlap)"
* tag 'acpi-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: require CRC32 to build
Matteo Croce [Fri, 4 Feb 2022 00:55:19 +0000 (01:55 +0100)]
selftests/bpf: Test bpf_core_types_are_compat() functionality.
Add several tests to check bpf_core_types_are_compat() functionality:
- candidate type name exists and types match
- candidate type name exists but types don't match
- nested func protos at kernel recursion limit
- nested func protos above kernel recursion limit. Such bpf prog
is rejected during the load.
Matteo Croce [Fri, 4 Feb 2022 00:55:18 +0000 (01:55 +0100)]
bpf: Implement bpf_core_types_are_compat().
Adopt libbpf's bpf_core_types_are_compat() for kernel duty by adding
explicit recursion limit of 2 which is enough to handle 2 levels of
function prototypes.
Linus Torvalds [Fri, 4 Feb 2022 19:24:28 +0000 (11:24 -0800)]
Merge tag 'sound-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes.
The major changes are ASoC core fixes, addressing the DPCM locking
issue after the recent code changes and the potentially invalid
register accesses via control API. Also, HD-audio got a core fix for
Oops at dynamic unbinding.
The rest are device-specific small fixes, including the usual stuff
like HD-audio and USB-audio quirks"
* tag 'sound-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (31 commits)
ALSA: hda: Skip codec shutdown in case the codec is not registered
ALSA: usb-audio: Correct quirk for VF0770
ALSA: Replace acpi_bus_get_device()
Input: wm97xx: Simplify resource management
ALSA: hda/realtek: Add quirk for ASUS GU603
ALSA: hda/realtek: Fix silent output on Gigabyte X570 Aorus Xtreme after reboot from Windows
ALSA: hda/realtek: Fix silent output on Gigabyte X570S Aorus Master (newer chipset)
ALSA: hda/realtek: Add missing fixup-model entry for Gigabyte X570 ALC1220 quirks
ALSA: hda: realtek: Fix race at concurrent COEF updates
ASoC: ops: Check for negative values before reading them
ASoC: rt5682: Fix deadlock on resume
ASoC: hdmi-codec: Fix OOB memory accesses
ASoC: soc-pcm: Move debugfs removal out of spinlock
ASoC: soc-pcm: Fix DPCM lockdep warning due to nested stream locks
ASoC: fsl: Add missing error handling in pcm030_fabric_probe
ALSA: hda: Fix signedness of sscanf() arguments
ALSA: usb-audio: initialize variables that could ignore errors
ALSA: hda: Fix UAF of leds class devs at unbinding
ASoC: qdsp6: q6apm-dai: only stop graphs that are started
ASoC: codecs: wcd938x: fix return value of mixer put function
...
Linus Torvalds [Fri, 4 Feb 2022 19:13:54 +0000 (11:13 -0800)]
Merge tag 'drm-fixes-2022-02-04' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular fixes for the week. Daniel has agreed to bring back the fbcon
hw acceleration under a CONFIG option for the non-drm fbdev users, we
don't advise turning this on unless you are in the niche that is old
fbdev drivers, Since it's essentially a revert and shouldn't be high
impact seemed like a good time to do it now.
Otherwise, i915 and amdgpu fixes are most of it, along with some minor
fixes elsewhere.
fbdev:
- readd fbcon acceleration
i915:
- fix DP monitor via type-c dock
- fix for engine busyness and read timeout with GuC
- use ALLOW_FAIL for error capture buffer allocs
- don't use interruptible lock on error paths
- smatch fix to reject zero sized overlays.
amdgpu:
- mGPU fan boost fix for beige goby
- S0ix fixes
- Cyan skillfish hang fix
- DCN fixes for DCN 3.1
- DCN fixes for DCN 3.01
- Apple retina panel fix
- ttm logic inversion fix
dma-buf:
- heaps: fix potential spectre v1 gadget
kmb:
- fix potential oob access
mxsfb:
- fix NULL ptr deref
nouveau:
- fix potential oob access during BIOS decode"
* tag 'drm-fixes-2022-02-04' of git://anongit.freedesktop.org/drm/drm: (24 commits)
drm: mxsfb: Fix NULL pointer dereference
drm/amdgpu: fix logic inversion in check
drm/amd: avoid suspend on dGPUs w/ s2idle support when runtime PM enabled
drm/amd/display: Force link_rate as LINK_RATE_RBR2 for 2018 15" Apple Retina panels
drm/amd/display: revert "Reset fifo after enable otg"
drm/amd/display: watermark latencies is not enough on DCN31
drm/amd/display: Update watermark values for DCN301
drm/amdgpu: fix a potential GPU hang on cyan skillfish
drm/amd: Only run s3 or s0ix if system is configured properly
drm/amd: add support to check whether the system is set to s3
fbcon: Add option to enable legacy hardware acceleration
Revert "fbcon: Disable accelerated scrolling"
Revert "fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)"
drm/i915/pmu: Fix KMD and GuC race on accessing busyness
dma-buf: heaps: Fix potential spectre v1 gadget
drm/amd: Warn users about potential s0ix problems
drm/amd/pm: correct the MGpuFanBoost support for Beige Goby
drm/nouveau: fix off by one in BIOS boundary checking
drm/i915/adlp: Fix TypeC PHY-ready status readout
drm/i915/pmu: Use PM timestamp instead of RING TIMESTAMP for reference
...
Linus Torvalds [Fri, 4 Feb 2022 18:34:19 +0000 (10:34 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"10 patches.
Subsystems affected by this patch series: ipc, MAINTAINERS, and mm
(vmscan, debug, pagemap, kmemleak, and selftests)"
* emailed patches from Andrew Morton <[email protected]>:
kselftest/vm: revert "tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner"
MAINTAINERS: update rppt's email
mm/kmemleak: avoid scanning potential huge holes
ipc/sem: do not sleep with a spin lock held
mm/pgtable: define pte_index so that preprocessor could recognize it
mm/page_table_check: check entries at pmd levels
mm/khugepaged: unify collapse pmd clear, flush and free
mm/page_table_check: use unsigned long for page counters and cleanup
mm/debug_vm_pgtable: remove pte entry from the page table
Revert "mm/page_isolation: unset migratetype directly for non Buddy page"
random: only call crng_finalize_init() for primary_crng
crng_finalize_init() returns instantly if it is called for another pool
than primary_crng. The test whether crng_finalize_init() is still required
can be moved to the relevant caller in crng_reseed(), and
crng_need_final_init can be reset to false if crng_finalize_init() is
called with workqueues ready. Then, no previous callsite will call
crng_finalize_init() unless it is needed, and we can get rid of the
superfluous function parameter.
random: access primary_pool directly rather than through pointer
Both crng_initialize_primary() and crng_init_try_arch_early() are
only called for the primary_pool. Accessing it directly instead of
through a function parameter simplifies the code.
When account() is called, and the amount of entropy dips below
random_write_wakeup_bits, we wake up the random writers, so that they
can write some more in. However, the RNDZAPENTCNT/RNDCLEARPOOL ioctl
sets the entropy count to zero -- a potential reduction just like
account() -- but does not unblock writers. This commit adds the missing
logic to that ioctl to unblock waiting writers.
The rngd kernel thread may sleep indefinitely if the entropy count is
kept above random_write_wakeup_bits by other entropy sources. To make
best use of multiple sources of randomness, mix entropy from hardware
RNGs into the pool at least once within CRNG_RESEED_INTERVAL.
lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFI
blake2s_compress_generic is weakly aliased by blake2s_compress. The
current harness for function selection uses a function pointer, which is
ordinarily inlined and resolved at compile time. But when Clang's CFI is
enabled, CFI still triggers when making an indirect call via a weak
symbol. This seems like a bug in Clang's CFI, as though it's bucketing
weak symbols and strong symbols differently. It also only seems to
trigger when "full LTO" mode is used, rather than "thin LTO".
Nonetheless, the function pointer method isn't so terrific anyway, so
this patch replaces it with a simple boolean, which also gets inlined
away. This successfully works around the Clang bug.
In general, I'm not too keen on all of the indirection involved here; it
clearly does more harm than good. Hopefully the whole thing can get
cleaned up down the road when lib/crypto is overhauled more
comprehensively. But for now, we go with a simple bandaid.
Linus Torvalds [Fri, 4 Feb 2022 17:54:02 +0000 (09:54 -0800)]
Merge tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A patch to make it possible to disable zero copy path in the messenger
to avoid checksum or authentication tag mismatches and ensuing session
resets in case the destination buffer isn't guaranteed to be stable"
* tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client:
libceph: optionally use bounce buffer on recv path in crc mode
libceph: make recv path in secure mode work the same as send path
Linus Torvalds [Fri, 4 Feb 2022 17:44:42 +0000 (09:44 -0800)]
Merge tag '9p-for-5.17-rc3' of git://github.com/martinetd/linux
Pull 9p fix from Dominique Martinet:
"Fix 'cannot walk open fid' rule
The 9p 'walk' operation requires fid arguments to not originate from
an open or create call and we've missed that for a while as the
servers regularly running tests with don't enforce the check and no
active reviewer knew about the rule.
Both reporters confirmed reverting this patch fixes things for them
and looking at it further wasn't actually required... Will take more
time for follow up and enforcing the rule more thoroughly later"
* tag '9p-for-5.17-rc3' of git://github.com/martinetd/linux:
Revert "fs/9p: search open fids first"
Linus Torvalds [Fri, 4 Feb 2022 17:34:37 +0000 (09:34 -0800)]
Merge tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"SMB3 client fixes including:
- multiple fscache related fixes, reenabling ability to read/write to
cached files for cifs.ko (that was temporarily disabled for cifs.ko
a few weeks ago due to the recent fscache changes)
- also includes a new fscache helper function ("query_occupancy")
used by above
- fix for multiuser mounts and NTLMSSP auth (workstation name) for
stable
- fix locking ordering problem in multichannel code
- trivial malformed comment fix"
* tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix workstation_name for multiuser mounts
Invalidate fscache cookie only when inode attributes are changed.
cifs: Fix the readahead conversion to manage the batch when reading from cache
cifs: Implement cache I/O by accessing the cache directly
netfs, cachefiles: Add a method to query presence of data in the cache
cifs: Transition from ->readpages() to ->readahead()
cifs: unlock chan_lock before calling cifs_put_tcp_session
Fix a warning about a malformed kernel doc comment in cifs
Shuah Khan [Fri, 4 Feb 2022 04:49:45 +0000 (20:49 -0800)]
kselftest/vm: revert "tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner"
With this change, userfaultfd fails to build with undefined reference
swap() error:
userfaultfd.c: In function `userfaultfd_stress':
userfaultfd.c:1530:17: warning: implicit declaration of function `swap'; did you mean `swab'? [-Wimplicit-function-declaration]
1530 | swap(area_src, area_dst);
| ^~~~
| swab
/usr/bin/ld: /tmp/ccDGOAdV.o: in function `userfaultfd_stress':
userfaultfd.c:(.text+0x549e): undefined reference to `swap'
/usr/bin/ld: userfaultfd.c:(.text+0x54bc): undefined reference to `swap'
collect2: error: ld returned 1 exit status
Lang Yu [Fri, 4 Feb 2022 04:49:37 +0000 (20:49 -0800)]
mm/kmemleak: avoid scanning potential huge holes
When using devm_request_free_mem_region() and devm_memremap_pages() to
add ZONE_DEVICE memory, if requested free mem region's end pfn were
huge(e.g., 0x400000000), the node_end_pfn() will be also huge (see
move_pfn_range_to_zone()). Thus it creates a huge hole between
node_start_pfn() and node_end_pfn().
We found on some AMD APUs, amdkfd requested such a free mem region and
created a huge hole. In such a case, following code snippet was just
doing busy test_bit() looping on the huge hole.
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
struct page *page = pfn_to_online_page(pfn);
if (!page)
continue;
...
}
Mike Rapoport [Fri, 4 Feb 2022 04:49:29 +0000 (20:49 -0800)]
mm/pgtable: define pte_index so that preprocessor could recognize it
Since commit 974b9b2c68f3 ("mm: consolidate pte_index() and
pte_offset_*() definitions") pte_index is a static inline and there is
no define for it that can be recognized by the preprocessor. As a
result, vm_insert_pages() uses slower loop over vm_insert_page() instead
of insert_pages() that amortizes the cost of spinlock operations when
inserting multiple pages.
Pasha Tatashin [Fri, 4 Feb 2022 04:49:24 +0000 (20:49 -0800)]
mm/page_table_check: check entries at pmd levels
syzbot detected a case where the page table counters were not properly
updated.
syzkaller login: ------------[ cut here ]------------
kernel BUG at mm/page_table_check.c:162!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 3099 Comm: pasha Not tainted 5.16.0+ #48
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO4
RIP: 0010:__page_table_check_zero+0x159/0x1a0
Call Trace:
free_pcp_prepare+0x3be/0xaa0
free_unref_page+0x1c/0x650
free_compound_page+0xec/0x130
free_transhuge_page+0x1be/0x260
__put_compound_page+0x90/0xd0
release_pages+0x54c/0x1060
__pagevec_release+0x7c/0x110
shmem_undo_range+0x85e/0x1250
...
The repro involved having a huge page that is split due to uprobe event
temporarily replacing one of the pages in the huge page. Later the huge
page was combined again, but the counters were off, as the PTE level was
not properly updated.
Make sure that when PMD is cleared and prior to freeing the level the
PTEs are updated.
Pasha Tatashin [Fri, 4 Feb 2022 04:49:20 +0000 (20:49 -0800)]
mm/khugepaged: unify collapse pmd clear, flush and free
Unify the code that flushes, clears pmd entry, and frees the PTE table
level into a new function collapse_and_free_pmd().
This cleanup is useful as in the next patch we will add another call to
this function to iterate through PTE prior to freeing the level for page
table check.
Commit 721fb891ad0b ("mm/page_isolation: unset migratetype directly for
non Buddy page") will result memory that should in buddy disappear by
mistake. move_freepages_block moves all pages in pageblock instead of
pages indicated by input parameter, so if input pages is not in buddy
but other pages in pageblock is in buddy, it will result in page out of
control.
Hou Tao [Sun, 30 Jan 2022 09:29:15 +0000 (17:29 +0800)]
bpf, arm64: Enable kfunc call
Since commit b2eed9b58811 ("arm64/kernel: kaslr: reduce module
randomization range to 2 GB"), for arm64 whether KASLR is enabled
or not, the module is placed within 2GB of the kernel region, so
s32 in bpf_kfunc_desc is sufficient to represente the offset of
module function relative to __bpf_call_base. The only thing needed
is to override bpf_jit_supports_kfunc_call().
Joerg Roedel [Fri, 4 Feb 2022 11:55:37 +0000 (12:55 +0100)]
iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()
The polling loop for the register change in iommu_ga_log_enable() needs
to have a udelay() in it. Otherwise the CPU might be faster than the
IOMMU hardware and wrongly trigger the WARN_ON() further down the code
stream. Use a 10us for udelay(), has there is some hardware where
activation of the GA log can take more than a 100ms.
A future optimization should move the activation check of the GA log
to the point where it gets used for the first time. But that is a
bigger change and not suitable for a fix.
ixgbevf: Require large buffers for build_skb on 82599VF
From 4.17 onwards the ixgbevf driver uses build_skb() to build an skb
around new data in the page buffer shared with the ixgbe PF.
This uses either a 2K or 3K buffer, and offsets the DMA mapping by
NET_SKB_PAD + NET_IP_ALIGN. When using a smaller buffer RXDCTL is set to
ensure the PF does not write a full 2K bytes into the buffer, which is
actually 2K minus the offset.
However on the 82599 virtual function, the RXDCTL mechanism is not
available. The driver attempts to work around this by using the SET_LPE
mailbox method to lower the maximm frame size, but the ixgbe PF driver
ignores this in order to keep the PF and all VFs in sync[0].
This means the PF will write up to the full 2K set in SRRCTL, causing it
to write NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the buffer.
With 4K pages split into two buffers, this means it either writes
NET_SKB_PAD + NET_IP_ALIGN bytes past the first buffer (and into the
second), or NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the DMA
mapping.
Avoid this by only enabling build_skb when using "large" buffers (3K).
These are placed in each half of an order-1 page, preventing the PF from
writing past the end of the mapping.
[0]: Technically it only ever raises the max frame size, see
ixgbe_set_vf_lpe() in ixgbe_sriov.c
Fixes: f15c5ba5b6cd ("ixgbevf: add support for using order 1 pages to receive large frames") Signed-off-by: Samuel Mendoza-Jonas <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
This series revises the algorithm used for replenishing receive
buffers on RX endpoints. Currently there are two atomic variables
that track how many receive buffers can be sent to the hardware.
The new algorithm obviates the need for those, by just assuming we
always want to provide the hardware with buffers until it can hold
no more.
The first patch eliminates an atomic variable that's not required.
The next moves some code into the main replenish function's caller,
making one of the called function's arguments unnecessary. The
next six refactor things a bit more, adding a new helper function
that allows us to eliminate an additional atomic variable. And the
final two implement two more minor improvements.
====================
Rather than tracking the number of receive buffer transactions that
have been submitted without a doorbell, just track the total number
of transactions that have been issued. Then ring the doorbell when
that number modulo the replenish batch size is 0.
The effect is roughly the same, but the new count is slightly more
interesting, and this approach will someday allow the replenish
batch size to be tuned at runtime.
Alex Elder [Thu, 3 Feb 2022 17:09:26 +0000 (11:09 -0600)]
net: ipa: replenish after delivering payload
Replenishing is now solely driven by whether transactions are
available for a channel, and it doesn't really matter whether
we replenish before or after we deliver received packets to the
network stack.
Replenishing before delivering the payload adds a little latency.
Eliminate that by requesting a replenish after the payload is
delivered.
Alex Elder [Thu, 3 Feb 2022 17:09:25 +0000 (11:09 -0600)]
net: ipa: kill replenish_backlog
We no longer use the replenish_backlog atomic variable to decide
when we've got work to do providing receive buffers to hardware.
Basically, we try to keep the hardware as full as possible, all the
time. We keep supplying buffers until the hardware has no more
space for them.
As a result, we can get rid of the replenish_backlog field and the
atomic operations performed on it.
Alex Elder [Thu, 3 Feb 2022 17:09:24 +0000 (11:09 -0600)]
net: ipa: introduce gsi_channel_trans_idle()
Create a new function that returns true if all transactions for a
channel are available for use.
Use it in ipa_endpoint_replenish_enable() to see whether to start
replenishing, and in ipa_endpoint_replenish() to determine whether
it's necessary after a failure to schedule delayed work to ensure a
future replenish attempt occurs.
Alex Elder [Thu, 3 Feb 2022 17:09:22 +0000 (11:09 -0600)]
net: ipa: allocate transaction in replenish loop
When replenishing, have ipa_endpoint_replenish() allocate a
transaction, and pass that to ipa_endpoint_replenish_one() to fill.
Then, if that produces no error, commit the transaction within the
replenish loop as well. In this way we can distinguish between
transaction failures and buffer allocation/mapping failures.
Failure to allocate a transaction simply means the hardware already
has as many receive buffers as it can hold. In that case we can
break out of the replenish loop because there's nothing more to do.
If we fail to allocate or map pages for the receive buffer, just
try again later.
Alex Elder [Thu, 3 Feb 2022 17:09:21 +0000 (11:09 -0600)]
net: ipa: decide on doorbell in replenish loop
Decide whether the doorbell should be signaled when committing a
replenish transaction in the main replenish loop, rather than in
ipa_endpoint_replenish_one(). This is a step to facilitate the
next patch.
Alex Elder [Thu, 3 Feb 2022 17:09:19 +0000 (11:09 -0600)]
net: ipa: allocate transaction before pages when replenishing
A transaction failure only occurs if no more transactions are
available for an endpoint. It's a very cheap test.
When replenishing an RX endpoint buffer, there's no point in
allocating pages if transactions are exhausted. So don't bother
doing so unless the transaction allocation succeeds.
Alex Elder [Thu, 3 Feb 2022 17:09:18 +0000 (11:09 -0600)]
net: ipa: kill replenish_saved
The replenish_saved field keeps track of the number of times a new
buffer is added to the backlog when replenishing is disabled. We
don't really use it though, so there's no need for us to track it
separately. Whether replenishing is enabled or not, we can simply
increment the backlog.
Get rid of replenish_saved, and initialize and increment the backlog
where it would have otherwise been used.
Jakub Kicinski [Wed, 2 Feb 2022 22:20:31 +0000 (14:20 -0800)]
tls: cap the output scatter list to something reasonable
TLS recvmsg() passes user pages as destination for decrypt.
The decrypt operation is repeated record by record, each
record being 16kB, max. TLS allocates an sg_table and uses
iov_iter_get_pages() to populate it with enough pages to
fit the decrypted record.
Even though we decrypt a single message at a time we size
the sg_table based on the entire length of the iovec.
This leads to unnecessarily large allocations, risking
triggering OOM conditions.
Use iov_iter_truncate() / iov_iter_reexpand() to construct
a "capped" version of iov_iter_npages(). Alternatively we
could parametrize iov_iter_npages() to take the size as
arg instead of using i->count, or do something else..
net: dsa: realtek: convert to phylink_generic_validate()
Populate the supported interfaces and MAC capabilities for the Realtek
rtl8365 DSA switch and remove the old validate implementation to allow
DSA to use phylink_generic_validate() for this switch driver.
David S. Miller [Fri, 4 Feb 2022 10:09:42 +0000 (10:09 +0000)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
40GbE Intel Wired LAN Driver Updates 2022-02-03
This series contains updates to the i40e client header file and driver.
Mateusz disables HW TC offload by default.
Joe Damato removes a no longer used statistic.
Jakub Kicinski removes an unused enum from the client header file.
Jedrzej changes some admin queue commands to occur under atomic context
and adds new functions for admin queue MAC VLAN filters to avoid a
potential race that could occur due storing results in a structure that
could be overwritten by the next admin queue call.
====================
Thomas Gleixner [Mon, 31 Jan 2022 21:02:46 +0000 (22:02 +0100)]
PCI/MSI: Remove bogus warning in pci_irq_get_affinity()
The recent overhaul of pci_irq_get_affinity() introduced a regression when
pci_irq_get_affinity() is called for an MSI-X interrupt which was not
allocated with affinity descriptor information.
The original code just returned a NULL pointer in that case, but the rework
added a WARN_ON() under the assumption that the corresponding WARN_ON() in
the MSI case can be applied to MSI-X as well.
In fact the MSI warning in the original code does not make sense either
because it's legitimate to invoke pci_irq_get_affinity() for a MSI
interrupt which was not allocated with affinity descriptor information.
Remove it and just return NULL as the original code did.
KVM: x86: Use ERR_PTR_USR() to return -EFAULT as a __user pointer
Use ERR_PTR_USR() when returning -EFAULT from kvm_get_attr_addr(), sparse
complains about implicitly casting the kernel pointer from ERR_PTR() into
a __user pointer.