====================
net/sched: cls_api: Support hardware miss to tc action
This series adds support for hardware miss to instruct tc to continue execution
in a specific tc action instance on a filter's action list. The mlx5 driver patch
(besides the refactors) shows its usage instead of using just chain restore.
Currently a filter's action list must be executed all together or
not at all as driver are only able to tell tc to continue executing from a
specific tc chain, and not a specific filter/action.
This is troublesome with regards to action CT, where new connections should
be sent to software (via tc chain restore), and established connections can
be handled in hardware.
Checking for new connections is done when executing the ct action in hardware
(by checking the packet's tuple against known established tuples).
But if there is a packet modification (pedit) action before action CT and the
checked tuple is a new connection, hardware will need to revert the previous
packet modifications before sending it back to software so it can
re-match the same tc filter in software and re-execute its CT action.
The following is an example configuration of stateless nat
on mlx5 driver that isn't supported before this patchet:
#Setup corrosponding mlx5 VFs in namespaces
$ ip netns add ns0
$ ip netns add ns1
$ ip link set dev enp8s0f0v0 netns ns0
$ ip netns exec ns0 ifconfig enp8s0f0v0 1.1.1.1/24 up
$ ip link set dev enp8s0f0v1 netns ns1
$ ip netns exec ns1 ifconfig enp8s0f0v1 1.1.1.2/24 up
#Setup tc arp and ct rules on mxl5 VF representors
$ tc qdisc add dev enp8s0f0_0 ingress
$ tc qdisc add dev enp8s0f0_1 ingress
$ ifconfig enp8s0f0_0 up
$ ifconfig enp8s0f0_1 up
#Original side
$ tc filter add dev enp8s0f0_0 ingress chain 0 proto ip flower \
ct_state -trk ip_proto tcp dst_port 8888 \
action pedit ex munge tcp dport set 5001 pipe \
action csum ip tcp pipe \
action ct pipe \
action goto chain 1
$ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
ct_state +trk+est \
action mirred egress redirect dev enp8s0f0_1
$ tc filter add dev enp8s0f0_0 ingress chain 1 proto ip flower \
ct_state +trk+new \
action ct commit pipe \
action mirred egress redirect dev enp8s0f0_1
$ tc filter add dev enp8s0f0_0 ingress chain 0 proto arp flower \
action mirred egress redirect dev enp8s0f0_1
#Reply side
$ tc filter add dev enp8s0f0_1 ingress chain 0 proto arp flower \
action mirred egress redirect dev enp8s0f0_0
$ tc filter add dev enp8s0f0_1 ingress chain 0 proto ip flower \
ct_state -trk ip_proto tcp \
action ct pipe \
action pedit ex munge tcp sport set 8888 pipe \
action csum ip tcp pipe \
action mirred egress redirect dev enp8s0f0_0
#Run traffic
$ ip netns exec ns1 iperf -s -p 5001&
$ sleep 2 #wait for iperf to fully open
$ ip netns exec ns0 iperf -c 1.1.1.2 -p 8888
#dump tc filter stats on enp8s0f0_0 chain 0 rule and see hardware packets:
$ tc -s filter show dev enp8s0f0_0 ingress chain 0 proto ip | grep "hardware.*pkt"
Sent hardware 9310116832 bytes 6149672 pkt
Sent hardware 9310116832 bytes 6149672 pkt
Sent hardware 9310116832 bytes 6149672 pkt
A new connection executing the first filter in hardware will first rewrite
the dst port to the new port, and then the ct action is executed,
because this is a new connection, hardware will need to be send this back
to software, on chain 0, to execute the first filter again in software.
The dst port needs to be reverted otherwise it won't re-match the old
dst port in the first filter. Because of that, currently mlx5 driver will
reject offloading the above action ct rule.
This series adds support for hardware partially executing a filter's action list,
and letting tc software continue processing in the specific action instance
where hardware left off (in the above case after the "action pedit ex munge tcp
dport... of the first rule") allowing support for scenarios such as the above.
====================
Paul Blakey [Fri, 17 Feb 2023 22:36:20 +0000 (00:36 +0200)]
net/mlx5e: TC, Set CT miss to the specific ct action instance
Currently, CT misses restore the missed chain on the tc skb extension so
tc will continue from the relevant chain. Instead, restore the CT action's
miss cookie on the extension, which will instruct tc to continue from the
this specific CT action instance on the relevant filter's action list.
Map the CT action's miss_cookie to a new miss object (ACT_MISS), and use
this miss mapping instead of the current chain miss object (CHAIN_MISS)
for CT action misses.
To restore this new miss mapping value, add a RX restore rule for each
such mapping value.
Paul Blakey [Fri, 17 Feb 2023 22:36:16 +0000 (00:36 +0200)]
net/sched: flower: Support hardware miss to tc action
To support hardware miss to tc action in actions on the flower
classifier, implement the required getting of filter actions,
and setup filter exts (actions) miss by giving it the filter's
handle and actions.
Paul Blakey [Fri, 17 Feb 2023 22:36:14 +0000 (00:36 +0200)]
net/sched: cls_api: Support hardware miss to tc action
For drivers to support partial offload of a filter's action list,
add support for action miss to specify an action instance to
continue from in sw.
CT action in particular can't be fully offloaded, as new connections
need to be handled in software. This imposes other limitations on
the actions that can be offloaded together with the CT action, such
as packet modifications.
Assign each action on a filter's action list a unique miss_cookie
which drivers can then use to fill action_miss part of the tc skb
extension. On getting back this miss_cookie, find the action
instance with relevant cookie and continue classifying from there.
Paul Blakey [Fri, 17 Feb 2023 22:36:13 +0000 (00:36 +0200)]
net/sched: Rename user cookie and act cookie
struct tc_action->act_cookie is a user defined cookie,
and the related struct flow_action_entry->act_cookie is
used as an handle similar to struct flow_cls_offload->cookie.
Rename tc_action->act_cookie to user_cookie, and
flow_action_entry->act_cookie to cookie so their names
would better fit their usage.
Miquel Raynal build upon his earlier work and introduced two new
features into the ieee802154 stack. Beaconing to announce existing
PAN's and passive scanning to discover the beacons and associated
PAN's. The matching changes to the userspace configuration tool
have been posted as well and will be released together with the
kernel release.
Arnd Bergmann and Dmitry Torokhov worked on converting the
at86rf230 and cc2520 drivers away from the unused platform_data
usage and towards the new gpiod API. (I had to add a revert as
Dmitry found a regression on an already pushed tree on my side).
Changes since v1 (pull request 2023-02-02)
- Netlink API extack and NLA_POLICY* usage as suggested by Jakub
- Removed always true condition found by kernel test robot
- Simplify device removal with running background job for scanning
- Fix problems with beacon sending in some cases by using the MLME
tx path
* tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next:
ieee802154: Drop device trackers
mac802154: Fix an always true condition
mac802154: Send beacons using the MLME Tx path
ieee802154: Change error code on monitor scan netlink request
ieee802154: Convert scan error messages to extack
ieee802154: Use netlink policies when relevant on scan parameters
ieee802154: at86rf230: switch to using gpiod API
ieee802154: at86rf230: drop support for platform data
Revert "at86rf230: convert to gpio descriptors"
cc2520: move to gpio descriptors
mac802154: Avoid superfluous endianness handling
at86rf230: convert to gpio descriptors
mac802154: Handle basic beaconing
ieee802154: Add support for user beaconing requests
mac802154: Handle passive scanning
mac802154: Add MLME Tx locked helpers
mac802154: Prepare forcing specific symbol duration
ieee802154: Introduce a helper to validate a channel
ieee802154: Define a beacon frame header
ieee802154: Add support for user scanning requests
====================
Alejandro Lucero [Mon, 20 Feb 2023 11:01:33 +0000 (11:01 +0000)]
sfc: fix builds without CONFIG_RTC_LIB
Add an embarrassingly missed semicolon plus and embarrassingly missed
parenthesis breaking kernel building when CONFIG_RTC_LIB is not set
like the one reported with ia64 config.
Kees Cook [Sat, 18 Feb 2023 18:38:50 +0000 (10:38 -0800)]
net/mlx4_en: Introduce flexible array to silence overflow warning
The call "skb_copy_from_linear_data(skb, inl + 1, spc)" triggers a FORTIFY
memcpy() warning on ppc64 platform:
In function ‘fortify_memcpy_chk’,
inlined from ‘skb_copy_from_linear_data’ at ./include/linux/skbuff.h:4029:2,
inlined from ‘build_inline_wqe’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:722:4,
inlined from ‘mlx4_en_xmit’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:1066:3:
./include/linux/fortify-string.h:513:25: error: call to ‘__write_overflow_field’ declared with
attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()?
[-Werror=attribute-warning]
513 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Same behaviour on x86 you can get if you use "__always_inline" instead of
"inline" for skb_copy_from_linear_data() in skbuff.h
The call here copies data into inlined tx destricptor, which has 104
bytes (MAX_INLINE) space for data payload. In this case "spc" is known
in compile-time but the destination is used with hidden knowledge
(real structure of destination is different from that the compiler
can see). That cause the fortify warning because compiler can check
bounds, but the real bounds are different. "spc" can't be bigger than
64 bytes (MLX4_INLINE_ALIGN), so the data can always fit into inlined
tx descriptor. The fact that "inl" points into inlined tx descriptor is
determined earlier in mlx4_en_xmit().
Avoid confusing the compiler with "inl + 1" constructions to get to past
the inl header by introducing a flexible array "data" to the struct so
that the compiler can see that we are not dealing with an array of inl
structs, but rather, arbitrary data following the structure. There are
no changes to the structure layout reported by pahole, and the resulting
machine code is actually smaller.
Horatiu Vultur [Fri, 17 Feb 2023 21:09:17 +0000 (22:09 +0100)]
net: lan966x: Fix possible deadlock inside PTP
When doing timestamping in lan966x and having PROVE_LOCKING
enabled the following warning is shown.
========================================================
WARNING: possible irq lock inversion dependency detected 6.2.0-rc7-01749-gc54e1f7f7e36 #2786 Tainted: G N
--------------------------------------------------------
swapper/0/0 just changed the state of lock: c2609f50 (_xmit_ETHER#2){+.-.}-{2:2}, at: sch_direct_xmit+0x16c/0x2e8
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&lan966x->ptp_ts_id_lock){+.+.}-{2:2}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G N 6.2.0-rc7-01749-gc54e1f7f7e36 #2786
Hardware name: Generic DT based system
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x58/0x70
dump_stack_lvl from mark_lock.part.0+0x59c/0x93c
mark_lock.part.0 from __lock_acquire+0x978/0x2978
__lock_acquire from lock_acquire.part.0+0xb0/0x248
lock_acquire.part.0 from _raw_spin_lock+0x38/0x48
_raw_spin_lock from sch_direct_xmit+0x16c/0x2e8
sch_direct_xmit from __dev_queue_xmit+0x41c/0x1224
__dev_queue_xmit from ip6_finish_output2+0x5f4/0xc64
ip6_finish_output2 from ndisc_send_skb+0x4cc/0x81c
ndisc_send_skb from addrconf_rs_timer+0xb0/0x2f8
addrconf_rs_timer from call_timer_fn+0xb4/0x33c
call_timer_fn from expire_timers+0xb4/0x10c
expire_timers from run_timer_softirq+0xf8/0x2a8
run_timer_softirq from __do_softirq+0xd4/0x5fc
__do_softirq from __irq_exit_rcu+0x138/0x17c
__irq_exit_rcu from irq_exit+0x8/0x28
irq_exit from __irq_svc+0x90/0xbc
Exception stack(0xc1001f20 to 0xc1001f68)
1f20: ffffffffffffffff00000001c011f840c100e000c100e000c1009314c1009370
1f40: c10f0c1ac0d5e564c0f5da8c0000000000000000c1001f70c010f0bcc010f0c0
1f60: 600f0013ffffffff
__irq_svc from arch_cpu_idle+0x30/0x3c
arch_cpu_idle from default_idle_call+0x44/0xac
default_idle_call from do_idle+0xc8/0x138
do_idle from cpu_startup_entry+0x18/0x1c
cpu_startup_entry from rest_init+0xcc/0x168
rest_init from arch_post_acpi_subsys_init+0x0/0x8
Fix this by using spin_lock_irqsave/spin_lock_irqrestore also
inside lan966x_ptp_irq_handler.
net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
Commit 2c02d41d71f9 ("net/ulp: prevent ULP without clone op from entering
the LISTEN status") guarantees that all ULP listeners have clone() op, so
we no longer need to test it in inet_clone_ulp().
We've added 64 non-merge commits during the last 7 day(s) which contain
a total of 158 files changed, 4190 insertions(+), 988 deletions(-).
The main changes are:
1) Add a rbtree data structure following the "next-gen data structure"
precedent set by recently-added linked-list, that is, by using
kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky.
2) Add a new benchmark for hashmap lookups to BPF selftests,
from Anton Protopopov.
3) Fix bpf_fib_lookup to only return valid neighbors and add an option
to skip the neigh table lookup, from Martin KaFai Lau.
4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory
accouting for container environments, from Yafang Shao.
5) Batch of ice multi-buffer and driver performance fixes,
from Alexander Lobakin.
6) Fix a bug in determining whether global subprog's argument is
PTR_TO_CTX, which is based on type names which breaks kprobe progs,
from Andrii Nakryiko.
7) Prep work for future -mcpu=v4 LLVM option which includes usage of
BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier,
from Eduard Zingerman.
8) More prep work for later building selftests with Memory Sanitizer
in order to detect usages of undefined memory, from Ilya Leoshkevich.
9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer
dereference via sendmsg(), from Maciej Fijalkowski.
10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui.
11) Fix BPF memory allocator in combination with BPF hashtab where it could
corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao.
12) Fix LoongArch BPF JIT to always use 4 instructions for function
address so that instruction sequences don't change between passes,
from Hengqi Chen.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
selftests/bpf: Add bpf_fib_lookup test
bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
riscv, bpf: Add bpf trampoline support for RV64
riscv, bpf: Add bpf_arch_text_poke support for RV64
riscv, bpf: Factor out emit_call for kernel and bpf context
riscv: Extend patch_text for multiple instructions
Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
selftests/bpf: Add global subprog context passing tests
selftests/bpf: Convert test_global_funcs test to test_loader framework
bpf: Fix global subprog context argument resolution logic
LoongArch, bpf: Use 4 instructions for function address in JIT
bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
bpf: Disable bh in bpf_test_run for xdp and tc prog
xsk: check IFF_UP earlier in Tx path
Fix typos in selftest/bpf files
selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
...
====================
Stefan Schmidt [Sat, 18 Feb 2023 21:13:17 +0000 (22:13 +0100)]
MAINTAINERS: Add Miquel Raynal as additional maintainer for ieee802154
We are growing the maintainer team for ieee802154 to spread the load for
review and general maintenance. Miquel has been driving the subsystem
forward over the last year and we would like to welcome him as a
maintainer.
Stefan Schmidt [Sat, 18 Feb 2023 21:13:16 +0000 (22:13 +0100)]
MAINTAINERS: Switch maintenance for mrf24j40 driver over
Alan Ott has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Alan for his work on the driver.
Stefan Schmidt [Sat, 18 Feb 2023 21:13:15 +0000 (22:13 +0100)]
MAINTAINERS: Switch maintenance for mcr20a driver over
Xue Liu has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Xue Liu for his work on the driver.
Stefan Schmidt [Sat, 18 Feb 2023 21:13:14 +0000 (22:13 +0100)]
MAINTAINERS: Switch maintenance for cc2520 driver over
Varka Bhadram has not been actively working on the driver or reviewing
patches for several years. I have been taking odd fixes in through the
wpan/ieee802154 tree. Update the MAINTAINERS file to reflect this
reality. I wanted to thank Varka for his work on the driver.
Linus Torvalds [Mon, 20 Feb 2023 23:55:47 +0000 (15:55 -0800)]
Merge tag 'asm-generic-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanups from Arnd Bergmann:
"Only three minor changes: a cross-platform series from Mike Rapoport
to consolidate asm/agp.h between architectures, and a correctness
change for __generic_cmpxchg_local() from Matt Evans"
* tag 'asm-generic-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
char/agp: introduce asm-generic/agp.h
char/agp: consolidate {alloc,free}_gatt_pages()
locking/atomic: cmpxchg: Make __generic_cmpxchg_local compare against zero-extended 'old' value
Linus Torvalds [Mon, 20 Feb 2023 23:49:56 +0000 (15:49 -0800)]
Merge tag 'soc-dt-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC DT updates from Arnd Bergmann:
"About a quarter of the changes are for 32-bit arm, mostly filling in
device support for existing machines and adding minor cleanups, mostly
for Qualcomm and Samsung based machines.
Two new 32-bit SoCs are added, both are quad-core Cortex-A7 chips from
Rockchips that have been around for a while but were lacking kernel
support so far: RV1126 is a Vision SoC with an NPU and is used in the
Edgeble Neural Compute Module 2(Neu2) board, while RK3128 is design
for TV boxes and so far only comes with a dts for its refernece
design.
The other 32-bit boards that were added are two ASpeed AST2600 based
BMC boards, the Microchip sam9x60_curiosity development board (Armv5
based!), the Enclustra PE1 FPGA-SoM baseboard, and a few more boards
for i.MX53 and i.MX6ULL.
On the RISC-V side, there are fewer patches, but a total of ten new
single-board computers based on variations of the Allwinner D1/T113
chip, plus one more board based on Microchip Polarfire.
As usual, arm64 has by far the most changes here, with over 700
non-merge changesets, among them over 400 alone for Qualcomm. The
newly added SoCs this time are all recent high-end embedded SoCs for
various markets, each on comes with support for its reference board:
- Qualcomm SM8550 (Snapdragon 8 Gen 2) for mobile phones
- Qualcomm QDU1000/QRU1000 5G RAN platform
- Rockchips RK3588/RK3588s for tablets, chromebooks and SBCs
- TI J784S4 for industrial and automotive applications
In total, there are 46 new arm64 machines:
- Reference platforms for each of the five new SoCs
- Three Amlogic based development boards
- Six embedded machines based on NXP i.MX8MM and i.MX8MP
- The Mediatek mt7986a based Banana Pi R3 router
- Six tablets based on Qualcomm MSM8916 (Snapdragon 410), SM6115
(Snapdragon 662) and SM8250 (Snapdragon 865)
- Two LTE dongles, also based on MSM8916
- Seven mobile phones, based on Qualcomm MSM8953 (Snapdragon 610),
SDM450 and SDM632
- Three chromebooks based on Qualcomm SC7280 (Snapdragon 7c)
- Nine development boards based on Rockchips RK3588, RK3568, RK3566
and RK3328.
- Five development machines based on TI K3 (AM642/AM654/AM68/AM69)
The cleanup of dtc warnings continues across all platforms, adding to
the total number of changes"
Linus Torvalds [Mon, 20 Feb 2023 23:43:36 +0000 (15:43 -0800)]
Merge tag 'soc-defconfig-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM defconfigs updates from Arnd Bergmann:
"As usual, this contains all the patches to enable options for newly
added device drivers in the 32-bit and 64-bit defconfig files.
I have sorted the files according to the changes to Kconfig files,
to make it easier to check what has changed compared to the 'make
savedefconfig' output.
The most notable change this time is a series from Mark Brown to add
a 'virtconfig' target for arm64, which is for the moment the same as
the 'defconfig' target but disables all the top-level SoC specific
options in order to have a smaller and faster kernel build"
* tag 'soc-defconfig-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (39 commits)
arm64: defconfig: enable drivers required by the Qualcomm SA8775P platform
arm64: defconfig: Enable DisplayPort on SC8280XP laptops
arm64: configs: Add virtconfig
kbuild: Provide a version of merge_into_defconfig without override warnings
scripts: merge_config: Add option to suppress warning on overrides
ARM: reorder defconfig files
arm64: reorder defconfig
arm64: defconfig: enable Qualcomm SDAM nvmem driver
arm64: defconfig: enable SM8450 DISPCC clock driver
ARM: defconfig: Add IOSCHED_BFQ to the default configs
ARM: configs: multi_v7: enable NVMEM driver for STM32
ARM: Add wpcm450_defconfig for Nuvoton WPCM450
arm64: defconfig: Enable DMA_RESTRICTED_POOL
arm64: defconfig: Enable missing configs for mt8192-asurada
riscv: defconfig: Enable the Allwinner D1 platform and drivers
ARM: imx_v6_v7_defconfig: Don't enable PROVE_LOCKING
ARM: multi_v7_defconfig: Add GXP Fan and SPI support
ARM: add multi_v7_lpae_defconfig
kbuild: Add config fragment merge functionality
ARM: multi_v7_defconfig: Add options to support TQMLS102xA series
...
Linus Torvalds [Mon, 20 Feb 2023 23:36:37 +0000 (15:36 -0800)]
Merge tag 'arm-soc-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC updates from Arnd Bergmann:
"The majority of the changes are for the OMAP2 platform, mostly
removing some dead code that got left behind from previous cleanups.
Aside from that, there are very minor updates and correctness fixes
for Zynq, i.MX, Samsung, Broadcom, AT91, ep93xx, and OMAP1"
* tag 'arm-soc-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (26 commits)
dt-bindings: soc: samsung: exynos-pmu: allow phys as child
ARM: imx: mach-imx6ul: add imx6ulz support
ARM: imx: Call ida_simple_remove() for ida_simple_get
arm64: drop redundant "ARMv8" from Kconfig option title
ARM: ep93xx: Convert to use descriptors for GPIO LEDs
ARM: s3c: fix s3c64xx_set_timer_source prototype
ARM: OMAP2+: Fix spelling typos in comment
ARM: OMAP2+: Remove unneeded #include <linux/pinctrl/machine.h>
ARM: OMAP2+: Remove unneeded #include <linux/pinctrl/pinmux.h>
ARM: OMAP1: call platform_device_put() in error case in omap1_dm_timer_init()
ARM: BCM63xx: remove useless goto statement
ARM: omap2: make functions static
ARM: omap2: remove unused omap2_pm_init
ARM: omap2: remove unused declarations
ARM: omap2: remove unused functions
ARM: omap2: smartreflex: remove on_init control
ARM: omap2: remove APLL control
ARM: omap2: simplify clock2xxx header
ARM: omap2: remove unused omap_hwmod_reset.c
ARM: omap2: remove unused headers
...
Linus Torvalds [Mon, 20 Feb 2023 23:28:57 +0000 (15:28 -0800)]
Merge tag 'arm-boardfile-remove-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC boardfile updates from Arnd Bergmann
"Unused boardfile removal for 6.3
This is a follow-up to the deprecation of most of the old-style board
files that was merged in linux-6.0, removing them for good.
This branch is almost exclusively dead code removal based on those
annotations. Some device driver removals went through separate
subsystem trees, but the majority is in the same branch, in order to
better handle dependencies between the patches and avoid breaking
bisection.
Unfortunately that leads to merge conflicts against other changes in
the subsystem trees, but they should all be trivial to resolve by
removing the files.
See commit 7d0d3fa7339e ("Merge tag 'arm-boardfiles-6.0' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc") for the
description of which machines were marked unused and are now removed.
The only removals that got postponed are Terastation WXL (mv78xx0) and
Jornada720 (StrongARM1100), which turned out to still have potential
users"
* tag 'arm-boardfile-remove-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (91 commits)
mmc: omap: drop TPS65010 dependency
ARM: pxa: restore mfp-pxa320.h
usb: ohci-omap: avoid unused-variable warning
ARM: debug: remove references in DEBUG_UART_8250_SHIFT to removed configs
ARM: s3c: remove obsolete s3c-cpu-freq header
MAINTAINERS: adjust SAMSUNG SOC CLOCK DRIVERS after s3c24xx support removal
MAINTAINERS: update file entries after arm multi-platform rework and mach-pxa removal
ARM: remove CONFIG_UNUSED_BOARD_FILES
mfd: remove htc-pasic3 driver
w1: remove ds1wm driver
usb: remove ohci-tmio driver
fbdev: remove w100fb driver
fbdev: remove tmiofb driver
mmc: remove tmio_mmc driver
mfd: remove ucb1400 support
mfd: remove toshiba tmio drivers
rtc: remove v3020 driver
power: remove pda_power supply driver
ASoC: pxa: remove unused board support
pcmcia: remove unused pxa/sa1100 drivers
...
Linus Torvalds [Mon, 20 Feb 2023 22:27:21 +0000 (14:27 -0800)]
Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- NVMe updates via Christoph:
- Small improvements to the logging functionality (Amit Engel)
- Authentication cleanups (Hannes Reinecke)
- Cleanup and optimize the DMA mapping cod in the PCIe driver
(Keith Busch)
- Work around the command effects for Format NVM (Keith Busch)
- Misc cleanups (Keith Busch, Christoph Hellwig)
- Fix and cleanup freeing single sgl (Keith Busch)
- MD updates via Song:
- Fix a rare crash during the takeover process
- Don't update recovery_cp when curr_resync is ACTIVE
- Free writes_pending in md_stop
- Change active_io to percpu
- Updates to drbd, inching us closer to unifying the out-of-tree driver
with the in-tree one (Andreas, Christoph, Lars, Robert)
- BFQ update adding support for multi-actuator drives (Paolo, Federico,
Davide)
- Make brd compliant with REQ_NOWAIT (me)
- Fix for IOPOLL and queue entering, fixing stalled IO waiting on
timeouts (me)
- Fix for REQ_NOWAIT with multiple bios (me)
- Fix memory leak in blktrace cleanup (Greg)
- Clean up sbitmap and fix a potential hang (Kemeng)
- Clean up some bits in BFQ, and fix a bug in the request injection
(Kemeng)
- Clean up the request allocation and issue code, and fix some bugs
related to that (Kemeng)
- ublk updates and fixes:
- Add support for unprivileged ublk (Ming)
- Improve device deletion handling (Ming)
- Misc (Liu, Ziyang)
- s390 dasd fixes (Alexander, Qiheng)
- Improve utility of request caching and fixes (Anuj, Xiao)
- zoned cleanups (Pankaj)
- More constification for kobjs (Thomas)
- blk-iocost cleanups (Yu)
- Remove bio splitting from drivers that don't need it (Christoph)
- Switch blk-cgroups to use struct gendisk. Some of this is now
incomplete as select late reverts were done. (Christoph)
- Add bvec initialization helpers, and convert callers to use that
rather than open-coding it (Christoph)
* tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits)
brd: use radix_tree_maybe_preload instead of radix_tree_preload
block: use proper return value from bio_failfast()
block: bio-integrity: Copy flags when bio_integrity_payload is cloned
block: Fix io statistics for cgroup in throttle path
brd: mark as nowait compatible
brd: check for REQ_NOWAIT and set correct page allocation mask
brd: return 0/-error from brd_insert_page()
block: sync mixed merged request's failfast with 1st bio's
Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
Revert "blk-cgroup: pass a gendisk to blkg_lookup"
Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
Revert "blk-cgroup: move the cgroup information to struct gendisk"
nvme-pci: remove iod use_sgls
nvme-pci: fix freeing single sgl
block: ublk: check IO buffer based on flag need_get_data
s390/dasd: Fix potential memleak in dasd_eckd_init()
s390/dasd: sort out physical vs virtual pointers usage
block: Remove the ALLOC_CACHE_SLACK constant
block: make kobj_type structures constant
...
Linus Torvalds [Mon, 20 Feb 2023 22:10:36 +0000 (14:10 -0800)]
Merge tag 'for-6.3/dio-2023-02-16' of git://git.kernel.dk/linux
Pull legacy dio update from Jens Axboe:
"We only have a few file systems that use the old dio code, make them
select it rather than build it unconditionally"
* tag 'for-6.3/dio-2023-02-16' of git://git.kernel.dk/linux:
fs: build the legacy direct I/O code conditionally
fs: move sb_init_dio_done_wq out of direct-io.c
Linus Torvalds [Mon, 20 Feb 2023 22:03:57 +0000 (14:03 -0800)]
Merge tag 'for-6.3/iter-ubuf-2023-02-16' of git://git.kernel.dk/linux
Pull io_uring ITER_UBUF conversion from Jens Axboe:
"Since we now have ITER_UBUF available, switch to using it for single
ranges as it's more efficient than ITER_IOVEC for that"
* tag 'for-6.3/iter-ubuf-2023-02-16' of git://git.kernel.dk/linux:
block: use iter_ubuf for single range
iov_iter: move iter_ubuf check inside restore WARN
io_uring: use iter_ubuf for single range imports
io_uring: switch network send/recv to ITER_UBUF
iov: add import_ubuf()
* tag 'for-6.3/io_uring-2023-02-16' of git://git.kernel.dk/linux: (51 commits)
io_uring: Support calling io_uring_register with a registered ring fd
io_uring,audit: don't log IORING_OP_MADVISE
io_uring: mark task TASK_RUNNING before handling resume/task work
io_uring: always go async for unsupported open flags
io_uring: always go async for unsupported fadvise flags
io_uring: for requests that require async, force it
io_uring: if a linked request has REQ_F_FORCE_ASYNC then run it async
io_uring: add reschedule point to handle_tw_list()
io_uring: add a conditional reschedule to the IOPOLL cancelation loop
io_uring: return normal tw run linking optimisation
io_uring: refactor tctx_task_work
io_uring: refactor io_put_task helpers
io_uring: refactor req allocation
io_uring: improve io_get_sqe
io_uring: kill outdated comment about overflow flush
io_uring: use user visible tail in io_uring_poll()
io_uring: pass in io_issue_def to io_assign_file()
io_uring: Enable KASAN for request cache
io_uring: handle TIF_NOTIFY_RESUME when checking for task_work
io_uring/msg-ring: ensure flags passing works for task_work completions
...
Linus Torvalds [Mon, 20 Feb 2023 21:05:24 +0000 (13:05 -0800)]
Merge tag 'dlm-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This fixes some races in the lowcomms startup and shutdown code that
were found by targeted stress testing that quickly and repeatedly
joins and leaves lockspaces"
* tag 'dlm-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
fs: dlm: remove unnecessary waker_up() calls
fs: dlm: move state change into else branch
fs: dlm: remove newline in log_print
fs: dlm: reduce the shutdown timeout to 5 secs
fs: dlm: make dlm sequence id more robust
fs: dlm: wait until all midcomms nodes detect version
fs: dlm: ignore unexpected non dlm opts msgs
fs: dlm: bring back previous shutdown handling
fs: dlm: send FIN ack back in right cases
fs: dlm: move sending fin message into state change handling
fs: dlm: don't set stop rx flag after node reset
fs: dlm: fix race setting stop tx flag
fs: dlm: be sure to call dlm_send_queue_flush()
fs: dlm: fix use after free in midcomms commit
fs: dlm: start midcomms before scand
fs/dlm: Remove "select SRCU"
fs: dlm: fix return value check in dlm_memory_init()
Linus Torvalds [Mon, 20 Feb 2023 20:54:27 +0000 (12:54 -0800)]
Merge tag 'for-6.3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"The usual mix of performance improvements and new features.
The core change is reworking how checksums are processed, with
followup cleanups and simplifications. There are two minor changes in
block layer and iomap code.
Features:
- block group allocation class heuristics:
- pack files by size (up to 128k, up to 8M, more) to avoid
fragmentation in block groups, assuming that file size and life
time is correlated, in particular this may help during balance
- with tracepoints and extensible in the future
Performance:
- send: cache directory utimes and only emit the command when
necessary
- speedup up to 10x
- smaller final stream produced (no redundant utimes commands
issued)
- compatibility not affected
- fiemap: skip backref checks for shared leaves
- speedup 3x on sample filesystem with all leaves shared (e.g. on
snapshots)
- micro optimized b-tree key lookup, speedup in metadata operations
(sample benchmark: fs_mark +10% of files/sec)
Core changes:
- change where checksumming is done in the io path:
- checksum and read repair does verification at lower layer
- cascaded cleanups and simplifications
- raid56 refactoring and cleanups
Fixes:
- sysfs: make sure that a run-time change of a feature is correctly
tracked by the feature files
- scrub: better reporting of tree block errors
Other:
- locally enable -Wmaybe-uninitialized after fixing all warnings
- misc cleanups, spelling fixes
Other code:
- block: export bio_split_rw
- iomap: remove IOMAP_F_ZONE_APPEND"
* tag 'for-6.3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (109 commits)
btrfs: make kobj_type structures constant
btrfs: remove the bdev argument to btrfs_rmap_block
btrfs: don't rely on unchanging ->bi_bdev for zone append remaps
btrfs: never return true for reads in btrfs_use_zone_append
btrfs: pass a btrfs_bio to btrfs_use_append
btrfs: set bbio->file_offset in alloc_new_bio
btrfs: use file_offset to limit bios size in calc_bio_boundaries
btrfs: do unsigned integer division in the extent buffer binary search loop
btrfs: eliminate extra call when doing binary search on extent buffer
btrfs: raid56: handle endio in scrub_rbio
btrfs: raid56: handle endio in recover_rbio
btrfs: raid56: handle endio in rmw_rbio
btrfs: raid56: submit the read bios from scrub_assemble_read_bios
btrfs: raid56: fold rmw_read_wait_recover into rmw_read_bios
btrfs: raid56: fold recover_assemble_read_bios into recover_rbio
btrfs: raid56: add a bio_list_put helper
btrfs: raid56: wait for I/O completion in submit_read_bios
btrfs: raid56: simplify code flow in rmw_rbio
btrfs: raid56: simplify error handling and code flow in raid56_parity_write
btrfs: replace btrfs_wait_tree_block_writeback by wait_on_extent_buffer_writeback
...
Linus Torvalds [Mon, 20 Feb 2023 20:44:08 +0000 (12:44 -0800)]
Merge tag 'fixes_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF and ext2 fixes from Jan Kara:
- Rewrite of udf directory iteration code to address multiple syzbot
reports
- Fixes to udf extent handling and block mapping code to address
several syzbot reports and filesystem corruption issues uncovered by
fsx & fsstress
- Convert udf to kmap_local()
- Add sanity checks when loading udf bitmaps
- Drop old VARCONV support which I've never seen used and which was
broken for quite some years without anybody noticing
- Finish conversion of ext2 to kmap_local()
- One fix to mpage_writepages() on which other udf fixes depend
* tag 'fixes_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (78 commits)
udf: Avoid directory type conversion failure due to ENOMEM
udf: Use unsigned variables for size calculations
udf: remove reporting loc in debug output
udf: Check consistency of Space Bitmap Descriptor
udf: Fix file counting in LVID
udf: Limit file size to 4TB
udf: Don't return bh from udf_expand_dir_adinicb()
udf: Convert udf_expand_file_adinicb() to avoid kmap_atomic()
udf: Convert udf_adinicb_writepage() to memcpy_to_page()
udf: Switch udf_adinicb_readpage() to kmap_local_page()
udf: Move udf_adinicb_readpage() to inode.c
udf: Mark aops implementation static
udf: Switch to single address_space_operations
udf: Add handling of in-ICB files to udf_bmap()
udf: Convert all file types to use udf_write_end()
udf: Convert in-ICB files to use udf_write_begin()
udf: Convert in-ICB files to use udf_direct_IO()
udf: Convert in-ICB files to use udf_writepages()
udf: Unify .read_folio for normal and in-ICB files
udf: Fix off-by-one error when discarding preallocation
...
Linus Torvalds [Mon, 20 Feb 2023 20:38:27 +0000 (12:38 -0800)]
Merge tag 'fsnotify_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"Support for auditing decisions regarding fanotify permission events"
* tag 'fsnotify_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify,audit: Allow audit to use the full permission event response
fanotify: define struct members to hold response decision context
fanotify: Ensure consistent variable type for response
Linus Torvalds [Mon, 20 Feb 2023 20:33:41 +0000 (12:33 -0800)]
Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux
Pull fsverity updates from Eric Biggers:
"Fix the longstanding implementation limitation that fsverity was only
supported when the Merkle tree block size, filesystem block size, and
PAGE_SIZE were all equal.
Specifically, add support for Merkle tree block sizes less than
PAGE_SIZE, and make ext4 support fsverity on filesystems where the
filesystem block size is less than PAGE_SIZE.
Effectively, this means that fsverity can now be used on systems with
non-4K pages, at least on ext4. These changes have been tested using
the verity group of xfstests, newly updated to cover the new code
paths.
Also update fs/verity/ to support verifying data from large folios.
There's also a similar patch for fs/crypto/, to support decrypting
data from large folios, which I'm including in here to avoid a merge
conflict between the fscrypt and fsverity branches"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
fscrypt: support decrypting data from large folios
fsverity: support verifying data from large folios
fsverity.rst: update git repo URL for fsverity-utils
ext4: allow verity with fs block size < PAGE_SIZE
fs/buffer.c: support fsverity in block_read_full_folio()
f2fs: simplify f2fs_readpage_limit()
ext4: simplify ext4_readpage_limit()
fsverity: support enabling with tree block size < PAGE_SIZE
fsverity: support verification with tree block size < PAGE_SIZE
fsverity: replace fsverity_hash_page() with fsverity_hash_block()
fsverity: use EFBIG for file too large to enable verity
fsverity: store log2(digest_size) precomputed
fsverity: simplify Merkle tree readahead size calculation
fsverity: use unsigned long for level_start
fsverity: remove debug messages and CONFIG_FS_VERITY_DEBUG
fsverity: pass pos and size to ->write_merkle_tree_block
fsverity: optimize fsverity_cleanup_inode() on non-verity files
fsverity: optimize fsverity_prepare_setattr() on non-verity files
fsverity: optimize fsverity_file_open() on non-verity files
Linus Torvalds [Mon, 20 Feb 2023 20:29:27 +0000 (12:29 -0800)]
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux
Pull fscrypt updates from Eric Biggers:
"Simplify the implementation of the test_dummy_encryption mount option
by adding the 'test dummy key' on-demand"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: clean up fscrypt_add_test_dummy_key()
fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super()
f2fs: stop calling fscrypt_add_test_dummy_key()
ext4: stop calling fscrypt_add_test_dummy_key()
fscrypt: add the test dummy encryption key on-demand
Linus Torvalds [Mon, 20 Feb 2023 20:23:40 +0000 (12:23 -0800)]
Merge tag 'erofs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"The most noticeable feature for this cycle is per-CPU kthread
decompression since Android use cases need low-latency I/O handling in
order to ensure the app runtime performance, currently unbounded
workqueue latencies are not quite good for production on many aarch64
hardwares and thus we need to introduce a deterministic expectation
for these. Decompression is CPU-intensive and it is sleepable for
EROFS, so other alternatives like decompression under softirq contexts
are not considered. More details are in the corresponding commit
message.
Others are random cleanups around the whole codebase and we will
continue to clean up further in the next few months.
Due to Lunar New Year holidays, some other new features were not
completely reviewed and solidified as expected and we may delay them
into the next version.
Summary:
- Add per-cpu kthreads for low-latency decompression for Android use
cases
- Get rid of tagged pointer helpers since they are rarely used now
- Several code cleanups to reduce codebase
- Documentation and MAINTAINERS updates"
* tag 'erofs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (21 commits)
erofs: fix an error code in z_erofs_init_zip_subsystem()
erofs: unify anonymous inodes for blob
erofs: relinquish volume with mutex held
erofs: maintain cookies of share domain in self-contained list
erofs: remove unused device mapping in meta routine
MAINTAINERS: erofs: Add Documentation/ABI/testing/sysfs-fs-erofs
Documentation/ABI: sysfs-fs-erofs: update supported features
erofs: remove unused EROFS_GET_BLOCKS_RAW flag
erofs: update print symbols for various flags in trace
erofs: make kobj_type structures constant
erofs: add per-cpu threads for decompression as an option
erofs: tidy up internal.h
erofs: get rid of z_erofs_do_map_blocks() forward declaration
erofs: move zdata.h into zdata.c
erofs: remove tagged pointer helpers
erofs: avoid tagged pointers to mark sync decompression
erofs: get rid of erofs_inode_datablocks()
erofs: simplify iloc()
erofs: get rid of debug_one_dentry()
erofs: remove linux/buffer_head.h dependency
...
Linus Torvalds [Mon, 20 Feb 2023 20:14:33 +0000 (12:14 -0800)]
Merge tag 'fs.acl.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull vfs acl update from Christian Brauner:
"This contains a single update to the internal get acl method and
replaces an open-coded cmpxchg() comparison with with try_cmpxchg().
It's clearer and also beneficial on some architectures"
* tag 'fs.acl.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
posix_acl: Use try_cmpxchg in get_acl
Linus Torvalds [Mon, 20 Feb 2023 20:03:55 +0000 (12:03 -0800)]
Merge tag 'fs.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull vfs hardening update from Christian Brauner:
"Jan pointed out that during shutdown both filp_close() and super block
destruction will use basic printk logging when bugs are detected. This
causes issues in a few scenarios:
- Tools like syzkaller cannot figure out that the logged message
indicates a bug.
- Users that explicitly opt in to have the kernel bug on data
corruption by selecting CONFIG_BUG_ON_DATA_CORRUPTION should see
the kernel crash when they did actually select that option.
- When there are busy inodes after the superblock is shut down later
access to such a busy inodes walks through freed memory. It would
be better to cleanly crash instead.
All of this can be addressed by using the already existing
CHECK_DATA_CORRUPTION() macro in these places when kernel bugs are
detected. Its logging improvement is useful for all users.
Otherwise this only has a meaningful behavioral effect when users do
select CONFIG_BUG_ON_DATA_CORRUPTION which means this is backward
compatible for regular users"
* tag 'fs.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
fs: Use CHECK_DATA_CORRUPTION() when kernel bugs are detected
Linus Torvalds [Mon, 20 Feb 2023 19:53:11 +0000 (11:53 -0800)]
Merge tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull vfs idmapping updates from Christian Brauner:
- Last cycle we introduced the dedicated struct mnt_idmap type for
mount idmapping and the required infrastucture in 256c8aed2b42 ("fs:
introduce dedicated idmap type for mounts"). As promised in last
cycle's pull request message this converts everything to rely on
struct mnt_idmap.
Currently we still pass around the plain namespace that was attached
to a mount. This is in general pretty convenient but it makes it easy
to conflate namespaces that are relevant on the filesystem with
namespaces that are relevant on the mount level. Especially for
non-vfs developers without detailed knowledge in this area this was a
potential source for bugs.
This finishes the conversion. Instead of passing the plain namespace
around this updates all places that currently take a pointer to a
mnt_userns with a pointer to struct mnt_idmap.
Now that the conversion is done all helpers down to the really
low-level helpers only accept a struct mnt_idmap argument instead of
two namespace arguments.
Conflating mount and other idmappings will now cause the compiler to
complain loudly thus eliminating the possibility of any bugs. This
makes it impossible for filesystem developers to mix up mount and
filesystem idmappings as they are two distinct types and require
distinct helpers that cannot be used interchangeably.
Everything associated with struct mnt_idmap is moved into a single
separate file. With that change no code can poke around in struct
mnt_idmap. It can only be interacted with through dedicated helpers.
That means all filesystems are and all of the vfs is completely
oblivious to the actual implementation of idmappings.
We are now also able to extend struct mnt_idmap as we see fit. For
example, we can decouple it completely from namespaces for users that
don't require or don't want to use them at all. We can also extend
the concept of idmappings so we can cover filesystem specific
requirements.
In combination with the vfs{g,u}id_t work we finished in v6.2 this
makes this feature substantially more robust and thus difficult to
implement wrong by a given filesystem and also protects the vfs.
- Enable idmapped mounts for tmpfs and fulfill a longstanding request.
A long-standing request from users had been to make it possible to
create idmapped mounts for tmpfs. For example, to share the host's
tmpfs mount between multiple sandboxes. This is a prerequisite for
some advanced Kubernetes cases. Systemd also has a range of use-cases
to increase service isolation. And there are more users of this.
However, with all of the other work going on this was way down on the
priority list but luckily someone other than ourselves picked this
up.
As usual the patch is tiny as all the infrastructure work had been
done multiple kernel releases ago. In addition to all the tests that
we already have I requested that Rodrigo add a dedicated tmpfs
testsuite for idmapped mounts to xfstests. It is to be included into
xfstests during the v6.3 development cycle. This should add a slew of
additional tests.
* tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits)
shmem: support idmapped mounts for tmpfs
fs: move mnt_idmap
fs: port vfs{g,u}id helpers to mnt_idmap
fs: port fs{g,u}id helpers to mnt_idmap
fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap
fs: port i_{g,u}id_{needs_}update() to mnt_idmap
quota: port to mnt_idmap
fs: port privilege checking helpers to mnt_idmap
fs: port inode_owner_or_capable() to mnt_idmap
fs: port inode_init_owner() to mnt_idmap
fs: port acl to mnt_idmap
fs: port xattr to mnt_idmap
fs: port ->permission() to pass mnt_idmap
fs: port ->fileattr_set() to pass mnt_idmap
fs: port ->set_acl() to pass mnt_idmap
fs: port ->get_acl() to pass mnt_idmap
fs: port ->tmpfile() to pass mnt_idmap
fs: port ->rename() to pass mnt_idmap
fs: port ->mknod() to pass mnt_idmap
fs: port ->mkdir() to pass mnt_idmap
...
Yury Norov [Fri, 17 Feb 2023 01:39:08 +0000 (17:39 -0800)]
sched/topology: fix KASAN warning in hop_cmp()
Despite that prev_hop is used conditionally on cur_hop
is not the first hop, it's initialized unconditionally.
Because initialization implies dereferencing, it might happen
that the code dereferences uninitialized memory, which has been
spotted by KASAN. Fix it by reorganizing hop_cmp() logic.
Linus Torvalds [Mon, 20 Feb 2023 19:21:02 +0000 (11:21 -0800)]
Merge tag 'iversion-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull i_version updates from Jeff Layton:
"This overhauls how we handle i_version queries from nfsd.
Instead of having special routines and grabbing the i_version field
directly out of the inode in some cases, we've moved most of the
handling into the various filesystems' getattr operations. As a bonus,
this makes ceph's change attribute usable by knfsd as well.
This should pave the way for future work to make this value queryable
by userland, and to make it more resilient against rolling back on a
crash"
* tag 'iversion-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
nfsd: remove fetch_iversion export operation
nfsd: use the getattr operation to fetch i_version
nfsd: move nfsd4_change_attribute to nfsfh.c
ceph: report the inode version in getattr if requested
nfs: report the inode version in getattr if requested
vfs: plumb i_version handling into struct kstat
fs: clarify when the i_version counter must be updated
fs: uninline inode_query_iversion
Linus Torvalds [Mon, 20 Feb 2023 19:10:38 +0000 (11:10 -0800)]
Merge tag 'locks-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking updates from Jeff Layton:
"The main change here is that I've broken out most of the file locking
definitions into a new header file. I also went ahead and completed
the removal of locks_inode function"
* tag 'locks-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
fs: remove locks_inode
filelock: move file locking definitions to separate header file
Linus Torvalds [Mon, 20 Feb 2023 19:02:05 +0000 (11:02 -0800)]
Merge tag 'tpm-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
"In additon to bug fixes, these are noteworthy changes:
- In TPM I2C drivers, migrate from probe() to probe_new() (a new
driver model in I2C).
- TPM CRB: Pluton support
- Add duplicate hash detection to the blacklist keyring in order to
give more meaningful klog output than e.g. [1]"
Link: https://askubuntu.com/questions/1436856/ubuntu-22-10-blacklist-problem-blacklisting-hash-13-message-on-boot
* tag 'tpm-v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: add vendor flag to command code validation
tpm: Add reserved memory event log
tpm: Use managed allocation for bios event log
tpm: tis_i2c: Convert to i2c's .probe_new()
tpm: tpm_i2c_nuvoton: Convert to i2c's .probe_new()
tpm: tpm_i2c_infineon: Convert to i2c's .probe_new()
tpm: tpm_i2c_atmel: Convert to i2c's .probe_new()
tpm: st33zp24: Convert to i2c's .probe_new()
KEYS: asymmetric: Fix ECDSA use via keyctl uapi
certs: don't try to update blacklist keys
KEYS: Add new function key_create()
certs: make blacklisted hash available in klog
tpm_crb: Add support for CRB devices based on Pluton
crypto: certs: fix FIPS selftest dependency
Linus Torvalds [Mon, 20 Feb 2023 18:40:42 +0000 (10:40 -0800)]
Merge tag 'rust-6.3' of https://github.com/Rust-for-Linux/linux
Pull Rust updates from Miguel Ojeda:
"More core additions, getting closer to a point where the first Rust
modules can be upstreamed. The major ones being:
- Sync: new types 'Arc', 'ArcBorrow' and 'UniqueArc'.
- Types: new trait 'ForeignOwnable' and new type 'ScopeGuard'.
There is also a substantial removal in terms of lines:
- 'alloc' crate: remove the 'borrow' module (type 'Cow' and trait
'ToOwned')"
* tag 'rust-6.3' of https://github.com/Rust-for-Linux/linux:
rust: delete rust-project.json when running make clean
rust: MAINTAINERS: Add the zulip link
rust: types: implement `ForeignOwnable` for `Arc<T>`
rust: types: implement `ForeignOwnable` for the unit type
rust: types: implement `ForeignOwnable` for `Box<T>`
rust: types: introduce `ForeignOwnable`
rust: types: introduce `ScopeGuard`
rust: prelude: prevent doc inline of external imports
rust: sync: add support for dispatching on Arc and ArcBorrow.
rust: sync: introduce `UniqueArc`
rust: sync: allow type of `self` to be `ArcBorrow<T>`
rust: sync: introduce `ArcBorrow`
rust: sync: allow coercion from `Arc<T>` to `Arc<U>`
rust: sync: allow type of `self` to be `Arc<T>` or variants
rust: sync: add `Arc` for ref-counted allocations
rust: compiler_builtins: make stubs non-global
rust: alloc: remove the `borrow` module (`ToOwned`, `Cow`)
Mark Rutland [Mon, 20 Feb 2023 16:23:17 +0000 (16:23 +0000)]
arm64: fix .idmap.text assertion for large kernels
When building a kernel with many debug options enabled (which happens in
test configurations use by myself and syzbot), the kernel can become
large enough that portions of .text can be more than 128M away from
.idmap.text (which is placed inside the .rodata section). Where idmap
code branches into .text, the linker will place veneers in the
.idmap.text section to make those branches possible.
Unfortunately, as Ard reports, GNU LD has bseen observed to add 4K of
padding when adding such veneers, e.g.
This makes the __idmap_text_start .. __idmap_text_end region bigger than
the 4K we require it to fit within, and triggers an assertion in arm64's
vmlinux.lds.S, which breaks the build:
| LD .tmp_vmlinux.kallsyms1
| aarch64-linux-gnu-ld: ID map text too big or misaligned
| make[1]: *** [scripts/Makefile.vmlinux:35: vmlinux] Error 1
| make: *** [Makefile:1264: vmlinux] Error 2
Avoid this by using an `ADRP+ADD+BLR` sequence for branches out of
.idmap.text, which avoids the need for veneers. These branches are only
executed once per boot, and only when the MMU is on, so there should be
no noticeable performance penalty in replacing `BL` with `ADRP+ADD+BLR`.
At the same time, remove the "x" and "w" attributes when placing code in
.idmap.text, as these are not necessary, and this will prevent the
linker from assuming that it is safe to place PLTs into .idmap.text,
causing it to warn if and when there are out-of-range branches within
.idmap.text, e.g.
| LD .tmp_vmlinux.kallsyms1
| arch/arm64/kernel/head.o: in function `primary_entry':
| (.idmap.text+0x1c): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| arch/arm64/kernel/head.o: in function `init_el2':
| (.idmap.text+0x88): relocation truncated to fit: R_AARCH64_CALL26 against symbol `dcache_clean_poc' defined in .text section in arch/arm64/mm/cache.o
| make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1
| make: *** [Makefile:1252: vmlinux] Error 2
Thus, if future changes add out-of-range branches in .idmap.text, it
should be easy enough to identify those from the resulting linker
errors.
Linus Torvalds [Mon, 20 Feb 2023 17:27:39 +0000 (09:27 -0800)]
Merge tag 'remove-get_kernel_pages-for-6.3' of https://git.linaro.org/people/jens.wiklander/linux-tee
Pull TEE update from Jens Wiklander:
"Remove get_kernel_pages()
Vmalloc page support is removed from shm_get_kernel_pages() and the
get_kernel_pages() call is replaced by calls to get_page(). With no
remaining callers of get_kernel_pages() the function is removed"
[ This looks like it's just some random 'tee' cleanup, but the bigger
picture impetus for this is really to to to remove historical
confusion with mixed use of kernel virtual addresses and 'struct page'
pointers.
Kernel virtual pointers in the vmalloc space is then particularly
confusing - both for looking up a page pointer (when trying to then
unify a "virtual address or page" interface) and _particularly_ when
mixed with HIGHMEM support and the kmap*() family of remapping.
This is particularly true with HIGHMEM getting much less test coverage
with 32-bit architectures being increasingly legacy targets.
So we actively wanted to remove get_kernel_pages() to make sure nobody
else used it too, and thus the 'tee' part is "finally remove last
user".
See also commit 6647e76ab623 ("v4l2: don't fall back to follow_pfn()
if pin_user_pages_fast() fails") for a totally different version of a
conceptually similar "let's stop this confusion of different ways of
referring to memory". - Linus ]
* tag 'remove-get_kernel_pages-for-6.3' of https://git.linaro.org/people/jens.wiklander/linux-tee:
mm: Remove get_kernel_pages()
tee: Remove call to get_kernel_pages()
tee: Remove vmalloc page support
highmem: Enhance is_kmap_addr() to check kmap_local_page() mappings
Florian Fainelli [Fri, 17 Feb 2023 18:34:14 +0000 (10:34 -0800)]
net: bcmgenet: Support wake-up from s2idle
When we suspend into s2idle we also need to enable the interrupt line
that generates the MPD and HFB interrupts towards the host CPU interrupt
controller (typically the ARM GIC or MIPS L1) to make it exit s2idle.
When we suspend into other modes such as "standby" or "mem" we engage a
power management state machine which will gate off the CPU L1 controller
(priv->irq0) and ungate the side band wake-up interrupt (priv->wol_irq).
It is safe to have both enabled as wake-up sources because they are
mutually exclusive given any suspend mode.
Eric Dumazet [Fri, 17 Feb 2023 18:24:54 +0000 (18:24 +0000)]
scm: add user copy checks to put_cmsg()
This is a followup of commit 2558b8039d05 ("net: use a bounce
buffer for copying skb->mark")
x86 and powerpc define user_access_begin, meaning
that they are not able to perform user copy checks
when using user_write_access_begin() / unsafe_copy_to_user()
and friends [1]
Instead of waiting bugs to trigger on other arches,
add a check_object_size() in put_cmsg() to make sure
that new code tested on x86 with CONFIG_HARDENED_USERCOPY=y
will perform more security checks.
[1] We can not generically call check_object_size() from
unsafe_copy_to_user() because UACCESS is enabled at this point.
this is a pull request of 4 patches for net-next/master.
The first patch is by Yang Li and converts the ctucanfd driver to
devm_platform_ioremap_resource().
The last 3 patches are by Frank Jungclaus, target the esd_usb driver
and contains preparations for the upcoming support of the esd
CAN-USB/3 hardware.
====================
Horatiu Vultur [Fri, 17 Feb 2023 13:28:31 +0000 (14:28 +0100)]
net: lan966x: Use automatic selection of VCAP rule actionset
Since commit 81e164c4aec5 ("net: microchip: sparx5: Add automatic
selection of VCAP rule actionset") the VCAP API has the capability to
select automatically the actionset based on the actions that are attached
to the rule. So it is not needed anymore to hardcode the actionset in the
driver, therefore it is OK to remove this.
The first patch namespacify the setting. In the common case, once
proper isolation is in place in the main namespace, forwarding
to/from each child netns will allways happen on the desidered CPUs.
Any additional RPS stage inside the child namespace will not provide
additional isolation and could hurt performance badly if picking a
CPU on a remote node.
The 2nd patch adds more self-tests coverage.
====================
Paolo Abeni [Fri, 17 Feb 2023 12:28:49 +0000 (13:28 +0100)]
net: make default_rps_mask a per netns attribute
That really was meant to be a per netns attribute from the beginning.
The idea is that once proper isolation is in place in the main
namespace, additional demux in the child namespaces will be redundant.
Let's make child netns default rps mask empty by default.
To avoid bloating the netns with a possibly large cpumask, allocate
it on-demand during the first write operation.
Arnd Bergmann [Fri, 17 Feb 2023 09:58:06 +0000 (10:58 +0100)]
net: microchip: sparx5: reduce stack usage
The vcap_admin structures in vcap_api_next_lookup_advanced_test()
take several hundred bytes of stack frame, but when CONFIG_KASAN_STACK
is enabled, each one of them also has extra padding before and after
it, which ends up blowing the warning limit:
In file included from drivers/net/ethernet/microchip/vcap/vcap_api.c:3521:
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c: In function 'vcap_api_next_lookup_advanced_test':
drivers/net/ethernet/microchip/vcap/vcap_api_kunit.c:1954:1: error: the frame size of 1448 bytes is larger than 1400 bytes [-Werror=frame-larger-than=]
1954 | }
Reduce the total stack usage by replacing the five structures with
an array that only needs one pair of padding areas.
Fixes: 1f741f001160 ("net: microchip: sparx5: Add KUNIT tests for enabling/disabling chains") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Arnd Bergmann [Fri, 17 Feb 2023 09:56:39 +0000 (10:56 +0100)]
sfc: use IS_ENABLED() checks for CONFIG_SFC_SRIOV
One local variable has become unused after a recent change:
drivers/net/ethernet/sfc/ef100_nic.c: In function 'ef100_probe_netdev_pf':
drivers/net/ethernet/sfc/ef100_nic.c:1155:21: error: unused variable 'net_dev' [-Werror=unused-variable]
struct net_device *net_dev = efx->net_dev;
^~~~~~~
The variable is still used in an #ifdef. Replace the #ifdef with
an if(IS_ENABLED()) check that lets the compiler see where it is
used, rather than adding another #ifdef.
This also fixes an uninitialized return value in ef100_probe_netdev_pf()
that gcc did not spot.
Fixes: 7e056e2360d9 ("sfc: obtain device mac address based on firmware handle for ef100") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Devlink reload patchset introduced regression. ICE_VSI_LB wasn't
taken into account when doing default allocation. Fix it by adding a
case for ICE_VSI_LB in ice_vsi_alloc_def().
Geetha sowjanya [Fri, 17 Feb 2023 05:51:12 +0000 (11:21 +0530)]
octeontx2-af: Add NIX Errata workaround on CN10K silicon
This patch adds workaround for below 2 HW erratas
1. Due to improper clock gating, NIXRX may free the same
NPA buffer multiple times.. to avoid this, always enable
NIX RX conditional clock.
2. NIX FIFO does not get initialized on reset, if the SMQ
flush is triggered before the first packet is processed, it
will lead to undefined state. The workaround to perform SMQ
flush only if packet count is non-zero in MDQ.
Andrew Lunn [Fri, 17 Feb 2023 03:15:20 +0000 (04:15 +0100)]
net: phy: Read EEE abilities when using .features
A PHY driver can use a static integer value to indicate what link mode
features it supports, i.e, its abilities.. This is the old way, but
useful when dynamically determining the devices features does not
work, e.g. support of fibre.
EEE support has been moved into phydev->supported_eee. This needs to
be set otherwise the code assumes EEE is not supported. It is normally
set as part of reading the devices abilities. However if a static
integer value was used, the dynamic reading of the abilities is not
performed. Add a call to genphy_c45_read_eee_abilities() to read the
EEE abilities.
Fixes: 8b68710a3121 ("net: phy: start using genphy_c45_ethtool_get/set_eee()") Signed-off-by: Andrew Lunn <[email protected]> Reviewed-by: Oleksij Rempel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Mon, 20 Feb 2023 10:04:22 +0000 (10:04 +0000)]
Merge branch 'phydev-locks'
Andrew Lunn says:
====================
Add additional phydev locks
The phydev lock should be held when accessing members of phydev, or
calling into the driver. Some of the phy_ethtool_ functions are
missing locks. Add them. To avoid deadlock the marvell driver is
modified since it calls one of the functions which gain locks, which
would result in a deadlock.
The missing locks have not caused noticeable issues, so these patches
are for net-next.
====================
Andrew Lunn [Fri, 17 Feb 2023 03:07:13 +0000 (04:07 +0100)]
net: phy: marvell: Use the unlocked genphy_c45_ethtool_get_eee()
phy_ethtool_get_eee() is about to gain locking of the phydev lock.
This means it cannot be used within a PHY driver without causing a
deadlock. Swap to using genphy_c45_ethtool_get_eee() which assumes the
lock has already been taken.
Doug Berger [Thu, 16 Feb 2023 19:41:28 +0000 (11:41 -0800)]
net: bcmgenet: fix MoCA LED control
When the bcmgenet_mii_config() code was refactored it was missed
that the LED control for the MoCA interface got overwritten by
the port_ctrl value. Its previous programming is restored here.
Fixes: 4f8d81b77e66 ("net: bcmgenet: Refactor register access in bcmgenet_mii_config") Signed-off-by: Doug Berger <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Shigeru Yoshida [Thu, 16 Feb 2023 16:37:10 +0000 (01:37 +0900)]
l2tp: Avoid possible recursive deadlock in l2tp_tunnel_register()
When a file descriptor of pppol2tp socket is passed as file descriptor
of UDP socket, a recursive deadlock occurs in l2tp_tunnel_register().
This situation is reproduced by the following program:
int main(void)
{
int sock;
struct sockaddr_pppol2tp addr;
Manish Chopra [Thu, 16 Feb 2023 11:54:47 +0000 (03:54 -0800)]
qede: fix interrupt coalescing configuration
On default driver load device gets configured with unexpected
higher interrupt coalescing values instead of default expected
values as memory allocated from krealloc() is not supposed to
be zeroed out and may contain garbage values.
Fix this by allocating the memory of required size first with
kcalloc() and then use krealloc() to resize and preserve the
contents across down/up of the interface.
Xuan Zhuo [Thu, 16 Feb 2023 08:30:47 +0000 (16:30 +0800)]
xsk: support use vaddr as ring
When we try to start AF_XDP on some machines with long running time, due
to the machine's memory fragmentation problem, there is no sufficient
contiguous physical memory that will cause the start failure.
If the size of the queue is 8 * 1024, then the size of the desc[] is
8 * 1024 * 8 = 16 * PAGE, but we also add struct xdp_ring size, so it is
16page+. This is necessary to apply for a 4-order memory. If there are a
lot of queues, it is difficult to these machine with long running time.
Here, that we actually waste 15 pages. 4-Order memory is 32 pages, but
we only use 17 pages.
This patch replaces __get_free_pages() by vmalloc() to allocate memory
to solve these problems.
This will cause the data received by a victim connection to be cleared,
thus triggering an HTTP 400 error in the server.
This patch exchange the order between clear used and memset, add
barrier to ensure memory consistency.
Fixes: 1c5526968e27 ("net/smc: Clear memory when release and reuse buffer") Signed-off-by: D. Wythe <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
The reason here is that when the server tries to create a second link,
smc_llc_srv_add_link() has no protection and may add a new link to
link group. This breaks the security environment protected by
llc_conf_mutex.
Paolo Abeni [Mon, 20 Feb 2023 07:46:59 +0000 (08:46 +0100)]
Merge branch 'taprio-queuemaxsdu-fixes'
Vladimir Oltean says:
====================
taprio queueMaxSDU fixes
This fixes 3 issues noticed while attempting to reoffload the
dynamically calculated queueMaxSDU values. These are:
- Dynamic queueMaxSDU is not calculated correctly due to a lost patch
- Dynamically calculated queueMaxSDU needs to be clamped on the low end
- Dynamically calculated queueMaxSDU needs to be clamped on the high end
====================
Vladimir Oltean [Wed, 15 Feb 2023 22:46:32 +0000 (00:46 +0200)]
net/sched: taprio: dynamic max_sdu larger than the max_mtu is unlimited
It makes no sense to keep randomly large max_sdu values, especially if
larger than the device's max_mtu. These are visible in "tc qdisc show".
Such a max_sdu is practically unlimited and will cause no packets for
that traffic class to be dropped on enqueue.
Just set max_sdu_dynamic to U32_MAX, which in the logic below causes
taprio to save a max_frm_len of U32_MAX and a max_sdu presented to user
space of 0 (unlimited).
Vladimir Oltean [Wed, 15 Feb 2023 22:46:31 +0000 (00:46 +0200)]
net/sched: taprio: don't allow dynamic max_sdu to go negative after stab adjustment
The overhead specified in the size table comes from the user. With small
time intervals (or gates always closed), the overhead can be larger than
the max interval for that traffic class, and their difference is
negative.
What we want to happen is for max_sdu_dynamic to have the smallest
non-zero value possible (1) which means that all packets on that traffic
class are dropped on enqueue. However, since max_sdu_dynamic is u32, a
negative is represented as a large value and oversized dropping never
happens.
Use max_t with int to force a truncation of max_frm_len to no smaller
than dev->hard_header_len + 1, which in turn makes max_sdu_dynamic no
smaller than 1.
Fixes: fed87cc6718a ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Kurt Kanzenbach <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
Vladimir Oltean [Wed, 15 Feb 2023 22:46:30 +0000 (00:46 +0200)]
net/sched: taprio: fix calculation of maximum gate durations
taprio_calculate_gate_durations() depends on netdev_get_num_tc() and
this returns 0. So it calculates the maximum gate durations for no
traffic class.
I had tested the blamed commit only with another patch in my tree, one
which in the end I decided isn't valuable enough to submit ("net/sched:
taprio: mask off bits in gate mask that exceed number of TCs").
The problem is that having this patch threw off my testing. By moving
the netdev_set_num_tc() call earlier, we implicitly gave to
taprio_calculate_gate_durations() the information it needed.
Extract only the portion from the unsubmitted change which applies the
mqprio configuration to the netdev earlier.
David Howells [Wed, 15 Feb 2023 21:48:05 +0000 (21:48 +0000)]
rxrpc: Fix overproduction of wakeups to recvmsg()
Fix three cases of overproduction of wakeups:
(1) rxrpc_input_split_jumbo() conditionally notifies the app that there's
data for recvmsg() to collect if it queues some data - and then its
only caller, rxrpc_input_data(), goes and wakes up recvmsg() anyway.
Fix the rxrpc_input_data() to only do the wakeup in failure cases.
(2) If a DATA packet is received for a call by the I/O thread whilst
recvmsg() is busy draining the call's rx queue in the app thread, the
call will left on the recvmsg() queue for recvmsg() to pick up, even
though there isn't any data on it.
This can cause an unexpected recvmsg() with a 0 return and no MSG_EOR
set after the reply has been posted to a service call.
Fix this by discarding pending calls from the recvmsg() queue that
don't need servicing yet.
(3) Not-yet-completed calls get requeued after having data read from them,
even if they have no data to read.
Fix this by only requeuing them if they have data waiting on them; if
they don't, the I/O thread will requeue them when data arrives or they
fail.
Paolo Abeni [Mon, 20 Feb 2023 07:14:22 +0000 (08:14 +0100)]
Merge branch 'net-final-gsi-register-updates'
Alex Elder says:
====================
net: final GSI register updates
I believe this is the last set of changes required to allow IPA v5.0
to be supported. There is a little cleanup work remaining, but that
can happen in the next Linux release cycle. Otherwise we just need
config data and register definitions for IPA v5.0 (and DTS updates).
These are ready but won't be posted without further testing.
The first patch in this series fixes a minor bug in a patch just
posted, which I found too late. The second eliminates the GSI
memory "adjustment"; this was done previously to avoid/delay the
need to implement a more general way to define GSI register offsets.
Note that this patch causes "checkpatch" warnings due to indentation
that aligns with an open parenthesis.
The third patch makes use of the newly-defined register offsets, to
eliminate the need for a function that hid a few details. The next
modifies a different helper function to work properly for IPA v5.0+.
The fifth patch changes the way the event ring size is specified
based on how it's now done for IPA v5.0+. And the last defines a
new register required for IPA v5.0+.
====================
Alex Elder [Wed, 15 Feb 2023 19:53:52 +0000 (13:53 -0600)]
net: ipa: add HW_PARAM_4 GSI register
Starting at IPA v5.0, the number of event rings per EE is defined
in a field in a new HW_PARAM_4 GSI register rather than HW_PARAM_2.
Define this new register and its fields, and update the code that
checks the number of rings supported by hardware to use the proper
field based on IPA version.
Alex Elder [Wed, 15 Feb 2023 19:53:51 +0000 (13:53 -0600)]
net: ipa: support different event ring encoding
Starting with IPA v5.0, a channel's event ring index is encoded in
a field in the CH_C_CNTXT_1 GSI register rather than CH_C_CNTXT_0.
Define a new field ID for the former register and encode the event
ring in the appropriate register.
Alex Elder [Wed, 15 Feb 2023 19:53:50 +0000 (13:53 -0600)]
net: ipa: avoid setting an undefined field
The GSI channel protocol field in the CH_C_CNTXT_0 GSI register is
widened starting IPA v5.0, making the CHTYPE_PROTOCOL_MSB field
added in IPA v4.5 unnecessary. Update the code to reflect this.
Alex Elder [Wed, 15 Feb 2023 19:53:49 +0000 (13:53 -0600)]
net: ipa: kill ev_ch_e_cntxt_1_length_encode()
Now that we explicitly define each register field width there is no
need to have a special encoding function for the event ring length.
Add a field for this to the EV_CH_E_CNTXT_1 GSI register, and use it
in place of ev_ch_e_cntxt_1_length_encode() (which can be removed).
Alex Elder [Wed, 15 Feb 2023 19:53:48 +0000 (13:53 -0600)]
net: ipa: kill gsi->virt_raw
Starting at IPA v4.5, almost all GSI registers had their offsets
changed by a fixed amount (shifted downward by 0xd000). Rather than
defining offsets for all those registers dependent on version, an
adjustment was applied for most register accesses. This was
implemented in commit cdeee49f3ef7f ("net: ipa: adjust GSI register
addresses"). It was later modified to be a bit more obvious about
the adjusment, in commit 571b1e7e58ad3 ("net: ipa: use a separate
pointer for adjusted GSI memory").
We now are able to define every GSI register with its own offset, so
there's no need to implement this special adjustment.
So get rid of the "virt_raw" pointer, and just maintain "virt" as
the (non-adjusted) base address of I/O mapped GSI register memory.
Redefine the offsets of all GSI registers (other than the INTER_EE
ones, which were not subject to the adjustment) for IPA v4.5+,
subtracting 0xd000 from their defined offsets instead.
Move the ERROR_LOG and ERROR_LOG_CLR definitions further down in the
register definition files so all registers are defined in order of
their offset.
Alex Elder [Wed, 15 Feb 2023 19:53:47 +0000 (13:53 -0600)]
net: ipa: fix an incorrect assignment
I spotted an error in a patch posted this week, unfortunately just
after it got accepted. The effect of the bug is that time-based
interrupt moderation is disabled. This is not technically a bug,
but it is not what is intended. The problem is that a |= assignment
got implemented as a simple assignment, so the previously assigned
value was ignored.
Fixes: edc6158b18af ("net: ipa: define fields for event-ring related registers") Signed-off-by: Alex Elder <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
Linus Torvalds [Sun, 19 Feb 2023 01:57:16 +0000 (17:57 -0800)]
Merge tag 'x86-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix for x86.
Revert the recent change to the MTRR code which aimed to support
SEV-SNP guests on Hyper-V. It caused a regression on XEN Dom0 kernels.
The underlying issue of MTTR (mis)handling in the x86 code needs some
deeper investigation and is definitely not 6.2 material"
* tag 'x86-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mtrr: Revert 90b926e68f50 ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case")
Linus Torvalds [Sun, 19 Feb 2023 01:46:50 +0000 (17:46 -0800)]
Merge tag 'timers-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A fix for a long standing issue in the alarmtimer code.
Posix-timers armed with a short interval with an ignored signal result
in an unpriviledged DoS. Due to the ignored signal the timer switches
into self rearm mode. This issue had been "fixed" before but a rework
of the alarmtimer code 5 years ago lost that workaround.
There is no real good solution for this issue, which is also worked
around in the core posix-timer code in the same way, but it certainly
moved way up on the ever growing todo list"
* tag 'timers-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimer: Prevent starvation by small intervals and SIG_IGN
Linus Torvalds [Sun, 19 Feb 2023 01:38:18 +0000 (17:38 -0800)]
Merge tag 'irq-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single build fix for the PCI/MSI infrastructure.
The addition of the new alloc/free interfaces in this cycle forgot to
add stub functions for pci_msix_alloc_irq_at() and pci_msix_free_irq()
for the CONFIG_PCI_MSI=n case"
* tag 'irq-urgent-2023-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Provide missing stubs for CONFIG_PCI_MSI=n