Andrii Nakryiko [Thu, 18 Jan 2024 03:31:41 +0000 (19:31 -0800)]
bpf: enforce types for __arg_ctx-tagged arguments in global subprogs
Add enforcement of expected types for context arguments tagged with
arg:ctx (__arg_ctx) tag.
First, any program type will accept generic `void *` context type when
combined with __arg_ctx tag.
Besides accepting "canonical" struct names and `void *`, for a bunch of
program types for which program context is actually a named struct, we
allows a bunch of pragmatic exceptions to match real-world and expected
usage:
- for both kprobes and perf_event we allow `bpf_user_pt_regs_t *` as
canonical context argument type, where `bpf_user_pt_regs_t` is a
*typedef*, not a struct;
- for kprobes, we also always accept `struct pt_regs *`, as that's what
actually is passed as a context to any kprobe program;
- for perf_event, we resolve typedefs (unless it's `bpf_user_pt_regs_t`)
down to actual struct type and accept `struct pt_regs *`, or
`struct user_pt_regs *`, or `struct user_regs_struct *`, depending
on the actual struct type kernel architecture points `bpf_user_pt_regs_t`
typedef to; otherwise, canonical `struct bpf_perf_event_data *` is
expected;
- for raw_tp/raw_tp.w programs, `u64/long *` are accepted, as that's
what's expected with BPF_PROG() usage; otherwise, canonical
`struct bpf_raw_tracepoint_args *` is expected;
- tp_btf supports both `struct bpf_raw_tracepoint_args *` and `u64 *`
formats, both are coded as expections as tp_btf is actually a TRACING
program type, which has no canonical context type;
- iterator programs accept `struct bpf_iter__xxx *` structs, currently
with no further iterator-type specific enforcement;
- fentry/fexit/fmod_ret/lsm/struct_ops all accept `u64 *`;
- classic tracepoint programs, as well as syscall and freplace
programs allow any user-provided type.
In all other cases kernel will enforce exact match of struct name to
expected canonical type. And if user-provided type doesn't match that
expectation, verifier will emit helpful message with expected type name.
Note a bit unnatural way the check is done after processing all the
arguments. This is done to avoid conflict between bpf and bpf-next
trees. Once trees converge, a small follow up patch will place a simple
btf_validate_prog_ctx_type() check into a proper ARG_PTR_TO_CTX branch
(which bpf-next tree patch refactored already), removing duplicated
arg:ctx detection logic.
Andrii Nakryiko [Thu, 18 Jan 2024 03:31:40 +0000 (19:31 -0800)]
bpf: extract bpf_ctx_convert_map logic and make it more reusable
Refactor btf_get_prog_ctx_type() a bit to allow reuse of
bpf_ctx_convert_map logic in more than one places. Simplify interface by
returning btf_type instead of btf_member (field reference in BTF).
To do the above we need to touch and start untangling
btf_translate_to_vmlinux() implementation. We do the bare minimum to
not regress anything for btf_translate_to_vmlinux(), but its
implementation is very questionable for what it claims to be doing.
Mapping kfunc argument types to kernel corresponding types conceptually
is quite different from recognizing program context types. Fixing this
is out of scope for this change though.
Andrii Nakryiko [Thu, 18 Jan 2024 03:31:39 +0000 (19:31 -0800)]
libbpf: feature-detect arg:ctx tag support in kernel
Add feature detector of kernel-side arg:ctx (__arg_ctx) tag support. If
this is detected, libbpf will avoid doing any __arg_ctx-related BTF
rewriting and checks in favor of letting kernel handle this completely.
test_global_funcs/ctx_arg_rewrite subtest is adjusted to do the same
feature detection (albeit in much simpler, though round-about and
inefficient, way), and skip the tests. This is done to still be able to
execute this test on older kernels (like in libbpf CI).
Note, BPF token series ([0]) does a major refactor and code moving of
libbpf-internal feature detection "framework", so to avoid unnecessary
conflicts we keep newly added feature detection stand-alone with ad-hoc
result caching. Once things settle, there will be a small follow up to
re-integrate everything back and move code into its final place in
newly-added (by BPF token series) features.c file.
Hao Sun [Mon, 15 Jan 2024 08:20:28 +0000 (09:20 +0100)]
selftests/bpf: Add test for alu on PTR_TO_FLOW_KEYS
Add a test case for PTR_TO_FLOW_KEYS alu. Testing if alu with variable
offset on flow_keys is rejected. For the fixed offset success case, we
already have C code coverage to verify (e.g. via bpf_flow.c).
Hao Sun [Mon, 15 Jan 2024 08:20:27 +0000 (09:20 +0100)]
bpf: Reject variable offset alu on PTR_TO_FLOW_KEYS
For PTR_TO_FLOW_KEYS, check_flow_keys_access() only uses fixed off
for validation. However, variable offset ptr alu is not prohibited
for this ptr kind. So the variable offset is not checked.
Fix this by rejecting ptr alu with variable offset on flow_keys.
Applying the patch rejects the program with "R7 pointer arithmetic
on flow_keys prohibited".
This patch set fixes an issue in bpf_iter_udp that makes backward
progress and prevents the user space process from finishing. There is
a test at the end to reproduce the bug.
Please see individual patches for details.
v3:
- Fixed the iter_fd check and local_port check in the
patch 3 selftest. (Yonghong)
- Moved jhash2 to test_jhash.h in the patch 3. (Yonghong)
- Added explanation in the bucket selection in the patch 3. (Yonghong)
v2:
- Added patch 1 to fix another bug that goes back to
the previous bucket
- Simplify the fix in patch 2 to always reset iter->offset to 0
- Add a test case to close all udp_sk in a bucket while
in the middle of the iteration.
====================
Martin KaFai Lau [Fri, 12 Jan 2024 19:05:30 +0000 (11:05 -0800)]
selftests/bpf: Test udp and tcp iter batching
The patch adds a test to exercise the bpf_iter_udp batching
logic. It specifically tests the case that there are multiple
so_reuseport udp_sk in a bucket of the udp_table.
The test creates two sets of so_reuseport sockets and
each set on a different port. Meaning there will be
two buckets in the udp_table.
The test does the following:
1. read() 3 out of 4 sockets in the first bucket.
2. close() all sockets in the first bucket. This
will ensure the current bucket's offset in
the kernel does not affect the read() of the
following bucket.
3. read() all 4 sockets in the second bucket.
The test also reads one udp_sk at a time from
the bpf_iter_udp prog. The true case in
"do_test(..., bool onebyone)". This is the buggy case
that the previous patch fixed.
It also tests the "false" case in "do_test(..., bool onebyone)",
meaning the userspace reads the whole bucket. There is
no bug in this case but adding this test also while
at it.
Considering the way to have multiple tcp_sk in the same
bucket is similar (by using so_reuseport),
this patch also tests the bpf_iter_tcp even though the
bpf_iter_tcp batching logic works correctly.
Both IP v4 and v6 are exercising the same bpf_iter batching
code path, so only v6 is tested.
Martin KaFai Lau [Fri, 12 Jan 2024 19:05:29 +0000 (11:05 -0800)]
bpf: Avoid iter->offset making backward progress in bpf_iter_udp
There is a bug in the bpf_iter_udp_batch() function that stops
the userspace from making forward progress.
The case that triggers the bug is the userspace passed in
a very small read buffer. When the bpf prog does bpf_seq_printf,
the userspace read buffer is not enough to capture the whole bucket.
When the read buffer is not large enough, the kernel will remember
the offset of the bucket in iter->offset such that the next userspace
read() can continue from where it left off.
The kernel will skip the number (== "iter->offset") of sockets in
the next read(). However, the code directly decrements the
"--iter->offset". This is incorrect because the next read() may
not consume the whole bucket either and then the next-next read()
will start from offset 0. The net effect is the userspace will
keep reading from the beginning of a bucket and the process will
never finish. "iter->offset" must always go forward until the
whole bucket is consumed.
This patch fixes it by using a local variable "resume_offset"
and "resume_bucket". "iter->offset" is always reset to 0 before
it may be used. "iter->offset" will be advanced to the
"resume_offset" when it continues from the "resume_bucket" (i.e.
"state->bucket == resume_bucket"). This brings it closer to
the bpf_iter_tcp's offset handling which does not suffer
the same bug.
Martin KaFai Lau [Fri, 12 Jan 2024 19:05:28 +0000 (11:05 -0800)]
bpf: iter_udp: Retry with a larger batch size without going back to the previous bucket
The current logic is to use a default size 16 to batch the whole bucket.
If it is too small, it will retry with a larger batch size.
The current code accidentally does a state->bucket-- before retrying.
This goes back to retry with the previous bucket which has already
been done. This patch fixed it.
It is hard to create a selftest. I added a WARN_ON(state->bucket < 0),
forced a particular port to be hashed to the first bucket,
created >16 sockets, and observed the for-loop went back
to the "-1" bucket.
net: netdev_queue: netdev_txq_completed_mb(): fix wake condition
netif_txq_try_stop() uses "get_desc >= start_thrs" as the check for
the call to netif_tx_start_queue().
Use ">=" i netdev_txq_completed_mb(), too.
Fixes: c91c46de6bbc ("net: provide macros for commonly copied lockless queue stop/wake code") Signed-off-by: Marc Kleine-Budde <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Eric Dumazet [Fri, 12 Jan 2024 12:28:16 +0000 (12:28 +0000)]
net: add more sanity check in virtio_net_hdr_to_skb()
syzbot/KMSAN reports access to uninitialized data from gso_features_check() [1]
The repro use af_packet, injecting a gso packet and hdrlen == 0.
We could fix the issue making gso_features_check() more careful
while dealing with NETIF_F_TSO_MANGLEID in fast path.
Or we can make sure virtio_net_hdr_to_skb() pulls minimal network and
transport headers as intended.
Note that for GSO packets coming from untrusted sources, SKB_GSO_DODGY
bit forces a proper header validation (and pull) before the packet can
hit any device ndo_start_xmit(), thus we do not need a precise disection
at virtio_net_hdr_to_skb() stage.
Jiri Pirko [Fri, 12 Jan 2024 11:39:30 +0000 (12:39 +0100)]
net: sched: track device in tcf_block_get/put_ext() only for clsact binder types
Clsact/ingress qdisc is not the only one using shared block,
red is also using it. The device tracking was originally introduced
by commit 913b47d3424e ("net/sched: Introduce tc block netdev
tracking infra") for clsact/ingress only. Commit 94e2557d086a ("net:
sched: move block device tracking into tcf_block_get/put_ext()")
mistakenly enabled that for red as well.
Fix that by adding a check for the binder type being clsact when adding
device to the block->ports xarray.
Reported-by: Ido Schimmel <[email protected]> Closes: https://lore.kernel.org/all/ZZ6JE0odnu1lLPtu@shredder/ Fixes: 94e2557d086a ("net: sched: move block device tracking into tcf_block_get/put_ext()") Signed-off-by: Jiri Pirko <[email protected]> Tested-by: Ido Schimmel <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Tested-by: Victor Nogueira <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Eric Dumazet [Fri, 12 Jan 2024 10:44:27 +0000 (10:44 +0000)]
udp: annotate data-races around up->pending
up->pending can be read without holding the socket lock,
as pointed out by syzbot [1]
Add READ_ONCE() in lockless contexts, and WRITE_ONCE()
on write side.
[1]
BUG: KCSAN: data-race in udpv6_sendmsg / udpv6_sendmsg
write to 0xffff88814e5eadf0 of 4 bytes by task 15547 on cpu 1:
udpv6_sendmsg+0x1405/0x1530 net/ipv6/udp.c:1596
inet6_sendmsg+0x63/0x80 net/ipv6/af_inet6.c:657
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
__sys_sendto+0x257/0x310 net/socket.c:2192
__do_sys_sendto net/socket.c:2204 [inline]
__se_sys_sendto net/socket.c:2200 [inline]
__x64_sys_sendto+0x78/0x90 net/socket.c:2200
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
read to 0xffff88814e5eadf0 of 4 bytes by task 15551 on cpu 0:
udpv6_sendmsg+0x22c/0x1530 net/ipv6/udp.c:1373
inet6_sendmsg+0x63/0x80 net/ipv6/af_inet6.c:657
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
____sys_sendmsg+0x37c/0x4d0 net/socket.c:2586
___sys_sendmsg net/socket.c:2640 [inline]
__sys_sendmmsg+0x269/0x500 net/socket.c:2726
__do_sys_sendmmsg net/socket.c:2755 [inline]
__se_sys_sendmmsg net/socket.c:2752 [inline]
__x64_sys_sendmmsg+0x57/0x60 net/socket.c:2752
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
value changed: 0x00000000 -> 0x0000000a
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 15551 Comm: syz-executor.1 Tainted: G W 6.7.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Sneh Shah [Tue, 9 Jan 2024 14:47:29 +0000 (20:17 +0530)]
net: stmmac: Fix ethool link settings ops for integrated PCS
Currently get/set_link_ksettings ethtool ops are dependent on PCS.
When PCS is integrated, it will not have separate link config.
Bypass configuring and checking PCS for integrated PCS.
Fixes: aa571b6275fb ("net: stmmac: add new switch to struct plat_stmmacenet_data") Tested-by: Andrew Halaney <[email protected]> # sa8775p-ride Signed-off-by: Sneh Shah <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Eric Dumazet [Thu, 11 Jan 2024 19:49:15 +0000 (19:49 +0000)]
mptcp: use OPTION_MPTCP_MPJ_SYNACK in subflow_finish_connect()
subflow_finish_connect() uses four fields (backup, join_id, thmac, none)
that may contain garbage unless OPTION_MPTCP_MPJ_SYNACK has been set
in mptcp_parse_option()
Claudiu Beznea [Fri, 5 Jan 2024 08:52:42 +0000 (10:52 +0200)]
net: phy: micrel: populate .soft_reset for KSZ9131
The RZ/G3S SMARC Module has 2 KSZ9131 PHYs. In this setup, the KSZ9131 PHY
is used with the ravb Ethernet driver. It has been discovered that when
bringing the Ethernet interface down/up continuously, e.g., with the
following sh script:
$ while :; do ifconfig eth0 down; ifconfig eth0 up; done
the link speed and duplex are wrong after interrupting the bring down/up
operation even though the Ethernet interface is up. To recover from this
state the following configuration sequence is necessary (executed
manually):
$ ifconfig eth0 down
$ ifconfig eth0 up
The behavior has been identified also on the Microchip SAMA7G5-EK board
which runs the macb driver and uses the same PHY.
The order of PHY-related operations in ravb_open() is as follows:
ravb_open() ->
ravb_phy_start() ->
ravb_phy_init() ->
of_phy_connect() ->
phy_connect_direct() ->
phy_attach_direct() ->
phy_init_hw() ->
phydev->drv->soft_reset()
phydev->drv->config_init()
phydev->drv->config_intr()
phy_resume()
kszphy_resume()
The order of PHY-related operations in ravb_close is as follows:
ravb_close() ->
phy_stop() ->
phy_suspend() ->
kszphy_suspend() ->
genphy_suspend()
// set BMCR_PDOWN bit in MII_BMCR
In genphy_suspend() setting the BMCR_PDWN bit in MII_BMCR switches the PHY
to Software Power-Down (SPD) mode (according to the KSZ9131 datasheet).
Thus, when opening the interface after it has been previously closed (via
ravb_close()), the phydev->drv->config_init() and
phydev->drv->config_intr() reach the KSZ9131 PHY driver via the
ksz9131_config_init() and kszphy_config_intr() functions.
KSZ9131 specifies that the MII management interface remains operational
during SPD (Software Power-Down), but (according to manual):
- Only access to the standard registers (0 through 31) is supported.
- Access to MMD address spaces other than MMD address space 1 is possible
if the spd_clock_gate_override bit is set.
- Access to MMD address space 1 is not possible.
The spd_clock_gate_override bit is not used in the KSZ9131 driver.
ksz9131_config_init() configures RGMII delay, pad skews and LEDs by
accessesing MMD registers other than those in address space 1.
The datasheet for the KSZ9131 does not specify what happens if registers
from an unsupported address space are accessed while the PHY is in SPD.
To fix the issue the .soft_reset method has been instantiated for KSZ9131,
too. This resets the PHY to the default state before doing any
configurations to it, thus switching it out of SPD.
Fixes: bff5b4b37372 ("net: phy: micrel: add Microchip KSZ9131 initial driver") Signed-off-by: Claudiu Beznea <[email protected]> Reviewed-by: Maxime Chevallier <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Horatiu Vultur [Wed, 10 Jan 2024 11:37:30 +0000 (12:37 +0100)]
net: micrel: Fix PTP frame parsing for lan8841
The HW has the capability to check each frame if it is a PTP frame,
which domain it is, which ptp frame type it is, different ip address in
the frame. And if one of these checks fail then the frame is not
timestamp. Most of these checks were disabled except checking the field
minorVersionPTP inside the PTP header. Meaning that once a partner sends
a frame compliant to 8021AS which has minorVersionPTP set to 1, then the
frame was not timestamp because the HW expected by default a value of 0
in minorVersionPTP.
Fix this issue by removing this check so the userspace can decide on this.
Taehee Yoo [Sun, 7 Jan 2024 14:42:41 +0000 (14:42 +0000)]
amt: do not use overwrapped cb area
amt driver uses skb->cb for storing tunnel information.
This job is worked before TC layer and then amt driver load tunnel info
from skb->cb after TC layer.
So, its cb area should not be overwrapped with CB area used by TC.
In order to not use cb area used by TC, it skips the biggest cb
structure used by TC, which was qdisc_skb_cb.
But it's not anymore.
Currently, biggest structure of TC's CB is tc_skb_cb.
So, it should skip size of tc_skb_cb instead of qdisc_skb_cb.
====================
net: ethernet: ti: am65-cpsw: Allow for MTU values
The am65-cpsw-nuss driver has a fixed definition for the maximum ethernet
frame length of 1522 bytes (AM65_CPSW_MAX_PACKET_SIZE). This limits the switch
ports to only operate at a maximum MTU of 1500 bytes. When combining this CPSW
switch with a DSA switch connected to one of its ports this limitation shows up.
The extra 8 bytes the DSA subsystem adds internally to the ethernet frame
create resulting frames bigger than 1522 bytes (1518 for non VLAN + 8 for DSA
stuff) so they get dropped by the switch.
One of the issues with the the am65-cpsw-nuss driver is that the network device
max_mtu was being set to the same fixed value defined for the max total frame
length (1522 bytes). This makes the DSA subsystem believe that the MTU of the
interface can be set to 1508 bytes to make room for the extra 8 bytes of the DSA
headers. However, all packages created assuming the 1500 bytes payload get
dropped by the switch as oversized.
This series offers a solution to this problem. The max_mtu advertised on the
network device and the actual max frame size configured on the switch registers
are made consistent by letting the extra room needed for the ethernet headers
and the frame checksum (22 bytes including VLAN).
====================
net: ethernet: ti: am65-cpsw: Fix max mtu to fit ethernet frames
The value of AM65_CPSW_MAX_PACKET_SIZE represents the maximum length
of a received frame. This value is written to the register
AM65_CPSW_PORT_REG_RX_MAXLEN.
The maximum MTU configured on the network device should then leave
some room for the ethernet headers and frame check. Otherwise, if
the network interface is configured to its maximum mtu possible,
the frames will be larger than AM65_CPSW_MAX_PACKET_SIZE and will
get dropped as oversized.
The switch supports ethernet frame sizes between 64 and 2024 bytes
(including VLAN) as stated in the technical reference manual, so
define AM65_CPSW_MAX_PACKET_SIZE with that maximum size.
Zhu Yanjun [Thu, 4 Jan 2024 02:09:02 +0000 (10:09 +0800)]
virtio_net: Fix "‘%d’ directive writing between 1 and 11 bytes into a region of size 10" warnings
Fix the warnings when building virtio_net driver.
"
drivers/net/virtio_net.c: In function ‘init_vqs’:
drivers/net/virtio_net.c:4551:48: warning: ‘%d’ directive writing between 1 and 11 bytes into a region of size 10 [-Wformat-overflow=]
4551 | sprintf(vi->rq[i].name, "input.%d", i);
| ^~
In function ‘virtnet_find_vqs’,
inlined from ‘init_vqs’ at drivers/net/virtio_net.c:4645:8:
drivers/net/virtio_net.c:4551:41: note: directive argument in the range [-2147483643, 65534]
4551 | sprintf(vi->rq[i].name, "input.%d", i);
| ^~~~~~~~~~
drivers/net/virtio_net.c:4551:17: note: ‘sprintf’ output between 8 and 18 bytes into a destination of size 16
4551 | sprintf(vi->rq[i].name, "input.%d", i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/virtio_net.c: In function ‘init_vqs’:
drivers/net/virtio_net.c:4552:49: warning: ‘%d’ directive writing between 1 and 11 bytes into a region of size 9 [-Wformat-overflow=]
4552 | sprintf(vi->sq[i].name, "output.%d", i);
| ^~
In function ‘virtnet_find_vqs’,
inlined from ‘init_vqs’ at drivers/net/virtio_net.c:4645:8:
drivers/net/virtio_net.c:4552:41: note: directive argument in the range [-2147483643, 65534]
4552 | sprintf(vi->sq[i].name, "output.%d", i);
| ^~~~~~~~~~~
drivers/net/virtio_net.c:4552:17: note: ‘sprintf’ output between 9 and 19 bytes into a destination of size 16
4552 | sprintf(vi->sq[i].name, "output.%d", i);
octeontx2-af: CN10KB: Fix FIFO length calculation for RPM2
RPM0 and RPM1 on the CN10KB SoC have 8 LMACs each, whereas RPM2
has only 4 LMACs. Similarly, the RPM0 and RPM1 have 256KB FIFO,
whereas RPM2 has 128KB FIFO. This patch fixes an issue with
improper TX credit programming for the RPM2 link.
====================
rtnetlink: allow to enslave with one msg an up interface
The first patch fixes a regression, introduced in linux v6.1, by reverting
a patch. The second patch adds a test to verify this API.
====================
The patch broke:
> ip link set dummy0 up
> ip link set dummy0 master bond0 down
This last command is useful to be able to enslave an interface with only
one netlink message.
After discussion, there is no good reason to support:
> ip link set dummy0 down
> ip link set dummy0 master bond0 up
because the bond interface already set the slave up when it is up.
David Howells [Tue, 9 Jan 2024 15:10:48 +0000 (15:10 +0000)]
rxrpc: Fix use of Don't Fragment flag
rxrpc normally has the Don't Fragment flag set on the UDP packets it
transmits, except when it has decided that DATA packets aren't getting
through - in which case it turns it off just for the DATA transmissions.
This can be a problem, however, for RESPONSE packets that convey
authentication and crypto data from the client to the server as ticket may
be larger than can fit in the MTU.
In such a case, rxrpc gets itself into an infinite loop as the sendmsg
returns an error (EMSGSIZE), which causes rxkad_send_response() to return
-EAGAIN - and the CHALLENGE packet is put back on the Rx queue to retry,
leading to the I/O thread endlessly attempting to perform the transmission.
Fix this by disabling DF on RESPONSE packets for now. The use of DF and
best data MTU determination needs reconsidering at some point in the
future.
Which is obviously bogus, because not all net_devices have a netdev_priv()
of type struct dsa_user_priv. But struct dsa_user_priv is fairly small,
and p->dp means dereferencing 8 bytes starting with offset 16. Most
drivers allocate that much private memory anyway, making our access not
fault, and we discard the bogus data quickly afterwards, so this wasn't
caught.
But the dummy interface is somewhat special in that it calls
alloc_netdev() with a priv size of 0. So every netdev_priv() dereference
is invalid, and we get this when we emit a NETDEV_PRECHANGEUPPER event
with a VLAN as its new upper:
$ ip link add dummy1 type dummy
$ ip link add link dummy1 name dummy1.100 type vlan id 100
[ 43.309174] ==================================================================
[ 43.316456] BUG: KASAN: slab-out-of-bounds in dsa_user_prechangeupper+0x30/0xe8
[ 43.323835] Read of size 8 at addr ffff3f86481d2990 by task ip/374
[ 43.330058]
[ 43.342436] Call trace:
[ 43.366542] dsa_user_prechangeupper+0x30/0xe8
[ 43.371024] dsa_user_netdevice_event+0xb38/0xee8
[ 43.375768] notifier_call_chain+0xa4/0x210
[ 43.379985] raw_notifier_call_chain+0x24/0x38
[ 43.384464] __netdev_upper_dev_link+0x3ec/0x5d8
[ 43.389120] netdev_upper_dev_link+0x70/0xa8
[ 43.393424] register_vlan_dev+0x1bc/0x310
[ 43.397554] vlan_newlink+0x210/0x248
[ 43.401247] rtnl_newlink+0x9fc/0xe30
[ 43.404942] rtnetlink_rcv_msg+0x378/0x580
Avoid the kernel oops by dereferencing after the type check, as customary.
Lin Ma [Wed, 10 Jan 2024 06:14:00 +0000 (14:14 +0800)]
net: qualcomm: rmnet: fix global oob in rmnet_policy
The variable rmnet_link_ops assign a *bigger* maxtype which leads to a
global out-of-bounds read when parsing the netlink attributes. See bug
trace below:
==================================================================
BUG: KASAN: global-out-of-bounds in validate_nla lib/nlattr.c:386 [inline]
BUG: KASAN: global-out-of-bounds in __nla_validate_parse+0x24af/0x2750 lib/nlattr.c:600
Read of size 1 at addr ffffffff92c438d0 by task syz-executor.6/84207
The intel test robot uses only selftest's Makefile, not the top linux
Makefile:
> make W=1 O=/tmp/kselftest -C tools/testing/selftests
So, $(LINK.c) is determined by environment, rather than by kernel
Makefiles. On my machine (as well as other people that ran tcp-ao
selftests) GNU/Make implicit definition does use $(LDFLAGS):
But, according to build robot report, it's not the case for them.
While I could just avoid using pre-defined $(LINK.c), it's also used by
selftests/lib.mk by default.
Anyways, according to GNU/Make documentation [1], I should have used
$(LDLIBS) instead of $(LDFLAGS) in the first place, so let's just do it:
> LDFLAGS
> Extra flags to give to compilers when they are supposed to invoke
> the linker, ‘ld’, such as -L. Libraries (-lfoo) should be added
> to the LDLIBS variable instead.
> LDLIBS
> Library flags or names given to compilers when they are supposed
> to invoke the linker, ‘ld’. LOADLIBES is a deprecated (but still
> supported) alternative to LDLIBS. Non-library linker flags, such
> as -L, should go in the LDFLAGS variable.
Arnd Bergmann [Thu, 11 Jan 2024 16:27:53 +0000 (17:27 +0100)]
wangxunx: select CONFIG_PHYLINK where needed
The ngbe driver needs phylink:
arm-linux-gnueabi-ld: drivers/net/ethernet/wangxun/libwx/wx_ethtool.o: in function `wx_nway_reset':
wx_ethtool.c:(.text+0x458): undefined reference to `phylink_ethtool_nway_reset'
arm-linux-gnueabi-ld: drivers/net/ethernet/wangxun/ngbe/ngbe_main.o: in function `ngbe_remove':
ngbe_main.c:(.text+0x7c): undefined reference to `phylink_destroy'
arm-linux-gnueabi-ld: drivers/net/ethernet/wangxun/ngbe/ngbe_main.o: in function `ngbe_open':
ngbe_main.c:(.text+0xf90): undefined reference to `phylink_connect_phy'
arm-linux-gnueabi-ld: drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.o: in function `ngbe_mdio_init':
ngbe_mdio.c:(.text+0x314): undefined reference to `phylink_create'
Jakub Kicinski [Tue, 9 Jan 2024 16:45:17 +0000 (08:45 -0800)]
MAINTAINERS: ibmvnic: drop Dany from reviewers
I missed that Dany uses a different email address
when tagging patches ([email protected])
and asked him if he's still actively working on ibmvnic.
He doesn't really fall under our removal criteria,
but he admitted that he already moved on to other projects.
Jakub Kicinski [Tue, 9 Jan 2024 16:45:16 +0000 (08:45 -0800)]
MAINTAINERS: mark ax25 as Orphan
We haven't heard from Ralf for two years, according to lore.
We get a constant stream of "fixes" to ax25 from people using
code analysis tools. Nobody is reviewing those, let's reflect
this reality in MAINTAINERS.
Jakub Kicinski [Tue, 9 Jan 2024 16:45:15 +0000 (08:45 -0800)]
MAINTAINERS: Bluetooth: retire Johan (for now?)
Johan moved to maintaining the Zephyr Bluetooth stack,
and we haven't heard from him on the ML in 3 years
(according to lore), and seen any tags in git in 4 years.
Trade the MAINTAINER entry for CREDITS, we can revert
whenever Johan comes back to Linux hacking :)
Jakub Kicinski [Tue, 9 Jan 2024 16:45:12 +0000 (08:45 -0800)]
MAINTAINERS: eth: mt7530: move Landen Chao to CREDITS
mt7530 is a pretty active driver and last we have heard
from Landen Chao on the list was March. There were total
of 4 message from them in the last 2.5 years.
I think it's time to move to CREDITS.
Linus Torvalds [Thu, 11 Jan 2024 18:07:29 +0000 (10:07 -0800)]
Merge tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"The most interesting thing is probably the networking structs
reorganization and a significant amount of changes is around
self-tests.
Core & protocols:
- Analyze and reorganize core networking structs (socks, netdev,
netns, mibs) to optimize cacheline consumption and set up build
time warnings to safeguard against future header changes
This improves TCP performances with many concurrent connections up
to 40%
- Add page-pool netlink-based introspection, exposing the memory
usage and recycling stats. This helps indentify bad PP users and
possible leaks
- Refine TCP/DCCP source port selection to no longer favor even
source port at connect() time when IP_LOCAL_PORT_RANGE is set. This
lowers the time taken by connect() for hosts having many active
connections to the same destination
- Refactor the TCP bind conflict code, shrinking related socket
structs
- Refactor TCP SYN-Cookie handling, as a preparation step to allow
arbitrary SYN-Cookie processing via eBPF
- Tune optmem_max for 0-copy usage, increasing the default value to
128KB and namespecifying it
- Allow coalescing for cloned skbs coming from page pools, improving
RX performances with some common configurations
- Reduce extension header parsing overhead at GRO time
- Add bridge MDB bulk deletion support, allowing user-space to
request the deletion of matching entries
- Reorder nftables struct members, to keep data accessed by the
datapath first
- Introduce TC block ports tracking and use. This allows supporting
multicast-like behavior at the TC layer
- Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and
classifiers (RSVP and tcindex)
- More data-race annotations
- Extend the diag interface to dump TCP bound-only sockets
- Conditional notification of events for TC qdisc class and actions
- Support for WPAN dynamic associations with nearby devices, to form
a sub-network using a specific PAN ID
- Implement SMCv2.1 virtual ISM device support
- Add support for Batman-avd mulicast packet type
BPF:
- Tons of verifier improvements:
- BPF register bounds logic and range support along with a large
test suite
- log improvements
- complete precision tracking support for register spills
- track aligned STACK_ZERO cases as imprecise spilled registers.
This improves the verifier "instructions processed" metric from
single digit to 50-60% for some programs
- support for user's global BPF subprogram arguments with few
commonly requested annotations for a better developer
experience
- support tracking of BPF_JNE which helps cases when the compiler
transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the
like
- several fixes
- Add initial TX metadata implementation for AF_XDP with support in
mlx5 and stmmac drivers. Two types of offloads are supported right
now, that is, TX timestamp and TX checksum offload
- Fix kCFI bugs in BPF all forms of indirect calls from BPF into
kernel and from kernel into BPF work with CFI enabled. This allows
BPF to work with CONFIG_FINEIBT=y
- Change BPF verifier logic to validate global subprograms lazily
instead of unconditionally before the main program, so they can be
guarded using BPF CO-RE techniques
- Support uid/gid options when mounting bpffs
- Add a new kfunc which acquires the associated cgroup of a task
within a specific cgroup v1 hierarchy where the latter is
identified by its id
- Extend verifier to allow bpf_refcount_acquire() of a map value
field obtained via direct load which is a use-case needed in
sched_ext
- Add BPF link_info support for uprobe multi link along with bpftool
integration for the latter
- Support for VLAN tag in XDP hints
- Remove deprecated bpfilter kernel leftovers given the project is
developed in user-space (https://github.com/facebook/bpfilter)
Misc:
- Support for parellel TC self-tests execution
- Increase MPTCP self-tests coverage
- Updated the bridge documentation, including several so-far
undocumented features
- Convert all the net self-tests to run in unique netns, to avoid
random failures due to conflict and allow concurrent runs
- Add TCP-AO self-tests
- Add kunit tests for both cfg80211 and mac80211
- Autogenerate Netlink families documentation from YAML spec
- Add yml-gen support for fixed headers and recursive nests, the tool
can now generate user-space code for all genetlink families for
which we have specs
- A bunch of additional module descriptions fixes
- Catch incorrect freeing of pages belonging to a page pool
Driver API:
- Rust abstractions for network PHY drivers; do not cover yet the
full C API, but already allow implementing functional PHY drivers
in rust
- Introduce queue and NAPI support in the netdev Netlink interface,
allowing complete access to the device <> NAPIs <> queues
relationship
- Introduce notifications filtering for devlink to allow control
application scale to thousands of instances
- Improve PHY validation, requesting rate matching information for
each ethtool link mode supported by both the PHY and host
- Add support for ethtool symmetric-xor RSS hash
- ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD
platform
- Expose pin fractional frequency offset value over new DPLL generic
netlink attribute
- Convert older drivers to platform remove callback returning void
- Ethernet high-speed NICs:
- Intel (100G, ice, idpf):
- allow one by one port representors creation and removal
- add temperature and clock information reporting
- add get/set for ethtool's header split ringparam
- add again FW logging
- adds support switchdev hardware packet mirroring
- iavf: implement symmetric-xor RSS hash
- igc: add support for concurrent physical and free-running
timers
- i40e: increase the allowable descriptors
- nVidia/Mellanox:
- Preparation for Socket-Direct multi-dev netdev. That will
allow in future releases combining multiple PFs devices
attached to different NUMA nodes under the same netdev
- Broadcom (bnxt):
- TX completion handling improvements
- add basic ntuple filter support
- reduce MSIX vectors usage for MQPRIO offload
- add VXLAN support, USO offload and TX coalesce completion
for P7
- Marvell Octeon EP:
- xmit-more support
- add PF-VF mailbox support and use it for FW notifications
for VFs
- Wangxun (ngbe/txgbe):
- implement ethtool functions to operate pause param, ring
param, coalesce channel number and msglevel
- Netronome/Corigine (nfp):
- add flow-steering support
- support UDP segmentation offload
- Ethernet NICs embedded, slower, virtual:
- Xilinx AXI: remove duplicate DMA code adopting the dma engine
driver
- stmmac: add support for HW-accelerated VLAN stripping
- TI AM654x sw: add mqprio, frame preemption & coalescing
- gve: add support for non-4k page sizes.
- virtio-net: support dynamic coalescing moderation
- nVidia/Mellanox Ethernet datacenter switches:
- allow firmware upgrade without a reboot
- more flexible support for bridge flooding via the compressed
FID flooding mode
- Ethernet embedded switches:
- Microchip:
- fine-tune flow control and speed configurations in KSZ8xxx
- KSZ88X3: enable setting rmii reference
- Renesas:
- add jumbo frames support
- Marvell:
- 88E6xxx: add "eth-mac" and "rmon" stats support
- Ethernet PHYs:
- aquantia: add firmware load support
- at803x: refactor the driver to simplify adding support for more
chip variants
- NXP C45 TJA11xx: Add MACsec offload support
- Wifi:
- MediaTek (mt76):
- NVMEM EEPROM improvements
- mt7996 Extremely High Throughput (EHT) improvements
- mt7996 Wireless Ethernet Dispatcher (WED) support
- mt7996 36-bit DMA support
- Qualcomm (ath12k):
- support for a single MSI vector
- WCN7850: support AP mode
- Intel (iwlwifi):
- new debugfs file fw_dbg_clear
- allow concurrent P2P operation on DFS channels
- Bluetooth:
- QCA2066: support HFP offload
- ISO: more broadcast-related improvements
- NXP: better recovery in case receiver/transmitter get out of sync"
* tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1714 commits)
lan78xx: remove redundant statement in lan78xx_get_eee
lan743x: remove redundant statement in lan743x_ethtool_get_eee
bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer()
bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel()
bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter()
tcp: Revert no longer abort SYN_SENT when receiving some ICMP
Revert "mlx5 updates 2023-12-20"
Revert "net: stmmac: Enable Per DMA Channel interrupt"
ipvlan: Remove usage of the deprecated ida_simple_xx() API
ipvlan: Fix a typo in a comment
net/sched: Remove ipt action tests
net: stmmac: Use interrupt mode INTM=1 for per channel irq
net: stmmac: Add support for TX/RX channel interrupt
net: stmmac: Make MSI interrupt routine generic
dt-bindings: net: snps,dwmac: per channel irq
net: phy: at803x: make read_status more generic
net: phy: at803x: add support for cdt cross short test for qca808x
net: phy: at803x: refactor qca808x cable test get status function
net: phy: at803x: generalize cdt fault length function
net: ethernet: cortina: Drop TSO support
...
Linus Torvalds [Thu, 11 Jan 2024 02:18:20 +0000 (18:18 -0800)]
Merge tag 's390-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Add machine variable capacity information to /proc/sysinfo.
- Limit the waste of page tables and always align vmalloc area size and
base address on segment boundary.
- Fix a memory leak when an attempt to register interruption sub class
(ISC) for the adjunct-processor (AP) guest failed.
- Reset response code AP_RESPONSE_INVALID_GISA to understandable by
guest AP_RESPONSE_INVALID_ADDRESS in response to a failed
interruption sub class (ISC) registration attempt.
- Improve reaction to adjunct-processor (AP)
AP_RESPONSE_OTHERWISE_CHANGED response code when enabling interrupts
on behalf of a guest.
- Fix incorrect sysfs 'status' attribute of adjunct-processor (AP)
queue device bound to the vfio_ap device driver when the mediated
device is attached to a guest, but the queue device is not passed
through.
- Rework struct ap_card to hold the whole adjunct-processor (AP) card
hardware information. As result, all the ugly bit checks are replaced
by simple evaluations of the required bit fields.
- Improve handling of some weird scenarios between service element (SE)
host and SE guest with adjunct-processor (AP) pass-through support.
- Change local_ctl_set_bit() and local_ctl_clear_bit() so they return
the previous value of the to be changed control register. This is
useful if a bit is only changed temporarily and the previous content
needs to be restored.
- The kernel starts with machine checks disabled and is expected to
enable it once trap_init() is called. However the implementation
allows machine checks early. Consistently enable it in trap_init()
only.
- local_mcck_disable() and local_mcck_enable() assume that machine
checks are always enabled. Instead implement and use
local_mcck_save() and local_mcck_restore() to disable machine checks
and restore the previous state.
- Modification of floating point control (FPC) register of a traced
process using ptrace interface may lead to corruption of the FPC
register of the tracing process. Fix this.
- kvm_arch_vcpu_ioctl_set_fpu() allows to set the floating point
control (FPC) register in vCPU, but may lead to corruption of the FPC
register of the host process. Fix this.
- Use READ_ONCE() to read a vCPU floating point register value from the
memory mapped area. This avoids that, depending on code generation, a
different value is tested for validity than the one that is used.
- Get rid of test_fp_ctl(), since it is quite subtle to use it
correctly. Instead copy a new floating point control register value
into its save area and test the validity of the new value when
loading it.
- Remove superfluous save_fpu_regs() call.
- Remove s390 support for ARCH_WANTS_DYNAMIC_TASK_STRUCT. All machines
provide the vector facility since many years and the need to make the
task structure size dependent on the vector facility does not exist.
- Remove the "novx" kernel command line option, as the vector code runs
without any problems since many years.
- Add the vector facility to the z13 architecture level set (ALS). All
hypervisors support the vector facility since many years. This allows
compile time optimizations of the kernel.
- Get rid of MACHINE_HAS_VX and replace it with cpu_has_vx(). As
result, the compiled code will have less runtime checks and less
code.
- Convert pgste_get_lock() and pgste_set_unlock() ASM inlines to C.
- Convert the struct subchannel spinlock from pointer to member.
* tag 's390-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (24 commits)
Revert "s390: update defconfigs"
s390/cio: make sch->lock spinlock pointer a member
s390: update defconfigs
s390/mm: convert pgste locking functions to C
s390/fpu: get rid of MACHINE_HAS_VX
s390/als: add vector facility to z13 architecture level set
s390/fpu: remove "novx" option
s390/fpu: remove ARCH_WANTS_DYNAMIC_TASK_STRUCT support
KVM: s390: remove superfluous save_fpu_regs() call
s390/fpu: get rid of test_fp_ctl()
KVM: s390: use READ_ONCE() to read fpc register value
KVM: s390: fix setting of fpc register
s390/ptrace: handle setting of fpc register correctly
s390/nmi: implement and use local_mcck_save() / local_mcck_restore()
s390/nmi: consistently enable machine checks in trap_init()
s390/ctlreg: return old register contents when changing bits
s390/ap: handle outband SE bind state change
s390/ap: store TAPQ hwinfo in struct ap_card
s390/vfio-ap: fix sysfs status attribute for AP queue devices
s390/vfio-ap: improve reaction to response code 07 from PQAP(AQIC) command
...
Linus Torvalds [Thu, 11 Jan 2024 02:13:44 +0000 (18:13 -0800)]
Merge tag 'asm-generic-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanups from Arnd Bergmann:
"A series from Baoquan He cleans up the asm-generic/io.h to remove the
ioremap_uc() definition from everything except x86, which still needs
it for pre-PAT systems. This series notably contains a patch from
Jiaxun Yang that converts MIPS to use asm-generic/io.h like every
other architecture does, enabling future cleanups.
Some of my own patches fix -Wmissing-prototype warnings in
architecture specific code across several architectures. This is now
needed as the warning is enabled by default. There are still some
remaining warnings in minor platforms, but the series should catch
most of the widely used ones make them more consistent with one
another.
David McKay fixes a bug in __generic_cmpxchg_local() when this is used
on 64-bit architectures. This could currently only affect parisc64 and
sparc64.
Additional cleanups address from Linus Walleij, Uwe Kleine-König,
Thomas Huth, and Kefeng Wang help reduce unnecessary inconsistencies
between architectures"
* tag 'asm-generic-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: Fix 32 bit __generic_cmpxchg_local
Hexagon: Make pfn accessors statics inlines
ARC: mm: Make virt_to_pfn() a static inline
mips: remove extraneous asm-generic/iomap.h include
sparc: Use $(kecho) to announce kernel images being ready
arm64: vdso32: Define BUILD_VDSO32_64 to correct prototypes
csky: fix arch_jump_label_transform_static override
arch: add do_page_fault prototypes
arch: add missing prepare_ftrace_return() prototypes
arch: vdso: consolidate gettime prototypes
arch: include linux/cpu.h for trap_init() prototype
arch: fix asm-offsets.c building with -Wmissing-prototypes
arch: consolidate arch_irq_work_raise prototypes
hexagon: Remove CONFIG_HEXAGON_ARCH_VERSION from uapi header
asm/io: remove unnecessary xlate_dev_mem_ptr() and unxlate_dev_mem_ptr()
mips: io: remove duplicated codes
arch/*/io.h: remove ioremap_uc in some architectures
mips: add <asm-generic/io.h> including
Linus Torvalds [Thu, 11 Jan 2024 02:00:18 +0000 (18:00 -0800)]
Merge tag 'modules-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module updates from Luis Chamberlain:
"Just one cleanup and one documentation improvement change. No
functional changes"
* tag 'modules-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
kernel/module: improve documentation for try_module_get()
module: Remove redundant TASK_UNINTERRUPTIBLE
Linus Torvalds [Thu, 11 Jan 2024 01:44:36 +0000 (17:44 -0800)]
Merge tag 'sysctl-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
"To help make the move of sysctls out of kernel/sysctl.c not incur a
size penalty sysctl has been changed to allow us to not require the
sentinel, the final empty element on the sysctl array. Joel Granados
has been doing all this work.
In the v6.6 kernel we got the major infrastructure changes required to
support this. For v6.7 we had all arch/ and drivers/ modified to
remove the sentinel. For v6.8-rc1 we get a few more updates for fs/
directory only.
The kernel/ directory is left but we'll save that for v6.9-rc1 as
those patches are still being reviewed. After that we then can expect
also the removal of the no longer needed check for procname == NULL.
Let us recap the purpose of this work:
- this helps reduce the overall build time size of the kernel and run
time memory consumed by the kernel by about ~64 bytes per array
- the extra 64-byte penalty is no longer inncurred now when we move
sysctls out from kernel/sysctl.c to their own files
Thomas Weißschuh also sent a few cleanups, for v6.9-rc1 we expect to
see further work by Thomas Weißschuh with the constificatin of the
struct ctl_table.
Due to Joel Granados's work, and to help bring in new blood, I have
suggested for him to become a maintainer and he's accepted. So for
v6.9-rc1 I look forward to seeing him sent you a pull request for
further sysctl changes. This also removes Iurii Zaikin as a maintainer
as he has moved on to other projects and has had no time to help at
all"
* tag 'sysctl-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
sysctl: remove struct ctl_path
sysctl: delete unused define SYSCTL_PERM_EMPTY_DIR
coda: Remove the now superfluous sentinel elements from ctl_table array
sysctl: Remove the now superfluous sentinel elements from ctl_table array
fs: Remove the now superfluous sentinel elements from ctl_table array
cachefiles: Remove the now superfluous sentinel element from ctl_table array
sysclt: Clarify the results of selftest run
sysctl: Add a selftest for handling empty dirs
sysctl: Fix out of bounds access for empty sysctl registers
MAINTAINERS: Add Joel Granados as co-maintainer for proc sysctl
MAINTAINERS: remove Iurii Zaikin from proc sysctl
Linus Torvalds [Thu, 11 Jan 2024 00:43:55 +0000 (16:43 -0800)]
Merge tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs
Pull header cleanups from Kent Overstreet:
"The goal is to get sched.h down to a type only header, so the main
thing happening in this patchset is splitting out various _types.h
headers and dependency fixups, as well as moving some things out of
sched.h to better locations.
This is prep work for the memory allocation profiling patchset which
adds new sched.h interdepencencies"
* tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (51 commits)
Kill sched.h dependency on rcupdate.h
kill unnecessary thread_info.h include
Kill unnecessary kernel.h include
preempt.h: Kill dependency on list.h
rseq: Split out rseq.h from sched.h
LoongArch: signal.c: add header file to fix build error
restart_block: Trim includes
lockdep: move held_lock to lockdep_types.h
sem: Split out sem_types.h
uidgid: Split out uidgid_types.h
seccomp: Split out seccomp_types.h
refcount: Split out refcount_types.h
uapi/linux/resource.h: fix include
x86/signal: kill dependency on time.h
syscall_user_dispatch.h: split out *_types.h
mm_types_task.h: Trim dependencies
Split out irqflags_types.h
ipc: Kill bogus dependency on spinlock.h
shm: Slim down dependencies
workqueue: Split out workqueue_types.h
...
Linus Torvalds [Thu, 11 Jan 2024 00:34:17 +0000 (16:34 -0800)]
Merge tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs updates from Kent Overstreet:
- btree write buffer rewrite: instead of adding keys to the btree write
buffer at transaction commit time, we now journal them with a
different journal entry type and copy them from the journal to the
write buffer just prior to journal write.
This reduces the number of atomic operations on shared cachelines in
the transaction commit path and is a signicant performance
improvement on some workloads: multithreaded 4k random writes went
from ~650k iops to ~850k iops.
- Bring back optimistic spinning for six locks: the new implementation
doesn't use osq locks; instead we add to the lock waitlist as normal,
and then spin on the lock_acquired bit in the waitlist entry, _not_
the lock itself.
- New ioctls:
- BCH_IOCTL_DEV_USAGE_V2, which allows for new data types
- BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of
fsck but without mounting: useful for transparently using the
kernel version of fsck from 'bcachefs fsck' when the kernel
version is a better match for the on disk filesystem.
- BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported
yet, but the passes that are supported are fully featured - errors
may be corrected as normal.
The new ioctls use the new 'thread_with_file' abstraction for kicking
off a kthread that's tied to a file descriptor returned to userspace
via the ioctl.
- btree_paths within a btree_trans are now dynamically growable,
instead of being limited to 64. This is important for the
check_directory_structure phase of fsck, and also fixes some issues
we were having with btree path overflow in the reflink btree.
- Trigger refactoring; prep work for the upcoming disk space accounting
rewrite
- Numerous bugfixes :)
* tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (226 commits)
bcachefs: eytzinger0_find() search should be const
bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent()
bcachefs: fix simulateously upgrading & downgrading
bcachefs: Restart recovery passes more reliably
bcachefs: bch2_dump_bset() doesn't choke on u64s == 0
bcachefs: improve checksum error messages
bcachefs: improve validate_bset_keys()
bcachefs: print sb magic when relevant
bcachefs: __bch2_sb_field_to_text()
bcachefs: %pg is banished
bcachefs: Improve would_deadlock trace event
bcachefs: fsck_err()s don't need to manually check c->sb.version anymore
bcachefs: Upgrades now specify errors to fix, like downgrades
bcachefs: no thread_with_file in userspace
bcachefs: Don't autofix errors we can't fix
bcachefs: add missing bch2_latency_acct() call
bcachefs: increase max_active on io_complete_wq
bcachefs: add time_stats for btree_node_read_done()
bcachefs: don't clear accessed bit in btree node fill
bcachefs: Add an option to control btree node prefetching
...
Linus Torvalds [Thu, 11 Jan 2024 00:23:30 +0000 (16:23 -0800)]
Merge tag 'v6.8-rc-part1-smb-client' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Various smb client fixes, most related to better handling special file
types:
- Improve handling of special file types:
- performance improvement (better compounding and better caching
of readdir entries that are reparse points)
- extend support for creating special files (sockets, fifos,
block/char devices)
- fix renaming and hardlinking of reparse points
- extend support for creating symlinks with IO_REPARSE_TAG_SYMLINK
- Multichannel logging improvement
- Exception handling fix
- Minor cleanups"
* tag 'v6.8-rc-part1-smb-client' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number for cifs.ko
cifs: remove unneeded return statement
cifs: make cifs_chan_update_iface() a void function
cifs: delete unnecessary NULL checks in cifs_chan_update_iface()
cifs: get rid of dup length check in parse_reparse_point()
smb: client: stop revalidating reparse points unnecessarily
cifs: Pass unbyteswapped eof value into SMB2_set_eof()
smb3: Improve exception handling in allocate_mr_list()
cifs: fix in logging in cifs_chan_update_iface
smb: client: handle special files and symlinks in SMB3 POSIX
smb: client: cleanup smb2_query_reparse_point()
smb: client: allow creating symlinks via reparse points
smb: client: fix hardlinking of reparse points
smb: client: fix renaming of reparse points
smb: client: optimise reparse point querying
smb: client: allow creating special files via reparse points
smb: client: extend smb2_compound_op() to accept more commands
smb: client: Fix minor whitespace errors and warnings
Linus Torvalds [Thu, 11 Jan 2024 00:13:57 +0000 (16:13 -0800)]
Merge tag 'nfs-for-6.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull nfs client updates from Anna Schumaker:
"New Features:
- Always ask for type with READDIR
- Remove nfs_writepage()
Bugfixes:
- Fix a suspicious RCU usage warning
- Fix a blocklayoutdriver reference leak
- Fix the block driver's calculation of layoutget size
- Fix handling NFS4ERR_RETURNCONFLICT
- Fix _xprt_switch_find_current_entry()
- Fix v4.1 backchannel request timeouts
- Don't add zero-length pnfs block devices
- Use the parent cred in nfs_access_login_time()
Cleanups:
- A few improvements when dealing with referring calls from the
server
- Clean up various unused variables, struct fields, and function
calls
- Various tracepoint improvements"
* tag 'nfs-for-6.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (21 commits)
NFSv4.1: Use the nfs_client's rpc timeouts for backchannel
SUNRPC: Fixup v4.1 backchannel request timeouts
rpc_pipefs: Replace one label in bl_resolve_deviceid()
nfs: Remove writepage
NFS: drop unused nfs_direct_req bytes_left
pNFS: Fix the pnfs block driver's calculation of layoutget size
nfs: print fileid in lookup tracepoints
nfs: rename the nfs_async_rename_done tracepoint
nfs: add new tracepoint at nfs4 revalidate entry point
SUNRPC: fix _xprt_switch_find_current_entry logic
NFSv4.1/pnfs: Ensure we handle the error NFS4ERR_RETURNCONFLICT
NFSv4.1: if referring calls are complete, trust the stateid argument
NFSv4: Track the number of referring calls in struct cb_process_state
NFS: Use parent's objective cred in nfs_access_login_time()
NFSv4: Always ask for type with READDIR
pnfs/blocklayout: Don't add zero-length pnfs_block_dev
blocklayoutdriver: Fix reference leak of pnfs_device_node
SUNRPC: Fix a suspicious RCU usage warning
SUNRPC: Create a helper function for accessing the rpc_clnt's xprt_switch
SUNRPC: Remove unused function rpc_clnt_xprt_switch_put()
...
Linus Torvalds [Thu, 11 Jan 2024 00:09:14 +0000 (16:09 -0800)]
Merge tag 'ext4_for_linus-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Various ext4 bug fixes and cleanups. The fixes are mostly in the
fstrim and mballoc code paths.
Also enable dioread_nolock in the case where the block size is less
than the page size (dioread_nolock has been default in the bs == ps
case for quite some time)"
* tag 'ext4_for_linus-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix inconsistent between segment fstrim and full fstrim
ext4: fallback to complex scan if aligned scan doesn't work
ext4: convert ext4_da_do_write_end() to take a folio
ext4: allow for the last group to be marked as trimmed
ext4: move ext4_check_bdev_write_error() into nojournal mode
jbd2: abort journal when detecting metadata writeback error of fs dev
jbd2: remove unused 'JBD2_CHECKPOINT_IO_ERROR' and 'j_atomic_flags'
jbd2: replace journal state flag by checking errseq
jbd2: add errseq to detect client fs's bdev writeback error
ext4: improving calculation of 'fe_{len|start}' in mb_find_extent()
ext4: clarify handling of unwritten bh in __ext4_block_zero_page_range()
ext4: treat end of range as exclusive in ext4_zero_range()
ext4: enable dioread_nolock as default for bs < ps case
ext4: delete redundant calculations in ext4_mb_get_buddy_page_lock()
ext4: reduce unnecessary memory allocation in alloc_flex_gd()
ext4: avoid online resizing failures due to oversized flex bg
ext4: remove unnecessary check from alloc_flex_gd()
ext4: unify the type of flexbg_size to unsigned int
Linus Torvalds [Thu, 11 Jan 2024 00:06:58 +0000 (16:06 -0800)]
Merge tag 'unicode-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode
Pull unicode updates from Gabriel Krisman Bertazi:
"Other than the update to MAINTAINERS, this PR has only a fix to stop
ecryptfs from inadvertently mounting case-insensitive filesystems that
it cannot handle, which would otherwise caused post-mount failures"
* tag 'unicode-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
MAINTAINERS: update unicode maintainer e-mail address
ecryptfs: Reject casefold directory inodes
David Howells [Wed, 10 Jan 2024 21:11:40 +0000 (21:11 +0000)]
keys, dns: Fix size check of V1 server-list header
Fix the size check added to dns_resolver_preparse() for the V1 server-list
header so that it doesn't give EINVAL if the size supplied is the same as
the size of the header struct (which should be valid).
Linus Torvalds [Wed, 10 Jan 2024 19:37:38 +0000 (11:37 -0800)]
Merge tag 'tpmdd-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
"Just a couple fixes and no new features"
* tag 'tpmdd-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: cr50: fix kernel-doc warning and spelling
tpm: nuvoton: Use i2c_get_match_data()
Linus Torvalds [Wed, 10 Jan 2024 19:03:52 +0000 (11:03 -0800)]
Merge tag 'hardening-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
- Introduce the param_unknown_fn type and other clean ups (Andy
Shevchenko)
- Various __counted_by annotations (Christophe JAILLET, Gustavo A. R.
Silva, Kees Cook)
- Add KFENCE test to LKDTM (Stephen Boyd)
- Various strncpy() refactorings (Justin Stitt)
- Fix qnx4 to avoid writing into the smaller of two overlapping buffers
- Various strlcpy() refactorings
* tag 'hardening-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
qnx4: Use get_directory_fname() in qnx4_match()
qnx4: Extract dir entry filename processing into helper
atags_proc: Add __counted_by for struct buffer and use struct_size()
tracing/uprobe: Replace strlcpy() with strscpy()
params: Fix multi-line comment style
params: Sort headers
params: Use size_add() for kmalloc()
params: Do not go over the limit when getting the string length
params: Introduce the param_unknown_fn type
lkdtm: Add kfence read after free crash type
nvme-fc: replace deprecated strncpy with strscpy
nvdimm/btt: replace deprecated strncpy with strscpy
nvme-fabrics: replace deprecated strncpy with strscpy
drm/modes: replace deprecated strncpy with strscpy_pad
afs: Add __counted_by for struct afs_acl and use struct_size()
VMCI: Annotate struct vmci_handle_arr with __counted_by
i40e: Annotate struct i40e_qvlist_info with __counted_by
HID: uhid: replace deprecated strncpy with strscpy
samples: Replace strlcpy() with strscpy()
SUNRPC: Replace strlcpy() with strscpy()
Ye Bin [Sat, 16 Dec 2023 01:09:19 +0000 (09:09 +0800)]
ext4: fix inconsistent between segment fstrim and full fstrim
Suppose we issue two FITRIM ioctls for ranges [0,15] and [16,31] with
mininum length of trimmed range set to 8 blocks. If we have say a range of
blocks 10-22 free, this range will not be trimmed because it straddles the
boundary of the two FITRIM ranges and neither part is big enough. This is a
bit surprising to some users that call FITRIM on smaller ranges of blocks
to limit impact on the system. Also XFS trims all free space extents that
overlap with the specified range so we are inconsistent among filesystems.
Let's change ext4_try_to_trim_range() to consider for trimming the whole
free space extent that straddles the end of specified range, not just the
part of it within the range.
Ojaswin Mujoo [Fri, 15 Dec 2023 11:19:50 +0000 (16:49 +0530)]
ext4: fallback to complex scan if aligned scan doesn't work
Currently in case the goal length is a multiple of stripe size we use
ext4_mb_scan_aligned() to find the stripe size aligned physical blocks.
In case we are not able to find any, we again go back to calling
ext4_mb_choose_next_group() to search for a different suitable block
group. However, since the linear search always begins from the start,
most of the times we end up with the same BG and the cycle continues.
With large fliesystems, the CPU can be stuck in this loop for hours
which can slow down the whole system. Hence, until we figure out a
better way to continue the search (rather than starting from beginning)
in ext4_mb_choose_next_group(), lets just fallback to
ext4_mb_complex_scan_group() in case aligned scan fails, as it is much
more likely to find the needed blocks.
ext4: convert ext4_da_do_write_end() to take a folio
There's nothing page-specific happening in ext4_da_do_write_end();
it's merely used for its refcount & lock, both of which are folio
properties. Saves four calls to compound_head().
ext4: allow for the last group to be marked as trimmed
The ext4 filesystem tracks the trim status of blocks at the group
level. When an entire group has been trimmed then it is marked as
such and subsequent trim invocations with the same minimum trim size
will not be attempted on that group unless it is marked as able to be
trimmed again such as when a block is freed.
Currently the last group can't be marked as trimmed due to incorrect
logic in ext4_last_grp_cluster(). ext4_last_grp_cluster() is supposed
to return the zero based index of the last cluster in a group. This is
then used by ext4_try_to_trim_range() to determine if the trim
operation spans the entire group and as such if the trim status of the
group should be recorded.
ext4_last_grp_cluster() takes a 0 based group index, thus the valid
values for grp are 0..(ext4_get_groups_count - 1). Any group index
less than (ext4_get_groups_count - 1) is not the last group and must
have EXT4_CLUSTERS_PER_GROUP(sb) clusters. For the last group we need
to calculate the number of clusters based on the number of blocks in
the group. Finally subtract 1 from the number of clusters as zero
based indexing is expected. Rearrange the function slightly to make
it clear what we are calculating and returning.
Reproducer:
// Create file system where the last group has fewer blocks than
// blocks per group
$ mkfs.ext4 -b 4096 -g 8192 /dev/nvme0n1 8191
$ mount /dev/nvme0n1 /mnt
Before Patch:
$ fstrim -v /mnt
/mnt: 25.9 MiB (27156480 bytes) trimmed
// Group not marked as trimmed so second invocation still discards blocks
$ fstrim -v /mnt
/mnt: 25.9 MiB (27156480 bytes) trimmed
After Patch:
fstrim -v /mnt
/mnt: 25.9 MiB (27156480 bytes) trimmed
// Group marked as trimmed so second invocation DOESN'T discard any blocks
fstrim -v /mnt
/mnt: 0 B (0 bytes) trimmed
Linus Torvalds [Wed, 10 Jan 2024 18:53:02 +0000 (10:53 -0800)]
Merge tag 'pstore-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore updates from Kees Cook:
- Do not allow misconfigured ECC sizes (Sergey Shtylyov)
- Allow for odd number of CPUs (Weichen Chen)
- Refactor error handling to use cleanup.h
* tag 'pstore-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore: inode: Use cleanup.h for struct pstore_private
pstore: inode: Use __free(pstore_iput) for inode allocations
pstore: inode: Convert mutex usage to guard(mutex)
pstore: inode: Convert kfree() usage to __free(kfree)
pstore: ram_core: fix possible overflow in persistent_ram_init_ecc()
pstore/ram: Fix crash when setting number of cpus to an odd number
Linus Torvalds [Wed, 10 Jan 2024 18:39:56 +0000 (10:39 -0800)]
Merge tag 'erofs-for-6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, we'd like to enable basic sub-page compressed data
support for Android ecosystem (for vendors to try out 16k page size
with 4k-block images in their compatibility mode) as well as container
images (so that 4k-block images can be parsed on arm64 cloud servers
using 64k page size.)
In addition, there are several bugfixes and cleanups as usual. All
commits have been in -next for a while and no potential merge conflict
is observed.
Summary:
- Add basic sub-page compressed data support
- Fix a memory leak on MicroLZMA and DEFLATE compression
- Fix a rare LZ4 inplace decompression issue on recent x86 CPUs
- Fix a KASAN issue reported by syzbot around crafted images
- Some cleanups"
* tag 'erofs-for-6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: make erofs_{err,info}() support NULL sb parameter
erofs: avoid debugging output for (de)compressed data
erofs: allow partially filled compressed bvecs
erofs: enable sub-page compressed block support
erofs: refine z_erofs_transform_plain() for sub-page block support
erofs: fix ztailpacking for subpage compressed blocks
erofs: fix up compacted indexes for block size < 4096
erofs: record `pclustersize` in bytes instead of pages
erofs: support I/O submission for sub-page compressed blocks
erofs: fix lz4 inplace decompression
erofs: fix memory leak on short-lived bounced pages
Linus Torvalds [Wed, 10 Jan 2024 18:24:49 +0000 (10:24 -0800)]
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux
Pull fscrypt updates from Eric Biggers:
"Adjust the timing of the fscrypt keyring destruction, to prepare for
btrfs's fscrypt support.
Also document that CephFS supports fscrypt now"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fs: move fscrypt keyring destruction to after ->put_super
f2fs: move release of block devices to after kill_block_super()
fscrypt: document that CephFS supports fscrypt now
fscrypt: update comment for do_remove_key()
fscrypt.rst: update definition of struct fscrypt_context_v2
Linus Torvalds [Wed, 10 Jan 2024 18:20:08 +0000 (10:20 -0800)]
Merge tag 'nfsd-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"The bulk of the patches for this release are clean-ups and minor bug
fixes.
There is one significant revert to mention: support for RDMA Read
operations in the server's RPC-over-RDMA transport implementation has
been fixed so it waits for Read completion in a way that avoids tying
up an nfsd thread. This prevents a possible DoS vector if an
RPC-over-RDMA client should become unresponsive during RDMA Read
operations.
As always I am grateful to NFSD contributors, reviewers, and testers"
* tag 'nfsd-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (56 commits)
nfsd: rename nfsd_last_thread() to nfsd_destroy_serv()
SUNRPC: discard sv_refcnt, and svc_get/svc_put
svc: don't hold reference for poolstats, only mutex.
SUNRPC: remove printk when back channel request not found
svcrdma: Implement multi-stage Read completion again
svcrdma: Copy construction of svc_rqst::rq_arg to rdma_read_complete()
svcrdma: Add back svcxprt_rdma::sc_read_complete_q
svcrdma: Add back svc_rdma_recv_ctxt::rc_pages
svcrdma: Clean up comment in svc_rdma_accept()
svcrdma: Remove queue-shortening warnings
svcrdma: Remove pointer addresses shown in dprintk()
svcrdma: Optimize svc_rdma_cc_init()
svcrdma: De-duplicate completion ID initialization helpers
svcrdma: Move the svc_rdma_cc_init() call
svcrdma: Remove struct svc_rdma_read_info
svcrdma: Update the synopsis of svc_rdma_read_special()
svcrdma: Update the synopsis of svc_rdma_read_call_chunk()
svcrdma: Update synopsis of svc_rdma_read_multiple_chunks()
svcrdma: Update synopsis of svc_rdma_copy_inline_range()
svcrdma: Update the synopsis of svc_rdma_read_data_item()
...
Linus Torvalds [Wed, 10 Jan 2024 18:17:23 +0000 (10:17 -0800)]
Merge tag 'dlm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This set cleans up the interface between nfs lockd and dlm, which is
handling nfs file locking for gfs2 and ocfs2. Very basic lockd
functionality is fixed, in which the fl owner was using the lockd pid
instead of the owner value from nfs"
* tag 'dlm-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: update format header reflect current format
dlm: fix format seq ops type 4
dlm: implement EXPORT_OP_ASYNC_LOCK
dlm: use FL_SLEEP to determine blocking vs non-blocking
dlm: use fl_owner from lockd
dlm: use kernel_connect() and kernel_bind()
Linus Torvalds [Wed, 10 Jan 2024 18:11:01 +0000 (10:11 -0800)]
Merge tag 'afs-fix-rotation-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull afs updates from David Howells:
"The majority of the patches are aimed at fixing and improving the AFS
filesystem's rotation over server IP addresses, but there are also
some fixes from Oleg Nesterov for the use of read_seqbegin_or_lock().
- Fix fileserver probe handling so that the next round of probes
doesn't break ongoing server/address rotation by clearing all the
probe result tracking. This could occasionally cause the rotation
algorithm to drop straight through, give a 'successful' result
without actually emitting any RPC calls, leaving the reply buffer
in an undefined state.
Instead, detach the probe results into a separate struct and
allocate a new one each time we start probing and update the
pointer to it. Probes are also sent in order of address preference
to try and improve the chance that the preferred one will complete
first.
- Fix server rotation so that it uses configurable address
preferences across on the probes that have completed so far than
ranking them by RTT as the latter doesn't necessarily give the best
route. The preference list can be altered by writing into
/proc/net/afs/addr_prefs.
- Fix the handling of Read-Only (and Backup) volume callbacks as
there is one per volume, not one per file, so if someone performs a
command that, say, offlines the volume but doesn't change it, when
it comes back online we don't spam the server with a status fetch
for every vnode we're using. Instead, check the Creation timestamp
in the VolSync record when prompted by a callback break.
- Handle volume regression (ie. a RW volume being restored from a
backup) by scrubbing all cache data for that volume. This is
detected from the VolSync creation timestamp.
- Adjust abort handling and abort -> error mapping to match better
with what other AFS clients do.
- Fix offline and busy volume state handling as they only apply to
individual server instances and not entire volumes and the rotation
algorithm should go and look at other servers if available. Also
make it sleep briefly before each retry if all the volume instances
are unavailable"
* tag 'afs-fix-rotation-20240105' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (40 commits)
afs: trace: Log afs_make_call(), including server address
afs: Fix offline and busy message emission
afs: Fix fileserver rotation
afs: Overhaul invalidation handling to better support RO volumes
afs: Parse the VolSync record in the reply of a number of RPC ops
afs: Don't leave DONTUSE/NEWREPSITE servers out of server list
afs: Fix comment in afs_do_lookup()
afs: Apply server breaks to mmap'd files in the call processor
afs: Move the vnode/volume validity checking code into its own file
afs: Defer volume record destruction to a workqueue
afs: Make it possible to find the volumes that are using a server
afs: Combine the endpoint state bools into a bitmask
afs: Keep a record of the current fileserver endpoint state
afs: Dispatch vlserver probes in priority order
afs: Dispatch fileserver probes in priority order
afs: Mark address lists with configured priorities
afs: Provide a way to configure address priorities
afs: Remove the unimplemented afs_cmp_addr_list()
afs: Add some more info to /proc/net/afs/servers
rxrpc: Create a procfile to display outstanding client conn bundles
...
Linus Torvalds [Wed, 10 Jan 2024 18:04:36 +0000 (10:04 -0800)]
Merge tag 'jfs-6.8' of github.com:kleikamp/linux-shaggy
Pull jfs updates from David Kleikamp:
"Stability improvements"
* tag 'jfs-6.8' of github.com:kleikamp/linux-shaggy:
jfs: Add missing set_freezable() for freezable kthread
jfs: fix array-index-out-of-bounds in diNewExt
jfs: fix shift-out-of-bounds in dbJoin
jfs: fix uaf in jfs_evict_inode
jfs: fix array-index-out-of-bounds in dbAdjTree
jfs: fix slab-out-of-bounds Read in dtSearch
UBSAN: array-index-out-of-bounds in dtSplitRoot
FS:JFS:UBSAN:array-index-out-of-bounds in dbAdjTree
Linus Torvalds [Wed, 10 Jan 2024 17:36:40 +0000 (09:36 -0800)]
Merge tag 'gfs2-v6.7-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Add support for non-blocking lookup (MAY_NOT_BLOCK / LOOKUP_RCU)
- Various minor fixes and cleanups
* tag 'gfs2-v6.7-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix freeze consistency check in log_write_header
gfs2: Refcounting fix in gfs2_thaw_super
gfs2: Minor gfs2_{freeze,thaw}_super cleanup
gfs2: Use wait_event_freezable_timeout() for freezable kthread
gfs2: Add missing set_freezable() for freezable kthread
gfs2: Remove use of error flag in journal reads
gfs2: Lift withdraw check out of gfs2_ail1_empty
gfs2: Rename gfs2_withdrawn to gfs2_withdrawing_or_withdrawn
gfs2: Mark withdraws as unlikely
gfs2: Minor gfs2_ail1_empty cleanup
gfs2: use is_subdir()
gfs2: d_obtain_alias(ERR_PTR(...)) will do the right thing
gfs2: Use GL_NOBLOCK flag for non-blocking lookups
gfs2: Add GL_NOBLOCK flag
gfs2: rgrp: fix kernel-doc warnings
gfs2: fix kernel BUG in gfs2_quota_cleanup
gfs2: Fix inode_go_instantiate description
gfs2: Fix kernel NULL pointer dereference in gfs2_rgrp_dump
Linus Torvalds [Wed, 10 Jan 2024 17:27:40 +0000 (09:27 -0800)]
Merge tag 'for-6.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"There are no exciting changes for users, it's been mostly API
conversions and some fixes or refactoring.
The mount API conversion is a base for future improvements that would
come with VFS. Metadata processing has been converted to folios, not
yet enabling the large folios but it's one patch away once everything
gets tested enough.
Core changes:
- convert extent buffers to folios:
- direct API conversion where possible
- performance can drop by a few percent on metadata heavy
workloads, the folio sizes are not constant and the calculations
add up in the item helpers
- both regular and subpage modes
- data cannot be converted yet, we need to port that to iomap and
there are some other generic changes required
- convert mount to the new API, should not be user visible:
- options deprecated long time ago have been removed: inode_cache,
recovery
- the new logic that splits mount to two phases slightly changes
timing of device scanning for multi-device filesystems
- LSM options will now work (like for selinux)
- convert delayed nodes radix tree to xarray, preserving the
preload-like logic that still allows to allocate with GFP_NOFS
- more validation of sysfs value of scrub_speed_max
- refactor chunk map structure, reduce size and improve performance
- extent map refactoring, smaller data structures, improved
performance
- reduce size of struct extent_io_tree, embedded in several
structures
- temporary pages used for compression are cached and attached to a
shrinker, this may slightly improve performance
- in zoned mode, remove redirty extent buffer tracking, zeros are
written in case an out-of-order is detected and proper data are
written to the actual write pointer
* tag 'for-6.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (89 commits)
btrfs: pass btrfs_io_geometry into btrfs_max_io_len
btrfs: pass struct btrfs_io_geometry to set_io_stripe
btrfs: open code set_io_stripe for RAID56
btrfs: change block mapping to switch/case in btrfs_map_block
btrfs: factor out block mapping for single profiles
btrfs: factor out block mapping for RAID5/6
btrfs: reduce scope of data_stripes in btrfs_map_block
btrfs: factor out block mapping for RAID10
btrfs: factor out block mapping for DUP profiles
btrfs: factor out RAID1 block mapping
btrfs: factor out block-mapping for RAID0
btrfs: re-introduce struct btrfs_io_geometry
btrfs: factor out helper for single device IO check
btrfs: migrate btrfs_repair_io_failure() to folio interfaces
btrfs: migrate eb_bitmap_offset() to folio interfaces
btrfs: migrate various end io functions to folios
btrfs: migrate subpage code to folio interfaces
btrfs: migrate get_eb_page_index() and get_eb_offset_in_page() to folios
btrfs: don't double put our subpage reference in alloc_extent_buffer
btrfs: cleanup metadata page pointer usage
...
Linus Torvalds [Wed, 10 Jan 2024 16:45:22 +0000 (08:45 -0800)]
Merge tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Chandan Babu:
"New features/functionality:
- Online repair:
- Reserve disk space for online repairs
- Fix misinteraction between the AIL and btree bulkloader because
of which the bulk load fails to queue a buffer for writeback if
it happens to be on the AIL list
- Prevent transaction reservation overflows when reaping blocks
during online repair
- Whenever possible, bulkloader now copies multiple records into
a block
- Support repairing of
1. Per-AG free space, inode and refcount btrees
2. Ondisk inodes
3. File data and attribute fork mappings
- Verify the contents of
1. Inode and data fork of realtime bitmap file
2. Quota files
- Introduce MF_MEM_PRE_REMOVE. This will be used to notify tasks
about a pmem device being removed
Bug fixes:
- Fix memory leak of recovered attri intent items
- Fix UAF during log intent recovery
- Fix realtime geometry integer overflows
- Prevent scrub from live locking in xchk_iget
- Prevent fs shutdown when removing files during low free disk space
- Prevent transaction reservation overflow when extending an RT
device
- Prevent incorrect warning from being printed when extending a
filesystem
- Fix an off-by-one error in xreap_agextent_binval
- Serialize access to perag radix tree during deletion operation
- Fix perag memory leak during growfs
- Allow allocation of minlen realtime extent when the maximum sized
realtime free extent is minlen in size
Cleanups:
- Remove duplicate boilerplate code spread across functionality
associated with different log items
- Cleanup resblks interfaces
- Pass defer ops pointer to defer helpers instead of an enum
- Initialize di_crc in xfs_log_dinode to prevent KMSAN warnings
- Use static_assert() instead of BUILD_BUG_ON_MSG() to validate size
of structures and structure member offsets. This is done in order
to be able to share the code with userspace
- Move XFS documentation under a new directory specific to XFS
- Do not invoke deferred ops' ->create_done callback if the deferred
operation does not have an intent item associated with it
- Remove duplicate inclusion of header files from scrub/health.c
- Refactor Realtime code
- Cleanup attr code"
* tag 'xfs-6.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (123 commits)
xfs: use the op name in trace_xlog_intent_recovery_failed
xfs: fix a use after free in xfs_defer_finish_recovery
xfs: turn the XFS_DA_OP_REPLACE checks in xfs_attr_shortform_addname into asserts
xfs: remove xfs_attr_sf_hdr_t
xfs: remove struct xfs_attr_shortform
xfs: use xfs_attr_sf_findname in xfs_attr_shortform_getvalue
xfs: remove xfs_attr_shortform_lookup
xfs: simplify xfs_attr_sf_findname
xfs: move the xfs_attr_sf_lookup tracepoint
xfs: return if_data from xfs_idata_realloc
xfs: make if_data a void pointer
xfs: fold xfs_rtallocate_extent into xfs_bmap_rtalloc
xfs: simplify and optimize the RT allocation fallback cascade
xfs: reorder the minlen and prod calculations in xfs_bmap_rtalloc
xfs: remove XFS_RTMIN/XFS_RTMAX
xfs: remove rt-wrappers from xfs_format.h
xfs: factor out a xfs_rtalloc_sumlevel helper
xfs: tidy up xfs_rtallocate_extent_exact
xfs: merge the calls to xfs_rtallocate_range in xfs_rtallocate_block
xfs: reflow the tail end of xfs_rtallocate_extent_block
...
Linus Torvalds [Wed, 10 Jan 2024 16:38:33 +0000 (08:38 -0800)]
Merge tag 'fsnotify_for_v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"fanotify changes allowing use of fanotify directory events even for
filesystems such as FUSE which don't report proper fsid"
* tag 'fsnotify_for_v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: allow "weak" fsid when watching a single filesystem
fanotify: store fsid in mark instead of in connector
The root cause is that the printed decompressed buffer may be filled
incompletely due to decompression failure. Since they were once only
used for debugging, get rid of them now.
Linus Torvalds [Wed, 10 Jan 2024 01:28:41 +0000 (17:28 -0800)]
Merge tag 'linux_kselftest-next-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest update from Shuah Khan:
"Enhancements to reporting test results, fixes to root and user run
behavior and fixing ksft_print_msg() calls"
* tag 'linux_kselftest-next-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
tracing/selftests: Add ownership modification tests for eventfs
selftests: sched: Remove initialization to 0 for a static variable
selftests: capabilities: namespace create varies for root and normal user
selftests: prctl: Add prctl test for PR_GET_NAME
kselftest/vDSO: Use ksft_print_msg() rather than printf in vdso_test_abi
kselftest/vDSO: Fix message formatting for clock_id logging
kselftest/vDSO: Make test name reporting for vdso_abi_test tooling friendly
selftests:x86: Fix Format String Warnings in lam.c
selftests/breakpoints: Fix format specifier in ksft_print_msg in step_after_suspend_test.c
selftests:breakpoints: Fix Format String Warning in breakpoint_test
Linus Torvalds [Wed, 10 Jan 2024 01:16:58 +0000 (17:16 -0800)]
Merge tag 'linux_kselftest-kunit-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
- a new feature that adds APIs for managing devices introducing a set
of helper functions which allow devices (internally a struct
kunit_device) to be created and managed by KUnit.
These devices will be automatically unregistered on test exit. These
helpers can either use a user-provided struct device_driver, or have
one automatically created and managed by KUnit. In both cases, the
device lives on a new kunit_bus.
- changes to switch drm/tests to use kunit devices
- several fixes and enhancements to attribute feature
- changes to reorganize deferred action function introducing
KUNIT_DEFINE_ACTION_WRAPPER
- new feature adds ability to run tests after boot using debugfs
- fixes and enhancements to string-stream-test:
- parse ERR_PTR in string_stream_destroy()
- unchecked dereference in bug fix in debugfs_print_results()
- handling errors from alloc_string_stream()
- NULL-dereference bug fix in kunit_init_suite()
* tag 'linux_kselftest-kunit-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (27 commits)
kunit: Fix some comments which were mistakenly kerneldoc
kunit: Protect string comparisons against NULL
kunit: Add example of kunit_activate_static_stub() with pointer-to-function
kunit: Allow passing function pointer to kunit_activate_static_stub()
kunit: Fix NULL-dereference in kunit_init_suite() if suite->log is NULL
kunit: Reset test->priv after each param iteration
kunit: Add example for using test->priv
drm/tests: Switch to kunit devices
ASoC: topology: Replace fake root_device with kunit_device in tests
overflow: Replace fake root_device with kunit_device
fortify: test: Use kunit_device
kunit: Add APIs for managing devices
Documentation: Add debugfs docs with run after boot
kunit: add ability to run tests after boot using debugfs
kunit: add is_init test attribute
kunit: add example suite to test init suites
kunit: add KUNIT_INIT_TABLE to init linker section
kunit: move KUNIT_TABLE out of INIT_DATA
kunit: tool: add test for parsing attributes
kunit: tool: fix parsing of test attributes
...
Linus Torvalds [Wed, 10 Jan 2024 01:14:32 +0000 (17:14 -0800)]
Merge tag 'linux_kselftest-nolibc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull nolibc updates from Shuah Khan:
- Support for PIC mode on MIPS
- Support for getrlimit()/setrlimit()
- Replace some custom declarations with UAPI includes
- A new script "run-tests.sh" to run the testsuite over different
architectures and configurations
- A few non-functional code cleanups
- Minor improvements to nolibc-test, primarily to support the test
script
* tag 'linux_kselftest-nolibc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (22 commits)
selftests/nolibc: disable coredump via setrlimit
tools/nolibc: add support for getrlimit/setrlimit
tools/nolibc: drop custom definition of struct rusage
tools/nolibc: drop duplicated testcase ioctl_tiocinq
tools/nolibc: annotate va_list printf formats
selftests/nolibc: make result alignment more robust
tools/nolibc: mips: add support for PIC
selftests/nolibc: run-tests.sh: enable testing via qemu-user
selftests/nolibc: introduce QEMU_ARCH_USER
selftests/nolibc: fix testcase status alignment
selftests/nolibc: add configuration for mipso32be
selftests/nolibc: extraconfig support
selftests/nolibc: explicitly specify ABI for MIPS
selftests/nolibc: use XARCH for MIPS
tools/nolibc: move MIPS ABI validation into arch-mips.h
tools/nolibc: error out on unsupported architecture
selftests/nolibc: add script to run testsuite
selftests/nolibc: support out-of-tree builds
selftests/nolibc: anchor paths in $(srcdir) if possible
selftests/nolibc: use EFI -bios for LoongArch qemu
...
Linus Torvalds [Wed, 10 Jan 2024 01:11:27 +0000 (17:11 -0800)]
Merge tag 'efi-next-for-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
- Fix a syzbot reported issue in efivarfs where concurrent accesses to
the file system resulted in list corruption
- Add support for accessing EFI variables via the TEE subsystem (and a
trusted application in the secure world) instead of via EFI runtime
firmware running in the OS's execution context
- Avoid linker tricks to discover the image base on LoongArch
* tag 'efi-next-for-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: memmap: fix kernel-doc warnings
efi/loongarch: Directly position the loaded image file
efivarfs: automatically update super block flag
efi: Add tee-based EFI variable driver
efi: Add EFI_ACCESS_DENIED status code
efi: expose efivar generic ops register function
efivarfs: Move efivarfs list into superblock s_fs_info
efivarfs: Free s_fs_info on unmount
efivarfs: Move efivar availability check into FS context init
efivarfs: force RO when remounting if SetVariable is not supported
Linus Torvalds [Wed, 10 Jan 2024 01:09:44 +0000 (17:09 -0800)]
Merge tag 'for-linus-6.8-1' of https://github.com/cminyard/linux-ipmi
Pull IPMI updates from Corey Minyard:
"Some small fixes. Nothing big, just aligning things with some changes"
* tag 'for-linus-6.8-1' of https://github.com/cminyard/linux-ipmi:
ipmi: Remove usage of the deprecated ida_simple_xx() API
ipmi: Use regspacings passed as a module parameter
ipmi: si: Use device_get_match_data()
Linus Torvalds [Wed, 10 Jan 2024 01:07:12 +0000 (17:07 -0800)]
Merge tag 'platform-drivers-x86-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
- Intel PMC / PMT / TPMI / uncore-freq / vsec improvements and new
platform support
- AMD PMC / PMF improvements and new platform support
- AMD ACPI based Wifi band RFI mitigation feature (WBRF)
- WMI bus driver cleanups and improvements (Armin Wolf)
- acer-wmi Predator PHN16-71 support
- New Silicom network appliance EC LEDs / GPIOs driver
* tag 'platform-drivers-x86-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (96 commits)
platform/x86/amd/pmc: Modify SMU message port for latest AMD platform
platform/x86/amd/pmc: Add 1Ah family series to STB support list
platform/x86/amd/pmc: Add idlemask support for 1Ah family
platform/x86/amd/pmc: call amd_pmc_get_ip_info() during driver probe
platform/x86/amd/pmc: Add VPE information for AMDI000A platform
platform/x86/amd/pmc: Send OS_HINT command for AMDI000A platform
platform/x86/amd/pmf: Return a status code only as a constant in two functions
platform/x86/amd/pmf: Return directly after a failed apmf_if_call() in apmf_sbios_heartbeat_notify()
platform/x86: wmi: linux/wmi.h: fix Excess kernel-doc description warning
platform/x86/intel/pmc: Add missing extern
platform/x86/intel/pmc/lnl: Add GBE LTR ignore during suspend
platform/x86/intel/pmc/arl: Add GBE LTR ignore during suspend
platform/x86: intel-uncore-freq: Add additional client processors
platform/x86: Remove "X86 PLATFORM DRIVERS - ARCH" from MAINTAINERS
platform/x86: hp-bioscfg: Removed needless asm-generic
platform/x86/intel/pmc: Add Lunar Lake M support to intel_pmc_core driver
platform/x86/intel/pmc: Add Arrow Lake S support to intel_pmc_core driver
platform/x86/intel/pmc: Add ssram_init flag in PMC discovery in Meteor Lake
platform/x86/intel/pmc: Move common code to core.c
platform/x86/intel/pmc: Add PSON residency counter for Alder Lake
...
Linus Torvalds [Wed, 10 Jan 2024 00:55:25 +0000 (16:55 -0800)]
Merge tag 'tag-chrome-platform-for-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Tzung-Bi Shih:
- Implement quickselect for median in cros-ec-sensorhub
- Fix an out of boundary array access in cros-ec-vbc
- Cleanups and fix typos
* tag 'tag-chrome-platform-for-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome/wilco_ec: Remove usage of the deprecated ida_simple_xx() API
platform/chrome: cros_ec_vbc: Fix -Warray-bounds warnings
platform/chrome: sensorhub: Implement quickselect for median calculation
platform/chrome: sensorhub: Fix typos
Linus Torvalds [Wed, 10 Jan 2024 00:32:11 +0000 (16:32 -0800)]
Merge tag 'pm-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add support for new processors (Sierra Forest, Grand Ridge and
Meteor Lake) to the intel_idle driver, make intel_pstate run on
Emerald Rapids without HWP support and adjust it to utilize EPP values
supplied by the platform firmware, fix issues, clean up code and
improve documentation.
The most significant fix addresses deadlocks in the core system-wide
resume code that occur if async_schedule_dev() attempts to run its
argument function synchronously (for example, due to a memory
allocation failure). It rearranges the code in question which may
increase the system resume time in some cases, but this basically is a
removal of a premature optimization. That optimization will be added
back later, but properly this time.
Specifics:
- Add support for the Sierra Forest, Grand Ridge and Meteorlake SoCs
to the intel_idle cpuidle driver (Artem Bityutskiy, Zhang Rui)
- Do not enable interrupts when entering idle in the haltpoll cpuidle
driver (Borislav Petkov)
- Add Emerald Rapids support in no-HWP mode to the intel_pstate
cpufreq driver (Zhenguo Yao)
- Use EPP values programmed by the platform firmware as balanced
performance ones by default in intel_pstate (Srinivas Pandruvada)
- Add a missing function return value check to the SCMI cpufreq
driver to avoid unexpected behavior (Alexandra Diupina)
- Fix parameter type warning in the armada-8k cpufreq driver (Gregory
CLEMENT)
- Rework trans_stat_show() in the devfreq core code to avoid buffer
overflows (Christian Marangi)
- Synchronize devfreq_monitor_[start/stop] so as to prevent a timer
list corruption from occurring when devfreq governors are switched
frequently (Mukesh Ojha)
- Fix possible deadlocks in the core system-wide PM code that occur
if device-handling functions cannot be executed asynchronously
during resume from system-wide suspend (Rafael J. Wysocki)
- Clean up unnecessary local variable initializations in multiple
places in the hibernation code (Wang chaodong, Li zeming)
- Adjust core hibernation code to avoid missing wakeup events that
occur after saving an image to persistent storage (Chris Feng)
- Update hibernation code to enforce correct ordering during image
compression and decompression (Hongchen Zhang)
- Use kmap_local_page() instead of kmap_atomic() in copy_data_page()
during hibernation and restore (Chen Haonan)
- Adjust documentation and code comments to reflect recent tasks
freezer changes (Kevin Hao)
- Repair excess function parameter description warning in the
hibernation image-saving code (Randy Dunlap)
- Fix _set_required_opps when opp is NULL (Bryan O'Donoghue)
- Use device_get_match_data() in the OPP code for TI (Rob Herring)
- Clean up OPP level and other parts and call dev_pm_opp_set_opp()
recursively for required OPPs (Viresh Kumar)"
* tag 'pm-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits)
OPP: Rename 'rate_clk_single'
OPP: Pass rounded rate to _set_opp()
OPP: Relocate dev_pm_opp_sync_regulators()
PM: sleep: Fix possible deadlocks in core system-wide PM code
OPP: Move dev_pm_opp_icc_bw to internal opp.h
async: Introduce async_schedule_dev_nocall()
async: Split async_schedule_node_domain()
cpuidle: haltpoll: Do not enable interrupts when entering idle
OPP: Fix _set_required_opps when opp is NULL
OPP: The level field is always of unsigned int type
PM: hibernate: Repair excess function parameter description warning
PM: sleep: Remove obsolete comment from unlock_system_sleep()
cpufreq: intel_pstate: Add Emerald Rapids support in no-HWP mode
Documentation: PM: Adjust freezing-of-tasks.rst to the freezer changes
PM: hibernate: Use kmap_local_page() in copy_data_page()
intel_idle: add Sierra Forest SoC support
intel_idle: add Grand Ridge SoC support
PM / devfreq: Synchronize devfreq_monitor_[start/stop]
cpufreq: armada-8k: Fix parameter type warning
PM: hibernate: Enforce ordering during image compression/decompression
...
Linus Torvalds [Wed, 10 Jan 2024 00:20:17 +0000 (16:20 -0800)]
Merge tag 'thermal-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These add support for the D1/T113s THS controller to the sun8i driver
and a DT-based mechanism for platforms to indicate a preference to
reboot (instead of shutting down) on crossing a critical trip point,
fix issues, make other improvements (in the IPA governor, the Intel
HFI driver, the exynos driver and the thermal netlink interface among
other places) and clean up code.
One long-standing issue addressed here is that trip point crossing
notifications sent to user space might be unreliable due to the
incorrect handling of trip point hysteresis in the thermal core:
multiple notifications might be sent for the same event or there might
be events without any notification at all.
Specifics:
- Add dynamic thresholds for trip point crossing detection to prevent
trip point crossing notifications from being sent at incorrect
times or not at all in some cases (Rafael J. Wysocki)
- Fix synchronization issues related to the resume of thermal zones
during a system-wide resume and allow thermal zones to be resumed
concurrently (Rafael J. Wysocki)
- Modify the thermal zone unregistration to wait for the given zone
to go away completely before returning to the caller and rework the
sysfs interface for trip points on top of that (Rafael J. Wysocki)
- Fix a possible NULL pointer dereference in thermal zone
registration error path (Rafael J. Wysocki)
- Clean up the IPA thermal governor and modify it (with the help of a
new governor callback) to avoid allocating and freeing memory every
time its throttling callback is invoked (Lukasz Luba)
- Make the IPA thermal governor handle thermal instance weight
changes via sysfs correctly (Lukasz Luba)
- Update the thermal netlink code to avoid sending messages if there
are no recipients (Stanislaw Gruszka)
- Convert Mediatek Thermal to the json-schema (Rafał Miłecki)
- Fix thermal DT bindings issue on Loongson (Binbin Zhou)
- Fix returning NULL instead of -ENODEV during thermal probe on
Loogsoon (Binbin Zhou)
- Add thermal DT binding for tsens on the SM8650 platform (Neil
Armstrong)
- Add reboot on the critical trip point crossing option feature
(Fabio Estevam)
- Use DEFINE_SIMPLE_DEV_PM_OPS do define PM functions for thermal
suspend/resume on AmLogic (Uwe Kleine-König)
- Add D1/T113s THS controller support to the Sun8i thermal control
driver (Maxim Kiselev)
- Fix example in the thermal DT binding for QCom SPMI (Johan Hovold)
- Fix compilation warning in the tmon utility (Florian Eckert)
- Add support for interrupt-based thermal configuration on Exynos
along with a set of related cleanups (Mateusz Majewski)
- Make the Intel HFI thermal driver enable an HFI instance (eg.
processor package) from its first online CPU and disable it when
the last CPU in it goes offline (Ricardo Neri)
- Fix a kernel-doc warning and a spello in the cpuidle_cooling
thermal driver (Randy Dunlap)
- Move the .get_temp() thermal zone callback presence check to the
thermal zone registration code (Daniel Lezcano)
- Use the for_each_trip() macro for trip points table walks in a few
places in the thermal core (Rafael J. Wysocki)
- Make all trip point updates (via sysfs as well as from the platform
firmware) trigger trip change notifications (Rafael J. Wysocki)
- Drop redundant code from the thermal core and make one function in
it take a const pointer argument (Rafael J. Wysocki)"
* tag 'thermal-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits)
thermal: trip: Constify thermal zone argument of thermal_zone_trip_id()
thermal: intel: hfi: Disable an HFI instance when all its CPUs go offline
thermal: intel: hfi: Enable an HFI instance from its first online CPU
thermal: intel: hfi: Refactor enabling code into helper functions
thermal/drivers/exynos: Use set_trips ops
thermal/drivers/exynos: Use BIT wherever possible
thermal/drivers/exynos: Split initialization of TMU and the thermal zone
thermal/drivers/exynos: Stop using the threshold mechanism on Exynos 4210
thermal/drivers/exynos: Simplify regulator (de)initialization
thermal/drivers/exynos: Handle devm_regulator_get_optional return value correctly
thermal/drivers/exynos: Wwitch from workqueue-driven interrupt handling to threaded interrupts
thermal/drivers/exynos: Drop id field
thermal/drivers/exynos: Remove an unnecessary field description
tools/thermal/tmon: Fix compilation warning for wrong format
dt-bindings: thermal: qcom-spmi-adc-tm5/hc: Clean up examples
dt-bindings: thermal: qcom-spmi-adc-tm5/hc: Fix example node names
thermal/drivers/sun8i: Add D1/T113s THS controller support
dt-bindings: thermal: sun8i: Add binding for D1/T113s THS controller
thermal: amlogic: Use DEFINE_SIMPLE_DEV_PM_OPS for PM functions
thermal: amlogic: Make amlogic_thermal_disable() return void
...
Linus Torvalds [Wed, 10 Jan 2024 00:12:44 +0000 (16:12 -0800)]
Merge tag 'acpi-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"From the new features standpoint, the most significant change here is
the addition of CSI-2 and MIPI DisCo for Imaging support to the ACPI
device enumeration code that will allow MIPI cameras to be enumerated
through the platform firmware on systems using ACPI.
Also significant is the switch-over to threaded interrupt handlers for
the ACPI SCI and the dedicated EC interrupt (on systems where the
former is not used) which essentially allows all ACPI code to run with
local interrupts enabled. That should improve responsiveness
significantly on systems where multiple GPEs are enabled and the
handling of one SCI involves many I/O address space accesses which
previously had to be carried out in one go with disabled interrupts on
the local CPU.
Apart from the above, the ACPI thermal zone driver will use the
Thermal fast Sampling Period (_TFP) object if available, which should
allow temperature changes to be followed more accurately on some
systems, the ACPI Notify () handlers can run on all CPUs (not just on
CPU0), which should generally speed up the processing of events
signaled through the ACPI SCI, and the ACPI power button driver will
trigger wakeup key events via the input subsystem (on systems where it
is a system wakeup device)
In addition to that, there are the usual bunch of fixes and cleanups.
Specifics:
- Add CSI-2 and DisCo for Imaging support to the ACPI device
enumeration code (Sakari Ailus, Rafael J. Wysocki)
- Adjust the cpufreq thermal reduction algorithm in the ACPI
processor driver for Tegra241 (Srikar Srimath Tirumala, Arnd
Bergmann)
- Make acpi_proc_quirk_mwait_check() x86-specific (Rafael J. Wysocki)
- Switch over ACPI to using a threaded interrupt handler for the SCI
(Rafael J. Wysocki)
- Allow ACPI Notify () handlers to run on all CPUs and clean up the
ACPI interface for deferred events processing (Rafael J. Wysocki)
- Switch over the ACPI EC driver to using a threaded handler for the
dedicated IRQ on systems without the EC GPE (Rafael J. Wysocki)
- Adjust code using ACPICA spinlocks and the ACPI EC driver spinlock
to keep local interrupts on (Rafael J. Wysocki)
- Adjust the USB4 _OSC handshake to correctly handle cases in which
certain types of OS control are denied by the platform (Mika
Westerberg)
- Correct and clean up the generic function for parsing ACPI
data-only tables with array structure (Yuntao Wang)
- Modify acpi_dev_uid_match() to support different types of its
second argument and adjust its users accordingly (Raag Jadav)
- Clean up code related to acpi_evaluate_reference() and ACPI device
lists (Rafael J. Wysocki)
- Use generic ACPI helpers for evaluating trip point temperature
objects in the ACPI thermal zone driver (Rafael J. Wysockii, Arnd
Bergmann)
- Add Thermal fast Sampling Period (_TFP) support to the ACPI thermal
zone driver (Jeff Brasen)
- Modify the ACPI LPIT table handling code to avoid u32
multiplication overflows in state residency computations (Nikita
Kiryushin)
- Drop an unused helper function from the ACPI backlight (video)
driver and add a clarifying comment to it (Hans de Goede)
- Update the ACPI backlight driver to avoid using uninitialized
memory in some cases (Nikita Kiryushin)
- Add ACPI backlight quirk for the Colorful X15 AT 23 laptop (Yuluo
Qiu)
- Add support for vendor-defined error types to the ACPI APEI error
injection code (Avadhut Naik)
- Adjust APEI to properly set MF_ACTION_REQUIRED on synchronous
memory failure events, so they are handled differently from the
asynchronous ones (Shuai Xue)
- Fix NULL pointer dereference check in the ACPI extlog driver
(Prarit Bhargava)
- Adjust the ACPI extlog driver to clear the Extended Error Log
status when RAS_CEC handled the error (Tony Luck)
- Add IRQ override quirks for some Infinity laptops and for TongFang
GMxXGxx (David McFarland, Hans de Goede)
- Clean up the ACPI NUMA code and fix it to ensure that fake_pxm is
not the same as one of the real pxm values (Yuntao Wang)
- Fix the fractional clock divider flags in the ACPI LPSS (Intel SoC)
driver so as to prevent miscalculation of the values in the clock
divider (Andy Shevchenko)
- Adjust comments in the ACPI watchdog driver to prevent kernel-doc
from complaining during documentation builds (Randy Dunlap)
- Make the ACPI button driver send wakeup key events to user space in
addition to power button events on systems that can be woken up by
the power button (Ken Xue)
- Adjust pnpacpi_parse_allocated_vendor() to use memcpy() on a full
structure field (Dmitry Antipov)"
* tag 'acpi-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (56 commits)
ACPI: resource: Add Infinity laptops to irq1_edge_low_force_override
ACPI: button: trigger wakeup key events
ACPI: resource: Add another DMI match for the TongFang GMxXGxx
ACPI: EC: Use a spin lock without disabing interrupts
ACPI: EC: Use a threaded handler for dedicated IRQ
ACPI: OSL: Use spin locks without disabling interrupts
ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events
ACPI: utils: Introduce helper for _DEP list lookup
ACPI: utils: Fix white space in struct acpi_handle_list definition
ACPI: utils: Refine acpi_handle_list_equal() slightly
ACPI: utils: Return bool from acpi_evaluate_reference()
ACPI: utils: Rearrange in acpi_evaluate_reference()
ACPI: arm64: export acpi_arch_thermal_cpufreq_pctg()
ACPI: extlog: Clear Extended Error Log status when RAS_CEC handled the error
ACPI: LPSS: Fix the fractional clock divider flags
ACPI: NUMA: Fix the logic of getting the fake_pxm value
ACPI: NUMA: Optimize the check for the availability of node values
ACPI: NUMA: Remove unnecessary check in acpi_parse_gi_affinity()
ACPI: watchdog: fix kernel-doc warnings
ACPI: extlog: fix NULL pointer dereference check
...
Linus Torvalds [Tue, 9 Jan 2024 23:40:59 +0000 (15:40 -0800)]
Merge tag 'mtd/for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd updates from Miquel Raynal:
"MTD:
- Apart from preventing the mtdblk to run on top of ftl or ubiblk
(which may cause security issues and has no meaning anyway), there
are a few misc fixes.
Raw NAND:
- Two meaningful changes this time. The conversion of the brcmnand
driver to the ->exec_op() API, this series brought additional
changes to the core in order to help controller drivers to handle
themselves the WP pin during destructive operations when relevant.
- There is also a series bringing important fixes to the sequential
read feature.
- As always, there is as well a whole bunch of miscellaneous W=1
fixes, together with a few runtime fixes (double free, timeout
value, OOB layout, missing register initialization) and the usual
load of remove callbacks turned into void (which led to switch the
txx9ndfmc driver to use module_platform_driver()).
SPI NOR:
- SPI NOR comes with die erase support for multi die flashes, with
new octal protocols (1-1-8 and 1-8-8) parsed from SFDP and with an
updated documentation about what the contributors shall consider
when proposing flash additions or updates.
- Michael Walle stepped out from the reviewer role to maintainer"
* tag 'mtd/for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (39 commits)
mtd: rawnand: Clarify conditions to enable continuous reads
mtd: rawnand: Prevent sequential reads with on-die ECC engines
mtd: rawnand: Fix core interference with sequential reads
mtd: rawnand: Prevent crossing LUN boundaries during sequential reads
mtd: Fix gluebi NULL pointer dereference caused by ftl notifier
dt-bindings: mtd: partitions: u-boot: Fix typo
mtd: rawnand: s3c2410: fix Excess struct member description kernel-doc warnings
MAINTAINERS: change my mail to the kernel.org one
mtd: spi-nor: sfdp: get the 1-1-8 and 1-8-8 protocol from SFDP
mtd: spi-nor: drop superfluous debug prints
mtd: spi-nor: sysfs: hide the flash name if not set
mtd: spi-nor: mark the flash name as obsolete
mtd: spi-nor: print flash ID instead of name
mtd: maps: vmu-flash: Fix the (mtd core) switch to ref counters
mtd: ssfdc: Remove an unused variable
mtd: rawnand: diskonchip: fix a potential double free in doc_probe
mtd: rawnand: rockchip: Add missing title to a kernel doc comment
mtd: rawnand: rockchip: Rename a structure
mtd: rawnand: pl353: Fix kernel doc
mtd: spi-nor: micron-st: Add support for mt25qu01g
...
Linus Torvalds [Tue, 9 Jan 2024 23:02:12 +0000 (15:02 -0800)]
Merge tag 'spi-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"A moderately busy release for SPI, the main core update was the
merging of support for multiple chip selects, used in some flash
configurations. There were also big overhauls for the AXI SPI Engine
and PL022 drivers, plus some new device support for ST.
There's a few patches for other trees, API updates to allow the
multiple chip select support and one of the naming modernisations
touched a controller embedded in the USB code.
- Support for multiple chip selects.
- A big overhaul for the AXI SPI engine driver, modernising it and
adding a bunch of new features.
- Modernisation of the PL022 driver, fixing some issues with
submitting messages while in atomic context in the process.
- Many drivers were converted to use new APIs which avoid outdated
terminology for devices and controllers.
- Support for ST Microelectronics STM32F7 and STM32MP25, and Renesas
RZ/Five"
* tag 'spi-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (83 commits)
spi: stm32: add st,stm32mp25-spi compatible supporting STM32MP25 soc
dt-bindings: spi: stm32: add st,stm32mp25-spi compatible
spi: stm32: use dma_get_slave_caps prior to configuring dma channel
spi: axi-spi-engine: fix struct member doc warnings
spi: pl022: update description of internal_cs_control()
spi: pl022: delete description of cur_msg
spi: dw: Remove Intel Thunder Bay SOC support
spi: dw: Remove Intel Thunder Bay SOC support
spi: sh-msiof: Enforce fixed DTDL for R-Car H3
spi: ljca: switch to use devm_spi_alloc_host()
spi: cs42l43: switch to use devm_spi_alloc_host()
spi: zynqmp-gqspi: switch to use modern name
spi: zynq-qspi: switch to use modern name
spi: xtensa-xtfpga: switch to use modern name
spi: xlp: switch to use modern name
spi: xilinx: switch to use modern name
spi: xcomm: switch to use modern name
spi: uniphier: switch to use modern name
spi: topcliff-pch: switch to use modern name
spi: wpcm-fiu: switch to use devm_spi_alloc_host()
...
Linus Torvalds [Tue, 9 Jan 2024 22:41:21 +0000 (14:41 -0800)]
Merge tag 'regulator-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"The main updates for this release are around monitoring of regulators,
largely for error handling purposes. We allow the stream of regulator
events to be seen by userspace as netlink events and allow system
integrators to describe individual regulators as system critical with
information on how long the system is expected to last on error. The
system level error handling is very much about best effort problem
mitigation rather than providing something fully robust, the initial
drive was to provide a mechanism for trying to avoid initiating any
new writes to flash once we notice the power going out.
Otherwise it's very quiet, mainly several new Qualcomm devices.
- Support for marking regulators as system critical and providing
information on how long the system might last with those regulators
in a failure state, hooked into the existing critical shutdown
error handling.
- Optional support for generating netlink events for events, there
are use cases for system monitoring UIs and error handling.
- A command line option to leave unused controllable regulators
enabled, useful for debugging. We already only disable regulators
we were explicitly given permission to control.
- Support for Quacomm MP5496, PM8010 and PM8937"
* tag 'regulator-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (31 commits)
regulator: event: Ensure atomicity for sequence number
uapi: regulator: Fix typo
regulator: Reuse LINEAR_RANGE() in REGULATOR_LINEAR_RANGE()
dt-bindings: regulator: qcom,usb-vbus-regulator: clean up example
regulator: qcom_smd: Add LDO5 MP5496 regulator
regulator: qcom-rpmh: add support for pm8010 regulators
regulator: dt-bindings: qcom,rpmh: add compatible for pm8010
regulator: qcom-rpmh: extend to support multiple linear voltage ranges
regulator: wm8350: Convert to platform remove callback returning void
regulator: virtual: Convert to platform remove callback returning void
regulator: userspace-consumer: Convert to platform remove callback returning void
regulator: uniphier: Convert to platform remove callback returning void
regulator: stm32-vrefbuf: Convert to platform remove callback returning void
regulator: db8500-prcmu: Convert to platform remove callback returning void
regulator: bd9571mwv: Convert to platform remove callback returning void
regulator: arizona-ldo1: Convert to platform remove callback returning void
regulator: event: Add regulator netlink event support
regulator: event: Add regulator netlink event support
regulator: stpmic1: Fix kernel-doc notation warnings
regulator: palmas: remove redundant initialization of pointer pdata
...