Linus Torvalds [Fri, 12 May 2023 22:10:32 +0000 (17:10 -0500)]
Merge tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull more btrfs fixes from David Sterba:
- fix incorrect number of bitmap entries for space cache if loading is
interrupted by some error
- fix backref walking, this breaks a mode of LOGICAL_INO_V2 ioctl that
is used in deduplication tools
- zoned mode fixes:
- properly finish zone reserved for relocation
- correctly calculate super block zone end on ZNS
- properly initialize new extent buffer for redirty
- make mount option clear_cache work with block-group-tree, to rebuild
free-space-tree instead of temporarily disabling it that would lead
to a forced read-only mount
- fix alignment check for offset when printing extent item
* tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: make clear_cache mount option to rebuild FST without disabling it
btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add
btrfs: zoned: fix full zone super block reading on ZNS
btrfs: zoned: zone finish data relocation BG with last IO
btrfs: fix backref walking not returning all inode refs
btrfs: fix space cache inconsistency after error loading it from disk
btrfs: print-tree: parent bytenr must be aligned to sector size
Linus Torvalds [Fri, 12 May 2023 22:01:36 +0000 (17:01 -0500)]
Merge tag '6.4-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs client fixes from Steve French:
- fix for copy_file_range bug for very large files that are multiples
of rsize
- do not ignore "isolated transport" flag if set on share
- set rasize default better
- three fixes related to shutdown and freezing (fixes 4 xfstests, and
closes deferred handles faster in some places that were missed)
* tag '6.4-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: release leases for deferred close handles when freezing
smb3: fix problem remounting a share after shutdown
SMB3: force unmount was failing to close deferred close files
smb3: improve parallel reads of large files
do not reuse connection if share marked as isolated
cifs: fix pcchunk length type in smb2_copychunk_range
Linus Torvalds [Fri, 12 May 2023 21:56:09 +0000 (16:56 -0500)]
Merge tag 'vfs/v6.4-rc1/pipe' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fix from Christian Brauner:
"During the pipe nonblock rework the check for both O_NONBLOCK and
IOCB_NOWAIT was dropped. Both checks need to be performed to ensure
that files without O_NONBLOCK but IOCB_NOWAIT don't block when writing
to or reading from a pipe.
This just contains the fix adding the check for IOCB_NOWAIT back in"
* tag 'vfs/v6.4-rc1/pipe' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
pipe: check for IOCB_NOWAIT alongside O_NONBLOCK
Linus Torvalds [Fri, 12 May 2023 21:39:05 +0000 (16:39 -0500)]
Merge tag 'io_uring-6.4-2023-05-12' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
"Just a single fix making io_uring_sqe_cmd() available regardless of
CONFIG_IO_URING, fixing a regression introduced during the merge
window if nvme was selected but io_uring was not"
* tag 'io_uring-6.4-2023-05-12' of git://git.kernel.dk/linux:
io_uring: make io_uring_sqe_cmd() unconditionally available
Linus Torvalds [Fri, 12 May 2023 21:31:55 +0000 (16:31 -0500)]
Merge tag 'riscv-for-linus-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fix from Palmer Dabbelt:
"Just a single fix this week for a build issue. That'd usually be a
good sign, but we've started to get some reports of boot failures on
some hardware/bootloader configurations. Nothing concrete yet, but
I've got a funny feeling that's where much of the bug hunting is going
right now.
Nothing's reproducing on my end, though, and this fixes some pretty
concrete issues so I figured there's no reason to delay it:
- a fix to the linker script to avoid orpahaned sections in
kernel/pi"
* tag 'riscv-for-linus-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix orphan section warnings caused by kernel/pi
Jens Axboe [Tue, 9 May 2023 15:12:24 +0000 (09:12 -0600)]
pipe: check for IOCB_NOWAIT alongside O_NONBLOCK
Pipe reads or writes need to enable nonblocking attempts, if either
O_NONBLOCK is set on the file, or IOCB_NOWAIT is set in the iocb being
passed in. The latter isn't currently true, ensure we check for both
before waiting on data or space.
Ming Lei [Fri, 5 May 2023 15:31:42 +0000 (23:31 +0800)]
ublk: fix command op code check
In case of CONFIG_BLKDEV_UBLK_LEGACY_OPCODES, type of cmd opcode could
be 0 or 'u'; and type can only be 'u' if CONFIG_BLKDEV_UBLK_LEGACY_OPCODES
isn't set.
Ivan Orlov [Fri, 12 May 2023 13:05:32 +0000 (17:05 +0400)]
nbd: Fix debugfs_create_dir error checking
The debugfs_create_dir function returns ERR_PTR in case of error, and the
only correct way to check if an error occurred is 'IS_ERR' inline function.
This patch will replace the null-comparison with IS_ERR.
Linus Torvalds [Fri, 12 May 2023 12:59:08 +0000 (07:59 -0500)]
Merge tag 'firewire-fixes-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fix from Takashi Sakamoto:
- fix early release of request packet
* tag 'firewire-fixes-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: net: fix unexpected release of object for asynchronous request packet
i915:
- taint kernel when force_probe is used
- NULL deref and div-by-zero fixes for display
- GuC error capture fix for Xe devices"
* tag 'drm-fixes-2023-05-12' of git://anongit.freedesktop.org/drm/drm: (24 commits)
drm/amdgpu: change gfx 11.0.4 external_id range
drm/amdgpu/jpeg: Remove harvest checking for JPEG3
drm/amdgpu/gfx: disable gfx9 cp_ecc_error_irq only when enabling legacy gfx ras
drm/amd/pm: avoid potential UBSAN issue on legacy asics
drm/i915: taint kernel when force probing unsupported devices
drm/i915/dp: prevent potential div-by-zero
drm/i915: Fix NULL ptr deref by checking new_crtc_state
drm/i915/guc: Don't capture Gen8 regs on Xe devices
drm/amdgpu: disable sdma ecc irq only when sdma RAS is enabled in suspend
drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)
drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcs
drm/amd/display: Enforce 60us prefetch for 200Mhz DCFCLK modes
drm/amd/display: Add symclk workaround during disable link output
drm/amd/pm: parse pp_handle under appropriate conditions
drm/amdgpu: set gfx9 onwards APU atomics support to be true
drm/amdgpu/nv: update VCN 3 max HEVC encoding resolution
drm/sched: Check scheduler work queue before calling timeout handling
drm/mipi-dsi: Set the fwnode for mipi_dsi_device
drm/nouveau/disp: More DP_RECEIVER_CAP_SIZE array fixes
drm/dsc: fix DP_DSC_MAX_BPP_DELTA_* macro values
...
Linus Torvalds [Thu, 11 May 2023 21:51:11 +0000 (16:51 -0500)]
Merge tag 'xfs-6.4-rc1-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs bug fixes from Dave Chinner:
"Largely minor bug fixes and cleanups, th emost important of which are
probably the fixes for regressions in the extent allocation code:
- fixes for inode garbage collection shutdown racing with work queue
updates
- ensure inodegc workers run on the CPU they are supposed to
- disable counter scrubbing until we can exclusively freeze the
filesystem from the kernel
- regression fixes for new allocation related bugs
- a couple of minor cleanups"
* tag 'xfs-6.4-rc1-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix xfs_inodegc_stop racing with mod_delayed_work
xfs: disable reaping in fscounters scrub
xfs: check that per-cpu inodegc workers actually run on that cpu
xfs: explicitly specify cpu when forcing inodegc delayed work to run immediately
xfs: fix negative array access in xfs_getbmap
xfs: don't allocate into the data fork for an unshare request
xfs: flush dirty data and drain directios before scrubbing cow fork
xfs: set bnobt/cntbt numrecs correctly when formatting new AGs
xfs: don't unconditionally null args->pag in xfs_bmap_btalloc_at_eof
Zheng Wang [Thu, 27 Apr 2023 03:08:41 +0000 (11:08 +0800)]
fbdev: imsttfb: Fix use after free bug in imsttfb_probe
A use-after-free bug may occur if init_imstt invokes framebuffer_release
and free the info ptr. The caller, imsttfb_probe didn't notice that and
still keep the ptr as private data in pdev.
If we remove the driver which will call imsttfb_remove to make cleanup,
UAF happens.
Fix it by return error code if bad case happens in init_imstt.
Dave Airlie [Thu, 11 May 2023 19:32:36 +0000 (05:32 +1000)]
Merge tag 'drm-misc-fixes-2023-05-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.4-rc2:
- More DSC macro fixes.
- Small mipi-dsi fix.
- Scheduler timeout handling fix.
---
drm-misc-fixes for v6.4-rc1:
- Fix DSC macros.
- Fix VESA format for simplefb.
- Prohibit potential out-of-bounds access in generic fbdev emulation.
- Improve AST2500+ compat on ARM.
Linus Torvalds [Thu, 11 May 2023 14:01:40 +0000 (09:01 -0500)]
Merge tag 'dt-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt
Pull devicetree binding fixes from Krzysztof Kozlowski:
"A few fixes for Devicetree bindings and related docs, all for issues
introduced in v6.4-rc1 commits:
- media/ov2685: fix number of possible data lanes, as old binding
explicitly mentioned one data lane. This fixes dt_binding_check
warnings like:
Documentation/devicetree/bindings/media/rockchip-isp1.example.dtb: camera@3c: port:endpoint:data-lanes: [[1]] is too short
From schema: Documentation/devicetree/bindings/media/i2c/ovti,ov2685.yaml
- PCI/fsl,imx6q: correct parsing of assigned-clocks and related
properties and make the clocks more specific per PCI device (host
or endpoint). This fixes dtschema limitation and dt_binding_check
warnings like:
Documentation/devicetree/bindings/pci/fsl,imx6q-pcie-ep.example.dtb: pcie-ep@33800000: Unevaluated properties are not allowed
From schema: Documentation/devicetree/bindings/pci/fsl,imx6q-pcie-ep.yaml
- Maintainers: correct path of Apple PWM binding. This fixes
refcheckdocs warning"
* tag 'dt-fixes-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt:
dt-bindings: PCI: fsl,imx6q: fix assigned-clocks warning
MAINTAINERS: adjust file entry for ARM/APPLE MACHINE SUPPORT
media: dt-bindings: ov2685: Correct data-lanes attribute
Linus Torvalds [Thu, 11 May 2023 13:42:47 +0000 (08:42 -0500)]
Merge tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter.
Current release - regressions:
- mtk_eth_soc: fix NULL pointer dereference
Previous releases - regressions:
- core:
- skb_partial_csum_set() fix against transport header magic value
- fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
- annotate sk->sk_err write from do_recvmmsg()
- add vlan_get_protocol_and_depth() helper
- netlink: annotate accesses to nlk->cb_running
- netfilter: always release netdev hooks from notifier
Previous releases - always broken:
- core: deal with most data-races in sk_wait_event()
- netfilter: fix possible bug_on with enable_hooks=1
- eth: bonding: fix send_peer_notif overflow
- eth: xpcs: fix incorrect number of interfaces
- eth: ipvlan: fix out-of-bounds caused by unclear skb->cb
* tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
af_unix: Fix data races around sk->sk_shutdown.
af_unix: Fix a data race of sk->sk_receive_queue->qlen.
net: datagram: fix data-races in datagram_poll()
net: mscc: ocelot: fix stat counter register values
ipvlan:Fix out-of-bounds caused by unclear skb->cb
docs: networking: fix x25-iface.rst heading & index order
gve: Remove the code of clearing PBA bit
tcp: add annotations around sk->sk_shutdown accesses
net: add vlan_get_protocol_and_depth() helper
net: pcs: xpcs: fix incorrect number of interfaces
net: deal with most data-races in sk_wait_event()
net: annotate sk->sk_err write from do_recvmmsg()
netlink: annotate accesses to nlk->cb_running
kselftest: bonding: add num_grat_arp test
selftests: forwarding: lib: add netns support for tc rule handle stats get
Documentation: bonding: fix the doc of peer_notif_delay
bonding: fix send_peer_notif overflow
net: ethernet: mtk_eth_soc: fix NULL pointer dereference
selftests: nft_flowtable.sh: check ingress/egress chain too
selftests: nft_flowtable.sh: monitor result file sizes
...
Zongjie Li [Tue, 9 May 2023 11:27:26 +0000 (19:27 +0800)]
fbdev: arcfb: Fix error handling in arcfb_probe()
Smatch complains that:
arcfb_probe() warn: 'irq' from request_irq() not released on lines: 587.
Fix error handling in the arcfb_probe() function. If IO addresses are
not provided or framebuffer registration fails, the code will jump to
the err_addr or err_register_fb label to release resources.
If IRQ request fails, previously allocated resources will be freed.
Fixes: 1154ea7dcd8e ("[PATCH] Framebuffer driver for Arc LCD board") Signed-off-by: Zongjie Li <[email protected]> Reviewed-by: Dongliang Mu <[email protected]> Signed-off-by: Helge Deller <[email protected]>
Guchun Chen [Sat, 6 May 2023 12:06:45 +0000 (20:06 +0800)]
drm/amdgpu/gfx: disable gfx9 cp_ecc_error_irq only when enabling legacy gfx ras
gfx9 cp_ecc_error_irq is only enabled when legacy gfx ras is assert.
So in gfx_v9_0_hw_fini, interrupt disablement for cp_ecc_error_irq
should be executed under such condition, otherwise, an amdgpu_irq_put
calltrace will occur.
Jani Nikula [Thu, 4 May 2023 10:35:08 +0000 (13:35 +0300)]
drm/i915: taint kernel when force probing unsupported devices
For development and testing purposes, the i915.force_probe module
parameter and DRM_I915_FORCE_PROBE kconfig option allow probing of
devices that aren't supported by the driver.
The i915.force_probe module parameter is "unsafe" and setting it taints
the kernel. However, using the kconfig option does not.
Always taint the kernel when force probing a device that is not
supported.
v2: Drop "depends on EXPERT" to avoid build breakage (kernel test robot)
drm_dp_dsc_sink_max_slice_count() may return 0 if something goes
wrong on the part of the DSC sink and its DPCD register. This null
value may be later used as a divisor in intel_dsc_compute_params(),
which will lead to an error.
In the unlikely event that this issue occurs, fix it by testing the
return value of drm_dp_dsc_sink_max_slice_count() against zero.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
drm/i915: Fix NULL ptr deref by checking new_crtc_state
intel_atomic_get_new_crtc_state can return NULL, unless crtc state wasn't
obtained previously with intel_atomic_get_crtc_state, so we must check it
for NULLness here, just as in many other places, where we can't guarantee
that intel_atomic_get_crtc_state was called.
We are currently getting NULL ptr deref because of that, so this fix was
confirmed to help.
John Harrison [Fri, 28 Apr 2023 18:56:33 +0000 (11:56 -0700)]
drm/i915/guc: Don't capture Gen8 regs on Xe devices
A pair of pre-Xe registers were being included in the Xe capture list.
GuC was rejecting those as being invalid and logging errors about
them. So, stop doing it.
Guchun Chen [Sat, 6 May 2023 08:52:59 +0000 (16:52 +0800)]
drm/amdgpu: disable sdma ecc irq only when sdma RAS is enabled in suspend
sdma_v4_0_ip is shared on a few asics, but in sdma_v4_0_hw_fini,
driver unconditionally disables ecc_irq which is only enabled on
those asics enabling sdma ecc. This will introduce a warning in
suspend cycle on those chips with sdma ip v4.0, while without
sdma ecc. So this patch correct this.
Lin.Cao [Mon, 8 May 2023 09:28:41 +0000 (17:28 +0800)]
drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)
v1: Vmbo->shadow is used to back vram bo up when vram lost. So that we
should set shadow as vmbo->shadow to recover vmbo->bo
v2: Modify if(vmbo->shadow) shadow = vmbo->shadow as if(!vmbo->shadow)
continue;
Horatio Zhang [Thu, 4 May 2023 05:46:12 +0000 (01:46 -0400)]
drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcs
The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still
use amdgpu_irq_put to disable this interrupt, which caused the call trace
in this function.
v2:
- Handle umc and gfx ras cases in separated patch
- Retired the gfx_v11_0_cp_ecc_error_irq_funcs in gfx11
v3:
- Improve the subject and code comments
- Add judgment on gfx11 in the function of amdgpu_gfx_ras_late_init
v4:
- Drop the define of CP_ME1_PIPE_INST_ADDR_INTERVAL and
SET_ECC_ME_PIPE_STATE which using in gfx_v11_0_set_cp_ecc_error_state
- Check cp_ecc_error_irq.funcs rather than ip version for a more
sustainable life
Alvin Lee [Thu, 27 Apr 2023 19:10:13 +0000 (15:10 -0400)]
drm/amd/display: Enforce 60us prefetch for 200Mhz DCFCLK modes
[Description]
- Due to bandwidth / arbitration issues at 200Mhz DCFCLK,
we want to enforce minimum 60us of prefetch to avoid
intermittent underflow issues
- Since 60us prefetch is already enforced for UCLK DPM0,
and many DCFCLK's > 200Mhz are mapped to UCLK DPM1, in
theory there should not be any UCLK DPM regressions by
enforcing greater prefetch
Leo Chen [Wed, 26 Apr 2023 20:02:28 +0000 (16:02 -0400)]
drm/amd/display: Add symclk workaround during disable link output
[Why & How]
This is originally a change (9c75891f) in DCN32 because of the lack
of interface to set TX while keeping symclk on. Adding this workaround
to DCN314 will resolve the current issue.
Guchun Chen [Fri, 5 May 2023 05:20:11 +0000 (13:20 +0800)]
drm/amd/pm: parse pp_handle under appropriate conditions
amdgpu_dpm_is_overdrive_supported is a common API across all
asics, so we should cast pp_handle into correct structure
under different power frameworks.
v2: using return directly to simplify code
v3: SI asic does not carry od_enabled member in pp_handle, and update Fixes tag
Jakub Kicinski [Thu, 11 May 2023 02:08:58 +0000 (19:08 -0700)]
Merge tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter updates for net
The following patchset contains Netfilter fixes for net:
1) Fix UAF when releasing netnamespace, from Florian Westphal.
2) Fix possible BUG_ON when nf_conntrack is enabled with enable_hooks,
from Florian Westphal.
3) Fixes for nft_flowtable.sh selftest, from Boris Sukholitko.
4) Extend nft_flowtable.sh selftest to cover integration with
ingress/egress hooks, from Florian Westphal.
* tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: nft_flowtable.sh: check ingress/egress chain too
selftests: nft_flowtable.sh: monitor result file sizes
selftests: nft_flowtable.sh: wait for specific nc pids
selftests: nft_flowtable.sh: no need for ps -x option
selftests: nft_flowtable.sh: use /proc for pid checking
netfilter: conntrack: fix possible bug_on with enable_hooks=1
netfilter: nf_tables: always release netdev hooks from notifier
====================
KCSAN found a data race around sk->sk_shutdown where unix_release_sock()
and unix_shutdown() update it under unix_state_lock(), OTOH unix_poll()
and unix_dgram_poll() read it locklessly.
We need to annotate the writes and reads with WRITE_ONCE() and READ_ONCE().
BUG: KCSAN: data-race in unix_poll / unix_release_sock
write to 0xffff88800d0f8aec of 1 bytes by task 264 on cpu 0:
unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631
unix_release+0x59/0x80 net/unix/af_unix.c:1042
__sock_release+0x7d/0x170 net/socket.c:653
sock_close+0x19/0x30 net/socket.c:1397
__fput+0x179/0x5e0 fs/file_table.c:321
____fput+0x15/0x20 fs/file_table.c:349
task_work_run+0x116/0x1a0 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffff88800d0f8aec of 1 bytes by task 222 on cpu 1:
unix_poll+0xa3/0x2a0 net/unix/af_unix.c:3170
sock_poll+0xcf/0x2b0 net/socket.c:1385
vfs_poll include/linux/poll.h:88 [inline]
ep_item_poll.isra.0+0x78/0xc0 fs/eventpoll.c:855
ep_send_events fs/eventpoll.c:1694 [inline]
ep_poll fs/eventpoll.c:1823 [inline]
do_epoll_wait+0x6c4/0xea0 fs/eventpoll.c:2258
__do_sys_epoll_wait fs/eventpoll.c:2270 [inline]
__se_sys_epoll_wait fs/eventpoll.c:2265 [inline]
__x64_sys_epoll_wait+0xcc/0x190 fs/eventpoll.c:2265
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0x00 -> 0x03
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 222 Comm: dbus-broker Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Fixes: 3c73419c09a5 ("af_unix: fix 'poll for write'/ connected DGRAM sockets") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <[email protected]> Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Michal Kubiak <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
af_unix: Fix a data race of sk->sk_receive_queue->qlen.
KCSAN found a data race of sk->sk_receive_queue->qlen where recvmsg()
updates qlen under the queue lock and sendmsg() checks qlen under
unix_state_sock(), not the queue lock, so the reader side needs
READ_ONCE().
BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_wait_for_peer
write (marked) to 0xffff888019fe7c68 of 4 bytes by task 49792 on cpu 0:
__skb_unlink include/linux/skbuff.h:2347 [inline]
__skb_try_recv_from_queue+0x3de/0x470 net/core/datagram.c:197
__skb_try_recv_datagram+0xf7/0x390 net/core/datagram.c:263
__unix_dgram_recvmsg+0x109/0x8a0 net/unix/af_unix.c:2452
unix_dgram_recvmsg+0x94/0xa0 net/unix/af_unix.c:2549
sock_recvmsg_nosec net/socket.c:1019 [inline]
____sys_recvmsg+0x3a3/0x3b0 net/socket.c:2720
___sys_recvmsg+0xc8/0x150 net/socket.c:2764
do_recvmmsg+0x182/0x560 net/socket.c:2858
__sys_recvmmsg net/socket.c:2937 [inline]
__do_sys_recvmmsg net/socket.c:2960 [inline]
__se_sys_recvmmsg net/socket.c:2953 [inline]
__x64_sys_recvmmsg+0x153/0x170 net/socket.c:2953
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
read to 0xffff888019fe7c68 of 4 bytes by task 49793 on cpu 1:
skb_queue_len include/linux/skbuff.h:2127 [inline]
unix_recvq_full net/unix/af_unix.c:229 [inline]
unix_wait_for_peer+0x154/0x1a0 net/unix/af_unix.c:1445
unix_dgram_sendmsg+0x13bc/0x14b0 net/unix/af_unix.c:2048
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x148/0x160 net/socket.c:747
____sys_sendmsg+0x20e/0x620 net/socket.c:2503
___sys_sendmsg+0xc6/0x140 net/socket.c:2557
__sys_sendmmsg+0x11d/0x370 net/socket.c:2643
__do_sys_sendmmsg net/socket.c:2672 [inline]
__se_sys_sendmmsg net/socket.c:2669 [inline]
__x64_sys_sendmmsg+0x58/0x70 net/socket.c:2669
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
value changed: 0x0000000b -> 0x00000001
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 49793 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Linus Torvalds [Thu, 11 May 2023 01:04:04 +0000 (20:04 -0500)]
MAINTAINERS: re-sort all entries and fields
It's been a few years since we've sorted this thing, and the end result
is that we've added MAINTAINERS entries in the wrong order, and a number
of entries have their fields in non-canonical order too.
So roll this boulder up the hill one more time by re-running
./scripts/parse-maintainers.pl --order
on it.
This file ends up being fairly painful for merge conflicts even
normally, since unlike almost all other kernel files it's one of those
"everybody touches the same thing", and re-ordering all entries is only
going to make that worse. But the alternative is to never do it at all,
and just let it all rot..
The rc2 week is likely the quietest and least painful time to do this.
Takashi Sakamoto [Wed, 10 May 2023 01:35:33 +0000 (10:35 +0900)]
firewire: net: fix unexpected release of object for asynchronous request packet
The lifetime of object for asynchronous request packet is now maintained
by reference counting, while current implementation of firewire-net
releases the passed object in the handler.
Bob Peterson [Fri, 28 Apr 2023 16:07:46 +0000 (12:07 -0400)]
gfs2: Don't deref jdesc in evict
On corrupt gfs2 file systems the evict code can try to reference the
journal descriptor structure, jdesc, after it has been freed and set to
NULL. The sequence of events is:
init_journal()
...
fail_jindex:
gfs2_jindex_free(sdp); <------frees journals, sets jdesc = NULL
if (gfs2_holder_initialized(&ji_gh))
gfs2_glock_dq_uninit(&ji_gh);
fail:
iput(sdp->sd_jindex); <--references jdesc in evict_linked_inode
evict()
gfs2_evict_inode()
evict_linked_inode()
ret = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
<------references the now freed/zeroed sd_jdesc pointer.
The call to gfs2_trans_begin is done because the truncate_inode_pages
call can cause gfs2 events that require a transaction, such as removing
journaled data (jdata) blocks from the journal.
This patch fixes the problem by adding a check for sdp->sd_jdesc to
function gfs2_evict_inode. In theory, this should only happen to corrupt
gfs2 file systems, when gfs2 detects the problem, reports it, then tries
to evict all the system inodes it has read in up to that point.
Linus Torvalds [Wed, 10 May 2023 14:36:42 +0000 (09:36 -0500)]
Merge tag 'platform-drivers-x86-v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Nothing special to report just various small fixes:
- thinkpad_acpi: Fix profile (performance/bal/low-power) regression
on T490
- misc other small fixes / hw-id additions"
* tag 'platform-drivers-x86-v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/mellanox: fix potential race in mlxbf-tmfifo driver
platform/x86: touchscreen_dmi: Add info for the Dexp Ursus KX210i
platform/x86: touchscreen_dmi: Add upside-down quirk for GDIX1002 ts on the Juno Tablet
platform/x86: thinkpad_acpi: Add profile force ability
platform/x86: thinkpad_acpi: Fix platform profiles on T490
platform/x86: hp-wmi: add micmute to hp_wmi_keymap struct
platform/x86/intel-uncore-freq: Return error on write frequency
platform/x86: intel_scu_pcidrv: Add back PCI ID for Medfield
Vitaly Prosyak [Wed, 10 May 2023 13:51:11 +0000 (09:51 -0400)]
drm/sched: Check scheduler work queue before calling timeout handling
During an IGT GPU reset test we see again oops despite of
commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling
timeout handling).
It uses ready condition whether to call drm_sched_fault which unwind
the TDR leads to GPU reset.
However it looks the ready condition is overloaded with other meanings,
for example, for the following stack is related GPU reset :
does the following:
/* start the ring */
gfx_v9_0_cp_gfx_start(adev);
ring->sched.ready = true;
The same approach is for other ASICs as well :
gfx_v8_0_cp_gfx_resume
gfx_v10_0_kiq_resume, etc...
As a result, our GPU reset test causes GPU fault which calls unconditionally gfx_v9_0_fault
and then drm_sched_fault. However now it depends on whether the interrupt service routine
drm_sched_fault is executed after gfx_v9_0_cp_gfx_start is completed which sets the ready
field of the scheduler to true even for uninitialized schedulers and causes oops vs
no fault or when ISR drm_sched_fault is completed prior gfx_v9_0_cp_gfx_start and
NULL pointer dereference does not occur.
Use the field timeout_wq to prevent oops for uninitialized schedulers.
The field could be initialized by the work queue of resetting the domain.
btrfs: make clear_cache mount option to rebuild FST without disabling it
Previously clear_cache mount option would simply disable free-space-tree
feature temporarily then re-enable it to rebuild the whole free space
tree.
But this is problematic for block-group-tree feature, as we have an
artificial dependency on free-space-tree feature.
If we go the existing method, after clearing the free-space-tree
feature, we would flip the filesystem to read-only mode, as we detect a
super block write with block-group-tree but no free-space-tree feature.
This patch would change the behavior by properly rebuilding the free
space tree without disabling this feature, thus allowing clear_cache
mount option to work with block group tree.
Now we can mount a filesystem with block-group-tree feature and
clear_mount option:
$ mkfs.btrfs -O block-group-tree /dev/test/scratch1 -f
$ sudo mount /dev/test/scratch1 /mnt/btrfs -o clear_cache
$ sudo dmesg -t | head -n 5
BTRFS info (device dm-1): force clearing of disk cache
BTRFS info (device dm-1): using free space tree
BTRFS info (device dm-1): auto enabling async discard
BTRFS info (device dm-1): rebuilding free space tree
BTRFS info (device dm-1): checking UUID tree
btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add
btrfs_redirty_list_add zeroes the buffer data and sets the
EXTENT_BUFFER_NO_CHECK to make sure writeback is fine with a bogus
header. But it does that after already marking the buffer dirty, which
means that writeback could already be looking at the buffer.
Switch the order of operations around so that the buffer is only marked
dirty when we're ready to write it.
Naohiro Aota [Tue, 9 May 2023 18:29:15 +0000 (18:29 +0000)]
btrfs: zoned: fix full zone super block reading on ZNS
When both of the superblock zones are full, we need to check which
superblock is newer. The calculation of last superblock position is wrong
as it does not consider zone_capacity and uses the length.
Naohiro Aota [Mon, 8 May 2023 22:14:20 +0000 (22:14 +0000)]
btrfs: zoned: zone finish data relocation BG with last IO
For data block groups, we zone finish a zone (or, just deactivate it) when
seeing the last IO in btrfs_finish_ordered_io(). That is only called for
IOs using ZONE_APPEND, but we use a regular WRITE command for data
relocation IOs. Detect it and call btrfs_zone_finish_endio() properly.
Colin Foster [Wed, 10 May 2023 04:48:51 +0000 (21:48 -0700)]
net: mscc: ocelot: fix stat counter register values
Commit d4c367650704 ("net: mscc: ocelot: keep ocelot_stat_layout by reg
address, not offset") organized the stats counters for Ocelot chips, namely
the VSC7512 and VSC7514. A few of the counter offsets were incorrect, and
were caught by this warning:
WARNING: CPU: 0 PID: 24 at drivers/net/ethernet/mscc/ocelot_stats.c:909
ocelot_stats_init+0x1fc/0x2d8
reg 0x5000078 had address 0x220 but reg 0x5000079 has address 0x214,
bulking broken!
Fix these register offsets.
Fixes: d4c367650704 ("net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset") Signed-off-by: Colin Foster <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Ard Biesheuvel [Mon, 8 May 2023 12:40:19 +0000 (13:40 +0100)]
ARM: 9297/1: vfp: avoid unbalanced stack on 'success' return path
Commit c76c6c4ecbec0deb5 ("ARM: 9294/2: vfp: Fix broken softirq handling
with instrumentation enabled") updated the VFP exception entry logic to
go via a C function, so that we get the compiler's version of
local_bh_disable(), which may be instrumented, and isn't generally
callable from assembler.
However, this assumes that passing an alternative 'success' return
address works in C as it does in asm, and this is only the case if the C
calls in question are tail calls, as otherwise, the stack will need some
unwinding as well.
I have already sent patches to the list that replace most of the asm
logic with C code, and so it is preferable to have a minimal fix that
addresses the issue and can be backported along with the commit that it
fixes to v6.3 from v6.4. Hopefully, we can land the C conversion for v6.5.
So instead of passing the 'success' return address as a function
argument, pass the stack address from where to pop it so that both LR
and SP have the expected value.
t.feng [Wed, 10 May 2023 03:50:44 +0000 (11:50 +0800)]
ipvlan:Fix out-of-bounds caused by unclear skb->cb
If skb enqueue the qdisc, fq_skb_cb(skb)->time_to_send is changed which
is actually skb->cb, and IPCB(skb_in)->opt will be used in
__ip_options_echo. It is possible that memcpy is out of bounds and lead
to stack overflow.
We should clear skb->cb before ip_local_out or ip6_local_out.
v2:
1. clean the stack info
2. use IPCB/IP6CB instead of skb->cb
To reproduce(ipvlan with IPVLAN_MODE_L3):
Env setting:
=======================================================
modprobe ipvlan ipvlan_default_mode=1
sysctl net.ipv4.conf.eth0.forwarding=1
iptables -t nat -A POSTROUTING -s 20.0.0.0/255.255.255.0 -o eth0 -j
MASQUERADE
ip link add gw link eth0 type ipvlan
ip -4 addr add 20.0.0.254/24 dev gw
ip netns add net1
ip link add ipv1 link eth0 type ipvlan
ip link set ipv1 netns net1
ip netns exec net1 ip link set ipv1 up
ip netns exec net1 ip -4 addr add 20.0.0.4/24 dev ipv1
ip netns exec net1 route add default gw 20.0.0.254
ip netns exec net1 tc qdisc add dev ipv1 root netem loss 10%
ifconfig gw up
iptables -t filter -A OUTPUT -p tcp --dport 8888 -j REJECT --reject-with
icmp-port-unreachable
=======================================================
And then excute the shell(curl any address of eth0 can reach):
for((i=1;i<=100000;i++))
do
ip netns exec net1 curl x.x.x.x:8888
done
=======================================================
Randy Dunlap [Wed, 10 May 2023 02:29:14 +0000 (19:29 -0700)]
docs: networking: fix x25-iface.rst heading & index order
Fix the chapter heading for "X.25 Device Driver Interface" so that it
does not contain a trailing '-' character, which makes Sphinx
omit this heading from the contents.
Reverse the order of the x25.rst and x25-iface.rst files in the index
so that the project introduction (x25.rst) comes first.
Ziwei Xiao [Tue, 9 May 2023 22:51:23 +0000 (15:51 -0700)]
gve: Remove the code of clearing PBA bit
Clearing the PBA bit from the driver is race prone and it may lead to
dropped interrupt events. This could potentially lead to the traffic
being completely halted.
net: pcs: xpcs: fix incorrect number of interfaces
In synopsys_xpcs_compat[], the DW_XPCS_2500BASEX entry was setting
the number of interfaces using the xpcs_2500basex_features array
rather than xpcs_2500basex_interfaces. This causes us to overflow
the array of interfaces. Fix this.
Fixes: f27abde3042a ("net: pcs: add 2500BASEX support for Intel mGbE controller") Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: David S. Miller <[email protected]>
read to 0xffff88813ea4db59 of 1 bytes by task 28222 on cpu 1:
netlink_recvmsg+0x3b4/0x730 net/netlink/af_netlink.c:2022
sock_recvmsg_nosec+0x4c/0x80 net/socket.c:1017
____sys_recvmsg+0x2db/0x310 net/socket.c:2718
___sys_recvmsg net/socket.c:2762 [inline]
do_recvmmsg+0x2e5/0x710 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x00 -> 0x01
Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Bonding send_peer_notif was defined as u8. But the value is
num_peer_notif multiplied by peer_notif_delay, which is u8 * u32.
This would cause the send_peer_notif overflow.
Before the fix:
TEST: num_grat_arp (active-backup miimon num_grat_arp 10) [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 20) [ OK ]
4 garp packets sent on active slave eth1
TEST: num_grat_arp (active-backup miimon num_grat_arp 30) [FAIL]
24 garp packets sent on active slave eth1
TEST: num_grat_arp (active-backup miimon num_grat_arp 50) [FAIL]
After the fix:
TEST: num_grat_arp (active-backup miimon num_grat_arp 10) [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 20) [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 30) [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 50) [ OK ]
====================
Hangbin Liu [Tue, 9 May 2023 03:11:59 +0000 (11:11 +0800)]
selftests: forwarding: lib: add netns support for tc rule handle stats get
When run the test in netns, it's not easy to get the tc stats via
tc_rule_handle_stats_get(). With the new netns parameter, we can get
stats from specific netns like
Hangbin Liu [Tue, 9 May 2023 03:11:58 +0000 (11:11 +0800)]
Documentation: bonding: fix the doc of peer_notif_delay
Bonding only supports setting peer_notif_delay with miimon set.
Fixes: 0307d589c4d6 ("bonding: add documentation for peer_notif_delay") Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Hangbin Liu [Tue, 9 May 2023 03:11:57 +0000 (11:11 +0800)]
bonding: fix send_peer_notif overflow
Bonding send_peer_notif was defined as u8. Since commit 07a4ddec3ce9
("bonding: add an option to specify a delay between peer notifications").
the bond->send_peer_notif will be num_peer_notif multiplied by
peer_notif_delay, which is u8 * u32. This would cause the send_peer_notif
overflow easily. e.g.
ip link add bond0 type bond mode 1 miimon 100 num_grat_arp 30 peer_notify_delay 1000
To fix the overflow, let's set the send_peer_notif to u32 and limit
peer_notif_delay to 300s.
Reported-by: Liang Li <[email protected]> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2090053 Fixes: 07a4ddec3ce9 ("bonding: add an option to specify a delay between peer notifications") Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Check for NULL pointer to avoid kernel crashing in case of missing WO
firmware in case only a single WEDv2 device has been initialized, e.g. on
MT7981 which can connect just one wireless frontend.
Fixes: 86ce0d09e424 ("net: ethernet: mtk_eth_soc: use WO firmware for MT7981") Signed-off-by: Daniel Golle <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
selftests: nft_flowtable.sh: wait for specific nc pids
Doing wait with no parameters may interfere with some of the tests
having their own background processes.
Although no such test is currently present, the cleanup is useful
to rely on the nft_flowtable.sh for local development (e.g. running
background tcpdump command during the tests).
First turn this BUG_ON into a WARN. I think it was triggered
via enable_hooks=1 flag.
When this flag is turned on, the conntrack hooks are registered
before nf_ct_hook pointer gets assigned.
This opens a short window where packets enter the conntrack machinery,
can have skb->_nfct set up and a subsequent kfree_skb might occur
before nf_ct_hook is set.
Call nf_conntrack_init_end() to set nf_ct_hook before we register the
pernet ops.
Fixes: ba3fbe663635 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
netfilter: nf_tables: always release netdev hooks from notifier
This reverts "netfilter: nf_tables: skip netdev events generated on netns removal".
The problem is that when a veth device is released, the veth release
callback will also queue the peer netns device for removal.
Its possible that the peer netns is also slated for removal. In this
case, the device memory is already released before the pre_exit hook of
the peer netns runs:
BUG: KASAN: slab-use-after-free in nf_hook_entry_head+0x1b8/0x1d0
Read of size 8 at addr ffff88812c0124f0 by task kworker/u8:1/45
Workqueue: netns cleanup_net
Call Trace:
nf_hook_entry_head+0x1b8/0x1d0
__nf_unregister_net_hook+0x76/0x510
nft_netdev_unregister_hooks+0xa0/0x220
__nft_release_hook+0x184/0x490
nf_tables_pre_exit_net+0x12f/0x1b0
..
Order is:
1. First netns is released, veth_dellink() queues peer netns device
for removal
2. peer netns is queued for removal
3. peer netns device is released, unreg event is triggered
4. unreg event is ignored because netns is going down
5. pre_exit hook calls nft_netdev_unregister_hooks but device memory
might be free'd already.
net: phy: bcm7xx: Correct read from expansion register
Since the driver works in the "legacy" addressing mode, we need to write
to the expansion register (0x17) with bits 11:8 set to 0xf to properly
select the expansion register passed as argument.