Sungwoo Kim [Thu, 2 May 2024 16:09:31 +0000 (12:09 -0400)]
Bluetooth: HCI: Fix potential null-ptr-deref
Fix potential null-ptr-deref in hci_le_big_sync_established_evt().
Fixes: f777d8827817 (Bluetooth: ISO: Notify user space about failed bis connections) Signed-off-by: Sungwoo Kim <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Johan Hovold [Wed, 1 May 2024 12:34:52 +0000 (14:34 +0200)]
Bluetooth: qca: fix info leak when fetching fw build id
Add the missing sanity checks and move the 255-byte build-id buffer off
the stack to avoid leaking stack data through debugfs in case the
build-info reply is malformed.
Fixes: c0187b0bd3e9 ("Bluetooth: btqca: Add support to read FW build version for WCN3991 BTSoC") Cc: [email protected] # 5.12 Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Johan Hovold [Tue, 30 Apr 2024 17:07:40 +0000 (19:07 +0200)]
Bluetooth: qca: fix NVM configuration parsing
The NVM configuration files used by WCN3988 and WCN3990/1/8 have two
sets of configuration tags that are enclosed by a type-length header of
type four which the current parser fails to account for.
Instead the driver happily parses random data as if it were valid tags,
something which can lead to the configuration data being corrupted if it
ever encounters the words 0x0011 or 0x001b.
As is clear from commit b63882549b2b ("Bluetooth: btqca: Fix the NVM
baudrate tag offcet for wcn3991") the intention has always been to
process the configuration data also for WCN3991 and WCN3998 which
encodes the baud rate at a different offset.
Fix the parser so that it can handle the WCN3xxx configuration files,
which has an enclosing type-length header of type four and two sets of
TLV tags enclosed by a type-length header of type two and three,
respectively.
Note that only the first set, which contains the tags the driver is
currently looking for, will be parsed for now.
With the parser fixed, the software in-band sleep bit will now be set
for WCN3991 and WCN3998 (as it is for later controllers) and the default
baud rate 3200000 may be updated by the driver also for WCN3xxx
controllers.
Notably the deep-sleep feature bit is already set by default in all
configuration files in linux-firmware.
Add the missing sanity checks when parsing the firmware files before
downloading them to avoid accessing and corrupting memory beyond the
vmalloced buffer.
Fixes: 83e81961ff7e ("Bluetooth: btqca: Introduce generic QCA ROME support") Cc: [email protected] # 4.10 Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
==================================================================
BUG: KASAN: slab-use-after-free in __mutex_lock_common
kernel/locking/mutex.c:587 [inline]
BUG: KASAN: slab-use-after-free in __mutex_lock+0x8f/0xc30
kernel/locking/mutex.c:752
Read of size 8 at addr ffff888106cbbca8 by task kworker/u5:2/309
Fixes: bf6a4e30ffbd ("Bluetooth: disable advertisement filters during suspend") Signed-off-by: Sungwoo Kim <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Sungwoo Kim [Tue, 30 Apr 2024 06:32:10 +0000 (02:32 -0400)]
Bluetooth: L2CAP: Fix slab-use-after-free in l2cap_connect()
Extend a critical section to prevent chan from early freeing.
Also make the l2cap_connect() return type void. Nothing is using the
returned value but it is ugly to return a potentially freed pointer.
Making it void will help with backports because earlier kernels did use
the return value. Now the compile will break for kernels where this
patch is not a complete fix.
Johan Hovold [Thu, 25 Apr 2024 07:55:03 +0000 (09:55 +0200)]
Bluetooth: qca: fix wcn3991 device address check
Qualcomm Bluetooth controllers may not have been provisioned with a
valid device address and instead end up using the default address
00:00:00:00:5a:ad.
This address is now used to determine if a controller has a valid
address or if one needs to be provided through devicetree or by user
space before the controller can be used.
It turns out that the WCN3991 controllers used in Chromium Trogdor
machines use a different default address, 39:98:00:00:5a:ad, which also
needs to be marked as invalid so that the correct address is fetched
from the devicetree.
Qualcomm has unfortunately not yet provided any answers as to whether
the 39:98 encodes a hardware id and if there are other variants of the
default address that needs to be handled by the driver.
For now, add the Trogdor WCN3991 default address to the device address
check to avoid having these controllers start with the default address
instead of their assigned addresses.
Bluetooth: Fix use-after-free bugs caused by sco_sock_timeout
When the sco connection is established and then, the sco socket
is releasing, timeout_work will be scheduled to judge whether
the sco disconnection is timeout. The sock will be deallocated
later, but it is dereferenced again in sco_sock_timeout. As a
result, the use-after-free bugs will happen. The root cause is
shown below:
[ 95.890016] ==================================================================
[ 95.890496] BUG: KASAN: slab-use-after-free in sco_sock_timeout+0x5e/0x1c0
[ 95.890755] Write of size 4 at addr ffff88800c388080 by task kworker/0:0/7
...
[ 95.890755] Workqueue: events sco_sock_timeout
[ 95.890755] Call Trace:
[ 95.890755] <TASK>
[ 95.890755] dump_stack_lvl+0x45/0x110
[ 95.890755] print_address_description+0x78/0x390
[ 95.890755] print_report+0x11b/0x250
[ 95.890755] ? __virt_addr_valid+0xbe/0xf0
[ 95.890755] ? sco_sock_timeout+0x5e/0x1c0
[ 95.890755] kasan_report+0x139/0x170
[ 95.890755] ? update_load_avg+0xe5/0x9f0
[ 95.890755] ? sco_sock_timeout+0x5e/0x1c0
[ 95.890755] kasan_check_range+0x2c3/0x2e0
[ 95.890755] sco_sock_timeout+0x5e/0x1c0
[ 95.890755] process_one_work+0x561/0xc50
[ 95.890755] worker_thread+0xab2/0x13c0
[ 95.890755] ? pr_cont_work+0x490/0x490
[ 95.890755] kthread+0x279/0x300
[ 95.890755] ? pr_cont_work+0x490/0x490
[ 95.890755] ? kthread_blkcg+0xa0/0xa0
[ 95.890755] ret_from_fork+0x34/0x60
[ 95.890755] ? kthread_blkcg+0xa0/0xa0
[ 95.890755] ret_from_fork_asm+0x11/0x20
[ 95.890755] </TASK>
[ 95.890755]
[ 95.890755] Allocated by task 506:
[ 95.890755] kasan_save_track+0x3f/0x70
[ 95.890755] __kasan_kmalloc+0x86/0x90
[ 95.890755] __kmalloc+0x17f/0x360
[ 95.890755] sk_prot_alloc+0xe1/0x1a0
[ 95.890755] sk_alloc+0x31/0x4e0
[ 95.890755] bt_sock_alloc+0x2b/0x2a0
[ 95.890755] sco_sock_create+0xad/0x320
[ 95.890755] bt_sock_create+0x145/0x320
[ 95.890755] __sock_create+0x2e1/0x650
[ 95.890755] __sys_socket+0xd0/0x280
[ 95.890755] __x64_sys_socket+0x75/0x80
[ 95.890755] do_syscall_64+0xc4/0x1b0
[ 95.890755] entry_SYSCALL_64_after_hwframe+0x67/0x6f
[ 95.890755]
[ 95.890755] Freed by task 506:
[ 95.890755] kasan_save_track+0x3f/0x70
[ 95.890755] kasan_save_free_info+0x40/0x50
[ 95.890755] poison_slab_object+0x118/0x180
[ 95.890755] __kasan_slab_free+0x12/0x30
[ 95.890755] kfree+0xb2/0x240
[ 95.890755] __sk_destruct+0x317/0x410
[ 95.890755] sco_sock_release+0x232/0x280
[ 95.890755] sock_close+0xb2/0x210
[ 95.890755] __fput+0x37f/0x770
[ 95.890755] task_work_run+0x1ae/0x210
[ 95.890755] get_signal+0xe17/0xf70
[ 95.890755] arch_do_signal_or_restart+0x3f/0x520
[ 95.890755] syscall_exit_to_user_mode+0x55/0x120
[ 95.890755] do_syscall_64+0xd1/0x1b0
[ 95.890755] entry_SYSCALL_64_after_hwframe+0x67/0x6f
[ 95.890755]
[ 95.890755] The buggy address belongs to the object at ffff88800c388000
[ 95.890755] which belongs to the cache kmalloc-1k of size 1024
[ 95.890755] The buggy address is located 128 bytes inside of
[ 95.890755] freed 1024-byte region [ffff88800c388000, ffff88800c388400)
[ 95.890755]
[ 95.890755] The buggy address belongs to the physical page:
[ 95.890755] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88800c38a800 pfn:0xc388
[ 95.890755] head: order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 95.890755] anon flags: 0x100000000000840(slab|head|node=0|zone=1)
[ 95.890755] page_type: 0xffffffff()
[ 95.890755] raw: 0100000000000840ffff888006842dc000000000000000000000000000000001
[ 95.890755] raw: ffff88800c38a800000000000010000a00000001ffffffff0000000000000000
[ 95.890755] head: 0100000000000840ffff888006842dc000000000000000000000000000000001
[ 95.890755] head: ffff88800c38a800000000000010000a00000001ffffffff0000000000000000
[ 95.890755] head: 0100000000000003ffffea000030e201ffffea000030e24800000000ffffffff
[ 95.890755] head: 0000000800000000000000000000000000000000ffffffff0000000000000000
[ 95.890755] page dumped because: kasan: bad access detected
[ 95.890755]
[ 95.890755] Memory state around the buggy address:
[ 95.890755] ffff88800c387f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 95.890755] ffff88800c388000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 95.890755] >ffff88800c388080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 95.890755] ^
[ 95.890755] ffff88800c388100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 95.890755] ffff88800c388180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 95.890755] ==================================================================
Fix this problem by adding a check protected by sco_conn_lock to judget
whether the conn->hcon is null. Because the conn->hcon will be set to null,
when the sock is releasing.
Fixes: ba316be1b6a0 ("Bluetooth: schedule SCO timeouts with delayed_work") Signed-off-by: Duoming Zhou <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
PCI/ASPM: Clarify that pcie_aspm=off means leave ASPM untouched
Previously we claimed "pcie_aspm=off" meant that ASPM would be disabled,
which is wrong.
Correct this to say that with "pcie_aspm=off", Linux doesn't touch any ASPM
configuration at all. ASPM may have been enabled by firmware, and that
will be left unchanged. See "aspm_support_enabled".
Linus Torvalds [Fri, 3 May 2024 16:33:59 +0000 (09:33 -0700)]
Merge tag 'block-6.9-20240503' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Nothing major in here - an nvme pull request with mostly auth/tcp
fixes, and a single fix for ublk not setting segment count and size
limits"
* tag 'block-6.9-20240503' of git://git.kernel.dk/linux:
nvme-tcp: strict pdu pacing to avoid send stalls on TLS
nvmet: fix nvme status code when namespace is disabled
nvmet-tcp: fix possible memory leak when tearing down a controller
nvme: cancel pending I/O if nvme controller is in terminal state
nvmet-auth: replace pr_debug() with pr_err() to report an error.
nvmet-auth: return the error code to the nvmet_auth_host_hash() callers
nvme: find numa distance only if controller has valid numa id
ublk: remove segment count and size limits
nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH
Linus Torvalds [Fri, 3 May 2024 16:24:46 +0000 (09:24 -0700)]
Merge tag 'sound-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"As usual in a late stage, we received a fair amount of fixes for ASoC,
and it became bigger than wished. But all fixes are rather device-
specific, and they look pretty safe to apply.
A major par of changes are series of fixes for ASoC meson and SOF
drivers as well as for Realtek and Cirrus codecs. In addition, recent
emu10k1 regression fixes and usual HD-audio quirks are included"
* tag 'sound-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (46 commits)
ALSA: hda/realtek: Fix build error without CONFIG_PM
ALSA: hda/realtek: Fix conflicting PCI SSID 17aa:386f for Lenovo Legion models
ALSA: hda/realtek - Set GPIO3 to default at S4 state for Thinkpad with ALC1318
ALSA: hda: intel-sdw-acpi: fix usage of device_get_named_child_node()
ALSA: hda: intel-dsp-config: harden I2C/I2S codec detection
ASoC: cs35l56: fix usages of device_get_named_child_node()
ASoC: da7219-aad: fix usage of device_get_named_child_node()
ASoC: meson: cards: select SND_DYNAMIC_MINORS
ASoC: meson: axg-tdm: add continuous clock support
ASoC: meson: axg-tdm-interface: manage formatters in trigger
ASoC: meson: axg-card: make links nonatomic
ASoC: meson: axg-fifo: use threaded irq to check periods
ALSA: hda/realtek: Fix mute led of HP Laptop 15-da3001TU
ALSA: emu10k1: make E-MU FPGA writes potentially more reliable
ALSA: emu10k1: fix E-MU dock initialization
ALSA: emu10k1: use mutex for E-MU FPGA access locking
ALSA: emu10k1: move the whole GPIO event handling to the workqueue
ALSA: emu10k1: factor out snd_emu1010_load_dock_firmware()
ALSA: emu10k1: fix E-MU card dock presence monitoring
ASoC: rt715-sdca: volume step modification
...
panel:
- ili9341: avoid OF for device properties; respect deferred probe;
fix usage of errno codes
ttm:
- fix status output
vmwgfx:
- fix legacy display unit
- fix read length in fence signalling"
* tag 'drm-fixes-2024-05-03' of https://gitlab.freedesktop.org/drm/kernel: (25 commits)
drm/xe/display: Fix ADL-N detection
drm/panel: ili9341: Use predefined error codes
drm/panel: ili9341: Respect deferred probe
drm/panel: ili9341: Correct use of device property APIs
drm/xe/vm: prevent UAF in rebind_work_func()
drm/amd/display: Disable panel replay by default for now
drm/amdgpu: fix doorbell regression
drm/amdkfd: Flush the process wq before creating a kfd_process
drm/amd/display: Disable seamless boot on 128b/132b encoding
drm/amd/display: Fix DC mode screen flickering on DCN321
drm/amd/display: Add VCO speed parameter for DCN31 FPU
drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2
drm/amd/display: Allocate zero bw after bw alloc enable
drm/amd/display: Fix incorrect DSC instance for MST
drm/amd/display: Atom Integrated System Info v2_2 for DCN35
drm/amd/display: Add dtbclk access to dcn315
drm/amd/display: Ensure that dmcub support flag is set for DCN20
drm/amd/display: Handle Y carry-over in VCP X.Y calculation
drm/amdgpu: Fix VRAM memory accounting
drm/vmwgfx: Fix invalid reads in fence signaled events
...
Linus Torvalds [Fri, 3 May 2024 16:12:28 +0000 (09:12 -0700)]
Merge tag 'spi-fix-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few small fixes for v6.9,
The core fix is for issues with reuse of a spi_message in the case
where we've got queued messages (a relatively rare occurrence with
modern code so it wasn't noticed in testing).
We also avoid an issue with the Kunpeng driver by simply removing the
debug interface that could trigger it, and address issues with
confusing and corrupted output when printing the IP version of the AXI
SPI engine"
* tag 'spi-fix-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fix null pointer dereference within spi_sync
spi: hisi-kunpeng: Delete the dump interface of data registers in debugfs
spi: axi-spi-engine: fix version format string
tcp: Use refcount_inc_not_zero() in tcp_twsk_unique().
Anderson Nascimento reported a use-after-free splat in tcp_twsk_unique()
with nice analysis.
Since commit ec94c2696f0b ("tcp/dccp: avoid one atomic operation for
timewait hashdance"), inet_twsk_hashdance() sets TIME-WAIT socket's
sk_refcnt after putting it into ehash and releasing the bucket lock.
Thus, there is a small race window where other threads could try to
reuse the port during connect() and call sock_hold() in tcp_twsk_unique()
for the TIME-WAIT socket with zero refcnt.
If that happens, the refcnt taken by tcp_twsk_unique() is overwritten
and sock_put() will cause underflow, triggering a real use-after-free
somewhere else.
To avoid the use-after-free, we need to use refcount_inc_not_zero() in
tcp_twsk_unique() and give up on reusing the port if it returns false.
To fix this issue, change tcp_shutdown() to not
perform a TCP_SYN_RECV -> TCP_FIN_WAIT1 transition,
which makes no sense anyway.
When tcp_rcv_state_process() later changes socket state
from TCP_SYN_RECV to TCP_ESTABLISH, then look at
sk->sk_shutdown to finally enter TCP_FIN_WAIT1 state,
and send a FIN packet from a sane socket state.
This means tcp_send_fin() can now be called from BH
context, and must use GFP_ATOMIC allocations.
Josef Bacik [Mon, 29 Apr 2024 13:03:35 +0000 (09:03 -0400)]
btrfs: make sure that WRITTEN is set on all metadata blocks
We previously would call btrfs_check_leaf() if we had the check
integrity code enabled, which meant that we could only run the extended
leaf checks if we had WRITTEN set on the header flags.
This leaves a gap in our checking, because we could end up with
corruption on disk where WRITTEN isn't set on the leaf, and then the
extended leaf checks don't get run which we rely on to validate all of
the item pointers to make sure we don't access memory outside of the
extent buffer.
However, since 732fab95abe2 ("btrfs: check-integrity: remove
CONFIG_BTRFS_FS_CHECK_INTEGRITY option") we no longer call
btrfs_check_leaf() from btrfs_mark_buffer_dirty(), which means we only
ever call it on blocks that are being written out, and thus have WRITTEN
set, or that are being read in, which should have WRITTEN set.
Add checks to make sure we have WRITTEN set appropriately, and then make
sure __btrfs_check_leaf() always does the item checking. This will
protect us from file systems that have been corrupted and no longer have
WRITTEN set on some of the blocks.
This was hit on a crafted image tweaking the WRITTEN bit and reported by
KASAN as out-of-bound access in the eb accessors. The example is a dir
item at the end of an eb.
btrfs: qgroup: do not check qgroup inherit if qgroup is disabled
[BUG]
After kernel commit 86211eea8ae1 ("btrfs: qgroup: validate
btrfs_qgroup_inherit parameter"), user space tool snapper will fail to
create snapshot using its timeline feature.
[CAUSE]
It turns out that, if using timeline snapper would unconditionally pass
btrfs_qgroup_inherit parameter (assigning the new snapshot to qgroup 1/0)
for snapshot creation.
In that case, since qgroup is disabled there would be no qgroup 1/0, and
btrfs_qgroup_check_inherit() would return -ENOENT and fail the whole
snapshot creation.
[FIX]
Just skip the check if qgroup is not enabled.
This is to keep the older behavior for user space tools, as if the
kernel behavior changed for user space, it is a regression of kernel.
Thankfully snapper is also fixing the behavior by detecting if qgroup is
running in the first place, so the effect should not be that huge.
Linus Torvalds [Thu, 2 May 2024 17:49:12 +0000 (10:49 -0700)]
Merge tag 'for-6.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- set correct ram_bytes when splitting ordered extent. This can be
inconsistent on-disk but harmless as it's not used for calculations
and it's only advisory for compression
- fix lockdep splat when taking cleaner mutex in qgroups disable ioctl
- fix missing mutex unlock on error path when looking up sys chunk for
relocation
* tag 'for-6.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: set correct ram_bytes when splitting ordered extent
btrfs: take the cleaner_mutex earlier in qgroup disable
btrfs: add missing mutex_unlock in btrfs_relocate_sys_chunks()
Linus Torvalds [Thu, 2 May 2024 17:43:35 +0000 (10:43 -0700)]
Merge tag 's390-6.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- The function __storage_key_init_range() expects the end address to be
the first byte outside the range to be initialized. Fix the callers
that provide the last byte within the range instead.
- 3270 Channel Command Word (CCW) may contain zero data address in case
there is no data in the request. Add data availability check to avoid
erroneous non-zero value as result of virt_to_dma32(NULL) application
in cases there is no data
- Add missing CFI directives for an unwinder to restore the return
address in the vDSO assembler code
- NUL-terminate kernel buffer when duplicating user space memory region
on Channel IO (CIO) debugfs write inject
- Fix wrong format string in zcrypt debug output
- Return -EBUSY code when a CCA card is temporarily unavailabile
- Restore a loop that retries derivation of a protected key from a
secure key in cases the low level reports temporarily unavailability
with -EBUSY code
* tag 's390-6.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/paes: Reestablish retry loop in paes
s390/zcrypt: Use EBUSY to indicate temp unavailability
s390/zcrypt: Handle ep11 cprb return code
s390/zcrypt: Fix wrong format string in debug feature printout
s390/cio: Ensure the copied buf is NUL terminated
s390/vdso: Add CFI for RA register to asm macro vdso_func
s390/3270: Fix buffer assignment
s390/mm: Fix clearing storage keys for huge pages
s390/mm: Fix storage key clearing for guest huge pages
Linus Torvalds [Thu, 2 May 2024 17:41:28 +0000 (10:41 -0700)]
Merge tag 'xtensa-20240502' of https://github.com/jcmvbkbc/linux-xtensa
Pull xtensa fixes from Max Filippov:
- fix unused variable warning caused by empty flush_dcache_page()
definition
- fix stack unwinding on windowed noMMU XIP configurations
- fix Coccinelle warning 'opportunity for min()' in xtensa ISS platform
code
* tag 'xtensa-20240502' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: remove redundant flush_dcache_page and ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE macros
tty: xtensa/iss: Use min() to fix Coccinelle warning
xtensa: fix MAKE_PC_FROM_RA second argument
x86/xen: return a sane initial apic id when running as PV guest
With recent sanity checks for topology information added, there are now
warnings issued for APs when running as a Xen PV guest:
[Firmware Bug]: CPU 1: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0001
This is due to the initial APIC ID obtained via CPUID for PV guests is
always 0.
Avoid the warnings by synthesizing the CPUID data to contain the same
initial APIC ID as xen_pv_smp_config() is using for registering the
APIC IDs of all CPUs.
Fixes: 52128a7a21f7 ("86/cpu/topology: Make the APIC mismatch warnings complete") Signed-off-by: Juergen Gross <[email protected]>
Lucas De Marchi [Thu, 25 Apr 2024 18:16:09 +0000 (11:16 -0700)]
drm/xe/display: Fix ADL-N detection
Contrary to i915, in xe ADL-N is kept as a different platform, not a
subplatform of ADL-P. Since the display side doesn't need to
differentiate between P and N, i.e. IS_ALDERLAKE_P_N() is never called,
just fixup the compat header to check for both P and N.
Moving ADL-N to be a subplatform would be more complex as the firmware
loading in xe only handles platforms, not subplatforms, as going forward
the direction is to check on IP version rather than
platforms/subplatforms.
Fix warning when initializing display:
xe 0000:00:02.0: [drm:intel_pch_type [xe]] Found Alder Lake PCH
------------[ cut here ]------------
xe 0000:00:02.0: drm_WARN_ON(!((dev_priv)->info.platform == XE_ALDERLAKE_S) && !((dev_priv)->info.platform == XE_ALDERLAKE_P))
Linus Torvalds [Thu, 2 May 2024 16:05:21 +0000 (09:05 -0700)]
Merge tag 'firewire-fixes-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire fixes from Takashi Sakamoto:
"Two driver fixes:
- The firewire-ohci driver for 1394 OHCI hardware does not fill time
stamp for response packet when handling asynchronous transaction to
local destination. This brings an inconvenience that the response
packet is not equivalent between the transaction to local and
remote. It is fixed by fulfilling the time stamp with hardware
time. The fix should be applied to Linux kernel v6.5 or later as
well.
- The nosy driver for Texas Instruments TSB12LV21A (PCILynx) has
long-standing issue about the behaviour when user space application
passes less size of buffer than expected. It is fixed by returning
zero according to the convention of UNIX-like systems"
* tag 'firewire-fixes-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: ohci: fulfill timestamp for some local asynchronous transaction
firewire: nosy: ensure user_length is taken into account when fetching packet contents
Thomas Gleixner [Thu, 2 May 2024 14:39:47 +0000 (16:39 +0200)]
x86/xen/smp_pv: Register the boot CPU APIC properly
The topology core expects the boot APIC to be registered from earhy APIC
detection first and then again when the firmware tables are evaluated. This
is used for detecting the real BSP CPU on a kexec kernel.
The recent conversion of XEN/PV to register fake APIC IDs failed to
register the boot CPU APIC correctly as it only registers it once. This
causes the BSP detection mechanism to trigger wrongly:
CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1
Additionally this results in one CPU being ignored.
Register the boot CPU APIC twice so that the XEN/PV fake enumeration
behaves like real firmware.
Linus Torvalds [Thu, 2 May 2024 16:01:27 +0000 (09:01 -0700)]
Merge tag 'thermal-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"Fix a memory leak and a few locking issues (that may cause the kernel
to crash in principle if all goes wrong) in the thermal debug code
introduced during the 6.8 development cycle"
* tag 'thermal-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/debugfs: Prevent use-after-free from occurring after cdev removal
thermal/debugfs: Fix two locking issues with thermal zone debug
thermal/debugfs: Free all thermal zone debug memory on zone removal
Linus Torvalds [Thu, 2 May 2024 15:51:47 +0000 (08:51 -0700)]
Merge tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf.
Relatively calm week, likely due to public holiday in most places. No
known outstanding regressions.
Current release - regressions:
- rxrpc: fix wrong alignmask in __page_frag_alloc_align()
- eth: e1000e: change usleep_range to udelay in PHY mdic access
Previous releases - regressions:
- gro: fix udp bad offset in socket lookup
- bpf: fix incorrect runtime stat for arm64
- tipc: fix UAF in error path
- netfs: fix a potential infinite loop in extract_user_to_sg()
- eth: ice: ensure the copied buf is NUL terminated
- eth: qeth: fix kernel panic after setting hsuid
Previous releases - always broken:
- bpf:
- verifier: prevent userspace memory access
- xdp: use flags field to disambiguate broadcast redirect
- bridge: fix multicast-to-unicast with fraglist GSO
- mptcp: ensure snd_nxt is properly initialized on connect
- nsh: fix outer header access in nsh_gso_segment().
- eth: bcmgenet: fix racing registers access
- eth: vxlan: fix stats counters.
Misc:
- a bunch of MAINTAINERS file updates"
* tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
MAINTAINERS: mark MYRICOM MYRI-10G as Orphan
MAINTAINERS: remove Ariel Elior
net: gro: add flush check in udp_gro_receive_segment
net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb
ipv4: Fix uninit-value access in __ip_make_skb()
s390/qeth: Fix kernel panic after setting hsuid
vxlan: Pull inner IP header in vxlan_rcv().
tipc: fix a possible memleak in tipc_buf_append
tipc: fix UAF in error path
rxrpc: Clients must accept conn from any address
net: core: reject skb_copy(_expand) for fraglist GSO skbs
net: bridge: fix multicast-to-unicast with fraglist GSO
mptcp: ensure snd_nxt is properly initialized on connect
e1000e: change usleep_range to udelay in PHY mdic access
net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341
cxgb4: Properly lock TX queue for the selftest.
rxrpc: Fix using alignmask being zero for __page_frag_alloc_align()
vxlan: Add missing VNI filter counter update in arp_reduce().
vxlan: Fix racy device stats updates.
net: qede: use return from qede_parse_actions()
...
* git://git.infradead.org/nvme:
nvme-tcp: strict pdu pacing to avoid send stalls on TLS
nvmet: fix nvme status code when namespace is disabled
nvmet-tcp: fix possible memory leak when tearing down a controller
nvme: cancel pending I/O if nvme controller is in terminal state
nvmet-auth: replace pr_debug() with pr_err() to report an error.
nvmet-auth: return the error code to the nvmet_auth_host_hash() callers
nvme: find numa distance only if controller has valid numa id
nvme: fix warn output about shared namespaces without CONFIG_NVME_MULTIPATH
Will Deacon [Thu, 2 May 2024 09:37:23 +0000 (10:37 +0100)]
swiotlb: initialise restricted pool list_head when SWIOTLB_DYNAMIC=y
Using restricted DMA pools (CONFIG_DMA_RESTRICTED_POOL=y) in conjunction
with dynamic SWIOTLB (CONFIG_SWIOTLB_DYNAMIC=y) leads to the following
crash when initialising the restricted pools at boot-time:
| Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
| Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
| pc : rmem_swiotlb_device_init+0xfc/0x1ec
| lr : rmem_swiotlb_device_init+0xf0/0x1ec
| Call trace:
| rmem_swiotlb_device_init+0xfc/0x1ec
| of_reserved_mem_device_init_by_idx+0x18c/0x238
| of_dma_configure_id+0x31c/0x33c
| platform_dma_configure+0x34/0x80
faddr2line reveals that the crash is in the list validation code:
because add_mem_pool() is trying to list_add_rcu() to a NULL
'mem->pools'.
Fix the crash by initialising the 'mem->pools' list_head in
rmem_swiotlb_device_init() before calling add_mem_pool().
Reported-by: Nikita Ioffe <[email protected]> Tested-by: Nikita Ioffe <[email protected]> Fixes: 1aaa736815eb ("swiotlb: allocate a new memory pool when existing pools are full") Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
====================
net: gro: add flush/flush_id checks and fix wrong offset in udp
This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to add network_offset and inner_network_offset to
napi_gro_cb and use these offsets for socket lookup.
In addition p->flush/flush_id should be checked in all UDP flows. The
same logic from tcp_gro_receive is applied for all flows in
udp_gro_receive_segment. This prevents packets with mismatching network
headers (flush/flush_id turned on) from merging in UDP GRO.
The original series includes a change to vxlan test which adds the local
parameter to prevent similar future bugs. I plan to submit it separately to
net-next.
This series is part of a previously submitted series to net-next:
https://lore.kernel.org/all/20240408141720[email protected]/
v3 -> v4:
- Store network offsets, and use them only in udp_gro_complete flows
- Correct commit hash used in Fixes tag
- v3:
https://lore.kernel.org/netdev/20240424163045[email protected]/
v2 -> v3:
- Add network_offsets and fix udp bug in a single commit to make backporting easier
- Write to inner_network_offset in {inet,ipv6}_gro_receive
- Use network_offsets union in tcp[46]_gro_complete as well
- v2:
https://lore.kernel.org/netdev/20240419153542[email protected]/
v1 -> v2:
- Use network_offsets instead of p_poff param as suggested by Willem
- Check flush before postpull, and for all UDP GRO flows
- v1:
https://lore.kernel.org/netdev/20240412152120[email protected]/
====================
Richard Gobert [Tue, 30 Apr 2024 14:35:55 +0000 (16:35 +0200)]
net: gro: add flush check in udp_gro_receive_segment
GRO-GSO path is supposed to be transparent and as such L3 flush checks are
relevant to all UDP flows merging in GRO. This patch uses the same logic
and code from tcp_gro_receive, terminating merge if flush is non zero.
Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Signed-off-by: Richard Gobert <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
Richard Gobert [Tue, 30 Apr 2024 14:35:54 +0000 (16:35 +0200)]
net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb
Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp:
additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the
complete phase of gro. The functions always return skb->network_header,
which in the case of encapsulated packets at the gro complete phase, is
always set to the innermost L3 of the packet. That means that calling
{ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in
gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_
L3/L4 may return an unexpected value.
This incorrect usage leads to a bug in GRO's UDP socket lookup.
udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These
*_hdr functions return network_header which will point to the innermost L3,
resulting in the wrong offset being used in __udp{4,6}_lib_lookup with
encapsulated packets.
This patch adds network_offset and inner_network_offset to napi_gro_cb, and
makes sure both are set correctly.
To fix the issue, network_offsets union is used inside napi_gro_cb, in
which both the outer and the inner network offsets are saved.
Reproduction example:
Endpoint configuration example (fou + local address bind)
# ip fou add port 6666 ipproto 4
# ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip
# ip link set tun1 up
# ip a add 1.1.1.2/24 dev tun1
Netperf TCP_STREAM result on net-next before patch is applied:
Shigeru Yoshida [Tue, 30 Apr 2024 12:39:45 +0000 (21:39 +0900)]
ipv4: Fix uninit-value access in __ip_make_skb()
KMSAN reported uninit-value access in __ip_make_skb() [1]. __ip_make_skb()
tests HDRINCL to know if the skb has icmphdr. However, HDRINCL can cause a
race condition. If calling setsockopt(2) with IP_HDRINCL changes HDRINCL
while __ip_make_skb() is running, the function will access icmphdr in the
skb even if it is not included. This causes the issue reported by KMSAN.
Check FLOWI_FLAG_KNOWN_NH on fl4->flowi4_flags instead of testing HDRINCL
on the socket.
Also, fl4->fl4_icmp_type and fl4->fl4_icmp_code are not initialized. These
are union in struct flowi4 and are implicitly initialized by
flowi4_init_output(), but we should not rely on specific union layout.
Andy Shevchenko [Thu, 25 Apr 2024 14:26:19 +0000 (17:26 +0300)]
drm/panel: ili9341: Use predefined error codes
In one case the -1 is returned which is quite confusing code for
the wrong device ID, in another the ret is returning instead of
plain 0 that also confusing as readed may ask the possible meaning
of positive codes, which are never the case there. Convert both
to use explicit predefined error codes to make it clear what's going
on there.
Andy Shevchenko [Thu, 25 Apr 2024 14:26:18 +0000 (17:26 +0300)]
drm/panel: ili9341: Respect deferred probe
GPIO controller might not be available when driver is being probed.
There are plenty of reasons why, one of which is deferred probe.
Since GPIOs are optional, return any error code we got to the upper
layer, including deferred probe. With that in mind, use dev_err_probe()
in order to avoid spamming the logs.
Alexandra Winter [Tue, 30 Apr 2024 09:10:04 +0000 (11:10 +0200)]
s390/qeth: Fix kernel panic after setting hsuid
Symptom:
When the hsuid attribute is set for the first time on an IQD Layer3
device while the corresponding network interface is already UP,
the kernel will try to execute a napi function pointer that is NULL.
Analysis:
There is one napi structure per out_q: card->qdio.out_qs[i].napi
The napi.poll functions are set during qeth_open().
Since
commit 1cfef80d4c2b ("s390/qeth: Don't call dev_close/dev_open (DOWN/UP)")
qeth_set_offline()/qeth_set_online() no longer call dev_close()/
dev_open(). So if qeth_free_qdio_queues() cleared
card->qdio.out_qs[i].napi.poll while the network interface was UP and the
card was offline, they are not set again.
Reproduction:
chzdev -e $devno layer2=0
ip link set dev $network_interface up
echo 0 > /sys/bus/ccwgroup/devices/0.0.$devno/online
echo foo > /sys/bus/ccwgroup/devices/0.0.$devno/hsuid
echo 1 > /sys/bus/ccwgroup/devices/0.0.$devno/online
-> Crash (can be enforced e.g. by af_iucv connect(), ip link down/up, ...)
Note that a Completion Queue (CQ) is only enabled or disabled, when hsuid
is set for the first time or when it is removed.
Workarounds:
- Set hsuid before setting the device online for the first time
or
- Use chzdev -d $devno; chzdev $devno hsuid=xxx; chzdev -e $devno;
to set hsuid on an existing device. (this will remove and recreate the
network interface)
Fix:
There is no need to free the output queues when a completion queue is
added or removed.
card->qdio.state now indicates whether the inbound buffer pool and the
outbound queues are allocated.
card->qdio.c_q indicates whether a CQ is allocated.
Takashi Iwai [Thu, 2 May 2024 06:24:42 +0000 (08:24 +0200)]
ALSA: hda/realtek: Fix build error without CONFIG_PM
The alc_spec.power_hook is defined only with CONFIG_PM, and the recent
fix overlooked it, resulting in a build error without CONFIG_PM.
Fix it with the simple ifdef and set __maybe_unused for the function.
We may drop the whole CONFIG_PM dependency there, but it should be
done in a separate cleanup patch later.
Ensure the inner IP header is part of skb's linear data before reading
its ECN bits. Otherwise we might read garbage.
One symptom is the system erroneously logging errors like
"vxlan: non-ECT from xxx.xxx.xxx.xxx with TOS=xxxx".
Similar bugs have been fixed in geneve, ip_tunnel and ip6_tunnel (see
commit 1ca1ba465e55 ("geneve: make sure to pull inner header in
geneve_rx()") for example). So let's reuse the same code structure for
consistency. Maybe we'll can add a common helper in the future.
Paolo Abeni [Tue, 30 Apr 2024 13:53:37 +0000 (15:53 +0200)]
tipc: fix UAF in error path
Sam Page (sam4k) working with Trend Micro Zero Day Initiative reported
a UAF in the tipc_buf_append() error path:
BUG: KASAN: slab-use-after-free in kfree_skb_list_reason+0x47e/0x4c0
linux/net/core/skbuff.c:1183
Read of size 8 at addr ffff88804d2a7c80 by task poc/8034
In the critical scenario, either the relevant skb is freed or its
ownership is transferred into a frag_lists. In both cases, the cleanup
code must not free it again: we need to clear the skb reference earlier.
The find connection logic of Transarc's Rx was modified in the mid-1990s
to support multi-homed servers which might send a response packet from
an address other than the destination address in the received packet.
The rules for accepting a packet by an Rx initiator (RX_CLIENT_CONNECTION)
were altered to permit acceptance of a packet from any address provided
that the port number was unchanged and all of the connection identifiers
matched (Epoch, CID, SecurityClass, ...).
This change applies the same rules to the Linux implementation which makes
it consistent with IBM AFS 3.6, Arla, OpenAFS and AuriStorFS.
Takashi Iwai [Wed, 1 May 2024 16:05:13 +0000 (18:05 +0200)]
Merge tag 'asoc-fix-v6.9-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.9
This is much larger than is ideal, partly due to your holiday but also
due to several vendors having come in with relatively large fixes at
similar times. It's all driver specific stuff.
The meson fixes from Jerome fix some rare timing issues with blocking
operations happening in triggers, plus the continuous clock support
which fixes clocking for some platforms. The SOF series from Peter
builds to the fix to avoid spurious resets of ChainDMA which triggered
errors in cleanup paths with both PulseAudio and PipeWire, and there's
also some simple new debugfs files from Pierre which make support a lot
eaiser.
Linus Torvalds [Wed, 1 May 2024 15:58:56 +0000 (08:58 -0700)]
Merge tag 'regulator-fix-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"There's a few simple driver specific fixes here, plus some core
cleanups from Matti which fix issues found with client drivers due to
the API being confusing.
The two fixes for the stubs provide more constructive behaviour with
!REGULATOR configurations, issues were noticed with some hwmon drivers
which would otherwise have needed confusing bodges in the users.
The irq_helpers fix to duplicate the provided name for the interrupt
controller was found because a driver got this wrong and it's again a
case where the core is the sensible place to put the fix"
* tag 'regulator-fix-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: change devm_regulator_get_enable_optional() stub to return Ok
regulator: change stubbed devm_regulator_get_enable to return Ok
regulator: vqmmc-ipq4019: fix module autoloading
regulator: qcom-refgen: fix module autoloading
regulator: mt6360: De-capitalize devicetree regulator subnodes
regulator: irq_helpers: duplicate IRQ name
mm/slub: avoid zeroing outside-object freepointer for single free
Commit 284f17ac13fe ("mm/slub: handle bulk and single object freeing
separately") splits single and bulk object freeing in two functions
slab_free() and slab_free_bulk() which leads slab_free() to call
slab_free_hook() directly instead of slab_free_freelist_hook().
If `init_on_free` is set, slab_free_hook() zeroes the object.
Afterward, if `slub_debug=F` and `CONFIG_SLAB_FREELIST_HARDENED` are
set, the do_slab_free() slowpath executes freelist consistency
checks and try to decode a zeroed freepointer which leads to a
"Freepointer corrupt" detection in check_object().
During bulk free, slab_free_freelist_hook() isn't affected as it always
sets it objects freepointer using set_freepointer() to maintain its
reconstructed freelist after `init_on_free`.
For single free, object's freepointer thus needs to be avoided when
stored outside the object if `init_on_free` is set. The freepointer left
as is, check_object() may later detect an invalid pointer value due to
objects overflow.
To reproduce, set `slub_debug=FU init_on_free=1 log_level=7` on the
command line of a kernel build with `CONFIG_SLAB_FREELIST_HARDENED=y`.
Matthew Auld [Tue, 23 Apr 2024 07:47:23 +0000 (08:47 +0100)]
drm/xe/vm: prevent UAF in rebind_work_func()
We flush the rebind worker during the vm close phase, however in places
like preempt_fence_work_func() we seem to queue the rebind worker
without first checking if the vm has already been closed. The concern
here is the vm being closed with the worker flushed, but then being
rearmed later, which looks like potential uaf, since there is no actual
refcounting to track the queued worker. We can't take the vm->lock here
in preempt_rebind_work_func() to first check if the vm is closed since
that will deadlock, so instead flush the worker again when the vm
refcount reaches zero.
v2:
- Grabbing vm->lock in the preempt worker creates a deadlock, so
checking the closed state is tricky. Instead flush the worker when
the refcount reaches zero. It should be impossible to queue the
preempt worker without already holding vm ref.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1676 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1591 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1364 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1304 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1249 Signed-off-by: Matthew Auld <[email protected]> Cc: Matthew Brost <[email protected]> Cc: <[email protected]> # v6.8+ Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 3d44d67c441a9fe6f81a1d705f7de009a32a5b35) Signed-off-by: Lucas De Marchi <[email protected]>
drm/amd/display: Disable panel replay by default for now
Panel replay was enabled by default in commit 5950efe25ee0
("drm/amd/display: Enable Panel Replay for static screen use case"), but
it isn't working properly at least on some BOE and AUO panels. Instead
of being static the screen is solid black when active. As it's a new
feature that was just introduced that regressed VRR disable it for now
so that problem can be properly root caused.
Cc: Tom Chung <[email protected]> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3344 Fixes: 5950efe25ee0 ("drm/amd/display: Enable Panel Replay for static screen use case") Signed-off-by: Mario Limonciello <[email protected]> Acked-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Felix Fietkau [Sat, 27 Apr 2024 18:24:19 +0000 (20:24 +0200)]
net: core: reject skb_copy(_expand) for fraglist GSO skbs
SKB_GSO_FRAGLIST skbs must not be linearized, otherwise they become
invalid. Return NULL if such an skb is passed to skb_copy or
skb_copy_expand, in order to prevent a crash on a potential later
call to skb_gso_segment.
Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Felix Fietkau [Sat, 27 Apr 2024 18:24:18 +0000 (20:24 +0200)]
net: bridge: fix multicast-to-unicast with fraglist GSO
Calling skb_copy on a SKB_GSO_FRAGLIST skb is not valid, since it returns
an invalid linearized skb. This code only needs to change the ethernet
header, so pskb_copy is the right function to call here.
Hannes Reinecke [Thu, 18 Apr 2024 10:39:45 +0000 (12:39 +0200)]
nvme-tcp: strict pdu pacing to avoid send stalls on TLS
TLS requires a strict pdu pacing via MSG_EOR to signal the end
of a record and subsequent encryption. If we do not set MSG_EOR
at the end of a sequence the record won't be closed, encryption
doesn't start, and we end up with a send stall as the message
will never be passed on to the TCP layer.
So do not check for the queue status when TLS is enabled but
rather make the MSG_MORE setting dependent on the current
request only.
nvmet: fix nvme status code when namespace is disabled
If the user disabled a nvmet namespace, it is removed from the subsystem
namespaces list. When nvmet processes a command directed to an nsid that
was disabled, it cannot differentiate between a nsid that is disabled
vs. a non-existent namespace, and resorts to return NVME_SC_INVALID_NS
with the dnr bit set.
This translates to a non-retryable status for the host, which translates
to a user error. We should expect disabled namespaces to not cause an
I/O error in a multipath environment.
Address this by searching a configfs item for the namespace nvmet failed
to find, and if we found one, conclude that the namespace is disabled
(perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR in this case
and keep DNR bit cleared.
nvmet-tcp: fix possible memory leak when tearing down a controller
When we teardown the controller, we wait for pending I/Os to complete
(sq->ref on all queues to drop to zero) and then we go over the commands,
and free their command buffers in case they are still fetching data from
the host (e.g. processing nvme writes) and have yet to take a reference
on the sq.
However, we may miss the case where commands have failed before executing
and are queued for sending a response, but will never occur because the
queue socket is already down. In this case we may miss deallocating command
buffers.
Solve this by freeing all commands buffers as nvmet_tcp_free_cmd_buffers is
idempotent anyways.
nvme: cancel pending I/O if nvme controller is in terminal state
While I/O is running, if the pci bus error occurs then
in-flight I/O can not complete. Worst, if at this time,
user (logically) hot-unplug the nvme disk then the
nvme_remove() code path can't forward progress until
in-flight I/O is cancelled. So these sequence of events
may potentially hang hot-unplug code path indefinitely.
This patch helps cancel the pending/in-flight I/O from the
nvme request timeout handler in case the nvme controller
is in the terminal (DEAD/DELETING/DELETING_NOIO) state and
that helps nvme_remove() code path forward progress and
finish successfully.
nvme: find numa distance only if controller has valid numa id
On system where native nvme multipath is configured and iopolicy
is set to numa but the nvme controller numa node id is undefined
or -1 (NUMA_NO_NODE) then avoid calculating node distance for
finding optimal io path. In such case we may access numa distance
table with invalid index and that may potentially refer to incorrect
memory. So this patch ensures that if the nvme controller numa node
id is -1 then instead of calculating node distance for finding optimal
io path, we set the numa node distance of such controller to default 10
(LOCAL_DISTANCE).
With commit ed6776c96c60 ("s390/crypto: remove retry
loop with sleep from PAES pkey invocation") the retry
loop to retry derivation of a protected key from a
secure key has been removed. This was based on the
assumption that theses retries are not needed any
more as proper retries are done in the zcrypt layer.
However, tests have revealed that there exist some
cases with master key change in the HSM and immediately
(< 1 second) attempt to derive a protected key from a
secure key with exact this HSM may eventually fail.
The low level functions in zcrypt_ccamisc.c and
zcrypt_ep11misc.c detect and report this temporary
failure and report it to the caller as -EBUSY. The
re-established retry loop in the paes implementation
catches exactly this -EBUSY and eventually may run
some retries.
Fixes: ed6776c96c60 ("s390/crypto: remove retry loop with sleep from PAES pkey invocation") Signed-off-by: Harald Freudenberger <[email protected]> Reviewed-by: Ingo Franzki <[email protected]> Reviewed-by: Holger Dengler <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
s390/zcrypt: Use EBUSY to indicate temp unavailability
Use -EBUSY instead of -EAGAIN in zcrypt_ccamisc.c
in cases where the CCA card returns 8/2290 to indicate
a temporarily unavailability of this function.
Fixes: ed6776c96c60 ("s390/crypto: remove retry loop with sleep from PAES pkey invocation") Signed-off-by: Harald Freudenberger <[email protected]> Reviewed-by: Ingo Franzki <[email protected]> Reviewed-by: Holger Dengler <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
An EP11 reply cprb contains a field ret_code which may
hold an error code different than the error code stored
in the payload of the cprb. As of now all the EP11 misc
functions do not evaluate this field but focus on the
error code in the payload.
Before checking the payload error, first the cprb error
field should be evaluated which is introduced with this
patch.
If the return code value 0x000c0003 is seen, this
indicates a busy situation which is reflected by
-EBUSY in the zcrpyt_ep11misc.c low level function.
A higher level caller should consider to retry after
waiting a dedicated duration (say 1 second).
Fixes: ed6776c96c60 ("s390/crypto: remove retry loop with sleep from PAES pkey invocation") Signed-off-by: Harald Freudenberger <[email protected]> Reviewed-by: Ingo Franzki <[email protected]> Reviewed-by: Holger Dengler <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
... and I think that's actually the important thing here:
- the first page fault is from user space, and triggers the vsyscall
emulation.
- the second page fault is from __do_sys_gettimeofday(), and that should
just have caused the exception that then sets the return value to
-EFAULT
- the third nested page fault is due to _raw_spin_unlock_irqrestore() ->
preempt_schedule() -> trace_sched_switch(), which then causes a BPF
trace program to run, which does that bpf_probe_read_compat(), which
causes that page fault under pagefault_disable().
It's quite the nasty backtrace, and there's a lot going on.
The problem is literally the vsyscall emulation, which sets
current->thread.sig_on_uaccess_err = 1;
and that causes the fixup_exception() code to send the signal *despite* the
exception being caught.
And I think that is in fact completely bogus. It's completely bogus
exactly because it sends that signal even when it *shouldn't* be sent -
like for the BPF user mode trace gathering.
In other words, I think the whole "sig_on_uaccess_err" thing is entirely
broken, because it makes any nested page-faults do all the wrong things.
Now, arguably, I don't think anybody should enable vsyscall emulation any
more, but this test case clearly does.
I think we should just make the "send SIGSEGV" be something that the
vsyscall emulation does on its own, not this broken per-thread state for
something that isn't actually per thread.
The x86 page fault code actually tried to deal with the "incorrect nesting"
by having that:
if (in_interrupt())
return;
which ignores the sig_on_uaccess_err case when it happens in interrupts,
but as shown by this example, these nested page faults do not need to be
about interrupts at all.
IOW, I think the only right thing is to remove that horrendously broken
code.
The attached patch looks like the ObviouslyCorrect(tm) thing to do.
NOTE! This broken code goes back to this commit in 2011:
4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults")
... and back then the reason was to get all the siginfo details right.
Honestly, I do not for a moment believe that it's worth getting the siginfo
details right here, but part of the commit says:
This fixes issues with UML when vsyscall=emulate.
... and so my patch to remove this garbage will probably break UML in this
situation.
I do not believe that anybody should be running with vsyscall=emulate in
2024 in the first place, much less if you are doing things like UML. But
let's see if somebody screams.
Mans Rullgard [Tue, 30 Apr 2024 18:27:05 +0000 (19:27 +0100)]
spi: fix null pointer dereference within spi_sync
If spi_sync() is called with the non-empty queue and the same spi_message
is then reused, the complete callback for the message remains set while
the context is cleared, leading to a null pointer dereference when the
callback is invoked from spi_finalize_current_message().
With function inlining disabled, the call stack might look like this:
_raw_spin_lock_irqsave from complete_with_flags+0x18/0x58
complete_with_flags from spi_complete+0x8/0xc
spi_complete from spi_finalize_current_message+0xec/0x184
spi_finalize_current_message from spi_transfer_one_message+0x2a8/0x474
spi_transfer_one_message from __spi_pump_transfer_message+0x104/0x230
__spi_pump_transfer_message from __spi_transfer_message_noqueue+0x30/0xc4
__spi_transfer_message_noqueue from __spi_sync+0x204/0x248
__spi_sync from spi_sync+0x24/0x3c
spi_sync from mcp251xfd_regmap_crc_read+0x124/0x28c [mcp251xfd]
mcp251xfd_regmap_crc_read [mcp251xfd] from _regmap_raw_read+0xf8/0x154
_regmap_raw_read from _regmap_bus_read+0x44/0x70
_regmap_bus_read from _regmap_read+0x60/0xd8
_regmap_read from regmap_read+0x3c/0x5c
regmap_read from mcp251xfd_alloc_can_err_skb+0x1c/0x54 [mcp251xfd]
mcp251xfd_alloc_can_err_skb [mcp251xfd] from mcp251xfd_irq+0x194/0xe70 [mcp251xfd]
mcp251xfd_irq [mcp251xfd] from irq_thread_fn+0x1c/0x78
irq_thread_fn from irq_thread+0x118/0x1f4
irq_thread from kthread+0xd8/0xf4
kthread from ret_from_fork+0x14/0x28
Fix this by also setting message->complete to NULL when the transfer is
complete.
Lancelot SIX [Wed, 10 Apr 2024 13:14:13 +0000 (14:14 +0100)]
drm/amdkfd: Flush the process wq before creating a kfd_process
There is a race condition when re-creating a kfd_process for a process.
This has been observed when a process under the debugger executes
exec(3). In this scenario:
- The process executes exec.
- This will eventually release the process's mm, which will cause the
kfd_process object associated with the process to be freed
(kfd_process_free_notifier decrements the reference count to the
kfd_process to 0). This causes kfd_process_ref_release to enqueue
kfd_process_wq_release to the kfd_process_wq.
- The debugger receives the PTRACE_EVENT_EXEC notification, and tries to
re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE).
- When handling this request, KFD tries to re-create a kfd_process.
This eventually calls kfd_create_process and kobject_init_and_add.
At this point the call to kobject_init_and_add can fail because the
old kfd_process.kobj has not been freed yet by kfd_process_wq_release.
This patch proposes to avoid this race by making sure to drain
kfd_process_wq before creating a new kfd_process object. This way, we
know that any cleanup task is done executing when we reach
kobject_init_and_add.
When fallback to TCP happens early on a client socket, snd_nxt
is not yet initialized and any incoming ack will copy such value
into snd_una. If the mptcp worker (dumbly) tries mptcp-level
re-injection after such ack, that would unconditionally trigger a send
buffer cleanup using 'bad' snd_una values.
We could easily disable re-injection for fallback sockets, but such
dumb behavior already helped catching a few subtle issues and a very
low to zero impact in practice.
Instead address the issue always initializing snd_nxt (and write_seq,
for consistency) at connect time.
Leo Ma [Thu, 11 Apr 2024 21:17:04 +0000 (17:17 -0400)]
drm/amd/display: Fix DC mode screen flickering on DCN321
[Why && How]
Screen flickering saw on 4K@60 eDP with high refresh rate external
monitor when booting up in DC mode. DC Mode Capping is disabled
which caused wrong UCLK being used.
e1000e: change usleep_range to udelay in PHY mdic access
This is a partial revert of commit 6dbdd4de0362 ("e1000e: Workaround
for sporadic MDI error on Meteor Lake systems"). The referenced commit
used usleep_range inside the PHY access routines, which are sometimes
called from an atomic context. This can lead to a kernel panic in some
scenarios, such as cable disconnection and reconnection on vPro systems.
Solve this by changing the usleep_range calls back to udelay.
Christian König [Thu, 21 Mar 2024 10:32:02 +0000 (11:32 +0100)]
drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2
This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move
on same heap. The basic problem here is that after the move the old
location is simply not available any more.
Some fixes were suggested, but essentially we should call the move
notification before actually moving things because only this way we have
the correct order for DMA-buf and VM move notifications as well.
Also rework the statistic handling so that we don't update the eviction
counter before the move.
v2: add missing NULL check
Signed-off-by: Christian König <[email protected]> Fixes: 94aeb4117343 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171 Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> CC: [email protected]
drm/amd/display: Allocate zero bw after bw alloc enable
[Why]
During DP tunnel creation, CM preallocates BW and reduces
estimated BW of other DPIA. CM release preallocation only
when allocation is complete. Display mode validation logic
validates timings based on bw available per host router.
In multi display setup, this causes bw allocation failure
when allocation greater than estimated bw.
[How]
Do zero alloc to make the CM to release preallocation and
update estimated BW correctly for all DPIAs per host router.
Hersen Wu [Tue, 13 Feb 2024 19:26:06 +0000 (14:26 -0500)]
drm/amd/display: Fix incorrect DSC instance for MST
[Why] DSC debugfs, such as dp_dsc_clock_en_read,
use aconnector->dc_link to find pipe_ctx for display.
Displays connected to MST hub share the same dc_link.
DSC instance is from pipe_ctx. This causes incorrect
DSC instance for display connected to MST hub.
[How] Add aconnector->sink check to find pipe_ctx.
Gabe Teeger [Tue, 9 Apr 2024 14:38:58 +0000 (10:38 -0400)]
drm/amd/display: Atom Integrated System Info v2_2 for DCN35
New request from KMD/VBIOS in order to support new UMA carveout
model. This fixes a null dereference from accessing
Ctx->dc_bios->integrated_info while it was NULL.
DAL parses through the BIOS and extracts the necessary
integrated_info but was missing a case for the new BIOS
version 2.3.
Currently DCN315 clk manager is missing code to enable/disable dtbclk.
Because of this, "optimized_required" flag is constantly set
and this prevents FreeSync from engaging for certain high bandwidth
display Modes which require DTBCLK.
The selftest for the driver sends a dummy packet and checks if the
packet will be received properly as it should be. The regular TX path
and the selftest can use the same network queue so locking is required
and was missing in the selftest path. This was addressed in the commit
cited below.
Unfortunately locking the TX queue requires BH to be disabled which is
not the case in selftest path which is invoked in process context.
Lockdep should be complaining about this.
Yunsheng Lin [Sun, 28 Apr 2024 11:16:38 +0000 (19:16 +0800)]
rxrpc: Fix using alignmask being zero for __page_frag_alloc_align()
rxrpc_alloc_data_txbuf() may be called with data_align being
zero in none_alloc_txbuf() and rxkad_alloc_txbuf(), data_align
is supposed to be an order-based alignment value, but zero is
not a valid order-based alignment value, and '~(data_align - 1)'
doesn't result in a valid mask-based alignment value for
__page_frag_alloc_align().
Fix it by passing a valid order-based alignment value in
none_alloc_txbuf() and rxkad_alloc_txbuf().
Also use page_frag_alloc_align() expecting an order-based
alignment value in rxrpc_alloc_data_txbuf() to avoid doing the
alignment converting operation and to catch possible invalid
alignment value in the future. Remove the 'if (data_align)'
checking too, as it is always true for a valid order-based
alignment value.
Subtract the VRAM pinned memory when checking for available memory
in amdgpu_amdkfd_reserve_mem_limit function since that memory is not
available for use.
ublk_drv currently creates block devices with the default max_segments
and max_segment_size limits of BLK_MAX_SEGMENTS (128) and
BLK_MAX_SEGMENT_SIZE (65536) respectively. These defaults can
artificially constrain the I/O size seen by the ublk server - for
example, suppose that the ublk server has configured itself to accept
I/Os up to 1M and the application is also issuing 1M sized I/Os. If the
I/O buffer used by the application is backed by 4K pages, the buffer
could consist of up to 1M / 4K = 256 physically discontiguous segments
(even if the buffer is virtually contiguous). As such, the I/O could
exceed the default max_segments limit and get split. This can cause
unnecessary performance issues if the ublk server is optimized to handle
1M I/Os. The block layer's segment count/size limits exist to model
hardware constraints which don't exist in ublk_drv's case, so just
remove those limits for the block devices created by ublk_drv.
Marek Szyprowski [Tue, 30 Apr 2024 18:46:56 +0000 (20:46 +0200)]
clk: samsung: Revert "clk: Use device_get_match_data()"
device_get_match_data() function should not be used on the device other
than the one matched to the given driver, because it always returns the
match_data of the matched driver. In case of exynos-clkout driver, the
original code matches the OF IDs on the PARENT device, so replacing it
with of_device_get_match_data() broke the driver.
This has been already pointed once in commit 2bc5febd05ab ("clk: samsung:
Revert "clk: samsung: exynos-clkout: Use of_device_get_match_data()"").
To avoid further confusion, add a comment about this special case, which
requires direct of_match_device() call to pass custom IDs array.
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fix from Paolo Bonzini:
"A pretty straightforward fix for a NULL pointer dereference, plus the
accompanying reproducer"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Add test for uaccesses to non-existent vgic-v2 CPUIF
KVM: arm64: vgic-v2: Check for non-NULL vCPU in vgic_v2_parse_attr()
Input: amimouse - mark driver struct with __refdata to prevent section mismatch
As described in the added code comment, a reference to .exit.text is ok
for drivers registered via module_platform_driver_probe(). Make this
explicit to prevent the following section mismatch warning
usb: typec: tcpm: Check for port partner validity before consuming it
typec_register_partner() does not guarantee partner registration
to always succeed. In the event of failure, port->partner is set
to the error value or NULL. Given that port->partner validity is
not checked, this results in the following crash:
Unable to handle kernel NULL pointer dereference at virtual address xx
pc : run_state_machine+0x1bc8/0x1c08
lr : run_state_machine+0x1b90/0x1c08
..
Call trace:
run_state_machine+0x1bc8/0x1c08
tcpm_state_machine_work+0x94/0xe4
kthread_worker_fn+0x118/0x328
kthread+0x1d0/0x23c
ret_from_fork+0x10/0x20
To prevent the crash, check for port->partner validity before
derefencing it in all the call sites.
usb: typec: tcpm: enforce ready state when queueing alt mode vdm
Before sending Enter Mode for an Alt Mode, there is a gap between Discover
Modes and the Alt Mode driver queueing the Enter Mode VDM for the port
partner to send a message to the port.
If this message results in unregistering Alt Modes such as in a DR_SWAP,
then the following deadlock can occur with respect to the DisplayPort Alt
Mode driver:
1. The DR_SWAP state holds port->lock. Unregistering the Alt Mode driver
results in a cancel_work_sync() that waits for the current dp_altmode_work
to finish.
2. dp_altmode_work makes a call to tcpm_altmode_enter. The deadlock occurs
because tcpm_queue_vdm_unlock attempts to hold port->lock.
Before attempting to grab the lock, ensure that the port is in a state
vdm_run_state_machine can run in. Alt Mode unregistration will not occur
in these states.
usb: typec: tcpm: unregister existing source caps before re-registration
Check and unregister existing source caps in tcpm_register_source_caps
function before registering new ones. This change fixes following
warning when port partner resends source caps after negotiating PD contract
for the purpose of re-negotiation.
[ 343.135030][ T151] sysfs: cannot create duplicate filename '/devices/virtual/usb_power_delivery/pd1/source-capabilities'
[ 343.135071][ T151] Call trace:
[ 343.135076][ T151] dump_backtrace+0xe8/0x108
[ 343.135099][ T151] show_stack+0x18/0x24
[ 343.135106][ T151] dump_stack_lvl+0x50/0x6c
[ 343.135119][ T151] dump_stack+0x18/0x24
[ 343.135126][ T151] sysfs_create_dir_ns+0xe0/0x140
[ 343.135137][ T151] kobject_add_internal+0x228/0x424
[ 343.135146][ T151] kobject_add+0x94/0x10c
[ 343.135152][ T151] device_add+0x1b0/0x4c0
[ 343.135187][ T151] device_register+0x20/0x34
[ 343.135195][ T151] usb_power_delivery_register_capabilities+0x90/0x20c
[ 343.135209][ T151] tcpm_pd_rx_handler+0x9f0/0x15b8
[ 343.135216][ T151] kthread_worker_fn+0x11c/0x260
[ 343.135227][ T151] kthread+0x114/0x1bc
[ 343.135235][ T151] ret_from_fork+0x10/0x20
[ 343.135265][ T151] kobject: kobject_add_internal failed for source-capabilities with -EEXIST, don't try to register things with the same name in the same directory.
usb: typec: tcpm: clear pd_event queue in PORT_RESET
When a Fast Role Swap control message attempt results in a transition
to ERROR_RECOVERY, the TCPC can still queue a TCPM_SOURCING_VBUS event.
If the event is queued but processed after the tcpm_reset_port() call
in the PORT_RESET state, then the following occurs:
1. tcpm_reset_port() calls tcpm_init_vbus() to reset the vbus sourcing and
sinking state
2. tcpm_pd_event_handler() turns VBUS on before the port is in the default
state.
3. The port resolves as a sink. In the SNK_DISCOVERY state,
tcpm_set_charge() cannot set vbus to charge.
Clear pd events within PORT_RESET to get rid of non-applicable events.
drm/vmwgfx: Fix invalid reads in fence signaled events
Correctly set the length of the drm_event to the size of the structure
that's actually used.
The length of the drm_event was set to the parent structure instead of
to the drm_vmw_event_fence which is supposed to be read. drm_read
uses the length parameter to copy the event to the user space thus
resuling in oob reads.