Linus Torvalds [Sat, 21 Dec 2024 17:29:46 +0000 (09:29 -0800)]
Merge tag 'ceph-for-6.13-rc4' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A handful of important CephFS fixes from Max, Alex and myself: memory
corruption due to a buffer overrun, potential infinite loop and
several memory leaks on the error paths. All but one marked for
stable"
* tag 'ceph-for-6.13-rc4' of https://github.com/ceph/ceph-client:
ceph: allocate sparse_ext map only for sparse reads
ceph: fix memory leak in ceph_direct_read_write()
ceph: improve error handling and short/overflow-read logic in __ceph_sync_read()
ceph: validate snapdirname option length when mounting
ceph: give up on paths longer than PATH_MAX
ceph: fix memory leaks in __ceph_sync_read()
Linus Torvalds [Fri, 20 Dec 2024 21:48:41 +0000 (13:48 -0800)]
Merge tag 'hwmon-for-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix reporting of negative temperature, current, and voltage values in
the tmp513 driver
* tag 'hwmon-for-v6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (tmp513) Fix interpretation of values of Temperature Result and Limit Registers
hwmon: (tmp513) Fix Current Register value interpretation
hwmon: (tmp513) Fix interpretation of values of Shunt Voltage and Limit Registers
Linus Torvalds [Fri, 20 Dec 2024 21:37:58 +0000 (13:37 -0800)]
Merge tag 'block-6.13-20241220' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Minor cleanups for bdev/nvme using the helpers introduced
- Revert of a deadlock fix that still needs more work
- Fix a UAF of hctx in the cpu hotplug code
* tag 'block-6.13-20241220' of git://git.kernel.dk/linux:
block: avoid to reuse `hctx` not removed from cpuhp callback list
block: Revert "block: Fix potential deadlock while freezing queue and acquiring sysfs_lock"
nvme: use blk_validate_block_size() for max LBA check
block/bdev: use helper for max block size check
Linus Torvalds [Fri, 20 Dec 2024 21:32:43 +0000 (13:32 -0800)]
Merge tag 'io_uring-6.13-20241220' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Fix for a file ref leak for registered ring fds
- Turn the ->timeout_lock into a raw spinlock, as it nests under the
io-wq lock which is a raw spinlock as it's called from the scheduler
side
- Limit ring resizing to DEFER_TASKRUN for now. We will broaden this in
the future, but for now, ensure that it's only feasible on rings with
a single user
- Add sanity check for io-wq enqueuing
* tag 'io_uring-6.13-20241220' of git://git.kernel.dk/linux:
io_uring: check if iowq is killed before queuing
io_uring/register: limit ring resizing to DEFER_TASKRUN
io_uring: Fix registered ring file refcount leak
io_uring: make ctx->timeout_lock a raw spinlock
Linus Torvalds [Fri, 20 Dec 2024 19:09:40 +0000 (11:09 -0800)]
Merge tag 'usb-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
"Here are some important, and small, fixes for USB and Thunderbolt
issues that have come up in the -rc releases. And some new device ids
for good measure. Included in here are:
- Much reported xhci bugfix for usb-storage devices (and other
devices as well, tripped me up on a video camera)
- thunderbolt fixes for some small reported issues
- new usb-serial device ids
All of these have been in linux-next this week with no reported issues"
* tag 'usb-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: xhci: fix ring expansion regression in 6.13-rc1
xhci: Turn NEC specific quirk for handling Stop Endpoint errors generic
thunderbolt: Improve redrive mode handling
USB: serial: option: add Telit FE910C04 rmnet compositions
USB: serial: option: add MediaTek T7XX compositions
USB: serial: option: add Netprisma LCUK54 modules for WWAN Ready
USB: serial: option: add MeiG Smart SLM770A
USB: serial: option: add TCL IK512 MBIM & ECM
thunderbolt: Don't display nvm_version unless upgrade supported
thunderbolt: Add support for Intel Panther Lake-M/P
Linus Torvalds [Fri, 20 Dec 2024 19:04:02 +0000 (11:04 -0800)]
Merge tag 'regulator-fix-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"The recently added regulator-uv-survival-time-ms property was renamed
during the review of the series that added it, but unfortunately only
in the DT binding and not in the code that parses the binding.
This brings the code in line with the binding, if someone started
using the original name we can add compat support for it but there's
nothing upstream yet and it's a very niche feature so hopefully not"
* tag 'regulator-fix-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: rename regulator-uv-survival-time-ms according to DT binding
Linus Torvalds [Fri, 20 Dec 2024 18:17:53 +0000 (10:17 -0800)]
Merge tag 'drm-fixes-2024-12-20' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Probably the last pull before Christmas holidays, I'll still be around
for most of the time anyways, nothing too major in here, bunch of
amdgpu and i915 along with a smattering of fixes across the board.
core:
- fix FB dependency
- avoid div by 0 more in vrefresh
- maintainers update
display:
- fix DP tunnel error path
dma-buf:
- fix !DEBUG_FS
sched:
- docs warning fix
panel:
- collection of misc panel fixes
i915:
- Reset engine utilization buffer before registration
- Ensure busyness counter increases motonically
- Accumulate active runtime on gt reset
amdgpu:
- Disable BOCO when CONFIG_HOTPLUG_PCI_PCIE is not enabled
- scheduler job fixes
- IP version check fixes
- devcoredump fix
- GPUVM update fix
- NBIO 2.5 fix
udmabuf:
- fix memory leak on last export
- sealing fixes
* tag 'drm-fixes-2024-12-20' of https://gitlab.freedesktop.org/drm/kernel: (33 commits)
drm/sched: Fix drm_sched_fini() docu generation
accel/ivpu: Fix WARN in ivpu_ipc_send_receive_internal()
accel/ivpu: Fix memory leak in ivpu_mmu_reserved_context_init()
accel/ivpu: Fix general protection fault in ivpu_bo_list()
drm/amdgpu/nbio7.0: fix IP version check
drm/amd: Update strapping for NBIO 2.5.0
drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_update
drm/amdgpu: fix amdgpu_coredump
drm/amdgpu/smu14.0.2: fix IP version check
drm/amdgpu/gfx12: fix IP version check
drm/amdgpu/mmhub4.1: fix IP version check
drm/amdgpu/nbio7.11: fix IP version check
drm/amdgpu/nbio7.7: fix IP version check
drm/amdgpu: don't access invalid sched
drm/amd: Require CONFIG_HOTPLUG_PCI_PCIE for BOCO
drm: rework FB_CORE dependency
drm/fbdev: Select FB_CORE dependency for fbdev on DMA and TTM
fbdev: Fix recursive dependencies wrt BACKLIGHT_CLASS_DEVICE
i915/guc: Accumulate active runtime on gt reset
i915/guc: Ensure busyness counter increases motonically
...
Linus Torvalds [Fri, 20 Dec 2024 18:13:26 +0000 (10:13 -0800)]
Merge tag 'trace-ringbuffer-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer fixes from Steven Rostedt:
- Fix possible overflow of mmapped ring buffer with bad offset
If the mmap() to the ring buffer passes in a start address that is
passed the end of the mmapped file, it is not caught and a
slab-out-of-bounds is triggered.
Add a check to make sure the start address is within the bounds
- Do not use TP_printk() to boot mapped ring buffers
As a boot mapped ring buffer's data may have pointers that map to the
previous boot's memory map, it is unsafe to allow the TP_printk() to
be used to read the boot mapped buffer's events. If a TP_printk()
points to a static string from within the kernel it will not match
the current kernel mapping if KASLR is active, and it can fault.
Have it simply print out the raw fields.
* tag 'trace-ringbuffer-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
trace/ring-buffer: Do not use TP_printk() formatting for boot mapped buffers
ring-buffer: Fix overflow in __rb_map_vma
We are seeing a sparse warning in gcs_restore_signal():
arch/arm64/kernel/signal.c:1054:9: sparse: sparse: cast removes address space '__user' of expression
when storing the final GCSPR_EL0 value back into the register, caused by
the fact that write_sysreg_s() casts the value it writes to a u64 which
sparse sees as discarding the __userness of the pointer.
Avoid this by treating the address as an integer, casting to a pointer only
when using it to write to userspace.
While we're at it also inline gcs_signal_cap_valid() into it's one user
and make equivalent updates to gcs_signal_entry().
Dave Airlie [Thu, 19 Dec 2024 21:13:44 +0000 (07:13 +1000)]
Merge tag 'drm-misc-fixes-2024-12-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.13-rc4:
- udma-buf fixes related to sealing.
- dma-buf build warning fix when debugfs is not enabled.
- Assorted drm/panel fixes.
- Correct error return in drm_dp_tunnel_mgr_create.
- Fix even more divide by zero in drm_mode_vrefresh.
- Fix FBDEV dependencies in Kconfig.
- Documentation fix for drm_sched_fini.
- IVPU NULL pointer, memory leak and WARN fix.
Pavel Begunkov [Thu, 19 Dec 2024 19:52:58 +0000 (19:52 +0000)]
io_uring: check if iowq is killed before queuing
task work can be executed after the task has gone through io_uring
termination, whether it's the final task_work run or the fallback path.
In this case, task work will find ->io_wq being already killed and
null'ed, which is a problem if it then tries to forward the request to
io_queue_iowq(). Make io_queue_iowq() fail requests in this case.
Note that it also checks PF_KTHREAD, because the user can first close
a DEFER_TASKRUN ring and shortly after kill the task, in which case
->iowq check would race.
* tag 'net-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
net: mctp: handle skb cleanup on sock_queue failures
net: mdiobus: fix an OF node reference leak
octeontx2-pf: fix error handling of devlink port in rvu_rep_create()
octeontx2-pf: fix netdev memory leak in rvu_rep_create()
psample: adjust size if rate_as_probability is set
netdev-genl: avoid empty messages in queue dump
net: dsa: restore dsa_software_vlan_untag() ability to operate on VLAN-untagged traffic
selftests: openvswitch: fix tcpdump execution
net: usb: qmi_wwan: add Quectel RG255C
net: phy: avoid undefined behavior in *_led_polarity_set()
netfilter: ipset: Fix for recursive locking warning
ipvs: Fix clamp() of ip_vs_conn_tab on small memory systems
can: m_can: fix missed interrupts with m_can_pci
can: m_can: set init flag earlier in probe
rtnetlink: Try the outer netns attribute in rtnl_get_peer_net().
net: netdevsim: fix nsim_pp_hold_write()
idpf: trigger SW interrupt when exiting wb_on_itr mode
idpf: add support for SW triggered interrupts
qed: fix possible uninit pointer read in qed_mcp_nvm_info_populate()
net: ethernet: bgmac-platform: fix an OF node reference leak
...
Linus Torvalds [Thu, 19 Dec 2024 16:53:51 +0000 (08:53 -0800)]
Merge tag 'mmc-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
- mtk-sd: Cleanup the wakeup configuration in error/remove-path
- sdhci-tegra: Correct quirk for ADMA2 length
* tag 'mmc-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: mtk-sd: disable wakeup in .remove() and in the error path of .probe()
mmc: sdhci-tegra: Remove SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC quirk
Linus Torvalds [Thu, 19 Dec 2024 16:50:05 +0000 (08:50 -0800)]
Merge tag 'pwm/for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fix from Uwe Kleine-König:
"Fix regression in pwm-stm32 driver when converting to new waveform
support
Fabrice Gasnier found and fixed a regression I introduced with
v6.13-rc1 when converting the stm32 pwm driver to support the new
waveform stuff. On some hardware variants this completely broke the
driver"
* tag 'pwm/for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
pwm: stm32: Fix complementary output in round_waveform_tohw()
Linus Torvalds [Thu, 19 Dec 2024 16:45:37 +0000 (08:45 -0800)]
Merge tag 'v6.13-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Two fixes for better handling maximum outstanding requests
- Fix simultaneous negotiate protocol race
* tag 'v6.13-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: conn lock to serialize smb2 negotiate
ksmbd: fix broken transfers when exceeding max simultaneous operations
ksmbd: count all requests in req_running counter
Jens Axboe [Thu, 19 Dec 2024 16:32:26 +0000 (09:32 -0700)]
io_uring/register: limit ring resizing to DEFER_TASKRUN
With DEFER_TASKRUN, we know the ring can't be both waited upon and
resized at the same time. This is important for CQ resizing. Allowing SQ
ring resizing is more trivial, but isn't the interesting use case. Hence
limit ring resizing in general to DEFER_TASKRUN only for now. This isn't
a huge problem as CQ ring resizing is generally the most useful on
networking type of workloads where it can be hard to size the ring
appropriately upfront, and those should be using DEFER_TASKRUN for
better performance.
Bagas Sanjaya [Tue, 17 Dec 2024 03:49:15 +0000 (10:49 +0700)]
drm/sched: Fix drm_sched_fini() docu generation
Commit baf4afc5831438 ("drm/sched: Improve teardown documentation")
added a list of drm_sched_fini()'s problems. The list triggers htmldocs
warning (but renders correctly in htmldocs output):
accel/ivpu: Fix WARN in ivpu_ipc_send_receive_internal()
Move pm_runtime_set_active() to ivpu_pm_init() so when
ivpu_ipc_send_receive_internal() is executed before ivpu_pm_enable()
it already has correct runtime state, even if last resume was
not successful.
Ahmad Fatoum [Wed, 18 Dec 2024 19:54:53 +0000 (20:54 +0100)]
regulator: rename regulator-uv-survival-time-ms according to DT binding
The regulator bindings don't document regulator-uv-survival-time-ms, but
the more descriptive regulator-uv-less-critical-window-ms instead.
Looking back at v3[1] and v4[2] of the series adding the support,
the property was indeed renamed between these patch series, but
unfortunately the rename only made it into the DT bindings with the
driver code still using the old name.
Let's therefore rename the property in the driver code to follow suit.
This will break backwards compatibility, but there are no upstream
device trees using the property and we never documented the old name
of the property anyway. ¯\_(ツ)_/¯"
Jeremy Kerr [Wed, 18 Dec 2024 03:53:01 +0000 (11:53 +0800)]
net: mctp: handle skb cleanup on sock_queue failures
Currently, we don't use the return value from sock_queue_rcv_skb, which
means we may leak skbs if a message is not successfully queued to a
socket.
Instead, ensure that we're freeing the skb where the sock hasn't
otherwise taken ownership of the skb by adding checks on the
sock_queue_rcv_skb() to invoke a kfree on failure.
In doing so, rather than using the 'rc' value to trigger the
kfree_skb(), use the skb pointer itself, which is more explicit.
Also, add a kunit test for the sock delivery failure cases.
Joe Hattori [Wed, 18 Dec 2024 03:51:06 +0000 (12:51 +0900)]
net: mdiobus: fix an OF node reference leak
fwnode_find_mii_timestamper() calls of_parse_phandle_with_fixed_args()
but does not decrement the refcount of the obtained OF node. Add an
of_node_put() call before returning from the function.
This bug was detected by an experimental static analysis tool that I am
developing.
Paolo Abeni [Thu, 19 Dec 2024 08:55:21 +0000 (09:55 +0100)]
Merge tag 'nf-24-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following series contains two fixes for Netfilter/IPVS:
1) Possible build failure in IPVS on systems with less than 512MB
memory due to incorrect use of clamp(), from David Laight.
2) Fix bogus lockdep nesting splat with ipset list:set type,
from Phil Sutter.
netfilter pull request 24-12-19
* tag 'nf-24-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: ipset: Fix for recursive locking warning
ipvs: Fix clamp() of ip_vs_conn_tab on small memory systems
====================
Jakub Kicinski [Wed, 18 Dec 2024 02:25:08 +0000 (18:25 -0800)]
netdev-genl: avoid empty messages in queue dump
Empty netlink responses from do() are not correct (as opposed to
dump() where not dumping anything is perfectly fine).
We should return an error if the target object does not exist,
in this case if the netdev is down it has no queues.
Vladimir Oltean [Mon, 16 Dec 2024 13:50:59 +0000 (15:50 +0200)]
net: dsa: restore dsa_software_vlan_untag() ability to operate on VLAN-untagged traffic
Robert Hodaszi reports that locally terminated traffic towards
VLAN-unaware bridge ports is broken with ocelot-8021q. He is describing
the same symptoms as for commit 1f9fc48fd302 ("net: dsa: sja1105: fix
reception from VLAN-unaware bridges").
was developed in a slightly different form earlier this year, in January.
Initially, the switch was unconditionally configured to set OCELOT_ES0_TAG
when using ocelot-8021q, regardless of port operating mode.
This led to the situation where VLAN-unaware bridge ports would always
push their PVID - see ocelot_vlan_unaware_pvid() - a negligible value
anyway - into RX packets. To strip this in software, we would have needed
DSA to know what private VID the switch chose for VLAN-unaware bridge
ports, and pushed into the packets. This was implemented downstream, and
a remnant of it remains in the form of a comment mentioning
ds->ops->get_private_vid(), as something which would maybe need to be
considered in the future.
However, for upstream, it was deemed inappropriate, because it would
mean introducing yet another behavior for stripping VLAN tags from
VLAN-unaware bridge ports, when one already existed (ds->untag_bridge_pvid).
The latter has been marked as obsolete along with an explanation why it
is logically broken, but still, it would have been confusing.
So, for upstream, felix_update_tag_8021q_rx_rule() was developed, which
essentially changed the state of affairs from "Felix with ocelot-8021q
delivers all packets as VLAN-tagged towards the CPU" into "Felix with
ocelot-8021q delivers all packets from VLAN-aware bridge ports towards
the CPU". This was done on the premise that in VLAN-unaware mode,
there's nothing useful in the VLAN tags, and we can avoid introducing
ds->ops->get_private_vid() in the DSA receive path if we configure the
switch to not push those VLAN tags into packets in the first place.
Unfortunately, and this is when the trainwreck started, the selftests
developed initially and posted with the series were not re-ran.
dsa_software_vlan_untag() was initially written given the assumption
that users of this feature would send _all_ traffic as VLAN-tagged.
It was only partially adapted to the new scheme, by removing
ds->ops->get_private_vid(), which also used to be necessary in
standalone ports mode.
Where the trainwreck became even worse is that I had a second opportunity
to think about this, when the dsa_software_vlan_untag() logic change
initially broke sja1105, in commit 1f9fc48fd302 ("net: dsa: sja1105: fix
reception from VLAN-unaware bridges"). I did not connect the dots that
it also breaks ocelot-8021q, for pretty much the same reason that not
all received packets will be VLAN-tagged.
To be compatible with the optimized Felix control path which runs
felix_update_tag_8021q_rx_rule() to only push VLAN tags when useful (in
VLAN-aware mode), we need to restore the old dsa_software_vlan_untag()
logic. The blamed commit introduced the assumption that
dsa_software_vlan_untag() will see only VLAN-tagged packets, assumption
which is false. What corrupts RX traffic is the fact that we call
skb_vlan_untag() on packets which are not VLAN-tagged in the first
place.
Jakub Kicinski [Thu, 19 Dec 2024 03:20:20 +0000 (19:20 -0800)]
Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
idpf: trigger SW interrupt when exiting wb_on_itr mode
Joshua Hay says:
This patch series introduces SW triggered interrupt support for idpf,
then uses said interrupt to fix a race condition between completion
writebacks and re-enabling interrupts.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
idpf: trigger SW interrupt when exiting wb_on_itr mode
idpf: add support for SW triggered interrupts
====================
Adrian Moreno [Tue, 17 Dec 2024 21:16:51 +0000 (22:16 +0100)]
selftests: openvswitch: fix tcpdump execution
Fix the way tcpdump is executed by:
- Using the right variable for the namespace. Currently the use of the
empty "ns" makes the command fail.
- Waiting until it starts to capture to ensure the interesting traffic
is caught on slow systems.
- Using line-buffered output to ensure logs are available when the test
is paused with "-p". Otherwise the last chunk of data might only be
written when tcpdump is killed.
Jakub Kicinski [Thu, 19 Dec 2024 01:51:38 +0000 (17:51 -0800)]
Merge tag 'linux-can-fixes-for-6.13-20241218' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2024-12-18
There are 2 patches by Matthias Schiffer for the m_can_pci driver that
handles the m_can cores found on the Intel Elkhart Lake processor.
They fix the initialization and the interrupt handling under high CAN
bus load.
* tag 'linux-can-fixes-for-6.13-20241218' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: m_can: fix missed interrupts with m_can_pci
can: m_can: set init flag earlier in probe
====================
Jann Horn [Wed, 18 Dec 2024 16:56:25 +0000 (17:56 +0100)]
io_uring: Fix registered ring file refcount leak
Currently, io_uring_unreg_ringfd() (which cleans up registered rings) is
only called on exit, but __io_uring_free (which frees the tctx in which the
registered ring pointers are stored) is also called on execve (via
begin_new_exec -> io_uring_task_cancel -> __io_uring_cancel ->
io_uring_cancel_generic -> __io_uring_free).
This means: A process going through execve while having registered rings
will leak references to the rings' `struct file`.
Fix it by zapping registered rings on execve(). This is implemented by
moving the io_uring_unreg_ringfd() from io_uring_files_cancel() into its
callee __io_uring_cancel(), which is called from io_uring_task_cancel() on
execve.
This could probably be exploited *on 32-bit kernels* by leaking 2^32
references to the same ring, because the file refcount is stored in a
pointer-sized field and get_file() doesn't have protection against
refcount overflow, just a WARN_ONCE(); but on 64-bit it should have no
impact beyond a memory leak.
Arnd Bergmann [Tue, 17 Dec 2024 08:10:34 +0000 (09:10 +0100)]
net: phy: avoid undefined behavior in *_led_polarity_set()
gcc runs into undefined behavior at the end of the three led_polarity_set()
callback functions if it were called with a zero 'modes' argument and it
just ends the function there without returning from it.
This gets flagged by 'objtool' as a function that continues on
to the next one:
drivers/net/phy/aquantia/aquantia_leds.o: warning: objtool: aqr_phy_led_polarity_set+0xf: can't find jump dest instruction at .text+0x5d9
drivers/net/phy/intel-xway.o: warning: objtool: xway_gphy_led_polarity_set() falls through to next function xway_gphy_config_init()
drivers/net/phy/mxl-gpy.o: warning: objtool: gpy_led_polarity_set() falls through to next function gpy_led_hw_control_get()
There is no point to micro-optimize the behavior here to save a single-digit
number of bytes in the kernel, so just change this to a "return -EINVAL"
as we do when any unexpected bits are set.
Fixes: 1758af47b98c ("net: phy: intel-xway: add support for PHY LEDs") Fixes: 9d55e68b19f2 ("net: phy: aquantia: correctly describe LED polarity override") Fixes: eb89c79c1b8f ("net: phy: mxl-gpy: correctly describe LED polarity") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
Phil Sutter [Tue, 17 Dec 2024 19:56:55 +0000 (20:56 +0100)]
netfilter: ipset: Fix for recursive locking warning
With CONFIG_PROVE_LOCKING, when creating a set of type bitmap:ip, adding
it to a set of type list:set and populating it from iptables SET target
triggers a kernel warning:
| WARNING: possible recursive locking detected
| 6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted
| --------------------------------------------
| ping/4018 is trying to acquire lock:
| ffff8881094a6848 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set]
|
| but task is already holding lock:
| ffff88811034c048 (&set->lock){+.-.}-{2:2}, at: ip_set_add+0x28c/0x360 [ip_set]
This is a false alarm: ipset does not allow nested list:set type, so the
loop in list_set_kadd() can never encounter the outer set itself. No
other set type supports embedded sets, so this is the only case to
consider.
To avoid the false report, create a distinct lock class for list:set
type ipset locks.
Fixes: f830837f0eed ("netfilter: ipset: list:set set type support") Signed-off-by: Phil Sutter <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
David Laight [Sat, 14 Dec 2024 17:30:53 +0000 (17:30 +0000)]
ipvs: Fix clamp() of ip_vs_conn_tab on small memory systems
The 'max_avail' value is calculated from the system memory
size using order_base_2().
order_base_2(x) is defined as '(x) ? fn(x) : 0'.
The compiler generates two copies of the code that follows
and then expands clamp(max, min, PAGE_SHIFT - 12) (11 on 32bit).
This triggers a compile-time assert since min is 5.
In reality a system would have to have less than 512MB memory
for the bounds passed to clamp to be reversed.
Swap the order of the arguments to clamp() to avoid the warning.
Replace the clamp_val() on the line below with clamp().
clamp_val() is just 'an accident waiting to happen' and not needed here.
Detected by compile time checks added to clamp(), specifically:
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
Linus Torvalds [Wed, 18 Dec 2024 22:17:21 +0000 (14:17 -0800)]
Merge tag 'for-6.13-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- tree-checker catches invalid number of inline extent references
- zoned mode fixes:
- enhance zone append IO command so it also detects emulated writes
- handle bio splitting at sectorsize boundary
- when deleting a snapshot, fix a condition for visiting nodes in reloc
trees
* tag 'for-6.13-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: tree-checker: reject inline extent items with 0 ref count
btrfs: split bios to the fs sector size boundary
btrfs: use bio_is_zone_append() in the completion handler
btrfs: fix improper generation check in snapshot delete
Linus Torvalds [Wed, 18 Dec 2024 20:52:57 +0000 (12:52 -0800)]
Merge tag 'cxl-fixes-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Ira Weiny:
- prevent probe failure when non-critical RAS unmasking fails
- fix CXL 1.1 link status sysfs attribute
- fix 4 way (and greater) switch interleave region creation
* tag 'cxl-fixes-6.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region: Fix region creation for greater than x2 switches
cxl/pci: Check dport->regs.rcd_pcie_cap availability before accessing
cxl/pci: Fix potential bogus return value upon successful probing
Steven Rostedt [Wed, 18 Dec 2024 19:15:07 +0000 (14:15 -0500)]
trace/ring-buffer: Do not use TP_printk() formatting for boot mapped buffers
The TP_printk() of a TRACE_EVENT() is a generic printf format that any
developer can create for their event. It may include pointers to strings
and such. A boot mapped buffer may contain data from a previous kernel
where the strings addresses are different.
One solution is to copy the event content and update the pointers by the
recorded delta, but a simpler solution (for now) is to just use the
print_fields() function to print these events. The print_fields() function
just iterates the fields and prints them according to what type they are,
and ignores the TP_printk() format from the event itself.
To understand the difference, when printing via TP_printk() the output
looks like this:
Add a check before the calculation to avoid this problem.
syzbot reported this as a slab-out-of-bounds in __rb_map_vma:
BUG: KASAN: slab-out-of-bounds in __rb_map_vma+0x9ab/0xae0 kernel/trace/ring_buffer.c:7058
Read of size 8 at addr ffff8880767dd2b8 by task syz-executor187/5836
Linus Torvalds [Wed, 18 Dec 2024 18:03:33 +0000 (10:03 -0800)]
Merge tag 'trace-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"Replace trace_check_vprintf() with test_event_printk() and
ignore_event()
The function test_event_printk() checks on boot up if the trace event
printf() formats dereference any pointers, and if they do, it then
looks at the arguments to make sure that the pointers they dereference
will exist in the event on the ring buffer. If they do not, it issues
a WARN_ON() as it is a likely bug.
But this isn't the case for the strings that can be dereferenced with
"%s", as some trace events (notably RCU and some IPI events) save a
pointer to a static string in the ring buffer. As the string it points
to lives as long as the kernel is running, it is not a bug to
reference it, as it is guaranteed to be there when the event is read.
But it is also possible (and a common bug) to point to some allocated
string that could be freed before the trace event is read and the
dereference is to bad memory. This case requires a run time check.
The previous way to handle this was with trace_check_vprintf() that
would process the printf format piece by piece and send what it didn't
care about to vsnprintf() to handle arguments that were not strings.
This kept it from having to reimplement vsnprintf(). But it relied on
va_list implementation and for architectures that copied the va_list
and did not pass it by reference, it wasn't even possible to do this
check and it would be skipped. As 64bit x86 passed va_list by
reference, most events were tested and this kept out bugs where
strings would have been dereferenced after being freed.
Instead of relying on the implementation of va_list, extend the boot
up test_event_printk() function to validate all the "%s" strings that
can be validated at boot, and for the few events that point to strings
outside the ring buffer, flag both the event and the field that is
dereferenced as "needs_test". Then before the event is printed, a call
to ignore_event() is made, and if the event has the flag set, it
iterates all its fields and for every field that is to be tested, it
will read the pointer directly from the event in the ring buffer and
make sure that it is valid. If the pointer is not valid, it will print
a WARN_ON(), print out to the trace that the event has unsafe memory
and ignore the print format.
With this new update, the trace_check_vprintf() can be safely removed
and now all events can be verified regardless of architecture"
* tag 'trace-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Check "%s" dereference via the field and not the TP_printk format
tracing: Add "%s" check in test_event_printk()
tracing: Add missing helper functions in event pointer dereference check
tracing: Fix test_event_printk() to process entire print argument
Since 2320c9e6a768 ("drm/sched: memset() 'job' in drm_sched_job_init()")
accessing job->base.sched can produce unexpected results as the initialisation
of (*job)->base.sched done in amdgpu_job_alloc is overwritten by the
memset.
This commit fixes an issue when a CS would fail validation and would
be rejected after job->num_ibs is incremented. In this case,
amdgpu_ib_free(ring->adev, ...) will be called, which would crash the
machine because the ring value is bogus.
To fix this, pass a NULL pointer to amdgpu_ib_free(): we can do this
because the device is actually not used in this function.
The next commit will remove the ring argument completely.
If the kernel hasn't been compiled with PCIe hotplug support this
can lead to problems with dGPUs that use BOCO because they effectively
drop off the bus.
To prevent issues, disable BOCO support when compiled without PCIe hotplug.
Linus Torvalds [Wed, 18 Dec 2024 17:55:55 +0000 (09:55 -0800)]
Merge tag 'hyperv-fixes-signed-20241217' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Various fixes to Hyper-V tools in the kernel tree (Dexuan Cui, Olaf
Hering, Vitaly Kuznetsov)
- Fix a bug in the Hyper-V TSC page based sched_clock() (Naman Jain)
- Two bug fixes in the Hyper-V utility functions (Michael Kelley)
- Convert open-coded timeouts to secs_to_jiffies() in Hyper-V drivers
(Easwar Hariharan)
* tag 'hyperv-fixes-signed-20241217' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
tools/hv: reduce resource usage in hv_kvp_daemon
tools/hv: add a .gitignore file
tools/hv: reduce resouce usage in hv_get_dns_info helper
hv/hv_kvp_daemon: Pass NIC name to hv_get_dns_info as well
Drivers: hv: util: Avoid accessing a ringbuffer not initialized yet
Drivers: hv: util: Don't force error code to ENODEV in util_probe()
tools/hv: terminate fcopy daemon if read from uio fails
drivers: hv: Convert open-coded timeouts to secs_to_jiffies()
tools: hv: change permissions of NetworkManager configuration file
x86/hyperv: Fix hv tsc page based sched_clock for hibernation
tools: hv: Fix a complier warning in the fcopy uio daemon
Juergen Gross [Wed, 18 Dec 2024 08:02:28 +0000 (09:02 +0100)]
x86/static-call: fix 32-bit build
In 32-bit x86 builds CONFIG_STATIC_CALL_INLINE isn't set, leading to
static_call_initialized not being available.
Define it as "0" in that case.
Reported-by: Stephen Rothwell <[email protected]> Fixes: 0ef8047b737d ("x86/static-call: provide a way to do very early static-call updates") Signed-off-by: Juergen Gross <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
Commit be26ba96421a ("block: Fix potential deadlock while freezing queue and
acquiring sysfs_loc") actually reverts commit 22465bbac53c ("blk-mq: move cpuhp
callback registering out of q->sysfs_lock"), and causes the original resctrl
lockdep warning.
So revert it and we need to fix the issue in another way.
Fabrice Gasnier [Tue, 17 Dec 2024 15:00:21 +0000 (16:00 +0100)]
pwm: stm32: Fix complementary output in round_waveform_tohw()
When the timer supports complementary output, the CCxNE bit must be set
additionally to the CCxE bit. So to not overwrite the latter use |=
instead of = to set the former.
The interrupt line of PCI devices is interpreted as edge-triggered,
however the interrupt signal of the m_can controller integrated in Intel
Elkhart Lake CPUs appears to be generated level-triggered.
Consider the following sequence of events:
- IR register is read, interrupt X is set
- A new interrupt Y is triggered in the m_can controller
- IR register is written to acknowledge interrupt X. Y remains set in IR
As at no point in this sequence no interrupt flag is set in IR, the
m_can interrupt line will never become deasserted, and no edge will ever
be observed to trigger another run of the ISR. This was observed to
result in the TX queue of the EHL m_can to get stuck under high load,
because frames were queued to the hardware in m_can_start_xmit(), but
m_can_finish_tx() was never run to account for their successful
transmission.
On an Elkhart Lake based board with the two CAN interfaces connected to
each other, the following script can reproduce the issue:
ip link set can0 up type can bitrate 1000000
ip link set can1 up type can bitrate 1000000
To fix the issue, repeatedly read and acknowledge interrupts at the
start of the ISR until no interrupt flags are set, so the next incoming
interrupt will also result in an edge on the interrupt line.
While we have received a report that even with this patch, the TX queue
can become stuck under certain (currently unknown) circumstances on the
Elkhart Lake, this patch completely fixes the issue with the above
reproducer, and it is unclear whether the remaining issue has a similar
cause at all.
While an m_can controller usually already has the init flag from a
hardware reset, no such reset happens on the integrated m_can_pci of the
Intel Elkhart Lake. If the CAN controller is found in an active state,
m_can_dev_setup() would fail because m_can_niso_supported() calls
m_can_cccr_update_bits(), which refuses to modify any other configuration
bits when CCCR_INIT is not set.
To avoid this issue, set CCCR_INIT before attempting to modify any other
configuration flags.
rtnetlink: Try the outer netns attribute in rtnl_get_peer_net().
Xiao Liang reported that the cited commit changed netns handling
in newlink() of netkit, veth, and vxcan.
Before the patch, if we don't find a netns attribute in the peer
device attributes, we tried to find another netns attribute in
the outer netlink attributes by passing it to rtnl_link_get_net().
hexagon: Disable constant extender optimization for LLVM prior to 19.1.0
The Hexagon-specific constant extender optimization in LLVM may crash on
Linux kernel code [1], such as fs/bcache/btree_io.c after
commit 32ed4a620c54 ("bcachefs: Btree path tracepoints") in 6.12:
clang: llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp:745: bool (anonymous namespace)::HexagonConstExtenders::ExtRoot::operator<(const HCE::ExtRoot &) const: Assertion `ThisB->getParent() == OtherB->getParent()' failed.
Stack dump:
0. Program arguments: clang --target=hexagon-linux-musl ... fs/bcachefs/btree_io.c
1. <eof> parser at end of file
2. Code generation
3. Running pass 'Function Pass Manager' on module 'fs/bcachefs/btree_io.c'.
4. Running pass 'Hexagon constant-extender optimization' on function '@__btree_node_lock_nopath'
Without assertions enabled, there is just a hang during compilation.
This has been resolved in LLVM main (20.0.0) [2] and backported to LLVM
19.1.0 but the kernel supports LLVM 13.0.1 and newer, so disable the
constant expander optimization using the '-mllvm' option when using a
toolchain that is not fixed.
Joshua Hay [Mon, 25 Nov 2024 23:58:55 +0000 (15:58 -0800)]
idpf: trigger SW interrupt when exiting wb_on_itr mode
There is a race condition between exiting wb_on_itr and completion write
backs. For example, we are in wb_on_itr mode and a Tx completion is
generated by HW, ready to be written back, as we are re-enabling
interrupts:
That tx_completion_wb happens before the vector is fully re-enabled.
Continuing with this example, it is a UDP stream and the
tx_completion_wb is the last one in the flow (there are no rx packets).
Because the HW generated the completion before the interrupt is fully
enabled, the HW will not fire the interrupt once the timer expires and
the write back will not happen. NAPI poll won't be called. We have
indicated we're back in interrupt mode but nothing else will trigger the
interrupt. Therefore, the completion goes unprocessed, triggering a Tx
timeout.
To mitigate this, fire a SW triggered interrupt upon exiting wb_on_itr.
This interrupt will catch the rogue completion and avoid the timeout.
Add logic to set the appropriate bits in the vector's dyn_ctl register.
Joshua Hay [Mon, 25 Nov 2024 23:58:54 +0000 (15:58 -0800)]
idpf: add support for SW triggered interrupts
SW triggered interrupts are guaranteed to fire after their timer
expires, unlike Tx and Rx interrupts which will only fire after the
timer expires _and_ a descriptor write back is available to be processed
by the driver.
Add the necessary fields, defines, and initializations to enable a SW
triggered interrupt in the vector's dyn_ctl register.
Qu Wenruo [Wed, 4 Dec 2024 03:00:46 +0000 (13:30 +1030)]
btrfs: tree-checker: reject inline extent items with 0 ref count
[BUG]
There is a bug report in the mailing list where btrfs_run_delayed_refs()
failed to drop the ref count for logical 25870311358464 num_bytes 2113536.
[CAUSE]
There is no concrete evidence yet, but considering 0 -> 1 is also a
single bit flipped, it's possible that hardware memory bitflip is
involved, causing the on-disk extent tree to be corrupted.
[FIX]
To prevent us reading such corrupted extent item, or writing such
damaged extent item back to disk, enhance the handling of
BTRFS_EXTENT_DATA_REF_KEY and BTRFS_SHARED_DATA_REF_KEY keys for both
inlined and key items, to detect such 0 ref count and reject them.
Btrfs like other file systems can't really deal with I/O not aligned to
it's internal block size (which strangely is called sector size in
btrfs, for historical reasons), but the block layer split helper doesn't
even know about that.
Round down the split boundary so that all I/Os are aligned.
btrfs: use bio_is_zone_append() in the completion handler
Otherwise it won't catch bios turned into regular writes by the block
level zone write plugging. The additional test it adds is for emulated
zone append.
Josef Bacik [Wed, 13 Nov 2024 16:11:55 +0000 (11:11 -0500)]
btrfs: fix improper generation check in snapshot delete
We have been using the following check
if (generation <= root->root_key.offset)
to make decisions about whether or not to visit a node during snapshot
delete. This is because for normal subvolumes this is set to 0, and for
snapshots it's set to the creation generation. The idea being that if
the generation of the node is less than or equal to our creation
generation then we don't need to visit that node, because it doesn't
belong to us, we can simply drop our reference and move on.
However reloc roots don't have their generation stored in
root->root_key.offset, instead that is the objectid of their
corresponding fs root. This means we can incorrectly not walk into
nodes that need to be dropped when deleting a reloc root.
There are a variety of consequences to making the wrong choice in two
distinct areas.
visit_node_for_delete()
1. False positive. We think we are newer than the block when we really
aren't. We don't visit the node and drop our reference to the node
and carry on. This would result in leaked space.
2. False negative. We do decide to walk down into a block that we
should have just dropped our reference to. However this means that
the child node will have refs > 1, so we will switch to
UPDATE_BACKREF, and then the subsequent walk_down_proc() will notice
that btrfs_header_owner(node) != root->root_key.objectid and it'll
break out of the loop, and then walk_up_proc() will drop our reference,
so this appears to be ok.
do_walk_down()
1. False positive. We are in UPDATE_BACKREF and incorrectly decide that
we are done and don't need to update the backref for our lower nodes.
This is another case that simply won't happen with relocation, as we
only have to do UPDATE_BACKREF if the node below us was shared and
didn't have FULL_BACKREF set, and since we don't own that node
because we're a reloc root we actually won't end up in this case.
2. False negative. Again this is tricky because as described above, we
simply wouldn't be here from relocation, because we don't own any of
the nodes because we never set btrfs_header_owner() to the reloc root
objectid, and we always use FULL_BACKREF, we never actually need to
set FULL_BACKREF on any children.
Having spent a lot of time stressing relocation/snapshot delete recently
I've not seen this pop in practice. But this is objectively incorrect,
so fix this to get the correct starting generation based on the root
we're dropping to keep me from thinking there's a problem here.
Arnd Bergmann [Mon, 16 Dec 2024 07:42:49 +0000 (08:42 +0100)]
drm: rework FB_CORE dependency
The 'select FB_CORE' statement moved from CONFIG_DRM to DRM_CLIENT_LIB,
but there are now configurations that have code calling into fb_core
as built-in even though the client_lib itself is a loadable module:
x86_64-linux-ld: drivers/gpu/drm/drm_fb_helper.o: in function `drm_fb_helper_set_suspend':
drm_fb_helper.c:(.text+0x2c6): undefined reference to `fb_set_suspend'
x86_64-linux-ld: drivers/gpu/drm/drm_fb_helper.o: in function `drm_fb_helper_resume_worker':
drm_fb_helper.c:(.text+0x2e1): undefined reference to `fb_set_suspend'
In addition to DRM_CLIENT_LIB, the 'select' needs to be at least in
DRM_KMS_HELPER and DRM_GEM_SHMEM_HELPER, so add it here.
Linus Torvalds [Tue, 17 Dec 2024 17:14:31 +0000 (09:14 -0800)]
Merge tag 'ftrace-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ftrace fixes from Steven Rostedt:
- Always try to initialize the idle functions when graph tracer starts
A bug was found that when a CPU is offline when graph tracing starts
and then comes online, that CPU is not traced. The fix to that was to
move the initialization of the idle shadow stack over to the hot plug
online logic, which also handle onlined CPUs. The issue was that it
removed the initialization of the shadow stack when graph tracing
starts, but the callbacks to the hot plug logic do nothing if graph
tracing isn't currently running. Although that fix fixed the onlining
of a CPU during tracing, it broke the CPUs that were already online.
- Have microblaze not try to get the "true parent" in function tracing
If function tracing and graph tracing are both enabled at the same
time the parent of the functions traced by the function tracer may
sometimes be the graph tracing trampoline. The graph tracing hijacks
the return pointer of the function to trace it, but that can
interfere with the function tracing parent output.
This was fixed by using the ftrace_graph_ret_addr() function passing
in the kernel stack pointer using the ftrace_regs_get_stack_pointer()
function. But Al Viro reported that Microblaze does not implement the
kernel_stack_pointer(regs) helper function that
ftrace_regs_get_stack_pointer() uses and fails to compile when
function graph tracing is enabled.
It was first thought that this was a microblaze issue, but the real
cause is that this only works when an architecture implements
HAVE_DYNAMIC_FTRACE_WITH_ARGS, as a requirement for that config is to
have ftrace always pass a valid ftrace_regs to the callbacks. That
also means that the architecture supports
ftrace_regs_get_stack_pointer()
Microblaze does not set HAVE_DYNAMIC_FTRACE_WITH_ARGS nor does it
implement ftrace_regs_get_stack_pointer() which caused it to fail to
build. Only implement the "true parent" logic if an architecture has
that config set"
* tag 'ftrace-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ftrace: Do not find "true_parent" if HAVE_DYNAMIC_FTRACE_WITH_ARGS is not set
fgraph: Still initialize idle shadow stacks when starting
Linus Torvalds [Tue, 17 Dec 2024 17:09:32 +0000 (09:09 -0800)]
Merge tag 's390-6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Fix DirectMap accounting in /proc/meminfo file
- Fix strscpy() return code handling that led to "unsigned 'len' is
never less than zero" warning
- Fix the calculation determining whether to use three- or four-level
paging: account KMSAN modules metadata
* tag 's390-6.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: Consider KMSAN modules metadata for paging levels
s390/ipl: Fix never less than zero warning
s390/mm: Fix DirectMap accounting
Do not select BACKLIGHT_CLASS_DEVICE from FB_BACKLIGHT. The latter
only controls backlight support within fbdev core code and data
structures.
Make fbdev drivers depend on BACKLIGHT_CLASS_DEVICE and let users
select it explicitly. Fixes warnings about recursive dependencies,
such as
error: recursive dependency detected!
symbol BACKLIGHT_CLASS_DEVICE is selected by FB_BACKLIGHT
symbol FB_BACKLIGHT is selected by FB_SH_MOBILE_LCDC
symbol FB_SH_MOBILE_LCDC depends on FB_DEVICE
symbol FB_DEVICE depends on FB_CORE
symbol FB_CORE is selected by DRM_GEM_DMA_HELPER
symbol DRM_GEM_DMA_HELPER is selected by DRM_PANEL_ILITEK_ILI9341
symbol DRM_PANEL_ILITEK_ILI9341 depends on BACKLIGHT_CLASS_DEVICE
BACKLIGHT_CLASS_DEVICE is user-selectable, so making drivers adapt to
it is the correct approach in any case. For most drivers, backlight
support is also configurable separately.
Linus Torvalds [Tue, 17 Dec 2024 17:04:42 +0000 (09:04 -0800)]
Merge tag 'erofs-for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"The first one fixes a syzbot UAF report caused by a commit introduced
in this cycle, but it also addresses a longstanding memory leak. The
second one resolves a PSI memstall mis-accounting issue.
The remaining patches switch file-backed mounts to use buffered I/Os
by default instead of direct I/Os, since the page cache of underlay
files is typically valid and maybe even dirty. This change also aligns
with the default policy of loopback devices. A mount option has been
added to try to use direct I/Os explicitly.
Summary:
- Fix (pcluster) memory leak and (sbi) UAF after umounting
- Fix a case of PSI memstall mis-accounting
- Use buffered I/Os by default for file-backed mounts"
* tag 'erofs-for-6.13-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: use buffered I/O for file-backed mounts by default
erofs: reference `struct erofs_device_info` for erofs_map_dev
erofs: use `struct erofs_device_info` for the primary device
erofs: add erofs_sb_free() helper
MAINTAINERS: erofs: update Yue Hu's email address
erofs: fix PSI memstall accounting
erofs: fix rare pcluster memory leak after unmounting
Steven Rostedt [Tue, 17 Dec 2024 02:41:22 +0000 (21:41 -0500)]
tracing: Check "%s" dereference via the field and not the TP_printk format
The TP_printk() portion of a trace event is executed at the time a event
is read from the trace. This can happen seconds, minutes, hours, days,
months, years possibly later since the event was recorded. If the print
format contains a dereference to a string via "%s", and that string was
allocated, there's a chance that string could be freed before it is read
by the trace file.
To protect against such bugs, there are two functions that verify the
event. The first one is test_event_printk(), which is called when the
event is created. It reads the TP_printk() format as well as its arguments
to make sure nothing may be dereferencing a pointer that was not copied
into the ring buffer along with the event. If it is, it will trigger a
WARN_ON().
For strings that use "%s", it is not so easy. The string may not reside in
the ring buffer but may still be valid. Strings that are static and part
of the kernel proper which will not be freed for the life of the running
system, are safe to dereference. But to know if it is a pointer to a
static string or to something on the heap can not be determined until the
event is triggered.
This brings us to the second function that tests for the bad dereferencing
of strings, trace_check_vprintf(). It would walk through the printf format
looking for "%s", and when it finds it, it would validate that the pointer
is safe to read. If not, it would produces a WARN_ON() as well and write
into the ring buffer "[UNSAFE-MEMORY]".
The problem with this is how it used va_list to have vsnprintf() handle
all the cases that it didn't need to check. Instead of re-implementing
vsnprintf(), it would make a copy of the format up to the %s part, and
call vsnprintf() with the current va_list ap variable, where the ap would
then be ready to point at the string in question.
For architectures that passed va_list by reference this was possible. For
architectures that passed it by copy it was not. A test_can_verify()
function was used to differentiate between the two, and if it wasn't
possible, it would disable it.
Even for architectures where this was feasible, it was a stretch to rely
on such a method that is undocumented, and could cause issues later on
with new optimizations of the compiler.
Instead, the first function test_event_printk() was updated to look at
"%s" as well. If the "%s" argument is a pointer outside the event in the
ring buffer, it would find the field type of the event that is the problem
and mark the structure with a new flag called "needs_test". The event
itself will be marked by TRACE_EVENT_FL_TEST_STR to let it be known that
this event has a field that needs to be verified before the event can be
printed using the printf format.
When the event fields are created from the field type structure, the
fields would copy the field type's "needs_test" value.
Finally, before being printed, a new function ignore_event() is called
which will check if the event has the TEST_STR flag set (if not, it
returns false). If the flag is set, it then iterates through the events
fields looking for the ones that have the "needs_test" flag set.
Then it uses the offset field from the field structure to find the pointer
in the ring buffer event. It runs the tests to make sure that pointer is
safe to print and if not, it triggers the WARN_ON() and also adds to the
trace output that the event in question has an unsafe memory access.
The ignore_event() makes the trace_check_vprintf() obsolete so it is
removed.
Steven Rostedt [Tue, 17 Dec 2024 02:41:21 +0000 (21:41 -0500)]
tracing: Add "%s" check in test_event_printk()
The test_event_printk() code makes sure that when a trace event is
registered, any dereferenced pointers in from the event's TP_printk() are
pointing to content in the ring buffer. But currently it does not handle
"%s", as there's cases where the string pointer saved in the ring buffer
points to a static string in the kernel that will never be freed. As that
is a valid case, the pointer needs to be checked at runtime.
Currently the runtime check is done via trace_check_vprintf(), but to not
have to replicate everything in vsnprintf() it does some logic with the
va_list that may not be reliable across architectures. In order to get rid
of that logic, more work in the test_event_printk() needs to be done. Some
of the strings can be validated at this time when it is obvious the string
is valid because the string will be saved in the ring buffer content.
Do all the validation of strings in the ring buffer at boot in
test_event_printk(), and make sure that the field of the strings that
point into the kernel are accessible. This will allow adding checks at
runtime that will validate the fields themselves and not rely on paring
the TP_printk() format at runtime.
Steven Rostedt [Tue, 17 Dec 2024 02:41:20 +0000 (21:41 -0500)]
tracing: Add missing helper functions in event pointer dereference check
The process_pointer() helper function looks to see if various trace event
macros are used. These macros are for storing data in the event. This
makes it safe to dereference as the dereference will then point into the
event on the ring buffer where the content of the data stays with the
event itself.
Steven Rostedt [Tue, 17 Dec 2024 02:41:19 +0000 (21:41 -0500)]
tracing: Fix test_event_printk() to process entire print argument
The test_event_printk() analyzes print formats of trace events looking for
cases where it may dereference a pointer that is not in the ring buffer
which can possibly be a bug when the trace event is read from the ring
buffer and the content of that pointer no longer exists.
The function needs to accurately go from one print format argument to the
next. It handles quotes and parenthesis that may be included in an
argument. When it finds the start of the next argument, it uses a simple
"c = strstr(fmt + i, ',')" to find the end of that argument!
In order to include "%s" dereferencing, it needs to process the entire
content of the print format argument and not just the content of the first
',' it finds. As there may be content like:
Which is an example of a full argument of an existing event. As the code
already handles finding the next print format argument, process the
argument at the end of it and not the start of it. This way it has both
the start of the argument as well as the end of it.
Add a helper function "process_pointer()" that will do the processing during
the loop as well as at the end. It also makes the code cleaner and easier
to read.
Linus Torvalds [Tue, 17 Dec 2024 16:29:58 +0000 (08:29 -0800)]
Merge tag 'xsa465+xsa466-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Fix xen netfront crash (XSA-465) and avoid using the hypercall page
that doesn't do speculation mitigations (XSA-466)"
* tag 'xsa465+xsa466-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: remove hypercall page
x86/xen: use new hypercall functions instead of hypercall page
x86/xen: add central hypercall functions
x86/xen: don't do PV iret hypercall through hypercall page
x86/static-call: provide a way to do very early static-call updates
objtool/x86: allow syscall instruction
x86: make get_cpu_vendor() accessible from Xen code
xen/netfront: fix crash when removing device
which is because io_uring has ctx->timeout_lock nesting inside the
io-wq acct lock, the latter of which is used from inside the scheduler
and hence is a raw spinlock, while the former is a "normal" spinlock
and can hence be sleeping on PREEMPT_RT.
Change ctx->timeout_lock to be a raw spinlock to solve this nesting
dependency on PREEMPT_RT=y.
Gianfranco Trad [Sun, 15 Dec 2024 01:17:34 +0000 (02:17 +0100)]
qed: fix possible uninit pointer read in qed_mcp_nvm_info_populate()
Coverity reports an uninit pointer read in qed_mcp_nvm_info_populate().
If EOPNOTSUPP is returned from qed_mcp_bist_nvm_get_num_images() ensure
nvm_info.num_images is set to 0 to avoid possible uninit assignment
to p_hwfn->nvm_info.image_att later on in out label.
====================
Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib
This patch series contain the below fixes.
- Infinite loop error when tx credits becomes 0.
- Race condition between tx skb reference pointers.
v2:
- Added mutex lock to protect tx skb reference handling.
v3:
- Added mutex protection in assigning new tx skb to waiting_tx_skb
pointer.
- Explained the possible scenario for the race condition with the time
diagram in the commit message.
v4:
- Replaced mutex with spin_lock_bh() variants as the start_xmit runs in
BH/softirq context which can't take sleeping locks.
====================
There are two skb pointers to manage tx skb's enqueued from n/w stack.
waiting_tx_skb pointer points to the tx skb which needs to be processed
and ongoing_tx_skb pointer points to the tx skb which is being processed.
SPI thread prepares the tx data chunks from the tx skb pointed by the
ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is
processed, the tx skb pointed by the waiting_tx_skb is assigned to
ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL.
Whenever there is a new tx skb from n/w stack, it will be assigned to
waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb
handled in two different threads.
Consider a scenario where the SPI thread processed an ongoing_tx_skb and
it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer
without doing any NULL check. At this time, if the waiting_tx_skb pointer
is NULL then ongoing_tx_skb pointer is also assigned with NULL. After
that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w
stack and there is a chance to overwrite the tx skb pointer with NULL in
the SPI thread. Finally one of the tx skb will be left as unhandled,
resulting packet missing and memory leak.
- Consider the below scenario where the TXC reported from the previous
transfer is 10 and ongoing_tx_skb holds an tx ethernet frame which can be
transported in 20 TXCs and waiting_tx_skb is still NULL.
tx_credits = 10; /* 21 are filled in the previous transfer */
ongoing_tx_skb = 20;
waiting_tx_skb = NULL; /* Still NULL */
- So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true.
- After oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
ongoing_tx_skb = 10;
waiting_tx_skb = NULL; /* Still NULL */
- Perform SPI transfer.
- Process SPI rx buffer to get the TXC from footers.
- Now let's assume previously filled 21 TXCs are freed so we are good to
transport the next remaining 10 tx chunks from ongoing_tx_skb.
tx_credits = 21;
ongoing_tx_skb = 10;
waiting_tx_skb = NULL;
- So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true again.
- In the oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
ongoing_tx_skb = NULL;
waiting_tx_skb = NULL;
- Now the below bad case might happen,
Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler)
--------------------------- -----------------------------------
- if waiting_tx_skb is NULL
- if ongoing_tx_skb is NULL
- ongoing_tx_skb = waiting_tx_skb
- waiting_tx_skb = skb
- waiting_tx_skb = NULL
...
- ongoing_tx_skb = NULL
- if waiting_tx_skb is NULL
- waiting_tx_skb = skb
To overcome the above issue, protect the moving of tx skb reference from
waiting_tx_skb pointer to ongoing_tx_skb pointer and assigning new tx skb
to waiting_tx_skb pointer, so that the other thread can't access the
waiting_tx_skb pointer until the current thread completes moving the tx
skb reference safely.
Fixes: 53fbde8ab21e ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames") Signed-off-by: Parthiban Veerasooran <[email protected]> Reviewed-by: Larysa Zaremba <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
SPI thread wakes up to perform SPI transfer whenever there is an TX skb
from n/w stack or interrupt from MAC-PHY. Ethernet frame from TX skb is
transferred based on the availability tx credits in the MAC-PHY which is
reported from the previous SPI transfer. Sometimes there is a possibility
that TX skb is available to transmit but there is no tx credits from
MAC-PHY. In this case, there will not be any SPI transfer but the thread
will be running in an endless loop until tx credits available again.
So checking the availability of tx credits along with TX skb will prevent
the above infinite loop. When the tx credits available again that will be
notified through interrupt which will trigger the SPI transfer to get the
available tx credits.
Fixes: 53fbde8ab21e ("net: ethernet: oa_tc6: implement transmit path to transfer tx ethernet frames") Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Parthiban Veerasooran <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
Niklas Neronin [Tue, 17 Dec 2024 10:21:22 +0000 (12:21 +0200)]
usb: xhci: fix ring expansion regression in 6.13-rc1
The source and destination rings were incorrectly assigned during the ring
linking process. The "source" ring, which contains the new segments,
was not spliced into the "destination" ring, leading to incorrect ring
expansion.
Fixes: fe688e500613 ("usb: xhci: refactor xhci_link_rings() to use source and destination rings") Reported-by: Jeff Chua <[email protected]> Closes: https://lore.kernel.org/lkml/CAAJw_ZtppNqC9XA=-WVQDr+vaAS=di7jo15CzSqONeX48H75MA@mail.gmail.com/ Signed-off-by: Niklas Neronin <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Mathias Nyman [Tue, 17 Dec 2024 10:21:21 +0000 (12:21 +0200)]
xhci: Turn NEC specific quirk for handling Stop Endpoint errors generic
xHC hosts from several vendors have the same issue where endpoints start
so slowly that a later queued 'Stop Endpoint' command may complete before
endpoint is up and running.
The 'Stop Endpoint' command fails with context state error as the endpoint
still appears as stopped.
See commit 42b758137601 ("usb: xhci: Limit Stop Endpoint retries") for
details
On gt reset, if a context is running, then accumulate it's active time
into the busyness counter since there will be no chance for the context
to switch out and update it's run time.