Steven Rostedt [Wed, 7 Aug 2024 22:54:02 +0000 (18:54 -0400)]
tracefs: Use generic inode RCU for synchronizing freeing
With structure layout randomization enabled for 'struct inode' we need to
avoid overlapping any of the RCU-used / initialized-only-once members,
e.g. i_lru or i_sb_list to not corrupt related list traversals when making
use of the rcu_head.
For an unlucky structure layout of 'struct inode' we may end up with the
following splat when running the ftrace selftests:
The list debug message as well as RBX's symbolic value point out that the
object in question was allocated from 'tracefs_inode_cache' and that the
list's '->next' member is at offset 0. Dumping the layout of the relevant
parts of 'struct tracefs_inode' gives the following:
Above shows that 'vfs_inode.i_lru' overlaps with 'rcu' which will
destroy the 'i_lru' list as soon as the 'rcu' member gets used, e.g. in
call_rcu() or later when calling the RCU callback. This will disturb
concurrent list traversals as well as object reuse which assumes these
list heads will keep their integrity.
For reproduction, the following diff manually overlays 'i_lru' with
'rcu' as, otherwise, one would require some good portion of luck for
gambling an unlucky RANDSTRUCT seed:
@@ -690,7 +691,6 @@ struct inode {
u16 i_wb_frn_avg_time;
u16 i_wb_frn_history;
#endif
- struct list_head i_lru; /* inode LRU list */
struct list_head i_sb_list;
struct list_head i_wb_list; /* backing dev writeback list */
union {
The tracefs inode does not need to supply its own RCU delayed destruction
of its inode. The inode code itself offers both a "destroy_inode()"
callback that gets called when the last reference of the inode is
released, and the "free_inode()" which is called after a RCU
synchronization period from the "destroy_inode()".
The tracefs code can unlink the inode from its list in the destroy_inode()
callback, and the simply free it from the free_inode() callback. This
should provide the same protection.
Jianhui Zhou [Mon, 5 Aug 2024 11:36:31 +0000 (19:36 +0800)]
ring-buffer: Remove unused function ring_buffer_nr_pages()
Because ring_buffer_nr_pages() is not an inline function and user accesses
buffer->buffers[cpu]->nr_pages directly, the function ring_buffer_nr_pages
is removed.
Tze-nan Wu [Mon, 5 Aug 2024 05:59:22 +0000 (13:59 +0800)]
tracing: Fix overflow in get_free_elt()
"tracing_map->next_elt" in get_free_elt() is at risk of overflowing.
Once it overflows, new elements can still be inserted into the tracing_map
even though the maximum number of elements (`max_elts`) has been reached.
Continuing to insert elements after the overflow could result in the
tracing_map containing "tracing_map->max_size" elements, leaving no empty
entries.
If any attempt is made to insert an element into a full tracing_map using
`__tracing_map_insert()`, it will cause an infinite loop with preemption
disabled, leading to a CPU hang problem.
Fix this by preventing any further increments to "tracing_map->next_elt"
once it reaches "tracing_map->max_elt".
Petr Pavlu [Sat, 3 Aug 2024 13:09:26 +0000 (15:09 +0200)]
function_graph: Fix the ret_stack used by ftrace_graph_ret_addr()
When ftrace_graph_ret_addr() is invoked to convert a found stack return
address to its original value, the function can end up producing the
following crash:
cd linux/tools/testing/selftests
make TARGETS='ftrace livepatch'
(cd ftrace; ./ftracetest test.d/ftrace/fgraph-filter.tc)
(cd livepatch; ./test-livepatch.sh)
The problem is that ftrace_graph_ret_addr() is supposed to operate on the
ret_stack of a selected task but wrongly accesses the ret_stack of the
current task. Specifically, the above NULL dereference occurs when
task->curr_ret_stack is non-zero, but current->ret_stack is NULL.
Correct ftrace_graph_ret_addr() to work with the right ret_stack.
eventfs: Don't return NULL in eventfs_create_dir()
Commit 77a06c33a22d ("eventfs: Test for ei->is_freed when accessing
ei->dentry") added another check, testing if the parent was freed after
we released the mutex. If so, the function returns NULL. However, all
callers expect it to either return a valid pointer or an error pointer,
at least since commit 5264a2f4bb3b ("tracing: Fix a NULL vs IS_ERR() bug
in event_subsystem_dir()"). Returning NULL will therefore fail the error
condition check in the caller.
Fix this by substituting the NULL return value with a fitting error
pointer.
Stefan Wahren [Sun, 4 Aug 2024 11:36:11 +0000 (13:36 +0200)]
spi: spi-fsl-lpspi: Fix scldiv calculation
The effective SPI clock frequency should never exceed speed_hz
otherwise this might result in undefined behavior of the SPI device.
Currently the scldiv calculation could violate this constraint.
For the example parameters perclk_rate = 24 MHz and speed_hz = 7 MHz,
the function fsl_lpspi_set_bitrate will determine perscale = 0 and
scldiv = 1, which is a effective SPI clock of 8 MHz.
So fix this by rounding up the quotient of perclk_rate and speed_hz.
While this never change within the loop, we can pull this out.
drm/amdgpu: Add DCC GFX12 flag to enable address alignment
We require this flag AMDGPU_GEM_CREATE_GFX12_DCC or any other
kernel level GFX12 DCC flag to differentiate the DCC buffers and other
pinned display buffers(which has TTM_PL_FLAG_CONTIGUOUS enabled).
If we use the TTM_PL_FLAG_CONTIGUOUS flag for DCC buffers, we may over
allocate for all the pinned display buffers unnecessarily that leads to
memory allocation failure.
drm/amdgpu: Add address alignment support to DCC buffers
Add address alignment support to the DCC VRAM buffers.
v2:
- adjust size based on the max_texture_channel_caches values
only for GFX12 DCC buffers.
- used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only
for DCC buffers.
- roundup non power of two DCC buffer adjusted size to nearest
power of two number as the buddy allocator does not support non
power of two alignments. This applies only to the contiguous
DCC buffers.
v3:(Alex)
- rewrite the max texture channel caches comparison code in an
algorithmic way to determine the alignment size.
v4:(Alex)
- Move the logic from amdgpu_vram_mgr_dcc_alignment() to gmc_v12_0.c
and add a new gmc func callback for dcc alignment. If the callback
is non-NULL, call it to get the alignment, otherwise, use the default.
v5:(Alex)
- Set the Alignment to a default value if the callback doesn't exist.
- Add the callback to amdgpu_gmc_funcs.
v6:
- Fix checkpatch warning reported by Intel CI.
v7:(Christian)
- remove the AMDGPU_GEM_CREATE_GFX12_DCC flag and keep a flag that
checks the BO pinning and for a specific hw generation.
v8:(Christian)
- move this check into gmc_v12_0_get_dcc_alignment.
[how]
dsc recompute should be skipped if no mode change detected on the new
request. If detected, keep checking whether the stream is already on
current state or not.
Joshua Ashton [Thu, 7 Mar 2024 19:04:31 +0000 (19:04 +0000)]
drm/amdgpu: Forward soft recovery errors to userspace
As we discussed before[1], soft recovery should be
forwarded to userspace, or we can get into a really
bad state where apps will keep submitting hanging
command buffers cascading us to a hard reset.
drm/buddy: Add start address support to trim function
- Add a new start parameter in trim function to specify exact
address from where to start the trimming. This would help us
in situations like if drivers would like to do address alignment
for specific requirements.
- Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this
flag to disable the allocator trimming part. This patch enables
the drivers control trimming and they can do it themselves
based on the application requirements.
v1:(Matthew)
- check new_start alignment with min chunk_size
- use range_overflows()
drm/amd/display: Add missing DET segments programming
The commit 5034b935f62a ("drm/amd/display: Modify DHCUB waterwark
structures and functions") introduced a code refactor for DCHUB, but
during the merge process into amd-staging-drm-next, the program det
segments were removed. This commit adds the DET segment programming for
DCN35.
drm/amd/display: Replace dm_execute_dmub_cmd with dc_wake_and_execute_dmub_cmd
In the commit c2cec7a872b6 ("drm/amd/display: Wake DMCUB before sending
a command for replay feature"), replaced dm_execute_dmub_cmd with
dc_wake_and_execute_dmub_cmd in multiple areas, but due to merge issues
the replacement of this function in the dmub_replay_copy_settings was
missed. This commit replaces the old dm_execute_dmub_cmd with
dc_wake_and_execute_dmub_cmd.
Steven Rostedt [Fri, 26 Jul 2024 18:42:08 +0000 (14:42 -0400)]
tracing: Use refcount for trace_event_file reference counter
Instead of using an atomic counter for the trace_event_file reference
counter, use the refcount interface. It has various checks to make sure
the reference counting is correct, and will warn if it detects an error
(like refcount_inc() on '0').
Steven Rostedt [Tue, 30 Jul 2024 15:06:57 +0000 (11:06 -0400)]
tracing: Have format file honor EVENT_FILE_FL_FREED
When eventfs was introduced, special care had to be done to coordinate the
freeing of the file meta data with the files that are exposed to user
space. The file meta data would have a ref count that is set when the file
is created and would be decremented and freed after the last user that
opened the file closed it. When the file meta data was to be freed, it
would set a flag (EVENT_FILE_FL_FREED) to denote that the file is freed,
and any new references made (like new opens or reads) would fail as it is
marked freed. This allowed other meta data to be freed after this flag was
set (under the event_mutex).
All the files that were dynamically created in the events directory had a
pointer to the file meta data and would call event_release() when the last
reference to the user space file was closed. This would be the time that it
is safe to free the file meta data.
A shortcut was made for the "format" file. It's i_private would point to
the "call" entry directly and not point to the file's meta data. This is
because all format files are the same for the same "call", so it was
thought there was no reason to differentiate them. The other files
maintain state (like the "enable", "trigger", etc). But this meant if the
file were to disappear, the "format" file would be unaware of it.
This caused a race that could be trigger via the user_events test (that
would create dynamic events and free them), and running a loop that would
read the user_events format files:
In one console run:
# cd tools/testing/selftests/user_events
# while true; do ./ftrace_test; done
And in another console run:
# cd /sys/kernel/tracing/
# while true; do cat events/user_events/__test_event/format; done 2>/dev/null
With KASAN memory checking, it would trigger a use-after-free bug report
(which was a real bug). This was because the format file was not checking
the file's meta data flag "EVENT_FILE_FL_FREED", so it would access the
event that the file meta data pointed to after the event was freed.
After inspection, there are other locations that were found to not check
the EVENT_FILE_FL_FREED flag when accessing the trace_event_file. Add a
new helper function: event_file_file() that will make sure that the
event_mutex is held, and will return NULL if the trace_event_file has the
EVENT_FILE_FL_FREED flag set. Have the first reference of the struct file
pointer use event_file_file() and check for NULL. Later uses can still use
the event_file_data() helper function if the event_mutex is still held and
was not released since the event_file_file() call.
Jens Axboe [Wed, 7 Aug 2024 21:08:17 +0000 (15:08 -0600)]
io_uring/net: ensure expanded bundle send gets marked for cleanup
If the iovec inside the kmsg isn't already allocated AND one gets
expanded beyond the fixed size, then the request may not already have
been marked for cleanup. Ensure that it is.
Jens Axboe [Wed, 7 Aug 2024 21:06:45 +0000 (15:06 -0600)]
io_uring/net: ensure expanded bundle recv gets marked for cleanup
If the iovec inside the kmsg isn't already allocated AND one gets
expanded beyond the fixed size, then the request may not already have
been marked for cleanup. Ensure that it is.
Anton Khirnov [Mon, 29 Jul 2024 19:58:10 +0000 (21:58 +0200)]
Bluetooth: hci_sync: avoid dup filtering when passive scanning with adv monitor
This restores behaviour (including the comment) from now-removed
hci_request.c, and also matches existing code for active scanning.
Without this, the duplicates filter is always active when passive
scanning, which makes it impossible to work with devices that send
nontrivial dynamic data in their advertisement reports.
Fixes: abfeea476c68 ("Bluetooth: hci_sync: Convert MGMT_OP_START_DISCOVERY") Signed-off-by: Anton Khirnov <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Bluetooth: hci_qca: fix a NULL-pointer derefence at shutdown
Unlike qca_regulator_init(), qca_power_shutdown() may be called for
QCA_ROME which does not have qcadev->bt_power assigned. Add a
NULL-pointer check before dereferencing the struct qca_power pointer.
Fixes: eba1718717b0 ("Bluetooth: hci_qca: make pwrseq calls the default if available") Reported-by: Dmitry Baryshkov <[email protected]> Closes: https://lore.kernel.org/linux-bluetooth/su3wp6s44hrxf4ijvsdfzbvv4unu4ycb7kkvwbx6ltdafkldir@4g7ydqm2ap5j/ Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Bluetooth: hci_qca: fix QCA6390 support on non-DT platforms
QCA6390 can albo be used on non-DT systems so we must not make the power
sequencing the only option. Check if the serdev device consumes an OF
node. If so: honor the new contract as per the DT bindings. If not: fall
back to the previous behavior by falling through to the existing
default label.
Fixes: 9a15ce685706 ("Bluetooth: qca: use the power sequencer for QCA6390") Reported-by: Wren Turkal <[email protected]> Closes: https://lore.kernel.org/linux-bluetooth/[email protected]/ Signed-off-by: Bartosz Golaszewski <[email protected]> Reviewed-by: Paul Menzel <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Bluetooth: hci_qca: don't call pwrseq_power_off() twice for QCA6390
Now that we call pwrseq_power_off() for all models that hold a valid
power sequencing handle, we can remove the switch case for QCA_6390. The
switch will now use the default label for this model but that's fine: if
it has the BT-enable GPIO than we should use it.
Fixes: eba1718717b0 ("Bluetooth: hci_qca: make pwrseq calls the default if available") Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Chen Yu [Tue, 6 Aug 2024 11:22:07 +0000 (19:22 +0800)]
x86/paravirt: Fix incorrect virt spinlock setting on bare metal
The kernel can change spinlock behavior when running as a guest. But this
guest-friendly behavior causes performance problems on bare metal.
The kernel uses a static key to switch between the two modes.
In theory, the static key is enabled by default (run in guest mode) and
should be disabled for bare metal (and in some guests that want native
behavior or paravirt spinlock).
A performance drop is reported when running encode/decode workload and
BenchSEE cache sub-workload.
Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused
native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS is
disabled the virt_spin_lock_key is incorrectly set to true on bare
metal. The qspinlock degenerates to test-and-set spinlock, which decreases
the performance on bare metal.
Set the default value of virt_spin_lock_key to false. If booting in a VM,
enable this key. Later during the VM initialization, if other
high-efficient spinlock is preferred (e.g. paravirt-spinlock), or the user
wants the native qspinlock (via nopvspin boot commandline), the
virt_spin_lock_key is disabled accordingly.
This results in the following decision matrix:
X86_FEATURE_HYPERVISOR Y Y Y N
CONFIG_PARAVIRT_SPINLOCKS Y Y N Y/N
PV spinlock Y N N Y/N
Zhiquan Li [Mon, 5 Aug 2024 10:35:31 +0000 (18:35 +0800)]
x86/acpi: Remove __ro_after_init from acpi_mp_wake_mailbox
On a platform using the "Multiprocessor Wakeup Structure"[1] to startup
secondary CPUs the control processor needs to memremap() the physical
address of the MP Wakeup Structure mailbox to the variable
acpi_mp_wake_mailbox, which holds the virtual address of mailbox.
To wake up the AP the control processor writes the APIC ID of AP, the
wakeup vector and the ACPI_MP_WAKE_COMMAND_WAKEUP command into the mailbox.
Current implementation doesn't consider the case which restricts boot time
CPU bringup to 1 with the kernel parameter "maxcpus=1" and brings other
CPUs online later from user space as it sets acpi_mp_wake_mailbox to
read-only after init. So when the first AP is tried to brought online
after init, the attempt to update the variable results in a kernel panic.
The memremap() call that initializes the variable cannot be moved into
acpi_parse_mp_wake() because memremap() is not functional at that point in
the boot process. Also as the APs might never be brought up, keep the
memremap() call in acpi_wakeup_cpu() so that the operation only takes place
when needed.
Commit ac21add2540e ("ice: Implement driver functionality to dump fec
statistics") introduces obtaining FEC correctable and uncorrectable
stats per netdev in ICE driver. Unfortunately the assignment of values
to fec_stats structure has been done incorrectly. This commit fixes the
assignments.
Fixes: ac21add2540e ("ice: Implement driver functionality to dump fec statistics") Reviewed-by: Wojciech Drewek <[email protected]> Signed-off-by: Mateusz Polchlopek <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
Grzegorz Nitka [Mon, 15 Jul 2024 15:39:11 +0000 (17:39 +0200)]
ice: Skip PTP HW writes during PTP reset procedure
Block HW write access for the driver while the device is in reset to
avoid potential race condition and access to the PTP HW in
non-nominal state which could lead to undesired effects
Linus Torvalds [Wed, 7 Aug 2024 16:53:41 +0000 (09:53 -0700)]
Merge tag 'for-6.11-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix double inode unlock for direct IO sync writes (reported by
syzbot)
- fix root tree id/name map definitions, don't use fixed size buffers
for name (reported by -Werror=unterminated-string-initialization)
- fix qgroup reserve leaks in bufferd write path
- update scrub status structure more often so it can be reported in
user space more accurately and let 'resume' not repeat work
- in preparation to remove space cache v1 in the future print a warning
if it's detected
* tag 'for-6.11-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: avoid using fixed char array size for tree names
btrfs: fix double inode unlock for direct IO sync writes
btrfs: emit a warning about space cache v1 being deprecated
btrfs: fix qgroup reserve leaks in cow_file_range
btrfs: implement launder_folio for clearing dirty page reserve
btrfs: scrub: update last_physical after scrubbing one stripe
btrfs: factor out stripe length calculation into a helper
Linus Torvalds [Wed, 7 Aug 2024 16:45:21 +0000 (09:45 -0700)]
Merge tag 'for-v6.11-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel:
"rt5033:
- fix driver regression causing kernel oops
axp288-charger:
- fix charge voltage setup
qcom-battmgr:
- fix thermal zone spamming errors
- fix init on Qualcomm X Elite"
* tag 'for-v6.11-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: qcom_battmgr: Ignore extra __le32 in info payload
power: supply: qcom_battmgr: return EAGAIN when firmware service is not up
power: supply: axp288_charger: Round constant_charge_voltage writes down
power: supply: axp288_charger: Fix constant_charge_voltage writes
power: supply: rt5033: Bring back i2c_set_clientdata
Shay Drory [Tue, 6 Aug 2024 07:20:44 +0000 (10:20 +0300)]
genirq/irqdesc: Honor caller provided affinity in alloc_desc()
Currently, whenever a caller is providing an affinity hint for an
interrupt, the allocation code uses it to calculate the node and copies the
cpumask into irq_desc::affinity.
If the affinity for the interrupt is not marked 'managed' then the startup
of the interrupt ignores irq_desc::affinity and uses the system default
affinity mask.
Prevent this by setting the IRQD_AFFINITY_SET flag for the interrupt in the
allocator, which causes irq_setup_affinity() to use irq_desc::affinity on
interrupt startup if the mask contains an online CPU.
Thomas Gleixner [Tue, 6 Aug 2024 18:48:43 +0000 (20:48 +0200)]
x86/mm: Fix PTI for i386 some more
So it turns out that we have to do two passes of
pti_clone_entry_text(), once before initcalls, such that device and
late initcalls can use user-mode-helper / modprobe and once after
free_initmem() / mark_readonly().
Now obviously mark_readonly() can cause PMD splits, and
pti_clone_pgtable() doesn't like that much.
Allow the late clone to split PMDs so that pagetables stay in sync.
Dmitry Torokhov [Tue, 6 Aug 2024 16:12:58 +0000 (09:12 -0700)]
ARM: pxa/gumstix: fix attaching properties to vbus gpio device
Commit f1d6588af93b tried to convert GPIO lookup tables to software
properties for the vbus gpio device, bit forgot the most important
step: actually attaching the new properties to the device.
Also fix up the name of the property array to reflect the board name,
and add missing gpio/property.h and devices.h includes absence of which
causes compile failures on some configurations.
Switch "#ifdef CONFIG_USB_PXA25X" to "#if IS_ENABLED(CONFIG_USB_PXA25X)"
because it should not matter if the driver is buolt in or a module, it
still need vbus controls.
Marek Behún [Tue, 30 Jul 2024 14:49:24 +0000 (16:49 +0200)]
doc: platform: cznic: turris-omnia-mcu: Use double backticks for attribute value
Use double backticks instead of quotes for sysfs attribute value.
This makes sphinx generate the "mcu" and "cpu" values in monospace when
rendering to HTML.
Marek Behún [Fri, 19 Jul 2024 08:57:56 +0000 (10:57 +0200)]
platform: cznic: turris-omnia-mcu: Make GPIO code optional
Make the GPIO part of the driver optional, under a boolean config
option. Move the dependency to GPIOLIB and OF and the selection of
GPIOLIB_IRQCHIP to this new option.
This makes the turris-omnia-mcu driver available for compilation even if
GPIOLIB or OF are disabled.
Marek Behún [Fri, 19 Jul 2024 08:57:55 +0000 (10:57 +0200)]
platform: cznic: turris-omnia-mcu: Make poweroff and wakeup code optional
Make the system poweroff and RTC wakeup part of the driver optional,
under a boolean config option. Move the dependency to RTC_CLASS to this
new option.
This makes the turris-omnia-mcu driver available for compilation even if
RTC_CLASS is disabled.
Marek Behún [Fri, 19 Jul 2024 08:57:53 +0000 (10:57 +0200)]
platform: cznic: turris-omnia-mcu: Make watchdog code optional
Make the watchdog part of the driver optional, under a boolean config
option. Move the dependency to WATCHDOG to this new option, and change
the WATCHDOG_CORE dependency to selection, as is done in most watchdog
drivers.
This makes the turris-omnia-mcu driver available for compilation even if
WATCHDOG is disabled.
Kent Overstreet [Wed, 31 Jul 2024 00:35:59 +0000 (20:35 -0400)]
bcachefs: Add a comment for bucket helper types
We've had bugs in the past with incorrect integer conversions in disk
accounting code, which is why bucket helpers now always return s64s; add
a comment explaining this.
Kent Overstreet [Wed, 31 Jul 2024 00:33:25 +0000 (20:33 -0400)]
bcachefs: Don't rely on implicit unsigned -> signed integer conversion
implicit integer conversion is a fertile source of bugs, and we really
would rather not have the min()/max() macros doing it implicitly.
bcachefs appears to be the only place in the kernel where this happens,
so let's fix it.
Xu Yang [Fri, 2 Aug 2024 06:41:56 +0000 (14:41 +0800)]
usb: typec: tcpm: avoid sink goto SNK_UNATTACHED state if not received source capability message
Since commit (122968f8dda8 usb: typec: tcpm: avoid resets for missing
source capability messages), state will change from SNK_WAIT_CAPABILITIES
to SNK_WAIT_CAPABILITIES_TIMEOUT. We need to change SNK_WAIT_CAPABILITIES
-> SNK_READY path to SNK_WAIT_CAPABILITIES_TIMEOUT -> SNK_READY
accordingly. Otherwise, the sink port will never change to SNK_READY state
if the source does't have PD capability.
Tudor Ambarus [Fri, 2 Aug 2024 14:04:28 +0000 (14:04 +0000)]
usb: gadget: f_fs: pull out f->disable() from ffs_func_set_alt()
The ``alt`` parameter was used as a way to differentiate between
f->disable() and f->set_alt(). As the code paths diverge quite a bit,
pull out the f->disable() code from ffs_func_set_alt(), everything will
become clearer and less error prone. No change in functionality
intended.
The blamed commit made ffs_func_disable() always return -EINVAL as the
method calls ffs_func_set_alt() with the ``alt`` argument being
``(unsigned)-1``, which is always greater than MAX_ALT_SETTINGS.
Use the MAX_ALT_SETTINGS check just in the f->set_alt() code path,
f->disable() doesn't care about the ``alt`` parameter.
Make a surgical fix, but really the f->disable() code shall be pulled
out from ffs_func_set_alt(), the code will become clearer. A patch will
follow.
Note that ffs_func_disable() always returning -EINVAL made pixel6 crash
on USB disconnect.
Huacai Chen [Wed, 7 Aug 2024 09:37:11 +0000 (17:37 +0800)]
LoongArch: Use accessors to page table entries instead of direct dereference
As very well explained in commit 20a004e7b017cce282 ("arm64: mm: Use
READ_ONCE/WRITE_ONCE when accessing page tables"), an architecture whose
page table walker can modify the PTE in parallel must use READ_ONCE()/
WRITE_ONCE() macro to avoid any compiler transformation.
So apply that to LoongArch which is such an architecture, in order to
avoid potential problems.
Similar to commit edf955647269422e ("riscv: Use accessors to page table
entries instead of direct dereference").
Miao Wang [Wed, 7 Aug 2024 09:37:11 +0000 (17:37 +0800)]
LoongArch: Enable general EFI poweroff method
efi_shutdown_init() can register a general sys_off handler named
efi_power_off(). Enable this by providing efi_poweroff_required(),
like arm and x86. Since EFI poweroff is also supported on LoongArch,
and the enablement makes the poweroff function usable for hardwares
which lack ACPI S5.
We prefer ACPI poweroff rather than EFI poweroff (like x86), so we only
require EFI poweroff if acpi_gbl_reduced_hardware or acpi_no_s5 is true.
David Gow [Sun, 4 Aug 2024 09:18:48 +0000 (17:18 +0800)]
drm/i915: Attempt to get pages without eviction first
In commit a78a8da51b36 ("drm/ttm: replace busy placement with flags v6"),
__i915_ttm_get_pages was updated to use flags instead of the separate
'busy' placement list. However, the behaviour was subtly changed.
Originally, the function would attempt to use the preferred placement
without eviction, and give an opportunity to restart the operation
before falling back to allowing eviction.
This was unintentionally changed, as the preferred placement was not
given the TTM_PL_FLAG_DESIRED flag, and so eviction could be triggered
in that first pass. This caused thrashing, and a significant performance
regression on DG2 systems with small BAR. For example, Minecraft and
Team Fortress 2 would drop to single-digit framerates.
Restore the original behaviour by marking the initial placement as
desired on that first attempt. Also, rework this to use a separate
struct ttm_palcement, as the individual placements are marked 'const',
so hot-patching the flags is even more dodgy than before.
David Gow [Sun, 4 Aug 2024 09:18:47 +0000 (17:18 +0800)]
drm/i915: Allow evicting to use the requested placement
In commit a78a8da51b36 ("drm/ttm: replace busy placement with flags v6"),
the old system of having a separate placement list (for placements
which should be used without eviction) and a 'busy' placement list (for
placements which should be attempted if eviction is required) was
replaced with a new one where placements could be marked 'FALLBACK' (to
be attempted if eviction is required) or 'DESIRED' (to be attempted
first, but not if eviction is required).
i915 had always included the requested placement in the list of
'busy' placements: i.e., the placement could be used either if eviction
is required or not. But when the new system was put in place, the
requested (first) placement was marked 'DESIRED', so would never be used
if eviction became necessary. While a bug in the original commit
prevented this flag from working, when this was fixed in 4a0e7b3c ("drm/i915: fix applying placement flag"), it caused long hangs
on DG2 systems with small BAR.
Don't mark the requested placement DESIRED (or FALLBACK), allowing it to
be used in both situations. This matches the old behaviour, and resolves
the hangs.
The Framework Laptop 13 (Intel Core Ultra) has an ALC285 that ships in a
similar configuration to the ALC295 in previous models. It requires the
same quirk for headset detection.
Tristram Ha [Mon, 5 Aug 2024 23:52:00 +0000 (16:52 -0700)]
net: dsa: microchip: Fix Wake-on-LAN check to not return an error
The wol variable in ksz_port_set_mac_address() is declared with random
data, but the code in ksz_get_wol call may not be executed so the
WAKE_MAGIC check may be invalid resulting in an error message when
setting a MAC address after starting the DSA driver.
Simon Ser [Wed, 31 Jul 2024 19:10:20 +0000 (19:10 +0000)]
drm/atomic: allow no-op FB_ID updates for async flips
User-space is allowed to submit any property in an async flip as
long as the value doesn't change. However we missed one case:
as things stand, the kernel rejects no-op FB_ID changes on
non-primary planes. Fix this by changing the conditional and
skipping drm_atomic_check_prop_changes() only for FB_ID on the
primary plane (instead of skipping for FB_ID on any plane).
Rik van Riel [Tue, 6 Aug 2024 15:56:45 +0000 (11:56 -0400)]
dma-debug: avoid deadlock between dma debug vs printk and netconsole
Currently the dma debugging code can end up indirectly calling printk
under the radix_lock. This happens when a radix tree node allocation
fails.
This is a problem because the printk code, when used together with
netconsole, can end up inside the dma debugging code while trying to
transmit a message over netcons.
This creates the possibility of either a circular deadlock on the same
CPU, with that CPU trying to grab the radix_lock twice, or an ABBA
deadlock between different CPUs, where one CPU grabs the console lock
first and then waits for the radix_lock, while the other CPU is holding
the radix_lock and is waiting for the console lock.
The trace captured by lockdep is of the ABBA variant.
Linus Torvalds [Tue, 6 Aug 2024 14:52:10 +0000 (07:52 -0700)]
Merge tag 'platform-drivers-x86-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
"Fixes:
- Fix ACPI notifier racing with itself (intel-vbtn)
- Initialize local variable to cover a timeout corner case
(intel/ifs)
- WMI docs spelling
New device IDs:
- amd/{pmc,pmf}: AMD 1Ah model 60h series.
- amd/pmf: SPS quirk support for ASUS ROG Ally X"
* tag 'platform-drivers-x86-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/intel/ifs: Initialize union ifs_status to zero
platform/x86: msi-wmi-platform: Fix spelling mistakes
platform/x86/amd/pmf: Add new ACPI ID AMDI0107
platform/x86/amd/pmc: Send OS_HINT command for new AMD platform
platform/x86/amd: pmf: Add quirk for ROG Ally X
platform/x86: intel-vbtn: Protect ACPI notify handler against recursion
This pair of patches extend wm_adsp to add a callback that can be used
to control whether ALSA controls are added and then tweak cs35l56 to use
it to suppress controls made from firmware coefficients.
Calculating the size of the mapped area as the lesser value
between the requested size and the actual size does not consider
the partial mapping offset. This can cause page fault access.
Fix the calculation of the starting and ending addresses, the
total size is now deduced from the difference between the end and
start addresses.
Additionally, the calculations have been rewritten in a clearer
and more understandable form.
Andi Shyti [Fri, 2 Aug 2024 08:38:49 +0000 (10:38 +0200)]
drm/i915/gem: Adjust vma offset for framebuffer mmap offset
When mapping a framebuffer object, the virtual memory area (VMA)
offset ('vm_pgoff') should be adjusted by the start of the
'vma_node' associated with the object. This ensures that the VMA
offset is correctly aligned with the corresponding offset within
the GGTT aperture.
Increment vm_pgoff by the start of the vma_node with the offset=
provided by the user.
Correct the McASP nodes - mcasp3 and mcasp4 with the right
DMAs thread IDs as per TISCI documentation [1] for J784s4.
This fixes the related McASPs probe failure due to incorrect
DMA IDs.
Arnd Bergmann [Mon, 5 Aug 2024 20:38:29 +0000 (22:38 +0200)]
syscalls: add back legacy __NR_nfsservctl macro
The conversion from the old unistd.h file to syscall.tbl dropped the
nfsservctl macro. This one was handled inconsistently across architectures
in the original introduction of the syscall.tbl format, and I went the
other way on this.
The syscall was already gone in linux-3.1 before the current users
of the generic table (other than openrisc) first appeared, so nobody
could actally use it, but putting the number back helps for consistency
since there are build scripts that check the presence of all these
macros.
ALSA: hda: Add HP MP9 G4 Retail System AMS to force connect list
In recent HP UEFI firmware (likely v2.15 and above, tested on 2.27),
these pins are incorrectly set for HDMI/DP audio. Tested on
HP MP9 G4 Retail System AMS. Tested audio with two monitors connected
via DisplayPort.
net: bridge: mcast: wait for previous gc cycles when removing port
syzbot hit a use-after-free[1] which is caused because the bridge doesn't
make sure that all previous garbage has been collected when removing a
port. What happens is:
CPU 1 CPU 2
start gc cycle remove port
acquire gc lock first
wait for lock
call br_multicasg_gc() directly
acquire lock now but free port
the port can be freed
while grp timers still
running
Make sure all previous gc cycles have finished by using flush_work before
freeing the port.
[1]
BUG: KASAN: slab-use-after-free in br_multicast_port_group_expired+0x4c0/0x550 net/bridge/br_multicast.c:861
Read of size 8 at addr ffff888071d6d000 by task syz.5.1232/9699
Linus Torvalds [Mon, 5 Aug 2024 21:31:12 +0000 (14:31 -0700)]
Merge tag 'linux_kselftest-fixes-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
"A single fix to the conditional in ksft.py script which incorrectly
flags a test suite failed when there are skipped tests in the mix.
The logic is fixed to take skipped tests into account and report the
test as passed"
* tag 'linux_kselftest-fixes-6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: ksft: Fix finished() helper exit code on skipped tests
Arnd Bergmann [Mon, 5 Aug 2024 20:35:43 +0000 (22:35 +0200)]
syscalls: fix fstat() entry again
The previous patch to fix the newfstatat() syscall entry ended up breaking
fstat() instead. Unfortunately these two are not handled the same way, so
I messed this one up the exact opposite way.
Jared McArthur [Thu, 1 Aug 2024 21:04:14 +0000 (16:04 -0500)]
arm64: dts: ti: k3-j722s: Fix gpio-range for main_pmx0
Commit 5e5c50964e2e ("arm64: dts: ti: k3-j722s: Add gpio-ranges
properties") introduced pinmux range definition for gpio-ranges, however
missed a hole within gpio-range for main_pmx0. As a result, automatic
mapping of GPIO to pin control for gpios within the main_pmx0 domain is
broken. Fix this by correcting the gpio-range.
Jared McArthur [Thu, 1 Aug 2024 21:04:13 +0000 (16:04 -0500)]
arm64: dts: ti: k3-am62p: Fix gpio-range for main_pmx0
Commit d72d73a44c3c ("arm64: dts: ti: k3-am62p: Add gpio-ranges
properties") introduced pinmux range definition for gpio-ranges, however
missed a hole within gpio-range for main_pmx0. As a result, automatic
mapping of GPIO to pin control for gpios within the main_pmx0 domain is
broken. Fix this by correcting the gpio-range.
Jared McArthur [Thu, 1 Aug 2024 21:04:12 +0000 (16:04 -0500)]
arm64: dts: ti: k3-am62p: Add gpio-ranges for mcu_gpio0
Commit d72d73a44c3c ("arm64: dts: ti: k3-am62p: Add gpio-ranges
properties") introduced pinmux range definition for gpio-ranges, however
missed introducing the range description for the mcu_gpio node. As a
result, automatic mapping of GPIO to pin control for mcu gpios is
broken. Fix this by introducing the proper ranges.
ASoC: cs35l56: Handle OTP read latency over SoundWire
Use the late-read buffer in the CS35L56 SoundWire interface to
read OTP memory.
The OTP memory has a longer access latency than chip registers
and cannot guarantee to return the data value in the SoundWire
control response if the bus clock is >4.8 MHz. The Cirrus
SoundWire peripheral IP exposes the bridge-to-bus read buffer
and status bits. For a read from OTP the bridge status bits are
polled to wait for the OTP data to be loaded into the read buffer
and the data is then read from there.
Linus Torvalds [Mon, 5 Aug 2024 16:23:00 +0000 (09:23 -0700)]
Merge tag 'slab-fixes-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
"Since v6.8 we've had a subtle breakage in SLUB with KFENCE enabled,
that can cause a crash. It hasn't been found earlier due to quite
specific conditions necessary (OOM during kmem_cache_alloc_bulk())"
* tag 'slab-fixes-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm, slub: do not call do_slab_free for kfence object
Thomas Gleixner [Sat, 3 Aug 2024 15:07:51 +0000 (17:07 +0200)]
timekeeping: Fix bogus clock_was_set() invocation in do_adjtimex()
The addition of the bases argument to clock_was_set() fixed up all call
sites correctly except for do_adjtimex(). This uses CLOCK_REALTIME
instead of CLOCK_SET_WALL as argument. CLOCK_REALTIME is 0.
As a result the effect of that clock_was_set() notification is incomplete
and might result in timers expiring late because the hrtimer code does
not re-evaluate the affected clock bases.
Use CLOCK_SET_WALL instead of CLOCK_REALTIME to tell the hrtimers code
which clock bases need to be re-evaluated.
Justin Stitt [Fri, 17 May 2024 00:47:10 +0000 (00:47 +0000)]
ntp: Safeguard against time_constant overflow
Using syzkaller with the recently reintroduced signed integer overflow
sanitizer produces this UBSAN report:
UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:738:18 9223372036854775806 + 4 cannot be represented in type 'long'
Call Trace:
handle_overflow+0x171/0x1b0
__do_adjtimex+0x1236/0x1440
do_adjtimex+0x2be/0x740
The user supplied time_constant value is incremented by four and then
clamped to the operating range.
Before commit eea83d896e31 ("ntp: NTP4 user space bits update") the user
supplied value was sanity checked to be in the operating range. That change
removed the sanity check and relied on clamping after incrementing which
does not work correctly when the user supplied value is in the overflow
zone of the '+ 4' operation.
The operation requires CAP_SYS_TIME and the side effect of the overflow is
NTP getting out of sync.
Similar to the fixups for time_maxerror and time_esterror, clamp the user
space supplied value to the operating range.