Git Repo - linux.git/log

btrfs: tree-checker: fix the error message for transid error

The error message for inode transid is the same as for inode generation,
which makes us unable to detect the real problem.

Reported-by: Tyler Richmond <[email protected]>
Fixes: 496245cac57e ("btrfs: tree-checker: Verify inode item")
CC: [email protected] # 5.4+
Reviewed-by: Marcos Paulo de Souza <[email protected]>
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: set the lockdep class for log tree extent buffers

These are special extent buffers that get rewound in order to lookup
the state of the tree at a specific point in time. As such they do not
go through the normal initialization paths that set their lockdep class,
so handle them appropriately when they are created and before they are
locked.

CC: [email protected] # 4.4+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: set the correct lockdep class for new nodes

When flipping over to the rw_semaphore I noticed I'd get a lockdep splat
in replace_path(), which is weird because we're swapping the reloc root
with the actual target root. Turns out this is because we're using the
root->root_key.objectid as the root id for the newly allocated tree
block when setting the lockdep class, however we need to be using the
actual owner of this new block, which is saved in owner.

The affected path is through btrfs_copy_root as all other callers of
btrfs_alloc_tree_block (which calls init_new_buffer) have root_objectid
== root->root_key.objectid .

CC: [email protected] # 5.4+
Reviewed-by: Filipe Manana <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: allocate scrub workqueues outside of locks

I got the following lockdep splat while testing:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.8.0-rc7-00172-g021118712e59 #932 Not tainted
  ------------------------------------------------------
  btrfs/229626 is trying to acquire lock:
  ffffffff828513f0 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue+0x378/0x450

  but task is already holding lock:
  ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #7 (&fs_info->scrub_lock){+.+.}-{3:3}:
__mutex_lock+0x9f/0x930
btrfs_scrub_dev+0x11c/0x630
btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
btrfs_ioctl+0x2799/0x30a0
ksys_ioctl+0x83/0xc0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x50/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #6 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
__mutex_lock+0x9f/0x930
btrfs_run_dev_stats+0x49/0x480
commit_cowonly_roots+0xb5/0x2a0
btrfs_commit_transaction+0x516/0xa60
sync_filesystem+0x6b/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0xe/0x30
btrfs_kill_super+0x12/0x20
deactivate_locked_super+0x29/0x60
cleanup_mnt+0xb8/0x140
task_work_run+0x6d/0xb0
__prepare_exit_to_usermode+0x1cc/0x1e0
do_syscall_64+0x5c/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #5 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
__mutex_lock+0x9f/0x930
btrfs_commit_transaction+0x4bb/0xa60
sync_filesystem+0x6b/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0xe/0x30
btrfs_kill_super+0x12/0x20
deactivate_locked_super+0x29/0x60
cleanup_mnt+0xb8/0x140
task_work_run+0x6d/0xb0
__prepare_exit_to_usermode+0x1cc/0x1e0
do_syscall_64+0x5c/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #4 (&fs_info->reloc_mutex){+.+.}-{3:3}:
__mutex_lock+0x9f/0x930
btrfs_record_root_in_trans+0x43/0x70
start_transaction+0xd1/0x5d0
btrfs_dirty_inode+0x42/0xd0
touch_atime+0xa1/0xd0
btrfs_file_mmap+0x3f/0x60
mmap_region+0x3a4/0x640
do_mmap+0x376/0x580
vm_mmap_pgoff+0xd5/0x120
ksys_mmap_pgoff+0x193/0x230
do_syscall_64+0x50/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #3 (&mm->mmap_lock#2){++++}-{3:3}:
__might_fault+0x68/0x90
_copy_to_user+0x1e/0x80
perf_read+0x141/0x2c0
vfs_read+0xad/0x1b0
ksys_read+0x5f/0xe0
do_syscall_64+0x50/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #2 (&cpuctx_mutex){+.+.}-{3:3}:
__mutex_lock+0x9f/0x930
perf_event_init_cpu+0x88/0x150
perf_event_init+0x1db/0x20b
start_kernel+0x3ae/0x53c
secondary_startup_64+0xa4/0xb0

  -> #1 (pmus_lock){+.+.}-{3:3}:
__mutex_lock+0x9f/0x930
perf_event_init_cpu+0x4f/0x150
cpuhp_invoke_callback+0xb1/0x900
_cpu_up.constprop.26+0x9f/0x130
cpu_up+0x7b/0xc0
bringup_nonboot_cpus+0x4f/0x60
smp_init+0x26/0x71
kernel_init_freeable+0x110/0x258
kernel_init+0xa/0x103
ret_from_fork+0x1f/0x30

  -> #0 (cpu_hotplug_lock){++++}-{0:0}:
__lock_acquire+0x1272/0x2310
lock_acquire+0x9e/0x360
cpus_read_lock+0x39/0xb0
alloc_workqueue+0x378/0x450
__btrfs_alloc_workqueue+0x15d/0x200
btrfs_alloc_workqueue+0x51/0x160
scrub_workers_get+0x5a/0x170
btrfs_scrub_dev+0x18c/0x630
btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
btrfs_ioctl+0x2799/0x30a0
ksys_ioctl+0x83/0xc0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x50/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9

  other info that might help us debug this:

  Chain exists of:
    cpu_hotplug_lock --> &fs_devs->device_list_mutex --> &fs_info->scrub_lock

   Possible unsafe locking scenario:

CPU0                    CPU1
----                    ----
    lock(&fs_info->scrub_lock);
lock(&fs_devs->device_list_mutex);
lock(&fs_info->scrub_lock);
    lock(cpu_hotplug_lock);

   *** DEADLOCK ***

  2 locks held by btrfs/229626:
   #0: ffff88bfe8bb86e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_scrub_dev+0xbd/0x630
   #1: ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630

  stack backtrace:
  CPU: 15 PID: 229626 Comm: btrfs Kdump: loaded Not tainted 5.8.0-rc7-00172-g021118712e59 #932
  Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
  Call Trace:
   dump_stack+0x78/0xa0
   check_noncircular+0x165/0x180
   __lock_acquire+0x1272/0x2310
   lock_acquire+0x9e/0x360
   ? alloc_workqueue+0x378/0x450
   cpus_read_lock+0x39/0xb0
   ? alloc_workqueue+0x378/0x450
   alloc_workqueue+0x378/0x450
   ? rcu_read_lock_sched_held+0x52/0x80
   __btrfs_alloc_workqueue+0x15d/0x200
   btrfs_alloc_workqueue+0x51/0x160
   scrub_workers_get+0x5a/0x170
   btrfs_scrub_dev+0x18c/0x630
   ? start_transaction+0xd1/0x5d0
   btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
   btrfs_ioctl+0x2799/0x30a0
   ? do_sigaction+0x102/0x250
   ? lockdep_hardirqs_on_prepare+0xca/0x160
   ? _raw_spin_unlock_irq+0x24/0x30
   ? trace_hardirqs_on+0x1c/0xe0
   ? _raw_spin_unlock_irq+0x24/0x30
   ? do_sigaction+0x102/0x250
   ? ksys_ioctl+0x83/0xc0
   ksys_ioctl+0x83/0xc0
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x50/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happens because we're allocating the scrub workqueues under the
scrub and device list mutex, which brings in a whole host of other
dependencies.

Because the work queue allocation is done with GFP_KERNEL, it can
trigger reclaim, which can lead to a transaction commit, which in turns
needs the device_list_mutex, it can lead to a deadlock. A different
problem for which this fix is a solution.

Fix this by moving the actual allocation outside of the
scrub lock, and then only take the lock once we're ready to actually
assign them to the fs_info.  We'll now have to cleanup the workqueues in
a few more places, so I've added a helper to do the refcount dance to
safely free the workqueues.

CC: [email protected] # 5.4+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: fix potential deadlock in the search ioctl

With the conversion of the tree locks to rwsem I got the following
lockdep splat:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.8.0-rc7-00165-g04ec4da5f45f-dirty #922 Not tainted
  ------------------------------------------------------
  compsize/11122 is trying to acquire lock:
  ffff889fabca8768 (&mm->mmap_lock#2){++++}-{3:3}, at: __might_fault+0x3e/0x90

  but task is already holding lock:
  ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (btrfs-fs-00){++++}-{3:3}:
down_write_nested+0x3b/0x70
__btrfs_tree_lock+0x24/0x120
btrfs_search_slot+0x756/0x990
btrfs_lookup_inode+0x3a/0xb4
__btrfs_update_delayed_inode+0x93/0x270
btrfs_async_run_delayed_root+0x168/0x230
btrfs_work_helper+0xd4/0x570
process_one_work+0x2ad/0x5f0
worker_thread+0x3a/0x3d0
kthread+0x133/0x150
ret_from_fork+0x1f/0x30

  -> #1 (&delayed_node->mutex){+.+.}-{3:3}:
__mutex_lock+0x9f/0x930
btrfs_delayed_update_inode+0x50/0x440
btrfs_update_inode+0x8a/0xf0
btrfs_dirty_inode+0x5b/0xd0
touch_atime+0xa1/0xd0
btrfs_file_mmap+0x3f/0x60
mmap_region+0x3a4/0x640
do_mmap+0x376/0x580
vm_mmap_pgoff+0xd5/0x120
ksys_mmap_pgoff+0x193/0x230
do_syscall_64+0x50/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (&mm->mmap_lock#2){++++}-{3:3}:
__lock_acquire+0x1272/0x2310
lock_acquire+0x9e/0x360
__might_fault+0x68/0x90
_copy_to_user+0x1e/0x80
copy_to_sk.isra.32+0x121/0x300
search_ioctl+0x106/0x200
btrfs_ioctl_tree_search_v2+0x7b/0xf0
btrfs_ioctl+0x106f/0x30a0
ksys_ioctl+0x83/0xc0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x50/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9

  other info that might help us debug this:

  Chain exists of:
    &mm->mmap_lock#2 --> &delayed_node->mutex --> btrfs-fs-00

   Possible unsafe locking scenario:

CPU0                    CPU1
----                    ----
    lock(btrfs-fs-00);
lock(&delayed_node->mutex);
lock(btrfs-fs-00);
    lock(&mm->mmap_lock#2);

   *** DEADLOCK ***

  1 lock held by compsize/11122:
   #0: ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180

  stack backtrace:
  CPU: 17 PID: 11122 Comm: compsize Kdump: loaded Not tainted 5.8.0-rc7-00165-g04ec4da5f45f-dirty #922
  Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
  Call Trace:
   dump_stack+0x78/0xa0
   check_noncircular+0x165/0x180
   __lock_acquire+0x1272/0x2310
   lock_acquire+0x9e/0x360
   ? __might_fault+0x3e/0x90
   ? find_held_lock+0x72/0x90
   __might_fault+0x68/0x90
   ? __might_fault+0x3e/0x90
   _copy_to_user+0x1e/0x80
   copy_to_sk.isra.32+0x121/0x300
   ? btrfs_search_forward+0x2a6/0x360
   search_ioctl+0x106/0x200
   btrfs_ioctl_tree_search_v2+0x7b/0xf0
   btrfs_ioctl+0x106f/0x30a0
   ? __do_sys_newfstat+0x5a/0x70
   ? ksys_ioctl+0x83/0xc0
   ksys_ioctl+0x83/0xc0
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x50/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

The problem is we're doing a copy_to_user() while holding tree locks,
which can deadlock if we have to do a page fault for the copy_to_user().
This exists even without my locking changes, so it needs to be fixed.
Rework the search ioctl to do the pre-fault and then
copy_to_user_nofault for the copying.

CC: [email protected] # 4.4+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: drop path before adding new uuid tree entry

With the conversion of the tree locks to rwsem I got the following
lockdep splat:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.8.0-rc7-00167-g0d7ba0c5b375-dirty #925 Not tainted
  ------------------------------------------------------
  btrfs-uuid/7955 is trying to acquire lock:
  ffff88bfbafec0f8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180

  but task is already holding lock:
  ffff88bfbafef2a8 (btrfs-uuid-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (btrfs-uuid-00){++++}-{3:3}:
down_read_nested+0x3e/0x140
__btrfs_tree_read_lock+0x39/0x180
__btrfs_read_lock_root_node+0x3a/0x50
btrfs_search_slot+0x4bd/0x990
btrfs_uuid_tree_add+0x89/0x2d0
btrfs_uuid_scan_kthread+0x330/0x390
kthread+0x133/0x150
ret_from_fork+0x1f/0x30

  -> #0 (btrfs-root-00){++++}-{3:3}:
__lock_acquire+0x1272/0x2310
lock_acquire+0x9e/0x360
down_read_nested+0x3e/0x140
__btrfs_tree_read_lock+0x39/0x180
__btrfs_read_lock_root_node+0x3a/0x50
btrfs_search_slot+0x4bd/0x990
btrfs_find_root+0x45/0x1b0
btrfs_read_tree_root+0x61/0x100
btrfs_get_root_ref.part.50+0x143/0x630
btrfs_uuid_tree_iterate+0x207/0x314
btrfs_uuid_rescan_kthread+0x12/0x50
kthread+0x133/0x150
ret_from_fork+0x1f/0x30

  other info that might help us debug this:

   Possible unsafe locking scenario:

CPU0                    CPU1
----                    ----
    lock(btrfs-uuid-00);
lock(btrfs-root-00);
lock(btrfs-uuid-00);
    lock(btrfs-root-00);

   *** DEADLOCK ***

  1 lock held by btrfs-uuid/7955:
   #0: ffff88bfbafef2a8 (btrfs-uuid-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180

  stack backtrace:
  CPU: 73 PID: 7955 Comm: btrfs-uuid Kdump: loaded Not tainted 5.8.0-rc7-00167-g0d7ba0c5b375-dirty #925
  Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
  Call Trace:
   dump_stack+0x78/0xa0
   check_noncircular+0x165/0x180
   __lock_acquire+0x1272/0x2310
   lock_acquire+0x9e/0x360
   ? __btrfs_tree_read_lock+0x39/0x180
   ? btrfs_root_node+0x1c/0x1d0
   down_read_nested+0x3e/0x140
   ? __btrfs_tree_read_lock+0x39/0x180
   __btrfs_tree_read_lock+0x39/0x180
   __btrfs_read_lock_root_node+0x3a/0x50
   btrfs_search_slot+0x4bd/0x990
   btrfs_find_root+0x45/0x1b0
   btrfs_read_tree_root+0x61/0x100
   btrfs_get_root_ref.part.50+0x143/0x630
   btrfs_uuid_tree_iterate+0x207/0x314
   ? btree_readpage+0x20/0x20
   btrfs_uuid_rescan_kthread+0x12/0x50
   kthread+0x133/0x150
   ? kthread_create_on_node+0x60/0x60
   ret_from_fork+0x1f/0x30

This problem exists because we have two different rescan threads,
btrfs_uuid_scan_kthread which creates the uuid tree, and
btrfs_uuid_tree_iterate that goes through and updates or deletes any out
of date roots.  The problem is they both do things in different order.
btrfs_uuid_scan_kthread() reads the tree_root, and then inserts entries
into the uuid_root.  btrfs_uuid_tree_iterate() scans the uuid_root, but
then does a btrfs_get_fs_root() which can read from the tree_root.

It's actually easy enough to not be holding the path in
btrfs_uuid_scan_kthread() when we add a uuid entry, as we already drop
it further down and re-start the search when we loop.  So simply move
the path release before we add our entry to the uuid tree.

This also fixes a problem where we're holding a path open after we do
btrfs_end_transaction(), which has it's own problems.

CC: [email protected] # 4.4+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: block-group: fix free-space bitmap threshold

[BUG]
After commit 9afc66498a0b ("btrfs: block-group: refactor how we read one
block group item"), cache->length is being assigned after calling
btrfs_create_block_group_cache. This causes a problem since
set_free_space_tree_thresholds calculates the free-space threshold to
decide if the free-space tree should convert from extents to bitmaps.

The current code calls set_free_space_tree_thresholds with cache->length
being 0, which then makes cache->bitmap_high_thresh zero. This implies
the system will always use bitmap instead of extents, which is not
desired if the block group is not fragmented.

This behavior can be seen by a test that expects to repair systems
with FREE_SPACE_EXTENT and FREE_SPACE_BITMAP, but the current code only
created FREE_SPACE_BITMAP.

[FIX]
Call set_free_space_tree_thresholds after setting cache->length. There
is now a WARN_ON in set_free_space_tree_thresholds to help preventing
the same mistake to happen again in the future.

Link: https://github.com/kdave/btrfs-progs/issues/251
Fixes: 9afc66498a0b ("btrfs: block-group: refactor how we read one block group item")
CC: [email protected] # 5.8+
Reviewed-by: Qu Wenruo <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Marcos Paulo de Souza <[email protected]>
Signed-off-by: David Sterba <[email protected]>

cpufreq: Use WARN_ON_ONCE() for invalid relation

The relation can't be invalid here, so if it turns out to be invalid,
just WARN_ON_ONCE() and return 0.

Signed-off-by: Viresh Kumar <[email protected]>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <[email protected]>

cpufreq: No need to verify cpufreq_driver in show_scaling_cur_freq()

"cpufreq_driver" is guaranteed to be valid here, no need to check it
here.

Signed-off-by: Viresh Kumar <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

Revert "powerpc/powernv/idle: Replace CPU feature check with PVR check"

cpuidle stop state implementation has minor optimizations for P10
where hardware preserves more SPR registers compared to P9. The
current P9 driver works for P10, although does few extra
save-restores. P9 driver can provide the required power management
features like SMT thread folding and core level power savings on a P10
platform.

Until the P10 stop driver is available, revert the commit which allows
for only P9 systems to utilize cpuidle and blocks all idle stop states
for P10. CPU idle states are enabled and tested on the P10 platform
with this fix.

This reverts commit 8747bf36f312356f8a295a0c39ff092d65ce75ae.

Fixes: 8747bf36f312 ("powerpc/powernv/idle: Replace CPU feature check with PVR check")
Signed-off-by: Pratik Rajesh Sampat <[email protected]>
Reviewed-by: Vaidyanathan Srinivasan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/perf: Fix reading of MSR[HV/PR] bits in trace-imc

IMC trace-mode uses MSR[HV/PR] bits to set the cpumode for the
instruction pointer captured in each sample. The bits are fetched from
the third double word of the trace record. Reading third double word
from IMC trace record should use be64_to_cpu() along with READ_ONCE
inorder to fetch correct MSR[HV/PR] bits. Patch addresses this change.

Currently we are using PERF_RECORD_MISC_HYPERVISOR as cpumode if MSR
HV is 1 and PR is 0 which means the address is from host counter. But
using PERF_RECORD_MISC_HYPERVISOR for host counter data will fail to
resolve the address -> symbol during "perf report" because perf tools
side uses PERF_RECORD_MISC_KERNEL to represent the host counter data.
Therefore, fix the trace imc sample data to use
PERF_RECORD_MISC_KERNEL as cpumode for host kernel information.

Fixes: 77ca3951cc37 ("powerpc/perf: Add kernel support for new MSR[HV PR] bits in trace-imc")
Signed-off-by: Athira Rajeev <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/perf: Fix crashes with generic_compat_pmu & BHRB

The bhrb_filter_map ("The Branch History Rolling Buffer") callback is
only defined in raw CPUs' power_pmu structs. The "architected" CPUs
use generic_compat_pmu, which does not have this callback, and crashes
occur if a user tries to enable branch stack for an event.

This add a NULL pointer check for bhrb_filter_map() which behaves as
if the callback returned an error.

This does not add the same check for config_bhrb() as the only caller
checks for cpuhw->bhrb_users which remains zero if bhrb_filter_map==0.

Fixes: be80e758d0c2 ("powerpc/perf: Add generic compat mode pmu driver")
Cc: [email protected] # v5.2+
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Madhavan Srinivasan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/64s: Fix crash in load_fp_state() due to fpexc_mode

The recent commit 01eb01877f33 ("powerpc/64s: Fix restore_math
unnecessarily changing MSR") changed some of the handling of floating
point/vector restore.

In particular it caused current->thread.fpexc_mode to be copied into
the current MSR (via msr_check_and_set()), rather than just into
regs->msr (which is moved into MSR on return to userspace).

This can lead to a crash in the kernel if we take a floating point
exception when restoring FPSCR:

  Oops: Exception in kernel mode, sig: 8 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in:
  CPU: 3 PID: 101213 Comm: ld64.so.2 Not tainted 5.9.0-rc1-00098-g18445bf405cb-dirty #9
  NIP:  c00000000000fbb4 LR: c00000000001a7ac CTR: c000000000183570
  REGS: c0000016b7cfb3b0 TRAP: 0700   Not tainted  (5.9.0-rc1-00098-g18445bf405cb-dirty)
  MSR:  900000000290b933 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 44002444  XER: 00000000
  CFAR: c00000000001a7a8 IRQMASK: 1
  GPR00: c00000000001ae40 c0000016b7cfb640 c0000000011b7f00 c000001542a0f740
  GPR04: c000001542a0f720 c000001542a0eb00 0000000000000900 c000001542a0eb00
  GPR08: 000000000000000a 0000000000002000 9000000000009033 0000000000000000
  GPR12: 0000000000004000 c0000017ffffd900 0000000000000001 c000000000df5a58
  GPR16: c000000000e19c18 c0000000010e1123 0000000000000001 c000000000e1a638
  GPR20: 0000000000000000 c0000000044b1d00 0000000000000000 c000001542a0f2a0
  GPR24: 00000016c7fe0000 c000001542a0f720 c000000001c93da0 c000000000fe5f28
  GPR28: c000001542a0f720 0000000000800000 c0000016b7cfbe90 0000000002802900
  NIP load_fp_state+0x4/0x214
  LR  restore_math+0x17c/0x1f0
  Call Trace:
    0xc0000016b7cfb680 (unreliable)
    __switch_to+0x330/0x460
    __schedule+0x318/0x920
    schedule+0x74/0x140
    schedule_timeout+0x318/0x3f0
    wait_for_completion+0xc8/0x210
    call_usermodehelper_exec+0x234/0x280
    do_coredump+0xedc/0x13c0
    get_signal+0x1d4/0xbe0
    do_notify_resume+0x1a0/0x490
    interrupt_exit_user_prepare+0x1c4/0x230
    interrupt_return+0x14/0x1c0
  Instruction dump:
  ebe10168 e88101a0 7c8ff120 382101e0 e8010010 7c0803a6 4e800020 790605c4
  782905c4 7c0008a8 7c0008a8 c8030200 <fffe058e> 48000088 c8030000 c8230010

Fix it by only loading the fpexc_mode value into regs->msr.

Also add a comment to explain that although VSX is subject to the
value of fpexc_mode, we don't have to handle that separately because
we only allow VSX to be enabled if FP is also enabled.

Fixes: 01eb01877f33 ("powerpc/64s: Fix restore_math unnecessarily changing MSR")
Reported-by: Milton Miller <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/64s: scv entry should set PPR

Kernel entry sets PPR to HMT_MEDIUM by convention. The scv entry
path missed this.

Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions")
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Documentation/powerpc: fix malformed table in syscall64-abi

Fix malformed table warning in powerpc/syscall64-abi.rst by making
two tables and moving the headings.

Documentation/powerpc/syscall64-abi.rst:53: WARNING: Malformed table.
Text in column margin in table line 2.

  =========== ============= ========================================
  --- For the sc instruction, differences with the ELF ABI ---
  r0          Volatile      (System call number.)
  r3          Volatile      (Parameter 1, and return value.)
  r4-r8       Volatile      (Parameters 2-6.)
  cr0         Volatile      (cr0.SO is the return error condition.)
  cr1, cr5-7  Nonvolatile
  lr          Nonvolatile

  --- For the scv 0 instruction, differences with the ELF ABI ---
  r0          Volatile      (System call number.)
  r3          Volatile      (Parameter 1, and return value.)
  r4-r8       Volatile      (Parameters 2-6.)
  =========== ============= ========================================

Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions")
Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

video: fbdev: controlfb: Fix build for COMPILE_TEST=y && PPC_PMAC=n

The build is currently broken, if COMPILE_TEST=y and PPC_PMAC=n:

  linux/drivers/video/fbdev/controlfb.c: In function ‘control_set_hardware’:
  linux/drivers/video/fbdev/controlfb.c:276:2: error: implicit declaration of function ‘btext_update_display’
    276 |  btext_update_display(p->frame_buffer_phys + CTRLFB_OFF,
        |  ^~~~~~~~~~~~~~~~~~~~

Fix it by including btext.h whenever CONFIG_BOOTX_TEXT is enabled.

Fixes: a07a63b0e24d ("video: fbdev: controlfb: add COMPILE_TEST support")
Signed-off-by: Michael Ellerman <[email protected]>
Acked-by: Bartlomiej Zolnierkiewicz <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

x86/irq: Unbreak interrupt affinity setting

Several people reported that 5.8 broke the interrupt affinity setting
mechanism.

The consolidation of the entry code reused the regular exception entry code
for device interrupts and changed the way how the vector number is conveyed
from ptregs->orig_ax to a function argument.

The low level entry uses the hardware error code slot to push the vector
number onto the stack which is retrieved from there into a function
argument and the slot on stack is set to -1.

The reason for setting it to -1 is that the error code slot is at the
position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax
indicates that the entry came via a syscall. If it's not set to a negative
value then a signal delivery on return to userspace would try to restart a
syscall. But there are other places which rely on pt_regs::orig_ax being a
valid indicator for syscall entry.

But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the
interrupt affinity setting mechanism, which was overlooked when this change
was made.

Moving interrupts on x86 happens in several steps. A new vector on a
different CPU is allocated and the relevant interrupt source is
reprogrammed to that. But that's racy and there might be an interrupt
already in flight to the old vector. So the old vector is preserved until
the first interrupt arrives on the new vector and the new target CPU. Once
that happens the old vector is cleaned up, but this cleanup still depends
on the vector number being stored in pt_regs::orig_ax, which is now -1.

That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector
always false. As a consequence the interrupt is moved once, but then it
cannot be moved anymore because the cleanup of the old vector never
happens.

There would be several ways to convey the vector information to that place
in the guts of the interrupt handling, but on deeper inspection it turned
out that this check is pointless and a leftover from the old affinity model
of X86 which supported multi-CPU affinities. Under this model it was
possible that an interrupt had an old and a new vector on the same CPU, so
the vector match was required.

Under the new model the effective affinity of an interrupt is always a
single CPU from the requested affinity mask. If the affinity mask changes
then either the interrupt stays on the CPU and on the same vector when that
CPU is still in the new affinity mask or it is moved to a different CPU, but
it is never moved to a different vector on the same CPU.

Ergo the cleanup check for the matching vector number is not required and
can be removed which makes the dependency on pt_regs:orig_ax go away.

The remaining check for new_cpu == smp_processsor_id() is completely
sufficient. If it matches then the interrupt was successfully migrated and
the cleanup can proceed.

For paranoia sake add a warning into the vector assignment code to
validate that the assumption of never moving to a different vector on
the same CPU holds.

Fixes: 633260fa143 ("x86/irq: Convey vector as argument and not in ptregs")
Reported-by: Alex bykov <[email protected]>
Reported-by: Avi Kivity <[email protected]>
Reported-by: Alexander Graf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Alexander Graf <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]

x86/hotplug: Silence APIC only after all interrupts are migrated

There is a race when taking a CPU offline. Current code looks like this:

native_cpu_disable()
{
...
apic_soft_disable();
/*
* Any existing set bits for pending interrupt to
* this CPU are preserved and will be sent via IPI
* to another CPU by fixup_irqs().
*/
cpu_disable_common();
{
....
/*
* Race window happens here. Once local APIC has been
* disabled any new interrupts from the device to
* the old CPU are lost
*/
fixup_irqs(); // Too late to capture anything in IRR.
...
}
}

The fix is to disable the APIC *after* cpu_disable_common().

Testing was done with a USB NIC that provided a source of frequent
interrupts. A script migrated interrupts to a specific CPU and
then took that CPU offline.

Fixes: 60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead")
Reported-by: Evan Green <[email protected]>
Signed-off-by: Ashok Raj <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Mathias Nyman <[email protected]>
Tested-by: Evan Green <[email protected]>
Reviewed-by: Evan Green <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/r/[email protected]

USB: Ignore UAS for JMicron JMS567 ATA/ATAPI Bridge

This device does not support UAS properly and a similar entry already
exists in drivers/usb/storage/unusual_uas.h. Without this patch,
storage_probe() defers the handling of this device to UAS, which cannot
handle it either.

Tested-by: Brice Goglin <[email protected]>
Fixes: bc3bdb12bbb3 ("usb-storage: Disable UAS on JMicron SATA enclosure")
Acked-by: Alan Stern <[email protected]>
CC: <[email protected]>
Signed-off-by: Cyril Roelandt <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: host: ohci-exynos: Fix error handling in exynos_ohci_probe()

If the function platform_get_irq() failed, the negative value
returned will not be detected here. So fix error handling in
exynos_ohci_probe(). And when get irq failed, the function
platform_get_irq() logs an error message, so remove redundant
message here.

Fixes: 62194244cf87 ("USB: Add Samsung Exynos OHCI diver")
Signed-off-by: Zhang Shengju <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Tang Bin <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: gadget: u_f: Unbreak offset calculation in VLAs

Inadvertently the commit b1cd1b65afba ("USB: gadget: u_f: add overflow checks
to VLA macros") makes VLA macros to always return 0 due to different scope of
two variables of the same name. Obviously we need to have only one.

Fixes: b1cd1b65afba ("USB: gadget: u_f: add overflow checks to VLA macros")
Reported-by: Marek Szyprowski <[email protected]>
Tested-by: Marek Szyprowski <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Brooke Basile <[email protected]>
Cc: stable <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: quirks: Ignore duplicate endpoint on Sound Devices MixPre-D

The Sound Devices MixPre-D audio card suffers from the same defect
as the Sound Devices USBPre2: an endpoint shared between a normal
audio interface and a vendor-specific interface, in violation of the
USB spec. Since the USB core now treats duplicated endpoints as bugs
and ignores them, the audio endpoint isn't available and the card
can't be used for audio capture.

Along the same lines as commit bdd1b147b802 ("USB: quirks: blacklist
duplicate ep on Sound Devices USBPre2"), this patch adds a quirks
entry saying to ignore ep5in for interface 1, leaving it available for
use with standard audio interface 2.

Reported-and-tested-by: Jean-Christophe Barnoud <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
CC: <[email protected]>
Fixes: 3e4f8e21c4f2 ("USB: core: fix check for duplicate endpoints")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

dma-pool: Fix an uninitialized variable bug in atomic_pool_expand()

The "page" pointer can be used with out being initialized.

Fixes: d7e673ec2c8e ("dma-pool: Only allocate from CMA when in same memory zone")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

arm/xen: Add misuse warning to virt_to_gfn

As virt_to_gfn uses virt_to_phys, it will return invalid addresses when
used with vmalloc'd addresses. This patch introduces a warning, when
virt_to_gfn is used in this way.

Signed-off-by: Simon Leiner <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>

xen/xenbus: Fix granting of vmalloc'd memory

On some architectures (like ARM), virt_to_gfn cannot be used for
vmalloc'd memory because of its reliance on virt_to_phys. This patch
introduces a check for vmalloc'd addresses and obtains the PFN using
vmalloc_to_pfn in that case.

Signed-off-by: Simon Leiner <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>

XEN uses irqdesc::irq_data_common::handler_data to store a per interrupt
XEN data pointer which contains XEN specific information.

handler data is meant for interrupt handlers and not for storing irq chip
specific information as some devices require handler data to store internal
per interrupt information, e.g. pinctrl/GPIO chained interrupt handlers.

This obviously creates a conflict of interests and crashes the machine
because the XEN pointer is overwritten by the driver pointer.

As the XEN data is not handler specific it should be stored in
irqdesc::irq_data::chip_data instead.

A simple sed s/irq_[sg]et_handler_data/irq_[sg]et_chip_data/ cures that.

Cc: [email protected]
Reported-by: Roman Shaposhnik <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Roman Shaposhnik <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>

Merge tag 'amd-drm-fixes-5.9-2020-08-26' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

amd-drm-fixes-5.9-2020-08-26:

amdgpu:
- Misc display fixes
- Backlight fixes
- MPO fix for DCN1
- Fixes for Sienna Cichlid
- Fixes for Navy Flounder
- Vega SW CTF fixes
- SMU fix for Raven
- Fix a possible overflow in INFO ioctl
- Gfx10 clockgating fix

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-msm-fixes-2020-08-24' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

Some fixes for v5.9 plus the one opp/bandwidth scaling patch ("drm:
msm: a6xx: use dev_pm_opp_set_bw to scale DDR") which was not included
in the initial pull due to dependency on patch landing thru OPP tree

Signed-off-by: Dave Airlie <[email protected]>
From: Rob Clark <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/

Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm-fixes

Two fixes:
One fixes a bad interaction with the DRM scheduler, leading to some dma
fences not getting signalled after hitting the job timeout. The other
one fixes a GPU init regression, as apparently one old core doesn't
likes us reading some of the identification registers.

Signed-off-by: Dave Airlie <[email protected]>
From: Lucas Stach <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'exynos-drm-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes

One fixup
- Just drop __iommu annotation to fix sparse warning.

Signed-off-by: Dave Airlie <[email protected]>
From: Inki Dae <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

io_uring: clear req->result on IOPOLL re-issue

Make sure we clear req->result, which was set to -EAGAIN for retry
purposes, when moving it to the reissue list. Otherwise we can end up
retrying a request more than once, which leads to weird results in
the io-wq handling (and other spots).

Cc: [email protected]
Reported-by: Andres Freund <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

xfs: fix boundary test in xfs_attr_shortform_verify

The boundary test for the fixed-offset parts of xfs_attr_sf_entry in
xfs_attr_shortform_verify is off by one, because the variable array
at the end is defined as nameval[1] not nameval[].
Hence we need to subtract 1 from the calculation.

This can be shown by:

# touch file
# setfattr -n root.a file

and verifications will fail when it's written to disk.

This only matters for a last attribute which has a single-byte name
and no value, otherwise the combination of namelen & valuelen will
push endp further out and this test won't fail.

Fixes: 1e1bbd8e7ee06 ("xfs: create structure verifier function for shortform xattrs")
Signed-off-by: Eric Sandeen <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>

xfs: fix off-by-one in inode alloc block reservation calculation

The inode chunk allocation transaction reserves inobt_maxlevels-1
blocks to accommodate a full split of the inode btree. A full split
requires an allocation for every existing level and a new root
block, which means inobt_maxlevels is the worst case block
requirement for a transaction that inserts to the inobt. This can
lead to a transaction block reservation overrun when tmpfile
creation allocates an inode chunk and expands the inobt to its
maximum depth. This problem has been observed in conjunction with
overlayfs, which makes frequent use of tmpfiles internally.

The existing reservation code goes back as far as the Linux git repo
history (v2.6.12). It was likely never observed as a problem because
the traditional file/directory creation transactions also include
worst case block reservation for directory modifications, which most
likely is able to make up for a single block deficiency in the inode
allocation portion of the calculation. tmpfile support is relatively
more recent (v3.15), less heavily used, and only includes the inode
allocation block reservation as tmpfiles aren't linked into the
directory tree on creation.

Fix up the inode alloc block reservation macro and a couple of the
block allocator minleft parameters that enforce an allocation to
leave enough free blocks in the AG for a full inobt split.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

xfs: finish dfops on every insert range shift iteration

The recent change to make insert range an atomic operation used the
incorrect transaction rolling mechanism. The explicit transaction
roll does not finish deferred operations. This means that intents
for rmapbt updates caused by extent shifts are not logged until the
final transaction commits. Thus if a crash occurs during an insert
range, log recovery might leave the rmapbt in an inconsistent state.
This was discovered by repeated runs of generic/455.

Update insert range to finish dfops on every shift iteration. This
is similar to collapse range and ensures that intents are logged
with the transactions that make associated changes.

Fixes: dd87f87d87fa ("xfs: rework insert range into an atomic operation")
Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

drm/amd/display: Fix memleak in amdgpu_dm_mode_config_init

When amdgpu_display_modeset_create_props() fails, state and
state->context should be freed to prevent memleak. It's the
same when amdgpu_dm_audio_init() fails.

Signed-off-by: Dinghao Liu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: disable runtime pm for navy_flounder

Disable runtime pm for navy_flounder temporarily.

Signed-off-by: Jiansong Chen <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Retry AUX write when fail occurs

[Why]
In dm_dp_aux_transfer() now, we forget to handle AUX_WR fail cases. We
suppose every write wil get done successfully and hence some AUX
commands might not sent out indeed.

[How]
Check if AUX_WR success. If not, retry it.

Signed-off-by: Wayne Lin <[email protected]>
Reviewed-by: Hersen Wu <[email protected]>
Acked-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: Fix buffer overflow in INFO ioctl

The values for "se_num" and "sh_num" come from the user in the ioctl.
They can be in the 0-255 range but if they're more than
AMDGPU_GFX_MAX_SE (4) or AMDGPU_GFX_MAX_SH_PER_SE (2) then it results in
an out of bounds read.

Reported-by: Dan Carpenter <[email protected]>
Acked-by: Dan Carpenter <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/powerplay: Fix hardmins not being sent to SMU for RV

[Why]
DC uses these to raise the voltage as needed for higher dispclk/dppclk
and to ensure that we have enough bandwidth to drive the displays.

There's a bug preventing these from actuially sending messages since
it's checking the actual clock (which is 0) instead of the incoming
clock (which shouldn't be 0) when deciding to send the hardmin.

[How]
Check the clocks != 0 instead of the actual clocks.

Fixes: 9ed9203c3ee7 ("drm/amd/powerplay: rv dal-pplib interface refactor powerplay part")
Signed-off-by: Nicholas Kazlauskas <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amdgpu: use MODE1 reset for navy_flounder by default

Switch default gpu reset method to MODE1 for navy_flounder.

Signed-off-by: Jiansong Chen <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm: correct the thermal alert temperature limit settings

Do the maths in celsius degree. This can fix the issues caused
by the changes below:

drm/amd/pm: correct Vega20 swctf limit setting
drm/amd/pm: correct Vega12 swctf limit setting
drm/amd/pm: correct Vega10 swctf limit setting

Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amdgpu: add asd fw check before loading asd

asd is not ready for some ASICs in early stage, and psp->asd_fw is more generic than ASIC name in the check.

Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Reviewed-by: Jiansong Chen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Keep current gain when ABM disable immediately

[Why]
When system enters s3/s0i3, backlight PWM would set user level.

[How]
ABM disable function add keep current gain to avoid it.

Signed-off-by: Brandon Syu <[email protected]>
Reviewed-by: Josip Pavic <[email protected]>
Acked-by: Eryk Brol <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Fix passive dongle mistaken as active dongle in EDID emulation

[Why]
dongle_type is set during dongle connection but for passive dongles,
dongle_type is not set. If user starts with an active dongle and
then switches to a passive dongle, it will still report as an active
dongle. Trying to emulate the wrong connecter type results in display
not lighting up.

[How]
Set dpcd_caps.dongle_type for passive dongles in detect_dp().

Signed-off-by: Samson Tam <[email protected]>
Reviewed-by: Joshua Aberback <[email protected]>
Acked-by: Eryk Brol <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Revert HDCP disable sequence change

[Why]
Revert HDCP disable sequence change that blanks stream before
disabling HDCP. PSP and HW teams are currently investigating the
root cause of why HDCP cannot be disabled before stream blank,
which is expected to work without issues.

Signed-off-by: Jaehyun Chung <[email protected]>
Reviewed-by: Wenjing Liu <[email protected]>
Acked-by: Eryk Brol <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Send DISPLAY_OFF after power down on boot

[WHY]
update_clocks might not be called on headless adapters. This means
DISPLAY_OFF may not be sent in headless cases.

[HOW]
If hardware is powered down on boot because it is headless (mode set
does not happen on that adapter) also send DISPLAY_OFF notification.

Signed-off-by: Sung Lee <[email protected]>
Reviewed-by: Yongqiang Sun <[email protected]>
Acked-by: Eryk Brol <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/gfx10: refine mgcg setting

1. enable ENABLE_CGTS_LEGACY to fix specviewperf11 random hang.
2. remove obsolete RLC_CGTT_SCLK_OVERRIDE workaround.

Signed-off-by: Jiansong Chen <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/pm: correct Vega20 swctf limit setting

Correct the Vega20 thermal swctf limit.

Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/pm: correct Vega12 swctf limit setting

Correct the Vega12 thermal swctf limit.

Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/pm: correct Vega10 swctf limit setting

Correct the Vega10 thermal swctf limit.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1267

Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/pm: set VCN pg per instances

When deciding whether to set pg for vcn1, instances
number is more generic than chip name.

Signed-off-by: Jiansong Chen <[email protected]>
Reviewed-by: Tao Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm: enable run_btc callback for sienna_cichlid

DC BTC support for sienna_cichlid is added, it provides
the DC tolerance and aging measurements.

Signed-off-by: Jiansong Chen <[email protected]>
Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drivers: gpu: amd: Initialize amdgpu_dm_backlight_caps object to 0 in amdgpu_dm_update_backlight_caps

In `amdgpu_dm_update_backlight_caps()`, there is a local
`amdgpu_dm_backlight_caps` object that is filled in by
`amdgpu_acpi_get_backlight_caps()`. However, this object is
uninitialized before the call and hence the subsequent check for
aux_support can fail since it is not initialized by
`amdgpu_acpi_get_backlight_caps()` as well. This change initializes
this local `amdgpu_dm_backlight_caps` object to 0.

Reviewed-by: Christian König <[email protected]>
Signed-off-by: Furquan Shaikh <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: Reject overlay plane configurations in multi-display scenarios

[Why]
These aren't stable on some platform configurations when driving
multiple displays, especially on higher resolution.

In particular the delay in asserting p-state and validating from
x86 outweights any power or performance benefit from the hardware
composition.

Under some configurations this will manifest itself as extreme stutter
or unresponsiveness especially when combined with cursor movement.

[How]
Disable these for now. Exposing overlays to userspace doesn't guarantee
that they'll be able to use them in any and all configurations and it's
part of the DRM contract to have userspace gracefully handle validation
failures when they occur.

Valdiation occurs as part of DC and this in particular affects RV, so
disable this in dcn10_global_validation.

Signed-off-by: Nicholas Kazlauskas <[email protected]>
Reviewed-by: Hersen Wu <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/display: use correct scale for actual_brightness

Documentation for sysfs backlight level interface requires that
values in both 'brightness' and 'actual_brightness' files are
interpreted to be in range from 0 to the value given in the
'max_brightness' file.

With amdgpu, max_brightness gives 255, and values written by the user
into 'brightness' are internally rescaled to a wider range. However,
reading from 'actual_brightness' gives the raw register value without
inverse rescaling. This causes issues for various userspace tools such
as PowerTop and systemd that expect the value to be in the correct
range.

Introduce a helper to retrieve internal backlight range. Use it to
reimplement 'convert_brightness' as 'convert_brightness_from_user' and
introduce 'convert_brightness_to_user'.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=203905
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1242
Cc: Alex Deucher <[email protected]>
Cc: Nicholas Kazlauskas <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alexander Monakov <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/display: should check error using DC_OK

core_link_read_dpcd returns only DC_OK(1) and DC_ERROR_UNEXPECTED(-1),
the caller should check error using DC_OK instead of checking against 0

Signed-off-by: Tong Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

iio: dpot-dac: fix code comment in dpot_dac_read_raw()

After the replacement of the /* fall through */ comment with the
fallthrough pseudo-keyword macro, the natural reading of a code
comment was broken.

Fix the natural reading of such a comment and make it intelligible.

Reported-by: Peter Rosin <[email protected]>
Acked-by: Peter Rosin <[email protected]>
Signed-off-by: Gustavo A. R. Silva <[email protected]>

Merge tag 'tty-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
"Here are a few small TTY/Serial/vt fixes for 5.9-rc3

  Included in here are:
   - qcom serial fixes
   - vt ioctl and core bugfixes
   - pl011 serial driver fixes
   - 8250 serial driver fixes
   - other misc serial driver fixes

  and for good measure:
   - fbcon fix for syzbot found problem.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: serial: imx: add dependence and build for earlycon
  serial: samsung: Removes the IRQ not found warning
  serial: 8250: change lock order in serial8250_do_startup()
  serial: stm32: avoid kernel warning on absence of optional IRQ
  serial: pl011: Fix oops on -EPROBE_DEFER
  serial: pl011: Don't leak amba_ports entry on driver register error
  serial: 8250_exar: Fix number of ports for Commtech PCIe cards
  tty: serial: qcom_geni_serial: Drop __init from qcom_geni_console_setup
  serial: qcom_geni_serial: Fix recent kdb hang
  vt_ioctl: change VT_RESIZEX ioctl to check for error return from vc_resize()
  fbcon: prevent user font height or width change from causing potential out-of-bounds access
  vt: defer kfree() of vc_screenbuf in vc_do_resize()

Merge tag 'char-misc-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are some small char and misc and other driver subsystem fixes for
  5.9-rc3.

  The majority of these are tiny habanalabs driver fixes, but also in
  here are:

   - speakup build fixes now that it is out of staging and got exposed
     to more build systems all of a sudden

   - mei driver fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  habanalabs: correctly report inbound pci region cfg error
  habanalabs: check correct vmalloc return code
  habanalabs: validate FW file size
  habanalabs: fix incorrect check on failed workqueue create
  habanalabs: set max power according to card type
  habanalabs: proper handling of alloc size in coresight
  habanalabs: set clock gating according to mask
  habanalabs: verify user input in cs_ioctl_signal_wait
  habanalabs: Fix a loop in gaudi_extract_ecc_info()
  habanalabs: Fix memory corruption in debugfs
  habanalabs: validate packet id during CB parse
  habanalabs: Validate user address before mapping
  habanalabs: unmap PCI bars upon iATU failure
  mei: hdcp: fix mei_hdcp_verify_mprime() input parameter
  speakup: only build serialio when ISA is enabled
  speakup: Fix wait_for_xmitr for ttyio case

Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:
"Two patches from Vineeth to improve Hyper-V timesync facility"

* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
hv_utils: drain the timesync packets on onchannelcallback
hv_utils: return error if host timesysnc update is stale

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio bugfixes from Michael Tsirkin:
"A couple vdpa and vhost bugfixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa/mlx5: Avoid warnings about shifts on 32-bit platforms
  vhost-iotlb: fix vhost_iotlb_itree_next() documentation
  vdpa: ifcvf: free config irq in ifcvf_free_irq()
  vdpa: ifcvf: return err when fail to request config irq

io_uring: make offset == -1 consistent with preadv2/pwritev2

The man page for io_uring generally claims were consistent with what
preadv2 and pwritev2 accept, but turns out there's a slight discrepancy
in how offset == -1 is handled for pipes/streams. preadv doesn't allow
it, but preadv2 does. This currently causes io_uring to return -EINVAL
if that is attempted, but we should allow that as documented.

This change makes us consistent with preadv2/pwritev2 for just passing
in a NULL ppos for streams if the offset is -1.

Cc: [email protected] # v5.7+
Reported-by: Benedikt Ames <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers

Add an entry and email addresses for people at HPE who are supporting
Linux on the Superdome Flex (a.k.a) UV platform.

[ bp: Capitalize "linux" too :) ]

Signed-off-by: Steve Wahl <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

s390/vmem: fix vmem_add_range for 4-level paging

The kernel currently crashes if 4-level paging is used. Add missing
p4d_populate for just allocated pud entry.

Fixes: 3e0d3e408e63 ("s390/vmem: consolidate vmem_add_range() and vmem_remove_range()")
Reviewed-by: Gerald Schaefer <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390: don't trace preemption in percpu macros

Since commit a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context}
to per-cpu variables") the lockdep code itself uses percpu variables. This
leads to recursions because the percpu macros are calling preempt_enable()
which might call trace_preempt_on().

Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Vasily Gorbik <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

loop: Set correct device size when using LOOP_CONFIGURE

The device size calculation was done before processing the loop
configuration, which meant that the we set the size on the underlying
block device incorrectly in case lo_offset/lo_sizelimit were set in the
configuration. Delay computing the size until we've setup the device
parameters correctly.

Fixes: 3448914e8cc5("loop: Add LOOP_CONFIGURE ioctl")
Reported-by: Lennart Poettering <[email protected]>
Tested-by: Yang Xu <[email protected]>
Signed-off-by: Martijn Coenen <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

nbd: restore default timeout when setting it to zero

If we configured io timeout of nbd0 to 100s. Later after we
finished using it, we configured nbd0 again and set the io
timeout to 0. We expect it would timeout after 30 seconds
and keep retry. But in fact we could not change the timeout
when we set it to 0. the timeout is still the original 100s.

So change the timeout to default 30s when we set it to zero.
It also behaves same as commit 2da22da57348 ("nbd: fix zero
cmd timeout handling v2").

It becomes more important if we were reconfigure a nbd device
and the io timeout it set to zero. Because it could take 30s
to detect the new socket and thus io could be completed more
quickly compared to 100s.

Signed-off-by: Hou Pu <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

vdpa/mlx5: Avoid warnings about shifts on 32-bit platforms

Clang warns several times when building for 32-bit ARM along the lines
of:

drivers/vdpa/mlx5/net/mlx5_vnet.c:1462:31: warning: shift count >= width
of type [-Wshift-count-overflow]
ndev->mvdev.mlx_features |= BIT(VIRTIO_F_VERSION_1);
^~~~~~~~~~~~~~~~~~~~~~~

This is related to the BIT macro, which uses an unsigned long literal,
which is 32-bit on ARM so having a shift equal to or larger than 32 will
cause this warning, such as the above, where VIRTIO_F_VERSION_1 is 32.
To avoid this, use BIT_ULL, which will be an unsigned long long. This
matches the size of the features field throughout this driver, which is
u64 so there should be no functional change.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Link: https://github.com/ClangBuiltLinux/linux/issues/1140
Signed-off-by: Nathan Chancellor <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reported-by: Randy Dunlap <[email protected]>
Acked-by: Randy Dunlap <[email protected]> # build-tested
Acked-by: Eli Cohen <[email protected]>

vhost-iotlb: fix vhost_iotlb_itree_next() documentation

This patch contains trivial changes for the vhost_iotlb_itree_next()
documentation, fixing the function name and the description of
first argument (@map).

Signed-off-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>

vdpa: ifcvf: free config irq in ifcvf_free_irq()

We don't free config irq in ifcvf_free_irq() which will trigger a
BUG() in pci core since we try to free the vectors that has an
action. Fixing this by recording the config irq in ifcvf_hw structure
and free it in ifcvf_free_irq().

Fixes: e7991f376a4d ("ifcvf: implement config interrupt in IFCVF")
Cc: Zhu Lingshan <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Zhu Lingshan <[email protected]>
Fixes: e7991f376a4d ("ifcvf: implement config interrupt in IFCVF")
Cc: Zhu Lingshan <a class="moz-txt-link-rfc2396E" href="mailto:[email protected]"><[email protected]></a>
Signed-off-by: Jason Wang <a class="moz-txt-link-rfc2396E" href="mailto:[email protected]"><[email protected]></a>

vdpa: ifcvf: return err when fail to request config irq

We ignore the err of requesting config interrupt, fix this.

Fixes: e7991f376a4d ("ifcvf: implement config interrupt in IFCVF")
Cc: Zhu Lingshan <[email protected]>
Signed-off-by: Jason Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: Zhu Lingshan <[email protected]>
Fixes: e7991f376a4d ("ifcvf: implement config interrupt in IFCVF")
Cc: Zhu Lingshan <a class="moz-txt-link-rfc2396E" href="mailto:[email protected]"><[email protected]></a>
Signed-off-by: Jason Wang <a class="moz-txt-link-rfc2396E" href="mailto:[email protected]"><[email protected]></a>
Tested-by: Maxime Coquelin <[email protected]>

lockdep,trace: Expose tracepoints

The lockdep tracepoints are under the lockdep recursion counter, this
has a bunch of nasty side effects:

- TRACE_IRQFLAGS doesn't work across the entire tracepoint

- RCU-lockdep doesn't see the tracepoints either, hiding numerous
"suspicious RCU usage" warnings.

Pull the trace_lock_*() tracepoints completely out from under the
lockdep recursion handling and completely rely on the trace level
recusion handling -- also, tracing *SHOULD* not be taking locks in any
case.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Tested-by: Marco Elver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

lockdep: Only trace IRQ edges

Problem:

  raw_local_irq_save(); // software state on
  local_irq_save(); // software state off
  ...
  local_irq_restore(); // software state still off, because we don't enable IRQs
  raw_local_irq_restore(); // software state still off, *whoopsie*

existing instances:

- lock_acquire()
     raw_local_irq_save()
     __lock_acquire()
       arch_spin_lock(&graph_lock)
         pv_wait() := kvm_wait() (same or worse for Xen/HyperV)
           local_irq_save()

- trace_clock_global()
     raw_local_irq_save()
     arch_spin_lock()
       pv_wait() := kvm_wait()
local_irq_save()

- apic_retrigger_irq()
     raw_local_irq_save()
     apic->send_IPI() := default_send_IPI_single_phys()
       local_irq_save()

Possible solutions:

A) make it work by enabling the tracing inside raw_*()
B) make it work by keeping tracing disabled inside raw_*()
C) call it broken and clean it up now

Now, given that the only reason to use the raw_* variant is because you don't
want tracing. Therefore A) seems like a weird option (although it can be done).
C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just
because there is one raw user, this strips the validation/tracing off for all
the other users.

So we pick B) and declare any code that ends up doing:

raw_local_irq_save()
local_irq_save()
lockdep_assert_irqs_disabled();

broken. AFAICT this problem has existed forever, the only reason it came
up is because commit: 859d069ee1dd ("lockdep: Prepare for NMI IRQ
state tracking") changed IRQ tracing vs lockdep recursion and the
first instance is fairly common, the other cases hardly ever happen.

Signed-off-by: Nicholas Piggin <[email protected]>
[rewrote changelog]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Tested-by: Marco Elver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

mips: Implement arch_irqs_disabled()

Cc: Thomas Bogendoerfer <[email protected]>
Cc: Paul Burton <[email protected]>
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

arm64: Implement arch_irqs_disabled()

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

nds32: Implement arch_irqs_disabled()

Cc: Nick Hu <[email protected]>
Cc: Greentime Hu <[email protected]>
Cc: Vincent Chen <[email protected]>
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Tested-by: Marco Elver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

locking/lockdep: Cleanup

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Tested-by: Marco Elver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

x86/entry: Remove unused THUNKs

Unused remnants

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Tested-by: Marco Elver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

cpuidle: Move trace_cpu_idle() into generic code

Remove trace_cpu_idle() from the arch_cpu_idle() implementations and
put it in the generic code, right before disabling RCU. Gets rid of
more trace_*_rcuidle() users.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Tested-by: Marco Elver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic

This allows moving the leave_mm() call into generic code before
rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Tested-by: Marco Elver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

sched,idle,rcu: Push rcu_idle deeper into the idle path

Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
that the locking tracepoints were using RCU.

Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.

Specifically, sched_clock_idle_wakeup_event() will use ktime which
will use seqlocks which will tickle lockdep, and
stop_critical_timings() uses lock.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Tested-by: Marco Elver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

cpuidle: Fixup IRQ state

Match the pattern elsewhere in this file.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Tested-by: Marco Elver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

lockdep: Use raw_cpu_*() for per-cpu variables

Sven reported that commit a21ee6055c30 ("lockdep: Change
hardirq{s_enabled,_context} to per-cpu variables") caused trouble on
s390 because their this_cpu_*() primitives disable preemption which
then lands back tracing.

On the one hand, per-cpu ops should use preempt_*able_notrace() and
raw_local_irq_*(), on the other hand, we can trivialy use raw_cpu_*()
ops for this.

Fixes: a21ee6055c30 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
Reported-by: Sven Schnelle <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Acked-by: Rafael J. Wysocki <[email protected]>
Tested-by: Marco Elver <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

sched: Use __always_inline on is_idle_task()

is_idle_task() may be used from noinstr functions such as
irqentry_enter(). Since the compiler is free to not inline regular
inline functions, switch to using __always_inline.

Signed-off-by: Marco Elver <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

drm/exynos: gem: Fix sparse warning

kvaddr element of the exynos_gem object points to a memory buffer, thus
it should not have a __iomem annotation. Then, to avoid a warning or
casting on assignment to fbi structure, the screen_buffer element of the
union should be used instead of the screen_base.

Reported-by: kernel test robot <[email protected]>
Suggested-by: Sam Ravnborg <[email protected]>
Signed-off-by: Marek Szyprowski <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Reviewed-by: Sam Ravnborg <[email protected]>
Signed-off-by: Inki Dae <[email protected]>

Merge tag 'nfsd-5.9-1' of git://git.linux-nfs.org/projects/cel/cel-2.6

Pull nfs server fixes from Chuck Lever:

- Eliminate an oops introduced in v5.8

- Remove a duplicate #include added by nfsd-5.9

* tag 'nfsd-5.9-1' of git://git.linux-nfs.org/projects/cel/cel-2.6:
SUNRPC: remove duplicate include
nfsd: fix oops on mixed NFSv4/NFSv3 client access

Merge tag 'irqchip-fixes-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

Pull irqchip fixes from Marc Zyngier:

- Revert the wholesale conversion to platform drivers of the pdc, sysirq
   and cirq drivers, as it breaks a number of platforms even when the
   driver is built-in (probe ordering bites you).

- Prevent interrupt from being lost with the STM32 exti driver

- Fix wake-up interrupts for the MIPS Ingenic driver

- Fix an embarassing typo in the new module helpers, leading to the probe
   failing most of the time

- The promised TI firmware rework that couldn't make it into the merge
   window due to a very badly managed set of dependencies

Merge tag 'm68knommu-for-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu

Pull m68knommu fix from Greg Ungerer:
"Only a single fix for the binfmt_flat loader (reverting a recent
change)"

* tag 'm68knommu-for-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
binfmt_flat: revert "binfmt_flat: don't offset the data start"

io_uring: ensure read requests go through -ERESTART* transformation

We need to call kiocb_done() for any ret < 0 to ensure that we always
get the proper -ERESTARTSYS (and friends) transformation done.

At some point this should be tied into general error handling, so we
can get rid of the various (mostly network) related commands that check
and perform this substitution.

Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'libnvdimm-fix-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Vishal Verma:
"A couple of minor fixes for things merged in 5.9-rc1.

  One is an out-of-bounds access caught by KASAN, and the second is a
  tweak to some overzealous logging about dax support even for
  traditional block devices which was unnecessary"

* tag 'libnvdimm-fix-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: do not print error message for non-persistent memory block device
  libnvdimm: KASAN: global-out-of-bounds Read in internal_create_group

io_uring: don't use poll handler if file can't be nonblocking read/written

There's no point in using the poll handler if we can't do a nonblocking
IO attempt of the operation, since we'll need to go async anyway. In
fact this is actively harmful, as reading from eg pipes won't return 0
to indicate EOF.

Cc: [email protected] # v5.7+
Reported-by: Benedikt Ames <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

- regression fix / revert of a commit that intended to reduce probing
   delay by ~50ms, but introduced a race that causes quite a few devices
   not to enumerate, or get stuck on first IRQ

- buffer overflow fix in hiddev, from Peilin Ye

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  Revert "HID: usbhid: do not sleep when opening device"
  HID: hiddev: Fix slab-out-of-bounds write in hiddev_ioctl_usage()
  HID: quirks: Always poll three more Lenovo PixArt mice
  HID: i2c-hid: Always sleep 60ms after I2C_HID_PWR_ON commands
  HID: macally: Constify macally_id_table
  HID: cougar: Constify cougar_id_table

io_uring: fix imbalanced sqo_mm accounting

We do the initial accounting of locked_vm and pinned_vm before we have
setup ctx->sqo_mm, which means we can end up having not accounted the
memory at setup time, but still decrement it when we exit. This causes
an imbalance in the accounting.

Setup ctx->sqo_mm earlier in io_uring_create(), before we do the first
accounting of mm->{locked,pinned}_vm. This also unifies the state
grabbing for the ctx, and eliminates a failure case in
io_sq_offload_start().

Fixes: f74441e6311a ("io_uring: account locked memory before potential error case")
Reported-by: Robert M. Muncrief <[email protected]>
Reported-by: Niklas Schnelle <[email protected]>
Tested-by: Niklas Schnelle <[email protected]>
Tested-by: Robert M. Muncrief <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

PM: sleep: core: Fix the handling of pending runtime resume requests

It has been reported that system-wide suspend may be aborted in the
absence of any wakeup events due to unforseen interactions of it with
the runtume PM framework.

One failing scenario is when there are multiple devices sharing an
ACPI power resource and runtime-resume needs to be carried out for
one of them during system-wide suspend (for example, because it needs
to be reconfigured before the whole system goes to sleep).  In that
case, the runtime-resume of that device involves turning the ACPI
power resource "on" which in turn causes runtime-resume requests
to be queued up for all of the other devices sharing it.  Those
requests go to the runtime PM workqueue which is frozen during
system-wide suspend, so they are not actually taken care of until
the resume of the whole system, but the pm_runtime_barrier()
call in __device_suspend() sees them and triggers system wakeup
events for them which then cause the system-wide suspend to be
aborted if wakeup source objects are in active use.

Of course, the logic that leads to triggering those wakeup events is
questionable in the first place, because clearly there are cases in
which a pending runtime resume request for a device is not connected
to any real wakeup events in any way (like the one above).  Moreover,
it is racy, because the device may be resuming already by the time
the pm_runtime_barrier() runs and so if the driver doesn't take care
of signaling the wakeup event as appropriate, it will be lost.
However, if the driver does take care of that, the extra
pm_wakeup_event() call in the core is redundant.

Accordingly, drop the conditional pm_wakeup_event() call fron
__device_suspend() and make the latter call pm_runtime_barrier()
alone.  Also modify the comment next to that call to reflect the new
code and extend it to mention the need to avoid unwanted interactions
between runtime PM and system-wide device suspend callbacks.

Fixes: 1e2ef05bb8cf8 ("PM: Limit race conditions between runtime PM and system sleep (v2)")
Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Alan Stern <[email protected]>
Reported-by: Utkarsh H Patel <[email protected]>
Tested-by: Utkarsh H Patel <[email protected]>
Tested-by: Pengfei Xu <[email protected]>
Cc: All applicable <[email protected]>

ACPI: OSL: Prevent acpi_release_memory() from returning too early

After commit 1757659d022b ("ACPI: OSL: Implement deferred unmapping
of ACPI memory") in some cases acpi_release_memory() may return
before the target memory mappings actually go away, because they
are released asynchronously now.

Prevent it from returning prematurely by making it wait for the next
RCU grace period to elapse, for all of the RCU callbacks to complete
and for all of the scheduled work items to be flushed before
returning.

Fixes: 1757659d022b ("ACPI: OSL: Implement deferred unmapping of ACPI memory")
Reported-by: Kenneth R. Crudup <[email protected]>
Tested-by: Kenneth R. Crudup <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Heikki Krogerus <[email protected]>
Reviewed-by: Mika Westerberg <[email protected]>

usb: typec: tcpm: Fix Fix source hard reset response for TDA 2.3.1.1 and TDA 2.3.1.2 failures

The patch addresses the compliance test failures while running  TDA
2.3.1.1 and  TDA 2.3.1.2 of the "PD Communications Engine USB PD
Compliance MOI" test plan published in https://www.usb.org/usbc.
For a product to be Type-C compliant, it's expected that these tests
are run on usb.org certified Type-C compliance tester as mentioned in
https://www.usb.org/usbc.

While the purpose of TDA 2.3.1.1 and  TDA 2.3.1.2 is to verify that
the static and dynamic electrical capabilities of a Source meet the
requirements for each PDO offered,  while doing so, the tests also
monitor that the timing of the VBUS waveform versus the messages meets
the requirements for Hard Reset defined in PROT-PROC-HR-TSTR as
mentioned in step 11 of TDA.2.3.1.1 and step 15 of TDA.2.3.1.2.

TDB.2.2.13.1: PROT-PROC-HR-TSTR Procedure and Checks for Tester
Originated Hard Reset
Purpose: To perform the appropriate protocol checks relating to any
circumstance in which the Hard Reset signal is sent by the Tester.

UUT is behaving as source:
The Tester sends a Hard Reset signal.
1. Check VBUS stays within present valid voltage range for
tPSHardReset min (25ms) after last bit of Hard Reset signal.
[PROT_PROC_HR_TSTR_1]
2. Check that VBUS starts to fall below present valid voltage range by
tPSHardReset max (35ms). [PROT_PROC_HR_TSTR_2]
3. Check that VBUS reaches vSafe0V within tSafe0v max (650 ms).
[PROT_PROC_HR_TSTR_3]
4. Check that VBUS starts rising to vSafe5V after a delay of
tSrcRecover (0.66s - 1s) from reaching vSafe0V. [PROT_PROC_HR_TSTR_4]
5. Check that VBUS reaches vSafe5V within tSrcTurnOn max (275ms) of
rising above vSafe0v max (0.8V). [PROT_PROC_HR_TSTR_5] Power Delivery
Compliance Plan 139 6. Check that Source Capabilities are finished
sending within tFirstSourceCap max (250ms) of VBUS reaching vSafe5v
min. [PROT_PROC_HR_TSTR_6].

This is in line with 7.1.5 Response to Hard Resets of the USB Power
Delivery Specification Revision 3.0, Version 1.2,
"Hard Reset Signaling indicates a communication failure has occurred
and the Source Shall stop driving VCONN, Shall remove Rp from the
VCONN pin and Shall drive VBUS to vSafe0V as shown in Figure 7-9. The
USB connection May reset during a Hard Reset since the VBUS voltage
will be less than vSafe5V for an extended period of time. After
establishing the vSafe0V voltage condition on VBUS, the Source Shall
wait tSrcRecover before re-applying VCONN and restoring VBUS to
vSafe5V. A Source Shall conform to the VCONN timing as specified in
[USB Type-C 1.3]."

With the above guidelines from the spec in mind, TCPM does not turn
off VCONN while entering SRC_HARD_RESET_VBUS_OFF. The patch makes TCPM
turn off VCONN while entering SRC_HARD_RESET_VBUS_OFF and turn it back
on while entering SRC_HARD_RESET_VBUS_ON along with vbus instead of
having VCONN on through hardreset.

Also, the spec clearly states that "After establishing the vSafe0V
voltage condition on VBUS",  the Source Shall wait tSrcRecover before
re-applying VCONN and restoring VBUS to vSafe5V.
TCPM does not conform to this requirement. If the TCPC driver calls
tcpm_vbus_change with vbus off signal, TCPM right away enters
SRC_HARD_RESET_VBUS_ON without waiting for tSrcRecover.
For TCPC's which are buggy/does not call tcpm_vbus_change, TCPM
assumes that the vsafe0v is instantaneous as TCPM only waits
tSrcRecover instead of waiting for tSafe0v + tSrcRecover.
This patch also fixes this behavior by making sure that TCPM waits for
tSrcRecover before transitioning into SRC_HARD_RESET_VBUS_ON when
tcpm_vbus_change is called by TCPC.
When TCPC does not call tcpm_vbus_change, TCPM assumes the worst case
i.e.  tSafe0v + tSrcRecover before transitioning into
SRC_HARD_RESET_VBUS_ON.

Signed-off-by: Badhri Jagan Sridharan <[email protected]>
Reviewed-by: Guenter Roeck <[email protected]>
Reviewed-by: Heikki Krogerus <[email protected]>
Cc: stable <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: PHY: JZ4770: Fix static checker warning.

The commit 2a6c0b82e651 ("USB: PHY: JZ4770: Add support for new
Ingenic SoCs.") introduced the initialization function for different
chips, but left the relevant code involved in the resetting process
in the original function, resulting in uninitialized variable calls.

Fixes: 2a6c0b82e651 ("USB: PHY: JZ4770: Add support for new Ingenic SoCs.").
Signed-off-by: 周琰杰 (Zhou Yanjie) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: gadget: f_ncm: add bounds checks to ncm_unwrap_ntb()

Some values extracted by ncm_unwrap_ntb() could possibly lead to several
different out of bounds reads of memory. Specifically the values passed
to netdev_alloc_skb_ip_align() need to be checked so that memory is not
overflowed.

Resolve this by applying bounds checking to a number of different
indexes and lengths of the structure parsing logic.

Reported-by: Ilja Van Sprundel <[email protected]>
Signed-off-by: Brooke Basile <[email protected]>
Acked-by: Felipe Balbi <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: gadget: u_f: add overflow checks to VLA macros

size can potentially hold an overflowed value if its assigned expression
is left unchecked, leading to a smaller than needed allocation when
vla_group_size() is used by callers to allocate memory.
To fix this, add a test for saturation before declaring variables and an
overflow check to (n) * sizeof(type).
If the expression results in overflow, vla_group_size() will return SIZE_MAX.

Reported-by: Ilja Van Sprundel <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Brooke Basile <[email protected]>
Acked-by: Felipe Balbi <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

io_uring: revert consumed iov_iter bytes on error

Some consumers of the iov_iter will return an error, but still have
bytes consumed in the iterator. This is an issue for -EAGAIN, since we
rely on a sane iov_iter state across retries.

Fix this by ensuring that we revert consumed bytes, if any, if the file
operations have consumed any bytes from iterator. This is similar to what
generic_file_read_iter() does, and is always safe as we have the previous
bytes count handy already.

Fixes: ff6165b2d7f6 ("io_uring: retain iov_iter state over io_read/io_write calls")
Reported-by: Dmitry Shulyak <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>