Git Repo - linux.git/log

btrfs: fix lockdep splat when reading qgroup config on mount

Lockdep reported the following splat when running test btrfs/190 from
fstests:

  [ 9482.126098] ======================================================
  [ 9482.126184] WARNING: possible circular locking dependency detected
  [ 9482.126281] 5.10.0-rc4-btrfs-next-73 #1 Not tainted
  [ 9482.126365] ------------------------------------------------------
  [ 9482.126456] mount/24187 is trying to acquire lock:
  [ 9482.126534] ffffa0c869a7dac0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}, at: qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.126647]
but task is already holding lock:
  [ 9482.126777] ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.126886]
which lock already depends on the new lock.

  [ 9482.127078]
the existing dependency chain (in reverse order) is:
  [ 9482.127213]
-> #1 (btrfs-quota-00){++++}-{3:3}:
  [ 9482.127366]        lock_acquire+0xd8/0x490
  [ 9482.127436]        down_read_nested+0x45/0x220
  [ 9482.127528]        __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.127613]        btrfs_read_lock_root_node+0x41/0x130 [btrfs]
  [ 9482.127702]        btrfs_search_slot+0x514/0xc30 [btrfs]
  [ 9482.127788]        update_qgroup_status_item+0x72/0x140 [btrfs]
  [ 9482.127877]        btrfs_qgroup_rescan_worker+0xde/0x680 [btrfs]
  [ 9482.127964]        btrfs_work_helper+0xf1/0x600 [btrfs]
  [ 9482.128039]        process_one_work+0x24e/0x5e0
  [ 9482.128110]        worker_thread+0x50/0x3b0
  [ 9482.128181]        kthread+0x153/0x170
  [ 9482.128256]        ret_from_fork+0x22/0x30
  [ 9482.128327]
-> #0 (&fs_info->qgroup_rescan_lock){+.+.}-{3:3}:
  [ 9482.128464]        check_prev_add+0x91/0xc60
  [ 9482.128551]        __lock_acquire+0x1740/0x3110
  [ 9482.128623]        lock_acquire+0xd8/0x490
  [ 9482.130029]        __mutex_lock+0xa3/0xb30
  [ 9482.130590]        qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.131577]        btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
  [ 9482.132175]        open_ctree+0x1228/0x18a0 [btrfs]
  [ 9482.132756]        btrfs_mount_root.cold+0x13/0xed [btrfs]
  [ 9482.133325]        legacy_get_tree+0x30/0x60
  [ 9482.133866]        vfs_get_tree+0x28/0xe0
  [ 9482.134392]        fc_mount+0xe/0x40
  [ 9482.134908]        vfs_kern_mount.part.0+0x71/0x90
  [ 9482.135428]        btrfs_mount+0x13b/0x3e0 [btrfs]
  [ 9482.135942]        legacy_get_tree+0x30/0x60
  [ 9482.136444]        vfs_get_tree+0x28/0xe0
  [ 9482.136949]        path_mount+0x2d7/0xa70
  [ 9482.137438]        do_mount+0x75/0x90
  [ 9482.137923]        __x64_sys_mount+0x8e/0xd0
  [ 9482.138400]        do_syscall_64+0x33/0x80
  [ 9482.138873]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 9482.139346]
other info that might help us debug this:

  [ 9482.140735]  Possible unsafe locking scenario:

  [ 9482.141594]        CPU0                    CPU1
  [ 9482.142011]        ----                    ----
  [ 9482.142411]   lock(btrfs-quota-00);
  [ 9482.142806]                                lock(&fs_info->qgroup_rescan_lock);
  [ 9482.143216]                                lock(btrfs-quota-00);
  [ 9482.143629]   lock(&fs_info->qgroup_rescan_lock);
  [ 9482.144056]
  *** DEADLOCK ***

  [ 9482.145242] 2 locks held by mount/24187:
  [ 9482.145637]  #0: ffffa0c8411c40e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xb9/0x400
  [ 9482.146061]  #1: ffffa0c892ebd3a0 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.146509]
stack backtrace:
  [ 9482.147350] CPU: 1 PID: 24187 Comm: mount Not tainted 5.10.0-rc4-btrfs-next-73 #1
  [ 9482.147788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  [ 9482.148709] Call Trace:
  [ 9482.149169]  dump_stack+0x8d/0xb5
  [ 9482.149628]  check_noncircular+0xff/0x110
  [ 9482.150090]  check_prev_add+0x91/0xc60
  [ 9482.150561]  ? kvm_clock_read+0x14/0x30
  [ 9482.151017]  ? kvm_sched_clock_read+0x5/0x10
  [ 9482.151470]  __lock_acquire+0x1740/0x3110
  [ 9482.151941]  ? __btrfs_tree_read_lock+0x27/0x120 [btrfs]
  [ 9482.152402]  lock_acquire+0xd8/0x490
  [ 9482.152887]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.153354]  __mutex_lock+0xa3/0xb30
  [ 9482.153826]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.154301]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.154768]  ? qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.155226]  qgroup_rescan_init+0x43/0xf0 [btrfs]
  [ 9482.155690]  btrfs_read_qgroup_config+0x43a/0x550 [btrfs]
  [ 9482.156160]  open_ctree+0x1228/0x18a0 [btrfs]
  [ 9482.156643]  btrfs_mount_root.cold+0x13/0xed [btrfs]
  [ 9482.157108]  ? rcu_read_lock_sched_held+0x5d/0x90
  [ 9482.157567]  ? kfree+0x31f/0x3e0
  [ 9482.158030]  legacy_get_tree+0x30/0x60
  [ 9482.158489]  vfs_get_tree+0x28/0xe0
  [ 9482.158947]  fc_mount+0xe/0x40
  [ 9482.159403]  vfs_kern_mount.part.0+0x71/0x90
  [ 9482.159875]  btrfs_mount+0x13b/0x3e0 [btrfs]
  [ 9482.160335]  ? rcu_read_lock_sched_held+0x5d/0x90
  [ 9482.160805]  ? kfree+0x31f/0x3e0
  [ 9482.161260]  ? legacy_get_tree+0x30/0x60
  [ 9482.161714]  legacy_get_tree+0x30/0x60
  [ 9482.162166]  vfs_get_tree+0x28/0xe0
  [ 9482.162616]  path_mount+0x2d7/0xa70
  [ 9482.163070]  do_mount+0x75/0x90
  [ 9482.163525]  __x64_sys_mount+0x8e/0xd0
  [ 9482.163986]  do_syscall_64+0x33/0x80
  [ 9482.164437]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 9482.164902] RIP: 0033:0x7f51e907caaa

This happens because at btrfs_read_qgroup_config() we can call
qgroup_rescan_init() while holding a read lock on a quota btree leaf,
acquired by the previous call to btrfs_search_slot_for_read(), and
qgroup_rescan_init() acquires the mutex qgroup_rescan_lock.

A qgroup rescan worker does the opposite: it acquires the mutex
qgroup_rescan_lock, at btrfs_qgroup_rescan_worker(), and then tries to
update the qgroup status item in the quota btree through the call to
update_qgroup_status_item(). This inversion of locking order
between the qgroup_rescan_lock mutex and quota btree locks causes the
splat.

Fix this simply by releasing and freeing the path before calling
qgroup_rescan_init() at btrfs_read_qgroup_config().

CC: [email protected] # 4.4+
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: tree-checker: add missing returns after data_ref alignment checks

There are sectorsize alignment checks that are reported but then
check_extent_data_ref continues. This was not intended, wrong alignment
is not a minor problem and we should return with error.

CC: [email protected] # 5.4+
Fixes: 0785a9aacf9d ("btrfs: tree-checker: Add EXTENT_DATA_REF check")
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: don't access possibly stale fs_info data for printing duplicate device

Syzbot reported a possible use-after-free when printing a duplicate device
warning device_list_add().

At this point it can happen that a btrfs_device::fs_info is not correctly
setup yet, so we're accessing stale data, when printing the warning
message using the btrfs_printk() wrappers.

  ==================================================================
  BUG: KASAN: use-after-free in btrfs_printk+0x3eb/0x435 fs/btrfs/super.c:245
  Read of size 8 at addr ffff8880878e06a8 by task syz-executor225/7068

  CPU: 1 PID: 7068 Comm: syz-executor225 Not tainted 5.9.0-rc5-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x1d6/0x29e lib/dump_stack.c:118
   print_address_description+0x66/0x620 mm/kasan/report.c:383
   __kasan_report mm/kasan/report.c:513 [inline]
   kasan_report+0x132/0x1d0 mm/kasan/report.c:530
   btrfs_printk+0x3eb/0x435 fs/btrfs/super.c:245
   device_list_add+0x1a88/0x1d60 fs/btrfs/volumes.c:943
   btrfs_scan_one_device+0x196/0x490 fs/btrfs/volumes.c:1359
   btrfs_mount_root+0x48f/0xb60 fs/btrfs/super.c:1634
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x88/0x270 fs/super.c:1547
   fc_mount fs/namespace.c:978 [inline]
   vfs_kern_mount+0xc9/0x160 fs/namespace.c:1008
   btrfs_mount+0x33c/0xae0 fs/btrfs/super.c:1732
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x88/0x270 fs/super.c:1547
   do_new_mount fs/namespace.c:2875 [inline]
   path_mount+0x179d/0x29e0 fs/namespace.c:3192
   do_mount fs/namespace.c:3205 [inline]
   __do_sys_mount fs/namespace.c:3413 [inline]
   __se_sys_mount+0x126/0x180 fs/namespace.c:3390
   do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x44840a
  RSP: 002b:00007ffedfffd608 EFLAGS: 00000293 ORIG_RAX: 00000000000000a5
  RAX: ffffffffffffffda RBX: 00007ffedfffd670 RCX: 000000000044840a
  RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffedfffd630
  RBP: 00007ffedfffd630 R08: 00007ffedfffd670 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000001a
  R13: 0000000000000004 R14: 0000000000000003 R15: 0000000000000003

  Allocated by task 6945:
   kasan_save_stack mm/kasan/common.c:48 [inline]
   kasan_set_track mm/kasan/common.c:56 [inline]
   __kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461
   kmalloc_node include/linux/slab.h:577 [inline]
   kvmalloc_node+0x81/0x110 mm/util.c:574
   kvmalloc include/linux/mm.h:757 [inline]
   kvzalloc include/linux/mm.h:765 [inline]
   btrfs_mount_root+0xd0/0xb60 fs/btrfs/super.c:1613
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x88/0x270 fs/super.c:1547
   fc_mount fs/namespace.c:978 [inline]
   vfs_kern_mount+0xc9/0x160 fs/namespace.c:1008
   btrfs_mount+0x33c/0xae0 fs/btrfs/super.c:1732
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x88/0x270 fs/super.c:1547
   do_new_mount fs/namespace.c:2875 [inline]
   path_mount+0x179d/0x29e0 fs/namespace.c:3192
   do_mount fs/namespace.c:3205 [inline]
   __do_sys_mount fs/namespace.c:3413 [inline]
   __se_sys_mount+0x126/0x180 fs/namespace.c:3390
   do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Freed by task 6945:
   kasan_save_stack mm/kasan/common.c:48 [inline]
   kasan_set_track+0x3d/0x70 mm/kasan/common.c:56
   kasan_set_free_info+0x17/0x30 mm/kasan/generic.c:355
   __kasan_slab_free+0xdd/0x110 mm/kasan/common.c:422
   __cache_free mm/slab.c:3418 [inline]
   kfree+0x113/0x200 mm/slab.c:3756
   deactivate_locked_super+0xa7/0xf0 fs/super.c:335
   btrfs_mount_root+0x72b/0xb60 fs/btrfs/super.c:1678
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x88/0x270 fs/super.c:1547
   fc_mount fs/namespace.c:978 [inline]
   vfs_kern_mount+0xc9/0x160 fs/namespace.c:1008
   btrfs_mount+0x33c/0xae0 fs/btrfs/super.c:1732
   legacy_get_tree+0xea/0x180 fs/fs_context.c:592
   vfs_get_tree+0x88/0x270 fs/super.c:1547
   do_new_mount fs/namespace.c:2875 [inline]
   path_mount+0x179d/0x29e0 fs/namespace.c:3192
   do_mount fs/namespace.c:3205 [inline]
   __do_sys_mount fs/namespace.c:3413 [inline]
   __se_sys_mount+0x126/0x180 fs/namespace.c:3390
   do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  The buggy address belongs to the object at ffff8880878e0000
   which belongs to the cache kmalloc-16k of size 16384
  The buggy address is located 1704 bytes inside of
   16384-byte region [ffff8880878e0000, ffff8880878e4000)
  The buggy address belongs to the page:
  page:0000000060704f30 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x878e0
  head:0000000060704f30 order:3 compound_mapcount:0 compound_pincount:0
  flags: 0xfffe0000010200(slab|head)
  raw: 00fffe0000010200 ffffea00028e9a08 ffffea00021e3608 ffff8880aa440b00
  raw: 0000000000000000 ffff8880878e0000 0000000100000001 0000000000000000
  page dumped because: kasan: bad access detected

  Memory state around the buggy address:
   ffff8880878e0580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff8880878e0600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  >ffff8880878e0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    ^
   ffff8880878e0700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff8880878e0780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ==================================================================

The syzkaller reproducer for this use-after-free crafts a filesystem image
and loop mounts it twice in a loop. The mount will fail as the crafted
image has an invalid chunk tree. When this happens btrfs_mount_root() will
call deactivate_locked_super(), which then cleans up fs_info and
fs_info::sb. If a second thread now adds the same block-device to the
filesystem, it will get detected as a duplicate device and
device_list_add() will reject the duplicate and print a warning. But as
the fs_info pointer passed in is non-NULL this will result in a
use-after-free.

Instead of printing possibly uninitialized or already freed memory in
btrfs_printk(), explicitly pass in a NULL fs_info so the printing of the
device name will be skipped altogether.

There was a slightly different approach discussed in
https://lore.kernel.org/linux-btrfs/20200114060920 [email protected]/t/#u

Link: https://lore.kernel.org/linux-btrfs/[email protected]
Reported-by: [email protected]
CC: [email protected] # 4.19+
Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

Merge tag 'misc-habanalabs-fixes-2020-11-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-linus

Oded writes:

This tag contains the following habanalabs driver fix for 5.10-rc6:

- Add missing statements and break; in case switch of ECC handling. Without
this fix, the handling of that interrupt will be erroneous.

* tag 'misc-habanalabs-fixes-2020-11-23' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux:
habanalabs/gaudi: fix missing code in ECC handling

ASoC: qcom: Fix enabling BCLK and LRCLK in LPAIF invalid state

Fix enabling BCLK and LRCLK only when LPAIF is invalid state and
bit clock in enable state.
In device suspend/resume scenario LPAIF is going to reset state.
which is causing LRCLK disable and BCLK enable.
Avoid such inconsitency by removing unnecessary cpu dai prepare API,
which is doing LRCLK enable, and by maintaining BLCK state information.

Fixes: 7e6799d8f87d ("ASoC: qcom: lpass-cpu: Enable MI2S BCLK and LRCLK together")
Signed-off-by: V Sujith Kumar Reddy <[email protected]>
Signed-off-by: Srinivasa Rao Mandadapu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

habanalabs/gaudi: fix missing code in ECC handling

There is missing statement and missing "break;" in the ECC handling
code in gaudi.c
This will cause a wrong behavior upon certain ECC interrupts.

Signed-off-by: Oded Gabbay <[email protected]>

drm/vc4: kms: Don't disable the muxing of an active CRTC

The current HVS muxing code will consider the CRTCs in a given state to
setup their muxing in the HVS, and disable the other CRTCs muxes.

However, it's valid to only update a single CRTC with a state, and in this
situation we would mux out a CRTC that was enabled but left untouched by
the new state.

Fix this by setting a flag on the CRTC state when the muxing has been
changed, and only change the muxing configuration when that flag is there.

Fixes: 87ebcd42fb7b ("drm/vc4: crtc: Assign output to channel automatically")
Signed-off-by: Maxime Ripard <[email protected]>
Tested-by: Hoegeun Kwon <[email protected]>
Reviewed-by: Hoegeun Kwon <[email protected]>
Reviewed-by: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/vc4: kms: Store the unassigned channel list in the state

If a CRTC is enabled but not active, and that we're then doing a page
flip on another CRTC, drm_atomic_get_crtc_state will bring the first
CRTC state into the global state, and will make us wait for its vblank
as well, even though that might never occur.

Instead of creating the list of the free channels each time atomic_check
is called, and calling drm_atomic_get_crtc_state to retrieve the
allocated channels, let's create a private state object in the main
atomic state, and use it to store the available channels.

Since vc4 has a semaphore (with a value of 1, so a lock) in its commit
implementation to serialize all the commits, even the nonblocking ones, we
are free from the use-after-free race if two subsequent commits are not ran
in their submission order.

Fixes: 87ebcd42fb7b ("drm/vc4: crtc: Assign output to channel automatically")
Signed-off-by: Maxime Ripard <[email protected]>
Tested-by: Hoegeun Kwon <[email protected]>
Reviewed-by: Hoegeun Kwon <[email protected]>
Reviewed-by: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'icc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-linus

Georgi writes:

interconnect fixes for v5.10

This contains a few driver fixes and one core fix:
- Fix an excessive of_node_put() in the core.
- Fix boot regression and integer overflow on msm8974 platforms.
- Fix a minor issue on qcs404 and msm8916 platforms.

Signed-off-by: Georgi Djakov <[email protected]>
* tag 'icc-5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
  interconnect: fix memory trashing in of_count_icc_providers()
  interconnect: qcom: qcs404: Remove GPU and display RPM IDs
  interconnect: qcom: msm8916: Remove rpm-ids from non-RPM nodes
  interconnect: qcom: msm8974: Don't boost the NoC rate during boot
  interconnect: qcom: msm8974: Prevent integer overflow in rate

Merge tag 'v5.10-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes

Fixed ordering for MMC devices on rk3399, due to a mmc change jumbling
all ordering, a fix to make the Odroig Go Advance actually power down
and using the correct clock name on the NanoPi R2S.

* tag 'v5.10-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
  arm64: dts: rockchip: Reorder LED triggers from mmc devices on rk3399-roc-pc.
  arm64: dts: rockchip: Assign a fixed index to mmc devices on rk3399 boards.
  arm64: dts: rockchip: Remove system-power-controller from pmic on Odroid Go Advance
  arm64: dts: rockchip: fix NanoPi R2S GMAC clock name

Link: https://lore.kernel.org/r/11641389.O9o76ZdvQC@phil
Signed-off-by: Arnd Bergmann <[email protected]>

arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()

With hardware dirty bit management, calling pte_wrprotect() on a writable,
dirty PTE will lose the dirty state and return a read-only, clean entry.

Move the logic from ptep_set_wrprotect() into pte_wrprotect() to ensure that
the dirty bit is preserved for writable entries, as this is required for
soft-dirty bit management if we enable it in the future.

Cc: <[email protected]>
Fixes: 2f4b829c625e ("arm64: Add support for hardware updates of the access and dirty pte bits")
Reviewed-by: Catalin Marinas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

arm64: pgtable: Fix pte_accessible()

pte_accessible() is used by ptep_clear_flush() to figure out whether TLB
invalidation is necessary when unmapping pages for reclaim. Although our
implementation is correct according to the architecture, returning true
only for valid, young ptes in the absence of racing page-table
modifications, this is in fact flawed due to lazy invalidation of old
ptes in ptep_clear_flush_young() where we elide the expensive DSB
instruction for completing the TLB invalidation.

Rather than penalise the aging path, adjust pte_accessible() to return
true for any valid pte, even if the access flag is cleared.

Cc: <[email protected]>
Fixes: 76c714be0e5e ("arm64: pgtable: implement pte_accessible()")
Reported-by: Yu Zhao <[email protected]>
Acked-by: Yu Zhao <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

iommu: Check return of __iommu_attach_device()

Currently iommu_create_device_direct_mappings() is called
without checking the return of __iommu_attach_device(). This
may result in failures in iommu driver if dev attach returns
error.

Fixes: ce574c27ae27 ("iommu: Move iommu_group_create_direct_mappings() out of iommu_group_add_device()")
Signed-off-by: Shameer Kolothum <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

arm-smmu-qcom: Ensure the qcom_scm driver has finished probing

Robin Murphy pointed out that if the arm-smmu driver probes before
the qcom_scm driver, we may call qcom_scm_qsmmu500_wait_safe_toggle()
before the __scm is initialized.

Now, getting this to happen is a bit contrived, as in my efforts it
required enabling asynchronous probing for both drivers, moving the
firmware dts node to the end of the dtsi file, as well as forcing a
long delay in the qcom_scm_probe function.

With those tweaks we ran into the following crash:
[    2.631040] arm-smmu 15000000.iommu:         Stage-1: 48-bit VA -> 48-bit IPA
[    2.633372] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
...
[    2.633402] [0000000000000000] user address but active_mm is swapper
[    2.633409] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[    2.633415] Modules linked in:
[    2.633427] CPU: 5 PID: 117 Comm: kworker/u16:2 Tainted: G        W         5.10.0-rc1-mainline-00025-g272a618fc36-dirty #3971
[    2.633430] Hardware name: Thundercomm Dragonboard 845c (DT)
[    2.633448] Workqueue: events_unbound async_run_entry_fn
[    2.633456] pstate: 80c00005 (Nzcv daif +PAN +UAO -TCO BTYPE=--)
[    2.633465] pc : qcom_scm_qsmmu500_wait_safe_toggle+0x78/0xb0
[    2.633473] lr : qcom_smmu500_reset+0x58/0x78
[    2.633476] sp : ffffffc0105a3b60
...
[    2.633567] Call trace:
[    2.633572]  qcom_scm_qsmmu500_wait_safe_toggle+0x78/0xb0
[    2.633576]  qcom_smmu500_reset+0x58/0x78
[    2.633581]  arm_smmu_device_reset+0x194/0x270
[    2.633585]  arm_smmu_device_probe+0xc94/0xeb8
[    2.633592]  platform_drv_probe+0x58/0xa8
[    2.633597]  really_probe+0xec/0x398
[    2.633601]  driver_probe_device+0x5c/0xb8
[    2.633606]  __driver_attach_async_helper+0x64/0x88
[    2.633610]  async_run_entry_fn+0x4c/0x118
[    2.633617]  process_one_work+0x20c/0x4b0
[    2.633621]  worker_thread+0x48/0x460
[    2.633628]  kthread+0x14c/0x158
[    2.633634]  ret_from_fork+0x10/0x18
[    2.633642] Code: a9034fa0 d0007f73 29107fa0 91342273 (f9400020)

To avoid this, this patch adds a check on qcom_scm_is_available() in
the qcom_smmu_impl_init() function, returning -EPROBE_DEFER if its
not ready.

This allows the driver to try to probe again later after qcom_scm has
finished probing.

Reported-by: Robin Murphy <[email protected]>
Signed-off-by: John Stultz <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Andy Gross <[email protected]>
Cc: Maulik Shah <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Saravana Kannan <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Lina Iyer <[email protected]>
Cc: [email protected]
Cc: linux-arm-msm <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

spi: spi-nxp-fspi: fix fspi panic by unexpected interrupts

Given the case that bootloader(such as UEFI)'s FSPI driver might not
handle all interrupts before loading kernel, those legacy interrupts
would assert immidiately once kernel's FSPI driver enable them. Further,
if it was FSPI_INTR_IPCMDDONE, the irq handler nxp_fspi_irq_handler()
would call complete(&f->c) to notify others. However, f->c might not be
initialized yet at that time, then cause kernel panic.

Of cause, we should fix this issue within bootloader. But it would be
better to have this pacth to make dirver more robust (by clearing all
interrupt status bits before enabling interrupts).

Suggested-by: Han Xu <[email protected]>
Signed-off-by: Ran Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

iommu/amd: Enforce 4k mapping for certain IOMMU data structures

AMD IOMMU requires 4k-aligned pages for the event log, the PPR log,
and the completion wait write-back regions. However, when allocating
the pages, they could be part of large mapping (e.g. 2M) page.
This causes #PF due to the SNP RMP hardware enforces the check based
on the page level for these data structures.

So, fix by calling set_memory_4k() on the allocated pages.

Fixes: c69d89aff393 ("iommu/amd: Use 4K page for completion wait write-back semaphore")
Signed-off-by: Suravee Suthikulpanit <[email protected]>
Cc: Brijesh Singh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

xsk: Fix incorrect netdev reference count

Fix incorrect netdev reference count in xsk_bind operation. Incorrect
reference count of the device appears when a user calls bind with the
XDP_ZEROCOPY flag on an interface which does not support zero-copy.
In such a case, an error is returned but the reference count is not
decreased. This change fixes the fault, by decreasing the reference count
in case of such an error.

The problem being corrected appeared in '162c820ed896' for the first time,
and the code was moved to new file location over the time with commit
'c2d3d6a47462'. This specific patch applies to all version starting
from 'c2d3d6a47462'. The same solution should be applied but on different
file (net/xdp/xdp_umem.c) and function (xdp_umem_assign_dev) for versions
from '162c820ed896' to 'c2d3d6a47462' excluded.

Fixes: 162c820ed896 ("xdp: hold device for umem regardless of zero-copy mode")
Signed-off-by: Marek Majtyka <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Magnus Karlsson <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Merge branch 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm

Pull SCMI cpufreq driver fix for 5.10-rc6 from Viresh Kumar:

"This fixes a build issues with SCMI cpufreq driver in the
!CONFIG_COMMON_CLK case."

* 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: scmi: Fix build for !CONFIG_COMMON_CLK

ACPI/IORT: Fix doc warnings in iort.c

Fix following warnings caused by mismatch between
function parameters and function comments.

drivers/acpi/arm64/iort.c:55: warning: Function parameter or member 'iort_node' not described in 'iort_set_fwnode'
drivers/acpi/arm64/iort.c:55: warning: Excess function parameter 'node' description in 'iort_set_fwnode'
drivers/acpi/arm64/iort.c:682: warning: Function parameter or member 'id' not described in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:682: warning: Function parameter or member 'bus_token' not described in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:682: warning: Excess function parameter 'req_id' description in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:1142: warning: Function parameter or member 'dma_size' not described in 'iort_dma_setup'
drivers/acpi/arm64/iort.c:1142: warning: Excess function parameter 'size' description in 'iort_dma_setup'
drivers/acpi/arm64/iort.c:1534: warning: Function parameter or member 'ops' not described in 'iort_add_platform_device'

Signed-off-by: Shiju Jose <[email protected]>
Acked-by: Hanjun Guo <[email protected]>
Acked-by: Lorenzo Pieralisi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

arm64/fpsimd: add <asm/insn.h> to <asm/kprobes.h> to fix fpsimd build

Adding <asm/exception.h> brought in <asm/kprobes.h> which uses
<asm/probes.h>, which uses 'pstate_check_t' so the latter needs to
#include <asm/insn.h> for this typedef.

Fixes this build error:

  In file included from arch/arm64/include/asm/kprobes.h:24,
                    from arch/arm64/include/asm/exception.h:11,
                    from arch/arm64/kernel/fpsimd.c:35:
  arch/arm64/include/asm/probes.h:16:2: error: unknown type name 'pstate_check_t'
      16 |  pstate_check_t *pstate_cc;

Fixes: c6b90d5cf637 ("arm64/fpsimd: Fix missing-prototypes in fpsimd.c")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Tian Tao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>

s390: fix fpu restore in entry.S

We need to disable interrupts in load_fpu_regs(). Otherwise an
interrupt might come in after the registers are loaded, but before
CIF_FPU is cleared in load_fpu_regs(). When the interrupt returns,
CIF_FPU will be cleared and the registers will never be restored.

The entry.S code usually saves the interrupt state in __SF_EMPTY on the
stack when disabling/restoring interrupts. sie64a however saves the pointer
to the sie control block in __SF_SIE_CONTROL, which references the same
location. This is non-obvious to the reader. To avoid thrashing the sie
control block pointer in load_fpu_regs(), move the __SIE_* offsets eight
bytes after __SF_EMPTY on the stack.

Cc: <[email protected]> # 5.8
Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
Reported-by: Pierre Morel <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Acked-by: Christian Borntraeger <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>

powerpc/64s: Fix allnoconfig build since uaccess flush

Using DECLARE_STATIC_KEY_FALSE needs linux/jump_table.h.

Otherwise the build fails with eg:

arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: warning: data definition has no type or storage class
66 | DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);

Fixes: 9a32a7e78bd0 ("powerpc/64s: flush L1D after user accesses")
Signed-off-by: Stephen Rothwell <[email protected]>
[mpe: Massage change log]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'powerpc-cve-2020-4788' into fixes

From Daniel's cover letter:

IBM Power9 processors can speculatively operate on data in the L1 cache
before it has been completely validated, via a way-prediction mechanism. It
is not possible for an attacker to determine the contents of impermissible
memory using this method, since these systems implement a combination of
hardware and software security measures to prevent scenarios where
protected data could be leaked.

However these measures don't address the scenario where an attacker induces
the operating system to speculatively execute instructions using data that
the attacker controls. This can be used for example to speculatively bypass
"kernel user access prevention" techniques, as discovered by Anthony
Steinhauser of Google's Safeside Project. This is not an attack by itself,
but there is a possibility it could be used in conjunction with
side-channels or other weaknesses in the privileged code to construct an
attack.

This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern.

This patch series flushes the L1 cache on kernel entry (patch 2) and after the
kernel performs any user accesses (patch 3). It also adds a self-test and
performs some related cleanups.

cpufreq: scmi: Fix build for !CONFIG_COMMON_CLK

Commit 8410e7f3b31e ("cpufreq: scmi: Fix OPP addition failure with a
dummy clock provider") registers a dummy clock provider using
devm_of_clk_add_hw_provider. These *_hw_provider functions are defined
only when CONFIG_COMMON_CLK=y. One possible fix is to add the Kconfig
dependency, but since we plan to move away from the clock dependency
for scmi cpufreq, it is preferrable to avoid that.

Let us just conditionally compile out the offending call to
devm_of_clk_add_hw_provider. It also uses the variable 'dev' outside
of the #ifdef block to avoid build warning.

Fixes: 8410e7f3b31e ("cpufreq: scmi: Fix OPP addition failure with a dummy clock provider")
Cc: Rafael J. Wysocki <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Sudeep Holla <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>

drm/exynos: depend on COMMON_CLK to fix compile tests

The Exynos DRM uses Common Clock Framework thus it cannot be built on
platforms without it (e.g. compile test on MIPS with RALINK and
SOC_RT305X):

/usr/bin/mips-linux-gnu-ld: drivers/gpu/drm/exynos/exynos_mixer.o: in function `mixer_bind':
exynos_mixer.c:(.text+0x958): undefined reference to `clk_set_parent'

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Inki Dae <[email protected]>

Linux 5.10-rc5

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

- Various functionality / regression fixes for Logitech devices from
   Hans de Goede

- Fix for (recently added) GPIO support in mcp2221 driver from Lars
   Povlsen

- Power management handling fix/quirk in i2c-hid driver for certain
   BIOSes that have strange aproach to power-cycle from Hans de Goede

- a few device ID additions and device-specific quirks

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: logitech-dj: Fix Dinovo Mini when paired with a MX5x00 receiver
  HID: logitech-dj: Fix an error in mse_bluetooth_descriptor
  HID: Add Logitech Dinovo Edge battery quirk
  HID: logitech-hidpp: Add HIDPP_CONSUMER_VENDOR_KEYS quirk for the Dinovo Edge
  HID: logitech-dj: Handle quad/bluetooth keyboards with a builtin trackpad
  HID: add HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE for Gamevice devices
  HID: mcp2221: Fix GPIO output handling
  HID: hid-sensor-hub: Fix issue with devices with no report ID
  HID: i2c-hid: Put ACPI enumerated devices in D3 on shutdown
  HID: add support for Sega Saturn
  HID: cypress: Support Varmilo Keyboards' media hotkeys
  HID: ite: Replace ABS_MISC 120/121 events with touchpad on/off keypresses
  HID: logitech-hidpp: Add PID for MX Anywhere 2
  HID: uclogic: Add ID for Trust Flex Design Tablet

Merge tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Thomas Gleixner:
"A couple of scheduler fixes:

   - Make the conditional update of the overutilized state work
     correctly by caching the relevant flags state before overwriting
     them and checking them afterwards.

   - Fix a data race in the wakeup path which caused loadavg on ARM64
     platforms to become a random number generator.

   - Fix the ordering of the iowaiter accounting operations so it can't
     be decremented before it is incremented.

   - Fix a bug in the deadline scheduler vs. priority inheritance when a
     non-deadline task A has inherited the parameters of a deadline task
     B and then blocks on a non-deadline task C.

     The second inheritance step used the static deadline parameters of
     task A, which are usually 0, instead of further propagating task
     B's parameters. The zero initialized parameters trigger a bug in
     the deadline scheduler"

* tag 'sched-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix priority inheritance with multiple scheduling classes
  sched: Fix rq->nr_iowait ordering
  sched: Fix data-race in wakeup
  sched/fair: Fix overutilized update in enqueue_task_fair()

Merge tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Thomas Gleixner:
"A single fix for the x86 perf sysfs interfaces which used kobject
  attributes instead of device attributes and therefore making clang's
  control flow integrity checker upset"

* tag 'perf-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: fix sysfs type mismatches

Merge tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fix from Thomas Gleixner:
"A single fix for lockdep which makes the recursion protection cover
graph lock/unlock"

* tag 'locking-urgent-2020-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Put graph lock/unlock under lock_recursion protection

Merge tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fixes from Borislav Petkov:
"Forwarded EFI fixes from Ard Biesheuvel:

   - fix memory leak in efivarfs driver

   - fix HYP mode issue in 32-bit ARM version of the EFI stub when built
     in Thumb2 mode

   - avoid leaking EFI pgd pages on allocation failure"

* tag 'efi-urgent-for-v5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/x86: Free efi_pgd with free_pages()
  efivarfs: fix memory leak in efivarfs_create()
  efi/arm: set HSCTLR Thumb2 bit correctly for HVC calls from HYP

Merge tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

- An IOMMU VT-d build fix when CONFIG_PCI_ATS=n along with a revert of
   same because the proper one is going through the IOMMU tree (Thomas
   Gleixner)

- An Intel microcode loader fix to save the correct microcode patch to
   apply during resume (Chen Yu)

- A fix to not access user memory of other processes when dumping
   opcode bytes (Thomas Gleixner)

* tag 'x86_urgent_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "iommu/vt-d: Take CONFIG_PCI_ATS into account"
  x86/dumpstack: Do not try to access user space code of other tasks
  x86/microcode/intel: Check patch signature before saving microcode for early loading
  iommu/vt-d: Take CONFIG_PCI_ATS into account

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"8 patches.

  Subsystems affected by this patch series: mm (madvise, pagemap,
  readahead, memcg, userfaultfd), kbuild, and vfs"

* emailed patches from Andrew Morton <[email protected]>:
  mm: fix madvise WILLNEED performance problem
  libfs: fix error cast of negative value in simple_attr_write()
  mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
  mm: memcg/slab: fix root memcg vmstats
  mm: fix readahead_page_batch for retry entries
  mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
  compiler-clang: remove version check for BPF Tracing
  mm/madvise: fix memory leak from process_madvise

Merge tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging and IIO fixes from Greg KH:
"Here are some small Staging and IIO driver fixes for 5.10-rc5. They
  include:

   - IIO fixes for reported regressions and problems

   - new device ids for IIO drivers

   - new device id for rtl8723bs driver

   - staging ralink driver Kconfig dependency fix

   - staging mt7621-pci bus resource fix

  All of these have been in linux-next all week with no reported issues"

* tag 'staging-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: accel: kxcjk1013: Add support for KIOX010A ACPI DSM for setting tablet-mode
  iio: accel: kxcjk1013: Replace is_smo8500_device with an acpi_type enum
  docs: ABI: testing: iio: stm32: remove re-introduced unsupported ABI
  iio: light: fix kconfig dependency bug for VCNL4035
  iio/adc: ingenic: Fix AUX/VBAT readings when touchscreen is used
  iio/adc: ingenic: Fix battery VREF for JZ4770 SoC
  staging: rtl8723bs: Add 024c:0627 to the list of SDIO device-ids
  staging: ralink-gdma: fix kconfig dependency bug for DMA_RALINK
  staging: mt7621-pci: avoid to request pci bus resources
  iio: imu: st_lsm6dsx: set 10ms as min shub slave timeout
  counter/ti-eqep: Fix regmap max_register
  iio: adc: stm32-adc: fix a regression when using dma and irq
  iio: adc: mediatek: fix unset field
  iio: cros_ec: Use default frequencies when EC returns invalid information

Merge tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty fixes from Greg KH:
"Here are some small tty/serial fixes for 5.10-rc5 that resolve some
  reported issues:

   - speakup crash when telling the kernel to use a device that isn't
     really there

   - imx serial driver fixes for reported problems

   - ar933x_uart driver fix for probe error handling path

  All have been in linux-next for a while with no reported issues"

* tag 'tty-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: ar933x_uart: disable clk on error handling path in probe
  tty: serial: imx: keep console clocks always on
  speakup: Do not let the line discipline be used several times
  tty: serial: imx: fix potential deadlock

Merge tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
"A final set of miscellaneous bug fixes for ext4"

* tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix bogus warning in ext4_update_dx_flag()
  jbd2: fix kernel-doc markups
  ext4: drop fast_commit from /proc/mounts

afs: Fix speculative status fetch going out of order wrt to modifications

When doing a lookup in a directory, the afs filesystem uses a bulk
status fetch to speculatively retrieve the statuses of up to 48 other
vnodes found in the same directory and it will then either update extant
inodes or create new ones - effectively doing 'lookup ahead'.

To avoid the possibility of deadlocking itself, however, the filesystem
doesn't lock all of those inodes; rather just the directory inode is
locked (by the VFS).

When the operation completes, afs_inode_init_from_status() or
afs_apply_status() is called, depending on whether the inode already
exists, to commit the new status.

A case exists, however, where the speculative status fetch operation may
straddle a modification operation on one of those vnodes.  What can then
happen is that the speculative bulk status RPC retrieves the old status,
and whilst that is happening, the modification happens - which returns
an updated status, then the modification status is committed, then we
attempt to commit the speculative status.

This results in something like the following being seen in dmesg:

kAFS: vnode modified {100058:861} 8->9 YFS.InlineBulkStatus

showing that for vnode 861 on volume 100058, we saw YFS.InlineBulkStatus
say that the vnode had data version 8 when we'd already recorded version
9 due to a local modification.  This was causing the cache to be
invalidated for that vnode when it shouldn't have been.  If it happens
on a data file, this might lead to local changes being lost.

Fix this by ignoring speculative status updates if the data version
doesn't match the expected value.

Note that it is possible to get a DV regression if a volume gets
restored from a backup - but we should get a callback break in such a
case that should trigger a recheck anyway.  It might be worth checking
the volume creation time in the volsync info and, if a change is
observed in that (as would happen on a restore), invalidate all caches
associated with the volume.

Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix madvise WILLNEED performance problem

The calculation of the end page index was incorrect, leading to a
regression of 70% when running stress-ng.

With this fix, we instead see a performance improvement of 3%.

Fixes: e6e88712e43b ("mm: optimise madvise WILLNEED")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Xing Zhengjun <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: William Kucharski <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: "Chen, Rong A" <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

libfs: fix error cast of negative value in simple_attr_write()

The attr->set() receive a value of u64, but simple_strtoll() is used for
doing the conversion.  It will lead to the error cast if user inputs a
negative value.

Use kstrtoull() instead of simple_strtoll() to convert a string got from
the user to an unsigned value.  The former will return '-EINVAL' if it
gets a negetive value, but the latter can't handle the situation
correctly.  Make 'val' unsigned long long as what kstrtoull() takes,
this will eliminate the compile warning on no 64-bit architectures.

Fixes: f7b88631a897 ("fs/libfs.c: fix simple_attr_write() on 32bit machines")
Signed-off-by: Yicong Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Al Viro <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()

Alexander reported a syzkaller / KASAN finding on s390, see below for
complete output.

In do_huge_pmd_anonymous_page(), the pre-allocated pagetable will be
freed in some cases.  In the case of userfaultfd_missing(), this will
happen after calling handle_userfault(), which might have released the
mmap_lock.  Therefore, the following pte_free(vma->vm_mm, pgtable) will
access an unstable vma->vm_mm, which could have been freed or re-used
already.

For all architectures other than s390 this will go w/o any negative
impact, because pte_free() simply frees the page and ignores the
passed-in mm.  The implementation for SPARC32 would also access
mm->page_table_lock for pte_free(), but there is no THP support in
SPARC32, so the buggy code path will not be used there.

For s390, the mm->context.pgtable_list is being used to maintain the 2K
pagetable fragments, and operating on an already freed or even re-used
mm could result in various more or less subtle bugs due to list /
pagetable corruption.

Fix this by calling pte_free() before handle_userfault(), similar to how
it is already done in __do_huge_pmd_anonymous_page() for the WRITE /
non-huge_zero_page case.

Commit 6b251fc96cf2c ("userfaultfd: call handle_userfault() for
userfaultfd_missing() faults") actually introduced both, the
do_huge_pmd_anonymous_page() and also __do_huge_pmd_anonymous_page()
changes wrt to calling handle_userfault(), but only in the latter case
it put the pte_free() before calling handle_userfault().

  BUG: KASAN: use-after-free in do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
  Read of size 8 at addr 00000000962d6988 by task syz-executor.0/9334

  CPU: 1 PID: 9334 Comm: syz-executor.0 Not tainted 5.10.0-rc1-syzkaller-07083-g4c9720875573 #0
  Hardware name: IBM 3906 M04 701 (KVM/Linux)
  Call Trace:
    do_huge_pmd_anonymous_page+0xcda/0xd90 mm/huge_memory.c:744
    create_huge_pmd mm/memory.c:4256 [inline]
    __handle_mm_fault+0xe6e/0x1068 mm/memory.c:4480
    handle_mm_fault+0x288/0x748 mm/memory.c:4607
    do_exception+0x394/0xae0 arch/s390/mm/fault.c:479
    do_dat_exception+0x34/0x80 arch/s390/mm/fault.c:567
    pgm_check_handler+0x1da/0x22c arch/s390/kernel/entry.S:706
    copy_from_user_mvcos arch/s390/lib/uaccess.c:111 [inline]
    raw_copy_from_user+0x3a/0x88 arch/s390/lib/uaccess.c:174
    _copy_from_user+0x48/0xa8 lib/usercopy.c:16
    copy_from_user include/linux/uaccess.h:192 [inline]
    __do_sys_sigaltstack kernel/signal.c:4064 [inline]
    __s390x_sys_sigaltstack+0xc8/0x240 kernel/signal.c:4060
    system_call+0xe0/0x28c arch/s390/kernel/entry.S:415

  Allocated by task 9334:
    slab_alloc_node mm/slub.c:2891 [inline]
    slab_alloc mm/slub.c:2899 [inline]
    kmem_cache_alloc+0x118/0x348 mm/slub.c:2904
    vm_area_dup+0x9c/0x2b8 kernel/fork.c:356
    __split_vma+0xba/0x560 mm/mmap.c:2742
    split_vma+0xca/0x108 mm/mmap.c:2800
    mlock_fixup+0x4ae/0x600 mm/mlock.c:550
    apply_vma_lock_flags+0x2c6/0x398 mm/mlock.c:619
    do_mlock+0x1aa/0x718 mm/mlock.c:711
    __do_sys_mlock2 mm/mlock.c:738 [inline]
    __s390x_sys_mlock2+0x86/0xa8 mm/mlock.c:728
    system_call+0xe0/0x28c arch/s390/kernel/entry.S:415

  Freed by task 9333:
    slab_free mm/slub.c:3142 [inline]
    kmem_cache_free+0x7c/0x4b8 mm/slub.c:3158
    __vma_adjust+0x7b2/0x2508 mm/mmap.c:960
    vma_merge+0x87e/0xce0 mm/mmap.c:1209
    userfaultfd_release+0x412/0x6b8 fs/userfaultfd.c:868
    __fput+0x22c/0x7a8 fs/file_table.c:281
    task_work_run+0x200/0x320 kernel/task_work.c:151
    tracehook_notify_resume include/linux/tracehook.h:188 [inline]
    do_notify_resume+0x100/0x148 arch/s390/kernel/signal.c:538
    system_call+0xe6/0x28c arch/s390/kernel/entry.S:416

  The buggy address belongs to the object at 00000000962d6948 which belongs to the cache vm_area_struct of size 200
  The buggy address is located 64 bytes inside of 200-byte region [00000000962d6948, 00000000962d6a10)
  The buggy address belongs to the page: page:00000000313a09fe refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x962d6 flags: 0x3ffff00000000200(slab)
  raw: 3ffff00000000200 000040000257e080 0000000c0000000c 000000008020ba00
  raw: 0000000000000000 000f001e00000000 ffffffff00000001 0000000096959501
  page dumped because: kasan: bad access detected
  page->mem_cgroup:0000000096959501

  Memory state around the buggy address:
   00000000962d6880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   00000000962d6900: 00 fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb
  >00000000962d6980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                        ^
   00000000962d6a00: fb fb fc fc fc fc fc fc fc fc 00 00 00 00 00 00
   00000000962d6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ==================================================================

Fixes: 6b251fc96cf2c ("userfaultfd: call handle_userfault() for userfaultfd_missing() faults")
Reported-by: Alexander Egorenkov <[email protected]>
Signed-off-by: Gerald Schaefer <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: <[email protected]> [4.3+]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcg/slab: fix root memcg vmstats

If we reparent the slab objects to the root memcg, when we free the slab
object, we need to update the per-memcg vmstats to keep it correct for
the root memcg. Now this at least affects the vmstat of
NR_KERNEL_STACK_KB for !CONFIG_VMAP_STACK when the thread stack size is
smaller than the PAGE_SIZE.

David said:
"I assume that without this fix that the root memcg's vmstat would
always be inflated if we reparented"

Fixes: ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes")
Signed-off-by: Muchun Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Christopher Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yafang Shao <[email protected]>
Cc: Chris Down <[email protected]>
Cc: <[email protected]> [5.3+]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix readahead_page_batch for retry entries

Both btrfs and fuse have reported faults caused by seeing a retry entry
instead of the page they were looking for.  This was caused by a missing
check in the iterator.

As can be seen in the below panic log, the accessing 0x402 causes a
panic.  In the xarray.h, 0x402 means RETRY_ENTRY.

  BUG: kernel NULL pointer dereference, address: 0000000000000402
  CPU: 14 PID: 306003 Comm: as Not tainted 5.9.0-1-amd64 #1 Debian 5.9.1-1
  Hardware name: Lenovo ThinkSystem SR665/7D2VCTO1WW, BIOS D8E106Q-1.01 05/30/2020
  RIP: 0010:fuse_readahead+0x152/0x470 [fuse]
  Code: 41 8b 57 18 4c 8d 54 10 ff 4c 89 d6 48 8d 7c 24 10 e8 d2 e3 28 f9 48 85 c0 0f 84 fe 00 00 00 44 89 f2 49 89 04 d4 44 8d 72 01 <48> 8b 10 41 8b 4f 1c 48 c1 ea 10 83 e2 01 80 fa 01 19 d2 81 e2 01
  RSP: 0018:ffffad99ceaebc50 EFLAGS: 00010246
  RAX: 0000000000000402 RBX: 0000000000000001 RCX: 0000000000000002
  RDX: 0000000000000000 RSI: ffff94c5af90bd98 RDI: ffffad99ceaebc60
  RBP: ffff94ddc1749a00 R08: 0000000000000402 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000100 R12: ffff94de6c429ce0
  R13: ffff94de6c4d3700 R14: 0000000000000001 R15: ffffad99ceaebd68
  FS:  00007f228c5c7040(0000) GS:ffff94de8ed80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000402 CR3: 0000001dbd9b4000 CR4: 0000000000350ee0
  Call Trace:
    read_pages+0x83/0x270
    page_cache_readahead_unbounded+0x197/0x230
    generic_file_buffered_read+0x57a/0xa20
    new_sync_read+0x112/0x1a0
    vfs_read+0xf8/0x180
    ksys_read+0x5f/0xe0
    do_syscall_64+0x33/0x80
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 042124cc64c3 ("mm: add new readahead_control API")
Reported-by: David Sterba <[email protected]>
Reported-by: Wonhyuk Yang <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports

The core-mm has a default __weak implementation of phys_to_target_node()
to mirror the weak definition of memory_add_physaddr_to_nid().  That
symbol is exported for modules.  However, while the export in
mm/memory_hotplug.c exported the symbol in the configuration cases of:

CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=y

...and:

CONFIG_NUMA_KEEP_MEMINFO=n
CONFIG_MEMORY_HOTPLUG=y

...it failed to export the symbol in the case of:

CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_MEMORY_HOTPLUG=n

Not only is that broken, but Christoph points out that the kernel should
not be exporting any __weak symbol, which means that
memory_add_physaddr_to_nid() example that phys_to_target_node() copied
is broken too.

Rework the definition of phys_to_target_node() and
memory_add_physaddr_to_nid() to not require weak symbols.  Move to the
common arch override design-pattern of an asm header defining a symbol
to replace the default implementation.

The only common header that all memory_add_physaddr_to_nid() producing
architectures implement is asm/sparsemem.h.  In fact, powerpc already
defines its memory_add_physaddr_to_nid() helper in sparsemem.h.
Double-down on that observation and define phys_to_target_node() where
necessary in asm/sparsemem.h.  An alternate consideration that was
discarded was to put this override in asm/numa.h, but that entangles
with the definition of MAX_NUMNODES relative to the inclusion of
linux/nodemask.h, and requires powerpc to grow a new header.

The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid
now that the symbol is properly exported / stubbed in all combinations
of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.

[[email protected]: v4]
Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com
[[email protected]: powerpc: fix create_section_mapping compile warning]
Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation")
Reported-by: Randy Dunlap <[email protected]>
Reported-by: Thomas Gleixner <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reported-by: Christoph Hellwig <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Tested-by: Thomas Gleixner <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Joao Martins <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <[email protected]>

compiler-clang: remove version check for BPF Tracing

bpftrace parses the kernel headers and uses Clang under the hood.

Remove the version check when __BPF_TRACING__ is defined (as bpftrace
does) so that this tool can continue to parse kernel headers, even with
older clang sources.

Fixes: commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1")
Reported-by: Chen Yu <[email protected]>
Reported-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Jarkko Sakkinen <[email protected]>
Acked-by: Jarkko Sakkinen <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Nathan Chancellor <[email protected]>
Acked-by: Miguel Ojeda <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/madvise: fix memory leak from process_madvise

The early return in process_madvise() will produce a memory leak.

Fix it.

Fixes: ecb8ac8b1f14 ("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

irqchip/gic-v3-its: Unconditionally save/restore the ITS state on suspend

On systems without HW-based collections (i.e. anything except GIC-500),
we rely on firmware to perform the ITS save/restore. This doesn't
really work, as although FW can properly save everything, it cannot
fully restore the state of the command queue (the read-side is reset
to the head of the queue). This results in the ITS consuming previously
processed commands, potentially corrupting the state.

Instead, let's always save the ITS state on suspend, disabling it in the
process, and restore the full state on resume. This saves us from broken
FW as long as it doesn't enable the ITS by itself (for which we can't do
anything).

This amounts to simply dropping the ITS_FLAGS_SAVE_SUSPEND_STATE.

Signed-off-by: Xu Qiang <[email protected]>
[maz: added warning on resume, rewrote commit message]
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

irqchip/exiu: Fix the index of fwspec for IRQ type

Since fwspec->param_count of ACPI node is two, the index of IRQ type
in fwspec->param[] should be 1 rather than 2.

Fixes: 3d090a36c8c8 ("irqchip/exiu: Implement ACPI support")
Signed-off-by: Chen Baozi <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: [email protected]

Merge branch 'ibmvnic-fixes-in-reset-path'

Lijun Pan says:

====================
ibmvnic: fixes in reset path

Patch 1/3 and 2/3 notify peers in failover and migration reset.
Patch 3/3 skips timeout reset if it is already resetting.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

ibmvnic: skip tx timeout reset while in resetting

Sometimes it takes longer than 5 seconds (watchdog timeout) to complete
failover, migration, and other resets. In stead of scheduling another
timeout reset, we wait for the current one to complete.

Suggested-by: Brian King <[email protected]>
Signed-off-by: Lijun Pan <[email protected]>
Reviewed-by: Dany Madden <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

ibmvnic: notify peers when failover and migration happen

Commit 61d3e1d9bc2a ("ibmvnic: Remove netdev notify for failover resets")
excluded the failover case for notify call because it said
netdev_notify_peers() can cause network traffic to stall or halt.
Current testing does not show network traffic stall
or halt because of the notify call for failover event.
netdev_notify_peers may be used when a device wants to inform the
rest of the network about some sort of a reconfiguration
such as failover or migration.

It is unnecessary to call that in other events like
FATAL, NON_FATAL, CHANGE_PARAM, and TIMEOUT resets
since in those scenarios the hardware does not change.
If the driver must do a hard reset, it is necessary to notify peers.

Fixes: 61d3e1d9bc2a ("ibmvnic: Remove netdev notify for failover resets")
Suggested-by: Brian King <[email protected]>
Suggested-by: Pradeep Satyanarayana <[email protected]>
Signed-off-by: Dany Madden <[email protected]>
Signed-off-by: Lijun Pan <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

ibmvnic: fix call_netdevice_notifiers in do_reset

When netdev_notify_peers was substituted in
commit 986103e7920c ("net/ibmvnic: Fix RTNL deadlock during device reset"),
call_netdevice_notifiers(NETDEV_RESEND_IGMP, dev) was missed.
Fix it now.

Fixes: 986103e7920c ("net/ibmvnic: Fix RTNL deadlock during device reset")
Signed-off-by: Lijun Pan <[email protected]>
Reviewed-by: Dany Madden <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

tun: honor IOCB_NOWAIT flag

tun only checks the file O_NONBLOCK flag, but it should also be checking
the iocb IOCB_NOWAIT flag. Any fops using ->read/write_iter() should check
both, otherwise it breaks users that correctly expect O_NONBLOCK semantics
if IOCB_NOWAIT is set.

Signed-off-by: Jens Axboe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net/af_iucv: set correct sk_protocol for child sockets

Child sockets erroneously inherit their parent's sk_type (ie. SOCK_*),
instead of the PF_IUCV protocol that the parent was created with in
iucv_sock_create().

We're currently not using sk->sk_protocol ourselves, so this shouldn't
have much impact (except eg. getting the output in skb_dump() right).

Fixes: eac3731bd04c ("[S390]: Add AF_IUCV socket support")
Signed-off-by: Julian Wiedmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

usbnet: ipheth: fix connectivity with iOS 14

Starting with iOS 14 released in September 2020, connectivity using the
personal hotspot USB tethering function of iOS devices is broken.

Communication between the host and the device (for example ICMP traffic
or DNS resolution using the DNS service running in the device itself)
works fine, but communication to endpoints further away doesn't work.

Investigation on the matter shows that no UDP and ICMP traffic from the
tethered host is reaching the Internet at all. For TCP traffic there are
exchanges between tethered host and server but packets are modified in
transit leading to impossible communication.

After some trials Matti Vuorela discovered that reducing the URB buffer
size by two bytes restored the previous behavior. While a better
solution might exist to fix the issue, since the protocol is not
publicly documented and considering the small size of the fix, let's do
that.

Tested-by: Matti Vuorela <[email protected]>
Signed-off-by: Yves-Alexis Perez <[email protected]>
Link: https://lore.kernel.org/linux-usb/CAAn0qaXmysJ9vx3ZEMkViv_B19ju-_ExN8Yn_uSefxpjS6g4Lw@mail.gmail.com/
Link: https://github.com/libimobiledevice/libimobiledevice/issues/1038
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

cxgb4: Fix build failure when CONFIG_TLS=m

After commit 9d2e5e9eeb59 ("cxgb4/ch_ktls: decrypted bit is not enough")
whenever CONFIG_TLS=m and CONFIG_CHELSIO_T4=y, the following build
failure occurs:

ld: drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.o: in function
`cxgb_select_queue':
cxgb4_main.c:(.text+0x2dac): undefined reference to `tls_validate_xmit_skb'

Fix this by ensuring that if TLS is set to be a module, CHELSIO_T4 will
also be compiled as a module. As otherwise the cxgb4 driver will not be
able to access TLS' symbols.

Fixes: 9d2e5e9eeb59 ("cxgb4/ch_ktls: decrypted bit is not enough")
Signed-off-by: Tom Seewald <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bonding: wait for sysfs kobject destruction before freeing struct slave

syzkaller found that with CONFIG_DEBUG_KOBJECT_RELEASE=y, releasing a
struct slave device could result in the following splat:

  kobject: 'bonding_slave' (00000000cecdd4fe): kobject_release, parent 0000000074ceb2b2 (delayed 1000)
  bond0 (unregistering): (slave bond_slave_1): Releasing backup interface
  ------------[ cut here ]------------
  ODEBUG: free active (active state 0) object type: timer_list hint: workqueue_select_cpu_near kernel/workqueue.c:1549 [inline]
  ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x98 kernel/workqueue.c:1600
  WARNING: CPU: 1 PID: 842 at lib/debugobjects.c:485 debug_print_object+0x180/0x240 lib/debugobjects.c:485
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 1 PID: 842 Comm: kworker/u4:4 Tainted: G S                5.9.0-rc8+ #96
  Hardware name: linux,dummy-virt (DT)
  Workqueue: netns cleanup_net
  Call trace:
   dump_backtrace+0x0/0x4d8 include/linux/bitmap.h:239
   show_stack+0x34/0x48 arch/arm64/kernel/traps.c:142
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x174/0x1f8 lib/dump_stack.c:118
   panic+0x360/0x7a0 kernel/panic.c:231
   __warn+0x244/0x2ec kernel/panic.c:600
   report_bug+0x240/0x398 lib/bug.c:198
   bug_handler+0x50/0xc0 arch/arm64/kernel/traps.c:974
   call_break_hook+0x160/0x1d8 arch/arm64/kernel/debug-monitors.c:322
   brk_handler+0x30/0xc0 arch/arm64/kernel/debug-monitors.c:329
   do_debug_exception+0x184/0x340 arch/arm64/mm/fault.c:864
   el1_dbg+0x48/0xb0 arch/arm64/kernel/entry-common.c:65
   el1_sync_handler+0x170/0x1c8 arch/arm64/kernel/entry-common.c:93
   el1_sync+0x80/0x100 arch/arm64/kernel/entry.S:594
   debug_print_object+0x180/0x240 lib/debugobjects.c:485
   __debug_check_no_obj_freed lib/debugobjects.c:967 [inline]
   debug_check_no_obj_freed+0x200/0x430 lib/debugobjects.c:998
   slab_free_hook mm/slub.c:1536 [inline]
   slab_free_freelist_hook+0x190/0x210 mm/slub.c:1577
   slab_free mm/slub.c:3138 [inline]
   kfree+0x13c/0x460 mm/slub.c:4119
   bond_free_slave+0x8c/0xf8 drivers/net/bonding/bond_main.c:1492
   __bond_release_one+0xe0c/0xec8 drivers/net/bonding/bond_main.c:2190
   bond_slave_netdev_event drivers/net/bonding/bond_main.c:3309 [inline]
   bond_netdev_event+0x8f0/0xa70 drivers/net/bonding/bond_main.c:3420
   notifier_call_chain+0xf0/0x200 kernel/notifier.c:83
   __raw_notifier_call_chain kernel/notifier.c:361 [inline]
   raw_notifier_call_chain+0x44/0x58 kernel/notifier.c:368
   call_netdevice_notifiers_info+0xbc/0x150 net/core/dev.c:2033
   call_netdevice_notifiers_extack net/core/dev.c:2045 [inline]
   call_netdevice_notifiers net/core/dev.c:2059 [inline]
   rollback_registered_many+0x6a4/0xec0 net/core/dev.c:9347
   unregister_netdevice_many.part.0+0x2c/0x1c0 net/core/dev.c:10509
   unregister_netdevice_many net/core/dev.c:10508 [inline]
   default_device_exit_batch+0x294/0x338 net/core/dev.c:10992
   ops_exit_list.isra.0+0xec/0x150 net/core/net_namespace.c:189
   cleanup_net+0x44c/0x888 net/core/net_namespace.c:603
   process_one_work+0x96c/0x18c0 kernel/workqueue.c:2269
   worker_thread+0x3f0/0xc30 kernel/workqueue.c:2415
   kthread+0x390/0x498 kernel/kthread.c:292
   ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:925

This is a potential use-after-free if the sysfs nodes are being accessed
whilst removing the struct slave, so wait for the object destruction to
complete before freeing the struct slave itself.

Fixes: 07699f9a7c8d ("bonding: add sysfs /slave dir for bond slave devices.")
Fixes: a068aab42258 ("bonding: Fix reference count leak in bond_sysfs_slave_add.")
Cc: Qiushi Wu <[email protected]>
Cc: Jay Vosburgh <[email protected]>
Cc: Veaceslav Falico <[email protected]>
Cc: Andy Gospodarek <[email protected]>
Signed-off-by: Jamie Iles <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
"The critical fixes are for a crash that someone reported in the xattr
  code on 32-bit arm last week; and a revert of the rmap key comparison
  change from last week as it was totally wrong. I need a vacation. :(

  Summary:

   - Fix various deficiencies in online fsck's metadata checking code

   - Fix an integer casting bug in the xattr code on 32-bit systems

   - Fix a hang in an inode walk when the inode index is corrupt

   - Fix error codes being dropped when initializing per-AG structures

   - Fix nowait directio writes that partially succeed but return EAGAIN

   - Revert last week's rmap comparison patch because it was wrong"

* tag 'xfs-5.10-fixes-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: revert "xfs: fix rmap key and record comparison functions"
  xfs: don't allow NOWAIT DIO across extent boundaries
  xfs: return corresponding errcode if xfs_initialize_perag() fail
  xfs: ensure inobt record walks always make forward progress
  xfs: fix forkoff miscalculation related to XFS_LITINO(mp)
  xfs: directory scrub should check the null bestfree entries too
  xfs: strengthen rmap record flags checking
  xfs: fix the minrecs logic when dealing with inode root child blocks

Merge tag 'fsnotify_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fanotify fix from Jan Kara:
"A single fanotify fix from Amir"

* tag 'fsnotify_for_v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: fix logic of reporting name info with watched parent

Merge tag 'seccomp-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fixes from Kees Cook:
"This gets the seccomp selftests running again on powerpc and sh, and
  fixes an audit reporting oversight noticed in both seccomp and ptrace.

   - Fix typos in seccomp selftests on powerpc and sh (Kees Cook)

   - Fix PF_SUPERPRIV audit marking in seccomp and ptrace (Mickaël
     Salaün)"

* tag 'seccomp-v5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests/seccomp: sh: Fix register names
  selftests/seccomp: powerpc: Fix typo in macro variable name
  seccomp: Set PF_SUPERPRIV when checking capability
  ptrace: Set PF_SUPERPRIV when checking capability

drm/mediatek: dsi: Modify horizontal front/back porch byte formula

In the patch to be fixed, horizontal_backporch_byte become too large
for some panel, so roll back that patch. For small hfp or hbp panel,
using vm->hfront_porch + vm->hback_porch to calculate
horizontal_backporch_byte would make it negtive, so
use horizontal_backporch_byte itself to make it positive.

Fixes: 35bf948f1edb ("drm/mediatek: dsi: Fix scrolling of panel with small hfp or hbp")
Signed-off-by: CK Hu <[email protected]>
Signed-off-by: Chun-Kuang Hu <[email protected]>
Tested-by: Bilal Wasim <[email protected]>

Merge branch 's390-qeth-fixes-2020-11-20'

Julian Wiedmann says:

====================
s390/qeth: fixes 2020-11-20

This brings several fixes for qeth's af_iucv-specific code paths.

Also one fix by Alexandra for the recently added BR_LEARNING_SYNC
support. We want to trust the feature indication bit, so that HW can
mask it out if there's any issues on their end.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

s390/qeth: fix tear down of async TX buffers

When qeth_iqd_tx_complete() detects that a TX buffer requires additional
async completion via QAOB, it might fail to replace the queue entry's
metadata (and ends up triggering recovery).

Assume now that the device gets torn down, overruling the recovery.
If the QAOB notification then arrives before the tear down has
sufficiently progressed, the buffer state is changed to
QETH_QDIO_BUF_HANDLED_DELAYED by qeth_qdio_handle_aob().

The tear down code calls qeth_drain_output_queue(), where
qeth_cleanup_handled_pending() will then attempt to replace such a
buffer _again_. If it succeeds this time, the buffer ends up dangling in
its replacement's ->next_pending list ... where it will never be freed,
since there's no further call to qeth_cleanup_handled_pending().

But the second attempt isn't actually needed, we can simply leave the
buffer on the queue and re-use it after a potential recovery has
completed. The qeth_clear_output_buffer() in qeth_drain_output_queue()
will ensure that it's in a clean state again.

Fixes: 72861ae792c2 ("qeth: recovery through asynchronous delivery")
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

s390/qeth: fix af_iucv notification race

The two expected notification sequences are
1. TX_NOTIFY_PENDING with a subsequent TX_NOTIFY_DELAYED_*, when
   our TX completion code first observed the pending TX and the QAOB
   then completes at a later time; or
2. TX_NOTIFY_OK, when qeth_qdio_handle_aob() picked up the QAOB
   completion before our TX completion code even noticed that the TX
   was pending.

But as qeth_iqd_tx_complete() and qeth_qdio_handle_aob() can run
concurrently, we may end up with a race that results in a sequence of
TX_NOTIFY_DELAYED_* followed by TX_NOTIFY_PENDING. Which would confuse
the af_iucv code in its tracking of pending transmits.

Rework the notification code, so that qeth_qdio_handle_aob() defers its
notification if the TX completion code is still active.

Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport")
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

s390/qeth: make af_iucv TX notification call more robust

Calling into socket code is ugly already, at least check whether we are
dealing with the expected sk_family. Only looking at skb->protocol is
bound to cause troubles (consider eg. af_packet).

Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport")
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

s390/qeth: Remove pnso workaround

Remove workaround that supported early hardware implementations
of PNSO OC3. Rely on the 'enarf' feature bit instead.

Fixes: fa115adff2c1 ("s390/qeth: Detect PNSO OC3 capability")
Signed-off-by: Alexandra Winter <[email protected]>
Reviewed-by: Julian Wiedmann <[email protected]>
[jwi: use logical instead of bit-wise AND]
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge branch 'tcp-address-issues-with-ect0-not-being-set-in-dctcp-packets'

Alexander Duyck says:

====================
tcp: Address issues with ECT0 not being set in DCTCP packets

This patch set is meant to address issues seen with SYN/ACK packets not
containing the ECT0 bit when DCTCP is configured as the congestion control
algorithm for a TCP socket.

A simple test using "tcpdump" and "test_progs -t bpf_tcp_ca" makes the
issue obvious. Looking at the packets will result in the SYN/ACK packet
with an ECT0 bit that does not match the other packets for the flow when
the congestion control agorithm is switch from the default. So for example
going from non-DCTCP to a DCTCP congestion control algorithm we will see
the SYN/ACK IPV6 header will not have ECT0 set while the other packets in
the flow will. Likewise if we switch from a default of DCTCP to cubic we
will see the ECT0 bit set in the SYN/ACK while the other packets in the
flow will not.
====================

Link: https://lore.kernel.org/r/160582070138.66684.11785214534154816097.stgit@localhost.localdomain
Signed-off-by: Jakub Kicinski <[email protected]>

tcp: Set INET_ECN_xmit configuration in tcp_reinit_congestion_control

When setting congestion control via a BPF program it is seen that the
SYN/ACK for packets within a given flow will not include the ECT0 flag. A
bit of simple printk debugging shows that when this is configured without
BPF we will see the value INET_ECN_xmit value initialized in
tcp_assign_congestion_control however when we configure this via BPF the
socket is in the closed state and as such it isn't configured, and I do not
see it being initialized when we transition the socket into the listen
state. The result of this is that the ECT0 bit is configured based on
whatever the default state is for the socket.

Any easy way to reproduce this is to monitor the following with tcpdump:
tools/testing/selftests/bpf/test_progs -t bpf_tcp_ca

Without this patch the SYN/ACK will follow whatever the default is. If dctcp
all SYN/ACK packets will have the ECT0 bit set, and if it is not then ECT0
will be cleared on all SYN/ACK packets. With this patch applied the SYN/ACK
bit matches the value seen on the other packets in the given stream.

Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

tcp: Allow full IP tos/IPv6 tclass to be reflected in L3 header

An issue was recently found where DCTCP SYN/ACK packets did not have the
ECT bit set in the L3 header. A bit of code review found that the recent
change referenced below had gone though and added a mask that prevented the
ECN bits from being populated in the L3 header.

This patch addresses that by rolling back the mask so that it is only
applied to the flags coming from the incoming TCP request instead of
applying it to the socket tos/tclass field. Doing this the ECT bits were
restored in the SYN/ACK packets in my testing.

One thing that is not addressed by this patch set is the fact that
tcp_reflect_tos appears to be incompatible with ECN based congestion
avoidance algorithms. At a minimum the feature should likely be documented
which it currently isn't.

Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket")
Signed-off-by: Alexander Duyck <[email protected]>
Acked-by: Wei Wang <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Fixes for two fairly obscure but annoying when triggered races in
  iSCSI"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target: iscsi: Fix cmd abort fabric stop race
  scsi: libiscsi: Fix NOP race condition

dpaa2-eth: select XGMAC_MDIO for MDIO bus support

Explicitly enable the FSL_XGMAC_MDIO Kconfig option in order to have
MDIO access to internal and external PHYs.

Fixes: 719479230893 ("dpaa2-eth: add MAC/PHY support through phylink")
Signed-off-by: Ioana Ciornei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'block-5.10-2020-11-20' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

- NVMe pull request from Christoph:
      - Doorbell Buffer freeing fix (Minwoo Im)
      - CSE log leak fix (Keith Busch)

- blk-cgroup hd_struct leak fix (Christoph)

- Flush request state fix (Ming)

- dasd NULL deref fix (Stefan)

* tag 'block-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
  s390/dasd: fix null pointer dereference for ERP requests
  blk-cgroup: fix a hd_struct leak in blkcg_fill_root_iostats
  nvme: fix memory leak freeing command effects
  nvme: directly cache command effects log
  nvme: free sq/cq dbbuf pointers when dbbuf set fails
  block: mark flush request as IDLE when it is really finished

Merge tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"Mostly regression or stable fodder:

   - Disallow async path resolution of /proc/self

   - Tighten constraints for segmented async buffered reads

   - Fix double completion for a retry error case

   - Fix for fixed file life times (Pavel)"

* tag 'io_uring-5.10-2020-11-20' of git://git.kernel.dk/linux-block:
  io_uring: order refnode recycling
  io_uring: get an active ref_node from files_data
  io_uring: don't double complete failed reissue request
  mm: never attempt async page lock if we've transferred data already
  io_uring: handle -EOPNOTSUPP on path resolution
  proc: don't allow async path resolution of /proc/self components

cxgb4: fix the panic caused by non smac rewrite

SMT entry is allocated only when loopback Source MAC
rewriting is requested. Accessing SMT entry for non
smac rewrite cases results in kernel panic.

Fix the panic caused by non smac rewrite

Fixes: 937d84205884 ("cxgb4: set up filter action after rewrites")
Signed-off-by: Raju Rangoju <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

selftests/seccomp: sh: Fix register names

It looks like the seccomp selftests was never actually built for sh.
This fixes it, though I don't have an environment to do a runtime test
of it yet.

Fixes: 0bb605c2c7f2b4b3 ("sh: Add SECCOMP_FILTER")
Tested-by: John Paul Adrian Glaubitz <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Kees Cook <[email protected]>

selftests/seccomp: powerpc: Fix typo in macro variable name

A typo sneaked into the powerpc selftest. Fix the name so it builds again.

Fixes: 46138329faea ("selftests/seccomp: powerpc: Fix seccomp return value testing")
Acked-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Kees Cook <[email protected]>

block/keyslot-manager: prevent crash when num_slots=1

If there is only one keyslot, then blk_ksm_init() computes
slot_hashtable_size=1 and log_slot_ht_size=0.  This causes
blk_ksm_find_keyslot() to crash later because it uses
hash_ptr(key, log_slot_ht_size) to find the hash bucket containing the
key, and hash_ptr() doesn't support the bits == 0 case.

Fix this by making the hash table always have at least 2 buckets.

Tested by running:

    kvm-xfstests -c ext4 -g encrypt -m inlinecrypt \
                 -o blk-crypto-fallback.num_keyslots=1

Fixes: 1b2628397058 ("block: Keyslot Manager for Inline Encryption")
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'for-linus-5.10b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fix from Juergen Gross:
"A single fix for avoiding WARN splats when booting a Xen guest with
nosmt"

* tag 'for-linus-5.10b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: don't unbind uninitialized lock_kicker_irq

net/tls: missing received data after fast remote close

In case when tcp socket received FIN after some data and the
parser haven't started before reading data caller will receive
an empty buffer. This behavior differs from plain TCP socket and
leads to special treating in user-space.
The flow that triggers the race is simple. Server sends small
amount of data right after the connection is configured to use TLS
and closes the connection. In this case receiver sees TLS Handshake
data, configures TLS socket right after Change Cipher Spec record.
While the configuration is in process, TCP socket receives small
Application Data record, Encrypted Alert record and FIN packet. So
the TCP socket changes sk_shutdown to RCV_SHUTDOWN and sk_flag with
SK_DONE bit set. The received data is not parsed upon arrival and is
never sent to user-space.

Patch unpauses parser directly if we have unparsed data in tcp
receive queue.

Fixes: fcf4793e278e ("tls: check RCV_SHUTDOWN in tls_wait_data")
Signed-off-by: Vadim Fedorenko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'dmaengine-fix-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

Pull dmaengine fixes from Vinod Koul:
"A solitary core fix and a few driver fixes:

  Core:

   - channel_register error handling

  Driver fixes:

   - idxd: wq config registers programming and mapping of portal size

   - ioatdma: unused fn removal

   - pl330: fix burst size

   - ti: pm fix on busy and -Wenum-conversion warns

   - xilinx: SG capability check, usage of xilinx_aximcdma_tx_segment,
     readl_poll_timeout_atomic variant"

* tag 'dmaengine-fix-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: fix error codes in channel_register()
  dmaengine: pl330: _prep_dma_memcpy: Fix wrong burst size
  dmaengine: ioatdma: remove unused function missed during dma_v2 removal
  dmaengine: idxd: fix mapping of portal size
  dmaengine: ti: omap-dma: Block PM if SDMA is busy to fix audio
  dmaengine: xilinx_dma: Fix SG capability check for MCDMA
  dmaengine: xilinx_dma: Fix usage of xilinx_aximcdma_tx_segment
  dmaengine: xilinx_dma: use readl_poll_timeout_atomic variant
  dmaengine: ti: k3-udma: fix -Wenum-conversion warning
  dmaengine: idxd: fix wq config registers offset programming

Merge tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull iommu fixes from Will Deacon:
"Two straightforward vt-d fixes:

   - Fix boot when intel iommu initialisation fails under TXT (tboot)

   - Fix intel iommu compilation error when DMAR is enabled without ATS

  and temporarily update IOMMU MAINTAINERs entry"

* tag 'iommu-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  MAINTAINERS: Temporarily add myself to the IOMMU entry
  iommu/vt-d: Fix compile error with CONFIG_PCI_ATS not set
  iommu/vt-d: Avoid panic if iommu init fails in tboot system

Merge tag 'mmc-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
"A couple of MMC fixes:

   - sdhci-of-arasan: Stabilize communication by fixing tap value configs

   - sdhci-pci: Use SDR25 timing for HS mode for BYT-based Intel HWs"

* tag 'mmc-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-of-arasan: Issue DLL reset explicitly
  mmc: sdhci-of-arasan: Use Mask writes for Tap delays
  mmc: sdhci-of-arasan: Allow configuring zero tap values
  mmc: sdhci-pci: Prefer SDR25 timing for High Speed mode for BYT-based Intel controllers

bnxt_en: Release PCI regions when DMA mask setup fails during probe.

Jump to init_err_release to cleanup. bnxt_unmap_bars() will also be
called but it will do nothing if the BARs are not mapped yet.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Reported-by: Jakub Kicinski <[email protected]>
Signed-off-by: Michael Chan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

rose: Fix Null pointer dereference in rose_send_frame()

rose_send_frame() dereferences `neigh->dev` when called from
rose_transmit_clear_request(), and the first occurrence of the
`neigh` is in rose_loopback_timer() as `rose_loopback_neigh`,
and it is initialized in rose_add_loopback_neigh() as NULL.
i.e when `rose_loopback_neigh` used in rose_loopback_timer()
its `->dev` was still NULL and rose_loopback_timer() was calling
rose_rx_call_request() without checking for NULL.

- net/rose/rose_link.c
This bug seems to get triggered in this line:

rose_call = (ax25_address *)neigh->dev->dev_addr;

Fix it by adding NULL checking for `rose_loopback_neigh->dev`
in rose_loopback_timer().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Jakub Kicinski <[email protected]>
Reported-by: [email protected]
Tested-by: [email protected]
Link: https://syzkaller.appspot.com/bug?id=9d2a7ca8c7f2e4b682c97578dfa3f236258300b3
Signed-off-by: Anmol Karn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'sound-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes: the only core change is a minor error
  code handling in the control API, and all the rest are device-specific
  fixes, mostly quirks, fixups and ASoC Intel fixes.

  It looks boring, and good so"

* tag 'sound-5.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: mixart: Fix mutex deadlock
  ALSA: hda/ca0132: Fix compile warning without PCI
  ASOC: Intel: kbl_rt5663_rt5514_max98927: Do not try to disable disabled clock
  ALSA: usb-audio: Add delay quirk for all Logitech USB devices
  ASoC: Intel: catpt: Correct clock selection for dai trigger
  ASoC: Intel: catpt: Skip position update for unprepared streams
  ASoC: qcom: lpass-platform: Fix memory leak
  ASoC: Intel: KMB: Fix S24_LE configuration
  ALSA: hda: Add Alderlake-S PCI ID and HDMI codec vid
  ALSA: usb-audio: Use ALC1220-VB-DT mapping for ASUS ROG Strix TRX40 mobo
  ALSA: firewire: Clean up a locking issue in copy_resp_to_buf()
  ASoC: rt1015: increase the time to detect BCLK
  ALSA: ctl: fix error path at adding user-defined element set
  ALSA: hda/realtek - HP Headset Mic can't detect after boot
  ALSA: hda/realtek - Add supported mute Led for HP
  ALSA: hda/realtek: Add some Clove SSID in the ALC293(ALC1220)
  ALSA: hda/realtek - Add supported for Lenovo ThinkPad Headset Button
  ASoC: rt1015: add delay to fix pop noise from speaker

Merge tag 'drm-fixes-2020-11-20-2' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Weekly fixes pull.

  This contains some fixes for sun4i/dw-hdmi probing, then amdgpu
  enables arcturus hw without experimental flag and two other fixes and
  a group of i915 fixes.

  It also has a backported from next fix for the warn on reported in
  ast/drm_gem_vram_helper code in the merge window. There's a separate
  report which initially looked to be the same problem, but I'm going to
  chase that up next week a bit more as I don't think the bisect landed
  anywhere useful.

  Summary:

  core:
   - vram helper TTM regression fix

  amdgpu:
   - Pageflip fix for navi1x with 5 or 6 displays
   - Remove experimental flag for Arcturus
   - Fix regression in atomic commit tail rework

  i915:
   - Fix tgl power gating issue
   - Memory leak fixes
   - Selftest fixes
   - Display bpc fix
   - Fix TGL MOCS for PTE tracking

  dw-hdmi:
   - probing fix

  sun4i:
   - probing fix"

* tag 'drm-fixes-2020-11-20-2' of git://anongit.freedesktop.org/drm/drm:
  drm/i915/gt: Fixup tgl mocs for PTE tracking
  drm/vram-helper: Fix use of top-down placement
  drm/i915/gt: Remember to free the virtual breadcrumbs
  drm/i915: Handle max_bpc==16
  drm/amd/display: Always get CRTC updated constant values inside commit tail
  drm/sun4i: backend: Fix probe failure with multiple backends
  drm/sun4i: dw-hdmi: fix error return code in sun8i_dw_hdmi_bind()
  drm/i915/selftests: Fix wrong return value of perf_request_latency()
  drm/i915/selftests: Fix wrong return value of perf_series_engines()
  drm/i915: Avoid memory leak with more than 16 workarounds on a list
  drm/i915/tgl: Fix Media power gate sequence.
  drm/amdgpu: remove experimental flag from arcturus
  drm/amd/display: Add missing pflip irq for dcn2.0
  drm/i915/gvt: return error when failing to take the module reference
  drm: bridge: dw-hdmi: Avoid resetting force in the detect function
  drm/i915/gvt: Set ENHANCED_FRAME_CAP bit
  drm/i915/gvt: Temporarily disable vfio_edid for BXT/APL

MAINTAINERS: Change Solarflare maintainers

Email from solarflare.com will stop working. Update the maintainers.
A replacement for [email protected] is not working yet,
for now remove it.

Signed-off-by: Martin Habets <[email protected]>
Signed-off-by: Edward Cree <[email protected]>
Link: https://lore.kernel.org/r/20201120113207.GA1605547@mh-desktop
Signed-off-by: Jakub Kicinski <[email protected]>

spi: Take the SPI IO-mutex in the spi_setup() method

I've discovered that due to the recent commit 49d7d695ca4b ("spi: dw:
Explicitly de-assert CS on SPI transfer completion") a concurrent usage of
the spidev devices with different chip-selects causes the "SPI transfer
timed out" error. The root cause of the problem has turned to be in a race
condition of the SPI-transfer execution procedure and the spi_setup()
method being called at the same time. In particular in calling the
spi_set_cs(false) while there is an SPI-transfer being executed. In my
case due to the commit cited above all CSs get to be switched off by
calling the spi_setup() for /dev/spidev0.1 while there is an concurrent
SPI-transfer execution performed on /dev/spidev0.0. Of course a situation
of the spi_setup() being called while there is an SPI-transfer being
executed for two different SPI peripheral devices of the same controller
may happen not only for the spidev driver, but for instance for MMC SPI +
some another device, or spi_setup() being called from an SPI-peripheral
probe method while some other device has already been probed and is being
used by a corresponding driver...

Of course I could have provided a fix affecting the DW APB SSI driver
only, for instance, by creating a mutual exclusive access to the set_cs
callback and setting/clearing only the bit responsible for the
corresponding chip-select. But after a short research I've discovered that
the problem most likely affects a lot of the other drivers:
- drivers/spi/spi-sun4i.c - RMW the chip-select register;
- drivers/spi/spi-rockchip.c - RMW the chip-select register;
- drivers/spi/spi-qup.c - RMW a generic force-CS flag in a CSR.
- drivers/spi/spi-sifive.c - set a generic CS-mode flag in a CSR.
- drivers/spi/spi-bcm63xx-hsspi.c - uses an internal mutex to serialize
  the bus config changes, but still isn't protected from the race
  condition described above;
- drivers/spi/spi-geni-qcom.c - RMW a chip-select internal flag and set the
  CS state in HW;
- drivers/spi/spi-orion.c - RMW a chip-select register;
- drivers/spi/spi-cadence.c - RMW a chip-select register;
- drivers/spi/spi-armada-3700.c - RMW a chip-select register;
- drivers/spi/spi-lantiq-ssc.c - overwrites the chip-select register;
- drivers/spi/spi-sun6i.c - RMW a chip-select register;
- drivers/spi/spi-synquacer.c - RMW a chip-select register;
- drivers/spi/spi-altera.c - directly sets the chip-select state;
- drivers/spi/spi-omap2-mcspi.c - RMW an internally cached CS state and
  writes it to HW;
- drivers/spi/spi-mt65xx.c - RMW some CSR;
- drivers/spi/spi-jcore.c - directly sets the chip-selects state;
- drivers/spi/spi-mt7621.c - RMW a chip-select register;

I could have missed some drivers, but a scale of the problem is obvious.
As you can see most of the drivers perform an unprotected
Read-modify-write chip-select register modification in the set_cs callback.
Seeing the spi_setup() function is calling the spi_set_cs() and it can be
executed concurrently with SPI-transfers exec procedure, which also calls
spi_set_cs() in the SPI core spi_transfer_one_message() method, the race
condition of the register modification turns to be obvious.

To sum up the problem denoted above affects each driver for a controller
having more than one chip-select lane and which:
1) performs the RMW to some CS-related register with no serialization;
2) directly disables any CS on spi_set_cs(dev, false).
* the later is the case of the DW APB SSI driver.

The controllers which equipped with a single CS theoretically can also
experience the problem, but in practice will not since normally the
spi_setup() isn't called concurrently with the SPI-transfers executed on
the same SPI peripheral device.

In order to generically fix the denoted bug I'd suggest to serialize an
access to the controller IO by taking the IO mutex in the spi_setup()
callback. The mutex is held while there is an SPI communication going on
on the SPI-bus of the corresponding SPI-controller. So calling the
spi_setup() method and disabling/updating the CS state within it would be
safe while there is no any SPI-transfers being executed. Also note I
suppose it would be safer to protect the spi_controller->setup() callback
invocation too, seeing some of the SPI-controller drivers update a HW
state in there.

Fixes: 49d7d695ca4b ("spi: dw: Explicitly de-assert CS on SPI transfer completion")
Signed-off-by: Serge Semin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

USB: core: Change %pK for __user pointers to %px

Commit 2f964780c03b ("USB: core: replace %p with %pK") used the %pK
format specifier for a bunch of __user pointers.  But as the 'K' in
the specifier indicates, it is meant for kernel pointers.  The reason
for the %pK specifier is to avoid leaks of kernel addresses, but when
the pointer is to an address in userspace the security implications
are minimal.  In particular, no kernel information is leaked.

This patch changes the __user %pK specifiers (used in a bunch of
debugging output lines) to %px, which will always print the actual
address with no mangling.  (Notably, there is no printk format
specifier particularly intended for __user pointers.)

Fixes: 2f964780c03b ("USB: core: replace %p with %pK")
CC: Vamsi Krishna Samavedam <[email protected]>
CC: <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

MAINTAINERS: Update email address for Sean Christopherson

Update my email address to one provided by my new benefactor.

Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Jarkko Sakkinen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <20201119183707 [email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

USB: core: Fix regression in Hercules audio card

Commit 3e4f8e21c4f2 ("USB: core: fix check for duplicate endpoints")
aimed to make the USB stack more reliable by detecting and skipping
over endpoints that are duplicated between interfaces. This caused a
regression for a Hercules audio card (reported as Bugzilla #208357),
which contains such non-compliant duplications. Although the
duplications are harmless, skipping the valid endpoints prevented the
device from working.

This patch fixes the regression by adding ENDPOINT_IGNORE quirks for
the Hercules card, telling the kernel to ignore the invalid duplicate
endpoints and thereby allowing the valid endpoints to be used as
intended.

Fixes: 3e4f8e21c4f2 ("USB: core: fix check for duplicate endpoints")
CC: <[email protected]>
Reported-by: Alexander Chalikiopoulos <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: gadget: Fix memleak in gadgetfs_fill_super

usb_get_gadget_udc_name will alloc memory for CHIP
in "Enomem" branch. we should free it before error
returns to prevent memleak.

Fixes: 175f712119c57 ("usb: gadget: provide interface for legacy gadgets to get UDC name")
Reported-by: Hulk Robot <[email protected]>
Acked-by: Alan Stern <[email protected]>
Signed-off-by: Zhang Qilong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: gadget: f_midi: Fix memleak in f_midi_alloc

In the error path, if midi is not null, we should
free the midi->id if necessary to prevent memleak.

Fixes: b85e9de9e818d ("usb: gadget: f_midi: convert to new function interface with backward compatibility")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Zhang Qilong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

USB: quirks: Add USB_QUIRK_DISCONNECT_SUSPEND quirk for Lenovo A630Z TIO built-in usb-audio card

Add a USB_QUIRK_DISCONNECT_SUSPEND quirk for the Lenovo TIO built-in
usb-audio. when A630Z going into S3,the system immediately wakeup 7-8
seconds later by usb-audio disconnect interrupt to avoids the issue.
eg dmesg:
....
[  626.974091 ] usb 7-1.1: USB disconnect, device number 3
....
....
[ 1774.486691] usb 7-1.1: new full-speed USB device number 5 using xhci_hcd
[ 1774.947742] usb 7-1.1: New USB device found, idVendor=17ef, idProduct=a012, bcdDevice= 0.55
[ 1774.956588] usb 7-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1774.964339] usb 7-1.1: Product: Thinkcentre TIO24Gen3 for USB-audio
[ 1774.970999] usb 7-1.1: Manufacturer: Lenovo
[ 1774.975447] usb 7-1.1: SerialNumber: 000000000000
[ 1775.048590] usb 7-1.1: 2:1: cannot get freq at ep 0x1
.......
Seeking a better fix, we've tried a lot of things, including:
- Check that the device's power/wakeup is disabled
- Check that remote wakeup is off at the USB level
- All the quirks in drivers/usb/core/quirks.c
   e.g. USB_QUIRK_RESET_RESUME,
        USB_QUIRK_RESET,
        USB_QUIRK_IGNORE_REMOTE_WAKEUP,
        USB_QUIRK_NO_LPM.

but none of that makes any difference.

There are no errors in the logs showing any suspend/resume-related issues.
When the system wakes up due to the modem, log-wise it appears to be a
normal resume.

Introduce a quirk to disable the port during suspend when the modem is
detected.

Signed-off-by: penghao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Merge tag 'phy-fixes-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy into usb-linus

Vinod writes:

phy: fixes for 5.10

Bunch of fixes for phy drivers:
*) USB phy incorrect clearing of bits
*) Tegra xusb dangling pointer
*) qcom-qmp null ptr initialization
*) cpcap-usb irq flags
*) intel kkembay kconfig depends
*) qualcomm OF dependency
*) mediatek typo

* tag 'phy-fixes-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: mediatek: fix spelling mistake in Kconfig "veriosn" -> "version"
  phy: qualcomm: Fix 28 nm Hi-Speed USB PHY OF dependency
  phy: qualcomm: usb: Fix SuperSpeed PHY OF dependency
  phy: intel: PHY_INTEL_KEEMBAY_EMMC should depend on ARCH_KEEMBAY
  phy: cpcap-usb: Use IRQF_ONESHOT
  phy: qcom-qmp: Initialize another pointer to NULL
  phy: tegra: xusb: Fix dangling pointer on probe failure
  phy: usb: Fix incorrect clearing of tca_drv_sel bit in SETUP reg for 7211

xsk: Fix umem cleanup bug at socket destruct

Fix a bug that is triggered when a partially setup socket is
destroyed. For a fully setup socket, a socket that has been bound to a
device, the cleanup of the umem is performed at the end of the buffer
pool's cleanup work queue item. This has to be performed in a work
queue, and not in RCU cleanup, as it is doing a vunmap that cannot
execute in interrupt context. However, when a socket has only been
partially set up so that a umem has been created but the buffer pool
has not, the code erroneously directly calls the umem cleanup function
instead of using a work queue, and this leads to a BUG_ON() in
vunmap().

As there in this case is no buffer pool, we cannot use its work queue,
so we need to introduce a work queue for the umem and schedule this for
the cleanup. So in the case there is no pool, we are going to use the
umem's own work queue to schedule the cleanup. But if there is a
pool, the cleanup of the umem is still being performed by the pool's
work queue, as it is important that the umem is cleaned up after the
pool.

Fixes: e5e1a4bc916d ("xsk: Fix possible memory leak at socket close")
Reported-by: Marek Majtyka <[email protected]>
Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Tested-by: Marek Majtyka <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

MAINTAINERS: Update XDP and AF_XDP entries

Getting too many false positive matches with current use
of the content regex K: and file regex N: patterns.

This patch drops file match N: and makes K: more restricted.
Some more normal F: file wildcards are added.

Notice that AF_XDP forgot to some F: files that is also
updated in this patch.

Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/bpf/160586238944.2808432.4401269290440394008.stgit@firesoul

interconnect: fix memory trashing in of_count_icc_providers()

of_count_icc_providers() function uses for_each_available_child_of_node()
helper to recursively check all the available nodes. This helper already
properly handles child nodes' reference count, so there is no need to do
it explicitly. Remove the excessive call to of_node_put(). This fixes
memory trashing when CONFIG_OF_DYNAMIC is enabled (for example
arm/multi_v7_defconfig).

Fixes: b1d681d8d324 ("interconnect: Add sync state support")
Signed-off-by: Marek Szyprowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Georgi Djakov <[email protected]>

ALSA: hda/realtek - Fixed Dell AIO wrong sound tone

This platform only had one audio jack.
If it plugged speaker then replug with speaker or headset, the sound
tone will change to abnormal.
Headset Mic also can't record when this issue was happen.

[ Added a short comment about the COEF by tiwai ]

Signed-off-by: Kailang Yang <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

interconnect: qcom: qcs404: Remove GPU and display RPM IDs

The following errors are noticed during boot on a QCS404 board:
[ 2.926647] qcom_icc_rpm_smd_send mas 6 error -6
[ 2.934573] qcom_icc_rpm_smd_send mas 8 error -6

These errors show when we try to configure the GPU and display nodes.
Since these particular nodes aren't supported on RPM and are purely
local, we should just change their mas_rpm_id to -1 to avoid any
requests being sent for these master IDs.

Reviewed-by: Mike Tipton <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Georgi Djakov <[email protected]>

interconnect: qcom: msm8916: Remove rpm-ids from non-RPM nodes

Some nodes are incorrectly marked as RPM-controlled (they have RPM
master and slave ids assigned), but are actually controlled by the
application CPU instead. The RPM complains when we send requests for
resources that it can't control. Let's fix this by replacing the IDs,
with the default "-1" in which case no requests are sent.

Reviewed-by: Mike Tipton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Georgi Djakov <[email protected]>