Git Repo - linux.git/log

net: core: Distribute switch variables for initialization

Variables declared in a switch statement before any case statements
cannot be automatically initialized with compiler instrumentation (as
they are not part of any execution flow). With GCC's proposed automatic
stack variable initialization feature, this triggers a warning (and they
don't get initialized). Clang's automatic stack variable initialization
(via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
doesn't initialize such variables[1]. Note that these warnings (or silent
skipping) happen before the dead-store elimination optimization phase,
so even when the automatic initializations are later elided in favor of
direct initializations, the warnings remain.

To avoid these problems, move such variables into the "case" where
they're used or lift them up into the main function body.

net/core/skbuff.c: In function ‘skb_checksum_setup_ip’:
net/core/skbuff.c:4809:7: warning: statement will never be executed [-Wswitch-unreachable]
4809 | int err;
| ^~~

[1] https://bugs.llvm.org/show_bug.cgi?id=44916

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

x86/xen: Distribute switch variables for initialization

Variables declared in a switch statement before any case statements
cannot be automatically initialized with compiler instrumentation (as
they are not part of any execution flow). With GCC's proposed automatic
stack variable initialization feature, this triggers a warning (and they
don't get initialized). Clang's automatic stack variable initialization
(via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
doesn't initialize such variables[1]. Note that these warnings (or silent
skipping) happen before the dead-store elimination optimization phase,
so even when the automatic initializations are later elided in favor of
direct initializations, the warnings remain.

To avoid these problems, move such variables into the "case" where
they're used or lift them up into the main function body.

arch/x86/xen/enlighten_pv.c: In function ‘xen_write_msr_safe’:
arch/x86/xen/enlighten_pv.c:904:12: warning: statement will never be executed [-Wswitch-unreachable]
904 | unsigned which;
| ^~~~~

[1] https://bugs.llvm.org/show_bug.cgi?id=44916

Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Juergen Gross <[email protected]>
[boris: made @which an 'unsigned int']
Signed-off-by: Boris Ostrovsky <[email protected]>

mm: Avoid creating virtual address aliases in brk()/mmap()/mremap()

Currently the arm64 kernel ignores the top address byte passed to brk(),
mmap() and mremap(). When the user is not aware of the 56-bit address
limit or relies on the kernel to return an error, untagging such
pointers has the potential to create address aliases in user-space.
Passing a tagged address to munmap(), madvise() is permitted since the
tagged pointer is expected to be inside an existing mapping.

The current behaviour breaks the existing glibc malloc() implementation
which relies on brk() with an address beyond 56-bit to be rejected by
the kernel.

Remove untagging in the above functions by partially reverting commit
ce18d171cb73 ("mm: untag user pointers in mmap/munmap/mremap/brk"). In
addition, update the arm64 tagged-address-abi.rst document accordingly.

Link: https://bugzilla.redhat.com/1797052
Fixes: ce18d171cb73 ("mm: untag user pointers in mmap/munmap/mremap/brk")
Cc: <[email protected]> # 5.4.x-
Cc: Florian Weimer <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reported-by: Victor Stinner <[email protected]>
Acked-by: Will Deacon <[email protected]>
Acked-by: Andrey Konovalov <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

docs: arm64: fix trivial spelling enought to enough in memory.rst

Fix trivial spelling error enought to enough in memory.rst.

Cc: [email protected]
Signed-off-by: Scott Branden <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

ext4: add cond_resched() to __ext4_find_entry()

We tested a soft lockup problem in linux 4.19 which could also
be found in linux 5.x.

When dir inode takes up a large number of blocks, and if the
directory is growing when we are searching, it's possible the
restart branch could be called many times, and the do while loop
could hold cpu a long time.

Here is the call trace in linux 4.19.

[  473.756186] Call trace:
[  473.756196]  dump_backtrace+0x0/0x198
[  473.756199]  show_stack+0x24/0x30
[  473.756205]  dump_stack+0xa4/0xcc
[  473.756210]  watchdog_timer_fn+0x300/0x3e8
[  473.756215]  __hrtimer_run_queues+0x114/0x358
[  473.756217]  hrtimer_interrupt+0x104/0x2d8
[  473.756222]  arch_timer_handler_virt+0x38/0x58
[  473.756226]  handle_percpu_devid_irq+0x90/0x248
[  473.756231]  generic_handle_irq+0x34/0x50
[  473.756234]  __handle_domain_irq+0x68/0xc0
[  473.756236]  gic_handle_irq+0x6c/0x150
[  473.756238]  el1_irq+0xb8/0x140
[  473.756286]  ext4_es_lookup_extent+0xdc/0x258 [ext4]
[  473.756310]  ext4_map_blocks+0x64/0x5c0 [ext4]
[  473.756333]  ext4_getblk+0x6c/0x1d0 [ext4]
[  473.756356]  ext4_bread_batch+0x7c/0x1f8 [ext4]
[  473.756379]  ext4_find_entry+0x124/0x3f8 [ext4]
[  473.756402]  ext4_lookup+0x8c/0x258 [ext4]
[  473.756407]  __lookup_hash+0x8c/0xe8
[  473.756411]  filename_create+0xa0/0x170
[  473.756413]  do_mkdirat+0x6c/0x140
[  473.756415]  __arm64_sys_mkdirat+0x28/0x38
[  473.756419]  el0_svc_common+0x78/0x130
[  473.756421]  el0_svc_handler+0x38/0x78
[  473.756423]  el0_svc+0x8/0xc
[  485.755156] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [tmp:5149]

Add cond_resched() to avoid soft lockup and to provide a better
system responding.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Shijie Luo <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: [email protected]

ext4: fix a data race in EXT4_I(inode)->i_disksize

EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
KCSAN,

BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]

write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
  ext4_write_end+0x4e3/0x750 [ext4]
  ext4_update_i_disksize at fs/ext4/ext4.h:3032
  (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
  (inlined by) ext4_write_end at fs/ext4/inode.c:1287
  generic_perform_write+0x208/0x2a0
  ext4_buffered_write_iter+0x11f/0x210 [ext4]
  ext4_file_write_iter+0xce/0x9e0 [ext4]
  new_sync_write+0x29c/0x3b0
  __vfs_write+0x92/0xa0
  vfs_write+0x103/0x260
  ksys_write+0x9d/0x130
  __x64_sys_write+0x4c/0x60
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
  ext4_writepages+0x10ac/0x1d00 [ext4]
  mpage_map_and_submit_extent at fs/ext4/inode.c:2468
  (inlined by) ext4_writepages at fs/ext4/inode.c:2772
  do_writepages+0x5e/0x130
  __writeback_single_inode+0xeb/0xb20
  writeback_sb_inodes+0x429/0x900
  __writeback_inodes_wb+0xc4/0x150
  wb_writeback+0x4bd/0x870
  wb_workfn+0x6b4/0x960
  process_one_work+0x54c/0xbe0
  worker_thread+0x80/0x650
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

Reported by Kernel Concurrency Sanitizer on:
CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G        W  O L 5.5.0-next-20200204+ #5
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
Workqueue: writeback wb_workfn (flush-7:0)

Since only the read is operating as lockless (outside of the
"i_data_sem"), load tearing could introduce a logic bug. Fix it by
adding READ_ONCE() for the read and WRITE_ONCE() for the write.

Signed-off-by: Qian Cai <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]

Merge branch 'linux-5.6' of git://github.com/skeggsb/linux into drm-fixes

Nothing major here, another TU1xx modesetting fix, and hooking up
ACR/GR support on TU11x now that NVIDIA have made the firmware
available.

Signed-off-by: Dave Airlie <[email protected]>
From: Ben Skeggs <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/

hwmon: (acpi_power_meter) Fix lockdep splat

Damien Le Moal reports a lockdep splat with the acpi_power_meter,
observed with Linux v5.5 and later.

======================================================
WARNING: possible circular locking dependency detected
5.6.0-rc2+ #629 Not tainted
------------------------------------------------------
python/1397 is trying to acquire lock:
ffff888619080070 (&resource->lock){+.+.}, at: show_power+0x3c/0xa0 [acpi_power_meter]

               but task is already holding lock:
ffff88881643f188 (kn->count#119){++++}, at: kernfs_seq_start+0x6a/0x160

               which lock already depends on the new lock.

               the existing dependency chain (in reverse order) is:

               -> #1 (kn->count#119){++++}:
       __kernfs_remove+0x626/0x7e0
       kernfs_remove_by_name_ns+0x41/0x80
       remove_attrs+0xcb/0x3c0 [acpi_power_meter]
       acpi_power_meter_notify+0x1f7/0x310 [acpi_power_meter]
       acpi_ev_notify_dispatch+0x198/0x1f3
       acpi_os_execute_deferred+0x4d/0x70
       process_one_work+0x7c8/0x1340
       worker_thread+0x94/0xc70
       kthread+0x2ed/0x3f0
       ret_from_fork+0x24/0x30

               -> #0 (&resource->lock){+.+.}:
       __lock_acquire+0x20be/0x49b0
       lock_acquire+0x127/0x340
       __mutex_lock+0x15b/0x1350
       show_power+0x3c/0xa0 [acpi_power_meter]
       dev_attr_show+0x3f/0x80
       sysfs_kf_seq_show+0x216/0x410
       seq_read+0x407/0xf90
       vfs_read+0x152/0x2c0
       ksys_read+0xf3/0x1d0
       do_syscall_64+0x95/0x1010
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

               other info that might help us debug this:

Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(kn->count#119);
                               lock(&resource->lock);
                               lock(kn->count#119);
  lock(&resource->lock);

                *** DEADLOCK ***
4 locks held by python/1397:
#0: ffff8890242d64e0 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0x9b/0xb0
#1: ffff889040be74e0 (&p->lock){+.+.}, at: seq_read+0x6b/0xf90
#2: ffff8890448eb880 (&of->mutex){+.+.}, at: kernfs_seq_start+0x47/0x160
#3: ffff88881643f188 (kn->count#119){++++}, at: kernfs_seq_start+0x6a/0x160

               stack backtrace:
CPU: 10 PID: 1397 Comm: python Not tainted 5.6.0-rc2+ #629
Hardware name: Supermicro Super Server/X11DPL-i, BIOS 3.1 05/21/2019
Call Trace:
dump_stack+0x97/0xe0
check_noncircular+0x32e/0x3e0
? print_circular_bug.isra.0+0x1e0/0x1e0
? unwind_next_frame+0xb9a/0x1890
? entry_SYSCALL_64_after_hwframe+0x49/0xbe
? graph_lock+0x79/0x170
? __lockdep_reset_lock+0x3c0/0x3c0
? mark_lock+0xbc/0x1150
__lock_acquire+0x20be/0x49b0
? mark_held_locks+0xe0/0xe0
? stack_trace_save+0x91/0xc0
lock_acquire+0x127/0x340
? show_power+0x3c/0xa0 [acpi_power_meter]
? device_remove_bin_file+0x10/0x10
? device_remove_bin_file+0x10/0x10
__mutex_lock+0x15b/0x1350
? show_power+0x3c/0xa0 [acpi_power_meter]
? show_power+0x3c/0xa0 [acpi_power_meter]
? mutex_lock_io_nested+0x11f0/0x11f0
? lock_downgrade+0x6a0/0x6a0
? kernfs_seq_start+0x47/0x160
? lock_acquire+0x127/0x340
? kernfs_seq_start+0x6a/0x160
? device_remove_bin_file+0x10/0x10
? show_power+0x3c/0xa0 [acpi_power_meter]
show_power+0x3c/0xa0 [acpi_power_meter]
dev_attr_show+0x3f/0x80
? memset+0x20/0x40
sysfs_kf_seq_show+0x216/0x410
seq_read+0x407/0xf90
? security_file_permission+0x16f/0x2c0
vfs_read+0x152/0x2c0

Problem is that reading an attribute takes the kernfs lock in the kernfs
code, then resource->lock in the driver. During an ACPI notification, the
opposite happens: The resource lock is taken first, followed by the kernfs
lock when sysfs attributes are removed and re-created. Presumably this is
now seen due to some locking related changes in kernfs after v5.4, but it
was likely always a problem.

Fix the problem by not blindly acquiring the lock in the notification
function. It is only needed to protect the various update functions.
However, those update functions are called anyway when sysfs attributes
are read. This means that we can just stop calling those functions from
the notifier, and the resource lock in the notifier function is no longer
needed.

That leaves two situations:

First, METER_NOTIFY_CONFIG removes and re-allocates capability strings.
While it did so under the resource lock, _displaying_ those strings was not
protected, creating a race condition. To solve this problem, selectively
protect both removal/creation and reporting of capability attributes with
the resource lock.

Second, removing and re-creating the attribute files is no longer protected
by the resource lock. That doesn't matter since access to each individual
attribute is protected by the kernfs lock. Userspace may get messed up if
attributes disappear and reappear under its nose, but that is not different
than today, and there is nothing we can do about it without major driver
restructuring.

Last but not least, when removing the driver, remove attribute functions
first, then release capability strings. This avoids yet another race
condition.

Reported-by: Damien Le Moal <[email protected]>
Cc: Damien Le Moal <[email protected]>
Cc: [email protected] # v5.5+
Tested-by: Damien Le Moal <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>

Merge tag 'linux-kselftest-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:
"Fixes to build failures and other test bugs"

* tag 'linux-kselftest-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: openat2: fix build error on newer glibc
  selftests: use LDLIBS for libraries instead of LDFLAGS
  selftests: fix too long argument
  selftests: allow detection of build failures
  Kernel selftests: tpm2: check for tpm support
  selftests/ftrace: Have pid filter test use instance flag
  selftests: fix spelling mistaked "chaigned" -> "chained"

dt-bindings: media: csi: Fix clocks description

Commit 1de243b07666 ("media: dt-bindings: media: sun4i-csi: Add compatible
for CSI1 on A10/A20") introduced support for the CSI1 controller on A10 and
A20 that unlike CSI0 doesn't have an ISP and therefore only have two
clocks, the bus and module clocks.

The clocks and clock-names properties have thus been modified to allow
either two or tree clocks. However, the current list has the ISP clock at
the second position, which means the bindings expects a list of either
bus and isp, or bus, isp and mod. The initial intent of the patch was
obviously to have bus and mod in the former case.

Let's fix the binding so that it validates properly.

Fixes: 1de243b07666 ("media: dt-bindings: media: sun4i-csi: Add compatible for CSI1 on A10/A20")
Signed-off-by: Maxime Ripard <[email protected]>
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: media: csi: Add interconnects properties

The Allwinner CSI controller is sitting beside the MBUS that is represented
as an interconnect.

Make sure that the interconnect properties are valid in the binding.

Fixes: 7866d6903ce8 ("media: dt-bindings: media: sun4i-csi: Add compatible for CSI0 on R40")
Signed-off-by: Maxime Ripard <[email protected]>
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: net: mdio: remove compatible string from example

Remove vendor specific compatible string from example, otherwise DT YAML
schemas validation may trigger warnings specific to TI ti,davinci_mdio
and not to the generic MDIO example.

For example, the "bus_freq" is required for davinci_mdio, but not required for
generic mdio example. As result following warning will be produced:
mdio.example.dt.yaml: mdio@5c030000: 'bus_freq' is a required property

Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: memory-controller: Update example for Tegra124 EMC

The example in the Tegra124 EMC device tree binding looks like an old
version that doesn't contain all the required fields. Update it with a
version from the current DTS files to fix the make dt_binding_check
target.

Reported-by: Rob Herring <[email protected]>
Signed-off-by: Thierry Reding <[email protected]>
[robh: also fix missing '#reset-cells']
Signed-off-by: Rob Herring <[email protected]>

Merge tag 'drm-msm-fixes-2020-02-16' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

+ fix UBWC on GPU and display side for sc7180
+ fix DSI suspend/resume issue encountered on sc7180
+ fix some breakage on so called "linux-android" devices
(fallout from sc7180/a618 support, not seen earlier
due to bootloader/firmware differences)
+ couple other misc fixes

Signed-off-by: Dave Airlie <[email protected]>
From: Rob Clark <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/

Merge tag 'amd-drm-fixes-5.6-2020-02-19' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

amd-drm-fixes-5.6-2020-02-19:

amdgpu:
- HDCP fixes
- xclk fix for raven
- GFXOFF fixes

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2020-02-19

The following pull-request contains BPF updates for your *net* tree.

We've added 10 non-merge commits during the last 10 day(s) which contain
a total of 10 files changed, 93 insertions(+), 31 deletions(-).

The main changes are:

1) batched bpf hashtab fixes from Brian and Yonghong.

2) various selftests and libbpf fixes.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2020-02-19

This series contains fixes to the ice driver.

Brett fixes an issue where if a user sets an odd [tx|rx]-usecs value
through ethtool, the request is denied because the hardware is set to
have an ITR with 2us granularity. Also fix an issue where the VF has
not been completely removed/reset after being unbound from the host
driver, so resolve this by waiting for the VF remove/reset process to
happen before checking if the VF is disabled.

Michal fixes an issue, where when the user changes flow control via
ethtool, the OS is told the link is going down when that may not be the
case. Before the fix, the only way to get out of this state was to take
the interface down and up again.
====================

Signed-off-by: David S. Miller <[email protected]>

udp: rehash on disconnect

As of the below commit, udp sockets bound to a specific address can
coexist with one bound to the any addr for the same port.

The commit also phased out the use of socket hashing based only on
port (hslot), in favor of always hashing on {addr, port} (hslot2).

The change broke the following behavior with disconnect (AF_UNSPEC):

    server binds to 0.0.0.0:1337
    server connects to 127.0.0.1:80
    server disconnects
    client connects to 127.0.0.1:1337
    client sends "hello"
    server reads "hello" // times out, packet did not find sk

On connect the server acquires a specific source addr suitable for
routing to its destination. On disconnect it reverts to the any addr.

The connect call triggers a rehash to a different hslot2. On
disconnect, add the same to return to the original hslot2.

Skip this step if the socket is going to be unhashed completely.

Fixes: 4cdeeee9252a ("net: udp: prefer listeners bound to an address")
Reported-by: Pavel Roskin <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/tls: Fix to avoid gettig invalid tls record

Current code doesn't check if tcp sequence number is starting from (/after)
1st record's start sequnce number. It only checks if seq number is before
1st record's end sequnce number. This problem will always be a possibility
in re-transmit case. If a record which belongs to a requested seq number is
already deleted, tls_get_record will start looking into list and as per the
check it will look if seq number is before the end seq of 1st record, which
will always be true and will return 1st record always, it should in fact
return NULL.
As part of the fix, start looking each record only if the sequence number
lies in the list else return NULL.
There is one more check added, driver look for the start marker record to
handle tcp packets which are before the tls offload start sequence number,
hence return 1st record if the record is tls start marker and seq number is
before the 1st record's starting sequence number.

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Rohit Maheshwari <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

dt-bindings: mmc: omap-hsmmc: Fix SDIO interrupt

SDIO interrupt must be specified correctly as IRQ_TYPE_LEVEL_LOW instead of GPIO_ACTIVE_LOW.

Signed-off-by: Tomas Paukrt <[email protected]>
Signed-off-by: Rob Herring <[email protected]>

bpf: Fix a potential deadlock with bpf_map_do_batch

Commit 057996380a42 ("bpf: Add batch ops to all htab bpf map")
added lookup_and_delete batch operation for hash table.
The current implementation has bpf_lru_push_free() inside
the bucket lock, which may cause a deadlock.

syzbot reports:
   -> #2 (&htab->buckets[i].lock#2){....}:
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
       htab_lru_map_delete_node+0xce/0x2f0 kernel/bpf/hashtab.c:593
       __bpf_lru_list_shrink_inactive kernel/bpf/bpf_lru_list.c:220 [inline]
       __bpf_lru_list_shrink+0xf9/0x470 kernel/bpf/bpf_lru_list.c:266
       bpf_lru_list_pop_free_to_local kernel/bpf/bpf_lru_list.c:340 [inline]
       bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:447 [inline]
       bpf_lru_pop_free+0x87c/0x1670 kernel/bpf/bpf_lru_list.c:499
       prealloc_lru_pop+0x2c/0xa0 kernel/bpf/hashtab.c:132
       __htab_lru_percpu_map_update_elem+0x67e/0xa90 kernel/bpf/hashtab.c:1069
       bpf_percpu_hash_update+0x16e/0x210 kernel/bpf/hashtab.c:1585
       bpf_map_update_value.isra.0+0x2d7/0x8e0 kernel/bpf/syscall.c:181
       generic_map_update_batch+0x41f/0x610 kernel/bpf/syscall.c:1319
       bpf_map_do_batch+0x3f5/0x510 kernel/bpf/syscall.c:3348
       __do_sys_bpf+0x9b7/0x41e0 kernel/bpf/syscall.c:3460
       __se_sys_bpf kernel/bpf/syscall.c:3355 [inline]
       __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:3355
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

   -> #0 (&loc_l->lock){....}:
       check_prev_add kernel/locking/lockdep.c:2475 [inline]
       check_prevs_add kernel/locking/lockdep.c:2580 [inline]
       validate_chain kernel/locking/lockdep.c:2970 [inline]
       __lock_acquire+0x2596/0x4a00 kernel/locking/lockdep.c:3954
       lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4484
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
       bpf_common_lru_push_free kernel/bpf/bpf_lru_list.c:516 [inline]
       bpf_lru_push_free+0x250/0x5b0 kernel/bpf/bpf_lru_list.c:555
       __htab_map_lookup_and_delete_batch+0x8d4/0x1540 kernel/bpf/hashtab.c:1374
       htab_lru_map_lookup_and_delete_batch+0x34/0x40 kernel/bpf/hashtab.c:1491
       bpf_map_do_batch+0x3f5/0x510 kernel/bpf/syscall.c:3348
       __do_sys_bpf+0x1f7d/0x41e0 kernel/bpf/syscall.c:3456
       __se_sys_bpf kernel/bpf/syscall.c:3355 [inline]
       __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:3355
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Possible unsafe locking scenario:

          CPU0                    CPU2
          ----                    ----
     lock(&htab->buckets[i].lock#2);
                                  lock(&l->lock);
                                  lock(&htab->buckets[i].lock#2);
     lock(&loc_l->lock);

    *** DEADLOCK ***

To fix the issue, for htab_lru_map_lookup_and_delete_batch() in CPU0,
let us do bpf_lru_push_free() out of the htab bucket lock. This can
avoid the above deadlock scenario.

Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map")
Reported-by: [email protected]
Reported-by: [email protected]
Suggested-by: Hillf Danton <[email protected]>
Suggested-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Reviewed-by: Jakub Sitnicki <[email protected]>
Acked-by: Brian Vazquez <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Do not grab the bucket spinlock by default on htab batch ops

Grabbing the spinlock for every bucket even if it's empty, was causing
significant perfomance cost when traversing htab maps that have only a
few entries. This patch addresses the issue by checking first the
bucket_cnt, if the bucket has some entries then we go and grab the
spinlock and proceed with the batching.

Tested with a htab of size 50K and different value of populated entries.

Before:
  Benchmark             Time(ns)        CPU(ns)
  ---------------------------------------------
  BM_DumpHashMap/1       2759655        2752033
  BM_DumpHashMap/10      2933722        2930825
  BM_DumpHashMap/200     3171680        3170265
  BM_DumpHashMap/500     3639607        3635511
  BM_DumpHashMap/1000    4369008        4364981
  BM_DumpHashMap/5k     11171919       11134028
  BM_DumpHashMap/20k    69150080       69033496
  BM_DumpHashMap/39k   190501036      190226162

After:
  Benchmark             Time(ns)        CPU(ns)
  ---------------------------------------------
  BM_DumpHashMap/1        202707         200109
  BM_DumpHashMap/10       213441         210569
  BM_DumpHashMap/200      478641         472350
  BM_DumpHashMap/500      980061         967102
  BM_DumpHashMap/1000    1863835        1839575
  BM_DumpHashMap/5k      8961836        8902540
  BM_DumpHashMap/20k    69761497       69322756
  BM_DumpHashMap/39k   187437830      186551111

Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map")
Signed-off-by: Brian Vazquez <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

ice: Wait for VF to be reset/ready before configuration

The configuration/command below is failing when the VF in the xml
file is already bound to the host iavf driver.

pci_0000_af_0_0.xml:

<interface type='hostdev' managed='yes'>
<source>
<address type='pci' domain='0x0000' bus='0xaf' slot='0x0' function='0x0'/>
</source>
<mac address='00:de:ad:00:11:01'/>
</interface>

> virsh attach-device domain_name pci_0000_af_0_0.xml
error: Failed to attach device from pci_0000_af_0_0.xml
error: Cannot set interface MAC/vlanid to 00:de:ad:00:11:01/0 for
ifname ens1f1 vf 0: Device or resource busy

This is failing because the VF has not been completely removed/reset
after being unbound (via the virsh command above) from the host iavf
driver and ice_set_vf_mac() checks if the VF is disabled before waiting
for the reset to finish.

Fix this by waiting for the VF remove/reset process to happen before
checking if the VF is disabled. Also, since many functions for VF
administration on the PF were more or less calling the same 3 functions
(ice_wait_on_vf_reset(), ice_is_vf_disabled(), and ice_check_vf_init())
move these into the helper function ice_check_vf_ready_for_cfg(). Then
call this function in any flow that attempts to configure/query a VF
from the PF.

Lastly, increase the maximum wait time in ice_wait_on_vf_reset() to
800ms, and modify/add the #define(s) that determine the wait time.
This was done for robustness because in rare/stress cases VF removal can
take a max of ~800ms and previously the wait was a max of ~300ms.

Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

ice: Don't tell the OS that link is going down

Remove code that tell the OS that link is going down when user
change flow control via ethtool. When link is up it isn't certain
that link goes down after 0x0605 aq command. If link doesn't go
down, OS thinks that link is down, but physical link is up. To
reset this state user have to take interface down and up.

If link goes down after 0x0605 command, FW send information
about that and after that driver tells the OS that the link goes
down. So this code in ethtool is unnecessary.

Signed-off-by: Michal Swiatkowski <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

ice: Don't reject odd values of usecs set by user

Currently if a user sets an odd [tx|rx]-usecs value through ethtool,
the request is denied because the hardware is set to have an ITR
granularity of 2us. This caused poor customer experience. Fix this by
aligning to a register allowed value, which results in rounding down.
Also, print a once per ring container type message to be clear about
our intentions.

Also, change the ITR_TO_REG define to be the bitwise and of the ITR
setting and the ICE_ITR_MASK. This makes the purpose of ITR_TO_REG more
obvious.

Signed-off-by: Brett Creeley <[email protected]>
Tested-by: Andrew Bowers <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

bridge: br_stp: Use built-in RCU list checking

list_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nfc: pn544: Fix occasional HW initialization failure

The PN544 driver checks the "enable" polarity during of driver's probe and
it's doing that by turning ON and OFF NFC with different polarities until
enabling succeeds. It takes some time for the hardware to power-down, and
thus, to deassert the IRQ that is raised by turning ON the hardware.
Since the delay after last power-down of the polarity-checking process is
missed in the code, the interrupt may trigger immediately after installing
the IRQ handler (right after the checking is done), which results in IRQ
handler trying to touch the disabled HW and ends with marking NFC as
'DEAD' during of the driver's probe:

  pn544_hci_i2c 1-002a: NFC: nfc_en polarity : active high
  pn544_hci_i2c 1-002a: NFC: invalid len byte
  shdlc: llc_shdlc_recv_frame: NULL Frame -> link is dead

This patch fixes the occasional NFC initialization failure on Nexus 7
device.

Signed-off-by: Dmitry Osipenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF

Commit

  aaf248848db50 ("perf/x86/msr: Add AMD IRPERF (Instructions Retired)
  performance counter")

added support for access to the free-running counter via 'perf -e
msr/irperf/', but when exercised, it always returns a 0 count:

BEFORE:

  $ perf stat -e instructions,msr/irperf/ true

   Performance counter stats for 'true':

             624,833      instructions
                   0      msr/irperf/

Simply set its enable bit - HWCR bit 30 - to make it start counting.

Enablement is restricted to all machines advertising IRPERF capability,
except those susceptible to an erratum that makes the IRPERF return
bad values.

That erratum occurs in Family 17h models 00-1fh [1], but not in F17h
models 20h and above [2].

AFTER (on a family 17h model 31h machine):

  $ perf stat -e instructions,msr/irperf/ true

   Performance counter stats for 'true':

             621,690      instructions
             622,490      msr/irperf/

[1] Revision Guide for AMD Family 17h Models 00h-0Fh Processors
[2] Revision Guide for AMD Family 17h Models 30h-3Fh Processors

The revision guides are available from the bugzilla Link below.

[ bp: Massage commit message. ]

Fixes: aaf248848db50 ("perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter")
Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Link: http://lkml.kernel.org/r/[email protected]

net: hsr: Pass lockdep expression to RCU lists

node_db is traversed using list_for_each_entry_rcu
outside an RCU read-side critical section but under the protection
of hsr->list_lock.

Hence, add corresponding lockdep expression to silence false-positive
warnings, and harden RCU lists.

Signed-off-by: Amol Grover <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'mlx5-fixes-2020-02-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2020-02-18

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

For -stable v5.3
('net/mlx5: Fix sleep while atomic in mlx5_eswitch_get_vepa')

For -stable v5.4
('net/mlx5: DR, Fix matching on vport gvmi')
('net/mlx5e: Fix crash in recovery flow without devlink reporter')

For -stable v5.5
('net/mlx5e: Reset RQ doorbell counter before moving RQ state from RST to RDY')
('net/mlx5e: Don't clear the whole vf config when switching modes')
====================

Signed-off-by: David S. Miller <[email protected]>

Merge tag 'iommu-fixes-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

- Compile warning fix for the Intel IOMMU driver

- Fix kdump boot with Intel IOMMU enabled and in passthrough mode

- Disable AMD IOMMU on a Laptop/Embedded platform because the delay it
   introduces in DMA transactions causes screen flickering there with 4k
   monitors

- Make domain_free function in QCOM IOMMU driver robust and not leak
   memory/dereference NULL pointers

- Fix ARM-SMMU module parameter prefix names

* tag 'iommu-fixes-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/arm-smmu: Restore naming of driver parameter prefix
  iommu/qcom: Fix bogus detach logic
  iommu/amd: Disable IOMMU on Stoney Ridge systems
  iommu/vt-d: Simplify check in identity_mapping()
  iommu/vt-d: Remove deferred_attach_domain()
  iommu/vt-d: Do deferred attachment in iommu_need_mapping()
  iommu/vt-d: Move deferred device attachment into helper function
  iommu/vt-d: Add attach_deferred() helper
  iommu/vt-d: Fix compile warning from intel-svm.h

Merge tag 'sound-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"The only largish change in this pull request is about the revert of
  the recent max98090 and its relevant patches due to regressions.

  Other than that, all small fixes for ALSA core (covering KCSAN fuzzer
  warnings in ALSA sequencer and rawmidi), Intel SOF HD-audio fixes, AMD
  ACP fixes, usual HD-audio quirks, and various ASoC fixes"

* tag 'sound-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda: Use scnprintf() for printing texts for sysfs/procfs
  ALSA: hda/realtek - Apply quirk for yet another MSI laptop
  ASoC: sun8i-codec: Fix setting DAI data format
  ALSA: hda/realtek - Apply quirk for MSI GP63, too
  ASoC: amd: ACP needs to be powered off in BIOS.
  ASoC: hdmi-codec: set plugged_cb to NULL when component removing
  ASoC: dapm: remove snd_soc_dapm_put_enum_double_locked
  ASoC: max98090: revert invalid fix for handling SHDN
  ALSA: rawmidi: Avoid bit fields for state flags
  ALSA: seq: Fix concurrent access to queue current tick/time
  ALSA: seq: Avoid concurrent access to queue flags
  ASoC: codec2codec: avoid invalid/double-free of pcm runtime
  ASoC: amd: Buffer Size instead of MAX Buffer
  ASoC: SOF: Intel: hda: move i915 init earlier
  ASoC: SOF: Intel: hda: fix ordering bug in resume flow
  ALSA: hda: do not override bus codec_mask in link_get()
  ASoC: atmel: fix atmel_ssc_set_audio link failure
  ASoC: fsl_sai: Fix exiting path on probing failure

nvme: Fix uninitialized-variable warning

gcc may detect a false positive on nvme using an unintialized variable
if setting features fails. Since this is not a fast path, explicitly
initialize this variable to suppress the warning.

Reported-by: Arnd Bergmann <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>

s390/qdio: fill SBALEs with absolute addresses

sbale->addr holds an absolute address (or for some FCP usage, an opaque
request ID), and should only be used with proper virt/phys translation.

Signed-off-by: Julian Wiedmann <[email protected]>
Reviewed-by: Benjamin Block <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/qdio: fill SL with absolute addresses

As the comment says, sl->sbal holds an absolute address. qeth currently
solves this through wild casting, while zfcp doesn't care.

Handle this properly in the code that actually builds the SL.

Signed-off-by: Julian Wiedmann <[email protected]>
Reviewed-by: Alexandra Winter <[email protected]>
Reviewed-by: Steffen Maier <[email protected]> [for qdio]
Reviewed-by: Benjamin Block <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

x86/boot/compressed: Don't declare __force_order in kaslr_64.c

GCC 10 changed the default to -fno-common, which leads to

    LD      arch/x86/boot/compressed/vmlinux
  ld: arch/x86/boot/compressed/pgtable_64.o:(.bss+0x0): multiple definition of `__force_order'; \
    arch/x86/boot/compressed/kaslr_64.o:(.bss+0x0): first defined here
  make[2]: *** [arch/x86/boot/compressed/Makefile:119: arch/x86/boot/compressed/vmlinux] Error 1

Since __force_order is already provided in pgtable_64.c, there is no
need to declare __force_order in kaslr_64.c.

Signed-off-by: H.J. Lu <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

drm/amdgpu/display: clean up hdcp workqueue handling

Use the existence of the workqueue itself to determine when to
enable HDCP features rather than sprinkling asic checks all over
the code. Also add a check for the existence of the hdcp
workqueue in the irq handling on the off chance we get and HPD
RX interrupt with the CP bit set. This avoids a crash if
the driver doesn't support HDCP for a particular asic.

Fixes: 96a3b32e67236f ("drm/amd/display: only enable HDCP for DCN+")
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206519
Reviewed-by: Bhawanpreet Lakha <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add is_raven_kicker judgement for raven1

The rlc version of raven_kicer_rlc is different from the legacy rlc
version of raven_rlc. So it needs to add a judgement function for
raven_kicer_rlc and avoid disable GFXOFF when loading raven_kicer_rlc.

Signed-off-by: changzhu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

nvme-pci: Use single IRQ vector for old Apple models

People reported that old Apple machines are not working properly
if the non-first IRQ vector is in use.

Set quirk for that models to limit IRQ to use first vector only.

Based on original patch by GitHub user npx001.

Link: https://github.com/Dunedan/mbp-2016-linux/issues/9
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Leif Liddy <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Keith Busch <[email protected]>

nvme/pci: Add sleep quirk for Samsung and Toshiba drives

The Samsung SSD SM981/PM981 and Toshiba SSD KBG40ZNT256G on the Lenovo
C640 platform experience runtime resume issues when the SSDs are kept in
sleep/suspend mode for long time.

This patch applies the 'Simple Suspend' quirk to these configurations.
With this patch, the issue had not been observed in a 1+ day test.

Reviewed-by: Jon Derrick <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Shyjumon N <[email protected]>
Signed-off-by: Keith Busch <[email protected]>

arm64: memory: Add missing brackets to untagged_addr() macro

Add brackets around the evaluation of the 'addr' parameter to the
untagged_addr() macro so that the cast to 'u64' applies to the result
of the expression.

Cc: <[email protected]>
Fixes: 597399d0cb91 ("arm64: tags: Preserve tags for addresses translated via TTBR1")
Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

s390: remove obsolete ieee_emulation_warnings

s390 math emulation was removed with commit 5a79859ae0f3 ("s390:
remove 31 bit support"), rendering ieee_emulation_warnings useless.
The code still built because it was protected by CONFIG_MATHEMU, which
was no longer selectable.

This patch removes the sysctl_ieee_emulation_warnings declaration and
the sysctl entry declaration.

Link: https://lkml.kernel.org/r/[email protected]
Reviewed-by: Vasily Gorbik <[email protected]>
Signed-off-by: Stephen Kitt <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

iommu/arm-smmu: Restore naming of driver parameter prefix

Extending the Arm SMMU driver to allow for modular builds changed
KBUILD_MODNAME to be "arm_smmu_mod" so that a single module could be
built from the multiple existing object files without the need to rename
any source files.

This inadvertently changed the name of the driver parameters, which may
lead to runtime issues if bootloaders are relying on the old names for
correctness (e.g. "arm-smmu.disable_bypass=0").

Although MODULE_PARAM_PREFIX can be overridden to restore the old naming
for builtin parameters, only the new name is matched by modprobe and so
loading the driver as a module would cause parameters specified on the
kernel command line to be ignored. Instead, rename "arm_smmu_mod" to
"arm_smmu". Whilst it's a bit of a bodge, this allows us to create a
single module without renaming any files and makes use of the fact that
underscores and hyphens can be used interchangeably in parameter names.

Cc: Robin Murphy <[email protected]>
Cc: Russell King <[email protected]>
Reported-by: Li Yang <[email protected]>
Fixes: cd221bd24ff5 ("iommu/arm-smmu: Allow building as a module")
Signed-off-by: Will Deacon <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

iommu/qcom: Fix bogus detach logic

Currently, the implementation of qcom_iommu_domain_free() is guaranteed
to do one of two things: WARN() and leak everything, or dereference NULL
and crash. That alone is terrible, but in fact the whole idea of trying
to track the liveness of a domain via the qcom_domain->iommu pointer as
a sanity check is full of fundamentally flawed assumptions. Make things
robust and actually functional by not trying to be quite so clever.

Reported-by: Brian Masney <[email protected]>
Tested-by: Brian Masney <[email protected]>
Reported-by: Naresh Kamboju <[email protected]>
Fixes: 0ae349a0f33f ("iommu/qcom: Add qcom_iommu")
Signed-off-by: Robin Murphy <[email protected]>
Tested-by: Stephan Gerhold <[email protected]>
Cc: [email protected] # v4.14+
Signed-off-by: Joerg Roedel <[email protected]>

iommu/amd: Disable IOMMU on Stoney Ridge systems

Serious screen flickering when Stoney Ridge outputs to a 4K monitor.

Use identity-mapping and PCI ATS doesn't help this issue.

According to Alex Deucher, IOMMU isn't enabled on Windows, so let's do
the same here to avoid screen flickering on 4K monitor.

Cc: Alex Deucher <[email protected]>
Bug: https://gitlab.freedesktop.org/drm/amd/issues/961
Signed-off-by: Kai-Heng Feng <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

net/mlx5: DR, Handle reformat capability over sw-steering tables

On flow table creation, send the relevant flags according to what the FW
currently supports.
When FW doesn't support reformat option over SW-steering managed table,
the driver shouldn't pass this.

Fixes: 988fd6b32d07 ("net/mlx5: DR, Pass table flags at creation to lower layer")
Signed-off-by: Erez Shitrit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Fix lowest FDB pool size

The pool sizes represent the pool sizes in the fw. when we request
a pool size from fw, it will return the next possible group.
We track how many pools the fw has left and start requesting groups
from the big to the small.
When we start request 4k group, which doesn't exists in fw, fw
wants to allocate the next possible size, 64k, but will fail since
its exhausted. The correct smallest pool size in fw is 128 and not 4k.

Fixes: 39ac237ce009 ("net/mlx5: E-Switch, Refactor chains and priorities")
Signed-off-by: Paul Blakey <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Don't clear the whole vf config when switching modes

There is no need to reset all vf config (except link state) between
legacy and switchdev modes changes.
Also, set link state to AUTO, when legacy enabled.

Fixes: 3b83b6c2e024 ("net/mlx5e: Clear VF config when switching modes")
Signed-off-by: Dmytro Linkin <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: DR, Fix matching on vport gvmi

Set vport gvmi in the tag, only when source gvmi is set in the bit mask.

Fixes: 26d688e3 ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Hamdan Igbaria <[email protected]>
Reviewed-by: Alex Vesker <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Fix crash in recovery flow without devlink reporter

When health reporters are not supported, recovery function is invoked
directly, not via devlink health reporters.

In this direct flow, the recover function input parameter was passed
incorrectly and is causing a kernel oops. This patch is fixing the input
parameter.

Following call trace is observed on rx error health reporting.

Internal error: Oops: 96000007 [#1] PREEMPT SMP
Process kworker/u16:4 (pid: 4584, stack limit = 0x00000000c9e45703)
Call trace:
mlx5e_rx_reporter_err_rq_cqe_recover+0x30/0x164 [mlx5_core]
mlx5e_health_report+0x60/0x6c [mlx5_core]
mlx5e_reporter_rq_cqe_err+0x6c/0x90 [mlx5_core]
mlx5e_rq_err_cqe_work+0x20/0x2c [mlx5_core]
process_one_work+0x168/0x3d0
worker_thread+0x58/0x3d0
kthread+0x108/0x134

Fixes: c50de4af1d63 ("net/mlx5e: Generalize tx reporter's functionality")
Signed-off-by: Aya Levin <[email protected]>
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5e: Reset RQ doorbell counter before moving RQ state from RST to RDY

Initialize RQ doorbell counters to zero prior to moving an RQ from RST
to RDY state. Per HW spec, when RQ is back to RDY state, the descriptor
ID on the completion is reset. The doorbell record must comply.

Fixes: 8276ea1353a4 ("net/mlx5e: Report and recover from CQE with error on RQ")
Signed-off-by: Aya Levin <[email protected]>
Reported-by: Tariq Toukan <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

net/mlx5: Fix sleep while atomic in mlx5_eswitch_get_vepa

rtnl_bridge_getlink is protected by rcu lock, so mlx5_eswitch_get_vepa
cannot take mutex lock. Two possible issues can happen:
1. User at the same time change vepa mode via RTM_SETLINK command.
2. User at the same time change the switchdev mode via devlink netlink
interface.

Case 1 cannot happen because rtnl executes one message in order.
Case 2 can happen but we do not expect user to change the switchdev mode
when changing vepa. Even if a user does it, so he will read a value
which is no longer valid.

Fixes: 8da202b24913 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
Signed-off-by: Huy Nguyen <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>

io_uring: fix use-after-free by io_cleanup_req()

io_cleanup_req() should be called before req->io is freed, and so
shouldn't be after __io_free_req() -> __io_req_aux_free(). Also,
it will be ignored for in io_free_req_many(), which use
__io_req_aux_free().

Place cleanup_req() into __io_req_aux_free().

Fixes: 99bc4c38537d774 ("io_uring: fix iovec leaks")
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

This batch contains Netfilter fixes for net:

1) Restrict hashlimit size to 1048576, from Cong Wang.

2) Check for offload flags from nf_flow_table_offload_setup(),
   this fixes a crash in case the hardware offload is disabled.
   From Florian Westphal.

3) Three preparation patches to extend the conntrack clash resolution,
   from Florian.

4) Extend clash resolution to deal with DNS packets from the same flow
   racing to set up the NAT configuration.

5) Small documentation fix in pipapo, from Stefano Brivio.

6) Remove misleading unlikely() from pipapo_refill(), also from Stefano.

7) Reduce hashlimit mutex scope, from Cong Wang. This patch is actually
   triggering another problem, still under discussion, another patch to
   fix this will follow up.
====================

Signed-off-by: David S. Miller <[email protected]>

Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents

In btrfs_wait_ordered_range() once we find an ordered extent that has
finished with an error we exit the loop and don't wait for any other
ordered extents that might be still in progress.

All the users of btrfs_wait_ordered_range() expect that there are no more
ordered extents in progress after that function returns. So past fixes
such like the ones from the two following commits:

  ff612ba7849964 ("btrfs: fix panic during relocation after ENOSPC before
                   writeback happens")

  28aeeac1dd3080 ("Btrfs: fix panic when starting bg cache writeout after
                   IO error")

don't work when there are multiple ordered extents in the range.

Fix that by making btrfs_wait_ordered_range() wait for all ordered extents
even after it finds one that had an error.

Link: https://github.com/kdave/btrfs-progs/issues/228#issuecomment-569777554
CC: [email protected] # 4.4+
Reviewed-by: Qu Wenruo <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: fix bytes_may_use underflow in prealloc error condtition

I hit the following warning while running my error injection stress
testing:

  WARNING: CPU: 3 PID: 1453 at fs/btrfs/space-info.h:108 btrfs_free_reserved_data_space_noquota+0xfd/0x160 [btrfs]
  RIP: 0010:btrfs_free_reserved_data_space_noquota+0xfd/0x160 [btrfs]
  Call Trace:
  btrfs_free_reserved_data_space+0x4f/0x70 [btrfs]
  __btrfs_prealloc_file_range+0x378/0x470 [btrfs]
  elfcorehdr_read+0x40/0x40
  ? elfcorehdr_read+0x40/0x40
  ? btrfs_commit_transaction+0xca/0xa50 [btrfs]
  ? dput+0xb4/0x2a0
  ? btrfs_log_dentry_safe+0x55/0x70 [btrfs]
  ? btrfs_sync_file+0x30e/0x420 [btrfs]
  ? do_fsync+0x38/0x70
  ? __x64_sys_fdatasync+0x13/0x20
  ? do_syscall_64+0x5b/0x1b0
  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happens if we fail to insert our reserved file extent.  At this
point we've already converted our reservation from ->bytes_may_use to
->bytes_reserved.  However once we break we will attempt to free
everything from [cur_offset, end] from ->bytes_may_use, but our extent
reservation will overlap part of this.

Fix this problem by adding ins.offset (our extent allocation size) to
cur_offset so we remove the actual remaining part from ->bytes_may_use.

I validated this fix using my inject-error.py script

python inject-error.py -o should_fail_bio -t cache_save_setup -t \
__btrfs_prealloc_file_range \
-t insert_reserved_file_extent.constprop.0 \
-r "-5" ./run-fsstress.sh

where run-fsstress.sh simply mounts and runs fsstress on a disk.

CC: [email protected] # 4.4+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: handle logged extent failure properly

If we're allocating a logged extent we attempt to insert an extent
record for the file extent directly.  We increase
space_info->bytes_reserved, because the extent entry addition will call
btrfs_update_block_group(), which will convert the ->bytes_reserved to
->bytes_used.  However if we fail at any point while inserting the
extent entry we will bail and leave space on ->bytes_reserved, which
will trigger a WARN_ON() on umount.  Fix this by pinning the space if we
fail to insert, which is what happens in every other failure case that
involves adding the extent entry.

CC: [email protected] # 5.4+
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: do not check delayed items are empty for single transaction cleanup

btrfs_assert_delayed_root_empty() will check if the delayed root is
completely empty, but this is a filesystem-wide check. On cleanup we
may have allowed other transactions to begin, for whatever reason, and
thus the delayed root is not empty.

So remove this check from cleanup_one_transation(). This however can
stay in btrfs_cleanup_transaction(), because it checks only after all of
the transactions have been properly cleaned up, and thus is valid.

CC: [email protected] # 4.4+
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: reset fs_root to NULL on error in open_ctree

While running my error injection script I hit a panic when we tried to
clean up the fs_root when freeing the fs_root. This is because
fs_info->fs_root == PTR_ERR(-EIO), which isn't great. Fix this by
setting fs_info->fs_root = NULL; if we fail to read the root.

CC: [email protected] # 4.4+
Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: destroy qgroup extent records on transaction abort

We clean up the delayed references when we abort a transaction but we
leave the pending qgroup extent records behind, leaking memory.

This patch destroys the extent records when we destroy the delayed refs
and makes sure ensure they're gone before releasing the transaction.

Fixes: 3368d001ba5d ("btrfs: qgroup: Record possible quota-related extent for qgroup.")
CC: [email protected] # 4.4+
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Jeff Mahoney <[email protected]>
[ Rebased to latest upstream, remove to_qgroup() helper, use
rbtree_postorder_for_each_entry_safe() wrapper ]
Signed-off-by: Qu Wenruo <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

powerpc/entry: Fix an #if which should be an #ifdef in entry_32.S

Fixes: 12c3f1fd87bf ("powerpc/32s: get rid of CPU_FTR_601 feature")
Cc: [email protected] # v5.4+
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/a99fc0ad65b87a1ba51cfa3e0e9034ee294c3e07.1582034961.git.christophe.leroy@c-s.fr

Merge tag 'dma-mapping-5.6' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:

- give command line cma= precedence over the CONFIG_ option (Nicolas
   Saenz Julienne)

- always allow 32-bit DMA, even for weirdly placed ZONE_DMA

- improve the debug printks when memory is not addressable, to help
   find problems with swiotlb initialization

* tag 'dma-mapping-5.6' of git://git.infradead.org/users/hch/dma-mapping:
  dma-direct: improve DMA mask overflow reporting
  dma-direct: improve swiotlb error reporting
  dma-direct: relax addressability checks in dma_direct_supported
  dma-contiguous: CMA: give precedence to cmdline

Merge tag 'tpmdd-next-20200217' of git://git.infradead.org/users/jjs/linux-tpmdd

Pull tpm fixes from Jarkko Sakkinen:
"Two bug fixes"

* tag 'tpmdd-next-20200217' of git://git.infradead.org/users/jjs/linux-tpmdd:
tpm: Initialize crypto_id of allocated_banks to HASH_ALGO__LAST
tpm: Revert tpm_tis_spi_mod.ko to tpm_tis_spi.ko.

pipe: make sure to wake up everybody when the last reader/writer closes

Andrei Vagin reported that commit 0ddad21d3e99 ("pipe: use exclusive
waits when reading or writing") broke one of the CRIU tests.  He even
has a trivial reproducer:

    #include <unistd.h>
    #include <sys/types.h>
    #include <sys/wait.h>

    int main()
    {
            int p[2];
            pid_t p1, p2;
            int status;

            if (pipe(p) == -1)
                    return 1;

            p1 = fork();
            if (p1 == 0) {
                    close(p[1]);
                    read(p[0], &status, sizeof(status));
                    return 0;
            }
            p2 = fork();
            if (p2 == 0) {
                    close(p[1]);
                    read(p[0], &status, sizeof(status));
                    return 0;
            }
            sleep(1);
            close(p[1]);
            wait(&status);
            wait(&status);

            return 0;
    }

and the problem - once he points it out - is obvious.  We use these nice
exclusive waits, but when the last writer goes away, it then needs to
wake up _every_ reader (and conversely, the last reader disappearing
needs to wake every writer, of course).

In fact, when going through this, we had several small oddities around
how to wake things.  We did in fact wake every reader when we changed
the size of the pipe buffers.  But that's entirely pointless, since that
just acts as a possible source of new space - no new data to read.

And when we change the size of the buffer, we don't need to wake all
writers even when we add space - that case acts just as if somebody made
space by reading, and any writer that finds itself not filling it up
entirely will wake the next one.

On the other hand, on the exit path, we tried to limit the wakeups with
the proper poll keys etc, which is entirely pointless, because at that
point we obviously need to wake up everybody.  So don't do that: just
wake up everybody - but only do that if the counts changed to zero.

So fix those non-IO wakeups to be more proper: space change doesn't add
any new data, but it might make room for writers, so it wakes up a
writer.  And the actual changes to reader/writer counts should wake up
everybody, since everybody is affected (ie readers will all see EOF if
the writers have gone away, and writers will all get EPIPE if all
readers have gone away).

Fixes: 0ddad21d3e99 ("pipe: use exclusive waits when reading or writing")
Reported-and-tested-by: Andrei Vagin <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

netfilter: nft_set_pipapo: Don't abuse unlikely() in pipapo_refill()

I originally used unlikely() in the if (match_only) clause, which
we hit on the mapping table for the last field in a set, to ensure
we avoid branching to the rest of for loop body, which is executed
more frequently.

However, Pablo reports, this is confusing as it gives the impression
that this is not a common case, and it's actually not the intended
usage of unlikely().

I couldn't observe any statistical difference in matching rates on
x864_64 and aarch64 without it, so just drop it.

Reported-by: Pablo Neira Ayuso <[email protected]>
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Stefano Brivio <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nft_set_pipapo: Fix mapping table example in comments

In both insertion and lookup examples, the two element pointers
of rule mapping tables were swapped. Fix that.

Reported-by: Pablo Neira Ayuso <[email protected]>
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Stefano Brivio <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

flow_table.c: Use built-in RCU list checking

hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

datapath.c: Use built-in RCU list checking

hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

vport.c: Use built-in RCU list checking

hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

meter.c: Use built-in RCU list checking

hlist_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netlabel_domainhash.c: Use built-in RCU list checking

list_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: netlabel: Use built-in RCU list checking

list_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled
by default.

Signed-off-by: Madhuparna Bhowmik <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Revert "net: dev: introduce support for sch BYPASS for lockless qdisc"

This reverts commit ba27b4cdaaa66561aaedb2101876e563738d36fe

Ahmed reported ouf-of-order issues bisected to commit ba27b4cdaaa6
("net: dev: introduce support for sch BYPASS for lockless qdisc").
I can't find any working solution other than a plain revert.

This will introduce some minor performance regressions for
pfifo_fast qdisc. I plan to address them in net-next with more
indirect call wrapper boilerplate for qdiscs.

Reported-by: Ahmad Fatoum <[email protected]>
Fixes: ba27b4cdaaa6 ("net: dev: introduce support for sch BYPASS for lockless qdisc")
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: broadcom: Fix a typo ("firsly")

Signed-off-by: Jonathan Neuschäfer <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: fix bogus socket flag values

Dan Carpenter reports static checker warnings due to bogus BIT() usage:

net/mptcp/subflow.c:571 subflow_write_space() warn: test_bit() takes a bit number
net/mptcp/subflow.c:694 subflow_state_change() warn: test_bit() takes a bit number
net/mptcp/protocol.c:261 ssk_check_wmem() warn: test_bit() takes a bit number
[..]

This is harmless (we use bits 1 & 2 instead of 0 and 1), but would
break eventually when adding BIT(5) (or 6, depends on size of 'long').

Just use 0 and 1, the values are only passed to test/set/clear_bit
functions.

Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qede: Fix race between rdma destroy workqueue and link change event

If an event is added while the rdma workqueue is being destroyed
it could lead to several races, list corruption, null pointer
dereference during queue_work or init_queue.
This fixes the race between the two flows which can occur during
shutdown.

A kref object and a completion object are added to the rdma_dev
structure, these are initialized before the workqueue is created.
The refcnt is used to indicate work is being added to the
workqueue and ensures the cleanup flow won't start while we're in
the middle of adding the event.
Once the work is added, the refcnt is decreased and the cleanup flow
is safe to run.

Fixes: cee9fbd8e2e ("qede: Add qedr framework")
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'thunderbolt-fix-for-v5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-linus

Mika writes:

thunderbolt: Fix for v5.6-rc3

Single fix that orders the THUNDERBOLT MAINTAINERS record according to
parse-maintainers.pl.

* tag 'thunderbolt-fix-for-v5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
MAINTAINERS: Sort entries in database for THUNDERBOLT

io_uring: remove unnecessary NULL checks

The "kmsg" pointer can't be NULL and we have already dereferenced it so
a check here would be useless.

Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

arm64: lse: Fix LSE atomics with LLVM

Commit e0d5896bd356 ("arm64: lse: fix LSE atomics with LLVM's integrated
assembler") broke the build when clang is used in connjunction with the
binutils assembler ("-no-integrated-as"). This happens because
__LSE_PREAMBLE is defined as ".arch armv8-a+lse", which overrides the
version of the CPU architecture passed via the "-march" paramter to gas:

$ aarch64-none-linux-gnu-as -EL -I ./arch/arm64/include
                                -I ./arch/arm64/include/generated
                                -I ./include -I ./include
                                -I ./arch/arm64/include/uapi
                                -I ./arch/arm64/include/generated/uapi
                                -I ./include/uapi -I ./include/generated/uapi
                                -I ./init -I ./init
                                -march=armv8.3-a -o init/do_mounts.o
                                /tmp/do_mounts-d7992a.s
/tmp/do_mounts-d7992a.s: Assembler messages:
/tmp/do_mounts-d7992a.s:1959: Error: selected processor does not support `autiasp'
/tmp/do_mounts-d7992a.s:2021: Error: selected processor does not support `paciasp'
/tmp/do_mounts-d7992a.s:2157: Error: selected processor does not support `autiasp'
/tmp/do_mounts-d7992a.s:2175: Error: selected processor does not support `paciasp'
/tmp/do_mounts-d7992a.s:2494: Error: selected processor does not support `autiasp'

Fix the issue by replacing ".arch armv8-a+lse" with ".arch_extension lse".
Sami confirms that the clang integrated assembler does now support the
'.arch_extension' directive, so this change will be fine even for LTO
builds in future.

Fixes: e0d5896bd356cd ("arm64: lse: fix LSE atomics with LLVM's integrated assembler")
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Reported-by: Amit Kachhap <[email protected]>
Tested-by: Sami Tolvanen <[email protected]>
Signed-off-by: Vincenzo Frascino <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

iommu/vt-d: Simplify check in identity_mapping()

The function only has one call-site and there it is never called with
dummy or deferred devices. Simplify the check in the function to
account for that.

Fixes: 1ee0186b9a12 ("iommu/vt-d: Refactor find_domain() helper")
Cc: [email protected] # v5.5
Reviewed-by: Jerry Snitselaar <[email protected]>
Acked-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

iommu/vt-d: Remove deferred_attach_domain()

The function is now only a wrapper around find_domain(). Remove the
function and call find_domain() directly at the call-sites.

Fixes: 1ee0186b9a12 ("iommu/vt-d: Refactor find_domain() helper")
Cc: [email protected] # v5.5
Reviewed-by: Jerry Snitselaar <[email protected]>
Acked-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

iommu/vt-d: Do deferred attachment in iommu_need_mapping()

The attachment of deferred devices needs to happen before the check
whether the device is identity mapped or not. Otherwise the check will
return wrong results, cause warnings boot failures in kdump kernels, like

WARNING: CPU: 0 PID: 318 at ../drivers/iommu/intel-iommu.c:592 domain_get_iommu+0x61/0x70

[...]

Call Trace:
  __intel_map_single+0x55/0x190
  intel_alloc_coherent+0xac/0x110
  dmam_alloc_attrs+0x50/0xa0
  ahci_port_start+0xfb/0x1f0 [libahci]
  ata_host_start.part.39+0x104/0x1e0 [libata]

With the earlier check the kdump boot succeeds and a crashdump is written.

Fixes: 1ee0186b9a12 ("iommu/vt-d: Refactor find_domain() helper")
Cc: [email protected] # v5.5
Reviewed-by: Jerry Snitselaar <[email protected]>
Acked-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

iommu/vt-d: Move deferred device attachment into helper function

Move the code that does the deferred device attachment into a separate
helper function.

Fixes: 1ee0186b9a12 ("iommu/vt-d: Refactor find_domain() helper")
Cc: [email protected] # v5.5
Reviewed-by: Jerry Snitselaar <[email protected]>
Acked-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

iommu/vt-d: Add attach_deferred() helper

Implement a helper function to check whether a device's attach process
is deferred.

Fixes: 1ee0186b9a12 ("iommu/vt-d: Refactor find_domain() helper")
Cc: [email protected] # v5.5
Reviewed-by: Jerry Snitselaar <[email protected]>
Acked-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>

libbpf: Sanitise internal map names so they are not rejected by the kernel

The kernel only accepts map names with alphanumeric characters, underscores
and periods in their name. However, the auto-generated internal map names
used by libbpf takes their prefix from the user-supplied BPF object name,
which has no such restriction. This can lead to "Invalid argument" errors
when trying to load a BPF program using global variables.

Fix this by sanitising the map names, replacing any non-allowed characters
with underscores.

Fixes: d859900c4c56 ("bpf, libbpf: support global data/bss/rodata sections")
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf, uapi: Remove text about bpf_redirect_map() giving higher performance

The performance of bpf_redirect() is now roughly the same as that of
bpf_redirect_map(). However, David Ahern pointed out that the header file
has not been updated to reflect this, and still says that a significant
performance increase is possible when using bpf_redirect_map(). Remove this
text from the bpf_redirect_map() description, and reword the description in
bpf_redirect() slightly. Also fix the 'Return' section of the
bpf_redirect_map() documentation.

Fixes: 1d233886dd90 ("xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths")
Reported-by: David Ahern <[email protected]>
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Quentin Monnet <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

ima: add sm3 algorithm to hash algorithm configuration list

sm3 has been supported by the ima hash algorithm, but it is not
yet in the Kconfig configuration list. After adding, both ima and tpm2
can support sm3 well.

Signed-off-by: Tianjia Zhang <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>

crypto: rename sm3-256 to sm3 in hash_algo_name

The name sm3-256 is defined in hash_algo_name in hash_info, but the
algorithm name implemented in sm3_generic.c is sm3, which will cause
the sm3-256 algorithm to be not found in some application scenarios of
the hash algorithm, and an ENOENT error will occur. For example,
IMA, keys, and other subsystems that reference hash_algo_name all use
the hash algorithm of sm3.

Fixes: 5ca4c20cfd37 ("keys, trusted: select hash algorithm for TPM2 chips")
Signed-off-by: Tianjia Zhang <[email protected]>
Reviewed-by: Pascal van Leeuwen <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>

efi: Only print errors about failing to get certs if EFI vars are found

If CONFIG_LOAD_UEFI_KEYS is enabled, the kernel attempts to load the certs
from the db, dbx and MokListRT EFI variables into the appropriate keyrings.

But it just assumes that the variables will be present and prints an error
if the certs can't be loaded, even when is possible that the variables may
not exist. For example the MokListRT variable will only be present if shim
is used.

So only print an error message about failing to get the certs list from an
EFI variable if this is found. Otherwise these printed errors just pollute
the kernel log ring buffer with confusing messages like the following:

[    5.427251] Couldn't get size: 0x800000000000000e
[    5.427261] MODSIGN: Couldn't get UEFI db list
[    5.428012] Couldn't get size: 0x800000000000000e
[    5.428023] Couldn't get UEFI MokListRT

Reported-by: Hans de Goede <[email protected]>
Signed-off-by: Javier Martinez Canillas <[email protected]>
Tested-by: Hans de Goede <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Mimi Zohar <[email protected]>

Merge tag 'asoc-fix-v5.6-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v5.6

A few fixes sent in since the merge window, none of them with global
impact but all important for the users they affect.

powerpc/xmon: Fix whitespace handling in getstring()

The ls (lookup symbol) and zr (reboot) commands use xmon's getstring()
helper to read a string argument from the xmon prompt. This function
skips over leading whitespace, but doesn't check if the first
"non-whitespace" character is a newline which causes some odd
behaviour (<enter> indicates a the enter key was pressed):

  0:mon> ls printk<enter>
  printk: c0000000001680c4

  0:mon> ls<enter>
  printk<enter>
  Symbol '
  printk' not found.
  0:mon>

With commit 2d9b332d99b ("powerpc/xmon: Allow passing an argument to
ppc_md.restart()") we have a similar problem with the zr command.
Previously zr took no arguments so "zr<enter> would trigger a reboot.
With that patch applied a second newline needs to be sent in order for
the reboot to occur. Fix this by checking if the leading whitespace
ended on a newline:

  0:mon> ls<enter>
  Symbol '' not found.

Fixes: 2d9b332d99b2 ("powerpc/xmon: Allow passing an argument to ppc_md.restart()")
Reported-by: Michael Ellerman <[email protected]>
Signed-off-by: Oliver O'Halloran <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/6xx: Fix power_save_ppc32_restore() with CONFIG_VMAP_STACK

power_save_ppc32_restore() is called during exception entry, before
re-enabling the MMU. It substracts KERNELBASE from the address
of nap_save_msscr0 to access it.

With CONFIG_VMAP_STACK enabled, data MMU translation has already been
re-enabled, so power_save_ppc32_restore() has to access
nap_save_msscr0 by its virtual address.

Reported-by: Larry Finger <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Fixes: cd08f109e262 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
Tested-by: Larry Finger <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/7bce32ccbab3ba3e3e0f27da6961bf6313df97ed.1581663140.git.christophe.leroy@c-s.fr

powerpc/chrp: Fix enter_rtas() with CONFIG_VMAP_STACK

With CONFIG_VMAP_STACK, data MMU has to be enabled
to read data on the stack.

Fixes: cd08f109e262 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/d2330584f8c42d3039896e2b56f5d39676dc919c.1581669558.git.christophe.leroy@c-s.fr

powerpc/32s: Fix DSI and ISI exceptions for CONFIG_VMAP_STACK

hash_page() needs to read page tables from kernel memory. When entire
kernel memory is mapped by BATs, which is normally the case when
CONFIG_STRICT_KERNEL_RWX is not set, it works even if the page hosting
the page table is not referenced in the MMU hash table.

However, if the page where the page table resides is not covered by
a BAT, a DSI fault can be encountered from hash_page(), and it loops
forever. This can happen when CONFIG_STRICT_KERNEL_RWX is selected
and the alignment of the different regions is too small to allow
covering the entire memory with BATs. This also happens when
CONFIG_DEBUG_PAGEALLOC is selected or when booting with 'nobats'
flag.

Also, if the page containing the kernel stack is not present in the
MMU hash table, registers cannot be saved and a recursive DSI fault
is encountered.

To allow hash_page() to properly do its job at all time and load the
MMU hash table whenever needed, it must run with data MMU disabled.
This means it must be called before re-enabling data MMU. To allow
this, registers clobbered by hash_page() and create_hpte() have to
be saved in the thread struct together with SRR0, SSR1, DAR and DSISR.
It is also necessary to ensure that DSI prolog doesn't overwrite
regs saved by prolog of the current running exception. That means:
- DSI can only use SPRN_SPRG_SCRATCH0
- Exceptions must free SPRN_SPRG_SCRATCH0 before writing to the stack.

This also fixes the Oops reported by Erhard when create_hpte() is
called by add_hash_page().

Due to prolog size increase, a few more exceptions had to get split
in two parts.

Fixes: cd08f109e262 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
Reported-by: Erhard F. <[email protected]>
Signed-off-by: Christophe Leroy <[email protected]>
Tested-by: Erhard F. <[email protected]>
Tested-by: Larry Finger <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206501
Link: https://lore.kernel.org/r/64a4aa44686e9fd4b01333401367029771d9b231.1581761633.git.christophe.leroy@c-s.fr

powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery

After a treclaim, we expect to be in non-transactional state. If we
don't clear the current thread's MSR[TS] before we get preempted, then
tm_recheckpoint_new_task() will recheckpoint and we get rescheduled in
suspended transaction state.

When handling a signal caught in transactional state,
handle_rt_signal64() calls get_tm_stackpointer() that treclaims the
transaction using tm_reclaim_current() but without clearing the
thread's MSR[TS]. This can cause the TM Bad Thing exception below if
later we pagefault and get preempted trying to access the user's
sigframe, using __put_user(). Afterwards, when we are rescheduled back
into do_page_fault() (but now in suspended state since the thread's
MSR[TS] was not cleared), upon executing 'rfid' after completion of
the page fault handling, the exception is raised because a transition
from suspended to non-transactional state is invalid.

  Unexpected TM Bad Thing exception at c00000000000de44 (msr 0x8000000302a03031) tm_scratch=800000010280b033
  Oops: Unrecoverable exception, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  CPU: 25 PID: 15547 Comm: a.out Not tainted 5.4.0-rc2 #32
  NIP:  c00000000000de44 LR: c000000000034728 CTR: 0000000000000000
  REGS: c00000003fe7bd70 TRAP: 0700   Not tainted  (5.4.0-rc2)
  MSR:  8000000302a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[SE]>  CR: 44000884  XER: 00000000
  CFAR: c00000000000dda4 IRQMASK: 0
  PACATMSCRATCH: 800000010280b033
  GPR00: c000000000034728 c000000f65a17c80 c000000001662800 00007fffacf3fd78
  GPR04: 0000000000001000 0000000000001000 0000000000000000 c000000f611f8af0
  GPR08: 0000000000000000 0000000078006001 0000000000000000 000c000000000000
  GPR12: c000000f611f84b0 c00000003ffcb200 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000f611f8140
  GPR24: 0000000000000000 00007fffacf3fd68 c000000f65a17d90 c000000f611f7800
  GPR28: c000000f65a17e90 c000000f65a17e90 c000000001685e18 00007fffacf3f000
  NIP [c00000000000de44] fast_exception_return+0xf4/0x1b0
  LR [c000000000034728] handle_rt_signal64+0x78/0xc50
  Call Trace:
  [c000000f65a17c80] [c000000000034710] handle_rt_signal64+0x60/0xc50 (unreliable)
  [c000000f65a17d30] [c000000000023640] do_notify_resume+0x330/0x460
  [c000000f65a17e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74
  Instruction dump:
  7c4ff120 e8410170 7c5a03a6 38400000 f8410060 e8010070 e8410080 e8610088
  60000000 60000000 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed0989
  ---[ end trace 93094aa44b442f87 ]---

The simplified sequence of events that triggers the above exception is:

  ... # userspace in NON-TRANSACTIONAL state
  tbegin # userspace in TRANSACTIONAL state
  signal delivery # kernelspace in SUSPENDED state
  handle_rt_signal64()
    get_tm_stackpointer()
      treclaim # kernelspace in NON-TRANSACTIONAL state
    __put_user()
      page fault happens. We will never get back here because of the TM Bad Thing exception.

  page fault handling kicks in and we voluntarily preempt ourselves
  do_page_fault()
    __schedule()
      __switch_to(other_task)

  our task is rescheduled and we recheckpoint because the thread's MSR[TS] was not cleared
  __switch_to(our_task)
    switch_to_tm()
      tm_recheckpoint_new_task()
        trechkpt # kernelspace in SUSPENDED state

  The page fault handling resumes, but now we are in suspended transaction state
  do_page_fault()    completes
  rfid     <----- trying to get back where the page fault happened (we were non-transactional back then)
  TM Bad Thing # illegal transition from suspended to non-transactional

This patch fixes that issue by clearing the current thread's MSR[TS]
just after treclaim in get_tm_stackpointer() so that we stay in
non-transactional state in case we are preempted. In order to make
treclaim and clearing the thread's MSR[TS] atomic from a preemption
perspective when CONFIG_PREEMPT is set, preempt_disable/enable() is
used. It's also necessary to save the previous value of the thread's
MSR before get_tm_stackpointer() is called so that it can be exposed
to the signal handler later in setup_tm_sigcontexts() to inform the
userspace MSR at the moment of the signal delivery.

Found with tm-signal-context-force-tm kernel selftest.

Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context")
Cc: [email protected] # v3.9
Signed-off-by: Gustavo Luiz Duarte <[email protected]>
Acked-by: Michael Neuling <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ALSA: hda: Use scnprintf() for printing texts for sysfs/procfs

Some code in HD-audio driver calls snprintf() in a loop and still
expects that the return value were actually written size, while
snprintf() returns the expected would-be length instead. When the
given buffer limit were small, this leads to a buffer overflow.

Use scnprintf() for addressing those issues. It returns the actually
written size unlike snprintf().

Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: hda/realtek - Apply quirk for yet another MSI laptop

MSI GP65 laptop with SSID 1462:1293 requires the same quirk as other
MSI models.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204159
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

drm/i915/gt: Avoid resetting ring->head outside of its timeline mutex

We manipulate ring->head while active in i915_request_retire underneath
the timeline manipulation. We cannot rely on a stable ring->head outside
of the timeline->mutex, in particular while setting up the context for
resume and reset.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1126
Fixes: 0881954965e3 ("drm/i915: Introduce intel_context.pin_mutex for pin management")
Fixes: e5dadff4b093 ("drm/i915: Protect request retirement with timeline->mutex")
References: f3c0efc9fe7a ("drm/i915/execlists: Leave resetting ring to intel_ring")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Reviewed-by: Mika Kuoppala <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 42827350f75c56d0fe9f15d8425a1390528958b6)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/execlists: Always force a context reload when rewinding RING_TAIL

If we rewind the RING_TAIL on a context, due to a preemption event, we
must force the context restore for the RING_TAIL update to be properly
handled. Rather than note which preemption events may cause us to rewind
the tail, compare the new request's tail with the previously submitted
RING_TAIL, as it turns out that timeslicing was causing unexpected
rewinds.

   <idle>-0       0d.s2 1280851190us : __execlists_submission_tasklet: 0000:00:02.0 rcs0: expired last=130:4698, prio=3, hint=3
   <idle>-0       0d.s2 1280851192us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 66:119966, current 119964
   <idle>-0       0d.s2 1280851195us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 130:4698, current 4695
   <idle>-0       0d.s2 1280851198us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 130:4696, current 4695
^----  Note we unwind 2 requests from the same context

   <idle>-0       0d.s2 1280851208us : __i915_request_submit: 0000:00:02.0 rcs0: fence 130:4696, current 4695
   <idle>-0       0d.s2 1280851213us : __i915_request_submit: 0000:00:02.0 rcs0: fence 134:1508, current 1506
^---- But to apply the new timeslice, we have to replay the first request
      before the new client can start -- the unexpected RING_TAIL rewind

   <idle>-0       0d.s2 1280851219us : trace_ports: 0000:00:02.0 rcs0: submit { 130:4696*, 134:1508 }
synmark2-5425    2..s. 1280851239us : process_csb: 0000:00:02.0 rcs0: cs-irq head=5, tail=0
synmark2-5425    2..s. 1280851240us : process_csb: 0000:00:02.0 rcs0: csb[0]: status=0x00008002:0x00000000
^---- Preemption event for the ELSP update; note the lite-restore

synmark2-5425    2..s. 1280851243us : trace_ports: 0000:00:02.0 rcs0: preempted { 130:4698, 66:119966 }
synmark2-5425    2..s. 1280851246us : trace_ports: 0000:00:02.0 rcs0: promote { 130:4696*, 134:1508 }
synmark2-5425    2.... 1280851462us : __i915_request_commit: 0000:00:02.0 rcs0: fence 130:4700, current 4695
synmark2-5425    2.... 1280852111us : __i915_request_commit: 0000:00:02.0 rcs0: fence 130:4702, current 4695
synmark2-5425    2.Ns1 1280852296us : process_csb: 0000:00:02.0 rcs0: cs-irq head=0, tail=2
synmark2-5425    2.Ns1 1280852297us : process_csb: 0000:00:02.0 rcs0: csb[1]: status=0x00000814:0x00000000
synmark2-5425    2.Ns1 1280852299us : trace_ports: 0000:00:02.0 rcs0: completed { 130:4696!, 134:1508 }
synmark2-5425    2.Ns1 1280852301us : process_csb: 0000:00:02.0 rcs0: csb[2]: status=0x00000818:0x00000040
synmark2-5425    2.Ns1 1280852302us : trace_ports: 0000:00:02.0 rcs0: completed { 134:1508, 0:0 }
synmark2-5425    2.Ns1 1280852313us : process_csb: process_csb:2336 GEM_BUG_ON(!i915_request_completed(*execlists->active) && !reset_in_progress(execlists))

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Referenecs: 82c69bf58650 ("drm/i915/gt: Detect if we miss WaIdleLiteRestore")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Reviewed-by: Mika Kuoppala <[email protected]>
Cc: <[email protected]> # v5.4+
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 5ba32c7be81e53ea8a27190b0f6be98e6c6779af)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Wean off drm_pci_alloc/drm_pci_free

drm_pci_alloc and drm_pci_free are just very thin wrappers around
dma_alloc_coherent, with a note that we should be removing them.
Furthermore since

commit de09d31dd38a50fdce106c15abd68432eebbd014
Author: Kirill A. Shutemov <[email protected]>
Date:   Fri Jan 15 16:51:42 2016 -0800

    page-flags: define PG_reserved behavior on compound pages

    As far as I can see there's no users of PG_reserved on compound pages.
    Let's use PF_NO_COMPOUND here.

drm_pci_alloc has been declared broken since it mixes GFP_COMP and
SetPageReserved. Avoid this conflict by weaning ourselves off using the
abstraction and using the dma functions directly.

Reported-by: Taketo Kabe
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1027
Fixes: de09d31dd38a ("page-flags: define PG_reserved behavior on compound pages")
Signed-off-by: Chris Wilson <[email protected]>
Cc: <[email protected]> # v4.5+
Reviewed-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit c6790dc22312f592c1434577258b31c48c72d52a)
Signed-off-by: Jani Nikula <[email protected]>