Git Repo - linux.git/log

s390/qeth: don't return -ENOTSUPP to userspace

ENOTSUPP is not uapi, use EOPNOTSUPP instead.

Fixes: d66cb37e9664 ("qeth: Add new priority queueing options")
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

s390/qeth: fix promiscuous mode after reset

When managing the promiscuous mode during an RX modeset, qeth caches the
current HW state to avoid repeated programming of the same state on each
modeset.

But while tearing down a device, we forget to clear the cached state. So
when the device is later set online again, the initial RX modeset
doesn't program the promiscuous mode since we believe it is already
enabled.
Fix this by clearing the cached state in the tear-down path.

Note that for the SBP variant of promiscuous mode, this accidentally
works right now because we unconditionally restore the SBP role while
re-initializing.

Fixes: 4a71df50047f ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <[email protected]>
Reviewed-by: Alexandra Winter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

s390/qeth: handle error due to unsupported transport mode

Along with z/VM NICs, there's additional device types that only support
a specific transport mode (eg. external-bridged IQD).
Identify the corresponding error code, and raise a fitting error message
so that the user knows to adjust their device configuration.

On top of that also fix the subsequent error path, so that the rejected
cmd doesn't need to wait for a timeout but gets cancelled straight away.

Fixes: 4a71df50047f ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sbitmap: only queue kyber's wait callback if not already active

Under heavy loads where the kyber I/O scheduler hits the token limits for
its scheduling domains, kyber can become stuck.  When active requests
complete, kyber may not be woken up leaving the I/O requests in kyber
stuck.

This stuck state is due to a race condition with kyber and the sbitmap
functions it uses to run a callback when enough requests have completed.
The running of a sbt_wait callback can race with the attempt to insert the
sbt_wait.  Since sbitmap_del_wait_queue removes the sbt_wait from the list
first then sets the sbq field to NULL, kyber can see the item as not on a
list but the call to sbitmap_add_wait_queue will see sbq as non-NULL. This
results in the sbt_wait being inserted onto the wait list but ws_active
doesn't get incremented.  So the sbitmap queue does not know there is a
waiter on a wait list.

Since sbitmap doesn't think there is a waiter, kyber may never be
informed that there are domain tokens available and the I/O never advances.
With the sbt_wait on a wait list, kyber believes it has an active waiter
so cannot insert a new waiter when reaching the domain's full state.

This race can be fixed by only adding the sbt_wait to the queue if the
sbq field is NULL.  If sbq is not NULL, there is already an action active
which will trigger the re-running of kyber.  Let it run and add the
sbt_wait to the wait list if still needing to wait.

Reviewed-by: Omar Sandoval <[email protected]>
Signed-off-by: David Jeffery <[email protected]>
Reported-by: John Pittman <[email protected]>
Tested-by: John Pittman <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

cxgb4: fix refcount init for TC-MQPRIO offload

Properly initialize refcount to 1 when hardware queue arrays for
TC-MQPRIO offload have been freshly allocated. Otherwise, following
warning is observed. Also fix up error path to only free hardware
queue arrays when refcount reaches 0.

[  130.075342] ------------[ cut here ]------------
[  130.075343] refcount_t: addition on 0; use-after-free.
[  130.075355] WARNING: CPU: 0 PID: 10870 at lib/refcount.c:25
refcount_warn_saturate+0xe1/0x100
[  130.075356] Modules linked in: sch_mqprio iptable_nat ib_iser
libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_umad iw_cxgb4 libcxgb
ib_uverbs x86_pkg_temp_thermal cxgb4 igb
[  130.075361] CPU: 0 PID: 10870 Comm: tc Kdump: loaded Not tainted
5.5.0-rc1+ #11
[  130.075362] Hardware name: Supermicro
X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.2
01/16/2015
[  130.075363] RIP: 0010:refcount_warn_saturate+0xe1/0x100
[  130.075364] Code: e8 14 41 c1 ff 0f 0b c3 80 3d 44 f4 10 01 00 0f 85
63 ff ff ff 48 c7 c7 38 9f 83 8c 31 c0 c6 05 2e f4 10 01 01 e8 ef 40 c1
ff <0f> 0b c3 48 c7 c7 10 9f 83 8c 31 c0 c6 05 17 f4 10 01 01 e8 d7 40
[  130.075365] RSP: 0018:ffffa48d00c0b768 EFLAGS: 00010286
[  130.075366] RAX: 0000000000000000 RBX: 0000000000000008 RCX:
0000000000000001
[  130.075366] RDX: 0000000000000001 RSI: 0000000000000096 RDI:
ffff8a2e9fa187d0
[  130.075367] RBP: ffff8a2e93890000 R08: 0000000000000398 R09:
000000000000003c
[  130.075367] R10: 00000000000142a0 R11: 0000000000000397 R12:
ffffa48d00c0b848
[  130.075368] R13: ffff8a2e94746498 R14: ffff8a2e966f7000 R15:
0000000000000031
[  130.075368] FS:  00007f689015f840(0000) GS:ffff8a2e9fa00000(0000)
knlGS:0000000000000000
[  130.075369] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  130.075369] CR2: 00000000006762a0 CR3: 00000007cf164005 CR4:
00000000001606f0
[  130.075370] Call Trace:
[  130.075377]  cxgb4_setup_tc_mqprio+0xbee/0xc30 [cxgb4]
[  130.075382]  ? cxgb4_ethofld_restart+0x50/0x50 [cxgb4]
[  130.075384]  ? pfifo_fast_init+0x7e/0xf0
[  130.075386]  mqprio_init+0x5f4/0x630 [sch_mqprio]
[  130.075389]  qdisc_create+0x1bf/0x4a0
[  130.075390]  tc_modify_qdisc+0x1ff/0x770
[  130.075392]  rtnetlink_rcv_msg+0x28b/0x350
[  130.075394]  ? rtnl_calcit.isra.32+0x110/0x110
[  130.075395]  netlink_rcv_skb+0xc6/0x100
[  130.075396]  netlink_unicast+0x1db/0x330
[  130.075397]  netlink_sendmsg+0x2f5/0x460
[  130.075399]  ? _copy_from_user+0x2e/0x60
[  130.075400]  sock_sendmsg+0x59/0x70
[  130.075401]  ____sys_sendmsg+0x1f0/0x230
[  130.075402]  ? copy_msghdr_from_user+0xd7/0x140
[  130.075403]  ___sys_sendmsg+0x77/0xb0
[  130.075404]  ? ___sys_recvmsg+0x84/0xb0
[  130.075406]  ? __handle_mm_fault+0x377/0xaf0
[  130.075407]  __sys_sendmsg+0x53/0xa0
[  130.075409]  do_syscall_64+0x44/0x130
[  130.075412]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  130.075413] RIP: 0033:0x7f688f13af10
[  130.075414] Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff
eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
[  130.075414] RSP: 002b:00007ffe6c7d9988 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[  130.075415] RAX: ffffffffffffffda RBX: 00000000006703a0 RCX:
00007f688f13af10
[  130.075415] RDX: 0000000000000000 RSI: 00007ffe6c7d99f0 RDI:
0000000000000003
[  130.075416] RBP: 000000005df38312 R08: 0000000000000002 R09:
0000000000008000
[  130.075416] R10: 00007ffe6c7d93e0 R11: 0000000000000246 R12:
0000000000000000
[  130.075417] R13: 00007ffe6c7e9c50 R14: 0000000000000001 R15:
000000000067c600
[  130.075418] ---[ end trace 8fbb3bf36a8671db ]---

v2:
- Move the refcount_set() closer to where the hardware queue arrays
  are being allocated.
- Fix up error path to only free hardware queue arrays when refcount
  reaches 0.

Fixes: 2d0cb84dd973 ("cxgb4: add ETHOFLD hardware queue support")
Signed-off-by: Rahul Lakkireddy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'devicetree-fixes-for-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull Devicetree fix from Rob Herring:
"Add missing 'properties' keyword enclosing 'snps,tso' in snps,dwmac
binding"

* tag 'devicetree-fixes-for-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: Add missing 'properties' keyword enclosing 'snps,tso'

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

- Leftover put_cpu() in the perf/smmuv3 error path.

- Add Hisilicon TSV110 to spectre-v2 safe list

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: cpu_errata: Add Hisilicon TSV110 to spectre-v2 safe list
perf/smmuv3: Remove the leftover put_cpu() in error path

Merge tag 'drm-fixes-2019-12-21' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"Probably the last one before Christmas, I'll see if there is much
  demand over next few weeks for more fixes, I expect it'll be quiet
  enough.

  This has one exynos fix, and a bunch of i915 core and i915 GVT fixes.

  Summary:

  exynos:
   - component delete fix

  i915:
   - Fix to drop an unused and harmful display W/A
   - Fix to define EHL power wells independent of ICL
   - Fix for priority inversion on bonded requests
   - Fix in mmio offset calculation of DSB instance
   - Fix memory leak from get_task_pid when banning clients
   - Fixes to avoid dereference of uninitialized ops in dma_fence
     tracing and keep reference to execbuf object until submitted.
   - vGPU state setting locking fix (Zhenyu)
   - Fix vGPU display dmabuf as read-only (Zhenyu)
   - Properly handle vGPU display dmabuf page pin when rendering (Tina)
   - Fix one guest boot warning to handle guc reset state (Fred)"

* tag 'drm-fixes-2019-12-21' of git://anongit.freedesktop.org/drm/drm:
  drm/exynos: gsc: add missed component_del
  drm/i915: Fix pid leak with banned clients
  drm/i915/gem: Keep request alive while attaching fences
  drm/i915: Fix WARN_ON condition for cursor plane ddb allocation
  drm/i915/gvt: Fix guest boot warning
  drm/i915/tgl: Drop Wa#1178
  drm/i915/ehl: Define EHL powerwells independently of ICL
  drm/i915: Set fence_work.ops before dma_fence_init
  drm/i915: Copy across scheduler behaviour flags across submit fences
  drm/i915/dsb: Fix in mmio offset calculation of DSB instance
  drm/i915/gvt: Pin vgpu dma address before using
  drm/i915/gvt: set guest display buffer as readonly
  drm/i915/gvt: use vgpu lock for active state setting

Merge tag 'io_uring-5.5-20191220' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"Here's a set of fixes that should go into 5.5-rc3 for io_uring.

  This is bigger than I'd like it to be, mainly because we're fixing the
  case where an application reuses sqe data right after issue. This
  really must work, or it's confusing. With 5.5 we're flagging us as
  submit stable for the actual data, this must also be the case for
  SQEs.

  Honestly, I'd really like to add another series on top of this, since
  it cleans it up considerable and prevents any SQE reuse by design. I
  posted that here:

    https://lore.kernel.org/io-uring/20191220174742 [email protected]/T/#u

  and may still send it your way early next week once it's been looked
  at and had some more soak time (does pass all regression tests). With
  that series, we've unified the prep+issue handling, and only the prep
  phase even has access to the SQE.

  Anyway, outside of that, fixes in here for a few other issues that
  have been hit in testing or production"

* tag 'io_uring-5.5-20191220' of git://git.kernel.dk/linux-block:
  io_uring: io_wq_submit_work() should not touch req->rw
  io_uring: don't wait when under-submitting
  io_uring: warn about unhandled opcode
  io_uring: read opcode and user_data from SQE exactly once
  io_uring: make IORING_OP_TIMEOUT_REMOVE deferrable
  io_uring: make IORING_OP_CANCEL_ASYNC deferrable
  io_uring: make IORING_POLL_ADD and IORING_POLL_REMOVE deferrable
  io_uring: make HARDLINK imply LINK
  io_uring: any deferred command must have stable sqe data
  io_uring: remove 'sqe' parameter to the OP helpers that take it
  io_uring: fix pre-prepped issue with force_nonblock == true
  io-wq: re-add io_wq_current_is_worker()
  io_uring: fix sporadic -EFAULT from IORING_OP_RECVMSG
  io_uring: fix stale comment and a few typos

Merge tag 'drm-intel-fixes-2019-12-19' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Fix to drop an unused and harmful display W/A
- Fix to define EHL power wells independent of ICL
- Fix for priority inversion on bonded requests
- Fix in mmio offset calculation of DSB instance
- Fix memory leak from get_task_pid when banning clients
- Fixes to avoid dereference of uninitialized ops in dma_fence tracing
and keep reference to execbuf object until submitted.

- Includes gvt-fixes-2019-12-18

Signed-off-by: Dave Airlie <[email protected]>
From: Joonas Lahtinen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'exynos-drm-fixes-for-v5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes

Just one bug fixup
. Make sure to unregister a component for Exynos gscaler driver
when the driver is removed.

Signed-off-by: Dave Airlie <[email protected]>
From: Inki Dae <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

parisc: Fix compiler warnings in debug_core.c

Fix this compiler warning:
kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
arch/parisc/include/asm/cmpxchg.h:48:3: warning: value computed is not used [-Wunused-value]
   48 |  ((__typeof__(*(ptr)))__xchg((unsigned long)(x), (ptr), sizeof(*(ptr))))
arch/parisc/include/asm/atomic.h:78:30: note: in expansion of macro ‘xchg’
   78 | #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
      |                              ^~~~
kernel/debug/debug_core.c:596:4: note: in expansion of macro ‘atomic_xchg’
  596 |    atomic_xchg(&kgdb_active, cpu);
      |    ^~~~~~~~~~~

Signed-off-by: Helge Deller <[email protected]>

block: fix memleak when __blk_rq_map_user_iov() is failed

When I doing fuzzy test, get the memleak report:

BUG: memory leak
unreferenced object 0xffff88837af80000 (size 4096):
  comm "memleak", pid 3557, jiffies 4294817681 (age 112.499s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    20 00 00 00 10 01 00 00 00 00 00 00 01 00 00 00   ...............
  backtrace:
    [<000000001c894df8>] bio_alloc_bioset+0x393/0x590
    [<000000008b139a3c>] bio_copy_user_iov+0x300/0xcd0
    [<00000000a998bd8c>] blk_rq_map_user_iov+0x2f1/0x5f0
    [<000000005ceb7f05>] blk_rq_map_user+0xf2/0x160
    [<000000006454da92>] sg_common_write.isra.21+0x1094/0x1870
    [<00000000064bb208>] sg_write.part.25+0x5d9/0x950
    [<000000004fc670f6>] sg_write+0x5f/0x8c
    [<00000000b0d05c7b>] __vfs_write+0x7c/0x100
    [<000000008e177714>] vfs_write+0x1c3/0x500
    [<0000000087d23f34>] ksys_write+0xf9/0x200
    [<000000002c8dbc9d>] do_syscall_64+0x9f/0x4f0
    [<00000000678d8e9a>] entry_SYSCALL_64_after_hwframe+0x49/0xbe

If __blk_rq_map_user_iov() is failed in blk_rq_map_user_iov(),
the bio(s) which is allocated before this failing will leak. The
refcount of the bio(s) is init to 1 and increased to 2 by calling
bio_get(), but __blk_rq_unmap_user() only decrease it to 1, so
the bio cannot be freed. Fix it by calling blk_rq_unmap_user().

Reviewed-by: Bob Liu <[email protected]>
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

s390/dasd: fix typo in copyright statement

coypright -> copyright

Reported-by: Kate Stewart <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

s390/dasd: fix memleak in path handling error case

If for whatever reason the dasd_eckd_check_characteristics() function
exits after at least some paths have their configuration data
allocated those data is never freed again. In the error case the
device->private pointer is set to NULL and dasd_eckd_uncheck_device()
will exit without freeing the path data because of this NULL pointer.

Fix by calling dasd_eckd_clear_conf_data() for error cases.

Also use dasd_eckd_clear_conf_data() in dasd_eckd_uncheck_device()
to avoid code duplication.

Reported-by: Qian Cai <[email protected]>
Reviewed-by: Jan Hoeppner <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

s390/dasd/cio: Interpret ccw_device_get_mdc return value correctly

The max data count (mdc) is an unsigned 16-bit integer value as per AR
documentation and is received via ccw_device_get_mdc() for a specific
path mask from the CIO layer. The function itself also always returns a
positive mdc value or 0 in case mdc isn't supported or couldn't be
determined.

Though, the comment for this function describes a negative return value
to indicate failures.

As a result, the DASD device driver interprets the return value of
ccw_device_get_mdc() incorrectly. The error case is essentially a dead
code path.

To fix this behaviour, check explicitly for a return value of 0 and
change the comment for ccw_device_get_mdc() accordingly.

This fix merely enables the error code path in the DASD functions
get_fcx_max_data() and verify_fcx_max_data(). The actual functionality
stays the same and is still correct.

Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Jan Höppner <[email protected]>
Acked-by: Peter Oberparleiter <[email protected]>
Reviewed-by: Stefan Haberland <[email protected]>
Signed-off-by: Stefan Haberland <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

block: Fix a lockdep complaint triggered by request queue flushing

Avoid that running test nvme/012 from the blktests suite triggers the
following false positive lockdep complaint:

============================================
WARNING: possible recursive locking detected
5.0.0-rc3-xfstests-00015-g1236f7d60242 #841 Not tainted
--------------------------------------------
ksoftirqd/1/16 is trying to acquire lock:
000000000282032e (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

but task is already holding lock:
00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

other info that might help us debug this:
Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&fq->mq_flush_lock)->rlock);
  lock(&(&fq->mq_flush_lock)->rlock);

*** DEADLOCK ***

May be due to missing lock nesting notation

1 lock held by ksoftirqd/1/16:
#0: 00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

stack backtrace:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3-xfstests-00015-g1236f7d60242 #841
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
dump_stack+0x67/0x90
__lock_acquire.cold.45+0x2b4/0x313
lock_acquire+0x98/0x160
_raw_spin_lock_irqsave+0x3b/0x80
flush_end_io+0x4e/0x1d0
blk_mq_complete_request+0x76/0x110
nvmet_req_complete+0x15/0x110 [nvmet]
nvmet_bio_done+0x27/0x50 [nvmet]
blk_update_request+0xd7/0x2d0
blk_mq_end_request+0x1a/0x100
blk_flush_complete_seq+0xe5/0x350
flush_end_io+0x12f/0x1d0
blk_done_softirq+0x9f/0xd0
__do_softirq+0xca/0x440
run_ksoftirqd+0x24/0x50
smpboot_thread_fn+0x113/0x1e0
kthread+0x121/0x140
ret_from_fork+0x3a/0x50

Cc: Christoph Hellwig <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

block: Fix the type of 'sts' in bsg_queue_rq()

This patch fixes the following sparse warnings:

block/bsg-lib.c:269:19: warning: incorrect type in initializer (different base types)
block/bsg-lib.c:269:19:    expected int sts
block/bsg-lib.c:269:19:    got restricted blk_status_t [usertype]
block/bsg-lib.c:286:16: warning: incorrect type in return expression (different base types)
block/bsg-lib.c:286:16:    expected restricted blk_status_t
block/bsg-lib.c:286:16:    got int [assigned] sts

Cc: Martin Wilck <[email protected]>
Fixes: d46fe2cb2dce ("block: drop device references in bsg_queue_rq()")
Signed-off-by: Bart Van Assche <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

parisc: soft_offline_page() now takes the pfn

Switch page deallocation table (pdt) driver to use pfn instead of a page
pointer in soft_offline_page().

Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn")
Signed-off-by: Helge Deller <[email protected]>

Merge tag 'iommu-fixes-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

- Fix kmemleak warning in IOVA code

- Fix compile warnings on ARM32/64 in dma-iommu code due to dma_mask
   type mismatches

- Make ISA reserved regions relaxable, so that VFIO can assign devices
   which have such regions defined

- Fix mapping errors resulting in IO page-faults in the VT-d driver

- Make sure direct mappings for a domain are created after the default
   domain is updated

- Map ISA reserved regions in the VT-d driver with correct permissions

- Remove unneeded check for PSI capability in the IOTLB flush code of
   the VT-d driver

- Lockdep fix iommu_dma_prepare_msi()

* tag 'iommu-fixes-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/dma: Relax locking in iommu_dma_prepare_msi()
  iommu/vt-d: Remove incorrect PSI capability check
  iommu/vt-d: Allocate reserved region for ISA with correct permission
  iommu: set group default domain before creating direct mappings
  iommu/vt-d: Fix dmar pte read access not set error
  iommu/vt-d: Set ISA bridge reserved region as relaxable
  iommu/dma: Rationalise types for DMA masks
  iommu/iova: Init the struct iova to fix the possible memleak

Merge tag 'platform-drivers-x86-v5.5-2' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver fixes from Andy Shevchenko:
"Bucket of fixes for PDx86. Note, that there is no ABI breakage in
  Mellanox driver because it has been introduced in v5.5-rc1, so we can
  change it.

  Summary:

   - Add support of APUv4 and fix an assignment of simswap GPIO

   - Add Siemens CONNECT X300 to DMI table to avoid stuck during boot

   - Correct arguments of WMI call on HP Envy x360 15-cp0xxx model

   - Fix the mlx-bootctl sysfs attributes to be device related"

* tag 'platform-drivers-x86-v5.5-2' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: pcengines-apuv2: Spelling fixes in the driver
  platform/x86: pcengines-apuv2: detect apuv4 board
  platform/x86: pcengines-apuv2: fix simswap GPIO assignment
  platform/x86: pmc_atom: Add Siemens CONNECT X300 to critclk_systems DMI table
  platform/x86: hp-wmi: Make buffer for HPWMI_FEATURE2_QUERY 128 bytes
  platform/mellanox: fix the mlx-bootctl sysfs

Merge tag 'mmc-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull MMC host fixes from Ulf Hansson:

- mtk-sd: Fix tuning for MT8173 HS200/HS400 mode

- sdhci: Revert a fix for incorrect switch to HS mode

- sdhci-msm: Fixup accesses to the DDR_CONFIG register

- sdhci-of-esdhc: Revert a bad fix for erratum A-009204

- sdhci-of-esdhc: Re-implement fix for erratum A-009204

- sdhci-of-esdhc: Fixup P2020 errata handling

- sdhci-pci: Disable broken CMDQ on Intel GLK based Lenovo systems

* tag 'mmc-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-of-esdhc: re-implement erratum A-009204 workaround
  mmc: sdhci: Add a quirk for broken command queuing
  mmc: sdhci: Workaround broken command queuing on Intel GLK
  mmc: sdhci-of-esdhc: fix P2020 errata handling
  mmc: sdhci: Update the tuning failed messages to pr_debug level
  mmc: sdhci-of-esdhc: Revert "mmc: sdhci-of-esdhc: add erratum A-009204 support"
  mmc: mediatek: fix CMD_TA to 2 for MT8173 HS200/HS400 mode
  mmc: sdhci-msm: Correct the offset and value for DDR_CONFIG register
  Revert "mmc: sdhci: Fix incorrect switch to HS mode"

Merge tag 'char-misc-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
"Here are some small char and other driver fixes for 5.5-rc3.

  The most noticable one is a much-reported fix for a random driver
  issue that came up from 5.5-rc1 compat_ioctl cleanups. The others are
  a chunk of habanalab driver fixes and intel_th driver fixes and new
  device ids.

  All have been in linux-next with no reported issues"

* tag 'char-misc-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  random: don't forget compat_ioctl on urandom
  intel_th: msu: Fix window switching without windows
  intel_th: Fix freeing IRQs
  intel_th: pci: Add Elkhart Lake SOC support
  intel_th: pci: Add Comet Lake PCH-V support
  habanalabs: remove variable 'val' set but not used
  habanalabs: rate limit error msg on waiting for CS

Merge tag 'staging-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for a number of reported
  issues.

  The majority here are some fixes for the wfx driver, but also in here
  is a comedi driver fix found during some code review, and an axis-fifo
  build dependancy issue to resolve some reported testing problems.

  All of these have been in linux-next with no reported issues"

* tag 'staging-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: wfx: fix wrong error message
  staging: wfx: fix hif_set_mfp() with big endian hosts
  staging: wfx: detect race condition in WEP authentication
  staging: wfx: ensure that retry policy always fallbacks to MCS0 / 1Mbps
  staging: wfx: fix rate control handling
  staging: wfx: firmware does not support more than 32 total retries
  staging: wfx: use boolean appropriately
  staging: wfx: fix counter overflow
  staging: wfx: fix case of lack of tx_retry_policies
  staging: wfx: fix the cache of rate policies on interface reset
  staging: axis-fifo: add unspecified HAS_IOMEM dependency
  staging: comedi: gsc_hpdi: check dma_alloc_coherent() return value

arm64: cpu_errata: Add Hisilicon TSV110 to spectre-v2 safe list

HiSilicon Taishan v110 CPUs didn't implement CSV2 field of the
ID_AA64PFR0_EL1, but spectre-v2 is mitigated by hardware, so
whitelist the MIDR in the safe list.

Signed-off-by: Wei Li <[email protected]>
[hanjun: re-write the commit log]
Signed-off-by: Hanjun Guo <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>

Merge tag 'tty-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial driver fixes for 5.5-rc3.

  Only four small patches here:

   - atmel serial driver fix

   - msm_serial driver fix

   - sprd serial driver fix

   - tty core port fix

  The last tty core fix should resolve a long-standing bug with a race
  at port creation time that some people would see, and Sudip finally
  tracked down.

  All of these have been in linux-next with no reported issues"

* tag 'tty-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty/serial: atmel: fix out of range clock divider handling
  tty: link tty and port before configuring it as console
  serial: sprd: Add clearing break interrupt operation
  tty: serial: msm_serial: Fix lockup for sysrq and oops

Merge tag 'usb-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are some small USB fixes for some reported issues.

  Included in here are:

   - xhci build warning fix

   - ehci disconnect warning fix

   - usbip lockup fix and error cleanup fix

   - typec build fix

  All of these have been in linux-next with no reported issues"

* tag 'usb-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: xhci: Fix build warning seen with CONFIG_PM=n
  usbip: Fix error path of vhci_recv_ret_submit()
  usbip: Fix receive error in vhci-hcd when using scatter-gather
  USB: EHCI: Do not return -EPIPE when hub is disconnected
  usb: typec: fusb302: Fix an undefined reference to 'extcon_get_state'

Merge tag 'pinctrl-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
"Sorry that this fixes pull request took a while. Too much christmas
  business going on.

  This contains a few really important Intel fixes and some odd fixes:

   - A host of fixes for the Intel baytrail and cherryview: properly
     serialize all register accesses and add the irqchip with the
     gpiochip as we need to, fix some pin lists and initialize the
     hardware in the right order.

   - Fix the Aspeed G6 LPC configuration.

   - Handle a possible NULL pointer exception in the core.

   - Fix the Kconfig dependencies for the Equilibrium driver"

* tag 'pinctrl-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: ingenic: Fixup PIN_CONFIG_OUTPUT config
  pinctrl: Modify Kconfig to fix linker error
  pinctrl: pinmux: fix a possible null pointer in pinmux_can_be_used_for_gpio
  pinctrl: aspeed-g6: Fix LPC/eSPI mux configuration
  pinctrl: cherryview: Pass irqchip when adding gpiochip
  pinctrl: cherryview: Add GPIO <-> pin mapping ranges via callback
  pinctrl: cherryview: Split out irq hw-init into a separate helper function
  pinctrl: baytrail: Pass irqchip when adding gpiochip
  pinctrl: baytrail: Add GPIO <-> pin mapping ranges via callback
  pinctrl: baytrail: Update North Community pin list
  pinctrl: baytrail: Really serialize all register accesses

io_uring: pass in 'sqe' to the prep handlers

This moves the prep handlers outside of the opcode handlers, and allows
us to pass in the sqe directly. If the sqe is non-NULL, it means that
the request should be prepared for the first time.

With the opcode handlers not having access to the sqe at all, we are
guaranteed that the prep handler has setup the request fully by the
time we get there. As before, for opcodes that need to copy in more
data then the io_kiocb allows for, the io_async_ctx holds that info. If
a prep handler is invoked with req->io set, it must use that to retain
information for later.

Finally, we can remove io_kiocb->sqe as well.

Signed-off-by: Jens Axboe <[email protected]>

io_uring: standardize the prep methods

We currently have a mix of use cases. Most of the newer ones are pretty
uniform, but we have some older ones that use different calling
calling conventions. This is confusing.

For the opcodes that currently rely on the req->io->sqe copy saving
them from reuse, add a request type struct in the io_kiocb command
union to store the data they need.

Prepare for all opcodes having a standard prep method, so we can call
it in a uniform fashion and outside of the opcode handler. This is in
preparation for passing in the 'sqe' pointer, rather than storing it
in the io_kiocb. Once we have uniform prep handlers, we can leave all
the prep work to that part, and not even pass in the sqe to the opcode
handler. This ensures that we don't reuse sqe data inadvertently.

Signed-off-by: Jens Axboe <[email protected]>

platform/x86: pcengines-apuv2: Spelling fixes in the driver

Mainly does:
- capitalize gpio and bios to GPIO and BIOS
- capitalize beginning of comments
- add periods in multi-line comments

Signed-off-by: Andy Shevchenko <[email protected]>

platform/x86: pcengines-apuv2: detect apuv4 board

GPIO stuff on APUv4 seems to be the same as on APUv2, so we just
need to match on DMI data.

Signed-off-by: Enrico Weigelt, metux IT consult <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>

platform/x86: pcengines-apuv2: fix simswap GPIO assignment

The mapping entry has to hold the GPIO line index instead of
controller's register number.

Fixes: 5037d4ddda31 ("platform/x86: pcengines-apuv2: wire up simswitch gpio as led")
Signed-off-by: Enrico Weigelt, metux IT consult <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>

platform/x86: pmc_atom: Add Siemens CONNECT X300 to critclk_systems DMI table

The CONNECT X300 uses the PMC clock for on-board components and gets
stuck during boot if the clock is disabled. Therefore, add this
device to the critical systems list.
Tested on CONNECT X300.

Fixes: 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL")
Signed-off-by: Michael Haener <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>

platform/x86: hp-wmi: Make buffer for HPWMI_FEATURE2_QUERY 128 bytes

At least on the HP Envy x360 15-cp0xxx model the WMI interface
for HPWMI_FEATURE2_QUERY requires an outsize of at least 128 bytes,
otherwise it fails with an error code 5 (HPWMI_RET_INVALID_PARAMETERS):

Dec 06 00:59:38 kernel: hp_wmi: query 0xd returned error 0x5

We do not care about the contents of the buffer, we just want to know
if the HPWMI_FEATURE2_QUERY command is supported.

This commits bumps the buffer size, fixing the error.

Fixes: 8a1513b4932 ("hp-wmi: limit hotkey enable")
Cc: [email protected]
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1520703
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>

platform/mellanox: fix the mlx-bootctl sysfs

This is a follow-up commit for the sysfs attributes to change
from DRIVER_ATTR to DEVICE_ATTR according to some initial comments.
In such case, it's better to point the sysfs path to the device
itself instead of the driver. The ABI document is also updated.

Fixes: 79e29cb8fbc5 ("platform/mellanox: Add bootctl driver for Mellanox BlueField Soc")
Signed-off-by: Liming Sun <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>

io_uring: read 'count' for IORING_OP_TIMEOUT in prep handler

Add the count field to struct io_timeout, and ensure the prep handler
has read it. Timeout also needs an async context always, set it up
in the prep handler if we don't have one.

Signed-off-by: Jens Axboe <[email protected]>

io_uring: move all prep state for IORING_OP_{SEND,RECV}_MGS to prep handler

Add struct io_sr_msg in our io_kiocb per-command union, and ensure that
the send/recvmsg prep handlers have grabbed what they need from the SQE
by the time prep is done.

Signed-off-by: Jens Axboe <[email protected]>

io_uring: move all prep state for IORING_OP_CONNECT to prep handler

Add struct io_connect in our io_kiocb per-command union, and ensure
that io_connect_prep() has grabbed what it needs from the SQE.

Signed-off-by: Jens Axboe <[email protected]>

io_uring: add and use struct io_rw for read/writes

Put the kiocb in struct io_rw, and add the addr/len for the request as
well. Use the kiocb->private field for the buffer index for fixed reads
and writes.

Any use of kiocb->ki_filp is flipped to req->file. It's the same thing,
and less confusing.

Signed-off-by: Jens Axboe <[email protected]>

xfs: Make the symbol 'xfs_rtalloc_log_count' static

Fix the following sparse warning:

fs/xfs/libxfs/xfs_trans_resv.c:206:1: warning: symbol 'xfs_rtalloc_log_count' was not declared. Should it be static?

Fixes: b1de6fc7520f ("xfs: fix log reservation overflows when allocating large rt extents")
Signed-off-by: Chen Wandun <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

io_uring: use u64_to_user_ptr() consistently

We use it in some spots, but not consistently. Convert the rest over,
makes it easier to read as well.

No functional changes in this patch.

Signed-off-by: Jens Axboe <[email protected]>

xen/grant-table: remove multiple BUG_ON on gnttab_interface

gnttab_request_version() always sets the gnttab_interface variable
and the assertions to check for empty gnttab_interface is unnecessary.
The patch eliminates multiple such assertions.

Signed-off-by: Aditya Pakki <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

xen-blkback: support dynamic unbind/bind

By simply re-attaching to shared rings during connect_ring() rather than
assuming they are freshly allocated (i.e assuming the counters are zero)
it is possible for vbd instances to be unbound and re-bound from and to
(respectively) a running guest.

This has been tested by running:

while true;
  do fio --name=randwrite --ioengine=libaio --iodepth=16 \
  --rw=randwrite --bs=4k --direct=1 --size=1G --verify=crc32;
  done

in a PV guest whilst running:

while true;
  do echo vbd-$DOMID-$VBD >unbind;
  echo unbound;
  sleep 5;
  echo vbd-$DOMID-$VBD >bind;
  echo bound;
  sleep 3;
  done

in dom0 from /sys/bus/xen-backend/drivers/vbd to continuously unbind and
re-bind its system disk image.

This is a highly useful feature for a backend module as it allows it to be
unloaded and re-loaded (i.e. updated) without requiring domUs to be halted.
This was also tested by running:

while true;
  do echo vbd-$DOMID-$VBD >unbind;
  echo unbound;
  sleep 5;
  rmmod xen-blkback;
  echo unloaded;
  sleep 1;
  modprobe xen-blkback;
  echo bound;
  cd $(pwd);
  sleep 3;
  done

in dom0 whilst running the same loop as above in the (single) PV guest.

Some (less stressful) testing has also been done using a Windows HVM guest
with the latest 9.0 PV drivers installed.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Reviewed-by: Roger Pau Monné <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

xen/interface: re-define FRONT/BACK_RING_ATTACH()

Currently these macros are defined to re-initialize a front/back ring
(respectively) to values read from the shared ring in such a way that any
requests/responses that are added to the shared ring whilst the front/back
is detached will be skipped over. This, in general, is not a desirable
semantic since most frontend implementations will eventually block waiting
for a response which would either never appear or never be processed.

Since the macros are currently unused, take this opportunity to re-define
them to re-initialize a front/back ring using specified values. This also
allows FRONT/BACK_RING_INIT() to be re-defined in terms of
FRONT/BACK_RING_ATTACH() using a specified value of 0.

NOTE: BACK_RING_ATTACH() will be used directly in a subsequent patch.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

xenbus: limit when state is forced to closed

If a driver probe() fails then leave the xenstore state alone. There is no
reason to modify it as the failure may be due to transient resource
allocation issues and hence a subsequent probe() may succeed.

If the driver supports re-binding then only force state to closed during
remove() only in the case when the toolstack may need to clean up. This can
be detected by checking whether the state in xenstore has been set to
closing prior to device removal.

NOTE: Re-bind support is indicated by new boolean in struct xenbus_driver,
which defaults to false. Subsequent patches will add support to
some backend drivers.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

xenbus: move xenbus_dev_shutdown() into frontend code...

...and make it static

xenbus_dev_shutdown() is seemingly intended to cause clean shutdown of PV
frontends when a guest is rebooted. Indeed the function waits for a
conpletion which is only set by a call to xenbus_frontend_closed().

This patch removes the shutdown() method from backends and moves
xenbus_dev_shutdown() from xenbus_probe.c into xenbus_probe_frontend.c,
renaming it appropriately and making it static.

NOTE: In the case where the backend is running in a driver domain, the
      toolstack should have already terminated any frontends that may be
      using it (since Xen does not support re-startable PV driver domains)
      so xenbus_dev_shutdown() should never be called.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk

Clang warns:

../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation;
statement is not part of the previous 'if' [-Wmisleading-indentation]
                nr_parts = PARTS_PER_DISK;
                ^
../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here
                if (err)
                ^

This is because there is a space at the beginning of this line; remove
it so that the indentation is consistent according to the Linux kernel
coding style and clang no longer warns.

While we are here, the previous line has some trailing whitespace; clean
that up as well.

Fixes: c80a420995e7 ("xen-blkfront: handle Xen major numbers other than XENVBD")
Link: https://github.com/ClangBuiltLinux/linux/issues/791
Signed-off-by: Nathan Chancellor <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Acked-by: Roger Pau Monné <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

riscv: move sifive_l2_cache.c to drivers/soc

The sifive_l2_cache.c is in no way related to RISC-V architecture
memory management. It is a little stub driver working around the fact
that the EDAC maintainers prefer their drivers to be structured in a
certain way that doesn't fit the SiFive SOCs.

Move the file to drivers/soc and add a Kconfig option for it, as well
as the whole drivers/soc boilerplate for CONFIG_SOC_SIFIVE.

Fixes: a967a289f169 ("RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs")
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
[[email protected]: keep the MAINTAINERS change specific to the L2$ controller code]
Signed-off-by: Paul Walmsley <[email protected]>

riscv: define vmemmap before pfn_to_page calls

pfn_to_page & page_to_pfn depend on vmemmap being available before the calls
if kernel is configured with CONFIG_SPARSEMEM_VMEMMAP=y. This was caused
by NOMMU changes which moved vmemmap definition bellow functions definitions
calling pfn_to_page & page_to_pfn.

Noticed while compiled 5.5-rc2 kernel for Fedora/RISCV.

v2:
- Add a comment for vmemmap in source

Signed-off-by: David Abdurachmanov <[email protected]>
Fixes: 6bd33e1ece52 ("riscv: add nommu support")
Reviewed-by: Anup Patel <[email protected]>
Signed-off-by: Paul Walmsley <[email protected]>

riscv: fix scratch register clearing in M-mode.

This patch fixes that the sscratch register clearing in M-mode. It cleared
sscratch register in M-mode, but it should clear mscratch register. That will
cause kernel trap if the CPU core doesn't support S-mode when trying to access
sscratch.

Fixes: 9e80635619b5 ("riscv: clear the instruction cache and all registers when booting")
Signed-off-by: Greentime Hu <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Signed-off-by: Paul Walmsley <[email protected]>

riscv: Fix use of undefined config option CONFIG_CONFIG_MMU

In Kconfig files, config options are written without the CONFIG_ prefix.

Fixes: 6bd33e1ece52 ("riscv: add nommu support")
Signed-off-by: Andreas Schwab <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Signed-off-by: Paul Walmsley <[email protected]>

Merge branch 'cls_u32-fix-refcount-leak'

Davide Caratti says:

====================
net/sched: cls_u32: fix refcount leak

a refcount leak in the error path of u32_change() has been recently
introduced. It can be observed with the following commands:

  [root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 97 \
  > u32 match ip src 127.0.0.1/32 indev notexist20 flowid 1:1 action drop
  RTNETLINK answers: Invalid argument
  We have an error talking to the kernel
  [root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 98 \
  > handle 42:42 u32 divisor 256
  Error: cls_u32: Divisor can only be used on a hash table.
  We have an error talking to the kernel
  [root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 99 \
  > u32 ht 47:47
  Error: cls_u32: Specified hash table not found.
  We have an error talking to the kernel

they all legitimately return -EINVAL; however, they leave semi-configured
filters at eth0 tc ingress:

[root@f31 ~]# tc filter show dev eth0 ingress
filter protocol ip pref 97 u32 chain 0
filter protocol ip pref 97 u32 chain 0 fh 800: ht divisor 1
filter protocol ip pref 98 u32 chain 0
filter protocol ip pref 98 u32 chain 0 fh 801: ht divisor 1
filter protocol ip pref 99 u32 chain 0
filter protocol ip pref 99 u32 chain 0 fh 802: ht divisor 1

With older kernels, filters were unconditionally considered empty (and
thus de-refcounted) on the error path of ->change().
After commit 8b64678e0af8 ("net: sched: refactor tp insert/delete for
concurrent execution"), filters were considered empty when the walk()
function didn't set 'walker.stop' to 1.
Finally, with commit 6676d5e416ee ("net: sched: set dedicated tcf_walker
flag when tp is empty"), tc filters are considered empty unless the walker
function is called with a non-NULL handle. This last change doesn't fit
cls_u32 design, because at least the "root hnode" is (almost) always
non-NULL, as it's allocated in u32_init().

- patch 1/2 is a proposal to restore the original kernel behavior, where
  no filter was installed in the error path of u32_change().
- patch 2/2 adds tdc selftests that can be ued to verify the correct
  behavior of u32 in the error path of ->change().
====================

Signed-off-by: David S. Miller <[email protected]>

tc-testing: initial tdc selftests for cls_u32

- move test "e9a3 - Add u32 with source match" to u32.json, and change the
match pattern to catch all hnodes
- add testcases for relevant error paths of cls_u32 module

Signed-off-by: Davide Caratti <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/sched: cls_u32: fix refcount leak in the error path of u32_change()

when users replace cls_u32 filters with new ones having wrong parameters,
so that u32_change() fails to validate them, the kernel doesn't roll-back
correctly, and leaves semi-configured rules.

Fix this in u32_walk(), avoiding a call to the walker function on filters
that don't have a match rule connected. The side effect is, these "empty"
filters are not even dumped when present; but that shouldn't be a problem
as long as we are restoring the original behaviour, where semi-configured
filters were not even added in the error path of u32_change().

Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty")
Signed-off-by: Davide Caratti <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nfc: s3fwrn5: replace the assertion with a WARN_ON

In s3fwrn5_fw_recv_frame, if fw_info->rsp is not empty, the
current code causes a crash via BUG_ON. However, s3fwrn5_fw_send_msg
does not crash in such a scenario. The patch replaces the BUG_ON
by returning the error to the callers and frees up skb.

Signed-off-by: Aditya Pakki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'macb-fix-probing-of-PHY-not-described-in-the-dt'

Antoine Tenart says:

====================
net: macb: fix probing of PHY not described in the dt

The macb Ethernet driver supports various ways of referencing its
network PHY. When a device tree is used the PHY can be referenced with
a phy-handle or, if connected to its internal MDIO bus, described in
a child node. Some platforms omitted the PHY description while
connecting the PHY to the internal MDIO bus and in such cases the MDIO
bus has to be scanned "manually" by the macb driver.

Prior to the phylink conversion the driver registered the MDIO bus with
of_mdiobus_register and then in case the PHY couldn't be retrieved
using dt or using phy_find_first (because registering an MDIO bus with
of_mdiobus_register masks all PHYs) the macb driver was "manually"
scanning the MDIO bus (like mdiobus_register does). The phylink
conversion did break this particular case but reimplementing the manual
scan of the bus in the macb driver wouldn't be very clean. The solution
seems to be registering the MDIO bus based on if the PHYs are described
in the device tree or not.

There are multiple ways to do this, none is perfect. I chose to check if
any of the child nodes of the macb node was a network PHY and based on
this to register the MDIO bus with the of_ helper or not. The drawback
is boards referencing the PHY through phy-handle, would scan the entire
MDIO bus of the macb at boot time (as the MDIO bus would be registered
with mdiobus_register). For this solution to work properly
of_mdiobus_child_is_phy has to be exported, which means the patch doing
so has to be backported to -stable as well.

Another possible solution could have been to simply check if the macb
node has a child node by counting its sub-nodes. This isn't techically
perfect, as there could be other sub-nodes (in practice this should be
fine, fixed-link being taken care of in the driver). We could also
simply s/of_mdiobus_register/mdiobus_register/ but that could break
boards using the PHY description in child node as a selector (which
really would be not a proper way to do this...).

The real issue here being having PHYs not described in the dt but we
have dt backward compatibility, so we have to live with that.
====================

Signed-off-by: David S. Miller <[email protected]>

net: macb: fix probing of PHY not described in the dt

This patch fixes the case where the PHY isn't described in the device
tree. This is due to the way the MDIO bus is registered in the driver:
whether the PHY is described in the device tree or not, the bus is
registered through of_mdiobus_register. The function masks all the PHYs
and only allow probing the ones described in the device tree. Prior to
the Phylink conversion this was also done but later on in the driver
the MDIO bus was manually scanned to circumvent the fact that the PHY
wasn't described.

This patch fixes it in a proper way, by registering the MDIO bus based
on if the PHY attached to a given interface is described in the device
tree or not.

Fixes: 7897b071ac3b ("net: macb: convert to phylink")
Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

of: mdio: export of_mdiobus_child_is_phy

This patch exports of_mdiobus_child_is_phy, allowing to check if a child
node is a network PHY.

Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

scsi: target/iblock: Fix protection error with blocks greater than 512B

The sector size of the block layer is 512 bytes, but integrity interval
size might be different (in case of 4K block size of the media). At the
initiator side the virtual start sector is the one that was originally
submitted by the block layer (512 bytes) for the Reftag usage. The
initiator converts the Reftag to integrity interval units and sends it to
the target. So the target virtual start sector should be calculated at
integrity interval units. prepare_fn() and complete_fn() don't remap
correctly the Reftag when using incorrect units of the virtual start
sector, which leads to the following protection error at the device:

"blk_update_request: protection error, dev sdb, sector 2048 op 0x0:(READ)
flags 0x10000 phys_seg 1 prio class 0"

To fix that, set the seed in integrity interval units.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Israel Rukshin <[email protected]>
Reviewed-by: Max Gurtovoy <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: libcxgbi: fix NULL pointer dereference in cxgbi_device_destroy()

If cxgb4i_ddp_init() fails then cdev->cdev2ppm will be NULL, so add a check
for NULL pointer before dereferencing it.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Varun Prakash <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: lpfc: fix spelling mistakes of asynchronous

There are spelling mistakes of asynchronous in a lpfc_printf_log message
and comments. Fix these.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Reviewed-by: James Smart <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

tracing: Have the histogram compare functions convert to u64 first

The compare functions of the histogram code would be specific for the size
of the value being compared (byte, short, int, long long). It would
reference the value from the array via the type of the compare, but the
value was stored in a 64 bit number. This is fine for little endian
machines, but for big endian machines, it would end up comparing zeros or
all ones (depending on the sign) for anything but 64 bit numbers.

To fix this, first derference the value as a u64 then convert it to the type
being compared.

Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: 08d43a5fa063e ("tracing: Add lock-free tracing_map")
Acked-by: Tom Zanussi <[email protected]>
Reported-by: Sven Schnelle <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Avoid memory leak in process_system_preds()

When failing in the allocation of filter_item, process_system_preds()
goes to fail_mem, where the allocated filter is freed.

However, this leads to memory leak of filter->filter_string and
filter->prog, which is allocated before and in process_preds().
This bug has been detected by kmemleak as well.

Fix this by changing kfree to __free_fiter.

unreferenced object 0xffff8880658007c0 (size 32):
  comm "bash", pid 579, jiffies 4295096372 (age 17.752s)
  hex dump (first 32 bytes):
    63 6f 6d 6d 6f 6e 5f 70 69 64 20 20 3e 20 31 30  common_pid  > 10
    00 00 00 00 00 00 00 00 65 73 00 00 00 00 00 00  ........es......
  backtrace:
    [<0000000067441602>] kstrdup+0x2d/0x60
    [<00000000141cf7b7>] apply_subsystem_event_filter+0x378/0x932
    [<000000009ca32334>] subsystem_filter_write+0x5a/0x90
    [<0000000072da2bee>] vfs_write+0xe1/0x240
    [<000000004f14f473>] ksys_write+0xb4/0x150
    [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0
    [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff888060c22d00 (size 64):
  comm "bash", pid 579, jiffies 4295096372 (age 17.752s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 00 e8 d7 41 80 88 ff ff  ...........A....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000b8c1b109>] process_preds+0x243/0x1820
    [<000000003972c7f0>] apply_subsystem_event_filter+0x3be/0x932
    [<000000009ca32334>] subsystem_filter_write+0x5a/0x90
    [<0000000072da2bee>] vfs_write+0xe1/0x240
    [<000000004f14f473>] ksys_write+0xb4/0x150
    [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0
    [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff888041d7e800 (size 512):
  comm "bash", pid 579, jiffies 4295096372 (age 17.752s)
  hex dump (first 32 bytes):
    70 bc 85 97 ff ff ff ff 0a 00 00 00 00 00 00 00  p...............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000001e04af34>] process_preds+0x71a/0x1820
    [<000000003972c7f0>] apply_subsystem_event_filter+0x3be/0x932
    [<000000009ca32334>] subsystem_filter_write+0x5a/0x90
    [<0000000072da2bee>] vfs_write+0xe1/0x240
    [<000000004f14f473>] ksys_write+0xb4/0x150
    [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0
    [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: http://lkml.kernel.org/r/[email protected]
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Fixes: 404a3add43c9c ("tracing: Only add filter list when needed")
Signed-off-by: Keita Suzuki <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2019-12-19

The following pull-request contains BPF updates for your *net* tree.

We've added 10 non-merge commits during the last 8 day(s) which contain
a total of 21 files changed, 269 insertions(+), 108 deletions(-).

The main changes are:

1) Fix lack of synchronization between xsk wakeup and destroying resources
   used by xsk wakeup, from Maxim Mikityanskiy.

2) Fix pruning with tail call patching, untrack programs in case of verifier
   error and fix a cgroup local storage tracking bug, from Daniel Borkmann.

3) Fix clearing skb->tstamp in bpf_redirect() when going from ingress to
   egress which otherwise cause issues e.g. on fq qdisc, from Lorenz Bauer.

4) Fix compile warning of unused proc_dointvec_minmax_bpf_restricted() when
   only cBPF is present, from Alexander Lobakin.
====================

Signed-off-by: David S. Miller <[email protected]>

bpf: Add further test_verifier cases for record_func_key

Expand dummy prog generation such that we can easily check on return
codes and add few more test cases to make sure we keep on tracking
pruning behavior.

  # ./test_verifier
  [...]
  #1066/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
  #1067/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
  Summary: 1580 PASSED, 0 SKIPPED, 0 FAILED

Also verified that JIT dump of added test cases looks good.

Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/df7200b6021444fd369376d227de917357285b65.1576789878.git.daniel@iogearbox.net

bpf: Fix record_func_key to perform backtracking on r3

While testing Cilium with /unreleased/ Linus' tree under BPF-based NodePort
implementation, I noticed a strange BPF SNAT engine behavior from time to
time. In some cases it would do the correct SNAT/DNAT service translation,
but at a random point in time it would just stop and perform an unexpected
translation after SYN, SYN/ACK and stack would send a RST back. While initially
assuming that there is some sort of a race condition in BPF code, adding
trace_printk()s for debugging purposes at some point seemed to have resolved
the issue auto-magically.

Digging deeper on this Heisenbug and reducing the trace_printk() calls to
an absolute minimum, it turns out that a single call would suffice to
trigger / not trigger the seen RST issue, even though the logic of the
program itself remains unchanged. Turns out the single call changed verifier
pruning behavior to get everything to work. Reconstructing a minimal test
case, the incorrect JIT dump looked as follows:

  # bpftool p d j i 11346
  0xffffffffc0cba96c:
  [...]
    21:   movzbq 0x30(%rdi),%rax
    26:   cmp    $0xd,%rax
    2a:   je     0x000000000000003a
    2c:   xor    %edx,%edx
    2e:   movabs $0xffff89cc74e85800,%rsi
    38:   jmp    0x0000000000000049
    3a:   mov    $0x2,%edx
    3f:   movabs $0xffff89cc74e85800,%rsi
    49:   mov    -0x224(%rbp),%eax
    4f:   cmp    $0x20,%eax
    52:   ja     0x0000000000000062
    54:   add    $0x1,%eax
    57:   mov    %eax,-0x224(%rbp)
    5d:   jmpq   0xffffffffffff6911
    62:   mov    $0x1,%eax
  [...]

Hence, unexpectedly, JIT emitted a direct jump even though retpoline based
one would have been needed since in line 2c and 3a we have different slot
keys in BPF reg r3. Verifier log of the test case reveals what happened:

  0: (b7) r0 = 14
  1: (73) *(u8 *)(r1 +48) = r0
  2: (71) r0 = *(u8 *)(r1 +48)
  3: (15) if r0 == 0xd goto pc+4
   R0_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=ctx(id=0,off=0,imm=0) R10=fp0
  4: (b7) r3 = 0
  5: (18) r2 = 0xffff89cc74d54a00
  7: (05) goto pc+3
  11: (85) call bpf_tail_call#12
  12: (b7) r0 = 1
  13: (95) exit
  from 3 to 8: R0_w=inv13 R1=ctx(id=0,off=0,imm=0) R10=fp0
  8: (b7) r3 = 2
  9: (18) r2 = 0xffff89cc74d54a00
  11: safe
  processed 13 insns (limit 1000000) [...]

Second branch is pruned by verifier since considered safe, but issue is that
record_func_key() couldn't have seen the index in line 3a and therefore
decided that emitting a direct jump at this location was okay.

Fix this by reusing our backtracking logic for precise scalar verification
in order to prevent pruning on the slot key. This means verifier will track
content of r3 all the way backwards and only prune if both scalars were
unknown in state equivalence check and therefore poisoned in the first place
in record_func_key(). The range is [x,x] in record_func_key() case since
the slot always would have to be constant immediate. Correct verification
after fix:

  0: (b7) r0 = 14
  1: (73) *(u8 *)(r1 +48) = r0
  2: (71) r0 = *(u8 *)(r1 +48)
  3: (15) if r0 == 0xd goto pc+4
   R0_w=invP(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=ctx(id=0,off=0,imm=0) R10=fp0
  4: (b7) r3 = 0
  5: (18) r2 = 0x0
  7: (05) goto pc+3
  11: (85) call bpf_tail_call#12
  12: (b7) r0 = 1
  13: (95) exit
  from 3 to 8: R0_w=invP13 R1=ctx(id=0,off=0,imm=0) R10=fp0
  8: (b7) r3 = 2
  9: (18) r2 = 0x0
  11: (85) call bpf_tail_call#12
  12: (b7) r0 = 1
  13: (95) exit
  processed 15 insns (limit 1000000) [...]

And correct corresponding JIT dump:

  # bpftool p d j i 11
  0xffffffffc0dc34c4:
  [...]
    21:   movzbq 0x30(%rdi),%rax
    26:   cmp    $0xd,%rax
    2a:   je     0x000000000000003a
    2c:   xor    %edx,%edx
    2e:   movabs $0xffff9928b4c02200,%rsi
    38:   jmp    0x0000000000000049
    3a:   mov    $0x2,%edx
    3f:   movabs $0xffff9928b4c02200,%rsi
    49:   cmp    $0x4,%rdx
    4d:   jae    0x0000000000000093
    4f:   and    $0x3,%edx
    52:   mov    %edx,%edx
    54:   cmp    %edx,0x24(%rsi)
    57:   jbe    0x0000000000000093
    59:   mov    -0x224(%rbp),%eax
    5f:   cmp    $0x20,%eax
    62:   ja     0x0000000000000093
    64:   add    $0x1,%eax
    67:   mov    %eax,-0x224(%rbp)
    6d:   mov    0x110(%rsi,%rdx,8),%rax
    75:   test   %rax,%rax
    78:   je     0x0000000000000093
    7a:   mov    0x30(%rax),%rax
    7e:   add    $0x19,%rax
    82:   callq  0x000000000000008e
    87:   pause
    89:   lfence
    8c:   jmp    0x0000000000000087
    8e:   mov    %rax,(%rsp)
    92:   retq
    93:   mov    $0x1,%eax
  [...]

Also explicitly adding explicit env->allow_ptr_leaks to fixup_bpf_calls() since
backtracking is enabled under former (direct jumps as well, but use different
test). In case of only tracking different map pointers as in c93552c443eb ("bpf:
properly enforce index mask to prevent out-of-bounds speculation"), pruning
cannot make such short-cuts, neither if there are paths with scalar and non-scalar
types as r3. mark_chain_precision() is only needed after we know that
register_is_const(). If it was not the case, we already poison the key on first
path and non-const key in later paths are not matching the scalar range in regsafe()
either. Cilium NodePort testing passes fine as well now. Note, released kernels
not affected.

Fixes: d2e4c1e6c294 ("bpf: Constant map key tracking for prog array pokes")
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/ac43ffdeb7386c5bd688761ed266f3722bb39823.1576789878.git.daniel@iogearbox.net

net, sysctl: Fix compiler warning when only cBPF is present

proc_dointvec_minmax_bpf_restricted() has been firstly introduced
in commit 2e4a30983b0f ("bpf: restrict access to core bpf sysctls")
under CONFIG_HAVE_EBPF_JIT. Then, this ifdef has been removed in
ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv
allocations"), because a new sysctl, bpf_jit_limit, made use of it.
Finally, this parameter has become long instead of integer with
fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
and thus, a new proc_dolongvec_minmax_bpf_restricted() has been
added.

With this last change, we got back to that
proc_dointvec_minmax_bpf_restricted() is used only under
CONFIG_HAVE_EBPF_JIT, but the corresponding ifdef has not been
brought back.

So, in configurations like CONFIG_BPF_JIT=y && CONFIG_HAVE_EBPF_JIT=n
since v4.20 we have:

  CC      net/core/sysctl_net_core.o
net/core/sysctl_net_core.c:292:1: warning: ‘proc_dointvec_minmax_bpf_restricted’ defined but not used [-Wunused-function]
  292 | proc_dointvec_minmax_bpf_restricted(struct ctl_table *table, int write,
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Suppress this by guarding it with CONFIG_HAVE_EBPF_JIT again.

Fixes: fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
Signed-off-by: Alexander Lobakin <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
"6 fixes"

* emailed patches from Andrew Morton <[email protected]>:
  lib/Kconfig.debug: fix some messed up configurations
  mm: vmscan: protect shrinker idr replace with CONFIG_MEMCG
  kasan: don't assume percpu shadow allocations will succeed
  kasan: use apply_to_existing_page_range() for releasing vmalloc shadow
  mm/memory.c: add apply_to_existing_page_range() helper
  kasan: fix crashes on access to memory mapped by vm_map_ram()

Merge tag 'pm-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"Fix a problem related to CPU offline/online and cpufreq governors that
  in some system configurations may lead to a system-wide deadlock
  during CPU online"

* tag 'pm-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: Avoid leaving stale IRQ work items during CPU offline

xfs: don't commit sunit/swidth updates to disk if that would cause repair failures

Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
and swidth values could cause xfs_repair to fail loudly.  The problem
here is that repair calculates the where mkfs should have allocated the
root inode, based on the superblock geometry.  The allocation decisions
depend on sunit, which means that we really can't go updating sunit if
it would lead to a subsequent repair failure on an otherwise correct
filesystem.

Port from xfs_repair some code that computes the location of the root
inode and teach mount to skip the ondisk update if it would cause
problems for repair.  Along the way we'll update the documentation,
provide a function for computing the minimum AGFL size instead of
open-coding it, and cut down some indenting in the mount code.

Note that we allow the mount to proceed (and new allocations will
reflect this new geometry) because we've never screened this kind of
thing before.  We'll have to wait for a new future incompat feature to
enforce correct behavior, alas.

Note that the geometry reporting always uses the superblock values, not
the incore ones, so that is what xfs_info and xfs_growfs will report.

[1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d

Reported-by: Alex Lyakas <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

xfs: split the sunit parameter update into two parts

If the administrator provided a sunit= mount option, we need to validate
the raw parameter, convert the mount option units (512b blocks) into the
internal unit (fs blocks), and then validate that the (now cooked)
parameter doesn't screw anything up on disk. The incore inode geometry
computation can depend on the new sunit option, but a subsequent patch
will make validating the cooked value depends on the computed inode
geometry, so break the sunit update into two steps.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

xfs: refactor agfl length computation function

Refactor xfs_alloc_min_freelist to accept a NULL @pag argument, in which
case it returns the largest possible minimum length. This will be used
in an upcoming patch to compute the length of the AGFL at mkfs time.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Brian Foster <[email protected]>

libxfs: resync with the userspace libxfs

Prepare to resync the userspace libxfs with the kernel libxfs. There
were a few things I missed -- a couple of static inline directory
functions that have to be exported for xfs_repair; a couple of directory
naming functions that make porting much easier if they're /not/ static
inline; and a u16 usage that should have been uint16_t.

None of these things are bugs in their own right; this just makes
porting xfsprogs easier.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Eric Sandeen <[email protected]>

xfs: use bitops interface for buf log item AIL flag check

The xfs_log_item flags were converted to atomic bitops as of commit
22525c17ed ("xfs: log item flags are racy"). The assert check for
AIL presence in xfs_buf_item_relse() still uses the old value based
check. This likely went unnoticed as XFS_LI_IN_AIL evaluates to 0
and causes the assert to unconditionally pass. Fix up the check.

Signed-off-by: Brian Foster <[email protected]>
Fixes: 22525c17ed ("xfs: log item flags are racy")
Reviewed-by: Eric Sandeen <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

Merge branch 'bpf-fix-xsk-wakeup'

Maxim Mikityanskiy says:

====================
This series addresses the issue described in the commit message of the
first patch: lack of synchronization between XSK wakeup and destroying
the resources used by XSK wakeup. The idea is similar to napi_synchronize.
The series contains fixes for the drivers that implement XSK.

v2 incorporates changes suggested by Björn:

1. Call synchronize_rcu in Intel drivers only if the XDP program is
being unloaded.
2. Don't forget rcu_read_lock when wakeup is called from xsk_poll.
3. Use xs->zc as the condition to call ndo_xsk_wakeup.
====================

Signed-off-by: Daniel Borkmann <[email protected]>

net/ixgbe: Fix concurrency issues between config flow and XSK

Use synchronize_rcu to wait until the XSK wakeup function finishes
before destroying the resources it uses:

1. ixgbe_down already calls synchronize_rcu after setting __IXGBE_DOWN.

2. After switching the XDP program, call synchronize_rcu to let
ixgbe_xsk_wakeup exit before the XDP program is freed.

3. Changing the number of channels brings the interface down.

4. Disabling UMEM sets __IXGBE_TX_DISABLED before closing hardware
resources and resetting xsk_umem. Check that bit in ixgbe_xsk_wakeup to
avoid using the XDP ring when it's already destroyed. synchronize_rcu is
called from ixgbe_txrx_ring_disable.

Signed-off-by: Maxim Mikityanskiy <[email protected]>
Signed-off-by: Björn Töpel <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

net/i40e: Fix concurrency issues between config flow and XSK

Use synchronize_rcu to wait until the XSK wakeup function finishes
before destroying the resources it uses:

1. i40e_down already calls synchronize_rcu. On i40e_down either
__I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in
i40e_xsk_wakeup (the former is already checked there).

2. After switching the XDP program, call synchronize_rcu to let
i40e_xsk_wakeup exit before the XDP program is freed.

3. Changing the number of channels brings the interface down (see
i40e_prep_for_reset and i40e_pf_quiesce_all_vsi).

4. Disabling UMEM sets __I40E_CONFIG_BUSY, too.

Signed-off-by: Maxim Mikityanskiy <[email protected]>
Signed-off-by: Björn Töpel <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

net/mlx5e: Fix concurrency issues between config flow and XSK

After disabling resources necessary for XSK (the XDP program, channels,
XSK queues), use synchronize_rcu to wait until the XSK wakeup function
finishes, before freeing the resources.

Suspend XSK wakeups during switching channels. If the XDP program is
being removed, synchronize_rcu before closing the old channels to allow
XSK wakeup to complete.

Signed-off-by: Maxim Mikityanskiy <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

xsk: Add rcu_read_lock around the XSK wakeup

The XSK wakeup callback in drivers makes some sanity checks before
triggering NAPI. However, some configuration changes may occur during
this function that affect the result of those checks. For example, the
interface can go down, and all the resources will be destroyed after the
checks in the wakeup function, but before it attempts to use these
resources. Wrap this callback in rcu_read_lock to allow driver to
synchronize_rcu before actually destroying the resources.

xsk_wakeup is a new function that encapsulates calling ndo_xsk_wakeup
wrapped into the RCU lock. After this commit, xsk_poll starts using
xsk_wakeup and checks xs->zc instead of ndo_xsk_wakeup != NULL to decide
ndo_xsk_wakeup should be called. It also fixes a bug introduced with the
need_wakeup feature: a non-zero-copy socket may be used with a driver
supporting zero-copy, and in this case ndo_xsk_wakeup should not be
called, so the xs->zc check is the correct one.

Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Signed-off-by: Björn Töpel <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

Merge branch 'pm-cpufreq'

* pm-cpufreq:
cpufreq: Avoid leaving stale IRQ work items during CPU offline

mmc: sdhci-of-esdhc: re-implement erratum A-009204 workaround

The erratum A-009204 workaround patch was reverted because of
incorrect implementation.

8b6dc6b mmc: sdhci-of-esdhc: Revert "mmc: sdhci-of-esdhc: add
erratum A-009204 support"

This patch is to re-implement the workaround (add a 5 ms delay
before setting SYSCTL[RSTD] to make sure all the DMA transfers
are finished).

Signed-off-by: Yangbo Lu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: 5dd195522562 ("mmc: sdhci-of-esdhc: add erratum A-009204 support")
Cc: [email protected]
Signed-off-by: Ulf Hansson <[email protected]>

clk: qcom: Avoid SMMU/cx gdsc corner cases

Mark the msm8998 cpu CX gdsc as votable and use the hw control to avoid
corner cases with SMMU per hardware documentation.

Fixes: 3f7df5baa259 ("clk: qcom: Add MSM8998 GPU Clock Controller (GPUCC) driver")
Signed-off-by: Jeffrey Hugo <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Stephen Boyd <[email protected]>

clk: qcom: gcc-sc7180: Fix setting flag for votable GDSCs

Commit 17269568f7267 ("clk: qcom: Add Global Clock controller (GCC)
driver for SC7180") sets the VOTABLE flag in .pwrsts, but it needs
to be set in .flags, fix this.

Fixes: 17269568f7267 ("clk: qcom: Add Global Clock controller (GCC) driver for SC7180")
Signed-off-by: Matthias Kaehlcke <[email protected]>
Link: https://lkml.kernel.org/r/20191204120341.1.I9971817e83ee890d1096c43c5a6ce6ced53d5bd3@changeid
Signed-off-by: Stephen Boyd <[email protected]>

Merge tag 'tpmdd-next-20191219' of git://git.infradead.org/users/jjs/linux-tpmdd

Pull tpm fixes from Jarkko Sakkinen:
"Bunch of fixes for rc3"

* tag 'tpmdd-next-20191219' of git://git.infradead.org/users/jjs/linux-tpmdd:
  tpm/tpm_ftpm_tee: add shutdown call back
  tpm: selftest: cleanup after unseal with wrong auth/policy test
  tpm: selftest: add test covering async mode
  tpm: fix invalid locking in NONBLOCKING mode
  security: keys: trusted: fix lost handle flush
  tpm_tis: reserve chip for duration of tpm_tis_core_init
  KEYS: asymmetric: return ENOMEM if akcipher_request_alloc() fails
  KEYS: remove CONFIG_KEYS_COMPAT

tpm/tpm_ftpm_tee: add shutdown call back

Add shutdown call back to close existing session with fTPM TA
to support kexec scenario.

Add parentheses to function names in comments as specified in kdoc.

Signed-off-by: Thirupathaiah Annapureddy <[email protected]>
Signed-off-by: Pavel Tatashin <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Tested-by: Sasha Levin <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

drm/exynos: gsc: add missed component_del

The driver forgets to call component_del in remove to match component_add
in probe.
Add the missed call to fix it.

Signed-off-by: Chuhong Yuan <[email protected]>
Signed-off-by: Inki Dae <[email protected]>

s390/ftrace: save traced function caller

A typical backtrace acquired from ftraced function currently looks like
the following (e.g. for "path_openat"):

arch_stack_walk+0x15c/0x2d8
stack_trace_save+0x50/0x68
stack_trace_call+0x15a/0x3b8
ftrace_graph_caller+0x0/0x1c
0x3e0007e3c98 <- ftraced function caller (should be do_filp_open+0x7c/0xe8)
do_open_execat+0x70/0x1b8
__do_execve_file.isra.0+0x7d8/0x860
__s390x_sys_execve+0x56/0x68
system_call+0xdc/0x2d8

Note random "0x3e0007e3c98" stack value as ftraced function caller. This
value causes either imprecise unwinder result or unwinding failure.
That "0x3e0007e3c98" comes from r14 of ftraced function stack frame, which
it haven't had a chance to initialize since the very first instruction
calls ftrace code ("ftrace_caller"). (ftraced function might never
save r14 as well). Nevertheless according to s390 ABI any function
is called with stack frame allocated for it and r14 contains return
address. "ftrace_caller" itself is called with "brasl %r0,ftrace_caller".
So, to fix this issue simply always save traced function caller onto
ftraced function stack frame.

Reported-by: Sven Schnelle <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/unwind: stop gracefully at user mode pt_regs in irq stack

Consider reaching user mode pt_regs at the bottom of irq stack graceful
unwinder termination. This is the case when irq/mcck/ext interrupt arrives
while in user mode.

Signed-off-by: Vasily Gorbik <[email protected]>

s390/purgatory: do not build purgatory with kcov, kasan and friends

the purgatory must not rely on functions from the "old" kernel,
so we must disable kasan and friends. We also need to have a
separate copy of string.c as the default does not build memcmp
with KASAN.

Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Reviewed-by: Vasily Gorbik <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/purgatory: Make sure we fail the build if purgatory has missing symbols

Since we link purgatory with -r aka we enable "incremental linking"
no checks for unresolved symbols are done while linking the purgatory.

This commit adds an extra check for unresolved symbols by calling ld
without -r before running objcopy to generate purgatory.ro.

This will help us catch missing symbols in the purgatory sooner.

Note this commit also removes --no-undefined from LDFLAGS_purgatory
as that has no effect.

Signed-off-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Tested-by: Philipp Rudo <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/ftrace: fix endless recursion in function_graph tracer

The following sequence triggers a kernel stack overflow on s390x:

mount -t tracefs tracefs /sys/kernel/tracing
cd /sys/kernel/tracing
echo function_graph > current_tracer
[crash]

This is because preempt_count_{add,sub} are in the list of traced
functions, which can be demonstrated by:

echo preempt_count_add >set_ftrace_filter
echo function_graph > current_tracer
[crash]

The stack overflow happens because get_tod_clock_monotonic() gets called
by ftrace but itself calls preempt_{disable,enable}(), which leads to a
endless recursion. Fix this by using preempt_{disable,enable}_notrace().

Fixes: 011620688a71 ("s390/time: ensure get_clock_monotonic() returns monotonic values")
Signed-off-by: Sven Schnelle <[email protected]>
Reviewed-by: Vasily Gorbik <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

Merge branch 'stmmac-fixes'

Jose Abreu says:

====================
net: stmmac: Fixes for -net

Fixes for stmmac.

1) Fixes the filtering selftests (again) for cases when the number of multicast
filters are not enough.

2) Fixes SPH feature for MTU > default.

3) Fixes the behavior of accepting invalid MTU values.

4) Fixes FCS stripping for multi-descriptor packets.

5) Fixes the change of RX buffer size in XGMAC.

6) Fixes RX buffer size alignment.

7) Fixes the 16KB buffer alignment.

8) Fixes the enabling of 16KB buffer size feature.

9) Always arm the TX coalesce timer so that missed interrupts do not cause
a TX queue timeout.
====================

Signed-off-by: David S. Miller <[email protected]>

net: stmmac: Always arm TX Timer at end of transmission start

If TX Coalesce timer is enabled we should always arm it, otherwise we
may hit the case where an interrupt is missed and the TX Queue will
timeout.

Arming the timer does not necessarly mean it will run the tx_clean()
because this function is wrapped around NAPI launcher.

Fixes: 9125cdd1be11 ("stmmac: add the initial tx coalesce schema")
Signed-off-by: Jose Abreu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: Enable 16KB buffer size

XGMAC supports maximum MTU that can go to 16KB. Lets add this check in
the calculation of RX buffer size.

Fixes: 7ac6653a085b ("stmmac: Move the STMicroelectronics driver")
Signed-off-by: Jose Abreu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: 16KB buffer must be 16 byte aligned

The 16KB RX Buffer must also be 16 byte aligned. Fix it.

Fixes: 7ac6653a085b ("stmmac: Move the STMicroelectronics driver")
Signed-off-by: Jose Abreu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: RX buffer size must be 16 byte aligned

We need to align the RX buffer size to at least 16 byte so that IP
doesn't mis-behave. This is required by HW.

Changes from v2:
- Align UP and not DOWN (David)

Fixes: 7ac6653a085b ("stmmac: Move the STMicroelectronics driver")
Signed-off-by: Jose Abreu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: xgmac: Clear previous RX buffer size

When switching between buffer sizes we need to clear the previous value.

Fixes: d6ddfacd95c7 ("net: stmmac: Add DMA related callbacks for XGMAC2")
Signed-off-by: Jose Abreu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: Only the last buffer has the FCS field

Only the last received buffer contains the FCS field. Check for end of
packet before trying to strip the FCS field.

Fixes: 88ebe2cf7f3f ("net: stmmac: Rework stmmac_rx()")
Signed-off-by: Jose Abreu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: Do not accept invalid MTU values

The maximum MTU value is determined by the maximum size of TX FIFO so
that a full packet can fit in the FIFO. Add a check for this in the MTU
change callback.

Also check if provided and rounded MTU does not passes the maximum limit
of 16K.

Changes from v2:
- Align MTU before checking if its valid

Fixes: 7ac6653a085b ("stmmac: Move the STMicroelectronics driver")
Signed-off-by: Jose Abreu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>