Git Repo - linux.git/log

drm/amd/powerplay: add new sensor type for VCN powergate status

VCN is widely used in new ASICs and different from tranditional
UVD and VCE.

Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix a potential information leaking bug

Coccinelle reports a path that the array "data" is never initialized.
The path skips the checks in the conditional branches when either
of callback functions, read_wave_vgprs and read_wave_sgprs, is not
registered. Later, the uninitialized "data" array is read
in the while-loop below and passed to put_user().

Fix the path by allocating the array with kcalloc().

The patch is simplier than adding a fall-back branch that explicitly
calls memset(data, 0, ...). Also it does not need the multiplication
1024*sizeof(*data) as the size parameter for memset() though there is
no risk of integer overflow.

Signed-off-by: Wang Xiayang <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: fix error handling in amdgpu_cs_process_fence_dep

We always need to drop the ctx reference and should check
for errors first and then dereference the fence pointer.

Signed-off-by: Christian König <[email protected]>
Reviewed-by: Chunming Zhou <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

xen: avoid link error on ARM

Building the privcmd code as a loadable module on ARM, we get
a link error due to the private cache management functions:

ERROR: "__sync_icache_dcache" [drivers/xen/xen-privcmd.ko] undefined!

Move the code into a new that is always built in when Xen is enabled,
as suggested by Juergen Gross and Boris Ostrovsky.

Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

xen/gntdev.c: Replace vm_map_pages() with vm_map_pages_zero()

'commit df9bde015a72 ("xen/gntdev.c: convert to use vm_map_pages()")'
breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages()
will:
- use map->pages starting at vma->vm_pgoff instead of 0
- verify map->count against vma_pages()+vma->vm_pgoff instead of just
   vma_pages().

In practice, this breaks using a single gntdev FD for mapping multiple
grants.

relevant strace output:
[pid   857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid   857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0) =
0x777f1211b000
[pid   857] ioctl(7, IOCTL_GNTDEV_SET_UNMAP_NOTIFY, 0x7ffd3407b710) = 0
[pid   857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid   857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7,
0x1000) = -1 ENXIO (No such device or address)

details here:
https://github.com/QubesOS/qubes-issues/issues/5199

The reason is -> ( copying Marek's word from discussion)

vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's
basically using this parameter for "which grant reference to map".
map struct returned by gntdev_find_map_index() describes just the pages
to be mapped. Specifically map->pages[0] should be mapped at
vma->vm_start, not vma->vm_start+vma->vm_pgoff*PAGE_SIZE.

When trying to map grant with index (aka vma->vm_pgoff) > 1,
__vm_map_pages() will refuse to map it because it will expect map->count
to be at least vma_pages(vma)+vma->vm_pgoff, while it is exactly
vma_pages(vma).

Converting vm_map_pages() to use vm_map_pages_zero() will fix the
problem.

Marek has tested and confirmed the same.

Cc: [email protected] # v5.2+
Fixes: df9bde015a72 ("xen/gntdev.c: convert to use vm_map_pages()")
Reported-by: Marek Marczykowski-Górecki <[email protected]>
Signed-off-by: Souptick Joarder <[email protected]>
Tested-by: Marek Marczykowski-Górecki <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

drm/amd/powerplay: enable SW SMU reset functionality

Move SMU irq handler register to sw_init as that's totally
software related. Otherwise, it will prevent SMU reset working.

Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/powerplay: fix null pointer dereference around dpm state relates

DPM state relates are not supported on the new SW SMU ASICs. But still
it's not OK to trigger null pointer dereference on accessing them.

Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/powerplay: use proper revision id for navi

The PCI revision id determines the sku.

Reviewed-by: Feifei Xu <[email protected]>
Reviewed-by: Kevin Wang <[email protected]>
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/powerplay: fix temperature granularity error in smu11

in this patch,
drm/amd/powerplay: add callback function of get_thermal_temperature_range
the driver missed temperature granularity change on other temperature.

Signed-off-by: Kevin Wang <[email protected]>
Reviewed-by: Evan Quan <[email protected]>
Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/powerplay: add callback function of get_thermal_temperature_range

1. the thermal temperature is asic related data, move the code logic to
xxx_ppt.c.
2. replace data structure PP_TemperatureRange with
smu_temperature_range.
3. change temperature uint from temp*1000 to temp (temperature uint).

Signed-off-by: Kevin Wang <[email protected]>
Signed-off-by: Kenneth Feng <[email protected]>
Acked-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdkfd: Fix byte align on VegaM

This was missed during the addition of VegaM support

Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Kent Russell <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

fgraph: Remove redundant ftrace_graph_notrace_addr() test

We already have tested it before. The second one should be removed.
With this change, the performance should have little improvement.

Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: 9cd2992f2d6c ("fgraph: Have set_graph_notrace only affect function_graph tracer")
Signed-off-by: Changbin Du <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

tracing: Fix header include guards in trace event headers

These include guards are broken.

Match the #if !define() and #define lines so that they work correctly.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: f54d1867005c3 ("dma-buf: Rename struct fence to dma_fence")
Fixes: 2e26ca7150a4f ("tracing: Fix tracepoint.h DECLARE_TRACE() to allow more than one header")
Fixes: e543002f77f46 ("qdisc: add tracepoint qdisc:qdisc_dequeue for dequeued SKBs")
Fixes: 95f295f9fe081 ("dmaengine: tegra: add tracepoints to driver")
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

Merge branch 'dax-fix-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull dax fix from Dan Williams:
"Fix a botched manual patch update that got dropped between testing and
application"

* 'dax-fix-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Fix missed wakeup in put_unlocked_entry()

dm table: fix various whitespace issues with recent DAX code

Also, rename device_synchronous to device_dax_synchronous.

Signed-off-by: Mike Snitzer <[email protected]>

dm table: fix dax_dev NULL dereference in device_synchronous()

If a device doesn't support DAX its 'dax_dev' is NULL. Fix
device_synchronous() to first check if dax_dev is NULL before
dereferencing it.

Fixes: 2e9ee0955d3c ("dm: enable synchronous dax")
Reported-by: [email protected]
Signed-off-by: Pankaj Gupta <[email protected]>
Acked-by: Dan Williams <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>

arm64: compat: vdso: Use legacy syscalls as fallback

The generic VDSO implementation uses the Y2038 safe clock_gettime64() and
clock_getres_time64() syscalls as fallback for 32bit VDSO. This breaks
seccomp setups because these syscalls might be not (yet) allowed.

Implement the 32bit variants which use the legacy syscalls and select the
variant in the core library.

The 64bit time variants are not removed because they are required for the
time64 based vdso accessors.

Fixes: 00b26474c2f1 ("lib/vdso: Provide generic VDSO implementation")
Reported-by: Sean Christopherson <[email protected]>
Reported-by: Paul Bolle <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

x86/vdso/32: Use 32bit syscall fallback

The generic VDSO implementation uses the Y2038 safe clock_gettime64() and
clock_getres_time64() syscalls as fallback for 32bit VDSO. This breaks
seccomp setups because these syscalls might be not (yet) allowed.

Implement the 32bit variants which use the legacy syscalls and select the
variant in the core library.

The 64bit time variants are not removed because they are required for the
time64 based vdso accessors.

Fixes: 7ac870747988 ("x86/vdso: Switch to generic vDSO implementation")
Reported-by: Sean Christopherson <[email protected]>
Reported-by: Paul Bolle <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

lib/vdso/32: Provide legacy syscall fallbacks

To address the regression which causes seccomp to deny applications the
access to clock_gettime64() and clock_getres64() syscalls because they
are not enabled in the existing filters.

That trips over the fact that 32bit VDSOs use the new clock_gettime64() and
clock_getres64() syscalls in the fallback path.

Add a conditional to invoke the 32bit legacy fallback syscalls instead of
the new 64bit variants. The conditional can go away once all architectures
are converted.

Fixes: 00b26474c2f1 ("lib/vdso: Provide generic VDSO implementation")
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Sean Christopherson <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

lib/vdso: Move fallback invocation to the callers

To allow syscall fallbacks using the legacy 32bit syscall for 32bit VDSO
builds, move the fallback invocation out into the callers.

Split the common code out of __cvdso_clock_gettime/getres() and invoke the
syscall fallback in the 64bit and 32bit variants.

Preparatory work for using legacy syscalls in 32bit VDSO. No functional
change.

Fixes: 00b26474c2f1 ("lib/vdso: Provide generic VDSO implementation")
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

lib/vdso/32: Remove inconsistent NULL pointer checks

The 32bit variants of vdso_clock_gettime()/getres() have a NULL pointer
check for the timespec pointer. That's inconsistent vs. 64bit.

But the vdso implementation will never be consistent versus the syscall
because the only case which it can handle is NULL. Any other invalid
pointer will cause a segfault. So special casing NULL is not really useful.

Remove it along with the superflouos syscall fallback invocation as that
will return -EFAULT anyway. That also gets rid of the dubious typecast
which only works because the pointer is NULL.

Fixes: 00b26474c2f1 ("lib/vdso: Provide generic VDSO implementation")
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

Merge tag 'for-linus-20190730' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull pidfd fixes from Christian Brauner:
"This makes setting the exit_state in exit_notify() consistent after
  fixing the pidfd polling race pre-rc1. Related to the race fix, this
  adds a WARN_ON() to do_notify_pidfd() to catch any future exit_state
  races.

  Last, this removes an obsolete comment from the pidfd tests"

* tag 'for-linus-20190730' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  exit: make setting exit_state consistent
  pidfd: Add warning if exit_state is 0 during notification
  pidfd: remove obsolete comments from test

Merge tag 'f2fs-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs fixes from Jaegeuk Kim:
"This set of patches adjust to follow recent setflags changes and fix
  two regressions"

* tag 'f2fs-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
  f2fs: use EINVAL for superblock with invalid magic
  f2fs: fix to read source block before invalidating it
  f2fs: remove redundant check from f2fs_setflags_common()
  f2fs: use generic checking function for FS_IOC_FSSETXATTR
  f2fs: use generic checking and prep function for FS_IOC_SETFLAGS

Merge tag 'linux-kselftest-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
"Minor fixes to tests and one major fix to livepatch test to add skip
  handling to avoid false fail reports when livepatch is disabled"

* tag 'linux-kselftest-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/livepatch: add test skip handling
  selftests: mlxsw: Fix typo in qos_mc_aware.sh
  selftests/x86: fix spelling mistake "FAILT" -> "FAIL"
  selftests: kmod: Fix typo in kmod.sh

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"A few regression and bug fixes for the patches merged in the last
  cycle:

   - hns fixes a subtle crash from the ib core SGL rework

   - hfi1 fixes various error handling, oops and protocol errors

   - bnxt_re fixes a regression where nvmeof doesn't work on some
     configurations

   - mlx5 fixes a serious 'use after free' bug in how MR caching is
     handled

   - some edge case crashers in the new statistic core code

   - more siw static checker fixups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification
  IB/counters: Always initialize the port counter object
  IB/core: Fix querying total rdma stats
  IB/mlx5: Prevent concurrent MR updates during invalidation
  IB/mlx5: Fix clean_mr() to work in the expected order
  IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache
  IB/mlx5: Use direct mkey destroy command upon UMR unreg failure
  IB/mlx5: Fix unreg_umr to ignore the mkey state
  RDMA/siw: Remove set but not used variables 'rv'
  IB/mlx5: Replace kfree with kvfree
  RDMA/bnxt_re: Honor vlan_id in GID entry comparison
  IB/hfi1: Drop all TID RDMA READ RESP packets after r_next_psn
  IB/hfi1: Field not zero-ed when allocating TID flow memory
  IB/hfi1: Unreserve a flushed OPFN request
  IB/hfi1: Check for error on call to alloc_rsm_map_table
  RDMA/hns: Fix sg offset non-zero issue
  RDMA/siw: Fix error return code in siw_init_module()

Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull HMM fixes from Jason Gunthorpe:
"Fix the locking around nouveau's use of the hmm_range_* APIs. It works
  correctly in the success case, but many of the the edge cases have
  missing unlocks or double unlocks.

  The diffstat is a bit big as Christoph did a comprehensive job to move
  the obsolete API from the core header and into the driver before
  fixing its flow, but the risk of regression from this code motion is
  low"

* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  nouveau: unlock mmap_sem on all errors from nouveau_range_fault
  nouveau: remove the block parameter to nouveau_range_fault
  mm/hmm: move hmm_vma_range_done and hmm_vma_fault to nouveau
  mm/hmm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot}

loop: Fix mount(2) failure due to race with LOOP_SET_FD

Commit 33ec3e53e7b1 ("loop: Don't change loop device under exclusive
opener") made LOOP_SET_FD ioctl acquire exclusive block device reference
while it updates loop device binding. However this can make perfectly
valid mount(2) fail with EBUSY due to racing LOOP_SET_FD holding
temporarily the exclusive bdev reference in cases like this:

for i in {a..z}{a..z}; do
        dd if=/dev/zero of=$i.image bs=1k count=0 seek=1024
        mkfs.ext2 $i.image
        mkdir mnt$i
done

echo "Run"
for i in {a..z}{a..z}; do
        mount -o loop -t ext2 $i.image mnt$i &
done

Fix the problem by not getting full exclusive bdev reference in
LOOP_SET_FD but instead just mark the bdev as being claimed while we
update the binding information. This just blocks new exclusive openers
instead of failing them with EBUSY thus fixing the problem.

Fixes: 33ec3e53e7b1 ("loop: Don't change loop device under exclusive opener")
Cc: [email protected]
Tested-by: Kai-Heng Feng <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

xfs: Fix possible null-pointer dereferences in xchk_da_btree_block_check_sibling()

In xchk_da_btree_block_check_sibling(), there is an if statement on
line 274 to check whether ds->state->altpath.blk[level].bp is NULL:
    if (ds->state->altpath.blk[level].bp)

When ds->state->altpath.blk[level].bp is NULL, it is used on line 281:
    xfs_trans_brelse(..., ds->state->altpath.blk[level].bp);
        struct xfs_buf_log_item *bip = bp->b_log_item;
        ASSERT(bp->b_transp == tp);

Thus, possible null-pointer dereferences may occur.

To fix these bugs, ds->state->altpath.blk[level].bp is checked before
being used.

These bugs are found by a static analysis tool STCheck written by us.

Signed-off-by: Jia-Ju Bai <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

exit: make setting exit_state consistent

Since commit b191d6491be6 ("pidfd: fix a poll race when setting exit_state")
we unconditionally set exit_state to EXIT_ZOMBIE before calling into
do_notify_parent(). This was done to eliminate a race when querying
exit_state in do_notify_pidfd().
Back then we decided to do the absolute minimal thing to fix this and
not touch the rest of the exit_notify() function where exit_state is
set.
Since this fix has not caused any issues change the setting of
exit_state to EXIT_DEAD in the autoreap case to account for the fact hat
exit_state is set to EXIT_ZOMBIE unconditionally. This fix was planned
but also explicitly requested in [1] and makes the whole code more
consistent.

/* References */
[1]: https://lore.kernel.org/lkml/CAHk-=wigcxGFR2szue4wavJtH5cYTTeNES=toUBVGsmX0rzX+g@mail.gmail.com

Signed-off-by: Christian Brauner <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Cc: Linus Torvalds <[email protected]>

scsi: qla2xxx: Fix possible fcport null-pointer dereferences

In qla2x00_alloc_fcport(), fcport is assigned to NULL in the error
handling code on line 4880:
fcport = NULL;

Then fcport is used on lines 4883-4886:
INIT_WORK(&fcport->del_work, qla24xx_delete_sess_fn);
INIT_WORK(&fcport->reg_work, qla_register_fcport_fn);
INIT_LIST_HEAD(&fcport->gnl_entry);
INIT_LIST_HEAD(&fcport->list);

Thus, possible null-pointer dereferences may occur.

To fix these bugs, qla2x00_alloc_fcport() directly returns NULL
in the error handling code.

These bugs are found by a static analysis tool STCheck written by us.

Signed-off-by: Jia-Ju Bai <[email protected]>
Acked-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA

Although SAS3 & SAS3.5 IT HBA controllers support 64-bit DMA addressing, as
per hardware design, if DMA-able range contains all 64-bits
set (0xFFFFFFFF-FFFFFFFF) then it results in a firmware fault.

E.g. SGE's start address is 0xFFFFFFFF-FFFF000 and data length is 0x1000
bytes. when HBA tries to DMA the data at 0xFFFFFFFF-FFFFFFFF location then
HBA will fault the firmware.

Driver will set 63-bit DMA mask to ensure the above address will not be
used.

Cc: <[email protected]> # 5.1.20+
Signed-off-by: Suganath Prabu <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: hpsa: remove printing internal cdb on tag collision

Remove racy printing of internal commands. Completion thread can be
cleaning up the command in parallel.

Reviewed-by: Bader Ali - Saleh <[email protected]>
Reviewed-by: Scott Teel <[email protected]>
Reviewed-by: Scott Benesh <[email protected]>
Reviewed-by: Kevin Barnett <[email protected]>
Signed-off-by: Don Brace <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: hpsa: correct scsi command status issue after reset

Reviewed-by: Bader Ali - Saleh <[email protected]>
Reviewed-by: Scott Teel <[email protected]>
Reviewed-by: Scott Benesh <[email protected]>
Reviewed-by: Kevin Barnett <[email protected]>
Signed-off-by: Don Brace <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

Btrfs: fix deadlock between fiemap and transaction commits

The fiemap handler locks a file range that can have unflushed delalloc,
and after locking the range, it tries to attach to a running transaction.
If the running transaction started its commit, that is, it is in state
TRANS_STATE_COMMIT_START, and either the filesystem was mounted with the
flushoncommit option or the transaction is creating a snapshot for the
subvolume that contains the file that fiemap is operating on, we end up
deadlocking. This happens because fiemap is blocked on the transaction,
waiting for it to complete, and the transaction is waiting for the flushed
dealloc to complete, which requires locking the file range that the fiemap
task already locked. The following stack traces serve as an example of
when this deadlock happens:

  (...)
  [404571.515510] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
  [404571.515956] Call Trace:
  [404571.516360]  ? __schedule+0x3ae/0x7b0
  [404571.516730]  schedule+0x3a/0xb0
  [404571.517104]  lock_extent_bits+0x1ec/0x2a0 [btrfs]
  [404571.517465]  ? remove_wait_queue+0x60/0x60
  [404571.517832]  btrfs_finish_ordered_io+0x292/0x800 [btrfs]
  [404571.518202]  normal_work_helper+0xea/0x530 [btrfs]
  [404571.518566]  process_one_work+0x21e/0x5c0
  [404571.518990]  worker_thread+0x4f/0x3b0
  [404571.519413]  ? process_one_work+0x5c0/0x5c0
  [404571.519829]  kthread+0x103/0x140
  [404571.520191]  ? kthread_create_worker_on_cpu+0x70/0x70
  [404571.520565]  ret_from_fork+0x3a/0x50
  [404571.520915] kworker/u8:6    D    0 31651      2 0x80004000
  [404571.521290] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
  (...)
  [404571.537000] fsstress        D    0 13117  13115 0x00004000
  [404571.537263] Call Trace:
  [404571.537524]  ? __schedule+0x3ae/0x7b0
  [404571.537788]  schedule+0x3a/0xb0
  [404571.538066]  wait_current_trans+0xc8/0x100 [btrfs]
  [404571.538349]  ? remove_wait_queue+0x60/0x60
  [404571.538680]  start_transaction+0x33c/0x500 [btrfs]
  [404571.539076]  btrfs_check_shared+0xa3/0x1f0 [btrfs]
  [404571.539513]  ? extent_fiemap+0x2ce/0x650 [btrfs]
  [404571.539866]  extent_fiemap+0x2ce/0x650 [btrfs]
  [404571.540170]  do_vfs_ioctl+0x526/0x6f0
  [404571.540436]  ksys_ioctl+0x70/0x80
  [404571.540734]  __x64_sys_ioctl+0x16/0x20
  [404571.540997]  do_syscall_64+0x60/0x1d0
  [404571.541279]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  (...)
  [404571.543729] btrfs           D    0 14210  14208 0x00004000
  [404571.544023] Call Trace:
  [404571.544275]  ? __schedule+0x3ae/0x7b0
  [404571.544526]  ? wait_for_completion+0x112/0x1a0
  [404571.544795]  schedule+0x3a/0xb0
  [404571.545064]  schedule_timeout+0x1ff/0x390
  [404571.545351]  ? lock_acquire+0xa6/0x190
  [404571.545638]  ? wait_for_completion+0x49/0x1a0
  [404571.545890]  ? wait_for_completion+0x112/0x1a0
  [404571.546228]  wait_for_completion+0x131/0x1a0
  [404571.546503]  ? wake_up_q+0x70/0x70
  [404571.546775]  btrfs_wait_ordered_extents+0x27c/0x400 [btrfs]
  [404571.547159]  btrfs_commit_transaction+0x3b0/0xae0 [btrfs]
  [404571.547449]  ? btrfs_mksubvol+0x4a4/0x640 [btrfs]
  [404571.547703]  ? remove_wait_queue+0x60/0x60
  [404571.547969]  btrfs_mksubvol+0x605/0x640 [btrfs]
  [404571.548226]  ? __sb_start_write+0xd4/0x1c0
  [404571.548512]  ? mnt_want_write_file+0x24/0x50
  [404571.548789]  btrfs_ioctl_snap_create_transid+0x169/0x1a0 [btrfs]
  [404571.549048]  btrfs_ioctl_snap_create_v2+0x11d/0x170 [btrfs]
  [404571.549307]  btrfs_ioctl+0x133f/0x3150 [btrfs]
  [404571.549549]  ? mem_cgroup_charge_statistics+0x4c/0xd0
  [404571.549792]  ? mem_cgroup_commit_charge+0x84/0x4b0
  [404571.550064]  ? __handle_mm_fault+0xe3e/0x11f0
  [404571.550306]  ? do_raw_spin_unlock+0x49/0xc0
  [404571.550608]  ? _raw_spin_unlock+0x24/0x30
  [404571.550976]  ? __handle_mm_fault+0xedf/0x11f0
  [404571.551319]  ? do_vfs_ioctl+0xa2/0x6f0
  [404571.551659]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
  [404571.552087]  do_vfs_ioctl+0xa2/0x6f0
  [404571.552355]  ksys_ioctl+0x70/0x80
  [404571.552621]  __x64_sys_ioctl+0x16/0x20
  [404571.552864]  do_syscall_64+0x60/0x1d0
  [404571.553104]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  (...)

If we were joining the transaction instead of attaching to it, we would
not risk a deadlock because a join only blocks if the transaction is in a
state greater then or equals to TRANS_STATE_COMMIT_DOING, and the delalloc
flush performed by a transaction is done before it reaches that state,
when it is in the state TRANS_STATE_COMMIT_START. However a transaction
join is intended for use cases where we do modify the filesystem, and
fiemap only needs to peek at delayed references from the current
transaction in order to determine if extents are shared, and, besides
that, when there is no current transaction or when it blocks to wait for
a current committing transaction to complete, it creates a new transaction
without reserving any space. Such unnecessary transactions, besides doing
unnecessary IO, can cause transaction aborts (-ENOSPC) and unnecessary
rotation of the precious backup roots.

So fix this by adding a new transaction join variant, named join_nostart,
which behaves like the regular join, but it does not create a transaction
when none currently exists or after waiting for a committing transaction
to complete.

Fixes: 03628cdbc64db6 ("Btrfs: do not start a transaction during fiemap")
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>

Btrfs: fix race leading to fs corruption after transaction abort

When one transaction is finishing its commit, it is possible for another
transaction to start and enter its initial commit phase as well. If the
first ends up getting aborted, we have a small time window where the second
transaction commit does not notice that the previous transaction aborted
and ends up committing, writing a superblock that points to btrees that
reference extent buffers (nodes and leafs) that were not persisted to disk.
The consequence is that after mounting the filesystem again, we will be
unable to load some btree nodes/leafs, either because the content on disk
is either garbage (or just zeroes) or corresponds to the old content of a
previouly COWed or deleted node/leaf, resulting in the well known error
messages "parent transid verify failed on ...".
The following sequence diagram illustrates how this can happen.

        CPU 1                                           CPU 2

<at transaction N>

btrfs_commit_transaction()
   (...)
   --> sets transaction state to
       TRANS_STATE_UNBLOCKED
   --> sets fs_info->running_transaction
       to NULL

                                                    (...)
                                                    btrfs_start_transaction()
                                                      start_transaction()
                                                        wait_current_trans()
                                                          --> returns immediately
                                                              because
                                                              fs_info->running_transaction
                                                              is NULL
                                                        join_transaction()
                                                          --> creates transaction N + 1
                                                          --> sets
                                                              fs_info->running_transaction
                                                              to transaction N + 1
                                                          --> adds transaction N + 1 to
                                                              the fs_info->trans_list list
                                                        --> returns transaction handle
                                                            pointing to the new
                                                            transaction N + 1
                                                    (...)

                                                    btrfs_sync_file()
                                                      btrfs_start_transaction()
                                                        --> returns handle to
                                                            transaction N + 1
                                                      (...)

   btrfs_write_and_wait_transaction()
     --> writeback of some extent
         buffer fails, returns an
error
   btrfs_handle_fs_error()
     --> sets BTRFS_FS_STATE_ERROR in
         fs_info->fs_state
   --> jumps to label "scrub_continue"
   cleanup_transaction()
     btrfs_abort_transaction(N)
       --> sets BTRFS_FS_STATE_TRANS_ABORTED
           flag in fs_info->fs_state
       --> sets aborted field in the
           transaction and transaction
   handle structures, for
           transaction N only
     --> removes transaction from the
         list fs_info->trans_list
                                                      btrfs_commit_transaction(N + 1)
                                                        --> transaction N + 1 was not
    aborted, so it proceeds
                                                        (...)
                                                        --> sets the transaction's state
                                                            to TRANS_STATE_COMMIT_START
                                                        --> does not find the previous
                                                            transaction (N) in the
                                                            fs_info->trans_list, so it
                                                            doesn't know that transaction
                                                            was aborted, and the commit
                                                            of transaction N + 1 proceeds
                                                        (...)
                                                        --> sets transaction N + 1 state
                                                            to TRANS_STATE_UNBLOCKED
                                                        btrfs_write_and_wait_transaction()
                                                          --> succeeds writing all extent
                                                              buffers created in the
                                                              transaction N + 1
                                                        write_all_supers()
                                                           --> succeeds
                                                           --> we now have a superblock on
                                                               disk that points to trees
                                                               that refer to at least one
                                                               extent buffer that was
                                                               never persisted

So fix this by updating the transaction commit path to check if the flag
BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting
the transaction to the TRANS_STATE_COMMIT_START we do not find any previous
transaction in the fs_info->trans_list. If the flag is set, just fail the
transaction commit with -EROFS, as we do in other places. The exact error
code for the previous transaction abort was already logged and reported.

Fixes: 49b25e0540904b ("btrfs: enhance transaction abort infrastructure")
CC: [email protected] # 4.4+
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

Btrfs: fix incremental send failure after deduplication

When doing an incremental send operation we can fail if we previously did
deduplication operations against a file that exists in both snapshots. In
that case we will fail the send operation with -EIO and print a message
to dmesg/syslog like the following:

  BTRFS error (device sdc): Send: inconsistent snapshot, found updated \
  extent for inode 257 without updated inode item, send root is 258, \
  parent root is 257

This requires that we deduplicate to the same file in both snapshots for
the same amount of times on each snapshot. The issue happens because a
deduplication only updates the iversion of an inode and does not update
any other field of the inode, therefore if we deduplicate the file on
each snapshot for the same amount of time, the inode will have the same
iversion value (stored as the "sequence" field on the inode item) on both
snapshots, therefore it will be seen as unchanged between in the send
snapshot while there are new/updated/deleted extent items when comparing
to the parent snapshot. This makes the send operation return -EIO and
print an error message.

Example reproducer:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  # Create our first file. The first half of the file has several 64Kb
  # extents while the second half as a single 512Kb extent.
  $ xfs_io -f -s -c "pwrite -S 0xb8 -b 64K 0 512K" /mnt/foo
  $ xfs_io -c "pwrite -S 0xb8 512K 512K" /mnt/foo

  # Create the base snapshot and the parent send stream from it.
  $ btrfs subvolume snapshot -r /mnt /mnt/mysnap1
  $ btrfs send -f /tmp/1.snap /mnt/mysnap1

  # Create our second file, that has exactly the same data as the first
  # file.
  $ xfs_io -f -c "pwrite -S 0xb8 0 1M" /mnt/bar

  # Create the second snapshot, used for the incremental send, before
  # doing the file deduplication.
  $ btrfs subvolume snapshot -r /mnt /mnt/mysnap2

  # Now before creating the incremental send stream:
  #
  # 1) Deduplicate into a subrange of file foo in snapshot mysnap1. This
  #    will drop several extent items and add a new one, also updating
  #    the inode's iversion (sequence field in inode item) by 1, but not
  #    any other field of the inode;
  #
  # 2) Deduplicate into a different subrange of file foo in snapshot
  #    mysnap2. This will replace an extent item with a new one, also
  #    updating the inode's iversion by 1 but not any other field of the
  #    inode.
  #
  # After these two deduplication operations, the inode items, for file
  # foo, are identical in both snapshots, but we have different extent
  # items for this inode in both snapshots. We want to check this doesn't
  # cause send to fail with an error or produce an incorrect stream.

  $ xfs_io -r -c "dedupe /mnt/bar 0 0 512K" /mnt/mysnap1/foo
  $ xfs_io -r -c "dedupe /mnt/bar 512K 512K 512K" /mnt/mysnap2/foo

  # Create the incremental send stream.
  $ btrfs send -p /mnt/mysnap1 -f /tmp/2.snap /mnt/mysnap2
  ERROR: send ioctl failed with -5: Input/output error

This issue started happening back in 2015 when deduplication was updated
to not update the inode's ctime and mtime and update only the iversion.
Back then we would hit a BUG_ON() in send, but later in 2016 send was
updated to return -EIO and print the error message instead of doing the
BUG_ON().

A test case for fstests follows soon.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203933
Fixes: 1c919a5e13702c ("btrfs: don't update mtime/ctime on deduped inodes")
CC: [email protected] # 4.4+
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>

kbuild: initialize CLANG_FLAGS correctly in the top Makefile

CLANG_FLAGS is initialized by the following line:

CLANG_FLAGS := --target=$(notdir $(CROSS_COMPILE:%-=%))

..., which is run only when CROSS_COMPILE is set.

Some build targets (bindeb-pkg etc.) recurse to the top Makefile.

When you build the kernel with Clang but without CROSS_COMPILE,
the same compiler flags such as -no-integrated-as are accumulated
into CLANG_FLAGS.

If you run 'make CC=clang' and then 'make CC=clang bindeb-pkg',
Kbuild will recompile everything needlessly due to the build command
change.

Fix this by correctly initializing CLANG_FLAGS.

Fixes: 238bcbc4e07f ("kbuild: consolidate Clang compiler flags")
Cc: <[email protected]> # v5.0+
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Acked-by: Nick Desaulniers <[email protected]>

powerpc/spe: Mark expected switch fall-throughs

Mark switch cases where we are expecting to fall through.

Fixes errors such as below, seen with mpc85xx_defconfig:

  arch/powerpc/kernel/align.c: In function 'emulate_spe':
  arch/powerpc/kernel/align.c:178:8: error: this statement may fall through
    ret |= __get_user_inatomic(temp.v[3], p++);
        ^~

Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

drm/bridge: tc358764: Fix build error

If CONFIG_DRM_TOSHIBA_TC358764=y but CONFIG_DRM_KMS_HELPER=m,
building fails:

drivers/gpu/drm/bridge/tc358764.o:(.rodata+0x228): undefined reference to `drm_atomic_helper_connector_reset'
drivers/gpu/drm/bridge/tc358764.o:(.rodata+0x240): undefined reference to `drm_helper_probe_single_connector_modes'
drivers/gpu/drm/bridge/tc358764.o:(.rodata+0x268): undefined reference to `drm_atomic_helper_connector_duplicate_state'
drivers/gpu/drm/bridge/tc358764.o:(.rodata+0x270): undefined reference to `drm_atomic_helper_connector_destroy_state'

Like TC358767, select DRM_KMS_HELPER to fix this, and
change to select DRM_PANEL to avoid recursive dependency.

Reported-by: Hulk Robot <[email protected]>
Fixes: f38b7cca6d0e ("drm/bridge: tc358764: Add DSI to LVDS bridge driver")
Signed-off-by: YueHaibing <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
Reviewed-by: Neil Armstrong <[email protected]>
Signed-off-by: Neil Armstrong <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/bridge: lvds-encoder: Fix build error while CONFIG_DRM_KMS_HELPER=m

If DRM_LVDS_ENCODER=y but CONFIG_DRM_KMS_HELPER=m,
build fails:

drivers/gpu/drm/bridge/lvds-encoder.o: In function `lvds_encoder_probe':
lvds-encoder.c:(.text+0x155): undefined reference to `devm_drm_panel_bridge_add'

Reported-by: Hulk Robot <[email protected]>
Fixes: dbb58bfd9ae6 ("drm/bridge: Fix lvds-encoder since the panel_bridge rework.")
Signed-off-by: YueHaibing <[email protected]>
Reviewed-by: Neil Armstrong <[email protected]>
Signed-off-by: Neil Armstrong <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

powerpc/nvdimm: Pick nearby online node if the device node is not online

Currently, nvdimm subsystem expects the device numa node for SCM device to be
an online node. It also doesn't try to bring the device numa node online. Hence
if we use a non-online numa node as device node we hit crashes like below. This
is because we try to access uninitialized NODE_DATA in different code paths.

cpu 0x0: Vector: 300 (Data Access) at [c0000000fac53170]
    pc: c0000000004bbc50: ___slab_alloc+0x120/0xca0
    lr: c0000000004bc834: __slab_alloc+0x64/0xc0
    sp: c0000000fac53400
   msr: 8000000002009033
   dar: 73e8
dsisr: 80000
  current = 0xc0000000fabb6d80
  paca    = 0xc000000003870000   irqmask: 0x03   irq_happened: 0x01
    pid   = 7, comm = kworker/u16:0
Linux version 5.2.0-06234-g76bd729b2644 (kvaneesh@ltc-boston123) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #135 SMP Thu Jul 11 05:36:30 CDT 2019
enter ? for help
[link register   ] c0000000004bc834 __slab_alloc+0x64/0xc0
[c0000000fac53400] c0000000fac53480 (unreliable)
[c0000000fac53500] c0000000004bc818 __slab_alloc+0x48/0xc0
[c0000000fac53560] c0000000004c30a0 __kmalloc_node_track_caller+0x3c0/0x6b0
[c0000000fac535d0] c000000000cfafe4 devm_kmalloc+0x74/0xc0
[c0000000fac53600] c000000000d69434 nd_region_activate+0x144/0x560
[c0000000fac536d0] c000000000d6b19c nd_region_probe+0x17c/0x370
[c0000000fac537b0] c000000000d6349c nvdimm_bus_probe+0x10c/0x230
[c0000000fac53840] c000000000cf3cc4 really_probe+0x254/0x4e0
[c0000000fac538d0] c000000000cf429c driver_probe_device+0x16c/0x1e0
[c0000000fac53950] c000000000cf0b44 bus_for_each_drv+0x94/0x130
[c0000000fac539b0] c000000000cf392c __device_attach+0xdc/0x200
[c0000000fac53a50] c000000000cf231c bus_probe_device+0x4c/0xf0
[c0000000fac53a90] c000000000ced268 device_add+0x528/0x810
[c0000000fac53b60] c000000000d62a58 nd_async_device_register+0x28/0xa0
[c0000000fac53bd0] c0000000001ccb8c async_run_entry_fn+0xcc/0x1f0
[c0000000fac53c50] c0000000001bcd9c process_one_work+0x46c/0x860
[c0000000fac53d20] c0000000001bd4f4 worker_thread+0x364/0x5f0
[c0000000fac53db0] c0000000001c7260 kthread+0x1b0/0x1c0
[c0000000fac53e20] c00000000000b954 ret_from_kernel_thread+0x5c/0x68

The patch tries to fix this by picking the nearest online node as the SCM node.
This does have a problem of us losing the information that SCM node is
equidistant from two other online nodes. If applications need to understand these
fine-grained details we should express then like x86 does via
/sys/devices/system/node/nodeX/accessY/initiators/

With the patch we get

# numactl -H
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 130865 MB
node 1 free: 129130 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10
# cat /sys/bus/nd/devices/region0/numa_node
0
# dmesg | grep papr_scm
[   91.332305] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Region registered with target node 2 and online node 0

Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

ALSA: usb-audio: Fix gpf in snd_usb_pipe_sanity_check

syzbot found the following crash on:

  general protection fault: 0000 [#1] SMP KASAN
  RIP: 0010:snd_usb_pipe_sanity_check+0x80/0x130 sound/usb/helper.c:75
  Call Trace:
    snd_usb_motu_microbookii_communicate.constprop.0+0xa0/0x2fb  sound/usb/quirks.c:1007
    snd_usb_motu_microbookii_boot_quirk sound/usb/quirks.c:1051 [inline]
    snd_usb_apply_boot_quirk.cold+0x163/0x370 sound/usb/quirks.c:1280
    usb_audio_probe+0x2ec/0x2010 sound/usb/card.c:576
    usb_probe_interface+0x305/0x7a0 drivers/usb/core/driver.c:361
    really_probe+0x281/0x650 drivers/base/dd.c:548
    ....

It was introduced in commit 801ebf1043ae for checking pipe and endpoint
types. It is fixed by adding a check of the ep pointer in question.

BugLink: https://syzkaller.appspot.com/bug?extid=d59c4387bfb6eced94e2
Reported-by: syzbot <[email protected]>
Fixes: 801ebf1043ae ("ALSA: usb-audio: Sanity checks for each pipe and EP types")
Cc: Andrey Konovalov <[email protected]>
Signed-off-by: Hillf Danton <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

Merge tag 'gvt-fixes-2019-07-30' of https://github.com/intel/gvt-linux into drm-intel-fixes

gvt-fixes-2019-07-30

- Guard against potential ggtt access error (Xiong)
- Fix includecheck (Zhenyu)
- Fix cache entry for guest page mapping found by 2M ppgtt guest (Xiaolin)
- Fix runtime pm warning (Xiaolin)
- Fix shadow mm settlement for Windows guest reset failure (Colin)

Signed-off-by: Jani Nikula <[email protected]>
From: Zhenyu Wang <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/i915/gvt: Adding ppgtt to GVT GEM context after shadow pdps settled.

Windows guest can't run after force-TDR with host log:
...
gvt: vgpu 1: workload shadow ppgtt isn't ready
gvt: vgpu 1: fail to dispatch workload, skip
...

The error is raised by set_context_ppgtt_from_shadow(), when it checks
and found the shadow_mm isn't marked as shadowed.

In work thread before each submission, a shadow_mm is set to shadowed in:
shadow_ppgtt_mm()
<-intel_vgpu_pin_mm()
<-prepare_workload()
<-dispatch_workload()
<-workload_thread()
However checking whether or not shadow_mm is shadowed is prior to it:
set_context_ppgtt_from_shadow()
<-dispatch_workload()
<-workload_thread()

In normal case, create workload will check the existence of shadow_mm,
if not it will create a new one and marked as shadowed. If already exist
it will reuse the old one. Since shadow_mm is reused, checking of shadowed
in set_context_ppgtt_from_shadow() actually always see the state set in
creation, but not the state set in intel_vgpu_pin_mm().

When force-TDR, all engines are reset, since it's not dmlr level, all
ppgtt_mm are invalidated but not destroyed. Invalidation will mark all
reused shadow_mm as not shadowed but still keeps in ppgtt_mm_list_head.
If workload submission phase those shadow_mm are reused with shadowed
not set, then set_context_ppgtt_from_shadow() will report error.

Pin for context after shadow_mm pinned and shadow pdps settled.

v2:
Move set_context_ppgtt_from_shadow() after prepare_workload(). (zhenyu)
v3:
Move set_context_ppgtt_from_shadow() after shadow pdps updated.(zhenyu)

Fixes: 4f15665ccbba ("drm/i915: Add ppgtt to GVT GEM context")
Cc: [email protected]
Signed-off-by: Colin Xu <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>

drm/i915/gvt: grab runtime pm first for forcewake use

in workload_thread, it should grab runtime pm wakelock and later
uncore forcewake get will check rpm wakelock held successfully.
otherwise, sometimes, rpm wakelock not hold and print call trace below:

Call Trace:
  intel_uncore_forcewake_get+0x15/0x20 [i915]
  workload_thread+0x5f9/0x16f0 [i915]
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70
  ? __switch_to_asm+0x34/0x70
  ? __switch_to+0x85/0x3f0
  ? __switch_to_asm+0x40/0x70
  ? do_wait_intr_irq+0x90/0x90
  kthread+0x121/0x140
  ? intel_vgpu_clean_workloads+0x100/0x100 [i915]
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
--[ end trace 86525f742a02e12c ]--

v2: adapted to use rpm structure.

Fixes: 251d46b0875c ("drm/i915/gvt: Pin the per-engine GVT shadow contexts")
Reviewed-by: Zhenyu Wang <[email protected]>
Signed-off-by: Xiaolin Zhang <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>

drm/i915/gvt: fix incorrect cache entry for guest page mapping

GPU hang observed during the guest OCL conformance test which is caused
by THP GTT feature used durning the test.

It was observed the same GFN with different size (4K and 2M) requested
from the guest in GVT. So during the guest page dma map stage, it is
required to unmap first with orginal size and then remap again with
requested size.

Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Cc: [email protected]
Reviewed-by: Zhenyu Wang <[email protected]>
Signed-off-by: Xiaolin Zhang <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>

drm/i915/gvt: Checking workload's gma earlier

Workload contains RB and WA_CTX which are in ggtt space,
if they aren't in valid ggtt space, the workload shouldn't be
shadowed and scanned. So checking them earlier to avoid shadow
them.

Reviewed-by: Zhenyu Wang <[email protected]>
Signed-off-by: Xiong Zhang <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>

drm/i915/gvt: Don't use ggtt_validdate_range() with size=0

Use vgpu_gmadr_is_valid() directly instead.

Reviewed-by: Zhenyu Wang <[email protected]>
Signed-off-by: Xiong Zhang <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>

drm/i915/gvt: Warning for invalid ggtt access

Instead of silently return virtual ggtt entries that guest is allowed
to access, this patch add extra range check. If guest read out of
range, it will print a warning and return 0. If guest write out
of range, the write will be dropped without any message.

Reviewed-by: Zhenyu Wang <[email protected]>
Signed-off-by: Xiong Zhang <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>

drm/i915/gvt: remove duplicate include of trace.h

This removes duplicate include of trace.h. Found by Hariprasad Kelam
with includecheck.

Reported-by: Hariprasad Kelam <[email protected]>
Reviewed-by: Yan Zhao <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>

scsi: fcoe: pass in fcoe_rport structure instead of fc_rport_priv

Instead of using the generic 'fc_rport_priv' structure as argument and then
having to painstakingly outcast this to fcoe_rport we should be passing the
fcoe_rport structure itself and reduce complexity.

Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: fcoe: Embed fc_rport_priv in fcoe_rport structure

Gcc-9 complains for a memset across pointer boundaries, which happens as
the code tries to allocate a flexible array on the stack. Turns out we
cannot do this without relying on gcc-isms, so with this patch we'll embed
the fc_rport_priv structure into fcoe_rport, can use the normal
'container_of' outcast, and will only have to do a memset over one
structure.

Signed-off-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: libfc: Whitespace cleanup in libfc.h

No functional change.

[mkp: typo]

Signed-off-by: Hannes Reinecke <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

libata: zpodd: Fix small read overflow in zpodd_get_mech_type()

Jeffrin reported a KASAN issue:

  BUG: KASAN: global-out-of-bounds in ata_exec_internal_sg+0x50f/0xc70
  Read of size 16 at addr ffffffff91f41f80 by task scsi_eh_1/149
  ...
  The buggy address belongs to the variable:
    cdb.48319+0x0/0x40

Much like commit 18c9a99bce2a ("libata: zpodd: small read overflow in
eject_tray()"), this fixes a cdb[] buffer length, this time in
zpodd_get_mech_type():

We read from the cdb[] buffer in ata_exec_internal_sg(). It has to be
ATAPI_CDB_LEN (16) bytes long, but this buffer is only 12 bytes.

Reported-by: Jeffrin Jose T <[email protected]>
Fixes: afe759511808c ("libata: identify and init ZPODD devices")
Link: https://lore.kernel.org/lkml/201907181423.E808958@keescook/
Tested-by: Jeffrin Jose T <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

ataflop: Mark expected switch fall-through

Mark switch cases where we are expecting to fall through.

This patch fixes the following warning (Building: m68k):

drivers/block/ataflop.c: In function ‘fd_locked_ioctl’:
drivers/block/ataflop.c:1728:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
   set_capacity(floppy->disk, MAX_DISK_SIZE * 2);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/block/ataflop.c:1729:2: note: here
  case FDFMTEND:
  ^~~~

Signed-off-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'perf-urgent-for-mingo-5.3-20190729' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

perf header:

  Vince Weaver:

  - Fix divide by zero error if f_header.attr_size==0, found using a perf tool fuzzer.

  Numfor Mbiziwo-Tiapo:

  - Silence use of uninitialized value warning pointed out by clang's MSAN tool.

libbpf:

  Andrii Nakryiko:

  - Fix missing __WORDSIZE definition in some systems, such as musl libc (Alpine Linux).

tools header UAPI:

  Arnaldo Carvalho de Melo:

  - Sync headers to address perf build warnings:

    - syscalls_64.tbl and generic unistd.h to pick up clone3 and pidfd_open.

    - With new ioctls: kvm.h, drm.h and usbdevice_fs.h.

    - No tooling change: mman.h, sched.h and if_link.h.

Documentation:

  Vince Weaver:

  - Fix perf.data documentation units for memory size, its kB, not bytes.

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio/vhost fixes from Michael Tsirkin:

- Fixes in the iommu and balloon devices.

- Disable the meta-data optimization for now - I hope we can get it
   fixed shortly, but there's no point in making users suffer crashes
   while we are working on that.

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: disable metadata prefetch optimization
  iommu/virtio: Update to most recent specification
  balloon: fix up comments
  mm/balloon_compaction: avoid duplicate page removal

Merge tag 'platform-drivers-x86-v5.3-3' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver fixes from Andy Shevchenko:
"Business as usual, a few fixes and new IDs:

   - PC Engines APU got one fix for software dependencies to
     automatically load them and another fix for mapping of key button
     in the front to issue restart event.

   - OLPC driver is now probed automatically based on module device
     table.

   - Intel PMC core driver supports Intel Ice Lake NNPI processor.

   - WMI driver missed description of a new field in the structure that
     has been added"

* tag 'platform-drivers-x86-v5.3-3' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: pcengines-apuv2: use KEY_RESTART for front button
  platform/x86: intel_pmc_core: Add ICL-NNPI support to PMC Core
  Platform: OLPC: add SPI MODULE_DEVICE_TABLE
  platform/x86: wmi: add missing struct parameter description
  platform/x86: pcengines-apuv2: Fix softdep statement

Do not dereference 'siw_crypto_shash' before checking

Reported-by: "Dan Carpenter" <[email protected]>
Fixes: f29dd55b0236 ("rdma/siw: queue pair methods")
Link: https://lore.kernel.org/r/OF61E386ED.49A73798-ON00258444.003BD6A6-00258444.003CC8D9@notes.na.collabserv.com
Signed-off-by: Bernard Metzler <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

ALSA: pcm: fix lost wakeup event scenarios in snd_pcm_drain

lost wakeup can occur after enabling irq, therefore put task
into interruptible before enabling interrupts,

without this change, task can be put to sleep and snd_pcm_drain
will delay

Fixes: f2b3614cefb6 ("ALSA: PCM - Don't check DMA time-out too shortly")
Signed-off-by: Yuki Tsunashima <[email protected]>
Signed-off-by: Suresh Udipi <[email protected]>
[ported from 4.9]
Signed-off-by: Adam Miartus <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

RDMA/qedr: Fix the hca_type and hca_rev returned in device attributes

There was a place holder for hca_type and vendor was returned
in hca_rev. Fix the hca_rev to return the hw revision and fix
the hca_type to return an informative string representing the
hca.

Signed-off-by: Michal Kalderon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Doug Ledford <[email protected]>

dax: Fix missed wakeup in put_unlocked_entry()

The condition checking whether put_unlocked_entry() needs to wake up
following waiter got broken by commit 23c84eb78375 ("dax: Fix missed
wakeup with PMD faults"). We need to wake the waiter whenever the passed
entry is valid (i.e., non-NULL and not special conflict entry). This
could lead to processes never being woken up when waiting for entry
lock. Fix the condition.

Cc: <[email protected]>
Link: http://lore.kernel.org/r/[email protected]
Fixes: 23c84eb78375 ("dax: Fix missed wakeup with PMD faults")
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Dan Williams <[email protected]>

RDMA/hns: Fix build error

If INFINIBAND_HNS_HIP08 is selected and HNS3 is m,
but INFINIBAND_HNS is y, building fails:

drivers/infiniband/hw/hns/hns_roce_hw_v2.o: In function `hns_roce_hw_v2_exit':
hns_roce_hw_v2.c:(.exit.text+0xd): undefined reference to `hnae3_unregister_client'
drivers/infiniband/hw/hns/hns_roce_hw_v2.o: In function `hns_roce_hw_v2_init':
hns_roce_hw_v2.c:(.init.text+0xd): undefined reference to `hnae3_register_client'

Also if INFINIBAND_HNS_HIP06 is selected and HNS_DSAF
is m, but INFINIBAND_HNS is y, building fails:

drivers/infiniband/hw/hns/hns_roce_hw_v1.o: In function `hns_roce_v1_reset':
hns_roce_hw_v1.c:(.text+0x39fa): undefined reference to `hns_dsaf_roce_reset'
hns_roce_hw_v1.c:(.text+0x3a25): undefined reference to `hns_dsaf_roce_reset'

Reported-by: Hulk Robot <[email protected]>
Fixes: dd74282df573 ("RDMA/hns: Initialize the PCI device for hip08 RoCE")
Fixes: 08805fdbeb2d ("RDMA/hns: Split hw v1 driver from hns roce driver")
Signed-off-by: YueHaibing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Doug Ledford <[email protected]>

vfio-ccw: make vfio_ccw_async_region_ops static

Since vfio_ccw_async_region_ops is not exported and has no reason to be
globally visible make it static to avoid the following sparse warning:
drivers/s390/cio/vfio_ccw_async.c:73:30: warning: symbol 'vfio_ccw_async_region_ops' was not declared. Should it be static?

Fixes: d5afd5d135c8 ("vfio-ccw: add handling for async channel instructions")
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/3215: add switch fall through comment for -Wimplicit-fallthrough

Silence the following warning when built with -Wimplicit-fallthrough=3
enabled by default since 5.3-rc2:
drivers/s390/char/con3215.c: In function 'raw3215_irq':
drivers/s390/char/con3215.c:399:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
  399 |   if (dstat == 0x08)
      |      ^
drivers/s390/char/con3215.c:401:2: note: here
  401 |  case 0x04:
      |  ^~~~

Signed-off-by: Vasily Gorbik <[email protected]>

s390/tape: add fallthrough annotations

Commit a035d552a93b ("Makefile: Globally enable fall-through warning")
enables fall-through warnings globally. Add missing annotations.

Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/mm: add fallthrough annotations

Commit a035d552a93b ("Makefile: Globally enable fall-through warning")
enables fall-through warnings globally. Add missing annotations.

Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390/mm: make gmap_test_and_clear_dirty_pmd static

Since gmap_test_and_clear_dirty_pmd is not exported and has no reason to
be globally visible make it static to avoid the following sparse warning:
arch/s390/mm/gmap.c:2427:6: warning: symbol 'gmap_test_and_clear_dirty_pmd' was not declared. Should it be static?

Signed-off-by: Vasily Gorbik <[email protected]>

s390/kexec: add missing include to machine_kexec_reloc.c

Include <asm/kexec.h> into machine_kexec_reloc.c to expose
arch_kexec_do_relocs declaration and avoid the following sparse warnings:
arch/s390/kernel/machine_kexec_reloc.c:4:5: warning: symbol 'arch_kexec_do_relocs' was not declared. Should it be static?
arch/s390/boot/../kernel/machine_kexec_reloc.c:4:5: warning: symbol 'arch_kexec_do_relocs' was not declared. Should it be static?

Signed-off-by: Vasily Gorbik <[email protected]>

s390/perf: make cf_diag_csd static

Since there is really no reason for cf_diag_csd per cpu variable to be
globally visible make it static to avoid the following sparse warning:
arch/s390/kernel/perf_cpum_cf_diag.c:37:1: warning: symbol 'cf_diag_csd' was not declared. Should it be static?

Signed-off-by: Vasily Gorbik <[email protected]>

s390/lib: add missing include

Include <asm/xor.h> into arch/s390/lib/xor.c to expose xor_block_xc
declaration and avoid the following sparse warning:
arch/s390/lib/xor.c:128:27: warning: symbol 'xor_block_xc' was not declared. Should it be static?

Signed-off-by: Vasily Gorbik <[email protected]>

s390/boot: add missing declarations and includes

Add __swsusp_reset_dma declaration to avoid the following sparse warnings:
arch/s390/kernel/setup.c:107:15: warning: symbol '__swsusp_reset_dma' was not declared. Should it be static?
arch/s390/boot/startup.c:52:15: warning: symbol '__swsusp_reset_dma' was not declared. Should it be static?

Add verify_facilities declaration to avoid the following sparse warning:
arch/s390/boot/als.c:105:6: warning: symbol 'verify_facilities' was not declared. Should it be static?

Include "boot.h" into arch/s390/boot/kaslr.c to expose get_random_base
function declaration and avoid the following sparse warning:
arch/s390/boot/kaslr.c:90:15: warning: symbol 'get_random_base' was not declared. Should it be static?

Signed-off-by: Vasily Gorbik <[email protected]>

s390: update configs

Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

s390: clean up qdio.h

Fix two typos, document missing fields in the driver initialization
data and remove the copy&pasted 'pfmt' field from the qdr struct.

Signed-off-by: Julian Wiedmann <[email protected]>
Reviewed-by: Jens Remus <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>

platform/x86: pcengines-apuv2: use KEY_RESTART for front button

The keycode KEY_RESTART is more appropriate for the front button,
as most people use it for things like restart or factory reset.

Signed-off-by: Enrico Weigelt <[email protected]>
Fixes: f8eb0235f659 ("x86: pcengines apuv2 gpio/leds/keys platform driver")
Signed-off-by: Andy Shevchenko <[email protected]>

pidfd: Add warning if exit_state is 0 during notification

Previously a condition got missed where the pidfd waiters are awakened
before the exit_state gets set. This can result in a missed notification
[1] and the polling thread waiting forever.

It is fixed now, however it would be nice to avoid this kind of issue
going unnoticed in the future. So just add a warning to catch it in the
future.

/* References */
[1]: https://lore.kernel.org/lkml/20190717172100 [email protected]/

Signed-off-by: Joel Fernandes (Google) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>

pidfd: remove obsolete comments from test

Since the introduction of CLONE_PIDFD pidfd_send_signal() is independent
of CONFIG_PROC_FS.

Signed-off-by: Christian Brauner <[email protected]>

libbpf: fix missing __WORDSIZE definition

hashmap.h depends on __WORDSIZE being defined. It is defined by
glibc/musl in different headers. It's an explicit goal for musl to be
"non-detectable" at compilation time, so instead include glibc header if
glibc is explicitly detected and fall back to musl header otherwise.

Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Fixes: e3b924224028 ("libbpf: add resizable non-thread safe internal hashmap")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

drm/i915: Fix the TBT AUX power well enabling

Fix the mapping from a TBT AUX power well index to the DP_AUX_CH_CTL
register.

Fixes: c7375d9542f1 ("drm/i915: Configure AUX_CH_CTL when enabling the AUX power domain")
Cc: José Roberto de Souza <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Signed-off-by: Imre Deak <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 29ae36abf08f943b76a2959f5000c44efa335be7)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Fix GEN8_MCR_SELECTOR programming

fls returns bit positions starting from one for the lsb and the MCR
register expects zero based (sub)slice addressing.

Incorrent MCR programming can have the effect of directing MMIO reads of
registers in the 0xb100-0xb3ff range to invalid subslice returning zeroes
instead of actual content.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Fixes: 1e40d4aea57b ("drm/i915/cnl: Implement WaProgramMgsrForCorrectSliceSpecificMmioReads")
Reviewed-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 15160879d47213c32f357bc67b6014d9aaf14ed7)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/vbt: Fix VBT parsing for the PSR section

A single 32-bit PSR2 training pattern field follows the sixteen element
array of PSR table entries in the VBT spec. But, we incorrectly define
this PSR2 field for each of the PSR table entries. As a result, the PSR1
training pattern duration for any panel_type != 0 will be parsed
incorrectly. Secondly, PSR2 training pattern durations for VBTs with bdb
version >= 226 will also be wrong.

Cc: Rodrigo Vivi <[email protected]>
Cc: José Roberto de Souza <[email protected]>
Cc: [email protected]
Cc: [email protected] #v5.2
Fixes: 88a0d9606aff ("drm/i915/vbt: Parse and use the new field with PSR2 TP2/3 wakeup time")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111088
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204183
Signed-off-by: Dhinakaran Pandiyan <[email protected]>
Reviewed-by: Ville Syrjälä <[email protected]>
Reviewed-by: José Roberto de Souza <[email protected]>
Acked-by: Rodrigo Vivi <[email protected]>
Tested-by: François Guerraz <[email protected]>
Signed-off-by: Rodrigo Vivi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit b5ea9c9337007d6e700280c8a60b4e10d070fb53)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Make sure cdclk is high enough for DP audio on VLV/CHV

On VLV/CHV there is some kind of linkage between the cdclk frequency
and the DP link frequency. The spec says:
"For DP audio configuration, cdclk frequency shall be set to
meet the following requirements:
DP Link Frequency(MHz) | Cdclk frequency(MHz)
270 | 320 or higher
162 | 200 or higher"

I suspect that would more accurately be expressed as
"cdclk >= DP link clock", and in any case we can express it like
that in the code because of the limited set of cdclk (200, 266,
320, 400 MHz) and link frequencies (162 and 270 MHz) we support.

Without this we can end up in a situation where the cdclk
is too low and enabling DP audio will kill the pipe. Happens
eg. with 2560x1440 modes where the 266MHz cdclk is sufficient
to pump the pixels (241.5 MHz dotclock) but is too low for
the DP audio due to the link frequency being 270 MHz.

v2: Spell out the cdclk and link frequencies we actually support

Cc: [email protected]
Tested-by: Stefan Gottwald <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111149
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Acked-by: Chris Wilson <[email protected]>
(cherry picked from commit bffb31f73b29a60ef693842d8744950c2819851d)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Lock the engine while dumping the active request

We cannot let the request be retired and freed while we are trying to
dump it during error capture. It is not sufficient just to grab a
reference to the request, as during retirement we may free the ring
which we are also dumping. So take the engine lock to prevent retiring
and freeing of the request.

Reported-by: Alex Shumsky <[email protected]>
Fixes: 83c317832eb1 ("drm/i915: Dump the ringbuffer of the active request for debugging")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Alex Shumsky <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit cfe7288c276e359eebf057699fe86c2f8af14224)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/perf: add missing delay for OA muxes configuration

This was dropped from the original patch series, we weren't sure
whether it was needed at the time. More recent tests show it's
definitely needed to have acurate performance data.

Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 19f81df2859eb1 ("drm/i915/perf: Add OA unit support for Gen 8+")
Acked-by: Chris Wilson <[email protected]>
[ickle: combine duplicate code and comments]
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 14bfcd3e0daeb0f757a02aac85fd03e0933ab37e)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/perf: ensure we keep a reference on the driver

The i915 perf stream has its own file descriptor and is tied to
reference of the driver. We haven't taken care of keep the driver
alive.

Signed-off-by: Lionel Landwerlin <[email protected]>
Suggested-by: Chris Wilson <[email protected]>
Fixes: eec688e1420da5 ("drm/i915: Add i915 perf infrastructure")
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit a5af1df716c123a09341351008fc497bea137b77)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/userptr: Acquire the page lock around set_page_dirty()

set_page_dirty says:

For pages with a mapping this should be done under the page lock
for the benefit of asynchronous memory errors who prefer a
consistent dirty state. This rule can be broken in some special
cases, but should be better not to.

Under those rules, it is only safe for us to use the plain set_page_dirty
calls for shmemfs/anonymous memory. Userptr may be used with real
mappings and so needs to use the locked version (set_page_dirty_lock).

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203317
Fixes: 5cc9ed4b9a7a ("drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl")
References: 6dcc693bc57f ("ext4: warn when page is dirtied without buffers")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: [email protected]
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit cb6d7c7dc7ff8cace666ddec66334117a6068ce2)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/gtt: Mark the freed page table entries with scratch

On unwinding the allocation error path and having freed the page table
entry, it is imperative that we mark it as scratch.

<4> [416.075569] general protection fault: 0000 [#1] PREEMPT SMP PTI
<4> [416.075801] CPU: 0 PID: 2385 Comm: kworker/u2:11 Tainted: G     U            5.2.0-rc7-CI-Patchwork_13534+ #1
<4> [416.076162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
<4> [416.076522] Workqueue: i915 __i915_vm_release [i915]
<4> [416.076754] RIP: 0010:gen8_ppgtt_cleanup_3lvl+0x58/0xb0 [i915]
<4> [416.077023] Code: 81 e2 04 fe ff ff 81 c2 ff 01 00 00 4c 8d 74 d6 58 4d 8b 65 00 4d 3b a7 28 02 00 00 74 40 49 8d 5c 24 50 49 81 c4 50 10 00 00 <48> 8b 2b 49 3b af 20 02 00 00 74 13 4c 89 ff 48 89 ee e8 01 fb ff
<4> [416.077445] RSP: 0018:ffffc9000046bd98 EFLAGS: 00010206
<4> [416.077625] RAX: 0001000000000000 RBX: 6b6b6b6b6b6b6bbb RCX: 8b4b56d500000000
<4> [416.077838] RDX: 00000000000001ff RSI: ffff88805a578008 RDI: ffff88805bd0efc8
<4> [416.078167] RBP: ffff88805bd0efc8 R08: 0000000004e42b93 R09: 0000000000000001
<4> [416.078381] R10: 0000000000000000 R11: ffff888077a1b0b8 R12: 6b6b6b6b6b6b7bbb
<4> [416.078594] R13: ffff88805a578058 R14: ffff88805a579058 R15: ffff88805bd0efc8
<4> [416.078815] FS:  0000000000000000(0000) GS:ffff88807da00000(0000) knlGS:0000000000000000
<4> [416.079395] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [416.079851] CR2: 000056160fec2b14 CR3: 0000000071bbc003 CR4: 00000000003606f0
<4> [416.080388] Call Trace:
<4> [416.080828]  gen8_ppgtt_cleanup+0x64/0x100 [i915]
<4> [416.081399]  __i915_vm_release+0xfc/0x1d0 [i915]

Fixes: 1d1b5490b91c ("drm/i915/gtt: Replace struct_mutex serialisation for allocation")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit e7539b79f703a6b533385088fc15cb5c9ab3f56f)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/gtt: Defer the free for alloc error paths

If we hit an error while allocating the page tables, we have to unwind
the incomplete updates, and wish to free the unused pd. However, we are
not allowed to be hoding the spinlock at that point, and so must use the
later free to defer it until after we drop the lock.

<3> [414.363795] BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/i915_gem_gtt.c:472
<3> [414.364167] in_atomic(): 1, irqs_disabled(): 0, pid: 3905, name: i915_selftest
<4> [414.364406] 3 locks held by i915_selftest/3905:
<4> [414.364408]  #0: 0000000034fe8aa8 (&dev->mutex){....}, at: device_driver_attach+0x18/0x50
<4> [414.364415]  #1: 000000006bd8a560 (&dev->struct_mutex){+.+.}, at: igt_ctx_exec+0xb7/0x410 [i915]
<4> [414.364476]  #2: 000000003dfdc766 (&(&pd->lock)->rlock){+.+.}, at: gen8_ppgtt_alloc_pdp+0x448/0x540 [i915]
<3> [414.364529] Preemption disabled at:
<4> [414.364530] [<0000000000000000>] 0x0
<4> [414.364696] CPU: 0 PID: 3905 Comm: i915_selftest Tainted: G     U            5.2.0-rc7-CI-CI_DRM_6403+ #1
<4> [414.364698] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
<4> [414.364699] Call Trace:
<4> [414.364704]  dump_stack+0x67/0x9b
<4> [414.364708]  ___might_sleep+0x167/0x250
<4> [414.364777]  vm_free_page+0x24/0xc0 [i915]
<4> [414.364852]  free_pd+0xf/0x20 [i915]
<4> [414.364897]  gen8_ppgtt_alloc_pdp+0x489/0x540 [i915]
<4> [414.364946]  gen8_ppgtt_alloc_4lvl+0x8e/0x2e0 [i915]
<4> [414.364992]  ppgtt_bind_vma+0x2e/0x60 [i915]
<4> [414.365039]  i915_vma_bind+0xe8/0x2c0 [i915]
<4> [414.365088]  __i915_vma_do_pin+0xa1/0xd20 [i915]

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111050
Fixes: 1d1b5490b91c ("drm/i915/gtt: Replace struct_mutex serialisation for allocation")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Reviewed-by: Mika Kuoppala <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 068610895ebd4bd86f496f01eb7b97e56d7269b2)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Deal with machines that expose less than three QGV points

When SAGV is forced to disabled/min/med/max in the BIOS pcode will
only hand us a single QGV point instead of the normal three. Fix
the code to deal with that instead declaring the bandwidth limit
to be 0 MB/s (and thus preventing any planes from being enabled).

Also shrink the max_bw sturct a bit while at it, and change the
deratedbw type to unsigned since the code returns the bw as
an unsigned int.

Since we now keep track of how many qgv points we got from pcode
we can drop the earlier check added for the "pcode doesn't
support the memory subsystem query" case.

Cc: [email protected]
Cc: Mark Janes <[email protected]>
Cc: Matt Roper <[email protected]>
Cc: Clint Taylor <[email protected]>
Fixes: c457d9cf256e ("drm/i915: Make sure we have enough memory bandwidth on ICL")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110838
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Matt Roper <[email protected]>
(cherry picked from commit 56e9371bc3f3e7d6c1a197a45d550b2ce6af25f6)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Fix memleak in runtime wakeref tracking

If we untrack wakerefs, the actual count may reach zero.
However the krealloced owners array is still there and
needs to be taken care of. Free the owners unconditionally
to fix the leak.

Fixes: bd780f37a361 ("drm/i915: Track all held rpm wakerefs")
Reported-by: Juha-Pekka Heikkila <[email protected]>
Cc: Juha-Pekka Heikkila <[email protected]>
Cc: Chris Wilson <[email protected]>
Signed-off-by: Mika Kuoppala <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit c5f846eed2a1856b78e988eeef08215c70598ecd)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/icl: whitelist PS_(DEPTH|INVOCATION)_COUNT

The same tests failing on CFL+ platforms are also failing on ICL.
Documentation doesn't list the
WaAllowPMDepthAndInvocationCountAccessFromUMD workaround for ICL but
applying it fixes the same tests as CFL.

v2: Use only one whitelist entry (Lionel)

Signed-off-by: Lionel Landwerlin <[email protected]>
Tested-by: Anuj Phogat <[email protected]>
Cc: [email protected] # 6883eab27481: drm/i915: Support flags in whitlist WAs
Cc: [email protected]
Acked-by: Chris Wilson <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 3fe0107e45ab396342497e06b8924cdd485cde3b)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: whitelist PS_(DEPTH|INVOCATION)_COUNT

CFL:C0+ changed the status of those registers which are now
blacklisted by default.

This is breaking a number of CTS tests on GL & Vulkan :

KHR-GL45.pipeline_statistics_query_tests_ARB.functional_fragment_shader_invocations (GL)

dEQP-VK.query_pool.statistics_query.fragment_shader_invocations.* (Vulkan)

v2: Only use one whitelist entry (Lionel)

Bspec: 14091
Signed-off-by: Lionel Landwerlin <[email protected]>
Cc: [email protected] # 6883eab27481: drm/i915: Support flags in whitlist WAs
Cc: [email protected]
Acked-by: Chris Wilson <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 2c903da50f5a9522b134e488bd0f92646c46f3c0)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: fix whitelist selftests with readonly registers

When a register is readonly there is not much we can tell about its
value (apart from its default value?). This can be covered by tests
exercising the value of the register from userspace.

For PS_INVOCATION_COUNT we've got the following piglit tests :

KHR-GL45.pipeline_statistics_query_tests_ARB.functional_fragment_shader_invocations

Vulkan CTS tests :

dEQP-VK.query_pool.statistics_query.fragment_shader_invocations.*

v2: Use a local to shrink under 80cols.

Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 86554f48e511 ("drm/i915/selftests: Verify whitelist of context registers")
Tested-by: Anuj Phogat <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 361b69051326ed0e07553315227678d00d651a9e)
Signed-off-by: Jani Nikula <[email protected]>

powerpc/kvm: Fall through switch case explicitly

Implicit fallthrough warning was enabled globally which broke
the build. Make it explicit with a `fall through` comment.

Signed-off-by: Santosh Sivaraj <[email protected]>
Reviewed-by: Stephen Rothwell <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

perf tools: Fix perf.data documentation units for memory size

The perf.data-file-format documentation incorrectly says the
HEADER_TOTAL_MEM results are in bytes. The results are in kilobytes
(perf reads the value from /proc/meminfo)

Signed-off-by: Vince Weaver <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1907251155500.22624@macbook-air
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf header: Fix use of unitialized value warning

When building our local version of perf with MSAN (Memory Sanitizer) and
running the perf record command, MSAN throws a use of uninitialized
value warning in "tools/perf/util/util.c:333:6".

This warning stems from the "buf" variable being passed into "write".
It originated as the variable "ev" with the type union perf_event*
defined in the "perf_event__synthesize_attr" function in
"tools/perf/util/header.c".

In the "perf_event__synthesize_attr" function they allocate space with a malloc
call using ev, then go on to only assign some of the member variables before
passing "ev" on as a parameter to the "process" function therefore "ev"
contains uninitialized memory. Changing the malloc call to zalloc to initialize
all the members of "ev" which gets rid of the warning.

To reproduce this warning, build perf by running:
make -C tools/perf CLANG=1 CC=clang EXTRA_CFLAGS="-fsanitize=memory\
-fsanitize-memory-track-origins"

(Additionally, llvm might have to be installed and clang might have to
be specified as the compiler - export CC=/usr/bin/clang)

then running:
tools/perf/perf record -o - ls / | tools/perf/perf --no-pager annotate\
-i - --stdio

Please see the cover letter for why false positive warnings may be
generated.

Signed-off-by: Numfor Mbiziwo-Tiapo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Drayton <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf header: Fix divide by zero error if f_header.attr_size==0

So I have been having lots of trouble with hand-crafted perf.data files
causing segfaults and the like, so I have started fuzzing the perf tool.

First issue found:

If f_header.attr_size is 0 in the perf.data file, then perf will crash
with a divide-by-zero error.

Committer note:

Added a pr_err() to tell the user why the command failed.

Signed-off-by: Vince Weaver <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1907231100440.14532@macbook-air
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools headers UAPI: Sync if_link.h with the kernel

To pick the changes in:

07a4ddec3ce9 ("bonding: add an option to specify a delay between peer notifications")

And silence this build warning:

Kernel ABI header at 'tools/include/uapi/linux/if_link.h' differs from latest version at 'include/uapi/linux/if_link.h'

Cc: Adrian Hunter <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Luis Cláudio Gonçalves <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Vincent Bernat <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools headers UAPI: Sync sched.h with the kernel

To get the changes in:

  a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
  1d6362fa0cfc ("sched/core: Allow sched_setattr() to use the current policy")
  7f192e3cd316 ("fork: add clone3")

And silence this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/sched.h' differs from latest version at 'include/uapi/linux/sched.h'
  diff -u tools/include/uapi/linux/sched.h include/uapi/linux/sched.h

No changes in tools/ due to the above.

Cc: Adrian Hunter <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Luis Cláudio Gonçalves <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Patrick Bellasi <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools headers UAPI: Sync usbdevice_fs.h with the kernels to get new ioctl

To get the changes in:

  6d101f24f1dd ("USB: add usbfs ioctl to retrieve the connection parameters")

And address this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/usbdevice_fs.h' differs from latest version at 'include/uapi/linux/usbdevice_fs.h'
  diff -u tools/include/uapi/linux/usbdevice_fs.h include/uapi/linux/usbdevice_fs.h

Which ends up autogenerating a ioctl_cmd->string table used by 'perf
trace':

  $ tools/perf/trace/beauty/usbdevfs_ioctl.sh > before
  $ cp include/uapi/linux/usbdevice_fs.h tools/include/uapi/linux/usbdevice_fs.h
  $ tools/perf/trace/beauty/usbdevfs_ioctl.sh > after
  $ diff -u before after
  --- before 2019-07-26 15:26:55.513636844 -0300
  +++ after 2019-07-26 15:29:11.650518677 -0300
  @@ -23,6 +23,7 @@
          [2] = "BULK",
          [30] = "DROP_PRIVILEGES",
          [31] = "GET_SPEED",
  +       [32] = "CONNINFO_EX",
          [3] = "RESETEP",
          [4] = "SETINTERFACE",
          [5] = "SETCONFIGURATION",
  $

Now 'perf trace' ioctl beautifier will translate this new ioctl to a
string and at some point will allow filtering the 'ioctl' syscall with
something like this in a system wide strace-like sessin:

  # perf trace -e ioctl/cmd=USBDEVFS_CONNINFO_EX/

Cc: Adrian Hunter <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Luis Cláudio Gonçalves <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>