Git Repo - linux.git/log

btrfs: do not leak reloc root if we fail to read the fs root

If we fail to read the fs root corresponding with a reloc root we'll
just break out and free the reloc roots. But we remove our current
reloc_root from this list higher up, which means we'll leak this
reloc_root. Fix this by adding ourselves back to the reloc_roots list
so we are properly cleaned up.

CC: [email protected] # 4.4+
Reviewed-by: Filipe Manana <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: skip log replay on orphaned roots

My fsstress modifications coupled with generic/475 uncovered a failure
to mount and replay the log if we hit a orphaned root.  We do not want
to replay the log for an orphan root, but it's completely legitimate to
have an orphaned root with a log attached.  Fix this by simply skipping
replaying the log.  We still need to pin it's root node so that we do
not overwrite it while replaying other logs, as we re-read the log root
at every stage of the replay.

CC: [email protected] # 4.4+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: handle ENOENT in btrfs_uuid_tree_iterate

If we get an -ENOENT back from btrfs_uuid_iter_rem when iterating the
uuid tree we'll just continue and do btrfs_next_item(). However we've
done a btrfs_release_path() at this point and no longer have a valid
path. So increment the key and go back and do a normal search.

CC: [email protected] # 4.4+
Reviewed-by: Filipe Manana <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: abort transaction after failed inode updates in create_subvol

We can just abort the transaction here, and in fact do that for every
other failure in this function except these two cases.

CC: [email protected] # 4.4+
Reviewed-by: Filipe Manana <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

Btrfs: fix hole extent items with a zero size after range cloning

Normally when cloning a file range if we find an implicit hole at the end
of the range we assume it is because the NO_HOLES feature is enabled.
However that is not always the case. One well known case [1] is when we
have a power failure after mixing buffered and direct IO writes against
the same file.

In such cases we need to punch a hole in the destination file, and if
the NO_HOLES feature is not enabled, we need to insert explicit file
extent items to represent the hole. After commit 690a5dbfc51315
("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning
extents"), we started to insert file extent items representing the hole
with an item size of 0, which is invalid and should be 53 bytes (the size
of a btrfs_file_extent_item structure), resulting in all sorts of
corruptions and invalid memory accesses. This is detected by the tree
checker when we attempt to write a leaf to disk.

The problem can be sporadically triggered by test case generic/561 from
fstests. That test case does not exercise power failure and creates a new
filesystem when it starts, so it does not use a filesystem created by any
previous test that tests power failure. However the test does both
buffered and direct IO writes (through fsstress) and it's precisely that
which is creating the implicit holes in files. That happens even before
the commit mentioned earlier. I need to investigate why we get those
implicit holes to check if there is a real problem or not. For now this
change fixes the regression of introducing file extent items with an item
size of 0 bytes.

Fix the issue by calling btrfs_punch_hole_range() without passing a
btrfs_clone_extent_info structure, which ensures file extent items are
inserted to represent the hole with a correct item size. We were passing
a btrfs_clone_extent_info with a value of 0 for its 'item_size' field,
which was causing the insertion of file extent items with an item size
of 0.

[1] https://www.spinics.net/lists/linux-btrfs/msg75350.html

Reported-by: David Sterba <[email protected]>
Fixes: 690a5dbfc51315 ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents")
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>

Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues

When a tree mod log user no longer needs to use the tree it calls
btrfs_put_tree_mod_seq() to remove itself from the list of users and
delete all no longer used elements of the tree's red black tree, which
should be all elements with a sequence number less then our equals to
the caller's sequence number. However the logic is broken because it
can delete and free elements from the red black tree that have a
sequence number greater then the caller's sequence number:

1) At a point in time we have sequence numbers 1, 2, 3 and 4 in the
   tree mod log;

2) The task which got assigned the sequence number 1 calls
   btrfs_put_tree_mod_seq();

3) Sequence number 1 is deleted from the list of sequence numbers;

4) The current minimum sequence number is computed to be the sequence
   number 2;

5) A task using sequence number 2 is at tree_mod_log_rewind() and gets
   a pointer to one of its elements from the red black tree through
   a call to tree_mod_log_search();

6) The task with sequence number 1 iterates the red black tree of tree
   modification elements and deletes (and frees) all elements with a
   sequence number less then or equals to 2 (the computed minimum sequence
   number) - it ends up only leaving elements with sequence numbers of 3
   and 4;

7) The task with sequence number 2 now uses the pointer to its element,
   already freed by the other task, at __tree_mod_log_rewind(), resulting
   in a use-after-free issue. When CONFIG_DEBUG_PAGEALLOC=y it produces
   a trace like the following:

  [16804.546854] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  [16804.547451] CPU: 0 PID: 28257 Comm: pool Tainted: G        W         5.4.0-rc8-btrfs-next-51 #1
  [16804.548059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
  [16804.548666] RIP: 0010:rb_next+0x16/0x50
  (...)
  [16804.550581] RSP: 0018:ffffb948418ef9b0 EFLAGS: 00010202
  [16804.551227] RAX: 6b6b6b6b6b6b6b6b RBX: ffff90e0247f6600 RCX: 6b6b6b6b6b6b6b6b
  [16804.551873] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e0247f6600
  [16804.552504] RBP: ffff90dffe0d4688 R08: 0000000000000001 R09: 0000000000000000
  [16804.553136] R10: ffff90dffa4a0040 R11: 0000000000000000 R12: 000000000000002e
  [16804.553768] R13: ffff90e0247f6600 R14: 0000000000001663 R15: ffff90dff77862b8
  [16804.554399] FS:  00007f4b197ae700(0000) GS:ffff90e036a00000(0000) knlGS:0000000000000000
  [16804.555039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [16804.555683] CR2: 00007f4b10022000 CR3: 00000002060e2004 CR4: 00000000003606f0
  [16804.556336] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [16804.556968] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [16804.557583] Call Trace:
  [16804.558207]  __tree_mod_log_rewind+0xbf/0x280 [btrfs]
  [16804.558835]  btrfs_search_old_slot+0x105/0xd00 [btrfs]
  [16804.559468]  resolve_indirect_refs+0x1eb/0xc70 [btrfs]
  [16804.560087]  ? free_extent_buffer.part.19+0x5a/0xc0 [btrfs]
  [16804.560700]  find_parent_nodes+0x388/0x1120 [btrfs]
  [16804.561310]  btrfs_check_shared+0x115/0x1c0 [btrfs]
  [16804.561916]  ? extent_fiemap+0x59d/0x6d0 [btrfs]
  [16804.562518]  extent_fiemap+0x59d/0x6d0 [btrfs]
  [16804.563112]  ? __might_fault+0x11/0x90
  [16804.563706]  do_vfs_ioctl+0x45a/0x700
  [16804.564299]  ksys_ioctl+0x70/0x80
  [16804.564885]  ? trace_hardirqs_off_thunk+0x1a/0x20
  [16804.565461]  __x64_sys_ioctl+0x16/0x20
  [16804.566020]  do_syscall_64+0x5c/0x250
  [16804.566580]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [16804.567153] RIP: 0033:0x7f4b1ba2add7
  (...)
  [16804.568907] RSP: 002b:00007f4b197adc88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [16804.569513] RAX: ffffffffffffffda RBX: 00007f4b100210d8 RCX: 00007f4b1ba2add7
  [16804.570133] RDX: 00007f4b100210d8 RSI: 00000000c020660b RDI: 0000000000000003
  [16804.570726] RBP: 000055de05a6cfe0 R08: 0000000000000000 R09: 00007f4b197add44
  [16804.571314] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4b197add48
  [16804.571905] R13: 00007f4b197add40 R14: 00007f4b100210d0 R15: 00007f4b197add50
  (...)
  [16804.575623] ---[ end trace 87317359aad4ba50 ]---

Fix this by making btrfs_put_tree_mod_seq() skip deletion of elements that
have a sequence number equals to the computed minimum sequence number, and
not just elements with a sequence number greater then that minimum.

Fixes: bd989ba359f2ac ("Btrfs: add tree modification log functions")
CC: [email protected] # 4.4+
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>

Btrfs: make tree checker detect checksum items with overlapping ranges

Having checksum items, either on the checksums tree or in a log tree, that
represent ranges that overlap each other is a sign of a corruption. Such
case confuses the checksum lookup code and can result in not being able to
find checksums or find stale checksums.

So add a check for such case.

This is motivated by a recent fix for a case where a log tree had checksum
items covering ranges that overlap each other due to extent cloning, and
resulted in missing checksums after replaying the log tree. It also helps
detect past issues such as stale and outdated checksums due to overlapping,
commit 27b9a8122ff71a ("Btrfs: fix csum tree corruption, duplicate and
outdated checksums").

CC: [email protected] # 4.4+
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>

Btrfs: fix missing data checksums after replaying a log tree

When logging a file that has shared extents (reflinked with other files or
with itself), we can end up logging multiple checksum items that cover
overlapping ranges. This confuses the search for checksums at log replay
time causing some checksums to never be added to the fs/subvolume tree.

Consider the following example of a file that shares the same extent at
offsets 0 and 256Kb:

   [ bytenr 13893632, offset 64Kb, len 64Kb  ]
   0                                         64Kb

   [ bytenr 13631488, offset 64Kb, len 192Kb ]
   64Kb                                      256Kb

   [ bytenr 13893632, offset 0, len 256Kb    ]
   256Kb                                     512Kb

When logging the inode, at tree-log.c:copy_items(), when processing the
file extent item at offset 0, we log a checksum item covering the range
13959168 to 14024704, which corresponds to 13893632 + 64Kb and 13893632 +
64Kb + 64Kb, respectively.

Later when processing the extent item at offset 256K, we log the checksums
for the range from 13893632 to 14155776 (which corresponds to 13893632 +
256Kb). These checksums get merged with the checksum item for the range
from 13631488 to 13893632 (13631488 + 256Kb), logged by a previous fsync.
So after this we get the two following checksum items in the log tree:

   (...)
   item 6 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 3095 itemsize 512
           range start 13631488 end 14155776 length 524288
   item 7 key (EXTENT_CSUM EXTENT_CSUM 13959168) itemoff 3031 itemsize 64
           range start 13959168 end 14024704 length 65536

The first one covers the range from the second one, they overlap.

So far this does not cause a problem after replaying the log, because
when replaying the file extent item for offset 256K, we copy all the
checksums for the extent 13893632 from the log tree to the fs/subvolume
tree, since searching for an checksum item for bytenr 13893632 leaves us
at the first checksum item, which covers the whole range of the extent.

However if we write 64Kb to file offset 256Kb for example, we will
not be able to find and copy the checksums for the last 128Kb of the
extent at bytenr 13893632, referenced by the file range 384Kb to 512Kb.

After writing 64Kb into file offset 256Kb we get the following extent
layout for our file:

   [ bytenr 13893632, offset 64K, len 64Kb   ]
   0                                         64Kb

   [ bytenr 13631488, offset 64Kb, len 192Kb ]
   64Kb                                      256Kb

   [ bytenr 14155776, offset 0, len 64Kb     ]
   256Kb                                     320Kb

   [ bytenr 13893632, offset 64Kb, len 192Kb ]
   320Kb                                     512Kb

After fsync'ing the file, if we have a power failure and then mount
the filesystem to replay the log, the following happens:

1) When replaying the file extent item for file offset 320Kb, we
   lookup for the checksums for the extent range from 13959168
   (13893632 + 64Kb) to 14155776 (13893632 + 256Kb), through a call
   to btrfs_lookup_csums_range();

2) btrfs_lookup_csums_range() finds the checksum item that starts
   precisely at offset 13959168 (item 7 in the log tree, shown before);

3) However that checksum item only covers 64Kb of data, and not 192Kb
   of data;

4) As a result only the checksums for the first 64Kb of data referenced
   by the file extent item are found and copied to the fs/subvolume tree.
   The remaining 128Kb of data, file range 384Kb to 512Kb, doesn't get
   the corresponding data checksums found and copied to the fs/subvolume
   tree.

5) After replaying the log userspace will not be able to read the file
   range from 384Kb to 512Kb, because the checksums are missing and
   resulting in an -EIO error.

The following steps reproduce this scenario:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt/sdc

  $ xfs_io -f -c "pwrite -S 0xa3 0 256K" /mnt/sdc/foobar
  $ xfs_io -c "fsync" /mnt/sdc/foobar
  $ xfs_io -c "pwrite -S 0xc7 256K 256K" /mnt/sdc/foobar

  $ xfs_io -c "reflink /mnt/sdc/foobar 320K 0 64K" /mnt/sdc/foobar
  $ xfs_io -c "fsync" /mnt/sdc/foobar

  $ xfs_io -c "pwrite -S 0xe5 256K 64K" /mnt/sdc/foobar
  $ xfs_io -c "fsync" /mnt/sdc/foobar

  <power failure>

  $ mount /dev/sdc /mnt/sdc
  $ md5sum /mnt/sdc/foobar
  md5sum: /mnt/sdc/foobar: Input/output error

  $ dmesg | tail
  [165305.003464] BTRFS info (device sdc): no csum found for inode 257 start 401408
  [165305.004014] BTRFS info (device sdc): no csum found for inode 257 start 405504
  [165305.004559] BTRFS info (device sdc): no csum found for inode 257 start 409600
  [165305.005101] BTRFS info (device sdc): no csum found for inode 257 start 413696
  [165305.005627] BTRFS info (device sdc): no csum found for inode 257 start 417792
  [165305.006134] BTRFS info (device sdc): no csum found for inode 257 start 421888
  [165305.006625] BTRFS info (device sdc): no csum found for inode 257 start 425984
  [165305.007278] BTRFS info (device sdc): no csum found for inode 257 start 430080
  [165305.008248] BTRFS warning (device sdc): csum failed root 5 ino 257 off 393216 csum 0x1337385e expected csum 0x00000000 mirror 1
  [165305.009550] BTRFS warning (device sdc): csum failed root 5 ino 257 off 393216 csum 0x1337385e expected csum 0x00000000 mirror 1

Fix this simply by deleting first any checksums, from the log tree, for the
range of the extent we are logging at copy_items(). This ensures we do not
get checksum items in the log tree that have overlapping ranges.

This is a long time issue that has been present since we have the clone
(and deduplication) ioctl, and can happen both when an extent is shared
between different files and within the same file.

A test case for fstests follows soon.

CC: [email protected] # 4.4+
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: return error pointer from alloc_test_extent_buffer

Callers of alloc_test_extent_buffer have not correctly interpreted the
return value as error pointer, as alloc_test_extent_buffer should behave
as alloc_extent_buffer. The self-tests were unaffected but
btrfs_find_create_tree_block could call both functions and that would
cause problems up in the call chain.

Fixes: faa2dbf004e8 ("Btrfs: add sanity tests for new qgroup accounting code")
CC: [email protected] # 4.4+
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: fix devs_max constraints for raid1c3 and raid1c4

The value 0 for devs_max means to spread the allocated chunks over all
available devices, eg. stripe for RAID0 or RAID5. This got mistakenly
copied to the RAID1C3/4 profiles. The intention is to have exactly 3 and
4 copies respectively.

Fixes: 47e6f7423b91 ("btrfs: add support for 3-copy replication (raid1c3)")
Fixes: 8d6fac0087e5 ("btrfs: add support for 4-copy replication (raid1c4)")
Signed-off-by: David Sterba <[email protected]>

btrfs: tree-checker: Fix error format string for size_t

Argument BTRFS_FILE_EXTENT_INLINE_DATA_START is defined as offsetof(),
which returns type size_t, so we need %zu instead of %lu.

This fixes a build warning on 32-bit ARM:

  ../fs/btrfs/tree-checker.c: In function 'check_extent_data_item':
  ../fs/btrfs/tree-checker.c:230:43: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Wformat=]
    230 |     "invalid item size, have %u expect [%lu, %u)",
        |                                         ~~^
        |                                           long unsigned int
        |                                         %u

Fixes: 153a6d299956 ("btrfs: tree-checker: Check item size before reading file extent type")
Acked-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andreas Färber <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: don't double lock the subvol_sem for rename exchange

If we're rename exchanging two subvols we'll try to lock this lock
twice, which is bad. Just lock once if either of the ino's are subvols.

Fixes: cdd1fedf8261 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
CC: [email protected] # 4.4+
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: handle error in btrfs_cache_block_group

We have a BUG_ON(ret < 0) in find_free_extent from
btrfs_cache_block_group.  If we fail to allocate our ctl we'll just
panic, which is not good.  Instead just go on to another block group.
If we fail to find a block group we don't want to return ENOSPC, because
really we got a ENOMEM and that's the root of the problem.  Save our
return from btrfs_cache_block_group(), and then if we still fail to make
our allocation return that ret so we get the right error back.

Tested with inject-error.py from bcc.

Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: do not call synchronize_srcu() in inode_tree_del

Testing with the new fsstress uncovered a pretty nasty deadlock with
lookup and snapshot deletion.

Process A
unlink
-> final iput
   -> inode_tree_del
     -> synchronize_srcu(subvol_srcu)

Process B
btrfs_lookup  <- srcu_read_lock() acquired here
  -> btrfs_iget
    -> find inode that has I_FREEING set
      -> __wait_on_freeing_inode()

We're holding the srcu_read_lock() while doing the iget in order to make
sure our fs root doesn't go away, and then we are waiting for the inode
to finish freeing.  However because the free'ing process is doing a
synchronize_srcu() we deadlock.

Fix this by dropping the synchronize_srcu() in inode_tree_del().  We
don't need people to stop accessing the fs root at this point, we're
only adding our empty root to the dead roots list.

A larger much more invasive fix is forthcoming to address how we deal
with fs roots, but this fixes the immediate problem.

Fixes: 76dda93c6ae2 ("Btrfs: add snapshot/subvolume destroy ioctl")
CC: [email protected] # 4.4+
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

ARM: shmobile: defconfig: Restore debugfs support

Since commit 0e4a459f56c32d3e ("tracing: Remove unnecessary DEBUG_FS
dependency"), CONFIG_DEBUG_FS is no longer auto-enabled.  This breaks
booting Debian 9, as systemd needs debugfs:

    [FAILED] Failed to mount /sys/kernel/debug.
    See 'systemctl status sys-kernel-debug.mount' for details.
    [DEPEND] Dependency failed for Local File Systems.
    ...
    You are in emergGive root password for maintenance
    (or press Control-D to continue):

Fix this by enabling CONFIG_DEBUG_FS explicitly.

See also commit 18977008f44c66bd ("ARM: multi_v7_defconfig: Restore
debugfs support").

Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Niklas Söderlund <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Btrfs: fix cloning range with a hole when using the NO_HOLES feature

When using the NO_HOLES feature if we clone a range that contains a hole
and a temporary ENOSPC happens while dropping extents from the target
inode's range, we can end up failing and aborting the transaction with
-EEXIST or with a corrupt file extent item, that has a length greater
than it should and overlaps with other extents. For example when cloning
the following range from inode A to inode B:

  Inode A:

    extent A1                                          extent A2
  [ ----------- ]  [ hole, implicit, 4MB length ]  [ ------------- ]
  0            1MB                                 5MB            6MB

  Range to clone: [1MB, 6MB)

  Inode B:

    extent B1       extent B2        extent B3         extent B4
  [ ---------- ]  [ --------- ]    [ ---------- ]    [ ---------- ]
  0           1MB 1MB        2MB   2MB        5MB    5MB         6MB

  Target range: [1MB, 6MB) (same as source, to make it easier to explain)

The following can happen:

1) btrfs_punch_hole_range() gets -ENOSPC from __btrfs_drop_extents();

2) At that point, 'cur_offset' is set to 1MB and __btrfs_drop_extents()
   set 'drop_end' to 2MB, meaning it was able to drop only extent B2;

3) We then compute 'clone_len' as 'drop_end' - 'cur_offset' = 2MB - 1MB =
   1MB;

4) We then attempt to insert a file extent item at inode B with a file
   offset of 5MB, which is the value of clone_info->file_offset. This
   fails with error -EEXIST because there's already an extent at that
   offset (extent B4);

5) We abort the current transaction with -EEXIST and return that error
   to user space as well.

Another example, for extent corruption:

  Inode A:

    extent A1                                           extent A2
  [ ----------- ]   [ hole, implicit, 10MB length ]  [ ------------- ]
  0            1MB                                  11MB            12MB

  Inode B:

    extent B1         extent B2
  [ ----------- ]   [ --------- ]    [ ----------------------------- ]
  0            1MB 1MB         5MB  5MB                             12MB

  Target range: [1MB, 12MB) (same as source, to make it easier to explain)

1) btrfs_punch_hole_range() gets -ENOSPC from __btrfs_drop_extents();

2) At that point, 'cur_offset' is set to 1MB and __btrfs_drop_extents()
   set 'drop_end' to 5MB, meaning it was able to drop only extent B2;

3) We then compute 'clone_len' as 'drop_end' - 'cur_offset' = 5MB - 1MB =
   4MB;

4) We then insert a file extent item at inode B with a file offset of 11MB
   which is the value of clone_info->file_offset, and a length of 4MB (the
   value of 'clone_len'). So we get 2 extents items with ranges that
   overlap and an extent length of 4MB, larger then the extent A2 from
   inode A (1MB length);

5) After that we end the transaction, balance the btree dirty pages and
   then start another or join the previous transaction. It might happen
   that the transaction which inserted the incorrect extent was committed
   by another task so we end up with extent corruption if a power failure
   happens.

So fix this by making sure we attempt to insert the extent to clone at
the destination inode only if we are past dropping the sub-range that
corresponds to a hole.

Fixes: 690a5dbfc51315 ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents")
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: Fix error messages in qgroup_rescan_init

The branch of qgroup_rescan_init which is executed from the mount
path prints wrong errors messages. The textual print out in case
BTRFS_QGROUP_STATUS_FLAG_RESCAN/BTRFS_QGROUP_STATUS_FLAG_ON are not
set are transposed. Fix it by exchanging their place.

Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

powerpc: Ensure that swiotlb buffer is allocated from low memory

Some powerpc platforms (e.g. 85xx) limit DMA-able memory way below 4G.
If a system has more physical memory than this limit, the swiotlb
buffer is not addressable because it is allocated from memblock using
top-down mode.

Force memblock to bottom-up mode before calling swiotlb_init() to
ensure that the swiotlb buffer is DMA-able.

Reported-by: Christian Zigotzky <[email protected]>
Signed-off-by: Mike Rapoport <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/shared: Use static key to detect shared processor

With the static key shared processor available, is_shared_processor()
can return without having to query the lppaca structure.

Signed-off-by: Srikar Dronamraju <[email protected]>
Acked-by: Phil Auld <[email protected]>
Acked-by: Waiman Long <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

powerpc/vcpu: Assume dedicated processors as non-preempt

With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on
pre-empted vCPUs"), the scheduler avoids preempted vCPUs to schedule
tasks on wakeup. This leads to wrong choice of CPU, which in-turn
leads to larger wakeup latencies. Eventually, it leads to performance
regression in latency sensitive benchmarks like soltp, schbench etc.

On Powerpc, vcpu_is_preempted() only looks at yield_count. If the
yield_count is odd, the vCPU is assumed to be preempted. However
yield_count is increased whenever the LPAR enters CEDE state (idle).
So any CPU that has entered CEDE state is assumed to be preempted.

Even if vCPU of dedicated LPAR is preempted/donated, it should have
right of first-use since they are supposed to own the vCPU.

On a Power9 System with 32 cores:
  # lscpu
  Architecture:        ppc64le
  Byte Order:          Little Endian
  CPU(s):              128
  On-line CPU(s) list: 0-127
  Thread(s) per core:  8
  Core(s) per socket:  1
  Socket(s):           16
  NUMA node(s):        2
  Model:               2.2 (pvr 004e 0202)
  Model name:          POWER9 (architected), altivec supported
  Hypervisor vendor:   pHyp
  Virtualization type: para
  L1d cache:           32K
  L1i cache:           32K
  L2 cache:            512K
  L3 cache:            10240K
  NUMA node0 CPU(s):   0-63
  NUMA node1 CPU(s):   64-127

  # perf stat -a -r 5 ./schbench
  v5.4                               v5.4 + patch
  Latency percentiles (usec)         Latency percentiles (usec)
        50.0000th: 45                      50.0th: 45
        75.0000th: 62                      75.0th: 63
        90.0000th: 71                      90.0th: 74
        95.0000th: 77                      95.0th: 78
        *99.0000th: 91                     *99.0th: 82
        99.5000th: 707                     99.5th: 83
        99.9000th: 6920                    99.9th: 86
        min=0, max=10048                   min=0, max=96
  Latency percentiles (usec)         Latency percentiles (usec)
        50.0000th: 45                      50.0th: 46
        75.0000th: 61                      75.0th: 64
        90.0000th: 72                      90.0th: 75
        95.0000th: 79                      95.0th: 79
        *99.0000th: 691                    *99.0th: 83
        99.5000th: 3972                    99.5th: 85
        99.9000th: 8368                    99.9th: 91
        min=0, max=16606                   min=0, max=117
  Latency percentiles (usec)         Latency percentiles (usec)
        50.0000th: 45                      50.0th: 46
        75.0000th: 61                      75.0th: 64
        90.0000th: 71                      90.0th: 75
        95.0000th: 77                      95.0th: 79
        *99.0000th: 106                    *99.0th: 83
        99.5000th: 2364                    99.5th: 84
        99.9000th: 7480                    99.9th: 90
        min=0, max=10001                   min=0, max=95
  Latency percentiles (usec)         Latency percentiles (usec)
        50.0000th: 45                      50.0th: 47
        75.0000th: 62                      75.0th: 65
        90.0000th: 72                      90.0th: 75
        95.0000th: 78                      95.0th: 79
        *99.0000th: 93                     *99.0th: 84
        99.5000th: 108                     99.5th: 85
        99.9000th: 6792                    99.9th: 90
        min=0, max=17681                   min=0, max=117
  Latency percentiles (usec)         Latency percentiles (usec)
        50.0000th: 46                      50.0th: 45
        75.0000th: 62                      75.0th: 64
        90.0000th: 73                      90.0th: 75
        95.0000th: 79                      95.0th: 79
        *99.0000th: 113                    *99.0th: 82
        99.5000th: 2724                    99.5th: 83
        99.9000th: 6184                    99.9th: 93
        min=0, max=9887                    min=0, max=111

   Performance counter stats for 'system wide' (5 runs):

  context-switches    43,373  ( +-  0.40% )   44,597 ( +-  0.55% )
  cpu-migrations       1,211  ( +-  5.04% )      220 ( +-  6.23% )
  page-faults         15,983  ( +-  5.21% )   15,360 ( +-  3.38% )

Waiman Long suggested using static_keys.

Fixes: 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted vCPUs")
Cc: [email protected] # v4.18+
Reported-by: Parth Shah <[email protected]>
Reported-by: Ihor Pasichnyk <[email protected]>
Tested-by: Juri Lelli <[email protected]>
Acked-by: Waiman Long <[email protected]>
Reviewed-by: Gautham R. Shenoy <[email protected]>
Signed-off-by: Srikar Dronamraju <[email protected]>
Acked-by: Phil Auld <[email protected]>
Reviewed-by: Vaidyanathan Srinivasan <[email protected]>
Tested-by: Parth Shah <[email protected]>
[mpe: Move the key and setting of the key to pseries/setup.c]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

pinctrl: Modify Kconfig to fix linker error

Fix below linker error

    ld: drivers/pinctrl/pinctrl-equilibrium.o: in function
    `pinconf_generic_dt_node_to_map_all':
    pinctrl-equilibrium.c:(.text+0xb): undefined reference
    to `pinconf_generic_dt_node_to_map'

Caused by below commit

    1948d5c51dba ("pinctrl: Add pinmux & GPIO controller driver for a new SoC")

by adding 'depends on OF' in Kconfig driver entry.

Reported-by: Randy Dunlap <[email protected]>>
Signed-off-by: Rahul Tanwar <[email protected]>
Link: https://lore.kernel.org/r/ba937f271d1a2173828a2325990d62cb36d61595.1575514110.git.rahul.tanwar@linux.intel.com
Acked-by: Randy Dunlap <[email protected]> # build-tested
Signed-off-by: Linus Walleij <[email protected]>

Merge tag 'intel-pinctrl-v5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/intel into fixes

intel-pinctrl for v5.5-2

* Fix Baytrail silicon issue by using a global lock
* Fix North community pin names that user will assume their functions
* Convert Cherryview and Baytrail to pass IRQ chip along with GPIO one

The following is an automated git shortlog grouped by driver:

baytrail:
-  Pass irqchip when adding gpiochip
-  Add GPIO <-> pin mapping ranges via callback
-  Update North Community pin list
-  Really serialize all register accesses

cherryview:
-  Pass irqchip when adding gpiochip
-  Add GPIO <-> pin mapping ranges via callback
-  Split out irq hw-init into a separate helper function

pinctrl: pinmux: fix a possible null pointer in pinmux_can_be_used_for_gpio

This commit adds a check on ops pointer to avoid a kernel panic when
ops->strict is used. Indeed, on some pinctrl driver (at least for
pinctrl-stmfx) the pinmux ops is not implemented. Let's assume than gpio
can be used in this case.

Fixes: 472a61e777fe ("pinctrl/gpio: Take MUX usage into account")
Signed-off-by: Alexandre Torgue <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

mac80211: Turn AQL into an NL80211_EXT_FEATURE

Instead of just having an airtime flag in debugfs, turn AQL into a proper
NL80211_EXT_FEATURE, so drivers can turn it on when they are ready, and so
we also expose the presence of the feature to userspace.

This also has the effect of flipping the default, so drivers have to opt in
to using AQL instead of getting it by default with TXQs. To keep
functionality the same as pre-patch, we set this feature for ath10k (which
is where it is needed the most).

While we're at it, split out the debugfs interface so AQL gets its own
per-station debugfs file instead of using the 'airtime' file.

[Johannes:]
This effectively disables AQL for iwlwifi, where it fixes a number of
issues:
* TSO in iwlwifi is causing underflows and associated warnings in AQL
* HE (802.11ax) rates aren't reported properly so at HE rates, AQL could
never have a valid estimate (it'd use 6 Mbps instead of up to 2400!)

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Fixes: 3ace10f5b5ad ("mac80211: Implement Airtime-based Queue Limit (AQL)")
Signed-off-by: Johannes Berg <[email protected]>

Merge branches 'pm-cpuidle' and 'acpi-pm'

* pm-cpuidle:
  cpuidle: Drop unnecessary type cast in cpuidle_poll_time()
  cpuidle: Fix cpuidle_driver_state_disabled()
  cpuidle: use first valid target residency as poll time

* acpi-pm:
  ACPI: PM: Avoid attaching ACPI PM domain to certain devices

mac80211: airtime: Fix an off by one in ieee80211_calc_rx_airtime()

This code was copied from mt76 and inherited an off by one bug from
there. The > should be >= so that we don't read one element beyond
the end of the array.

Fixes: db3e1c40cf2f ("mac80211: Import airtime calculation code from mt76")
Reported-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Toke Høiland-Jørgensen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

cfg80211: fix double-free after changing network namespace

If wdev->wext.keys was initialized it didn't get reset to NULL on
unregister (and it doesn't get set in cfg80211_init_wdev either), but
wdev is reused if unregister was triggered through
cfg80211_switch_netns.

The next unregister (for whatever reason) will try to free
wdev->wext.keys again.

Signed-off-by: Stefan Bühler <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

mac80211: fix TID field in monitor mode transmit

Fix overwriting of the qos_ctrl.tid field for encrypted frames injected on
a monitor interface. While qos_ctrl.tid is not encrypted, it's used as an
input into the encryption algorithm so it's protected, and thus cannot be
modified after encryption. For injected frames, the encryption may already
have been done in userspace, so we cannot change any fields.

Before passing the frame to the driver, the qos_ctrl.tid field is updated
from skb->priority. Prior to dbd50a851c50 skb->priority was updated in
ieee80211_select_queue_80211(), but this function is no longer always
called.

Update skb->priority in ieee80211_monitor_start_xmit() so that the value
is stored, and when later code 'modifies' the TID it really sets it to
the same value as before, preserving the encryption.

Fixes: dbd50a851c50 ("mac80211: only allocate one queue when using iTXQs")
Signed-off-by: Fredrik Olofsson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[rewrite commit message based on our discussion]
Signed-off-by: Johannes Berg <[email protected]>

xen/balloon: fix ballooned page accounting without hotplug enabled

When CONFIG_XEN_BALLOON_MEMORY_HOTPLUG is not defined
reserve_additional_memory() will set balloon_stats.target_pages to a
wrong value in case there are still some ballooned pages allocated via
alloc_xenballooned_pages().

This will result in balloon_process() no longer be triggered when
ballooned pages are freed in batches.

Reported-by: Nicholas Tsirakis <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

xen-blkback: prevent premature module unload

Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem
cache. This cache is destoyed when xen-blkif is unloaded so it is
necessary to wait for the deferred free routine used for such objects to
complete. This necessity was missed in commit 14855954f636 "xen-blkback:
allow module to be cleanly unloaded". This patch fixes the problem by
taking/releasing extra module references in xen_blkif_alloc/free()
respectively.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: Roger Pau Monné <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>

CIFS: Close cached root handle only if it has a lease

SMB2_tdis() checks if a root handle is valid in order to decide
whether it needs to close the handle or not. However if another
thread has reference for the handle, it may end up with putting
the reference twice. The extra reference that we want to put
during the tree disconnect is the reference that has a directory
lease. So, track the fact that we have a directory lease and
close the handle only in that case.

Signed-off-by: Pavel Shilovsky <[email protected]>
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>

SMB3: Fix crash in SMB2_open_init due to uninitialized field in compounding path

Ran into an intermittent crash in
SMB2_open_init+0x2f6/0x970
due to oparms.cifs_sb not being initialized when called from:
smb2_compound_op+0x45d/0x1690
Zero the whole oparms struct in the compounding path before setting up the
oparms so we don't risk any uninitialized fields.

Fixes: fdef665ba44a ("smb3: fix mode passed in on create for modetosid mount option")
Signed-off-by: Steve French <[email protected]>
Acked-by: Ronnie Sahlberg <[email protected]>
Reviewed-by: Pavel Shilovsky <[email protected]>

Merge tag 'drm-fixes-5.5-2019-12-12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

drm-fixes-5.5-2019-12-12:

amdgpu:
- DC fixes for renoir
- Gfx8 fence flush align with mesa
- Power profile fix for arcturus
- Freesync fix
- DC I2c over aux fix
- DC aux defer fix
- GPU reset fix
- GPUVM invalidation semaphore fixes for PCO and SR-IOV
- Golden settings updates for gfx10

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-intel-fixes-2019-12-12' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Fix user reported issue #673: GPU hang on transition to idle
- Avoid corruption on the top of the screen on GLK+ by disabling FBC
- Fix non-privileged access to OA on Tigerlake
- Fix HDCP code not to touch global state when just computing commit
- Fix CI splat by saving irqstate around virtual_context_destroy
- Serialise context retirement possibly on another CPU

Signed-off-by: Dave Airlie <[email protected]>
From: Joonas Lahtinen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-misc-next-fixes-2019-12-12' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

-mgag200: more startadd mitigation (Thomas)
-panfrost: devfreq fix + several memory fixes (Steven, Boris)

Cc: Boris Brezillon <[email protected]>
Cc: Steven Price <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
From: Sean Paul <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20191212140145.GA145200@art_vandelay

drm/i915/gvt: Pin vgpu dma address before using

Dma-buf display uses the vgpu dma address saved in the guest part GGTT
table which is updated by vCPU thread. In host side, when the dma
address is used by qemu ui thread, gvt-g must make sure the dma address
is validated before letting it go to the HW. Invalid guest dma address
will easily cause DMA fault and make GPU hang.

v2: Rebase

Fixes: e546e281d33d ("drm/i915/gvt: Dmabuf support for GVT-g")
Acked-by: Zhenyu Wang <[email protected]>
Signed-off-by: Tina Zhang <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/i915/gvt: set guest display buffer as readonly

We shouldn't allow write for exposed guest display buffer which
doesn't make sense. So explicitly set read only flag for display
dmabuf allocated object.

Fixes: e546e281d33d ("drm/i915/gvt: Dmabuf support for GVT-g")
Cc: Tina Zhang <[email protected]>
Acked-by: Tina Zhang <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'imx-clk-fixes-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into clk-fixes

Pull i.MX clk fixes from Shawn Guo:

- Add missing lock to divider in the composite driver for exclusive
   register access
- Add missing sentinel for ulp_div_table in clk-imx7ulp driver
- Fix clk_pll14xx_wait_lock() function which calls into
   readl_poll_timeout() with incorrect parameter

* tag 'imx-clk-fixes-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  clk: imx: pll14xx: fix clk_pll14xx_wait_lock
  clk: imx: clk-imx7ulp: Add missing sentinel of ulp_div_table
  clk: imx: clk-composite-8m: add lock to gate/mux

clk: walk orphan list on clock provider registration

So far, we walked the orphan list every time a new clock was registered
in CCF. This was fine since the clocks were only referenced by name.

Now that the clock can be referenced through DT, it is not enough:
* Controller A register first a reference clocks from controller B
through DT.
* Controller B register all its clocks then register the provider.

Each time controller B registers a new clock, the orphan list is walked
but it can't match since the provider is registered yet. When the
provider is finally registered, the orphan list is not walked unless
another clock is registered afterward.

This can lead to situation where some clocks remain orphaned even if
the parent is available.

Walking the orphan list on provider registration solves the problem.

Reported-by: Jian Hu <[email protected]>
Fixes: fc0c209c147f ("clk: Allow parents to be specified without string names")
Signed-off-by: Jerome Brunet <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Stephen Boyd <[email protected]>

of/platform: Unconditionally pause/resume sync state during kernel init

Commit 5e6669387e22 ("of/platform: Pause/resume sync state during init
and of_platform_populate()") paused/resumed sync state during init only
if Linux had parsed and populated a devicetree.

However, the check for that (of_have_populated_dt()) can change after
of_platform_default_populate_init() executes. One example of this is
when devicetree unittests are enabled. This causes an unmatched
pause/resume of sync state. To avoid this, just unconditionally
pause/resume sync state during init.

Fixes: 5e6669387e22 ("of/platform: Pause/resume sync state during init and of_platform_populate()")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Saravana Kannan <[email protected]>
Reviewed-by: Frank Rowand <[email protected]>
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: memory-controllers: tegra: Fix type references

Json-schema requires a $ref to be under an 'allOf' if there are
additional constraints otherwise the additional constraints are
ignored. (Note that this behavior will be changed in draft8.)

Fixes: 641262f5e1ed ("dt-bindings: memory: Add binding for NVIDIA Tegra30 External Memory Controller")
Fixes: 785685b7a106 ("dt-bindings: memory: Add binding for NVIDIA Tegra30 Memory Controller")
Fixes: 8da65c377b21 ("dt-bindings: memory: tegra30: Convert to Tegra124 YAML")
Cc: Thierry Reding <[email protected]>
Cc: Jonathan Hunter <[email protected]>
Cc: [email protected]
Reviewed-by: Dmitry Osipenko <[email protected]>
Signed-off-by: Rob Herring <[email protected]>

dt-bindings: Change maintainer address

While my email address has changed for a while, all the schemas I
contributed still have the old one unfortunately. Update it.

Signed-off-by: Maxime Ripard <[email protected]>
Signed-off-by: Rob Herring <[email protected]>

IB/mlx5: Fix device memory flows

Fix device memory flows so that only once there will be no live mmaped
VA to a given allocation the matching object will be destroyed.

This prevents a potential scenario that existing VA that was mmaped by
one process might still be used post its deallocation despite that it's
owned now by other process.

The above is achieved by integrating with IB core APIs to manage
mmap/munmap. Only once the refcount will become 0 the DM object and its
underlay area will be freed.

Fixes: 3b113a1ec3d4 ("IB/mlx5: Support device memory type attribute")
Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Doug Ledford <[email protected]>

IB/core: Introduce rdma_user_mmap_entry_insert_range() API

Introduce rdma_user_mmap_entry_insert_range() API to be used once the
required key for the given entry should be in a given range.

Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Doug Ledford <[email protected]>

KEYS: asymmetric: return ENOMEM if akcipher_request_alloc() fails

No error code was being set on this error path.

Cc: [email protected]
Fixes: ad4b1eb5fb33 ("KEYS: asym_tpm: Implement encryption operation [ver #2]")
Fixes: c08fed737126 ("KEYS: Implement encrypt, decrypt and sign for software asymmetric key [ver #2]")
Reviewed-by: James Morris <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

KEYS: remove CONFIG_KEYS_COMPAT

KEYS_COMPAT now always takes the value of COMPAT && KEYS. But the
security/keys/ directory is only compiled if KEYS is enabled, so in
practice KEYS_COMPAT is the same as COMPAT. Therefore, remove the
unnecessary KEYS_COMPAT and just use COMPAT directly.

(Also remove an outdated comment from compat.c.)

Reviewed-by: James Morris <[email protected]>
Signed-off-by: Eric Biggers <[email protected]>
Reviewed-by: Jarkko Sakkinen <[email protected]>
Tested-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Jarkko Sakkinen <[email protected]>

PCI: rockchip: Fix IO outbound ATU register number

Since 62240a88004b ("PCI: rockchip: Drop storing driver private outbound
resource data), the offset calculation is wrong to access the register
number to program the IO outbound ATU.

Fix this by computing the ATU IO register number based on the number of MEM
registers, not the size of the IO region.

This causes 'synchronous external aborts' like the following:

  mwifiex_pcie 0000:01:00.0: enabling device (0000 -> 0002)
  mwifiex_pcie: PCI memory map Virt0: 00000000a573ad00 PCI memory map Virt2: 00000000783126c4
  Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP
  Modules linked in: mwifiex_pcie(+) mwifiex uvcvideo cfg80211 atmel_mxt_ts videobuf2_vmalloc ...
  CPU: 2 PID: 269 Comm: systemd-udevd Not tainted 5.4.0+ #327
  Hardware name: Google Kevin (DT)
  pstate: 60000005 (nZCv daif -PAN -UAO)
  pc : mwifiex_register_dev+0x264/0x3f8 [mwifiex_pcie]
  lr : mwifiex_register_dev+0x150/0x3f8 [mwifiex_pcie]
  sp : ffff800012073860
  x29: ffff800012073860 x28: ffff8000100a2e28
  x27: ffff8000118b6210 x26: ffff800008f57458
  x25: ffff0000ecfda000 x24: 0000000000000001
  x23: ffff0000e9905080 x22: ffff800008f5d000
  x21: ffff0000eecea078 x20: ffff0000e9905080
  x19: ffff0000eecea000 x18: 0000000000000001
  x17: 0000000000000000 x16: 0000000000000000
  x15: ffffffffffffffff x14: ffff8000118998c8
  x13: ffff000000000000 x12: 0000000000000008
  x11: 0101010101010101 x10: ffff7f7fffff7fff
  x9 : 0000000000000000 x8 : ffff0000e3c24240
  x7 : 0000000000000000 x6 : ffff0000e3c24148
  x5 : ffff0000e3c24148 x4 : ffff0000e7975ec8
  x3 : 0000000000000001 x2 : 0000000000002b42
  x1 : ffff800012c00008 x0 : ffff0000e9905080
  Call trace:
   mwifiex_register_dev+0x264/0x3f8 [mwifiex_pcie]
   mwifiex_add_card+0x2f8/0x430 [mwifiex]
   mwifiex_pcie_probe+0x98/0x148 [mwifiex_pcie]
   pci_device_probe+0x110/0x1a8
   ...
  Code: a8c67bfd d65f03c0 f942ac01 91002021 (b9400021)

Suggested-by: Lorenzo Pieralisi <[email protected]>
Fixes: 62240a88004b ("PCI: rockchip: Drop storing driver private outbound resource data)
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Enric Balletbo i Serra <[email protected]>
Reported-by: Vicente Bergas <[email protected]>
Tested-by: Vicente Bergas <[email protected]>
Signed-off-by: Enric Balletbo i Serra <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Andrew Murray <[email protected]>

drm/amdgpu: add invalidate semaphore limit for SRIOV in gmc10

It may fail to load guest driver in round 2 when using invalidate
semaphore for SRIOV. So it needs to avoid using invalidate semaphore
for SRIOV.

Signed-off-by: changzhu <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: add invalidate semaphore limit for SRIOV and picasso in gmc9

It may fail to load guest driver in round 2 or cause Xstart problem
when using invalidate semaphore for SRIOV or picasso. So it needs avoid
using invalidate semaphore for SRIOV and picasso.

Signed-off-by: changzhu <[email protected]>
Reviewed-by: Christian König <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

IB/mlx5: Fix steering rule of drop and count

There are two flow rule destinations: QP and packet. While users are
setting DROP packet rule, the QP should not be set as a destination.

Fixes: 3b3233fbf02e ("IB/mlx5: Add flow counters binding support")
Signed-off-by: Maor Gottlieb <[email protected]>
Reviewed-by: Raed Salem <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx4: Follow mirror sequence of device add during device removal

Current code device add sequence is:

ib_register_device()
ib_mad_init()
init_sriov_init()
register_netdev_notifier()

Therefore, the remove sequence should be,

unregister_netdev_notifier()
close_sriov()
mad_cleanup()
ib_unregister_device()

However it is not above.
Hence, make do above remove sequence.

Fixes: fa417f7b520ee ("IB/mlx4: Add support for IBoE")
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Doug Ledford <[email protected]>

RDMA/counter: Prevent auto-binding a QP which are not tracked with res

Some QPs (e.g. XRC QP) are not tracked in kernel, in this case they have
an invalid res and should not be bound to any dynamically-allocated
counter in auto mode.

This fixes below call trace:
BUG: kernel NULL pointer dereference, address: 0000000000000390
PGD 80000001a7233067 P4D 80000001a7233067 PUD 1a7215067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 2 PID: 24822 Comm: ibv_xsrq_pingpo Not tainted 5.4.0-rc5+ #21
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
RIP: 0010:rdma_counter_bind_qp_auto+0x142/0x270 [ib_core]
Code: e1 48 85 c0 48 89 c2 0f 84 bc 00 00 00 49 8b 06 48 39 42 48 75 d6 40 3a aa 90 00 00 00 75 cd 49 8b 86 00 01 00 00 48 8b 4a 28 <8b> 80 90 03 00 00 39 81 90 03 00 00 75 b4 85 c0 74 b0 48 8b 04 24
RSP: 0018:ffffc900003f39c0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff88820020ec00 RSI: 0000000000000004 RDI: ffffffffffffffc0
RBP: 0000000000000001 R08: ffff888224149ff0 R09: ffffc900003f3968
R10: ffffffffffffffff R11: ffff8882249c5848 R12: ffffffffffffffff
R13: ffff88821d5aca50 R14: ffff8881f7690800 R15: ffff8881ff890000
FS: 00007fe53a3e1740(0000) GS:ffff888237b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000390 CR3: 00000001a7292006 CR4: 00000000003606a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
_ib_modify_qp+0x3a4/0x3f0 [ib_core]
? lookup_get_idr_uobject.part.8+0x23/0x40 [ib_uverbs]
modify_qp+0x322/0x3e0 [ib_uverbs]
ib_uverbs_modify_qp+0x43/0x70 [ib_uverbs]
ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb1/0xf0 [ib_uverbs]
ib_uverbs_run_method+0x6be/0x760 [ib_uverbs]
? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
ib_uverbs_cmd_verbs+0x18d/0x3a0 [ib_uverbs]
? get_acl+0x1a/0x120
? __alloc_pages_nodemask+0x15d/0x2c0
ib_uverbs_ioctl+0xa7/0x110 [ib_uverbs]
do_vfs_ioctl+0xa5/0x610
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x48/0x110
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 99fa331dc862 ("RDMA/counter: Add "auto" configuration mode support")
Signed-off-by: Mark Zhang <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Reviewed-by: Ido Kalir <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Doug Ledford <[email protected]>

qede: Fix multicast mac configuration

Driver doesn't accommodate the configuration for max number
of multicast mac addresses, in such particular case it leaves
the device with improper/invalid multicast configuration state,
causing connectivity issues (in lacp bonding like scenarios).

Signed-off-by: Manish Chopra <[email protected]>
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: usb: lan78xx: Fix suspend/resume PHY register access error

Lan78xx driver accesses the PHY registers through MDIO bus over USB
connection. When performing a suspend/resume, the PHY registers can be
accessed before the USB connection is resumed. This will generate an
error and will prevent the device to resume correctly.
This patch adds the dependency between the MDIO bus and USB device to
allow correct handling of suspend/resume.

Fixes: ce85e13ad6ef ("lan78xx: Update to use phylib instead of mii_if_info.")
Signed-off-by: Cristian Birsan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'ceph-for-5.5-rc2' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
"A fix to avoid a corner case when scheduling cap reclaim in batches
  from Xiubo, a patch to add some observability into cap waiters from
  Jeff and a couple of cleanups"

* tag 'ceph-for-5.5-rc2' of git://github.com/ceph/ceph-client:
  ceph: add more debug info when decoding mdsmap
  ceph: switch to global cap helper
  ceph: trigger the reclaim work once there has enough pending caps
  ceph: show tasks waiting on caps in debugfs caps file
  ceph: convert int fields in ceph_mount_options to unsigned int

fs: remove ksys_dup()

ksys_dup() is used only at one place in the kernel, namely to duplicate
fd 0 of /dev/console to stdout and stderr. The same functionality can be
achieved by using functions already available within the kernel namespace.

Signed-off-by: Dominik Brodowski <[email protected]>

init: unify opening /dev/console as stdin/stdout/stderr

Merge the two instances where /dev/console is opened as
stdin/stdout/stderr.

Signed-off-by: Dominik Brodowski <[email protected]>

Merge tag 'imx-fixes-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes

i.MX fixes for 5.5:
- Add missing jedec,spi-nor compatible for imx6ul-14x14-evk board,
   so that SPI NOR device can be probed.
- Fix power button of E60K02 board by removing LDORTC2 regulator.
- A couple of fixes on serial number support of i.MX6ULL/ULZ SoCs to
   remove the boot regression caused by 8267ff89b713 ("ARM: imx: Add
   serial number support for i.MX6/7 SoCs").
- A couple of fixes on LS1028A SoC TMU regarding to calibration data
   and reboot register configuration.
- Fix a regression seen on imx6ul-evk board by marking always-on for
   the regulator that is shared by many peripherals.
- Explicitly restore CONFIG_DEBUG_FS in imx_v6_v7_defconfig.

* tag 'imx-fixes-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  ARM: imx: Fix boot crash if ocotp is not found
  ARM: imx_v6_v7_defconfig: Explicitly restore CONFIG_DEBUG_FS
  ARM: dts: imx6ul-evk: Fix peripheral regulator
  arm64: dts: ls1028a: fix reboot node
  arm64: dts: ls1028a: fix typo in TMU calibration data
  ARM: imx: Correct ocotp id for serial number support of i.MX6ULL/ULZ SoCs
  ARM: dts: e60k02: fix power button
  ARM: dts: imx6ul: imx6ul-14x14-evk.dtsi: Fix SPI NOR probing

Link: https://lore.kernel.org/r/20191212122427.GK15858@dragon
Signed-off-by: Olof Johansson <[email protected]>

cpufreq: Avoid leaving stale IRQ work items during CPU offline

The scheduler code calling cpufreq_update_util() may run during CPU
offline on the target CPU after the IRQ work lists have been flushed
for it, so the target CPU should be prevented from running code that
may queue up an IRQ work item on it at that point.

Unfortunately, that may not be the case if dvfs_possible_from_any_cpu
is set for at least one cpufreq policy in the system, because that
allows the CPU going offline to run the utilization update callback
of the cpufreq governor on behalf of another (online) CPU in some
cases.

If that happens, the cpufreq governor callback may queue up an IRQ
work on the CPU running it, which is going offline, and the IRQ work
may not be flushed after that point.  Moreover, that IRQ work cannot
be flushed until the "offlining" CPU goes back online, so if any
other CPU calls irq_work_sync() to wait for the completion of that
IRQ work, it will have to wait until the "offlining" CPU is back
online and that may not happen forever.  In particular, a system-wide
deadlock may occur during CPU online as a result of that.

The failing scenario is as follows.  CPU0 is the boot CPU, so it
creates a cpufreq policy and becomes the "leader" of it
(policy->cpu).  It cannot go offline, because it is the boot CPU.
Next, other CPUs join the cpufreq policy as they go online and they
leave it when they go offline.  The last CPU to go offline, say CPU3,
may queue up an IRQ work while running the governor callback on
behalf of CPU0 after leaving the cpufreq policy because of the
dvfs_possible_from_any_cpu effect described above.  Then, CPU0 is
the only online CPU in the system and the stale IRQ work is still
queued on CPU3.  When, say, CPU1 goes back online, it will run
irq_work_sync() to wait for that IRQ work to complete and so it
will wait for CPU3 to go back online (which may never happen even
in principle), but (worse yet) CPU0 is waiting for CPU1 at that
point too and a system-wide deadlock occurs.

To address this problem notice that CPUs which cannot run cpufreq
utilization update code for themselves (for example, because they
have left the cpufreq policies that they belonged to), should also
be prevented from running that code on behalf of the other CPUs that
belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so
in that case the cpufreq_update_util_data pointer of the CPU running
the code must not be NULL as well as for the CPU which is the target
of the cpufreq utilization update in progress.

Accordingly, change cpufreq_this_cpu_can_update() into a regular
function in kernel/sched/cpufreq.c (instead of a static inline in a
header file) and make it check the cpufreq_update_util_data pointer
of the local CPU if dvfs_possible_from_any_cpu is set for the target
cpufreq policy.

Also update the schedutil governor to do the
cpufreq_this_cpu_can_update() check in the non-fast-switch
case too to avoid the stale IRQ work issues.

Fixes: 99d14d0e16fa ("cpufreq: Process remote callbacks from any CPU if the platform permits")
Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/
Reported-by: Anson Huang <[email protected]>
Tested-by: Anson Huang <[email protected]>
Cc: 4.14+ <[email protected]> # 4.14+
Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
Tested-by: Peng Fan <[email protected]> (i.MX8QXP-MEK)
Signed-off-by: Rafael J. Wysocki <[email protected]>

cpuidle: Drop unnecessary type cast in cpuidle_poll_time()

The data type of the target_residency_ns field in struct cpuidle_state
is u64, so it does not need to be cast into u64.

Get rid of the unnecessary type cast.

Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Daniel Lezcano <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

blk-cgroup: remove blkcg_drain_queue

Since blk_drain_queue had already been removed, so this function
is not needed anymore.

Signed-off-by: Guoqing Jiang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

KVM: arm/arm64: Properly handle faulting of device mappings

A device mapping is normally always mapped at Stage-2, since there
is very little gain in having it faulted in.

Nonetheless, it is possible to end-up in a situation where the device
mapping has been removed from Stage-2 (userspace munmaped the VFIO
region, and the MMU notifier did its job), but present in a userspace
mapping (userpace has mapped it back at the same address). In such
a situation, the device mapping will be demand-paged as the guest
performs memory accesses.

This requires to be careful when dealing with mapping size, cache
management, and to handle potential execution of a device mapping.

Reported-by: Alexandru Elisei <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Tested-by: Alexandru Elisei <[email protected]>
Reviewed-by: James Morse <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]

bus: ti-sysc: Fix missing reset delay handling

We have dts property for "ti,sysc-delay-us", and we're using it, but the
wait after OCP softreset only happens if devices are probed in legacy mode.

Let's add a delay after writing the OCP softreset when specified.

Fixes: e0db94fe87da ("bus: ti-sysc: Make OCP reset work for sysstatus and sysconfig reset bits")
Cc: Keerthy <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>

pinctrl: aspeed-g6: Fix LPC/eSPI mux configuration

Early revisions of the AST2600 datasheet are conflicted about the state
of the LPC/eSPI strapping bit (SCU510[6]). Conversations with ASPEED
determined that the reference pinmux configuration tables were in error
and the SCU documentation contained the correct configuration. Update
the driver to reflect the state described in the SCU documentation.

Fixes: 2eda1cdec49f ("pinctrl: aspeed: Add AST2600 pinmux support")
Signed-off-by: Andrew Jeffery <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>

block: fix NULL pointer dereference in account statistics with IDE

The IDE driver creates some passthru requests which never get
submitted to the block layer in such a way that blk_account_io_start()
gets called. However, the driver still calls __blk_mq_end_request() in
ide_end_rq() which will call blk_account_io_completion() which tries
to dereferences req->part which is never set. See ide_prep_sense() for
an example of where these requests come from.

To fix this, blk_account_io_completion() and blk_account_io_done()
should do nothing if req->part is not set.

The back trace of this bug is:

    BUG: kernel NULL pointer dereference, address: 000002ac
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    *pde = 00000000
    Oops: 0002 [#1]
    CPU: 0 PID: 237 Comm: kworker/0:1H Not tainted
    5.4.0-rc2-00011-g48d9b0d43105e #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1
    04/01/2014
    Workqueue: kblockd drive_rq_insert_work
    EIP: blk_account_io_completion+0x7a/0xf0
    Code: 89 54 24 08 31 d2 89 4c 24 04 31 c9 c7 04 24 02 00 00 00 c1 ee
    09 e8 f5 21 a6 ff e8 70 5c a7 ff 8b 53 60 8d 04 bd 00 00 00 00 <01> b4
    02 ac 02 00 00 8b 9a 88 02 00 00 85 db 74 11 85 d2 74 51 8b
    EAX: 00000000 EBX: f5b80000 ECX: 00000000 EDX: 00000000
    ESI: 00000000 EDI: 00000000 EBP: f3031e70 ESP: f3031e54
    DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010046
    CR0: 80050033 CR2: 000002ac CR3: 03c25000 CR4: 000406d0
    Call Trace:
     <IRQ>
      blk_update_request+0x85/0x420
      ide_end_rq+0x38/0xa0
      ide_complete_rq+0x3d/0x70
      cdrom_newpc_intr+0x258/0xba0
      ide_intr+0x135/0x250
      __handle_irq_event_percpu+0x3e/0x250
      handle_irq_event_percpu+0x1f/0x50
      handle_irq_event+0x32/0x60
      handle_level_irq+0x6c/0x110
      handle_irq+0x72/0xa0
      </IRQ>
      do_IRQ+0x45/0xad
      common_interrupt+0x115/0x11c

Fixes: 48d9b0d43105 ("block: account statistics for passthrough requests")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Logan Gunthorpe <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

drm/amdgpu: avoid using invalidate semaphore for picasso

It may cause timeout waiting for sem acquire in VM flush when using
invalidate semaphore for picasso. So it needs to avoid using invalidate
semaphore for piasso.

Signed-off-by: changzhu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

Revert "drm/amdgpu: dont schedule jobs while in reset"

This reverts commit f2efc6e60089c99c342a6b7da47f1037e06c4296.

This was fixed properly for 5.5, but came back via 5.4 merge
into drm-next, so revert it again.

Reviewed-by: Christian König <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

init: use do_mount() instead of ksys_mount()

In prepare_namespace(), do_mount() can be used instead of ksys_mount()
as the first and third argument are const strings in the kernel, the
second and fourth argument are passed through anyway, and the fifth
argument is NULL.

In do_mount_root(), ksys_mount() is called with the first and third
argument being already kernelspace strings, which do not need to be
copied over from userspace to kernelspace (again). The second and
fourth arguments are passed through to do_mount() anyway. The fifth
argument, while already residing in kernelspace, needs to be put into
a page of its own. Then, do_mount() can be used instead of
ksys_mount().

Once this is done, there are no in-kernel users to ksys_mount() left,
which can therefore be removed.

Signed-off-by: Dominik Brodowski <[email protected]>

initrd: use do_mount() instead of ksys_mount()

All three calls to ksys_mount() in initrd-related kernel code can
be switched over to do_mount():
- the first and third arguments are const strings in the kernel,
and do not need to be copied over from userspace;
- the fifth argument is NULL, and therefore no page needs to be,
copied over from userspace;
- the second and fourth argument are passed through anyway.

Signed-off-by: Dominik Brodowski <[email protected]>

devtmpfs: use do_mount() instead of ksys_mount()

In devtmpfs, do_mount() can be called directly instead of complex wrapping
by ksys_mount():
- the first and third arguments are const strings in the kernel,
and do not need to be copied over from userspace;
- the fifth argument is NULL, and therefore no page needs to be
copied over from userspace;
- the second and fourth argument are passed through anyway.

Signed-off-by: Dominik Brodowski <[email protected]>

serial: sprd: Add clearing break interrupt operation

A break interrupt will be generated if the RX line was pulled low, which
means some abnomal behaviors occurred of the UART. In this case, we still
need to clear this break interrupt status, otherwise it will cause irq
storm to crash the whole system.

Fixes: b7396a38fb28 ("tty/serial: Add Spreadtrum sc9836-uart driver support")
Signed-off-by: Yonghan Ye <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Baolin Wang <[email protected]>
Link: https://lore.kernel.org/r/925e51b73099c90158e080b8f5bed9b3b38c4548.1575460601.git.baolin.wang7@gmail.com
Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: serial: msm_serial: Fix lockup for sysrq and oops

As the commit 677fe555cbfb ("serial: imx: Fix recursive locking bug")
has mentioned the uart driver might cause recursive locking between
normal printing and the kernel debugging facilities (e.g. sysrq and
oops). In the commit it gave out suggestion for fixing recursive
locking issue: "The solution is to avoid locking in the sysrq case
and trylock in the oops_in_progress case."

This patch follows the suggestion (also used the exactly same code with
other serial drivers, e.g. amba-pl011.c) to fix the recursive locking
issue, this can avoid stuck caused by deadlock and print out log for
sysrq and oops.

Fixes: 04896a77a97b ("msm_serial: serial driver for MSM7K onboard serial peripheral.")
Signed-off-by: Leo Yan <[email protected]>
Reviewed-by: Jeffrey Hugo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usb: dwc3: pci: add ID for the Intel Comet Lake -H variant

The original ID that was added for Comet Lake PCH was
actually for the -LP (low power) variant even though the
constant for it said CMLH. Changing that while at it.

Signed-off-by: Heikki Krogerus <[email protected]>
Acked-by: Felipe Balbi <[email protected]>
Cc: stable <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

KVM: arm64: Ensure 'params' is initialised when looking up sys register

Commit 4b927b94d5df ("KVM: arm/arm64: vgic: Introduce find_reg_by_id()")
introduced 'find_reg_by_id()', which looks up a system register only if
the 'id' index parameter identifies a valid system register. As part of
the patch, existing callers of 'find_reg()' were ported over to the new
interface, but this breaks 'index_to_sys_reg_desc()' in the case that the
initial lookup in the vCPU target table fails because we will then call
into 'find_reg()' for the system register table with an uninitialised
'param' as the key to the lookup.

GCC 10 is bright enough to spot this (amongst a tonne of false positives,
but hey!):

  | arch/arm64/kvm/sys_regs.c: In function ‘index_to_sys_reg_desc.part.0.isra’:
  | arch/arm64/kvm/sys_regs.c:983:33: warning: ‘params.Op2’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  |   983 |   (u32)(x)->CRn, (u32)(x)->CRm, (u32)(x)->Op2);
  | [...]

Revert the hunk of 4b927b94d5df which breaks 'index_to_sys_reg_desc()' so
that the old behaviour of checking the index upfront is restored.

Fixes: 4b927b94d5df ("KVM: arm/arm64: vgic: Introduce find_reg_by_id()")
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

interconnect: qcom: msm8974: Walk the list safely on node removal

As we will remove items off the list using list_del(), we need to use the
safe version of list_for_each_entry().

Fixes: 4e60a9568dc6 ("interconnect: qcom: add msm8974 driver")
Reported-by: Dmitry Osipenko <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
Signed-off-by: Georgi Djakov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

interconnect: qcom: qcs404: Walk the list safely on node removal

As we will remove items off the list using list_del(), we need to use the
safe version of list_for_each_entry().

Fixes: 5e4e6c4d3ae0 ("interconnect: qcom: Add QCS404 interconnect provider driver")
Reported-by: Dmitry Osipenko <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
Signed-off-by: Georgi Djakov <[email protected]>
Cc: <[email protected]> # v5.4
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

interconnect: qcom: sdm845: Walk the list safely on node removal

As we will remove items off the list using list_del(), we need to use the
safe version of list_for_each_entry().

Fixes: b5d2f741077a ("interconnect: qcom: Add sdm845 interconnect provider driver")
Reported-by: Dmitry Osipenko <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
Signed-off-by: Georgi Djakov <[email protected]>
Cc: <[email protected]> # v5.3+
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

interconnect: qcom: Fix Kconfig indentation

Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig

Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Georgi Djakov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

nios2: Fix ioremap

Commit 5ace77e0b41a ("nios2: remove __ioremap") removed the following code,
with the argument that cacheflag is always 0 and the expression would
therefore always be false.

if (IS_MAPPABLE_UNCACHEABLE(phys_addr) &&
IS_MAPPABLE_UNCACHEABLE(last_addr) &&
!(cacheflag & _PAGE_CACHED))
return (void __iomem *)(CONFIG_NIOS2_IO_REGION_BASE + phys_addr);

This did not take the "!" in the expression into account. Result is that
nios2 images no longer boot. Restoring the removed code fixes the problem.

Fixes: 5ace77e0b41a ("nios2: remove __ioremap")
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Ley Foon Tan <[email protected]>

crypto: arm/curve25519 - add arch-specific key generation function

Somehow this was forgotten when Zinc was being split into oddly shaped
pieces, resulting in linker errors. The x86_64 glue has a specific key
generation implementation, but the Arm one does not. However, it can
still receive the NEON speedups by calling the ordinary DH function
using the base point.

Signed-off-by: Jason A. Donenfeld <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

Merge branch 'linux-5.5' of git://github.com/skeggsb/linux into drm-fixes

Bunch of random nouveau fixes.

Signed-off-by: Dave Airlie <[email protected]>
From: Ben Skeggs <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2019-12-11

The following pull-request contains BPF updates for your *net* tree.

We've added 8 non-merge commits during the last 1 day(s) which contain
a total of 10 files changed, 126 insertions(+), 18 deletions(-).

The main changes are:

1) Make BPF trampoline co-exist with ftrace-based tracers, from Alexei.

2) Fix build in minimal configurations, from Arnd.

3) Fix mips, riscv bpf_tail_call limit, from Paul.

4) Fix bpftool segfault, from Toke.

5) Fix samples/bpf, from Daniel.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge branch 'md-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-linus

Pull MD fixes from Song.

* 'md-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: make sure desc_nr less than MD_SB_DISKS
  md: raid1: check rdev before reference in raid1_sync_request func
  raid5: need to set STRIPE_HANDLE for batch head

Merge tag 'drm-misc-fixes-2019-12-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

- Expand dma-buf MAINTAINER scope
- Fix mode matching for drivers not using picture_aspect_ratio

Signed-off-by: Dave Airlie <[email protected]>
From: Sean Paul <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20191211212107.GA257983@art_vandelay

ARM: imx: Fix boot crash if ocotp is not found

The imx_soc_device_init functions tries to fetch the ocotp regmap in
order to soc serial number. If regmap fetch fails then a message is
printed but regmap_read is called anyway and the system crashes.

Failing to lookup ocotp regmap shouldn't be a fatal boot error so check
that the pointer is valid.

Only side-effect of ocotp lookup failure now is that serial number will
be reported as all-zeros which is acceptable.

Cc: [email protected]
Fixes: 8267ff89b713 ("ARM: imx: Add serial number support for i.MX6/7 SoCs")
Signed-off-by: Leonard Crestez <[email protected]>
Tested-by: Christoph Niedermaier <[email protected]>
Signed-off-by: Shawn Guo <[email protected]>

Merge tag 'afs-fixes-20191211' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull AFS fixes from David Howells:
"Fixes for AFS plus one patch to make debugging easier:

   - Fix how addresses are matched to server records. This is currently
     incorrect which means cache invalidation callbacks from the server
     don't necessarily get delivered correctly. This causes stale data
     and metadata to be seen under some circumstances.

   - Make the dynamic root superblock R/W so that rpm/dnf can reapply
     the SELinux label to it when upgrading the Fedora filesystem-afs
     package. If the filesystem is R/O, this fails and the upgrade
     fails.

     It might be better in future to allow setxattr from an LSM to
     bypass the R/O protections, if only for pseudo-filesystems.

   - Fix the parsing of mountpoint strings. The mountpoint object has to
     have a terminal dot, whereas the source/device string passed to
     mount should not. This confuses type-forcing suffix detection
     leading to the wrong volume variant being mounted.

   - Make lookups in the dynamic root superblock for creation events
     (such as mkdir) fail with EOPNOTSUPP rather than something like
     EEXIST. The dynamic root only allows implicit creation by the
     ->lookup() method - and only if the target cell exists.

   - Fix the looking up of an AFS superblock to include the cell in the
     matching key - otherwise all volumes with the same ID number are
     treated as the same thing, irrespective of which cell they're in.

   - Show the volume name of each volume in the volume records displayed
     in /proc/net/afs/<cell>/volumes. This proved useful in debugging as
     it provides a way to map the volume IDs to names, where the names
     are what appear in /proc/mounts"

* tag 'afs-fixes-20191211' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Show volume name in /proc/net/afs/<cell>/volumes
  afs: Fix missing cell comparison in afs_test_super()
  afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP
  afs: Fix mountpoint parsing
  afs: Fix SELinux setting security label on /afs
  afs: Fix afs_find_server lookups for ipv4 peers

ARM: imx_v6_v7_defconfig: Explicitly restore CONFIG_DEBUG_FS

This is currently off and that's not desirable: default imx config is
meant to be generally useful for development and debugging.

Running git bisect between v5.4 and v5.5-rc1 finds this started from
commit 0e4a459f56c3 ("tracing: Remove unnecessary DEBUG_FS dependency")

Explicit CONFIG_DEBUG_FS=y was earlier removed by
commit c29d541f590c ("ARM: imx_v6_v7_defconfig: Remove unneeded options")

A very similar fix was required before:
commit 7e9eb6268809 ("ARM: imx_v6_v7_defconfig: Explicitly restore CONFIG_DEBUG_FS")

Signed-off-by: Leonard Crestez <[email protected]>
Signed-off-by: Shawn Guo <[email protected]>

ARM: dts: imx6ul-evk: Fix peripheral regulator

Many peripherals are affected by gpio5/2, not just sensors. One of those
is ethernet phy so network boot is current broken.

Fix by renaming reg_sensors and marking it as "always on". Also add a
comment asking for careful testing if this is to be made dynamic in the
future.

The "peri_3v3" naming is similar to imx6sx-sdb and regulator-name is
same string as in schematics (VPERI_3V3).

Fixes: 09e2b1048954 ("ARM: dts: imx6ul-14x14-evk: Add sensors' GPIO regulator")
Signed-off-by: Leonard Crestez <[email protected]>
Reviewed-by: Marco Felsch <[email protected]>
Signed-off-by: Shawn Guo <[email protected]>

arm64: dts: ls1028a: fix reboot node

The reboot register isn't located inside the DCFG controller, but in its
own RST controller. Fix it.

Fixes: 8897f3255c9c ("arm64: dts: Add support for NXP LS1028A SoC")
Signed-off-by: Michael Walle <[email protected]>
Acked-by: Li Yang <[email protected]>
Signed-off-by: Shawn Guo <[email protected]>

tools/testing/nvdimm: Fix mock support for ioremap

After commit d092a8707326 "arch: rely on asm-generic/io.h for default
ioremap_* definitions" the ioremap_nocache() symbol has been replaced
with ioremap(). Update the mocked symbol list for nvdimm testing.

Link: https://lore.kernel.org/r/157369090817.2974548.10148423996292973088.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: d092a8707326 ("arch: rely on asm-generic/io.h for default ioremap_* definitions")
Signed-off-by: Dan Williams <[email protected]>

samples: bpf: fix syscall_tp due to unused syscall

Currently, open() is called from the user program and it calls the syscall
'sys_openat', not the 'sys_open'. This leads to an error of the program
of user side, due to the fact that the counter maps are zero since no
function such 'sys_open' is called.

This commit adds the kernel bpf program which are attached to the
tracepoint 'sys_enter_openat' and 'sys_enter_openat'.

Fixes: 1da236b6be963 ("bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints")
Signed-off-by: Daniel T. Lee <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

samples: bpf: Replace symbol compare of trace_event

Previously, when this sample is added, commit 1c47910ef8013
("samples/bpf: add perf_event+bpf example"), a symbol 'sys_read' and
'sys_write' has been used without no prefixes. But currently there are
no exact symbols with these under kallsyms and this leads to failure.

This commit changes exact compare to substring compare to keep compatible
with exact symbol or prefixed symbol.

Fixes: 1c47910ef8013 ("samples/bpf: add perf_event+bpf example")
Signed-off-by: Daniel T. Lee <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

selftests/bpf: Test function_graph tracer and bpf trampoline together

Add simple test script to execute funciton graph tracer while BPF trampoline
attaches and detaches from the functions being graph traced.

Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

bpf: Make BPF trampoline use register_ftrace_direct() API

Make BPF trampoline attach its generated assembly code to kernel functions via
register_ftrace_direct() API. It helps ftrace-based tracers co-exist with BPF
trampoline on the same kernel function. It also switches attaching logic from
arch specific text_poke to generic ftrace that is available on many
architectures. text_poke is still necessary for bpf-to-bpf attach and for
bpf_tail_call optimization.

Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

io_uring: ensure we return -EINVAL on unknown opcode

If we submit an unknown opcode and have fd == -1, io_op_needs_file()
will return true as we default to needing a file. Then when we go and
assign the file, we find the 'fd' invalid and return -EBADF. We really
should be returning -EINVAL for that case, as we normally do for
unsupported opcodes.

Change io_op_needs_file() to have the following return values:

0 - does not need a file
1 - does need a file
< 0 - error value

and use this to pass back the right value for this invalid case.

Signed-off-by: Jens Axboe <[email protected]>

xfs: stabilize insert range start boundary to avoid COW writeback race

generic/522 (fsx) occasionally fails with a file corruption due to
an insert range operation. The primary characteristic of the
corruption is a misplaced insert range operation that differs from
the requested target offset. The reason for this behavior is a race
between the extent shift sequence of an insert range and a COW
writeback completion that causes a front merge with the first extent
in the shift.

The shift preparation function flushes and unmaps from the target
offset of the operation to the end of the file to ensure no
modifications can be made and page cache is invalidated before file
data is shifted. An insert range operation then splits the extent at
the target offset, if necessary, and begins to shift the start
offset of each extent starting from the end of the file to the start
offset. The shift sequence operates at extent level and so depends
on the preparation sequence to guarantee no changes can be made to
the target range during the shift. If the block immediately prior to
the target offset was dirty and shared, however, it can undergo
writeback and move from the COW fork to the data fork at any point
during the shift. If the block is contiguous with the block at the
start offset of the insert range, it can front merge and alter the
start offset of the extent. Once the shift sequence reaches the
target offset, it shifts based on the latest start offset and
silently changes the target offset of the operation and corrupts the
file.

To address this problem, update the shift preparation code to
stabilize the start boundary along with the full range of the
insert. Also update the existing corruption check to fail if any
extent is shifted with a start offset behind the target offset of
the insert range. This prevents insert from racing with COW
writeback completion and fails loudly in the event of an unexpected
extent shift.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

xfs: fix Sphinx documentation warning

Fix Sphinx documentation format warning by not indenting so much.

Documentation/admin-guide/xfs.rst:257: WARNING: Block quote ends without a blank line; unexpected unindent.

Signed-off-by: Randy Dunlap <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: [email protected]
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

Merge tag 'erofs-for-5.5-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
"Mainly address a regression reported by David recently observed
  together with overlayfs due to the improper return value of
  listxattr() without xattr. Update outdated expressions in document as
  well.

  Summary:

   - Fix improper return value of listxattr() with no xattr

   - Keep up documentation with latest code"

* tag 'erofs-for-5.5-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: update documentation
  erofs: zero out when listxattr is called with no xattr

Merge tag 'trace-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

- Remove code I accidentally applied when doing a minor fix up to a
   patch, and then using "git commit -a --amend", which pulled in some
   other changes I was playing with.

- Remove an used variable in trace_events_inject code

- Fix function graph tracer when it traces a ftrace direct function.
   It will now ignore tracing a function that has a ftrace direct
   tramploine attached. This is needed for eBPF to use the ftrace direct
   code.

* tag 'trace-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix function_graph tracer interaction with BPF trampoline
  tracing: remove set but not used variable 'buffer'
  module: Remove accidental change of module_enable_x()

pipe: simplify signal handling in pipe_read() and add comments

There's no need to separately check for signals while inside the locked
region, since we're going to do "wait_event_interruptible()" right
afterwards anyway, and the error handling is much simpler there.

The check for whether we had already read anything was also redundant,
since we no longer do the odd merging of reads when there are pending
writers.

But perhaps more importantly, this adds commentary about why we still
need to wake up possible writers even though we didn't read any data,
and why we can skip all the finishing touches now if we get a signal (or
had a signal pending) while waiting for more data.

[ This is a split-out cleanup from my "make pipe IO use exclusive wait
queues" thing, which I can't apply because it triggers a nasty bug in
the GNU make jobserver - Linus ]

Signed-off-by: Linus Torvalds <[email protected]>