Git Repo - linux.git/log

x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}

Function bodies are very similar and are going to grow more almost
identical code. Add a bool arg to determine whether SPEC_CTRL is being set
for the guest or restored to the host.

No functional changes.

Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>

x86/speculation: Rework speculative_store_bypass_update()

The upcoming support for the virtual SPEC_CTRL MSR on AMD needs to reuse
speculative_store_bypass_update() to avoid code duplication. Add an
argument for supplying a thread info (TIF) value and create a wrapper
speculative_store_bypass_update_current() which is used at the existing
call site.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>

x86/speculation: Add virtualized speculative store bypass disable support

Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD). To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.

Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>

x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL

AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store
Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care
about the bit position of the SSBD bit and thus facilitate migration.
Also, the sibling coordination on Family 17H CPUs can only be done on
the host.

Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an
extra argument for the VIRT_SPEC_CTRL MSR.

Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU
data structure which is going to be used in later patches for the actual
implementation.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>

x86/speculation: Handle HT correctly on AMD

The AMD64_LS_CFG MSR is a per core MSR on Family 17H CPUs. That means when
hyperthreading is enabled the SSBD bit toggle needs to take both cores into
account. Otherwise the following situation can happen:

CPU0 CPU1

disable SSB
disable SSB
enable SSB <- Enables it for the Core, i.e. for CPU0 as well

So after the SSB enable on CPU1 the task on CPU0 runs with SSB enabled
again.

On Intel the SSBD control is per core as well, but the synchronization
logic is implemented behind the per thread SPEC_CTRL MSR. It works like
this:

CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL

i.e. if one of the threads enables a mitigation then this affects both and
the mitigation is only disabled in the core when both threads disabled it.

Add the necessary synchronization logic for AMD family 17H. Unfortunately
that requires a spinlock to serialize the access to the MSR, but the locks
are only shared between siblings.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>

x86/cpufeatures: Add FEATURE_ZEN

Add a ZEN feature bit so family-dependent static_cpu_has() optimizations
can be built for ZEN.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>

x86/cpufeatures: Disentangle SSBD enumeration

The SSBD enumeration is similarly to the other bits magically shared
between Intel and AMD though the mechanisms are different.

Make X86_FEATURE_SSBD synthetic and set it depending on the vendor specific
features or family dependent setup.

Change the Intel bit to X86_FEATURE_SPEC_CTRL_SSBD to denote that SSBD is
controlled via MSR_SPEC_CTRL and fix up the usage sites.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>

x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS

The availability of the SPEC_CTRL MSR is enumerated by a CPUID bit on
Intel and implied by IBRS or STIBP support on AMD. That's just confusing
and in case an AMD CPU has IBRS not supported because the underlying
problem has been fixed but has another bit valid in the SPEC_CTRL MSR,
the thing falls apart.

Add a synthetic feature bit X86_FEATURE_MSR_SPEC_CTRL to denote the
availability on both Intel and AMD.

While at it replace the boot_cpu_has() checks with static_cpu_has() where
possible. This prevents late microcode loading from exposing SPEC_CTRL, but
late loading is already very limited as it does not reevaluate the
mitigation options and other bits and pieces. Having static_cpu_has() is
the simplest and least fragile solution.

Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>

x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP

Intel and AMD have different CPUID bits hence for those use synthetic bits
which get set on the respective vendor's in init_speculation_control(). So
that debacles like what the commit message of

c65732e4f721 ("x86/cpu: Restore CPUID_8000_0008_EBX reload")

talks about don't happen anymore.

Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
Tested-by: Jörg Otte <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

KVM: SVM: Move spec control call after restore of GS

svm_vcpu_run() invokes x86_spec_ctrl_restore_host() after VMEXIT, but
before the host GS is restored. x86_spec_ctrl_restore_host() uses 'current'
to determine the host SSBD state of the thread. 'current' is GS based, but
host GS is not yet restored and the access causes a triple fault.

Move the call after the host GS restore.

Fixes: 885f82bfbc6f x86/process: Allow runtime control of Speculative Store Bypass
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Borislav Petkov <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>

powerpc/powernv: Fix NVRAM sleep in invalid context when crashing

Similarly to opal_event_shutdown, opal_nvram_write can be called in
the crash path with irqs disabled. Special case the delay to avoid
sleeping in invalid context.

Fixes: 3b8070335f75 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops")
Cc: [email protected] # v3.2
Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

MAINTAINERS: add entry for STM32 I2C driver

Add I2C/SMBUS Driver entry for STM32 family from ST Microelectronics.

Signed-off-by: Pierre-Yves MORDRET <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>

Makefile: disable PIE before testing asm goto

Since commit e501ce957a78 ("x86: Force asm-goto"), aarch64 build on
distributions which enable PIE by default (e.g. openSUSE Tumbleweed) does
not detect support for asm goto correctly. The problem is that ARM specific
part of scripts/gcc-goto.sh fails with PIE even with recent gcc versions.
Moving the asm goto detection up in Makefile put it before the place where
we disable PIE. As a result, kernel is built without jump label support.

Move the lines disabling PIE before the asm goto test to make it work.

Fixes: e501ce957a78 ("x86: Force asm-goto")
Reported-by: Andreas Faerber <[email protected]>
Signed-off-by: Michal Kubecek <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

kbuild: gcov: enable -fno-tree-loop-im if supported

Clang does not recognize this compiler option.

Reported-by: Prasad Sodagudi <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>

btrfs: fix crash when trying to resume balance without the resume flag

We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance()
only, which isn't called during the remount. So when resuming from
the paused balance we hit the bug:

kernel: kernel BUG at fs/btrfs/volumes.c:3890!
::
kernel:  balance_kthread+0x51/0x60 [btrfs]
kernel:  kthread+0x111/0x130
::
kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8

Reproducer:
  On a mounted filesystem:

  btrfs balance start --full-balance /btrfs
  btrfs balance pause /btrfs
  mount -o remount,ro /dev/sdb /btrfs
  mount -o remount,rw /dev/sdb /btrfs

To fix this set the BTRFS_BALANCE_RESUME flag in
btrfs_resume_balance_async().

CC: [email protected] # 4.4+
Signed-off-by: Anand Jain <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: Fix delalloc inodes invalidation during transaction abort

When a transaction is aborted btrfs_cleanup_transaction is called to
cleanup all the various in-flight bits and pieces which migth be
active. One of those is delalloc inodes - inodes which have dirty
pages which haven't been persisted yet. Currently the process of
freeing such delalloc inodes in exceptional circumstances such as
transaction abort boiled down to calling btrfs_invalidate_inodes whose
sole job is to invalidate the dentries for all inodes related to a
root. This is in fact wrong and insufficient since such delalloc inodes
will likely have pending pages or ordered-extents and will be linked to
the sb->s_inode_list. This means that unmounting a btrfs instance with
an aborted transaction could potentially lead inodes/their pages
visible to the system long after their superblock has been freed. This
in turn leads to a "use-after-free" situation once page shrink is
triggered. This situation could be simulated by running generic/019
which would cause such inodes to be left hanging, followed by
generic/176 which causes memory pressure and page eviction which lead
to touching the freed super block instance. This situation is
additionally detected by the unmount code of VFS with the following
message:

"VFS: Busy inodes after unmount of Self-destruct in 5 seconds. Have a nice day..."

Additionally btrfs hits WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
in free_fs_root for the same reason.

This patch aims to rectify the sitaution by doing the following:

1. Change btrfs_destroy_delalloc_inodes so that it calls
invalidate_inode_pages2 for every inode on the delalloc list, this
ensures that all the pages of the inode are released. This function
boils down to calling btrfs_releasepage. During test I observed cases
where inodes on the delalloc list were having an i_count of 0, so this
necessitates using igrab to be sure we are working on a non-freed inode.

2. Since calling btrfs_releasepage might queue delayed iputs move the
call out to btrfs_cleanup_transaction in btrfs_error_commit_super before
calling run_delayed_iputs for the last time. This is necessary to ensure
that delayed iputs are run.

Note: this patch is tagged for 4.14 stable but the fix applies to older
versions too but needs to be backported manually due to conflicts.

CC: [email protected] # 4.14.x: 2b8773313494: btrfs: Split btrfs_del_delalloc_inode into 2 functions
CC: [email protected] # 4.14.x
Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
[ add comment to igrab ]
Signed-off-by: David Sterba <[email protected]>

btrfs: Split btrfs_del_delalloc_inode into 2 functions

This is in preparation of fixing delalloc inodes leakage on transaction
abort. Also export the new function.

Signed-off-by: Nikolay Borisov <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: fix reading stale metadata blocks after degraded raid1 mounts

If a btree block, aka. extent buffer, is not available in the extent
buffer cache, it'll be read out from the disk instead, i.e.

btrfs_search_slot()
  read_block_for_search()  # hold parent and its lock, go to read child
    btrfs_release_path()
    read_tree_block()  # read child

Unfortunately, the parent lock got released before reading child, so
commit 5bdd3536cbbe ("Btrfs: Fix block generation verification race") had
used 0 as parent transid to read the child block.  It forces
read_tree_block() not to check if parent transid is different with the
generation id of the child that it reads out from disk.

A simple PoC is included in btrfs/124,

0. A two-disk raid1 btrfs,

1. Right after mkfs.btrfs, block A is allocated to be device tree's root.

2. Mount this filesystem and put it in use, after a while, device tree's
   root got COW but block A hasn't been allocated/overwritten yet.

3. Umount it and reload the btrfs module to remove both disks from the
   global @fs_devices list.

4. mount -odegraded dev1 and write some data, so now block A is allocated
   to be a leaf in checksum tree.  Note that only dev1 has the latest
   metadata of this filesystem.

5. Umount it and mount it again normally (with both disks), since raid1
   can pick up one disk by the writer task's pid, if btrfs_search_slot()
   needs to read block A, dev2 which does NOT have the latest metadata
   might be read for block A, then we got a stale block A.

6. As parent transid is not checked, block A is marked as uptodate and
   put into the extent buffer cache, so the future search won't bother
   to read disk again, which means it'll make changes on this stale
   one and make it dirty and flush it onto disk.

To avoid the problem, parent transid needs to be passed to
read_tree_block().

In order to get a valid parent transid, we need to hold the parent's
lock until finishing reading child.

This patch needs to be slightly adapted for stable kernels, the
&first_key parameter added to read_tree_block() is from 4.16+
(581c1760415c4). The fix is to replace 0 by 'gen'.

Fixes: 5bdd3536cbbe ("Btrfs: Fix block generation verification race")
CC: [email protected] # 4.4+
Signed-off-by: Liu Bo <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
[ update changelog ]
Signed-off-by: David Sterba <[email protected]>

btrfs: property: Set incompat flag if lzo/zstd compression is set

Incompat flag of LZO/ZSTD compression should be set at:

1. mount time (-o compress/compress-force)
2. when defrag is done
3. when property is set

Currently 3. is missing and this commit adds this.

This could lead to a filesystem that uses ZSTD but is not marked as
such. If a kernel without a ZSTD support encounteres a ZSTD compressed
extent, it will handle that but this could be confusing to the user.

Typically the filesystem is mounted with the ZSTD option, but the
discrepancy can arise when a filesystem is never mounted with ZSTD and
then the property on some file is set (and some new extents are
written). A simple mount with -o compress=zstd will fix that up on an
unpatched kernel.

Same goes for LZO, but this has been around for a very long time
(2.6.37) so it's unlikely that a pre-LZO kernel would be used.

Fixes: 5c1aab1dd544 ("btrfs: Add zstd support")
CC: [email protected] # 4.14+
Signed-off-by: Tomohiro Misono <[email protected]>
Reviewed-by: Anand Jain <[email protected]>
Reviewed-by: David Sterba <[email protected]>
[ add user visible impact ]
Signed-off-by: David Sterba <[email protected]>

Btrfs: fix duplicate extents after fsync of file with prealloc extents

In commit 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size
after fsync log replay"), on fsync,  we started to always log all prealloc
extents beyond an inode's i_size in order to avoid losing them after a
power failure. However under some cases this can lead to the log replay
code to create duplicate extent items, with different lengths, in the
extent tree. That happens because, as of that commit, we can now log
extent items based on extent maps that are not on the "modified" list
of extent maps of the inode's extent map tree. Logging extent items based
on extent maps is used during the fast fsync path to save time and for
this to work reliably it requires that the extent maps are not merged
with other adjacent extent maps - having the extent maps in the list
of modified extents gives such guarantee.

Consider the following example, captured during a long run of fsstress,
which illustrates this problem.

We have inode 271, in the filesystem tree (root 5), for which all of the
following operations and discussion apply to.

A buffered write starts at offset 312391 with a length of 933471 bytes
(end offset at 1245862). At this point we have, for this inode, the
following extent maps with the their field values:

em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613,
      block_len 0, orig_block_len 0
em B, start 40960, orig_start 40960, len 376832, block_start 1106399232,
      block_len 376832, orig_block_len 376832
em C, start 417792, orig_start 417792, len 782336, block_start
      18446744073709551613, block_len 0, orig_block_len 0
em D, start 1200128, orig_start 1200128, len 835584, block_start
      1106776064, block_len 835584, orig_block_len 835584
em E, start 2035712, orig_start 2035712, len 245760, block_start
      1107611648, block_len 245760, orig_block_len 245760

Extent map A corresponds to a hole and extent maps D and E correspond to
preallocated extents.

Extent map D ends where extent map E begins (1106776064 + 835584 =
1107611648), but these extent maps were not merged because they are in
the inode's list of modified extent maps.

An fsync against this inode is made, which triggers the fast path
(BTRFS_INODE_NEEDS_FULL_SYNC is not set). This fsync triggers writeback
of the data previously written using buffered IO, and when the respective
ordered extent finishes, btrfs_drop_extents() is called against the
(aligned) range 311296..1249279. This causes a split of extent map D at
btrfs_drop_extent_cache(), replacing extent map D with a new extent map
D', also added to the list of modified extents,  with the following
values:

em D', start 1249280, orig_start of 1200128,
       block_start 1106825216 (= 1106776064 + 1249280 - 1200128),
       orig_block_len 835584,
       block_len 786432 (835584 - (1249280 - 1200128))

Then, during the fast fsync, btrfs_log_changed_extents() is called and
extent maps D' and E are removed from the list of modified extents. The
flag EXTENT_FLAG_LOGGING is also set on them. After the extents are logged
clear_em_logging() is called on each of them, and that makes extent map E
to be merged with extent map D' (try_merge_map()), resulting in D' being
deleted and E adjusted to:

em E, start 1249280, orig_start 1200128, len 1032192,
      block_start 1106825216, block_len 1032192,
      orig_block_len 245760

A direct IO write at offset 1847296 and length of 360448 bytes (end offset
at 2207744) starts, and at that moment the following extent maps exist for
our inode:

em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613,
      block_len 0, orig_block_len 0
em B, start 40960, orig_start 40960, len 270336, block_start 1106399232,
      block_len 270336, orig_block_len 376832
em C, start 311296, orig_start 311296, len 937984, block_start 1112842240,
      block_len 937984, orig_block_len 937984
em E (prealloc), start 1249280, orig_start 1200128, len 1032192,
      block_start 1106825216, block_len 1032192, orig_block_len 245760

The dio write results in drop_extent_cache() being called twice. The first
time for a range that starts at offset 1847296 and ends at offset 2035711
(length of 188416), which results in a double split of extent map E,
replacing it with two new extent maps:

em F, start 1249280, orig_start 1200128, block_start 1106825216,
      block_len 598016, orig_block_len 598016
em G, start 2035712, orig_start 1200128, block_start 1107611648,
      block_len 245760, orig_block_len 1032192

It also creates a new extent map that represents a part of the requested
IO (through create_io_em()):

em H, start 1847296, len 188416, block_start 1107423232, block_len 188416

The second call to drop_extent_cache() has a range with a start offset of
2035712 and end offset of 2207743 (length of 172032). This leads to
replacing extent map G with a new extent map I with the following values:

em I, start 2207744, orig_start 1200128, block_start 1107783680,
      block_len 73728, orig_block_len 1032192

It also creates a new extent map that represents the second part of the
requested IO (through create_io_em()):

em J, start 2035712, len 172032, block_start 1107611648, block_len 172032

The dio write set the inode's i_size to 2207744 bytes.

After the dio write the inode has the following extent maps:

em A, start 0, orig_start 0, len 40960, block_start 18446744073709551613,
      block_len 0, orig_block_len 0
em B, start 40960, orig_start 40960, len 270336, block_start 1106399232,
      block_len 270336, orig_block_len 376832
em C, start 311296, orig_start 311296, len 937984, block_start 1112842240,
      block_len 937984, orig_block_len 937984
em F, start 1249280, orig_start 1200128, len 598016,
      block_start 1106825216, block_len 598016, orig_block_len 598016
em H, start 1847296, orig_start 1200128, len 188416,
      block_start 1107423232, block_len 188416, orig_block_len 835584
em J, start 2035712, orig_start 2035712, len 172032,
      block_start 1107611648, block_len 172032, orig_block_len 245760
em I, start 2207744, orig_start 1200128, len 73728,
      block_start 1107783680, block_len 73728, orig_block_len 1032192

Now do some change to the file, like adding a xattr for example and then
fsync it again. This triggers a fast fsync path, and as of commit
471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync
log replay"), we use the extent map I to log a file extent item because
it's a prealloc extent and it starts at an offset matching the inode's
i_size. However when we log it, we create a file extent item with a value
for the disk byte location that is wrong, as can be seen from the
following output of "btrfs inspect-internal dump-tree":

item 1 key (271 EXTENT_DATA 2207744) itemoff 3782 itemsize 53
     generation 22 type 2 (prealloc)
     prealloc data disk byte 1106776064 nr 1032192
     prealloc data offset 1007616 nr 73728

Here the disk byte value corresponds to calculation based on some fields
from the extent map I:

  1106776064 = block_start (1107783680) - 1007616 (extent_offset)
  extent_offset = 2207744 (start) - 1200128 (orig_start) = 1007616

The disk byte value of 1106776064 clashes with disk byte values of the
file extent items at offsets 1249280 and 1847296 in the fs tree:

        item 6 key (271 EXTENT_DATA 1249280) itemoff 3568 itemsize 53
                generation 20 type 2 (prealloc)
                prealloc data disk byte 1106776064 nr 835584
                prealloc data offset 49152 nr 598016
        item 7 key (271 EXTENT_DATA 1847296) itemoff 3515 itemsize 53
                generation 20 type 1 (regular)
                extent data disk byte 1106776064 nr 835584
                extent data offset 647168 nr 188416 ram 835584
                extent compression 0 (none)
        item 8 key (271 EXTENT_DATA 2035712) itemoff 3462 itemsize 53
                generation 20 type 1 (regular)
                extent data disk byte 1107611648 nr 245760
                extent data offset 0 nr 172032 ram 245760
                extent compression 0 (none)
        item 9 key (271 EXTENT_DATA 2207744) itemoff 3409 itemsize 53
                generation 20 type 2 (prealloc)
                prealloc data disk byte 1107611648 nr 245760
                prealloc data offset 172032 nr 73728

Instead of the disk byte value of 1106776064, the value of 1107611648
should have been logged. Also the data offset value should have been
172032 and not 1007616.
After a log replay we end up getting two extent items in the extent tree
with different lengths, one of 835584, which is correct and existed
before the log replay, and another one of 1032192 which is wrong and is
based on the logged file extent item:

item 12 key (1106776064 EXTENT_ITEM 835584) itemoff 3406 itemsize 53
    refs 2 gen 15 flags DATA
    extent data backref root 5 objectid 271 offset 1200128 count 2
item 13 key (1106776064 EXTENT_ITEM 1032192) itemoff 3353 itemsize 53
    refs 1 gen 22 flags DATA
    extent data backref root 5 objectid 271 offset 1200128 count 1

Obviously this leads to many problems and a filesystem check reports many
errors:

(...)
checking extents
Extent back ref already exists for 1106776064 parent 0 root 5 owner 271 offset 1200128 num_refs 1
extent item 1106776064 has multiple extent items
ref mismatch on [1106776064 835584] extent item 2, found 3
Incorrect local backref count on 1106776064 root 5 owner 271 offset 1200128 found 2 wanted 1 back 0x55b1d0ad7680
Backref 1106776064 root 5 owner 271 offset 1200128 num_refs 0 not found in extent tree
Incorrect local backref count on 1106776064 root 5 owner 271 offset 1200128 found 1 wanted 0 back 0x55b1d0ad4e70
Backref bytes do not match extent backref, bytenr=1106776064, ref bytes=835584, backref bytes=1032192
backpointer mismatch on [1106776064 835584]
checking free space cache
block group 1103101952 has wrong amount of free space
failed to load free space cache for block group 1103101952
checking fs roots
(...)

So fix this by logging the prealloc extents beyond the inode's i_size
based on searches in the subvolume tree instead of the extent maps.

Fixes: 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay")
CC: [email protected] # 4.14+
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>

dmaengine: qcom: bam_dma: check if the runtime pm enabled

Disabling pm runtime at probe is not sufficient to get BAM working
on remotely controller instances. pm_runtime_get_sync() would return
-EACCES in such cases.
So check if runtime pm is enabled before returning error from bam functions.

Fixes: 5b4a68952a89 ("dmaengine: qcom: bam_dma: disable runtime pm on remote controlled")
Signed-off-by: Srinivas Kandagatla <[email protected]>
Signed-off-by: Vinod Koul <[email protected]>

KVM: s390: vsie: fix < 8k check for the itdba

By missing an "L", we might detect some addresses to be <8k,
although they are not.

e.g. for itdba = 100001fff
!(gpa & ~0x1fffU) -> 1
!(gpa & ~0x1fffUL) -> 0

So we would report a SIE validity intercept although everything is fine.

Fixes: 166ecb3 ("KVM: s390: vsie: support transactional execution")
Reported-by: Dan Carpenter <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Reviewed-by: Janosch Frank <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Janosch Frank <[email protected]>
Cc: [email protected] # v4.8+
Signed-off-by: Christian Borntraeger <[email protected]>

KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path

A radix guest can execute tlbie instructions to invalidate TLB entries.
After a tlbie or a group of tlbies, it must then do the architected
sequence eieio; tlbsync; ptesync to ensure that the TLB invalidation
has been processed by all CPUs in the system before it can rely on
no CPU using any translation that it just invalidated.

In fact it is the ptesync which does the actual synchronization in
this sequence, and hardware has a requirement that the ptesync must
be executed on the same CPU thread as the tlbies which it is expected
to order. Thus, if a vCPU gets moved from one physical CPU to
another after it has done some tlbies but before it can get to do the
ptesync, the ptesync will not have the desired effect when it is
executed on the second physical CPU.

To fix this, we do a ptesync in the exit path for radix guests. If
there are any pending tlbies, this will wait for them to complete.
If there aren't, then ptesync will just do the same as sync.

Signed-off-by: Paul Mackerras <[email protected]>

KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change

When a vcpu priority (CPPR) is set to a lower value (masking more
interrupts), we stop processing interrupts already in the queue
for the priorities that have now been masked.

If those interrupts were previously re-routed to a different
CPU, they might still be stuck until the older one that has
them in its queue processes them. In the case of guest CPU
unplug, that can be never.

To address that without creating additional overhead for
the normal interrupt processing path, this changes H_CPPR
handling so that when such a priority change occurs, we
scan the interrupt queue for that vCPU, and for any
interrupt in there that has been re-routed, we replace it
with a dummy and force a re-trigger.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Tested-by: Alexey Kardashevskiy <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>

KVM: PPC: Book3S HV: Make radix clear pte when unmapping

The current partition table unmap code clears the _PAGE_PRESENT bit
out of the pte, which leaves pud_huge/pmd_huge true and does not
clear pud_present/pmd_present. This can confuse subsequent page
faults and possibly lead to the guest looping doing continual
hypervisor page faults.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>

KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page

The standard eieio ; tlbsync ; ptesync must follow tlbie to ensure it
is ordered with respect to subsequent operations.

Signed-off-by: Nicholas Piggin <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>

KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry

Currently, the HV KVM guest entry/exit code adds the timebase offset
from the vcore struct to the timebase on guest entry, and subtracts
it on guest exit.  Which is fine, except that it is possible for
userspace to change the offset using the SET_ONE_REG interface while
the vcore is running, as there is only one timebase offset per vcore
but potentially multiple VCPUs in the vcore.  If that were to happen,
KVM would subtract a different offset on guest exit from that which
it had added on guest entry, leading to the timebase being out of sync
between cores in the host, which then leads to bad things happening
such as hangs and spurious watchdog timeouts.

To fix this, we add a new field 'tb_offset_applied' to the vcore struct
which stores the offset that is currently applied to the timebase.
This value is set from the vcore tb_offset field on guest entry, and
is what is subtracted from the timebase on guest exit.  Since it is
zero when the timebase offset is not applied, we can simplify the
logic in kvmhv_start_timing and kvmhv_accumulate_time.

In addition, we had secondary threads reading the timebase while
running concurrently with code on the primary thread which would
eventually add or subtract the timebase offset from the timebase.
This occurred while saving or restoring the DEC register value on
the secondary threads.  Although no specific incorrect behaviour has
been observed, this is a race which should be fixed.  To fix it, we
move the DEC saving code to just before we call kvmhv_commence_exit,
and the DEC restoring code to after the point where we have waited
for the primary thread to switch the MMU context and add the timebase
offset.  That way we are sure that the timebase contains the guest
timebase value in both cases.

Signed-off-by: Paul Mackerras <[email protected]>

Merge branch 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux into drm-fixes

A single fix for a recent regression.

* 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: Set dmabuf_size when vmw_dmabuf_init is successful

Merge tag 'drm-misc-fixes-2018-05-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

- core: Fix regression in dev node offsets (Haneen)
- vc4: Fix memory leak on driver close (Eric)
- dumb-buffers: Prevent overflow in DIV_ROUND_UP() (Dan)

Cc: Haneen Mohammed <[email protected]>
Cc: Eric Anholt <[email protected]>
Cc: Dan Carpenter <[email protected]>
* tag 'drm-misc-fixes-2018-05-16' of git://anongit.freedesktop.org/drm/drm-misc:
  drm/dumb-buffers: Integer overflow in drm_mode_create_ioctl()
  drm/vc4: Fix leak of the file_priv that stored the perfmon.
  drm: Match sysfs name in link removal to link creation

Merge tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
"Some of the ftrace internal events use a zero for a data size of a
  field event. This is increasingly important for the histogram trigger
  work that is being extended.

  While auditing trace events, I found that a couple of the xen events
  were used as just marking that a function was called, by creating a
  static array of size zero. This can play havoc with the tracing
  features if these events are used, because a zero size of a static
  array is denoted as a special nul terminated dynamic array (this is
  what the trace_marker code uses). But since the xen events have no
  size, they are not nul terminated, and unexpected results may occur.

  As trace events were never intended on being a marker to denote that a
  function was hit or not, especially since function tracing and kprobes
  can trivially do the same, the best course of action is to simply
  remove these events"

* tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}

tuntap: fix use after free during release

After commit b196d88aba8a ("tun: fix use after free for ptr_ring") we
need clean up tx ring during release(). But unfortunately, it tries to
do the cleanup blindly after socket were destroyed which will lead
another use-after-free. Fix this by doing the cleanup before dropping
the last reference of the socket in __tun_detach().

Reported-by: Andrei Vagin <[email protected]>
Acked-by: Andrei Vagin <[email protected]>
Fixes: b196d88aba8a ("tun: fix use after free for ptr_ring")
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'qed-LL2-fixes'

Michal Kalderon says:

====================
qed: LL2 fixes

This series fixes some issues in ll2 related to synchronization
and resource freeing
====================

Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: Fix LL2 race during connection terminate

Stress on qedi/qedr load unload lead to list_del corruption.
This is due to ll2 connection terminate freeing resources without
verifying that no more ll2 processing will occur.

This patch unregisters the ll2 status block before terminating
the connection to assure this race does not occur.

Fixes: 1d6cff4fca4366 ("qed: Add iSCSI out of order packet handling")
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: Fix possibility of list corruption during rmmod flows

The ll2 flows of flushing the txq/rxq need to be synchronized with the
regular fp processing. Caused list corruption during load/unload stress
tests.

Fixes: 0a7fb11c23c0f ("qed: Add Light L2 support")
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

qed: LL2 flush isles when connection is closed

Driver should free all pending isles once it gets a FLUSH cqe from FW.
Part of iSCSI out of order flow.

Fixes: 1d6cff4fca4366 ("qed: Add iSCSI out of order packet handling")
Signed-off-by: Ariel Elior <[email protected]>
Signed-off-by: Michal Kalderon <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/sched: fix refcnt leak in the error path of tcf_vlan_init()

Similarly to what was done with commit a52956dfc503 ("net sched actions:
fix refcnt leak in skbmod"), fix the error path of tcf_vlan_init() to avoid
refcnt leaks when wrong value of TCA_VLAN_PUSH_VLAN_PROTOCOL is given.

Fixes: 5026c9b1bafc ("net sched: vlan action fix late binding")
CC: Roman Mashak <[email protected]>
Signed-off-by: Davide Caratti <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: 8390: ne: Fix accidentally removed RBTX4927 support

The configuration settings for RBTX4927 were accidentally removed,
leading to a silently broken network interface.

Re-add the missing settings to fix this.

Fixes: 8eb97ff5a4ec941d ("net: 8390: remove m32r specific bits")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'dsa-bcm_sf2-CFP-fixes'

Florian Fainelli says:

====================
net: dsa: bcm_sf2: CFP fixes

This patch series fixes a number of usability issues with the SF2 Compact Field
Processor code:

- we would not be properly bound checking the location when we let the kernel
  automatically place rules with RX_CLS_LOC_ANY

- when using IPv6 rules and user space specifies a location identifier we
  would be off by one in what the chain ID (within the Broadcom tag) indicates

- it would be possible to delete one of the two slices of an IPv6 while leaving
  the other one programming leading to various problems
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: bcm_sf2: Fix IPv6 rule half deletion

It was possible to delete only one half of an IPv6, which would leave
the second half still programmed and possibly in use. Instead of
checking for the unused bitmap, we need to check the unique bitmap, and
refuse any deletion that does not match that criteria. We also need to
move that check from bcm_sf2_cfp_rule_del_one() into its caller:
bcm_sf2_cfp_rule_del() otherwise we would not be able to delete second
halves anymore that would not pass the first test.

Fixes: ba0696c22e7c ("net: dsa: bcm_sf2: Add support for IPv6 CFP rules")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: bcm_sf2: Fix IPv6 rules and chain ID

We had several issues that would make the programming of IPv6 rules both
inconsistent and error prone:

- the chain ID that we would be asking the hardware to put in the
  packet's Broadcom tag would be off by one, it would return one of the
  two indexes, but not the one user-space specified

- when an user specified a particular location to insert a CFP rule at,
  we would not be returning the same index, which would be confusing if
  nothing else

- finally, like IPv4, it would be possible to overflow the last entry by
  re-programming it

Fix this by swapping the usage of rule_index[0] and rule_index[1] where
relevant in order to return a consistent and correct user-space
experience.

Fixes: ba0696c22e7c ("net: dsa: bcm_sf2: Add support for IPv6 CFP rules")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: bcm_sf2: Fix RX_CLS_LOC_ANY overwrite for last rule

When we let the kernel pick up a rule location with RX_CLS_LOC_ANY, we
would be able to overwrite the last rules because of a number of issues.

The IPv4 code path would not be checking that rule_index is within
bounds, and it would also only be allowed to pick up rules from range
0..126 instead of the full 0..127 range. This would lead us to allow
overwriting the last rule when we let the kernel pick-up the location.

Fixes: 3306145866b6 ("net: dsa: bcm_sf2: Move IPv4 CFP processing to specific functions")
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'trace-v4.17-rc5-vsprintf' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull memory barrier for from Steven Rostedt:
"The memory barrier usage in updating the random ptr hash for %p in
  vsprintf is incorrect.

  Instead of adding the read memory barrier into vsprintf() which will
  cause a slight degradation to a commonly used function in the kernel
  just to solve a very unlikely race condition that can only happen at
  boot up, change the code from using a variable branch to a
  static_branch.

  Not only does this solve the race condition, it actually will improve
  the performance of vsprintf() by removing the conditional branch that
  is only needed at boot"

* tag 'trace-v4.17-rc5-vsprintf' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  vsprintf: Replace memory barrier with static_key for random_ptr_key update

usbip: usbip_host: fix bad unlock balance during stub_probe()

stub_probe() calls put_busid_priv() in an error path when device isn't
found in the busid_table. Fix it by making put_busid_priv() safe to be
called with null struct bus_id_priv pointer.

This problem happens when "usbip bind" is run without loading usbip_host
driver and then running modprobe. The first failed bind attempt unbinds
the device from the original driver and when usbip_host is modprobed,
stub_probe() runs and doesn't find the device in its busid table and calls
put_busid_priv(0 with null bus_id_priv pointer.

usbip-host 3-10.2: 3-10.2 is not in match_busid table...  skip!

[  367.359679] =====================================
[  367.359681] WARNING: bad unlock balance detected!
[  367.359683] 4.17.0-rc4+ #5 Not tainted
[  367.359685] -------------------------------------
[  367.359688] modprobe/2768 is trying to release lock (
[  367.359689]
==================================================================
[  367.359696] BUG: KASAN: null-ptr-deref in print_unlock_imbalance_bug+0x99/0x110
[  367.359699] Read of size 8 at addr 0000000000000058 by task modprobe/2768

[  367.359705] CPU: 4 PID: 2768 Comm: modprobe Not tainted 4.17.0-rc4+ #5

Fixes: 22076557b07c ("usbip: usbip_host: fix NULL-ptr deref and use-after-free errors") in usb-linus
Signed-off-by: Shuah Khan (Samsung OSG) <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: phy: micrel: add 125MHz reference clock workaround

The micrel KSZ9031 phy has a optional clock pin (CLK125_NDO) which can be
used as reference clock for the MAC unit. The clock signal must meet the
RGMII requirements to ensure the correct data transmission between the
MAC and the PHY. The KSZ9031 phy does not fulfill the duty cycle
requirement if the phy is configured as slave. For a complete
describtion look at the errata sheets: DS80000691D or DS80000692D.

The errata sheet recommends to force the phy into master mode whenever
there is a 1000Base-T link-up as work around. Only set the
"micrel,force-master" property if you use the phy reference clock provided
by CLK125_NDO pin as MAC reference clock in your application.

Attenation, this workaround is only usable if the link partner can
be configured to slave mode for 1000Base-T.

Signed-off-by: Markus Niebel <[email protected]>
[[email protected]: fix dt-binding documentation]
[[email protected]: use already existing result var for read/write]
[[email protected]: add error handling]
[[email protected]: add more comments]
Signed-off-by: Marco Felsch <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: purge write queue in tcp_connect_init()

syzkaller found a reliable way to crash the host, hitting a BUG()
in __tcp_retransmit_skb()

Malicous MSG_FASTOPEN is the root cause. We need to purge write queue
in tcp_connect_init() at the point we init snd_una/write_seq.

This patch also replaces the BUG() by a less intrusive WARN_ON_ONCE()

kernel BUG at net/ipv4/tcp_output.c:2837!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 5276 Comm: syz-executor0 Not tainted 4.17.0-rc3+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__tcp_retransmit_skb+0x2992/0x2eb0 net/ipv4/tcp_output.c:2837
RSP: 0000:ffff8801dae06ff8 EFLAGS: 00010206
RAX: ffff8801b9fe61c0 RBX: 00000000ffc18a16 RCX: ffffffff864e1a49
RDX: 0000000000000100 RSI: ffffffff864e2e12 RDI: 0000000000000005
RBP: ffff8801dae073a0 R08: ffff8801b9fe61c0 R09: ffffed0039c40dd2
R10: ffffed0039c40dd2 R11: ffff8801ce206e93 R12: 00000000421eeaad
R13: ffff8801ce206d4e R14: ffff8801ce206cc0 R15: ffff8801cd4f4a80
FS:  0000000000000000(0000) GS:ffff8801dae00000(0063) knlGS:00000000096bc900
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000020000000 CR3: 00000001c47b6000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
tcp_retransmit_skb+0x2e/0x250 net/ipv4/tcp_output.c:2923
tcp_retransmit_timer+0xc50/0x3060 net/ipv4/tcp_timer.c:488
tcp_write_timer_handler+0x339/0x960 net/ipv4/tcp_timer.c:573
tcp_write_timer+0x111/0x1d0 net/ipv4/tcp_timer.c:593
call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
expire_timers kernel/time/timer.c:1363 [inline]
__run_timers+0x79e/0xc50 kernel/time/timer.c:1666
run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
__do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
invoke_softirq kernel/softirq.c:365 [inline]
irq_exit+0x1d1/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:525 [inline]
smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863

Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: Neal Cardwell <[email protected]>
Reported-by: syzbot <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx5: Fix build break when CONFIG_SMP=n

Avoid using the kernel's irq_descriptor and return IRQ vector affinity
directly from the driver.

This fixes the following build break when CONFIG_SMP=n

include/linux/mlx5/driver.h: In function ‘mlx5_get_vector_affinity_hint’:
include/linux/mlx5/driver.h:1299:13: error:
‘struct irq_desc’ has no member named ‘affinity_hint’

Fixes: 6082d9c9c94a ("net/mlx5: Fix mlx5_get_vector_affinity function")
Signed-off-by: Saeed Mahameed <[email protected]>
CC: Randy Dunlap <[email protected]>
CC: Guenter Roeck <[email protected]>
CC: Thomas Gleixner <[email protected]>
Tested-by: Israel Rukshin <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Reported-by: Randy Dunlap <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipvlan: call netdevice notifier when master mac address changed

When master device's mac has been changed, the commit
32c10bbfe914 ("ipvlan: always use the current L2 addr of the
master") makes the IPVlan devices's mac changed also, but it
doesn't do related works such as flush the IPVlan devices's
arp table.

Signed-off-by: Keefe Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

drm/dumb-buffers: Integer overflow in drm_mode_create_ioctl()

There is a comment here which says that DIV_ROUND_UP() and that's where
the problem comes from. Say you pick:

args->bpp = UINT_MAX - 7;
args->width = 4;
args->height = 1;

The integer overflow in DIV_ROUND_UP() means "cpp" is UINT_MAX / 8 and
because of how we picked args->width that means cpp < UINT_MAX / 4.

I've fixed it by preventing the integer overflow in DIV_ROUND_UP(). I
removed the check for !cpp because it's not possible after this change.
I also changed all the 0xffffffffU references to U32_MAX.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516140026.GA19340@mwanda

vsprintf: Replace memory barrier with static_key for random_ptr_key update

Reviewing Tobin's patches for getting pointers out early before
entropy has been established, I noticed that there's a lone smp_mb() in
the code. As with most lone memory barriers, this one appears to be
incorrectly used.

We currently basically have this:

get_random_bytes(&ptr_key, sizeof(ptr_key));
/*
* have_filled_random_ptr_key==true is dependent on get_random_bytes().
* ptr_to_id() needs to see have_filled_random_ptr_key==true
* after get_random_bytes() returns.
*/
smp_mb();
WRITE_ONCE(have_filled_random_ptr_key, true);

And later we have:

if (unlikely(!have_filled_random_ptr_key))
return string(buf, end, "(ptrval)", spec);

/* Missing memory barrier here. */

hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key);

As the CPU can perform speculative loads, we could have a situation
with the following:

CPU0 CPU1
---- ----
   load ptr_key = 0
   store ptr_key = random
   smp_mb()
   store have_filled_random_ptr_key

   load have_filled_random_ptr_key = true

    BAD BAD BAD! (you're so bad!)

Because nothing prevents CPU1 from loading ptr_key before loading
have_filled_random_ptr_key.

But this race is very unlikely, but we can't keep an incorrect smp_mb() in
place. Instead, replace the have_filled_random_ptr_key with a static_branch
not_filled_random_ptr_key, that is initialized to true and changed to false
when we get enough entropy. If the update happens in early boot, the
static_key is updated immediately, otherwise it will have to wait till
entropy is filled and this happens in an interrupt handler which can't
enable a static_key, as that requires a preemptible context. In that case, a
work_queue is used to enable it, as entropy already took too long to
establish in the first place waiting a little more shouldn't hurt anything.

The benefit of using the static key is that the unlikely branch in
vsprintf() now becomes a nop.

Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: ad67b74d2469d ("printk: hash addresses printed with %p")
Acked-by: Linus Torvalds <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

x86/boot/compressed/64: Fix moving page table out of trampoline memory

cleanup_trampoline() relocates the top-level page table out of
trampoline memory. We use 'top_pgtable' as our new top-level page table.

But if the 'top_pgtable' would be referenced from C in a usual way,
the address of the table will be calculated relative to RIP.
After kernel gets relocated, the address will be in the middle of
decompression buffer and the page table may get overwritten.
This leads to a crash.

We calculate the address of other page tables relative to the relocation
address. It makes them safe. We should do the same for 'top_pgtable'.

Calculate the address of 'top_pgtable' in assembly and pass down to
cleanup_trampoline().

Move the page table to .pgtable section where the rest of page tables
are. The section is @nobits so we save 4k in kernel image.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: e9d0e6330eb8 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline()

Eric and Hugh have reported instant reboot due to my recent changes in
decompression code.

The root cause is that I didn't realize that we need to adjust GOT to be
able to run C code that early.

The problem is only visible with an older toolchain. Binutils >= 2.24 is
able to eliminate GOT references by replacing them with RIP-relative
address loads:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=80d873266dec

We need to adjust GOT two times:

- before calling paging_prepare() using the initial load address
- before calling C code from the relocated kernel

Reported-by: Eric Dumazet <[email protected]>
Reported-by: Hugh Dickins <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: 194a9749c73d ("x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

locking/percpu-rwsem: Annotate rwsem ownership transfer by setting RWSEM_OWNER_UNKNOWN

The filesystem freezing code needs to transfer ownership of a rwsem
embedded in a percpu-rwsem from the task that does the freezing to
another one that does the thawing by calling percpu_rwsem_release()
after freezing and percpu_rwsem_acquire() before thawing.

However, the new rwsem debug code runs afoul with this scheme by warning
that the task that releases the rwsem isn't the one that acquires it,
as reported by Amir Goldstein:

  DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
  WARNING: CPU: 1 PID: 1401 at /home/amir/build/src/linux/kernel/locking/rwsem.c:133 up_write+0x59/0x79

  Call Trace:
   percpu_up_write+0x1f/0x28
   thaw_super_locked+0xdf/0x120
   do_vfs_ioctl+0x270/0x5f1
   ksys_ioctl+0x52/0x71
   __x64_sys_ioctl+0x16/0x19
   do_syscall_64+0x5d/0x167
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

To work properly with the rwsem debug code, we need to annotate that the
rwsem ownership is unknown during the tranfer period until a brave soul
comes forward to acquire the ownership. During that period, optimistic
spinning will be disabled.

Reported-by: Amir Goldstein <[email protected]>
Tested-by: Amir Goldstein <[email protected]>
Signed-off-by: Waiman Long <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Theodore Y. Ts'o <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

locking/rwsem: Add a new RWSEM_ANONYMOUSLY_OWNED flag

There are use cases where a rwsem can be acquired by one task, but
released by another task. In thess cases, optimistic spinning may need
to be disabled. One example will be the filesystem freeze/thaw code
where the task that freezes the filesystem will acquire a write lock
on a rwsem and then un-owns it before returning to userspace. Later on,
another task will come along, acquire the ownership, thaw the filesystem
and release the rwsem.

Bit 0 of the owner field was used to designate that it is a reader
owned rwsem. It is now repurposed to mean that the owner of the rwsem
is not known. If only bit 0 is set, the rwsem is reader owned. If bit
0 and other bits are set, it is writer owned with an unknown owner.
One such value for the latter case is (-1L). So we can set owner to 1 for
reader-owned, -1 for writer-owned. The owner is unknown in both cases.

To handle transfer of rwsem ownership, the higher level code should
set the owner field to -1 to indicate a write-locked rwsem with unknown
owner. Optimistic spinning will be disabled in this case.

Once the higher level code figures who the new owner is, it can then
set the owner field accordingly.

Tested-by: Amir Goldstein <[email protected]>
Signed-off-by: Waiman Long <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Theodore Y. Ts'o <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

drm/i915/gen9: Add WaClearHIZ_WM_CHICKEN3 for bxt and glk

Factor in clear values wherever required while updating destination
min/max.

References: HSDES#1604444184
Signed-off-by: Michel Thierry <[email protected]>
Cc: [email protected]
Cc: Mika Kuoppala <[email protected]>
Cc: Oscar Mateo <[email protected]>
Reviewed-by: Mika Kuoppala <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Cc: [email protected]
Cc: Joonas Lahtinen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(backported from commit 0c79f9cb77eae28d48a4f9fc1b3341aacbbd260c)
Signed-off-by: Joonas Lahtinen <[email protected]>

drm/vmwgfx: Set dmabuf_size when vmw_dmabuf_init is successful

SOU primary plane prepare_fb hook depends upon dmabuf_size to pin up BO
(and not call a new vmw_dmabuf_init) when a new fb size is same as
current fb. This was changed in a recent commit which is causing
page_flip to fail on VM with low display memory and multi-mon failure
when cycle monitors from secondary display.

Cc: <[email protected]> # 4.14, 4.16
Fixes: 20fb5a635a0c ("drm/vmwgfx: Unpin the screen object backup buffer when not used")
Signed-off-by: Deepak Rawat <[email protected]>
Reviewed-by: Sinclair Yeh <[email protected]>
Signed-off-by: Thomas Hellstrom <[email protected]>

IB/umem: Use the correct mm during ib_umem_release

User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads.

If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has
exited, get_pid_task will return NULL and ib_umem_release will not
decrease mm->pinned_vm.

Instead of using threads to locate the mm, use the overall tgid from the
ib_ucontext struct instead. This matches the behavior of ODP and
disassociate in handling the mm of the process that called ibv_reg_mr.

Cc: <[email protected]>
Fixes: 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get")
Signed-off-by: Lidong Chen <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

clk: stm32: fix: stm32 clock drivers are not compiled by default

Clock driver is mandatory if the machine is selected.
Then don't use 'bool' and 'depends on' commands, but 'def_bool'
with the machine(s).

Fixes: da32d3539fca ("clk: stm32: add configuration flags for each of the stm32 drivers")
Signed-off-by: Gabriel Fernandez <[email protected]>
Acked-by: Alexandre TORGUE <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

clk: imx6ull: use OSC clock during AXI rate change

On i.MX6 ULL using PLL3 seems to cause a freeze when setting
the parent to IMX6UL_CLK_PLL3_USB_OTG. This only seems to appear
since commit 6f9575e55632 ("clk: imx: Add CLK_IS_CRITICAL flag
for busy divider and busy mux"), probably because the clock is
now forced to be on.

Fixes: 6f9575e55632("clk: imx: Add CLK_IS_CRITICAL flag for busy divider and busy mux")
Signed-off-by: Stefan Agner <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>

Merge tag 'davinci-fixes-for-v4.17-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into fixes

Second set of fixes for TI DaVinci.

They are needed for DM6467 EVM to work. The first patch fixes an
issue with timer interrupt and the second two are needed for video
driver to probe successfully.

* tag 'davinci-fixes-for-v4.17-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci:
  ARM: davinci: board-dm646x-evm: set VPIF capture card name
  ARM: davinci: board-dm646x-evm: pass correct I2C adapter id for VPIF
  ARM: davinci: dm646x: fix timer interrupt generation

Signed-off-by: Olof Johansson <[email protected]>

tick/broadcast: Use for_each_cpu() specially on UP kernels

for_each_cpu() unintuitively reports CPU0 as set independent of the actual
cpumask content on UP kernels. This causes an unexpected PIT interrupt
storm on a UP kernel running in an SMP virtual machine on Hyper-V, and as
a result, the virtual machine can suffer from a strange random delay of 1~20
minutes during boot-up, and sometimes it can hang forever.

Protect if by checking whether the cpumask is empty before entering the
for_each_cpu() loop.

[ tglx: Use !IS_ENABLED(CONFIG_SMP) instead of #ifdeffery ]

Signed-off-by: Dexuan Cui <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Josh Poulson <[email protected]>
Cc: "Michael Kelley (EOSG)" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: [email protected]
Cc: Rakib Mullick <[email protected]>
Cc: Jork Loeser <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: KY Srinivasan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Link: https://lkml.kernel.org/r/KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
Link: https://lkml.kernel.org/r/KL1P15301MB0006FA63BC22BEB64902EAA0BF930@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM

Merge tag 'afs-fixes-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull AFS fixes from David Howells:
"Here's a set of patches that fix a number of bugs in the in-kernel AFS
  client, including:

   - Fix directory locking to not use individual page locks for
     directory reading/scanning but rather to use a semaphore on the
     afs_vnode struct as the directory contents must be read in a single
     blob and data from different reads must not be mixed as the entire
     contents may be shuffled about between reads.

   - Fix address list parsing to handle port specifiers correctly.

   - Only give up callback records on a server if we actually talked to
     that server (we might not be able to access a server).

   - Fix some callback handling bugs, including refcounting,
     whole-volume callbacks and when callbacks actually get broken in
     response to a CB.CallBack op.

   - Fix some server/address rotation bugs, including giving up if we
     can't probe a server; giving up if a server says it doesn't have a
     volume, but there are more servers to try.

   - Fix the decoding of fetched statuses to be OpenAFS compatible.

   - Fix the handling of server lookups in Cache Manager ops (such as
     CB.InitCallBackState3) to use a UUID if possible and to handle no
     server being found.

   - Fix a bug in server lookup where not all addresses are compared.

   - Fix the non-encryption of calls that prevents some servers from
     being accessed (this also requires an AF_RXRPC patch that has
     already gone in through the net tree).

  There's also a patch that adds tracepoints to log Cache Manager ops
  that don't find a matching server, either by UUID or by address"

* tag 'afs-fixes-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix the non-encryption of calls
  afs: Fix CB.CallBack handling
  afs: Fix whole-volume callback handling
  afs: Fix afs_find_server search loop
  afs: Fix the handling of an unfound server in CM operations
  afs: Add a tracepoint to record callbacks from unlisted servers
  afs: Fix the handling of CB.InitCallBackState3 to find the server by UUID
  afs: Fix VNOVOL handling in address rotation
  afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility
  afs: Fix server rotation's handling of fileserver probe failure
  afs: Fix refcounting in callback registration
  afs: Fix giving up callbacks on server destruction
  afs: Fix address list parsing
  afs: Fix directory page locking

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Two small driver fixes: aacraid to fix an unknown IU type on task
  management functions which causes a firmware fault and vmw_pvscsi to
  change a return code to retry the operation instead of causing an
  immediate error"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: aacraid: Correct hba_send to include iu_type
  scsi: vmw-pvscsi: return DID_BUS_BUSY for adapter-initated aborts

Merge tag 'drm-fixes-for-v4.17-rc6-urgent' of git://people.freedesktop.org/~airlied/linux

Pull drm fix from Dave Airlie:
"This fixes the mmap regression reported to me on irc by an i686 kernel
  user today, he's tested the fix works, and I've audited all the drm
  drivers for the bad mmap usage and since we use the mmap offset as a
  lookup in a table we aren't inclined to have anything bad in there"

[ See commit be83bbf80682 ("mmap: introduce sane default mmap limits")
  for details and the note on why the GPU drivers were expected to be a
  special case.    - Linus ]

* tag 'drm-fixes-for-v4.17-rc6-urgent' of git://people.freedesktop.org/~airlied/linux:
  drm: set FMODE_UNSIGNED_OFFSET for drm files

mtd: rawnand: Fix return type of __DIVIDE() when called with 32-bit

The __DIVIDE() macro checks whether it is called with a 32-bit or 64-bit
dividend, to select the appropriate divide-and-round-up routine.
As the check uses the ternary operator, the result will always be
promoted to a type that can hold both results, i.e. unsigned long long.

When using this result in a division on a 32-bit system, this may lead
to link errors like:

ERROR: "__udivdi3" [drivers/mtd/nand/raw/nand.ko] undefined!

Fix this by casting the result of the division to the type of the
dividend.

Fixes: 8878b126df769831 ("mtd: nand: add ->exec_op() implementation")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>

KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls

kvm_read_guest() will eventually look up in kvm_memslots(), which requires
either to hold the kvm->slots_lock or to be inside a kvm->srcu critical
section.
In contrast to x86 and s390 we don't take the SRCU lock on every guest
exit, so we have to do it individually for each kvm_read_guest() call.
Use the newly introduced wrapper for that.

Cc: Stable <[email protected]> # 4.12+
Reported-by: Jan Glauber <[email protected]>
Signed-off-by: Andre Przywara <[email protected]>
Acked-by: Christoffer Dall <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock

kvm_read_guest() will eventually look up in kvm_memslots(), which requires
either to hold the kvm->slots_lock or to be inside a kvm->srcu critical
section.
In contrast to x86 and s390 we don't take the SRCU lock on every guest
exit, so we have to do it individually for each kvm_read_guest() call.

Provide a wrapper which does that and use that everywhere.

Note that ending the SRCU critical section before returning from the
kvm_read_guest() wrapper is safe, because the data has been *copied*, so
we don't need to rely on valid references to the memslot anymore.

Cc: Stable <[email protected]> # 4.8+
Reported-by: Jan Glauber <[email protected]>
Signed-off-by: Andre Przywara <[email protected]>
Acked-by: Christoffer Dall <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: arm/arm64: VGIC/ITS: Promote irq_lock() in update_affinity

Apparently the development of update_affinity() overlapped with the
promotion of irq_lock to be _irqsave, so the patch didn't convert this
lock over. This will make lockdep complain.

Fix this by disabling IRQs around the lock.

Cc: [email protected]
Fixes: 08c9fd042117 ("KVM: arm/arm64: vITS: Add a helper to update the affinity of an LPI")
Reported-by: Jan Glauber <[email protected]>
Signed-off-by: Andre Przywara <[email protected]>
Acked-by: Christoffer Dall <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: arm/arm64: Properly protect VGIC locks from IRQs

As Jan reported [1], lockdep complains about the VGIC not being bullet
proof. This seems to be due to two issues:
- When commit 006df0f34930 ("KVM: arm/arm64: Support calling
  vgic_update_irq_pending from irq context") promoted irq_lock and
  ap_list_lock to _irqsave, we forgot two instances of irq_lock.
  lockdeps seems to pick those up.
- If a lock is _irqsave, any other locks we take inside them should be
  _irqsafe as well. So the lpi_list_lock needs to be promoted also.

This fixes both issues by simply making the remaining instances of those
locks _irqsave.
One irq_lock is addressed in a separate patch, to simplify backporting.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/575718.html

Cc: [email protected]
Fixes: 006df0f34930 ("KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context")
Reported-by: Jan Glauber <[email protected]>
Acked-by: Christoffer Dall <[email protected]>
Signed-off-by: Andre Przywara <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

cxl: Report the tunneled operations status

Failure to synchronize the tunneled operations does not prevent
the initialization of the cxl card. This patch reports the tunneled
operations status via /sys.

Signed-off-by: Philippe Bergheaud <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

cxl: Set the PBCQ Tunnel BAR register when enabling capi mode

Skiboot used to set the default Tunnel BAR register value when capi
mode was enabled. This approach was ok for the cxl driver, but
prevented other drivers from choosing different values.

Skiboot versions > 5.11 will not set the default value any longer.
This patch modifies the cxl driver to set/reset the Tunnel BAR
register when entering/exiting the cxl mode, with
pnv_pci_set_tunnel_bar().

That should work with old skiboot (since we are re-writing the value
already set) and new skiboot.

mpe: The tunnel support was only merged into Linux recently, in commit
d6a90bb83b50 ("powerpc/powernv: Enable tunneled operations")
(v4.17-rc1), so with new skiboot kernels between that commit and this
will not work correctly.

Fixes: d6a90bb83b50 ("powerpc/powernv: Enable tunneled operations")
Signed-off-by: Philippe Bergheaud <[email protected]>
Reviewed-by: Christophe Lombard <[email protected]>
Acked-by: Frederic Barrat <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

drm/vc4: Fix leak of the file_priv that stored the perfmon.

Signed-off-by: Eric Anholt <[email protected]>
Fixes: 65101d8c9108 ("drm/vc4: Expose performance counters to userspace")
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Boris Brezillon <[email protected]>
Signed-off-by: Maarten Lankhorst <[email protected]>

KVM: X86: Lower the default timer frequency limit to 200us

Anthoine reported:
The period used by Windows change over time but it can be 1
milliseconds or less. I saw the limit_periodic_timer_frequency
print so 500 microseconds is sometimes reached.

As suggested by Paolo, lower the default timer frequency limit to a
smaller interval of 200 us (5000 Hz) to leave some headroom. This
is required due to Windows 10 changing the scheduler tick limit
from 1024 Hz to 2048 Hz.

Reported-by: Anthoine Bourgeois <[email protected]>
Suggested-by: Paolo Bonzini <[email protected]>
Reviewed-by: Darren Kenny <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Anthoine Bourgeois <[email protected]>
Cc: Darren Kenny <[email protected]>
Cc: Jan Kiszka <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

ARM: davinci: board-dm646x-evm: set VPIF capture card name

VPIF capture driver expects card name to be set since it
uses it without checking for NULL. The commit which
introduced VPIF display and capture support added card
name only for display, not for capture.

Set it in platform data to probe driver successfully.

While at it, also fix the display card name to something more
appropriate.

Fixes: 85609c1ccda6 ("DaVinci: DM646x - platform changes for vpif capture and display drivers")
Signed-off-by: Sekhar Nori <[email protected]>

ARM: davinci: board-dm646x-evm: pass correct I2C adapter id for VPIF

commit a16cb91ad9c4 ("[media] media: vpif: use a configurable
i2c_adapter_id for vpif display") removed hardcoded I2C adaptor
setting in VPIF driver, but missed updating platform data passed
from DM646x board.

Fix it.

Fixes: a16cb91ad9c4 ("[media] media: vpif: use a configurable i2c_adapter_id for vpif display")
Signed-off-by: Sekhar Nori <[email protected]>

ARM: davinci: dm646x: fix timer interrupt generation

commit b38434145b34 ("ARM: davinci: irqs: Correct McASP1 TX interrupt
definition for DM646x") inadvertently removed priority setting for
timer0_12 (bottom half of timer0). This timer is used as clockevent.

When INTPRIn register setting for an interrupt is left at 0, it is
mapped to FIQ by the AINTC causing the timer interrupt to not get
generated.

Fix it by including an entry for timer0_12 in interrupt priority map
array. While at it, move the clockevent comment to the right place.

Fixes: b38434145b34 ("ARM: davinci: irqs: Correct McASP1 TX interrupt definition for DM646x")
Signed-off-by: Sekhar Nori <[email protected]>

usbip: usbip_host: fix NULL-ptr deref and use-after-free errors

usbip_host updates device status without holding lock from stub probe,
disconnect and rebind code paths. When multiple requests to import a
device are received, these unprotected code paths step all over each
other and drive fails with NULL-ptr deref and use-after-free errors.

The driver uses a table lock to protect the busid array for adding and
deleting busids to the table. However, the probe, disconnect and rebind
paths get the busid table entry and update the status without holding
the busid table lock. Add a new finer grain lock to protect the busid
entry. This new lock will be held to search and update the busid entry
fields from get_busid_idx(), add_match_busid() and del_match_busid().

match_busid_show() does the same to access the busid entry fields.

get_busid_priv() changed to return the pointer to the busid entry holding
the busid lock. stub_probe(), stub_disconnect() and stub_device_rebind()
call put_busid_priv() to release the busid lock before returning. This
changes fixes the unprotected code paths eliminating the race conditions
in updating the busid entries.

Reported-by: Jakub Jirasek
Signed-off-by: Shuah Khan (Samsung OSG) <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usbip: usbip_host: run rebind from exit when module is removed

After removing usbip_host module, devices it releases are left without
a driver. For example, when a keyboard or a mass storage device are
bound to usbip_host when it is removed, these devices are no longer
bound to any driver.

Fix it to run device_attach() from the module exit routine to restore
the devices to their original drivers. This includes cleanup changes
and moving device_attach() code to a common routine to be called from
rebind_store() and usbip_host_exit().

Signed-off-by: Shuah Khan (Samsung OSG) <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usbip: usbip_host: delete device from busid_table after rebind

Device is left in the busid_table after unbind and rebind. Rebind
initiates usb bus scan and the original driver claims the device.
After rescan the device should be deleted from the busid_table as
it no longer belongs to usbip_host.

Fix it to delete the device after device_attach() succeeds.

Signed-off-by: Shuah Khan (Samsung OSG) <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

usbip: usbip_host: refine probe and disconnect debug msgs to be useful

Refine probe and disconnect debug msgs to be useful and say what is
in progress.

Signed-off-by: Shuah Khan <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

i2c: viperboard: return message count on master_xfer success

Returning zero is wrong in this case.

Signed-off-by: Peter Rosin <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Fixes: 174a13aa8669 ("i2c: Add viperboard i2c master driver")

i2c: pmcmsp: fix error return from master_xfer

Returning -1 (-EPERM) is not appropriate here, go with -EIO.

Signed-off-by: Peter Rosin <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Fixes: 1b144df1d7d6 ("i2c: New PMC MSP71xx TWI bus driver")

i2c: pmcmsp: return message count on master_xfer success

Returning zero is wrong in this case.

Signed-off-by: Peter Rosin <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Fixes: 1b144df1d7d6 ("i2c: New PMC MSP71xx TWI bus driver")

Merge tag 'perf-urgent-for-mingo-4.17-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

- Fix segfault when processing unknown threads in cs-etm (Leo Yan)

- Fix "perf test inet_pton" on s390 failing due to missing inline (Thomas Richter)

- Display all available events on 'perf annotate --stdio' (Jin Yao)

- Add missing newline when parsing empty BPF proggie (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

s390/qdio: don't release memory in qdio_setup_irq()

Calling qdio_release_memory() on error is just plain wrong. It frees
the main qdio_irq struct, when following code still uses it.

Also, no other error path in qdio_establish() does this. So trust
callers to clean up via qdio_free() if some step of the QDIO
initialization fails.

Fixes: 779e6e1c724d ("[S390] qdio: new qdio driver.")
Cc: <[email protected]> #v2.6.27+
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

s390/qdio: fix access to uninitialized qdio_q fields

Ever since CQ/QAOB support was added, calling qdio_free() straight after
qdio_alloc() results in qdio_release_memory() accessing uninitialized
memory (ie. q->u.out.use_cq and q->u.out.aobs). Followed by a
kmem_cache_free() on the random AOB addresses.

For older kernels that don't have 6e30c549f6ca, the same applies if
qdio_establish() fails in the DEV_STATE_ONLINE check.

While initializing q->u.out.use_cq would be enough to fix this
particular bug, the more future-proof change is to just zero-alloc the
whole struct.

Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks")
Cc: <[email protected]> #v3.2+
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>

objtool: Detect RIP-relative switch table references

Typically a switch table can be found by detecting a .rodata access
followed an indirect jump:

    1969: 4a 8b 0c e5 00 00 00 mov    0x0(,%r12,8),%rcx
    1970: 00
196d: R_X86_64_32S .rodata+0x438
    1971: e9 00 00 00 00        jmpq   1976 <dispc_runtime_suspend+0xb6a>
1972: R_X86_64_PC32 __x86_indirect_thunk_rcx-0x4

Randy Dunlap reported a case (seen with GCC 4.8) where the .rodata
access uses RIP-relative addressing:

    19bd: 48 8b 3d 00 00 00 00 mov    0x0(%rip),%rdi        # 19c4 <dispc_runtime_suspend+0xbb8>
19c0: R_X86_64_PC32 .rodata+0x45c
    19c4: e9 00 00 00 00        jmpq   19c9 <dispc_runtime_suspend+0xbbd>
19c5: R_X86_64_PC32 __x86_indirect_thunk_rdi-0x4

In this case the relocation addend needs to be adjusted accordingly in
order to find the location of the switch table.

The fix is for case 3 (as described in the comments), but also make the
existing case 1 & 2 checks more precise by only adjusting the addend for
R_X86_64_PC32 relocations.

This fixes the following warnings:

  drivers/video/fbdev/omap2/omapfb/dss/dispc.o: warning: objtool: dispc_runtime_suspend()+0xbb8: sibling call from callable instruction with modified stack frame
  drivers/video/fbdev/omap2/omapfb/dss/dispc.o: warning: objtool: dispc_runtime_resume()+0xcc5: sibling call from callable instruction with modified stack frame

Reported-by: Randy Dunlap <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/b6098294fd67afb69af8c47c9883d7a68bf0f8ea.1526305958.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <[email protected]>

ALSA: usb-audio: Use Class Specific EP for UAC3 devices.

bmAtributes offset doesn't exist in the UAC3 CS_EP descriptor.
Hence, checking for pitch control as if it was UAC2 doesn't make
any sense. Use the defined UAC3 offsets instead.

Fixes: 9a2fe9b801f5 ("ALSA: usb: initial USB Audio Device Class 3.0 support")
Signed-off-by: Jorge Sanjuan <[email protected]>
Reviewed-by: Ruslan Bilovol <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

drm: set FMODE_UNSIGNED_OFFSET for drm files

Since we have the ttm and gem vma managers using a subset
of the file address space for objects, and these start at
0x100000000 they will overflow the new mmap checks.

I've checked all the mmap routines I could see for any
bad behaviour but overall most people use GEM/TTM VMA
managers even the legacy drivers have a hashtable.

Reported-and-Tested-by: Arthur Marsh (amarsh04 on #radeon)
Fixes: be83bbf8068 (mmap: introduce sane default mmap limits)
Signed-off-by: Dave Airlie <[email protected]>

scsi: core: clean up generated file scsi_devinfo_tbl.c

"make clean" should remove the generated file "scsi_devinfo_tbl.c", so
list it in the clean-files variable so that the file gets cleaned up.

Fixes: 345e29608b4b ("scsi: scsi: Export blacklist flags to sysfs")
Cc: Hannes Reinecke <[email protected]>
Signed-off-by: Randy Dunlap <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: target: tcmu: fix error resetting qfull_time_out to default

Problem:

$ cat /sys/kernel/config/target/core/user_0/block/attrib/qfull_time_out
-1

$ echo "-1" > /sys/kernel/config/target/core/user_0/block/attrib/qfull_time_out
-bash: echo: write error: Invalid argument

Fix:

This patch will help reset qfull_time_out to its default
i.e. qfull_time_out=-1.

Signed-off-by: Prasanna Kumar Kalever <[email protected]>
Acked-by: Mike Christie <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

vmxnet3: use DMA memory barriers where required

The gen bits must be read first from (resp. written last to) DMA memory.
The proper way to enforce this on Linux is to call dma_rmb() (resp.
dma_wmb()).

Signed-off-by: Regis Duchesne <[email protected]>
Acked-by: Ronak Doshi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

vmxnet3: set the DMA mask before the first DMA map operation

The DMA mask must be set before, not after, the first DMA map operation, or
the first DMA map operation could in theory fail on some systems.

Fixes: b0eb57cb97e78 ("VMXNET3: Add support for virtual IOMMU")
Signed-off-by: Regis Duchesne <[email protected]>
Acked-by: Ronak Doshi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cxgb4: Correct ntuple mask validation for hash filters

Earlier code of doing bitwise AND with field width bits was wrong.
Instead, simplify code to calculate ntuple_mask based on supplied
fields and then compare with mask configured in hw - which is the
correct and simpler way to validate ntuple mask.

Fixes: 3eb8b62d5a26 ("cxgb4: add support to create hash-filters via tc-flower offload")
Signed-off-by: Kumar Sanghvi <[email protected]>
Signed-off-by: Ganesh Goudar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs

Check the TIF_32BIT_FPREGS task setting of the tracee rather than the
tracer in determining the layout of floating-point general registers in
the floating-point context, correcting access to odd-numbered registers
for o32 tracees where the setting disagrees between the two processes.

Fixes: 597ce1723e0f ("MIPS: Support for 64-bit FP with O32 binaries")
Signed-off-by: Maciej W. Rozycki <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 3.14+
Signed-off-by: James Hogan <[email protected]>

MIPS: xilfpga: Actually include FDT in fitImage

Commit b35565bb16a5 ("MIPS: generic: Add support for MIPSfpga") added
and its.S file for xilfpga but forgot to add it to
arch/mips/generic/Platform so it is never used.

Fixes: b35565bb16a5 ("MIPS: generic: Add support for MIPSfpga")
Signed-off-by: Alexandre Belloni <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 4.15+
Patchwork: https://patchwork.linux-mips.org/patch/19245/
Signed-off-by: James Hogan <[email protected]>

MIPS: xilfpga: Stop generating useless dtb.o

A dtb.o is generated from nexys4ddr.dts but this is never used since it
has been moved to mips/generic with commit b35565bb16a5 ("MIPS: generic:
Add support for MIPSfpga").

Fixes: b35565bb16a5 ("MIPS: generic: Add support for MIPSfpga")
Signed-off-by: Alexandre Belloni <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 4.15+
Patchwork: https://patchwork.linux-mips.org/patch/19244/
Signed-off-by: James Hogan <[email protected]>

KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"

Trivial fix to spelling mistake in debugfs_entries text.

Fixes: 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM")
Signed-off-by: Colin Ian King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: <[email protected]> # 3.10+
Signed-off-by: James Hogan <[email protected]>

MIPS: ptrace: Expose FIR register through FP regset

Correct commit 7aeb753b5353 ("MIPS: Implement task_user_regset_view.")
and expose the FIR register using the unused 4 bytes at the end of the
NT_PRFPREG regset.  Without that register included clients cannot use
the PTRACE_GETREGSET request to retrieve the complete FPU register set
and have to resort to one of the older interfaces, either PTRACE_PEEKUSR
or PTRACE_GETFPREGS, to retrieve the missing piece of data.  Also the
register is irreversibly missing from core dumps.

This register is architecturally hardwired and read-only so the write
path does not matter.  Ignore data supplied on writes then.

Fixes: 7aeb753b5353 ("MIPS: Implement task_user_regset_view.")
Signed-off-by: James Hogan <[email protected]>
Signed-off-by: Maciej W. Rozycki <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 3.13+
Patchwork: https://patchwork.linux-mips.org/patch/19273/
Signed-off-by: James Hogan <[email protected]>

MIPS: Fix build with DEBUG_ZBOOT and MACH_JZ4770

The debug definitions were missing for MACH_JZ4770, resulting in a build
failure when DEBUG_ZBOOT was set.

Since the UART addresses are the same across all Ingenic SoCs, we just
use a #ifdef CONFIG_MACH_INGENIC instead of checking for individual
Ingenic SoCs.

Additionally, I added a #define for the UART0 address in-code and
dropped the <asm/mach-jz4740/base.h> include, for the reason that this
include file is slowly being phased out as the whole platform is being
moved to devicetree.

Fixes: 9be5f3e92ed5 ("MIPS: ingenic: Initial JZ4770 support")
Signed-off-by: Paul Cercueil <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 4.16
Patchwork: https://patchwork.linux-mips.org/patch/18957/
Signed-off-by: James Hogan <[email protected]>

MIPS: c-r4k: Fix data corruption related to cache coherence

When DMA will be performed to a MIPS32 1004K CPS, the L1-cache for the
range needs to be flushed and invalidated first.
The code currently takes one of two approaches.
1/ If the range is less than the size of the dcache, then HIT type
   requests flush/invalidate cache lines for the particular addresses.
   HIT-type requests a globalised by the CPS so this is safe on SMP.

2/ If the range is larger than the size of dcache, then INDEX type
   requests flush/invalidate the whole cache. INDEX type requests affect
   the local cache only. CPS does not propagate them in any way. So this
   invalidation is not safe on SMP CPS systems.

Data corruption due to '2' can quite easily be demonstrated by
repeatedly "echo 3 > /proc/sys/vm/drop_caches" and then sha1sum a file
that is several times the size of available memory. Dropping caches
means that large contiguous extents (large than dcache) are more likely.

This was not a problem before Linux-4.8 because option 2 was never used
if CONFIG_MIPS_CPS was defined. The commit which removed that apparently
didn't appreciate the full consequence of the change.

We could, in theory, globalize the INDEX based flush by sending an IPI
to other cores. These cache invalidation routines can be called with
interrupts disabled and synchronous IPI require interrupts to be
enabled. Asynchronous IPI may not trigger writeback soon enough. So we
cannot use IPI in practice.

We can already test if IPI would be needed for an INDEX operation with
r4k_op_needs_ipi(R4K_INDEX). If this is true then we mustn't try the
INDEX approach as we cannot use IPI. If this is false (e.g. when there
is only one core and hence one L1 cache) then it is safe to use the
INDEX approach without IPI.

This patch avoids options 2 if r4k_op_needs_ipi(R4K_INDEX), and so
eliminates the corruption.

Fixes: c00ab4896ed5 ("MIPS: Remove cpu_has_safe_index_cacheops")
Signed-off-by: NeilBrown <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 4.8+
Patchwork: https://patchwork.linux-mips.org/patch/19259/
Signed-off-by: James Hogan <[email protected]>