Git Repo - linux.git/log

loop: make do_req_filebacked more robust

Use a switch statement to iterate over the possible operations and
error out if it's an incorrect one.

Signed-off-by: Jens Axboe <[email protected]>

loop: don't try to use AIO for discards

Fix a fat-fingered conversion to the req_op accessors, and also
use a switch statement to make it more obvious what is being checked.

Signed-off-by: Christoph Hellwig <[email protected]>
Reported-by: Dave Chinner <[email protected]>
Fixes: c2df40 ("drivers: use req op accessor");
Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

blk-mq: fix deadlock in blk_mq_register_disk() error path

If we fail registering any of the hardware queues, we call
into blk_mq_unregister_disk() with the hotplug mutex already
held. Since blk_mq_unregister_disk() attempts to acquire the
same mutex, we end up in a less than happy place.

Reported-by: Jinpu Wang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Include: blkdev: Removed duplicate 'struct request;' declaration.

In include/linux/blkdev.h duplicate declarations of the request
struct exist. Cleaned up by removing the second, unneeded
declaration.

Signed-off-by: John Pittman <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Fixup direct bi_rw modifiers

bi_rw should be using bio_set_op_attrs to set bi_rw.

Signed-off-by: Shaun Tancheff <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Mike Christie <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

block: fix bdi vs gendisk lifetime mismatch

The name for a bdi of a gendisk is derived from the gendisk's devt.
However, since the gendisk is destroyed before the bdi it leaves a
window where a new gendisk could dynamically reuse the same devt while a
bdi with the same name is still live.  Arrange for the bdi to hold a
reference against its "owner" disk device while it is registered.
Otherwise we can hit sysfs duplicate name collisions like the following:

WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80
sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1'

Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015
  0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec
  ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351
  0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8
Call Trace:
  [<ffffffff8134caec>] dump_stack+0x63/0x87
  [<ffffffff8108c351>] __warn+0xd1/0xf0
  [<ffffffff8108c3cf>] warn_slowpath_fmt+0x5f/0x80
  [<ffffffff812a0d34>] sysfs_warn_dup+0x64/0x80
  [<ffffffff812a0e1e>] sysfs_create_dir_ns+0x7e/0x90
  [<ffffffff8134faaa>] kobject_add_internal+0xaa/0x320
  [<ffffffff81358d4e>] ? vsnprintf+0x34e/0x4d0
  [<ffffffff8134ff55>] kobject_add+0x75/0xd0
  [<ffffffff816e66b2>] ? mutex_lock+0x12/0x2f
  [<ffffffff8148b0a5>] device_add+0x125/0x610
  [<ffffffff8148b788>] device_create_groups_vargs+0xd8/0x100
  [<ffffffff8148b7cc>] device_create_vargs+0x1c/0x20
  [<ffffffff811b775c>] bdi_register+0x8c/0x180
  [<ffffffff811b7877>] bdi_register_dev+0x27/0x30
  [<ffffffff813317f5>] add_disk+0x175/0x4a0

Cc: <[email protected]>
Reported-by: Yi Zhang <[email protected]>
Tested-by: Yi Zhang <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Fixed up missing 0 return in bdi_register_owner().

Signed-off-by: Jens Axboe <[email protected]>

blk-mq: Allow timeouts to run while queue is freezing

In case a submitted request gets stuck for some reason, the block layer
can prevent the request starvation by starting the scheduled timeout work.
If this stuck request occurs at the same time another thread has started
a queue freeze, the blk_mq_timeout_work will not be able to acquire the
queue reference and will return silently, thus not issuing the timeout.
But since the request is already holding a q_usage_counter reference and
is unable to complete, it will never release its reference, preventing
the queue from completing the freeze started by first thread.  This puts
the request_queue in a hung state, forever waiting for the freeze
completion.

This was observed while running IO to a NVMe device at the same time we
toggled the CPU hotplug code. Eventually, once a request got stuck
requiring a timeout during a queue freeze, we saw the CPU Hotplug
notification code get stuck inside blk_mq_freeze_queue_wait, as shown in
the trace below.

[c000000deaf13690] [c000000deaf13738] 0xc000000deaf13738 (unreliable)
[c000000deaf13860] [c000000000015ce8] __switch_to+0x1f8/0x350
[c000000deaf138b0] [c000000000ade0e4] __schedule+0x314/0x990
[c000000deaf13940] [c000000000ade7a8] schedule+0x48/0xc0
[c000000deaf13970] [c0000000005492a4] blk_mq_freeze_queue_wait+0x74/0x110
[c000000deaf139e0] [c00000000054b6a8] blk_mq_queue_reinit_notify+0x1a8/0x2e0
[c000000deaf13a40] [c0000000000e7878] notifier_call_chain+0x98/0x100
[c000000deaf13a90] [c0000000000b8e08] cpu_notify_nofail+0x48/0xa0
[c000000deaf13ac0] [c0000000000b92f0] _cpu_down+0x2a0/0x400
[c000000deaf13b90] [c0000000000b94a8] cpu_down+0x58/0xa0
[c000000deaf13bc0] [c0000000006d5dcc] cpu_subsys_offline+0x2c/0x50
[c000000deaf13bf0] [c0000000006cd244] device_offline+0x104/0x140
[c000000deaf13c30] [c0000000006cd40c] online_store+0x6c/0xc0
[c000000deaf13c80] [c0000000006c8c78] dev_attr_store+0x68/0xa0
[c000000deaf13cc0] [c0000000003974d0] sysfs_kf_write+0x80/0xb0
[c000000deaf13d00] [c0000000003963e8] kernfs_fop_write+0x188/0x200
[c000000deaf13d50] [c0000000002e0f6c] __vfs_write+0x6c/0xe0
[c000000deaf13d90] [c0000000002e1ca0] vfs_write+0xc0/0x230
[c000000deaf13de0] [c0000000002e2cdc] SyS_write+0x6c/0x110
[c000000deaf13e30] [c000000000009204] system_call+0x38/0xb4

The fix is to allow the timeout work to execute in the window between
dropping the initial refcount reference and the release of the last
reference, which actually marks the freeze completion.  This can be
achieved with percpu_refcount_tryget, which does not require the counter
to be alive.  This way the timeout work can do it's job and terminate a
stuck request even during a freeze, returning its reference and avoiding
the deadlock.

Allowing the timeout to run is just a part of the fix, since for some
devices, we might get stuck again inside the device driver's timeout
handler, should it attempt to allocate a new request in that path -
which is a quite common action for Abort commands, which need to be sent
after a timeout.  In NVMe, for instance, we call blk_mq_alloc_request
from inside the timeout handler, which will fail during a freeze, since
it also tries to acquire a queue reference.

I considered a similar change to blk_mq_alloc_request as a generic
solution for further device driver hangs, but we can't do that, since it
would allow new requests to disturb the freeze process.  I thought about
creating a new function in the block layer to support unfreezable
requests for these occasions, but after working on it for a while, I
feel like this should be handled in a per-driver basis.  I'm now
experimenting with changes to the NVMe timeout path, but I'm open to
suggestions of ways to make this generic.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
Cc: Brian King <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

nbd: fix race in ioctl

Quentin ran into this bug:

WARNING: CPU: 64 PID: 10085 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x65/0x80
sysfs: cannot create duplicate filename '/devices/virtual/block/nbd3/pid'
Modules linked in: nbd
CPU: 64 PID: 10085 Comm: qemu-nbd Tainted: G D 4.6.0+ #7
0000000000000000 ffff8820330bba68 ffffffff814b8791 ffff8820330bbac8
0000000000000000 ffff8820330bbab8 ffffffff810d04ab ffff8820330bbaa8
0000001f00000296 0000000000017681 ffff8810380bf000 ffffffffa0001790
Call Trace:
[<ffffffff814b8791>] dump_stack+0x4d/0x6c
[<ffffffff810d04ab>] __warn+0xdb/0x100
[<ffffffff810d0574>] warn_slowpath_fmt+0x44/0x50
[<ffffffff81218c65>] sysfs_warn_dup+0x65/0x80
[<ffffffff81218a02>] sysfs_add_file_mode_ns+0x172/0x180
[<ffffffff81218a35>] sysfs_create_file_ns+0x25/0x30
[<ffffffff81594a76>] device_create_file+0x36/0x90
[<ffffffffa0000e8d>] __nbd_ioctl+0x32d/0x9b0 [nbd]
[<ffffffff814cc8e8>] ? find_next_bit+0x18/0x20
[<ffffffff810f7c29>] ? select_idle_sibling+0xe9/0x120
[<ffffffff810f6cd7>] ? __enqueue_entity+0x67/0x70
[<ffffffff810f9bf0>] ? enqueue_task_fair+0x630/0xe20
[<ffffffff810efa76>] ? resched_curr+0x36/0x70
[<ffffffff810f0078>] ? check_preempt_curr+0x78/0x90
[<ffffffff810f00a2>] ? ttwu_do_wakeup+0x12/0x80
[<ffffffff810f01b1>] ? ttwu_do_activate.constprop.86+0x61/0x70
[<ffffffff810f0c15>] ? try_to_wake_up+0x185/0x2d0
[<ffffffff810f0d6d>] ? default_wake_function+0xd/0x10
[<ffffffff81105471>] ? autoremove_wake_function+0x11/0x40
[<ffffffffa0001577>] nbd_ioctl+0x67/0x94 [nbd]
[<ffffffff814ac0fd>] blkdev_ioctl+0x14d/0x940
[<ffffffff811b0da2>] ? put_pipe_info+0x22/0x60
[<ffffffff811d96cc>] block_ioctl+0x3c/0x40
[<ffffffff811ba08d>] do_vfs_ioctl+0x8d/0x5e0
[<ffffffff811aa329>] ? ____fput+0x9/0x10
[<ffffffff810e9092>] ? task_work_run+0x72/0x90
[<ffffffff811ba627>] SyS_ioctl+0x47/0x80
[<ffffffff8185f5df>] entry_SYSCALL_64_fastpath+0x17/0x93
---[ end trace 7899b295e4f850c8 ]---

It seems fairly obvious that device_create_file() is not being protected
from being run concurrently on the same nbd.

Quentin found the following relevant commits:

1a2ad21 nbd: add locking to nbd_ioctl
90b8f28 [PATCH] end of methods switch: remove the old ones
d4430d6 [PATCH] beginning of methods conversion
08f8585 [PATCH] move block_device_operations to blkdev.h

It would seem that the race was introduced in the process of moving nbd
from BKL to unlocked ioctls.

By setting nbd->task_recv while the mutex is held, we can prevent other
processes from running concurrently (since nbd->task_recv is also checked
while the mutex is held).

Reported-and-tested-by: Quentin Casasnovas <[email protected]>
Cc: Markus Pargmann <[email protected]>
Cc: Paul Clements <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

block: fix use-after-free in seq file

I got a KASAN report of use-after-free:

    ==================================================================
    BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
    Read of size 8 by task trinity-c1/315
    =============================================================================
    BUG kmalloc-32 (Not tainted): kasan: bad access detected
    -----------------------------------------------------------------------------

    Disabling lock debugging due to kernel taint
    INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
            ___slab_alloc+0x4f1/0x520
            __slab_alloc.isra.58+0x56/0x80
            kmem_cache_alloc_trace+0x260/0x2a0
            disk_seqf_start+0x66/0x110
            traverse+0x176/0x860
            seq_read+0x7e3/0x11a0
            proc_reg_read+0xbc/0x180
            do_loop_readv_writev+0x134/0x210
            do_readv_writev+0x565/0x660
            vfs_readv+0x67/0xa0
            do_preadv+0x126/0x170
            SyS_preadv+0xc/0x10
            do_syscall_64+0x1a1/0x460
            return_from_SYSCALL_64+0x0/0x6a
    INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
            __slab_free+0x17a/0x2c0
            kfree+0x20a/0x220
            disk_seqf_stop+0x42/0x50
            traverse+0x3b5/0x860
            seq_read+0x7e3/0x11a0
            proc_reg_read+0xbc/0x180
            do_loop_readv_writev+0x134/0x210
            do_readv_writev+0x565/0x660
            vfs_readv+0x67/0xa0
            do_preadv+0x126/0x170
            SyS_preadv+0xc/0x10
            do_syscall_64+0x1a1/0x460
            return_from_SYSCALL_64+0x0/0x6a

    CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G    B           4.7.0+ #62
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
     ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
     ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
     ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
    Call Trace:
     [<ffffffff81d6ce81>] dump_stack+0x65/0x84
     [<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0
     [<ffffffff814704ff>] object_err+0x2f/0x40
     [<ffffffff814754d1>] kasan_report_error+0x221/0x520
     [<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40
     [<ffffffff83888161>] klist_iter_exit+0x61/0x70
     [<ffffffff82404389>] class_dev_iter_exit+0x9/0x10
     [<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50
     [<ffffffff8151f812>] seq_read+0x4b2/0x11a0
     [<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180
     [<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210
     [<ffffffff814b4c45>] do_readv_writev+0x565/0x660
     [<ffffffff814b8a17>] vfs_readv+0x67/0xa0
     [<ffffffff814b8de6>] do_preadv+0x126/0x170
     [<ffffffff814b92ec>] SyS_preadv+0xc/0x10

This problem can occur in the following situation:

open()
- pread()
    - .seq_start()
       - iter = kmalloc() // succeeds
       - seqf->private = iter
    - .seq_stop()
       - kfree(seqf->private)
- pread()
    - .seq_start()
       - iter = kmalloc() // fails
    - .seq_stop()
       - class_dev_iter_exit(seqf->private) // boom! old pointer

As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.

An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.

Cc: [email protected]
Signed-off-by: Vegard Nossum <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

f2fs: drop bio->bi_rw manual assignment

Merge 4fc29c1aa375 included this extra line, but it's not needed (or
useful) since we'll bio_set_op_attrs() right after to properly set
the op and flags for the bio.

Signed-off-by: Jens Axboe <[email protected]>

block: add missing group association in bio-cloning functions

When a bio is cloned, the newly created bio must be associated with
the same blkcg as the original bio (if BLK_CGROUP is enabled). If
this operation is not performed, then the new bio is not associated
with any group, and the group of the current task is returned when
the group of the bio is requested.

Depending on the cloning frequency, this may cause a large
percentage of the bios belonging to a given group to be treated
as if belonging to other groups (in most cases as if belonging to
the root group). The expected group isolation may thereby be broken.

This commit adds the missing association in bio-cloning functions.

Fixes: da2f0f74cf7d ("Btrfs: add support for blkio controllers")
Cc: [email protected] # v4.3+
Signed-off-by: Paolo Valente <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Jeff Moyer <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

blkcg: kill unused field nr_undestroyed_grps

'nr_undestroyed_grps' in struct throtl_data was used to count
the number of throtl_grp related with throtl_data, but now
throtl_grp is tracked by blkcg_gq, so it is useless anymore.

Signed-off-by: Hou Tao <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

writeback: Write dirty times for WB_SYNC_ALL writeback

Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and
queue_io() so that inodes which have only dirty timestamps are properly
written on fsync(2) and sync(2). However there are other call sites -
most notably going through write_inode_now() - which expect inode to be
clean after WB_SYNC_ALL writeback. This is not currently true as we do
not clear I_DIRTY_TIME in __writeback_single_inode() even for
WB_SYNC_ALL writeback in all the cases. This then resulted in the
following oops because bdev_write_inode() did not clean the inode and
writeback code later stumbled over a dirty inode with detached wb.

  general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
  Modules linked in:
  CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Workqueue: writeback wb_workfn (flush-11:0)
  task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000
  RIP: 0010:[<ffffffff818884d2>]  [<ffffffff818884d2>]
  locked_inode_to_wb_and_lock_list+0xa2/0x750
  RSP: 0018:ffff88006cdaf7d0  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050
  RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286
  RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000
  R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0
  R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40
  FS:  0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0
  DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
  Stack:
   ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948
   ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948
   ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e
  Call Trace:
   [<     inline     >] inode_to_wb_and_lock_list fs/fs-writeback.c:309
   [<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554
   [<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600
   [<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709
   [<     inline     >] wb_do_writeback fs/fs-writeback.c:1844
   [<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884
   [<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094
   [<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228
   [<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
   [<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392
  Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e
  00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42>
  80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3
  RIP  [<     inline     >] wb_get include/linux/backing-dev-defs.h:212
  RIP  [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750
  fs/fs-writeback.c:281
   RSP <ffff88006cdaf7d0>
  ---[ end trace 986a4d314dcb2694 ]---

Fix the problem by making sure __writeback_single_inode() writes inode
only with dirty times in WB_SYNC_ALL mode.

Reported-by: Dmitry Vyukov <[email protected]>
Tested-by: Laurent Dufour <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

floppy: fix open(O_ACCMODE) for ioctl-only open

Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().

Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.

Cc: [email protected] # v4.5+
Reported-by: Wim Osterholt <[email protected]>
Tested-by: Wim Osterholt <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

metag: Fix __cmpxchg_u32 asm constraint for CMP

The LNKGET based atomic sequence in __cmpxchg_u32 has slightly incorrect
constraints for the return value which under certain circumstances can
allow an address unit register to be used as the first operand of a CMP
instruction. This isn't a valid instruction however as the encodings
only allow a data unit to be specified. This would result in an
assembler error like the following:

Error: failed to assemble instruction: "CMP A0.2,D0Ar6"

Fix by changing the constraint from "=&da" (assigned, early clobbered,
data or address unit register) to "=&d" (data unit register only).

The constraint for the second operand, "bd" (an op2 register where op1
is a data unit register and the instruction supports O2R) is already
correct assuming the first operand is a data unit register.

Other cases of CMP in inline asm have had their constraints checked, and
appear to all be fine.

Fixes: 6006c0d8ce94 ("metag: Atomics, locks and bitops")
Signed-off-by: James Hogan <[email protected]>
Cc: [email protected]
Cc: <[email protected]> # 3.9.x-

Input: silead - remove some dead code

buf[0] is an unsigned char. touch_nr is an int. The test for negative
here doesn't make sense so I have removed it.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>

Input: sis-i2c - select CONFIG_CRC_ITU_T

The newly added sis_i2c driver fails to link without the CRC_ITU_T
driver enabled:

drivers/input/touchscreen/sis_i2c.o: In function `sis_ts_irq_handler':
sis_i2c.c:(.text+0xc0): undefined reference to `crc_itu_t'

This adds a Kconfig select statement.

Signed-off-by: Arnd Bergmann <[email protected]>
Fixes: a485cb037fe6 ("Input: add driver for SiS 9200 family I2C touchscreen controllers")
Signed-off-by: Dmitry Torokhov <[email protected]>

Merge branches 'misc' and 'rxe' into k.o/for-4.8-1

Soft RoCE driver

Soft RoCE (RXE) - The software RoCE driver

ib_rxe implements the RDMA transport and registers to the RDMA core
device as a kernel verbs provider. It also implements the packet IO
layer. On the other hand ib_rxe registers to the Linux netdev stack
as a udp encapsulating protocol, in that case RDMA, for sending and
receiving packets over any Ethernet device.  This yields a RDMA
transport over the UDP/Ethernet network layer forming a RoCEv2
compatible device.

The configuration procedure of the Soft RoCE drivers requires
binding to any existing Ethernet network device. This is done with
/sys interface.

A userspace Soft RoCE library (librxe) provides user applications
the ability to run with Soft RoCE devices.  The use of rxe verbs ins
user space requires the inclusion of librxe as a device specifics
plug-in to libibverbs. librxe is packaged separately.

Architecture:

     +-----------------------------------------------------------+
     |                          Application                      |
     +-----------------------------------------------------------+
                            +-----------------------------------+
                            |             libibverbs            |
User                        +-----------------------------------+
                            +----------------+ +----------------+
                            | librxe         | | HW RoCE lib    |
                            +----------------+ +----------------+
+---------------------------------------------------------------+
     +--------------+                           +------------+
     | Sockets      |                           | RDMA ULP   |
     +--------------+                           +------------+
     +--------------+                  +---------------------+
     | TCP/IP       |                  | ib_core             |
     +--------------+                  +---------------------+
                             +------------+ +----------------+
Kernel                       | ib_rxe     | | HW RoCE driver |
                             +------------+ +----------------+
     +------------------------------------+
     | NIC driver                         |
     +------------------------------------+

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +-----------------------------------------------------------+
     |                          Application                      |
     +-----------------------------------------------------------+
                            +-----------------------------------+
                            |             libibverbs            |
User                        +-----------------------------------+
                            +----------------+ +----------------+
                            | librxe         | | HW RoCE lib    |
                            +----------------+ +----------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +--------------+                           +------------+
     | Sockets      |                           | RDMA ULP   |
     +--------------+                           +------------+
     +--------------+                  +---------------------+
     | TCP/IP       |                  | ib_core             |
     +--------------+                  +---------------------+
                             +------------+ +----------------+
Kernel                       | ib_rxe     | | HW RoCE driver |
                             +------------+ +----------------+
     +------------------------------------+
     | NIC driver                         |
     +------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Soft RoCE resources:

[1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
Github
[2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
Wiki page
[3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library

Signed-off-by: Kamal Heib <[email protected]>
Signed-off-by: Amir Vadai <[email protected]>
Signed-off-by: Moni Shoua <[email protected]>
Reviewed-by: Haggai Eran <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

dm raid: fix use of wrong status char during resynchronization

During a resynchronization, device status char 'a' is output on the raid
status line for every device of a RAID set. It changes from 'a' to 'A'
(unless device failure) when the resynchronization completes.

Interrupting and restarting a resynchronization, by reloading the DM
table, erroneously lead to status char 'A'.

Fix this by avoiding setting the MD_RECOVERY_REQUESTED flag in
raid_preresume().

Signed-off-by: Heinz Mauelshagen <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>

Merge tag 'media/v4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media DocBook removal and some fixups from Mauro Carvalho Chehab:

  - removal of the media DocBook (since it's all in Sphinx now)

  - videobuf2: Fix an allocation regression

  - a few fixes related to the CEC drivers

* tag 'media/v4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] cec: fix off-by-one memset
  [media] staging: add MEDIA_SUPPORT dependency
  [media] vivid: don't handle CEC_MSG_SET_STREAM_PATH
  [media] media: adv7180: Fix broken interrupt register access
  [media] vb2: Fix allocation size of dma_parms
  [media] vim2m: copy the other colorspace-related fields as well
  [media] adv7511: fix VIC autodetect
  doc-rst: Remove the media docbook

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
"The only interesting thing here is Jessica's patch to add
  ro_after_init support to modules.  The rest are all trivia"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  extable.h: add stddef.h so "NULL" definition is not implicit
  modules: add ro_after_init support
  jump_label: disable preemption around __module_text_address().
  exceptions: fork exception table content from module.h into extable.h
  modules: Add kernel parameter to blacklist modules
  module: Do a WARN_ON_ONCE() for assert module mutex not held
  Documentation/module-signing.txt: Note need for version info if reusing a key
  module: Invalidate signatures on force-loaded modules
  module: Issue warnings when tainting kernel
  module: fix redundant test.
  module: fix noreturn attribute for __module_put_and_exit()

Merge branch 'akpm' (patches from Andrew)

Merge even more updates from Andrew Morton:

- dma-mapping API cleanup

- a few cleanups and misc things

- use jump labels in dynamic-debug

* emailed patches from Andrew Morton <[email protected]>:
  dynamic_debug: add jump label support
  jump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABEL
  arm: jump label may reference text in __exit
  tile: support static_key usage in non-module __exit sections
  sparc: support static_key usage in non-module __exit sections
  powerpc: add explicit #include <asm/asm-compat.h> for jump label
  drivers/media/dvb-frontends/cxd2841er.c: avoid misleading gcc warning
  MAINTAINERS: update email and list of Samsung HW driver maintainers
  block: remove BLK_DEV_DAX config option
  samples/kretprobe: fix the wrong type
  samples/kretprobe: convert the printk to pr_info/pr_err
  samples/jprobe: convert the printk to pr_info/pr_err
  samples/kprobe: convert the printk to pr_info/pr_err
  dma-mapping: use unsigned long for dma_attrs
  media: mtk-vcodec: remove unused dma_attrs
  include/linux/bitmap.h: cleanup
  tree-wide: replace config_enabled() with IS_ENABLED()
  drivers/fpga/Kconfig: fix build failure

dynamic_debug: add jump label support

Although dynamic debug is often only used for debug builds, sometimes
its enabled for production builds as well.  Minimize its impact by using
jump labels.  This reduces the text section by 7000+ bytes in the kernel
image below.  It does increase data, but this should only be referenced
when changing the direction of the branches, and hence usually not in
cache.

     text     data     bss       dec     hex  filename
  8194852  4879776  925696  14000324  d5a0c4  vmlinux.pre
  8187337  4960224  925696  14073257  d6bda9  vmlinux.post

Link: http://lkml.kernel.org/r/d165b465e8c89bc582d973758d40be44c33f018b.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

jump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABEL

The current jump_label.h includes bug.h for things such as WARN_ON().
This makes the header problematic for inclusion by kernel.h or any
headers that kernel.h includes, since bug.h includes kernel.h (circular
dependency). The inclusion of atomic.h is similarly problematic. Thus,
this should make jump_label.h 'includable' from most places.

Link: http://lkml.kernel.org/r/7060ce35ddd0d20b33bf170685e6b0fab816bdf2.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arm: jump label may reference text in __exit

The jump table can reference text found in an __exit section. Thus,
instead of discarding it at build time, include EXIT_TEXT as part of
__init and it will be released when the system boots.

Link: http://lkml.kernel.org/r/60284113bb759121e8ae3e99af1535647e52123f.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

tile: support static_key usage in non-module __exit sections

Previously, all the __exit sections were just dropped by the link phase.
However, if there are static_key (jump label) constructs in __exit
sections that are not modules, the link fails with the message:

`.exit.text' referenced in section `__jump_table' of xxx.o:
defined in discarded section `.exit.text' of xxx.o

Support this usage by keeping the .exit.text sections in the final image
if JUMP_LABEL is defined, then discarding them once initialization is
complete.

Link: http://lkml.kernel.org/r/bfd7c107c610c30e992868ebfe2a5d796a097464.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <[email protected]>
Signed-off-by: Chris Metcalf <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

sparc: support static_key usage in non-module __exit sections

The jump table can reference text found in an __exit section.  Thus,
instead of discarding it at build/link time, include EXIT_TEXT as part
of __init and release it at system boot time.

Without this patch the link fails with:

    `.exit.text' referenced in section `__jump_table' of xxx.o:
    defined in discarded section `.exit.text' of xxx.o

Link: http://lkml.kernel.org/r/d822da427ab07a02a394602eca687104ff682f83.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

powerpc: add explicit #include <asm/asm-compat.h> for jump label

The stringify_in_c() macro may not be included. Make the dependency
explicit.

Link: http://lkml.kernel.org/r/564720c5328edd53c9d56db325be7215440eec3e.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/media/dvb-frontends/cxd2841er.c: avoid misleading gcc warning

The addition of jump label support in dynamic_debug caused an unexpected
warning in exactly one file in the kernel:

  drivers/media/dvb-frontends/cxd2841er.c: In function 'cxd2841er_tune_tc':
  include/linux/dynamic_debug.h:134:3: error: 'carrier_offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
     __dynamic_dev_dbg(&descriptor, dev, fmt, \
     ^~~~~~~~~~~~~~~~~
  drivers/media/dvb-frontends/cxd2841er.c:3177:11: note: 'carrier_offset' was declared here
    int ret, carrier_offset;
             ^~~~~~~~~~~~~~

The problem seems to be that the compiler gets confused by the extra
conditionals in static_branch_unlikely, to the point where it can no
longer keep track of which branches have already been taken, and it
doesn't realize that this variable is now always initialized when it
gets used.

I have done lots of randconfig kernel builds and could not find any
other file with this behavior, so I assume it's a rare enough glitch
that we don't need to change the jump label support but instead just
work around the warning in the driver.

To achieve that, I'm moving the check for the return value into the
switch() statement, which is an obvious transformation, but is enough to
un-confuse the compiler here.  The resulting code is not as nice to
read, but at least we retain the behavior of warning if it gets changed
to actually access an uninitialized carrier offset value in the future.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Abylay Ospan <[email protected]>
Cc: Sergey Kozlov <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Jason Baron <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

MAINTAINERS: update email and list of Samsung HW driver maintainers

Change my email address in the MAINTAINERS file.
Add new maintainers of selected Samsung HW drivers.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Kamil Debski <[email protected]>
Reviewed-by: Jean Delvare <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Kishon Vijay Abraham I <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Andrzej Hajda <[email protected]>
Cc: Lukasz Majewski <[email protected]>
Cc: Sylwester Nawrocki <[email protected]>
Cc: Kamil Debski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

block: remove BLK_DEV_DAX config option

The functionality for block device DAX was already removed with commit
acc93d30d7d4 ("Revert "block: enable dax for raw block devices"")

However, we still had a config option hanging around that was always
disabled because it depended on CONFIG_BROKEN. This config option was
introduced in commit 03cdadb04077 ("block: disable block device DAX by
default")

This change reverts that commit, removing the dead config option.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Cc: Dave Hansen <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

samples/kretprobe: fix the wrong type

The regs_return_value() returns "unsigned long" or "long" value. But the
retval is int type now, it may cause overflow, the log may becomes:

[ 2911.078869] do_brk returned -2003877888 and took 4620 ns to execute

This patch converts the retval to "unsigned long" type, and fixes the
overflow issue.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Huang Shijie <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

samples/kretprobe: convert the printk to pr_info/pr_err

We prefer to use the pr_* to print out the log now, this patch converts
the printk to pr_info. In the error path, use the pr_err to replace the
printk.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Huang Shijie <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

samples/jprobe: convert the printk to pr_info/pr_err

We prefer to use the pr_* to print out the log now, this patch converts
the printk to pr_info. In the error path, use the pr_err to replace the
printk.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Huang Shijie <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

samples/kprobe: convert the printk to pr_info/pr_err

We prefer to use the pr_* to print out the log now, this patch converts
the printk to pr_info. In the error path, use the pr_err to replace the
printk.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Huang Shijie <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Steve Capper <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dma-mapping: use unsigned long for dma_attrs

The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer.  Thus the pointer can point to const data.
However the attributes do not have to be a bitfield.  Instead unsigned
long will do fine:

1. This is just simpler.  Both in terms of reading the code and setting
   attributes.  Instead of initializing local attributes on the stack
   and passing pointer to it to dma_set_attr(), just set the bits.

2. It brings safeness and checking for const correctness because the
   attributes are passed by value.

Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Acked-by: Vineet Gupta <[email protected]>
Acked-by: Robin Murphy <[email protected]>
Acked-by: Hans-Christian Noren Egtvedt <[email protected]>
Acked-by: Mark Salter <[email protected]> [c6x]
Acked-by: Jesper Nilsson <[email protected]> [cris]
Acked-by: Daniel Vetter <[email protected]> [drm]
Reviewed-by: Bart Van Assche <[email protected]>
Acked-by: Joerg Roedel <[email protected]> [iommu]
Acked-by: Fabien Dessenne <[email protected]> [bdisp]
Reviewed-by: Marek Szyprowski <[email protected]> [vb2-core]
Acked-by: David Vrabel <[email protected]> [xen]
Acked-by: Konrad Rzeszutek Wilk <[email protected]> [xen swiotlb]
Acked-by: Joerg Roedel <[email protected]> [iommu]
Acked-by: Richard Kuo <[email protected]> [hexagon]
Acked-by: Geert Uytterhoeven <[email protected]> [m68k]
Acked-by: Gerald Schaefer <[email protected]> [s390]
Acked-by: Bjorn Andersson <[email protected]>
Acked-by: Hans-Christian Noren Egtvedt <[email protected]> [avr32]
Acked-by: Vineet Gupta <[email protected]> [arc]
Acked-by: Robin Murphy <[email protected]> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

media: mtk-vcodec: remove unused dma_attrs

The local variable dma_attrs is set but never read.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

include/linux/bitmap.h: cleanup

Remove two unneeded `else's.

Cc: David Hildenbrand <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

tree-wide: replace config_enabled() with IS_ENABLED()

The use of config_enabled() against config options is ambiguous.  In
practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
author might have used it for the meaning of IS_ENABLED().  Using
IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
clearer.

This commit replaces config_enabled() with IS_ENABLED() where possible.
This commit is only touching bool config options.

I noticed two cases where config_enabled() is used against a tristate
option:

- config_enabled(CONFIG_HWMON)
  [ drivers/net/wireless/ath/ath10k/thermal.c ]

- config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
  [ drivers/gpu/drm/gma500/opregion.c ]

I did not touch them because they should be converted to IS_BUILTIN()
in order to keep the logic, but I was not sure it was the authors'
intention.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Stas Sergeev <[email protected]>
Cc: Matt Redfearn <[email protected]>
Cc: Joshua Kinard <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Markos Chandras <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: yu-cheng yu <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Johannes Berg <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Will Drewry <[email protected]>
Cc: Nikolay Martynov <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Leonid Yegoshin <[email protected]>
Cc: Rafal Milecki <[email protected]>
Cc: James Cowgill <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Alex Smith <[email protected]>
Cc: Adam Buchbinder <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Mikko Rapeli <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Brian Norris <[email protected]>
Cc: Hidehiro Kawai <[email protected]>
Cc: "Luis R. Rodriguez" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Roland McGrath <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: Tony Wu <[email protected]>
Cc: Huaitong Han <[email protected]>
Cc: Sumit Semwal <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Andrea Gelmini <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Rabin Vincent <[email protected]>
Cc: "Maciej W. Rozycki" <[email protected]>
Cc: David Daney <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/fpga/Kconfig: fix build failure

While building m32r allmodconfig the build is failing with the error:

ERROR: "bad_dma_ops" [drivers/fpga/zynq-fpga.ko] undefined!

Xilinx Zynq FPGA is using DMA but there was no dependency while
building.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Sudip Mukherjee <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
Cc: Alan Tull <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

arm64: Fix copy-on-write referencing in HugeTLB

set_pte_at(.) will set or unset the PTE_RDONLY hardware bit before
writing the entry to the table.

This can cause problems with the copy-on-write logic in hugetlb_cow:
*) hugetlb_cow(.) called to handle a write fault on read only pte,
*) Before the copy-on-write updates the new page table a call is
made to pte_same(huge_ptep_get(ptep), pte)), to check for a race,
*) Because set_pte_at(.) changed the pte, *ptep != pte, and the
hugetlb_cow(.) code erroneously assumes that it lost the race,
*) The new page is subsequently freed without being used.

On arm64 this problem only becomes apparent when we apply:
67961f9 mm/hugetlb: fix huge page reserve accounting for private
mappings

When one runs the libhugetlbfs test suite, there are allocation errors
and hugetlbfs pages become erroneously locked in memory as reserved.
(There is a high HugePages_Rsvd: count).

In this patch we introduce pte_same which ignores the PTE_RDONLY bit,
allowing for the libhugetlbfs test suite to pass as expected and
without leaking any reserved HugeTLB pages.

Reported-by: Huang Shijie <[email protected]>
Signed-off-by: Steve Capper <[email protected]>
Signed-off-by: Will Deacon <[email protected]>

nvmx: mark ept single context invalidation as supported

Commit 4b855078601f ("KVM: nVMX: Don't advertise single
context invalidation for invept") removed advertising
single context invalidation since the spec does not mandate it.
However, some hypervisors (such as ESX) require it to be present
before willing to use ept in a nested environment. Advertise it
and fallback to the global case.

Signed-off-by: Bandan Das <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

nvmx: remove comment about missing nested vpid support

Nested vpid is already supported and both single/global
modes are advertised to the guest

Signed-off-by: Bandan Das <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: lapic: fix access preemption timer stuff even if kernel_irqchip=off

BUG: unable to handle kernel NULL pointer dereference at 000000000000008c
IP: [<ffffffffc04e0180>] kvm_lapic_hv_timer_in_use+0x10/0x20 [kvm]
PGD 0
Oops: 0000 [#1] SMP
Call Trace:
kvm_arch_vcpu_load+0x86/0x260 [kvm]
vcpu_load+0x46/0x60 [kvm]
kvm_vcpu_ioctl+0x79/0x7c0 [kvm]
? __lock_is_held+0x54/0x70
do_vfs_ioctl+0x96/0x6a0
? __fget_light+0x2a/0x90
SyS_ioctl+0x79/0x90
do_syscall_64+0x7c/0x1e0
entry_SYSCALL64_slow_path+0x25/0x25
RIP [<ffffffffc04e0180>] kvm_lapic_hv_timer_in_use+0x10/0x20 [kvm]
RSP <ffff8800db1f3d70>
CR2: 000000000000008c
---[ end trace a55fb79d2b3b4ee8 ]---

This can be reproduced steadily by kernel_irqchip=off.

We should not access preemption timer stuff if lapic is emulated in userspace.
This patch fix it by avoiding access preemption timer stuff when kernel_irqchip=off.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Yunhong Jiang <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

KVM: documentation: fix KVM_CAP_X2APIC_API information

The KVM_X2APIC_API_USE_32BIT_IDS feature applies to both
KVM_SET_GSI_ROUTING and KVM_SIGNAL_MSI, but was not mentioned in the
documentation for the latter ioctl.

Signed-off-by: Paolo Bonzini <[email protected]>

Merge tag 'kvm-arm-for-4.8-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Changes for v4.8 - Take 2

Includes GSI routing support to go along with the new VGIC and a small fix that
has been cooking in -next for a while.

x86: vdso: use __pvclock_read_cycles

The new simplified __pvclock_read_cycles does the same computation
as vread_pvclock, except that (because it takes the pvclock_vcpu_time_info
pointer) it has to be moved inside the loop. Since the loop is expected to
never roll, this makes no difference.

Acked-by: Andy Lutomirski <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

pvclock: introduce seqcount-like API

The version field in struct pvclock_vcpu_time_info basically implements
a seqcount. Wrap it with the usual read_begin and read_retry functions,
and use these APIs instead of peppering the code with smp_rmb()s.
While at it, change it to the more pedantically correct virt_rmb().

With this change, __pvclock_read_cycles can be simplified noticeably.

Signed-off-by: Paolo Bonzini <[email protected]>

powerpc/mm: Move register_process_table() out of ppc_md

We want to initialise register_process_table() before ppc_md is setup,
so that it can be called as part of MMU init (at least on Radix ATM).

That no longer works because probe_machine() requires that ppc_md be
empty before it's called, and we now do probe_machine() much later.

So make register_process_table a global for now. It will probably move
into a mmu_radix_ops struct at some point in the future.

This was broken by me when applying commit 7025776ed1eb "powerpc/mm:
Move hash table ops to a separate structure" due to conflicts with other
patches.

Fixes: 7025776ed1eb ("powerpc/mm: Move hash table ops to a separate structure")
Signed-off-by: Michael Ellerman <[email protected]>

powerpc/perf: Fix incorrect event codes in power9-event-list

These have been changed in the hardware, update Linux's version.

Signed-off-by: Madhavan Srinivasan <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>

ALSA: hda - Fix headset mic detection problem for two dell machines

One of the machines has ALC255 on it, another one has ALC298 on it.

On the machine with the codec ALC298, it also has the speaker volume
problem, so we add the fixup chained to ALC298_FIXUP_SPK_VOLUME rather
than adding a group of pin definition in the pin quirk table, since
the speak volume problem does not happen on other machines yet.

Cc: <[email protected]>
Signed-off-by: Hui Wang <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>

Merge tag 'perf-core-for-mingo-20160803' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

New features:

- Add --sample-cpu to 'perf record', to explicitely ask for sampling
  the CPU (Jiri Olsa)

Fixes:

- Fix processing of multi byte chunks in objdump output, fixing
  disassemble processing for annotation on at least ARM64 (Jan Stancek)

- Use SyS_epoll_wait in a BPF 'perf test' entry instead of sys_epoll_wait, that
  is not present in the DWARF info in vmlinux files (Arnaldo Carvalho de Melo)

- Add -wno-shadow when processing files using perl headers, fixing
  the build on Fedora Rawhide and Arch Linux (Namhyung Kim)

Infrastructure changes:

- Annotate prep work to better catch and report errors related to
  using objdump to disassemble DSOs (Arnaldo Carvalho de Melo)

- Add 'alloc', 'scnprintf' and 'and' methods for bitmap processing (Jiri Olsa)

- Add nested output resorting callback in hists processing (Jiri Olsa)

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

Revert "ACPI / hotplug / PCI: Runtime resume bridge before rescan"

This reverts commit 16468c783cb4cf72475dcda23fabecb4a4bb0e17.

Bisection showed that it was the root cause for a resume hang on a
bog-standard all-Intel laptop (Sony Vaio Pro 11), and reverting fixes
the hang.

Signed-off-by: Linus Torvalds <[email protected]>

Merge branches 'hfi1' and 'sge-limit' into k.o/for-4.8-2

Merge branch 'next' into for-linus

Prepare second round of input updates for 4.8 merge window.

IB/core: Support for CMA multicast join flags

Added UCMA and CMA support for multicast join flags. Flags are
passed using UCMA CM join command previously reserved fields.
Currently supporting two join flags indicating two different
multicast JoinStates:

1. Full Member:
   The initiator creates the Multicast group(MCG) if it wasn't
   previously created, can send Multicast messages to the group
   and receive messages from the MCG.

2. Send Only Full Member:
   The initiator creates the Multicast group(MCG) if it wasn't
   previously created, can send Multicast messages to the group
   but doesn't receive any messages from the MCG.

   IB: Send Only Full Member requires a query of ClassPortInfo
       to determine if SM/SA supports this option. If SM/SA
       doesn't support Send-Only there will be no join request
       sent and an error will be returned.

   ETH: When Send Only Full Member is requested no IGMP join
will be sent.

Signed-off-by: Alex Vesker <[email protected]>
Reviewed by: Hal Rosenstock <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/sa: Add cached attribute containing SM information to SA port

Added a new SA port attribute containing SM ClassPortInfo fields,
(ClassPortInfo fields: Table 126 IB Spec 1.3.). This is useful for
checking SM support for specific features. The attribute is cached
to avoid resending queries, caching is done when a successful
ClassPortInfo reply is received on the port. Invalidation of the
attribute is done on SM change events, SM re-registration events,
and SM LID change events. The fields in ClassPortInfo should not
change during SM runtime without an event.

Signed-off-by: Alex Vesker <[email protected]>
Reviewed by: Hal Rosenstock <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/uverbs: Fix race between uverbs_close and remove_one

Fixes an oops that might happen if uverbs_close races with
remove_one.

Both contexts may run ib_uverbs_cleanup_ucontext, it depends
on the flow.

Currently, there is no protection for a case that remove_one
didn't make the cleanup it runs to its end, the underlying
ib_device was freed then uverbs_close will call
ib_uverbs_cleanup_ucontext and OOPs.

Above might happen if uverbs_close deleted the file from the list
then remove_one didn't find it and runs to its end.

Fixes to protect against that case by a new cleanup lock so that
ib_uverbs_cleanup_ucontext will be called always before that
remove_one is ended.

Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and remove_one")
Reported-by: Devesh Sharma <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Yishai Hadas <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mthca: Clean up error unwind flow in mthca_reset()

The kfree() function was called in a few cases by the mthca_reset()
function during error handling even if the passed variables "bridge_header"
and "hca_header" contained a null pointer.

Adjust jump targets according to the Linux coding style convention.

Signed-off-by: Markus Elfring <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mthca: NULL arg to pci_dev_put is OK

The pci_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/hfi1: NULL arg to sc_return_credits is OK

The sc_return_credits() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx4: Add diagnostic hardware counters

Expose IB diagnostic hardware counters.
The counters count IB events and are applicable for IB and RoCE.

The counters can be divided into two groups, per device and per port.
Device counters are always exposed.
Port counters are exposed only if the firmware supports per port counters.

rq_num_dup and sq_num_to are only exposed if we have firmware support
for them, if we do, we expose them per device and per port.
rq_num_udsdprd and num_cqovf are device only counters.

rq - denotes responder.
sq - denotes requester.

|-----------------------|---------------------------------------|
| Name | Description |
|-----------------------|---------------------------------------|
|rq_num_lle | Number of local length errors |
|-----------------------|---------------------------------------|
|sq_num_lle | number of local length errors |
|-----------------------|---------------------------------------|
|rq_num_lqpoe | Number of local QP operation errors |
|-----------------------|---------------------------------------|
|sq_num_lqpoe | Number of local QP operation errors |
|-----------------------|---------------------------------------|
|rq_num_lpe | Number of local protection errors |
|-----------------------|---------------------------------------|
|sq_num_lpe | Number of local protection errors |
|-----------------------|---------------------------------------|
|rq_num_wrfe | Number of CQEs with error |
|-----------------------|---------------------------------------|
|sq_num_wrfe | Number of CQEs with error |
|-----------------------|---------------------------------------|
|sq_num_mwbe | Number of Memory Window bind errors |
|-----------------------|---------------------------------------|
|sq_num_bre | Number of bad response errors |
|-----------------------|---------------------------------------|
|sq_num_rire | Number of Remote Invalid request |
| | errors |
|-----------------------|---------------------------------------|
|rq_num_rire | Number of Remote Invalid request |
| | errors |
|-----------------------|---------------------------------------|
|sq_num_rae | Number of remote access errors |
|-----------------------|---------------------------------------|
|rq_num_rae | Number of remote access errors |
|-----------------------|---------------------------------------|
|sq_num_roe | Number of remote operation errors |
|-----------------------|---------------------------------------|
|sq_num_tree | Number of transport retries exceeded |
| | errors |
|-----------------------|---------------------------------------|
|sq_num_rree | Number of RNR NAK retries exceeded |
| | errors |
|-----------------------|---------------------------------------|
|rq_num_rnr | Number of RNR NAKs sent |
|-----------------------|---------------------------------------|
|sq_num_rnr | Number of RNR NAKs received |
|-----------------------|---------------------------------------|
|rq_num_oos | Number of Out of Sequence requests |
| | received |
|-----------------------|---------------------------------------|
|sq_num_oos | Number of Out of Sequence NAKs |
| | received |
|-----------------------|---------------------------------------|
|rq_num_udsdprd | Number of UD packets silently |
| | discarded on the Receive Queue due to |
| | lack of receive descriptor |
|-----------------------|---------------------------------------|
|rq_num_dup | Number of duplicate requests received |
|-----------------------|---------------------------------------|
|sq_num_to | Number of time out received |
|-----------------------|---------------------------------------|
|num_cqovf | Number of CQ overflows |
|-----------------------|---------------------------------------|

Signed-off-by: Mark Bloch <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

net/mlx4: Query performance and diagnostics counters

Add a function to query diagnostics counters from the firmware.

Signed-off-by: Mark Bloch <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

net/mlx4: Add diagnostic counters capability bit

Add a bit that indicates if the firmware supports per port
diagnostic counters.

Signed-off-by: Mark Bloch <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

Use smaller 512 byte messages for portmapper messages

Portmapper messages are short and do not occupy more than 512 bytes.
Lower portmapper message size to 512 bytes. This change significantly
reduces the amount of memory needed when trying to establish a large
number of connections simultaneously. The old value is based on page
size.

Signed-off-by: Faisal Latif <[email protected]>
Signed-off-by: Mustafa Ismail <[email protected]>
Signed-off-by: Shiraz Saleem <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/ipoib: Report SG feature regardless of HW UD CSUM capability

Decouple SG support from HW ability to do UD checksum.
This coupling is for historical reasons and removed with 'commit
ec5f06156423 ("net: Kill link between CSUM and SG features.")'

During driver load it is assumed that device does not supports SG. The
final decision is taken after creating UD QP based on device capability.

Signed-off-by: Yuval Shaia <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct

We allocate a small tracking structure as part of mlx4_ib_resize_cq().
However, we don't need to use GFP_ATOMIC -- immediately after the
allocation, we call mlx4_cq_resize(), which allocates a command
mailbox with GFP_KERNEL and then sleeps on a firmware command, so we
better not be in an atomic context.

This actually has a real impact, because when this GFP_ATOMIC
allocation fails (and GFP_ATOMIC does fail in practice) then a
userspace consumer resizing a CQ will get a spurious failure that we
can easily avoid.

Signed-off-by: Roland Dreier <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/hfi1: Disable by default

There is a strict policy in the Linux kernel that new drivers must be
disabled by default. Hence leave out the "default m" line from Kconfig.

Fixes: f48ad614c100 ("IB/hfi1: Move driver out of staging")
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Jubin John <[email protected]>
Cc: Dennis Dalessandro <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Mike Marciniszyn <[email protected]>
Cc: <[email protected]> # v4.7+
Acked-by: Dennis Dalessandro <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

IB/rdmavt: Disable by default

There is a strict policy in the Linux kernel that new drivers must be
disabled by default. Hence leave out the "default m" line from Kconfig.

Fixes: 0194621b2253 ("IB/rdmavt: Create module framework and handle driver registration")
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Jubin John <[email protected]>
Cc: Dennis Dalessandro <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Mike Marciniszyn <[email protected]>
Cc: <[email protected]> # v4.6+
Signed-off-by: Doug Ledford <[email protected]>

Merge branch 'i40iw' into k.o/for-4.8

Merge branches 'cxgb4' and 'mlx5' into k.o/for-4.8

extable.h: add stddef.h so "NULL" definition is not implicit

While not an issue now, eventually we will have independent users of
the extable.h file and we will stop sourcing it via module.h header.

In testing that pending work, with very sparse builds, characteristic
of an "allnoconfig" on various architectures, we can sometimes hit an
instance where the very basic standard definitions aren't present,
resulting in:

include/linux/extable.h:26:9: error: 'NULL' undeclared (first use in this function)

To be clear, this isn't a regression, since currently extable.h is
only used by module.h -- however, we will need this addition present
before we start migrating exception table users off module.h and onto
extable.h during the next release cycle.

Cc: Rusty Russell <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>

modules: add ro_after_init support

Add ro_after_init support for modules by adding a new page-aligned section
in the module layout (after rodata) for ro_after_init data and enabling RO
protection for that section after module init runs.

Signed-off-by: Jessica Yu <[email protected]>
Acked-by: Kees Cook <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>

jump_label: disable preemption around __module_text_address().

Steven reported a warning caused by not holding module_mutex or
rcu_read_lock_sched: his backtrace was corrupted but a quick audit
found this possible cause. It's wrong anyway...

Reported-by: Steven Rostedt <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>

exceptions: fork exception table content from module.h into extable.h

For historical reasons (i.e. pre-git) the exception table stuff was
buried in the middle of the module.h file. I noticed this while
doing an audit for needless includes of module.h and found core
kernel files (both arch specific and arch independent) were just
including module.h for this.

The converse is also true, in that conventional drivers, be they
for filesystems or actual hardware peripherals or similar, do not
normally care about the exception tables.

Here we fork the exception table content out of module.h into a
new file called extable.h -- and temporarily include it into the
module.h itself.

Then we will work our way across the arch independent and arch
specific files needing just exception table content, and move
them off module.h and onto extable.h

Once that is done, we can remove the extable.h from module.h
and in doing it like this, we avoid introducing build failures
into the git history.

The gain here is that module.h gets a bit smaller, across all
modular drivers that we build for allmodconfig. Also the core
files that only need exception table stuff don't have an include
of module.h that brings in lots of extra stuff and just looks
generally out of place.

Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Signed-off-by: Paul Gortmaker <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>

modules: Add kernel parameter to blacklist modules

Blacklisting a module in linux has long been a problem. The current
procedure is to use rd.blacklist=module_name, however, that doesn't
cover the case after the initramfs and before a boot prompt (where one
is supposed to use /etc/modprobe.d/blacklist.conf to blacklist
runtime loading). Using rd.shell to get an early prompt is hit-or-miss,
and doesn't cover all situations AFAICT.

This patch adds this functionality of permanently blacklisting a module
by its name via the kernel parameter module_blacklist=module_name.

[v2]: Rusty, use core_param() instead of __setup() which simplifies
things.

[v3]: Rusty, undo wreckage from strsep()

[v4]: Rusty, simpler version of blacklisted()

Signed-off-by: Prarit Bhargava <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: [email protected]
Signed-off-by: Rusty Russell <[email protected]>

module: Do a WARN_ON_ONCE() for assert module mutex not held

When running with lockdep enabled, I triggered the WARN_ON() in the
module code that asserts when module_mutex or rcu_read_lock_sched are
not held. The issue I have is that this can also be called from the
dump_stack() code, causing us to enter an infinite loop...

------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
  ffff880215e8fa70 ffff880215e8fa70 ffffffff812fc8e3 0000000000000000
  ffffffff81d3e55b ffff880215e8fac0 ffffffff8104fc88 ffffffff8104fcab
  0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
Call Trace:
  [<ffffffff812fc8e3>] dump_stack+0x67/0x90
  [<ffffffff8104fc88>] __warn+0xcb/0xe9
  [<ffffffff8104fcab>] ? warn_slowpath_null+0x5/0x1f
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
  ffff880215e8f7a0 ffff880215e8f7a0 ffffffff812fc8e3 0000000000000000
  ffffffff81d3e55b ffff880215e8f7f0 ffffffff8104fc88 ffffffff8104fcab
  0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
Call Trace:
  [<ffffffff812fc8e3>] dump_stack+0x67/0x90
  [<ffffffff8104fc88>] __warn+0xcb/0xe9
  [<ffffffff8104fcab>] ? warn_slowpath_null+0x5/0x1f
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375253c #14
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
  ffff880215e8f4d0 ffff880215e8f4d0 ffffffff812fc8e3 0000000000000000
  ffffffff81d3e55b ffff880215e8f520 ffffffff8104fc88 ffffffff8104fcab
  0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
Call Trace:
  [<ffffffff812fc8e3>] dump_stack+0x67/0x90
  [<ffffffff8104fc88>] __warn+0xcb/0xe9
  [<ffffffff8104fcab>] ? warn_slowpath_null+0x5/0x1f
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
[...]

Which gives us rather useless information. Worse yet, there's some race
that causes this, and I seldom trigger it, so I have no idea what
happened.

This would not be an issue if that warning was a WARN_ON_ONCE().

Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>

ACPI / EC: Work around method reentrancy limit in ACPICA for _Qxx

A regression is caused by the following commit:

  Commit: 02b771b64b73226052d6e731a0987db3b47281e9
  Subject: ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations

In this commit, using system workqueue causes that the maximum parallel
executions of _Qxx can exceed 255. This violates the method reentrancy
limit in ACPICA and generates the following error log:

  ACPI Error: Method reached maximum reentrancy limit (255) (20150818/dsmethod-341)

This patch creates a seperate workqueue and limits the number of parallel
_Qxx evaluations down to a configurable value (can be tuned against number
of online CPUs).

Since EC events are handled after driver probe, we can create the workqueue
in acpi_ec_init().

Fixes: 02b771b64b73 (ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=135691
Cc: 4.3+ <[email protected]> # 4.3+
Reported-and-tested-by: Helen Buus <[email protected]>
Signed-off-by: Lv Zheng <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>

perf tests bpf: Use SyS_epoll_wait alias

Something made the sys_epoll_wait() function alias not to be found in
the vmlinux DWARF info, being found only in /proc/kallsyms, which made
the BPF perf tests to fail:

  [root@jouet ~]# perf test BPF
  37: Test BPF filter                                          :
  37.1: Test basic BPF filtering                               : FAILED!
  37.2: Test BPF prologue generation                           : Skip
  37.3: Test BPF relocation checker                            : Skip
  [root@jouet ~]#

Using -v we can see it is failing to find DWARF info for the probed function,
sys_epoll_wait, which we can find in /proc/kallsyms but not in vmlinux with
CONFIG_DEBUG_INFO:

  [root@jouet ~]# grep -w sys_epoll_wait /proc/kallsyms
  ffffffffbd295b50 T sys_epoll_wait
  [root@jouet ~]#

  [root@jouet ~]# readelf -wi /lib/modules/4.7.0+/build/vmlinux | grep -w sys_epoll_wait
  [root@jouet ~]#

If we try to use perf probe:

[root@jouet ~]# perf probe sys_epoll_wait
Failed to find debug information for address ffffffffbd295b50
Probe point 'sys_epoll_wait' not found.
  Error: Failed to add events.
[root@jouet ~]#

It all works if we use SyS_epoll_wait, that is just an alias to the probed
function:

  [root@jouet ~]# grep -i sys_epoll_wait /proc/kallsyms
  ffffffffbd295b50 T SyS_epoll_wait
  ffffffffbd295b50 T sys_epoll_wait
  [root@jouet ~]#

So use it:

  [root@jouet ~]# perf test BPF
  37: Test BPF filter                                          :
  37.1: Test basic BPF filtering                               : Ok
  37.2: Test BPF prologue generation                           : Ok
  37.3: Test BPF relocation checker                            : Ok
  [root@jouet ~]#

Further info:

  [root@jouet ~]# gcc --version
  gcc (GCC) 6.1.1 20160621 (Red Hat 6.1.1-3)
  [acme@jouet linux]$ cat /etc/fedora-release
  Fedora release 24 (Twenty Four)

Investigation as to why it fails is still underway, but it was always
going from sys_epoll_wait to SyS_epoll_wait when looking up the DWARF
info in vmlinux, and this is what is breaking now.

Switching to use SyS_epoll_wait allows this test to proceed and test the
BPF code it was designed for, so lets have this in to allow passing this
test while we fix the root cause.

Cc: Adrian Hunter <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

shmem: Fix link error if huge pages support is disabled

If CONFIG_TRANSPARENT_HUGE_PAGECACHE=n, HPAGE_PMD_NR evaluates to
BUILD_BUG_ON(), and may cause (e.g. with gcc 4.12):

mm/built-in.o: In function `shmem_alloc_hugepage':
shmem.c:(.text+0x17570): undefined reference to `__compiletime_assert_1365'

To fix this, move the assignment to hindex after the check for huge
pages support.

Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()

We can't pass error pointers to kfree() or it causes an oops.

Fixes: 52b209f7b848 ('get rid of hostfs_read_inode()')
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

um: Support kcov

This adds support for kcov to UML.

There is a small problem where UML will randomly segfault during boot;
this is because current_thread_info() occasionally returns an invalid
(non-NULL) pointer and we try to dereference it in
__sanitizer_cov_trace_pc(). I consider this a bug in UML itself and this
patch merely exposes it.

[v2: disable instrumentation in UML-specific code]

Cc: Quentin Casasnovas <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Thomas Meyer <[email protected]>
Cc: user-mode-linux-devel <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Vegard Nossum <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

um: Enable TRACE_IRQFLAGS_SUPPORT

Now we have everything we need, so enable
TRACE_IRQFLAGS_SUPPORT.

Signed-off-by: Richard Weinberger <[email protected]>

um: Use asm-generic/irqflags.h

Instead proving its own arch_local_irq_save() and arch_irqs_disabled()
version use the generic version from asm-generic/irqflags.h.

A nice side effect is that um gets a few additional arch_ functions
as well.

Signed-off-by: Daniel Wagner <[email protected]>
[rw: Massaged commit message]
Signed-off-by: Richard Weinberger <[email protected]>

um: Fix possible deadlock in sig_handler_common()

We are in atomic context and must not sleep.
Sleeping here is possible since malloc() maps
to kmalloc() with GFP_KERNEL.

Cc: [email protected]
Fixes: b6024b21 ("um: extend fpstate to _xstate to support YMM registers")
Signed-off-by: Richard Weinberger <[email protected]>

drm: i915: fix build when DEBUG_FS is disabled

This clearly had never gotten tested, probably because you need a fairly
minimal configuration in order to disable DEBUG_FS (several other
options select it).

The dummy inline functions that were used for the no-DEBUG_FS case were
missing the argument names in the declarations.

Fixes: 1dac891c1c95 ("drm/i915: Register debugfs interface last")
Reported-and-tested-by: Jörg Otte <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

um: Select HAVE_DEBUG_KMEMLEAK

Now we have the infrastructure to support kmemleak.
Enable the HAVE flag.

Signed-off-by: Richard Weinberger <[email protected]>

um: Setup physical memory in setup_arch()

Currently UML sets up physical memory very early,
long before setup_arch() was called by the kernel main
function.
This can cause problems when code paths in UML's memory setup
code assume that the kernel is already running.
i.e. when kmemleak is enabled it will evaluate current()
in free_bootmem(). That early current() is undefined and
UML explodes.

Solve the problem by setting up physical memory in setup_arch(),
at this stage the kernel has materialized and basic infrastructure
such as current() works.

Signed-off-by: Richard Weinberger <[email protected]>

um: Eliminate null test after alloc_bootmem

alloc_bootmem function never returns NULL. Thus a NULL test after a
call to this function is unnecessary.

The Coccinelle semantic patch used to make this change is follows:
@@
expression E;
statement S;
@@

E =
alloc_bootmem(...)
... when != E
- if (E == NULL) S

Signed-off-by: Amitoj Kaur Chawla <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

Documenation: update cgroup's document path

cgroup's document path is changed to "cgroup-v1". update it.

Signed-off-by: seokhoon.yoon <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>

Documentation/sphinx: do not warn about missing tools in 'make help'

Simply move the dochelp rule outside of the HAVE_SPHINX check,
overriding the .DEFAULT rule for HAVE_SPHINX=0.

Cc: Jonathan Corbet <[email protected]>
Cc: Christian Kujau <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>

Btrfs: fix __MAX_CSUM_ITEMS

Jeff Mahoney's cleanup commit (14a1e067b4) wasn't correct for csums on
machines where the pagesize >= metadata blocksize.

This just reverts the relevant hunks to bring the old math back.

Signed-off-by: Chris Mason <[email protected]>

cachefiles: Fix race between inactivating and culling a cache object

There's a race between cachefiles_mark_object_inactive() and
cachefiles_cull():

(1) cachefiles_cull() can't delete a backing file until the cache object
     is marked inactive, but as soon as that's the case it's fair game.

(2) cachefiles_mark_object_inactive() marks the object as being inactive
     and *only then* reads the i_blocks on the backing inode - but
     cachefiles_cull() might've managed to delete it by this point.

Fix this by making sure cachefiles_mark_object_inactive() gets any data it
needs from the backing inode before deactivating the object.

Without this, the following oops may occur:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
...
CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G          I    ------------   3.10.0-470.el7.x86_64 #1
Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011
Workqueue: fscache_object fscache_object_work_func [fscache]
task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000
RIP: 0010:[<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
RSP: 0018:ffff8800b77c3d70  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034
RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8
RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000
R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600
R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498
FS:  0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0
ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658
ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600
Call Trace:
[<ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles]
[<ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache]
[<ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache]
[<ffffffff810a605b>] process_one_work+0x17b/0x470
[<ffffffff810a6e96>] worker_thread+0x126/0x410
[<ffffffff810a6d70>] ? rescuer_thread+0x460/0x460
[<ffffffff810ae64f>] kthread+0xcf/0xe0
[<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
[<ffffffff81695418>] ret_from_fork+0x58/0x90
[<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140

The oopsing code shows:

callq  0xffffffff810af6a0 <wake_up_bit>
mov    0xf8(%r12),%rax
mov    0x30(%rax),%rax
mov    0x98(%rax),%rax   <---- oops here
lock add %rax,0x130(%rbx)

where this is:

d_backing_inode(object->dentry)->i_blocks

Fixes: a5b3a80b899bda0f456f1246c4c5a1191ea01519 (CacheFiles: Provide read-and-reset release counters for cachefilesd)
Reported-by: Jianhong Yin <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Steve Dickson <[email protected]>
cc: [email protected]
Signed-off-by: Al Viro <[email protected]>

Merge branch 'for-viro' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into for-linus

Merge tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"A few updates and fixes:

   - move the suppressing of the __builtin_return_address >0 warning to
     the tracing directory only.

   - metag recordmcount fix for newer glibc's

   - two tracing histogram fixes that were reported by KASAN"

* tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix use-after-free in hist_register_trigger()
  tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all
  Makefile: Mute warning for __builtin_return_address(>0) for tracing only
  ftrace/recordmcount: Work around for addition of metag magic but not relocations

fs/proc: Add compiler check for -Wno-override-init to support gcc < 4.2

With gcc < 4.2 (e.g. 4.1.2):

CC fs/proc/task_mmu.o
cc1: error: unrecognized command line option "-Wno-override-init"

To fix this, only enable the compiler option when it is actually
supported by the compiler.

Fixes: ca52953f5f24 ("fs/proc/task_mmu.c: suppress compilation warnings with W=1")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Valdis Kletnieks <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dm raid: constructor fails on non-zero incompat_features

When lvm2 userspace requests a RaidLV repair, it sets the rebuild
constructor flag on the new replacement DataLVs but does not clear the
respective MetaLVs. Hence the superblock that is loaded from such new
MetaLVs may have a non-zero incompat_features member and the constructor
will fail with false-positive on incompat_features.

Solve by initializing the incompat_features member properly.

Signed-off-by: Heinz Mauelshagen <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>

9p: use clone_fid()

in a bunch of places it cleans the things up

Signed-off-by: Al Viro <[email protected]>

9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"

In v9fs_vfs_rename() we need to clone the parents' fids, not just
find them.

Spotted-by: Johannes Berg <[email protected]>
Signed-off-by: Al Viro <[email protected]>