Git Repo - linux.git/log

mm/rmap: annotate a data race at tlb_flush_batched

mm->tlb_flush_batched could be accessed concurrently as noticed by
KCSAN,

BUG: KCSAN: data-race in flush_tlb_batched_pending / try_to_unmap_one

write to 0xffff93f754880bd0 of 1 bytes by task 822 on cpu 6:
  try_to_unmap_one+0x59a/0x1ab0
  set_tlb_ubc_flush_pending at mm/rmap.c:635
  (inlined by) try_to_unmap_one at mm/rmap.c:1538
  rmap_walk_anon+0x296/0x650
  rmap_walk+0xdf/0x100
  try_to_unmap+0x18a/0x2f0
  shrink_page_list+0xef6/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  balance_pgdat+0x652/0xd90
  kswapd+0x396/0x8d0
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

read to 0xffff93f754880bd0 of 1 bytes by task 6364 on cpu 4:
  flush_tlb_batched_pending+0x29/0x90
  flush_tlb_batched_pending at mm/rmap.c:682
  change_p4d_range+0x5dd/0x1030
  change_pte_range at mm/mprotect.c:44
  (inlined by) change_pmd_range at mm/mprotect.c:212
  (inlined by) change_pud_range at mm/mprotect.c:240
  (inlined by) change_p4d_range at mm/mprotect.c:260
  change_protection+0x222/0x310
  change_prot_numa+0x3e/0x60
  task_numa_work+0x219/0x350
  task_work_run+0xed/0x140
  prepare_exit_to_usermode+0x2cc/0x2e0
  ret_from_intr+0x32/0x42

Reported by Kernel Concurrency Sanitizer on:
CPU: 4 PID: 6364 Comm: mtest01 Tainted: G        W    L 5.5.0-next-20200210+ #5
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

flush_tlb_batched_pending() is under PTL but the write is not, but
mm->tlb_flush_batched is only a bool type, so the value is unlikely to be
shattered.  Thus, mark it as an intentional data race by using the data
race macro.

Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Marco Elver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/mempool: fix a data race in mempool_free()

mempool_t pool.curr_nr could be accessed concurrently as noticed by
KCSAN,

BUG: KCSAN: data-race in mempool_free / remove_element

write to 0xffffffffa937638c of 4 bytes by task 6359 on cpu 113:
  remove_element+0x4a/0x1c0
  remove_element at mm/mempool.c:132
  mempool_alloc+0x102/0x210
  (inlined by) mempool_alloc at mm/mempool.c:399
  bio_alloc_bioset+0x106/0x2c0
  get_swap_bio+0x49/0x230
  __swap_writepage+0x680/0xc30
  swap_writepage+0x9c/0xf0
  pageout+0x33e/0xae0
  shrink_page_list+0x1f57/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  <snip>

read to 0xffffffffa937638c of 4 bytes by interrupt on cpu 64:
  mempool_free+0x3e/0x150
  mempool_free at mm/mempool.c:492
  bio_free+0x192/0x280
  bio_put+0x91/0xd0
  end_swap_bio_write+0x1d8/0x280
  bio_endio+0x2c2/0x5b0
  dec_pending+0x22b/0x440 [dm_mod]
  clone_endio+0xe4/0x2c0 [dm_mod]
  bio_endio+0x2c2/0x5b0
  blk_update_request+0x217/0x940
  scsi_end_request+0x6b/0x4d0
  scsi_io_completion+0xb7/0x7e0
  scsi_finish_command+0x223/0x310
  scsi_softirq_done+0x1d5/0x210
  blk_mq_complete_request+0x224/0x250
  scsi_mq_done+0xc2/0x250
  pqi_raid_io_complete+0x5a/0x70 [smartpqi]
  pqi_irq_handler+0x150/0x1410 [smartpqi]
  __handle_irq_event_percpu+0x90/0x540
  handle_irq_event_percpu+0x49/0xd0
  handle_irq_event+0x85/0xca
  handle_edge_irq+0x13f/0x3e0
  do_IRQ+0x86/0x190
  <snip>

Since the write is under pool->lock but the read is done as lockless.
Even though the commit 5b990546e334 ("mempool: fix and document
synchronization and memory barrier usage") introduced the smp_wmb() and
smp_rmb() pair to improve the situation, it is adequate to protect it
from data races which could lead to a logic bug, so fix it by adding
READ_ONCE() for the read.

Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/list_lru: fix a data race in list_lru_count_one

struct list_lru_one l.nr_items could be accessed concurrently as noticed
by KCSAN,

BUG: KCSAN: data-race in list_lru_count_one / list_lru_isolate_move

write to 0xffffa102789c4510 of 8 bytes by task 823 on cpu 39:
  list_lru_isolate_move+0xf9/0x130
  list_lru_isolate_move at mm/list_lru.c:180
  inode_lru_isolate+0x12b/0x2a0
  __list_lru_walk_one+0x122/0x3d0
  list_lru_walk_one+0x75/0xa0
  prune_icache_sb+0x8b/0xc0
  super_cache_scan+0x1b8/0x250
  do_shrink_slab+0x256/0x6d0
  shrink_slab+0x41b/0x4a0
  shrink_node+0x35c/0xd80
  balance_pgdat+0x652/0xd90
  kswapd+0x396/0x8d0
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

read to 0xffffa102789c4510 of 8 bytes by task 6345 on cpu 56:
  list_lru_count_one+0x116/0x2f0
  list_lru_count_one at mm/list_lru.c:193
  super_cache_count+0xe8/0x170
  do_shrink_slab+0x95/0x6d0
  shrink_slab+0x41b/0x4a0
  shrink_node+0x35c/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

Reported by Kernel Concurrency Sanitizer on:
CPU: 56 PID: 6345 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #4
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

A shattered l.nr_items could affect the shrinker behaviour due to a data
race. Fix it by adding READ_ONCE() for the read. Since the writes are
aligned and up to word-size, assume those are safe from data races to
avoid readability issues of writing WRITE_ONCE(var, var + val).

Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/memcontrol: fix a data race in scan count

struct mem_cgroup_per_node mz.lru_zone_size[zone_idx][lru] could be
accessed concurrently as noticed by KCSAN,

BUG: KCSAN: data-race in lruvec_lru_size / mem_cgroup_update_lru_size

write to 0xffff9c804ca285f8 of 8 bytes by task 50951 on cpu 12:
  mem_cgroup_update_lru_size+0x11c/0x1d0
  mem_cgroup_update_lru_size at mm/memcontrol.c:1266
  isolate_lru_pages+0x6a9/0xf30
  shrink_active_list+0x123/0xcc0
  shrink_lruvec+0x8fd/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

read to 0xffff9c804ca285f8 of 8 bytes by task 50964 on cpu 95:
  lruvec_lru_size+0xbb/0x270
  mem_cgroup_get_zone_lru_size at include/linux/memcontrol.h:536
  (inlined by) lruvec_lru_size at mm/vmscan.c:326
  shrink_lruvec+0x1d0/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_current+0xa6/0x120
  alloc_slab_page+0x3b1/0x540
  allocate_slab+0x70/0x660
  new_slab+0x46/0x70
  ___slab_alloc+0x4ad/0x7d0
  __slab_alloc+0x43/0x70
  kmem_cache_alloc+0x2c3/0x420
  getname_flags+0x4c/0x230
  getname+0x22/0x30
  do_sys_openat2+0x205/0x3b0
  do_sys_open+0x9a/0xf0
  __x64_sys_openat+0x62/0x80
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reported by Kernel Concurrency Sanitizer on:
CPU: 95 PID: 50964 Comm: cc1 Tainted: G        W  O L    5.5.0-next-20200204+ #6
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

The write is under lru_lock, but the read is done as lockless.  The scan
count is used to determine how aggressively the anon and file LRU lists
should be scanned.  Load tearing could generate an inefficient heuristic,
so fix it by adding READ_ONCE() for the read.

Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/page_counter: fix various data races at memsw

Commit 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") could had
memcg->memsw->watermark and memcg->memsw->failcnt been accessed
concurrently as reported by KCSAN,

BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge

read to 0xffff8fb18c4cd190 of 8 bytes by task 1081 on cpu 59:
  page_counter_try_charge+0x4d/0x150 mm/page_counter.c:138
  try_charge+0x131/0xd50 mm/memcontrol.c:2405
  __memcg_kmem_charge_memcg+0x58/0x140
  __memcg_kmem_charge+0xcc/0x280
  __alloc_pages_nodemask+0x1e1/0x450
  alloc_pages_current+0xa6/0x120
  pte_alloc_one+0x17/0xd0
  __pte_alloc+0x3a/0x1f0
  copy_p4d_range+0xc36/0x1990
  copy_page_range+0x21d/0x360
  dup_mmap+0x5f5/0x7a0
  dup_mm+0xa2/0x240
  copy_process+0x1b3f/0x3460
  _do_fork+0xaa/0xa20
  __x64_sys_clone+0x13b/0x170
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

write to 0xffff8fb18c4cd190 of 8 bytes by task 1153 on cpu 120:
  page_counter_try_charge+0x5b/0x150 mm/page_counter.c:139
  try_charge+0x131/0xd50 mm/memcontrol.c:2405
  mem_cgroup_try_charge+0x159/0x460
  mem_cgroup_try_charge_delay+0x3d/0xa0
  wp_page_copy+0x14d/0x930
  do_wp_page+0x107/0x7b0
  __handle_mm_fault+0xce6/0xd40
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

BUG: KCSAN: data-race in page_counter_try_charge / page_counter_try_charge

write to 0xffff88809bbf2158 of 8 bytes by task 11782 on cpu 0:
  page_counter_try_charge+0x100/0x170 mm/page_counter.c:129
  try_charge+0x185/0xbf0 mm/memcontrol.c:2405
  __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
  __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
  __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780

read to 0xffff88809bbf2158 of 8 bytes by task 11814 on cpu 1:
  page_counter_try_charge+0xef/0x170 mm/page_counter.c:129
  try_charge+0x185/0xbf0 mm/memcontrol.c:2405
  __memcg_kmem_charge_memcg+0x4a/0xe0 mm/memcontrol.c:2837
  __memcg_kmem_charge+0xcf/0x1b0 mm/memcontrol.c:2877
  __alloc_pages_nodemask+0x26c/0x310 mm/page_alloc.c:4780

Since watermark could be compared or set to garbage due to a data race
which would change the code logic, fix it by adding a pair of READ_ONCE()
and WRITE_ONCE() in those places.

The "failcnt" counter is tolerant of some degree of inaccuracy and is only
used to report stats, a data race will not be harmful, thus mark it as an
intentional data race using the data_race() macro.

Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
Reported-by: [email protected]
Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Johannes Weiner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/swapfile: fix and annotate various data races

swap_info_struct si.highest_bit, si.swap_map[offset] and si.flags could
be accessed concurrently separately as noticed by KCSAN,

=== si.highest_bit ===

write to 0xffff8d5abccdc4d4 of 4 bytes by task 5353 on cpu 24:
  swap_range_alloc+0x81/0x130
  swap_range_alloc at mm/swapfile.c:681
  scan_swap_map_slots+0x371/0xb90
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0xf2/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1795/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

read to 0xffff8d5abccdc4d4 of 4 bytes by task 6672 on cpu 70:
  scan_swap_map_slots+0x4a6/0xb90
  scan_swap_map_slots at mm/swapfile.c:892
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0xf2/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1795/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

Reported by Kernel Concurrency Sanitizer on:
CPU: 70 PID: 6672 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #3
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

=== si.swap_map[offset] ===

write to 0xffffbc370c29a64c of 1 bytes by task 6856 on cpu 86:
  __swap_entry_free_locked+0x8c/0x100
  __swap_entry_free_locked at mm/swapfile.c:1209 (discriminator 4)
  __swap_entry_free.constprop.20+0x69/0xb0
  free_swap_and_cache+0x53/0xa0
  unmap_page_range+0x7f8/0x1d70
  unmap_single_vma+0xcd/0x170
  unmap_vmas+0x18b/0x220
  exit_mmap+0xee/0x220
  mmput+0x10e/0x270
  do_exit+0x59b/0xf40
  do_group_exit+0x8b/0x180

read to 0xffffbc370c29a64c of 1 bytes by task 6855 on cpu 20:
  _swap_info_get+0x81/0xa0
  _swap_info_get at mm/swapfile.c:1140
  free_swap_and_cache+0x40/0xa0
  unmap_page_range+0x7f8/0x1d70
  unmap_single_vma+0xcd/0x170
  unmap_vmas+0x18b/0x220
  exit_mmap+0xee/0x220
  mmput+0x10e/0x270
  do_exit+0x59b/0xf40
  do_group_exit+0x8b/0x180

=== si.flags ===

write to 0xffff956c8fc6c400 of 8 bytes by task 6087 on cpu 23:
  scan_swap_map_slots+0x6fe/0xb50
  scan_swap_map_slots at mm/swapfile.c:887
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0x377/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1795/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

read to 0xffff956c8fc6c400 of 8 bytes by task 6207 on cpu 63:
  _swap_info_get+0x41/0xa0
  __swap_info_get at mm/swapfile.c:1114
  put_swap_page+0x84/0x490
  __remove_mapping+0x384/0x5f0
  shrink_page_list+0xff1/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

The writes are under si->lock but the reads are not. For si.highest_bit
and si.swap_map[offset], data race could trigger logic bugs, so fix them
by having WRITE_ONCE() for the writes and READ_ONCE() for the reads
except those isolated reads where they compare against zero which a data
race would cause no harm. Thus, annotate them as intentional data races
using the data_race() macro.

For si.flags, the readers are only interested in a single bit where a
data race there would cause no issue there.

[[email protected]: add a missing annotation for si->flags in memory.c]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Hugh Dickins <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/filemap.c: fix a data race in filemap_fault()

struct file_ra_state ra.mmap_miss could be accessed concurrently during
page faults as noticed by KCSAN,

BUG: KCSAN: data-race in filemap_fault / filemap_map_pages

write to 0xffff9b1700a2c1b4 of 4 bytes by task 3292 on cpu 30:
  filemap_fault+0x920/0xfc0
  do_sync_mmap_readahead at mm/filemap.c:2384
  (inlined by) filemap_fault at mm/filemap.c:2486
  __xfs_filemap_fault+0x112/0x3e0 [xfs]
  xfs_filemap_fault+0x74/0x90 [xfs]
  __do_fault+0x9e/0x220
  do_fault+0x4a0/0x920
  __handle_mm_fault+0xc69/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

read to 0xffff9b1700a2c1b4 of 4 bytes by task 3313 on cpu 32:
  filemap_map_pages+0xc2e/0xd80
  filemap_map_pages at mm/filemap.c:2625
  do_fault+0x3da/0x920
  __handle_mm_fault+0xc69/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

Reported by Kernel Concurrency Sanitizer on:
CPU: 32 PID: 3313 Comm: systemd-udevd Tainted: G        W    L 5.5.0-next-20200210+ #1
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

ra.mmap_miss is used to contribute the readahead decisions, a data race
could be undesirable.  Both the read and write is only under non-exclusive
mmap_sem, two concurrent writers could even underflow the counter.  Fix
the underflow by writing to a local variable before committing a final
store to ra.mmap_miss given a small inaccuracy of the counter should be
acceptable.

Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Qian Cai <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: Marco Elver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/swap_state: mark various intentional data races

swap_cache_info.* could be accessed concurrently as noticed by
KCSAN,

BUG: KCSAN: data-race in lookup_swap_cache / lookup_swap_cache

write to 0xffffffff85517318 of 8 bytes by task 94138 on cpu 101:
  lookup_swap_cache+0x12e/0x460
  lookup_swap_cache at mm/swap_state.c:322
  do_swap_page+0x112/0xeb0
  __handle_mm_fault+0xc7a/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

read to 0xffffffff85517318 of 8 bytes by task 91655 on cpu 100:
  lookup_swap_cache+0x117/0x460
  lookup_swap_cache at mm/swap_state.c:322
  shmem_swapin_page+0xc7/0x9e0
  shmem_getpage_gfp+0x2ca/0x16c0
  shmem_fault+0xef/0x3c0
  __do_fault+0x9e/0x220
  do_fault+0x4a0/0x920
  __handle_mm_fault+0xc69/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

Reported by Kernel Concurrency Sanitizer on:
CPU: 100 PID: 91655 Comm: systemd-journal Tainted: G        W  O L 5.5.0-next-20200204+ #6
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

write to 0xffffffff8d717308 of 8 bytes by task 11365 on cpu 87:
   __delete_from_swap_cache+0x681/0x8b0
   __delete_from_swap_cache at mm/swap_state.c:178

read to 0xffffffff8d717308 of 8 bytes by task 11275 on cpu 53:
   __delete_from_swap_cache+0x66e/0x8b0
   __delete_from_swap_cache at mm/swap_state.c:178

Both the read and write are done as lockless. Since swap_cache_info.*
are only used to print out counter information, even if any of them
missed a few incremental due to data races, it will be harmless, so just
mark it as an intentional data race using the data_race() macro.

While at it, fix a checkpatch.pl warning,

WARNING: Single statement macros should not use a do {} while (0) loop

Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Marco Elver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/page_io: mark various intentional data races

struct swap_info_struct si.flags could be accessed concurrently as noticed
by KCSAN,

BUG: KCSAN: data-race in scan_swap_map_slots / swap_readpage

write to 0xffff9c77b80ac400 of 8 bytes by task 91325 on cpu 16:
  scan_swap_map_slots+0x6fe/0xb50
  scan_swap_map_slots at mm/swapfile.c:887
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0x377/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1740/0x2820
  shrink_inactive_list+0x316/0x8b0
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

read to 0xffff9c77b80ac400 of 8 bytes by task 5422 on cpu 7:
  swap_readpage+0x204/0x6a0
  swap_readpage at mm/page_io.c:380
  read_swap_cache_async+0xa2/0xb0
  swapin_readahead+0x6a0/0x890
  do_swap_page+0x465/0xeb0
  __handle_mm_fault+0xc7a/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

Reported by Kernel Concurrency Sanitizer on:
CPU: 7 PID: 5422 Comm: gmain Tainted: G        W  O L 5.5.0-next-20200204+ #6
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

Other reads,

read to 0xffff91ea33eac400 of 8 bytes by task 11276 on cpu 120:
  __swap_writepage+0x140/0xc20
  __swap_writepage at mm/page_io.c:289

read to 0xffff91ea33eac400 of 8 bytes by task 11264 on cpu 16:
  swap_set_page_dirty+0x44/0x1f4
  swap_set_page_dirty at mm/page_io.c:442

The write is under &si->lock, but the reads are done as lockless.  Since
the reads only check for a specific bit in the flag, it is harmless even
if load tearing happens.  Thus, just mark them as intentional data races
using the data_race() macro.

[[email protected]: add a missing annotation]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Marco Elver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/frontswap: mark various intentional data races

There are a few information counters that are intentionally not protected
against increment races, so just annotate them using the data_race()
macro.

BUG: KCSAN: data-race in __frontswap_store / __frontswap_store

write to 0xffffffff8b7174d8 of 8 bytes by task 6396 on cpu 103:
  __frontswap_store+0x2d0/0x344
  inc_frontswap_failed_stores at mm/frontswap.c:70
  (inlined by) __frontswap_store at mm/frontswap.c:280
  swap_writepage+0x83/0xf0
  pageout+0x33e/0xae0
  shrink_page_list+0x1f57/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

read to 0xffffffff8b7174d8 of 8 bytes by task 6405 on cpu 47:
  __frontswap_store+0x2b9/0x344
  inc_frontswap_failed_stores at mm/frontswap.c:70
  (inlined by) __frontswap_store at mm/frontswap.c:280
  swap_writepage+0x83/0xf0
  pageout+0x33e/0xae0
  shrink_page_list+0x1f57/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290
  __alloc_pages_nodemask+0x3bb/0x450
  alloc_pages_vma+0x8a/0x2c0
  do_anonymous_page+0x170/0x700
  __handle_mm_fault+0xc9f/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm/kmemleak: silence KCSAN splats in checksum

Even if KCSAN is disabled for kmemleak, update_checksum() could still call
crc32() (which is outside of kmemleak.c) to dereference object->pointer.
Thus, the value of object->pointer could be accessed concurrently as
noticed by KCSAN,

BUG: KCSAN: data-race in crc32_le_base / do_raw_spin_lock

write to 0xffffb0ea683a7d50 of 4 bytes by task 23575 on cpu 12:
  do_raw_spin_lock+0x114/0x200
  debug_spin_lock_after at kernel/locking/spinlock_debug.c:91
  (inlined by) do_raw_spin_lock at kernel/locking/spinlock_debug.c:115
  _raw_spin_lock+0x40/0x50
  __handle_mm_fault+0xa9e/0xd00
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x6f9
  page_fault+0x34/0x40

read to 0xffffb0ea683a7d50 of 4 bytes by task 839 on cpu 60:
  crc32_le_base+0x67/0x350
  crc32_le_base+0x67/0x350:
  crc32_body at lib/crc32.c:106
  (inlined by) crc32_le_generic at lib/crc32.c:179
  (inlined by) crc32_le at lib/crc32.c:197
  kmemleak_scan+0x528/0xd90
  update_checksum at mm/kmemleak.c:1172
  (inlined by) kmemleak_scan at mm/kmemleak.c:1497
  kmemleak_scan_thread+0xcc/0xfa
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

If a shattered value was returned due to a data race, it will be corrected
in the next scan.  Thus, let KCSAN ignore all reads in the region to
silence KCSAN in case the write side is non-atomic.

Suggested-by: Marco Elver <[email protected]>
Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Marco Elver <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

all arch: remove system call sys_sysctl

Since commit 61a47c1ad3a4dc ("sysctl: Remove the sysctl system call"),
sys_sysctl is actually unavailable: any input can only return an error.

We have been warning about people using the sysctl system call for years
and believe there are no more users. Even if there are users of this
interface if they have not complained or fixed their code by now they
probably are not going to, so there is no point in warning them any
longer.

So completely remove sys_sysctl on all architectures.

[[email protected]: s390: fix build error for sys_call_table_emu]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Xiaoming Ni <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Will Deacon <[email protected]> [arm/arm64]
Acked-by: "Eric W. Biederman" <[email protected]>
Cc: Aleksa Sarai <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Bin Meng <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: chenzefeng <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: David Howells <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Diego Elio Pettenò <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Iurii Zaikin <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kars de Jong <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Olof Johansson <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Russell King <[email protected]>
Cc: Sami Tolvanen <[email protected]>
Cc: Sargun Dhillon <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Sudeep Holla <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Thiago Jung Bauermann <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Zhou Yanjie <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

fs: autofs: delete repeated words in comments

Drop duplicated words {the, at} in comments.

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Ian Kent <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: introduce offset_in_thp

Mirroring offset_in_page(), this gives you the offset within this
particular page, no matter what size page it is. It optimises down to
offset_in_page() if CONFIG_TRANSPARENT_HUGEPAGE is not set.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: add thp_head

This is like compound_head() but compiles away when
CONFIG_TRANSPARENT_HUGEPAGE is not enabled.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: replace hpage_nr_pages with thp_nr_pages

The thp prefix is more frequently used than hpage and we should be
consistent between the various functions.

[[email protected]: fix mm/migrate.c]

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: add thp_size

This function returns the number of bytes in a THP. It is like
page_size(), but compiles to just PAGE_SIZE if CONFIG_TRANSPARENT_HUGEPAGE
is disabled.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: add thp_order

This function returns the order of a transparent huge page. It compiles
to 0 if CONFIG_TRANSPARENT_HUGEPAGE is disabled.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: move page-flags include to top of file

Give up on the notion that we can remove page-flags.h from mm.h.  There
are currently 14 inline functions which use a PageFoo function.  Also, two
of the files directly included by mm.h include page-flags.h themselves,
and there are probably more indirect inclusions.  So just include it at
the top like any other header file.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mm: store compound_nr as well as compound_order

Patch series "THP prep patches".

These are some generic cleanups and improvements, which I would like
merged into mmotm soon.  The first one should be a performance improvement
for all users of compound pages, and the others are aimed at getting code
to compile away when CONFIG_TRANSPARENT_HUGEPAGE is disabled (ie small
systems).  Also better documented / less confusing than the current prefix
mixture of compound, hpage and thp.

This patch (of 7):

This removes a few instructions from functions which need to know how many
pages are in a compound page.  The storage used is either page->mapping on
64-bit or page->index on 32-bit.  Both of these are fine to overlay on
tail pages.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

mailmap: add entry for Greg Kurz

I had stopped using [email protected] a while back already but this
email address was shutdown last June when I quit IBM. It's about time to
map it to [email protected].

Signed-off-by: Greg Kurz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

selftests/exec: add file type errno tests

Make sure execve() returns the expected errno values for non-regular
files.

Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Marc Zyngier <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

exec: restore EACCES of S_ISDIR execve()

Patch series "Fix S_ISDIR execve() errno".

Fix an errno change for execve() of directories, noticed by Marc Zyngier.
Along with the fix, include a regression test to avoid seeing this return
in the future.

This patch (of 2):

The return code for attempting to execute a directory has always been
EACCES. Adjust the S_ISDIR exec test to reflect the old errno instead of
the general EISDIR for other kinds of "open" attempts on directories.

Fixes: 633fb6ac3980 ("exec: move S_ISREG() check earlier")
Reported-by: Marc Zyngier <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/lkml/20200813151305.6191993b@why
Signed-off-by: Linus Torvalds <[email protected]>

lz4: fix kernel decompression speed

This patch replaces all memcpy() calls with LZ4_memcpy() which calls
__builtin_memcpy() so the compiler can inline it.

LZ4 relies heavily on memcpy() with a constant size being inlined.  In x86
and i386 pre-boot environments memcpy() cannot be inlined because memcpy()
doesn't get defined as __builtin_memcpy().

An equivalent patch has been applied upstream so that the next import
won't lose this change [1].

I've measured the kernel decompression speed using QEMU before and after
this patch for the x86_64 and i386 architectures.  The speed-up is about
10x as shown below.

Code Arch Kernel Size Time Speed
v5.8 x86_64 11504832 B 148 ms 79 MB/s
patch x86_64 11503872 B 13 ms 885 MB/s
v5.8 i386 9621216 B 91 ms 106 MB/s
patch i386 9620224 B 10 ms 962 MB/s

I also measured the time to decompress the initramfs on x86_64, i386, and
arm.  All three show the same decompression speed before and after, as
expected.

[1] https://github.com/lz4/lz4/pull/890

Signed-off-by: Nick Terrell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: Yann Collet <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Sven Schmidt <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arvind Sankar <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

Revert "mm/vmstat.c: do not show lowmem reserve protection information of empty zone"

This reverts commit 26e7deadaae175.

Sonny reported that one of their tests started failing on the latest
kernel on their Chrome OS platform. The root cause is that the above
commit removed the protection line of empty zone, while the parser used in
the test relies on the protection line to mark the end of each zone.

Let's revert it to avoid breaking userspace testing or applications.

Fixes: 26e7deadaae175 ("mm/vmstat.c: do not show lowmem reserve protection information of empty zone)"
Reported-by: Sonny Rao <[email protected]>
Signed-off-by: Baoquan He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: <[email protected]> [5.8.x]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

asm-generic: pgalloc.h: use correct #ifdef to enable pud_alloc_one()

The #ifdef statement that guards the generic version of pud_alloc_one() by
mistake used __HAVE_ARCH_PUD_FREE instead of __HAVE_ARCH_PUD_ALLOC_ONE.

Fix it.

Fixes: d9e8b929670b ("asm-generic: pgalloc: provide generic pud_alloc_one() and pud_free_one()")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Mike Rapoport <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>

sh: landisk: Add missing initialization of sh_io_port_base

The Landisk setup code maps the CF IDE area using ioremap_prot(), and
passes the resulting virtual addresses to the pata_platform driver,
disguising them as I/O port addresses. Hence the pata_platform driver
translates them again using ioport_map().
As CONFIG_GENERIC_IOMAP=n, and CONFIG_HAS_IOPORT_MAP=y, the
SuperH-specific mapping code in arch/sh/kernel/ioport.c translates
I/O port addresses to virtual addresses by adding sh_io_port_base, which
defaults to -1, thus breaking the assumption of an identity mapping.

Fix this by setting sh_io_port_base to zero.

Fixes: 37b7a97884ba64bf ("sh: machvec IO death.")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: bring syscall_set_return_value in line with other architectures

Other architectures expect that syscall_set_return_value gets an already
negative value as error. That's also what kernel/seccomp.c provides.

Signed-off-by: Michael Karcher <[email protected]>
Tested-by: John Paul Adrian Glaubitz <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: Add SECCOMP_FILTER

Port sh to use the new SECCOMP_FILTER code.

Signed-off-by: Michael Karcher <[email protected]>
Tested-by: John Paul Adrian Glaubitz <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: Rearrange blocks in entry-common.S

This avoids out-of-range jumps that get auto-replaced by the assembler
and prepares for the changes needed to implement SECCOMP_FILTER cleanly.

Signed-off-by: Michael Karcher <[email protected]>
Tested-by: John Paul Adrian Glaubitz <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: switch to copy_thread_tls()

Use the copy_thread_tls() calling convention which passes tls through a
register. This is required so we can remove the copy_thread{_tls}() split
and remove the HAVE_COPY_THREAD_TLS macro.

Cc: Rich Felker <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: use the generic dma coherent remap allocator

This switches to using common code for the DMA allocations, including
potential use of the CMA allocator if configured.

Switching to the generic code enables DMA allocations from atomic
context, which is required by the DMA API documentation, and also
adds various other minor features drivers start relying upon. It
also makes sure we have on tested code base for all architectures
that require uncached pte bits for coherent DMA allocations.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: don't allow non-coherent DMA for NOMMU

The code handling non-coherent DMA depends on being able to remap code
as non-cached. But that can't be done without an MMU, so using this
option on NOMMU builds is broken.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

dma-mapping: consolidate the NO_DMA definition in kernel/dma/Kconfig

Have a single definition that architetures can select.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: unexport register_trapped_io and match_trapped_io_handler

Both functions are only used by compiled in core code.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: don't include <asm/io_trapped.h> in <asm/io.h>

No need to expose the details of trapped I/O to drivers.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: move the ioremap implementation out of line

Move the internal implementation details of ioremap out of line, no need
to expose any of this to drivers for a slow path API.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: move ioremap_fixed details out of <asm/io.h>

ioremap_fixed is an internal implementation detail and should not be
exposed to drivers.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: remove __KERNEL__ ifdefs from non-UAPI headers

There is no point in having __KERNEL__ ifdefs in headers not exported to
userspace.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: sort the selects for SUPERH alphabetically

Ensure there is an order for the selects. Also remove a duplicate
one.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: remove -Werror from Makefiles

The sh build is full of warnings when building with gcc 9.2.1. While
fixing those would be great, at least avoid failing the build.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: Replace HTTP links with HTTPS ones

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
  If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
            If both the HTTP and HTTPS versions
            return 200 OK and serve the same content:
              Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

arch/sh/configs: remove obsolete CONFIG_SOC_CAMERA*

Drop all configs with the CONFIG_SOC_CAMERA prefix since those
have been removed.

SOC_CAMERA support for the sh architecture was removed a long time ago.
Drop it from the configs.

Signed-off-by: Hans Verkuil <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: stacktrace: Remove stacktrace_ops.stack()

The SH implementation never called stacktrace_ops.stack().
Presumably this was copied from the x86 implementation.

Hence remove the method, and all implementations (most of them are
dummies).

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: machvec: Modernize printing of kernel messages

- Convert from printk() to pr_*(),
- Add missing continuations.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: pci: Modernize printing of kernel messages

  - Convert from printk() to pr_*(),
  - Add missing continuations,
  - Join broken messages.

Note that printk(KERN_DEBUG ...) is retained, to preserve behavior
(pr_debug() is a dummy if DEBUG is not defined).

Signed-off-by: Geert Uytterhoeven <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: sh2007: Modernize printing of kernel messages

- Convert from printk() to pr_*(),
- Add missing continuation.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: process: Fix broken lines in register dumps

Rejoin the broken lines by using pr_cont().
Convert the remaining printk() calls to pr_*() while at it.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: dump_stack: Fix broken lines and ptrval in calltrace dumps

Rejoin the broken lines by dropping the log level parameters and using
pr_cont().
Use "%px" to print sensible addresses in call traces.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: kernel: disassemble: Fix broken lines in disassembly dumps

Rejoin the broken lines by using pr_cont().
Convert the remaining printk() calls to pr_*() while at it.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

Revert "sh: remove needless printk()"

This reverts commit 8b92f34877225c8eb85e3ab7f1177fc248ba26d0.

"data" became the log level in commit 539e786cc37ee5cb ("sh: add loglvl
to show_trace()"), so we do need to keep the printk() before the
continuation in print_trace_address().

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

Revert "sh: add loglvl to printk_address()"

This reverts commit 2deebe4d56d638269a4a728086d64de5734b460a.

printk_address() is always used as a continuation of the previous
logging, hence it should not include a log level.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: fault: Fix duplicate printing of "PC:"

Somewhere along the patch handling path, both the old "printk(KERN_ALERT
....)" and the new "pr_alert(...)" were retained, leading to the
duplicate printing of "PC:".

Drop the old one.

Fixes: eaabf98b0932a540 ("sh: fault: modernize printing of kernel messages")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

input: i8042 - Remove special Cayman handling

As of commit 37744feebc086908 ("sh: remove sh5 support"), support for
the SH5-based Cayman platform can no longer be selected.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: Remove SH5-based Cayman platform

Since the removal of core support for SH5, Cayman support can no longer
be selected.

Fixes: 37744feebc086908 ("sh: remove sh5 support")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

arch: sh: smc37c93x: fix spelling mistake

Fix typo: "triger" --> "trigger"

Signed-off-by: Flavio Suligoi <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

Revert "sh: add missing EXPORT_SYMBOL() for __delay"

This reverts commit d1f56f318d234fc5db230af2f3e0088f689ab3c0.

__delay() is an internal implementation detail on several architectures.
Drivers should not call __delay() directly, as it has non-standardized
semantics, or may not even exist.
Hence there is no need to export __delay() to modules.

See also include/asm-generic/delay.h:

    /* Undefined functions to get compile-time errors */
    ...
    extern void __delay(unsigned long loops);

Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: Implement __get_user_u64() required for 64-bit get_user()

Trying to build the kernel with CONFIG_INFINIBAND_USER_ACCESS enabled fails

ERROR: "__get_user_unknown" [drivers/infiniband/core/ib_uverbs.ko] undefined!

with on SH since the kernel misses a 64-bit implementation of get_user().

Implement the missing 64-bit get_user() as __get_user_u64(), matching the
already existing __put_user_u64() which implements the 64-bit put_user().

Signed-off-by: John Paul Adrian Glaubitz <[email protected]>
Acked-by: Yoshinori Sato <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: remove call to memset after dma_alloc_coherent

Function dma_alloc_coherent use in buf already zeroes out memory,
so memset is not needed.

Signed-off-by: Chen Zhou <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

sh: Fix unneeded constructor in page table allocation

The pgd kmem_cache allocation both specified __GFP_ZERO and had a
constructor which makes no sense. Remove __GFP_ZERO and zero the user
parts of the pgd explicitly.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Rich Felker <[email protected]>

riscv: Setup exception vector for nommu platform

Exception vector is missing on nommu platform and that is an issue.
This patch is tested in Sipeed Maix Bit Dev Board.

Fixes: 79b1feba5455 ("RISC-V: Setup exception vector early")
Suggested-by: Anup Patel <[email protected]>
Suggested-by: Atish Patra <[email protected]>
Signed-off-by: Qiu Wenbo <[email protected]>
Reviewed-by: Atish Patra <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
"This is the set of patches which arrived too late to stabilise in
  -next for the first pull.

  It's really just an lpfc driver update and an assortment of minor
  fixes, all in drivers. The only core update is to the zone block
  device driver, which isn't the one most people use"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: lpfc: Update lpfc version to 12.8.0.3
  scsi: lpfc: Fix LUN loss after cable pull
  scsi: lpfc: Fix validation of bsg reply lengths
  scsi: lpfc: Fix retry of PRLI when status indicates its unsupported
  scsi: lpfc: Fix oops when unloading driver while running mds diags
  scsi: lpfc: Fix RSCN timeout due to incorrect gidft counter
  scsi: lpfc: Fix no message shown for lpfc_hdw_queue out of range value
  scsi: lpfc: Fix FCoE speed reporting
  scsi: lpfc: Add missing misc_deregister() for lpfc_init()
  scsi: lpfc: nvmet: Avoid hang / use-after-free again when destroying targetport
  scsi: scsi_transport_sas: Add spaces around binary operator "|"
  scsi: sd_zbc: Improve zone revalidation
  scsi: libfc: Free skb in fc_disc_gpn_id_resp() for valid cases
  scsi: fcoe: Memory leak fix in fcoe_sysfs_fcf_del()
  scsi: target: Make iscsit_register_transport() return void

Merge tag 'pwm/for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm

Pull pwm updates from Thierry Reding:
"The majority of this batch is conversion of the PWM period and duty
  cycle to 64-bit unsigned integers, which is required so that some
  types of hardware can generate the full range of signals that they're
  capable of.

  The remainder is mostly minor fixes and cleanups"

* tag 'pwm/for-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
  pwm: bcm-iproc: handle clk_get_rate() return
  pwm: Replace HTTP links with HTTPS ones
  pwm: omap-dmtimer: Repair pwm_omap_dmtimer_chip's broken kerneldoc header
  pwm: mediatek: Provide missing kerneldoc description for 'soc' arg
  pwm: bcm-kona: Remove impossible comparison when validating duty cycle
  pwm: bcm-iproc: Remove impossible comparison when validating duty cycle
  pwm: iqs620a: Use lowercase hexadecimal literals for consistency
  pwm: Convert period and duty cycle to u64
  clk: pwm: Use 64-bit division function
  backlight: pwm_bl: Use 64-bit division function
  pwm: sun4i: Use nsecs_to_jiffies to avoid a division
  pwm: sifive: Use 64-bit division macro
  pwm: iqs620a: Use 64-bit division
  pwm: imx27: Use 64-bit division macro
  pwm: imx-tpm: Use 64-bit division macro
  pwm: clps711x: Use 64-bit division macro
  hwmon: pwm-fan: Use 64-bit division macro
  drm/i915: Use 64-bit division macro

Merge tag 'sound-fix-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"All device-specific small fixes and quirks mostly for usual suspects,
  USB-audio and HD-audio"

* tag 'sound-fix-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: echoaudio: Fix potential Oops in snd_echo_resume()
  ALSA: hda/hdmi: Use force connectivity quirk on another HP desktop
  ALSA: hda/realtek - Fix unused variable warning
  ALSA: hda - reverse the setting value in the micmute_led_set
  ALSA: echoaduio: Drop superfluous volatile modifier
  ALSA: usb-audio: Disable Lenovo P620 Rear line-in volume control
  ALSA: usb-audio: add quirk for Pioneer DDJ-RB
  ALSA: usb-audio: work around streaming quirk for MacroSilicon MS2109
  ALSA: hda - fix the micmute led status for Lenovo ThinkCentre AIO
  ALSA: usb-audio: fix overeager device match for MacroSilicon MS2109
  ALSA: hda/realtek: Fix pin default on Intel NUC 8 Rugged
  ALSA: usb-audio: Creative USB X-Fi Pro SB1095 volume knob support
  ALSA: usb-audio: fix spelling mistake "buss" -> "bus"

dma-debug: remove debug_dma_assert_idle() function

This remoes the code from the COW path to call debug_dma_assert_idle(),
which was added many years ago.

Google shows that it hasn't caught anything in the 6+ years we've had it
apart from a false positive, and Hugh just noticed how it had a very
unfortunate spinlock serialization in the COW path.

He fixed that issue the previous commit (a85ffd59bd36: "dma-debug: fix
debug_dma_assert_idle(), use rcu_read_lock()"), but let's see if anybody
even notices when we remove this function entirely.

NOTE! We keep the dma tracking infrastructure that was added by the
commit that introduced it. Partly to make it easier to resurrect this
debug code if we ever deside to, and partly because that tracking by pfn
and offset looks quite reasonable.

The problem with this debug code was simply that it was expensive and
didn't seem worth it, not that it was wrong per se.

Acked-by: Dan Williams <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

dma-debug: fix debug_dma_assert_idle(), use rcu_read_lock()

Since commit 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common()
logic") improved unlock_page(), it has become more noticeable how
cow_user_page() in a kernel with CONFIG_DMA_API_DEBUG=y can create and
suffer from heavy contention on DMA debug's radix_lock in
debug_dma_assert_idle().

It is only doing a lookup: use rcu_read_lock() and rcu_read_unlock()
instead; though that does require the static ents[] to be moved
onstack...

...but, hold on, isn't that radix_tree_gang_lookup() and loop doing
quite the wrong thing: searching CACHELINES_PER_PAGE entries for an
exact match with the first cacheline of the page in question?
radix_tree_gang_lookup() is the right tool for the job, but we need
nothing more than to check the first entry it can find, reporting if
that falls anywhere within the page.

(Is RCU safe here? As safe as using the spinlock was. The entries are
never freed, so don't need to be freed by RCU. They may be reused, and
there is a faint chance of a race, with an offending entry reused while
printing its error info; but the spinlock did not prevent that either,
and I agree that it's not worth worrying about. ]

[ Side noe: this patch is a clear improvement to the status quo, but the
  next patch will be removing this debug function entirely.

  But just in case we decide we want to resurrect the debugging code
  some day, I'm first applying this improvement patch so that it doesn't
  get lost    - Linus ]

Fixes: 3b7a6418c749 ("dma debug: account for cachelines and read-only mappings in overlap tracking")
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Dan Williams <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'timers-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timekeeping updates from Thomas Gleixner:
"A set of timekeeping/VDSO updates:

   - Preparatory work to allow S390 to switch over to the generic VDSO
     implementation.

     S390 requires that the VDSO data pointer is handed in to the
     counter read function when time namespace support is enabled.
     Adding the pointer is a NOOP for all other architectures because
     the compiler is supposed to optimize that out when it is unused in
     the architecture specific inline. The change also solved a similar
     problem for MIPS which fortunately has time namespaces not yet
     enabled.

     S390 needs to update clock related VDSO data independent of the
     timekeeping updates. This was solved so far with yet another
     sequence counter in the S390 implementation. A better solution is
     to utilize the already existing VDSO sequence count for this. The
     core code now exposes helper functions which allow to serialize
     against the timekeeper code and against concurrent readers.

     S390 needs extra data for their clock readout function. The initial
     common VDSO data structure did not provide a way to add that. It
     now has an embedded architecture specific struct embedded which
     defaults to an empty struct.

     Doing this now avoids tree dependencies and conflicts post rc1 and
     allows all other architectures which work on generic VDSO support
     to work from a common upstream base.

   - A trivial comment fix"

* tag 'timers-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Delete repeated words in comments
  lib/vdso: Allow to add architecture-specific vdso data
  timekeeping/vsyscall: Provide vdso_update_begin/end()
  vdso/treewide: Add vdso_data pointer argument to __arch_get_hw_counter()

Merge tag 'timers-core-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more timer updates from Thomas Gleixner:
"A set of posix CPU timer changes which allows to defer the heavy work
  of posix CPU timers into task work context. The tick interrupt is
  reduced to a quick check which queues the work which is doing the
  heavy lifting before returning to user space or going back to guest
  mode. Moving this out is deferring the signal delivery slightly but
  posix CPU timers are inaccurate by nature as they depend on the tick
  so there is no real damage. The relevant test cases all passed.

  This lifts the last offender for RT out of the hard interrupt context
  tick handler, but it also has the general benefit that the actual
  heavy work is accounted to the task/process and not to the tick
  interrupt itself.

  Further optimizations are possible to break long sighand lock hold and
  interrupt disabled (on !RT kernels) times when a massive amount of
  posix CPU timers (which are unpriviledged) is armed for a
  task/process.

  This is currently only enabled for x86 because the architecture has to
  ensure that task work is handled in KVM before entering a guest, which
  was just established for x86 with the new common entry/exit code which
  got merged post 5.8 and is not the case for other KVM architectures"

* tag 'timers-core-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Select POSIX_CPU_TIMERS_TASK_WORK
  posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
  posix-cpu-timers: Split run_posix_cpu_timers()

Merge tag 'irq-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
"Two fixes in the core interrupt code which ensure that all error exits
  unlock the descriptor lock"

* tag 'irq-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Unlock irq descriptor after errors
  genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq()

Merge tag 'for-linus' of git://github.com/openrisc/linux

Pull OpenRISC updates from Stafford Horne:
"A few patches all over the place during this cycle, mostly bug and
  sparse warning fixes for OpenRISC, but a few enhancements too. Note,
  there are 2 non OpenRISC specific fixups.

  Non OpenRISC fixes:

   - In init we need to align the init_task correctly to fix an issue
     with MUTEX_FLAGS, reviewed by Peter Z. No one picked this up so I
     kept it on my tree.

   - In asm-generic/io.h I fixed up some sparse warnings, OK'd by Arnd.
     Arnd asked to merge it via my tree.

  OpenRISC fixes:

   - Many fixes for OpenRISC sprase warnings.

   - Add support OpenRISC SMP tlb flushing rather than always flushing
     the entire TLB on every CPU.

   - Fix bug when dumping stack via /proc/xxx/stack of user threads"

* tag 'for-linus' of git://github.com/openrisc/linux:
  openrisc: uaccess: Add user address space check to access_ok
  openrisc: signal: Fix sparse address space warnings
  openrisc: uaccess: Remove unused macro __addr_ok
  openrisc: uaccess: Use static inline function in access_ok
  openrisc: uaccess: Fix sparse address space warnings
  openrisc: io: Fixup defines and move include to the end
  asm-generic/io.h: Fix sparse warnings on big-endian architectures
  openrisc: Implement proper SMP tlb flushing
  openrisc: Fix oops caused when dumping stack
  openrisc: Add support for external initrd images
  init: Align init_task to avoid conflict with MUTEX_FLAGS
  openrisc: fix __user in raw_copy_to_user()'s prototype

Merge tag 'powerpc-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
"One fix for a boot crash on some platforms introduced by the recent
  pkey refactoring.

  Thanks to Christian Zigotzky and Aneesh Kumar K.V"

* tag 'powerpc-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pkeys: Fix boot failures with Nemo board (A-EON AmigaOne X1000)

Merge tag 'for-linus-5.9-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull more xen updates from Juergen Gross:

- Remove support for running as 32-bit Xen PV-guest.

   32-bit PV guests are rarely used, are lacking security fixes for
   Meltdown, and can be easily replaced by PVH mode. Another series for
   doing more cleanup will follow soon (removal of 32-bit-only pvops
   functionality).

- Fixes and additional features for the Xen display frontend driver.

* tag 'for-linus-5.9-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  drm/xen-front: Pass dumb buffer data offset to the backend
  xen: Sync up with the canonical protocol definition in Xen
  drm/xen-front: Add YUYV to supported formats
  drm/xen-front: Fix misused IS_ERR_OR_NULL checks
  xen/gntdev: Fix dmabuf import with non-zero sgt offset
  x86/xen: drop tests for highmem in pv code
  x86/xen: eliminate xen-asm_64.S
  x86/xen: remove 32-bit Xen PV guest support

Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyper-v fixes from Wei Liu:

- fix oops reporting on Hyper-V

- make objtool happy

* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Make hv_setup_sched_clock inline
Drivers: hv: vmbus: Only notify Hyper-V for die events that are oops

x86/fsgsbase/64: Fix NULL deref in 86_fsgsbase_read_task

syzbot found its way in 86_fsgsbase_read_task() and triggered this oops:

   KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
   CPU: 0 PID: 6866 Comm: syz-executor262 Not tainted 5.8.0-syzkaller #0
   RIP: 0010:x86_fsgsbase_read_task+0x16d/0x310 arch/x86/kernel/process_64.c:393
   Call Trace:
     putreg32+0x3ab/0x530 arch/x86/kernel/ptrace.c:876
     genregs32_set arch/x86/kernel/ptrace.c:1026 [inline]
     genregs32_set+0xa4/0x100 arch/x86/kernel/ptrace.c:1006
     copy_regset_from_user include/linux/regset.h:326 [inline]
     ia32_arch_ptrace arch/x86/kernel/ptrace.c:1061 [inline]
     compat_arch_ptrace+0x36c/0xd90 arch/x86/kernel/ptrace.c:1198
     __do_compat_sys_ptrace kernel/ptrace.c:1420 [inline]
     __se_compat_sys_ptrace kernel/ptrace.c:1389 [inline]
     __ia32_compat_sys_ptrace+0x220/0x2f0 kernel/ptrace.c:1389
     do_syscall_32_irqs_on arch/x86/entry/common.c:84 [inline]
     __do_fast_syscall_32+0x57/0x80 arch/x86/entry/common.c:126
     do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:149
     entry_SYSENTER_compat_after_hwframe+0x4d/0x5c

This can happen if ptrace() or sigreturn() pokes an LDT selector into FS
or GS for a task with no LDT and something tries to read the base before
a return to usermode notices the bad selector and fixes it.

The fix is to make sure ldt pointer is not NULL.

Fixes: 07e1d88adaae ("x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately")
Co-developed-by: Jann Horn <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: Chang S. Bae <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Markus T Metzger <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Shankar <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
"This fixes a regression in af_alg"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: algif_aead - fix uninitialized ctx->init

Merge tag 'modules-for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

Pull module updates from Jessica Yu:
"The most important change would be Christoph Hellwig's patch
  implementing proprietary taint inheritance, in an effort to discourage
  the creation of GPL "shim" modules that interface between GPL symbols
  and proprietary symbols.

  Summary:

   - Have modules that use symbols from proprietary modules inherit the
     TAINT_PROPRIETARY_MODULE taint, in an effort to prevent GPL shim
     modules that are used to circumvent _GPL exports. These are modules
     that claim to be GPL licensed while also using symbols from
     proprietary modules. Such modules will be rejected while non-GPL
     modules will inherit the proprietary taint.

   - Module export space cleanup. Unexport symbols that are unused
     outside of module.c or otherwise used in only built-in code"

* tag 'modules-for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  modules: inherit TAINT_PROPRIETARY_MODULE
  modules: return licensing information from find_symbol
  modules: rename the licence field in struct symsearch to license
  modules: unexport __module_address
  modules: unexport __module_text_address
  modules: mark each_symbol_section static
  modules: mark find_symbol static
  modules: mark ref_module static
  modules: linux/moduleparam.h: drop duplicated word in a comment

Merge tag 'kconfig-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kconfig updates from Masahiro Yamada:

- remove '---help---' keyword support

- fix mouse events for 'menuconfig' symbols in search view of qconf

- code cleanups of qconf

* tag 'kconfig-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (24 commits)
  kconfig: qconf: move setOptionMode() to ConfigList from ConfigView
  kconfig: qconf: do not limit the pop-up menu to the first row
  kconfig: qconf: refactor icon setups
  kconfig: qconf: remove unused voidPix, menuInvPix
  kconfig: qconf: remove ConfigItem::text/setText
  kconfig: qconf: remove ConfigList::addColumn/removeColumn
  kconfig: qconf: remove ConfigItem::pixmap/setPixmap
  kconfig: qconf: drop more localization code
  kconfig: qconf: remove 'parent' from ConfigList::updateMenuList()
  kconfig: qconf: remove unused argument from ConfigView::updateList()
  kconfig: qconf: remove unused argument from ConfigList::updateList()
  kconfig: qconf: omit parent to QHBoxLayout()
  kconfig: qconf: remove name from ConfigSearchWindow constructor
  kconfig: qconf: remove unused ConfigList::listView()
  kconfig: qconf: overload addToolBar() to create and insert toolbar
  kconfig: qconf: remove toolBar from ConfigMainWindow members
  kconfig: qconf: use 'menu' variable for (QMenu *)
  kconfig: qconf: do not use 'menu' variable for (QMenuBar *)
  kconfig: qconf: remove ->addSeparator() to menuBar
  kconfig: add 'static' to some file-local data
  ...

Merge branch 'pm-cpufreq'

* pm-cpufreq:
cpufreq: intel_pstate: Implement passive mode with HWP enabled

dt-bindings: Remove more cases of 'allOf' containing a '$ref'

Another wack-a-mole pass of killing off unnecessary 'allOf + $ref'
usage.

json-schema versions draft7 and earlier have a weird behavior in that
any keywords combined with a '$ref' are ignored (silently). The correct
form was to put a '$ref' under an 'allOf'. This behavior is now changed
in the 2019-09 json-schema spec and '$ref' can be mixed with other
keywords. The json-schema library doesn't yet support this, but the
tooling now does a fixup for this and either way works.

This has been a constant source of review comments, so let's change this
treewide so everyone copies the simpler syntax.

Signed-off-by: Rob Herring <[email protected]>

dt-bindings: Whitespace clean-ups in schema files

Clean-up incorrect indentation, extra spaces, long lines, and missing
EOF newline in schema files. Most of the clean-ups are for list
indentation which should always be 2 spaces more than the preceding
keyword.

Found with yamllint (which I plan to integrate into the checks).

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Acked-by: Sam Ravnborg <[email protected]>
Signed-off-by: Rob Herring <[email protected]>

perf ftrace: Make option description initials all capital letters

And improve a bit the -m description to state that a B/K/M/G suffix is
needed.

Cc: Changbin Du <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf build-ids: Fall back to debuginfod query if debuginfo not found

During a perf-record, use the -ldebuginfod API to query a debuginfod
server, should the debug data not be found in the usual system
locations.  If successful, the usual $HOME/.debug dir is populated.

Tested with:

  $ find .
  .
  ./ctags-debuginfo-5.8-26.fc31.x86_64.rpm
  ./usr
  ./usr/lib
  ./usr/lib/debug
  ./usr/lib/debug/.build-id
  ./usr/lib/debug/.build-id/ca
  ./usr/lib/debug/.build-id/ca/46f6ae6a0cee57d85abc1d461c49074248908d
  ./usr/lib/debug/.build-id/ca/46f6ae6a0cee57d85abc1d461c49074248908d.debug
  ./usr/lib/debug/usr
  ./usr/lib/debug/usr/bin
  ./usr/lib/debug/usr/bin/ctags-5.8-26.fc31.x86_64.debug

  $ debuginfod  -F .
  ...

  $ rm -rf ~/.debug/ ; mkdir ~/.debug

  $ perf record make tags
    BUILD:   Doing 'make -j8' parallel build
    GEN      tags
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.107 MB perf.data (1483 samples) ]

  $ find ~/.debug | grep ctags
  /home/jolsa/.debug/usr/bin/ctags
  /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d
  /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d/elf
  /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d/probes

  $ rm -rf ~/.debug/ ; mkdir ~/.debug

  $ DEBUGINFOD_URLS=http://localhost:8002 perf record make tags
    BUILD:   Doing 'make -j8' parallel build
    GEN      tags
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.108 MB perf.data (1531 samples) ]

  $ find ~/.debug | grep ctag
  /home/jolsa/.debug/usr/bin/ctags
  /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d
  /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d/debug
  /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d/elf
  /home/jolsa/.debug/usr/bin/ctags/ca46f6ae6a0cee57d85abc1d461c49074248908d/probes

Note the 'debug' file is created in the last run.

Note that currently the debuginfo data are downloaded only on record path,
we still need add this support to perf build-id/report.. and test ;-)

Tested-by: Jiri Olsa <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Frank Ch. Eigler <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf bench numa: Remove dead code in parse_nodes_opt()

In the function parse_nodes_opt(), the statement "return 0;" is dead
code, remove it.

Signed-off-by: Peng Fan <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf stat: Update POWER9 metrics to utilize other metrics

These changes take advantage of the new capability added in merge commit
00e4db51259a5f936fec1424b884f029479d3981 "Allow using computed metrics
in calculating other metrics".

The net is a simplification of the expressions for a handful of metrics,
but no functional change.

Signed-off-by: Paul Clarke <[email protected]>
Reviewed-by: Kajol Jain <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Add change log

Add a change log after previous enhancements.

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf: ftrace: Add set_tracing_options() to set all trace options

Now the __cmd_ftrace() becomes a bit long. This moves the trace option
setting code to a separate function set_tracing_options().

Suggested-by: Namhyung Kim <[email protected]>
Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Add option --tid to filter by thread id

This allows us to trace single thread instead of the whole process.

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Add option -D/--delay to delay tracing

This adds an option '-D/--delay' to allow us to start tracing some times
later after workload is launched.

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf: ftrace: Allow set graph depth by '--graph-opts'

This is to have a consistent view of all graph tracer options.
The original option '--graph-depth' is marked as deprecated.

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Add support for trace option tracing_thresh

This adds an option '--graph-opts thresh' to setup trace duration
threshold for funcgraph tracer.

  $ sudo ./perf ftrace -G '*' --graph-opts thresh=100
   3) ! 184.060 us  |    } /* schedule */
   3) ! 185.600 us  |  } /* exit_to_usermode_loop */
   2) ! 225.989 us  |    } /* schedule_idle */
   2) # 4140.051 us |  } /* do_idle */

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Add option 'verbose' to show more info for graph tracer

Sometimes we want ftrace display more and longer information about the
trace.

  $ sudo perf ftrace -G '*'
   2)   0.979 us    |  mutex_unlock();
   2)   1.540 us    |  __fsnotify_parent();
   2)   0.433 us    |  fsnotify();

  $ sudo perf ftrace -G '*' --graph-opts verbose
  14160.770883 |   0)  <...>-47814   |  .... |   1.289 us    |  mutex_unlock();
  14160.770886 |   0)  <...>-47814   |  .... |   1.624 us    |  __fsnotify_parent();
  14160.770887 |   0)  <...>-47814   |  .... |   0.636 us    |  fsnotify();
  14160.770888 |   0)  <...>-47814   |  .... |   0.328 us    |  __sb_end_write();
  14160.770888 |   0)  <...>-47814   |  d... |   0.430 us    |  fpregs_assert_state_consistent();
  14160.770889 |   0)  <...>-47814   |  d... |               |  do_syscall_64() {
  14160.770889 |   0)  <...>-47814   |  .... |               |    __x64_sys_close() {

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Add support for tracing option 'irq-info'

This adds support to display irq context info for function tracer. To do
this, just specify a '--func-opts irq-info' option.

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Add support for trace option funcgraph-irqs

This adds an option '--graph-opts noirqs' to filter out functions executed
in irq context.

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Add support for trace option sleep-time

This adds an option '--graph-opts nosleep-time' which allow us only to
measure on-CPU time. This option is function_graph tracer only.

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Add support for tracing option 'func_stack_trace'

This adds support to display call trace for function tracer. To do this,
just specify a '--func-opts call-graph' option.

Example:

  $ sudo perf ftrace -T vfs_read --func-opts call-graph
   iio-sensor-prox-855   [003]   6168.369657: vfs_read <-ksys_read
   iio-sensor-prox-855   [003]   6168.369677: <stack trace>
   => vfs_read
   => ksys_read
   => __x64_sys_read
   => do_syscall_64
   => entry_SYSCALL_64_after_hwframe
   ...

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf tools: Add general function to parse sublevel options

This factors out a general function perf_parse_sublevel_options() to
parse sublevel options. The 'sublevel' options is something like the
'--debug' options which allow more sublevel options.

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Add option '--inherit' to trace children processes

This adds an option '--inherit' to allow us trace children
processes spawned by our target.

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Show trace column header

This makes 'perf ftrace' display column header before printing trace.

  $ sudo perf ftrace
  # tracer: function
  #
  # entries-in-buffer/entries-written: 0/0   #P:8
  #
  #            TASK-PID     CPU#   TIMESTAMP  FUNCTION
  #              | |         |       |         |
             <...>-9246  [006]  10726.262760: mutex_unlock <-rb_simple_write
             <...>-9246  [006]  10726.262764: __fsnotify_parent <-vfs_write
             <...>-9246  [006]  10726.262765: fsnotify <-vfs_write
             <...>-9246  [006]  10726.262766: __sb_end_write <-vfs_write
             <...>-9246  [006]  10726.262767: fpregs_assert_state_consistent <-do_syscall_64

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Add option '-m/--buffer-size' to set per-cpu buffer size

This adds an option '-m/--buffer-size' to allow us set the size of per-cpu
tracing buffer.

Committer testing:

Before running with this option:

  # find /sys/kernel/tracing/ -name buffer_size_kb | xargs cat
  1408
  1408
  1408
  1408
  1408
  1408
  1408
  1408
  1408
  #

Then, run:

  # perf ftrace -m 2048K | head -10
   2)               |  mutex_unlock() {
   2)   ==========> |
   2)               |    smp_irq_work_interrupt() {
   2)               |      irq_enter() {
   2)   0.121 us    |        rcu_irq_enter();
   2)   0.128 us    |        irqtime_account_irq();
   2)   0.719 us    |      }
   2)               |      __wake_up() {
   2)               |        __wake_up_common_lock() {
   2)   0.105 us    |          _raw_spin_lock_irqsave();
  #

Now look at those tracefs knobs:

  # find /sys/kernel/tracing/ -name buffer_size_kb | xargs cat
  2048
  2048
  2048
  2048
  2048
  2048
  2048
  2048
  2048
  #

This should be similar to the -m option in the other perf tools, such as
'perf record', 'perf trace', etc.

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf ftrace: Factor out function write_tracing_file_int()

We will reuse this function later.

Signed-off-by: Changbin Du <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt (VMware) <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>