Git Repo - linux.git/log

tc-testing: add ETS scheduler to tdc build configuration

add CONFIG_NET_SCH_ETS to 'config', otherwise test suites using this file
to perform a full tdc run will encounter the following warning:

ok 645 e90e - Add ETS qdisc using bands # skipped - "-----> teardown stage" did not complete successfully

Fixes: 82c664b69c8b ("selftests: qdiscs: Add test coverage for ETS Qdisc")
Reported-by: Jamal Hadi Salim <[email protected]>
Signed-off-by: Davide Caratti <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: phy: fix MDIO bus PM PHY resuming

So far we have the unfortunate situation that mdio_bus_phy_may_suspend()
is called in suspend AND resume path, assuming that function result is
the same. After the original change this is no longer the case,
resulting in broken resume as reported by Geert.

To fix this call mdio_bus_phy_may_suspend() in the suspend path only,
and let the phy_device store the info whether it was suspended by
MDIO bus PM.

Fixes: 503ba7c69610 ("net: phy: Avoid multiple suspends")
Reported-by: Geert Uytterhoeven <[email protected]>
Tested-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cifs_atomic_open(): fix double-put on late allocation failure

several iterations of ->atomic_open() calling conventions ago, we
used to need fput() if ->atomic_open() failed at some point after
successful finish_open().  Now (since 2016) it's not needed -
struct file carries enough state to make fput() work regardless
of the point in struct file lifecycle and discarding it on
failure exits in open() got unified.  Unfortunately, I'd missed
the fact that we had an instance of ->atomic_open() (cifs one)
that used to need that fput(), as well as the stale comment in
finish_open() demanding such late failure handling.  Trivially
fixed...

Fixes: fe9ec8291fca "do_last(): take fput() on error after opening to out:"
Cc: [email protected] # v4.7+
Signed-off-by: Al Viro <[email protected]>

gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcache

with the way fs/namei.c:do_last() had been done, ->atomic_open()
instances needed to recognize the case when existing file got
found with O_EXCL|O_CREAT, either by falling back to finish_no_open()
or failing themselves. gfs2 one didn't.

Fixes: 6d4ade986f9c (GFS2: Add atomic_open support)
Cc: [email protected] # v3.11
Signed-off-by: Al Viro <[email protected]>

Merge branch 'hns3-fixes'

Huazhong Tan says:

====================
net: hns3: fixes for -net

This series includes several bugfixes for the HNS3 ethernet driver.

[patch 1] fixes an "tc qdisc del" failure.
[patch 2] fixes SW & HW VLAN table not consistent issue.
[patch 3] fixes a RMW issue related to VLAN filter switch.
[patch 4] clears port based VLAN when uploading PF.
====================

Signed-off-by: David S. Miller <[email protected]>

net: hns3: clear port base VLAN when unload PF

Currently, PF missed to clear the port base VLAN for VF when
unload. In this case, the VLAN id will remain in the VLAN
table. This patch fixes it.

Fixes: 92f11ea177cd ("net: hns3: fix set port based VLAN issue for VF")
Signed-off-by: Jian Shen <[email protected]>
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: fix RMW issue for VLAN filter switch

According to the user manual, the ingress and egress VLAN filter
are configured at the same time. Currently, hclge_init_vlan_config()
and hclge_set_vlan_spoofchk() will both change the VLAN filter
switch. So it's necessary to read the old configuration before
modifying it.

Fixes: 22044f95faa0 ("net: hns3: add support for spoof check setting")
Signed-off-by: Jian Shen <[email protected]>
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: fix VF VLAN table entries inconsistent issue

Currently, if VF is loaded on the host side, the host doesn't
clear the VF's VLAN table entries when VF removing. In this
case, when doing reset and disabling sriov at the same time the
VLAN device over VF will be removed, but the VLAN table entries
in hardware are remained.

This patch fixes it by asking PF to clear the VLAN table entries for
VF when VF is removing. It also clears the VLAN table full bit
after VF VLAN table entries being cleared.

Fixes: c6075b193462 ("net: hns3: Record VF vlan tables")
Signed-off-by: Jian Shen <[email protected]>
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: hns3: fix "tc qdisc del" failed issue

The HNS3 driver supports to configure TC numbers and TC to priority
map via "tc" tool. But when delete the rule, will fail, because
the HNS3 driver needs at least one TC, but the "tc" tool sets TC
number to zero when delete.

This patch makes sure that the TC number is at least one.

Fixes: 30d240dfa2e8 ("net: hns3: Add mqprio hardware offload support in hns3 driver")
Signed-off-by: Yonglong Liu <[email protected]>
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

taprio: Fix sending packets without dequeueing them

There was a bug that was causing packets to be sent to the driver
without first calling dequeue() on the "child" qdisc. And the KASAN
report below shows that sending a packet without calling dequeue()
leads to bad results.

The problem is that when checking the last qdisc "child" we do not set
the returned skb to NULL, which can cause it to be sent to the driver,
and so after the skb is sent, it may be freed, and in some situations a
reference to it may still be in the child qdisc, because it was never
dequeued.

The crash log looks like this:

[   19.937538] ==================================================================
[   19.938300] BUG: KASAN: use-after-free in taprio_dequeue_soft+0x620/0x780
[   19.938968] Read of size 4 at addr ffff8881128628cc by task swapper/1/0
[   19.939612]
[   19.939772] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc3+ #97
[   19.940397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qe4
[   19.941523] Call Trace:
[   19.941774]  <IRQ>
[   19.941985]  dump_stack+0x97/0xe0
[   19.942323]  print_address_description.constprop.0+0x3b/0x60
[   19.942884]  ? taprio_dequeue_soft+0x620/0x780
[   19.943325]  ? taprio_dequeue_soft+0x620/0x780
[   19.943767]  __kasan_report.cold+0x1a/0x32
[   19.944173]  ? taprio_dequeue_soft+0x620/0x780
[   19.944612]  kasan_report+0xe/0x20
[   19.944954]  taprio_dequeue_soft+0x620/0x780
[   19.945380]  __qdisc_run+0x164/0x18d0
[   19.945749]  net_tx_action+0x2c4/0x730
[   19.946124]  __do_softirq+0x268/0x7bc
[   19.946491]  irq_exit+0x17d/0x1b0
[   19.946824]  smp_apic_timer_interrupt+0xeb/0x380
[   19.947280]  apic_timer_interrupt+0xf/0x20
[   19.947687]  </IRQ>
[   19.947912] RIP: 0010:default_idle+0x2d/0x2d0
[   19.948345] Code: 00 00 41 56 41 55 65 44 8b 2d 3f 8d 7c 7c 41 54 55 53 0f 1f 44 00 00 e8 b1 b2 c5 fd e9 07 00 3
[   19.950166] RSP: 0018:ffff88811a3efda0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
[   19.950909] RAX: 0000000080000000 RBX: ffff88811a3a9600 RCX: ffffffff8385327e
[   19.951608] RDX: 1ffff110234752c0 RSI: 0000000000000000 RDI: ffffffff8385262f
[   19.952309] RBP: ffffed10234752c0 R08: 0000000000000001 R09: ffffed10234752c1
[   19.953009] R10: ffffed10234752c0 R11: ffff88811a3a9607 R12: 0000000000000001
[   19.953709] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[   19.954408]  ? default_idle_call+0x2e/0x70
[   19.954816]  ? default_idle+0x1f/0x2d0
[   19.955192]  default_idle_call+0x5e/0x70
[   19.955584]  do_idle+0x3d4/0x500
[   19.955909]  ? arch_cpu_idle_exit+0x40/0x40
[   19.956325]  ? _raw_spin_unlock_irqrestore+0x23/0x30
[   19.956829]  ? trace_hardirqs_on+0x30/0x160
[   19.957242]  cpu_startup_entry+0x19/0x20
[   19.957633]  start_secondary+0x2a6/0x380
[   19.958026]  ? set_cpu_sibling_map+0x18b0/0x18b0
[   19.958486]  secondary_startup_64+0xa4/0xb0
[   19.958921]
[   19.959078] Allocated by task 33:
[   19.959412]  save_stack+0x1b/0x80
[   19.959747]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[   19.960222]  kmem_cache_alloc+0xe4/0x230
[   19.960617]  __alloc_skb+0x91/0x510
[   19.960967]  ndisc_alloc_skb+0x133/0x330
[   19.961358]  ndisc_send_ns+0x134/0x810
[   19.961735]  addrconf_dad_work+0xad5/0xf80
[   19.962144]  process_one_work+0x78e/0x13a0
[   19.962551]  worker_thread+0x8f/0xfa0
[   19.962919]  kthread+0x2ba/0x3b0
[   19.963242]  ret_from_fork+0x3a/0x50
[   19.963596]
[   19.963753] Freed by task 33:
[   19.964055]  save_stack+0x1b/0x80
[   19.964386]  __kasan_slab_free+0x12f/0x180
[   19.964830]  kmem_cache_free+0x80/0x290
[   19.965231]  ip6_mc_input+0x38a/0x4d0
[   19.965617]  ipv6_rcv+0x1a4/0x1d0
[   19.965948]  __netif_receive_skb_one_core+0xf2/0x180
[   19.966437]  netif_receive_skb+0x8c/0x3c0
[   19.966846]  br_handle_frame_finish+0x779/0x1310
[   19.967302]  br_handle_frame+0x42a/0x830
[   19.967694]  __netif_receive_skb_core+0xf0e/0x2a90
[   19.968167]  __netif_receive_skb_one_core+0x96/0x180
[   19.968658]  process_backlog+0x198/0x650
[   19.969047]  net_rx_action+0x2fa/0xaa0
[   19.969420]  __do_softirq+0x268/0x7bc
[   19.969785]
[   19.969940] The buggy address belongs to the object at ffff888112862840
[   19.969940]  which belongs to the cache skbuff_head_cache of size 224
[   19.971202] The buggy address is located 140 bytes inside of
[   19.971202]  224-byte region [ffff888112862840, ffff888112862920)
[   19.972344] The buggy address belongs to the page:
[   19.972820] page:ffffea00044a1800 refcount:1 mapcount:0 mapping:ffff88811a2bd1c0 index:0xffff8881128625c0 compo0
[   19.973930] flags: 0x8000000000010200(slab|head)
[   19.974388] raw: 8000000000010200 ffff88811a2ed650 ffff88811a2ed650 ffff88811a2bd1c0
[   19.975151] raw: ffff8881128625c0 0000000000190013 00000001ffffffff 0000000000000000
[   19.975915] page dumped because: kasan: bad access detected
[   19.976461] page_owner tracks the page as allocated
[   19.976946] page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NO)
[   19.978332]  prep_new_page+0x24b/0x330
[   19.978707]  get_page_from_freelist+0x2057/0x2c90
[   19.979170]  __alloc_pages_nodemask+0x218/0x590
[   19.979619]  new_slab+0x9d/0x300
[   19.979948]  ___slab_alloc.constprop.0+0x2f9/0x6f0
[   19.980421]  __slab_alloc.constprop.0+0x30/0x60
[   19.980870]  kmem_cache_alloc+0x201/0x230
[   19.981269]  __alloc_skb+0x91/0x510
[   19.981620]  alloc_skb_with_frags+0x78/0x4a0
[   19.982043]  sock_alloc_send_pskb+0x5eb/0x750
[   19.982476]  unix_stream_sendmsg+0x399/0x7f0
[   19.982904]  sock_sendmsg+0xe2/0x110
[   19.983262]  ____sys_sendmsg+0x4de/0x6d0
[   19.983660]  ___sys_sendmsg+0xe4/0x160
[   19.984032]  __sys_sendmsg+0xab/0x130
[   19.984396]  do_syscall_64+0xe7/0xae0
[   19.984761] page last free stack trace:
[   19.985142]  __free_pages_ok+0x432/0xbc0
[   19.985533]  qlist_free_all+0x56/0xc0
[   19.985907]  quarantine_reduce+0x149/0x170
[   19.986315]  __kasan_kmalloc.constprop.0+0x9e/0xd0
[   19.986791]  kmem_cache_alloc+0xe4/0x230
[   19.987182]  prepare_creds+0x24/0x440
[   19.987548]  do_faccessat+0x80/0x590
[   19.987906]  do_syscall_64+0xe7/0xae0
[   19.988276]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   19.988775]
[   19.988930] Memory state around the buggy address:
[   19.989402]  ffff888112862780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   19.990111]  ffff888112862800: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[   19.990822] >ffff888112862880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   19.991529]                                               ^
[   19.992081]  ffff888112862900: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
[   19.992796]  ffff888112862980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
Reported-by: Michael Schmidt <[email protected]>
Signed-off-by: Vinicius Costa Gomes <[email protected]>
Acked-by: Andre Guedes <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'for-linus-5.6-2' of git://github.com/cminyard/linux-ipmi

Pull IPMI fix from Corey Minyard:
"Fix a message spew on some system

  The call to platform_get_irq() was changed to print a log if the
  interrupt was not available, and that was causing bogus messages to
  spew out for the IPMI driver. People have requested that this get in
  to 5.6 so I'm sending it along"

* tag 'for-linus-5.6-2' of git://github.com/cminyard/linux-ipmi:
  ipmi_si: Avoid spurious errors for optional IRQs

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
"Fix a build problem with x86/curve25519"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/curve25519 - support assemblers with no adx support

ovl: fix lock in ovl_llseek()

ovl_inode_lock() is interruptible. When inode_lock() in ovl_llseek()
was replaced with ovl_inode_lock(), we did not add a check for error.

Fix this by making ovl_inode_lock() uninterruptible and change the
existing call sites to use an _interruptible variant.

Reported-by: [email protected]
Fixes: b1f9d3858f72 ("ovl: use ovl_inode_lock in ovl_llseek()")
Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>

perf record: Fix binding of AIO user space buffers to nodes

Correct maxnode parameter value passed to mbind() syscall to be the
amount of node mask bits to analyze plus 1. Dynamically allocate node
mask memory depending on the index of node of cpu being profiled.

Fixes: c44a8b44ca9f ("perf record: Bind the AIO user space buffers to nodes")
Signed-off-by: Alexey Budankov <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
[ Remove leftover nr_bits + 1 comment in mbind() call ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

block: Fix partition support for host aware zoned block devices

Commit b72053072c0b ("block: allow partitions on host aware zone
devices") introduced the helper function disk_has_partitions() to check
if a given disk has valid partitions. However, since this function result
directly depends on the disk partition table length rather than the
actual existence of valid partitions in the table, it returns true even
after all partitions are removed from the disk. For host aware zoned
block devices, this results in zone management support to be kept
disabled even after removing all partitions.

Fix this by changing disk_has_partitions() to walk through the partition
table entries and return true if and only if a valid non-zero size
partition is found.

Fixes: b72053072c0b ("block: allow partitions on host aware zone devices")
Cc: [email protected] # 5.5
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

blk-mq: insert flush request to the front of dispatch queue

commit 01e99aeca397 ("blk-mq: insert passthrough request into
hctx->dispatch directly") may change to add flush request to the tail
of dispatch by applying the 'add_head' parameter of
blk_mq_sched_insert_request.

Turns out this way causes performance regression on NCQ controller because
flush is non-NCQ command, which can't be queued when there is any in-flight
NCQ command. When adding flush rq to the front of hctx->dispatch, it is
easier to introduce extra time to flush rq's latency compared with adding
to the tail of dispatch queue because of S_SCHED_RESTART, then chance of
flush merge is increased, and less flush requests may be issued to
controller.

So always insert flush request to the front of dispatch queue just like
before applying commit 01e99aeca397 ("blk-mq: insert passthrough request
into hctx->dispatch directly").

Cc: Damien Le Moal <[email protected]>
Cc: Shinichiro Kawasaki <[email protected]>
Reported-by: Shinichiro Kawasaki <[email protected]>
Fixes: 01e99aeca397 ("blk-mq: insert passthrough request into hctx->dispatch directly")
Signed-off-by: Ming Lei <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

s390/dasd: fix data corruption for thin provisioned devices

Devices are formatted in multiple of tracks.
For an Extent Space Efficient (ESE) volume we get errors when accessing
unformatted tracks. In this case the driver either formats the track on
the flight for write requests or returns zero data for read requests.

In case a request spans multiple tracks, the indication of an unformatted
track presented for the first track is incorrectly applied to all tracks
covered by the request. As a result, tracks containing data will be handled
as empty, resulting in zero data being returned on read, or overwriting
existing data with zero on write.

Fix by determining the track that gets the NRF error.
For write requests only format the track that is surely not formatted.
For Read requests all tracks before have returned valid data and should not
be touched.
All tracks after the unformatted track might be formatted or not. Those are
returned to the blocklayer to build a new request.

When using alias devices there is a chance that multiple write requests
trigger a format of the same track which might lead to data loss. Ensure
that a track is formatted only once by maintaining a list of currently
processed tracks.

Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes")
Cc: [email protected] # 5.3+
Signed-off-by: Stefan Haberland <[email protected]>
Reviewed-by: Jan Hoeppner <[email protected]>
Reviewed-by: Peter Oberparleiter <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag

Enable the sampling check in kernel/events/core.c::perf_event_open(),
which returns the more appropriate -EOPNOTSUPP.

BEFORE:

  $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (l3_request_g1.caching_l3_cache_accesses).
  /bin/dmesg | grep -i perf may provide additional information.

With nothing relevant in dmesg.

AFTER:

  $ sudo perf record -a -e instructions,l3_request_g1.caching_l3_cache_accesses true
  Error:
  l3_request_g1.caching_l3_cache_accesses: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'

Fixes: c43ca5091a37 ("perf/x86/amd: Add support for AMD NB and L2I "uncore" counters")
Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for eMMC sleep command

The busy timeout for the CMD5 to put the eMMC into sleep state, is specific
to the card. Potentially the timeout may exceed the host->max_busy_timeout.
If that becomes the case, mmc_sleep() converts from using an R1B response
to an R1 response, as to prevent the host from doing HW busy detection.

However, it has turned out that some hosts requires an R1B response no
matter what, so let's respect that via checking MMC_CAP_NEED_RSP_BUSY. Note
that, if the R1B gets enforced, the host becomes fully responsible of
managing the needed busy timeout, in one way or the other.

Suggested-by: Sowjanya Komatineni <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>

misc: eeprom: at24: fix regulator underflow

The at24 driver attempts to read a byte from the device to validate that
it's actually present, and if not, disables the vcc regulator and
returns -ENODEV. However, between the read and the error handling path,
pm_runtime_idle() is called and invokes the driver's suspend callback,
which also disables the vcc regulator. This leads to an underflow of the
regulator enable count if the EEPROM is not present.

Move the pm_runtime_suspend() call to be after the error handling path
to resolve this.

Fixes: cd5676db0574 ("misc: eeprom: at24: support pm_runtime control")
Signed-off-by: Michael Auchter <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>

net: mvmdio: avoid error message for optional IRQ

Per the dt-binding the interrupt is optional so use
platform_get_irq_optional() instead of platform_get_irq(). Since
commit 7723f4c5ecdb ("driver core: platform: Add an error message to
platform_get_irq*()") platform_get_irq() produces an error message

orion-mdio f1072004.mdio: IRQ index 0 not found

which is perfectly normal if one hasn't specified the optional property
in the device tree.

Signed-off-by: Chris Packham <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Add missing mask of ATU occupancy register

Only the bottom 12 bits contain the ATU bin occupancy statistics. The
upper bits need masking off.

Fixes: e0c69ca7dfbb ("net: dsa: mv88e6xxx: Add ATU occupancy via devlink resources")
Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: memcg: fix lockdep splat in inet_csk_accept()

Locking newsk while still holding the listener lock triggered
a lockdep splat [1]

We can simply move the memcg code after we release the listener lock,
as this can also help if multiple threads are sharing a common listener.

Also fix a typo while reading socket sk_rmem_alloc.

[1]
WARNING: possible recursive locking detected
5.6.0-rc3-syzkaller #0 Not tainted
--------------------------------------------
syz-executor598/9524 is trying to acquire lock:
ffff88808b5b8b90 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline]
ffff88808b5b8b90 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x69f/0xd30 net/ipv4/inet_connection_sock.c:492

but task is already holding lock:
ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline]
ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x8d/0xd30 net/ipv4/inet_connection_sock.c:445

other info that might help us debug this:
Possible unsafe locking scenario:

       CPU0
       ----
  lock(sk_lock-AF_INET6);
  lock(sk_lock-AF_INET6);

*** DEADLOCK ***

May be due to missing lock nesting notation

1 lock held by syz-executor598/9524:
#0: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: lock_sock include/net/sock.h:1541 [inline]
#0: ffff88808b5b9590 (sk_lock-AF_INET6){+.+.}, at: inet_csk_accept+0x8d/0xd30 net/ipv4/inet_connection_sock.c:445

stack backtrace:
CPU: 0 PID: 9524 Comm: syz-executor598 Not tainted 5.6.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x188/0x20d lib/dump_stack.c:118
print_deadlock_bug kernel/locking/lockdep.c:2370 [inline]
check_deadlock kernel/locking/lockdep.c:2411 [inline]
validate_chain kernel/locking/lockdep.c:2954 [inline]
__lock_acquire.cold+0x114/0x288 kernel/locking/lockdep.c:3954
lock_acquire+0x197/0x420 kernel/locking/lockdep.c:4484
lock_sock_nested+0xc5/0x110 net/core/sock.c:2947
lock_sock include/net/sock.h:1541 [inline]
inet_csk_accept+0x69f/0xd30 net/ipv4/inet_connection_sock.c:492
inet_accept+0xe9/0x7c0 net/ipv4/af_inet.c:734
__sys_accept4_file+0x3ac/0x5b0 net/socket.c:1758
__sys_accept4+0x53/0x90 net/socket.c:1809
__do_sys_accept4 net/socket.c:1821 [inline]
__se_sys_accept4 net/socket.c:1818 [inline]
__x64_sys_accept4+0x93/0xf0 net/socket.c:1818
do_syscall_64+0xf6/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4445c9
Code: e8 0c 0d 03 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffc35b37608 EFLAGS: 00000246 ORIG_RAX: 0000000000000120
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004445c9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000306777 R09: 0000000000306777
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00000000004053d0 R14: 0000000000000000 R15: 0000000000000000

Fixes: d752a4986532 ("net: memcg: late association of sock to memcg")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Shakeel Butt <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 's390-qeth-fixes'

Julian Wiedmann says:

====================
s390/qeth: fixes 2020-03-11

please apply the following patch series for qeth to netdev's net tree.

Just one fix to get the RX buffer pool resizing right, with two
preparatory cleanups.
This is on the larger side given where we are in the -rc cycle, but a
big chunk of the delta is just refactoring to make the fix look nice.

I intentionally split these off from yesterday's series. No objections
if you'd rather punt them to net-next, the series should apply cleanly.
====================

Signed-off-by: David S. Miller <[email protected]>

s390/qeth: implement smarter resizing of the RX buffer pool

The RX buffer pool is allocated in qeth_alloc_qdio_queues().
A subsequent pool resizing is then handled in a very simple way:
first free the current pool, then allocate a new pool of the requested
size.

There's two ways where this can go wrong:
1. if the resize action happens _before_ the initial pool was allocated,
   then a subsequent initialization will call qeth_alloc_qdio_queues()
   and fill the pool with a second(!) set of pages. We consume twice the
   planned amount of memory.
   This is easy to fix - just skip the resizing if the queues haven't
   been allocated yet.
2. if the initial pool was created by qeth_alloc_qdio_queues() but a
   subsequent resizing fails, then the device has no(!) RX buffer pool.
   The next initialization will _not_ call qeth_alloc_qdio_queues(), and
   attempting to back the RX buffers with pages in
   qeth_init_qdio_queues() will fail.
   Not very difficult to fix either - instead of re-allocating the whole
   pool, just allocate/free as many entries to match the desired size.

Fixes: 4a71df50047f ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

s390/qeth: refactor buffer pool code

In preparation for a subsequent fix, split out helpers to allocate/free
individual pool entries.

Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

s390/qeth: use page pointers to manage RX buffer pool

The RX buffer elements are always backed with full pages, reflect this
in the pointer type.

Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number

The Internet Assigned Numbers Authority (IANA) has recently assigned
a protocol number value of 143 for Ethernet [1].

Before this assignment, encapsulation mechanisms such as Segment Routing
used the IPv6-NoNxt protocol number (59) to indicate that the encapsulated
payload is an Ethernet frame.

In this patch, we add the definition of the Ethernet protocol number to the
kernel headers and update the SRv6 L2 tunnels to use it.

[1] https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml

Signed-off-by: Paolo Lungaroni <[email protected]>
Reviewed-by: Andrea Mayer <[email protected]>
Acked-by: Ahmed Abdelsalam <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed

By default, DSA drivers should configure CPU and DSA ports to their
maximum speed. In many configurations this is sufficient to make the
link work.

In some cases it is necessary to configure the link to run slower,
e.g. because of limitations of the SoC it is connected to. Or back to
back PHYs are used and the PHY needs to be driven in order to
establish link. In this case, phylink is used.

Only instantiate phylink if it is required. If there is no PHY, or no
fixed link properties, phylink can upset a link which works in the
default configuration.

Fixes: 0e27921816ad ("net: dsa: Use PHYLINK for the CPU/DSA ports")
Signed-off-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/packet: tpacket_rcv: do not increment ring index on drop

In one error case, tpacket_rcv drops packets after incrementing the
ring producer index.

If this happens, it does not update tp_status to TP_STATUS_USER and
thus the reader is stalled for an iteration of the ring, causing out
of order arrival.

The only such error path is when virtio_net_hdr_from_skb fails due
to encountering an unknown GSO type.

Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sxgbe: Fix off by one in samsung driver strncpy size arg

This patch fixes an off-by-one error in strncpy size argument in
drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c. The issue is that in:

strncmp(opt, "eee_timer:", 6)

the passed string literal: "eee_timer:" has 10 bytes (without the NULL
byte) and the passed size argument is 6. As a result, the logic will
also accept other, malformed strings, e.g. "eee_tiXXX:".

This bug doesn't seem to have any security impact since its present in
module's cmdline parsing code.

Signed-off-by: Dominik Czarnota <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: caif: Add lockdep expression to RCU traversal primitive

caifdevs->list is traversed using list_for_each_entry_rcu()
outside an RCU read-side critical section but under the
protection of rtnl_mutex. Hence, add the corresponding lockdep
expression to silence the following false-positive warning:

[   10.868467] =============================
[   10.869082] WARNING: suspicious RCU usage
[   10.869817] 5.6.0-rc1-00177-g06ec0a154aae4 #1 Not tainted
[   10.870804] -----------------------------
[   10.871557] net/caif/caif_dev.c:115 RCU-list traversed in non-reader section!!

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Amol Grover <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

MAINTAINERS: remove Sathya Perla as Emulex NIC maintainer

Remove Sathya Perla, [email protected] is bouncing.
The driver has 3 more maintainers.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: fec: validate the new settings in fec_enet_set_coalesce()

fec_enet_set_coalesce() validates the previously set params
and if they are within range proceeds to apply the new ones.
The new ones, however, are not validated. This seems backwards,
probably a copy-paste error?

Compile tested only.

Fixes: d851b47b22fc ("net: fec: add interrupt coalescence feature support")
Signed-off-by: Jakub Kicinski <[email protected]>
Acked-by: Fugang Duan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipmi_si: Avoid spurious errors for optional IRQs

Although the IRQ assignment in ipmi_si driver is optional,
platform_get_irq() spews error messages unnecessarily:
ipmi_si dmi-ipmi-si.0: IRQ index 0 not found

Fix this by switching to platform_get_irq_optional().

Cc: [email protected] # 5.4.x
Cc: John Donnelly <[email protected]>
Fixes: 7723f4c5ecdb ("driver core: platform: Add an error message to platform_get_irq*()")
Reported-and-tested-by: Patrick Vo <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Message-Id: <20200205093146 [email protected]>
Signed-off-by: Corey Minyard <[email protected]>

Merge tag 'exynos-drm-fixes-for-v5.6-rc5-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes

Fix IOMMU initialization failure when Exynos DRM driver is rebound,
and also fix memory leak to iommu mapping object, which was
detected by kmemleak detector.

Signed-off-by: Dave Airlie <[email protected]>
From: Inki Dae <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

drm/i915: Defer semaphore priority bumping to a workqueue

Since the semaphore fence may be signaled from inside an interrupt
handler from inside a request holding its request->lock, we cannot then
enter into the engine->active.lock for processing the semaphore priority
bump as we may traverse our call tree and end up on another held
request.

CPU 0:
[ 2243.218864]  _raw_spin_lock_irqsave+0x9a/0xb0
[ 2243.218867]  i915_schedule_bump_priority+0x49/0x80 [i915]
[ 2243.218869]  semaphore_notify+0x6d/0x98 [i915]
[ 2243.218871]  __i915_sw_fence_complete+0x61/0x420 [i915]
[ 2243.218874]  ? kmem_cache_free+0x211/0x290
[ 2243.218876]  i915_sw_fence_complete+0x58/0x80 [i915]
[ 2243.218879]  dma_i915_sw_fence_wake+0x3e/0x80 [i915]
[ 2243.218881]  signal_irq_work+0x571/0x690 [i915]
[ 2243.218883]  irq_work_run_list+0xd7/0x120
[ 2243.218885]  irq_work_run+0x1d/0x50
[ 2243.218887]  smp_irq_work_interrupt+0x21/0x30
[ 2243.218889]  irq_work_interrupt+0xf/0x20

CPU 1:
[ 2242.173107]  _raw_spin_lock+0x8f/0xa0
[ 2242.173110]  __i915_request_submit+0x64/0x4a0 [i915]
[ 2242.173112]  __execlists_submission_tasklet+0x8ee/0x2120 [i915]
[ 2242.173114]  ? i915_sched_lookup_priolist+0x1e3/0x2b0 [i915]
[ 2242.173117]  execlists_submit_request+0x2e8/0x2f0 [i915]
[ 2242.173119]  submit_notify+0x8f/0xc0 [i915]
[ 2242.173121]  __i915_sw_fence_complete+0x61/0x420 [i915]
[ 2242.173124]  ? _raw_spin_unlock_irqrestore+0x39/0x40
[ 2242.173137]  i915_sw_fence_complete+0x58/0x80 [i915]
[ 2242.173140]  i915_sw_fence_commit+0x16/0x20 [i915]

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1318
Fixes: b7404c7ecb38 ("drm/i915: Bump ready tasks ahead of busywaits")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: <[email protected]> # v5.2+
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 209df10bb4536c81c2540df96c02cd079435357f)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/gt: Close race between cacheline_retire and free

If the cacheline may still be busy, atomically mark it for future
release, and only if we can determine that it will never be used again,
immediately free it.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1392
Fixes: ebece7539242 ("drm/i915: Keep timeline HWSP allocated until idle across the system")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Cc: Matthew Auld <[email protected]>
Reviewed-by: Mika Kuoppala <[email protected]>
Cc: <[email protected]> # v5.2+
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 2d4bd971f5baa51418625f379a69f5d58b5a0450)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/execlists: Enable timeslice on partial virtual engine dequeue

If we stop filling the ELSP due to an incompatible virtual engine
request, check if we should enable the timeslice on behalf of the queue.

This fixes the case where we are inspecting the last->next element when
we know that the last element is the last request in the execution queue,
and so decided we did not need to enable timeslicing despite the intent
to do so!

Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: <[email protected]> # v5.4+
Reviewed-by: Mika Kuoppala <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 3df2deed411e0f1b7312baf0139aab8bba4c0410)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: be more solid in checking the alignment

The alignment is u64, and yet is_power_of_2() assumes unsigned long,
which might give different results between 32b and 64b kernel.

Signed-off-by: Matthew Auld <[email protected]>
Cc: Chris Wilson <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Cc: [email protected]
(cherry picked from commit 2920516b2f719546f55079bc39a7fe409d9e80ab)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915/gvt: Fix dma-buf display blur issue on CFL

Commit c3b5a8430daad ("drm/i915/gvt: Enable gfx virtualiztion for CFL")
added the support on CFL. The vgpu emulation hotplug support on CFL was
supposed to be included in that patch. Without the vgpu emulation
hotplug support, the dma-buf based display gives us a blur face.

So fix this issue by adding the vgpu emulation hotplug support on CFL.

Fixes: c3b5a8430daad ("drm/i915/gvt: Enable gfx virtualiztion for CFL")
Signed-off-by: Tina Zhang <[email protected]>
Acked-by: Zhenyu Wang <[email protected]>
Signed-off-by: Zhenyu Wang <[email protected]>
Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 135dde8853c7e00f6002e710f7e4787ed8585c0e)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Return early for await_start on same timeline

Requests within a timeline are ordered by that timeline, so awaiting for
the start of a request within the timeline is a no-op. This used to work
by falling out of the mutex_trylock() as the signaler and waiter had the
same timeline and not returning an error.

Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: <[email protected]> # v5.5+
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit ab7a69020fb5d5c7ba19fba60f62fd6f9ca9f779)
Signed-off-by: Jani Nikula <[email protected]>

drm/i915: Actually emit the await_start

Fix the inverted test to emit the wait on the end of the previous
request if we /haven't/ already.

Fixes: 6a79d848403d ("drm/i915: Lock signaler timeline while navigating")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: <[email protected]> # v5.5+
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 07e9c59d63df6a1c44c1975c01827ba18b69270a)
Signed-off-by: Jani Nikula <[email protected]>

dpaa_eth: Remove unnecessary boolean expression in dpaa_get_headroom

Clang warns:

drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:2860:9: warning:
converting the result of '?:' with integer constants to a boolean always
evaluates to 'true' [-Wtautological-constant-compare]
        return DPAA_FD_DATA_ALIGNMENT ? ALIGN(headroom,
               ^
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:131:34: note: expanded
from macro 'DPAA_FD_DATA_ALIGNMENT'
\#define DPAA_FD_DATA_ALIGNMENT  (fman_has_errata_a050385() ? 64 : 16)
                                 ^
1 warning generated.

This was exposed by commit 3c68b8fffb48 ("dpaa_eth: FMan erratum A050385
workaround") even though it appears to have been an issue since the
introductory commit 9ad1a3749333 ("dpaa_eth: add support for DPAA
Ethernet") since DPAA_FD_DATA_ALIGNMENT has never been able to be zero.

Just replace the whole boolean expression with the true branch, as it is
always been true.

Link: https://github.com/ClangBuiltLinux/linux/issues/928
Signed-off-by: Nathan Chancellor <[email protected]>
Reviewed-by: Madalin Bucur <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt

Pull fscrypt fix from Eric Biggers:
"Fix a bug where if userspace is writing to encrypted files while the
  FS_IOC_REMOVE_ENCRYPTION_KEY ioctl (introduced in v5.4) is running,
  dirty inodes could be evicted, causing writes could be lost or the
  filesystem to hang due to a use-after-free. This was encountered
  during real-world use, not just theoretical.

  Tested with the existing fscrypt xfstests, and with a new xfstest I
  wrote to reproduce this bug. This fix does expose an existing bug with
  '-o lazytime' that Ted is working on fixing, but this fix is more
  critical and needed anyway regardless of the lazytime fix"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
  fscrypt: don't evict dirty inodes after removing key

Merge tag 'mac80211-for-net-2020-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A couple of fixes:
* three netlink validation fixes
* a mesh path selection fix
====================

Signed-off-by: David S. Miller <[email protected]>

ARC: define __ALIGN_STR and __ALIGN symbols for ARC

The default defintions use fill pattern 0x90 for padding which for ARC
generates unintended "ldh_s r12,[r0,0x20]" corresponding to opcode 0x9090

So use ".align 4" which insert a "nop_s" instruction instead.

Cc: [email protected]
Acked-by: Vineet Gupta <[email protected]>
Signed-off-by: Eugeniy Paltsev <[email protected]>
Signed-off-by: Vineet Gupta <[email protected]>

ARC: show_regs: reduce lines of output

Before
------

| CPU: 1 PID: 29061 Comm: tst-dynarray-at Not tainted 5.6.0-rc1-00002-g941fcc018ca6-dirty #12
|
| [ECR   ]: 0x00090000 =>
| [EFA   ]: 0x00000000
| [ERET  ]: 0x2004aa6c
|     @off 0x2aa6c in [/lib/libc-2.31.9000.so]
      VMA: 0x20020000 to 0x20122000
| [STAT32]: 0x80080a82 [IE U     ]
| BTA: 0x2004aa18 SP: 0x5ffff8a8  FP: 0x5ffff8fc
| LPS: 0x2008788e LPE: 0x20087896 LPC: 0x00000000
| r00: 0x00000000 r01: 0x5ffff8a8 r02: 0x00000000
| r03: 0x00000008 r04: 0xffffffff r05: 0x00000000
| r06: 0x00000000 r07: 0x00000000 r08: 0x00000087
| r09: 0x00000000 r10: 0x2010691c r11: 0x00000020
| r12: 0x2003b214 r13: 0x5ffff8a8 r14: 0x20126e68
| r15: 0x2001f26c r16: 0x2012a000 r17: 0x00000001
| r18: 0x5ffff8fc r19: 0x00000000 r20: 0x5ffff948
| r21: 0x00000001 r22: 0xffffffff r23: 0x5fffff8c
| r24: 0x4008c2a8 r25: 0x2001f6e0

After
-----

| CPU: 1 PID: 29061 Comm: tst-dynarray-at Not tainted 5.6.0-rc1-00002-g941fcc018ca6-dirty #12
|   @off 0x2aa6c in [/lib/libc-2.31.9000.so]  VMA: 0x20020000 to 0x20122000
| ECR: 0x00090000 EFA: 0x00000000 ERET: 0x2004aa6c
| STAT32: 0x80080a82 [IE U     ]  BTA: 0x2004aa18
| BLK: 0x2003b214  SP: 0x5ffff8a8  FP: 0x5ffff8fc
| LPS: 0x2008788e LPE: 0x20087896 LPC: 0x00000000
| r00: 0x00000000 r01: 0x5ffff8a8 r02: 0x00000000
| r03: 0x00000008 r04: 0xffffffff r05: 0x00000000
| r06: 0x00000000 r07: 0x00000000 r08: 0x00000087
| r09: 0x00000000 r10: 0x2010691c r11: 0x00000020
| r12: 0x2003b214 r13: 0x5ffff8a8 r14: 0x20126e68
| r15: 0x2001f26c r16: 0x2012a000 r17: 0x00000001
| r18: 0x5ffff8fc r19: 0x00000000 r20: 0x5ffff948
| r21: 0x00000001 r22: 0xffffffff r23: 0x5fffff8c
| r24: 0x4008c2a8 r25: 0x2001f6e0 BTA: 0x2004aa18

Signed-off-by: Vineet Gupta <[email protected]>

Merge tag 'for-linus-2020-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread fix from Christian Brauner:
"This contains a single fix for a regression which was introduced when
  we introduced the ability to select a specific pid at process creation
  time.

  When this feature is requested, the error value will be set to -EPERM
  after exiting the pid allocation loop. This caused EPERM to be
  returned when e.g. the init process/child subreaper of the pid
  namespace has already died where we used to return ENOMEM before.

  The first patch here simply fixes the regression by unconditionally
  setting the return value back to ENOMEM again once we've successfully
  allocated the requested pid number. This should be easy to backport to
  v5.5.

  The second patch adds a comment explaining that we must keep returning
  ENOMEM since we've been doing it for a long time and have explicitly
  documented this behavior for userspace. This seemed worthwhile because
  we now have at least two separate example where people tried to change
  the return value to something other than ENOMEM (The first version of
  the regression fix did that too and the commit message links to an
  earlier patch that tried to do the same.).

  I have a simple regression test to make sure we catch this regression
  in the future but since that introduces a whole new selftest subdir
  and test files I'll keep this for v5.7"

* tag 'for-linus-2020-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  pid: make ENOMEM return value more obvious
  pid: Fix error return value in some cases

Merge tag 'trace-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fix from Steven Rostedt:
"Have ftrace lookup_rec() return a consistent record otherwise it can
break live patching"

* tag 'trace-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Return the first found result in lookup_rec()

Merge tag 'mips_fixes_5.6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:
"A few MIPS fixes:

   - DT fixes for CI20

   - Fix command line handling

   - Correct patchwork URL"

* tag 'mips_fixes_5.6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MAINTAINERS: Correct MIPS patchwork URL
  MIPS: DTS: CI20: fix interrupt for pcf8563 RTC
  MIPS: DTS: CI20: fix PMU definitions for ACT8600
  MIPS: Fix CONFIG_MIPS_CMDLINE_DTB_EXTEND handling

Merge tag 'pinctrl-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
"Some pin control fixes for the v5.6 series.

  It comes down to memory leaks in the core and driver fixes. Some
  should have been sent earlier but they kept piling up and the world is
  just so full of distractions these days.

   - Fix some inverted pins in the Meson GLX driver.

   - Align the i.MX SC message structs causing warnings from KASan.

   - Balance the kref in pinctrl hogs so they are actually free:d when
     removing a pin control module. We haven't seen it before as people
     don't use modules for pin control that much, I think.

   - Add a missing call to pinctrl_unregister_mappings() another memory
     leak when using modules.

   - Fix the fwspec parsing in the Qualcomm driver.

   - Fix a syntax error in the Falcon driver.

   - Assign .irq_eoi conditionally in the Qualcomm driver, fixing a bug
     affecting elder Qualcomm platforms"

* tag 'pinctrl-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: qcom: Assign irq_eoi conditionally
  pinctrl: falcon: fix syntax error
  pinctrl: qcom: ssbi-gpio: Fix fwspec parsing bug
  pinctrl: madera: Add missing call to pinctrl_unregister_mappings
  pinctrl: core: Remove extra kref_get which blocks hogs being freed
  pinctrl: imx: scu: Align imx sc msg structs to 4
  pinctrl: meson-gxl: fix GPIOX sdio pins

driver code: clarify and fix platform device DMA mask allocation

This does three inter-related things to clarify the usage of the
platform device dma_mask field. In the process, fix the bug introduced
by cdfee5623290 ("driver core: initialize a default DMA mask for
platform device") that caused Artem Tashkinov's laptop to not boot with
newer Fedora kernels.

This does:

- First off, rename the field to "platform_dma_mask" to make it
   greppable.

   We have way too many different random fields called "dma_mask" in
   various data structures, where some of them are actual masks, and
   some of them are just pointers to the mask. And the structures all
   have pointers to each other, or embed each other inside themselves,
   and "pdev" sometimes means "platform device" and sometimes it means
   "PCI device".

   So to make it clear in the code when you actually use this new field,
   give it a unique name (it really should be something even more unique
   like "platform_device_dma_mask", since it's per platform device, not
   per platform, but that gets old really fast, and this is unique
   enough in context).

   To further clarify when the field gets used, initialize it when we
   actually start using it with the default value.

- Then, use this field instead of the random one-off allocation in
   platform_device_register_full() that is now unnecessary since we now
   already have a perfectly fine allocation for it in the platform
   device structure.

- The above then allows us to fix the actual bug, where the error path
   of platform_device_register_full() would unconditionally free the
   platform device DMA allocation with 'kfree()'.

   That kfree() was dont regardless of whether the allocation had been
   done earlier with the (now removed) kmalloc, or whether
   setup_pdev_dma_masks() had already been used and the dma_mask pointer
   pointed to the mask that was part of the platform device.

It seems most people never triggered the error path, or only triggered
it from a call chain that set an explicit pdevinfo->dma_mask value (and
thus caused the unnecessary allocation that was "cleaned up" in the
error path) before calling platform_device_register_full().

Robin Murphy points out that in Artem's case the wdat_wdt driver failed
in platform_device_add(), and that was the one that had called
platform_device_register_full() with pdevinfo.dma_mask = 0, and would
have caused that kfree() of pdev.dma_mask corrupting the heap.

A later unrelated kmalloc() then oopsed due to the heap corruption.

Fixes: cdfee5623290 ("driver core: initialize a default DMA mask for platform device")
Reported-bisected-and-tested-by: Artem S. Tashkinov <[email protected]>
Reviewed-by: Robin Murphy <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mmc: sdhci-tegra: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY

It has turned out that the sdhci-tegra controller requires the R1B response,
for commands that has this response associated with them. So, converting
from an R1B to an R1 response for a CMD6 for example, leads to problems
with the HW busy detection support.

Fix this by informing the mmc core about the requirement, via setting the
host cap, MMC_CAP_NEED_RSP_BUSY.

Reported-by: Bitan Biswas <[email protected]>
Reported-by: Peter Geis <[email protected]>
Suggested-by: Sowjanya Komatineni <[email protected]>
Cc: <[email protected]>
Tested-by: Sowjanya Komatineni <[email protected]>
Tested-By: Peter Geis <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>

mmc: sdhci-omap: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY

It has turned out that the sdhci-omap controller requires the R1B response,
for commands that has this response associated with them. So, converting
from an R1B to an R1 response for a CMD6 for example, leads to problems
with the HW busy detection support.

Fix this by informing the mmc core about the requirement, via setting the
host cap, MMC_CAP_NEED_RSP_BUSY.

Reported-by: Naresh Kamboju <[email protected]>
Reported-by: Anders Roxell <[email protected]>
Reported-by: Faiz Abbas <[email protected]>
Cc: <[email protected]>
Tested-by: Anders Roxell <[email protected]>
Tested-by: Faiz Abbas <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>

mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for erase/trim/discard

The busy timeout that is computed for each erase/trim/discard operation,
can become quite long and may thus exceed the host->max_busy_timeout. If
that becomes the case, mmc_do_erase() converts from using an R1B response
to an R1 response, as to prevent the host from doing HW busy detection.

However, it has turned out that some hosts requires an R1B response no
matter what, so let's respect that via checking MMC_CAP_NEED_RSP_BUSY. Note
that, if the R1B gets enforced, the host becomes fully responsible of
managing the needed busy timeout, in one way or the other.

Suggested-by: Sowjanya Komatineni <[email protected]>
Cc: <[email protected]>
Tested-by: Anders Roxell <[email protected]>
Tested-by: Sowjanya Komatineni <[email protected]>
Tested-by: Faiz Abbas <[email protected]>
Tested-By: Peter Geis <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>

mmc: core: Allow host controllers to require R1B for CMD6

It has turned out that some host controllers can't use R1B for CMD6 and
other commands that have R1B associated with them. Therefore invent a new
host cap, MMC_CAP_NEED_RSP_BUSY to let them specify this.

In __mmc_switch(), let's check the flag and use it to prevent R1B responses
from being converted into R1. Note that, this also means that the host are
on its own, when it comes to manage the busy timeout.

Suggested-by: Sowjanya Komatineni <[email protected]>
Cc: <[email protected]>
Tested-by: Anders Roxell <[email protected]>
Tested-by: Sowjanya Komatineni <[email protected]>
Tested-by: Faiz Abbas <[email protected]>
Tested-By: Peter Geis <[email protected]>
Signed-off-by: Ulf Hansson <[email protected]>

x86/ioremap: Map EFI runtime services data as encrypted for SEV

The dmidecode program fails to properly decode the SMBIOS data supplied
by OVMF/UEFI when running in an SEV guest. The SMBIOS area, under SEV, is
encrypted and resides in reserved memory that is marked as EFI runtime
services data.

As a result, when memremap() is attempted for the SMBIOS data, it
can't be mapped as regular RAM (through try_ram_remap()) and, since
the address isn't part of the iomem resources list, it isn't mapped
encrypted through the fallback ioremap().

Add a new __ioremap_check_other() to deal with memory types like
EFI_RUNTIME_SERVICES_DATA which are not covered by the resource ranges.

This allows any runtime services data which has been created encrypted,
to be mapped encrypted too.

[ bp: Move functionality to a separate function. ]

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Joerg Roedel <[email protected]>
Tested-by: Joerg Roedel <[email protected]>
Cc: <[email protected]> # 5.3
Link: https://lkml.kernel.org/r/2d9e16eb5b53dc82665c95c6764b7407719df7a0.1582645327.git.thomas.lendacky@amd.com

ftrace: Return the first found result in lookup_rec()

It appears that ip ranges can overlap so. In that case lookup_rec()
returns whatever results it got last even if it found nothing in last
searched page.

This breaks an obscure livepatch late module patching usecase:
  - load livepatch
  - load the patched module
  - unload livepatch
  - try to load livepatch again

To fix this return from lookup_rec() as soon as it found the record
containing searched-for ip. This used to be this way prior lookup_rec()
introduction.

Link: http://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: 7e16f581a817 ("ftrace: Separate out functionality from ftrace_location_range()")
Signed-off-by: Artem Savkov <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>

perf scripting perl: Add common_callchain to fix argument order

Since common_callchain has been added to the argument array, we need to
reflect it in perl-based scripts, because otherwise the following args
would be shifted and thus incorrect. E.g. rw-by-pid and calculation of
read and written bytes:

Before:

  read counts by pid:
     pid                  comm     # reads  bytes_requested  bytes_read
  ------  --------------------  -----------  ----------  ----------
   19301  dd                             4  424510450039736           0

After:

  read counts by pid:
     pid                  comm     # reads  bytes_requested  bytes_read
  ------  --------------------  -----------  ----------  ----------
   19301  dd                             4        9536             4341

Committer testing:

To see before after first do:

  # perf script record rw-by-pid
  ^C

Now you'll have a perf.data file to report on, then do before and after
using:

  # perf script report rw-by-pid

Anbd notice the bytes_request/bytes_read, as above.

Signed-off-by: Michael Petlan <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Benjamin Salon <[email protected]>
Cc: Jiri Olsa <[email protected]>
LPU-Reference: 20200311132836 [email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf intel-pt: Update intel-pt.txt file with new location of the documentation

Make it easy for people looking in intel-pt.txt to find the new file.

Signed-off-by: Adrian Hunter <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf intel-pt: Add Intel PT man page references

Add references to Intel PT man page in man pages of associated tools.

Signed-off-by: Adrian Hunter <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf intel-pt: Rename intel-pt.txt and put it in man page format

Make the Intel PT documentation into a man page.

Signed-off-by: Adrian Hunter <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf doc: Set man page date to last git commit

Currently the man page dates reflect the date the man pages were built.
This patch adjusts the date so that the date is when then man page
last had a commit against it. The date is generated using 'git log'.

Committer testing:

  $ git log -1 --pretty="format:%cd" --date=short tools/perf/Documentation/perf-top.txt
  2020-01-14

Before:

  rm -rf /tmp/build/perf
  mkdir -p /tmp/build/perf
  make -C tools/perf O=/tmp/build/perf/ install
  $ date
  Wed 11 Mar 2020 10:21:19 AM -03
  $ man perf-top | tail -1
  perf                    03/11/2020           PERF-TOP(1)
  $

After:

  rm -rf /tmp/build/perf
  mkdir -p /tmp/build/perf
  make -C tools/perf O=/tmp/build/perf/ install
  $ date
  $ date
  Wed 11 Mar 2020 10:24:06 AM -03
  $ man perf-top | tail -1
  perf                    2020-01-14           PERF-TOP(1)
  $

Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masanari Iida <[email protected]>
Cc: Mukesh Ojha <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Fix unsigned variable comparison to zero

The variable 'offset' in function cs_etm__sample() is u64 type, it's not
appropriate to check it with 'while (offset > 0)'; this patch changes to
'while (offset)'.

Signed-off-by: Leo Yan <[email protected]>
Reviewed-by: Mathieu Poirier <[email protected]>
Reviewed-by: Mike Leach <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Walker <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: coresight ml <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Optimize copying last branches

If an instruction range packet can generate multiple instruction
samples, these samples share the same last branches; it's not necessary
to copy the same last branches repeatedly for these samples within the
same packet.

This patch moves out the last branches copying from function
cs_etm__synth_instruction_sample(), and execute it prior to generating
instruction samples.

Signed-off-by: Leo Yan <[email protected]>
Reviewed-by: Mathieu Poirier <[email protected]>
Reviewed-by: Mike Leach <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Walker <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: coresight ml <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Correct synthesizing instruction samples

When 'etm->instructions_sample_period' is less than
'tidq->period_instructions', the function cs_etm__sample() cannot handle
this case properly with its logic.

Let's see below flow as an example:

- If we set itrace option '--itrace=i4', then function cs_etm__sample()
  has variables with initialized values:

  tidq->period_instructions = 0
  etm->instructions_sample_period = 4

- When the first packet is coming:

  packet->instr_count = 10; the number of instructions executed in this
  packet is 10, thus update period_instructions as below:

  tidq->period_instructions = 0 + 10 = 10
  instrs_over = 10 - 4 = 6
  offset = 10 - 6 - 1 = 3
  tidq->period_instructions = instrs_over = 6

- When the second packet is coming:

  packet->instr_count = 10; in the second pass, assume 10 instructions
  in the trace sample again:

  tidq->period_instructions = 6 + 10 = 16
  instrs_over = 16 - 4 = 12
  offset = 10 - 12 - 1 = -3  -> the negative value
  tidq->period_instructions = instrs_over = 12

So after handle these two packets, there have below issues:

The first issue is that cs_etm__instr_addr() returns the address within
the current trace sample of the instruction related to offset, so the
offset is supposed to be always unsigned value.  But in fact, function
cs_etm__sample() might calculate a negative offset value (in handling
the second packet, the offset is -3) and pass to cs_etm__instr_addr()
with u64 type with a big positive integer.

The second issue is it only synthesizes 2 samples for sample period = 4.
In theory, every packet has 10 instructions so the two packets have
total 20 instructions, 20 instructions should generate 5 samples
(4 x 5 = 20).  This is because cs_etm__sample() only calls once
cs_etm__synth_instruction_sample() to generate instruction sample per
range packet.

This patch fixes the logic in function cs_etm__sample(); the basic
idea for handling coming packet is:

- To synthesize the first instruction sample, it combines the left
  instructions from the previous packet and the head of the new
  packet; then generate continuous samples with sample period;
- At the tail of the new packet, if it has the rest instructions,
  these instructions will be left for the sequential sample.

Suggested-by: Mike Leach <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
Reviewed-by: Mathieu Poirier <[email protected]>
Reviewed-by: Mike Leach <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Walker <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: coresight ml <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Continuously record last branch

Every time synthesize instruction sample, the last branch recording will
be reset. This is fine if the instruction period is big enough, for
example if use the option '--itrace=i100000', the last branch array is
reset for every sample with 100000 instructions per period; before
generate the next instruction sample, there has the sufficient packets
coming to fill the last branch array.

On the other hand, if set a very small period, the packets will be
significantly reduced between two continuous instruction samples, thus
the last branch array is almost empty for new instruction sample by
frequently resetting.

To allow the last branches to work properly for any instruction periods,
this patch avoids to reset the last branch for every instruction sample
and only reset it when flush the trace data. The last branches will be
reset only for two cases, one is for trace starting, another case is for
discontinuous trace; other cases can keep recording last branches for
continuous instruction samples.

Signed-off-by: Leo Yan <[email protected]>
Reviewed-by: Mathieu Poirier <[email protected]>
Reviewed-by: Mike Leach <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Walker <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: coresight ml <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf cs-etm: Swap packets for instruction samples

If use option '--itrace=iNNN' with Arm CoreSight trace data, perf tool
fails inject instruction samples; the root cause is the packets are only
swapped for branch samples and last branches but not for instruction
samples, so the new coming packets cannot be properly handled for only
synthesizing instruction samples.

To fix this issue, this patch refactors the code with a new function
cs_etm__packet_swap() which is used to swap packets and adds the
condition for instruction samples.

Signed-off-by: Leo Yan <[email protected]>
Reviewed-by: Mathieu Poirier <[email protected]>
Reviewed-by: Mike Leach <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robert Walker <[email protected]>
Cc: Suzuki Poulouse <[email protected]>
Cc: coresight ml <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf map: Use strstarts() to look for Android libraries

And add the '/' to avoid looking at things like "/system/libsomething",
when all we want to know if it is like "/system/lib/something", i.e. if
it is in that system library dir.

Using strstarts() avoids off-by-one errors like recently fixed in this
file.

Since this adds the '/' I separated this patch, another patch will make
this consistent by removing other strncmp(str, prefix, manually
calculated prefix length) usage.

Reported-by: Dominik Czarnota <[email protected]>
Acked-by: Dominik Czarnota <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: http://lore.kernel.org/lkml/CABEVAa0_q-uC0vrrqpkqRHy_9RLOSXOJxizMLm1n5faHRy2AeA@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf map: Fix off by one in strncpy() size argument

This patch fixes an off-by-one error in strncpy size argument in
tools/perf/util/map.c. The issue is that in:

strncmp(filename, "/system/lib/", 11)

the passed string literal: "/system/lib/" has 12 bytes (without the NULL
byte) and the passed size argument is 11. As a result, the logic won't
match the ending "/" byte and will pass filepaths that are stored in
other directories e.g. "/system/libmalicious/bin" or just
"/system/libmalicious".

This functionality seems to be present only on Android. I assume the
/system/ directory is only writable by the root user, so I don't think
this bug has much (or any) security impact.

Fixes: eca818369996 ("perf tools: Add automatic remapping of Android libraries")
Signed-off-by: disconnect3d <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Changbin Du <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Keeping <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Lentine <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

HID: hid-sensor-custom: Use scnprintf() for avoiding potential buffer overflow

Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit. Fix it by replacing with scnprintf().

Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>

HID: hid-picolcd_fb: Use scnprintf() for avoiding potential buffer overflow

Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit. Fix it by replacing with scnprintf().

Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>

mac80211: Do not send mesh HWMP PREQ if HWMP is disabled

When trying to transmit to an unknown destination, the mesh code would
unconditionally transmit a HWMP PREQ even if HWMP is not the current
path selection algorithm.

Signed-off-by: Nicolas Cavallari <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

nl80211: add missing attribute validation for channel switch

Add missing attribute validation for NL80211_ATTR_OPER_CLASS
to the netlink policy.

Fixes: 1057d35ede5d ("cfg80211: introduce TDLS channel switch commands")
Signed-off-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

nl80211: add missing attribute validation for beacon report scanning

Add missing attribute validation for beacon report scanning
to the netlink policy.

Fixes: 1d76250bd34a ("nl80211: support beacon report scanning")
Signed-off-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

nl80211: add missing attribute validation for critical protocol indication

Add missing attribute validation for critical protocol fields
to the netlink policy.

Fixes: 5de17984898c ("cfg80211: introduce critical protocol indication from user-space")
Signed-off-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>

KVM: s390: Also reset registers in sync regs for initial cpu reset

When we do the initial CPU reset we must not only clear the registers
in the internal data structures but also in kvm_run sync_regs. For
modern userspace sync_regs is the only place that it looks at.

Fixes: 7de3f1423ff9 ("KVM: s390: Add new reset vcpu API")
Acked-by: David Hildenbrand <[email protected]>
Signed-off-by: Christian Borntraeger <[email protected]>

scsi: ipr: Fix softlockup when rescanning devices in petitboot

When trying to rescan disks in petitboot shell, we hit the following
softlockup stacktrace:

Kernel panic - not syncing: System is deadlocked on memory
[  241.223394] CPU: 32 PID: 693 Comm: sh Not tainted 5.4.16-openpower1 #1
[  241.223406] Call Trace:
[  241.223415] [c0000003f07c3180] [c000000000493fc4] dump_stack+0xa4/0xd8 (unreliable)
[  241.223432] [c0000003f07c31c0] [c00000000007d4ac] panic+0x148/0x3cc
[  241.223446] [c0000003f07c3260] [c000000000114b10] out_of_memory+0x468/0x4c4
[  241.223461] [c0000003f07c3300] [c0000000001472b0] __alloc_pages_slowpath+0x594/0x6d8
[  241.223476] [c0000003f07c3420] [c00000000014757c] __alloc_pages_nodemask+0x188/0x1a4
[  241.223492] [c0000003f07c34a0] [c000000000153e10] alloc_pages_current+0xcc/0xd8
[  241.223508] [c0000003f07c34e0] [c0000000001577ac] alloc_slab_page+0x30/0x98
[  241.223524] [c0000003f07c3520] [c0000000001597fc] new_slab+0x138/0x40c
[  241.223538] [c0000003f07c35f0] [c00000000015b204] ___slab_alloc+0x1e4/0x404
[  241.223552] [c0000003f07c36c0] [c00000000015b450] __slab_alloc+0x2c/0x48
[  241.223566] [c0000003f07c36f0] [c00000000015b754] kmem_cache_alloc_node+0x9c/0x1b4
[  241.223582] [c0000003f07c3760] [c000000000218c48] blk_alloc_queue_node+0x34/0x270
[  241.223599] [c0000003f07c37b0] [c000000000226574] blk_mq_init_queue+0x2c/0x78
[  241.223615] [c0000003f07c37e0] [c0000000002ff710] scsi_mq_alloc_queue+0x28/0x70
[  241.223631] [c0000003f07c3810] [c0000000003005b8] scsi_alloc_sdev+0x184/0x264
[  241.223647] [c0000003f07c38a0] [c000000000300ba0] scsi_probe_and_add_lun+0x288/0xa3c
[  241.223663] [c0000003f07c3a00] [c000000000301768] __scsi_scan_target+0xcc/0x478
[  241.223679] [c0000003f07c3b20] [c000000000301c64] scsi_scan_channel.part.9+0x74/0x7c
[  241.223696] [c0000003f07c3b70] [c000000000301df4] scsi_scan_host_selected+0xe0/0x158
[  241.223712] [c0000003f07c3bd0] [c000000000303f04] store_scan+0x104/0x114
[  241.223727] [c0000003f07c3cb0] [c0000000002d5ac4] dev_attr_store+0x30/0x4c
[  241.223741] [c0000003f07c3cd0] [c0000000001dbc34] sysfs_kf_write+0x64/0x78
[  241.223756] [c0000003f07c3cf0] [c0000000001da858] kernfs_fop_write+0x170/0x1b8
[  241.223773] [c0000003f07c3d40] [c0000000001621fc] __vfs_write+0x34/0x60
[  241.223787] [c0000003f07c3d60] [c000000000163c2c] vfs_write+0xa8/0xcc
[  241.223802] [c0000003f07c3db0] [c000000000163df4] ksys_write+0x70/0xbc
[  241.223816] [c0000003f07c3e20] [c00000000000b40c] system_call+0x5c/0x68

As a part of the scan process Linux will allocate and configure a
scsi_device for each target to be scanned. If the device is not present,
then the scsi_device is torn down. As a part of scsi_device teardown a
workqueue item will be scheduled and the lockups we see are because there
are 250k workqueue items to be processed.  Accoding to the specification of
SIS-64 sas controller, max_channel should be decreased on SIS-64 adapters
to 4.

The patch fixes softlockup issue.

Thanks for Oliver Halloran's help with debugging and explanation!

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wen Xiong <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

Merge branch 's390-qeth-fixes'

Julian Wiedmann says:

====================
s390/qeth: fixes 2020-03-10

This fixes three minor issues:
1) a setup parameter gets cleared unnecessarily when the HW config
changes,
2) insufficient error handling when initially filling the RX ring, and
3) a rarely used worker that needs to be cancelled during tear down.
====================

Signed-off-by: David S. Miller <[email protected]>

s390/qeth: cancel RX reclaim work earlier

When qeth's napi poll code fails to refill an entirely empty RX ring, it
kicks off buffer_reclaim_work to try again later.

Make sure that this worker is cancelled when setting the qeth device
offline. Otherwise a RX refill action can unexpectedly end up running
concurrently to bigger re-configurations (eg. resizing the buffer pool),
without any locking.

Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport")
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

s390/qeth: handle error when backing RX buffer

qeth_init_qdio_queues() fills the RX ring with an initial set of
RX buffers. If qeth_init_input_buffer() fails to back one of the RX
buffers with memory, we need to bail out and report the error.

Fixes: 4a71df50047f ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

s390/qeth: don't reset default_out_queue

When an OSA device in prio-queue setup is reduced to 1 TX queue due to
HW restrictions, we reset its the default_out_queue to 0.

In the old code this was needed so that qeth_get_priority_queue() gets
the queue selection right. But with proper multiqueue support we already
reduced dev->real_num_tx_queues to 1, and so the stack puts all traffic
on txq 0 without even calling .ndo_select_queue.

Thus we can preserve the user's configuration, and apply it if the OSA
device later re-gains support for multiple TX queues.

Fixes: 73dc2daf110f ("s390/qeth: add TX multiqueue support for OSA devices")
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

scsi: ufs: Fix possible unclocked access to auto hibern8 timer register

Before access auto hibner8 timer register, make sure power and clock are
properly configured to avoid unclocked register access.

Link: https://lore.kernel.org/r/[email protected]
Fixes: ba7af5ec5126 ("scsi: ufs: export ufshcd_auto_hibern8_update for vendor usage")
Reviewed-by: Stanley Chu <[email protected]>
Signed-off-by: Can Guo <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

Merge branch 'MACSec-bugfixes-related-to-MAC-address-change'

Igor Russkikh says:

====================
MACSec bugfixes related to MAC address change

We found out that there's an issue in MACSec code when the MAC address
is changed.
Both s/w and offloaded implementations don't update SCI when the MAC
address changes at the moment, but they should do so, because SCI contains
MAC in its first 6 octets.
====================

Signed-off-by: David S. Miller <[email protected]>

net: macsec: invoke mdo_upd_secy callback when mac address changed

Notify the offload engine about MAC address change to reconfigure it
accordingly.

Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure")
Signed-off-by: Dmitry Bogdanov <[email protected]>
Signed-off-by: Mark Starovoytov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: macsec: update SCI upon MAC address change.

SCI should be updated, because it contains MAC in its first 6 octets.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Dmitry Bogdanov <[email protected]>
Signed-off-by: Mark Starovoytov <[email protected]>
Signed-off-by: Igor Russkikh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ibmvnic: Do not process device remove during device reset

The ibmvnic driver does not check the device state when the device
is removed. If the device is removed while a device reset is being
processed, the remove may free structures needed by the reset,
causing an oops.

Fix this by checking the device state before processing device remove.

Signed-off-by: Juliet Kim <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/smc: cancel event worker during device removal

During IB device removal, cancel the event worker before the device
structure is freed.

Fixes: a4cf0443c414 ("smc: introduce SMC as an IB-client")
Reported-by: [email protected]
Signed-off-by: Karsten Graul <[email protected]>
Reviewed-by: Ursula Braun <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface

Rafał found an issue that for non-Ethernet interface, if we down and up
frequently, the memory will be consumed slowly.

The reason is we add allnodes/allrouters addressed in multicast list in
ipv6_add_dev(). When link down, we call ipv6_mc_down(), store all multicast
addresses via mld_add_delrec(). But when link up, we don't call ipv6_mc_up()
for non-Ethernet interface to remove the addresses. This makes idev->mc_tomb
getting bigger and bigger. The call stack looks like:

addrconf_notify(NETDEV_REGISTER)
ipv6_add_dev
ipv6_dev_mc_inc(ff01::1)
ipv6_dev_mc_inc(ff02::1)
ipv6_dev_mc_inc(ff02::2)

addrconf_notify(NETDEV_UP)
addrconf_dev_config
/* Alas, we support only Ethernet autoconfiguration. */
return;

addrconf_notify(NETDEV_DOWN)
addrconf_ifdown
ipv6_mc_down
igmp6_group_dropped(ff02::2)
mld_add_delrec(ff02::2)
igmp6_group_dropped(ff02::1)
igmp6_group_dropped(ff01::1)

After investigating, I can't found a rule to disable multicast on
non-Ethernet interface. In RFC2460, the link could be Ethernet, PPP, ATM,
tunnels, etc. In IPv4, it doesn't check the dev type when calls ip_mc_up()
in inetdev_event(). Even for IPv6, we don't check the dev type and call
ipv6_add_dev(), ipv6_dev_mc_inc() after register device.

So I think it's OK to fix this memory consumer by calling ipv6_mc_up() for
non-Ethernet interface.

v2: Also check IFF_MULTICAST flag to make sure the interface supports
multicast

Reported-by: Rafał Miłecki <[email protected]>
Tested-by: Rafał Miłecki <[email protected]>
Fixes: 74235a25c673 ("[IPV6] addrconf: Fix IPv6 on tuntap tunnels")
Fixes: 1666d49e1d41 ("mld: do not remove mld souce list info when set link down")
Signed-off-by: Hangbin Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'clang-format-for-linus-v5.6-rc6' of git://github.com/ojeda/linux

Pull clang-format update from Miguel Ojeda:
"Another update for the .clang-format macro list

It has been a while since the last time I sent one!"

* tag 'clang-format-for-linus-v5.6-rc6' of git://github.com/ojeda/linux:
clang-format: Update with the latest for_each macro list

net: memcg: late association of sock to memcg

If a TCP socket is allocated in IRQ context or cloned from unassociated
(i.e. not associated to a memcg) in IRQ context then it will remain
unassociated for its whole life. Almost half of the TCPs created on the
system are created in IRQ context, so, memory used by such sockets will
not be accounted by the memcg.

This issue is more widespread in cgroup v1 where network memory
accounting is opt-in but it can happen in cgroup v2 if the source socket
for the cloning was created in root memcg.

To fix the issue, just do the association of the sockets at the accept()
time in the process context and then force charge the memory buffer
already used and reserved by the socket.

Signed-off-by: Shakeel Butt <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cgroup: memcg: net: do not associate sock with unrelated cgroup

We are testing network memory accounting in our setup and noticed
inconsistent network memory usage and often unrelated cgroups network
usage correlates with testing workload. On further inspection, it
seems like mem_cgroup_sk_alloc() and cgroup_sk_alloc() are broken in
irq context specially for cgroup v1.

mem_cgroup_sk_alloc() and cgroup_sk_alloc() can be called in irq context
and kind of assumes that this can only happen from sk_clone_lock()
and the source sock object has already associated cgroup. However in
cgroup v1, where network memory accounting is opt-in, the source sock
can be unassociated with any cgroup and the new cloned sock can get
associated with unrelated interrupted cgroup.

Cgroup v2 can also suffer if the source sock object was created by
process in the root cgroup or if sk_alloc() is called in irq context.
The fix is to just do nothing in interrupt.

WARNING: Please note that about half of the TCP sockets are allocated
from the IRQ context, so, memory used by such sockets will not be
accouted by the memcg.

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:

CPU: 70 PID: 12720 Comm: ssh Tainted: 5.6.0-smp-DEV #1
Hardware name: ...
Call Trace:
<IRQ>
dump_stack+0x57/0x75
mem_cgroup_sk_alloc+0xe9/0xf0
sk_clone_lock+0x2a7/0x420
inet_csk_clone_lock+0x1b/0x110
tcp_create_openreq_child+0x23/0x3b0
tcp_v6_syn_recv_sock+0x88/0x730
tcp_check_req+0x429/0x560
tcp_v6_rcv+0x72d/0xa40
ip6_protocol_deliver_rcu+0xc9/0x400
ip6_input+0x44/0xd0
? ip6_protocol_deliver_rcu+0x400/0x400
ip6_rcv_finish+0x71/0x80
ipv6_rcv+0x5b/0xe0
? ip6_sublist_rcv+0x2e0/0x2e0
process_backlog+0x108/0x1e0
net_rx_action+0x26b/0x460
__do_softirq+0x104/0x2a6
do_softirq_own_stack+0x2a/0x40
</IRQ>
do_softirq.part.19+0x40/0x50
__local_bh_enable_ip+0x51/0x60
ip6_finish_output2+0x23d/0x520
? ip6table_mangle_hook+0x55/0x160
__ip6_finish_output+0xa1/0x100
ip6_finish_output+0x30/0xd0
ip6_output+0x73/0x120
? __ip6_finish_output+0x100/0x100
ip6_xmit+0x2e3/0x600
? ipv6_anycast_cleanup+0x50/0x50
? inet6_csk_route_socket+0x136/0x1e0
? skb_free_head+0x1e/0x30
inet6_csk_xmit+0x95/0xf0
__tcp_transmit_skb+0x5b4/0xb20
__tcp_send_ack.part.60+0xa3/0x110
tcp_send_ack+0x1d/0x20
tcp_rcv_state_process+0xe64/0xe80
? tcp_v6_connect+0x5d1/0x5f0
tcp_v6_do_rcv+0x1b1/0x3f0
? tcp_v6_do_rcv+0x1b1/0x3f0
__release_sock+0x7f/0xd0
release_sock+0x30/0xa0
__inet_stream_connect+0x1c3/0x3b0
? prepare_to_wait+0xb0/0xb0
inet_stream_connect+0x3b/0x60
__sys_connect+0x101/0x120
? __sys_getsockopt+0x11b/0x140
__x64_sys_connect+0x1a/0x20
do_syscall_64+0x51/0x200
entry_SYSCALL_64_after_hwframe+0x44/0xa9

The stack trace of mem_cgroup_sk_alloc() from IRQ-context:
Fixes: 2d7580738345 ("mm: memcontrol: consolidate cgroup socket tracking")
Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: Shakeel Butt <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'auxdisplay-for-linus-v5.6-rc6' of git://github.com/ojeda/linux

Pull auxdisplay updates from Miguel Ojeda:
"A few minor auxdisplay improvements:

   - charlcd: replace zero-length array with flexible-array member
     (kernel-wide cleanup by Gustavo A. R. Silva)

   - img-ascii-lcd: convert to devm_platform_ioremap_resource (Yangtao
     Li)

   - Fix Kconfig indentation (Krzysztof Kozlowski)

* tag 'auxdisplay-for-linus-v5.6-rc6' of git://github.com/ojeda/linux:
  auxdisplay: charlcd: replace zero-length array with flexible-array member
  auxdisplay: img-ascii-lcd: convert to devm_platform_ioremap_resource
  auxdisplay: Fix Kconfig indentation

MAINTAINERS: update cxgb4vf maintainer to Vishal

Casey Leedomn <[email protected]> is bouncing,
Vishal indicated he's happy to take the role.

Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

- cgroup.procs listing related fixes.

   It didn't interlock properly with exiting tasks leaving a short
   window where a cgroup has empty cgroup.procs but still can't be
   removed and misbehaved on short reads.

- psi_show() crash fix on 32bit ino archs

- Empty release_agent handling fix

* 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup1: don't call release_agent when it is ""
  cgroup: fix psi_show() crash on 32bit ino archs
  cgroup: Iterate tasks that did not finish do_exit()
  cgroup: cgroup_procs_next should increase position index
  cgroup-v1: cgroup_pidlist_next should update position index

Merge branch 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue fixes from Tejun Heo:
"Workqueue has been incorrectly round-robining per-cpu work items.
  Hillf's patch fixes that.

  The other patch documents memory-ordering properties of workqueue
  operations"

* 'for-5.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: don't use wq_select_unbound_cpu() for bound works
  workqueue: Document (some) memory-ordering properties of {queue,schedule}_work()

drm/amdgpu/powerplay: nv1x, renior copy dcn clock settings of watermark to smu during boot up

dc to pplib interface is changed for navi1x, renoir.
display_config_changed is not called by dc anymore.
smu_write_watermarks_table is not executed for navi1x, renoir
during boot up.

solution: call smu_write_watermarks_table just after dc pass
watermark clock settings to pplib

Signed-off-by: Hersen Wu <[email protected]>
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

perf vendor events intel: Add NO_NMI_WATCHDOG metric constraint

Add NO_NMI_WATCHDOG metric constraint to Page_Walks_Utilization for Sky Lake
and Cascade Lake.

Committer testing:

On a Lenovo T480S, Intel(R) Core(TM) i7-8650U Kaby Lake, that looking at x86's
mapfile.csv file is a:

  $ grep -w skylake tools/perf/pmu-events/arch/x86/mapfile.csv
  GenuineIntel-6-[4589]E,v24,skylake,core
  $

So uses the constraint added in this patch in this file:

  tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json

Before:

  # perf stat -a -M Page_Walks_Utilization sleep 2

   Performance counter stats for 'system wide':

       <not counted>      itlb_misses.walk_pending                                      (0.00%)
       <not counted>      dtlb_load_misses.walk_pending                                     (0.00%)
       <not counted>      dtlb_store_misses.walk_pending                                     (0.00%)
       <not counted>      ept.walk_pending                                              (0.00%)
       <not counted>      cycles                                                        (0.00%)

         2.001750514 seconds time elapsed

  Some events weren't counted. Try disabling the NMI watchdog:
   echo 0 > /proc/sys/kernel/nmi_watchdog
   perf stat ...
   echo 1 > /proc/sys/kernel/nmi_watchdog
  The events in group usually have to be from the same PMU. Try reorganizing the group.
  #

After:

  # perf stat -a -M Page_Walks_Utilization sleep 2
  Splitting metric group Page_Walks_Utilization into standalone metrics.
  Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
      echo 0 > /proc/sys/kernel/nmi_watchdog
      perf stat ...
      echo 1 > /proc/sys/kernel/nmi_watchdog
  ,
   Performance counter stats for 'system wide':

          36,883,102      itlb_misses.walk_pending  #      0.1 Page_Walks_Utilization   (79.99%)
         123,104,146      dtlb_load_misses.walk_pending                                     (80.02%)
          13,720,795      dtlb_store_misses.walk_pending                                     (79.99%)
                   0      ept.walk_pending                                              (79.99%)
       1,519,948,400      cycles                                                        (80.01%)

         2.002170780 seconds time elapsed

  #

Before and after, if we disable the nmi_watchdog we get:

  # echo 0 > /proc/sys/kernel/nmi_watchdog
  # perf stat -a -M Page_Walks_Utilization sleep 2

   Performance counter stats for 'system wide':

          33,721,658      itlb_misses.walk_pending  #      0.1 Page_Walks_Utilization
          84,070,996      dtlb_load_misses.walk_pending
           9,816,071      dtlb_store_misses.walk_pending
                   0      ept.walk_pending
         704,920,899      cycles

         2.002331670 seconds time elapsed

  #

  More information about the metric expressions:

  # perf stat -v -a -M Page_Walks_Utilization sleep 2
  Using CPUID GenuineIntel-6-8E-A
  metric expr ( itlb_misses.walk_pending + dtlb_load_misses.walk_pending + dtlb_store_misses.walk_pending + ept.walk_pending ) / ( 2 * cycles ) for Page_Walks_Utilization
  found event itlb_misses.walk_pending
  found event dtlb_load_misses.walk_pending
  found event dtlb_store_misses.walk_pending
  found event ept.walk_pending
  found event cycles
  adding {itlb_misses.walk_pending,dtlb_load_misses.walk_pending,dtlb_store_misses.walk_pending,ept.walk_pending,cycles}:W
   -> cpu/umask=0x10,(null)=0x186a3,event=0x85/
   -> cpu/umask=0x10,(null)=0x1e8483,event=0x8/
   -> cpu/umask=0x10,(null)=0x1e8483,event=0x49/
   -> cpu/umask=0x10,(null)=0x1e8483,event=0x4f/
  itlb_misses.walk_pending: 8085772 16010162799 16010162799
  dtlb_load_misses.walk_pending: 28134579 16010162799 16010162799
  dtlb_store_misses.walk_pending: 7276535 16010162799 16010162799
  ept.walk_pending: 2 16010162799 16010162799
  cycles: 315140605 16010162799 16010162799

   Performance counter stats for 'system wide':

           8,085,772      itlb_misses.walk_pending  #      0.1 Page_Walks_Utilization
          28,134,579      dtlb_load_misses.walk_pending
           7,276,535      dtlb_store_misses.walk_pending
                   2      ept.walk_pending
         315,140,605      cycles

         2.002333181 seconds time elapsed

  #

Signed-off-by: Kan Liang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

perf metricgroup: Support metric constraint

Some metric groups have metric constraints. A metric group can be
scheduled as a group only when some constraints are applied. For
example, Page_Walks_Utilization has a metric constraint,
"NO_NMI_WATCHDOG".

When NMI watchdog is disabled, the metric group can be scheduled as a
group. Otherwise, splitting the metric group into standalone metrics.

Add a new function, metricgroup__has_constraint(), to check whether all
constraints are applied. If not, splitting the metric group into
standalone metrics.

Currently, only one constraint, "NO_NMI_WATCHDOG", is checked. Print a
warning for the metric group with the constraint, when NMI WATCHDOG is
enabled.

Signed-off-by: Kan Liang <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>