Git Repo - linux.git/log

perf arch powerpc: Sync powerpc syscall.tbl with the kernel sources

To get the changes in:

fbcee2ebe8edbb6a ("powerpc/32: Always save non volatile GPRs at syscall entry")

That shouldn't cause any change in tooling, just silences the following
tools/perf/ build warning:

Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'

Cc: Christophe Leroy <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools headers UAPI: Sync openat2.h with the kernel sources

To pick the changes in:

  99668f618062816c ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED")

That don't result in any change in tooling, only silences this perf
build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/openat2.h' differs from latest version at 'include/uapi/linux/openat2.h'
  diff -u tools/include/uapi/linux/openat2.h include/uapi/linux/openat2.h

Cc: Al Viro <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools headers UAPI: Sync drm/i915_drm.h with the kernel sources

To pick the changes in:

  8c3b1ba0e7ea9a80 ("drm/i915/gt: Track the overall awake/busy time")
  348fb0cb0a79bce0 ("drm/i915/pmu: Deprecate I915_PMU_LAST and optimize state tracking")

That don't result in any change in tooling:

  $ tools/perf/trace/beauty/drm_ioctl.sh > before
  $ cp include/uapi/drm/i915_drm.h tools/include/uapi/drm/i915_drm.h
  $ tools/perf/trace/beauty/drm_ioctl.sh > after
  $ diff -u before after
  $

Only silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
  diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h

Cc: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools headers UAPI: Update tools's copy of drm.h headers

Picking the changes from:

  0e0dc448005583a6 ("drm/doc: demote old doc-comments in drm.h")

Silencing these perf build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h'
  diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h

No changes in tooling as these are just C comment documentation changes.

Cc: Simon Ser <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

cifs: ask for more credit on async read/write code paths

When doing a large read or write workload we only
very gradually increase the number of credits
which can cause problems with parallelizing large i/o
(I/O ramps up more slowly than it should for large
read/write workloads) especially with multichannel
when the number of credits on the secondary channels
starts out low (e.g. less than about 130) or when
recovering after server throttled back the number
of credit.

Signed-off-by: Aurelien Aptel <[email protected]>
Reviewed-by: Shyam Prasad N <[email protected]>
Signed-off-by: Steve French <[email protected]>

cifs: fix credit accounting for extra channel

With multichannel, operations like the queries
from "ls -lR" can cause all credits to be used and
errors to be returned since max_credits was not
being set correctly on the secondary channels and
thus the client was requesting 0 credits incorrectly
in some cases (which can lead to not having
enough credits to perform any operation on that
channel).

Signed-off-by: Aurelien Aptel <[email protected]>
CC: <[email protected]> # v5.8+
Reviewed-by: Shyam Prasad N <[email protected]>
Signed-off-by: Steve French <[email protected]>

m68k: Fix virt_addr_valid() W=1 compiler warnings

If CONFIG_DEBUG_SG=y, and CONFIG_MMU=y:

    include/linux/scatterlist.h: In function ‘sg_set_buf’:
    arch/m68k/include/asm/page_mm.h:174:49: warning: ordered comparison of pointer with null pointer [-Wextra]
      174 | #define virt_addr_valid(kaddr) ((void *)(kaddr) >= (void *)PAGE_OFFSET && (void *)(kaddr) < high_memory)
  |                                                 ^~

or CONFIG_MMU=n:

    include/linux/scatterlist.h: In function ‘sg_set_buf’:
    arch/m68k/include/asm/page_no.h:33:50: warning: ordered comparison of pointer with null pointer [-Wextra]
       33 | #define virt_addr_valid(kaddr) (((void *)(kaddr) >= (void *)PAGE_OFFSET) && \
  |                                                  ^~

Fix this by doing the comparison in the "unsigned long" instead of the
"void *" domain.

Note that for now this is only seen when compiling btrfs, due to commit
e9aa7c285d20a69c ("btrfs: enable W=1 checks for btrfs"), but as people
are doing more W=1 compile testing, it will start to show up elsewhere,
too.

Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Greg Ungerer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Linux 5.12-rc2

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
"Nothing special here, though Bob's regression fixes for rxe would have
  made it before the rc cycle had there not been such strong winter
  weather!

   - Fix corner cases in the rxe reference counting cleanup that are
     causing regressions in blktests for SRP

   - Two kdoc fixes so W=1 is clean

   - Missing error return in error unwind for mlx5

   - Wrong lock type nesting in IB CM"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/rxe: Fix errant WARN_ONCE in rxe_completer()
  RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt()
  RDMA/rxe: Fix missed IB reference counting in loopback
  RDMA/uverbs: Fix kernel-doc warning of _uverbs_alloc
  RDMA/mlx5: Set correct kernel-doc identifier
  IB/mlx5: Add missing error code
  RDMA/rxe: Fix missing kconfig dependency on CRYPTO
  RDMA/cm: Fix IRQ restore in ib_send_cm_sidr_rep

Merge tag 'gcc-plugins-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull gcc-plugins fixes from Kees Cook:
"Tiny gcc-plugin fixes for v5.12-rc2. These issues are small but have
  been reported a couple times now by static analyzers, so best to get
  them fixed to reduce the noise. :)

   - Fix coding style issues (Jason Yan)"

* tag 'gcc-plugins-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: latent_entropy: remove unneeded semicolon
  gcc-plugins: structleak: remove unneeded variable 'ret'

Merge tag 'pstore-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore fixes from Kees Cook:

- Rate-limit ECC warnings (Dmitry Osipenko)

- Fix error path check for NULL (Tetsuo Handa)

* tag 'pstore-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Rate-limit "uncorrectable error in header" message
pstore: Fix warning in pstore_kill_sb()

ethernet: alx: fix order of calls on resume

netif_device_attach() will unpause the queues so we can't call
it before __alx_open(). This went undetected until
commit b0999223f224 ("alx: add ability to allocate and free
alx_napi structures") but now if stack tries to xmit immediately
on resume before __alx_open() we'll crash on the NAPI being null:

BUG: kernel NULL pointer dereference, address: 0000000000000198
CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G           OE 5.10.0-3-amd64 #1 Debian 5.10.13-1
Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77-D3H, BIOS F15 11/14/2013
RIP: 0010:alx_start_xmit+0x34/0x650 [alx]
Code: 41 56 41 55 41 54 55 53 48 83 ec 20 0f b7 57 7c 8b 8e b0
0b 00 00 39 ca 72 06 89 d0 31 d2 f7 f1 89 d2 48 8b 84 df
RSP: 0018:ffffb09240083d28 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffffa04d80ae7800 RCX: 0000000000000004
RDX: 0000000000000000 RSI: ffffa04d80afa000 RDI: ffffa04e92e92a00
RBP: 0000000000000042 R08: 0000000000000100 R09: ffffa04ea3146700
R10: 0000000000000014 R11: 0000000000000000 R12: ffffa04e92e92100
R13: 0000000000000001 R14: ffffa04e92e92a00 R15: ffffa04e92e92a00
FS:  0000000000000000(0000) GS:ffffa0508f600000(0000) knlGS:0000000000000000
i915 0000:00:02.0: vblank wait timed out on crtc 0
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000198 CR3: 000000004460a001 CR4: 00000000001706f0
Call Trace:
  dev_hard_start_xmit+0xc7/0x1e0
  sch_direct_xmit+0x10f/0x310

Cc: <[email protected]> # 4.9+
Fixes: bc2bebe8de8e ("alx: remove WoL support")
Reported-by: Zbynek Michl <[email protected]>
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983595
Signed-off-by: Jakub Kicinski <[email protected]>
Tested-by: Zbynek Michl <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

lan743x: trim all 4 bytes of the FCS; not just 2

Trim all 4 bytes of the received FCS; not just 2 of them. Leaving 2
bytes of the FCS on the frame breaks DSA tailing tag drivers.

Fixes: a8db76d40e4d ("lan743x: boost performance on cpu archs w/o dma cache snooping")
Signed-off-by: George McCollister <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'for-5.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:
"Fix DM verity target's optional Forward Error Correction (FEC) for
  Reed-Solomon roots that are unaligned to block size"

* tag 'for-5.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm verity: fix FEC for RS roots unaligned to block size
  dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size

gianfar: fix jumbo packets+napi+rx overrun crash

When using jumbo packets and overrunning rx queue with napi enabled,
the following sequence is observed in gfar_add_rx_frag:

   | lstatus                              |       | skb                   |
t  | lstatus,  size, flags                | first | len, data_len, *ptr   |
---+--------------------------------------+-------+-----------------------+
13 | 18002348, 9032, INTERRUPT LAST       | 0     | 9600, 8000,  f554c12e |
12 | 10000640, 1600, INTERRUPT            | 0     | 8000, 6400,  f554c12e |
11 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  f554c12e |
10 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  f554c12e |
09 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  f554c12e |
08 | 14000640, 1600, INTERRUPT FIRST      | 0     | 1600, 0,     f554c12e |
07 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     f554c12e |
06 | 1c000080, 128,  INTERRUPT LAST FIRST | 1     | 0,    0,     abf3bd6e |
05 | 18002348, 9032, INTERRUPT LAST       | 0     | 8000, 6400,  c5a57780 |
04 | 10000640, 1600, INTERRUPT            | 0     | 6400, 4800,  c5a57780 |
03 | 10000640, 1600, INTERRUPT            | 0     | 4800, 3200,  c5a57780 |
02 | 10000640, 1600, INTERRUPT            | 0     | 3200, 1600,  c5a57780 |
01 | 10000640, 1600, INTERRUPT            | 0     | 1600, 0,     c5a57780 |
00 | 14000640, 1600, INTERRUPT FIRST      | 1     | 0,    0,     c5a57780 |

So at t=7 a new packets is started but not finished, probably due to rx
overrun - but rx overrun is not indicated in the flags. Instead a new
packets starts at t=8. This results in skb->len to exceed size for the LAST
fragment at t=13 and thus a negative fragment size added to the skb.

This then crashes:

kernel BUG at include/linux/skbuff.h:2277!
Oops: Exception in kernel mode, sig: 5 [#1]
...
NIP [c04689f4] skb_pull+0x2c/0x48
LR [c03f62ac] gfar_clean_rx_ring+0x2e4/0x844
Call Trace:
[ec4bfd38] [c06a84c4] _raw_spin_unlock_irqrestore+0x60/0x7c (unreliable)
[ec4bfda8] [c03f6a44] gfar_poll_rx_sq+0x48/0xe4
[ec4bfdc8] [c048d504] __napi_poll+0x54/0x26c
[ec4bfdf8] [c048d908] net_rx_action+0x138/0x2c0
[ec4bfe68] [c06a8f34] __do_softirq+0x3a4/0x4fc
[ec4bfed8] [c0040150] run_ksoftirqd+0x58/0x70
[ec4bfee8] [c0066ecc] smpboot_thread_fn+0x184/0x1cc
[ec4bff08] [c0062718] kthread+0x140/0x144
[ec4bff38] [c0012350] ret_from_kernel_thread+0x14/0x1c

This patch fixes this by checking for computed LAST fragment size, so a
negative sized fragment is never added.
In order to prevent the newer rx frame from getting corrupted, the FIRST
flag is checked to discard the incomplete older frame.

Signed-off-by: Michael Braun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sun/niu: fix wrong RXMAC_BC_FRM_CNT_COUNT count

RXMAC_BC_FRM_CNT_COUNT added to mp->rx_bcasts twice in a row
in niu_xmac_interrupt(). Remove the second addition.

Signed-off-by: Denis Efremov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/hamradio/6pack: remove redundant check in sp_encaps()

"len > sp->mtu" checked twice in a row in sp_encaps().
Remove the second check.

Signed-off-by: Denis Efremov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8169: fix r8168fp_adjust_ocp_cmd function

The (0xBAF70000 & 0x00FFF000) << 6 should be (0xf70 << 18).

Fixes: 561535b0f239 ("r8169: fix OCP access on RTL8117")
Signed-off-by: Hayes Wang <[email protected]>
Acked-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftest/net/ipsec.c: Remove unneeded semicolon

fix semicolon.cocci warning:
tools/testing/selftests/net/ipsec.c:1788:2-3: Unneeded semicolon

Signed-off-by: Xu Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ibmvnic: remove excessive irqsave

ibmvnic_remove locks multiple spinlocks while disabling interrupts:
spin_lock_irqsave(&adapter->state_lock, flags);
spin_lock_irqsave(&adapter->rwi_lock, flags);

As reported by coccinelle, the second _irqsave() overwrites the value
saved in 'flags' by the first _irqsave(), therefore when the second
_irqrestore() comes,the value in 'flags' is not valid,the value saved
by the first _irqsave() has been lost.
This likely leads to IRQs remaining disabled. So remove the second
_irqsave():
spin_lock_irqsave(&adapter->state_lock, flags);
spin_lock(&adapter->rwi_lock);

Generated by: ./scripts/coccinelle/locks/flags.cocci
./drivers/net/ethernet/ibm/ibmvnic.c:5413:1-18:
ERROR: nested lock+irqsave that reuses flags from line 5404.

Fixes: 4a41c421f367 ("ibmvnic: serialize access to work queue on remove")
Signed-off-by: Junlin Yang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

CIPSO: Fix unaligned memory access in cipso_v4_gentag_hdr

We need to use put_unaligned when writing 32-bit DOI value
in cipso_v4_gentag_hdr to avoid unaligned memory access.

v2: unneeded type cast removed as Ondrej Mosnacek suggested.

Signed-off-by: Sergey Nazarov <[email protected]>
Acked-by: Paul Moore <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'block-5.12-2021-03-05' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

- NVMe fixes:
      - more device quirks (Julian Einwag, Zoltán Böszörményi, Pascal
        Terjan)
      - fix a hwmon error return (Daniel Wagner)
      - fix the keep alive timeout initialization (Martin George)
      - ensure the model_number can't be changed on a used subsystem
        (Max Gurtovoy)

- rsxx missing -EFAULT on copy_to_user() failure (Dan)

- rsxx remove unused linux.h include (Tian)

- kill unused RQF_SORTED (Jean)

- updated outdated BFQ comments (Joseph)

- revert work-around commit for bd_size_lock, since we removed the
   offending user in this merge window (Damien)

* tag 'block-5.12-2021-03-05' of git://git.kernel.dk/linux-block:
  nvmet: model_number must be immutable once set
  nvme-fabrics: fix kato initialization
  nvme-hwmon: Return error code when registration fails
  nvme-pci: add quirks for Lexar 256GB SSD
  nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
  nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.
  rsxx: Return -EFAULT if copy_to_user() fails
  block/bfq: update comments and default value in docs for fifo_expire
  rsxx: remove unused including <linux/version.h>
  block: Drop leftover references to RQF_SORTED
  block: revert "block: fix bd_size_lock use"

stmmac: intel: Fixes clock registration error seen for multiple interfaces

Issue seen when enumerating multiple Intel mGbE interfaces in EHL.

[    6.898141] intel-eth-pci 0000:00:1d.2: enabling device (0000 -> 0002)
[    6.900971] intel-eth-pci 0000:00:1d.2: Fail to register stmmac-clk
[    6.906434] intel-eth-pci 0000:00:1d.2: User ID: 0x51, Synopsys ID: 0x52

We fix it by making the clock name to be unique following the format
of stmmac-pci_name(pci_dev) so that we can differentiate the clock for
these Intel mGbE interfaces in EHL platform as follow:

  /sys/kernel/debug/clk/stmmac-0000:00:1d.1
  /sys/kernel/debug/clk/stmmac-0000:00:1d.2
  /sys/kernel/debug/clk/stmmac-0000:00:1e.4

Fixes: 58da0cfa6cf1 ("net: stmmac: create dwmac-intel.c to contain all Intel platform")
Signed-off-by: Wong Vee Khee <[email protected]>
Signed-off-by: Voon Weifeng <[email protected]>
Co-developed-by: Ong Boon Leong <[email protected]>
Signed-off-by: Ong Boon Leong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: stmmac: Fix VLAN filter delete timeout issue in Intel mGBE SGMII

For Intel mGbE controller, MAC VLAN filter delete operation will time-out
if serdes power-down sequence happened first during driver remove() with
below message.

[82294.764958] intel-eth-pci 0000:00:1e.4 eth2: stmmac_dvr_remove: removing driver
[82294.778677] intel-eth-pci 0000:00:1e.4 eth2: Timeout accessing MAC_VLAN_Tag_Filter
[82294.779997] intel-eth-pci 0000:00:1e.4 eth2: failed to kill vid 0081/0
[82294.947053] intel-eth-pci 0000:00:1d.2 eth1: stmmac_dvr_remove: removing driver
[82295.002091] intel-eth-pci 0000:00:1d.1 eth0: stmmac_dvr_remove: removing driver

Therefore, we delay the serdes power-down to be after unregister_netdev()
which triggers the VLAN filter delete.

Fixes: b9663b7ca6ff ("net: stmmac: Enable SERDES power up/down sequence")
Signed-off-by: Ong Boon Leong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: intel: iavf: fix error return code of iavf_init_get_resources()

When iavf_process_config() fails, no error return code of
iavf_init_get_resources() is assigned.
To fix this bug, err is assigned with the return value of
iavf_process_config(), and then err is checked.

Reported-by: TOTE Robot <[email protected]>
Signed-off-by: Jia-Ju Bai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: tehuti: fix error return code in bdx_probe()

When bdx_read_mac() fails, no error return code of bdx_probe()
is assigned.
To fix this bug, err is assigned with -EFAULT as error return code.

Reported-by: TOTE Robot <[email protected]>
Signed-off-by: Jia-Ju Bai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
"A bit of a mix between fallout from the worker change, cleanups and
  reductions now possible from that change, and fixes in general. In
  detail:

   - Fully serialize manager and worker creation, fixing races due to
     that.

   - Clean up some naming that had gone stale.

   - SQPOLL fixes.

   - Fix race condition around task_work rework that went into this
     merge window.

   - Implement unshare. Used for when the original task does unshare(2)
     or setuid/seteuid and friends, drops the original workers and forks
     new ones.

   - Drop the only remaining piece of state shuffling we had left, which
     was cred. Move it into issue instead, and we can drop all of that
     code too.

   - Kill f_op->flush() usage. That was such a nasty hack that we had
     out of necessity, we no longer need it.

   - Following from ->flush() removal, we can also drop various bits of
     ctx state related to SQPOLL and cancelations.

   - Fix an issue with IOPOLL retry, which originally was fallout from a
     filemap change (removing iov_iter_revert()), but uncovered an issue
     with iovec re-import too late.

   - Fix an issue with system suspend.

   - Use xchg() for fallback work, instead of cmpxchg().

   - Properly destroy io-wq on exec.

   - Add create_io_thread() core helper, and use that in io-wq and
     io_uring. This allows us to remove various silly completion events
     related to thread setup.

   - A few error handling fixes.

  This should be the grunt of fixes necessary for the new workers, next
  week should be quieter. We've got a pending series from Pavel on
  cancelations, and how tasks and rings are indexed. Outside of that,
  should just be minor fixes. Even with these fixes, we're still killing
  a net ~80 lines"

* tag 'io_uring-5.12-2021-03-05' of git://git.kernel.dk/linux-block: (41 commits)
  io_uring: don't restrict issue_flags for io_openat
  io_uring: make SQPOLL thread parking saner
  io-wq: kill hashed waitqueue before manager exits
  io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return
  io_uring: don't keep looping for more events if we can't flush overflow
  io_uring: move to using create_io_thread()
  kernel: provide create_io_thread() helper
  io_uring: reliably cancel linked timeouts
  io_uring: cancel-match based on flags
  io-wq: ensure all pending work is canceled on exit
  io_uring: ensure that threads freeze on suspend
  io_uring: remove extra in_idle wake up
  io_uring: inline __io_queue_async_work()
  io_uring: inline io_req_clean_work()
  io_uring: choose right tctx->io_wq for try cancel
  io_uring: fix -EAGAIN retry with IOPOLL
  io-wq: fix error path leak of buffered write hash map
  io_uring: remove sqo_task
  io_uring: kill sqo_dead and sqo submission halting
  io_uring: ignore double poll add on the same waitqueue head
  ...

net/mlx4_en: update moderation when config reset

This patch fixes a bug that the moderation config will not be
applied when calling mlx4_en_reset_config. For example, when
turning on rx timestamping, mlx4_en_reset_config() will be called,
causing the NIC to forget previous moderation config.

This fix is in phase with a previous fix:
commit 79c54b6bbf06 ("net/mlx4_en: Fix TX moderation info loss
after set_ringparam is called")

Tested: Before this patch, on a host with NIC using mlx4, run
netserver and stream TCP to the host at full utilization.
$ sar -I SUM 1
                 INTR    intr/s
14:03:56          sum  48758.00

After rx hwtstamp is enabled:
$ sar -I SUM 1
14:10:38          sum 317771.00
We see the moderation is not working properly and issued 7x more
interrupts.

After the patch, and turned on rx hwtstamp, the rate of interrupts
is as expected:
$ sar -I SUM 1
14:52:11          sum  49332.00

Fixes: 79c54b6bbf06 ("net/mlx4_en: Fix TX moderation info loss after set_ringparam is called")
Signed-off-by: Kevin(Yudong) Yang <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Neal Cardwell <[email protected]>
CC: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'pm-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix the usage of device links in the runtime PM core code and
  update the DTPM (Dynamic Thermal Power Management) feature added
  recently.

  Specifics:

   - Make the runtime PM core code avoid attempting to suspend supplier
     devices before updating the PM-runtime status of a consumer to
     'suspended' (Rafael Wysocki).

   - Fix DTPM (Dynamic Thermal Power Management) root node
     initialization and label that feature as EXPERIMENTAL in Kconfig
     (Daniel Lezcano)"

* tag 'pm-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  powercap/drivers/dtpm: Add the experimental label to the option description
  powercap/drivers/dtpm: Fix root node initialization
  PM: runtime: Update device status before letting suppliers suspend

Merge tag 'acpi-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"Make the empty stubs of some helper functions used when CONFIG_ACPI is
not set actually match those functions (Andy Shevchenko)"

* tag 'acpi-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: bus: Constify is_acpi_node() and friends (part 2)

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2021-03-04

The following pull-request contains BPF updates for your *net* tree.

We've added 7 non-merge commits during the last 4 day(s) which contain
a total of 9 files changed, 128 insertions(+), 40 deletions(-).

The main changes are:

1) Fix 32-bit cmpxchg, from Brendan.

2) Fix atomic+fetch logic, from Ilya.

3) Fix usage of bpf_csum_diff in selftests, from Yauheni.
====================

Merge tag 'iommu-fixes-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

- Fix a sleeping-while-atomic issue in the AMD IOMMU code

- Disable lazy IOTLB flush for untrusted devices in the Intel VT-d
   driver

- Fix status code definitions for Intel VT-d

- Fix IO Page Fault issue in Tegra IOMMU driver

* tag 'iommu-fixes-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix status code for Allocate/Free PASID command
  iommu: Don't use lazy flush for untrusted device
  iommu/tegra-smmu: Fix mc errors on tegra124-nyan
  iommu/amd: Fix sleeping in atomic in increase_address_space()

Merge tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"More regression fixes and stabilization.

  Regressions:

   - zoned mode
      - count zone sizes in wider int types
      - fix space accounting for read-only block groups

   - subpage: fix page tail zeroing

  Fixes:

   - fix spurious warning when remounting with free space tree

   - fix warning when creating a directory with smack enabled

   - ioctl checks for qgroup inheritance when creating a snapshot

   - qgroup
      - fix missing unlock on error path in zero range
      - fix amount of released reservation on error
      - fix flushing from unsafe context with open transaction,
        potentially deadlocking

   - minor build warning fixes"

* tag 'for-5.12-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: do not account freed region of read-only block group as zone_unusable
  btrfs: zoned: use sector_t for zone sectors
  btrfs: subpage: fix the false data csum mismatch error
  btrfs: fix warning when creating a directory with smack enabled
  btrfs: don't flush from btrfs_delayed_inode_reserve_metadata
  btrfs: export and rename qgroup_reserve_meta
  btrfs: free correct amount of space in btrfs_delayed_inode_reserve_metadata
  btrfs: fix spurious free_space_tree remount warning
  btrfs: validate qgroup inherit for SNAP_CREATE_V2 ioctl
  btrfs: unlock extents in btrfs_zero_range in case of quota reservation errors
  btrfs: ref-verify: use 'inline void' keyword ordering

Merge tag 'devicetree-fixes-for-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

- Another batch of graph and video-interfaces schema conversions

- Drop DT header symlink for dropped C6X arch

- Fix bcm2711-hdmi schema error

* tag 'devicetree-fixes-for-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: media: Use graph and video-interfaces schemas, round 2
  dts: drop dangling c6x symlink
  dt-bindings: bcm2711-hdmi: Fix broken schema

Merge tag 'trace-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"Functional fixes:

   - Fix big endian conversion for arm64 in recordmcount processing

   - Fix timestamp corruption in ring buffer on discarding events

   - Fix memory leak in __create_synth_event()

   - Skip selftests if tracing is disabled as it will cause them to
     fail.

  Non-functional fixes:

   - Fix help text in Kconfig

   - Remove duplicate prototype for trace_empty()

   - Fix stale comment about the trace_event_call flags.

  Self test update:

   - Add more information to the validation output of when a corrupt
     timestamp is found in the ring buffer, and also trigger a warning
     to make sure that tests catch it"

* tag 'trace-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix comment about the trace_event_call flags
  tracing: Skip selftests if tracing is disabled
  tracing: Fix memory leak in __create_synth_event()
  ring-buffer: Add a little more information and a WARN when time stamp going backwards is detected
  ring-buffer: Force before_stamp and write_stamp to be different on discard
  tracing: Fix help text of TRACEPOINT_BENCHMARK in Kconfig
  tracing: Remove duplicate declaration from trace.h
  ftrace: Have recordmcount use w8 to read relp->r_info in arm64_is_fake_mcount

RDMA/rxe: Fix errant WARN_ONCE in rxe_completer()

In rxe_comp.c in rxe_completer() the function free_pkt() did not clear skb
which triggered a warning at 'done:' and could possibly at 'exit:'. The
WARN_ONCE() calls are not actually needed. The call to free_pkt() is
moved to the end to clearly show that all skbs are freed.

Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt()

rxe_rcv_mcast_pkt() dropped a reference to ib_device when no error
occurred causing an underflow on the reference counter. This code is
cleaned up to be clearer and easier to read.

Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

RDMA/rxe: Fix missed IB reference counting in loopback

When the noted patch below extending the reference taken by
rxe_get_dev_from_net() in rxe_udp_encap_recv() until each skb is freed it
was not matched by a reference in the loopback path resulting in
underflows.

Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

io_uring: don't restrict issue_flags for io_openat

45d189c606292 ("io_uring: replace force_nonblock with flags") did
something strange for io_openat() slicing all issue_flags but
IO_URING_F_NONBLOCK. Not a bug for now, but better to just forward the
flags.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'nvme-5.12-2021-03-05' of git://git.infradead.org/nvme into block-5.12

Pull NVMe fixes from Christoph:

"nvme fixes for 5.12:

- more device quirks (Julian Einwag, Zoltán Böszörményi, Pascal Terjan)
- fix a hwmon error return (Daniel Wagner)
- fix the keep alive timeout initialization (Martin George)
- ensure the model_number can't be changed on a used subsystem
   (Max Gurtovoy)"

* tag 'nvme-5.12-2021-03-05' of git://git.infradead.org/nvme:
  nvmet: model_number must be immutable once set
  nvme-fabrics: fix kato initialization
  nvme-hwmon: Return error code when registration fails
  nvme-pci: add quirks for Lexar 256GB SSD
  nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
  nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.

io_uring: make SQPOLL thread parking saner

We have this weird true/false return from parking, and then some of the
callers decide to look at that. It can lead to unbalanced parks and
sqd locking. Have the callers check the thread status once it's parked.
We know we have the lock at that point, so it's either valid or it's NULL.

Fix race with parking on thread exit. We need to be careful here with
ordering of the sdq->lock and the IO_SQ_THREAD_SHOULD_PARK bit.

Rename sqd->completion to sqd->parked to reflect that this is the only
thing this completion event doesn.

Signed-off-by: Jens Axboe <[email protected]>

io-wq: kill hashed waitqueue before manager exits

If we race with shutting down the io-wq context and someone queueing
a hashed entry, then we can exit the manager with it armed. If it then
triggers after the manager has exited, we can have a use-after-free where
io_wqe_hash_wake() attempts to wake a now gone manager process.

Move the killing of the hashed write queue into the manager itself, so
that we know we've killed it before the task exits.

Fixes: e941894eae31 ("io-wq: make buffered file write hashed work map per-ctx")
Signed-off-by: Jens Axboe <[email protected]>

io_uring: clear IOCB_WAITQ for non -EIOCBQUEUED return

The callback can only be armed, if we get -EIOCBQUEUED returned. It's
important that we clear the WAITQ bit for other cases, otherwise we can
queue for async retry and filemap will assume that we're armed and
return -EAGAIN instead of just blocking for the IO.

Cc: [email protected] # 5.9+
Signed-off-by: Jens Axboe <[email protected]>

io_uring: don't keep looping for more events if we can't flush overflow

It doesn't make sense to wait for more events to come in, if we can't
even flush the overflow we already have to the ring. Return -EBUSY for
that condition, just like we do for attempts to submit with overflow
pending.

Cc: [email protected] # 5.11
Signed-off-by: Jens Axboe <[email protected]>

io_uring: move to using create_io_thread()

This allows us to do task creation and setup without needing to use
completions to try and synchronize with the starting thread. Get rid of
the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup
completion events - we can now do setup before the task is running.

Signed-off-by: Jens Axboe <[email protected]>

Merge branch 'powercap'

* powercap:
powercap/drivers/dtpm: Add the experimental label to the option description
powercap/drivers/dtpm: Fix root node initialization

nvmet: model_number must be immutable once set

In case we have already established connection to nvmf target, it
shouldn't be allowed to change the model_number. E.g. if someone will
identify ctrl and get model_number of "my_model" later on will change
the model_numbel via configfs to "my_new_model" this will break the NVMe
specification for "Get Log Page – Persistent Event Log" that refers to
Model Number as: "This field contains the same value as reported in the
Model Number field of the Identify Controller data structure, bytes
63:24."

Although it doesn't mentioned explicitly that this field can't be
changed, we can assume it.

So allow setting this field only once: using configfs or in the first
identify ctrl operation.

Signed-off-by: Max Gurtovoy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme-fabrics: fix kato initialization

Currently kato is initialized to NVME_DEFAULT_KATO for both
discovery & i/o controllers. This is a problem specifically
for non-persistent discovery controllers since it always ends
up with a non-zero kato value. Fix this by initializing kato
to zero instead, and ensuring various controllers are assigned
appropriate kato values as follows:

non-persistent controllers  - kato set to zero
persistent controllers      - kato set to NVMF_DEV_DISC_TMO
                              (or any positive int via nvme-cli)
i/o controllers             - kato set to NVME_DEFAULT_KATO
                              (or any positive int via nvme-cli)

Signed-off-by: Martin George <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme-hwmon: Return error code when registration fails

The hwmon pointer wont be NULL if the registration fails. Though the
exit code path will assign it to ctrl->hwmon_device. Later
nvme_hwmon_exit() will try to free the invalid pointer. Avoid this by
returning the error code from hwmon_device_register_with_info().

Fixes: ed7770f66286 ("nvme/hwmon: rework to avoid devm allocation")
Signed-off-by: Daniel Wagner <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme-pci: add quirks for Lexar 256GB SSD

Add the NVME_QUIRK_NO_NS_DESC_LIST and NVME_QUIRK_IGNORE_DEV_SUBNQN
quirks for this buggy device.

Reported and tested in https://bugs.mageia.org/show_bug.cgi?id=28417

Signed-off-by: Pascal Terjan <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state

My 2TB SKC2000 showed the exact same symptoms that were provided
in 538e4a8c57 ("nvme-pci: avoid the deepest sleep state on
Kingston A2000 SSDs"), i.e. a complete NVME lockup that needed
cold boot to get it back.

According to some sources, the A2000 is simply a rebadged
SKC2000 with a slightly optimized firmware.

Adding the SKC2000 PCI ID to the quirk list with the same workaround
as the A2000 made my laptop survive a 5 hours long Yocto bootstrap
buildfest which reliably triggered the SSD lockup previously.

Signed-off-by: Zoltán Böszörményi <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>

nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.

The kernel fails to fully detect these SSDs, only the character devices
are present:

[   10.785605] nvme nvme0: pci function 0000:04:00.0
[   10.876787] nvme nvme1: pci function 0000:81:00.0
[   13.198614] nvme nvme0: missing or invalid SUBNQN field.
[   13.198658] nvme nvme1: missing or invalid SUBNQN field.
[   13.206896] nvme nvme0: Shutdown timeout set to 20 seconds
[   13.215035] nvme nvme1: Shutdown timeout set to 20 seconds
[   13.225407] nvme nvme0: 16/0/0 default/read/poll queues
[   13.233602] nvme nvme1: 16/0/0 default/read/poll queues
[   13.239627] nvme nvme0: Identify Descriptors failed (8194)
[   13.246315] nvme nvme1: Identify Descriptors failed (8194)

Adding the NVME_QUIRK_NO_NS_DESC_LIST fixes this problem.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205679
Signed-off-by: Julian Einwag <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>

Merge tag 'drm-fixes-2021-03-05' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"More may show up but this is what I have at this stage: just a single
  nouveau regression fix, and a bunch of amdgpu fixes.

  amdgpu:
   - S0ix fix
   - Handle new NV12 SKU
   - Misc power fixes
   - Display uninitialized value fix
   - PCIE debugfs register access fix

  nouveau:
   - regression fix for gk104"

* tag 'drm-fixes-2021-03-05' of git://anongit.freedesktop.org/drm/drm:
  drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie
  drm/amd/display: fix the return of the uninitialized value in ret
  drm/amdgpu: enable BACO runpm by default on sienna cichlid and navy flounder
  drm/amd/pm: correct Arcturus mmTHM_BACO_CNTL register address
  drm/amdgpu/swsmu/vangogh: Only use RLCPowerNotify msg for disable
  drm/amdgpu/pm: make unsupported power profile messages debug
  drm/amdgpu:disable VCN for Navi12 SKU
  drm/amdgpu: Only check for S0ix if AMD_PMC is configured
  drm/nouveau/fifo/gk104-gp1xx: fix creation of sw class

bpf: Explicitly zero-extend R0 after 32-bit cmpxchg

As pointed out by Ilya and explained in the new comment, there's a
discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads
the value from memory into r0, while x86 only does so when r0 and the
value in memory are different. The same issue affects s390.

At first this might sound like pure semantics, but it makes a real
difference when the comparison is 32-bit, since the load will
zero-extend r0/rax.

The fix is to explicitly zero-extend rax after doing such a
CMPXCHG. Since this problem affects multiple archs, this is done in
the verifier by patching in a BPF_ZEXT_REG instruction after every
32-bit cmpxchg. Any archs that don't need such manual zero-extension
can do a look-ahead with insn_is_zext to skip the unnecessary mov.

Note this still goes on top of Ilya's patch:

https://lore.kernel.org/bpf/20210301154019 [email protected]/T/#u

Differences v5->v6[1]:
- Moved is_cmpxchg_insn and ensured it can be safely re-used. Also renamed it
   and removed 'inline' to match the style of the is_*_function helpers.
- Fixed up comments in verifier test (thanks for the careful review, Martin!)

Differences v4->v5[1]:
- Moved the logic entirely into opt_subreg_zext_lo32_rnd_hi32, thanks to Martin
   for suggesting this.

Differences v3->v4[1]:
- Moved the optimization against pointless zext into the correct place:
   opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls.

Differences v2->v3[1]:
- Moved patching into fixup_bpf_calls (patch incoming to rename this function)
- Added extra commentary on bpf_jit_needs_zext
- Added check to avoid adding a pointless zext(r0) if there's already one there.

Difference v1->v2[1]: Now solved centrally in the verifier instead of
  specifically for the x86 JIT. Thanks to Ilya and Daniel for the suggestions!

[1] v5: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t
    v4: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t
    v3: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
    v2: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
    v1: https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@iogearbox.net/T/#t

Reported-by: Ilya Leoshkevich <[email protected]>
Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg")
Signed-off-by: Brendan Jackman <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Acked-by: Ilya Leoshkevich <[email protected]>
Tested-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>

Merge tag 'mkp-scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi

Pull iSCSI fixes from Martin Petersen:
"Three fixes for missed iSCSI verification checks (and make the sysfs
  files use "sysfs_emit()" - that's what it is there for)"

* tag 'mkp-scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
  scsi: iscsi: Verify lengths on passthrough PDUs
  scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE
  scsi: iscsi: Restrict sessions and handles to admin capabilities

Merge tag 'amd-drm-fixes-5.12-2021-03-03' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.12-2021-03-03:

amdgpu:
- S0ix fix
- Handle new NV12 SKU
- Misc power fixes
- Display uninitialized value fix
- PCIE debugfs register access fix

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge branch '00.00-inst' of git://github.com/skeggsb/linux into drm-fixes

A single regression fix here that I noticed while testing a bunch of
boards for something else, not sure where this got lost! Prevents 3D
driver from initialising on some GPUs.

Signed-off-by: Dave Airlie <[email protected]>
From: Ben Skeggs <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv5gmq14BrDmkMncfd=tHVSSaU89BdBEWfs6Jy-aRz03GQ@mail.gmail.com

scsi: iscsi: Verify lengths on passthrough PDUs

Open-iSCSI sends passthrough PDUs over netlink, but the kernel should be
verifying that the provided PDU header and data lengths fall within the
netlink message to prevent accessing beyond that in memory.

Cc: [email protected]
Reported-by: Adam Nichols <[email protected]>
Reviewed-by: Lee Duncan <[email protected]>
Reviewed-by: Mike Christie <[email protected]>
Signed-off-by: Chris Leech <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE

As the iSCSI parameters are exported back through sysfs, it should be
enforcing that they never are more than PAGE_SIZE (which should be more
than enough) before accepting updates through netlink.

Change all iSCSI sysfs attributes to use sysfs_emit().

Cc: [email protected]
Reported-by: Adam Nichols <[email protected]>
Reviewed-by: Lee Duncan <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Mike Christie <[email protected]>
Signed-off-by: Chris Leech <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

scsi: iscsi: Restrict sessions and handles to admin capabilities

Protect the iSCSI transport handle, available in sysfs, by requiring
CAP_SYS_ADMIN to read it. Also protect the netlink socket by restricting
reception of messages to ones sent with CAP_SYS_ADMIN. This disables
normal users from being able to end arbitrary iSCSI sessions.

Cc: [email protected]
Reported-by: Adam Nichols <[email protected]>
Reviewed-by: Chris Leech <[email protected]>
Reviewed-by: Mike Christie <[email protected]>
Signed-off-by: Lee Duncan <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

cipso,calipso: resolve a number of problems with the DOI refcounts

The current CIPSO and CALIPSO refcounting scheme for the DOI
definitions is a bit flawed in that we:

1. Don't correctly match gets/puts in netlbl_cipsov4_list().
2. Decrement the refcount on each attempt to remove the DOI from the
   DOI list, only removing it from the list once the refcount drops
   to zero.

This patch fixes these problems by adding the missing "puts" to
netlbl_cipsov4_list() and introduces a more conventional, i.e.
not-buggy, refcounting mechanism to the DOI definitions.  Upon the
addition of a DOI to the DOI list, it is initialized with a refcount
of one, removing a DOI from the list removes it from the list and
drops the refcount by one; "gets" and "puts" behave as expected with
respect to refcounts, increasing and decreasing the DOI's refcount by
one.

Fixes: b1edeb102397 ("netlabel: Replace protocol/NetLabel linking with refrerence counts")
Fixes: d7cce01504a0 ("netlabel: Add support for removing a CALIPSO DOI.")
Reported-by: [email protected]
Signed-off-by: Paul Moore <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

kernel: provide create_io_thread() helper

Provide a generic helper for setting up an io_uring worker. Returns a
task_struct so that the caller can do whatever setup is needed, then call
wake_up_new_task() to kick it into gear.

Add a kernel_clone_args member, io_thread, which tells copy_process() to
mark the task with PF_IO_WORKER.

Signed-off-by: Jens Axboe <[email protected]>

io_uring: reliably cancel linked timeouts

Linked timeouts are fired asynchronously (i.e. soft-irq), and use
generic cancellation paths to do its stuff, including poking into io-wq.
The problem is that it's racy to access tctx->io_wq, as
io_uring_task_cancel() and others may be happening at this exact moment.
Mark linked timeouts with REQ_F_INLIFGHT for now, making sure there are
no timeouts before io-wq destraction.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

io_uring: cancel-match based on flags

Instead of going into request internals, like checking req->file->f_op,
do match them based on REQ_F_INFLIGHT, it's set only when we want it to
be reliably cancelled.

Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

ibmvnic: always store valid MAC address

The last change to ibmvnic_set_mac(), 8fc3672a8ad3, meant to prevent
users from setting an invalid MAC address on an ibmvnic interface
that has not been brought up yet. The change also prevented the
requested MAC address from being stored by the adapter object for an
ibmvnic interface when the state of the ibmvnic interface is
VNIC_PROBED - that is after probing has finished but before the
ibmvnic interface is brought up. The MAC address stored by the
adapter object is used and sent to the hypervisor for checking when
an ibmvnic interface is brought up.

The ibmvnic driver ignoring the requested MAC address when in
VNIC_PROBED state caused LACP bonds (bonds in 802.3ad mode) with more
than one slave to malfunction. The bonding code must be able to
change the MAC address of its slaves before they are brought up
during enslaving. The inability of kernels with 8fc3672a8ad3 to set
the MAC addresses of bonding slaves is observable in the output of
"ip address show". The MAC addresses of the slaves are the same as
the MAC address of the bond on a working system whereas the slaves
retain their original MAC addresses on a system with a malfunctioning
LACP bond.

Fixes: 8fc3672a8ad3 ("ibmvnic: fix ibmvnic_set_mac")
Signed-off-by: Jiri Wiesner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

netdevsim: init u64 stats for 32bit hardware

Init the u64 stats in order to avoid the lockdep prints on the 32bit
hardware like

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 4695 Comm: syz-executor.0 Not tainted 5.11.0-rc5-syzkaller #0
Hardware name: ARM-Versatile Express
Backtrace:
[<826fc5b8>] (dump_backtrace) from [<826fc82c>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
[<826fc814>] (show_stack) from [<8270d1f8>] (__dump_stack lib/dump_stack.c:79 [inline])
[<826fc814>] (show_stack) from [<8270d1f8>] (dump_stack+0xa8/0xc8 lib/dump_stack.c:120)
[<8270d150>] (dump_stack) from [<802bf9c0>] (assign_lock_key kernel/locking/lockdep.c:935 [inline])
[<8270d150>] (dump_stack) from [<802bf9c0>] (register_lock_class+0xabc/0xb68 kernel/locking/lockdep.c:1247)
[<802bef04>] (register_lock_class) from [<802baa2c>] (__lock_acquire+0x84/0x32d4 kernel/locking/lockdep.c:4711)
[<802ba9a8>] (__lock_acquire) from [<802be840>] (lock_acquire.part.0+0xf0/0x554 kernel/locking/lockdep.c:5442)
[<802be750>] (lock_acquire.part.0) from [<802bed10>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5415)
[<802beca4>] (lock_acquire) from [<81560548>] (seqcount_lockdep_reader_access include/linux/seqlock.h:103 [inline])
[<802beca4>] (lock_acquire) from [<81560548>] (__u64_stats_fetch_begin include/linux/u64_stats_sync.h:164 [inline])
[<802beca4>] (lock_acquire) from [<81560548>] (u64_stats_fetch_begin include/linux/u64_stats_sync.h:175 [inline])
[<802beca4>] (lock_acquire) from [<81560548>] (nsim_get_stats64+0xdc/0xf0 drivers/net/netdevsim/netdev.c:70)
[<8156046c>] (nsim_get_stats64) from [<81e2efa0>] (dev_get_stats+0x44/0xd0 net/core/dev.c:10405)
[<81e2ef5c>] (dev_get_stats) from [<81e53204>] (rtnl_fill_stats+0x38/0x120 net/core/rtnetlink.c:1211)
[<81e531cc>] (rtnl_fill_stats) from [<81e59d58>] (rtnl_fill_ifinfo+0x6d4/0x148c net/core/rtnetlink.c:1783)
[<81e59684>] (rtnl_fill_ifinfo) from [<81e5ceb4>] (rtmsg_ifinfo_build_skb+0x9c/0x108 net/core/rtnetlink.c:3798)
[<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3830 [inline])
[<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3821 [inline])
[<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo+0x44/0x70 net/core/rtnetlink.c:3839)
[<81e5d068>] (rtmsg_ifinfo) from [<81e45c2c>] (register_netdevice+0x664/0x68c net/core/dev.c:10103)
[<81e455c8>] (register_netdevice) from [<815608bc>] (nsim_create+0xf8/0x124 drivers/net/netdevsim/netdev.c:317)
[<815607c4>] (nsim_create) from [<81561184>] (__nsim_dev_port_add+0x108/0x188 drivers/net/netdevsim/dev.c:941)
[<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_port_add_all drivers/net/netdevsim/dev.c:990 [inline])
[<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_probe+0x5cc/0x750 drivers/net/netdevsim/dev.c:1119)
[<81561b0c>] (nsim_dev_probe) from [<815661dc>] (nsim_bus_probe+0x10/0x14 drivers/net/netdevsim/bus.c:287)
[<815661cc>] (nsim_bus_probe) from [<811724c0>] (really_probe+0x100/0x50c drivers/base/dd.c:554)
[<811723c0>] (really_probe) from [<811729c4>] (driver_probe_device+0xf8/0x1c8 drivers/base/dd.c:740)
[<811728cc>] (driver_probe_device) from [<81172fe4>] (__device_attach_driver+0x8c/0xf0 drivers/base/dd.c:846)
[<81172f58>] (__device_attach_driver) from [<8116fee0>] (bus_for_each_drv+0x88/0xd8 drivers/base/bus.c:431)
[<8116fe58>] (bus_for_each_drv) from [<81172c6c>] (__device_attach+0xdc/0x1d0 drivers/base/dd.c:914)
[<81172b90>] (__device_attach) from [<8117305c>] (device_initial_probe+0x14/0x18 drivers/base/dd.c:961)
[<81173048>] (device_initial_probe) from [<81171358>] (bus_probe_device+0x90/0x98 drivers/base/bus.c:491)
[<811712c8>] (bus_probe_device) from [<8116e77c>] (device_add+0x320/0x824 drivers/base/core.c:3109)
[<8116e45c>] (device_add) from [<8116ec9c>] (device_register+0x1c/0x20 drivers/base/core.c:3182)
[<8116ec80>] (device_register) from [<81566710>] (nsim_bus_dev_new drivers/net/netdevsim/bus.c:336 [inline])
[<8116ec80>] (device_register) from [<81566710>] (new_device_store+0x178/0x208 drivers/net/netdevsim/bus.c:215)
[<81566598>] (new_device_store) from [<8116fcb4>] (bus_attr_store+0x2c/0x38 drivers/base/bus.c:122)
[<8116fc88>] (bus_attr_store) from [<805b4b8c>] (sysfs_kf_write+0x48/0x54 fs/sysfs/file.c:139)
[<805b4b44>] (sysfs_kf_write) from [<805b3c90>] (kernfs_fop_write_iter+0x128/0x1ec fs/kernfs/file.c:296)
[<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (call_write_iter include/linux/fs.h:1901 [inline])
[<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (new_sync_write fs/read_write.c:518 [inline])
[<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (vfs_write+0x3dc/0x57c fs/read_write.c:605)
[<804d1f20>] (vfs_write) from [<804d2604>] (ksys_write+0x68/0xec fs/read_write.c:658)
[<804d259c>] (ksys_write) from [<804d2698>] (__do_sys_write fs/read_write.c:670 [inline])
[<804d259c>] (ksys_write) from [<804d2698>] (sys_write+0x10/0x14 fs/read_write.c:667)
[<804d2688>] (sys_write) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64)

Fixes: 83c9e13aa39a ("netdevsim: add software driver for testing offloads")
Reported-by: [email protected]
Tested-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Hillf Danton <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mptcp-fixes'

Mat Martineau says:

====================
mptcp: Fixes for v5.12

These patches from the MPTCP tree fix a few multipath TCP issues:

Patches 1 and 5 clear some stale pointers when subflows close.

Patches 2, 4, and 9 plug some memory leaks.

Patch 3 fixes a memory accounting error identified by syzkaller.

Patches 6 and 7 fix a race condition that slowed data transmission.

Patch 8 adds missing wakeups when write buffer space is freed.
====================

Signed-off-by: David S. Miller <[email protected]>

mptcp: free resources when the port number is mismatched

When the port number is mismatched with the announced ones, use
'goto dispose_child' to free the resources instead of using 'goto out'.

This patch also moves the port number checking code in
subflow_syn_recv_sock before mptcp_finish_join, otherwise subflow_drop_ctx
will fail in dispose_child.

Fixes: 5bc56388c74f ("mptcp: add port number check for MP_JOIN")
Reported-by: Paolo Abeni <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: fix missing wakeup

__mptcp_clean_una() can free write memory and should wake-up
user-space processes when needed.

When such function is invoked by the MPTCP receive path, the wakeup
is not needed, as the TCP stack will later trigger subflow_write_space
which will do the wakeup as needed.

Other __mptcp_clean_una() call sites need an additional wakeup check
Let's bundle the relevant code in a new helper and use it.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/165
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks")
Fixes: 64b9cea7a0af ("mptcp: fix spurious retransmissions")
Tested-by: Matthieu Baerts <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: fix race in release_cb

If we receive a MPTCP_PUSH_PENDING even from a subflow when
mptcp_release_cb() is serving the previous one, the latter
will be delayed up to the next release_sock(msk).

Address the issue implementing a test/serve loop for such
event.

Additionally rename the push helper to __mptcp_push_pending()
to be more consistent with the existing code.

Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks")
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: factor out __mptcp_retrans helper()

Will simplify the following patch, no functional change
intended.

Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: reset 'first' and ack_hint on subflow close

Just like with last_snd, we have to NULL 'first' on subflow close.

ack_hint isn't strictly required (its never dereferenced), but better to
clear this explicitly as well instead of making it an exception.

msk->first is dereferenced unconditionally at accept time, but
at that point the ssk is not on the conn_list yet -- this means
worker can't see it when iterating the conn_list.

Reported-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: dispose initial struct socket when its subflow is closed

Christoph Paasch reported following crash:
dst_release underflow
WARNING: CPU: 0 PID: 1319 at net/core/dst.c:175 dst_release+0xc1/0xd0 net/core/dst.c:175
CPU: 0 PID: 1319 Comm: syz-executor217 Not tainted 5.11.0-rc6af8e85128b4d0d24083c5cac646e891227052e0c #70
Call Trace:
rt_cache_route+0x12e/0x140 net/ipv4/route.c:1503
rt_set_nexthop.constprop.0+0x1fc/0x590 net/ipv4/route.c:1612
__mkroute_output net/ipv4/route.c:2484 [inline]
...

The worker leaves msk->subflow alone even when it
happened to close the subflow ssk associated with it.

Fixes: 866f26f2a9c33b ("mptcp: always graft subflow socket to parent")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/157
Reported-by: Christoph Paasch <[email protected]>
Suggested-by: Paolo Abeni <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: fix memory accounting on allocation error

In case of memory pressure the MPTCP xmit path keeps
at most a single skb in the tx cache, eventually freeing
additional ones.

The associated counter for forward memory is not update
accordingly, and that causes the following splat:

WARNING: CPU: 0 PID: 12 at net/core/stream.c:208 sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208
Modules linked in:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.11.0-rc2 #59
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208
Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 63 01 00 00 8b ab 00 01 00 00 e9 60 ff ff ff e8 2f 24 d3 fe 0f 0b eb 97 e8 26 24 d3 fe <0f> 0b eb a0 e8 1d 24 d3 fe 0f 0b e9 a5 fe ff ff 4c 89 e7 e8 0e d0
RSP: 0018:ffffc900000c7bc8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88810030ac40 RSI: ffffffff8262ca4a RDI: 0000000000000003
RBP: 0000000000000d00 R08: 0000000000000000 R09: ffffffff85095aa7
R10: ffffffff8262c9ea R11: 0000000000000001 R12: ffff888108908100
R13: ffffffff85095aa0 R14: ffffc900000c7c48 R15: 1ffff92000018f85
FS: 0000000000000000(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa7444baef8 CR3: 0000000035ee9005 CR4: 0000000000170ef0
Call Trace:
__mptcp_destroy_sock+0x4a7/0x6c0 net/mptcp/protocol.c:2547
mptcp_worker+0x7dd/0x1610 net/mptcp/protocol.c:2272
process_one_work+0x896/0x1170 kernel/workqueue.c:2275
worker_thread+0x605/0x1350 kernel/workqueue.c:2421
kthread+0x344/0x410 kernel/kthread.c:292
ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296

At close time, as reported by syzkaller/Christoph.

This change address the issue properly updating the fwd
allocated memory counter in the error path.

Reported-by: Christoph Paasch <[email protected]>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/136
Fixes: 724cfd2ee8aa ("mptcp: allocate TX skbs in msk context")
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: put subflow sock on connect error

mptcp_add_pending_subflow() performs a sock_hold() on the subflow,
then adds the subflow to the join list.

Without a sock_put the subflow sk won't be freed in case connect() fails.

unreferenced object 0xffff88810c03b100 (size 3000):
[..]
    sk_prot_alloc.isra.0+0x2f/0x110
    sk_alloc+0x5d/0xc20
    inet6_create+0x2b7/0xd30
    __sock_create+0x17f/0x410
    mptcp_subflow_create_socket+0xff/0x9c0
    __mptcp_subflow_connect+0x1da/0xaf0
    mptcp_pm_nl_work+0x6e0/0x1120
    mptcp_worker+0x508/0x9a0

Fixes: 5b950ff4331ddda ("mptcp: link MPC subflow into msk only after accept")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

mptcp: reset last_snd on subflow close

Send logic caches last active subflow in the msk, so it needs to be
cleared when the cached subflow is closed.

Fixes: d5f49190def61c ("mptcp: allow picking different xmit subflows")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/155
Reported-by: Christoph Paasch <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sched: avoid duplicates in classes dump

This is a follow up of commit ea3274695353 ("net: sched: avoid
duplicates in qdisc dump") which has fixed the issue only for the qdisc
dump.

The duplicate printing also occurs when dumping the classes via
tc class show dev eth0

Fixes: 59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable")
Signed-off-by: Maximilian Heyne <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: usb: qmi_wwan: allow qmimux add/del with master up

There's no reason for preventing the creation and removal
of qmimux network interfaces when the underlying interface
is up.

This makes qmi_wwan mux implementation more similar to the
rmnet one, simplifying userspace management of the same
logical interfaces.

Fixes: c6adf77953bc ("net: usb: qmi_wwan: add qmap mux protocol support")
Reported-by: Aleksander Morgado <[email protected]>
Signed-off-by: Daniele Palmas <[email protected]>
Acked-by: Bjørn Mork <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: sja1105: fix ucast/bcast flooding always remaining enabled

In the blamed patch I managed to introduce a bug while moving code
around: the same logic is applied to the ucast_egress_floods and
bcast_egress_floods variables both on the "if" and the "else" branches.

This is clearly an unintended change compared to how the code used to be
prior to that bugfix, so restore it.

Fixes: 7f7ccdea8c73 ("net: dsa: sja1105: fix leakage of flooded frames outside bridging domain")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10

When using MLO_AN_PHY or MLO_AN_FIXED, the MII_BMCR of the SGMII PCS is
read before resetting the switch so it can be reprogrammed afterwards.
This works for the speeds of 1Gbps and 100Mbps, but not for 10Mbps,
because SPEED_10 is actually 0, so AND-ing anything with 0 is false,
therefore that last branch is dead code.

Do what others do (genphy_read_status_fixed, phy_mii_ioctl) and just
remove the check for SPEED_10, let it fall into the default case.

Fixes: ffe10e679cec ("net: dsa: sja1105: Add support for the SGMII port")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: mscc: ocelot: properly reject destination IP keys in VCAP IS1

An attempt is made to warn the user about the fact that VCAP IS1 cannot
offload keys matching on destination IP (at least given the current half
key format), but sadly that warning fails miserably in practice, due to
the fact that it operates on an uninitialized "match" variable. We must
first decode the keys from the flow rule.

Fixes: 75944fda1dfe ("net: mscc: ocelot: offload ingress skbedit and vlan actions to VCAP IS1")
Reported-by: Colin Ian King <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Colin Ian King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'nexthop-blackhole'

Ido Schimmel says:

====================
nexthop: Do not flush blackhole nexthops when loopback goes down

Patch #1 prevents blackhole nexthops from being flushed when the
loopback device goes down given that as far as user space is concerned,
these nexthops do not have a nexthop device.

Patch #2 adds a test case.

There are no regressions in fib_nexthops.sh with this change:

# ./fib_nexthops.sh
...
Tests passed: 165
Tests failed: 0
====================

Signed-off-by: David S. Miller <[email protected]>

selftests: fib_nexthops: Test blackhole nexthops when loopback goes down

Test that blackhole nexthops are not flushed when the loopback device
goes down.

Output without previous patch:

# ./fib_nexthops.sh -t basic

Basic functional tests
----------------------
TEST: List with nothing defined                                     [ OK ]
TEST: Nexthop get on non-existent id                                [ OK ]
TEST: Nexthop with no device or gateway                             [ OK ]
TEST: Nexthop with down device                                      [ OK ]
TEST: Nexthop with device that is linkdown                          [ OK ]
TEST: Nexthop with device only                                      [ OK ]
TEST: Nexthop with duplicate id                                     [ OK ]
TEST: Blackhole nexthop                                             [ OK ]
TEST: Blackhole nexthop with other attributes                       [ OK ]
TEST: Blackhole nexthop with loopback device down                   [FAIL]
TEST: Create group                                                  [ OK ]
TEST: Create group with blackhole nexthop                           [FAIL]
TEST: Create multipath group where 1 path is a blackhole            [ OK ]
TEST: Multipath group can not have a member replaced by blackhole   [ OK ]
TEST: Create group with non-existent nexthop                        [ OK ]
TEST: Create group with same nexthop multiple times                 [ OK ]
TEST: Replace nexthop with nexthop group                            [ OK ]
TEST: Replace nexthop group with nexthop                            [ OK ]
TEST: Nexthop group and device                                      [ OK ]
TEST: Test proto flush                                              [ OK ]
TEST: Nexthop group and blackhole                                   [ OK ]

Tests passed:  19
Tests failed:   2

Output with previous patch:

# ./fib_nexthops.sh -t basic

Basic functional tests
----------------------
TEST: List with nothing defined                                     [ OK ]
TEST: Nexthop get on non-existent id                                [ OK ]
TEST: Nexthop with no device or gateway                             [ OK ]
TEST: Nexthop with down device                                      [ OK ]
TEST: Nexthop with device that is linkdown                          [ OK ]
TEST: Nexthop with device only                                      [ OK ]
TEST: Nexthop with duplicate id                                     [ OK ]
TEST: Blackhole nexthop                                             [ OK ]
TEST: Blackhole nexthop with other attributes                       [ OK ]
TEST: Blackhole nexthop with loopback device down                   [ OK ]
TEST: Create group                                                  [ OK ]
TEST: Create group with blackhole nexthop                           [ OK ]
TEST: Create multipath group where 1 path is a blackhole            [ OK ]
TEST: Multipath group can not have a member replaced by blackhole   [ OK ]
TEST: Create group with non-existent nexthop                        [ OK ]
TEST: Create group with same nexthop multiple times                 [ OK ]
TEST: Replace nexthop with nexthop group                            [ OK ]
TEST: Replace nexthop group with nexthop                            [ OK ]
TEST: Nexthop group and device                                      [ OK ]
TEST: Test proto flush                                              [ OK ]
TEST: Nexthop group and blackhole                                   [ OK ]

Tests passed:  21
Tests failed:   0

Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nexthop: Do not flush blackhole nexthops when loopback goes down

As far as user space is concerned, blackhole nexthops do not have a
nexthop device and therefore should not be affected by the
administrative or carrier state of any netdev.

However, when the loopback netdev goes down all the blackhole nexthops
are flushed. This happens because internally the kernel associates
blackhole nexthops with the loopback netdev.

This behavior is both confusing to those not familiar with kernel
internals and also diverges from the legacy API where blackhole IPv4
routes are not flushed when the loopback netdev goes down:

# ip route add blackhole 198.51.100.0/24
# ip link set dev lo down
# ip route show 198.51.100.0/24
blackhole 198.51.100.0/24

Blackhole IPv6 routes are flushed, but at least user space knows that
they are associated with the loopback netdev:

# ip -6 route show 2001:db8:1::/64
blackhole 2001:db8:1::/64 dev lo metric 1024 pref medium

Fix this by only flushing blackhole nexthops when the loopback netdev is
unregistered.

Fixes: ab84be7e54fc ("net: Initial nexthop code")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-by: Donald Sharp <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: sctp: trivial: fix typo in comment

Fix typo of 'overflow' for comment in sctp_tsnmap_check().

Reported-by: Gustavo A. R. Silva <[email protected]>
Signed-off-by: Drew Fustini <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-03-03

This series contains updates to ixgbe and ixgbevf drivers.

Bartosz Golaszewski does not error on -ENODEV from ixgbe_mii_bus_init()
as this is valid for some devices with a shared bus for ixgbe.

Antony Antony adds a check to fail for non transport mode SA with
offload as this is not supported for ixgbe and ixgbevf.

Dinghao Liu fixes a memory leak on failure to program a perfect filter
for ixgbe.
====================

Signed-off-by: David S. Miller <[email protected]>

Merge tag 'tpmdd-next-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm fixes Jarkko Sakkinen:
"Three fixes for rc2"

* tag 'tpmdd-next-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm: Remove unintentional dump_stack() call
  tpm, tpm_tis: Decorate tpm_tis_gen_interrupt() with request_locality()
  tpm, tpm_tis: Decorate tpm_get_timeouts() with request_locality()

dm verity: fix FEC for RS roots unaligned to block size

Optional Forward Error Correction (FEC) code in dm-verity uses
Reed-Solomon code and should support roots from 2 to 24.

The error correction parity bytes (of roots lengths per RS block) are
stored on a separate device in sequence without any padding.

Currently, to access FEC device, the dm-verity-fec code uses dm-bufio
client with block size set to verity data block (usually 4096 or 512
bytes).

Because this block size is not divisible by some (most!) of the roots
supported lengths, data repair cannot work for partially stored parity
bytes.

This fix changes FEC device dm-bufio block size to "roots << SECTOR_SHIFT"
where we can be sure that the full parity data is always available.
(There cannot be partial FEC blocks because parity must cover whole
sectors.)

Because the optional FEC starting offset could be unaligned to this
new block size, we have to use dm_bufio_set_sector_offset() to
configure it.

The problem is easily reproduced using veritysetup, e.g. for roots=13:

  # create verity device with RS FEC
  dd if=/dev/urandom of=data.img bs=4096 count=8 status=none
  veritysetup format data.img hash.img --fec-device=fec.img --fec-roots=13 | awk '/^Root hash/{ print $3 }' >roothash

  # create an erasure that should be always repairable with this roots setting
  dd if=/dev/zero of=data.img conv=notrunc bs=1 count=8 seek=4088 status=none

  # try to read it through dm-verity
  veritysetup open data.img test hash.img --fec-device=fec.img --fec-roots=13 $(cat roothash)
  dd if=/dev/mapper/test of=/dev/null bs=4096 status=noxfer
  # wait for possible recursive recovery in kernel
  udevadm settle
  veritysetup close test

With this fix, errors are properly repaired.
  device-mapper: verity-fec: 7:1: FEC 0: corrected 8 errors
  ...

Without it, FEC code usually ends on unrecoverable failure in RS decoder:
  device-mapper: verity-fec: 7:1: FEC 0: failed to correct: -74
  ...

This problem is present in all kernels since the FEC code's
introduction (kernel 4.5).

It is thought that this problem is not visible in Android ecosystem
because it always uses a default RS roots=2.

Depends-on: a14e5ec66a7a ("dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size")
Signed-off-by: Milan Broz <[email protected]>
Tested-by: Jérôme Carretero <[email protected]>
Reviewed-by: Sami Tolvanen <[email protected]>
Cc: [email protected] # 4.5+
Signed-off-by: Mike Snitzer <[email protected]>

dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size

dm_bufio_get_device_size returns the device size in blocks. Before
returning the value, we must subtract the nubmer of starting
sectors. The number of starting sectors may not be divisible by block
size.

Note that currently, no target is using dm_bufio_set_sector_offset and
dm_bufio_get_device_size simultaneously, so this change has no effect.
However, an upcoming dm-verity-fec fix needs this change.

Signed-off-by: Mikulas Patocka <[email protected]>
Reviewed-by: Milan Broz <[email protected]>
Cc: [email protected]
Signed-off-by: Mike Snitzer <[email protected]>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:

- Doc fixes

- selftests fixes

- Add runstate information to the new Xen support

- Allow compiling out the Xen interface

- 32-bit PAE without EPT bugfix

- NULL pointer dereference bugfix

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Clear the CR4 register on reset
  KVM: x86/xen: Add support for vCPU runstate information
  KVM: x86/xen: Fix return code when clearing vcpu_info and vcpu_time_info
  selftests: kvm: Mmap the entire vcpu mmap area
  KVM: Documentation: Fix index for KVM_CAP_PPC_DAWR1
  KVM: x86: allow compiling out the Xen hypercall interface
  KVM: xen: flush deferred static key before checking it
  KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled
  KVM: x86: hyper-v: Fix Hyper-V context null-ptr-deref
  KVM: x86: remove misplaced comment on active_mmu_pages
  KVM: Documentation: rectify rst markup in kvm_run->flags
  Documentation: kvm: fix messy conversion from .txt to .rst

Merge tag 'for-linus-5.12b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
"Two security issues (XSA-367 and XSA-369)"

* tag 'for-linus-5.12b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: fix p2m size in dom0 for disabled memory hotplug case
  xen-netback: respect gnttab_map_refs()'s return value
  Xen/gnttab: handle p2m update errors on a per-slot basis

Merge tag 'sound-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Slightly bulky changes are seen at this time, mostly for dealing with
  the messed up Kconfig for ASoC Intel SOF stuff. The driver and its
  code was split to each module per platform now, which is far more
  straightforward. This should cover the randconfig problems, and more
  importantly, improve the actual device handling as well.

  Other than that, nothing particular stands out: the HDMI PCM
  assignment fix for Intel Tigerlake, MIPS n64 error handling fix, and
  the usual suspects, HD-audio / USB-audio quirks"

* tag 'sound-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
  ALSA: hda/realtek: Apply dual codec quirks for MSI Godlike X570 board
  ALSA: hda/realtek: Add quirk for Intel NUC 10
  ALSA: hda/hdmi: let new platforms assign the pcm slot dynamically
  ALSA: hda/realtek: Add quirk for Clevo NH55RZQ
  ALSA: hda: intel-sdw-acpi: add missing include files
  ALSA: hda: move Intel SoundWire ACPI scan to dedicated module
  ASoC: SOF: Intel: SoundWire: simplify Kconfig
  ASoC: SOF: pci: move DSP_CONFIG use to platform-specific drivers
  ASoC: SOF: pci: split PCI into different drivers
  ASoC: SOF: ACPI: avoid reverse module dependency
  ASoC: soc-acpi: allow for partial match in parent name
  ALSA: hda: intel-nhlt: verify config type
  ALSA: hda: fix kernel-doc warnings
  ALSA: usb-audio: Fix Pioneer DJM devices URB_CONTROL request direction to set samplerate
  ALSA: usb-audio: use Corsair Virtuoso mapping for Corsair Virtuoso SE
  ALSA: hda/realtek: Enable headset mic of Acer SWIFT with ALC256
  ALSA: ctxfi: cthw20k2: fix mask on conf to allow 4 bits
  ALSA: usb-audio: Allow modifying parameters with succeeding hw_params calls
  ALSA: usb-audio: Drop bogus dB range in too low level
  ALSA: usb-audio: Don't abort even if the clock rate differs
  ...

ixgbe: Fix memleak in ixgbe_configure_clsu32

When ixgbe_fdir_write_perfect_filter_82599() fails,
input allocated by kzalloc() has not been freed,
which leads to memleak.

Signed-off-by: Dinghao Liu <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

ixgbe: fail to create xfrm offload of IPsec tunnel mode SA

Based on talks and indirect references ixgbe IPsec offlod do not
support IPsec tunnel mode offload. It can only support IPsec transport
mode offload. Now explicitly fail when creating non transport mode SA
with offload to avoid false performance expectations.

Fixes: 63a67fe229ea ("ixgbe: add ipsec offload add and remove SA")
Signed-off-by: Antony Antony <[email protected]>
Acked-by: Shannon Nelson <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>

btrfs: zoned: do not account freed region of read-only block group as zone_unusable

We migrate zone unusable bytes to read-only bytes when a block group is
set to read-only, and account all the free region as bytes_readonly.
Thus, we should not increase block_group->zone_unusable when the block
group is read-only.

Fixes: 169e0da91a21 ("btrfs: zoned: track unusable bytes for zones")
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Naohiro Aota <[email protected]>
Signed-off-by: David Sterba <[email protected]>

btrfs: zoned: use sector_t for zone sectors

We need to use sector_t for zone_sectors, or it would set the zone size
to zero when the size >= 4GB (= 2^24 sectors) by shifting the
zone_sectors value by SECTOR_SHIFT. We're assuming zones sizes up to
8GiB.

Fixes: 5b316468983d ("btrfs: get zone information of zoned block devices")
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Naohiro Aota <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>

bpf: Account for BPF_FETCH in insn_has_def32()

insn_has_def32() returns false for 32-bit BPF_FETCH insns. This makes
adjust_insn_aux_data() incorrectly set zext_dst, as can be seen in [1].
This happens because insn_no_def() does not know about the BPF_FETCH
variants of BPF_STX.

Fix in two steps.

First, replace insn_no_def() with insn_def_regno(), which returns the
register an insn defines. Normally insn_no_def() calls are followed by
insn->dst_reg uses; replace those with the insn_def_regno() return
value.

Second, adjust the BPF_STX special case in is_reg64() to deal with
queries made from opt_subreg_zext_lo32_rnd_hi32(), where the state
information is no longer available. Add a comment, since the purpose
of this special case is not clear at first glance.

[1] https://lore.kernel.org/bpf/20210223150845.1857620 [email protected]/

Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg")
Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Acked-by: Brendan Jackman <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

libbpf: Clear map_info before each bpf_obj_get_info_by_fd

xsk_lookup_bpf_maps, based on prog_fd, looks whether current prog has a
reference to XSKMAP. BPF prog can include insns that work on various BPF
maps and this is covered by iterating through map_ids.

The bpf_map_info that is passed to bpf_obj_get_info_by_fd for filling
needs to be cleared at each iteration, so that it doesn't contain any
outdated fields and that is currently missing in the function of
interest.

To fix that, zero-init map_info via memset before each
bpf_obj_get_info_by_fd call.

Also, since the area of this code is touched, in general strcmp is
considered harmful, so let's convert it to strncmp and provide the
size of the array name for current map_info.

While at it, do s/continue/break/ once we have found the xsks_map to
terminate the search.

Fixes: 5750902a6e9b ("libbpf: proper XSKMAP cleanup")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

samples, bpf: Add missing munmap in xdpsock

We mmap the umem region, but we never munmap it.
Add the missing call at the end of the cleanup.

Fixes: 3945b37a975d ("samples/bpf: use hugepages in xdpsock app")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

xsk: Remove dangling function declaration from header file

xdp_umem_query() is dead for a long time, drop the declaration from
include/linux/netdevice.h

Fixes: c9b47cc1fabc ("xsk: fix bug when trying to use both copy and zero-copy on one queue id")
Signed-off-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]