Git Repo - linux.git/log

sfc: fix devlink info error handling

Avoid early devlink info return if errors arise with MCDI commands
executed for getting the required info from the device. The rationale
is some commands can fail but later ones could still give useful data.
Moreover, some nvram partitions could not be present which needs to be
handled as a non error.

The specific errors are reported through system messages and if any
error appears, it will be reported generically through extack.

Fixes 14743ddd2495 ("sfc: add devlink info support for ef100")
Signed-off-by: Alejandro Lucero <[email protected]>
Acked-by: Martin Habets <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/smc: Reset connection when trying to use SMCRv2 fails.

We found a crash when using SMCRv2 with 2 Mellanox ConnectX-4. It
can be reproduced by:

- smc_run nginx
- smc_run wrk -t 32 -c 500 -d 30 http://<ip>:<port>

BUG: kernel NULL pointer dereference, address: 0000000000000014
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 8000000108713067 P4D 8000000108713067 PUD 151127067 PMD 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 4 PID: 2441 Comm: kworker/4:249 Kdump: loaded Tainted: G        W   E      6.4.0-rc1+ #42
Workqueue: smc_hs_wq smc_listen_work [smc]
RIP: 0010:smc_clc_send_confirm_accept+0x284/0x580 [smc]
RSP: 0018:ffffb8294b2d7c78 EFLAGS: 00010a06
RAX: ffff8f1873238880 RBX: ffffb8294b2d7dc8 RCX: 0000000000000000
RDX: 00000000000000b4 RSI: 0000000000000001 RDI: 0000000000b40c00
RBP: ffffb8294b2d7db8 R08: ffff8f1815c5860c R09: 0000000000000000
R10: 0000000000000400 R11: 0000000000000000 R12: ffff8f1846f56180
R13: ffff8f1815c5860c R14: 0000000000000001 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff8f1aefd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000014 CR3: 00000001027a0001 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  <TASK>
  ? mlx5_ib_map_mr_sg+0xa1/0xd0 [mlx5_ib]
  ? smcr_buf_map_link+0x24b/0x290 [smc]
  ? __smc_buf_create+0x4ee/0x9b0 [smc]
  smc_clc_send_accept+0x4c/0xb0 [smc]
  smc_listen_work+0x346/0x650 [smc]
  ? __schedule+0x279/0x820
  process_one_work+0x1e5/0x3f0
  worker_thread+0x4d/0x2f0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xe5/0x120
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x2c/0x50
  </TASK>

During the CLC handshake, server sequentially tries available SMCRv2
and SMCRv1 devices in smc_listen_work().

If an SMCRv2 device is found. SMCv2 based link group and link will be
assigned to the connection. Then assumed that some buffer assignment
errors happen later in the CLC handshake, such as RMB registration
failure, server will give up SMCRv2 and try SMCRv1 device instead. But
the resources assigned to the connection won't be reset.

When server tries SMCRv1 device, the connection creation process will
be executed again. Since conn->lnk has been assigned when trying SMCRv2,
it will not be set to the correct SMCRv1 link in
smcr_lgr_conn_assign_link(). So in such situation, conn->lgr points to
correct SMCRv1 link group but conn->lnk points to the SMCRv2 link
mistakenly.

Then in smc_clc_send_confirm_accept(), conn->rmb_desc->mr[link->link_idx]
will be accessed. Since the link->link_idx is not correct, the related
MR may not have been initialized, so crash happens.

| Try SMCRv2 device first
|     |-> conn->lgr: assign existed SMCRv2 link group;
|     |-> conn->link: assign existed SMCRv2 link (link_idx may be 1 in SMC_LGR_SYMMETRIC);
|     |-> sndbuf & RMB creation fails, quit;
|
| Try SMCRv1 device then
|     |-> conn->lgr: create SMCRv1 link group and assign;
|     |-> conn->link: keep SMCRv2 link mistakenly;
|     |-> sndbuf & RMB creation succeed, only RMB->mr[link_idx = 0]
|         initialized.
|
| Then smc_clc_send_confirm_accept() accesses
| conn->rmb_desc->mr[conn->link->link_idx, which is 1], then crash.
v

This patch tries to fix this by cleaning conn->lnk before assigning
link. In addition, it is better to reset the connection and clean the
resources assigned if trying SMCRv2 failed in buffer creation or
registration.

Fixes: e49300a6bf62 ("net/smc: add listen processing for SMC-Rv2")
Link: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Wen Gu <[email protected]>
Reviewed-by: Tony Lu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests: fib_tests: mute cleanup error message

In the end of the test, there will be an error message induced by the
`ip netns del ns1` command in cleanup()

  Tests passed: 201
  Tests failed:   0
  Cannot remove namespace file "/run/netns/ns1": No such file or directory

This can even be reproduced with just `./fib_tests.sh -h` as we're
calling cleanup() on exit.

Redirect the error message to /dev/null to mute it.

V2: Update commit message and fixes tag.
V3: resubmit due to missing netdev ML in V2

Fixes: b60417a9f2b8 ("selftest: fib_tests: Always cleanup before exit")
Signed-off-by: Po-Hsu Lin <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx5e: do as little as possible in napi poll when budget is 0

NAPI gets called with budget of 0 from netpoll, which has interrupts
disabled. We should try to free some space on Tx rings and nothing
else.

Specifically do not try to handle XDP TX or try to refill Rx buffers -
we can't use the page pool from IRQ context. Don't check if IRQs moved,
either, that makes no sense in netpoll. Netpoll calls _all_ the rings
from whatever CPU it happens to be invoked on.

In general do as little as possible, the work quickly adds up when
there's tens of rings to poll.

The immediate stack trace I was seeing is:

    __do_softirq+0xd1/0x2c0
    __local_bh_enable_ip+0xc7/0x120
    </IRQ>
    <TASK>
    page_pool_put_defragged_page+0x267/0x320
    mlx5e_free_xdpsq_desc+0x99/0xd0
    mlx5e_poll_xdpsq_cq+0x138/0x3b0
    mlx5e_napi_poll+0xc3/0x8b0
    netpoll_poll_dev+0xce/0x150

AFAIU page pool takes a BH lock, releases it and since BH is now
enabled tries to run softirqs.

Reviewed-by: Tariq Toukan <[email protected]>
Fixes: 60bbf7eeef10 ("mlx5: use page_pool for xdp_return_frame call")
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'tls-fixes'

Jakub Kicinski says:

====================
tls: rx: strp: fix inline crypto offload

The local strparser version I added to TLS does not preserve
decryption status, which breaks inline crypto (NIC offload).
====================

Signed-off-by: David S. Miller <[email protected]>

tls: rx: strp: don't use GFP_KERNEL in softirq context

When receive buffer is small, or the TCP rx queue looks too
complicated to bother using it directly - we allocate a new
skb and copy data into it.

We already use sk->sk_allocation... but nothing actually
sets it to GFP_ATOMIC on the ->sk_data_ready() path.

Users of HW offload are far more likely to experience problems
due to scheduling while atomic. "Copy mode" is very rarely
triggered with SW crypto.

Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tls: rx: strp: preserve decryption status of skbs when needed

When receive buffer is small we try to copy out the data from
TCP into a skb maintained by TLS to prevent connection from
stalling. Unfortunately if a single record is made up of a mix
of decrypted and non-decrypted skbs combining them into a single
skb leads to loss of decryption status, resulting in decryption
errors or data corruption.

Similarly when trying to use TCP receive queue directly we need
to make sure that all the skbs within the record have the same
status. If we don't the mixed status will be detected correctly
but we'll CoW the anchor, again collapsing it into a single paged
skb without decrypted status preserved. So the "fixup" code will
not know which parts of skb to re-encrypt.

Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tls: rx: strp: factor out copying skb data

We'll need to copy input skbs individually in the next patch.
Factor that code out (without assuming we're copying a full record).

Tested-by: Shai Amiram <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tls: rx: strp: fix determining record length in copy mode

We call tls_rx_msg_size(skb) before doing skb->len += chunk.
So the tls_rx_msg_size() code will see old skb->len, most
likely leading to an over-read.

Worst case we will over read an entire record, next iteration
will try to trim the skb but may end up turning frag len negative
or discarding the subsequent record (since we already told TCP
we've read it during previous read but now we'll trim it out of
the skb).

Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tls: rx: strp: force mixed decrypted records into copy mode

If a record is partially decrypted we'll have to CoW it, anyway,
so go into copy mode and allocate a writable skb right away.

This will make subsequent fix simpler because we won't have to
teach tls_strp_msg_make_copy() how to copy skbs while preserving
decrypt status.

Tested-by: Shai Amiram <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tls: rx: strp: set the skb->len of detached / CoW'ed skbs

alloc_skb_with_frags() fills in page frag sizes but does not
set skb->len and skb->data_len. Set those correctly otherwise
device offload will most likely generate an empty skb and
hit the BUG() at the end of __skb_nsg().

Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Shai Amiram <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tls: rx: device: fix checking decryption status

skb->len covers the entire skb, including the frag_list.
In fact we're guaranteed that rxm->full_len <= skb->len,
so since the change under Fixes we were not checking decrypt
status of any skb but the first.

Note that the skb_pagelen() added here may feel a bit costly,
but it's removed by subsequent fixes, anyway.

Reported-by: Tariq Toukan <[email protected]>
Fixes: 86b259f6f888 ("tls: rx: device: bound the frag walk")
Tested-by: Shai Amiram <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: cdc_ncm: Deal with too low values of dwNtbOutMaxSize

Currently in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is lower than
the calculated "min" value, but greater than zero, the logic sets
tx_max to dwNtbOutMaxSize. This is then used to allocate a new SKB in
cdc_ncm_fill_tx_frame() where all the data is handled.

For small values of dwNtbOutMaxSize the memory allocated during
alloc_skb(dwNtbOutMaxSize, GFP_ATOMIC) will have the same size, due to
how size is aligned at alloc time:
size = SKB_DATA_ALIGN(size);
size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
Thus we hit the same bug that we tried to squash with
commit 2be6d4d16a084 ("net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero")

Low values of dwNtbOutMaxSize do not cause an issue presently because at
alloc_skb() time more memory (512b) is allocated than required for the
SKB headers alone (320b), leaving some space (512b - 320b = 192b)
for CDC data (172b).

However, if more elements (for example 3 x u64 = [24b]) were added to
one of the SKB header structs, say 'struct skb_shared_info',
increasing its original size (320b [320b aligned]) to something larger
(344b [384b aligned]), then suddenly the CDC data (172b) no longer
fits in the spare SKB data area (512b - 384b = 128b).

Consequently the SKB bounds checking semantics fails and panics:

skbuff: skb_over_panic: text:ffffffff831f755b len:184 put:172 head:ffff88811f1c6c00 data:ffff88811f1c6c00 tail:0xb8 end:0x80 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:113!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 57 Comm: kworker/0:2 Not tainted 5.15.106-syzkaller-00249-g19c0ed55a470 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023
Workqueue: mld mld_ifc_work
RIP: 0010:skb_panic net/core/skbuff.c:113 [inline]
RIP: 0010:skb_over_panic+0x14c/0x150 net/core/skbuff.c:118
[snip]
Call Trace:
<TASK>
skb_put+0x151/0x210 net/core/skbuff.c:2047
skb_put_zero include/linux/skbuff.h:2422 [inline]
cdc_ncm_ndp16 drivers/net/usb/cdc_ncm.c:1131 [inline]
cdc_ncm_fill_tx_frame+0x11ab/0x3da0 drivers/net/usb/cdc_ncm.c:1308
cdc_ncm_tx_fixup+0xa3/0x100

Deal with too low values of dwNtbOutMaxSize, clamp it in the range
[USB_CDC_NCM_NTB_MIN_OUT_SIZE, CDC_NCM_NTB_MAX_SIZE_TX]. We ensure
enough data space is allocated to handle CDC data by making sure
dwNtbOutMaxSize is not smaller than USB_CDC_NCM_NTB_MIN_OUT_SIZE.

Fixes: 289507d3364f ("net: cdc_ncm: use sysfs for rx/tx aggregation tuning")
Cc: [email protected]
Reported-by: [email protected]
Link: https://syzkaller.appspot.com/bug?extid=b982f1059506db48409d
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Tudor Ambarus <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'nvme-6.4-2023-05-18' of git://git.infradead.org/nvme into block-6.4

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.4

- More device quirks (Sagi, Hristo, Adrian, Daniel)
- Controller delete race (Maurizo)
- Multipath cleanup fix (Christoph)"

* tag 'nvme-6.4-2023-05-18' of git://git.infradead.org/nvme:
  nvme-pci: Add quirk for Teamgroup MP33 SSD
  nvme: do not let the user delete a ctrl before a complete initialization
  nvme-multipath: don't call blk_mark_disk_dead in nvme_mpath_remove_disk
  nvme-pci: clamp max_hw_sectors based on DMA optimized limitation
  nvme-pci: add quirk for missing secondary temperature thresholds
  nvme-pci: add NVME_QUIRK_BOGUS_NID for HS-SSD-FUTURE 2048G

Merge tag 'amd-drm-fixes-6.4-2023-05-18' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.4-2023-05-18:

amdgpu:
- update gfx11 clock counter logic
- Fix a race when disabling gfxoff on gfx10/11 for profiling
- Raven/Raven2/PCO clock counter fix
- Add missing get_vbios_fb_size for GMC 11
- Fix a spurious irq warning in the device remove case
- Fix possible power mode mismatch between driver and PMFW
- USB4 fix

Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'drm-msm-fixes-2023-05-17' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

msm-fixes for v6.4-rc3

Display Fixes:

+ Catalog fixes:
- fix the programmable fetch lines and qos settings of msm8998
   to match what is present downstream
- fix the LM pairs for msm8998 to match what is present downstream.
   The current settings are not right as LMs with incompatible
   connected blocks are paired
- remove unused INTF0 interrupt mask from SM6115/QCM2290 as there
   is no INTF0 present on those chipsets. There is only one DSI on
   index 1
- remove TE2 block from relevant chipsets because this is mainly
   used for ping-pong split feature which is not supported upstream
   and also for the chipsets where we are removing them in this
   change, that block is not present as the tear check has been moved
   to the intf block
- relocate non-MDP_TOP INTF_INTR offsets from dpu_hwio.h to
   dpu_hw_interrupts.c to match where they belong
- fix the indentation for REV_7xxx interrupt masks
- fix the offset and version for dither blocks of SM8[34]50/SC8280XP
   chipsets as it was incorrect
- make the ping-pong blk length 0 for appropriate chipsets as those
   chipsets only have a dither ping-pong dither block but no other
   functionality in the base ping-pong
- remove some duplicate register defines from INTF
+ Fix the log mask for the writeback block so that it can be enabled
  correctly via debugfs
+ unregister the hdmi codec for dp during unbind otherwise it leaks
  audio codec devices
+ Yaml change to fix warnings related to 'qcom,master-dsi' and
  'qcom,sync-dual-dsi'

GPU Fixes:

+ fix submit error path leak
+ arm-smmu-qcom fix for regression that broke per-process page tables
+ fix no-iommu crash

Signed-off-by: Dave Airlie <[email protected]>
From: Rob Clark <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvHEcJfp=k6qatmb_SvAeyvy3CBpaPfwLqtNthuEzA_7w@mail.gmail.com

Merge tag 'drm-intel-fixes-2023-05-17' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

Add missing null check for HDCP code.

Signed-off-by: Dave Airlie <[email protected]>
From: Joonas Lahtinen <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

nvme-pci: Add quirk for Teamgroup MP33 SSD

Add a quirk for Teamgroup MP33 that reports duplicate ids for disk.

Signed-off-by: Daniel Smith <[email protected]>
[kch: patch formatting]
Signed-off-by: Chaitanya Kulkarni <[email protected]>
Tested-by: Daniel Smith <[email protected]>
Signed-off-by: Keith Busch <[email protected]>

Merge tag 'exynos-drm-fixes-for-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes

Just one fixup to exynos_drm_g2d module.
- Fix below build warning by marking them as 'static inline'
drivers/gpu/drm/exynos/exynos_drm_g2d.h:37:5: error: no previous prototype for 'g2d_open'
drivers/gpu/drm/exynos/exynos_drm_g2d.h:42:5: error: no previous prototype for 'g2d_close'

Signed-off-by: Dave Airlie <[email protected]>
From: Inki Dae <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]

Merge tag 'mm-hotfixes-stable-2023-05-18-15-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
"Eight hotfixes. Four are cc:stable, the other four are for post-6.4
  issues, or aren't considered suitable for backporting"

* tag 'mm-hotfixes-stable-2023-05-18-15-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  MAINTAINERS: Cleanup Arm Display IP maintainers
  MAINTAINERS: repair pattern in DIALOG SEMICONDUCTOR DRIVERS
  nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()
  mm: fix zswap writeback race condition
  mm: kfence: fix false positives on big endian
  zsmalloc: move LRU update from zs_map_object() to zs_malloc()
  mm: shrinkers: fix race condition on debugfs cleanup
  maple_tree: make maple state reusable after mas_empty_area()

ASoC: soc-pcm: test if a BE can be prepared

In the BE hw_params configuration, the existing code checks if any of the
existing FEs are prepared, running, paused or suspended - and skips the
configuration in those cases. This allows multiple calls of hw_params
which the ALSA state machine supports.

This check is not handled for the prepare stage, which can lead to the
same BE being prepared multiple times. This patch adds a check similar to
that of the hw_params, with the main difference being that the suspended
state is allowed: the ALSA state machine allows a transition from
suspended to prepared with hw_params skipped.

This problem was detected on Intel IPC4/SoundWire devices, where the BE
dailink .prepare stage is used to configure the SoundWire stream with a
bank switch. Multiple .prepare calls lead to conflicts with the .trigger
operation with IPC4 configurations. This problem was not detected earlier
on Intel devices, HDaudio BE dailinks detect that the link is already
prepared and skip the configuration, and for IPC3 devices there is no BE
trigger.

Link: https://github.com/thesofproject/sof/issues/7596
Signed-off-by: Ranjani Sridharan <[email protected]
Signed-off-by: Bard Liao <[email protected]
Signed-off-by: Pierre-Louis Bossart <[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]

Merge tag 'probes-fixes-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

- Initialize 'ret' local variables on fprobe_handler() to fix the
   smatch warning. With this, fprobe function exit handler is not
   working randomly.

- Fix to use preempt_enable/disable_notrace for rethook handler to
   prevent recursive call of fprobe exit handler (which is based on
   rethook)

- Fix recursive call issue on fprobe_kprobe_handler()

- Fix to detect recursive call on fprobe_exit_handler()

- Fix to make all arch-dependent rethook code notrace (the
   arch-independent code is already notrace)"

* tag 'probes-fixes-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rethook, fprobe: do not trace rethook related functions
  fprobe: add recursion detection in fprobe_exit_handler
  fprobe: make fprobe_kprobe_handler recursion free
  rethook: use preempt_{disable, enable}_notrace in rethook_trampoline_handler
  tracing: fprobe: Initialize ret valiable to fix smatch error

Merge tag 'net-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from can, xfrm, bluetooth and netfilter.

  Current release - regressions:

   - ipv6: fix RCU splat in ipv6_route_seq_show()

   - wifi: iwlwifi: disable RFI feature

  Previous releases - regressions:

   - tcp: fix possible sk_priority leak in tcp_v4_send_reset()

   - tipc: do not update mtu if msg_max is too small in mtu negotiation

   - netfilter: fix null deref on element insertion

   - devlink: change per-devlink netdev notifier to static one

   - phylink: fix ksettings_set() ethtool call

   - wifi: mac80211: fortify the spinlock against deadlock by interrupt

   - wifi: brcmfmac: check for probe() id argument being NULL

   - eth: ice:
      - fix undersized tx_flags variable
      - fix ice VF reset during iavf initialization

   - eth: hns3: fix sending pfc frames after reset issue

  Previous releases - always broken:

   - xfrm: release all offloaded policy memory

   - nsh: use correct mac_offset to unwind gso skb in nsh_gso_segment()

   - vsock: avoid to close connected socket after the timeout

   - dsa: rzn1-a5psw: enable management frames for CPU port

   - eth: virtio_net: fix error unwinding of XDP initialization

   - eth: tun: fix memory leak for detached NAPI queue.

  Misc:

   - MAINTAINERS: sctp: move Neil to CREDITS"

* tag 'net-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
  MAINTAINERS: skip CCing netdev for Bluetooth patches
  mdio_bus: unhide mdio_bus_init prototype
  bridge: always declare tunnel functions
  atm: hide unused procfs functions
  net: isa: include net/Space.h
  Revert "ARM: dts: stm32: add CAN support on stm32f746"
  netfilter: nft_set_rbtree: fix null deref on element insertion
  netfilter: nf_tables: fix nft_trans type confusion
  netfilter: conntrack: define variables exp_nat_nla_policy and any_addr with CONFIG_NF_NAT
  net: wwan: t7xx: Ensure init is completed before system sleep
  net: selftests: Fix optstring
  net: pcs: xpcs: fix C73 AN not getting enabled
  net: wwan: iosm: fix NULL pointer dereference when removing device
  vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit()
  mailmap: add entries for Nikolay Aleksandrov
  igb: fix bit_shift to be in [1..8] range
  net: dsa: mv88e6xxx: Fix mv88e6393x EPC write command offset
  cassini: Fix a memory leak in the error handling path of cas_init_one()
  tun: Fix memory leak for detached NAPI queue.
  can: kvaser_pciefd: Disable interrupts in probe error path
  ...

Merge tag 'media/v6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
"Several fixes for the dvb core and drivers:

   - fix UAF and null pointer de-reference in DVB core

   - fix kernel runtime warning for blocking operation in wait_event*()
     in dvb core

   - fix write size bug in DVB conditional access core

   - fix dvb demux continuity counter debug check logic

   - randconfig build fixes in pvrusb2 and mn88443x

   - fix memory leak in ttusb-dec

   - fix netup_unidvb probe-time error check logic

   - improve error handling in dw2102 if it can't retrieve DVB MAC
     address"

* tag 'media/v6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: dvb-core: Fix use-after-free due to race condition at dvb_ca_en50221
  media: dvb-core: Fix kernel WARNING for blocking operation in wait_event*()
  media: dvb-core: Fix use-after-free due to race at dvb_register_device()
  media: dvb-core: Fix use-after-free due on race condition at dvb_net
  media: dvb-core: Fix use-after-free on race condition at dvb_frontend
  media: mn88443x: fix !CONFIG_OF error by drop of_match_ptr from ID table
  media: ttusb-dec: fix memory leak in ttusb_dec_exit_dvb()
  media: dvb_ca_en50221: fix a size write bug
  media: netup_unidvb: fix irq init by register it at the end of probe
  media: dvb-usb: dw2102: fix uninit-value in su3000_read_mac_address
  media: dvb-usb: digitv: fix null-ptr-deref in digitv_i2c_xfer()
  media: dvb-usb-v2: rtl28xxu: fix null-ptr-deref in rtl28xxu_i2c_xfer
  media: dvb-usb-v2: ce6230: fix null-ptr-deref in ce6230_i2c_master_xfer()
  media: dvb-usb-v2: ec168: fix null-ptr-deref in ec168_i2c_xfer()
  media: dvb-usb: az6027: fix three null-ptr-deref in az6027_i2c_xfer()
  media: netup_unidvb: fix use-after-free at del_timer()
  media: dvb_demux: fix a bug for the continuity counter
  media: pvrusb2: fix DVB_CORE dependency

ublk: fix AB-BA lockdep warning

When handling UBLK_IO_FETCH_REQ, ctx->uring_lock is grabbed first, then
ub->mutex is acquired.

When handling UBLK_CMD_STOP_DEV or UBLK_CMD_DEL_DEV, ub->mutex is
grabbed first, then calling io_uring_cmd_done() for canceling uring
command, in which ctx->uring_lock may be required.

Real deadlock only happens when all the above commands are issued from
same uring context, and in reality different uring contexts are often used
for handing control command and IO command.

Fix the issue by using io_uring_cmd_complete_in_task() to cancel command
in ublk_cancel_dev(ublk_cancel_queue).

Reported-by: Shinichiro Kawasaki <[email protected]>
Closes: https://lore.kernel.org/linux-block/becol2g7sawl4rsjq2dztsbc7mqypfqko6wzsyoyazqydoasml@rcxarzwidrhk
Cc: Ziyang Zhang <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Tested-by: Shinichiro Kawasaki <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

drm/amd/display: enable dpia validate

Use dpia_validate_usb4_bw() function

Fixes: a8b537605e22 ("drm/amd/display: Add function pointer for validate bw usb4")
Reviewed-by: Roman Li <[email protected]>
Reviewed-by: Meenakshikumar Somasundaram <[email protected]>
Acked-by: Aurabindo Pillai <[email protected]>
Signed-off-by: Mustapha Ghaddar <[email protected]>
Tested-by: Daniel Wheeler <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/pm: fix possible power mode mismatch between driver and PMFW

PMFW may boots the ASIC with a different power mode from the system's
real one. Notify PMFW explicitly the power mode the system in. This
is needed only when ACDC switch via gpio is not supported.

Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Kenneth Feng <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged

When performing device unbind or halt, we have disabled all irqs at the
very begining like amdgpu_pci_remove or amdgpu_device_halt. So
amdgpu_irq_put for irqs stored in fence driver should not be called
any more, otherwise, below calltrace will arrive.

[  139.114088] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:616 amdgpu_irq_put+0xf6/0x110 [amdgpu]
[  139.114655] Call Trace:
[  139.114655]  <TASK>
[  139.114657]  amdgpu_fence_driver_hw_fini+0x93/0x130 [amdgpu]
[  139.114836]  amdgpu_device_fini_hw+0xb6/0x350 [amdgpu]
[  139.114955]  amdgpu_driver_unload_kms+0x51/0x70 [amdgpu]
[  139.115075]  amdgpu_pci_remove+0x63/0x160 [amdgpu]
[  139.115193]  ? __pm_runtime_resume+0x64/0x90
[  139.115195]  pci_device_remove+0x3a/0xb0
[  139.115197]  device_remove+0x43/0x70
[  139.115198]  device_release_driver_internal+0xbd/0x140

Signed-off-by: Guchun Chen <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/gmc11: implement get_vbios_fb_size()

Implement get_vbios_fb_size() so we can properly reserve
the vbios splash screen to avoid potential artifacts on the
screen during the transition from the pre-OS console to the
OS console.

Acked-by: Sunil Khatri <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.1.x

drm/amdgpu: Differentiate between Raven2 and Raven/Picasso according to revision id

Due to the raven2 and raven/picasso maybe have the same GC_HWIP version.
So differentiate them by revision id.

Signed-off-by: shanshengwang <[email protected]>
Signed-off-by: Jesse Zhang <[email protected]>
Acked-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu/gfx11: Adjust gfxoff before powergating on gfx11 as well

(Bas: speculative change to mirror gfx10/gfx9)

Signed-off-by: Guilherme G. Piccoli <[email protected]>
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.1.x

drm/amdgpu/gfx10: Disable gfxoff before disabling powergating.

Otherwise we get a full system lock (looks like a FW mess).

Copied the order from the GFX9 powergating code.

Fixes: 366468ff6c34 ("drm/amdgpu: Allow GfxOff on Vangogh as default")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2545
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
Tested-by: Guilherme G. Piccoli <[email protected]>
Cc: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

regulator: mt6359: add read check for PMIC MT6359

Add hardware version read check for PMIC MT6359

Signed-off-by: Sen Chu <[email protected]
Fixes: 4cfc96547512 ("regulator: mt6359: Add support for MT6359P regulator")
Reviewed-by: AngeloGioacchino Del Regno <[email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]

ceph: force updating the msg pointer in non-split case

When the MClientSnap reqeust's op is not CEPH_SNAP_OP_SPLIT the
request may still contain a list of 'split_realms', and we need
to skip it anyway. Or it will be parsed as a corrupt snaptrace.

Cc: [email protected]
Link: https://tracker.ceph.com/issues/61200
Reported-by: Frank Schilder <[email protected]>
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

ceph: silence smatch warning in reconnect_caps_cb()

Smatch static checker warning:

  fs/ceph/mds_client.c:3968 reconnect_caps_cb()
  warn: missing error code here? '__get_cap_for_mds()' failed. 'err' = '0'

[ idryomov: Dan says that Smatch considers it intentional only if the
  "ret = 0;" assignment is within 4 or 5 lines of the goto. ]

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

Merge tag 'linux-can-fixes-for-6.4-20230518' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2023-05-18

this is a pull request of 7 patches for net/master.

The first 6 patches are by Jimmy Assarsson and fix several bugs in the
kvaser_pciefd driver.

The latest patch is from me and reverts a change in stm32f746.dtsi
that causes build errors due to a missing dependent patch.

* tag 'linux-can-fixes-for-6.4-20230518' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  Revert "ARM: dts: stm32: add CAN support on stm32f746"
  can: kvaser_pciefd: Disable interrupts in probe error path
  can: kvaser_pciefd: Do not send EFLUSH command on TFD interrupt
  can: kvaser_pciefd: Empty SRB buffer in probe
  can: kvaser_pciefd: Call request_irq() before enabling interrupts
  can: kvaser_pciefd: Clear listen-only bit if not explicitly requested
  can: kvaser_pciefd: Set CAN_STATE_STOPPED in kvaser_pciefd_stop()
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'nf-23-05-17' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Florian Westphal says:

====================
Netfilter fixes for net

1. Silence warning about unused variable when CONFIG_NF_NAT=n, from Tom Rix.
2. nftables: Fix possible out-of-bounds access, from myself.
3. nftables: fix null deref+UAF during element insertion into rbtree,
   also from myself.

* tag 'nf-23-05-17' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_set_rbtree: fix null deref on element insertion
  netfilter: nf_tables: fix nft_trans type confusion
  netfilter: conntrack: define variables exp_nat_nla_policy and any_addr with CONFIG_NF_NAT
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'wireless-2023-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.4

A lot of fixes this time, for both the stack and the drivers. The
brcmfmac resume fix has been reported by several people so I would say
it's the most important here. The iwlwifi RFI workaround is also
something which was reported as a regression recently.

* tag 'wireless-2023-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (31 commits)
  wifi: b43: fix incorrect __packed annotation
  wifi: rtw88: sdio: Always use two consecutive bytes for word operations
  mac80211_hwsim: fix memory leak in hwsim_new_radio_nl
  wifi: iwlwifi: mvm: Add locking to the rate read flow
  wifi: iwlwifi: Don't use valid_links to iterate sta links
  wifi: iwlwifi: mvm: don't trust firmware n_channels
  wifi: iwlwifi: mvm: fix OEM's name in the tas approved list
  wifi: iwlwifi: fix OEM's name in the ppag approved list
  wifi: iwlwifi: mvm: fix initialization of a return value
  wifi: iwlwifi: mvm: fix access to fw_id_to_mac_id
  wifi: iwlwifi: fw: fix DBGI dump
  wifi: iwlwifi: mvm: fix number of concurrent link checks
  wifi: iwlwifi: mvm: fix cancel_delayed_work_sync() deadlock
  wifi: iwlwifi: mvm: don't double-init spinlock
  wifi: iwlwifi: mvm: always free dup_data
  wifi: mac80211: recalc chanctx mindef before assigning
  wifi: mac80211: consider reserved chanctx for mindef
  wifi: mac80211: simplify chanctx allocation
  wifi: mac80211: Abort running color change when stopping the AP
  wifi: mac80211: fix min center freq offset tracing
  ...
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

MAINTAINERS: skip CCing netdev for Bluetooth patches

As requested by Marcel skip netdev for Bluetooth patches.
Bluetooth has its own mailing list and overloading netdev
leads to fewer people reading it.

Link: https://lore.kernel.org/netdev/[email protected]/
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

mdio_bus: unhide mdio_bus_init prototype

mdio_bus_init() is either used as a local module_init() entry,
or it gets called in phy_device.c. In the former case, there
is no declaration, which causes a warning:

drivers/net/phy/mdio_bus.c:1371:12: error: no previous prototype for 'mdio_bus_init' [-Werror=missing-prototypes]

Remove the #ifdef around the declaration to avoid the warning..

Signed-off-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

bridge: always declare tunnel functions

When CONFIG_BRIDGE_VLAN_FILTERING is disabled, two functions are still
defined but have no prototype or caller. This causes a W=1 warning for
the missing prototypes:

net/bridge/br_netlink_tunnel.c:29:6: error: no previous prototype for 'vlan_tunid_inrange' [-Werror=missing-prototypes]
net/bridge/br_netlink_tunnel.c:199:5: error: no previous prototype for 'br_vlan_tunnel_info' [-Werror=missing-prototypes]

The functions are already contitional on CONFIG_BRIDGE_VLAN_FILTERING,
and I coulnd't easily figure out the right set of #ifdefs, so just
move the declarations out of the #ifdef to avoid the warning,
at a small cost in code size over a more elaborate fix.

Fixes: 188c67dd1906 ("net: bridge: vlan options: add support for tunnel id dumping")
Fixes: 569da0822808 ("net: bridge: vlan options: add support for tunnel mapping set/del")
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

atm: hide unused procfs functions

When CONFIG_PROC_FS is disabled, the function declarations for some
procfs functions are hidden, but the definitions are still build,
as shown by this compiler warning:

net/atm/resources.c:403:7: error: no previous prototype for 'atm_dev_seq_start' [-Werror=missing-prototypes]
net/atm/resources.c:409:6: error: no previous prototype for 'atm_dev_seq_stop' [-Werror=missing-prototypes]
net/atm/resources.c:414:7: error: no previous prototype for 'atm_dev_seq_next' [-Werror=missing-prototypes]

Add another #ifdef to leave these out of the build.

Signed-off-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

net: isa: include net/Space.h

The legacy drivers that still get called from net/Space.c have prototypes
in net/Space, but this header is not included in most of the files that
define those functions:

drivers/net/ethernet/cirrus/cs89x0.c:1649:28: error: no previous prototype for 'cs89x0_probe' [-Werror=missing-prototypes]
drivers/net/ethernet/8390/ne.c:947:28: error: no previous prototype for 'ne_probe' [-Werror=missing-prototypes]
drivers/net/ethernet/8390/smc-ultra.c:167:28: error: no previous prototype for 'ultra_probe' [-Werror=missing-prototypes]
drivers/net/ethernet/amd/lance.c:438:28: error: no previous prototype for 'lance_probe' [-Werror=missing-prototypes]
drivers/net/ethernet/3com/3c515.c:422:20: error: no previous prototype for 'tc515_probe' [-Werror=missing-prototypes]

Add the inclusion to avoids the warnings.

Signed-off-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

MAINTAINERS: Cleanup Arm Display IP maintainers

Some people have moved to different roles and are no longer involved in
the upstream development. As there is only one person left, remove the
mailing list as well as it serves no purpose.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Liviu Dudau <[email protected]>
Acked-by: Brian Starkey <[email protected]>
Cc: Mihail Atanassov <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Joe Perches <[email protected]> # "Please use --order"
Cc: Mihail Atanassov <[email protected]>
Cc: Uwe Kleine-König <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

MAINTAINERS: repair pattern in DIALOG SEMICONDUCTOR DRIVERS

Commit 361104b05684c ("dt-bindings: mfd: Convert da9063 to yaml") converts
da9063.txt to dlg,da9063.yaml and adds a new file pattern in MAINTAINERS.
Unfortunately, the file pattern matches da90*.yaml, but the yaml file is
prefixed with dlg,da90.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken file pattern.

Repair this file pattern in DIALOG SEMICONDUCTOR DRIVERS.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 361104b05684c ("dt-bindings: mfd: Convert da9063 to yaml")
Signed-off-by: Lukas Bulwahn <[email protected]>
Acked-by: Conor Dooley <[email protected]>
Cc: Lee Jones <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()

During unmount process of nilfs2, nothing holds nilfs_root structure after
nilfs2 detaches its writer in nilfs_detach_log_writer(). However, since
nilfs_evict_inode() uses nilfs_root for some cleanup operations, it may
cause use-after-free read if inodes are left in "garbage_list" and
released by nilfs_dispose_list() at the end of nilfs_detach_log_writer().

Fix this issue by modifying nilfs_evict_inode() to only clear inode
without additional metadata changes that use nilfs_root if the file system
is degraded to read-only or the writer is detached.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ryusuke Konishi <[email protected]>
Reported-by: [email protected]
Closes: https://lkml.kernel.org/r/[email protected]
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: fix zswap writeback race condition

The zswap writeback mechanism can cause a race condition resulting in
memory corruption, where a swapped out page gets swapped in with data that
was written to a different page.

The race unfolds like this:
1. a page with data A and swap offset X is stored in zswap
2. page A is removed off the LRU by zpool driver for writeback in
   zswap-shrink work, data for A is mapped by zpool driver
3. user space program faults and invalidates page entry A, offset X is
   considered free
4. kswapd stores page B at offset X in zswap (zswap could also be
   full, if so, page B would then be IOed to X, then skip step 5.)
5. entry A is replaced by B in tree->rbroot, this doesn't affect the
   local reference held by zswap-shrink work
6. zswap-shrink work writes back A at X, and frees zswap entry A
7. swapin of slot X brings A in memory instead of B

The fix:
Once the swap page cache has been allocated (case ZSWAP_SWAPCACHE_NEW),
zswap-shrink work just checks that the local zswap_entry reference is
still the same as the one in the tree.  If it's not the same it means that
it's either been invalidated or replaced, in both cases the writeback is
aborted because the local entry contains stale data.

Reproducer:
I originally found this by running `stress` overnight to validate my work
on the zswap writeback mechanism, it manifested after hours on my test
machine.  The key to make it happen is having zswap writebacks, so
whatever setup pumps /sys/kernel/debug/zswap/written_back_pages should do
the trick.

In order to reproduce this faster on a vm, I setup a system with ~100M of
available memory and a 500M swap file, then running `stress --vm 1
--vm-bytes 300000000 --vm-stride 4000` makes it happen in matter of tens
of minutes.  One can speed things up even more by swinging
/sys/module/zswap/parameters/max_pool_percent up and down between, say, 20
and 1; this makes it reproduce in tens of seconds.  It's crucial to set
`--vm-stride` to something other than 4096 otherwise `stress` won't
realize that memory has been corrupted because all pages would have the
same data.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Domenico Cerasuolo <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Chris Li (Google) <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: kfence: fix false positives on big endian

Since commit 1ba3cbf3ec3b ("mm: kfence: improve the performance of
__kfence_alloc() and __kfence_free()"), kfence reports failures in random
places at boot on big endian machines.

The problem is that the new KFENCE_CANARY_PATTERN_U64 encodes the address
of each byte in its value, so it needs to be byte swapped on big endian
machines.

The compiler is smart enough to do the le64_to_cpu() at compile time, so
there is no runtime overhead.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1ba3cbf3ec3b ("mm: kfence: improve the performance of __kfence_alloc() and __kfence_free()")
Signed-off-by: Michael Ellerman <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Cc: Peng Zhang <[email protected]>
Cc: David Laight <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

zsmalloc: move LRU update from zs_map_object() to zs_malloc()

Under memory pressure, we sometimes observe the following crash:

[ 5694.832838] ------------[ cut here ]------------
[ 5694.842093] list_del corruption, ffff888014b6a448->next is LIST_POISON1 (dead000000000100)
[ 5694.858677] WARNING: CPU: 33 PID: 418824 at lib/list_debug.c:47 __list_del_entry_valid+0x42/0x80
[ 5694.961820] CPU: 33 PID: 418824 Comm: fuse_counters.s Kdump: loaded Tainted: G S                5.19.0-0_fbk3_rc3_hoangnhatpzsdynshrv41_10870_g85a9558a25de #1
[ 5694.990194] Hardware name: Wiwynn Twin Lakes MP/Twin Lakes Passive MP, BIOS YMM16 05/24/2021
[ 5695.007072] RIP: 0010:__list_del_entry_valid+0x42/0x80
[ 5695.017351] Code: 08 48 83 c2 22 48 39 d0 74 24 48 8b 10 48 39 f2 75 2c 48 8b 51 08 b0 01 48 39 f2 75 34 c3 48 c7 c7 55 d7 78 82 e8 4e 45 3b 00 <0f> 0b eb 31 48 c7 c7 27 a8 70 82 e8 3e 45 3b 00 0f 0b eb 21 48 c7
[ 5695.054919] RSP: 0018:ffffc90027aef4f0 EFLAGS: 00010246
[ 5695.065366] RAX: 41fe484987275300 RBX: ffff888008988180 RCX: 0000000000000000
[ 5695.079636] RDX: ffff88886006c280 RSI: ffff888860060480 RDI: ffff888860060480
[ 5695.093904] RBP: 0000000000000002 R08: 0000000000000000 R09: ffffc90027aef370
[ 5695.108175] R10: 0000000000000000 R11: ffffffff82fdf1c0 R12: 0000000010000002
[ 5695.122447] R13: ffff888014b6a448 R14: ffff888014b6a420 R15: 00000000138dc240
[ 5695.136717] FS:  00007f23a7d3f740(0000) GS:ffff888860040000(0000) knlGS:0000000000000000
[ 5695.152899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5695.164388] CR2: 0000560ceaab6ac0 CR3: 000000001c06c001 CR4: 00000000007706e0
[ 5695.178659] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5695.192927] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5695.207197] PKRU: 55555554
[ 5695.212602] Call Trace:
[ 5695.217486]  <TASK>
[ 5695.221674]  zs_map_object+0x91/0x270
[ 5695.229000]  zswap_frontswap_store+0x33d/0x870
[ 5695.237885]  ? do_raw_spin_lock+0x5d/0xa0
[ 5695.245899]  __frontswap_store+0x51/0xb0
[ 5695.253742]  swap_writepage+0x3c/0x60
[ 5695.261063]  shrink_page_list+0x738/0x1230
[ 5695.269255]  shrink_lruvec+0x5ec/0xcd0
[ 5695.276749]  ? shrink_slab+0x187/0x5f0
[ 5695.284240]  ? mem_cgroup_iter+0x6e/0x120
[ 5695.292255]  shrink_node+0x293/0x7b0
[ 5695.299402]  do_try_to_free_pages+0xea/0x550
[ 5695.307940]  try_to_free_pages+0x19a/0x490
[ 5695.316126]  __folio_alloc+0x19ff/0x3e40
[ 5695.323971]  ? __filemap_get_folio+0x8a/0x4e0
[ 5695.332681]  ? walk_component+0x2a8/0xb50
[ 5695.340697]  ? generic_permission+0xda/0x2a0
[ 5695.349231]  ? __filemap_get_folio+0x8a/0x4e0
[ 5695.357940]  ? walk_component+0x2a8/0xb50
[ 5695.365955]  vma_alloc_folio+0x10e/0x570
[ 5695.373796]  ? walk_component+0x52/0xb50
[ 5695.381634]  wp_page_copy+0x38c/0xc10
[ 5695.388953]  ? filename_lookup+0x378/0xbc0
[ 5695.397140]  handle_mm_fault+0x87f/0x1800
[ 5695.405157]  do_user_addr_fault+0x1bd/0x570
[ 5695.413520]  exc_page_fault+0x5d/0x110
[ 5695.421017]  asm_exc_page_fault+0x22/0x30

After some investigation, I have found the following issue: unlike other
zswap backends, zsmalloc performs the LRU list update at the object
mapping time, rather than when the slot for the object is allocated.
This deviation was discussed and agreed upon during the review process
of the zsmalloc writeback patch series:

https://lore.kernel.org/lkml/[email protected]/

Unfortunately, this introduces a subtle bug that occurs when there is a
concurrent store and reclaim, which interleave as follows:

zswap_frontswap_store()            shrink_worker()
  zs_malloc()                        zs_zpool_shrink()
    spin_lock(&pool->lock)             zs_reclaim_page()
    zspage = find_get_zspage()
    spin_unlock(&pool->lock)
                                         spin_lock(&pool->lock)
                                         zspage = list_first_entry(&pool->lru)
                                         list_del(&zspage->lru)
                                           zspage->lru.next = LIST_POISON1
                                           zspage->lru.prev = LIST_POISON2
                                         spin_unlock(&pool->lock)
  zs_map_object()
    spin_lock(&pool->lock)
    if (!list_empty(&zspage->lru))
      list_del(&zspage->lru)
        CHECK_DATA_CORRUPTION(next == LIST_POISON1) /* BOOM */

With the current upstream code, this issue rarely happens. zswap only
triggers writeback when the pool is already full, at which point all
further store attempts are short-circuited. This creates an implicit
pseudo-serialization between reclaim and store. I am working on a new
zswap shrinking mechanism, which makes interleaving reclaim and store
more likely, exposing this bug.

zbud and z3fold do not have this problem, because they perform the LRU
list update in the alloc function, while still holding the pool's lock.
This patch fixes the aforementioned bug by moving the LRU update back to
zs_malloc(), analogous to zbud and z3fold.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 64f768c6b32e ("zsmalloc: add a LRU to zs_pool to keep track of zspages in LRU order")
Signed-off-by: Nhat Pham <[email protected]>
Suggested-by: Johannes Weiner <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Nitin Gupta <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Vitaly Wool <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

mm: shrinkers: fix race condition on debugfs cleanup

When something registers and unregisters many shrinkers, such as:
for x in $(seq 10000); do unshare -Ui true; done

Sometimes the following error is printed to the kernel log:
debugfs: Directory '...' with parent 'shrinker' already present!

This occurs since commit badc28d4924b ("mm: shrinkers: fix deadlock in
shrinker debugfs") / v6.2: Since the call to `debugfs_remove_recursive`
was moved outside the `shrinker_rwsem`/`shrinker_mutex` lock, but the call
to `ida_free` stayed inside, a newly registered shrinker can be
re-assigned that ID and attempt to create the debugfs directory before the
directory from the previous shrinker has been removed.

The locking changes in commit f95bdb700bc6 ("mm: vmscan: make global slab
shrink lockless") made the race condition more likely, though it existed
before then.

Commit badc28d4924b ("mm: shrinkers: fix deadlock in shrinker debugfs")
could be reverted since the issue is addressed should no longer occur
since the count and scan operations are lockless since commit 20cd1892fcc3
("mm: shrinkers: make count and scan in shrinker debugfs lockless").
However, since this is a contended lock, prefer instead moving `ida_free`
outside the lock to avoid the race.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: badc28d4924b ("mm: shrinkers: fix deadlock in shrinker debugfs")
Signed-off-by: Joan Bruguera Micó <[email protected]>
Cc: Qi Zheng <[email protected]>
Cc: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

maple_tree: make maple state reusable after mas_empty_area()

Make mas->min and mas->max point to a node range instead of a leaf entry
range. This allows mas to still be usable after mas_empty_area() returns.
Users would get unexpected results from other operations on the maple
state after calling the affected function.

For example, x86 MAP_32BIT mmap() acts as if there is no suitable gap when
there should be one.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <[email protected]>
Reported-by: "Edgecombe, Rick P" <[email protected]>
Reported-by: Tad <[email protected]>
Reported-by: Michael Keyes <[email protected]>
Link: https://lore.kernel.org/linux-mm/[email protected]/
Link: https://lore.kernel.org/linux-mm/[email protected]/
Reviewed-by: Liam R. Howlett <[email protected]>
Tested-by: Rick Edgecombe <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

rethook, fprobe: do not trace rethook related functions

These functions are already marked as NOKPROBE to prevent recursion and
we have the same reason to blacklist them if rethook is used with fprobe,
since they are beyond the recursion-free region ftrace can guard.

Link: https://lore.kernel.org/all/[email protected]/
Fixes: f3a112c0c40d ("x86,rethook,kprobes: Replace kretprobe with rethook on x86")
Signed-off-by: Ze Gao <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Cc: [email protected]
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

fprobe: add recursion detection in fprobe_exit_handler

fprobe_hander and fprobe_kprobe_handler has guarded ftrace recursion
detection but fprobe_exit_handler has not, which possibly introduce
recursive calls if the fprobe exit callback calls any traceable
functions. Checking in fprobe_hander or fprobe_kprobe_handler
is not enough and misses this case.

So add recursion free guard the same way as fprobe_hander. Since
ftrace recursion check does not employ ip(s), so here use entry_ip and
entry_parent_ip the same as fprobe_handler.

Link: https://lore.kernel.org/all/[email protected]/
Fixes: 5b0ab78998e3 ("fprobe: Add exit_handler support")
Signed-off-by: Ze Gao <[email protected]>
Cc: [email protected]
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

fprobe: make fprobe_kprobe_handler recursion free

Current implementation calls kprobe related functions before doing
ftrace recursion check in fprobe_kprobe_handler, which opens door
to kernel crash due to stack recursion if preempt_count_{add, sub}
is traceable in kprobe_busy_{begin, end}.

Things goes like this without this patch quoted from Steven:
"
fprobe_kprobe_handler() {
   kprobe_busy_begin() {
      preempt_disable() {
         preempt_count_add() {  <-- trace
            fprobe_kprobe_handler() {
[ wash, rinse, repeat, CRASH!!! ]
"

By refactoring the common part out of fprobe_kprobe_handler and
fprobe_handler and call ftrace recursion detection at the very beginning,
the whole fprobe_kprobe_handler is free from recursion.

[ Fix the indentation of __fprobe_handler() parameters. ]

Link: https://lore.kernel.org/all/[email protected]/
Fixes: ab51e15d535e ("fprobe: Introduce FPROBE_FL_KPROBE_SHARED flag for fprobe")
Signed-off-by: Ze Gao <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Cc: [email protected]
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

rethook: use preempt_{disable, enable}_notrace in rethook_trampoline_handler

This patch replaces preempt_{disable, enable} with its corresponding
notrace version in rethook_trampoline_handler so no worries about stack
recursion or overflow introduced by preempt_count_{add, sub} under
fprobe + rethook context.

Link: https://lore.kernel.org/all/[email protected]/
Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
Signed-off-by: Ze Gao <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Cc: <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

Revert "ARM: dts: stm32: add CAN support on stm32f746"

This reverts commit 0920ccdf41e3078a4dd2567eb905ea154bc826e6.

The commit 0920ccdf41e3 ("ARM: dts: stm32: add CAN support on
stm32f746") depends on the patch "dt-bindings: mfd: stm32f7: add
binding definition for CAN3" [1], which is not in net/main, yet. This
results in a parsing error of "stm32f746.dtsi".

So revert this commit.

[1] https://lore.kernel.org/all/20230423172528.1398158 [email protected]

Cc: Dario Binacchi <[email protected]>
Cc: Alexandre TORGUE <[email protected]>
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]
Fixes: 0920ccdf41e3 ("ARM: dts: stm32: add CAN support on stm32f746")
Suggested-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

Merge tag 'linux-kselftest-fixes-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:

- sgx test fix for false negatives

- ftrace output is hard to parses and it masks inappropriate skips etc.
   This fix addresses the problems by integrating with kselftest runner

* tag 'linux-kselftest-fixes-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ftrace: Improve integration with kselftest runner
  selftests/sgx: Add "test_encl.elf" to TEST_FILES

Merge tag 'kvmarm-fixes-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.4, take #1

- Plug a race in the stage-2 mapping code where the IPA and the PA
would end up being out of sync

- Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...)

- FP/SVE/SME documentation update, in the hope that this field
becomes clearer...

- Add workaround for the usual Apple SEIS brokenness

- Random comment fixes

SMB3: drop reference to cfile before sending oplock break

In cifs_oplock_break function we drop reference to a cfile at
the end of function, due to which close command goes on wire
after lease break acknowledgment even if file is already closed
by application but we had deferred the handle close.
If other client with limited file shareaccess waiting on lease
break ack proceeds operation on that file as soon as first client
sends ack, then we may encounter status sharing violation error
because of open handle.
Solution is to put reference to cfile(send close on wire if last ref)
and then send oplock acknowledgment to server.

Fixes: 9e31678fb403 ("SMB3: fix lease break timeout when multiple deferred close handles for the same file.")
Cc: [email protected]
Signed-off-by: Bharath SM <[email protected]>
Reviewed-by: Shyam Prasad N <[email protected]>
Signed-off-by: Steve French <[email protected]>

SMB3: Close all deferred handles of inode in case of handle lease break

Oplock break may occur for different file handle than the deferred
handle. Check for inode deferred closes list, if it's not empty then
close all the deferred handles of inode because we should not cache
handles if we dont have handle lease.

Eg: If openfilelist has one deferred file handle and another open file
handle from app for a same file, then on a lease break we choose the
first handle in openfile list. The first handle in list can be deferred
handle or actual open file handle from app. In case if it is actual open
handle then today, we don't close deferred handles if we lose handle lease
on a file. Problem with this is, later if app decides to close the existing
open handle then we still be caching deferred handles until deferred close
timeout. Leaving open handle may result in sharing violation when windows
client tries to open a file with limited file share access.

So we should check for deferred list of inode and walk through the list of
deferred files in inode and close all deferred files.

Fixes: 9e31678fb403 ("SMB3: fix lease break timeout when multiple deferred close handles for the same file.")
Cc: [email protected]
Signed-off-by: Bharath SM <[email protected]>
Signed-off-by: Steve French <[email protected]>

Merge tag 'nfsd-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

- A collection of minor bug fixes

* tag 'nfsd-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Remove open coding of string copy
  SUNRPC: Fix trace_svc_register() call site
  SUNRPC: always free ctxt when freeing deferred request
  SUNRPC: double free xprt_ctxt while still in use
  SUNRPC: Fix error handling in svc_setup_socket()
  SUNRPC: Fix encoding of accepted but unsuccessful RPC replies
  lockd: define nlm_port_min,max with CONFIG_SYSCTL
  nfsd: define exports_proc_ops with CONFIG_PROC_FS
  SUNRPC: Avoid relying on crypto API to derive CBC-CTS output IV

Merge tag 'tpmdd-v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm fixes from Jarkko Sakkinen:
"Three bug fixes for recently discovered issues"

* tag 'tpmdd-v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  tpm/tpm_tis: Disable interrupts for more Lenovo devices
  tpm: Prevent hwrng from activating during resume
  tpm_tis: Use tpm_chip_{start,stop} decoration inside tpm_tis_resume

drm/amdgpu/gfx11: update gpu_clock_counter logic

This code was written prior to previous updates to this
logic for other chips. The RSC registers are part of
SMUIO which is an always on block so there is no need
to disable gfxoff. Additionally add the carryover and
preemption checks.

v2: rebase

Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.1.y: 5591a051b86b: drm/amdgpu: refine get gpu clock counter method
Cc: [email protected] # 6.2.y: 5591a051b86b: drm/amdgpu: refine get gpu clock counter method
Cc: [email protected] # 6.3.y: 5591a051b86b: drm/amdgpu: refine get gpu clock counter method

tracing: make ftrace_likely_update() declaration visible

This function is only used when CONFIG_TRACE_BRANCH_PROFILING is set and
DISABLE_BRANCH_PROFILING is not set, and the declaration is hidden
behind this combination of tests.

But that causes a warning when building with CONFIG_TRACING_BRANCHES,
since that sets DISABLE_BRANCH_PROFILING for the tracing code, and the
declaration is thus hidden:

kernel/trace/trace_branch.c:205:6: error: no previous prototype for 'ftrace_likely_update' [-Werror=missing-prototypes]

Move the declaration out of the #ifdef to avoid the warning.

Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

x86/mm: Avoid incomplete Global INVLPG flushes

The INVLPG instruction is used to invalidate TLB entries for a
specified virtual address.  When PCIDs are enabled, INVLPG is supposed
to invalidate TLB entries for the specified address for both the
current PCID *and* Global entries.  (Note: Only kernel mappings set
Global=1.)

Unfortunately, some INVLPG implementations can leave Global
translations unflushed when PCIDs are enabled.

As a workaround, never enable PCIDs on affected processors.

I expect there to eventually be microcode mitigations to replace this
software workaround.  However, the exact version numbers where that
will happen are not known today.  Once the version numbers are set in
stone, the processor list can be tweaked to only disable PCIDs on
affected processors with affected microcode.

Note: if anyone wants a quick fix that doesn't require patching, just
stick 'nopcid' on your kernel command-line.

Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: [email protected]

drm/msm: Be more shouty if per-process pgtables aren't working

Otherwise it is not always obvious if a dt or iommu change is causing us
to fall back to global pgtable.

Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/537359/
Link: https://lore.kernel.org/r/[email protected]

iommu/arm-smmu-qcom: Fix missing adreno_smmu's

When the special handling of qcom,adreno-smmu was moved into
qcom_smmu_create(), it was overlooked that we didn't have all the
required entries in qcom_smmu_impl_of_match. So we stopped getting
adreno_smmu_priv on sc7180, breaking per-process pgtables.

Fixes: 30b912a03d91 ("iommu/arm-smmu-qcom: Move the qcom,adreno-smmu check into qcom_smmu_create")
Cc: <[email protected]>
Suggested-by: Lepton Wu <[email protected]>
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Konrad Dybcio <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/537357/
Link: https://lore.kernel.org/r/[email protected]

ALSA: hda: Add NVIDIA codec IDs a3 through a7 to patch table

These IDs are for AD102, AD103, AD104, AD106, and AD107 gpus with
audio functions that are largely similar to the existing ones.

Tested audio using gnome-settings, over HDMI, DP-SST and DP-MST
connections on AD106 gpu.

Signed-off-by: Nikhil Mahale <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: oss: avoid missing-prototype warnings

Two functions are defined and used in pcm_oss.c but also optionally
used from io.c, with an optional prototype. If CONFIG_SND_PCM_OSS_PLUGINS
is disabled, this causes a warning as the functions are not static
and have no prototype:

sound/core/oss/pcm_oss.c:1235:19: error: no previous prototype for 'snd_pcm_oss_write3' [-Werror=missing-prototypes]
sound/core/oss/pcm_oss.c:1266:19: error: no previous prototype for 'snd_pcm_oss_read3' [-Werror=missing-prototypes]

Avoid this by making the prototypes unconditional.

Signed-off-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

ALSA: cs46xx: mark snd_cs46xx_download_image as static

snd_cs46xx_download_image() was originally called from dsp_spos.c, but
is now local to cs46xx_lib.c. Mark it as 'static' to avoid a warning
about it lacking a declaration, and '__maybe_unused' to avoid a warning
about it being unused when CONFIG_SND_CS46XX_NEW_DSP is disabled:

sound/pci/cs46xx/cs46xx_lib.c:534:5: error: no previous prototype for 'snd_cs46xx_download_image'

Fixes: 89f157d9e6bf ("[ALSA] cs46xx - Fix PM resume")
Signed-off-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>

tools headers disabled-features: Sync with the kernel sources

To pick the changes from:

  e0bddc19ba9578bc ("x86/mm: Reduce untagged_addr() overhead for systems without LAM")

This only causes these perf files to be rebuilt:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
  diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h

Cc: Adrian Hunter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

nvme: do not let the user delete a ctrl before a complete initialization

If a userspace application performes a "delete_controller" command
early during the ctrl initialization, the delete operation
may race against the init code and the kernel will crash.

nvme nvme5: Connect command failed: host path error
nvme nvme5: failed to connect queue: 0 ret=880
PF: supervisor write access in kernel mode
PF: error_code(0x0002) - not-present page
blk_mq_quiesce_queue+0x18/0x90
nvme_tcp_delete_ctrl+0x24/0x40 [nvme_tcp]
nvme_do_delete_ctrl+0x7f/0x8b [nvme_core]
nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
kernfs_fop_write_iter+0x124/0x1b0
new_sync_write+0xff/0x190
vfs_write+0x1ef/0x280

Fix the crash by checking the NVME_CTRL_STARTED_ONCE bit;
if it's not set it means that the nvme controller is still
in the process of getting initialized and the kernel
will return an -EBUSY error to userspace.
Set the NVME_CTRL_STARTED_ONCE later in the nvme_start_ctrl()
function, after the controller start operation is completed.

Signed-off-by: Maurizio Lombardi <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>

nvme-multipath: don't call blk_mark_disk_dead in nvme_mpath_remove_disk

nvme_mpath_remove_disk is called after del_gendisk, at which point a
blk_mark_disk_dead call doesn't make any sense.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Keith Busch <[email protected]>

tools headers UAPI: Sync arch prctl headers with the kernel sources

To pick the changes in this cset:

  a03c376ebaf38394 ("x86/arch_prctl: Add AMX feature numbers as ABI constants")
  23e5d9ec2bab53c4 ("x86/mm/iommu/sva: Make LAM and SVA mutually exclusive")
  2f8794bd087e7958 ("x86/mm: Provide arch_prctl() interface for LAM")

This picks these new prctls in a third range, that was also added to the
tools/perf/trace/beauty/arch_prctl.c beautifier.

  $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/before
  $ cp arch/x86/include/uapi/asm/prctl.h tools/arch/x86/include/uapi/asm/prctl.h
  $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/after
  $ diff -u /tmp/before /tmp/after
  @@ -20,3 +20,11 @@
    [0x2003 - 0x2001]= "MAP_VDSO_64",
   };

  +#define x86_arch_prctl_codes_3_offset 0x4001
  +static const char *x86_arch_prctl_codes_3[] = {
  + [0x4001 - 0x4001]= "GET_UNTAG_MASK",
  + [0x4002 - 0x4001]= "ENABLE_TAGGED_ADDR",
  + [0x4003 - 0x4001]= "GET_MAX_TAG_BITS",
  + [0x4004 - 0x4001]= "FORCE_TAGGED_SVA",
  +};
  +
  $

With this 'perf trace' can translate those numbers into strings and use
the strings in filter expressions:

  # perf trace -e prctl
       0.000 ( 0.011 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9c014b7df5)     = 0
       0.032 ( 0.002 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9bb6b51580)     = 0
       5.452 ( 0.003 ms): StreamT~ns #30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfeb70) = 0
       5.468 ( 0.002 ms): StreamT~ns #30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfea70) = 0
      24.494 ( 0.009 ms): IndexedDB #556/3722624 prctl(option: SET_NAME, arg2: 0x7f562a32ae28) = 0
      24.540 ( 0.002 ms): IndexedDB #556/3722624 prctl(option: SET_NAME, arg2: 0x7f563c6d4b30) = 0
     670.281 ( 0.008 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30805c8) = 0
     670.293 ( 0.002 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30800f0) = 0
  ^C#

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/prctl.h' differs from latest version at 'arch/x86/include/uapi/asm/prctl.h'
  diff -u tools/arch/x86/include/uapi/asm/prctl.h arch/x86/include/uapi/asm/prctl.h

Cc: Adrian Hunter <[email protected]>
Cc: Chang S. Bae <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'

This is to get the changes from:

  68674f94ffc9dddc ("x86: don't use REP_GOOD or ERMS for small memory copies")
  20f3337d350c4e1b ("x86: don't use REP_GOOD or ERMS for small memory clearing")

This also make the 'perf bench mem' files stop referring to the erms
versions that gone away with the above patches.

That addresses these perf tools build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S'
  diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S
  Warning: Kernel ABI header at 'tools/arch/x86/lib/memset_64.S' differs from latest version at 'arch/x86/lib/memset_64.S'
  diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

tools headers x86 cpufeatures: Sync with the kernel sources

To pick the changes from:

  3d8f61bf8bcd69bc ("x86: KVM: Add common feature flag for AMD's PSFD")
  3763bf58029f3459 ("x86/cpufeatures: Redefine synthetic virtual NMI bit as AMD's "real" vNMI")
  6449dcb0cac73821 ("x86: CPUID and CR3/CR4 flags for Linear Address Masking")
  be8de49bea505e77 ("x86/speculation: Identify processors vulnerable to SMT RSB predictions")
  e7862eda309ecfcc ("x86/cpu: Support AMD Automatic IBRS")
  faabfcb194a8d068 ("x86/cpu, kvm: Add the SMM_CTL MSR not present feature")
  5b909d4ae59aedc7 ("x86/cpu, kvm: Add the Null Selector Clears Base feature")
  84168ae786f8a15a ("x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf")
  a9dc9ec5a1fafc3d ("x86/cpu, kvm: Add the NO_NESTED_DATA_BP feature")
  f8df91e73a6827a4 ("x86/cpufeatures: Add macros for Intel's new fast rep string features")
  78335aac6156eada ("x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag")
  f334f723a63cfc25 ("x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag")
  a018d2e3d4b1abc4 ("x86/cpufeatures: Add Architectural PerfMon Extension bit")

This causes these perf files to be rebuilt and brings some X86_FEATURE
that will be used when updating the copies of
tools/arch/x86/lib/mem{cpy,set}_64.S with the kernel sources:

  CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
  CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Borislav Petkov (AMD) <[email protected]>
Cc: Kim Phillips <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Babu Moger <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Tom Lendacky <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

s390/iommu: get rid of S390_CCW_IOMMU and S390_AP_IOMMU

These don't do anything anymore, the only user of the symbol was
VFIO_CCW/AP which already "depends on VFIO" and VFIO itself selects
IOMMU_API.

When this was added VFIO was wrongly doing "depends on IOMMU_API" which
required some contortions like this to ensure IOMMU_API was turned on.

Reviewed-by: Eric Farman <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexander Gordeev <[email protected]>

s390/Kconfig: remove obsolete configs SCHED_{BOOK,DRAWER}

Commit f1045056c726 ("topology/sysfs: rework book and drawer topology
ifdefery") activates the book and drawer topology, previously activated by
CONFIG_SCHED_{BOOK,DRAWER}, dependent on the existence of certain macro
definitions. Hence, since then, CONFIG_SCHED_{BOOK,DRAWER} have no effect
and any further purpose.

Remove the obsolete configs SCHED_{BOOK,DRAWER}.

Signed-off-by: Lukas Bulwahn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexander Gordeev <[email protected]>

s390/uapi: cover statfs padding by growing f_spare

pahole says:

struct compat_statfs64 {
...
u32 f_spare[4]; /*    68    16 */
/* size: 88, cachelines: 1, members: 12 */
/* padding: 4 */

struct statfs {
...
unsigned int f_spare[4]; /*    68    16 */
/* size: 88, cachelines: 1, members: 12 */
/* padding: 4 */

struct statfs64 {
...
unsigned int f_spare[4]; /*    68    16 */
/* size: 88, cachelines: 1, members: 12 */
/* padding: 4 */

One has to keep the existence of padding in mind when working with
these structs. Grow f_spare arrays to 5 in order to simplify things.

Acked-by: Heiko Carstens <[email protected]>
Signed-off-by: Ilya Leoshkevich <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexander Gordeev <[email protected]>

statfs: enforce statfs[64] structure initialization

s390's struct statfs and struct statfs64 contain padding, which
field-by-field copying does not set. Initialize the respective structs
with zeros before filling them and copying them to userspace, like it's
already done for the compat versions of these structs.

Found by KMSAN.

[[email protected]: fixed typo in patch description]
Acked-by: Heiko Carstens <[email protected]>
Cc: [email protected] # v4.14+
Signed-off-by: Ilya Leoshkevich <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexander Gordeev <[email protected]>

s390/qdio: fix do_sqbs() inline assembly constraint

Use "a" constraint instead of "d" constraint to pass the state parameter to
the do_sqbs() inline assembly. This prevents that general purpose register
zero is used for the state parameter.

If the compiler would select general purpose register zero this would be
problematic for the used instruction in rsy format: the register used for
the state parameter is a base register. If the base register is general
purpose register zero the contents of the register are unexpectedly ignored
when the instruction is executed.

This only applies to z/VM guests using QIOASSIST with dedicated (pass through)
QDIO-based devices such as FCP [zfcp driver] as well as real OSA or
HiperSockets [qeth driver].

A possible symptom for this case using zfcp is the following repeating kernel
message pattern:

zfcp <devbusid>: A QDIO problem occurred
zfcp <devbusid>: A QDIO problem occurred
zfcp <devbusid>: qdio: ZFCP on SC <sc> using AI:1 QEBSM:1 PRI:1 TDD:1 SIGA: W
zfcp <devbusid>: A QDIO problem occurred
zfcp <devbusid>: A QDIO problem occurred

Each of the qdio problem message can be accompanied by the following entries
for the affected subchannel <sc> in
/sys/kernel/debug/s390dbf/qdio_error/hex_ascii for zfcp or qeth:

<sc> ccq: 69....
<sc> SQBS ERROR.

Reviewed-by: Benjamin Block <[email protected]>
Cc: Steffen Maier <[email protected]>
Fixes: 8129ee164267 ("[PATCH] s390: qdio V=V pass-through")
Cc: <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>

Merge tag 'thunderbolt-for-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-linus

Mika writes:

thunderbolt: Fix for v6.4-rc3

This includes a single fix that fixes an error when resuming from
hibernation if the driver is built into the kernel. This has been in
linux-next with no reported issues.

* tag 'thunderbolt-for-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Clear registers properly when auto clear isn't in use

netfilter: nft_set_rbtree: fix null deref on element insertion

There is no guarantee that rb_prev() will not return NULL in nft_rbtree_gc_elem():

general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
nft_add_set_elem+0x14b0/0x2990
nf_tables_newsetelem+0x528/0xb30

Furthermore, there is a possible use-after-free while iterating,
'node' can be free'd so we need to cache the next value to use.

Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
Signed-off-by: Florian Westphal <[email protected]>

netfilter: nf_tables: fix nft_trans type confusion

nft_trans_FOO objects all share a common nft_trans base structure, but
trailing fields depend on the real object size. Access is only safe after
trans->msg_type check.

Check for rule type first. Found by code inspection.

Fixes: 1a94e38d254b ("netfilter: nf_tables: add NFTA_RULE_ID attribute")
Signed-off-by: Florian Westphal <[email protected]>

netfilter: conntrack: define variables exp_nat_nla_policy and any_addr with CONFIG_NF_NAT

gcc with W=1 and ! CONFIG_NF_NAT
net/netfilter/nf_conntrack_netlink.c:3463:32: error:
  ‘exp_nat_nla_policy’ defined but not used [-Werror=unused-const-variable=]
3463 | static const struct nla_policy exp_nat_nla_policy[CTA_EXPECT_NAT_MAX+1] = {
      |                                ^~~~~~~~~~~~~~~~~~
net/netfilter/nf_conntrack_netlink.c:2979:33: error:
  ‘any_addr’ defined but not used [-Werror=unused-const-variable=]
2979 | static const union nf_inet_addr any_addr;
      |                                 ^~~~~~~~

These variables use is controlled by CONFIG_NF_NAT, so should their definitions.

Signed-off-by: Tom Rix <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>

net: wwan: t7xx: Ensure init is completed before system sleep

When the system attempts to sleep while mtk_t7xx is not ready, the driver
cannot put the device to sleep:
[   12.472918] mtk_t7xx 0000:57:00.0: [PM] Exiting suspend, modem in invalid state
[   12.472936] mtk_t7xx 0000:57:00.0: PM: pci_pm_suspend(): t7xx_pci_pm_suspend+0x0/0x20 [mtk_t7xx] returns -14
[   12.473678] mtk_t7xx 0000:57:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x1b0 returns -14
[   12.473711] mtk_t7xx 0000:57:00.0: PM: failed to suspend async: error -14
[   12.764776] PM: Some devices failed to suspend, or early wake event detected

Mediatek confirmed the device can take a rather long time to complete
its initialization, so wait for up to 20 seconds until init is done.

Signed-off-by: Kai-Heng Feng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: selftests: Fix optstring

The cited commit added a stray colon to the 'v' option. That makes the
option work incorrectly.

ex:
tools/testing/selftests/net# ./fib_nexthops.sh -v
(should enable verbose mode, instead it shows help text due to missing arg)

Fixes: 5feba4727395 ("selftests: fib_nexthops: Make ping timeout configurable")
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: Benjamin Poirier <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: pcs: xpcs: fix C73 AN not getting enabled

The XPCS expects clause 73 (copper backplane) autoneg to follow the
ethtool autoneg bit. It actually did that until the blamed
commit inaptly replaced state->an_enabled (coming from ethtool) with
phylink_autoneg_inband() (coming from the device tree or struct
phylink_config), as part of an unrelated phylink_pcs API conversion.

Russell King suggests that state->an_enabled from the original code was
just a proxy for the ethtool Autoneg bit, and that the correct way of
restoring the functionality is to check for this bit in the advertising
mask.

Fixes: 11059740e616 ("net: pcs: xpcs: convert to phylink_pcs_ops")
Link: https://lore.kernel.org/netdev/[email protected]/
Suggested-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: wwan: iosm: fix NULL pointer dereference when removing device

In suspend and resume cycle, the removal and rescan of device ends
up in NULL pointer dereference.

During driver initialization, if the ipc_imem_wwan_channel_init()
fails to get the valid device capabilities it returns an error and
further no resource (wwan struct) will be allocated. Now in this
situation if driver removal procedure is initiated it would result
in NULL pointer exception since unallocated wwan struct is dereferenced
inside ipc_wwan_deinit().

ipc_imem_run_state_worker() to handle the called functions return value
and to release the resource in failure case. It also reports the link
down event in failure cases. The user space application can handle this
event to do a device reset for restoring the device communication.

Fixes: 3670970dd8c6 ("net: iosm: shared memory IPC interface")
Reported-by: Samuel Wein PhD <[email protected]>
Closes: https://lore.kernel.org/netdev/[email protected]/T/
Signed-off-by: M Chetan Kumar <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit()

syzbot triggered the following splat [1], sending an empty message
through pppoe_sendmsg().

When VLAN_FLAG_REORDER_HDR flag is set, vlan_dev_hard_header()
does not push extra bytes for the VLAN header, because vlan is offloaded.

Unfortunately vlan_dev_hard_start_xmit() first reads veth->h_vlan_proto
before testing (vlan->flags & VLAN_FLAG_REORDER_HDR).

We need to swap the two conditions.

[1]
BUG: KMSAN: uninit-value in vlan_dev_hard_start_xmit+0x171/0x7f0 net/8021q/vlan_dev.c:111
vlan_dev_hard_start_xmit+0x171/0x7f0 net/8021q/vlan_dev.c:111
__netdev_start_xmit include/linux/netdevice.h:4883 [inline]
netdev_start_xmit include/linux/netdevice.h:4897 [inline]
xmit_one net/core/dev.c:3580 [inline]
dev_hard_start_xmit+0x253/0xa20 net/core/dev.c:3596
__dev_queue_xmit+0x3c7f/0x5ac0 net/core/dev.c:4246
dev_queue_xmit include/linux/netdevice.h:3053 [inline]
pppoe_sendmsg+0xa93/0xb80 drivers/net/ppp/pppoe.c:900
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0xa24/0xe40 net/socket.c:2501
___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2555
__sys_sendmmsg+0x411/0xa50 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Uninit was created at:
slab_post_alloc_hook+0x12d/0xb60 mm/slab.h:774
slab_alloc_node mm/slub.c:3452 [inline]
kmem_cache_alloc_node+0x543/0xab0 mm/slub.c:3497
kmalloc_reserve+0x148/0x470 net/core/skbuff.c:520
__alloc_skb+0x3a7/0x850 net/core/skbuff.c:606
alloc_skb include/linux/skbuff.h:1277 [inline]
sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2583
pppoe_sendmsg+0x3af/0xb80 drivers/net/ppp/pppoe.c:867
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0xa24/0xe40 net/socket.c:2501
___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2555
__sys_sendmmsg+0x411/0xa50 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

CPU: 0 PID: 29770 Comm: syz-executor.0 Not tainted 6.3.0-rc6-syzkaller-gc478e5b17829 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tracing: fprobe: Initialize ret valiable to fix smatch error

The commit 39d954200bf6 ("fprobe: Skip exit_handler if entry_handler returns
!0") introduced a hidden dependency of 'ret' local variable in the
fprobe_handler(), Smatch warns the `ret` can be accessed without
initialization.

kernel/trace/fprobe.c:59 fprobe_handler()
error: uninitialized symbol 'ret'.

kernel/trace/fprobe.c
    49                 fpr->entry_ip = ip;
    50                 if (fp->entry_data_size)
    51                         entry_data = fpr->data;
    52         }
    53
    54         if (fp->entry_handler)
    55                 ret = fp->entry_handler(fp, ip, ftrace_get_regs(fregs), entry_data);

ret is only initialized if there is an ->entry_handler

    56
    57         /* If entry_handler returns !0, nmissed is not counted. */
    58         if (rh) {

rh is only true if there is an ->exit_handler.  Presumably if you have
and ->exit_handler that means you also have a ->entry_handler but Smatch
is not smart enough to figure it out.

--> 59                 if (ret)
                           ^^^
Warning here.

    60                         rethook_recycle(rh);
    61                 else
    62                         rethook_hook(rh, ftrace_get_regs(fregs), true);
    63         }
    64 out:
    65         ftrace_test_recursion_unlock(bit);
    66 }

Link: https://lore.kernel.org/all/168100731160.79534.374827110083836722.stgit@devnote2/
Reported-by: Dan Carpenter <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Fixes: 39d954200bf6 ("fprobe: Skip exit_handler if entry_handler returns !0")
Acked-by: Dan Carpenter <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

mailmap: add entries for Nikolay Aleksandrov

Turns out I missed a few patches due to use of old addresses by
senders. Add a mailmap entry with my old addresses.

Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

igb: fix bit_shift to be in [1..8] range

In igb_hash_mc_addr() the expression:
"mc_addr[4] >> 8 - bit_shift", right shifting "mc_addr[4]"
shift by more than 7 bits always yields zero, so hash becomes not so different.
Add initialization with bit_shift = 1 and add a loop condition to ensure
bit_shift will be always in [1..8] range.

Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Signed-off-by: Aleksandr Loktionov <[email protected]>
Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony nguyen says:

====================
Intel Wired LAN Driver Updates 2023-05-16

This series contains updates to ice and iavf drivers.

Ahmed adds setting of missed condition for statistics which caused
incorrect reporting of values for ice. For iavf, he removes a call to set
VLAN offloads during re-initialization which can cause incorrect values
to be set.

Dawid adds checks to ensure VF is ready to be reset before executing
commands that will require it to be reset on ice.
---
v2:
Patch 2
- Redo commit message
====================

Signed-off-by: David S. Miller <[email protected]>

net: dsa: mv88e6xxx: Fix mv88e6393x EPC write command offset

According to datasheet, the command opcode must be specified
into bits [14:12] of the Extended Port Control register (EPC).

Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family")
Signed-off-by: Marco Migliore <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

cassini: Fix a memory leak in the error handling path of cas_init_one()

cas_saturn_firmware_init() allocates some memory using vmalloc(). This
memory is freed in the .remove() function but not it the error handling
path of the probe.

Add the missing vfree() to avoid a memory leak, should an error occur.

Fixes: fcaa40669cd7 ("cassini: use request_firmware")
Signed-off-by: Christophe JAILLET <[email protected]>
Reviewed-by: Pavan Chebbi <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tun: Fix memory leak for detached NAPI queue.

syzkaller reported [0] memory leaks of sk and skb related to the TUN
device with no repro, but we can reproduce it easily with:

  struct ifreq ifr = {}
  int fd_tun, fd_tmp;
  char buf[4] = {};

  fd_tun = openat(AT_FDCWD, "/dev/net/tun", O_WRONLY, 0);
  ifr.ifr_flags = IFF_TUN | IFF_NAPI | IFF_MULTI_QUEUE;
  ioctl(fd_tun, TUNSETIFF, &ifr);

  ifr.ifr_flags = IFF_DETACH_QUEUE;
  ioctl(fd_tun, TUNSETQUEUE, &ifr);

  fd_tmp = socket(AF_PACKET, SOCK_PACKET, 0);
  ifr.ifr_flags = IFF_UP;
  ioctl(fd_tmp, SIOCSIFFLAGS, &ifr);

  write(fd_tun, buf, sizeof(buf));
  close(fd_tun);

If we enable NAPI and multi-queue on a TUN device, we can put skb into
tfile->sk.sk_write_queue after the queue is detached.  We should prevent
it by checking tfile->detached before queuing skb.

Note this must be done under tfile->sk.sk_write_queue.lock because write()
and ioctl(IFF_DETACH_QUEUE) can run concurrently.  Otherwise, there would
be a small race window:

  write()                             ioctl(IFF_DETACH_QUEUE)
  `- tun_get_user                     `- __tun_detach
     |- if (tfile->detached)             |- tun_disable_queue
     |  `-> false                        |  `- tfile->detached = tun
     |                                   `- tun_queue_purge
     |- spin_lock_bh(&queue->lock)
     `- __skb_queue_tail(queue, skb)

Another solution is to call tun_queue_purge() when closing and
reattaching the detached queue, but it could paper over another
problems.  Also, we do the same kind of test for IFF_NAPI_FRAGS.

[0]:
unreferenced object 0xffff88801edbc800 (size 2048):
  comm "syz-executor.1", pid 33269, jiffies 4295743834 (age 18.756s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
  backtrace:
    [<000000008c16ea3d>] __do_kmalloc_node mm/slab_common.c:965 [inline]
    [<000000008c16ea3d>] __kmalloc+0x4a/0x130 mm/slab_common.c:979
    [<000000003addde56>] kmalloc include/linux/slab.h:563 [inline]
    [<000000003addde56>] sk_prot_alloc+0xef/0x1b0 net/core/sock.c:2035
    [<000000003e20621f>] sk_alloc+0x36/0x2f0 net/core/sock.c:2088
    [<0000000028e43843>] tun_chr_open+0x3d/0x190 drivers/net/tun.c:3438
    [<000000001b0f1f28>] misc_open+0x1a6/0x1f0 drivers/char/misc.c:165
    [<000000004376f706>] chrdev_open+0x111/0x300 fs/char_dev.c:414
    [<00000000614d379f>] do_dentry_open+0x2f9/0x750 fs/open.c:920
    [<000000008eb24774>] do_open fs/namei.c:3636 [inline]
    [<000000008eb24774>] path_openat+0x143f/0x1a30 fs/namei.c:3791
    [<00000000955077b5>] do_filp_open+0xce/0x1c0 fs/namei.c:3818
    [<00000000b78973b0>] do_sys_openat2+0xf0/0x260 fs/open.c:1356
    [<00000000057be699>] do_sys_open fs/open.c:1372 [inline]
    [<00000000057be699>] __do_sys_openat fs/open.c:1388 [inline]
    [<00000000057be699>] __se_sys_openat fs/open.c:1383 [inline]
    [<00000000057be699>] __x64_sys_openat+0x83/0xf0 fs/open.c:1383
    [<00000000a7d2182d>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<00000000a7d2182d>] do_syscall_64+0x3c/0x90 arch/x86/entry/common.c:80
    [<000000004cc4e8c4>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

unreferenced object 0xffff88802f671700 (size 240):
  comm "syz-executor.1", pid 33269, jiffies 4295743854 (age 18.736s)
  hex dump (first 32 bytes):
    68 c9 db 1e 80 88 ff ff 68 c9 db 1e 80 88 ff ff  h.......h.......
    00 c0 7b 2f 80 88 ff ff 00 c8 db 1e 80 88 ff ff  ..{/............
  backtrace:
    [<00000000e9d9fdb6>] __alloc_skb+0x223/0x250 net/core/skbuff.c:644
    [<000000002c3e4e0b>] alloc_skb include/linux/skbuff.h:1288 [inline]
    [<000000002c3e4e0b>] alloc_skb_with_frags+0x6f/0x350 net/core/skbuff.c:6378
    [<00000000825f98d7>] sock_alloc_send_pskb+0x3ac/0x3e0 net/core/sock.c:2729
    [<00000000e9eb3df3>] tun_alloc_skb drivers/net/tun.c:1529 [inline]
    [<00000000e9eb3df3>] tun_get_user+0x5e1/0x1f90 drivers/net/tun.c:1841
    [<0000000053096912>] tun_chr_write_iter+0xac/0x120 drivers/net/tun.c:2035
    [<00000000b9282ae0>] call_write_iter include/linux/fs.h:1868 [inline]
    [<00000000b9282ae0>] new_sync_write fs/read_write.c:491 [inline]
    [<00000000b9282ae0>] vfs_write+0x40f/0x530 fs/read_write.c:584
    [<00000000524566e4>] ksys_write+0xa1/0x170 fs/read_write.c:637
    [<00000000a7d2182d>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<00000000a7d2182d>] do_syscall_64+0x3c/0x90 arch/x86/entry/common.c:80
    [<000000004cc4e8c4>] entry_SYSCALL_64_after_hwframe+0x72/0xdc

Fixes: cde8b15f1aab ("tuntap: add ioctl to attach or detach a file form tuntap device")
Reported-by: syzkaller <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge patch series "can: kvaser_pciefd: Bug fixes"

Jimmy Assarsson <[email protected]> says:

This patch series contains various bug fixes for the kvaser_pciefd
driver.

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: kvaser_pciefd: Disable interrupts in probe error path

Disable interrupts in error path of probe function.

Fixes: 26ad340e582d ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
Cc: [email protected]
Signed-off-by: Jimmy Assarsson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>

can: kvaser_pciefd: Do not send EFLUSH command on TFD interrupt

Under certain circumstances we send two EFLUSH commands, resulting in two
EFLUSH ack packets, while only expecting a single EFLUSH ack.
This can cause the driver Tx flush completion to get out of sync.

To avoid this problem, don't enable the "Transmit buffer flush done" (TFD)
interrupt and remove the code handling it.
Now we only send EFLUSH command after receiving status packet with
"Init detected" (IDET) bit set.

Fixes: 26ad340e582d ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices")
Cc: [email protected]
Signed-off-by: Jimmy Assarsson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>